Postdoc opportunity in NLP

(En español)

Postdoctoral scholarships are available in my (Alexander Gelbukh) project. If you are interested, please contact me via Skype or WhatsApp (see contact). Please do not send me email, or, if you do, notify me about this on Skype or WhatsApp.

Postdoctoral scholarship: approx. USD 1,200 per month. Note: The amount is more than twice greater than what is sufficient to normally live in this city, renting a room or a small apartment near the school.

Family: We might be able to arrange for simultaneous scholarships (such as for you and your spouse), with total duration of up to 3 years (such as 1.5 years for both). We also have PhD and MSc scholarships, if relevant for your spouse; let me know.

Location: Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico. Since the funding (salary) comes from the Government of Mexico, they do require your presence.

Our group consists of three faculty members (all three from Russia) and 25 PhD and MSc students currently from 8 countries; see the Lab and my personal page.

Communication inside the group is in English. Learning Spanish is not required; many of our students don’t know any Spanish.

Requirements:

· Having obtained the PhD degree within last 5 years.

· Not to be employed elsewhere.

· Full-time involvement, from 6 to 24 months.

Goals: The main goal is joint (you and me) publications in top journals (with high JCR impact factor). We can agree on the research topics of your interest. However, as part of your activities, you will develop the following project. The goals and activities of this project need to be fulfilled exactly and completely, even if with completeness just enough to report the project to the government as done, with each activity and goal fulfilled to the extent promised in the proposal (but not more than that). The rest of the time can be used for more interesting research.

Project: Multilingual analysis of veracity and author profiling in social networks

General objective: Develop deep learning-based methods with convolutional and recurrent neural networks for analysis of veracity and author profiling in the comments in web pages of web 2.0, in social networks, media news, and other massive information sources in Internet, in the form of text, in Spanish and other languages.

Specific objectives:

1. Build lexical resources and datasets large enough to be used in deep learning.

2. Develop text analysis methods for the extraction of features and construction of linguistically rich representations using novel deep learning-based architectures.

3. Develop machine-learning methods for the analysis of veracity in massive information sources on the Internet.

4. Develop machine-learning methods for the identification of author profiles in social networks.

5. Develop algorithms that will use author profile information for recommender systems.

6. Develop an opinion analysis system in the political and public administration fields using veracity analysis.

Products:

N	Product
1	At least two papers in top journals, related with the construction of resources and the extraction of features with deep learning
2	At least two papers in top journals, related with the recommender system improved with author profiling and to the opinion analysis system in politics and public administration.
3	At least five MSc and / or PhD theses.
4	At least two papers in top journals related with the developed methods for veracity analysis and author profile identification.
5	A prototype system for opinion analysis using veracity analysis.
6	Machine learning models suitable for veracity analysis in massive information sources in the Internet.
7	Neural architectures based on recurring networks to extract relevant features from texts.
8	Neural architectures based on convolutional networks to extract relevant features from texts.
9	Machine-learning models suitable for identifying author profiles from social media messages.
10	Corpus labeled with the veracity of the text, i.e., whether the text represents truthful information.
11	Corpus tagged with author profiles (gender, age range, language variety).
12	Lexical resources (lexica) with keywords that help identify the sociolect of the author of a text.
13	Prototypes of recommender systems using information on social media author profiles.

Activities:

N	Activity
1	Studying new bibliography and existing resources. Review of criteria on the amount of text necessary to be used in deep learning
2	Identify textual sources and label them manually with the author's profile (gender, age, native language) according to social media metadata
3	Generate a controlled corpus of real and fictitious opinions, using crowdsourcing
4	Vectorization of words, phrases and documents using known techniques, such as word2vec, GloVe, FastText, doc2vec, among others
5	Implementation of deep-learning techniques based on convolutional neural networks (CNNs) to generate vector representation models of documents
6	Implementation of deep-learning techniques based on recurrent neural networks (RNNs) to generate vector representation models of documents
7	Presentation of the developed corpora and techniques at international conferences. Preparation of papers to be published in top journals
8	Experiments with the two deep-learning architectures—CNNs and RNNs—to generate veracity analysis models
9	Evaluation of characteristics and parameters of the algorithms
10	Evaluation of the results obtained with these techniques with the developed corpus
11	Preparation of a paper related with the model generated for veracity analysis to be published in a top journal
12	Experiments with the two deep learning architectures—CNNs and RNNs—for author profile identification
13	Evaluation of characteristics and parameters of the algorithms
14	Evaluation of the developed system at the PAN international author profiling competition
15	Preparation of a paper related with the generated model for identification of author profiles to be published in a top journal
16	Organization of a workshop on deep-learning techniques for NLP and presentation of the obtained results
17	Building a basic recommender system using sentiment analysis of textual data
18	Improving the basic recommender system using author profile information as an additional feature
19	Evaluation of the improved recommender system with author profiles on a well-known corpus (benchmark)
20	Obtaining textual data from social networks in real time (streaming) to perform the analysis of opinions in the political domain
21	Automatic labeling of the veracity of the messages using the models generated at the previous stage
22	Development of a graphical user interface for the analysis of the results
23	Preparation of two papers related to applied systems: the improved recommender system and the opinion analysis system
24	Organization of a workshop on deep-learning techniques for NLP and presentation of the obtained results