The dual mission of the project is to produce world-class research in the field of computational social science and to promote AI methods in the social science community. More specifically, the proposal outlines a research program for the application of AI for advanced text analysis tasks in the social sciences. While cutting-edge international journal articles and book projects serve as the core of the research proposal, the project will also make several contributions to the wider academic community and beyond. With its framework of multi-layered international co-operations, the project will benefit social science scholars and also interested parties in business and government. The research team will generate new corpora, software, websites hosting new datasets and offering tools for everyday users, as well as outreach events to non-academic audiences and training opportunities. The project constitute a thoroughly new research direction as the focus shifts from general quantitative text analysis to the application of artificial intelligence and machine learning in the field of text mining.
The supervised machine learning for multiclass classification can be considered to be the flagship project of POLTEXTLAB. Our aim is to (1) improve the external validity of artificial intelligence-based classification mechanism to domains beyond the Hungarian language, (2) to raise the recall rate by applying active learning methods to recalcitrant items and (3) to enhance the proposed method with further combinations of algorithms (ensemble approach) in the first step, and with state-of-the-art neural networks in the second step. These extensions of the conceptual framework offer a very high potential for applications in multilingual comparative research in the social sciences or even in a business context.

Keywords: machine learning, elaboration of machine learning algorithms, corpus construction, quantitative text analysis