What’s the shelf-life of a classification? – A classification of anti-vax comments

As a larger goal, our project focusing on COVID-19 anti-vax sentiment has aimed to create a platform to support policy-making that is able to draw on textual data generated in the online space to provide a real-time picture of the opinions and reactions of the Hungarian population. Most of the classification projects have been classifying data sets that have been completed in time. However, an ever-increasing body of data is made available by the dynamic growth in digitalization, which may even require an ongoing classification process to be put in place. Such continuity is another important feature of our project. While certain classification models are able to continuously retrain themselves on the basis of new input data, such adaptation is more difficult in the case of models based on training datasets. As the cost of retraining models is high, it is important to consider how long you intend to use a model before replacing it.

In our sub-project, we developed a framework between 2020–2021 to monitor the discourse on vaccination and vaccine scepticism in the online space in Hungary. One of the core elements of this framework is the BERT model which classifies comments according to the degree of anti-vax sentiment expressed in them. The initial model was completed in spring 2021, based on the manual classification of 10,000 comments. As the data was collected on an ongoing basis, during which time the discourse continued to evolve, this raised the issue of the temporal stability of the model. To examine this question, we prepared a new annotation at the end of 2021, based on the comments from autumn 2021.

The new annotation enables the examination of the performance of different annotations at different times, whether it is worth treating annotated data as a single unit, and the efficiency of special techniques such as rewriting comments into a different category. During 2022, we primarily focused on the extent to which the classification of comments can be improved by the use of unlabelled data, from which periods it may be worth including unlabelled data, and whether it makes sense to pre-select content on the basis of certain criteria during the annotation process.

 

Project participants
Krisztián Boros
Eszter Rita Katona
Zoltán Kmetty
Bence Kollányi
Árpád Knap
Anna Vancsó

 

Cooperating partners
SentiOne
Ynsight

 

Publication
Vancsó Anna, Kmetty Zoltán. Dominant Christian narratives of solidarity during the COVID pandemic in Hungary. Intersections. East European Journal of Society and Politics. Vol. 7 No. 3, 2021, pp. 101–119

 

Workshop
Anti-vaccine resistance in the cross-section of three methods, 12 January 2021, online meetup

 

Conference papers
Kmetty Zoltán. ʹRussian vaccine will be good for our politicians; I want Pfizerʹ – Changing narratives of vaccination in Hungary. 7th International Conference on Computational Social Science IC2S2, online, 27–31 July 2021
Kmetty Zoltán. Mapping the network of anti-vaxxer and pro-vaxxer supporters. Sunbelt and NetSci Conference, online, 6–11 July 2021
Decision Support Platform Demo, AI Coalition Workshop and Exhibition, BME Building Q, 15–16 June 2021
Kmetty Zoltán. Investigating the evolution of COVID-19 vaccination-related online discourse using text mining techniques. 8th Education and Research Methodology Workshop, Semmelweis University, 3 February 2021

 

Demo
MILAB COVID Dashboard