In the framework of the planned project, we will deal with difficulties of sentiment analysis of Hungarian texts in different domains. Currently, the toolkit for sentiment analysis of Hungarian texts is modest: only two domain-independent sentiment dictionaries are available which did not prove to be effective on some text domains. Lexicons that are automatically translated from other languages, even domain-specific ones, are also inefficient in many sentiment analysis tasks. The main goal of our recent project is to develop a domain-dependent sentiment analysis method suitable for the effective analysis of Hungarian texts in a range of different domains (e.g. political news). To accomplish this task, we plan to apply different NLP methods then measure and compare the efficiency of each outcome.
During the project, we shall create large sentiment corpora manually annotated at document- and sentence-levels. These corpora are used to 1) reveal the qualitative and quantitative features of sentiment contents of different domains, 2) create dictionaries, 3) various machine learning purposes, 4) and finally, to measure the effectiveness of machine learning algorithms and dictionary-based methods. For the analysis, on the one hand, we create domain-dependent sentiment dictionaries on the basis of our corpora via label propagation, proven to be effective in international projects. We experiment with various modifications of this method as well. On the other hand, using a manually annotated training set, we shall apply a supervised machine learning method, particularly using a cloud-based classification procedure developed in the framework of the POLTEXT project, which can classify more than 20 categories with high efficiency (up to 95% precision).
Keywords: sentiment analysis, Hungarian language, domain dependency, machine learning, dictionary-based method, dictionary creation, manually annotated corpora