Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Aspect term extraction based on word embedding

https://doi.org/10.15514/ISPRAS-2016-28(6)-16

Abstract

There are many sites in the Internet that allow users to share their opinions and write reviews about all kinds of goods and services. These views may be useful not only for other users, but also for companies which want to track their own reputation and to receive timely feedback on their products and services. The most detailed statement of the problem in this area is an aspect-based sentiment analysis, which determines the user attitude not only to the object as a whole, but also to its individual aspects. In this paper we consider the solution of subtask of aspect terms extraction in aspect-based sentiment analysis. A review of research in this area is given. The subtask of aspect terms extraction is considered as a problem of sequence labeling; to solve it we apply the model of conditional random fields (CRF). To create the sequence feature description, we use distributed representations of words derived from neural network models for the Russian language and parts of speech of the analyzed words. The stages of the aspect terms extraction software system are shown. The experiments with the developed software system were carried out on the corpus of labeled reviews of restaurants, created in the International Workshop on Semantic Evaluation (SemEval-2016). We describe the dependence of the quality of aspect terms extraction subtask on various neural network models and the variations of feature descriptions. The best results (F1-measure = 69%) are shown by a version of the system, which takes into account the context and the parts of speech. This paper contains a detailed analysis of errors made by the system, as well as suggestions on possible options for their correction. Finally, future research directions are presented.

About the Authors

D. O. Mashkin
Vyatka State University
Russian Federation


E. V. Kotelnikov
Vyatka State University
Russian Federation


References

1. Liu B., Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 2012, pp. 1–167.

2. Pontiki M., Galanis D., Pavlopoulos J., Papageorgiou H., Androutsopoulos I., Manandhar S. Semeval-2014 task 4: Aspect based sentiment analysis. Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), 2014, pp. 27-35.

3. Loukachevitch N., Blinov P., Kotelnikov E., Rubtsova Y., Ivanov V., Tutubalina E. SentiRuEval: Testing Object‐oriented Sentiment Analysis Systems in Russian. Proceedings of the 21st International Conference on Computational Linguistics (Dialog-2015), 2015, volume 2, pp. 12–24.

4. Pontiki M., Galanis D., Papageorgiou H., Androutsopoulos I., Manandhar S., AL-Smadi M., Al-Ayyoub M., Zhao Y., Qin B., De Clercq O., Hoste V., Apidianaki M., Tannier X., Loukachevitch N., Kotelnikov E., Bel N., Zafra S. M. J., Eryigit G. Semeval-2016 task 5: Aspect based sentiment analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 19–30.

5. Andrianov I., Mayorov V., Turdakov D. Modern Approaches to Aspect-Based Sentiment Analysis. Trudy ISP RAN/Proc. ISP RAS, vol. 27, issue 5, 2015, pp. 5–22 (in Russian). DOI: 10.15514/ISPRAS-2015-27(5)-1.

6. Ivanov V., Tutubalina E., Mingazov N., Alimova I. Extracting Aspects, Sentiment and Categories of Aspects in User Reviews about Restaurants and Cars. Proceedings of the 21st International Conference on Computational Linguistics (Dialog-2015), 2015, volume 2, pp. 46–57.

7. Jakob N., Gurevych I., Extracting opinion targets in a single-and cross-domain setting with conditional random fields, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010, pp. 1035 1045.

8. Xenos D., Theodorakakos P., Pavlopoulos J., Malakasiotis P., Androutsopoulos I. AUEB-ABSA at SemEval-2016 Task 5: Ensembles of Classifiers and Embeddings for Aspect Based Sentiment Analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 312–317.

9. Hamdan H. SentiSys at SemEval-2016 Task 5: Opinion Target Extraction and Sentiment Polarity Detection. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 350–355.

10. Mikolov T., Sutskever I., Chen K., Corrado G. S., Dean J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 2013, pp. 3111–3119.

11. Blinov P. D., Kotelnikov E. V. Semantic Similarity for Aspect-Based Sentiment Analysis. Proceedings of the 21st International Conference on Computational Linguistics Dialog-2015, 2015, volume 2, pp. 36–45.

12. Toh Z., Su J. NLANGP at SemEval-2016 Task 5: Improving Aspect Based Sentiment Analysis using Neural Network Features. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 282–288.

13. Hu M., Liu B. Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, 2004, pp. 168–177.

14. Popescu A. M., Nguyen B., Etzioni O. OPINE: Extracting product features and opinions from reviews. Proceedings of HLT/EMNLP on interactive demonstrations, 2005, pp. 32–33.

15. Turney P. D. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning, 2001, pp. 491–502.

16. Scaffidi C., Bierhoff K., Chang E., Felker M., Ng H., Jin C. Red Opal: product-feature scoring from reviews. Proceedings of the 8th ACM conference on Electronic commerce, 2007, pp. 182–191.

17. Hofmann T. Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, 1999, pp. 50–57.

18. Blei D. M., Ng A. Y., Jordan M. I. Latent Dirichlet Allocation. Journal of machine Learning research, 2003, pp. 993-1022.

19. Mukherjee A, Liu B. Aspect extraction through semi-supervised modeling. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012, volume 1, pp. 339-348.

20. Titov I., McDonald R., Modeling online reviews with multi-grain topic models. Proceedings of the 17th international conference on World Wide Web. ACM, 2008, pp 111–120.

21. Müller A.C., Behnke S. PyStruct: learning structured prediction in python. Journal of Machine Learning Research 15(1), 2014, pp. 2055–2060.

22. Segalovich I. A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine. In MLMTA, 2003, pp. 273–280.

23. Ramshaw L. A., Marcus M. P. Text chunking using transformation-based learning. Natural language processing using very large corpora, Springer Netherlands, 1999, pp. 157-176.

24. Kutuzov A, Andreev I. Texts in, meaning out: neural language models in semantic similarity task for Russian. Proceedings of the 21st International Conference on Computational Linguistics Dialog-2015, 2015, volume 2, pp. 133–144.

25. Panchenko A., Loukachevitch N. V., Ustalov D., Paperno D., Meyer C. M., Konstantinova N. Russe: The first workshop on russian semantic similarity. Proceedings of the 21st International Conference on Computational Linguistics Dialog-2015, 2015, volume 2, pp. 89-105.

26. Plungian V. A. Why we make Russian National Corpus? Otechestvennye Zapiski 2, 2005.

27. Sharov S, Nivre J. The proper place of men and machines in language technology. Processing Russian without any linguistic knowledge. Proceedings of the 17th International Conference on Computational Linguistics Dialog-2011, 2011, pp. 657–670.


Review

For citations:


Mashkin D.O., Kotelnikov E.V. Aspect term extraction based on word embedding. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(6):223-240. (In Russ.) https://doi.org/10.15514/ISPRAS-2016-28(6)-16



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)