Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences?

https://doi.org/10.15514/ISPRAS-2014-26(4)-10

Abstract

The paper deals with keyphrase extraction problem for single documents, e.g. scientific abstracts. Keyphrase extraction task is important and its results could be used in a variety of applications: data indexing, clustering and classification of documents, meta-information extraction, automatic ontologies creation etc. In the paper we discuss an approach to keyphrase extraction, itsтАЩ first step is building of candidate phrases which are then ranked and the best are selected as keyphrases. The paper is focused on the evaluation of weighting approaches to candidate phrases in the unsupervised ex-traction methods. A number of in-phrase word weighting procedures is evaluated. Unsuitable approaches to weighting are identified. Testing of some approaches shows their equivalence as applied to keyphrase extraction. A feature, which allows to increase the quality of extracted keyphrases and shows better results in comparison to the state of the art, is proposed. Experiments are based on Inspec dataset.

About the Authors

S. V. Popova
Saint-Petersburg State University; ITMO University
Russian Federation


I. A. Khodyrev
ITMO University
Russian Federation


References

1. Gutwina, C., Paynterb, G., Wittenb, I., Nevill-Manningc C., Frankb E.: Improving browsing in digital libraries with keyphrase indexes. Journal of Decision Support Sys-tems, 27(1-2), pp. 81-104 (1999)

2. Zhang, D. and Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Re-sults. In: 6th Asia-Pacific Web Conference. Hangzhou, China (2004)

3. Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y., Ma, J.: Learning to cluster web search re-sults. In: the 27th Annual International ACM SIGIR Conference on Research and De-velopment in Information Retrieval, pp. 210-217 (2004)

4. Popova, S., Khodyrev, I., Egorov, A., Logvin, S., Gulyaev, S., Karpova, M. and Mouromtsev, D. Sci-Search: Academic Search and Analysis System Based on Keyphrases. In: KESW 2013, Communications in Computer and Information Science, CCIS, vol. 394, pp 281-288, Springer Berlin Heidelberg

5. Pudota, N., Dattolo, A., Baruzzo, A., Ferrara, F., Tasso, C.: Automatic keyphrase ex-traction and ontology mining for content-based tag recommendation. International Journal of Intelligent Systems, vol 25, pp. 1158-1186, 2010

6. You, W., Fontaine, D., Barhes, J.-P.: An automatic keyphrase extraction system for scientific documents. In: Knowl Inf Syst 34, pp. 691-724, 2013

7. El-Beltagy, S. R., and Rafea, A.,: KP-Miner: A keyphrase extraction system for english and arabic documents. In: Information Systems, 34, pp. 132-144, 2009

8. Popova, S., Khodyrev, I.: Izvlechenie i ranzirovanie klyuchevix fraz v zadache annotirovaniya [Keyphrase extraction and ranking in annotation problem]. Journal Nauchno-Texnicheskiy Vestnik Informatsionnix technologiy mechaniki i optiki [Scientific and Technical Journal of Information Technologies, Mechanics and Optocs], Vol. 1, 2013

9. Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Conference on Empir-ical Methods in Natural Language Processing, pp. 404-411, 2004

10. Xiaojun, W. and Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd AAAI Conferenceon Artificial Intelligence, pp. 855-860, 2008

11. Xiaojun W., Xiao J.: Exploiting Neighborhood Knowledge for Single Document Sum-marization and Keyphrase Extraction ACM Transactions on Information Systems, 28(2), Article 8, 2010

12. Zesch, T., Gurevych, I.: Approximate Matching for Evaluating Keyphrase Extraction. In: International Conference RANLP 2009. pp. 484-489, Borovets, Bulgaria, 2009

13. Kim, S.N., Medelyan, O., Yen, M.: Automatic keyphrase extraction from scientific ar-ticles. Language Resources and Evaluation, Springer Kan & Timothy Baldwin, 2012

14. Hulth A.: Improved automatic keyword extraction given more linguistic knowledge. In: Conference on Empirical Methods in Natural Language Processing, pp. 216-223, 2003

15. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proc. of IJCAI. pp. 688-673,1999

16. Turney, P.: Learning to Extract Keyphrases from Text. In: NRC/ERB-1057, pp. 17- 43, 1999

17. Manning, C., Raghavan, P., Schutz,e H.: Introduction to Information Retrieval. Cam-bridge University Press, 2009

18. Dobrynin, V., Patterson, D., Rooney, N.: Contextual Document Clustering. In Advanc-es in Information Retrieval. Lecture Notes in Computer Science. 2997, pp.167-180, 2004

19. Standford POS tagging tool DOI: http://nlp.stanford.edu/software/tagger.shtml (09.11.2012).

20. Tsatsaronis, G., Varlamis, I., Norvag, K.: SemanticRank: Ranking Keywords and Sen-tences Using Semantic Graphs. In: Proc. of the 23rd International Conference on Com-putational Linguistics, pp. 1074-1082, 2010

21. Hasan, K. S., Ng, V.: Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art. In: Coling, Poster Volume, Beijing, pp. 365-373, 2010


Review

For citations:


Popova S.V., Khodyrev I.A. Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences? Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014;26(4):123-136. (In Russ.) https://doi.org/10.15514/ISPRAS-2014-26(4)-10



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)