Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Automatic Recognition of Domain-Specific Terms: an Experimental Evaluation

https://doi.org/10.15514/ISPRAS-2014-26(4)-5

Abstract

This paper presents an experimental evaluation of the state-of-the-art approaches for automatic term recognition based on multiple features: machine learning method and voting algorithm. We show that in most cases machine learning approach obtains the best results and needs little data for training; we also find the best subsets of all popular features.

About the Authors

D. Fedorenko
ISP RAS
Russian Federation


N. Astrakhantsev
ISP RAS
Russian Federation


D. Turdakov
ISP RAS
Russian Federation


References

1. Pazienza M., Pennacchiotti M., Zanzotto F. Terminology extraction: an analysis of linguistic and statistical approaches // Knowledge Mining. - 2005. - P. 255-279.

2. Zhang Z, Brewster C, Ciravegna F. A comparative evaluation of term recognition algorithms // Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC08), Marrakech, Morocco. - 2008.

3. Patry A., Langlais P. Corpus-based terminology extraction // Terminology and Content Development-Proceedings of 7th International Conference On Terminology and Knowledge Engineering, Litera, Copenhagen. - 2005.

4. Nokel M, Bolshakova E, Loukachevitch N. Combining multiple features for singleword term extraction. - 2012.

5. Kageura K., Umino B. Methods of automatic term recognition: A review // Terminology. - 1996. - V. 3, No 2. - P. 259-289.

6. Ahrenberg L. Term extraction: A review draft version 091221. - 2009.

7. Manning C, Schutze H. Foundations of statistical natural language processing. - MIT press, 1999.

8. Empirical observation of term variations and principles for their description / B. Daille, B. Habert, C. Jacquemin, J. Royaute // Terminology. - 1996.- V. 3, No 2. - P. 197-257.

9. Foo J. Term extraction using machine learning. - 2009.

10. Zhang W, Yoshida T., Tang X. Using ontology to improve precision of terminology extraction from documents // Expert Systems with Applications. - 2009. - V. 36, No 5. - P. 9333-9339.

11. Dobrov B., Loukachevitch N. Multiple evidence for term extraction in broad domains // Proceedings of the 8th Recent Advances in Natural Language Processing Conference (RANLP 2011). Hissar, Bulgaria. - 2011. - P. 710-715.

12. Church K., Hanks P. Word association norms, mutual information, and lexicography // Computational linguistics. - 1990. - V. 16, No 1. - P. 22-29.

13. Frantzi K., Ananiadou S. Extracting nested collocations // Proceedings of the 16th conference on Computational linguistics-Volume 1 / Association for Computational Linguistics. - 1996. - P. 41-46.

14. Navigli R., Velardi P. Semantic interpretation of terminological strings // Proc. 6th IntB^TMl Conf. Terminology and Knowledge Eng. - 2002. - P. 95-100.

15. Sclano F, Velardi P. Termextractor: a web application to learn the shared terminology of emergent web communities // Enterprise Interoperability II. - 2007. - P. 287-290.

16. Park Y, Byrd R., Boguraev B. Automatic glossary extraction: beyond terminology identification // Proceedings of the 19th international conference on Computational linguistics-Volume 1 / Association for Computational Linguistics. - 2002. - P. 1-7.

17. Corpus-based terminology extraction applied to information access / A. Penas, F. Verdejo, J. Gonzalo et al. // Proceedings of Corpus Linguistics / Citeseer. - V. 2001. - 2001.

18. University of surrey participation in trec8: Weirdness indexing for logical document extrapolation and retrieval (wilder) / K. Ahmad, L. Gillam, L. Tostevin et al. // The Eighth Text REtrieval Conference (TREC-8). - 1999.

19. Velardi P., Missikoff M., Basili R. Identification of relevant terms to support the construction of domain ontologies // Proceedings of the workshop on Human Language Technology and Knowledge Management-Volume 2001 / Association for Computational Linguistics. - 2001. - P. 5.

20. Fault-tolerant learning for term extraction / Y. Yang, H. Yu, Y. Meng et al. - 2011.

21. Manning C, Raghavan P. Introduction to information retrieval. - V. 1.

22. Daille B. Study and implementation of combined techniques for automatic extraction of terminology // The balancing act: Combining symbolic and statistical approaches to language. - 1996. - V. 1. - P. 49-66.

23. Guyon I., Elisseeff A. An introduction to variable and feature selection // The Journal of Machine Learning Research. - 2003. - V. 3. - P. 1157-1182.

24. Molina L., Belanche L., Nebot A. Feature selection algorithms: A survey and experimental evaluation // Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on / IEEE. - 2002. - P. 306-313.


Review

For citations:


Fedorenko D., Astrakhantsev N., Turdakov D. Automatic Recognition of Domain-Specific Terms: an Experimental Evaluation. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014;26(4):55-72. (In Russ.) https://doi.org/10.15514/ISPRAS-2014-26(4)-5



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)