Texterra: A Framework for Text Analysis
https://doi.org/10.15514/ISPRAS-2014-26(1)-18
Abstract
About the Authors
Denis TurdakovRussian Federation
Nikita Astrakhantsev
Russian Federation
Yaroslav Nedumov
Russian Federation
Andrey Sysoev
Russian Federation
Ivan Andrianov
Russian Federation
Vladimir Mayorov
Russian Federation
Denis Fedorenko
Russian Federation
Anton Korshunov
Russian Federation
Sergey Kuznetsov
Russian Federation
References
1. Bird S., Klein E., Loper E., Baldridge J. Multidisciplinary instruction with the Natural Language Toolkit. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, 2008. pp. 62-70.
2. Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics. PLoS computational biology, 9(2), 2013.
3. Ferrucci D. et al. Towards an interoperability standard for text and multi-modal analytics. IBM Res. Technical report RC24122, 2006.
4. Nozhov I. Morfologicheskaya i sintaksicheskaya obrabotka teksta(modeli i programmy) [Morphological and syntactic text processing (models and programs)]. Tezisy dissertatsii [PhD Thesis], 2003. (in Russian).
5. Аlekseev А., Dobrov B., Lukashevich N. Lingvisticheskaya ontologiya tezaurus RuTez [Linguistic ontology thesaurus RuTez] // Trudy konferentsii Open Semantic Technologies for Intelligent Systems [The Proceedings of Open Semantic Technologies for Intelligent Systems], 2013. pp. 153–158. (in Russian).
6. Braslavskij, P., Mukhin, M., Lyashevskaya, O. N., Bonch-Osmolovskaya, А. А., Krzhizhanovskij, А., Egorov, P. (2012). YARN: nachalo [YARN: The beginning]. Trudy konferentsii Dialog [The Proceedings of International Conference on Computational Linguistics Dialog], 2013.
7. Karkaletsis V., Fragkou P., Petasis G., Iosif E. Ontology based information extraction from text. Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, ser. Lecture Notes in Computer Science, G. Paliouras, C. Spyropoulos, and G. Tsatsaronis, Eds. Springer Berlin / Heidelberg, 2011. vol. 6050, pp. 89-109. doi: 10.1007/978-3-642-20795-2_4
8. Unger C., Cimiano P. Pythia: Compositional meaning construction for ontology-based question answering on the semantic web. Natural Language Processing and Information Systems, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2011. vol. 6716, pp. 153–160. doi: 10.1007/978-3-642-22327-3_15
9. Jimeno-Yepes A., Berlanga-Llavori R., Rebholz-Schuhmann D. Ontology refinement for improved information retrieval. Information Processing & Management, 2010. vol. 46, no. 4, pp. 426 – 435.
10. Grineva M., Turdakov D., Sysoev A. Blognoon: Exploring a topic in the blogosphere. Proceedings of the 20th international conference companion on World wide web, Hyderabad, India, 2011. pp. 213–216.
11. Biemann C. Ontology Learning from Text: A Survey of Methods. LDV-Forum, 2005. vol. 20, pp. 75–93.
12. Astrakhantsev N, Turdakov D. Automatic construction and enrichment of informal ontologies: A survey. Programming and Computer Software, 2013. vol. 39, no. 1, pp. 34-42. doi: 10.1134/S0361768813010039
13. Segalovich I. A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine. In MLMTA, 2003. pp. 273-280.
14. Bocharov V., Alexeeva S., Granovsky D., Protopopova E., Stepanova M., Surikov A. Crowdsourcing morphological annotation. Komp'yuternaya lingvistika i intellektual'nye tekhnologii: Po materialam ezhegodnoj Mezhdunarodnoj konferentsii «Dialog» [The Proceedings of International Conference on Computational Linguistics Dialog]. 2013. vol. 12, no. 19.
15. Lyashevskaya O., Plungyan V., Sichinava D. O morfologicheskom standarte Natsional'nogo korpusa russkogo yazyka [About morphological standard of Russian National Corpus]. Natsional'nyj korpus russkogo yazyka: 2003-2005. Rezul'taty i perspektivy [Russian Natioanl Corpus: 2003-2005. Results and Prospects], 2005. pp. 111—135.
16. Milne D., Witten I. H. Learning to link with wikipedia. Proceedings of the 17th ACM conference on Information and knowledge management (CIKM '08), 2008.
17. Stanford Twitter sentiment general domain datasetAvailable at: http://www.stanford.edu/~alecmgo/cs224n/trainingandtestdata.zip
18. Sentiment140 Twitter sentiment general domain dataset. Available at: http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
19. KnowCenter Twitter sentiment general domain dataset. Available at: http://know-center.tugraz.at/loesungen/daten
20. UNED Twitter sentiment general domain dataset. Available at: http://nlp.uned.es/~damiano/datasets/entityProfiling_ORM_Twitter.html
21. International Conference on Weblogs and Social Media movie domain dataset. Available at: http://icwsm.cs.mcgill.ca
22. IMDb movie review dataset. Available at: http://www.cs.cornell.edu/people/pabo/movie-review-data/polarity_html.zip
Review
For citations:
Turdakov D., Astrakhantsev N., Nedumov Ya., Sysoev A., Andrianov I., Mayorov V., Fedorenko D., Korshunov A., Kuznetsov S. Texterra: A Framework for Text Analysis. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014;26(1):421-438. (In Russ.) https://doi.org/10.15514/ISPRAS-2014-26(1)-18