Труды Института системного программирования РАН

Расширенный поиск

Комбинирование признаков для извлечения тематических цепочек в новостном кластере

Полный текст:


В данной работе предлагается метод для извлечения цепочек семантически близких слов и выражений, описывающих различных участников сюжета – тематических узлов. Предполагается, что выделение основных участников позволит улучшить качество обработки новостного кластера. Метод основан на структурной организации новостных кластеров и анализе контекстов вхождения языковых выражений. Контексты слов используются в качестве базиса для извлечения многословных выражений и построения тематических узлов. Оценка предложенного алгоритма производится в задаче построения обзорных рефератов новостных кластеров.

Об авторах

А. А. Алексеев
МГУ, Москва

Н. В. Лукашевич
МГУ, Москва

Список литературы

1. Loukachevitch N.: Multigraph representation for lexical chaining. In: Proceedings of SENSE workshop, pp. 67-76 (2009)

2. Hirst G., St-Onge D.: Lexical Chains as representation of context for the detection and correction malapropisms. In: WordNet: An electronic lexical database and some of its applications / C. Fellbaum, editor. Cambrige, MA: The MIT Press (1998)

3. Turdakov D., Lizorkin D. HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computations, pp. 549–559 (2009)

4. Blei D., Ng A., Jordan M. Latent Dirichlet Allocation. In: Journal of Machine Learning Research, 3:993-1022 (2003)

5. Griffiths T., Steyvers M. Finding scientific topics. In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, No. Suppl 1. (6 April 2004), pp. 5228-5235 (2004)

6. Allan J.: Introduction to Topic Detection and Tracking. In: Topic detection and tracking, Kluwer Academic Publishers Norwell, MA, USA, pp. 1-16 (2002)

7. Duame H., Marcu D.: A large Scale Exploration of Global Features for a Joint Entity Detection and Tracking Model. In: Proceedings of Human Language Conference and Conference on Empirical Methods in Natural Language Processing, pp. 97-104 (2005)

8. Yang H., Callan J.: A metric-based framework for automatic taxonomy induction. In: Proceedings of ACL-2009 (2009)

9. Dang V., Xue X., Croft B. Context-based Quasi-Synonym Extraction. CIIR Technical Report (2009)

10. [Barzilay R., McKeown K.: Extracting Paraphrases from a Parallel Corpus. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (2001)

11. Passonneau R.J., Nenkova A., McKeown K.R., Sigelman S.: Applying the pyramid method in DUC 2005. In: Proceedings of the Document Understanding Conferences (DUC'2005), Vancouver, Canada (2005)

12. Doddington G., Mitchell A., Przybocki M., Ramshaw, L., Strassel S., Weishedel R.: The Automatic Content Extraction (ACE): Task, Data, Evaluation. In: Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC 2004 (2004)

13. Barzilay R., Lee L.: Learning to Paraphrase: an Unsupervised Approach Using Multiple Sequence Alignment. In: Proceedings of HLT/NACCL-2003 (2003)

14. Dolan B., Quirk Ch., Brockett Ch.: Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In: Proceedings of COLING-2004 (2004)

15. Dijk van T.: Semantic Discourse Analysis. In: Teun A. van Dijk, (Ed.), Handbook of Dis-course Analysis, vol. 2., pp. 103-136, London: Academic Press (1985)

16. Hasan R.: Coherence and Cohesive harmony. J. Flood, Understanding reading comprehension, Newark, DE: IRA, pp. 181-219 (1984)

17. Loukachevitch N., Dobrov B.: Evaluation of Thesaurus on Sociopolitical Life as Information Retrieval Tool. In: M.Gonzalez Rodriguez, C. Paz Suarez Araujo (Eds.), Proceedings of Third International Conference on Language Resources and Evaluation (LREC2002), Vol.1, pp.115-121 (2002)

18. Dobrov B., Pavlov A.: Basic line for news clusterization methods evaluation. In: Proceedings of the 5-th Russian Conference RCDL-2010 (2010) (in Russian)

19. Witten I., Paynter G., Frank E., Gutwin C., Newill-Manning C.: KEA: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on Digital Libraries (1999)

20. Alexeev A., Loukachevitch N. Automatic detection of near-synonyms in news clusters. In: Computational Linguistics and Intelligent Technologies: Proceedings of the International Conference Dialog`2011, pp. 32-40 (2011)

21. Carbonell J., Goldstein J.: The use of MMR, diver-sity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 335-336 (1998)

22. Dobrov B., Loukachevitch N.: Summarization of News Clusters Based on Thematic Representation. In: Computational Linguistics and Intelligent Technologies: Proceedings of the International Conference Dialog`2009, pp. 299-305 (2009) (In Russian)

23. Li J., Sun L., Kit C., Webster J.: A Query-Focused Multi-Document Summarizer Based on Lexical Chains. In: Proceedings of the Document Understanding Conference DUC-2007 (2007)

24. Haghighi A., Vanderwende L.: Exploring Content Models for Multi-Document Summarization. In: Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, Boulder, Colorado, pp. 362–370 (2009)

25. Blei D., Griffiths T., Jordan M., Tenenbaum J. Hierarchical topic models and the nested chinese restaurant process. In: Neural Information Processing Systems (NIPS) (2003)

26. Celikyilmaz A., Hakkani-Tur D. A Hybrid Hierarchical Model for Multi-Document Summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 815–824 (2010)

27. Harnly A., Nenkova A., Passonneau R., Ram-bow O.: Automation of summary evaluation by the pyramid method. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’2005), Borovets, Bulgaria (2005)

Для цитирования:

Алексеев А.А., Лукашевич Н.В. Комбинирование признаков для извлечения тематических цепочек в новостном кластере. Труды Института системного программирования РАН. 2012;23.

For citation:

Alekseev A.A., Loukachevitch N.V. Use of Multiple Features for Extracting Topics from News Clusters. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2012;23. (In Russ.)

Просмотров: 52

Creative Commons License
Контент доступен под лицензией Creative Commons Attribution 4.0 License.

ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)