SumHiS: Extractive Summarization Exploiting Hidden Structure
https://doi.org/10.15514/ISPRAS-2025-37(3)-17
Abstract
Extractive summarization is a task of highlighting the most important parts of the text. We introduce a new approach to extractive summarization task using hidden clustering structure of the text. Experimental results on CNN/DailyMail demonstrate that our approach generates more accurate summaries than both extractive and abstractive methods, achieving state-of-the-art results in terms of ROUGE-2 metric exceeding the previous approaches by 10%. Additionally, we show that hidden structure of the text could be interpreted as aspects.
About the Authors
Pavel Alexandrovich TIKHONOVRussian Federation
Master Sci. (Comp. Sci.), postgraduate student at Skoltech, researcher at AIRI. Research interests: Natural Language Processing, Generation Models, Interpretability.
Anastasia YANINA
Russian Federation
Head of LLM Department at Wildberries. Research interests: Language Models, Text Embeddings, Marketplace Search Optimization.
Valentin Andreevich MALYKH
Russian Federation
Cand. Sci. (Tech.), NLP Research Head at MTS AI. Research interests: Natural Language Generation, Deep Learning.
References
1. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
2. Nallapati R., Zhou B., Gulcehre C., Xiang B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023, 2016.
3. Lin C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, 2004, pp. 74-81.
4. Zhong M., Liu P., Chen Y., Wang D., Qiu X., Huang X.-J. Extractive Summarization as Text Matching. In Proc. 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 6197-6208.
5. Dong Y., Romascanu A., Cheung J. C. K. HipoRank: Incorporating hierarchical and positional information into graph-based unsupervised long document extractive summarization. arXiv preprint arXiv:2005.00513, 2020.
6. Boiy E., Moens M.-F. A machine learning approach to sentiment analysis in multilingual Web texts. Information retrieval, vol. 12, no. 5, 2009, pp. 526-558.
7. Kiritchenko S., Zhu X., Cherry C., Mohammad S. NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. In Proc. 8th International workshop on Semantic Evaluation (SemEval 2014), 2014, pp. 437-442.
8. Wagner J., Arora P., Cortes S., Barman U., Bogdanova D., Foster J., Tounsi L. DCU: Aspect-based polarity classification for SemEval task 4. In Proc. 8th International workshop on Semantic Evaluation (SemEval 2014), 2014.
9. He R., Lee W. S., Ng H. T., Dahlmeier D. Exploiting document knowledge for aspect-level sentiment classification. arXiv preprint arXiv:1806.04346, 2018.
10. Chen P., Sun Z., Bing L., Yang W. Recurrent attention network on memory for aspect sentiment analysis. In Proc. 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 452-461.
11. Liu J., Zhang Y. Attention modeling for targeted sentiment. In Proc. 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, pp. 572-577.
12. Nallapati R., Zhai F., Zhou B. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents. In Proc. Thirty-First AAAI Conference on Artificial Intelligence, 2017.
13. Cohan A., Dernoncourt F., Kim D. S., Bui T., Kim S., Chang W., Goharian N. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 615-621.
14. Putra J. W. G., Kobayashi H., Shimizu N. Experiment on Using Topic Sentence for Neural News Headline Generation, 2018.
15. He R., Lee W. S., Ng H. T., Dahlmeier D. An unsupervised neural attention model for aspect extraction. In Proc. 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 388-397.
16. Bai Y., Li X., Wang G., Zhang C., Shang L., Xu J., Wang Z., Wang F., Liu Q. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval. arXiv preprint arXiv:2010.00768, 2020.
17. Narayan S., Cohen S. B., Lapata M. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. arXiv preprint arXiv:1808.08745, 2018.
18. Lin C.-Y. ROUGE: A package for automatic evaluation of summaries. In Proc. Text Summarization Branches Out, 2004, pp. 74-81.
19. Xu J., Gan Z., Cheng Y., Liu J. Discourse-Aware Neural Extractive Text Summarization. In Proc. 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 5021-5031.
20. Liu Y., Lapata M. Text Summarization with Pretrained Encoders. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3730-3740.
21. Cheng X., Luo D., Chen X., Liu L., Zhao D., Yan R. Lift yourself up: Retrieval-augmented text generation with self-memory. Advances in Neural Information Processing Systems, vol. 36, 2024.
22. Liu Y., Liu P., Radev D., Neubig G. BRIO: Bringing order to abstractive summarization. arXiv preprint arXiv:2203.16804, 2022.
23. Liu Y., Liu P. SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization. arXiv preprint arXiv:2106.01890, 2021.
24. Dou Z.-Y., Liu P., Hayashi H., Jiang Z., Neubig G. GSum: A General Framework for Guided Neural Abstractive Summarization. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 4830-4842.
25. Qi W., Yan Y., Gong Y., Liu D., Duan N., Chen J., Zhang R., Zhou M. ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing: Findings, 2020, pp. 2401-2410.
26. Wolf T., Debut L., Sanh V., Chaumond J., Delangue C., Moi A., Cistac P., Rault T., Louf R., Funtowicz M., Davison J., Shleifer S., von Platen P., Ma C., Jernite Y., Plu J., Xu C., Le Scao T., Gugger S., Drame M., Lhoest Q., Rush A. M. Transformers: State-of-the-Art Natural Language Processing. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 2020, pp. 38-45.
Review
For citations:
TIKHONOV P.A., YANINA A., MALYKH V.A. SumHiS: Extractive Summarization Exploiting Hidden Structure. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(3):237-250. https://doi.org/10.15514/ISPRAS-2025-37(3)-17