Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Loss functions for train document image segmentation models

https://doi.org/10.15514/ISPRAS-2022-34(2)-8

Abstract

The work is devoted to improving the quality of the results of image segmentation of documents of various scientific articles and legal acts by neural network models by learning using modified loss functions that take into account the features of images of the selected subject area. The analysis of existing loss functions is carried out, as well as the development of new functions that operate both with the coordinates of the bounding boxes and using information about the pixels of the input image. To assess the quality, a neural network segmentation model with modified loss functions is trained, and a theoretical assessment is carried out using a simulation experiment showing the convergence rate and segmentation error. As a result of the study, rapidly converging loss functions were created that improve the quality of document image segmentation using additional information about the input data.

About the Authors

Andrey Igorevich PERMINOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Master’s student of the Department of System Programming



Denis Yurievich TURDAKOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

PhD, Head of Department at ISP RAS, associate professor of the Department of System Programming at MSU



Oksana Vladimirovna BELYAEVA
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

PhD Student



References

1. Zheng Z., Wang P. et al. Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12993-13000.

2. Rezatofighi H., Tsoi N. et al. Generalized intersection over union: A metric and a loss for bounding box regression. In Proc. of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 658-666.

3. Zheng T., Zhao S. et al. SCALoss: Side and Corner Aligned Loss for Bounding Box Regression. arXiv preprint arXiv:2104.00462, 2021, 9 p.

4. He J., Erfani S. et al. α-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. Advances in Neural Information Processing Systems, vol. 34, 2021, 13 p.

5. Wu S., Yang J. et al. Iou-balanced loss functions for single-stage object detection. Pattern Recognition Letters, vol. 156, 2022, pp. 96-103.

6. Du S., Zhang B., Zhang P. Scale-Sensitive IOU Loss: An Improved Regression Loss Function in Remote Sensing Object Detection. IEEE Access, vol. 9, 2021, pp. 141258-141272.

7. Redmon J., Farhadi A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018, 6 p.

8. Zhong X., Tang J., Yepes A.J. Publaynet: largest dataset ever for document layout analysis. In Proc. of the 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 1015-1022.

9. Беляева О.В., Перминов А.И., Козлов И.С. Использование синтетических данных для тонкой настройки моделей сегментации документов. Труды ИСП РАН, том 32, вып. 4, 2020 г., стр. 189-202. DOI: 10.15514/ISPRAS–2020–32(4)–14 / Belyaeva O.V., Perminov A.I., Kozlov I.S. Synthetic data usage for document segmentation models fine-tuning. Trudy ISP RAN/Proc. ISP RAS, vol. 32, issue 4, 2020. pp. 189-202.


Review

For citations:


PERMINOV A.I., TURDAKOV D.Yu., BELYAEVA O.V. Loss functions for train document image segmentation models. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(2):89-110. (In Russ.) https://doi.org/10.15514/ISPRAS-2022-34(2)-8



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)