Vector Representations of Fonts: an Additional Feature for Understanding Documents
https://doi.org/10.15514/ISPRAS-2025-37(6)-44
Abstract
The article presents a model based on a convolutional neural network that matches a vector of embeddings encoding information about fonts to a text image. The model consists of two identical convolutional blocks that combine features into a vector, which is then analyzed by linear layers to find differences. The model trained in this way is able to distinguish fonts, ignoring the text content, which makes it universal for various types of documents. Embedding vectors are tested on additional tasks, such as text classification by fatness and tilt, demonstrating high accuracy and confirming their usefulness for analyzing stylistic features. Experiments with variable and manual fonts show the versatility of the model and its applicability to work with a variety of data. The results of the comparison with the base model confirm the effectiveness of the proposed architecture. However, the limitations associated with working with low-quality data and multilingual texts have been identified. The code and models were published on GitHub (https://github.com/YRL-AIDA/FontEmb).
About the Authors
Daniil Evgenievich KOPYLOVRussian Federation
Master’s student of Irkutsk State University, employee of Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences. Research interests: applied mathematics, data analysis.
Maria Viktorovna SHCHURIK
Russian Federation
A bachelor’s student of Irkutsk State University. Research interests: applied mathematics, data analysis, artificial intelligence.
References
1. Xu Y., Li M., Cui L. Huang S. Zhou M. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In Proc. of the 26th ACM SIGKDD, 2020, pp. 1192-1200. DOI:10.1145/3394486.3403172.
2. Brzakovic D., Tou J. T. An approach to computer-aided document examination. International journal of computer & information sciences, vol. 14, 1985, pp. 365-385.
3. Allier B., Emptoz H. Type extraction and character prototyping using Gabor filters. In Proc. of the 7th ICDAR, 2003, pp. 799-803. DOI: 10.1109/ICDAR.2003.1227772.
4. O’Donovan P., Lībeks J., Agarwala A., Hertzmann A. Exploratory font selection using crowdsourced attributes. ACM Transactions on Graphics, vol. 33, pp. 1–9. DOI:10.1145/2601097.2601110.
5. Wang Z., Yang J., Jin H., Shechtman E., Agarwala A., Brandt J., Huang, T.S. DeepFont: Identify Your Font from An Image. In Proc. of the 23rd ACM MM, 2015, pp. 813-814. DOI:10.1145/2733373.2807988.
6. Tensmeyer C., Saunders D., Martinez T.R. Convolutional Neural Networks for Font Classification. In Proc. of 14th IAPR ICDAR, 2017, pp. 985-990. DOI:10.1109/ICDAR.2017.164.
7. Jiang S., Wang Z., Hertzmann A., Jin H., Fu Y. Visual font pairing. IEEE Transactions on Multimedia, 2019, 22(8), pp. 2086-2097. DOI:10.1109/TMM.2019.2952266.
8. Yasukochi N., Hayashi H., Haraguchi D., Uchida S. Analyzing Font Style Usage and Contextual Fac-tors in Real Images. In Proc. of the 17th ICDAR, 2023, pp. 331-347. DOI:10.1007/978-3-031-41682-8_21.
9. Kulahcioglu T., De Melo G. Fonts like this but happier: A new way to discover fonts. In Proc of the 28th ACM MM, 2020, pp. 2973-2981. DOI:10.1145/3394171.3413534.
10. Bychkov O., Merkulova K., Dimitrov G., Zhabska Y., Kostadinova I., Petrova P., Petrov P., Getova I., Panayotova G. Using Neural Networks Application for the Font Recognition Task Solution. In Prec. of 55th ICEST, 2020, pp. 167-170. DOI: 10.1109/ICEST49890.2020.9232788.
11. Slimane F., Ingold R., Hennebert J. ICDAR2017 Competition on Multi-Font and Multi-Size Digitally Represented Arabic Text. In Prec. of 14th IAPR ICDAR, 2017, vol. 1, pp. 1466-1472. DOI: 10.1109/ICDAR.2017.239.
12. Tatsukawa Y. et al. FontCLIP: A Semantic Typography Visual‐Language Model for Multilingual Font Applications. Computer Graphics Forum, 2024, 43(2), p. e15043. DOI: 10.1111/cgf.15043.
13. Phinney T. Variable Fonts Are the Next Generation. Communication Arts, 2016.
Review
For citations:
KOPYLOV D.E., SHCHURIK M.V. Vector Representations of Fonts: an Additional Feature for Understanding Documents. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(6):177-188. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(6)-44






