Text documents marking algorithm based on interword distances shifting invariant to format conversion
https://doi.org/10.15514/ISPRAS-2021-33(4)-10
Abstract
The article presents an electronic text documents marking algorithm based on the identification information embedding by changing the values of the intervals between words (interwords distance shifting). The algorithm development is aimed at increasing the documents containing text information security from leakage through the channel due to the transfer of documents printed on paper, as well as the corresponding electronic copies of paper documents. In the marking algorithm developing process, an existing tools analysis of protecting paper documents from leakage was carried out, practical solutions in the field of protecting text documents were considered, their advantages and disadvantages were determined. The interwods distance shifting algorithm acts as an approach to the information embedding in electronic documents. Changing the values of interwords distance is based on embedding the normalized space in the selected areas of text lines and adjusting the remaining values of the spacing between words by the calculated values. To invariance ensure of the embedded marker for printing and subsequent scanning or photographing, formation algorithms of embedding regions and embedding matrix have been developed. In the embedding regions forming process from the text lines of the source document, arrays of spaces are formed, consisting of pairs: four and two spaces or two spaces. By means of the embedded information in the formed areas, the places where the normalized space is inserted is determined. In the embedding a marker process, an embedding matrix is formed, containing the values of the word displacement, and it is embedded in the original document in the process of printing. The developed marking algorithm usage makes it possible to introduce a marker into the electronic document text structure that is invariant to the format transformation of an electronic document into a paper one and vice versa. In addition, the developed marking algorithm features and limitations are presented. Directions for further research identified.
About the Authors
Alexander Vasilievich KOZACHOKRussian Federation
Employer of the Academy of Federal Guard Service
Sergey Alexandrovich KOPYLOV
Russian Federation
Employer of the Academy of Federal Guard Service
Pavel Nikolaevich GORBACHEV
Russian Federation
Employer of the Academy of Federal Guard Service
Artur Evgenevich GAYNOV
Russian Federation
Employer of the Ministry of Defence
Boris Vladimirovich KONDRAT’EV
Russian Federation
Employer of the Ministry of Defence
References
1. Россия: утечки информации ограниченного доступа, 2020 год. InfoWatch. 2021, 30 стр. / Russia: Restricted Information Leaks, 2020. InfoWatch. 2021, 30 p. Available at: https://www.infowatch.ru/ analytics/analitika/rossiya-utechki-informatsii-ogranichennogo-dostupa-2020-god, accessed 10.08.2021 (in Russian).
2. Исследование утечек информации ограниченного доступа в 2020 году. InfoWatch. 2021, 40 стр. / Research on restricted information leaks in 2020. InfoWatch. 2021, 40 p. Available at: https:// www.infowatch.ru/analytics/analitika/issledovanie-utechek-informatsii-ogranichennogo-dostupa-v-2020 -godu, accessed 10.08.2021 (in Russian).
3. Mukkamala P. P., Rajendran S. A survey on the different firewall technologies, International Journal of Engineering Applied Sciences and Technology, vol. 5, issue 1, 2020, pp 363-365.
4. Neupane K., Haddad R., Chen L. Next Generation Firewall for Network Security: A Survey. In Proc. of the SoutheastCon 2018, 2018, pp. 1-6.
5. Sharma R. K., Kalita H. K., Issac B. Different firewall techniques: A survey. In Proc. of the Fifth International Conference on Computing, Communications and Networking Technologies (ICCCNT). 2014, pp. 1-6.
6. Lopez G., Richardson N., Carvajal J. Methodology for Data Loss Prevention Technology Evaluation for Protecting Sensitive Information. Revista Politécnica, vol. 36, no. 3, 2015, pp. 60-69.
7. Alneyadi S., Sithirasenan E., Muthukkumarasamy V. A survey on data leakage prevention systems. Journal of Network and Computer Applications, vol. 62, 2016, pp. 137-152.
8. Jadhav P., Chawan P. M. Data Leak Prevention system: A Survey. International Research Journal of Engineering and Technology, vol. 6, no. 10, 2019, pp. 197-199.
9. Kozachok A.V., Kopylov S.A. et al. Text marking approach for data leakage prevention. Journal of Computer Virology and Hacking Techniques, vol. 15. no. 3, 2019, pp. 219-232.
10. Козачок А.В., Кпылов С.А. и др. Подход к извлечению робастного водяного знака из изображений, содержащих текст. Труды СПИИРАН, вып. 5(60), 2018 г., стр. 128-155 / Kozachok A.V., Kopylov S.A. et al. An Approach to a Robust Watermark Extraction from Images Containing Text. SPIIRAS Proceedings, issue 5(60), 2018, pp. 128-155 (in Russian).
11. Trace Doc. Available at: https://secretgroup.ru/trace-doc, Accessed 10.08.2021.
12. Unique Interface. EveryTag. Available at: https://everytag.ru/ui, Accessed 10.08.2021.
13. Safe Copy. Available at: https://www.niisokb.ru/products/safecopy, Accessed 10.08.2021.
14. Jalil Z., Mirza A.M. A Review of Digital Watermarking Techniques for Text Documents. In Proc. of the International Conference on Information and Multimedia Technology, 2009, pp. 230-234.
15. Huang D., Yan H. Interword distance changes represented by sine waves for watermarking text images, IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 12, 2001, pp. 1237-1245.
16. Alattar A.M., Alattar O.M. Watermarking electronic text documents containing justified paragraphs and irregular line spacing, In Proc. of the Conference on Security, Steganography, and Watermarking of Multimedia Contents, 2004, pp. 685-695.
17. Yang H., Kot A.C. Text document authentication by integrating inter character and word spaces watermarking, In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), 2004, pp. 955-958.
18. Национальный стандарт Российской Федерации. Система стандартов по информации, библиотечному и издательскому делу. Делопроизводство и архивное дело. Термины и определения, ГОСТ Р 7.0.8–2013, Стандартинформ, 2013 г., 16 с. / National standard of the Russian Federation. System of standards on information, librarianship and publishing. Records management and organization of archives. Terms and difinitions, GOST R 7.0.8–2013, Standartinform, 2013, 16 p. (in Russian).
19. Национальный стандарт Российской Федерации. Единая система конструкторской документации. Общие требования к текстовым документам, ГОСТ Р 2.105–2019, Стандартинформ, 2019 г., 35 с. / National standard of the Russian Federation. Unified system for design documentation. General requirements for textual documents, GOST R 2.105–2019, Standartinform, 2019, 35 p. (in Russian).
20. Библиотека маркирования текстовых документов при печати за счет горизонтального смещения слов, Свидетельство о государственной регистрации программ для ЭВМ № 2020667592 от 24.12.2020, Россия, заявка № 2020666902 от 17.12.12020 / Library for marking text documents when printing due to horizontal displacement of words, Certificate of state registration of computer programs № 2020667592 dated 12.24.2020, Russia, application № 2020666902 dated 17.12.12020 (in Russian).
21. Модуль маркирования текстовых документов при печати для ОС семейства Windows, Свидетельство о государственной регистрации программ для ЭВМ № 2020667579 от 24.12.2020, Россия, заявка № 2020666721 от 17.12.12020 / Module for marking text documents when printing for Windows family OS, Certificate of state registration of computer programs № 2020667579 dated 12.24.2020, Russia, application № 2020666721 dated 12.17.12020 (in Russian).
Review
For citations:
KOZACHOK A.V., KOPYLOV S.A., GORBACHEV P.N., GAYNOV A.E., KONDRAT’EV B.V. Text documents marking algorithm based on interword distances shifting invariant to format conversion. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(4):131-146. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(4)-10