Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Printed text documents watermarking based on vertical word shift and word fragments brightness changing

https://doi.org/10.15514/ISPRAS-2021-33(5)-4

Abstract

This paper describes the results of the development of methods for marking text documents represented as a raster image. An important feature of the algorithms is the possibility wipe current document mark and embed another one. The study refers to structural marking algorithms based on vertical word shifts and brightness changes of certain areas of the words. Segmentation tools are used to obtain document layout, BCH codes for error correction, a likelihood maximization method for label extraction, and a neural network for perturbed words recovery. Testing has proved the practical applicability of the algorithms with printing and scanning.

About the Authors

Dmitry Olegovich OBYDENKOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Graduate student



Alexander Evgenevich FROLOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Master's student



Yury Vitalievich MARKIN
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Researcher, PhD



Stanislav Alexandrovich FOMIN
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Leading programmer



Boris Vladimirovich KONDRAT’EV
Ministry of Defence of the Russian Federation
Russian Federation


References

1. Утечки информации ограниченного доступа: отчет за 9 месяцев 2020 г. Экспертно-аналитический центр InfoWatch, 2020 г. / Restricted information leaks: report for 9 months of 2020. InfoWatch Analytical Center, 2020 (in Russian).

2. Козлов С.В., Копылов С.А. и др. Реализация маркирования в подсистеме печати ОС семейства Windows на основе виртуального XPS-принтера. Труды ИСП РАН, том 32, вып. 5, 2020 г., стр. 95-110 / Kozlov S.V., Kopylov S.A. et al. Implementing watermarking based on a virtual XPS printer for Windows operating systems. rudy ISP RAN/Proc. ISP RAS, vol. 32, issue 5, 2020, pp. 95-110 (in Russian). DOI: 10.15514/ISPRAS–2020–32(5)–7.

3. Dong P., Galatsanos N. P. Affine transformation resistant watermarking based on image normalization. In Proc. of the International Conference on Image Processing, 2002, pp. 489-492.

4. Pramila A., Keskinarkaus A., Seppänen T. Multiple domain watermarking for print-scan and JPEG resilient data hiding. Lecture Notes in Computer Science, vol. 5041, 2007, pp. 279-293.

5. Ahmed Q., Munib S., Mirza M. T., Khan A. Smart phone based online medicine authentication using print-cam robust watermarking. In Proc. of the 13th International Conference on Frontiers of Information Technology (FIT), 2015, pp. 222-227.

6. Ahvanooey M.T., Li Q. et al Modern text hiding, text steganalysis, and applications: a comparative analysis. Entropy, vol. 21, no. 4, 2019, article 355.

7. Khadam U., Iqbal M.M. et al. Digital watermarking technique for text document protection using data mining analysis. IEEE Access, vol. 7, 2019, pp. 64955-64965.

8. Por L. Y., Wong K., Chee K. O. UniSpaCh: A text-based data hiding method using Unicode space characters. Journal of Systems and Software, vol. 85, no. 5, 2012, pp. 1075-1082.

9. Bender W., Gruhl D. et al. Techniques for data hiding. IBM Systems Journal, vol. 35, issue 3.4, 1996, pp. 313-336.

10. Leea I.S. Secret communication through web pages using special space codes in HTML files. International Journal of Applied Science and Engineering, vol. 6, no. 2, 2008, pp. 141-149.

11. Ahvanooey M.T., Tabasi S.H. A new method for copyright protection in digital text documents by adding hidden unicode characters in persian/english texts. International Journal of Current Life Sciences, vol. 4, no. 8, 2014, pp. 4895-4900.

12. Ahvanooei M.T., Tabasi S.H., Rahmani S. A novel approach for text watermarking in digital documents by zero-width interword distance changes. DAV International Journal of Science, vol. 4, no. 3, 2015, pp. 550-558.

13. Low S.H., Maxemchuk N.F. et al. Document marking and identification using both line and word shifting. In Proc. of the INFOCOM'95, 1995, pp. 853-860.

14. Brassil J. T., Low S. et al. Electronic marking and identification techniques to discourage document copying. IEEE Journal on Selected Areas in Communications, vol. 13, issue 8, 1995, pp. 1495-1504.

15. Alattar A.M., Alattar O.M. Watermarking electronic text documents containing justified paragraphs and irregular line spacing. Proceedings of the SPIE, vol. 5306, Security, Steganography, and Watermarking of Multimedia Contents VI, 2004, pp. 685-695.

16. Kim Y.W., Moon K.A., Oh I.S. A Text Watermarking Algorithm based on Word Classification and Inter-word Space Statistics. In Proc. of the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 775-779.

17. Kozachok A.V., Kopylov S. Estimation of Watermark Embedding Capacity with Line Space Shifting. In Proc. of the 2020 Ivannikov Memorial Workshop (IVMEM), 2020, pp. 29-34.

18. Tan L., Hu K. et al. Print-scan invariant text image watermarking for hardcopy document authentication. Multimedia Tools and Applications, vol. 78, no. 10, 2018, pp. 13189-13211.

19. Xiao C., Zhang C., Zheng C. Fontcode: Embedding information in text documents using glyph perturbation. ACM Transactions on Graphics (TOG), vol. 37, no. 2, 2017, pp. 1-16.

20. Gutub A., Fattani M. A novel Arabic text steganography method using letter points and extensions. In Proc. of the WASTET International Conference on Computer, Information and System Science and Engineering (ICCISSE), 2007, pp. 28-31.

21. Secret Technologies – Trace Doc. Available at https://secretgroup.ru/trace-doc/, accessed 09.04.2021.

22. EVERYTAG – Information Leaks Detection (ILD). Available at https://everytag.ru/, accessed 09.04.2021.

23. Smith R. An overview of the Tesseract OCR engine. In Proc. of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2007, pp. 629-633.

24. Ronneberger O., Fischer P., Brox T. U-net: Convolutional networks for biomedical image segmentation. Lecture Notes in Computer Science, vol. 9351, 2015, pp. 234-241.

25. Морелос-Сарагоса Р. Искусство помехоустойчивого кодирования. Техносфера, 2005, 320 стр. / Morelos-Zaragoza R. The Art of Error Correcting Coding. Willey, 2002, 238 p.

26. Suzuki S. Topological structural analysis of digitized binary images by border following. Computer vision, graphics, and image processing, vol. 30, no. 1, 1985, pp. 32-46.

27. He K., Zhang X. et al. Deep residual learning for image recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.


Review

For citations:


OBYDENKOV D.O., FROLOV A.E., MARKIN Yu.V., FOMIN S.A., KONDRAT’EV B.V. Printed text documents watermarking based on vertical word shift and word fragments brightness changing. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(5):65-82. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(5)-4



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)