Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Document Marking System for Leak Investigations

https://doi.org/10.15514/ISPRAS-2021-33(6)-11

Abstract

This paper presents a confidential text documents leakage investigation system, focused on leak channels by documents printing and screen photographing. Internal intruders may print confidential document, take paper copy out of protected perimeter, make document image by scanner and perform anonymous leak. Also, intruders may take a photo of printed confidential document or displayed on workstation screen using personal mobile phone. Described leakage channels are weakly covered by traditional DLP systems that are usually used by enterprises for confidential information leak protection. Digital watermark (DWM) embedding is chosen as a document protection mechanism by implying changing of document image visual representation. In case of confidential document anonymous leak embedded DWM would enable the employee to determine what leak intentionally or by security protocol violation. System architecture consists of different type components. Employees’ workstation components provide DWM embedding into documents, which are sent for printing or displayed on screen. Information about watermark embedding is sent to a remote server that aggregates marking facts and provides it to security officer during investigation. Text document marking algorithms are developed, which embed DWM into printed and displayed on screen documents. Screen watermark is embedded into interline space interval, information is encoded by sequence of lightened and darkened spaces. DWM embedding into printed documents is implemented by three algorithms: horizontal and vertical shift based, font fragments brightness changing based. Algorithms testing methodology is developed in view of the production environment, that helped to evaluate the application area of algorithms. Besides, intruder model was formulated, system security was evaluated and determined possible attack vectors.

About the Authors

Dmitry Olegovich OBYDENKOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

PhD Student



Aleksey Yur'evich YAKUSHEV
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Student



Yury Vital'evich MARKIN
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Researcher, PhD in Technical Sciences



Alexander Evgenevich FROLOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Master's Student



Stanislav Alexandrovich FOMIN
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Leading Programmer



Sergey Viktorovich KOZLOV
Academy of Federal Guard Service
Russian Federation

Candidate of Technical Sciences



Dmitry Dmitrievich GROMEY
Academy of Federal Guard Service
Russian Federation

Employer of the Academy of Federal Security Guard Service of the Russian Federation



Alexander Vasilievich KOZACHOK
Academy of Federal Guard Service
Russian Federation

Doctor of Technical Sciences, Associated Professor. Employer of the Academy of Federal Guard Service



Boris Vladimirovich KONDRAT’EV
Ministry of Defence of the Russian Federation
Russian Federation


References

1. Исследование утечек информации ограниченного доступа в 2020 году. InfoWatch. 2021, 40 стр. / Research on restricted information leaks in 2020. InfoWatch. 2021, 40 p. Available at: https://www.infowatch.ru/analytics/analitika/issledovanie-utechek-informatsii-ogranichennogo-dostupa-v-2020-godu, accessed 24.10.2021.

2. Утечки информации ограниченного доступа: отчет за 9 месяцев 2020 г. Экспертно-аналитический центр InfoWatch, 2020 г. / Restricted information leaks: report for 9 months of 2020. InfoWatch Analytical Center, 2020 (in Russian). Available at: https://www.infowatch.ru/analytics/analitika/utechki-informatsii-ogranichennogo-dostupa-otchet-za-9-mesyatsev-2020, accessed 24.10.2021.

3. McAfee Data Loss Prevention. Available at: https://docs.mcafee.com/bundle/data-loss-prevention-11.4.x-product-guide, accessed 24.10.2021.

4. Symantec Data Loss Prevention. Available at: https://www.broadcom.com/products/cyber-security/information-protection/data-loss-prevention, accessed 24.10.2021.

5. Trace Doc. Available at: https://secretgroup.ru/trace-doc, accessed 10.08.2021.

6. Unique Interface. EveryTag. Available at: https://everytag.ru/ui, accessed 10.08.2021.

7. Safe Copy. Available at: https://www.niisokb.ru/products/safecopy, accessed 10.08.2021.

8. Козлов С.В., Копылов С.А. и др. Реализация маркирования в подсистеме печати ОС семейства Windows на основе виртуального XPS-принтера. Труды ИСП РАН, том 32, вып. 5, 2020 г., стр. 95-110 / Kozlov S.V., Kopylov S.A. et al. Implementing watermarking based on a virtual XPS printer for Windows operating systems. rudy ISP RAN/Proc. ISP RAS, vol. 32, issue 5, 2020, pp. 95-110 (in Russian). DOI: 10.15514/ISPRAS–2020–32(5)–7.

9. Gugelmann D., Sommer D. et al. Screen Watermarking for Data Theft Investigation and Attribution. In Proc. of the 10th International Conference on Cyber Conflict (CyCon), 2018, pp. 391-408.

10. Fang H., Zhang W. et al. Screen-Shooting Resilient Watermarking. IEEE Transactions on Information Forensics and Security, vol. 14, no. 6, 2019, pp. 1403-1418.

11. Chen W., Ren N. et al. Screen-Cam Robust Image Watermarking with Feature-Based Synchronization. Applied Sciences, vol. 10, no. 21, 2020, article no. 7494.

12. Якушев А.Ю., Маркин Ю.В. и др. Маркирование текстовых документов на экране монитора посредством изменения яркости фона в областях межстрочных интервалов. Труды ИСП РАН, том 33, вып. 4, 2021 г., стр. 147-162 / Yakushev A.Yu., Markin Yu.V. et al. Text documents screen watermarking by changing background brightness in the interline spacing. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 4, 2021. pp. 147-162 (in Russian). DOI: 10.15514/ISPRAS–2021–33(4)–11

13. Pramila A., Keskinarkaus A., Seppänen T. Multiple domain watermarking for print-scan and JPEG resilient data hiding. Lecture Notes in Computer Science, vol. 5041, 2007, pp. 279-293.

14. Dong P., Galatsanos N.P. Affine transformation resistant watermarking based on image normalization. In Proc. of the International Conference on Image Processing, 2002, pp. 489-492.

15. Ahvanooey M.T., Li Q.et al. Modern text hiding, text steganalysis, and applications: a comparative analysis. Entropy, vol. 21, no. 4, 2019, article no. 355.

16. Topkara M., Topkara U., Atallah M.J. Words are not enough: sentence level natural language watermarking. In Proc. of the 4th ACM International Workshop on Contents Protection and Security, 2006, pp. 37-46.

17. Topkara U., Topkara M., Atallah M.J. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Proc. of the 8th Workshop on Multimedia and Security, 2006, pp. 164-174.

18. Shirali-Shahreza M. A new Persian/Arabic text steganography using “La” word. In Advances in computer and information sciences and engineering, Springer, 2008, pp. 339-342.

19. Low S. H., Maxemchuk N. F. et al. Document marking and identification using both line and word shifting. In Proc. of INFOCOM'95, 1995, pp. 853-860.

20. Alattar A. M., Alattar O. M. Watermarking electronic text documents containing justified paragraphs and irregular line spacing. Security, Steganography, and Watermarking of Multimedia Contents VI, vol. 5306, 2004, pp. 685-695.

21. Brassil J. T., Low S. Electronic marking and identification techniques to discourage document copying. IEEE Journal on Selected Areas in Communications, vol. 13, no. 8, 1995, pp. 1495-1504.

22. Kim Y.W., Moon K.A., Oh I. S. A Text Watermarking Algorithm based on Word Classification and Inter-word Space Statistics. In Proc. of the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 775-779.

23. Tan L., Hu K. et al. Print-scan invariant text image watermarking for hardcopy document authentication. Multimedia Tools and Applications, vol. 78, no. 10, 2018, pp. 13189-13211.

24. Xiao C., Zhang C., Zheng C. Fontcode: Embedding information in text documents using glyph perturbation. ACM Transactions on Graphics (TOG), vol. 37, no. 2, 2017, pp. 1-16.

25. Gutub A., Fattani M. A novel Arabic text steganography method using letter points and extensions. In Proc. of the WASET International Conference on Computer, Information and Systems Science and Engineering (ICCISSE), 2007, pp. 28-31.

26. Козачок А.В., Копылов С.А. и др. Алгоритм маркирования текстовых документов на основе изменении интервала между словами, обеспечивающий устойчивость к преобразованию формата. Труды ИСП РАН, том 33, вып. 4, 2021 г., стр. 131-146 / Kozachok A.V., Kopylov S.A. et al. Text documents marking algorithm based on interword distances shifting invariant to format conversion. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 4, 2021. pp. 131-146 (in Russian). DOI: 10.15514/ISPRAS–2021–33(4)–10.

27. Обыденков Д.О., Фролов А.Е. и др. Методы маркирования текстовых документов при печати посредством вертикального сдвига и изменения яркости фрагментов слов. Труды ИСП РАН, том 33, вып. 5, 2021 г., стр. 65-82 / Obydenkov D. O., Frolov A.E. et al. Printed text documents watermarking based on vertical word shift and word fragments brightness changing. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 5, 2021, pp. 65-82 (in Russian). DOI: 10.15514/ISPRAS–2021–33(5).

28. Ronneberger O., Fischer P., Brox T. U-net: Convolutional networks for biomedical image segmentation Lecture Notes in Computer Science, vol. 9351, 2015, pp. 234-241.


Review

For citations:


OBYDENKOV D.O., YAKUSHEV A.Yu., MARKIN Yu.V., FROLOV A.E., FOMIN S.A., KOZLOV S.V., GROMEY D.D., KOZACHOK A.V., KONDRAT’EV B.V. Document Marking System for Leak Investigations. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(6):161-174. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(6)-11



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)