Document Marking System for Leak Investigations
https://doi.org/10.15514/ISPRAS-2021-33(6)-11
Abstract
This paper presents a confidential text documents leakage investigation system, focused on leak channels by documents printing and screen photographing. Internal intruders may print confidential document, take paper copy out of protected perimeter, make document image by scanner and perform anonymous leak. Also, intruders may take a photo of printed confidential document or displayed on workstation screen using personal mobile phone. Described leakage channels are weakly covered by traditional DLP systems that are usually used by enterprises for confidential information leak protection. Digital watermark (DWM) embedding is chosen as a document protection mechanism by implying changing of document image visual representation. In case of confidential document anonymous leak embedded DWM would enable the employee to determine what leak intentionally or by security protocol violation. System architecture consists of different type components. Employees’ workstation components provide DWM embedding into documents, which are sent for printing or displayed on screen. Information about watermark embedding is sent to a remote server that aggregates marking facts and provides it to security officer during investigation. Text document marking algorithms are developed, which embed DWM into printed and displayed on screen documents. Screen watermark is embedded into interline space interval, information is encoded by sequence of lightened and darkened spaces. DWM embedding into printed documents is implemented by three algorithms: horizontal and vertical shift based, font fragments brightness changing based. Algorithms testing methodology is developed in view of the production environment, that helped to evaluate the application area of algorithms. Besides, intruder model was formulated, system security was evaluated and determined possible attack vectors.
About the Authors
Dmitry Olegovich OBYDENKOVRussian Federation
PhD Student
Aleksey Yur'evich YAKUSHEV
Russian Federation
Student
Yury Vital'evich MARKIN
Russian Federation
Researcher, PhD in Technical Sciences
Alexander Evgenevich FROLOV
Russian Federation
Master's Student
Stanislav Alexandrovich FOMIN
Russian Federation
Leading Programmer
Sergey Viktorovich KOZLOV
Russian Federation
Candidate of Technical Sciences
Dmitry Dmitrievich GROMEY
Russian Federation
Employer of the Academy of Federal Security Guard Service of the Russian Federation
Alexander Vasilievich KOZACHOK
Russian Federation
Doctor of Technical Sciences, Associated Professor. Employer of the Academy of Federal Guard Service
Boris Vladimirovich KONDRAT’EV
Russian Federation
References
1. Исследование утечек информации ограниченного доступа в 2020 году. InfoWatch. 2021, 40 стр. / Research on restricted information leaks in 2020. InfoWatch. 2021, 40 p. Available at: https://www.infowatch.ru/analytics/analitika/issledovanie-utechek-informatsii-ogranichennogo-dostupa-v-2020-godu, accessed 24.10.2021.
2. Утечки информации ограниченного доступа: отчет за 9 месяцев 2020 г. Экспертно-аналитический центр InfoWatch, 2020 г. / Restricted information leaks: report for 9 months of 2020. InfoWatch Analytical Center, 2020 (in Russian). Available at: https://www.infowatch.ru/analytics/analitika/utechki-informatsii-ogranichennogo-dostupa-otchet-za-9-mesyatsev-2020, accessed 24.10.2021.
3. McAfee Data Loss Prevention. Available at: https://docs.mcafee.com/bundle/data-loss-prevention-11.4.x-product-guide, accessed 24.10.2021.
4. Symantec Data Loss Prevention. Available at: https://www.broadcom.com/products/cyber-security/information-protection/data-loss-prevention, accessed 24.10.2021.
5. Trace Doc. Available at: https://secretgroup.ru/trace-doc, accessed 10.08.2021.
6. Unique Interface. EveryTag. Available at: https://everytag.ru/ui, accessed 10.08.2021.
7. Safe Copy. Available at: https://www.niisokb.ru/products/safecopy, accessed 10.08.2021.
8. Козлов С.В., Копылов С.А. и др. Реализация маркирования в подсистеме печати ОС семейства Windows на основе виртуального XPS-принтера. Труды ИСП РАН, том 32, вып. 5, 2020 г., стр. 95-110 / Kozlov S.V., Kopylov S.A. et al. Implementing watermarking based on a virtual XPS printer for Windows operating systems. rudy ISP RAN/Proc. ISP RAS, vol. 32, issue 5, 2020, pp. 95-110 (in Russian). DOI: 10.15514/ISPRAS–2020–32(5)–7.
9. Gugelmann D., Sommer D. et al. Screen Watermarking for Data Theft Investigation and Attribution. In Proc. of the 10th International Conference on Cyber Conflict (CyCon), 2018, pp. 391-408.
10. Fang H., Zhang W. et al. Screen-Shooting Resilient Watermarking. IEEE Transactions on Information Forensics and Security, vol. 14, no. 6, 2019, pp. 1403-1418.
11. Chen W., Ren N. et al. Screen-Cam Robust Image Watermarking with Feature-Based Synchronization. Applied Sciences, vol. 10, no. 21, 2020, article no. 7494.
12. Якушев А.Ю., Маркин Ю.В. и др. Маркирование текстовых документов на экране монитора посредством изменения яркости фона в областях межстрочных интервалов. Труды ИСП РАН, том 33, вып. 4, 2021 г., стр. 147-162 / Yakushev A.Yu., Markin Yu.V. et al. Text documents screen watermarking by changing background brightness in the interline spacing. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 4, 2021. pp. 147-162 (in Russian). DOI: 10.15514/ISPRAS–2021–33(4)–11
13. Pramila A., Keskinarkaus A., Seppänen T. Multiple domain watermarking for print-scan and JPEG resilient data hiding. Lecture Notes in Computer Science, vol. 5041, 2007, pp. 279-293.
14. Dong P., Galatsanos N.P. Affine transformation resistant watermarking based on image normalization. In Proc. of the International Conference on Image Processing, 2002, pp. 489-492.
15. Ahvanooey M.T., Li Q.et al. Modern text hiding, text steganalysis, and applications: a comparative analysis. Entropy, vol. 21, no. 4, 2019, article no. 355.
16. Topkara M., Topkara U., Atallah M.J. Words are not enough: sentence level natural language watermarking. In Proc. of the 4th ACM International Workshop on Contents Protection and Security, 2006, pp. 37-46.
17. Topkara U., Topkara M., Atallah M.J. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Proc. of the 8th Workshop on Multimedia and Security, 2006, pp. 164-174.
18. Shirali-Shahreza M. A new Persian/Arabic text steganography using “La” word. In Advances in computer and information sciences and engineering, Springer, 2008, pp. 339-342.
19. Low S. H., Maxemchuk N. F. et al. Document marking and identification using both line and word shifting. In Proc. of INFOCOM'95, 1995, pp. 853-860.
20. Alattar A. M., Alattar O. M. Watermarking electronic text documents containing justified paragraphs and irregular line spacing. Security, Steganography, and Watermarking of Multimedia Contents VI, vol. 5306, 2004, pp. 685-695.
21. Brassil J. T., Low S. Electronic marking and identification techniques to discourage document copying. IEEE Journal on Selected Areas in Communications, vol. 13, no. 8, 1995, pp. 1495-1504.
22. Kim Y.W., Moon K.A., Oh I. S. A Text Watermarking Algorithm based on Word Classification and Inter-word Space Statistics. In Proc. of the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 775-779.
23. Tan L., Hu K. et al. Print-scan invariant text image watermarking for hardcopy document authentication. Multimedia Tools and Applications, vol. 78, no. 10, 2018, pp. 13189-13211.
24. Xiao C., Zhang C., Zheng C. Fontcode: Embedding information in text documents using glyph perturbation. ACM Transactions on Graphics (TOG), vol. 37, no. 2, 2017, pp. 1-16.
25. Gutub A., Fattani M. A novel Arabic text steganography method using letter points and extensions. In Proc. of the WASET International Conference on Computer, Information and Systems Science and Engineering (ICCISSE), 2007, pp. 28-31.
26. Козачок А.В., Копылов С.А. и др. Алгоритм маркирования текстовых документов на основе изменении интервала между словами, обеспечивающий устойчивость к преобразованию формата. Труды ИСП РАН, том 33, вып. 4, 2021 г., стр. 131-146 / Kozachok A.V., Kopylov S.A. et al. Text documents marking algorithm based on interword distances shifting invariant to format conversion. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 4, 2021. pp. 131-146 (in Russian). DOI: 10.15514/ISPRAS–2021–33(4)–10.
27. Обыденков Д.О., Фролов А.Е. и др. Методы маркирования текстовых документов при печати посредством вертикального сдвига и изменения яркости фрагментов слов. Труды ИСП РАН, том 33, вып. 5, 2021 г., стр. 65-82 / Obydenkov D. O., Frolov A.E. et al. Printed text documents watermarking based on vertical word shift and word fragments brightness changing. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 5, 2021, pp. 65-82 (in Russian). DOI: 10.15514/ISPRAS–2021–33(5).
28. Ronneberger O., Fischer P., Brox T. U-net: Convolutional networks for biomedical image segmentation Lecture Notes in Computer Science, vol. 9351, 2015, pp. 234-241.
Review
For citations:
OBYDENKOV D.O., YAKUSHEV A.Yu., MARKIN Yu.V., FROLOV A.E., FOMIN S.A., KOZLOV S.V., GROMEY D.D., KOZACHOK A.V., KONDRAT’EV B.V. Document Marking System for Leak Investigations. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(6):161-174. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(6)-11