Identification of transparent, compressed and encrypted data in network traffic
https://doi.org/10.15514/ISPRAS-2021-33(4)-3
Abstract
The article is dedicated to the problem of classifying network traffic into three categories: transparent, compressed and opaque, preferably in real-time. It begins with the description of the areas where this problem needs to be solved, then proceeds to the existing solutions with their methods, advantages and limitations. As most of the current research is done either in the area of separating traffic into transparent and opaque or into compressed and encrypted, the need arises to combine a subset of existing methods to unite these two problems into one. As later the main mathematical ideas and suggestions that lie behind the ideas used in the research done by other scientists are described, the list of the best performing of them is composed to be combined together and used as the features for the random forest classificator, which will divide the provided traffic into three classes. The best performing of these features are used, the optimal tree parameters are chosen and, what’s more, the initial three class classifier is divided into two sequential ones to save time needed for classifying in case of transparent packets. Then comes the proposition of the new method to classify the whole network flow as one into one of those three classes, the validity of which is confirmed on several examples of the protocols most specific in this area (SSH, SSL). The article concludes with the directions in which this research is to be continued, mostly optimizing it for real-time classification and obtaining more samples of traffic suitable for experiments and demonstrations.
About the Authors
Aleksandr Igorevich GETMANRussian Federation
Senior researcher, PhD in physical and mathematical sciences
Maria Kirillovna IKONNIKOVA
Russian Federation
Postgraduate student
References
1. Luo S., Seideman J.D., Dietrich S. Fingerprinting Cryptographic Protocols with Key Exchange using an Entropy Measure. In Proc. of the IEEE Security and Privacy Workshops (SPW), 2018, pp. 170-179.
2. Choudhury P., Kumar K.R.P. et al. An empirical approach towards characterization of encrypted and unencrypted VoIP traffic. Multimedia Tools and Applications, vol. 79, issue 1, 2020, pp. 603-631.
3. Wood D., Apthorpe N., Feamster N. Cleartext data transmissions in consumer iot medical devices. In Proc. of the 2017 Workshop on Internet of Things Security and Privacy, 2017, pp. 7-12.
4. Dorfinger P., Panholzer G., John W. Entropy estimation for real-time encrypted traffic identification. Lecture Notes in Computer Science, vol. 6613, 2011, pp. 164-171.
5. Hjelmvik E., John W. Breaking and improving protocol obfuscation. Chalmers University of Technology, Technical Report No. 2010-05, 2010, 34 p.
6. White A. M., Krishnan S. et al. Clear and Present Data: Opaque Traffic and its Security Implications for the Future. In Proc. of the 20th Annual Network & Distributed System Security Symposium, 2013, 16 p.
7. Roesch M. Snort: Lightweight intrusion detection for networks. In Proc. of the 13th USENIX Conference on System Administration (LISA '99), 1999, pp. 229–238.
8. Cha S., Kim H. Detecting encrypted traffic: a machine learning approach. Lecture Notes in Computer Science, vol. 10144, 2016, pp. 54-65.
9. Lewis R.J. An introduction to classification and regression tree (CART) analysis. In Proc. of the Annual Meeting of the Society for Academic Emergency Medicine in San Francisco, 2000, 15 p.
10. Rish I. An empirical study of the naive Bayes classifier. In Proc. of the Workshop on Empirical Methods in Artificial Intelligence, 2001, pp. 41-46.
11. Casino F., Choo K. K. R., Patsakis C. HEDGE: efficient traffic classification of encrypted and compressed packets. IEEE Transactions on Information Forensics and Security, vol. 14, issue 11, 2019, pp. 2916-2926.
12. Hahn D., Apthorpe N., Feamster N. Detecting compressed cleartext traffic from consumer internet of things devices. arXiv preprint arXiv:1805.02722, 2018.
13. Zhang H., Papadopoulos C. Early detection of high entropy traffic. In Proc. of the IEEE Conference on Communications and Network Security (CNS, 2015, pp. 104-112.
14. Wang, Y., Zhang, Z. et al. Using entropy to classify traffic more deeply. In Proc. of the IEEE Sixth International Conference on Networking, Architecture, and Storage, 2011, pp. 45-52.
15. Wang L. (ed.). Support vector machines: theory and applications. Springer Science & Business Media, 2005, 412 p.
16. Lyda R., Hamrock J. Using entropy analysis to find encrypted and packed malware. IEEE Security & Privacy, vol. 5, issue 2, 2007, pp. 40-45.
17. Rukhin A., Soto J. et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. NIST Special Publication 800-22, 2001, 131 p.
18. Sturgill, M., & Simske, S. (2016). Mass Serialization Method for Document Encryption Policy Enforcement. In Proc. of the ACM Symposium on Document Engineering, 2016, pp. 193-196.
19. De Gaspari F., Hitaj D. et al. Encod: Distinguishing compressed and encrypted file fragments. Lecture Notes in Computer Science, vol. 12570, 2020, pp. 42-62.
20. De Gaspari F., Hitaj D. et al. Reliable Detection of Compressed and Encrypted Data. arXiv preprint arXiv:2103.17059, 2021.
21. Shannon C.E. A mathematical theory of communication. The Bell System Technical Journal, vol. 27, no. 3, 1948, pp. 379-423.
22. Goubault-Larrecq J., Olivain J. Detecting subverted cryptographic protocols by entropy checking. Research Report LSV-06-13, Laboratoire Spécification et Vérification, ENS Cachan, 2006.
23. Goubault-Larrecq J., Olivain J. On the efficiency of mathematics in intrusion detection: the NetEntropy case. In Proc. of the International Symposium on Foundations and Practice of Security, 2013, pp. 3-16.
24. Kozachok A. V. et al. Classification of pseudo-random sequences based on the random forest algorithm. In Proc. of the 2020 Ivannikov Memorial Workshop (IVMEM), 2020, pp. 55-58.
25. Zahid N., Abouelala O. et al. Fuzzy clustering based on K-nearest-neighbours rule. Fuzzy Sets and Systems, vol 120, issue 2, 2001, pp. 239-247.
Review
For citations:
GETMAN A.I., IKONNIKOVA M.K. Identification of transparent, compressed and encrypted data in network traffic. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(4):31-48. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(4)-3