Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Applicances of Different Kind of Storage Systems for Network Traffic Analysis Results

https://doi.org/10.15514/ISPRAS-2024-36(2)-1

Abstract

Network Traffic Analysis (NTA) helps identify security threats, monitor network performance, and plan for future capacity. While real-time analysis is ideal, it can be difficult due to high data volume and complexity. Large amounts of traffic require parsing, and real-time data may miss hidden threats. Post-analysis can address these challenges. It hardly depends on choosing an effective and appropriate storage solutions. A variety of storage systems exist, each employing different approaches and formats to retain data. This article explores the applications of various storage systems for NTA results. Three different types of storage systems considered, including Greenplum, Nebula graph and OpenSearch. A comparative approach is employed, analyzing the same dataset across various storage systems.This allows to examine how different database structures and query capabilities influence the efficiency and accuracy of NTA. The resulting insights will not only provide valuable guidance for selecting the optimal storage solution for specific NTA tasks, but also serve as a foundation for future research in this area.

About the Authors

Vladislav Igorevich EGOROV
Ivannikov Institute for System Programming of the RAS
Russian Federation

Postgraduate student, intern researcher at ISP RAS. Research interests: processing, analysis and storage of network traffic analysis results.



Roman Evgenevich PONOMARENKO
Ivannikov Institute for System Programming of the RAS
Russian Federation

Postgraduate student, intern researcher at ISP RAS. Research interests: software architecture, program optimization, deep packet inspection.



Aleksandr Igorevich GETMAN
Ivannikov Institute for System Programming of the RAS, Moscow Institute of Physics and Technology, National Research University “Higher School of Economics”, Lomonosov Moscow State University.
Russian Federation

Cand. Sci. (Phys.-Math.), senior researcher at ISP RAS, assistant at CMC MSU and MIPT, associate professor at HSE. Research interests: binary code analysis, data format recovery, network traffic analysis and classification.



References

1. “Greenplum database.” (2024), [Online]. Available at: https://greenplum.org, accessed on 03.20.2024.

2. Z. Lyu, H. H. Zhang, G. Xiong, et al., “Greenplum: A hybrid database for transactional and analytical workloads,” in Proceedings of the 2021 International Conference on Management of Data, 2021, pp. 2530–2542.

3. “Elasticsearch.” (2024), [Online]. Available at: https://www.elastic.co/elasticsearch, accessed on 03.22.2024.

4. “Opensearch.” (2024), [Online]. Available at: https://opensearch.org, accessed on 03.20.2024.

5. “Neo4j graph database & analytics — graph database management system.” (2024), [Online]. Available at: https://neo4j.com, accessed on 03.22.2024.

6. “Dgraph — graphql cloud platform, distributed graph engine.” (2024), [Online]. Available at: https://dgraph.io, accessed on 03.22.2024.

7. “Open source distributed graph database — nebula graph.” (2024), [Online]. Available at: https://www . nebula-graph.io, accessed on 03.22.2024.

8. M. Wu, X. Yi, H. Yu, Y. Liu, and Y. Wang, “Nebula graph: An open source distributed graph database,” arXiv preprint arXiv:2206.07278, 2022.

9. A. D’Alconzo, I. Drago, A. Morichetta, M. Mellia, and P. Casas, “A survey on big data for network traffic monitoring and analysis,” IEEE Transactions on Network and Service Management, vol. 16, no. 3, pp. 800–813, 2019.

10. Гетьман А. И., Иконникова М. К. Обзор методов классификации сетевого трафика с использованием машинного обучения. Труды ИСП РАН, том 32, вып. 6, 2020 г., стр. 137-154 / GETMAN A. I., Ikonnikova M. K., “A survey of network traffic classification” Trudy ISP RAN/Proc. ISP RAS, vol. 32, no. 6, pp. 137–154, 2021 (in Russian).

11. S. Rezaei and X. Liu, “Deep learning for encrypted traffic classification: An overview,” IEEE communications magazine, vol. 57, no. 5, pp. 76–81, 2019.

12. D. Wei, F. Shi, and S. Dhelim, “A self-supervised learning model for unknown internet traffic identification based on surge period,” Future Internet, vol. 14, no. 10, p. 289, 2022.

13. M. Piskozub, R. Spolaor, and I. Martinovic, “Compactflow: A hybrid binary format for network flow data,” in Information Security Theory and Practice: 13th IFIP WG 11.2 International Conference, WISTP 2019, Paris, France, December 11–12, 2019, Proceedings 13, Springer, 2020, pp. 185–201.

14. S. Chandrasekaran, O. Cooper, A. Deshpande, et al., “Telegraphcq: Continuous dataflow processing,” in Proceedings of the 2003 ACM SIGMOD international conference on Management of data, 2003, pp. 668–668.

15. “Postgresql: The world’s most advanced open source database.” (2024), [Online]. Available at: https://www. postgresql.org/, accessed on 03.20.2024.

16. S. Kornexl, V. Paxson, H. Dreger, A. Feldmann, and R. Sommer, “Building a time machine for efficient recording and retrieval of high-volume network traffic,” in 5th Internet Measurement Conference, USENIX Association, 2005, pp. 267–272.

17. A. Ba¨r, P. Casas, L. Golab, and A. Finamore, “Dbstream: An online aggregation, filtering and processing system for network traffic monitoring,” in 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), IEEE, 2014, pp. 611–616.

18. B. Claise, “Cisco Systems netflow services export version 9,” RFC Editor, RFC 3954, Oct. 2004. [Online]. Available at: https://www.rfc-editor.org/rfc/rfc3954.txt.

19. B. Claise, B. Trammell, and P. Aitken, “Specification of the ip flow information export (ipfix) protocol for the exchange of flow information,” RFC Editor, RFC 7011, Sep. 2013. [Online]. Available at: https://www.rfc- editor.org/rfc/rfc7011.txt.

20. H. Lim, V. Sekar, Y. Abe, and D. G. Andersen, “Netmemex: Providing full-fidelity traffic archival,” arXiv preprint arXiv:1603.04387, 2016.

21. M. Wullink, G. C. Moura, M. Mu¨ller, and C. Hesselman, “Entrada: A high-performance network traffic data streaming warehouse,” in NOMS 2016-2016 IEEE/IFIP Network Operations and Management Symposium, IEEE, 2016, pp. 913–918.

22. Y. Lee and Y. Lee, “Toward scalable internet traffic measurement and analysis with hadoop,” ACM SIGCOMM Computer Communication Review, vol. 43, no. 1, pp. 5–13, 2012.

23. S. I. Steinfadt and P. S. Ferrell, “Packet capture solutions: Pcapdb benchmark for high-bandwidth capture, storage, and searching,” Los Alamos National Laboratory (LANL), Los Alamos, NM (United States), Tech. Rep., 2017.

24. J. Uramova´, P. Segecˇ, M. Moravcˇ´ık, J. Papa´n, T. Mokosˇ, and M. Brodec, “Packet capture infrastructure based on moloch,” in 2017 15th International Conference on Emerging eLearning Technologies and Applications (ICETA), IEEE, 2017, pp. 1–7.

25. “Arkime.” (2024), [Online]. Available at: https://arkime. com, accessed on 03.20.2024.

26. M. Cermak and D. Sramkova, “Granef: Utilization of a graph database for network forensics.,” in SECRYPT, 2021, pp. 785–790.

27. D. Larin, “Razrabotka i primenenie modeli opisanija mnogourovnevyh setevyh topologij dlja reshenija zadachi monitoringa i modelirovanija setevoj infrastruktury,” Master’s thesis, Moscow Institute of Physics and Technology (National Research University), Moscow, Jun. 2023.

28. “Fast open-source olap dbms clickhouse.” (2024), [Online]. Available at: https://clickhouse.com, accessed on 03.22.2024.

29. Маркин Ю.В., “Методы и средства углубленного анализа сетевого трафика”, диссертация на соискание ученой степени кандидата технических наук, год защиты 2017, ИСП РАН, 80 с. / Markin Y.V., “Metody i sredstva uglublennogo analiza setevogo trafika”, dissertacija na soiskanie uchenoj stepeni kandidata tehnicheskih nauk, 2017, ISP RAN, 80 p, (in Russian).

30. T. Bray, “The JavaScript Object Notation (JSON) Data Interchange Format,” RFC 7159, Mar. 2014, 16 pp. [Online]. Available at: https://www.rfc- editor.org/info/ rfc7159.

31. “Postgresql: Documentation: 16: 8.9. network address types.” (2024), [Online]. Available at: https://www.postgresql.org/docs/current/datatype-net-types.html, accessed on 03.22.2024.

32. “Greenplum: With queries (common table expressions).” (2024), [Online]. Available at: https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-query-topics-CTE- query.html, accessed on 03.22.2024.

33. “Postgresql: Documentation: 12: 8.14. json types.” (2024), [Online]. Available: https://www.postgresql.org/, accessed on 03.21.2024.

34. “Object – opensearch documentation”. (2024), [Online]. Available at: https://opensearch.org/docs/latest/field-types/supported-field-types/object/, accessed on 03.21.2024.

35. “Join – opensearch documentation”. (2024), [Online]. Available at: https://opensearch.org/docs/latest/field-types/supported-field-types/join/, accessed on 03.22.2024.

36. “Map – nebulagraph database manual”. (2024), [Online]. Available at: https://docs.nebula-graph.io/3.6.0/3.ngql-guide/3.data-types/8.map/, accessed on 03.20.2024.

37. “Postgresql: Documentation: 12: 8.4. binary data types”. (2024), [Online]. Available at: https://www.postgresql.org/ docs/12/datatype-binary.html, accessed on 03.20.2024.

38. “Binary – opensearch documentation”. (2024), [Online]. Available: https://opensearch.org/docs/latest/field-types/supported-field-types/binary/, accessed on 03.20.2024.

39. “Opentelemetry.” (2024), [Online]. Available: https://opentelemetry.io, accessed on 03.22.2024.


Review

For citations:


EGOROV V.I., PONOMARENKO R.E., GETMAN A.I. Applicances of Different Kind of Storage Systems for Network Traffic Analysis Results. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2024;36(2):7-20. https://doi.org/10.15514/ISPRAS-2024-36(2)-1



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)