Извлечение функциональности из бинарного кода
Аннотация
Семантический анализ кода – важный, но трудоемкий процесс, используемый во многих областях программирования. Целью данной работы является изучение метода автоматизации семантического анализа бинарного кода, который основан на разделении программ на семантические ядра с использованием частичных трасс выполнения или выделения подграфов графа вызовов и выделении их функциональности.
Об авторах
Анна Александровна ИЛЬИНАРоссия
Студентка магистратуры ВМК МГУ, лаборант ИСП РАН. Сфера научных интересов: статический анализ бинарного кода, символьное выполнение, применение больших языковых моделей.
Шамиль Фаимович КУРМАНГАЛЕЕВ
Россия
Кандидат физико-математических наук, руководитель направления разработки автономных систем и технологий для создания безопасного ПО в ИСП РАН.
Список литературы
1. capa: Automatically Identify Malware Capabilities. Mandiant. Google Cloud Blog. [Online]. Available at: https://cloud.google.com/blog/topics/threat-intelligence/capa-automatically-identify-malware-capabilities, accessed 07.04.2025.
2. zynamics.com – BinDiff. [Online]. Available at: https://www.zynamics.com/bindiff.html, accessed 07.04.2025.
3. D. Gao, M. K. Reiter, and D. Song, BinHunt: Automatically Finding Semantic Differences in Binary Programs in Information and Communications Security, vol. 5308, L. Chen, M. D. Ryan, and G. Wang, Eds. In Lecture Notes in Computer Science, vol. 5308. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 238–255. doi: 10.1007/978-3-540-88625-9_16.
4. L. Massarelli, G. A. D. Luna, F. Petroni, L. Querzoni, and R. Baldoni, SAFE: Self-Attentive Function Embeddings for Binary Similarity, Dec. 19, 2019, arXiv: arXiv:1811.05296. doi: 10.48550/arXiv.1811.05296.
5. X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In Proceedings of the 2017 ACM SIGSAC Confer-ence on Computer and Communications Security, Oct. 2017, pp. 363–376. doi: 10.1145/3133956.3134018.
6. S. H. H. Ding, B. C. M. Fung, and P. Charland, Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In 2019 IEEE Sympo-sium on Security and Privacy (SP), San Francisco, CA, USA: IEEE, May 2019, pp. 472–489. doi: 10.1109/SP.2019.00003.
7. X. Shang et al., How Far Have We Gone in Binary Code Understanding Using Large Language Mod-els. In 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME), Flag-staff, AZ, USA: IEEE, Oct. 2024, pp. 1–12. doi: 10.1109/ICSME58944.2024.00012.
8. Y. David, U. Alon, and E. Yahav, Neural Reverse Engineering of Stripped Binaries using Augmented Control Flow Graphs. Proc. ACM Program. Lang., vol. 4, no. OOPSLA, pp. 1–28, Nov. 2020, doi: 10.1145/3428293.
9. H. Gao, S. Cheng, Y. Xue, and W. Zhang, A lightweight framework for function name reassignment based on large-scale stripped binaries. In Proceedings of the 30th ACM SIGSOFT International Sym-posium on Software Testing and Analysis, Virtual Denmark: ACM, Jul. 2021, pp. 607–619. doi: 10.1145/3460319.3464804.
10. X. Jin, K. Pei, J. Y. Won, and Z. Lin, SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles CA USA: ACM, Nov. 2022, pp. 1631–1645. doi: 10.1145/3548606.3560612.
11. A. Al-Kaswan, T. Ahmed, M. Izadi, A. A. Sawant, P. Devanbu, and A. van Deursen, Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries. 2023, arXiv. doi: 10.48550/ARXIV.2301.01701.
12. J. Xiong, G. Chen, K. Chen, H. Gao, S. Cheng, and W. Zhang, HexT5: Unified Pre-Training for Stripped Binary Code Information Inference. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), Luxembourg, Luxembourg: IEEE, Sep. 2023, pp. 774–786. doi: 10.1109/ASE56229.2023.00099.
13. mandiant/xrefer: FLARE Team’s Binary Navigator. [Online]. Available at: https://github.com/mandiant/xrefer, accessed 21.02.2025.
14. XRefer: The Gemini-Assisted Binary Navigator | Google Cloud Blog. [Online]. Available at: https://cloud.google.com/blog/topics/threat-intelligence/xrefer-gemini-assisted-binary-navigator, ac-cessed 21.02.2025.
15. Ghidra Software Reverse Engineering Framework. Available at: https://github.com/NationalSecurityAgency/ghidra, accessed 31.01.2025.
16. Y. Shoshitaishvili et al., SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA: IEEE, May 2016, pp. 138–157. doi: 10.1109/SP.2016.17.
17. QwenLM/Qwen2.5: Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud. [Online]. Available at: https://github.com/QwenLM/Qwen2.5, accessed 21.02.2025.
18. The Algorithms. [Online]. Available at: https://github.com/TheAlgorithms, accessed 21.02.2025.
19. flax-sentence-embeddings/st-codesearch-distilroberta-base. Hugging Face. [Online]. Available at: https://huggingface.co/flax-sentence-embeddings/st-codesearch-distilroberta-base, accessed 21.02.2025.
20. Microsoft Access – Wikipedia. [Online]. Available at: https://en.m.wikipedia.org/wiki/Microsoft_Access, accessed 21.02.2025.
21. codellama/CodeLlama-13b-Instruct-hf. Hugging Face. [Online]. Available at: https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf, accessed 21.02.2025.
22. microsoft/Phi-3-mini-4k-instruct. Hugging Face. [Online]. Available at: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct, accessed 21.02.2025.
23. aheck/vlandhcpd: VLAN aware DHCP server which listens on a trunk port. [Online]. Available at: https://github.com/aheck/vlandhcpd, accessed 21.02.2025.
Рецензия
Для цитирования:
ИЛЬИНА А.А., КУРМАНГАЛЕЕВ Ш.Ф. Извлечение функциональности из бинарного кода. Труды Института системного программирования РАН. 2025;37(4):97-110.
For citation:
ILINA A.A., KURMANGALEEV Sh.F. Extraction of Functionality from Binary Code. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(4):97-110.