About Methods of Extracting Algorithms from Binary Code
https://doi.org/10.15514/ISPRAS-2024-36(3)-10
Abstract
The paper proposes an iterative method for extracting algorithms from a binary code and constructing their high-level representation. The construction of algorithms using the proposed method is implemented in the form of analysis of dynamic slices. The method is based on an algorithm for tracking data flow in the forward and backward directions. Also, two levels of presentation of the extracted algorithms are proposed: a functional slice diagram and an algorithm execution diagram. The functional slice diagram is a structured slice representation and is a lower-level representation than the algorithm execution diagram. The flowchart of the algorithm is a representation consisting only of function models and their parameters. The proposed method for constructing algorithms and methods of their presentation allow to increase the analyst's productivity in solving problems of code security analysis, to improve the quality of the obtained analysis results. The developed ways of representing algorithms can be used to implement algorithms for automatic analysis of code security. In addition, the authors reviewed the approaches to extracting algorithms from binary code and how they are represented by static code analysis tools and considered some of their shortcomings and limitations.
About the Authors
Ivan Ivanovich KULAGINRussian Federation
Cand. Sci. (Tech.), Researcher in ISP RAS. Research interests: compiler construction, binary code analysis, compiler optimizations, polyhedral compilation, code generation, parallel programming models.
Vartan Andronikovich PADARYAN
Russian Federation
Cand. Sci. (Phys.-Math.), leading researcher at ISP RAS, associate professor of the Department of system programming of CMC of Lomonosov Moscow State University. Science Interests: compiler technologies, binary code analysis, cybersecurity, high performance calculating.
Viacheslav Alexandrovich KOSHKIN
Russian Federation
Research intern at ISP RAS. Science Interests: dynamic binary code analysis, high-level algorithm representation recovery, taint analysis, sensitive data leak detection, data structures type recovery.
References
1. D. Brumley, I. Jager, T. Avgerinos, E. J. Schwartz. BAP: A Binary Analysis Platform. In Proceedings of the 23rd International Conference on Computer Aided Verification. Snowbird, UT, 2011.
2. Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel,G. Vigna. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. IEEE Symposium on Security and Privacy, 2016.
3. The IDA Pro disassembler and debugger. [Online]. http://www.hex-rays.com/idapro/.
4. NSA, Ghidra is a software reverse engineering (SRE) framework. NSA. [Online]. https://github.com/NationalSecurityAgency/ghidra.
5. The Official Rizin Book. [Online]. https://book.rizin.re/.
6. ESIL: Radare2 book. [Online]. https://radare.gitbooks.io/radare2book/content/disassembling/esil.html.
7. Андрей Тихонов, Арутюн Аветисян, Вартан Падарян. Методика извлечения алгоритма из бинарного кода на основе динамического анализа. // Проблемы информационной безопасности. Компьютерные системы. №3 2008. стр. 66-71.
8. Retargetable Decompiler. [Online] https://retdec.com/.
9. Alessandro Di Federico, Mathias Payer, Giovanni Agosta. Rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. In Proceedings of the 26th International Conference on Compiler Construction (CC 2017). Association for Computing Machinery, New York, NY, USA, pp. 131–141.
10. Reps T., Balakrishnan G. Improved memory-access analysis for x86 executables // Proceedings of the International Conference on Compiler Construction, April 2008.
11. Balakrishnan G., Reps T. DIVINE: DIscovering Variables IN Executables // Proceedings of the VMCAI. Nice, France. Jan. 14-16, 2007.
12. M. Weiser. Program Slicing. In Int. Conf. on Softw. Eng. IEEE Comp. Soc., Wash., DC, pp. 439–449, 1981
13. Michael Vaughn, Thomas Reps. A Generating-Extension-Generator for Machine Code. 2020.
14. Venkatesh Srinivasan and Thomas Reps. Partial Evaluation of Machine Code. SIGPLAN Not. 50, 10 (Oct. 2015), pp. 860–879.
15. Venkatesh Srinivasan, Thomas Reps. An improved algorithm for slicing machine code. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). Association for Computing Machinery, New York, NY, USA, 378–393.
16. N. Jones, C. Gomard, P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Inc., 1993.
17. B. Yadegari, B. Johannesmeyer, B. Whitely, S. Debray. A generic approach to automatic deobfuscation of executable code. In S&P, 2015.
18. K. Coogan, G. Lu, S. Debray. Deobfuscation of virtualizationobfuscated software: A semantics-based approach. In CCS, 2011.
19. S. Bansal, A. Aiken. Automatic generation of peephole superoptimizers. In ASPLOS, 2006.
20. E. Schkufza, R. Sharma, A. Aiken. Stochastic superoptimization. In ASPLOS, 2013.
21. M. Abadi, M. Budiu, U. Erlingsson, J. Ligatti. Control-flow integrity. In CCS, 2005.
22. U. Erlingsson, F. Schneider. SASI enforcement of security policies: A retrospective. In Workshop on New Security Paradigms, 1999.
23. A. Slowinska, T. Stancescu, H. Bos. Body armor for binaries: Preventing buffer overflows without recompilation. In ATC, 2012.
24. L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, Univ. of Copenhagen, 1994.
25. C. Consel, L. Hornof, F. No¨el, J. Noy´e, and N. Volanschi. A uniform approach for compile-time and run-time specialization. Dagstuhl Seminar on Partial Evaluation, pages 54–72, 1996.
26. P. Kleinrubatscher, A. Kriegshaber, R. Z¨ochling, and R. Gl¨uck. Fortran program specialization. SIGPLAN Notices, 30(4), 1995.
27. П.М. Довгалюк, В.А. Макаров, В.А. Падарян, М.С. Романеев, Н.И. Фурсова. Применение программных эмуляторов в задачах анализа бинарного кода. Труды Института системного программирования РАН, том 26, вып. 1, 2014, стр. 277-296. DOI: 10.15514/ISPRAS-2014-26(1)-9.
28. A. B. Bugerya, I. I. Kulagin, V. A. Padaryan, M. A. Solovev, A. Y. Tikhonov, Recovery of High-Level Intermediate Representations of Algorithms from Binary Code," 2019 Ivannikov Memorial Workshop (IVMEM), 2019, pp. 57-63.
29. Z. Lin, X. Zhang, D. Xu. Automatic Reverse Engineering of Data Structures from Binary Execution. In Proceedings of the 11th Annual Information Security Symposium, West Lafayette, Indiana, 2010.
30. G. Ramalingam, J. Field, F. Tip. Aggregate Structure Identification and Its Application to Program Analysis. In Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, Texas, USA, 1999.
Review
For citations:
KULAGIN I.I., PADARYAN V.A., KOSHKIN V.A. About Methods of Extracting Algorithms from Binary Code. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2024;36(3):139-160. (In Russ.) https://doi.org/10.15514/ISPRAS-2024-36(3)-10