Practical Abstract Interpretation of Binary Code
https://doi.org/10.15514/ISPRAS-2020-32(6)-8
Abstract
The mathematical foundations of abstract interpretation provide a unified method of formalization and research of program analysis algorithms for a broad spectrum of practical problems. However, its practical usage for binary code analysis faces several challenges, of both scientific and engineering nature. In this paper we address some of those challenges. We describe an intermediate representation that is tailored to binary code analysis; unlike some other IRs it is still useable in system code analysis. To achieve this, we take into account the low-level specifics of how CPUs work; on the IR level this mostly pertains to modeling main memory in that accesses can fail, and addresses can alias. Further, we propose an infrastructure for carrying out abstract interpretation on top of the IR. The user needs to implement the abstract state and the transfer functions, and the infrastructure handles the rest: two executors are currently implemented, one for analysis of a single path, and one for fixed point analysis. Both executors handle interprocedural analysis internally, via inlining or using summaries, so the interpretations only consider only procedure at a time, which greatly simplifies implementation. The IR and the abstract interpretation framework are used together to define a model pipeline for a target instruction set architecture, consisting of a fetch stage, a decode stage, and an execute stage. A distinct fetch stage allows to model delay slots, hardware loops, etc. We currently have limited implementations for RISC-V and x86. The x86 implementation is evaluated in two experiments where concolic execution is used to automatically analyze a «crackme» program, both in dynamic (execution trace) and static (executable image) setting. In conclusion, we outline the future directions of our project.
Keywords
About the Authors
Mikhail Aleksandrovich SOLOVEVRussian Federation
Candidate of physical and mathematical sciences, senior researcher at the compiler technologies department of ISP RAS; senior lecturer at the system programming department of the faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University
Maksim Gennadevich BAKULIN
Russian Federation
Junior researcher
Sergei Sergeevich MAKAROV
Russian Federation
Student at the system programming department
Dmitrii Valerevich MANUSHIN
Russian Federation
Postgraduate student at the system programming department
Vartan Andronikovich PADARYAN
Russian Federation
Candidate of physical and mathematical sciences, leading researcher at the compiler technologies department of ISP RAS; associate professor of the system programming department of the faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University
References
1. Cousot P, Cousot R. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, 1977, pp. 238-252.
2. Kildall G. A unified approach to global program optimization. In Proc. of the 1st ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, 1973, pp. 194-206.
3. Mauborgne L. ASTRÉE: Verification of Absence of Run-Time Error. In Building the information society, IFIP International Federation for Information Processing, vol. 156, 2004, pp. 384-392.
4. Соловьев М.А., Бакулин М.Г., Горбачев М.С., Манушин Д.В., Падарян В.А., Панасенко С.С. О новом поколении промежуточных представлений, применяемом для анализа бинарного кода. Труды ИСП РАН, том 30, вып. 6, 2018, стр. 39-68 / Solovev M.A., Bakulin M.G., Gorbachev M.S., Manushin D.V., Padaryan V.A., Panasenko S.S. Next generation intermediate representations for binary code analysis. Trudy ISP RAN/Proc. ISP RAS, vol. 30, issue 6, 2018, pp. 39-68 (in Russian). DOI: 10.15514/ISPRAS-2018-30(6)-3.
5. Dullien T., Porst S. REIL: A platform-independent intermediate representation of disassembled code for static code analysis. In Proc. of the CanSecWest Conference, 2009, 7 p.
6. Jung M., Kim S., Han H., Choi J., Cha S.K. B2R2: building an efficient front-end for binary analysis. In Proc. of the NDSS workshop on Binary analysis research, 2019, 10 p.
7. Cruehead. URL: https://ansmirnov.ru/crackme-cruehead/ (in Russian), accessed: 23.10.2020.
8. The Z3 Theorem Prover. URL: https://github.com/Z3Prover/z3, accessed: 23.10.2020.
9. Boolector. URL: https://boolector.github.io/, accessed: 23.10.2020.
Review
For citations:
SOLOVEV M.A., BAKULIN M.G., MAKAROV S.S., MANUSHIN D.V., PADARYAN V.A. Practical Abstract Interpretation of Binary Code. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2020;32(6):101-110. (In Russ.) https://doi.org/10.15514/ISPRAS-2020-32(6)-8