Raising the level of abstraction of program execution trace
https://doi.org/10.15514/ISPRAS-2012-23-6
Abstract
This paper presents a method for raising the level of abstraction of a program execution trace by building of an algorithm model.
Dynamic analysis of executable files is often used to understand the logic of a program in the absence of source code. One of dynamic analysis methods is analysis of program execution traces containing register values and sequence of instructions executed by the processor. Traces are difficult for understanding because of the large amount of information.
First stage of building an algorithm model is identification of function calls in the trace. Input and output parameters are determined for each call. If the trace has got several calls for the same function, the information about them is combined, defining low-level parameters, return value, and dependencies between inputs and outputs.
Second stage is variable recovery. Low-level data elements are mapped on variables. Each processor has a fixed set of registers and limited memory and the number of higher-level variables in the program is not limited. Variable lifetime is evaluated for mapping variables to their locations. Lifetime is a range of trace step indexes from variable creation to its last usage. Return value and parameters of the function are recovered using its calling convention.
Third stage is examination of library calls that are used in the trace. Symbolic information can be extracted from binary libraries and added to the corresponding functions in the trace. Header files are available for some libraries. Full high level function prototypes can be found from them and mapped to the trace.
This allows us to get high level types of parameters that are propagated along the trace through global variables and function calls.
Function models are combined into a high level model algorithm that can be used to restore or analyse it.
About the Authors
A. G. NazarovRussian Federation
M. A. Klimushenkova
Russian Federation
P. M. Dovgaluk
Russian Federation
V. A. Makarov
Russian Federation
References
1. Cifuentes C., Gough K. J. Decompilation of binary programs. Softw. Pract. Exper., 25(7):811–829, 1995. ISSN 0038-0644. doi:10.1002/spe.4380250706
2. Padaryan V.А., Get'man А.I., Solov'ev M.А. Programmnaya sreda dlya dinamicheskogo analiza binarnogo koda [Software environment for dynamic analysis of binary code]. Trudy ISP RАN [The Proceedings of ISP RAS], 2009, vol. 17, pp. 51-72 (in Russian).
3. Tikhonov А.YU., Аvetisyan А.I. Kombinirovannyj (staticheskij i dinamicheskij) analiz binarnogo koda [Combined (static and dynamic) analysis of binary code]. Trudy ISP RАN [The Proceedings of ISP RAS], 2012, vol. 22, pp. 131-152 (in Russian).
4. Nathan E. Rosenblum, Barton P. Miller, Xiaojin Zhu. Extracting compiler provenance from program binaries. Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. Toronto, Ontario, Canada, 2010, pp. 21-28
5. Nazarov А.G. Аvtomaticheskaya identifikatsiya interfejsa bloka trassy v zadache vosstanovleniya modeli algoritma [Automatic interface identification of trace block in recovery algorithm model task]. Vestnik NovGU [Vestnik NovGU], 2012, vol. 62, pp. 41 - 50 (in Russian).
6. J. Caballero, N.M. Johnson,S.McCamant, and D. Song. Binary code extraction and interface identification for security applications. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, February 2010, 1-18 pp
7. clang: a C language family frontend for LLVM. clang.llvm.org
8. Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual. download.intel.com/products/processor/manual/325462.pdf
9. ARM® v7-M Architecture Reference Manual. silver.arm.com/download/ARM_and_AMBA_Architecture/AR580-DC-11001-r0p0-02rel0/DDI0403D_arm_architecture_v7m_reference_manual_errata_markup_1_0.pdf
10. MIPS® Architecture For Programmers Volume I-A: Introduction to the MIPS64® Architecture. www.mips.com/auth/MD00083-2B-MIPS64INT-AFP-03.02.pdf
11. Klimushenkova M.А. Metody vosstanovleniya ob"ektov dannykh iz binarnogo koda [Methods for data objects recovery from a binary code]. Vestnik NovGU [Vestnik NovGU], 2012, vol. 62, pp. 51 - 57 (in Russian).
12. Alan Mycroft. Type-based decompilation. Proceedings of European Symposium on Programming, Mar. 1999, pp. 208 – 223
13. E. Dolgova and A. Chernov. Automatic reconstruction of data types in the decompilation problem. Programming and Computer Software 35, Mar. 2009, pp. 105 – 119
14. JongHyup Lee, Thanassis Avgerinos and David Brumley. TIE: Principled Reverse Engineering of Types in Binary Programs.Proceedings of the Distributed System Security Symposium, NDSS 2011, pp. 1 – 18
15. G. Balakrishnan and T. Reps. DIVINE: Discovering variables in executables. Proceedings of the International Conference on Verification Model Checking and Abstract Interpretation, Jan. 2007, pp. 1 - 28
16. G. Balakrishnan. WYSINWYX: What You See Is Not What You eXecute. PhD thesis, Computer Science Department, University of Wisconsin at Madison,Aug. 2007, pp. 20 – 97
17. Z. Lin, X. Zhang, and D. Xu. Automatic reverse engineering of data structures from binary execution. Proceedings of the Network and Distributed System Security Symposium, 2010, pp. 1 – 18
Review
For citations:
Nazarov A.G., Klimushenkova M.A., Dovgaluk P.M., Makarov V.A. Raising the level of abstraction of program execution trace. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2012;23. (In Russ.) https://doi.org/10.15514/ISPRAS-2012-23-6