Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Next generation intermediate representations for binary code analysis

https://doi.org/10.15514/ISPRAS-2018-30(6)-3

Abstract

A lot of binary code analysis tools do not work directly with machine instructions, instead relying on an intermediate representation from the binary code. In this paper, we first analyze problems in binary code analysis that benefit from such an IR and construct a list of requirements that an IR suitable for solving these problems must meet. Generally speaking, a universal binary analysis platform requires two principal components. The first component is a retargetable instruction decoder that utilizes external specifications for describing target instruction sets. External specifications facilitate maintainability and allow for quickly adding support for new instruction sets. We analyze some of the more common ISAs, including those used in microcontrollers, and from that produce a list of requirements for a retargetable decoder. We then survey existing multi-ISA decoders and propose our vision of a more generic approach, based on a multi-layered directed acyclic graph describing the decoding process in universal terms. The second component of an analysis platform is the actual architecture-neutral IR. In this paper we describe such existing IRs, and propose Pivot 2, an IR that is low-level enough to be easily constructed from decoded machine instructions, and at the same time is also easy to analyze. The main features of Pivot 2 are explicit side effects, SSA variables, a simpler alternative to phi-functions, and an extensible elementary operation set at the core. The IR also supports machines that have multiple memory address spaces. Finally, we propose a way to tie the decoder and the IR together to fit them to most binary code analysis tasks through abstract interpretation on top of the IR. The proposed scheme takes into account various aspects of target architectures that are overlooked in many other works, including pipeline specifics (handling of delay slots, hardware loop support, etc.), exception and interrupt management, and a generic address space model where accesses may have arbitrary side effects due to memory-mapped devices or other non-trivial behavior of the memory system.

About the Authors

M. A. Solovev
Ivannikov Institute for System Programming of the Russian Academy of Sciences; Lomonosov Moscow State University
Russian Federation


M. G. Bakulin
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation


M. S. Gorbachev
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation


D. V. Manushin
Ivannikov Institute for System Programming of the Russian Academy of Sciences; Lomonosov Moscow State University
Russian Federation


V. A. Padaryan
Ivannikov Institute for System Programming of the Russian Academy of Sciences; Lomonosov Moscow State University
Russian Federation


S. S. Panasenko
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation


References

1. [1]. Wang X., Zeldovich N., Kaashoek M. F., Solar-Lezama A. A Differential Approach to Undefined Behavior Detection. ACM Transactions on Computer Systems, vol. 33, no. 1, art. 1, 2015, 29 p. DOI: 10.1145/2699678.

2. [2]. Nethercote N., Seward J. Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM SIGPLAN Notices, 2007, vol. 42, no. 6, pp. 89-100.

3. [3]. Chipounov V., Candea G. Enabling sophisticated analyses of x86 binaries with RevGen. In Proc. of the IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W), 2011, pp. 211-216.

4. [4]. Lattner C., Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, 2004, pp. 75-86.

5. [5]. Song D., Brumley D., Yin H., Caballero J., Jager I., Kang M.G., Liang Z., Newsome J., Poosankam P., Saxena P. BitBlaze: A new approach to computer security via binary analysis. Information systems security, 2008, pp. 1-25.

6. [6]. Padaryan V.A., Solov’ev M.A., Kononov A.I. Simulation of operational semantics of machine instructions. Programming and Computer Software, vol. 37, no. 3, 2011, pp. 161-170. DOI: 10.1134/S0361768811030030.

7. [7]. Brumley D., Jager I., Avgerinos T., Schwartz E.J. BAP: a binary analysis platform. Computer Aided Verification, 2011, pp. 463-469.

8. [8]. Dullien T., Porst S. REIL: A platform-independent intermediate representation of disassembled code for static code analysis. In Proc. of the CanSecWest Conference, 2009.

9. [9]. Bellard F. QEMU, a fast and portable dynamic translator. In Proc. of the USENIX Annual Technical Conference, 2005.

10. [10]. Luk C.K., Cohn R., Muth R., Patil H., Klauser A., Lowney G., Wallace S., Reddi V.J., Hazelwood K. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. ACM SIGPLAN Notices, vol. 40, no. 6, 2005, pp. 190-200.

11. [11]. Bruening D., Amarasinghe S. Efficient, transparent, and comprehensive runtime code manipulation. PhD thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2004.

12. [12]. Chipounov V., Kuznetsov V. S2E: A Platform for In Vivo Multi-Path Analysis of Software Systems. In Proc. of the 16th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS. 2011.

13. [13]. Cha S. K., Avgerinos T., Rebert A., Brumley D. Unleashing mayhem on binary code. IEEE Symposium on Security and Privacy (SP), 2012, pp. 380-394.

14. [14]. Padaryan V.A., Kaushan V.V., Fedotov A.N. Automated exploit generaton method for stack buffer overflow vulnerabilities. Trudy ISP RAN/Proc. ISP RAN, vol. 26, no. 3, 2014, pp. 127-144 (in Russian). DOI: 10.15514/ISPRAS-2014-26(3)-7.

15. [15]. Kruegel C., Valeur F., Robertson W., Vigna G. Static Analysis of Obfuscated Binaries. In Proc. of the 13th USENIX Security Symposium, 2004, pp. 255-270.

16. [16]. Ben Khadra M. A., Stoffel D., Kunz W. Speculative disassembly of binary code. In Proc. of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2016.

17. [17]. Balakrishnan G., Reps T. Analyzing Memory Accesses in x86 Executables. In Proc. of the 13th International Conference on Compiler Construction, 2004, pp. 5-23.

18. [18]. Aslanyan H., Asryan S., Hakobyan J., Vardanyan V., Sargsyan S., Kurmangaleev S. Multiplatform Static Analysis Framework for Program Defects Detection. In Proc. of the CSIT Conference, 2017.

19. [19]. Cousot P., Cousot R. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1977, pp. 238-252.

20. [20]. Padaryan V.A., Getman A.I., Solovyev M.A., Bakulin M.G., Borzilov A.I., Kaushan V.V., Ledovskikh I.N., Markin Yu.V., Panasensko S.S. Methods and software tools to support combined binary code analysis. Programming and Computer Software, vol. 40, no. 5, 2014, pp. 276-287.

21. [21]. GNU Binutils. URL: http://sourceware.org/binutils/, дата обращения: 03.12.2018.

22. [22]. Capstone. URL: http://www.capstone-engine.org/, дата обращения: 03.12.2018.

23. [23]. IDA Pro. URL: https://www.hex-rays.com/products/ida/index.shtml, дата обращения: 03.12.2018.

24. [24]. Fauth A., Van Praet J., Freericks M. Describing instruction set processors using nML. European Design and Test Conference, 1995, pp. 503-507.

25. [25]. Hadjiyiannis G., Hanono S., Devadas S. ISDL: An instruction set description language for retargetability. In Proc. of the 34th annual Design Automation Conference, 1997, pp. 299-302.

26. [26]. Fox A. Improved tool support for machine-code decompilation in HOL4. In Proc. of the International Conference on Interactive Theorem Proving, 2015, pp. 187-202.

27. [27]. Gray K.E., Kerneis G., Mulligan D., Pulte C., Sarkar S., Sewell, P. An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors. In Proc. of the 48th International Symposium on Microarchitecture, 2015, pp. 635-646.

28. [28]. Muchnick S.S. Advanced Compiler Design & Implementation. Morgan Kaufmann Publishers, 1997.


Review

For citations:


Solovev M.A., Bakulin M.G., Gorbachev M.S., Manushin D.V., Padaryan V.A., Panasenko S.S. Next generation intermediate representations for binary code analysis. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2018;30(6):39-68. (In Russ.) https://doi.org/10.15514/ISPRAS-2018-30(6)-3



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)