Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Decoding of machine instructions for abstract interpretation of binary code

https://doi.org/10.15514/ISPRAS-2019-31(6)-4

Abstract

Not only developers require tools that work with binary code: it is impossible to achieve sufficient security in contemporary software without inspecting its properties at this level. A key component of binary code analysis toolset is the instruction decoder. Different instruction set architectures give rise to decoders that are differently structured, the decoding results are incompatible, and maintenance is hindered because of the ubiquitous practice of implementing decoders as cascades of conditional operators. Further binary code analysis (control and data flows, symbolic interpretation, etc.) cannot easily be ported from one target architecture to another because of limitations and peculiarities of decoder implementations. In this paper, we propose an approach to decoding machine instructions that is based on external specifications. The main distinction is an original way of representing the decoder instruction universally, i.e. in a way that does not differ from one architecture to another. The decoding process is handled by an abstract stack machine we have developed. Despite the inevitable overhead stemming from the approach’s universality, an implementation prototype displays only 1.5-2.5 times slowdown compared to traditional decoders; the measurements include time required to parse the specification and build the required data structures. The proposed approach to organizing decoding would allow, in the long run, to establish a unified stack of binary code analysis tools that would be applicable to different instruction set architectures. The paper further discusses questions of translating the decoded instructions into a machine-neutral internal representation for analyzing their operational semantics and carrying out abstract interpretation. We give examples of practically useful interpretations: the concrete interpretation and a “directing” interpretation that allows to tie the idea of abstract interpretation with the problem of deeper automatic analysis of individual paths in a program.

About the Authors

Mikhail Aleksandrovich Solovev
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation
Candidate of physical and mathematical sciences, senior researcher at the compiler technologies department of ISP RAS; senior lecturer at the system programming department of the faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University


Maksim Gennadevich Bakulin
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation
Junior researcher at the compiler technologies department


Sergei Sergeevich Makarov
Lomonosov Moscow State University
Russian Federation
Student at the system programming department of the faculty of Computational Mathematics and Cybernetics


Dmitrii Valerevich Manushin
Lomonosov Moscow State University
Russian Federation
PhD student at the system programming department of the faculty of Computational Mathematics and Cybernetics


Vartan Andronikovich Padaryan
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation
Candidate of physical and mathematical sciences, leading researcher at the compiler technologies department of ISP RAS; associate professor of the system programming department of the faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University


References

1. Solovev M.A., Bakulin M.G., Gorbachev M.S., Manushin D.V., Padaryan V.A., Panasenko S.S. Next-generation intermediate representations for binary code analysis. Programming and Computer Software, vol. 45, issue 7, 2019, pp. 424-437. DOI: 10.1134/S0361768819070107.

2. Ben Khadra M.A., Stoffel D., Kunz W. Speculative disassembly of binary code. In Proc. of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2016, Article No. 16.

3. Luk C.K., Cohn R., Muth R., Patil H., Klauser A., Lowney G., Wallace S., Reddi V.J., Hazelwood K. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. ACM SIGPLAN Notices, vol. 40, no. 6, 2005, pp. 190-200.

4. Ren S, Tan L, Li C, Xiao Z, Song W. Samsara: Efficient deterministic replay in multiprocessor environments with hardware virtualization extensions. In Proc. of the USENIX Annual Technical Conference, 2016, pp. 551-564.

5. Weiser M. Program slicing. IEEE Transactions on software engineering, vol. 10, issue 4, 1984, pp. 352-357.

6. Bakulin M, Klimushenkova M, Egorov D. Dynamic Diluted Taint Analysis for Evaluating Detected Policy Violations. Ivannikov ISPRAS Open Conference, 2017, pp. 22-26. DOI: 10.1109/ISPRAS.2017.00011.

7. Bugerya A.B., Kulagin I.I., Padaryan V.A., Solovev M.A., Tikhonov A.Yu. Recovery of High-Level Intermediate Representations of Algorithms from Binary Code. Ivannikov Memorial Workshop (IVMEM), 2019, pp. 57-63. DOI: 10.1109/IVMEM.2019.00015.

8. Федотов А.Н., Падарян В.А., Каушан В.В., Курмангалеев Ш.Ф., Вишняков А.В., Нурмухаметов А.Р. Оценка критичности программных дефектов в рамках работы современных защитных механизмов. Труды ИСП РАН, том 28, вып. 5, 2016, стр. 73-92 / Fedotov A.N., Padaryan V.A., Kaushan V.V., Kurmangaleev Sh.F., Vishnyakov A.V., Nurmukhametov A.R. Software defect severity estimation in presence of modern defense mechanisms. Trudy ISP RAN/Proc. ISP RAN, vol. 28, issue 5, 2016, pp. 73-92 (in Russian). DOI: 10.15514/ISPRAS-2016-28(5)-4.

9. Muchnick S.S. Advanced Compiler Design & Implementation. Morgan Kaufmann Publishers, 1997, 856 p.

10. GNU binutils. URL: https://www.gnu.org/software/binutils/, accessed 25.11.2019.

11. Capstone. URL: http://www.capstone-engine.org/, accessed 25.11.2019.

12. Lattner C., Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the International Symposium on Code Generation and Optimization: feedback-directed and runtime optimization, 2004, pp. 75-86.

13. Bellard F. QEMU, a fast and portable dynamic translator. In Proc. of the USENIX Annual Technical Conference, 2005, pp. 41-46.

14. Nethercote N., Seward J. Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM SIGPLAN Notices, vol. 42, no. 6, 2007, pp. 89-100.

15. Intel XED. URL: https://intelxed.github.io/, accessed 25.11.2019.

16. Padaryan V.A., Getman A.I., Solovyev M.A., Bakulin M.G., Borzilov A.I., Kaushan V.V., Ledovskikh I.N., Markin Yu.V., Panasensko S.S. Methods and software tools to support combined binary code analysis. Programming and Computer Software, vol. 40, no. 5, 2014, pp. 276-287. DOI: 10.1134/S0361768814050077.

17. Рубанов В.В., Михеев А.С. Интегрированная среда описания системы команд для сигнальных процессоров. Труды ИСП РАН, том 9, 2006, стр. 143-158 / Rubanov V.V, Mikheev A.S. Integrated environment for describing instruction systems of signal processors. Trudy ISP RAN/Proc. ISP RAN, vol. 8, 2006, pp. 143-158 (in Russian).

18. Ghidra. URL: https://www.nsa.gov/resources/everyone/ghidra/, accessed 25.11.2019.

19. Соловьев М.А., Бакулин М.Г., Горбачев М.С., Манушин Д.В., Падарян В.А., Панасенко С.С. О новом поколении промежуточных представлений, применяемом для анализа бинарного кода. Труды ИСП РАН, том 30, вып. 6, 2018, стр. 39-68 / Solovev M.A., Bakulin M.G., Gorbachev M.S., Manushin D.V., Padaryan V.A, Panasenko S.S. Next-generation intermediate representations for binary code analysis. Trudy ISP RAN/Proc. ISP RAN, vol. 30, issue 6, 2018, pp. 39-68. DOI: 10.15514/ISPRAS-2018-30(6)-3.

20. Brummayer R, Biere A, Lonsing F. BTOR: bit-precise modelling of word-level problems for model checking. In Proc. of the Joint Workshops of the 6th International Workshop on Satisfiability Modulo Theories and 1st International Workshop on Bit-Precise Reasoning, 2008, pp. 33-38.

21. Ford B. Parsing expression grammars: a recognition-based syntactic foundation. In Proc. of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2004, pp. 111-122.

22. Debian Linux RISC-V OS image. URL: https://people.debian.org/~mafm/debian-riscv64-tarball-20180418.tar.xz, accessed 25.11.2019.

23. Debian Linux PowerPC OS image. URL: https://cdimage.debian.org/debian-cd/current/ppc64el/iso-cd/debian-10.2.0-ppc64el-netinst.iso, accessed 25.11.2019.

24. RV8. URL: https://rv8.io/, accessed 25.11.2019.

25. Cousot P, Cousot R. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1977, pp. 238-252.

26. Tarski A. A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics, vol. 5, issue 2, 1955, pp. 285-309.


Review

For citations:


Solovev M.A., Bakulin M.G., Makarov S.S., Manushin D.V., Padaryan V.A. Decoding of machine instructions for abstract interpretation of binary code. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2019;31(6):65-88. (In Russ.) https://doi.org/10.15514/ISPRAS-2019-31(6)-4



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)