A software complex for revealing malicious behavior in untrusted binary code
https://doi.org/10.15514/ISPRAS-2019-31(6)-3
Abstract
One of the main problem of a binary code security analysis is a revealing of malicious behavior in an untrusted program. This task is hard to automate, and it requires a participation of a cybersecurity expert. Existing solutions are aimed on the analyst manual work; automation they provide does not demonstrate a system approach. In case where needed analysis tools are absent, the analyst loses the proper support and he is forced to develop tools on one's own. This greatly slows down him from obtaining the practical results. The paper presents a software complex to solve a revealing of malicious behavior problem as a whole: from creating a controlled execution environment to man guided preparing a high-level description of an analyzed algorithm. A QEMU Developer Toolkit (QDT) is introduced, offering support for the domain specific development life cycle. QDT is especially suited for QEMU virtual machine development, including specialized testing and debugging technologies and tools. A high-level hierarchical flowchart-based representation of a program algorithm is presented, as well as an algorithm for its construction. The proposed representation is based on a hypergraph and it allows both automatic and manual data flow analysis at various detail levels. The developed representation is suitable for automatic analysis algorithms implementation. An approach to improve the quality of the resulting representation of the algorithm is proposed. The approach combines individual data streams into the one that links separate logical modules of the algorithm. A test set based on real programs and model examples has been developed to evaluate the result of constructing the proposed high-level algorithm representation.
Keywords
About the Authors
Alexander Borisovich BugeryaRussian Federation
Senior Researcher in Keldysh Institute of Applied Mathematics; Programmer at ISP RAS
Vasilii Yur'evich Efimov
Russian Federation
Junior Researcher
Ivan Ivanovich Kulagin
Russian Federation
Candidate of Technical Science, Software Engineer
Vartan Andronikovich Padaryan
Russian Federation
Candidate of Physical and Mathematical Sciences, Leading Researcher at ISP RAS, Associate Professor of the Department of system programming of CMC faculty of Lomonosov Moscow State University
Mikhail Aleksandrovich Solovev
Russian Federation
Candidate of Physical and Mathematical Sciences, senior researcher at ISP RAS, senior lecturer of the Department of system programming of CMC faculty of Lomonosov Moscow State University
Andrei Yur'evich Tikhonov
Russian Federation
Lecturer
References
1. The IDA Pro disassembler and debugger. URL http://www.hex-rays.com/idapro/, accessed 20.11.2019.
2. NSA, Ghidra is a software reverse engineering (SRE) framework. NSA. URL https://github.com/NationalSecurityAgency/ghidra, accessed 20.11.2019.
3. D. Brumley, I. Jager, T. Avgerinos, E. J. Schwartz. BAP: A Binary Analysis Platform. Lecture Notes in Computer Science, vol. 6806, 2011, pp. 463-469.
4. Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel,G. Vigna. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. IEEE Symposium on Security and Privacy, 2016, pp. 138-157.
5. ESIL: Radare2 book. URL https://radare.gitbooks.io/radare2book/content/disassembling/esil.html, accessed 20.11.2019.
6. В.А. Падарян, А.И. Гетьман, М.А. Соловьев, М.Г. Бакулин, А.И. Борзилов, В.В. Каушан, И.Н. Ледовских, Ю.В. Маркин, С.С. Панасенко. Методы и программные средства, поддерживающие комбинированный анализ бинарного кода. Труды ИСП РАН, том 26, вып. 1, 2014 г., стр. 251-276 / V.A. Padaryan, A.I. Getman, M.A. Solovyev, M.G. Bakulin, A.I. Borzilov, V.V. Kaushan, I.N. Ledovskich, U.V. Markin, S.S. Panasenko. Methods and software tools for combined binary code analysis. Trudy ISP RAN/Proc. ISP RAS, vol. 26, issue 1, 2014, pp. 251-276. DOI: 10.15514/ISPRAS-2014-26(1)-8.
7. Aslanyan H.K. Plarform for interprocedural static analysis of binary code. Trudy ISP RAN/Proc. ISP RAS, vol. 30, issue 5, 2018. pp. 89-100. DOI: 10.15514/ISPRAS-2018-30(5)-5.
8. Ефимов В.Ю., Беззубиков А.А., Богомолов Д.А., Горемыкин О.В., Падарян В.А. Автоматизация разработки моделей устройств и вычислительных машин для QEMU. Труды ИСПРАН, том 29, вып. 6, 2017 г., стр. 77-104. DOI: 10.15514/ISPRAS-2017-29(6)-4 / Efimov V.Yu., Bezzubikov A.A., Bogomolov D.A., Goremykin O.V., Padaryan V.A. Automation of device and machine development for QEMU. Trudy ISP RAN/Proc. ISP RAS, vol. 29, issue 6, 2017, pp. 77-104 (In Russian). DOI: 10.15514/ISPRAS-2017-29(6)-4.
9. А.И. Гетьман, Ю.В. Маркин, Д.О. Обыденков, В. А. Падарян. Архитектура системы глубокого разбора сетевого трафика. Системный администратор, том 1, вып. 2, 2018 г., стр. 83-87 / A.I. Get’man, Yu.V. Markin, D.O. Obidenkov, V. A. Padaryan. An architecture of deep packet inspection system. Sistemnyj administrator, vol. 1, issue 2, 2018, pp. 83-87 (in Russian).
10. Bellard F. QEMU, a fast and portable dynamic translator. In Proc. of the USENIX Annual Technical Conference, 2005, pp. 41-46.
11. Bezzubikov, N. Belov, K. Batuzov. Automatic dynamic binary translator generation from instruction set description. In Proc. of the 2017 Ivannikov ISPRAS Open Conference, 2017, pp. 27-33.
12. ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile). URL https://static.docs.arm.com/ddi0487/ea/DDI0487E_a_armv8_arm.pdf, accessed 20.11.2019.
13. Reid. Trustworthy specifications of ARM v8-A and v8-M system level architecture. In Proc. of Formal Methods in Computer-Aided Design, 2016, pp. 161-168.
14. I3S (Instruction Set Semantics Specification) Translator. URL https://github.com/ispras/I3S, accessed 20.11.2019.
15. Колтунов Д.С., Ефимов В.Ю., Падарян В.А. Автоматизированное тестирование фронтенда транслятора TCG для QEMU. Труды ИСП РАН, том 31, вып. 5, 2019 г., стр. 7–24 / Koltunov D.S., Efimov V.Y., Padaryan V.A. Automated testing of a TCG frontend for QEMU. Trudy ISP RAN/Proc. ISP RAS, vol. 31, issue 5, 2019 г., pp. 7-24 (in Russian). DOI: 10.15514/ISPRAS–2019–31(5)–1.
16. А.И. Аветисян, К.А. Батузов, В.Ю. Ефимов, В.А. Падарян, А.Ю. Тихонов. Применение программных эмуляторов для полносистемного анализа бинарного кода мобильных платформ. Проблемы информационной безопасности. Компьютерные системы, №4, 2015 г., cтр. 187-194 / Avetisyan A.I., Batuzov K.A., Efimov V.Y., Padaryan V.A., Tikhonov A.Y. Whole system emulators for mobile platform binary code analysis. Information Security Problems. Computer Systems, №4, 2015, pp. 187-194 (in Russian).
17. V. Efimov, V. Padaryan. Peripheral Device Register Support for Source Code Boilerplate Generator of QEMU Development Toolkit. In Proc. of the 2018 Ivannikov Memorial Workshop (IVMEM), 2018, pp. 36-39.
18. S. Sargsyan, J. Hakobyan, M. Mehrabyan, M. Mishechkin, V. Akozin, S. Kurmangaleev. ISP-Fuzzer: Extendable Fuzzing Framework. In Proc. of the of 2019 Ivannikov Memorial Workshop (IVMEM), 2019, pp. 68-71.
19. А.Н. Федотов. Метод оценки эксплуатирумости программных дефектов. Труды ИСП РАН, том 28, вып. 4, 2016 г., стр. 137-148 / A.N. Fedotov. Method for exploitability estimation of program bugs. Trudy ISP RAN/Proc. ISP RAS, vol. 28, issue 4, 2016. pp. 137-148 (in Russian). DOI: 10.15514/ISPRAS-2016-28(4)-8.
20. Падарян В.А. О представлении результатов обратной инженерии бинарного кода. Труды ИСП РАН, том 29, вып. 3, 2017 г., стр. 31-42 / Padaryan V.A. Automated vulnerabilities exploitation in presence of modern defense mechanisms. Trudy ISP RAN/Proc. ISP RAS, vol. 29, issue 3, 2017. pp. 31-42 (in Russian). DOI: 10.15514/ISPRAS-2017-29(3)-3
21. S. Horwitz, T. Reps, D. Binkley. Interprocedural Slicing Using Dependence Graphs. ACM Transactions on Programming Languages and Systems, vol. 12, no. 1, 1990, pp. 26-60.
22. J. Ferrante, K. J. Ottenstein, J. D. Warren. The Program Dependence Graph and Its Use in Optimization. ACM Transactions on Programming Languages and Systems, vol. 9, no. 3, 1987, pp. 319-349.
23. В. А. Падарян, М. А. Соловьев, А. И. Кононов. Моделирование операционной семантики машинных инструкций. Программирование, том 37, № 3, 2011 г., стр. 50-64 / V. A. Padaryan, M. A. Solov’ev, A. I. Kononov. Simulation of operational semantics of machine instructions. Programming and Computer Software, vol. 37, Issue 3, 2011, pp 161–170.
24. Соловьев М.А., Бакулин М.Г., Горбачев М.С., Манушин Д.В., Падарян В.А., Панасенко С.С. О новом поколении промежуточных представлений, применяемом для анализа бинарного кода. Труды ИСП РАН, том 30, вып. 6, 2018 г., стр. 39-68 / Solovev M.A., Bakulin M.G., Gorbachev M.S., Manushin D.V., Padaryan V.A., Panasenko S.S. Next generation intermediate representations for binary code analysis. Trudy ISP RAN/Proc. ISP RAS, vol. 30, issue 6, 2018, pp. 39-68 (in Russian). DOI: 10.15514/ISPRAS-2018-30(6)-3.
25. M. Jung, S. Kim, H. Han, J. Choi, S. Kil Cha. B2R2: Building an Efficient Front-End for Binary Analysis. In Proc. of the NDSS Workshop on Binary Analysis Research, 2019.
26. T. Dullien, S. Porst. REIL: A platform-independent intermediate representation of disassembled code for static code analysis. In Proc. of the CanSecWest Applied Security Conference, 2009, 7 p.
27. S. Alam, R.N. Horspool, I. Traore. MAIL: Malware Analysis Intermediate Language: A Step Towards Automating and Optimizing Malware Detection. In Proc. of the 6th International Conference on Security of Information and Networks, 2013, pp. 233–240.
28. A formal specification for BIL: BIL Instruction Language, 2015. URL https://github.com/BinaryAnalysisPlatform/bil/blob/master/bil.tex, accessed 20.11.2019.
29. D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, P. Saxena. BitBlaze: A New Approach to Computer Security via Binary Analysis. Lecture Notes in Computer Science, vol. 5352, 2008, pp. 1-25.
30. Sepp, B. Mihaila, A. Simon. Precise Static Analysis of Binaries by Extracting Relational Information. In Proc. of the 18th Working Conference on Reverse Engineering, 2011, pp. 357-366.
31. N. Nethercote, J. Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentatio. SIGPLAN Notices, vol. 42, no. 6, 2007, pp. 89-100.
32. E. J. Schwartz, J. Lee, M. Woo, D. Brumley. Native x86 Decompilation Using Semantics-preserving Structural Analysis and Iterative Control-flow Structuring. In Proc. of the 22Nd USENIX Conference on Security, 2013, pp. 353-368.
33. K. Yakdan, S. Dechand, E. Gerhards-Padilla, M. Smith. Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study. In Proc. of the IEEE Symposium on Security and Privacy (SP), 2016, pp. 158-177.
34. Retargetable Decompiler. URL https://retdec.com/, accessed 20.11.2019.
35. T. Ben-Nun, A. S. Jakobovits, T. Hoefler. Neural Code Comprehension: A Learnable Representation of Code Semantics. In Proc. of the 32Nd International Conference on Neural Information Processing Systems, 2018, pp. 1-13.
36. O. Katz, Y. Olshaker, Y. Goldberg, E. Yahav. Towards Neural Decompilation. arXiv preprint arXiv:1905.08325, 2019.
37. P. Lestringant, F. Guihéry, P.-A. Fouque. Automated Identification of Cryptographic Primitives in Binary Code with Data Flow Graph Isomorphism. In Proc. of the 10th ACM Symposium on Information, Computer and Communications Security, 2015, pp. 203–214.
38. D. Caselden, A. Bazhanyuk, M. Payer, S. McCamant, D. Song. HI-CFG: Construction by Binary Analysis and Application to Attack Polymorphism. Lecture Notes in Computer Science, vol. 81-34, 2013, pp. 164-181.
39. Slowinska, T. Stancescu, H. Bos. Howard: A Dynamic Excavator for Reverse Engineering Data Structures. In Proc. of the NDSS Symposium, 2011, 20 p.
40. Z. Lin, X. Zhang, D. Xu. Automatic Reverse Engineering of Data Structures from Binary Execution. In Proc. of the 11th Annual Information Security Symposium, 2010, Article no. 5.
41. G. Ramalingam, J. Field, F. Tip. Aggregate Structure Identification and Its Application to Program Analysis. In Proc. of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1999. Pp. 119–132.
42. R. Milner. A theory of type polymorphism in programming. Journal of Computer and System Sciences, vol. 17, 1978, pp. 348-375.
43. Довгалюк П.М., Макаров В.А., Падарян В.А., Романеев М.С., Фурсова Н.И. Применение программных эмуляторов в задачах анализа бинарного кода. Труды ИСП РАН, том 26, вып. 1, 2014 г., стр. 277-296 / Dovgalyuk P.M., Makarov V.A., Padaryan V.A., Romaneev M.S., Fursova N.I. Application of software emulators for thebinary code analysis. Trudy ISP RAN/Proc. ISP RAS, vol, 26, issue 1, 2014, pp. 277-296 (in Russian). DOI: 10.15514/ISPRAS-2014-26(1)-9.
44. C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. ACM SIGPLAN Notices, vol. 40, no. 6, 2005, pp. 190-200.
45. D. Bruening, S. Amarasinghe. Efficient, transparent, and comprehensive runtime code manipulation. PhD thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2004.
46. V.A. Padaryan, I.N. Ledovskikh. On the Representation of Results of Binary Code Reverse Engineering. Programming and Computer Software, vol. 44, issue 3, 2018, pp. 200–206.
Review
For citations:
Bugerya A.B., Efimov V.Yu., Kulagin I.I., Padaryan V.A., Solovev M.A., Tikhonov A.Yu. A software complex for revealing malicious behavior in untrusted binary code. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2019;31(6):33-64. (In Russ.) https://doi.org/10.15514/ISPRAS-2019-31(6)-3