Framework for Machine Instruction Usage Analysis
https://doi.org/10.15514/ISPRAS-2023-35(3)-12
Abstract
When migrating software to new hardware architectures, including the development of optimizing compilers for new platforms, there is a need for statistical analysis of data on the use of different machine instructions or their groups in the machine code of programs. This paper describes a new framework useful for statistical research on machine opcodes that is designed to be extensible and a dataset that can be used by other researchers. We automatically collect data on different GNU/Linux distributions and architectures and provide facilities for its statistical analysis.
About the Authors
Danila Evgenevich PECHENEVRussian Federation
Student and researcher at St. Petersburg State University
Iakov Aleksandrovich KIRILENKO
Russian Federation
Head of the Infrastructure Solutions Programming Technologies Laboratory at St. Petersburg State University
Olga Andreevna AFONINA
Russian Federation
Student and researcher at Saint Petersburg State University
References
1. RISC-V International home page, Available at:: https://riscv.org/about/ (accessed: 01.05.2023).
2. RISC-V Alliance in Russia, Available at: https://riscv-alliance.ru/ (accessed: 01.05.2023).
3. Global News on High Performance Computing (HPC), Available at: https://www.hpcwire.com/2022/12/16/europe-to-dish-out-e270-millionto-build-risc-v-hardware-and-software/ (accessed: 01.05.2023).
4. Akshintala A., Jain B., Tsai C., Ferdman M., Porter D. X86-64 Instruction Usage among C/C++ Applications. Proceedings of The 12th ACM International Conference On Systems And Storage. pp. 68-79 (2019), DOI: 10.1145/3319647.3325833.
5. GitHub repository, Available at: https://github.com/DanilaPechenev/InstructionAnalysisFramework/tree/syrcose (accessed: 01.05.2023).
6. Kollara A. Opcode Frequency Based Malware Detection Using Hybrid Classifiers. National College of Ireland, 2020.
7. Bilar D. Opcodes as Predictor for Malware. Int. J. Electron. Secur. Digit. Forensic. 1, 156-168 (2007,1), DOI: 10.1504/IJESDF.2007.016865.
8. Baldwin J., Dehghantanha A. Leveraging support vector machine for opcode density based detection of crypto-ransomware. Cyber Threat Intelligence. pp. 107-136 (2018), DOI: 10.1007/978-3-319-73951-9 6.
9. Rad B., Masrom M., Ibrahim S. Opcodes histogram for classifying metamorphic portable executables malware. 2012 International Conference On E-Learning And E-Technologies In Education (ICEEE). pp. 209-213 (2012), DOI: 10.1109/ICeLeTE.2012.6333411.
10. Han K., Kang B., Im E. Malware Classification Using Instruction Frequencies. Proceedings Of The 2011 ACM Symposium On Research In Applied Computation. pp. 298-300 (2011), DOI: 10.1145/2103380.2103441.
11. Shabtai A., Moskovitch R., Feher C., Dolev S., Elovici Y. Detecting unknown malicious code by applying classification techniques on opcode patterns. Security Informatics. 1, 1-22 (2012).
12. Ding Y., Dai W., Yan S., Zhang Y. Control flow-based opcode behavior analysis for Malware detection. Computers & Security. 44 pp. 65-74 (2014), DOI: 10.1016/j.cose.2014.04.003.
13. Kenneth V. Opcode statistics for detecting compiler settings. University of Amsterdam, 2018.
14. Mutigwe C., Kinyua J., Aghdasi F. Instruction set usage analysis for application-specific systems design. Int’l Journal Of Information Technology And Computer Science. 7 (2013).
15. Ibrahim A., Abdelhalim M., Hussein H., Fahmy A. An Analysis of x86-64 Instruction Set for Optimization of System Softwares. International Journal Of Advanced Computer Science. 1, 152-162 (2011, 10).
16. Lopes B., Auler R., Ramos L., Borin E., Azevedo R. SHRINK: Reducing the ISA Complexity via Instruction Recycling. SIGARCH Comput. Archit. News. 43, 311-322 (2015,6), DOI: 10.1145/2872887.2750391.
17. DockerHub repository, Available at: https://hub.docker.com/repository/docker/danilapechenev/instructionanalysis/general (accessed: 01.05.2023).
18. Obtained datasets, Available at: https://github.com/DanilaPechenev/InstructionAnalysisFramework/tree/syrcose-data (accessed: 01.05.2023).
19. Framework documentation, Available at: https://danilapechenev.github.io/InstructionAnalysisFramework/ (accessed: 01.05.2023).
20. x86 and amd64 instruction reference, Available at: https://www.felixcloutier.com/x86/ (accessed: 01.05.2023).
21. x86 Opcode and Instruction Reference, Available at: http://ref.x86asm.net/geek.html (accessed: 01.05.2023).
22. x86-64 Instructions Set (Linux Assembly libraries project), Available at: https://linasm.sourceforge.net/docs/instructions/index.php (accessed: 01.05.2023).
Review
For citations:
PECHENEV D.E., KIRILENKO I.A., AFONINA O.A. Framework for Machine Instruction Usage Analysis. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(3):163-170. https://doi.org/10.15514/ISPRAS-2023-35(3)-12