Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Platform-independent and scalable tool for binary code clone detection

https://doi.org/10.15514/ISPRAS-2016-28(5)-13

Abstract

During the software development developers often copy and paste fragments of code to achieve the desired result. Copying of code can lead to variety of errors, as well as can increase the size of the source and binary code. The problem of finding semantically similar pieces of code (clones) in binary code becomes actual due to the unavailability of source code of many software programs. The first part of the article is dedicated to the analysis of the existing methods for finding code clone in binary code. In the second part we provide a newly developed tool for finding code clones in binary code. The work of the tool is divided into three main stages. The first stage is based on the Binnavi [1] framework, which is responsible for generation of program dependence graphs (PDG). Program dependence graphs are generated using REIL (Reverse Engineering Intermediate Language). The usage of REIL language allows to generate graphs for multiple architectures (x86, x86-64, ARM, MIPS, PPC), thus providing the independence of the tool from the target architecture. In the second step code clones are found based on previously created graphs. Maximum common subgraph is built for each pair of graphs and based on it, code clones are detected. In the third stage, the detected clones are visualized for convenient analysis of the results.

About the Authors

H. K. Aslanyan
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


S. F. Kurmangaleev
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


V. G. Vardanyan
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


M. S. Arutunian
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


S. S. Sargsyan
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


References

1. https://www.zynamics.com/binnavi.html

2. Ducasse S., Rieger M., Demeyer S., A language independent approach for detecting duplicated code, in: Proceedings of the 15th International Conference on Software Maintenance, 1999, pp. 109-119, DOI: 10.1109/ICSM.1999.792593.

3. Kamiya T., Kusumoto S., Inoue K., CCFinder: A multilinguistic tokenbased code clone detection system for large scale source code, IEEE Transactions on Software Engineering, 2002, vol. 28, no. 7, pp. 654-670, DOI: 10.1109/TSE.2002.1019480.

4. Baxter I., Yahin A., Moura L., Anna M., Clone detection using abstract syntax trees, in: Proceedings of the 14th IEEE International Conference on Software Maintenance, IEEE Computer Society, 1998, pp. 368-377, DOI: 10.1109/ICSM.1998.738528.

5. Tairas R., Gray J., Phoenix-based clone detection using suffix trees, in: Proceedings of the 44th Annual Southeast Regional Conference, 2006, pp. 679-684, DOI: 10.1145/1185448.1185597.

6. Jiang L., Misherghi G., Su Z., Glondu S., DECKARD : Scalable and accurate tree-based detection of code clones, in: Proceedings of the 29th International Conference on Software Engineering, IEEE Computer Society, 2007, pp. 96-105, DOI: 10.1109/ICSE.2007.30.

7. Komondoor R., Horwitz S., Using slicing to identify duplication in source code, in: Proceedings of the 8th International Symposium on Static Analysis, 2001, pp. 40-56, DOI: 10.1007/3-540-47764-0_3.

8. Krinke J., Identifying similar code with program dependence graphs, in: Proceedings of the 8th Working Conference on Reverse Engineering, 2001, pp. 301-309, DOI: 10.1109/WCRE.2001.957835.

9. Gabel M., Jiang L., Su Z., Scalable detection of semantic clones, in: Proceedings of 30th International Conference on Software Engineering, 2008, pp. 321-330, DOI: 10.1145/1368088.1368132.

10. Sargsyan S., Kurmangaleev S., Baloian A.., Aslanyan H., Scalable and Accurate Clones Detection Based on Metrics for Dependence Graph, Mathematical Problems of Computer Science, Volume 42, 2014, pp. 54-62.

11. Avetisyan A., Kurmangaleev S., Sargsyan S., Arutunian M., Belevantsev A. LLVMBased Code Clone Detection Framework. 10th International Conference on Computer Science and Information Technologies, 2015, pp. 178-182.

12. Sargsyan S., Kurmangaleev S., Belevantsev A., Avetisyan A. Scalable and accurate detection of code clones. Programming and Computer Software, 2016, issue 1. pp. 27-33. DOI: 10.1134/S0361768816010072

13. Sargsyan S., Kurmangaleev S., Belevantsev A., Aslanyan H., Baloian A. [Scalable tool for code clone detection based on semantic analysis of program]. Trudy ISP RAN/Proc. ISP RAS, vol. 27, issue 1, pp. 39-50 (in Russian). DOI: 10.15514/ISPRAS-2015-27(1)-3

14. J. Jang and D. Brumley. Bitshred: Fast, scalable code reuse detection in binary code (cmu-cylab-10-006). CyLab, 2009 page 28.

15. A. Schulman. Finding binary clones with opstrings function digests: Part III. Dr. Dobb’s Journal, 30(9):64, 2005.

16. D. Bruschi, L. Martignoni, and M. Monga. Code normalization for self-mutating malware. IEEE Security & Privacy, 5(2):46–54, 2007.

17. A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su. Detecting code clones in binary executables. In Proceedings of the 18th International Symposium on Software Testing and Analysis, ACM, 2009, pp. 117–128.

18. M. R. Farhadi, B. C. M. Fung, P. Charland and M. Debbabi, "BinClone: Detecting Code Clones in Malware," Software Security and Reliability (SERE), 2014 Eighth International Conference on, San Francisco, CA, 2014, pp. 78-87. doi: 10.1109/SERE.2014.21.

19. Thomas Dullien, Ero Carrera, Soeren-Meyer Eppler, Sebastian Porst «Automated attacker correlation for malicious code» // Technical report, DTIC Document, 2010.

20. https://www.hex-rays.com/products/ida/


Review

For citations:


Aslanyan H.K., Kurmangaleev S.F., Vardanyan V.G., Arutunian M.S., Sargsyan S.S. Platform-independent and scalable tool for binary code clone detection. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(5):215-226. (In Russ.) https://doi.org/10.15514/ISPRAS-2016-28(5)-13



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)