Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Scalable code clone detection tool based on semantic analysis

https://doi.org/10.15514/ISPRAS-2015-27(1)-3

Abstract

This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part of LLVM compiler, which allows exceeding existed methods. The tool is consisted of three basic parts. The first part is Program Dependence Graph (PDG) generation and serialization. PDG is constructed during compilation time of the project based on LLVM‘s intermediate representation. Several simple optimizations are applied on these graphs, then they are serialized to file. The second stage is analyzing of stored PDGs. PDGs are loaded from files and split to subgraphs. Every subgraph is considered as clone candidate.  New method is purposed for the splitting, which increases number of detected clones. There are two types of algorithms for clone detection. The first types of algorithms try to prove that the pair of PDGs cannot be clones. These algorithms have linear complexity, which allows processing huge amount of PDGs pairs. In case of failure graph isomorphism algorithms are applied for similar subgraphs detection. The last part is integrated system for automatic testing of algorithm’s accuracy. For the project, set of clones are automatically generated, then clone detection algorithms are applied for original source and generated one.

About the Authors

Sevak Sargsyan
Institute for System Programming of the Russian Academy of Sciences, Moscow
Russian Federation

Institute for System Programming of the Russian Academy of Sciences, 25, Alexander Solzhenitsyn st., Moscow, Russia, 109004.



Shamil Kurmnagaleev
Institute for System Programming of the Russian Academy of Sciences, Moscow
Russian Federation
Institute for System Programming of the Russian Academy of Sciences, 25, Alexander Solzhenitsyn st., Moscow, Russia, 109004.


Andrey Belevantsev
Institute for System Programming of the Russian Academy of Sciences, Moscow
Russian Federation
Institute for System Programming of the Russian Academy of Sciences, 25, Alexander Solzhenitsyn st., Moscow, Russia, 109004.


Hayk Aslanyan
Institute for System Programming of the Russian Academy of Sciences, Moscow
Russian Federation
Institute for System Programming of the Russian Academy of Sciences, 25, Alexander Solzhenitsyn st., Moscow, Russia, 109004.


Artiom Baloian
Institute for System Programming of the Russian Academy of Sciences, Moscow
Russian Federation
Institute for System Programming of the Russian Academy of Sciences, 25, Alexander Solzhenitsyn st., Moscow, Russia, 109004.


References

1. Baker B., On finding duplication and near-duplication in large software systems, in: Proceedings of the 2nd Working Conference on Reverse Engineering, 1995, pp. 86-95, DOI: 10.1109/WCRE.1995.514697.

2. Roy C.K., Cordy J.R., An empirical study of function clones in open source software systems, in: Proceedings of the 15th Working Conference on Reverse Engineering, 2008, pp. 81-90, DOI: 10.1109/WCRE.2008.54.

3. Bellon S., Koschke R., Antoniol G., Krinke J., Merlo E., Comparison and evaluation of clone detection tools, Transactions on Software Engineering 33, 2007, pp. 577–591, DOI: 10.1109/TSE.2007.70725.

4. Ducasse S., Rieger M., Demeyer S., A language independent approach for detecting duplicated code, in: Proceedings of the 15th International Conference on Software Maintenance, 1999, pp. 109-119, DOI: 10.1109/ICSM.1999.792593.

5. Manber U., Finding similar files in a large file system, in: Proceedings of the Winter 1994 Usenix Technical Conference, 1994, pp. 2-2.

6. Kamiya T., Kusumoto S., Inoue K., CCFinder: A multilinguistic tokenbased code clone detection system for large scale source code, IEEE Transactions on Software Engineering, 2002, vol. 28, no. 7, pp. 654-670, DOI: 10.1109/TSE.2002.1019480.

7. Baxter I., Yahin A., Moura L., Anna M., Clone detection using abstract syntax trees, in: Proceedings of the 14th IEEE International Conference on Software Maintenance, IEEE Computer Society, 1998, pp. 368-377, DOI: 10.1109/ICSM.1998.738528.

8. Tairas R., Gray J., Phoenix-based clone detection using suffix trees, in: Proceedings of the 44th Annual Southeast Regional Conference, 2006, pp. 679-684, DOI: 10.1145/1185448.1185597.

9. Jiang L., Misherghi G., Su Z., Glondu S., DECKARD : Scalable and accurate tree-based detection of code clones, in: Proceedings of the 29th International Conference on Software Engineering, IEEE Computer Society, 2007, pp. 96-105, DOI: 10.1109/ICSE.2007.30.

10. Mayrand J., Leblanc C., Merlo E., Experiment on the automatic detection of function clones in a software system using metrics, in: Proceedings of the 12th International Conference on Software Maintenance, 1996, pp. 244-253, DOI: 10.1109/ICSM.1996.565012.

11. Sargsyan S., Kurmangaleev S., Baloian A.., Aslanyan H., Scalable and Accurate Clones Detection Based on Metrics for Dependence Graph, Mathematical Problems of Computer Science, 2014, Volume 42, pp. 54-62.

12. Davey N., Barson P., Field S., Frank R., The development of a software clone detector, International Journal of Applied Software Technology, 1995, Volume 1, no. 3/4, pp. 219-236.

13. Gupta S., Gupta P. C., Literature Survey of Clone Detection Techniques, International Journal of Computer Applications, 2014, Volume 99, no. 3, pp. 41-44, DOI: 10.5120/17355-7858.

14. Komondoor R., Horwitz S., Using slicing to identify duplication in source code, in: Proceedings of the 8th International Symposium on Static Analysis, 2001, pp. 40-56, DOI: 10.1007/3-540-47764-0_3.

15. Krinke J., Identifying similar code with program dependence graphs, in: Proceedings of the 8th Working Conference on Reverse Engineering, 2001, pp.301-309, DOI: 10.1109/WCRE.2001.957835.

16. Gabel M., Jiang L., Su Z., Scalable detection of semantic clones, in: Proceedings of 30th International Conference on Software Engineering, 2008, pp. 321-330, DOI: 10.1145/1368088.1368132.

17. LLVM www.llvm.org


Review

For citations:


Sargsyan S., Kurmnagaleev Sh., Belevantsev A., Aslanyan H., Baloian A. Scalable code clone detection tool based on semantic analysis. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2015;27(1):39-50. (In Russ.) https://doi.org/10.15514/ISPRAS-2015-27(1)-3



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)