Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Copy-paste semantic errors detection

https://doi.org/10.15514/ISPRAS-2015-27(2)-6

Abstract

The paper describes a method for semantic errors detection arising during incorrect code copy-paste made by the developer. The method consists of two basic parts. The first part detects code clones based on lexical analysis of the program. A sequence of tokens is constructed based on the LLVM lexer and then all pairs of maximal, non-intersected matched token sequences are detected. The pairs of identical subsequences are then partially parsed to retain the constructs allowed by the programming language and to remove the incomplete sequences. When the remaining subsequences are big enough, the second stage is applied for them. A Program Dependence Graph (PDG) is constructed for the corresponding function code, and then identical subsequences’ subgraphs are considered. If two subgraphs have shared vertices, then outgoing edges of these vertices are analyzed. This allows detecting semantic errors with high accuracy. The described method is implemented for the LLVM/Clang compiler. Due to this semantic mistakes are detected during program compile time, so there is no need for separate lexical and semantic program analysis. A number of widely used open source libraries and software systems were analyzed. The paper contains the list of detected semantic errors for Linux kernel 2.6 and Android 4.3. For these systems, the true positive rate achieved by our approach is above 65%.

About the Author

Sevak Sargsyan
ISP RAS
Russian Federation


References

1. B. Baker, On finding duplication and near-duplication in large software systems, Proceedings of the 2nd Working Conference on Reverse Engineering, WCRE 1995, pp. 86-95, 1995.

2. C. K. Roy and J. R. Cordy, An empirical study of function clones in open source software systems, Proceedings of the 15th Working Conference on Reverse Engineering, WCRE 2008, pp. 81-90, 2008.

3. S. Ducasse, M. Rieger and S. Demeyer, A language independent approach for detecting duplicated code, Proceedings of the 15th International Conference on Software Maintenance, (ICSM'99), Oxford, England, UK, pp. 109-119, 1999.

4. T.Kamiya, S.Kusumoto and K.Inoue, CCFinder: A multilinguistic token-based code clone detection system for large scale source code", IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, 2002.

5. I. Baxter, A. Yahin, L. Moura and M. Anna, Clone detection using abstract syntax trees, Proceedings of the 14th IEEE International Conference on Software Maintenance, IEEE Computer Society, pp. 368-377, 1998.

6. L.Jiang, G.Misherghi, Z.Su and S.Glondu, DECKARD : Scalable and accurate tree-based detection of code clones", Proceedings of the 29th International Conference on Software Engineering, (ICSE07), IEEE Computer Society, pp. 96-105, 2007.

7. J. Mayrand, C. Leblanc and E. Merlo, Experiment on the automatic detection of function clones in a software system using metrics, Proceedings of the 12th International Conference on Software Maintenance, (ICSM96), Monterey, CA, USA, pp. 244-253, 1996.

8. Sargsyan S., Kurmangaleev S., Baloian A., Aslanyan H., Scalable and Accurate Clones Detection Based on Metrics for Dependence Graph, Mathematical Problems of Computer Science, Volume 42, pp. 54-62, 2014.

9. R.Komondoor and S.Horwitz, Using slicing to identify duplication in source code, Proceedings of the 8th International Symposium on Static Analysis, pp. 40-56, 2001.

10. J. Krinke, Identifying similar code with program dependence graphs, Proceedings of the 8th Working Conference on Reverse Engineering, (WCRE 2001), pp. 301-309, 2001.

11. Y. Higo and S. Kusumoto, Code clone detection on specialized PDGs with heuristics, Proceedings of the 15th European Conference on Software Maintenance and Reengineering (CSMR11), Oldenburg, Germany, pp.75-84, 2011.

12. S. Sargsyan, S. Kurmnagaleev, A. Belevantsev, H. Aslanyan, A. Baloian, Scalable code clone detection tool based on semantic analysis, The Proceedings of ISP RAS, vol. 27, issue 1, 2015.

13. K. Miryung, S. Person, N. Rungta, Detecting and characterizing semantic inconsistencies in ported code, Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference, pp. 367-377

14. Z. Li, S. Lu, S. Myagmar, Y. Zhou, CP-Miner: Finding copy-paste and related bugs in large-scale software code, IEEE Transactions on Software Engineering 32 (3) (2006) 176-192.

15. P. Jablonski, D. Hou, CReN: A Tool for Tracking Copy-and-Paste Code Clones and Renaming Identifiers Consistently in the IDE, in Proceedings of the 2007 OOPSLA workshop on eclipse technology, 2007, pp.16-20.

16. L. Jiang, Z. Su, E. Chiu, Context-Based Detection of Clone-Related Bugs, in Proceedings of the 6th joint meeting of the European software engineering conference, 2007, pp. 55-64.

17. Y. Higo, S. Kusumoto, MPAnalyzer: A Tool for Finding Unintended Inconsistencies in Program Source Code, in Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, 2014, pp.843-846.

18. Bellon S., Koschke R., Antoniol G., Krinke J., Merlo E., Comparison and evaluation of clone detection tools, Transactions on Software Engineering 33 (9)(2007) 577–591

19. http://llvm.org


Review

For citations:


Sargsyan S. Copy-paste semantic errors detection. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2015;27(2):93-104. (In Russ.) https://doi.org/10.15514/ISPRAS-2015-27(2)-6



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)