Machine Learning-Based Validation of Warnings in an Industrial Static Code Analyzer
https://doi.org/10.15514/ISPRAS-2025-37(6)-6
Abstract
This paper describes a mechanism for the automatic classification of static analysis warnings using machine learning methods. Static analysis is a tool for detecting potential vulnerabilities and bugs in source code. However, static analyzers often generate a large number of warnings, including both true and false positives. Manually analyzing all the defects found by the analyzer is a labor-intensive and time-consuming task. The developed automatic classification mechanism demonstrated high precision of more than 93% with a recall of about 96% on a set of warnings generated by the industrial static analysis tool Svace during the analysis of real-world projects. The dataset for the machine learning model is generated based on the warnings and source code metrics obtained during the static analysis of the project. The paper explores various approaches to feature selection and processing for the classifier, taking into account the characteristics of different machine learning algorithms. The mechanism’s efficiency and its independence from the programming language allowed it to be integrated into the industrial static analysis tool Svace. Various approaches to integrating the tool were considered, accounting for the specifics of the static analyzer, and the most convenient one was selected.
About the Authors
Uljana Vladimirovna TSIAZHKAROBRussian Federation
Postgraduate student of the Phystech School of Radio Engineering and Computer Technologies of MIPT, employee of the ISP RAS. Research interests: compiler technologies, static program analysis, machine learning.
Mikchail Vladimirovich BELYAEV
Russian Federation
Researcher at ISP RAS. Research interests: compiler technologies, static program analysis.
Andrey Andreevich BELEVANTSEV
Russian Federation
Dr. Sci. (Phys.-Math.), Prof., leading researcher at ISP RAS, Professor at Moscow State University. Research interests: static analysis, program optimization, parallel programming.
Valery Nikolayevich IGNATYEV
Russian Federation
Cand. Sci. (Phys.-Math.) in computer sciences, senior researcher at Ivannikov Institute for System Programming RAS and associate professor at system programming division of CMC faculty of Lomonosov Moscow State University. His research interests include program analysis techniques for error detection in program source code using classical static analysis and machine learning.
References
1. Tsiazhkorob U. V., Ignatyev V. N. Classification of Static Analyzer Warnings using Machine Learning Methods //2024 Ivannikov Memorial Workshop (IVMEM). – IEEE, 2024. – С. 69-74.
2. Иванников В. П. и др. Статический анализатор Svace для поиска дефектов в исходном коде программ //Труды Института системного программирования РАН. – 2014. – Т. 26. – №. 1. – С. 231 250.
3. Lee S. et al. Classifying false positive static checker alarms in continuous integration using convolutional neural networks //2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). – IEEE, 2019. – С. 391-401.
4. Alikhashashneh E. A., Raje R. R., Hill J. H. Using machine learning techniques to classify and predict static code analysis tool warnings //2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA). – IEEE, 2018. – С. 1-8.
5. Christmann A., Steinwart I. Support vector machines. – 2008.
6. Dasarathy B. V. Nearest neighbor (NN) norms: NN pattern classification techniques //IEEE Computer Society Tutorial. – 1991.
7. Breiman L. Random forests //Machine learning. – 2001. – Т. 45. – С. 5-32.
8. Witten I. H. et al. Data Mining: Practical machine learning tools and techniques. – Elsevier, 2025.
9. Rajput A. et al. J48 and JRIP rules for e-governance data //International Journal of Computer Science and Security (IJCSS). – 2011. – Т. 5. – №. 2. – С. 201.
10. Understand: The Software Developer’s Multi-Tool [Электронный ресурс]. – URL: https://scitools.com.
11. CWE – Common Weakness Enumeration [Электронный ресурс]. – URL: https://cwe.mitre.org.
12. Center for Assured Software N.S.A. Juliet Test Suite v1.1 for C/C++ User Guide. [Электронный ресурс]. – URL: https://samate.nist.gov/SARD/downloads/documents/Juliet_Test_Suite_v1.1_for_C_Cpp_-_User_Guide.pdf.
13. Software metric [Электронный ресурс]. URL: https://en.wikipedia.org/wiki/Software_metric
14. Lee M. C. Software quality factors and software quality metrics to enhance software quality assurance //British Journal of Applied Science & Technology. – 2014. – Т. 4. – №. 21. – С. 3069-3095.
15. Белеванцев А. А., Велесевич Е. А. Анализ сущностей программ на языках Си/Си++ и связей между ними для понимания программ //Труды Института системного программирования РАН. – 2015. – Т. 27. – №. 2. – С. 53-64.
16. McKinney W. et al. Data structures for statistical computing in Python //SciPy. – 2010. – Т. 445. – №. 1. – С. 51-56.
17. Pedregosa F. et al. Scikit-learn: Machine learning in Python //the Journal of machine Learning research. – 2011. – Т. 12. – С. 2825-2830.
18. Beautiful Soup: We called him Tortoise because he taught us. [Электронный ресурс]. URL: https://www.crummy.com/software/BeautifulSoup/.
19. Dorogush A. V., Ershov V., Gulin A. CatBoost: Gradient boosting with categorical features support. arXiv 2018 //arXiv preprint arXiv:1810.11363. – 1810.
20. Parmar A., Katariya R., Patel V. A review on random forest: An ensemble classifier //International conference on intelligent data communication technologies and internet of things. – Cham : Springer International Publishing, 2018. – С. 758-763.
21. Chen T., Guestrin C. Xgboost: A scalable tree boosting system //Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. – 2016. – С. 785-794.
22. Cohesion (computer science) [Электронный ресурс]. URL: https://en.wikipedia.org/wiki/Cohesion_(computer_science).
23. Belevantsev A. et al. Design and development of Svace static analyzers //2018 Ivannikov Memorial Workshop (IVMEM). – IEEE, 2018. – С. 3-9.
24. T. Kremenek. Finding software bugs with the Clang static analyzer [Электронный ресурс]. URL: https://llvm.org/devmtg/2008-08/Kremenek_StaticAnalyzer.pdf.
25. SpotBugs: Find bugs in Java Programs [Электронный ресурс]. URL: https://spotbugs.github.io.
26. Koshelev V. K. et al. SharpChecker: Static analysis tool for C# programs //Programming and Computer Software. – 2017. – Т. 43. – С. 268-276.
27. Svacer Wiki [Электронный ресурс]. URL: https://svacer.ispras.ru/.
28. О центре | Linux Verification Center. URL: http://linuxtesting.ru/about.
29. Swagger UI: Svacer Server 10.x.x. Svacer REST API documentation [Электронный ресурс]. URL: https://svacer-demo.ispras.ru/api/public/swagger/.
30. AI-ассистент для разметки предупреждений анализатора [Электронный ресурс]. URL: https://svace.pages.ispras.ru/svace-website/2025/02/21/ai-assistant.html.
Review
For citations:
TSIAZHKAROB U.V., BELYAEV M.V., BELEVANTSEV A.A., IGNATYEV V.N. Machine Learning-Based Validation of Warnings in an Industrial Static Code Analyzer. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(6):101-120. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(6)-6






