Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Increasing Precision of Static Code Analysis Using Large Language Models

https://doi.org/10.15514/ISPRAS-2025-37(6)-5

Abstract

This paper describes an approach to verifying the results of static code analysis using large language models (LLMs), which filters warnings to eliminate false positives. To construct the prompt for LLM, the proposed approach retains information collected by the analyzer, such as abstract syntax trees of files, symbol tables, type and function summaries. This information can either be directly included in the prompt or used to accurately identify the code fragments required to verify the warning. The approach was implemented in SharpChecker – an industrial static analyzer for the C# language. Testing on real-world code demonstrated an improvement in result precision by up to 10 percentage points while maintaining high recall (0.8 to 0.97) for context-sensitive and interprocedural path-sensitive detectors of resource leaks, null dereferences, and integer overflows. In case of unreachable code detector, use of information from the static analyzer improved recall by 11–27 percentage points compared to an approach that only uses the program's source code in the prompt.

About the Authors

Danila Dmitrievich PANOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Senior laboratory assistant at ISP RAS, student at CMC faculty of Lomonosov Moscow State University. His research interests include static analysis of programs and large language models.



Nikita Vladimirovich SHIMCHIK
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Cand. Sci. (Tech.), researcher at ISP RAS. His research interests include static analysis of programs, large language models.



Dmitrii Aleksandrovich CHIBISOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Postgraduate student at ISP RAS. His research interests include static program analysis and large language models.



Andrey Andreevich BELEVANTSEV
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Dr. Sci. (Phys.-Math.), Prof., corresponding Member RAS, leading researcher at ISP RAS, Professor at Moscow State University. Research interests: static analysis, program optimization, parallel programming.



Valery Nikolayevich IGNATYEV
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Cand. Sci. (Phys.-Math.) in computer sciences, senior researcher at Ivannikov Institute for System Programming RAS and associate professor at system programming division of CMC faculty of Lomonosov Moscow State University. His research interests include program analysis techniques for error detection in program source code using classical static analysis and machine learning.



References

1. Gerasimov A. Y. Directed dynamic symbolic execution for static analysis warnings confirmation. Programming and Computer Software, vol. 44, 2018, pp. 316-323. DOI: 10.1134/S036176881805002X

2. Tsiazhkorob U. V., Ignatyev V. N. Classification of Static Analyzer Warnings using Machine Learning Methods. Ivannikov Memorial Workshop (IVMEM), IEEE, 2024, pp. 69-74. DOI: 10.1109/IVMEM63006.2024.10659704.

3. Ignatyev V. N., Shimchik N. V., Panov D. D., Mitrofanov A. A. Large language models in source code static analysis. Ivannikov Memorial Workshop (IVMEM), IEEE, 2024, pp. 28-35. DOI: 10.1109/IVMEM63006.2024.10659715.

4. GPT-4 | OpenAI, available at: https://openai.com/index/gpt-4/, accessed 14.05.2025.

5. Koshelev V. K., Ignatiev V. N., Borzilov A. I., Belevantsev A. A. SharpChecker: Static analysis tool for C# programs. Programming and Computer Software, vol. 43, 2017, pp. 268-276. DOI: 10.1134/S0361768817040041.

6. Ivannikov V. P., Belevantsev A. A., Borodin A. E., Ignatiev V. N., Zhurikhin D. M., Avetisyan A. I. Static analyzer Svace for finding defects in a source program code. Programming and Computer Software, vol. 40, 2014, pp. 265-275. DOI: 10.1134/S0361768814050041.

7. Li H., Hao Y., Zhai Y., Qian Z. Enhancing static analysis for practical bug detection: An llm-integrated approach. Proceedings of the ACM on Programming Languages, vol. 8, No. OOPSLA1, 2024, pp. 474-499. DOI: 10.1145/3649828.

8. Mohajer M. M., Aleithan R., Harzevili N. S., Wei M., Belle A. B., Pham H. V., Wang S. Effectiveness of ChatGPT for static analysis: How far are we? Proceedings of the 1st ACM International Conference on AI-Powered Software, 2024, pp. 151-160. DOI: 10.1145/3664646.3664777.

9. Wen C., Cai Y., Zhang B., Su J., Xu Z., Liu D., Qin S., Ming Z., Cong, T. Automatically inspecting thousands of static bug warnings with large language model: How far are we? ACM Transactions on Knowledge Discovery from Data, vol. 18, No. 7, 2024, pp. 1-34. DOI: 10.1145/3653718.

10. Li Z., Dutta S., Naik M. IRIS: llm-assisted static analysis for detecting security vulnerabilities. arXiv preprint arXiv:2405.17238, 2024.

11. Khare A., Dutta S., Li Z., Solko-Breslin A., Alur R., Naik M. Understanding the effectiveness of large language models in detecting security vulnerabilities. 2025 IEEE Conference on Software Testing, Verification and Validation (ICST), IEEE, 2025, pp. 103-114. DOI: 10.1109/ICST62969.2025.10988968.

12. Introduction – Tree-sitter, available at: https://tree-sitter.github.io/tree-sitter/, accessed 14.05.2025.

13. Mou L., Li G., Zhang L., Wang T., Jin Z. Convolutional neural networks over tree structures for programming language processing. Proceedings of the AAAI conference on artificial intelligence, vol. 30, No. 1, 2016. DOI: 2016.10.1609/aaai.v30i1.10139.

14. GitHub – The Roslyn .NET compiler, available at: https://github.com/dotnet/roslyn, accessed 14.05.2025.

15. GitHub – mozilla/dxr, available at: https://github.com/mozilla/dxr, accessed 14.05.2025.

16. vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs, available at: https://github.com/vllm-project/vllm, accessed 14.05.2025.


Review

For citations:


PANOV D.D., SHIMCHIK N.V., CHIBISOV D.A., BELEVANTSEV A.A., IGNATYEV V.N. Increasing Precision of Static Code Analysis Using Large Language Models. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(6):83-100. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(6)-5



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)