Static Analysis Based on the Unified Abstract Syntax Tree
https://doi.org/10.15514/ISPRAS-2023-35(6)-6
Abstract
The paper describes a unified representation for an abstract syntax tree (AST) suitable for static analysis of several programming languages. The proposed analysis scheme consists of saving an intermediate representation in the form of a unified AST from compilers of the corresponding languages and subsequent analysis of the saved trees. We have implemented this described representation for Java, Kotlin and Python. The unified AST analyzer has 27 checkers. In the paper we present structure and entities of our unified AST, provide more details regarding language specifics that have to be reflected in the UAST representation. We give extensive experimental results that show UAST generation and analysis speed, analysis quality, and comparison with the old scheme of analyzing compiler ASTs where applicable. As a result, we see that we observe some degradation of analysis speed, but we pay it for the separation of AST construction and checkers’ implementation. This separation allows easier support of many languages in the analyzer, where one can just generate UAST and support the required checker once within the UAST infrastructure instead of implementing a checker once per language.
About the Authors
Vitaly Olegovich AFANASYEVRussian Federation
master’s student at the Faculty of Computer Science, NRU HSE, employee of ISP RAS. Research interests: compiler technologies, static analysis, JVM languages.
Alexey Evgenevich BORODIN
Russian Federation
Cand. Sci. (Phys.-Math.) - researcher. Research interests: static analysis for finding errors in source code.
Konstantin Igorevich VIHLIANCEV
Russian Federation
PSRECT MPTI student, ISP RAS researcher. Research interests: compiler technologies, static analysis, Python.
Andrey Andreevich BELEVANTSEV
Russian Federation
Dr. Sci (Phys.-Math.), Leading Researcher at ISP RAS, Professor at MSU. Research interests: static analysis, program optimization, parallel programming.
References
1. В. П. Иванников, А. А. Белеванцев, А. Е. Бородин, В. Н. Игнатьев, Д. М. Журихин, А. И. Аветисян и М. И. Леонов. Статический анализатор Svace для поиска дефектов в исходном коде программ. Труды ИСП РАН , 26(1):231—250, 2014. DOI: 10.15514/ISPRAS-2014-26(1)-7.
2. A. Belevantsev, A. Borodin, I. Dudina, V. Ignatiev, A. Izbyshev, S. Polyakov, E. Velesevich and D. Zhurikhin. Design and development of Svace static analyzers. In 2018 Ivannikov Memorial Workshop (IVMEM), pp.3—9. IEEE, 2018.
3. A.E. Borodin and A. A. Belevantsev. A static analysis tool Svace as a collection of analyzers with various complexity levels. Proceedings of the Institute for System Programming of the RAS, 27(6):111—134, 2015.
4. Clang static analyzer. URL: https://clang-analyzer.llvm.org (дата обращения 02.10.2023).
5. V. Afanasyev, S. Polyakov, A. Borodin and A. Belevantsev. Kotlin from the point of view of static analysis developer. Programming and Computer Software, 49(7):549—558, 2023. DOI: 10.1134/S0361768823070022.
6. Compiler tree api. Английский. URL: https://docs.oracle.com/javase/8/docs/jdk/api/javac/tree/index.html (дата обращения 18.09.2023).
7. Intellij platform plugin sdk. Psi elements. Английский. URL: https://plugins.jetbrains.com/docs/intellij/psi-elements.html (дата обращения 18.09.2023).
8. Ast — abstract syntax trees. Английский. URL: https://docs.python.org/3/library/ast.html (дата обращения 18.09.2023).
9. A. Belevantsev, A. Izbyshev and D. Zhurikhin. Monitoring program builds for svace static analyzer. System administrator, (7-8):135—139, 2017.
10. Treesitter. Introduction. Английский. URL: https://tree-sitter.github.io/tree-sitter/ (дата обращения 18.12.2023).
11. Python developer’s guide. Compiler design. Английский. URL: https://devguide.python.org/internals/compiler/ (дата обращения 28.09.2023).
12. Execution model. Английский. URL: https://docs.python.org/3.8/reference/executionmodel.html#execution-model (дата обращения 13.10.2023).
13. The global statement. Английский. URL: https://docs.python.org/3.8/reference/simple_stmts.html#the-global-statement (дата обращения 27.10.2023).
14. The nonlocal statement. Английский. URL: https://docs.python.org/3.8/reference/simple_stmts.html#the-nonlocal-statement (дата обращения 27.10.2023).
15. D. Strein, H. Kratz и W. Lowe. Cross-language program analysis and refactoring. In 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation, pp 207—216. IEEE, 2006.
16. R.-H. Pfeiffer и A. Wasowski. Texmo: a multi-language development environment. In European Conference on Modelling Foundations and Applications, pp 178—193. Springer, 2012.
17. Uast – unified abstract syntax tree. Intellij platform plugin sdk. Английский. URL: https://plugins.jetbrains.com/docs/intellij/uast.html (дата обращения 03.10.2023).
18. I. D. Baxter. Dms: program transformations for practical scalable software evolution. In Proceedings of the International Workshop on Principles of Software Evolution, pp 48—51, 2002.
19. A. Raza, G. Vogel и E. Plodereder. Bauhaus–a tool suite for program analysis and reverse engineering. In Reliable Software Technologies–Ada-Europe 2006: 11th Ada-Europe International Conference on Reliable Software Technologies, Porto, Portugal, June 5-9, 2006. Proceedings 11, pp 71—82. Springer, 2006.
20. М. В. Зубов, А. Н. Пустыгин и Е. В. Старцев. Применение универсальных промежуточных представлений для статического анализа исходного программного кода. Доклады Томского государственного университета систем управления и радиоэлектроники , (1 (27)):64—68, 2013.
21. С. В. Сыромятников. Декларативный интерфейс поиска дефектов по синтаксическим деревьям: язык kast. Труды Института системного программирования РАН , 20:51—68, 2011.
22. Н. Л. Луговской и С. В. Сыромятников. Применение языка kast для преобразования исходного кода и автоматического исправления дефектов. Труды Института системного программирования РАН , 25:51—66, 2013.
23. D. Harmim, V. Marcin и O. Pavela. Scalable static analysis using Facebook Infer. I, VI-B, 2019.
24. D. Harmim. Advanced static analysis of atomicity in concurrent programs through Facebook Infer. Proceedings of the Excel@ FIT’21.
25. Ast language (al). Английский. URL: https://fbinfer.com/docs/checker-linters/ (дата обращения 02.10.2023.
Review
For citations:
AFANASYEV V.O., BORODIN A.E., VIHLIANCEV K.I., BELEVANTSEV A.A. Static Analysis Based on the Unified Abstract Syntax Tree. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(6):103-120. (In Russ.) https://doi.org/10.15514/ISPRAS-2023-35(6)-6