Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Static Analysis on Abstract Syntax Trees Based on Finite Automata

https://doi.org/10.15514/ISPRAS-2025-37(1)-2

Abstract

The paper describes a way of organizing static analysis on abstract syntax trees (AST) widely used for finding coding errors as well as errors that do not require deep understanding of program semantics. For most known types of errors, a formalization is proposed that is based on finite automata over trees (FATs), similar to NFAs and DFAs for regular languages over an alphabet. In contrast to the theory of FATs developed for alphabets with bounded arity, which in practice fixes the number of children for each node type, the case of a finite number of children without a priori bound is considered. Nondeterministic and deterministic FAT are described, their equivalence is shown, the closure of regular languages over trees with respect to union and intersection is shown, and the linear complexity of the FAT tree recognition problem is proven. For the AST analysis algorithms not covered by regular languages over trees, context-free languages are considered.

About the Author

Valery Nikolayevich IGNATYEV
Ivannikov Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Cand. Sci. (Phys.-Math.) in computer sciences, senior researcher at Ivannikov Institute for System Programming RAS and associate professor at system programming division of CMC faculty of Lomonosov Moscow State University. His research interests include program analysis techniques for error detection in program source code using classical static analysis and machine learning.



References

1. Marpons, G., Mariño, J., Carro, M., Herranz, Á., Moreno-Navarro, J.J., Fredlund, LÅ. (2007). Automatic Coding Rule Conformance Checking Using Logic Programming. In: Hudak, P., Warren, D.S. (eds) Practical Aspects of Declarative Languages. PADL 2008. Lecture Notes in Computer Science, vol 4902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77442-6_3.

2. С.В. Сыромятников. Декларативный интерфейс поиска дефектов по синтаксическим деревьям: язык KAST. Труды Института системного программирования РАН, т. 20, 2011, с. 51-68.

3. Tomoko Matsumura, Akito Monden, and Ken-ichi Matsumoto. 2002. A method for detecting faulty code violating implicit coding rules. In Proceedings of the International Workshop on Principles of Software Evolution (IWPSE '02). Association for Computing Machinery, New York, NY, USA, 15–21. https://doi.org/10.1145/512035.512040.

4. Stan Jarzabek. – «Design of flexible static program analyzers with PQL». – В: IEEE Transactions on software engineering 24.3 (1998), pp. 197– 215.

5. А. А. Белеванцев. Многоуровневый статический анализ исходного кода программ для обеспечения качества программ. Программирование, 2017, т. 43, № 6, с. 3-26.

6. V.K. Koshelev, V.N. Ignatiev, A.I. Borzilov и A.A. Belevantsev. – «SharpChecker: Static analysis tool for C# programs». – В: Programming and Computer Software 43 (2017), pp. 268– 276.

7. Comon H. et al. Tree automata techniques and applications. – 2008.

8. Gécseg, Ferenc, and Magnus Steinby. «Tree automata» --- 1984, 2015.

9. Neven F., Schwentick T. Query automata over finite trees //Theoretical Computer Science. – 2002. – Т. 275. – №. 1-2. – pp. 633-674.

10. M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM Transactions on Internet Technology, 5(4):660–704, 2005.

11. G Shobha et al. – «Code clone detection– a systematic review». – В: Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2020, Volume 2 (2021), pp. 645– 655.

12. D.A. Koryabkin, and V.N. Ignatyev. – «Automatic Mining of Code Fix Patterns from Code Repositories». – В: 2022 Ivannikov Memorial Workshop (IVMEM). – IEEE. 2022, – pp. 27– 34.

13. U.V. Tsiazhkorob и V.N. Ignatyev. – «Classification of Static Analyzer Warnings using Machine Learning Methods». – В: 2024 Ivannikov Memorial Workshop (IVMEM). – IEEE. 2024, – pp. 69– 74.

14. N.V. Shimchik, V.N. Ignatyev, and A.A. Belevantsev. – «Improving accuracy and completeness of source code static taint analysis». – В: 2021 Ivannikov ISPRAS OPEN Conference (ISPRAS). – IEEE. 2021, – pp. 61– 68.


Review

For citations:


IGNATYEV V.N. Static Analysis on Abstract Syntax Trees Based on Finite Automata. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(1):41-54. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(1)-2



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)