Static analyzer debugging and quality assurance approaches
https://doi.org/10.15514/ISPRAS-2020-32(3)-3
Abstract
Writing static analyzers is hard due to many equivalent transformations between program source, intermediate representation and large formulas in Satisfiability Modulo Theories (SMT) format. Traditional methods such as debugger usage, instrumentation, and logging make developers concentrate on specific minor issues. At the same time, each analyzer architecture imposes a unique view on how to represent the intermediate results required for debugging. Thus, error debugging remains a concern for each static analysis researcher. In this paper, our experience debugging a work-in-progress industrial static analyzer is presented. Several most effective techniques of constructive (code generation), testing (random test case generation) and logging (log fusion and visual representation) groups are presented. Code generation helps avoid issues with the copied code, we enhance it with the verification of the code usage. Goal-driven random test case generation reduces the risks of developing a tool highly biased towards specific syntax construction use cases by producing verifiable test programs with assertions. A log fusion merges module logs and sets up cross-references between them. The visual representation module shows a combined log, presents major data structures and provides health and performance reports in the form of log fingerprints. These methods are implemented on a basis of Equid, the static analysis framework for industrial applications, and are used internally for development purposes. They are presented in the paper, studied and evaluated. The main contributions include a study of failure reasons in the author's project, a set of methods, their implementations, testing results and two case studies demonstrating the usefulness of the methods.
About the Author
Maxim Alexandrovich MENSHIKOVRussian Federation
PhD student of the Department of System Programming
References
1. Lisa Nguyen Quang Do, Stefan Krüger, Patrick Hill, Karim Ali, Eric Bodden. Debugging static analysis. IEEE Transactions on Software Engineering, 2018.
2. M. Menshikov. Equid – a static analysis framework for industrial applications. Lecture Notes in Computer Science, vol. 11619, 2019, pp. 677–692.
3. GDB: The GNU Project Debugger. Available at: https: //www.gnu.org/software/gdb/.
4. The LLDB Debugger. Available at: https://lldb:llvm:org.
5. The interactive reverse debugger for Linux-based applications. Available at: https://undo.io/solutions/products/undodb-reverse-debugger/.
6. R. O’Callahan, C. Jones, N. Froyd, K. Huey, A. Noll, and N. Partush. Engineering record and replay for deployability. In Proc. of the 2017 USENIX Annual Technical Conference (USENIX ATC’17), 2017, pp. 377–389
7. J. Engblom. A review of reverse debugging. In Proc. of the 2012 System, Software, SoC and Silicon Debug Conference, 2012, pp. 1–6.
8. E. Eide and J. Regehr. Volatiles are miscompiled, and what to do about it. In Proc. of the 8th ACM International Conference on Embedded Software, 2008, pp. 255–264.
9. X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in c compilers. In Proc. of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011, pp. 283–294.
10. G. Barany. Liveness-driven random program generation. Lecture Notes in Computer Science, vol. 10855, 2018, pp. 112–127.
11. V.Yu. Livinskij, D.Yu. Babokin. Automation of search for optimization errors in C / C ++ language compilers using the Yet Another Random Program Generator. In Proc. of the 60th All-Russian Scientific Conference of MIPT. Radio engineering and computer technology, 2017, pp. 40-42 (in Russian) / В.Ю. Ливинский, Д.Ю. Бабокин. Автоматизация поиска ошибок оптимизации в компиляторах языков С/С++ с помощью генератора случайных тестов Yet Another Random Program Generator. Труды 60-й Всероссийской научной конференции МФТИ. Радиотехника и компьютерные технологии, 2017 г., стр. 40-42.
12. S. Takakura, M. Iwatsuji, and N. Ishiura. Extending equivalence transformation based program generator for random testing of c compilers. In Proc. of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, 2018, pp. 9–15.
13. M. Chupilko, A. Kamkin, A. Kotsynyak, and A. Tatarnikov. Microtesk: Specification-based tool for constructing test program generators. Lecture Notes in Computer Science, vol. 10629, 2017, pp. 217–220.
14. D. Binkley, M. Harman, and J. Krinke. Characterising, explaining, and exploiting the approximate nature of static analysis through animation. In Proc. of the 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation, 2006, pp. 43–52.
15. Sourcetrail – documentation. Available at: https://www:sourcetrail:com/documentation.
16. L. Voinea, A. Telea, and J. J. Van Wijk. Cvsscan: visualization of code evolution. In Proc. of the 2005 ACM symposium on Software visualization, 2005, pp. 47–56.
17. C. Collberg, S. Kobourov, J. Nagra, J. Pitts, and K. Wampler. A system for graph-based visualization of the evolution of software. In Proc. of the 2003 ACM Symposium on Software Visualization, 2003, pp. 77–86.
18. J.P.S. Alcocer, F. Beck, and A. Bergel. Performance evolution matrix: Visualizing performance variations along software versions. In Proc. of the 2019 Working Conference on Software Visualization (VISSOFT), 2019, pp. 1–11.
19. D. Yuan, S. Park, and Y. Zhou. Characterizing logging practices in open-source software. In Proc. of the 2012 34th International Conference on Software Engineering (ICSE), 2012, pp. 102–112.
20. Q. Fu, J. Zhu, W. Hu, J.-G. Lou, R. Ding, Q. Lin, D. Zhang, and T. Xie. Where do developers log? an empirical study on logging practices in industry. In the Companion Proceedings of the 36th International Conference on Software Engineering, 2014, pp. 24–33.
21. D. Jurafsky, J. Martin, P. Norvig, and S. Russell. Speech and Language Processing. Pearson Education, 2014, 1032 p.
22. M. Du, F. Li, G. Zheng, and V. Srikumar. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proc. of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 1285–1298.
23. A. Nandi, A. Mandal, S. Atreja, G. B. Dasgupta, and S. Bhattacharya. Anomaly detection using program control flow graph mining from execution logs. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 215–224.
24. O.I. Sheluhin, V.S. Rjabinin, and M.A. Farmakovskij. Anomaly detection in computer system by intellectual analysis of system journals. Voprosy kiberbezopasnosti, vol. 26, no. 2, 2018, pp. 33-43 (in Russian) / Шелухин О.И., Рябинин В.С., Фармаковский М.А. Обнаружение аномальных состояний компьютерных систем средствами интеллектуального анализа данных системных журналов. Вопросы кибербезопасности, том 26, no. 2, 2018 г., стр. 33-43.
25. B. John Smith F. Stephen Weiss. Hypertext. Communications of the ACM, vol. 31, no. 7, 1988, pp. 816–819.
26. R. Stallman, R. Pesch, and S. Shebs. Debugging with GDB: The GNU Source-Level Debugger. 12th Media Services, 2018, 826 p.
Review
For citations:
MENSHIKOV M.A. Static analyzer debugging and quality assurance approaches. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2020;32(3):33-47. https://doi.org/10.15514/ISPRAS-2020-32(3)-3