Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Tolerant parsing with a special kind of «Any» symbol: the algorithm and practical application

https://doi.org/10.15514/ISPRAS-2018-30(4)-1

Abstract

Tolerant parsing is a form of syntax analysis aimed at capturing the structure of certain points of interest presented in a source code. While these points should be well-described in the corresponding language grammar, other parts of the program are allowed to be not presented in the grammar or to be described coarse-grained, thereby parser remains tolerant to the possible inconsistencies in the irrelevant area. Island grammars are one of the basic tolerant parsing techniques. “Island” is used as the relevant code alias, while the irrelevant code is called “water”. In the paper, a modified LL(1) parsing algorithm with built-in “Any” symbol processing is described. The “Any” symbol matches implicitly defined token sequences. The use of the algorithm for island grammars allows one to reduce irrelevant code description as well as to simplify patterns for relevant code matching. Our “Any” implementation is more accurate and less restrictive in comparison with the closest analogues implemented in Coco/R and LightParse parser generators. It also has potentially lower overhead than the “bounded seas” concept implemented in PetitParser. As shown in the experimental section, the tolerant parser generated by the C# island grammar is proven to be applicable for large-scale software projects analysis.

About the Authors

A. V. Goloveshkin
I.I. Vorovich Institute for Mathematics, Mechanics and Computer Science, Southern Federal University
Russian Federation


S. S. Mikhalkovich
I.I. Vorovich Institute for Mathematics, Mechanics and Computer Science, Southern Federal University
Russian Federation


References

1. Goloveshkin A.V. Searching and analysing crosscutting concerns in marked up programming language grammar. Izvestija vuzov. Severo-Kavkazskij region. Tehnicheskie nauki [University News. North-Caucasian Region. Technical Sciences Series], 2017, issue 3, pp. 29–34 (in Russian). DOI: 10.17213/0321-2653-2017-3-29-34.

2. Afroozeh A., Bach J.-C., van den Brand M., Johnstone A., Manders M., Moreau P.-E., Scott E. Island grammar-based parsing using GLL and Tom. Software Language Engineering: 5th International Conference, SLE 2012, Dresden, Germany, September 26-28, 2012, Revised Selected Papers. Springer Berlin Heidelberg, 2013, pp. 224–243.

3. Van den Brand M., Sellink M.P.A., Verhoef C. Obtaining a COBOL grammar from legacy code for reengineering purposes. In Proceedings of the 2nd International Conference on Theory and Practice of Algebraic Specifications. BCS Learning & Development Ltd., 1997, pp. 6–16.

4. Moonen L. Generating robust parsers using island grammars. In Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE’01). IEEE Computer Society, 2001, pp. 13–22.

5. Moonen L. Lightweight impact analysis using island grammars. In Proceedings of the 10th International Workshop on Program Comprehension (IWPC). IEEE Computer Society, 2002, pp. 219–228.

6. Graham S.L., Haley C.B., Joy W.N. Practical LR error recovery. SIGPLAN Notes, vol. 14, issue 8, 1979, pp. 168–175.

7. Burke M.G., Fisher G.A. A practical method for LR and LL syntactic error diagnosis and recovery. ACM Trans. Program. Lang. Syst., vol. 9, issue 2, 1987, pp. 164–197.

8. De Jonge M., Nilsson-Nyman E., Kats L.C.L., Visser E. Natural and flexible error recovery for generated parsers. Software Language Engineering: Second International Conference, SLE 2009, Denver, CO, USA, October 5-6, 2009, Revised Selected Papers. Springer Berlin Heidelberg, 2010, pp. 204–223.

9. Nilsson-Nyman E., Ekman T., Hedin G. Practical scope recovery using bridge parsing. Software Language Engineering: First International Conference, SLE 2008, Toulouse, France, September 29-30, 2008. Revised Selected Papers. Springer Berlin Heidelberg, 2009, pp. 95–113.

10. Koppler R. A systematic approach to fuzzy parsing. Software: Practice and Experience, vol. 27, issue 6, 1997, pp. 637–649.

11. Carvalho P., Oliveira N., Henriques P.R. Unfuzzying fuzzy parsing. 3rd Symposium on Languages, Applications and Technologies, ser. OpenAccess Series in Informatics (OASIcs), vol. 38, 2014, pp. 101–108

12. S. Klusener and R. Lämmel, Deriving tolerant grammars from a base-line grammar. In Proceedings of the International Conference on Software Maintenance. IEEE Computer Society, 2003, pp. 179–188.

13. Aycock J., Horspool R.N., Schrödinger’s token. Software: Practice and Experience, vol. 31, issue 8, 2001, pp. 803–814.

14. Grune D., Jacobs C.J. Parsing Techniques: A Practical Guide (2nd Edition). Springer-Verlag, New York, 2008, 662 p.

15. Scott E., Johnstone A. GLL parsing. Electron. Notes Theor. Comput. Sci., vol. 253, issue 7, 2010, pp. 177–189.

16. Mössenböck H. (2014) The compiler generator Coco/R. Available at: http://ssw.jku.at/Coco/Doc/UserManual.pdf, accessed 02.03.2018.

17. Malevannyy M. Lightweight parsing and its application in development environment. Informatizatsiya i svyaz [Informatization and communication], 2015, vol. 3, pp. 89–94 (in Russian).

18. Malevannyy M.S., Mikhalkovich S.S. Context-based model for concern markup of a source code. Trudy ISP RAN/Proc. ISP RAS, 2016, vol. 28, issue 2, pp. 63–78. DOI: 10.15514/ISPRAS-2016-28(2)-4.

19. Kurš J., Lungu M., Iyadurai R., Nierstrasz O. Bounded seas. Comput. Lang. Syst. Struct., 2015, vol. 44, pp. 114–140

20. Ford B. Packrat parsing: Simple, powerful, lazy, linear time. Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming, ser. ICFP ’02. ACM, 2002, pp. 36–47.

21. Aho A.V., Lam M.S., Sethi R., Ullman J.D. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., 2006, 1000 p.

22. Parr T., Harwell S., Fisher K. Adaptive LL(*) parsing: The power of dynamic analysis. SIGPLAN Notes, vol. 49, issue 10, 2014, pp. 579–598.


Review

For citations:


Goloveshkin A.V., Mikhalkovich S.S. Tolerant parsing with a special kind of «Any» symbol: the algorithm and practical application. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2018;30(4):7-28. https://doi.org/10.15514/ISPRAS-2018-30(4)-1



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)