Modification of Valiant’s algorithm for the string-matching problem
https://doi.org/10.15514/ISPRAS-2020-32(2)-11
Abstract
The theory of formal languages and, particularly, context-free grammars has been extensively studied and applied in different areas. For example, several approaches to the recognition and classification problems in bioinformatics are based on searching the genomic subsequences possessing some specific features which can be described by a context-free grammar. Therefore, the string-matching problem can be reduced to parsing – verification if some subsequence can be derived in this grammar. Such field of application as bioinformatics requires working with a large amount of data, so it is necessary to improve the existing parsing techniques. The most asymptotically efficient parsing algorithm that can be applied to any context-free grammar is a matrix-based algorithm proposed by Valiant. This paper aims to present Valiant’s algorithm modification, which main advantage is the possibility to divide the parsing table into successively computed layers of disjoint submatrices where each submatrix of the layer can be processed independently. Moreover, our approach is easily adapted for the string-matching problem. Our evaluation shows that the proposed modification retains all benefits of Valiant’s algorithm, especially its high performance achieved by using fast matrix multiplication methods. Also, the modified version decreases a large amount of excessive computations and accelerates the substrings searching.
About the Authors
Yuliya Alekseevna SUSANINARussian Federation
Graduate student
Anna Nikitichna YAVEYN
Russian Federation
Graduate student
Semyon Vyacheslavovich GRIGOREV
Russian Federation
PhD, Lecturer
References
1. Okhotin A. Conjunctive grammars. Journal of Automata, Languages and Combinatorics, vol. 6, no. 4, 2001, pp. 519-535.
2. Chomsky N. On certain formal properties of grammars. Information and control, vol. 2, no. 2, 1959, pp. 137-167.
3. Kasami T. An efficient recognition and syntax analysis algorithm for context-free languages. Technical Report AFCRL-65-758, Air Force Cambridge Research Laboratory, 1965.
4. Younger D.H. Recognition and parsing of context-free languages in time n3. Information and control, vol. 10, no. 2, 1967, pp. 189-208.
5. Valiant L.G. General context-free recognition in less than cubic time. Journal of computer and system sciences, vol. 10, no. 2, 1975, pp. 308-315.
6. Okhotin A. Conjunctive and Boolean grammars: the true general case of the context-free grammars. Computer Science Review, vol. 9, 2013, pp. 27-59.
7. Okhotin A. Boolean grammars. Information and Computation, vol. 194, issue 1, 2004, pp. 19-48.
8. Okhotin A. Parsing by matrix multiplication generalized to Boolean grammars. Theoretical Computer Science, vol. 516, 2014, pp. 101-120.
9. Albrecht M., Bard G. The M4RI Library. The M4RI Team. 2019. Available at: https://bitbucket.org/malb/m4ri, accessed 20.01.2010.
10. Earley J. An effcient context-free parsing algorithm. Communications of the ACM, vol. 13, no. 2, 1970, pp.94-102
11. Bernardy J.P., Claessen K. Effcient divide-and-conquer parsing of practical context-free languages. ACM SIGPLAN Notice, vol. 48, no. 9, 2013, pp.111-122
12. Grigorev S., Lunina P. The Composition of Dense Neural Networks and Formal Grammars for Secondary Structure Analysis. In Proc. of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies, vol. 3, 2019, pp. 234-241.
13. Rivas E., Eddy S.R. The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics, vol. 16, no. 4, 2000, pp. 334-340.
14. Knudsen B. and Hein J. Rna secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics, vol. 15, no. 6, 1999, pp. 446-454.
15. Dowell R.D., Eddy S.R. Evaluation of several lightweight stochastic context-free grammars for rna secondary structure prediction. BMC bioinformatics, vol. 5, no. 1, 2004, Article number: 71.
16. Durbin R., Eddy S., Krogh A., and Mitchison G. Biological sequence analysis. Cambridge University Press, 1996, 356 p.
17. Hopcroft J.E., Ullman J.D. Formal languages and their relation to automata. Addison-Wesley, 1969, 242 p.
Review
For citations:
SUSANINA Yu.A., YAVEYN A.N., GRIGOREV S.V. Modification of Valiant’s algorithm for the string-matching problem. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2020;32(2):135-148. (In Russ.) https://doi.org/10.15514/ISPRAS-2020-32(2)-11