Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Spectral analytical method of recognition of inexact repeats in character sequences

https://doi.org/10.15514/ISPRAS-2015-27(6)-21

Abstract

Proposed are theoretical basis and algorithmic implementation of spectral-analytical method of recognition of repeats in character sequences. The theoretical justification is based on the theorem on equivalent representation of the character sequence by the vector of continuous characteristic functions. Comparison of fragments of characteristic functions is performed in the standard metric in Euclidean space of expansion coefficients of the Fourier series of orthogonal polynomials. An essential feature of this approach is the ability to evaluate repeats at different scales. Another important feature is the possibility of efficient parallelization of data. In the development of algorithms we preferred scheme of computing with a minimal amount of references to memory, implying repetitive calculations and evaluations on demand. In this paradigm, proposed is an algorithm for calculating the coefficients of expansions in the orthogonal polynomials through the use of recurrence relations. It is shown that the algorithm for calculating the coefficients of expansions in the orthogonal polynomials can be effectively vectorized by computing with a fixed vector length. Parallelization and vectorization implemented using the OpenMP standard and extension Cilk Plus of language C/C++. The developed method effectively scales, depending on the parameters of the problem and the number of processor cores on systems with shared memory.

About the Authors

A. N. Pankratov
IMPB RAS
Russian Federation


R. K. Tetuev
IMPB RAS
Russian Federation


M. I. Pyatkov
IMPB RAS
Russian Federation


V. P. Toigildin
Lomonosov Moscow State University, 2nd Education Building
Russian Federation


N. N. Popova
Lomonosov Moscow State University, 2nd Education Building
Russian Federation


References

1. Dedus F.F., Kulikova L.I., Makhortykh S.A., Nazipova N.N., Pankratov A.N. and Tetuev R.K. Analytical Recognition Methods for Repeated Structures in Genomes. Doklady Mathematics, 2006, Vol. 74, №3, pp. 926-929, doi: 10.1134/S1064562406060354.

2. Dedus F.F., Kulikova L.I., Makhortykh S.A., Nazipova N.N., Pankratov A.N., and Tetuev R.K. Recognition of the Structural–Functional Organization of Genetic Sequences. Moscow University Computational Mathematics and Cybernetics, 2007, Vol. 31, No. 2, pp.49–53, doi: 10.3103/S0278641907020021.

3. Pankratov A.N., Gorchakov M.A., Dedus F.F., Dolotova N.S., Kulikova L.I., Makhortykh S.A., Nazipova N.N., Novikova D.A., Olshevets M.M., Pyatkov M.I., Rudnev V.R., Tetuev R.K., and Filippov V.V. Spectral Analysis for Identification and Visualization of Repeats in Genetic Sequences. Pattern Recognition and Image Analysis, 2009, Vol. 19, №4, pp. 687–692.

4. Tetuev R.K., Nazipova N.N., Pankratov A.N., Dedus F.F. Search for Megasatellite Tandem Repeats in Eukaryotic Genomes by Estimation of GC-content Curve Oscillations. Math. Biol. Bioinf. 2010, 5(1):30-42, doi: 10.17537/2010.5.30.

5. Pankratov A.N., Pyatkov M.I., Tetuev R.K., Nazipova N.N., Dedus F.F. Search for Extended Repeats in Genomes Based on the Spectral-Analytical Method. Math. Biol. Bioinf. 2012;7(2):476-492, doi: 10.17537/2012.7.476.

6. Pyatkov M.I., Pankratov A.N. SBARS: fast creation of dotplots for DNA sequences on different scales using GA-, GC-content. Bioinformatics, Vol. 30, №12, 2014, pp. 1765–1766, doi: 10.1093/bioinformatics/btu095.

7. W.H.Press, S.A.Teukolsky, W.T.Vetterling, B.P.Flannery Numerical Recipes. The Art of Scientific Computing. Third Edition. Cambridge University Press, 2007, 1256 pp.


Review

For citations:


Pankratov A.N., Tetuev R.K., Pyatkov M.I., Toigildin V.P., Popova N.N. Spectral analytical method of recognition of inexact repeats in character sequences. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2015;27(6):335-344. (In Russ.) https://doi.org/10.15514/ISPRAS-2015-27(6)-21



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)