Modification of the Smith-Waterman Algorithm for Local Alignment of Genetic Sequences Based on the Window Method
https://doi.org/10.15514/ISPRAS-2025-37(5)-14
Abstract
The paper presents a modified algorithm for local alignment of genetic sequences based on the Smith-Waterman algorithm, using window method and run-length encoding. an experimental comparison of the performance of the proposed approach with the classical algorithm by such metrics as execution time, average and peak memory usage is carried out. The results demonstrate the effectiveness of the modification while preserving the quality of alignment, especially in resource-constrained environments. The work has practical implications for bioinformatics tasks involving genome analysis, gene annotation and homologous site search.
Keywords
About the Authors
Ekaterina Sergeevna BEZUGLOVARussian Federation
Graduated with a master's degree in Mathematics and Computer Science in 2023 from the North Caucasus Federal University. She is currently a postgraduate student and a senior mid-level employee at the Research Laboratory of Biological and Medical Informatics at the Faculty of Medicine and Biology at the North Caucasus Federal University. Her research interests include bioinformatics, data analysis, machine learning, neural networks, and cloud computing.
Egor Mikhailovich SHIRIAEV
Russian Federation
Graduated from the master's program "Applied Mathematics and Computer Science" in 2022, at the North Caucasus Federal University. Currently, he is a postgraduate student, a research engineer at the Department of Science, and an assistant at the Department of Computational Mathematics and Cybernetics at the North Caucasus Federal University. His research interests are in homomorphic encryption methods, privacy-preserving neural networks, cloud computing and cybersecurity.
Nikolay Nikolaevich KUCHEROV
Russian Federation
Cand. Sci. (Tech.) since 2018. He graduated from North-Caucasus Federal University, Stavropol, Russia, in 2012, and has been working as an Assistant Professor with the Department of Mathematical Analysis, Algebra and Geometry, North-Caucasus Federal University, Stavropol, Russia, since 2020. His research interests include cloud computing, high-performance computing, residue number systems, neural networks, and cybersecurity.
Mikhail Grigoryevich BABENKO
Russian Federation
Dr. Sci. (Phys.-Math.), Head of the Department of Computational Mathematics and Cybernetics, Faculty of Mathematics and Computer Science named after Professor N.I. Chervyakov, North Caucasus Federal University. His research interests include cloud computing, high-performance computing, residue number systems, neural networks, cryptography
References
1. Baxevanis A.D., Bader G.D., Wishart D.S. Bioinformatics. Hoboken, NJ: John Wiley & Sons, 2020, 656 p.
2. Olson M. V. The human genome project. Proceedings of the National Academy of Sciences, 1993. Vol. 90, no. 10, pp. 4338-4344.
3. Smith T. F., Waterman M. S. Identification of common molecular subsequences. Journal of Molecular Biology, 1981, vol. 147, no. 1, pp. 195-197.
4. Flynn M. J. Some Computer Organizations and Their Effectiveness. IEEE Transactions on Computers, 1972, vol. C-21, no. 9, pp. 948–960, DOI: 10.1109/TC.1972.5009071.
5. Farrar M. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics, 2007, vol. 23, no. 2, pp. 156-161, DOI: 10.1093/bioinformatics/btl582.
6. Barron E. T., Glorioso R. M. A micro controlled peripheral processor. Conference record of the 6th annual workshop on Microprogramming, in MICRO 6. New York, NY, USA: Association for Computing Machinery, 1973, pp. 122-128, DOI: 10.1145/800203.806247.
7. Liu Y., Wirawan A., Schmidt B. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 2013, vol. 14, no. 1, p. 117, DOI: 10.1186/1471-2105-14-117.
8. Rognes T. Faster Smith-Waterman database searches with inter-sequence SIMD parallelization. BMC Bioinformatics, 2011, vol. 12, no. 1, p. 221, DOI: 10.1186/1471-2105-12-221.
9. Robinson A. H., Cherry C. Results of a prototype television bandwidth compression scheme. Proceedings of the IEEE, 1967, vol. 55, no. 3, pp. 356–364, DOI: 10.1109/PROC.1967.5493.
10. Collett D. Modelling Binary Data. 2nd ed. New York: Chapman and Hall/CRC, 2002, p. 367, DOI: 10.1201/b16654
11. Lavrov I., Maksimova L. Problems in Set Theory, Mathematical Logic and the Theory of Algorithms. Springer Science & Business Media, 2003, p. 275.
12. Sayood K. Introduction to data compression. Morgan Kaufmann, 2017, p. 735.
13. Salomon D. A concise introduction to data compression. Springer Science & Business Media, 2007, p. 305.
14. Cormen T. H., Leiserson C. E., Rivest R. L. Introduction to algorithms. MIT press, 2022, p. 1251.
15. Rissanen J., Langdon G. G. Arithmetic Coding. IBM Journal of Research and Development, 1979, vol. 23, no. 2, pp. 149-162, DOI: 10.1147/rd.232.0149.
16. Chervyakov N., Babenko M., Tchenykh A., Dvoryaninova I., Kucherov N. Towards reliable low cost distributed storage in multi-clouds. 2017 International Siberian Conference on Control and Communications (SIBCON), 2017, pp. 1-6. DOI: 10.1109/SIBCON.2017.7998476.
17. Huffman D. A. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE, 1952, vol. 40, no. 9, pp. 1098-1101, DOI: 10.1109/JRPROC.1952.273898.
18. Witten I. H., Neal R. M., Cleary J. G. Arithmetic coding for data compression. Commun. ACM, 1987, vol. 30, no. 6, pp. 520-540, DOI: 10.1145/214762.214771.
19. Ziv J., Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 1977, vol. 23, no. 3, pp. 337-343, DOI: 10.1109/TIT.1977.1055714.
20. Winters K. D., Owsley P. A., French C. A., Bode R. M., Feeley P. S. Adaptive data compression system with systolic string-matching logic, US5532693A, 1996.
21. Welch T. A. A Technique for High-Performance Data Compression. Computer, 1984, vol. 17, no. 06, pp. 8-19, DOI: 10.1109/MC.1984.1659158.
22. Nelson M., Gailly J. L. The data compression book 2nd edition. M & T Books, New York, NY., 1995, p. 576.
23. Gusfield D. Algorithms on Stings, Trees, and Sequences: Computer Science and Computational Biology. SIGACT News, 1997, vol. 28, no. 4, pp. 41-60, DOI: 10.1145/270563.571472.
24. Rana Y. Python: simple though an important programming language. International Research Journal of Engineering and Technology (IRJET), 2019, vol. 6. no. 2, pp. 1856-1858.
Review
For citations:
BEZUGLOVA E.S., SHIRIAEV E.M., KUCHEROV N.N., BABENKO M.G. Modification of the Smith-Waterman Algorithm for Local Alignment of Genetic Sequences Based on the Window Method. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(5):183-194. https://doi.org/10.15514/ISPRAS-2025-37(5)-14






