Optimization of Short Reads Alignment with Indels in Whole-Genome Sequencing
https://doi.org/10.15514/ISPRAS-2025-37(6)-30
Abstract
We present a novel method for aligning reads in whole-genome sequencing (WGS), aimed at improving alignment accuracy and the practical efficiency of this stage of genomic analysis. Unlike graph-based approaches, the proposed algorithm directly integrates knowledge of known genetic variants into the alignment process, enabling more accurate mapping of reads to the reference genome without constructing complex graph structures. The method has demonstrated high effectiveness on real sequencing data: we observed a consistent improvement in read alignment quality in highly variable and difficult-to-map regions of the genome. In particular, using variant information allows more precise alignment of reads that contain alternative alleles, reducing the number of mapping errors in these regions. At the same time, the required computational resources remain at an acceptable level, making this solution applicable in standard WGS pipelines without a significant increase in workload. The alignment speed of the algorithm is comparable to traditional solutions, which facilitates its integration into existing analytical pipelines.
The practical value of the method lies in the improved alignment accuracy, which directly affects the quality of downstream variant calling and other analyses. The proposed approach can serve as an effective alternative to current graph-based alignment methods, providing comparable improvements in alignment quality with lower complexity of implementation. Future work will include optimizing the algorithm’s performance, expanding the set of genetic variants accounted for, and conducting in-depth comparisons with other tools. These steps are intended to further increase the method’s efficiency and reliability, reinforcing its significance for practical use in genomics.
About the Authors
Nikita Artemovich KOLTUNOVRussian Federation
Laboratory assistant at the Ivannikov Institute for System Programming of the Russian Academy of Sciences, a specialist in bioinformatics.
Egor Pavlovich GUGUCHKIN
Russian Federation
Postgraduate student at the Ivannikov Institute for System Programming of the Russian Academy of Sciences, a specialist in bioinformatics.
Evgeny Andreevich KARPULEVICH
Russian Federation
Cand. Sci. (Phys.-Math.), Ivannikov Institute for System Programming of the Russian Academy of Sciences.
References
1. Halldorsson, B. V., Eggertsson, H. P., Moore, K. H., Hauswedell, H., Eiriksson, O., Ulfarsson, M. O., ... & Stefansson, K. (2022). The sequences of 150,119 genomes in the UK Biobank. Nature, 607(7920), 732 740.
2. Liao, W. W., Asri, M., Ebler, J., Doerr, D., Haukness, M., Hickey, G., ... & Paten, B. (2023). A draft human pangenome reference. Nature, 617(7960), 312-324.
3. Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68.
4. Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094 3100.
5. Chaisson, M. J., Sanders, A. D., Zhao, X., Malhotra, A., Porubsky, D., Rausch, T., ... & Lee, C. (2019). Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nature communications, 10(1), 1784.
6. Sirén, J., Monlong, J., Chang, X., Novak, A. M., Eizenga, J. M., Markello, C., ... & Paten, B. (2021). Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science, 374(6574), abg8871.
7. Illumina. (n.d.). ALT-aware mapping. Illumina DRAGEN Bio-IT Platform Documentation (v3.7). Available at: https://support.illumina.com/content/dam/illumina-support/help/Illumina_DRAGEN_Bio_IT_Platform_v3_7_1000000141465/Content/SW/Informatics/Dragen/GPipelineAltMap_fDG.html, accessed 12.11.2025.
8. Mun, T., Chen, N. C., & Langmead, B. (2021). LevioSAM: fast lift-over of variant-aware reference alignments. Bioinformatics, 37(22), 4243-4245.
9. Mose, L. E., Perou, C. M., & Parker, J. S. (2019). Improved indel detection in DNA and RNA via realignment with ABRA2. Bioinformatics, 35(17), 2966-2973.
10. National Institute of Standards and Technology (NIST). (n.d.). Genome in a Bottle. NIST. Available at: https://www.nist.gov/programs-projects/genome-bottle, accessed 13.11.2025.
11. Olson, N. D., Wagner, J., McDaniel, J., Stephens, S. H., Westreich, S. T., Prasanna, A. G., ... & Zook, J. M. (2022). PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. Cell genomics, 2(5).
12. Krusche, P., Trigg, L., Boutros, P. C., Mason, C. E., De La Vega, F. M., Moore, B. L., ... & Global Alliance for Genomics and Health Benchmarking Team. (2019). Best practices for benchmarking germline small-variant calls in human genomes. Nature biotechnology, 37(5), 555-560.
Review
For citations:
KOLTUNOV N.A., GUGUCHKIN E.P., KARPULEVICH E.A. Optimization of Short Reads Alignment with Indels in Whole-Genome Sequencing. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(6):211-222. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(6)-30






