Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Modification of the Method for Calculating Polygenic Risks With Variation Graph

https://doi.org/10.15514/ISPRAS-2022-34(2)-15

Abstract

Representation of the DNA sequence is possible in various ways. The variation graph is one of the most accurate methods that allows you to work with atypical areas and take into account all their diversity. Based on this data structure and the polygenic risk assessment method, a DNA interpretation system was built. As a result, a correlation coefficient was obtained between the path in the column responsible for a specific DNA sequence and the feature. We then compared it with a coefficient obtained by a similar method but using sequence representation using a reference genome. Such a comparison helped to evaluate the effectiveness of the representation in the form of a graph. After that, a modified method for calculating the polygenic score on the alignment data of the vg tool was built, which was also compared with existing methods. The modified method showed an improvement in the prediction of the trait. 

About the Authors

Olesia Anatolevna Kondrateva
Ivannikov Institute for System Programming of the RAS, Lomonosov Moscow State University
Russian Federation

Master’ Student of the Department of System Programming of the MSU, also works at ISP RAS



Evgeny Andreevich Karpulevich
Ivannikov Institute for System Programming of the RAS
Russian Federation

Specialist of the Information Systems' Department



References

1. . Chaisson M.J., Tesler G.Mapping single molecule sequencing reads using basiclocal alignment with successive refinement (BLASR): application and theory.//BMC Bioinformatics. 2012. Vol. 13.P. 238.

2. . H., Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. //arXiv preprint arXiv:1303.3997. 2013.

3. . Ivan Sovi, MileˇSiki, Andreas Wilm et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap //Nature communications. 2016. Vol. 7, no. 1. Pp. 1–11.

4. . Lin, Hsin-Nan. Kart: a divide-and-conquer algorithm for NGS read alignment //Bioinformatics. 2017. Vol. 33, no. 15. Pp. 2281–2287.

5. . Li, Heng. Minimap2: pairwise alignment for nucleotide sequences //Bioinformatics. 2018. Vol. 34, no. 18. Pp. 3094–3100.

6. . Polyanovsky V.O., Roytberg M.A. Tumanyan V.G. Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences. //Algorithms Mol Biol. 2011. Vol. 6. P. 25.

7. . Mohamed Ibrahim Abouelhoda, Enno Ohlebusch. Chaining algorithms for multiple genome comparison //Journal of Discrete Algorithms. 2005. Vol. 3, no. 2. Pp. 321–341. Combinatorial Pattern Matching (CPM) Special Issue. https://www.sciencedirect.com/science/article/pii/S1570866704000589.

8. . Suzuki, Hajime. Introducing difference recurrence relations for faster semi-global alignment of long sequences // BMC bioinformatics. 2018. Vol. 19, no. 1. Pp. 33–47.

9. . Kim D., Paggi J.M. Park C. et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype //Nat Biotechnol. 2019. Vol. 37. P. 907–915.

10. . Li, Heng. Fast and accurate short read alignment with Burrows–Wheeler transform// bioinformatics. 2009. Vol. 25, no. 14. Pp. 1754–1760.

11. . Simpson, Jared T. Efficient construction of an assembly string graph using the FM-index // Bioinformatics. 2010. Vol. 26, no. 12. Pp. i367–i373

12. . Goran Rakocevic, Vladimir Semenyuk, Wan-Ping Lee et al. Fast and accurate genomic analyses using genome graphs.// Nature genetics. 2019. Vol. 51, no. 2. Pp. 354–362.

13. . Garrison E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference //Nature biotechnology. – 2018. – Т. 36. – №. 9. – С. 875-879.

14. . So, Hon-Cheong. Improving polygenic risk prediction from summary statistics by anempirical Bayes approach //Scientific reports. 2017. Vol. 7, no. 1. Pp. 1–11.

15. . So, Hon-Cheong. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study // Genetic epidemiology. 2011. Vol. 35, no. 6. Pp. 447–456.

16. . Weigel, Detlef. The 1001 genomes project for Arabidopsis thaliana // Genome biology. 2009. Vol. 10, no. 5. Pp. 1–5

17. . AraGWAS Catalog. ’M216T665’ / AraGWAS Catalog. 2019. http://aragwas.1001genomes.org/#/study/144.

18. . Weichun Huang, Leping Li, Jason R. Myers, Gabor T. Marth. ART: a next-generation sequencing read simulator // Bioinformatics. 2011. 12. Vol. 28, no. 4. Pp. 593–594. https://doi.org/10.1093/bioinformatics/btr708.

19. . Li, Heng. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data //Bioinformatics. 2011. Vol. 27, no. 21. Pp. 2987–2993.


Review

For citations:


Kondrateva O.A., Karpulevich E.A. Modification of the Method for Calculating Polygenic Risks With Variation Graph. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(2):191-200. (In Russ.) https://doi.org/10.15514/ISPRAS-2022-34(2)-15



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)