Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Developing scalable software infrastructure for data storage and processing for computational biology problems

https://doi.org/10.15514/ISPRAS-2014-26(4)-4

Abstract

This article is an overview of scalable infrastructure for storage and processing of genome data in genetics problems. The overview covers used technologies descriptions, the organization of unified access to genome processing API of different underlying services. The article also covers methods for scalable and cloud computing technologies support. The first service in virtual genome processing laboratory is provided and presented. The service solves transcription factors bindning sites prediction problem. The main principles of service construction are provided. Basic requirements for underlying comptutaion software in virtual laboratory environments are provided. Overview describes the implemented web-service (https://api.ispras.ru/demo/gen) for transcription factors binding site prediction. Provided solution is based on ISPRAS API project as an API gateway and load-balancer; the middle-ware task-manager software for pool of workers support and for communications with Openstack infrastructure; OpenZFS as an intermediate storage with transparent compression support. The described solution is easy to extend with new services fitting the basic requirements.

About the Authors

O. Borisenko
ISP RAS
Russian Federation


A. Laguta
ISP RAS
Russian Federation


D. Turdakov
ISP RAS
Russian Federation


S. Kuznetsov
ISP RAS
Russian Federation


References

1. Zhang, Xiujun, Position Weight Matrices., Encyclopedia of Systems Biology. Springer New York, 2013, 1721-1722.

2. 1000 Genomes project web page — http://www.1000genomes.org/about

3. University of California, Santa Cruz genome project — http://genome.ucsc.edu/

4. Google Genomics web page — https://cloud.google.com/genomics/

5. Ivan V. Kulakovskiy, Yulia A. Medvedeva, Ulf Schaefer, Artem S. Kasianov, Ilya E. Vorontsov, Vladimir B. Bajic and Vsevolod J. Makeev, HOCOMOCO: a comprehensive collection of human transcription factor binding sites models, Nucleic Acids Research 2012. doi: 10.1093/nar/gks1089

6. M. Ahrens, OpenZFS: a Community of Open Source ZFS Developers."AsiaBSDCon 2014, pp. 27-32.

7. NFS project web page — http://nfs.sourceforge.net/

8. Duan, Zhi Ying, and Yi Zhen Cao, The Implementation of Cloud Storage System Based on OpenStack Swift, Applied Mechanics and Materials. Vol. 644. 2014, pp. 2981-2984.

9. Beernaert, L., Matos, M., Vilaça, R., & Oliveira, R., Automatic elasticity in Openstack, In Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management, ACM, p. 2.

10. Zhou, Qing, and Jun S. Liu., Modeling within-motif dependence for transcription factor binding site predictions., Bioinformatics 20.6, 2004, pp. 909-916.

11. Bintu, Lacramioara, et al., Transcriptional regulation by the numbers: applications. Current opinion in genetics & development 15.2, 2005: pp. 125-135.

12. Statistical mechanics of transctiption factors binding sites — http://www.bio-physics.at/wiki/index.php?title=Statistical_Mechanics_of_Binding

13. FASTA format description — http://genetics.bwh.harvard.edu/pph/FASTA.html

14. Zhang, Xiujun, Position Weight Matrices., Encyclopedia of Systems Biology. Springer New York, 2013, 1721-1722.


Review

For citations:


Borisenko O., Laguta A., Turdakov D., Kuznetsov S. Developing scalable software infrastructure for data storage and processing for computational biology problems. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014;26(4):45-54. (In Russ.) https://doi.org/10.15514/ISPRAS-2014-26(4)-4



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)