Large-scale scientific data and long-term data storage function in a computing center
https://doi.org/10.15514/ISPRAS-2022-34(4)-9
Abstract
Long-term data storing is an important task for many modern scientific laboratories and datacenters. In order to reduce cost of digital information ownership, some solutions use magnetic tape technology and special software to control medium and data. Considering the on-site infrastructure specifics and well-established workflows of data processing, these organizations build and support such systems mainly by their own efforts, what becomes an important task in seeking to acquire the technological sovereignty. This paper describes long-term data storage issues in the computing center of the Zababakhin All-Russia Research Institute of Technical Physics where mathematical modeling computations generate vast amount of scientific data. The architecture and functional composition of the developed Archive Data Storage System are given as well as its internal data model, the chunk grouping rules, and the low-level tape format used. The measures taken to ensure an archived data consistency, methods of storage media management and issues of archival fund maintenance, are also considered. The calculation scheme of a typical archive system site’s hardware configuration, sufficient to process archiving data flows existing in datacenter, is given.
About the Author
Dmitry Vladimirovich IVANKOVRussian Federation
Head of the laboratory
References
1. . «SPO Super-EVM», Available at: http://vniitf.ru/article/spo-super-evm, accessed 28.08.2022 (in Russian).
2. . IBM HPSS, Available at: https://www.hpss-collaboration.org, accessed 28.08.2022.
3. . Enstore, Available at: https://www-stken.fnal.gov/enstore, accessed 28.08.2022.
4. . CERN Tape Archive, Available at: https://cta.web.cern.ch/cta, accessed 28.08.2022.
5. . Tape control program, Available at: https://github.com/iustin/mt-st, accessed 28.08.2022.
6. . Single or multi-drive SCSI media changer program, Available at: https://sourceforge.net/projects/mtx, accessed 28.08.2022.
7. . The Single UNIX Specification Version 3, Available at: https://unix.org/version3, accessed 28.08.2022.
8. . ANSI X3.27-1978, Available at: https://nulpubs.nist.gov/nistpub/Legacy/FIPS/fipspub79.pdf, accessed 28.08.2022.
9. . Apache Hadoop, Available at: https://hadoop.apache.org, accessed 28.08.2022.
Review
For citations:
IVANKOV D.V. Large-scale scientific data and long-term data storage function in a computing center. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(4):117-134. (In Russ.) https://doi.org/10.15514/ISPRAS-2022-34(4)-9