Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

The Reliability Model of a Distributed Data Storage in Case of Explicit and Latent Disk Faults

https://doi.org/10.15514/ISPRAS-2015-27(6)-16

Abstract

This work examines the approach to the estimation of the data storage reliability that accounts for both explicit disk faults and latent bit errors as well as procedures to detect them. A new analytical math model of the failure and recovery events in the distributed data storage is proposed to calculate reliability. The model describes dynamics of the data loss and recovery based on Markov chains corresponding to the different schemes of redundant encoding. Advantages of the developed model as compared to classical models for traditional RAIDs are covered. Influence of latent HDD errors is considered, while other bit faults occurring in the other hardware components of the machine are omitted. Reliability is estimated according to new analytical formulas for calculation of the mean time to failure, at which data loss exceeds the recoverability threshold defined by the redundant encoding parameters. New analytical dependencies between the storage average lifetime until the data loss and the mean time for complete verification of the storage data are given.

About the Authors

L. . Ivanichkina
OOO Proekt IKS
Russian Federation


A. . Neporada
OOO Acronis
Russian Federation


References

1. Patterson D. A., Gibson G., and Katz R. H. A Case for Redundant Arrays of Inexpensive Disks (RAID), Proc. of ACM SIGMOD, 1988.

2. Reibman A. and Trivedi K. S. A. Transient Analysis of Cumulative Measures of Markov Model Behavior. Communications in Statistics-Stochastic Models, 1989, vol. 5, pp. 683-710.

3. Schultz M., Gibson G., Katz R., and Patterson D. How Reliable is a RAID? Proceedings of CompCon, 1989, pp. 118-123.

4. Malhotra M. and Trivedi K. S. Reliability Analysis of Redundant Arrays of Inexpensive Disks. Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems, 1993, vol.17, - no. 1-2., pp. 146-151.

5. Greenan K. M., Plank J. S. and Wylie J. J. Mean Time To Meaningless: MTTDL, Markov models, and Storage System Reliability, Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems, 2010, pp. 1-5.

6. Karmakar P. and Gopinath K. Are Markov Models Effective for Storage Reliability Modelling? arXiv:1503.07931v1, 2015.

7. Li Y., Lee P. P. and Lui J. Stochastic analysis on raid reliability for solid-state drives, IEEE 32nd International Symposium on Reliable Distributed Systems (SRDS). IEEE, 2013, pp. 71-80. (http://arxiv.org/pdf/1304.1863.pdf)

8. Mann S. E., Anderson M. and Rychlik M. On the Reliability of RAID Systems: An Argument for More Check Drives, arXiv:1202.4423v1, 2012.

9. Pâris J.-F., Schwarz T., Amer A. and Long D. D. E. Improving Disk Array Reliability Through Expedited Scrubbing, Proceedings of the 5th IEEE International Conference on Networking, Architecture, and Storage, 2010, pp. 119-125.

10. Xin Q., Miller E. L., Schwarz T. J., Long D. D. E., Brandt S. A. and Litwin W.Reliability mechanisms for very large storage systems, Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003, pp. 146-156.

11. Elerath J.G. and Pecht M. Enhanced Reliability Modeling of RAID Storage Systems, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007, pp. 175-184.

12. Ivanichkina L. and Neporada A. Mathematical methods and models of improving data storage reliability including those based on finite field theory, Contemporary Engineering Sciences, 2014, vol. 7, no. 28, 1589-1602 http://dx.doi.org/10.12988/ces.2014.411236.


Review

For citations:


Ivanichkina L., Neporada A. The Reliability Model of a Distributed Data Storage in Case of Explicit and Latent Disk Faults. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2015;27(6):253-274. (In Russ.) https://doi.org/10.15514/ISPRAS-2015-27(6)-16



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)