Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Automated Error Detection and Analysis in Hyperconverged Systems

https://doi.org/10.15514/ISPRAS-2019-31(4)-2

Abstract

The paper is devoted to the problem of early error detection and analysis in hyperconverged systems. One approach to organizing hyperconverged systems is to install on each physical server a separate instance of an operating system (OS) that carries virtualization tools and tools for administering and using a distributed data warehouse. Errors can occur both at the level of a single OS instance and at the level of the entire cluster. For example, incorrect control element commands from one infrastructure node can cause software failure on another node. In addition, errors from the subsystems of the cluster can provoke abnormal situations inside virtual machines. The complexity of the architecture of hyperconverged systems makes it difficult to analyze the errors that occur in them. To simplify such an analysis and increase its effectiveness, it is necessary to automate the process of detecting problems and collecting data necessary for their study and correction. Existing approaches for automation of error detection are described and various improvements are suggested to adopt them for systems where distributed storage and virtualization technologies are actively used. Improvements include log collection from the whole cluster just after the error occurred, additional analysis of guest operating system behaviour inside virtual machines, usage of a knowledge base for automated crash recovery and duplicate detection. Finally, a real-life scenario of error handling process in Virtuozzo company products is described starting from error detection and ending with fix deployment.

About the Author

Denis Vladimirovich Silakov
Virtuozzo
Russian Federation
Ph.D. in Physical and Mathematical Sciences, Senior system architect


References

1. Doleželová M., Muehlfeld M. et al. Automatic Bug Reporting Tool (ABRT). Deployment, Configuration, and Administration of Red Hat Enterprise Linux 7. Chapter 25 (online). Available at: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-abrt.

2. Apport. Ubuntu Wiki (online). Доступно по ссылке: https://wiki.ubuntu.com/Apport

3. How to set process core file names. Red Hat Customer Portal (online). Available at: https://access.redhat.com/solutions/901293

4. Силаков Д.В. Открытое решение Graylog. Cбор и анализ событий в сетях промышленных масштабов. Системный администратор, № 3, 2019г., стр. 24-29 / Silakov D.V. Open Graylog Solution. Collection and analysis of events in networks of industrial scale. System Administrator, № 3, 2019, pp. 24-29 (in Russian).

5. Du Min, Li Feifei, Zheng Guineng, and Srikumar Vivek. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proc. of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 1285-1298.

6. P. Garcia-Teodoro, J. Diaz-Verdejo, G. Macia-Fernandez, and E. Vazquez. Anomaly-based network intrusion detection: Techniques, systems and challenges. Computers & Security, vol. 28, issues 1-2, 2009, pp. 18–28.

7. K.K. Sabor, A. Hamou-Lhadj, and A. Larsson. DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports. In Proc. of the 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), 2017, pp. 240-250.

8. R.P. Gopalan and A. Krishna. Duplicate Bug Report Detection Using Clustering. In Proc. of the 2014 23rd Australian Software Engineering Conference, 2014, pp. 104-109.


Review

For citations:


Silakov D.V. Automated Error Detection and Analysis in Hyperconverged Systems. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2019;31(4):29-38. (In Russ.) https://doi.org/10.15514/ISPRAS-2019-31(4)-2



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)