Hard drives monitoring automation approach for Kubernetes container orchestration system

Today, a laborious and non-trivial task is to automate monitoring of hard drives in a cluster infrastructure using the Kubernetes container management system. The paper discusses existing approaches to monitoring hard drives in the Kubernetes container orchestration system and provides a comparative analysis of them. Based on the presented analysis, a conclusion is drawn on the need to improve and automate approaches. The paper proposes an approach to automating the collection of metrics from hard drives by implementing the Kubernetes “operator” for a tool with which you can effectively obtain information about the state of hard drives in the system. As results, the temporal characteristics of collecting information about disks using existing approaches and the proposed approach are given. Numerical results and graphs showing the gain of the proposed approach are presented


Introduction
Currently container orchestration systems for the automatic deployment of applications in a cluster infrastructure are gaining popularity. These systems include Kubernetes [1], Docker Swarm [2], Mesos [3]. At the same time a lot of applications need to store their data on hard drives, which sets in turn increasing requirements for the reliability of used hard drives. That's why special attention is paid to monitoring of hard drive state in a cluster infrastructure to prevent their failure and data loss. Many companies such as Dell EMC encounter the problem of disk monitoring. System administrators and engineers who monitor hard drives in a cluster system use different methods and approaches for obtaining disk information, discussed in Section 2. Section 3 provides a theoretical basis and defines container orchestration systems using Kubernetes as an example. Section 4 describes the operator pattern for Kubernetes. Section 5 describes advantages of the proposed approach for automating disk monitoring in a Kubernetes cluster. Section 6 covers the implementation of the proposed approach using the "operator" for disk monitoring tool. Section 7presents the results and temporal characteristics of collecting information about disks using existing approaches and the proposed one. Charts show the time gain while using the proposed approach.

Approaches for evaluation of the hard drives statuses
One way to get the state of hard drive is to use utilities built into the Linux operating system such as lsblk, fdisk, etc. lsblk utility [4] displays information about all available or specified device blocks. It reads information from the sysfs and udev db file systems. The problem with this approach is that the lsblk utility does not provide information about the current state and availability of the disk. Another disk health evaluation utility is smartctl. This Linux utility allows to retrieve S.M.A.R.T. disk information. S.M.A.R.T. is a technology which allows to analyze and predict the state of hard drives [5]. S.M.A.R.T. monitors the main characteristics of the drive. Each receives estimates and then recounts them into numbers. Depending on the reference value, the state of the disk can be estimated from the result. There are over 150 S.M.A.R.T. indicators with their own reference values, so manual analysis of such information is quite time-consuming for a person. To reduce the complexity of the analysis S.M.A.R.T. indicators, neural networks can be used, which is described in [6,7]. There is also an intelligent platform management interface (IPMI) designed for remote monitoring and control of a computer system [8]. It is possibly to connect remotely to the server and manage its operation using IPMI. The IPMI specification standardizes an interface. Various companies have the implementations of this interface: IDRAC (DELL), Cisco IMC (Cisco), ILO (HP). This approach is more efficient than using lsblk and smartctl utilities, because specific IPMI implementations (IDRAC, IMC, ILO) provide a graphical interface and a wide range of server functionality in addition to monitoring hard drives. The disadvantage of this method is the lack of free licenses for products that implement the IPMI interface. The last of the considered approaches is the Linux ipmitool command [9], which implements the interface, but its functionality is limited by the ability to provide information about FRU, LAN configuration, sensor readings and remote power management. The hardware also must have a special BMC port in order to use IPMI, so this approach is not universal. Considering the approaches, the automating analysis of the state of hard drive through the implementation of Kubernetes «operator» of the disk monitoring tool is proposed. A more detailed definition of the «operator» of Kubernetes is given in Section 4. The proposed approach is better than others, because of efficient and automatic provision of information on the current state of disks without creating additional load on the Kubernetes cluster.

Container orchestration systems
Containerization is an approach in software development which allows an application or service, its dependencies and configuration (abstract deployment manifest files) to be packaged together into a container image [10]. In fact, this is virtualization at the operating system level. Containers greatly simplify and automate application deployment regardless of environment. In turn, the Docker tool [11] helps simplify the creation and launch of containers. As modern applications include more and more containers and distributed servers require complex management and deployment of applications, there is a need for container orchestration systems. Container orchestration allows to determine how to select, deploy, track and dynamically manage the configuration of multi-container packaged applications [12]. Container orchestration concerns not only the initial deployment of multi-container applications, but also includes management, for example, scaling a multi-container application as a single object. The most popular is the Kubernetes container orchestration system [1].
Kubernetes is the open source software needed to manage and deploy a Docker container cluster. A Kubernetes deployment is known as a cluster. Kubernetes cluster can be imagined as two parts: a management layer, which consists of the main node(s) and worker nodes. Pods consisting of containers are made on working nodes. Each node represents its own Linux environment and can be either a physical or virtual machine.

Pattern «operator»
«Operator» is a Kubernetes plug-in that uses user resources to manage applications and their components [13]. A resource in Kubernetes stores a collection of specific types of API objects. Kubernetes objects are persistent resources in the Kubernetes system. Each resource has HTTP request at a unique address, which is processed by the server with the Kubernetes API and returns information about this resource. Kubernetes uses these entities to represent the state of a cluster. For example, Kubernetes Deployment is an object that can represent an application running in a cluster. While deploying applications, there should be a specification for the final configuration. For example, setting up three replicas of an application makes Kubernetes system reads the specification and starts three instances of the desired application, updating the state according to the configuration. Pods can be example of resources. It is embedded resource and it contains a collection of pod objects. Custom Resource is a Kubernetes API extension that is not necessarily available when Kubernetes is installed by default. «Operator» combines user resources and custom controllers. They allow to monitor the specified resources and maintain their state in accordance with the value the user set. When a corresponding event occurs with a resource, the "operator" reacts and performs a specific action.  Fig. 1 shows the scheme of the disk information gathering automating approach in the Kubernetes cluster and user interaction. Fig. 1. Scheme of the "the disk information gathering automating approach " in the Kubernetes cluster The approach proposes to introduce an «operator» into the Kubernetes container orchestration system to automate monitoring of hard drives on nodes. «Operator» is an application deployed on a Kubernetes cluster. It works with its Kubernetes API and monitors events within the cluster. Its task is to collect information from cluster nodes about hard drives, create disk objects on the cluster and monitor them.

Fig. 2. Command for generating code template
Once objects are created, Kubernetes can provide information about them through a client application. Using the command line utility, the user can access the Kubernetes cluster to obtain information about the specified resources. The use of the «operator» has several advantages.
• «Operator» allows to take advantage of Kubernetes, e.g. work with custom resources.

•
One of the advantages of the "operator" is the ability to process the network events of the Kubernetes cluster (GET, CREATE, DELETE, PATCH, etc.), that is, the programmer decides what should happen to the custom resource during an event.

•
Since a resource is created in the system, this opens the possibility of using the Kubernetes command-line interface. The user can get information about all resources with a single command (Fig. 2).

Kubernetes master node
Operator Worker Node 1 Worker Node 3 Worker Node 2 Command line interface

•
The information that Kubernetes can show about resources also might be regulated. For example, if you need to know the status of disks, then the command line can show only the requested information. This filtering approach reduces human analysis time.

•
Kubernetes provides data in the form of a table, which simplifies the analysis of information by a person.

•
Kubernetes administrators can create volumes, knowing which drives can be used for this. Thus, the use of the «operator» allows not only to automate monitoring of hard drives, but also to reduce the time of analyzing disk information. Working with the Kubernetes API allows to take advantage of Kubernetes described above.  «Operator» is a software tool for Kubernetes, which implements two controllers: monitoring tool controller and disk resource controller. The controller of the monitoring tool deploys a server on each node of the cluster, which collects information about the hard drives of this node on request. The disk resource controller requests this information and sends a request to create a resource in Kubernetes. Kubernetes API server based on the received information creates custom resources (CR) for disks. Disks CR store information about a specific drive from a node, for example, serial number, size, etc.

Technical implementation features
Since all Kubernetes «operators» have the same structure, it is possible to generate common code, filling it with own logic. Usually such tools are used for generation as Operator-SDK [14], Kubebuilder [15], Code-Generator [16] and written. Usually Golang programming language is used for «operator» development The Operator-SDK tool was used in this work. It allows to generate a resource structure template for Kubernetes, in the case of an "operator" for disk monitoring -Disk Custom Resource Definition (CRD) and a resource represented disk monitoring tool on each machine in the Kubernetes cluster. It runs as a gRPC server. To generate code template Operator-sdk has a command ( fig. 3). Controllers have also been created to manage these resources. Each controller has a synchronization function (Reconcile loop). It is called by the Kubernetes system every time during any events happening to a custom resource.
• Third command generate code template for object controller with Reconcile loop.

Fig. 4. Disk structure in Golang
In the operator implementation, the controller of the disk monitoring tool in the Reconcile loop ensures that the gRPC server is running on each node of the cluster. In turn, the Disk CRD controller in the synchronization function creates a connection with the monitoring servers and makes a request for information about the disks. Then using a REST client, the request for creating resources is sent. Based on the information received, Kubernetes creates disk object. If such object already exists, then Kubernetes updates it. The synchronization cycle is called every 2 seconds to keep disk information up to date.
To build docker image the command on fig. 5 can be used.

Fig. 5. Command for building docker image
To deploy «operator» in Kubernetes cluster Helm tool was used [17]. This tool allows to deploy application using one command ( fig. 6). We can develop special manifests (charts) for Helm to represent «operator» application in Kubernetes. These manifests contain all necessary information about the application, so Kubernetes deploy it correctly. For example, we can specify how many replicas of application should be deployed on Kubernetes cluster.

Fig. 6. Command for deploying "operator" in Kubernetes cluster
Thus, Kubernetes has a resource that can be accessed using the command line or an HTTP request. Output is shown in fig. 7.  Table 1 shows the results of assessing the status of the Kubernetes cluster hard drives in three ways:

Results
• using the smartclt utility; • using the IDRAC program (implementation of the ipmi interface); • the proposed approach with the "operator" Kubernetes.
The characteristics of the tested cluster are: • the cluster consists of 4 nodes; • each node has 4 hard drives (system drive: 112 GB SSD, 3 drives: 8 TB HDD) • each node has a Linux operating system deployed. The necessary command was entered on each node in order to obtain information. Measurements were performed three times on the cluster. It was made of the time to enter the necessary tool commands to obtain information about the disks. The time of manual analysis of the output of commands is also measured. The average 3 value of each measurement was taken. In the future, it is planned to automate this process and collect more metrics.  is the time to enter commands into the terminal.
-the total time spent in the tested system. This is the time to enter the command and the time to analyze the logs. We can conclude from table 1 and the graph in fig. 8, the gain of the "operator" approach compared to IDRAC was 8 seconds, and for smartctl utility the total analysis time decreased by 2 times.

Conclusion
Thus an "operator" approach was developed for Kubernetes, which provides information about hard drives in a cluster infrastructure. This approach requires 2 times less time than using built-in utilities in the Linux operating system. The proposed approach is suitable for different platforms, as it runs in a Kubernetes cluster, which can be run on different platforms. However, the approach has several disadvantages compared to the existing ones. For example, the smartctl utility and the ipmi IDRAC implementation provide more information about hard drives, while the "operator" provides only status.
In the future, it is planned to expand the list of indicators that can be analyzed through the kubectl utility using the "operator" and to create a graphical interface and scripts to automate command input to reduce the amount of time analyzing disk information.

Information about authors / Информация об авторах
Anastasia Sergeevna SHEMYAKINSKAYA is a 4-th year student of Institute of computer science and technologies of Peter the Great St.Petersburg Polytechnic University. Research interests are container orchestration systems, containerization, cloud computing.
Igor Valerevich NIKIFOROV is a candidate of engineering sciences, associate professor of High school of software engineering, Institute of computer science and technologies of Peter the Great St.Petersburg Polytechnic University. His research interests include big data.