Development of automated computer vision methods for cell counting and endometrial gland detection for medical images processing

. Current work is focused on the processing of medical images obtained by performing a pathomorphological analysis of preparation. The algorithms for processing images of nuclei of light and confocal microscopy and tissue of light microscopy were considered in particular. The application of the proposed algorithms and software for detecting pathologies was justified.


Introduction
The processing of medical images is an extremely important issue for biology and medicine. Pathomorphologists have to process hundreds of images of preparations per day. Their work can be automated due to computer analysis. Medical image processing can be performed in semi-automatic and automatic modes. Semiautomatic mode is based on manual adjustment of simple, intuitive parameters for evaluating single microphotographs. Automatic mode otherwise does not require both direct operator intervention and initial settings in the processing of preparations. Modern experts in the field of pathomorphology have an access to a wide range of technologies that make it possible to carry out various measurements depending to required tasks. A striking example of such technologies is the universal ImageJ software [1], which is capable for performing operations aimed at evaluating the geometry and color gamut of the resulting images. The main idea of this software is to write macroses that require a minimum understanding of computer technology from a specialist. This approach makes ImageJ flexible, but not user friendly. More intuitive tools are commercial software such as VideoTestMorphology [2,3] and ImageProPlus [4]. There are various microcopy solutions. The most popular software at the moment are compared in Table 1. Thus, there are a lot of modern software, designed for morphometry. However, there are a number of facts that severely limit the domain specialist in choosing his own tools. Firstly, most of the free automatic and semi-automatic programs are incomplete and require bioinformatics in the team. Secondly, the already collected, semi-automatic programs in most cases are universal, which means the absence in the process of taking into account the parameters of the microscope, the type of tissue being examined and the lighting of a particular picture. These parameters must be driven manually, relying on the empirical experience of the specialist responsible for setting up the program. Thirdly, the most convenient and user-friendly programs have a high cost. Therefore, they are not available in small laboratories. The aim of this work is the development of software that partially eliminate the shortcomings of modern non-commercial software for processing digital images with automatic objects recognition by series of images and extracting from them the minimum set of basic features necessary for researchers to work with. Typical operations for this task are initialization, localization, segmentation, shape analysis, modeling, analysis of cell parameters, etc. [9]. Although there are many methods of segmentation, precise segmentation is a complex task, and it plays a significant role in biological imaging studies

Processing of cell and tissue preparations
Processing of preparations is carried out both at the cellular and tissue levels. In both cases, the cells are usually tinted with the help of their special reaction to the examined «marker». In this study, nuclear markers such as estrogen receptors (ER) and progesterone (PR) were detected. A quantitative analysis of nuclei with receptors for ER and PR is essential, since they are involved in the mechanisms of growth and metastasis of tumors. The research of the expression of ER and PR is included in the standard of examining patients with breast cancer, as it allows us to determine the sensitivity of the tumor to hormone therapy and to clarify the prognosis of the disease. The research of ER and PR is also used in the diagnosis of infertility, endometrial hyperplasia [10]. Pathomorphologist needs to calculate the number of cell nuclei highlighted in color on the preparation, which correspond to the expression of the researched markers, as well as the total number of nuclei per unit area. Another assignment of the pathomorphologist is to isolate the contours of glands and tissues on the preparation and determine the number of glands with high total marker expression. 122 According to statistical data, such as the number of nuclei and glands with and without marker expression, conclusions about the structure of the tissue are made, the effectiveness of treatment is considered, a diagnosis and the prognosis of the disease are specified.

Light and confocal microscopy
Medical preparations can be obtained in various ways. Preparations obtained using light and confocal microscopes are studied in this work. Confocal microscopy has been used relatively recently and, compared to the light one, gives more contrasting color images, with staining of the nuclei with different fluorescent dyes with antibodies of the corresponding markers. However, confocal microscopy requires special equipment and the quality of the preparation, which makes it impossible for mass application. Therefore, for the diagnosis of diseases light microscopy is still used in most cases [11].

Popular algorithms
Medical imaging algorithms typically consist of the steps described in this section below.

Image pre-processing
It is used to create conditions that increase the efficiency and quality of the isolation and recognition of nuclei in medical preparations. It includes morphological operators and filters, border detectors, filters with brightness normalization [12].

Detection of objects of interest
At this stage, the X and Y coordinates of the proposed center of each object of interest (nucleus) are determined. As a result of the stage, a set of objects of interest, which are probably nuclei is obtained.
The algorithm parameters at this step are set in the particular way in order to create the redundant number of objects of interest. In other words, the detection of false objects is acceptable, but the admission of real cores is not. Basically, the following algorithms are used: active contour algorithm, watershed algorithm, image segmentation by known classes, segmentation with preliminary detection of class boundaries [12].

Selection of characteristics of objects of interest, classification and arrival at a decision
The final stage allows to attribute each of the objects of interest to one of the target classes. For the task of classifying nuclei using fluorescence and light microscopy preparations, the topological, texture, and color intensity-based characteristics of following classes are distinguished. At the classification stage, objects of interest classes are determined by markers used for coloring the nuclei and include: • A nucleus not highlighted with a marker; • Background (stroma); • A core highlighted with marker; • Several nuclei.
The case with several nuclei should be considered separately (see subsection 3.4).
The method for determining the intensity and clustering of the color histogram allows you to automatically determine the number of markers and their colors. This saves the precious time of pathomorphologists.
The following classifiers are used for pathomorphological analysis: the support vector method, Bayes classifier, Haar cascade, convolutional neural network [13].

Methods for the separation of overlapping nuclei
A specific task is the separation of overlapping nuclei. This can be caused by cell division, the camera's viewing angle in the process of shooting the preparation, and also, the location of the nuclei on top of each other in the depth of the examined tissue. The following approaches can be used to separate the fused nuclei and accurately determine their number: • the method of active contours with the preliminary use of erosion; • classifiers (convolutional neural network), previously trained in classes that determine the number of cells in the area of a given size; • watersheds algorithms; • segmentation algorithms focused on topological features of objects [14].

Image dataset
Many images of cell structures and tissues preparations of light and confocal microscopy were collected and labeled (see Table 2). Material and equipment for shooting images was provided by the Institute of Obstetrics, Gynecology and Reproductology Ott. The shooting of individual classes of images was carried out with a fixed scale of the microscope. Image preparations of various types of tissues with different lighting conditions and marker colors were collected and labelled. To accelerate nuclei and glands labelling on images, a software was developed that allows a specialist to set a marker on an object of interest in the image. Subsequently, the coordinates of the centers of these markers (x, y), as well as the length (SizeY) and width (SizeX) were recorded in a csv file (see Table 3). Thus, a numerical data of the location and shape of the investigated structures were obtained. The markup was carried out by an employee from the laboratory by a cell biologist at the Ott Research Institute.

Formulation of the problem
The aim of the work is the research and development of the following algorithms: • the cell nuclei number estimation with and without researched marker expression on light microscopy image preparations; • the cell nuclei number estimation with and without researched marker expression on confocal microscopy image preparations; • highlighting the internal and external glands borders in the confocal microscopy image preparations. For all algorithms, the following requirements are established: • work without an operator; • resistance to changes in the brightness of the preparation; • resistance to various colors of markers; • image scale is an input parameter of the algorithms.

Suggested algorithms
The algorithms were developed in the PyCharm environment in Python 3.7 using the OpenCVpython 4.0.0.21 library. The source code of developed algorithms is available on github.com [15].

Counting the number of cell nuclei in confocal microscopy images
• Enter the scaling parameter 100 -the number of pixels per 100 nanometers; • Read color image in RGB format, depth 8 bits per channel; • Bring the image to a scale of 1.5 nanometers per pixel; • Convert to HSV format, write the component to the variable ; • Apply the contrast limited adaptive histogram equalization method with clipLimit = 2 and titleGridSize = 8 on image ; • Perform erosion on image with an ellipse core of size 3; • Calculate mean as the average value of pixels ; • For each pixel : if > + 20, assign = min ( + 100, 255) if < − 20, assign = max ( − 100, 0); • Apply a median filter with a core of size 5 to the image ; • Perform threshold binarization of image with a threshold 127. Write the result to variable ; • Perform a contour search on image , leaving contours that do not have nested paths; Calculate the centers of mass ( , ) for each contour using formulas (1): Leave only those contours for which − < 1; • The number of contours received will be the total number of nuclei.

Detection of the internal contours of the glands in confocal microscopy images
• Read color image in RGB format, depth 8 bits per channel; • Bring the image to a scale of 1.5 nanometers per pixel; • Convert image to HSV format, write the component to the variable ; • Use the contrast limited adaptive histogram equalization method with parameters = 2 and = 8 on image ; • Perform threshold binarization on component of HSV with a threshold of 127; • Perform 27 erosion steps on the image with the ellipse core of size 3; • Perform a contour detection on image , leaving paths that do not have nested paths. Write the result to the contours variable; • For each contour: calculate the area, count the number of pixels falling into this contour, taking into account the exceeding 15, and write to the variable nonZeroPixelsArea; o suppose that the contour is the inner border of the gland if the nonzero pixels of the region exceed the product of the contour of the region 0.4; • Recognized glands boundaries will be in the contours list an on the output image. o leave only those contours for which − < 1;

Counting the number of cell nuclei in light microscopy images
The task of scaling the image to 1.5 nanometers per pixel was solved by manual measuring the length of the scale bar on the image in pixels. However, we plan to develop automated recognition of the scale bar's length on images The coefficients of the algorithms were selected by optimizing the criteria by the simplex method and the gradient descent method when testing on the analyzed preparations. The criteria are given in the section 7.

The algorithm for counting the number of cell nuclei in confocal microscopy images
The percent of successful recognition in the researched images was 84%. An example of image recognition is presented in Fig.1.

Highlighting of the internal contours of the glands on confocal microscopy images.
The percentage of successful recognition in the researched images was 70%. An example of image recognition is shown in Fig. 2.

The estimation of the number of cell nuclei in light microscopy images
The percentage of successful recognition was 90% in the researched images. An example of image recognition is presented in Fig. 3.
For the task of counting nuclei on light microscopy preparations, the percentage of successful recognition was analyzed by the formula (2) similarly to the percentage of recognition for confocal microscopy.

Comparison with existed software
The purpose of last stage of this work was to compare the obtained data with the results of other applications used to evaluate microphotographs. Since most of the software intended for cytological studies are either expensive or require a long study of the manual and programming languages, it was decided to compare the performance of the created software with FiJi -the ImageJ plugin for evaluating microscopy images, which is the most balanced among its plugins. The initial task was to estimate the number of cells in 30 microphotographs in the jpg format. The comparison results between proposed algorithm and Fiji for confocal imagery task is presented in Table 4. For the task of counting nuclei on light microscopy preparations, the percentage of successful recognition was analyzed by the formula (2) similarly to the percentage of recognition for confocal microscopy preparations. Thus, the developed algorithm exceeds the FiJi accuracy by 7.8%. It should be noted that in order to achieve maximum accuracy in FiJi, the threshold parameter is required and the estimated radius of the object of interest should be introduced as the lower limit, whereas in the written code the determination of the size of objects takes place automatically, which excludes the element of subjectivity from the study and the necessity for preliminary processing of photographs. For example, to evaluate the image, the created algorithm does not require preliminary removal of the scale bar from the image. Also, for the various color markers expression in the nuclei classification task using FiJi, it is necessary to set the Color Threshold value for each type of marker in each photo separately, which not only significantly increases the time of analyze carried out by the pathomorphologist, but also greatly reduces the quality of this analysis, since nuclei having weak expression are most likely will not be included in the corresponding group. Unlike Fiji, the created algorithm equally effectively copes with the task of counting the total number of cores as well as with the task of classifying them.

Further research
In the future, it is planned to continue research in this area, improving the reliability of the algorithms, particularly: use CNN and U-net networks for better segmentation of core images [16,17]; detect the contours of the scale bar for automatic scaling of the drug; automatically recognize marker colors (support more than two colors).