Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 32, No 5 (2020)
View or download the full issue PDF (Russian)
7-20
Abstract

Verification tools can produce various kinds of results while checking programs against requirement specifications. Experts, who seek for errors and estimate completeness of verification, mostly appreciate verdicts, violation witnesses and code coverage reports. They need convenient tools for automating the assessment of verification results to apply verification tools in practice when many program configurations and versions are checked against various requirements. In this paper, we propose new methods for expert evaluation of verification results, covering all those problems that are most significant in accordance with our experience in verifying large programs for compliance with a large number of requirements specifications. Some ideas are borrowed from the areas of testing and static analysis. However, specific methods and technical solutions are unique, since the verification results provided by verification tools are either not found in other areas or have special semantics. The paper presents our approaches and their implementation in the Klever software verification framework.

21-34
Abstract

The paper describes an extension to summary based static program analysis to find deadlock errors. Summary based analysis is a popular approach aimed at the detection of bugs in programs due to its high performance and scalability. At the same time, the implementation of deadlock detectors in such an analysis is nontrivial, because there is no information about the locks held higher in the call stack during the process of function intraprocedural analysis. A lock graph, which is built during the main analysis, is used to model the semantics of multithreaded programs. Lock graph is a modification of call graph which contains additional information about held locks. After the lock graph is built, the deadlock detector is launched. Both the construction of the lock graph and the deadlock detection algorithm do not require significant processor time. On the performed measurements, the total analysis time increased by 4%. Based on the results of the analysis of 8 open source projects in C/C++/Java with a total size of more than 14 million lines of code, the proposed algorithm showed a high level of true positives. The described algorithms were implemented in the Svace tool.

35-56
Abstract

Specialized processors programmable in domain-specific languages are increasingly used in modern computing systems. The compiler-in-the-loop approach, based on the joint development of a specialized processor and a compiler, is gaining popularity. At the same time, the traditional tools, like GCC and LLVM, are insufficient for the agile development of optimizing compilers that generate target code of an exotic, irregular architecture with static parallelism of operations. The article proposes methods from the field of program synthesis for the implementation of machine-dependent compilation phases. The phases are based on a reduction to SMT problem which allows to get rid of heuristic and approximate approaches, that requires complex software implementation of a compiler. In particular, a synthesis of machine-dependent optimization rules, instruction selection and instruction scheduling combined with register allocation are implemented with help of SMT solver. Practical applications of the developed methods and algorithms are illustrated by the example of a compiler for a specialized processor with an instruction set that accelerates the implementation of lightweight cryptography algorithms in the Internet of Things. The results of compilation and simulation of 8 cryptographic primitives for 3 variants of specialized processor (CISC-like, VLIW-like and a variant with delayed load instruction) show the vitality of the proposed approach.

57-66
Abstract

This article describes the family of emulators for IBM mainframe architectures, their development history, functional features and capability, as well as the experience of many years (since 1994) of emulators development and their implementation area. There was sold the relatively simple task (for modern standards) of creating a virtual machine in the VSE/ESA operating system for transferring legacy platform-dependent applications to this target environment. The problem was solved at first for EU computers in Russia, and then for IBM 9221 in Germany and in the other western countries. The transfer was made to the OS/390 environment, and to IBM AIX, quite modern at that time. The virtual execution of any existing IBM mainframe operating systems in the main server OS environments: Linux, Windows, AIX, Z/OS, ZLinux had been provided. There was developed the solution for combining any types of formed virtual computing nodes into heterogeneous geographically distributed computing networks that provide, in particular, multiple mutual redundancy of nodes in the network.

67-80
Abstract

Many modern RDBMS provide procedural extensions for SQL programming language, which allow users to perform server-side complex computations. Use of such extensions improves modularity and code reuse, simplifies programming of application logic, and helps developers to avoid network overhead and enhance performance. Interpretation is mostly used to execute SQL queries and procedural extensions code, resulting in significant computational overhead because of indirect function calls and performing of generic checks. Moreover, most RDBMS use different engines for SQL queries execution and procedural extensions code execution, and it is necessary to perform additional computations to switch between different engines. Thus, interpretation of SQL queries combined with interpretation of procedural extensions code may drastically degrade performance of RDBMS. One solution is to use a dynamic compilation technique. In this paper, we describe the technique of dynamic compilation of PL/pgSQL procedural language for the PostgreSQL database system using LLVM compiler infrastructure. Dynamic compiler of PL/pgSQL procedural language is developed as part of PostgreSQL queries dynamic compiler. Proposed technique helps to get rid of computational overhead caused by interpretation usage. Synthetic performance tests show that the developed solution speeds up SQL queries execution by several times.

81-94
Abstract

The paper deals with the construction and practical implementation of the model of computer attack detection based on machine learning methods. Among available public datasets one of the most relevant was chosen - CICIDS2017. For this dataset, the procedures of data preprocessing and sampling were developed in detail. In order to reduce computation time, the only class of computer attacks (brute force, XSS, SQL injection) was left in the training set. The procedure of feature space construction is described sequentially, which allowed to significantly reduce its dimensions - from 85 to 10 most important features. The quality assessment of ten most common machine learning models on the obtained pre-processed dataset was made. Among the models (algorithms) that demonstrated the best results (k-nearest neighbors, decision tree, random forest, AdaBoost, logistic regression), taking into account the minimum time of execution, the choice of random forest model was justified. А quasi-optimal selection of hyper parameters was carried out, which made it possible to improve the quality of the model in comparison with the previously published research results. The synthesized model of attack detection was tested on real network traffic. The model has shown its validity only under the condition of training on data collected in a specific network, since important features depend on the physical structure of the network and the settings of the equipment used. The conclusion was made that it is possible to use machine learning methods to detect computer attacks taking into account these limitations.

95-110
Abstract

The article presents an approach to electronic documents printed marking by implementing a virtual XPS printer in Windows operating systems. The developed approach allows marking electronic documents during printing, regardless of the document presentation format and requirements for the printing process. During the marking approach development and implementation, a comparative analysis of technical solutions in the field of marking electronic documents was carried out, advantages and disadvantages were determined. Requirements and limitations imposed on the marking approach are defined. The virtual printer technology choice for the marking documents implementation in the printing process is substantiated. In the course of the marking approach implementing based on virtual printer technology, the structure of the organization and interaction of the marking process with the components of the print service of Windows family operating systems is given. The architecture of a virtual XPS printer driver has been developed. The process of practical implementation of the marker embedding into an electronic document using the developed virtual printer is described. In the process of the marking approach practical implementation, a interaction features description of the developed print filter with the printing subsystem, the parameters of metadata processing and the organization features of the marking server multithreaded implementation is presented. The implementation features of the developed marking approach in individual operating systems of the Windows family are considered. Limitations and assumptions are determined for each of the considered operating systems. Marking process requirements and further research directions are formulated.

111-120
Abstract

One of the main tasks in the analysis of single cell RNA sequencing (scRNA-seq) data is the identification of cell types and subtypes, which is usually based on some method of clustering. There is a number of generally accepted approaches to solving the clustering problem, one of which is implemented in the Seurat package. In addition, the quality of clustering is influenced by the use of preprocessing algorithms, such as imputation, dimensionality reduction, feature selection, etc. In the article, the HDBSCAN hierarchical clustering method is used to cluster scRNA-seq data. For a more complete comparison Experiments and comparisons were made on two labeled datasets: Zeisel (3005 cells) and Romanov (2881 cells). To compare the quality of clustering, two external metrics were used: Adjusted Rand index and V-measure. The experiments demonstrated a higher quality of clustering by the HDBSCAN method on the Zeisel dataset and a poorer quality on the Romanov dataset.

121-130
Abstract

Method for conducting a non-invasive screening of the population for carbohydrate metabolism disorders (CMD) has been developed and described. The novelty of the method is that there are no medical standards in the field of endocrinology for the non-invasive type of screening, so the method was based on the results of a clinical study of electrocardiographic abnormalities in patients with CMD, where the method of non-invasive determination of CMD by first-lead ECG was used. During the development of the method, an additional analysis of the ECG sample obtained during the study was performed. As a result of the analysis, it was concluded that the effectiveness of the method (sensitivity and specificity) vary slightly depending on the time of taking an ECG during the day. This means that the patient can come to the screening using the new method of non-invasive detection of CMD not only in the morning and not necessarily on an empty stomach, in contrast to the invasive methods (fasting plasma glucose test and oral glucose tolerance test). To make a decision «there is a suspicion of CMD /there is no suspicion of CMD», the patient only needs to take up to 2 ECGs.

131-142
Abstract

Monitoring of industrial cyber-physical systems (CPS) is an ongoing process necessary to ensure their security. The effectiveness of information security monitoring depends on the quality and speed of collection, processing, and analyzing of heterogeneous CPS data. Today, there are many methods of analysis for solving security problems of distributed industrial CPS. These methods have different requirements for the input data characteristics, but there are common features in them due to the subject area. The work is devoted to preliminary data processing for the security monitoring of industrial CPS in modern conditions. The general architecture defines the use of aggregation and normalization methods for data preprocessing. The work includes the issue from the requirements for the preprocessing system, the specifics of the subject area, to the general architecture and specific methods of multidimensional data aggregation.

143-152
Abstract

The Kurchatov Complex of NBICS Nature-Like Technologies is focused on interdisciplinary research and development in the field of nano-, bio-, information, cognitive, socio-humanitarian sciences and technologies. The experimental basis of the NBICS complex is the Resource Centers operating in the mode of collective use by various scientific laboratories and containing modern equipment for conducting a wide range of scientific experiments. The processing and storage of the obtained experimental data is carried out on the supercomputer of the Computing Center, the use of which is also collective. Thus, there are problems of data exchange between different buildings, organizing their processing, analysis and orderly storage, as well as combining heterogeneous experimental data to obtain scientific results of a higher level. To solve these issues on the basis of the distributed modular platform «Digital Laboratory», an information and analytical environment was organized as a system that combines the scientific equipment of the Resource Centers, the supercomputer of the Computing Center, virtual machines and personal computers of scientific laboratories into a single virtual space, while organizing the exchange of data between various buildings, their processing, analysis and storage. The work with the system is carried out through the user web interface. At the request of researchers, each procedure for working with experimental data of a given type is implemented as an autonomous module of the «Digital Laboratory» platform. For example, the Module «Neuroimaging» for processing and analysis of fMRI / MRI experimental data of the human brain obtained on the tomograph of the Resource Center was put into operation and is successfully functioning. The use of this module makes the task of fMRI / MRI data analysis as simple as possible for the user, and also makes it possible to speed up the data processing many times over by parallelizing the computations on the supercomputer nodes. In addition to creating modules for working with experimental data, the system provides the ability to create modules for working with data of a different type. An example is the Module «Project Activity» for analyzing the effectiveness of scientific activities of research laboratories. The use of this system allows to optimize the work with experimental data in the course of scientific research due to the possibility of software implementation of the necessary procedures for their transfer, storage, processing and analysis.

153-166
Abstract

In the paper, we analyze problems of reliability and security in the world practice and in Russia. We consider aspects of modeling software/hardware systems from service resources and ready-made reuses with ensuring reliability and security. We present the formed basic and theoretical foundations of the modeling problem, the experience of using modern service tools SOA, SCA, SOAP in software/hardware systems and Web systems to ensure their reliability and security on the Internet. We note that software/hardware systems and Web systems are created by the assembly build method in modern environments: IBM WSDK + WebSphere, Apache Axis + Tomcat; Microsoft .Net + IIS, etc. Verification and testing of systems should be conducted for searching of errors that occur in exceptional cases (cyber-attacks, forbidden access to the database, etc.). We describe methods for analyzing such situations and applying reliability and security methods to ensure stable and trouble-free operation of software/hardware systems service components in the Internet information environment.

167-180
Abstract

The registration of a 3D model over an image can be seen as the alignment of visual correspondences extracted from these two data. This is a challenging task and it is even more complex when the two images have a different modality. This paper introduces an approach that allows matching features detected in two different modalities: photographs and 3D models, by using a common 2D representation. Our approach is based on a modifcation of the Marching Cubes algorithm aiming to remove ambiguous cases without adding further calculations in each cube. We share the idea about the crucial importance of splitting the equivalence cases into two classes. Considering all the possible states inside/outside in the four corners of a cube side, indeed, there are only four non-trivial cases after eliminating those equivalences through the rotation. The obtained results allow us to validate the feasibility of the proposed methodology.

181-198
Abstract

There are presented the results of numerical simulation of an applied acoustic problem – modeling of gas processes occurring in the measuring chamber of the infrasound pistonphone 3202 at different frequencies of piston oscillation (0.1 – 1000 Hz) and characterized by extremely small Mach numbers
(9.1·10-7÷9.1·10-3). The simulation was performed using quasi-gas-dynamic (QGD) and quasi-hydrodynamic (QHD) equations of a viscous compressible heat-conducting gas with the use of a time-explicit difference scheme, all spatial derivatives was approximated by central differences. It is shown that QGD and QHD models can be used for a simulation of applied acoustics and, in particular, to the simulation of infrasonic pistonphone: the stability limits of the QGD and QHD algorithms in this problem were determined, the dependence of sound pressure on the tuning parameter α is investigated and it is shown that this dependence is quite small. The spectra of sound pressure at the control point calculated by QGD and QHD are given, their dependence on the tuning parameter α is shown, both models equally predict the value of the sound pressure amplitude at the fundamental frequency oscillations. At the end of the article, the sound pressure at the control point at the fundamental frequency oscillations obtained by using QGD and QHD is compared with the values calculated by using semi-empirical formula of sound pressure at closed volume for a case of small oscillations using the polytropic index obtained by Henry Gerber instead of the adiabatic coefficient.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)