Software model checkers enable automatic detection of violations of specified requirements in programs as well as formal proof of correctness under certain assumptions. These tools actively evolve two last decades. They were already successfully applied to a bunch of industrial projects, first of all to kernels and drivers of various operating systems. This paper considers an interface of software model checkers, their unique capabilities as well as restrictions that prevent their large-scale usage on practice.
High complexity of a modern operating system (OS) requires to use complex models and high-level specification languages to describe even separated aspects of OS functionality, e.g., security functions. Use of such models in conformance verification of modeled OS needs to establish rather complex relation between elements of OS implementation and elements of the model. In this paper we present a method to establish and support such a relation, which can be effectively used in testing and runtime verification/monitoring of OS. The method described was used successfully in testing and monitoring of Linux OS core components on conformance to Event-B models.
A number of tasks related to binary data formats include the tasks of parsing, generating and сonjoint code and data analysis. A key element for all of these tasks is a universal data format model. This paper proposes an approach to modeling binary data formats. The described model is expressive enough to specify the most common data formats. The distinctive feature of the model its flexibility in specifying field locations, as well as the ability to describe external fields, which do not resolve into detailed structure during parsing. Implemented infrastructure allows to create and modify a model using application programming interfaces. An algorithm is proposed for parsing binary data by a model, based on the concept of computability of fields. The paper also presents a domain-specific language for data format specification. The specified formats and potential applications of the model for programmatic analysis of formatted data are indicated.
Dynamic binary analysis, that is often used for full-system analysis, provides the analyst with a sequence of executed instructions and the content of RAM and system registers. This data is hard to process, as it is low-level and demands a deep understanding of studied system and a high-skileed professional to perform the analysis. To simplify the analysis process, it is necessary to bring the input data to a more user-friendly form, i.e. provide high-level information about the system. Such high-level information would be the program execution flow. To recover the flow of execution of a program, it is important to have an understanding of the procedures being called in it. You can get such a representation using the function call stack for a specific thread. Building a call stack without information about the running threads is impossible, since each thread is uniquely associated with one stack, and vice versa. In addition, the very presence of information about flows increases the level of knowledge about the system, allows you to more subtly profile the object of research and conduct a highly focused analysis, applying the principles of selective instrumentation. The virtual machine only provides low-level data, thus, there is a need to develop a method for automatic identification of threads in the system under study, based on the available data. In this paper, the existing approaches to the implementation of obtaining high-level information in full-system analysis are considered and a method is proposed for recovering thread info during full-system emulation with a low degree of OS-dependency. Examples of practical use of this method in the implementation of analysis tools are also given, namely: restoring the call stack, detecting suspicious return operations, and detecting calls to freed memory in the stack. The testing presented in the article shows that the slowdown imposed by the described algorithms allows working with the system under study, and comparison with the reference data confirms the correctness of the results obtained by the algorithms.
The paper describes a static analysis for finding defects and computing metrics for programs written in the Kotlin language. The analysis is implemented in the Svace static analyzer developed at ISP RAS. The paper focuses on the problems we met during implementation, the approaches we used to solve them, and the experimental results for the tool we have built. The analyzer supports not only Kotlin analysis, but is also capable of analyzing mixed projects that use both Java and Kotlin languages. We hope that the paper might be useful to static analysis developers and language designers.
The main task described in this article is automatic fixing defects in C/C++ code found by a static analyzer on big software projects. We describe how we solved this task for Svace static analyzer and discuss main principles of automatic fixing defects of various types. We pay special attention to fixing null pointer dereference since it is the most important and sophisticated defect type among those we’ve supported. Statistics on fixes proposed for defects of this type is also provided. We discuss common limitations and other specificity of our task and explain why we cannot use existing automatic fixing tools for solving it. At the end we outline further steps of development of our tool.
This paper is dedicated to the review of the most popular code smells, which is one of the technical debt components, in addition to methods and instruments for their detection. We conduct a comparative analysis of multiple instruments such as DesigniteJava, PMD, SonarQube. We apply existing tools to set of open-source projects to deduce detection precision and coherence of the selected instruments. We highlight strengths and weaknesses of the approach based on metrics computation and threshold filtering that is used in instruments. Citing of code smells detected by the instruments shows low percentage of true positives (10% for god class and 20% for complex method). We perform literature review of papers suggesting enhancements of the standard approach and alternative approaches that doesn’t involve metrics. To assess the potential of alternative methods we introduce our long method detection prototype with a false positive filtering system based on machine learning methods.
This paper reviews open-source tools for the logical synthesis, place-and-route, static timing analysis and topology generation hardware design stages. The following tools have been described: qFlow, OpenLANE, Coriolis, and SymbiFlow. These tools are aimed to synthesize RTL models into FPGA bitstreams or GDS II physical layouts. A PicoRV32 implementation of RISC-V microprocessor has been used for experimental evaluation of these flows. The results show that open-source flows are capable to produce physical layouts for realistic examples. At the same time, commercial CADs allow generating more effective designs in terms of clock frequency.
Online testing is a process of functional verification of microprocessors produced in silicon or their FPGA-prototypes, i.e. post-silicon verification. This type of testing differs both from the manufacturing testing, aimed at checking the workability of manufactured chips (e.g., absence of physical defects, admissibility of physical characteristics) and from simulation-based pre-silicon functional verification of microprocessors models (where internal microprocessor signals are available for observing, and the execution process can be controlled). Post-silicon verification enables to rapidly run huge numbers of tests and detect bugs missed during pre-silicon functional verification. Tests for microprocessors are usually represented by executable programs. Accordingly, the main tasks of online testing are high-performance generation of test programs in the given ISA and creation of a test environment responsible for launching programs, assessing the correctness of their execution by a microprocessor, diagnosing errors, and interacting with the outside world. This paper examines the problems arising in the development of online testing systems (online test program generators), reviews existing solutions in this area, and, on the base on them, proposes a promising approach to organizing online testing.
Master Data Management (MDM) is an important data management discipline in modern companies. Currently, industrial solutions in this area are actively developing, as evidenced, in particular, by the annual reviews of the consulting company Gartner. However, the problem of effectively customizing standard MDM products for the individual needs of organizations is very acute. The natural solution to this problem is the component architecture of the standard product, as well as its openness. However, in fact, the only industrial open component MDM product today is the Unidata platform, developed by the Russian company Unidata LLC. In this paper, the architecture of this platform is presented, as well as an overview of open component developments in the field of MDM available on the market.
This paper presents a confidential text documents leakage investigation system, focused on leak channels by documents printing and screen photographing. Internal intruders may print confidential document, take paper copy out of protected perimeter, make document image by scanner and perform anonymous leak. Also, intruders may take a photo of printed confidential document or displayed on workstation screen using personal mobile phone. Described leakage channels are weakly covered by traditional DLP systems that are usually used by enterprises for confidential information leak protection. Digital watermark (DWM) embedding is chosen as a document protection mechanism by implying changing of document image visual representation. In case of confidential document anonymous leak embedded DWM would enable the employee to determine what leak intentionally or by security protocol violation. System architecture consists of different type components. Employees’ workstation components provide DWM embedding into documents, which are sent for printing or displayed on screen. Information about watermark embedding is sent to a remote server that aggregates marking facts and provides it to security officer during investigation. Text document marking algorithms are developed, which embed DWM into printed and displayed on screen documents. Screen watermark is embedded into interline space interval, information is encoded by sequence of lightened and darkened spaces. DWM embedding into printed documents is implemented by three algorithms: horizontal and vertical shift based, font fragments brightness changing based. Algorithms testing methodology is developed in view of the production environment, that helped to evaluate the application area of algorithms. Besides, intruder model was formulated, system security was evaluated and determined possible attack vectors.
Artificial neural networks are widely spread in the modern world. Various hardware is used for neural network inference: from CPUs and GPUs to FPGAs and ASICs. An important research area is inference acceleration. Many open-source tools have been proposed in this area. This article contains a review of a range of open-source tools for neural network inference, acceleration and hardware synthesis. Some of the tools have been selected for evaluation on an FPGA. Five neural network examples have been used as test models. Intel CPU, NVIDIA GPU and Cyclone V FPGA have been used as evaluation platforms. Results show that TVM/VTA and LeFlow tools can successfully process neural network models and run them on the FPGA. However, execution results are controversial.
State-of-the-art supervised word sense disambiguation models require large sense-tagged training sets. However, many low-resource languages, including Russian, lack such a large amount of data. To cope with the knowledge acquisition bottleneck in Russian, we first utilized the method based on the concept of monosemous relatives to automatically generate a labelled training collection. We then introduce three weakly supervised models trained on this synthetic data. Our work builds upon the bootstrapping approach: relying on this seed of tagged instances, the ensemble of the classifiers is used to label samples from unannotated corpora. Along with this method, different techniques were exploited to augment the new training examples. We show the simple bootstrapping approach based on the ensemble of weakly supervised models can already produce an improvement over the initial word sense disambiguation models.
In this paper, we investigate the effectiveness of classical approaches of active learning in the problem of segmentation of document images in order to reduce the training sample. A modified approach to the selection of images for marking and subsequent training is presented. The results obtained through active learning are compared to transfer learning using fully labeled data. It also investigates how the subject area of the training set, on which the model is initialized for transfer learning, affects the subsequent additional training of the model.
Aggregating knowledge about drug, disease, and drug reaction entities across a broader range of domains and languages is critical for information extraction (IE) applications. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for biomedical named entity recognition (NER) and multi-label sentence classification tasks. We investigate the role of transfer learning (TL) strategies between two English corpora and a novel annotated corpus of Russian reviews about drug therapy. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level to identify fine-grained subtypes such as drug names, drug indications, and drug reactions. Evaluation results demonstrate that BERT trained on Russian and English raw reviews (5M in total) shows the best transfer capabilities on evaluation of adverse drug reactions on Russian data. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the classification task, our EnRuDR-BERT model achieves the macro F1 score of 70%, gaining 8.64% over the score of a general domain BERT model.
Simplified mechanisms of combustion of hydrocarbon fuels are considered, on their basis the expansion of the capabilities of the OpenFOAM package and physicochemical libraries, applicable for the numerical simulation of processes occurring in methane-air mixtures, is carried out. A modified mechanism of combustion of hydrocarbon fuel is investigated. The choice of this substance is due to the prospects and practical interest in this type of fuel at the present time. Compares the results obtained by using the solvers created at the MAI and ISP RAS. A physical and mathematical model, numerical algorithms and results of calculations of non-stationary physical and chemical processes occurring in methane-air mixtures are presented. A comparison is made of the values of temperature and concentration of the chemical at constant pressure and enthalpy, the ignition time and the level of values of in are estimated when the state of thermodynamic equilibrium is reached. The process of flow of a methane-air mixture in a tube with reflection of a shock wave incident on the wall is considered. The unsteady equations of gas dynamics are solved numerically, supplemented by the equations of chemical kinetics. The effects of viscosity, thermal conductivity and diffusion are not taken into account. The distributions of the flow parameters behind the reflected shock wave are obtained and analyzed. The propagation of a detonation wave in an oscillatory mode is illustrated. The consistency of the calculation results of the solvers used is shown. Estimates of the possible application of this reduced combustion mechanism are given.
This work is devoted to the creation of the flagmanFoam software package developed on the basis of the OpenFOAM v2012 package. A solver has been developed to simulate icing in conditions of small droplet in-flow, with a characteristic droplet size of up to 40 microns, which corresponds to Appendix C of the Aviation Regulations AP-25. Computational models implemented in the solver are presented: the Euler-Euler approach is used to describe the dynamics of a gas-droplet flow, the Myers thermodynamic model is used to describe the growth of a liquid film and ice, the Coupled Level Set - VoF method is used to move the interface, to take into account the interaction between liquid and body, the immersed boundary method is used, the turbulent viscosity is calculated using the k-w SST turbulence model. The results of modeling on test problems and comparison with experimental data are presented.
Magnetic field generation in galaxies turns out to be a significant problem both for cosmic magnetohydrodynamics and mathematical physics. It is based on dynamo mechanism characterising the transition between the energy of medium turbulent motions and the magnetic field energy. The evolution of the field is described with the help of mean field dynamo equations. For galaxies the solutions are commonly found using so-called “no-z” approximation, while the half-thickness of the galactic disc is considered negligible. In nonlinear case mentioned equations admit contrast structure formation, predicted by the singular perturbation theory, describing equations with small parameter at the elder derivative. From astronomical point of view some authors tend to connect such solutions with the spiral structure of the galaxies and the formation of magnetic field reversals (when in different parts of galaxy there are regions with opposite directions of magnetic, divided by a thin transition layer). From numerical point of view finding the solution of two-dimensional system of equations requires large computational resources, for this reason using GPU and parallel calculations turns out to be reasonable. One of the implementation methods is calculating using OpenCL, which allows one to increase the process efficiency several times. OpenCL is a perspective crossplatform standard for development of applications, particularly involving GPU, the efficiency of which is rapidly increasing as the drivers evolve. The present work presents basic theoretical assessments of magnetic field behaviour, which are further confirmed and clarified during the computations. It is shown that the formation of the transition layers is described by fundamentally different mechanisms in radial and azimuthal directions. While radial reversals of the field turn out to be rather stable, all of the azimuthal structures are rapidly blurred due to the nature of the interstellar medium motions. That also indicates the practical impossibility of non-axisymmetric distributions of the field.
A wide range of issues of theory and practice in the development of large-scale software and hybrid software intensive systems, including operating systems, is considered. Issues of configuration management, modeling and verification of such systems, construction of ontological models of subject areas related to application and system software are touched upon. Such multilateral consideration is necessary to ensure reliability, security and elastic development during the multi-year period of operation of infrastructural and mission-critical systems. The paper is based on the materials of studies carried out within the two RFBR projects. In addition to recent results, the authors pay attention to the history of the development of research in the relevant areas in the Soviet Union.
ISSN 2220-6426 (Online)