Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 33, No 4 (2021)
View or download the full issue PDF (Russian)
7-18
Abstract

Unified Modeling Language (UML) is widely used standard for models visualization in software industry. Hence, a preparation of IT professionals involves the learning modeling process. Studies of student perception of UML modeling indicate that this process is perceived as quite complex. This paper presents software for validation activity, class and use-case diagrams by XMI representation. To achieve this goal, we researched existing methods and systems. Besides, we analyzed mistake catalogues and Perm State University’s student models to propose a mistake classification and checklist that presents a list of validation to be done. This paper focuses on validation each type of diagram separately, without maintaining consistency between different UML models. However, all these validation modules are combined in one system, which allows to check any of the described types of diagrams.

19-30
Abstract

Low code development environments are gaining attention due to their potential as a development paradigm for very large scale adoption in the future IT. In this paper, we propose a method to extend the (application) Domain Specific Languages supported by two low code development environments based on formal models, namely DIME (native Java) and Pyro (native Python), to include functionalities hosted on heterogeneous technologies and platforms. For this we follow the analogy of micro services. After this integration, both environments can leverage the communication with pre-existing remote RESTful and enterprise systems’ services, in our case Amazon Web Services (AWS) (but this can be easily generalized to other cloud platforms). Developers can this way utilize within DIME and Pyro the potential of sophisticated services, potentially the entire Python and AWS ecosystems, as libraries of drag and drop components in their model driven, low-code style. The new DSLs are made available in DIME and Pyro as collections of implemented SIBs and blocks. Due to the specific capabilities and checks underlying the DIME and Pyro platforms, the individual DSL functionalities are automatically validated for semantic and syntactical errors in both environments.

31-48
Abstract

The article is dedicated to the problem of classifying network traffic into three categories: transparent, compressed and opaque, preferably in real-time. It begins with the description of the areas where this problem needs to be solved, then proceeds to the existing solutions with their methods, advantages and limitations. As most of the current research is done either in the area of separating traffic into transparent and opaque or into compressed and encrypted, the need arises to combine a subset of existing methods to unite these two problems into one. As later the main mathematical ideas and suggestions that lie behind the ideas used in the research done by other scientists are described, the list of the best performing of them is composed to be combined together and used as the features for the random forest classificator, which will divide the provided traffic into three classes. The best performing of these features are used, the optimal tree parameters are chosen and, what’s more, the initial three class classifier is divided into two sequential ones to save time needed for classifying in case of transparent packets. Then comes the proposition of the new method to classify the whole network flow as one into one of those three classes, the validity of which is confirmed on several examples of the protocols most specific in this area (SSH, SSL). The article concludes with the directions in which this research is to be continued, mostly optimizing it for real-time classification and obtaining more samples of traffic suitable for experiments and demonstrations.

49-68
Abstract

Network stacks currently implemented in operating systems can no longer cope with the packet rates offered by 10 Gbit Ethernet. Thus, frameworks were developed claiming to offer a faster alternative for this demand. These frameworks enable arbitrary packet processing systems to be built from commodity hardware handling a traffic rate of several 10 Gbit interfaces, entering a domain previously only available to custom-built hardware. In this paper, we survey various frameworks for high-performance packet IO and their interaction with a modular frameworks and specialized virtual network functions software for high-speed packet processing. We introduce a model to estimate and assess the performance of these packet processing frameworks. Moreover, we analyze the performance of the most prominent frameworks based on representative measurements in packet capturing scenarios. Therefore, we provide a comparison between them and select the area of applicability.

69-76
Abstract

This work presents a network processing unit based on specialized computational cores that is used for packet processing in network devices (e.g. in network switches). Nowadays stateful data-plane algorithms are developing in software-defined networks. The idea of stateful data-plane algorithms is to move a part of control information from control plane to data plane. But these algorithms require hardware support because they need resources for state handling. This work presents the network processing unit architecture modifications that allow to use stateful data-plane algorithms that require state synchronization between the NPU processing pipelines.

77-86
Abstract

This paper addresses the problem of packet classification within a network processor (NP) architecture without the separate associative device. By the classification, we mean the process of identifying a packet by its header. The classification stage requires the implementation of data structures to store the flow tables. In our work, we consider the NP without the associative memory. Flow tables are represented by an assembly language program in the NP. For translating flow tables into assembly language programs, a tables translator was used. The main reason for implementing data compression algorithms in a flow tables translator is that modern flow tables can take up to tens of megabytes. In this paper, we describe the following data compression algorithms: Optimal rule caching, recursive end-point cutting and common data compression algorithms. An evaluation of the implemented data compression algorithms was performed on a simulation model of the NP.

87-98
Abstract

Recording and analyzing 12-lead electrocardiograms is the most common procedure for detecting heart disease. Recently, various deep learning methods have been proposed for the automatic diagnosis by an electrocardiogram. The proposed methods can provide a second opinion for the doctor and help detect pathologies at an early stage. Various methods are proposed in the paper to improve the quality of prediction of ECG recording pathologies. Techniques include adding patient metadata, ECG noise reduction, and self-adaptive learning. The significance of data parameters in training a classification model is also explored. Among the considered parameters, the influence of various ECG leads, the length of the electrocardiogram and the volume of the training sample is studied. The experiments carried out show the relevance of the described approaches and offer an optimal estimate of the input data parameters.

99-116
Abstract

The paper compares three methods for parsing of patients’ chief complaints extracted from electronic medical cards. We propose two methods which are based on usage of an ontology: either as a method for correction of mistake made by a parser, or for constructing syntactical dependencies according to this ontology and a limited set of rules of syntactical governance. As a control test, we use existing natural text parsing libraries. The paper demonstrates that such a simple approach could achieve a high accuracy, which is comparable to modern parsers.

117-130
Abstract

Morphological analysis of text is one of the most important stages of natural language processing (NLP). Traditional and well-studied problems of morphological analysis include normalization (lemmatization) of a given word form, recognition of its morphological characteristics and their morphological disambiguation. The morphological analysis also involves the problem of morpheme segmentation of words (i.e., segmentation of words into constituent morphs and their classification), which is actual in some NLP applications. In recent years, several machine learning models have been developed, which increase the accuracy of traditional morphological analysis and morpheme segmentation, but performance of such models is insufficient for many applied problems. For morpheme segmentation, high-precision models have been built only for lemmas (normalized word forms). This paper describes two new high-accuracy neural network models that implement morphemic segmentation of Russian word forms with sufficiently high performance. The first model is based on convolutional neural networks and shows the state-of-the-art quality of morphemic segmentation for Russian word forms. The second model, besides morpheme segmentation of a word form, preliminarily refines its morphological characteristics, thereby performing their disambiguation. The performance of this joined morphological model is the best among the considered morpheme segmentation models, with comparable accuracy of segmentation.

 

131-146
Abstract

The article presents an electronic text documents marking algorithm based on the identification information embedding by changing the values of the intervals between words (interwords distance shifting). The algorithm development is aimed at increasing the documents containing text information security from leakage through the channel due to the transfer of documents printed on paper, as well as the corresponding electronic copies of paper documents. In the marking algorithm developing process, an existing tools analysis of protecting paper documents from leakage was carried out, practical solutions in the field of protecting text documents were considered, their advantages and disadvantages were determined. The interwods distance shifting algorithm acts as an approach to the information embedding in electronic documents. Changing the values of interwords distance is based on embedding the normalized space in the selected areas of text lines and adjusting the remaining values of the spacing between words by the calculated values. To invariance ensure of the embedded marker for printing and subsequent scanning or photographing, formation algorithms of embedding regions and embedding matrix have been developed. In the embedding regions forming process from the text lines of the source document, arrays of spaces are formed, consisting of pairs: four and two spaces or two spaces. By means of the embedded information in the formed areas, the places where the normalized space is inserted is determined. In the embedding a marker process, an embedding matrix is formed, containing the values of the word displacement, and it is embedded in the original document in the process of printing. The developed marking algorithm usage makes it possible to introduce a marker into the electronic document text structure that is invariant to the format transformation of an electronic document into a paper one and vice versa. In addition, the developed marking algorithm features and limitations are presented. Directions for further research identified.

147-162
Abstract

One of the most common ways documents leak is taking a picture of document displayed on the screen. For investigation of such cases data leakage prevention technologies including screen watermarking are used. The article gives short review on the problem of screen shooting watermarking and the existing research results. A novel approach for watermarking text images displayed on the screen is proposed. The watermark is embedded as slight changes in luminance into the interline spacing of marked text. The watermark is designed to be invisible for human eye but still able to be detected by digital camera. An algorithm for extraction of watermark from the screen photo is presented.  The extraction algorithm doesn’t need the original image of document for successful extraction. The experimental results show that the approach is robust against screen-cam attacks, that means that the watermark stays persistent after the process of taking a photo of document displayed on the screen. A criterion for watermark message extraction accuracy without knowledge about the original message is proposed. The criterion represents the probability that the watermark was extracted correctly.

163-176
Abstract

Visual modeling is widely used nowadays, but the existing modeling platforms cannot meet all the user requirements. Visual languages are usually based on graph models, but the graph types used have significant restrictions. A new graph model, called HP-graph, whose main element is a set of poles, the subsets of which are combined into vertices and edges, has been previously presented to solve the problem of insufficient expressiveness of the existing graph models. Transformations and many other operations on visual models face a problem of subgraph matching, which slows down their execution. A multilayer approach to subgraph matching can be a solution for this problem if a modeling system is based on the HP-graph. In this case, the search is started on the higher level of the graph model, where vertices and hyperedges are compared without revealing their structures, and only when a candidate is found, it moves to the level of poles, where the comparison of the decomposed structures is performed. The description of the idea of the multilayer approach is given. A backtracking algorithm based on this approach is presented. The Ullmann algorithm and VF2 are adapted to this approach and are analyzed for complexity. The proposed approach incrementally decreases the search field of the backtracking algorithm and helps to decrease its overall complexity. The paper proves that the existing subgraph matching algorithms except ones that modify a graph pattern can be successfully adapted to the proposed approach.

177-194
Abstract

The process of developing C programs is quite often prone to errors related to the uses of pointer arithmetic and operations on memory addresses. This promotes a need in developing various tools for automated program verification. One of the techniques frequently employed by those tools is invocation of appropriate decision procedures implemented within existing SMT-solvers. But at the same time both the SMT standard and most existing SMT-solvers lack the relevant logics (combinations of logical theories) for directly and precisely modelling the semantics of pointer operations in C. One of the possible ways to support these logics is to implement them in an SMT solver, but this approach can be time-consuming (as requires modifying the solver’s source code), inflexible (introducing any changes to the theory’s signature or semantics can be unreasonably hard) and limited (every solver has to be supported separately). Another way is to design and implement custom quantifier instantiation strategies. These strategies can be then used to translate formulas in the desired theory combinations to formulas in well-supported decidable logics such as QF_UFLIA. In this paper, we present an instantiation procedure for translating formulas in the theory of bounded pointer arithmetic into the QF_UFLIA logic. We formally proved soundness and completeness of our instantiation procedure in Isabelle/HOL. The paper presents an informal description of this proof of the proposed procedure. The theory of bounded pointer arithmetic itself was formulated based on known errors regarding the correct use of pointer arithmetic operations in industrial code as well as the semantics of these operations specified in the C standard. Similar procedure can also be defined for a practically relevant fragment of the theory of bit vectors (monotone propositional combinations of equalities between bitwise expressions). Our approach is sufficient to obtain efficient decision procedures implemented as Isabelle/HOL proof methods for several decidable logical theories used in C program verification by relying on the existing capabilities of well-known SMT solvers, such as Z3 and proof reconstruction capabilities of the Isabelle/HOL proof assistant.

195-210
Abstract

Aggressive optimization in modern compilers may uncover vulnerabilities in program code that did not lead to bugs prior to optimization. The source of these vulnerabilities is in code with undefined behavior. Programmers use such constructs relying on some particular behavior these constructs showed before in their experience, but the compiler is not obliged to stick to that behavior and may change the behavior if it’s needed for optimization since the behavior is undefined by language standard. This article describes approaches to detection and elimination of vulnerabilities arising from optimization in the case when source code is available but its modification is undesirable or impossible. Concept of a safe compiler (i.e. compiler that ensures no vulnerability is added to the program during optimization) is presented and implementation of such a compiler on top of GCC compiler is described. Implementation of safe compiler’s functionality is divided into three security levels whose applicability is discussed in the article. Feasibility of using the safe compiler on real-world codebases is demonstrated and possible performance losses are estimated.

211-226
Abstract

The digital transformation of society is leading to the creation of a large number of distributed automated information systems in various areas of modern life. The need to meet security and reliability requirements prompts the creation of tools for their automated testing. Fuzzing within the security development lifecycle (SDL) is a strictly required tool for solving this problem. Tools for fuzzing binary-only applications are in demand too. These kind of fuzzing tools provide the search for critical defects in already functioning systems. It is especially acute when researching the security of proprietary systems operating using closed protocols. In the course of the research, it was found out that for fuzzing network applications in the absence of source codes, the use of universal fuzzers is complicated by many factors. These circumstances are pushing for the creation of an easy-to-use tool for network applications fuzzing. The paper discusses the features of fuzzing of this kind of programs and suggests possible solutions to the identified tasks.

227-240
Abstract

We calibrate the k − ε turbulence model for free surface flows in the channel or on the slope. To calibrate the turbulence model, an experiment is carried out in an inclined rectangular research tray. In the experiment, the pressure values in the flow are measured at different distances from the bottom using a Pitot tube; after transforming data, the flow velocity profile is obtained. The k − ε turbulence model is calibrated based on experimental data using the Nelder-Mead optimization algorithm. The calibrated turbulence model is then used to calculate the outburst of a lake near the glacier Maliy Azau on the Elbrus (Central Caucasus).



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)