Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 30, No 3 (2018)
View or download the full issue PDF (Russian)
7-20
Abstract
The article describes new method of use after free bug detection using program dynamic analysis. In memory-unsafe programming languages such as C/C++ this class of bugs mainly accurse when program tries to access specific area of dynamically allocated memory that has been already freed. This method is based on combination of two basic components. The first component tracks all memory operations through dynamic binary instrumentation and searches for inappropriate memory access. It preserves two sets of memory address for all allocation and free instructions. Using both sets this component checks whether current memory is accessible through its address or it has been already freed. It is based on dynamic symbolic execution and code coverage algorithm. It is used to maximize the number of execution paths of the program. Using initial input, it starts symbolic execution of the target program and gathers input constraints from conditional statements. The new inputs are generated by systematically solving saved constraints using constraint solver and then sorted by number of basic blocks they cover. Proposed method detects use after free bugs by applying first component each time when second one was able to open new path of the program. It was tested on our synthetic tests that were created based on well-known use after free bug patterns. The method was also tested on couple of real projects by injecting bugs on different levels of execution.
21-30
Abstract
Over the last few decades buffer overflow remains one of the main sources of program errors and vulnerabilities. Among other solutions several static analysis techniques were developed to mitigate such program defects. We analyzed different approaches and tools that address this issue to discern common practices and types of detected errors. Also, we explored some popular sets of synthetic tests (Juliet Test Suite, Toyota ITC benchmark) and set of buggy code snippets extracted from real applications to define types of defects that a static analyzer is expected to uncover. Both sources are essential to understand the design goals of a production quality static analyzer. Test suites expose a set of features to support that is easy to understand, classify, and check. On the other hand, they don’t provide a real picture of a production code. Inspecting vulnerabilities is useful but provides an exploitation-biased sample. Besides, it does not include defects eliminated during the development process (probably with the help of some static analyzer). Our research has shown that interprocedural analysis, path-sensitivity and loop handling are essential. An analysis can really benefit from tracking affine relations between variables and modeling C-style strings as a very important case of buffers. Our goal is to use this knowledge to enhance our own buffer overrun detector. Now it can perform interprocedural context- and path-sensitive analysis to detect buffer overflow mainly for static and stack objects with approximately 65% true positive ratio. We think that promising directions are improving string manipulations handling and combining taint analysis with our approaches.
31-46
Abstract
It may be useful to analyze and reuse some components of legacy systems during development of new systems. By using a model-based approach it is possible to build an architecture model from the existing source code of the legacy system. The purpose of using architecture models is to analyze the system’s static and dynamic features during the development process. These features may include real-time performance, resources consumption, reliability etc. The architecture models can be used as for system analysis as well as for reusing some components of the legacy system in the new design. In many cases it will allow to avoid creation of a new system from scratch. For creation of the architectural models various modeling languages can be used. In the present work Architecture Analysis & Design Language (AADL) is used. The paper describes an algorithm of extracting architectural information from source code of ARINC 653-compatible application software. ARINC 653 specification defines the requirements for software components of Integrated Modular Avionics (IMA) systems. To access the various services of ARINC 653 based OS an application software uses function calls defined in the APplication/Executive (APEX) interface. Architectural information in source code of application software compliant with ARINC 653 specification includes different objects and their attributes such as processes in each partition, objects for interpartition and intrapartition communications, as well as global variables. To collect the architectural information, it is necessary to extract all APEX calls from source code of application software. The extracted architectural information can be further used for creation the architecture models of the system. For source code analysis an approach based on Counterexample-guided abstraction refinement (CEGAR) algorithm is used. CEGAR algorithm explores possible execution paths of the program using its representation in the form of Abstract Reachability Graph (ARG). In a classical CEGAR algorithm a path in a program to be explored is called a counterexample and it means a path to the error state. In CPAchecker tool the basic predicate-based CEGAR algorithm has been extended for explicit-value analysis. In this paper the extended for explicit-value analysis CEGAR algorithm is applied for the task of extracting architecture information from source code. The main contribution of this paper is the application the ideas of counterexample and path feasibility check for the task of extracting the architectural information from source code.
47-62
Abstract
Data race occurs in multithreaded program when several threads simultaneously access same shared data and at least of them writes. Two main approaches to automatic race detection - static and dynamic - have their pros and cons. Dynamic analysis can provide best precision on certain program execution but introduce enormous runtime overheads. Earlier we introduced high-performance approach that improves performance of dynamic race detection. The key idea is to define and exclude external trusted parts of code (e.g. libraries) from analysis and replace them with specifications of their behavior in multithreaded environment. Possible behavior was classified and corresponding language for describing contracts developed. Evaluation on lightweight applications confirmed performance boost but further industrial usage of detector revealed some problems. This article covers that problems, introduces method and architecture of contract processing module and some technical features that help to apply proposed approach on high load production systems.
63-86
Abstract
Many common programming tasks, like connecting to a database, drawing an image, or reading from a file, are long implemented in various frameworks and are available via corresponding Application Programming Interfaces (APIs). However, to use them, a software engineer must first learn of their existence and then of the correct way to utilize them. Currently, the Internet seems to be the best and the most common way to gather such information. Recently, a deep-learning-based solution was proposed in the form of DeepAPI tool. Given English description of the desired functionality, sequence of Java function calls is generated. In this paper, we show the way to apply this approach to a different programming language (C# over Java) that has smaller open code base; we describe techniques used to achieve results close to the original, as well as techniques that failed to produce an impact. Finally, we release our dataset, code and trained model to facilitate further research.
87-92
Abstract
When programs are analyzed for the presence of vulnerabilities and malicious code, there is a need for a quality isolation of the analysis tools. There are two reasons for this. At first, the program can influence the tool environment. This problem is solved by using the emulator. At second, the tool environment can influence behavior of the analyzed program. So, the programmer will think that the program is harmless, but in fact it is not. This problem is solved by the mechanism of stealth debugging. The WinDbg debugger has the possibility of connecting to a remote debug service (Kdsrv.exe) in the Windows kernel. Therefore, it is possible to connect to the guest system running in the QEMU emulator. Interaction between WinDbg client and server occurs through packets by protocol KDCOM. However, kernel debugging is possible only with the enabled debugging mode in boot settings. And it reveals the debugging process. We developed special module of WinDbg debugger for Qemu emulator. It is an alternative of the remote debugging service in the kernel. Thus, the debugger client tries to connect to the WinDbg server, but module intercepts all packets, generates all the necessary information from the Qemu emulator and sends response to the client. Module completely simulates the behavior of the server, so the client does not notice the spoofing and perfectly interacts with it. At the same time for debugging there is no need to enable debugging mode in the kernel. This leads to stealth debugging. Our module supports all features of WinDbg regarding remote debugging, besides interception of events and exceptions.
93-98
Abstract
Sometimes programmers face the task of analyzing the work of a compiled program. To do this, there are many different tools for debugging and tracing written programs. One of these tools is the analysis of the application through system calls. With a detailed study of the mechanism of system calls, you can find a lot of nuances that you have to deal with when developing a program analyzer using system calls. This paper discusses the implementation of a tracer that allows you to analyze programs based on system calls. In addition, the paper describes the problems that I had to face in its design and development. Now there are a lot of different operating systems and for each operating system must be developed its own approach to implementing the debugger. The same problem arises with the architecture of the processor, under which the operating system is running. For each architecture, the analyzer must change its behavior and adjust to it. As a solution to this problem, the paper proposes to describe the operating system model, which we analyze. The model description is a configuration file that can be changed depending on the needs of the operating systems. When a system call is detected the plugin collects the information downloaded from the configuration file. In a configuration file, arguments are expressions, so we need to implement a parser that needs to recognize input expressions and calculate their values. After calculating the values of all expressions, the tracer formalizes the collected data and outputs it to the log file.
99-120
Abstract
The analysis of models and methods of reliability evaluation of hardware and software is carried out. The basic concepts of reliability and safety methods of such systems and situations leading to errors, defects and failures are defined. The definition of reliability and safety of technical systems and software systems is given. The classification of reliability models: predictive, measuring and evaluation types. Evaluation models that are used more in practice are described. The standard of Software life cycle (ISO 15288:2002) is defined, focused on the development and control of system components for errors, starting with the system requirements. The results of application of reliability models (Moussa, Goel-Okomoto, etc.) to small, medium and large projects are presented and their comparative assessment is given. The technological module (TM) of reliability evaluation of complex software systems VPK (1989) is described. The quality model of the standard ISO 9126 (1-4): 2002-2004 with indicators of functionality, reliability, efficiency, etc., which are used in determining the maturity and certificate of the product is shown.
121-134
Abstract
Device emulation is a common necessity that arises at various steps of the development cycle, hardware migration, or reverse-engineering. While implementing the algorithms behind the device may be a nontrivial task by itself, connecting the emulator to an existing environment, such as drivers intended to work with the actual hardware, may be no less complex. Devices relying on memory-mapped input/output are of a particular interest, because unlike port-mapped input/output there is much less of a chance that the target platform provides a direct interface to intercept the transmissions. A well-known approach used in various virtual machine software is to put the entire operating system under a hypervisor and build the emulator externally. This may not be desirable for reasons like hypervisor complexity, performance loss, and additional requirements for the host hardware. In this paper we extend this approach to the kernel and explain how it may be possible to build the emulator by relying on the existing interfaces provided by an operating system. Given the common availability of an MMU unit as well as memory protection mechanisms, allowing the handling of page or segment traps at read or write access, we presume that a suggested technique of intercepting memory-mapped input/output could be implemented in a broad number of target platforms. To illustrate the specifics and show potential issues we provide the ways to simplify the implementation and optimize it in speed depending on the target capabilities, the protocol emulated, and the project requirements. As a working proof we created a SMC emulator for an x86 target, which makes use of this approach.
135-148
Abstract
Modern real-time operating systems are complex embedded product made by many vendors: OS vendor, board support package vendor, device driver developers, etc. These operating systems are designed to run on different hardware; the hardware often has limited memory. Embedded OS contains many features and drivers to support different hardware. Most of the drivers are not needed for correct OS execution on a specific board. OS is statically configured to select drivers and features for each board. Modularity of OS simplifies both configuration and development. Splitting OS to isolated modules with well-specified interfaces reduces developers’ needs to interact during joint development. The configurator, in turn, can easily compose isolated components without component developers. We use formal models to specify components and their composition. Formal model describes the behavior of components and their interaction. Usage of formal models has many benefits. Models contain enough information to generate source code in C language. Our model is executable; this allows configurator to quickly verify the correctness of component configurations. Moreover, model contains constraints on its parameters. These constraints are internal consistency or some external properties. Constraints are translated into asserts in generated source code. Therefore, we can check these constraints both at model simulation and at source code execution. This paper presents our approach to describe such models at Scala language. We successfully tested the approach in RTOS JetOS.
149-164
Abstract
Field of study: Blockchain technology, decentralized autonomous organizations, smart contract and their resistance to attacks and failures. Theoretical and practical significance: Due to the fact that such a form of organization is experimental, participants often face problems of attacks on the organization, the consequences of incorrectly written rules and of fraud. The task of creating decentralized autonomous organizations that are resistant to failures and attacks, and research on the causes of such problems has become relevant for software developers and architects. Goals and objectives of work: Investigation of attack algorithms and development of methods for ensuring the sustainability of decentralized autonomous organizations for attacks on the basis of analysis of the subprocesses of border events and logs using the methods of Process Mining. The methods to be developed should promptly identify and prevent inconsistencies between the alleged and actual behavior of smart contracts that lead to such errors in the operation, such as the content of spam contracts, empty transactions, increased block processing time, etc.
165-182
Abstract
A worthy cryptographic protocol specification has to be human-readable (declarative and concise), executable and formally verified in a sound model. Keeping in mind these requirements, we present a protocol message definition notation named CMN.1, which is based on an abstraction named cryptographic stack machine. The paper presents the syntax and semantics of CMN.1 and the principles of implementation of the CMN.1-based executable protocol specification language. The core language library (the engine) performs all the message processing, whereas a specification should only provide the declarative definitions of the messages. If an outcoming message must be formed, the engine takes the CMN.1 definition as input and produces the binary data in consistency with it. When an incoming message is received, the engine verifies the binary data with respect to the given CMN.1 definition memorizing all the information needed in the further actions. The verification is complete: the engine decrypts the ciphertexts, checks the message authentication codes and signatures, etc. Currently, the author's proof-of-concept implementation of the language (embedded in Haskell) can translate a CMN.1-based specifications both to the interoperable implementations and to the programs for the ProVerif protocol analyzer. The excerpts from the CMN.1-based TLS protocol specification and corresponding automatically generated ProVerif program are provided as an illustration.
183-194
Abstract
The paper proposes some approaches to functional verification of microprocessor communication controllers based on developing layered UVM (Universal Verification Methodology) test systems. In modern microprocessor systems there are a lots of controllers operating with their own data types. Communication controllers support transferring and transformation data between microprocessor units. Such transformation must be carried out quickly and without data corruption for the correct functioning of the whole system. Communication controllers could carry additional functions such transmission values of copies of the system registers, address translation and others. Brief overview of verification tools and benefits of application standalone simulation based verification for checking the correctness of communication subsystems are marked out in the paper. We present the approaches of construction a standalone UVM-based verification environment with checking module implemented in external functional reference model. We also propose some techniques for checking the correctness of communication subsystems: checking multiple-clock controllers using parametrized clock generator, supporting of credit exchange mechanisms. Presented approaches were used to verify the communication subsystem - Host-Bridge - of Sparc V9 eight-core microprocessor developed by MCST. The difficulties discovered in the process of test system developing and its resolutions are described in the paper. The results of using presented solutions for verification of communicating subsystem controllers and further plan of the test system enhancement are considered.
195-206
Abstract
This article presents an approach used to verify communication controllers developed for Systems on Chip (SOC) at MCST. We provide a list of communication controllers developed in MCST and present their characteristics. We describe principles of communication controller’s operation on transaction, data link and physical layers and highlight their similarities. Then we describe a common method of device verification: principles of test system design, constrained random test stimuli generation and checking of device behavior. Based on common features of the controllers, we provide the general design of their test system. It includes components to work with transaction level interface (system agent of system on chip communication protocol) and physical interface (physical agent of protocol for SOC communication on a single board), configuration agent that determines device mode of operation and a scoreboard. Because controllers only execute transformation of transactions between different representation, scoreboard checks accordance of in and outgoing transactions. In addition, we describe specific features of devices that require the adjustments to the common approach. We describe how verification of those features affected the design of different test systems. We explain how a replacement of a physical agent with a second communication controller allows to speed up the development of test systems. We explain challenges of link training and status state machine (LTSSM) verification. We provide a way to work with devices with direct memory access (DMA) in a system agent. In conclusion, we present a list of found errors and directions of further research.
207-220
Abstract
The developed system "AAUI" is aimed at the automated adaptation of interfaces to the needs of users. The system provides pseudo-identification of users (building anonymous users and rules database based on their stay in web applications). The original feature of the system is the used model of identifying anonymous users of the end products with the further use of the dynamic identifier to automatically adapt the interface to the identified user's need. The analysis of the systems aimed at collecting information about users made it possible to conclude that these software systems provide only the collection of profiled information about incoming users and do not provide for automated adaptation of interfaces to the needs of users. There is no similar software in the open access. The prospects for the further development of AAUI include an increase in the number of identification markers to improve the authenticity of user identification, the integratioin with content management systems and the optimization of data management.
221-232
Abstract
In this article, the routing problems are described. It is shown, that almost all routing problem can be transformed into each other. An example of the Mixed Chinese Postman problem is discussed. The article gives an overview of various variants of Chinese Postman Problem. For all problems the mathematical formulation is given. Moreover, the useful real-life application is presented, too. Then, the article provides a table of possible Chinese Postman problems and identifies parameters that can be varied for obtaining new problems. Five parameters have been identified, such as: presence of set of edges; presence of set of arcs; presence of edges with cost, depending on traversing; the presence of set of required edges; the presence of set of required arcs. It was shown that by varying these parameters one can obtain tasks that were not described earlier but can be used in real life. Four new tasks were identified. Then it is shown that the Chinese Postman problem can be solved as another routing tasks through graph transformations. The method for transforming Chinese Postman problem into the Generalized Travelling Salesman problem is given. Then the results of solving the above problem are presented by simple algorithms, and their effectiveness is shown. The research is not over yet. The testing of other algorithms is planned.
233-250
Abstract
Vehicle Routing Problem (VRP) is one of the most widely known questions in a class of combinatorial optimization problems. It is concerned with the optimal design of routes to be used by a fleet of vehicles to serve a set of customers. In this study we analyze Capacitated Vehicle Routing Problem (CVRP) - a subcase of VRP, where the vehicles have a limited capacity. CVRP is mostly aimed at savings in the global transportation costs. The problem is NP-hard, therefore heuristic algorithms which provide near-optimal polynomial-time solutions will be considered instead of the exact ones. The aim of this article is to make a survey on mathematical formulations of CVRP and on methods for solving each type of this problem. The first part presents a general information about the problem and restrictions of this work. In the second part, the classical mathematical formulations of CVRP are described. In the third part, a classification of most popular subcases of CVRP is given, including description of additional constraints with their math formulations. This section also includes most perspective methods that can be applied for solving special types of CVRP. The forth part contains an important note about the most powerful algorithm LKH-3. Finally, the fourth part consists of table with solving techniques for each subproblem and of scheme with basic problems of the CVRP class and their interconnections.
251-270
Abstract
This article presents the results of applying various methods of system analysis (CATWOE, Rich Picture, AHP, Fuzzy AHP) to evaluation of teaching assistants. The soft and hard methods were applied. Methods of system analysis are considered as an example at the Higher School of Economics (HSE) in program “Teaching assistant”. The article shows the process of interaction of teaching assistants with students and faculty in the form of Rich Picture. Selection and analysis of criteria for the evaluation of training assistants are carried out. Three groups of criteria were defined: professional skills, communicating skills, personal qualities. Each group has some subcriteria, which were defined in brainstorm. Its own method was determined, which immediately allow drop some assistants. In addition, the application of the methods AHP and Fuzzy AHP type-2 to evaluate teaching assistants is considered. The strengths and weaknesses of each method are revealed. It is also shown that, despite the power of the methods of system analysis, it is necessary to use common sense and logic. Do not rely only on the numbers obtained by the methods of system analysis. In the process of work, the best teaching assistant is selected, and the group of the best teaching assistants is defined.
271-284
Abstract
Modern information systems manipulate data models containing millions of items, and the tendency is to make these models even more complex. One of the most crucial aspects of modern concurrent engineering environments is their reliability. The principles of ACID (atomicity, consistency, isolation, durability) are aimed at providing it, but directly following them leads to serious performance drawbacks on large-scale models, since it is necessary to control the correctness of every performed transaction. In the paper, a method for incremental validation of object-oriented data is presented. Assuming that a submitted transaction is applied to originally consistent data, it is guaranteed that the final data representation is also consistent if only the spot rules are satisfied. To identify data items subject to spot rule validation, a bipartite data-rule dependency graph is formed. To automatically build the dependency graph a static analysis of the model specifications is proposed to apply. In the case of complex object-oriented models defining hundreds and thousands of data types and semantic rules, the static analysis seems to be the only way to realize the incremental validation and to make possible to manage the data in accordance with the ACID principles.
285-302
Abstract
In this paper, we present an approach to model and simulate models of multi-agent systems (MAS) using Petri nets. A MAS is modeled as a set of workflow nets. The agent-to-agent interactions are described by means of an interface. It is a logical formula over atomic interaction constraints specifying the order of inner agent actions. Our study considers positive and negative interaction rules. In this work, we study interfaces describing acyclic agent interactions. We propose an algorithm for simulating the MAS with respect to a given interface. The algorithm is implemented as a ProM 6 plug-in that allows one to generate a set of event logs. We suggest our approach to be used for evaluating process discovery techniques against the quality of obtained models since this research area is on the rise. The proposed approach can be used for process discovery algorithms concerning internal agent interactions of the MAS.
303-324
Abstract
Sequential reactive systems represent programs that interact with the environment by receiving signals or requests and react to these requests by performing operations with data. Such systems simulate various software like computer drivers, real-time systems, control procedures, online protocols etc. In this paper, we study the verification problem for the programs of this kind. We use finite state transducers over semigroups as formal models of reactive systems. We introduce a new specification language LP-CTL* to describe the behavior of transducers. This language is based on the well-known temporal logic CTL* and has two distinguished features: 1) each temporal operator is parameterized with a regular expression over input alphabet of the transducer, and 2) each atomic proposition is specified by a regular expression over the output alphabet of the transducer. We develop a tabular algorithm for model checking of finite state transducers over semigroups against LP-CTL* formulae, prove its correctness, and estimate its complexity. We also consider a special fragment of LP-CTL* language, where temporal operators are parameterized with regular expressions over one-letter alphabet, and show that this fragment may be used to specify usual Kripke structures, while it is more expressive than usual CTL*.
325-340
Abstract
Finite State Machines (FSMs) are widely used as formal models for solving numerous tasks in software engineering, VLSI design, development of telecommunication systems, etc. To describe the behavior of a real-time system one could supply FSM model with clocks - a continuous time parameters with real values. In a Timed FSM (TFSM) inputs and outputs have timestamps, and each transition is equipped with a timed guard and an output delay to indicate time interval when the transition is active and how much time does it take to produce an output. A variety of algorithms for equivalence checking, minimization and test generation were developed for TFSMs in many papers. A distinguishing feature of TFSMs studied in these papers is that the order in which output letters occur in an output timed word does not depend on their timestamps. We think that such behavior of a TFSM is not realistic from the point of view of an outside observer. In this paper we consider a more advanced and adequate TFSM functioning; in our model the order in which outputs become visible to an outsider is determined not only by the order of inputs, but also by de lays required for their processing. When the same sequence of transitions is performed by a TFSM modified in a such way, the same outputs may follow in different order depending on the time when corresponding inputs become available to the machine. A TFSM is called strictly deterministic if every input timed word activates no more than one sequence of transitions (trace) and for any input timed word which activates this trace the letters in the output words always follows in the same order (but, maybe, with different timestamps). We studied the problem of checking whether a behavior of an improved model of TFSM is strictly deterministic. To this end we showed how to verify whether an arbitrary given trace in a TFSM is steady, i.e. preserves the same order of output letters for every input timed word which activates this trace. Further, having the criterion of trace steadiness, we developed an exhaustive algorithm for checking the property of strict determinacy of TFSMs. Exhaustive search in this case can hardly be avoided: we proved that determinacy checking problem for our model of TFSM is co-NP-hard.
341-362
Abstract
Multi-agent social systems (MASS) in general are systems of autonomous interdependent agents each pursuing its own goals interacting with other agents and environment. Dynamics of MASS cannot be adequately modeled by the methods borrowed from statistical physics since these methods do not reflect the main feature of social systems, viz. their ability to percept, process and use the external information. This important quality of distributed (“swarm”) intelligence has to be directly taken into account in a correct theoretical description of social systems. However, discussion of distributed intelligence (DI) in the literature is mostly restricted to distributed tasks, information exchange and aggregated judgment - i.e. to ‘sum’ or ‘average’ of independent intellectual activities. This approach ignores empirically well-known option of a ‘collective insight’ in a group as a special demonstration of MASS’s DI . It this paper, a state of art in modeling social systems and studies of intelligence per se are briefly characterized, and a new modular model of intelligence is suggested. The model allows to reproduce the most important result of intellectual activity, i.e. creation of new information, which is not reflected in the contemporary theoretical schemes (e.g. neural networks). Using the “modular” approach, a correspondence between individual intelligence and DI of MASS is discussed, and prospective directions for future studies are suggested. Efficiency of DI was estimated numerically by computer simulations of a simple system of agents with variable kinematic parameters {ki}, moving through a pathway with obstacles. Selection of fast agents with ‘positive mutation’ of parameters gives ca. 20% reduction of average passing time after 200-300 cycles and creates a swarm movement where agents follow a leader and cooperatively avoid obstacles.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)