Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 29, No 4 (2017)
View or download the full issue PDF (Russian)
7-20
Abstract
Elegant syntax of the Ruby language pays back when it comes to finding bugs in large codebases. Static analysis is hindered by specific capabilities of Ruby, such as defining methods dynamically and evaluating string expressions. Even in dynamically typed languages, type information is very useful as it ensures better type safety and more reliable checking whether the called method is defined for the object or whether the arguments of the correct types are passed to it. One may annotate the code with YARD (Ruby documentation tool) to declare the input and output types of methods or even declare methods that are added dynamically. These annotations improve the capabilities of tooling such as code completion. This paper reports a new approach to type annotations generation. We trace direct method calls while the program is running, evaluate types of input and output variables and use this information to derive implicit type contracts. Each method or function is associated with a finite-state automaton consisting of all variants of typed signatures for this method. An effective compression technique is applied to the automaton to reduce the cost of storage and allows to display the collected information in a human-readable form. The exhaustiveness of the contract defined by the generated automaton depends on the diversity of the traced method usages. Therefore, it is also important to be able to merge all the automatons received from users into one, which is further covered in this paper.
21-38
Abstract
Process mining offers various tools for studying process-aware information systems. They mainly involve several participants (or agents) managing and executing operations on the basis of process models. To reveal the actual behavior of agents, we can use process discovery. However, for large-scale processes, it does not yield models, which help understand how agents interact since they are independent and their concurrent implementation can lead to a very sophisticated behavior. To overcome this problem, we propose interface patterns, which allow getting models of multi-agent processes with a clearly identified agent behavior and interaction scheme as well. The correctness of patterns is provided via morphisms. We also conduct a preliminary experiment, results of which are highly competitive compared to the process discovery without interface patterns.
39-54
Abstract
The verification of many practical systems - in particular, embedded systems - involves processes executing over time, for which it is common to use models based on temporal logic, in either its linear (LTL) or branching (CTL). Some of today’s most advanced automatic program verifiers, however, rely on non-temporal theories, particularly Hoare-style logic. Can we still take advantage of this sophisticated verification technology for more challenging systems? As a step towards a positive answer, we have defined a translation scheme from temporal specifications to contract-equipped object-oriented programs, expressed in Eiffel and hence open for processing by the AutoProof program prover. We have applied this scheme to a published CTL model of a widely used realistic example, the “landing gear” system which has been the subject of numerous competing specifications. An attempt to verify the result in AutoProof failed to prove one temporal property, which on further inspection seemed to be wrong in the original published model, even though the published work claimed to have verified an Abstract State Machine implementation of that model. Correcting the CTL specification to reflect the apparent informal attempt, re-translating again to contracted Eiffel and re-running the verification leads to success. The LTL-to-contracted-Eiffel process is still ad hoc, and tailored to generate the kind of scheme that the target verification tool (AutoProof) can handle best, rather than the simplest or most elegant scheme. Even with this limitation, the results highlight the need for rigor in the verification process, and (on the positive side) demonstrate that the highly advanced mechanized proof technology developed over several decades for the verification of traditional programs also has the potential of handling the demanding needs of embedded systems and other demanding contemporary developments.
55-72
Abstract
Gaussian convolution and its discrete analogue, Gauss transform, have many science and engineering applications, such as mathematical statistics, thermodynamics and machine learning, and are widely applied to computer vision and image processing tasks. Due to its computational expense (quadratic and exponential complexities with respect to the number of points and dimensionality, respectively) and rapid spreading of high quality data (bit depth/dynamic range), accurate approximation has become important in practice compared with conventional fast methods, such as recursive or box kernel methods. In this paper, we propose a novel approximation method for fast Gaussian convolution of two-dimensional uniform point sets, such as 2D images. Our method employs L1 distance metric for Gaussian function and domain splitting approach to achieve fast computation (linear computational complexity) while preserving high accuracy. Our numerical experiments show the advantages over conventional methods in terms of speed and precision. We also introduce a novel and effective joint image filtering approach based on the proposed method, and demonstrate its capability on edge-aware smoothing and detail enhancement. The experiments show that filters based on the proposed L1 Gauss transform give higher quality of the result and are faster than the original filters that use box kernel for Gaussian convolution approximation.
73-86
Abstract
This article describes our ongoing research on real-time digital video stabilization using MEMS-sensors. The authors propose to use the described method for stabilizing the video that is transmitted to the mobile robot operator who controls the vehicle remotely, as well as increasing the precision of video-based navigation for subminiature autonomous models. The article describes the general mathematical models needed to implement the video stabilization module based on the MEMS sensors readings. These models includes the camera motion model, frame transformation model and rolling-shutter model. The existing approaches to stabilization using sensors data were analyzed and considered from the point of view of the application in a real-time mode. This article considers the main problems that came up during the experiments that were not resolved in the previous research papers. Such problems include: calibration of the camera and sensors, synchronization of the camera and sensors, increasing the accuracy of determining the camera position from sensors data. The authors offer possible solutions to these problems that would help improve quality of the work of existing algorithms, such as a system for parallel synchronized recording of video and sensor data based on the Android operating system. As the main result, the authors represent a framework for implementing video stabilization algorithms based on MEMS sensors readings.
87-106
Abstract
In the last few years there has been a growing interest in route building oriented mobile applications with the following features of navigation and sending timely notifications about arrival. Despite the large body of existing knowledge on navigational services, there has been an important issue relative to positioning accuracy. The paper discusses a possible solution to comparison problem, which is linked to the determination of the closeness to destination metro station through finding a difference between user’s current coordinates and fixed-point coordinates. With this end in view, fuzzy logic approach is used to develop Routes Recommender System (RRS) that utilizes linguistic variables to express the vague and uncertain term ‘closeness to…’. The paper provides detailed explanation of each variable considered in the fuzzy inference system (FIS), set of fuzzy rules in line with graphical representation of system’s output. Based on Mamdani model, we propose a set of test cases to check maintainability of the model and provide a description about received results. At a later time, an Android-based mobile application aimed at public transport route building will be developed whose notification system will be based on our model`s implementation presented. It should be emphasized that the paper examines potentials of the modeling approach based on interval type-2 fuzzy sets (IT2FS) that attract much attention these days in various research studies and conventional Mamdani fuzzy inference system (MFIS) as applied to real and rather topical problem. The significance of developing such models may be of a high demand for appropriate representation of factors that are inherently vague and uncertain. Hence, this study may also contribute to future research on similar topics.
107-122
Abstract
The routing problems are important for logistic and transport sphere. Basically, the routing problems related to determining the optimal set of routes in the multigraph. The Chinese postman problem (CPP) is a special case of the routing problem, which has many potential applications. We propose to solve the MCPP (special NP-hard case of CPP, which defined on mixed multigraph) using the reduction of the original problem into General Travelling Salesman Problem (GTSP). The variants of CPP are pointed out. The mathematical formulations of some problems are presented. The algorithm for reduction the MCPP in multigraph into GTSP is shown. The experimental results of solving MCPP in multigraph through the reduction into GTSP are presented.
123-138
Abstract
The Metric Travelling Salesman Problem is a subcase of the Travelling Salesman Problem (TSP), where the triangle inequality holds. It is a key problem in combinatorial optimization. Solutions of the Metric TSP are generally used for costs minimization tasks in logistics, manufacturing, genetics and other fields. Since this problem is NP-hard, heuristic algorithms providing near optimal solutions in polynomial time will be considered instead of the exact ones. The aim of this article is to experimentally find Pareto optimal heuristics for Metric TSP under criteria of error rate and run time efficiency. Two real-life kinds of inputs are intercompared - VLSI Data Sets based on very large scale integration schemes and National TSPs that use geographic coordinates of cities. This paper provides an overview and prior estimates of seventeen heuristic algorithms implemented in C++ and tested on both data sets. The details of the research methodology are provided, the computational scenario is presented. In the course of computational experiments, the comparative figures are obtained and on their basis multi-objective optimization is provided. Overall, the group of Pareto-optimal algorithms for different N consists of some of the MC, SC, NN, DENN, CI, GRD, CI + 2-Opt, GRD + 2-Opt, CHR and LKH heuristics.
139-154
Abstract
Finite State Machines (FSMs) are widely used for analysis and synthesis of components of control systems. In order to take into account time aspects, timed FSMs are considered. As the complexity of many problems of analysis and synthesis of digital and hybrid systems including high-quality test derivation significantly depends on the size of the system (component) specification, in this paper, we address the problem of minimizing a FSM with timed guards and input and output timeouts (TFSM). The behavior of a TFSM can be described using a corresponding FSM abstraction and a proposed approach for minimizing a TFSM is based on such abstraction. We minimize not only the number of states as it is done for classical FSMs but also the number of timed guards and timeout duration. We show that for a complete deterministic TFSM there exists the unique minimal (canonical) form, i.e., a unique time and state reduced TFSM that has the same behavior as the given TFSM; for example, this minimal form can be used when deriving tests for checking whether an implementation under test satisfies functional and nonfunctional requirements. A proposed approach for minimizing timed machines can be applied to particular cases of TFSM, i.e. for FSM with timeouts and FSM with timed guards.
155-174
Abstract
In the paper we consider a method for mining so-called “hybrid” UML models, that refers to software process mining. Models are built from execution traces of information systems with service-oriented architecture (SOA), given in the form of event logs. While common reverse engineering techniques usually require the source code, which is often unavailable, our approach deals with event logs which are produced by a lot of information systems, and some heuristic parameters. Since an individual type of UML diagrams shows only one perspective of a system’s model, we propose to mine a combination of various types of UML diagrams (namely, sequence and activity ), which are considered together with communication diagrams. This allows us to increase the expressive power of the individual diagram. Each type of diagram correlates with one of three levels of abstraction (workflow, interaction and operation), which are commonly used while considering web-service interaction. The proposed algorithm consists of four tasks. They include splitting an event log into several parts and building UML sequence, activity and communication diagrams. We also propose to encapsulate some insignificant or low-level implementation details (such as internal service operations) into activity diagrams and connect them with a more general sequence diagram by using interaction use semantics. To cope with a problem of immense size of synthesized UML sequence diagrams, we propose an abstraction technique based on regular expressions. The approach is evaluated by using a developed software tool as a Windows-application in C#. It produces UML models in the form of XML-files. The latter are compatible with well-known Sparx Enterprise Architect and can be further visualized and utilized by that tool.
175-190
Abstract
Well-structured transition systems (WSTS) became a well-known tool in the study of concurrency systems for proving decidability of properties based on coverability and boundedness. Each year brings new formalisms proven to be WSTS systems. Despite the large body of theoretical work on the WSTS theory, there has been a notable gap of empirical research of well-structured transition systems. In this paper, the tool for behavioural analysis of such systems is presented. We suggest the extension of SETL language to describe WSTS systems (WSTSL). It makes the description of new formalisms very close to the formal definition. Therefore, it is easy to introduce and modify new formalisms as well as conduct analysis of the behavioural properties without much programming efforts. It is highly convenient when a new formalism is still under active development. Two most studied algorithms for analysis of well-structured transition systems behavior (backward reachability and the Finite Reachability Tree analyses) have been implemented; and, their performance was measured through the runs on such models as Petri Nets and Lossy Channel Systems. The developed tool can be useful for incorporating and testing analysis methods to formalisms that occur to be well-structuredness transition systems.
191-202
Abstract
In this paper we consider Markov analysis of models of complex software and hardware systems. A Markov analysis tool can be used during verification processes of models of avionics systems. In the introduction we enumerate main advantages and disadvantages of Markov analysis. For example, with Markov analysis, unlike other approaches, such as fault tree analysis and dependency diagram analysis, it is possible to analyze models of systems that are able to recovery. The main drawback of this approach is an exponential growth of models size with number of components in analyzed system. It makes Markov analysis barely used in practice. The other important problem is to develop a new algorithm for translating a model of a system to a model suitable for Markov analysis (Markov chain), since the existing solutions have significant limitations on the architecture of analyzed systems. Next we give a brief description of the context - AADL modeling language with Error Model Annex library, MASIW framework, and also give an explanation of Markov analysis method. In a main section we suggest an algorithm for translating a system model into a Markov chain, partially solving the problem of exponential growth of Markov chain. Then follows a description of further steps, and some heuristics that allow to extremely reduce running time of the algorithm. In this paper we also consider other Markov analysis tools and their features. As a result, we suggest a Markov analysis tool that can be effectively use in practice.
203-216
Abstract
Software verification is a type of activity focused on software quality control and detection of errors in software. Static verification is verification without the execution of software source code. Special software - tools for static verification - often work with program's source code. One of the tools that can be used for static verification is a tool called CPAchecker. The problem of the current memory model used by the tool is that if a function returning a pointer to program's memory lacks a body, arbitrary assumptions can be made about this function return value in the process of verification. Although possible, the assumptions are often also practically very improbable. Their usage may lead to a false alarm. In this paper we give an overview of the approach capable of resolving this issue and its formal specification in terms of path formulas based on the uninterpreted functions used by the tool for memory modeling. We also present results of benchmarking the corresponding implementation against existing memory model.
217-230
Abstract
The Linux kernel is often used as a real world case study to demonstrate novel software product line engineering research methods. It is one of the most sophisticated programs nowadays. To provide the most safe experience of building of Linux product line variants it is necessary to analyse Kconfig file as well as source code. Ten of thousands of variable statements and options even by the standards of modern software development. Verification researchers offered lots of solutions for this problem. Standard procedures of code verification are not acceptable here due to time of execution and coverage of all configurations. We offer to check the operating system with special wrapper for tools analyzing built code and configuration file connected with coverage metric. Such a bundle is able to provide efficient tool for calculating all valid configurations for predetermined set of code and Kconfig. Metric can be used for improving existing analysis tools as well as decision of choice the right configuration. Our main goal is to contribute to a better understanding of possible defects and offer fast and safe solution to improve the validity of evaluations based on Linux. This solution will be described as a program with instruction for inner architecture implementation.
231-246
Abstract
This paper introduces a technique for scalable functional verification of cache coherence protocols that is based on the verification method, which was previously developed by the author. Scalability means that verification efforts do not depend on the model size (that is, the number of processors in the system under verification). The article presents an approach to the development of formal Promela models of cache coherence protocols and shows examples taken from the Elbrus-4C protocol model. The resulting formal models consist of language constructs that directly reflect the way protocol designers describe their developments. The paper describes the development of the tool, which is written in the C++ language with the Boost.Spirit library as parser generator. The tool automatically performs the syntactical transformations of Promela models. These transformations are part of the verification method. The procedure for refinement of the transformed models is presented. The refinement procedure is supposed to be used to eliminate spurious error messages. Finally, the overall verification technique is described. The technique has been successfully applied to verification of the MOSI protocol implemented in the Elbrus computer systems. Experimental results show that computer memory requirements for parameterized verification are negligible and the amount of manual work needed is acceptable.
247-256
Abstract
Hardware testing is a process aimed at detecting manufacturing faults in integrated circuits. To measure test quality, two main metrics are in use: fault detection abilities (fault coverage) and test application time (test length). Many algorithms have been suggested for test generation; however, no scalable solution exists. In this paper, we analyze applicability of functional tests generated from high-level models for low-level manufacturing testing. A particular test generation method is considered. The input information is an HDL description. The key steps of the method are system model construction and coverage model construction. Both models are automatically extracted from the given description. The system model is a representation of the design in the form of high-level decision diagrams. The coverage model is a set of LTL formulae defining reachability conditions for the transitions of the extended finite state machine. The models are translated into the input format of a model checker. For each coverage model formula the model checker generates a counterexample, i.e. an execution that violates the formula (makes the corresponding transition to fire). The approach is intended for covering of all possible execution paths of the input HDL description and detecting dead code. Experimental comparison with the existing analogues has shown that it produces shorter tests, but they achieve lower stuck-at fault coverage comparing with the dedicated approach. An improvement has been proposed to overcome the issue.
257-268
Abstract
This article proposes approaches used to verify 10 Gigabit Ethernet controllers developed by MCST. We present principles of the device operation - they provide a set of memory-mapped registers and use direct memory access, and their characteristics. We describe a set of approaches used to verify such devices - prototype based verification, system and stand-alone verification. We provide the motivation for the chosen approach - combination of system verification with stand-alone verification of its single component. The structure of the test systems that we used to verify devices and their components are presented. Test system of the controller transmits Ethernet frames to the network and receives frames from it. Algorithms to transfer packet to representation used by the device were implemented. Stand-alone test system was developed for a connector module between internal device buses and its external interface. Test systems were developed using UVM. This methodology and structure of test systems allowed to reuse components in a different systems. A set of test scenarios used to verify the device is described. The examination of network characteristics of the controller is very important in the verification process. Some approaches and techniques for throughput measuring and modes of device operations for the measurement are described. We present measured throughput in different modes. In conclusion, we provide a list of found errors and their distribution by different types of functionality they affected.
269-282
Abstract
Market surveillance systems, used for monitoring and analysis of all transactions in the financial market, have gained importance since the latest financial crisis. Such systems are designed to detect market abuse behavior and prevent it. The latest approach to the development of such systems is to use machine learning methods that largely improve the accuracy of market abuse predictions. These intelligent market surveillance systems are based on data mining methods, which build their own dependencies between the variables. It makes the application of standard user-logic-based testing methodologies difficult. Therefore, in the context of intelligent surveillance systems, we built our own model for classifying the transactions. To test it, it is important to be able to create a set of test cases that will generate obvious and predictable output. We propose scenarios that allow to test the model more thoroughly, compared to the standard testing methods. These scenarios consist of several types of test cases which are based on the equivalence classes methodology. The division into equivalence classes is performed after the analysis of the real data used by real surveillance systems. We tested the created model and discovered how this approach allows to define its weaknesses. This paper describes our findings from using this method to test a market surveillance system that is based on machine learning techniques.
283-294
Abstract
Modern embedded OS are designed to be used in control solutions in various hardware contexts. Control computers may differ in the architecture of the CPU, the structure of communication channels, supported communication protocols, etc. Embedded OS are often statically configured to create an OS image, which intended to be executed on some specific control computer. System integrator usually performs this configuration. Embedded OS are often developed by many companies. Joint development and integration is very complex if OS doesn’t support modularity. Support of modularity and component assembly reduces the need of communication among companies during development and integration. This allows customers to create minimal solutions that are optimally adapted to the particular task and hardware platform. Furthermore, customers may be interested in adding their own low level components without OS modification. In this article, we present an approach to building modular embedded solutions from heterogeneous components based on the RTOS JetOS. The mechanism of components binding developed by us allows uniting heterogeneous components from different manufacturers within the same section of the address space. This mechanism allows component developer to independently develop their components. And system integrator can independently from developers configure what component he likes to see in OS image and how components should interact.
295-302
Abstract
In this paper, we present our work in developing a debugger for multiplatform real-time operating system Jet OS designed for civil airborne avionics. This system is being developed in the Institute for System Programming of the Russian Academy of Sciences, and it is designed to work within Integrated Modular Avionics (IMA) architecture and implement ARINC-653 API specification. Jet OS supports work on different architectures such as PowerPC, MIPS, x86 and ARM. Debugger for a real-time OS is an important tool in software development process, but debugger for RTOS is more than typical debugger used by desktop developers and we must take into account all specific features of such debugger. Moreover, we must support debugging on many platforms. However, debugger's code has to be developed for each platform and we faced the problem of porting our debugger to different architecture without developing it from scratch. In addition, the debugger must support work within emulators, because it can expand developers’ capabilities and increase their efficiency. In this paper, we present the architecture of the debugger for JetOS real-time operating system, which provides capabilities for porting our debugger to a new platform in little to no time, and discuss the challenges imposed by multiplatform support in the OS.
303-314
Abstract
Development of software documentation often involves copy-pasting, which produces a lot of duplicate text. Such duplicates make it difficult and expensive documentation maintenance, especially in case of long life cycle of software and its documentation. The situation is further complicated by duplicate information frequently being near duplicate, i.e., the same information may be presented many times with different levels of detail, in various contexts, etc. There are a number approaches to deal with duplicates in software documentation. But most of them use software clone detection technique, that is make difficult to provide efficient near duplicate detection: source code algorithms ignore a document structure, and they produce a lot of false positives. In this paper, we present an algorithm aiming to detect near duplicates in software documentation using natural language processing technique called as N-gramm model. The algorithm has a considerable limitation: it only detects single sentences as near duplicates. But it is very simple and may be easily improved in future. It is implemented with use of Natural Language Toolkit (NLTK), and. Evaluation results are presented for five real life documents from various industrial projects. Manual analysis shows 39 % of false positives in automatic detected duplicates. The algorithm demonstrates reasonable performance: documents of 0,8-3 Mb are processed 5-22 min.
315-324
Abstract
With the popularization of social media, a vast amount of textual content with additional geo-located and time-stamped information is directly generated by human every day. Both tweet meaning and extended message information can be analyzed in a purpose of exploration of public mood variations within a certain time periods. This paper aims at describing the development of the program for public mood monitoring based on sentiment analysis of Twitter content in Russian. Machine learning (naive Bayes classifier) and natural language processing techniques were used for the program implementation. As a result, the client-server program was implemented, where the server-side application collects tweets via Twitter API and analyses tweets using naive Bayes classifier, and the client-side web application visualizes the public mood using Google Charts libraries. The mood visualization consists of the Russian mood geo chart, the mood changes plot through the day, and the mood changes plot through the week. Cloud computing services were used in this program in two cases. Firstly, the program was deployed on Google App Engine, which allows completely abstracts away infrastructure, so the server administration is not required. Secondly, the data is stored in Google Cloud Datastore, that is, the highly-scalable NoSQL document database, which is fully integrated with Google App Engine.
325-336
Abstract
Nowadays, news portals are forced to seek new methods of engaging the audience due to the increasing competition in today’s mass media. The growth in the loyalty of news service consumers may further a rise of popularity and, as a result, additional advertising revenue. Therefore, we propose the tool that is intended for stylish presenting of facts from a news feed. Its outputs are little poems that contain key facts from different news sources, based on the texts of Russian classics. The main idea of our algorithm is to use a collection of classical literature or poetry as a dictionary of style. The facts are extracted from news texts through Tomita Parser and then presented in the form similar to a sample from the collection. During our work, we tested several approaches for text generating, such as machine learning (including neural networks) and template-base method. The last method gave us the best performance, while the texts generated by the neural network are still needed to be improved. In this article, we present the current state of Narrabat, a prototype system rephrasing news we are currently working on, give examples of generated poems, and discuss some ideas for future performance improvement.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)