Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 24 (2013)
View or download the full issue PDF (Russian)
Abstract
There is pronounced interest to cloud computing in the scientific community. However, current cloud computing offerings are rarely suitable for highperformance computing, in large part due to an overhead level of underlying virtualization components. The purpose of this paper is to propose a design and implementation of a cloud system that possesses a small enough overhead level to allow it to be practically used for a wide range of scientific workloads. First, we describe requirements for the desired system and classify workloads to identify those that are practical to transfer to the cloud. Then, we review related work. Finally, we describe our cloud system, "Virtual Supercomputer", which is based on the OpenStack cloud infrastructure and KVM/QEMU hypervisor. Most components of the original infrastructure were modified to satisfy the requirements. In particular, we tuned KVM/QEMU and the host operating system, introduced the concept of virtual machine groups and implemented a topology-aware scheduler to reduce communication overhead between network nodes belonging to the same virtual machine group. Also, we implemented a proof-of-concept web service on top of our system that allows to use OpenFOAM toolbox in software-as-a-service manner. The main result of our work is that "Virtual Supercomputer" achieved the overhead level of less than 10% on industry standard benchmarks when using up to 1024 processor cores. We deem this overhead level as acceptable for practical use.
Abstract
For some compute intensive applications cloud computing can be a costeffective alternative or an addition to supercomputers. However, in the case of highperformance computing, overall application performance depends heavily on how processes are mapped to the network nodes. Therefore a cloud scheduler must be topology-aware to reduce network congestion. In this paper the Hop-Byte metric for the case of "fat tree" network topology was evaluated under the assumption that all pairs of processes communicate evenly. We propose a scheduling algorithm that tries to minimize this metric, which was implemented atop the OpenStack scheduler. All instances are divided into groups according to compute intensive application they belong to. Every time the scheduler receives a request for launching N new instances of the same group, it maps them to the nodes in such a way that entire group (including already running instances) uses as few lower-level switches in “fat tree” as possible. We measured the impact of topology-aware scheduling on the performance of NASA Advanced Supercomputing Parallel Benchmarks. Results of 10 of 11 benchmarks changed insignificantly. The average time of Block Tridiagonal test decreased by 14%, the maximum time decreased by 40% and the difference between the maximum time and the minimum time decreased from 42% to 3%, that is, fluctuation almost disappeared.
Abstract
Most of developed tools for analysis for various libraries (MPI, OpenMP) and languages for parallel programming use low level approaches to analyze the performance of parallel applications. There are a lot of profiling tools and trace visualizers which produce tables, graphs with various statistics of executed program. In most cases developer has to manually look for bottlenecks and opportunities for performance improvement in the produced statistics and graphs. The amount of information developer has to handle manually, increase dramatically with number of cores, number of processes and size of problem in application. Therefore new methods of performance analysis fully or partially handling output information will be more beneficial. To apply the same analysis tool to various parallel paradigm (MPI applications, UPC programs) paradigm-specific inefficiency patterns has been developed. In this paper code patterns resulting in performance penalties are discussed. Patterns of parallel MPI applications for parallel computing systems with distributed memory as well as for parallel UPC programs for systems with partial global address space (PGAS) are considered. A method for automatic detection of inefficiency patterns in parallel MPI applications and UPC programs is proposed. It allows to reduce the tuning time of parallel application.
Abstract
Significant part of the computational work in numerical simulations of technical systems, physical phenomena and technological processes is solving of linear algebraic equations systems arising from the discretization of the corresponding differential or integrodifferential equations. There are several classes of iterative methods for linear algebraic equations systems solving, which differ by the approach to the construction of the next iterative approximation. These classes are methods based on splitting, variational-type methods and projection-type methods. The aim of this study is the approach development for computational efficiency analysis of iterative methods for linear algebraic equations systems solving, which are difference analogues of continuum mechanics equations, and the approaches for method a’priori choice for linear algebraic equation solving with high computational efficiency. To choose the optimal numerical method for linear systems solving, in addition to the rate of convergence such characteristics of a linear system and numerical method, as condition number, smoothing factor and cost-coefficient should be considered. The smoothing factor and cost-coefficient can be computed through the amplification factors of the modes. The performance of a smoothing method is measured by its smoothing factor, but the cost of a numerical method is measured through its costcoefficient which shows the difference between amplitudes vanishing speeds of smooth modes and rough modes. The method for modes amplification factors computing using the discrete Fourier transform is proposed. The cost-coefficient usage allows to choose the optimal parameters of the multigrid preconditioner. Some test problems are considered and the efficiency of BiCGStab (BiConjugate Gradient Stabilized) method with the Incomplete LU and multigrid preconditioners is investigated for linear systems solving which follow from discrete forms of Helmholtz and Poisson equations. These linear algebraic equations systems arise in numerical simulation of incompressible viscous flow in a square cavity by using the LS-STAG cut-cell immersed boundary method with level-set function.
Abstract
The paper presents the authors' experience in usage of the technological platform UniHUB for numerical simulation and computations of continuously stratified fluid flows based on the open source computational packages OpenFOAM, Salome and ParaView. Special attention is paid to the problems of high-resolution computational grids construction, complex boundary conditions setting using the standard and extended OpenFOAM utilities, own solvers development, numerical data processing and visualization and running program codes in parallel on the JSCC RAS Cluster, as well. Some physical results are demonstrated on the stratified flows structure and dynamics around impermeable sloping plate, symmetrical wedge, horizontal disc and circular cylinder. The obtained numerical results have direct application to the natural systems since the Earth’s atmosphere and hydrosphere are mostly stably stratified due to non-uniformity of distributions in space and time of dissolved or suspended matters, gas bubbles, temperature, medium compressibility and effects of external forces. The numerical study of diffusion-induced flows on an impermeable obstacle reveals a system of jet-like flows formed along its sloping boundaries and a complicated structure of circulation cells attached to the surface of the obstacle. The most intensive structures are clearly registered experimentally by Schlieren techniques in form of horizontally extended high gradient interfaces attached to extreme points of an obstacle, i.e. sharp edges of a plate, poles of a cylinder, vertices of a wedge, etc. With increase of typical velocities these structures do not disappear but are transformed into a complicated system of thin interfaces separating different kinds of disturbances, e.g. internal waves and a vortex sheet. The structural elements of extremely slow flows of non-homogeneous fluids form a flow fine structure in rapidly changing environments. The analytical, numerical and laboratory data are compared with each other, conditions of their agreement being discussed together with possibility of their application to the natural systems.
Abstract
This article describes two types of data transfers via PCI Express bus involving several FPGA. The first one is a simultaneous DMA data transfer between the system memory and different FPGA chips. The second one is a simultaneous direct data transfer between different FPGA. 

The data transfer speed was measured for both cases with results being about 99% from maximum speed for PCIe x4 Gen 2.0 link for the direct transfer between FPGAs (1603 MB/s for 128 bytes payload and 1740 MB/s for 256 bytes payload). The direct data transfer latency was also measured to be 0,7 us for one intermediate PCIe switch and 1 us for three intermediate switches. 

Also the effect of simultaneous transfers on data transfer speed was studied with the result that, as long as the aggregate transfer speed does not overcome the shared link bandwidth, each transfer is performed on its maximum speed; after that the shared link utilization reaches 100% with its bandwidth being distributed equally between individual transfers.
Abstract
This paper analyzes approaches for optimizing C/C++ applications used in twostage compilation system, allowing distributing such applications in the LLVM (low level virtual machine) intermediate representation. The on-stack replacement technique implemented in the LLVM just-in-time compiler is described. The paper presents a static instrumentation technique with incomplete control flow edge covering and its correction to the full control flow profile. The proposed approach allows the derived profile information to be comparable in quality to the classical approach while significantly reducing the profiling overhead. 

Also, the technique for converting dynamically collected profile to LLVM profile format for statically compiled applications is presented. This paper evaluates both existing optimizations modified for using the collected profile and the newly developed ones. Implemented profile based inlining, outlining, speculative devirtualization. Also implemented annotation of control flow graph by profile weights to make profile based optimizations easy for debugging. Proposed the technique which allows using the prefetch instructions to improve the effectiveness of array processing codes. This paper describes the approach for building a specific cloud application storage that allows solving program optimization and protection problems. Finally, we present the approach for reducing compilation and optimization overhead in the cloud infrastructure.
Abstract
This paper describes the work on development of the deobfuscation software. The main target of the developed software is the analysis of the obfuscated malware code. The need of this analysis comes from the obfuscation techniques being widely used for protecting implementations. The regular disassembly tool mostly used by an analyst transforms a binary code in a human-readable form but doesn’t simplify the result or verify its correctness. Earlier for this task it was enough to apply pattern-matching cleanup of the inserted useless garbage code, but nowadays obfuscation techniques are getting more complicated thus requiring more complex methods of code analysis and simplification. 

As deobfuscation methods require analysis and transformation algorithms similar to those of an optimizing compiler, we have evaluated using LLVM compiler infrastructure as a basis for deobfuscation software. The difference from the compiler is that the deobfuscation algorithms do not have the full information about the program being analyzed, but rather a small part of it. The evaluation results show that using LLVM directly does not remove all the artifacts from the obfuscated code, so to provide the cleaner output it is desirable to develop an independent tool. Nevertheless, using LLVM or similar compiler infrastructure is the feasible approach for developing deobfuscation software.
Abstract
The article presents type inference for programs written in Python programming language. At first, type inference algorithms for parametric polymorphism that were described in the scientific literature are reviewed. The algorithms evolved from “basic” algorithm (also known as Palsberg — Schwartzbach algorithm) to Cartesian product algorithm (CPA). It was shown that CPA is both precise and efficient, however it has to be modified to be used on Python code. Afterwards, we present a new algorithm (a modification of CPA) for Python. It is shown how type inference module using the new algorithm analyses various Python language structures: constants (literals), basic Python collections, variable names, assignments, function and class definitions, attribute and index expressions. It is also shown how the algorithm deals with external (non-Python) functions using special annotations that specify output types depending on input types of the function. Afterwards, the results of work on the prototype (module that implements described type inference algorithm) are presented. The paper is concluded by an overview of possible future work directions such as generating a defect trace, i.e. description that specifies how expression got its incorrect types.
Abstract
This paper describes issues related to automatic detection of concurrency defects using dynamic analysis methods with primary target subject as the Android platform. Due to the increased popularity and spread of Android platform and, more importantly, high importance of concurrent work flow for all user applications on this platform, automatic tools for defect detection are likely to be beneficial for many developers. The paper is organised as follows: section 1 provides a brief introduction to the discussed subjects (concurrency basics, Android platform basics). Section 2 discusses theoretical base for a set of dynamic concurrency detection methods. Section 3 contains an overview of several key points in Android work flow related to native code execution in system utilities and in user applications through the JNI mechanisms. This overview is followed by a description of existing tools implementing concurrency defect detection methods for native code and the practical considerations of applying these tools for Android analysis. Section 4 provides the information on Java-oriented tools for concurrency defect detection (Java ThreadSanitizer and Coffee Machine) with the major focus on Coffee Machine, developed within the scope of this project. An overview of Coffee Machine tool functionality and base methods is given, followed by an evaluation of practical application of the tool to Android platform. Section 5 concludes the paper with an evaluation of key issues for both native and Java dynamic analysis for automatic concurrency defect detection.
Abstract

UniTESK (UNIfied TEsting and Specification toolKit) is a testing technology based on formal models (or specifications) of requirements to behavior of software or hardware components. It was created with experience gained during development of the framework for automated testing of a real-time operating system kernel during 1994-2000. Software contracts were the main kind of models in the first version of the technology. Automata models were also implemented to support generation of complicated test sequences. UniTESK was first used in 2000 in the development of the test suite for IPv6 implementation. It was the first experience of using contract specifications in testing of implementations of telecommunication protocols. This project demonstrated that contract specifications in combination with the technique of testing software systems with asynchronous interface developed within UniTESK are very effective. Applications of UniTESK to testing software components in Java include the project on testing of implementation of standard library for Java runtime support and the project on testing of infrastructure of information system for one of large mobile operators in Russia. The most significant application of UniTESK happened in 2005-2007 in the Open Linux Verification project. Formal specifications and tests for interfaces of the Linux Standard Base were created during the project. These results then were used in development of the test suite for Russian avionics real-time operating system. Positive lessons learned during development and using UniTESK include effective application of model based testing methods in large industrial projects, high level of test coverage achieved by the generated tests, applicability of model based testing to critical systems and applications. Negative lessons include the lack of well-defined and detailed models and specifications, extra development expenses caused by creation of test specific models, problems with introduction of model based testing technologies using bilingual test generation systems and specific notations.

Abstract
Precision, completeness and scalability of static verification tools have dramatically improved over the last decade. In particular, automatic checking of moderate-sized software systems has been made possible due to development of CEGAR — Counter-Example Guided Abstraction Refinement. This approach is used in such tools as SLAM, BLAST, SATABS, and CPAchecker. The paper presents an extended review of predicate abstraction-based CEGAR. It provides an introduction to the general principles of CEGAR and describes some implementation details of CEGAR in BLAST and CPAchecker. In particular, the paper concerns deciding the reachability problem for C programs by means of symbolic predicate abstraction. The set of predicates for the abstraction is obtained by Craig interpolation of the logical formulas representing the counterexample traces being discovered during the analysis. This technique is explained by two examples analyzed step-by-step both in intuitive and in formal manner. The explanation proceeds from a number of greatly simplified programs employing a very restricted subset of C language features (e.g. using only a finite set of integer variables) to more complicated programs of arbitrary size with pointers, heap allocations and bit operations. In terms of considered abstract domains the paper describes the simplest fine-grained Cartesian abstraction and a coarse-grained Boolean abstraction with adjustable block encoding. The paper also includes small discussions on common issues arising from verification of real industrial C codebase and current capabilities of existing decision procedure implementations.
Abstract
Nowadays static verification is one of the most promising methods for finding bugs in programs. To apply successfully existing tools for the Linux kernel one needs to perform componentwise verification. Such verification needs an environment model that reflects a real environment of components rather accurately. Development of the complete environment model for Linux kernel components is a very time-consuming task since there are too many programming interfaces in the kernel and they are not stable. The given paper suggests a new approach for building programming interface specifications. This approach allows one to apply static verification tools efficiently (with small number of false alarms but without missed bugs) for checking rules of programming interfaces usage under conditions of the incomplete environment model, if component developers obey a specific coding style. Two implementations of the suggested approach were developed. The first one extended the BLAST static verification tool while the second one utilized C Instrumentation Framework – an aspect-oriented programming implementation for the C programming language. The latter allowed to use various static verification tools, like BLAST, CPAchecker, CBMC, etc., for checking specifications. In practice that implementation helped to reduce the number of false alarms on more than 70% for some rules of programming interfaces usage. The paper shows that to avoid left false alarms one needs to develop more precise environment models and to use more accurate static verification tools. Besides the Linux kernel the suggested approach can be applied for componentwise verification in projects where developers obey the specific coding style.
Abstract

Modern IT systems become more and more distributed, both in literal sense - as business logic spreads across applications running in several different network domains, and in a metaphorical sense - as expert knowledge about system's inner working spreads across different outsource companies and hundreds of documents. In this paper we are discussing regression testing as one of the issues related to that trend. When it comes to testing, quality of specifications is a major issue. They are either unmanageably big and outdated, or very much fragmented - each specifying its own module, but lacking perspective on how system works as a whole. In our practice, it often comes down to reverse engineering to discover what exactly different parts of the system are expecting from each other, but such ad hoc techniques are not welcome in the domain of testing. Different approach relies on analyzing system's events trace that is provided either by business activity monitoring tools, transaction logs or, simply, log files. There is a lot of different data mining techniques already developed; we are mostly interested in process mining as a way of discovering system's decision model. A key factor of any process mining application is the algorithm of discovering events relation. It can be either generic algorithm like Alpha-algorithm or a specific algorithm tailored for a particular set of possible events. In this paper we provide an outline for a specific algorithm that we use to discover events relation in a special environment. The environment we are using is specifically designed for the goal and has central part in the process. It is used for mediating interactions happening between system's modules in run-time as a discrete process thus providing us with a very straightforward way of determining events causality.

Abstract
Many modern applications (such as large-scale Web-sites, social networks, research projects, business analytics, etc.) have to deal with very large data volumes (also referred to as “big data”) and high read/write loads. These applications require underlying data management systems to scale well in order to accommodate data growth and increasing workloads. High throughput, low latencies and data availability are also very important, as well as data consistency guarantees. Traditional SQL-oriented DBMSs, despite their popularity, ACID transactions and rich features, do not scale well and thus are not suitable in certain cases. A number of new data management systems and approaches have emerged over the last decade intended to resolve scalability issues. This paper reviews several classes of such systems and key problems they are able to solve. A large variety of systems and approaches due to the general trend toward specialization in the field of SMS: every data management system has been adapted to solve a certain class of problems. Thus, the selection of specific solutions due to the specific problem to be solved: the expected load, the intensity ratio of read and write, the form of data storage and query types, the desired level of consistency, reliability requirements, the availability of client libraries for the selected language, etc.
Abstract
In the paper the complex approach to scientific and technical document quality assessment is proposed based on various automatically calculated document quality characteristics as widely used bibliometric and scientometric (based on citation indices), and the new types of characteristics based on the text semantic analysis, heuristics, and also on plagiarism detection methods. The integrated indicator of scientific and technical document quality assessment is formed on the basis of the received basic characteristics with use of machine learning methods similar to the problem of ranking in information retrieval. The developed prototype system based on offered approach is presented, and also the experimental investigations of the developed system directed on check of scientific and technical document quality assessment accuracy are carried out. The analysis of the state of art researches of scientific and technical document quality assessment showed the offered approach based on enhanced list of basic characteristic groups was considered by nobody in so broad statement and as a whole is innovative. The main part of the paper has the following structure. The second section contains an analytical overview of existing approaches to assess quality of scientific and technical documents. The third section provides detail of a proposed approach to assess quality of scientific and technical documents. The forth section describes a prototype system based on the proposed approach. The fifth section discusses results of experiments.
Abstract

This paper is dedicated to review of recent methods for indexing of multidimensional data and their use for modeling of large-scale dynamic scenes. The problem arises in many application domains such as computer graphics systems, virtual and augmented reality systems, CAD systems, robotics, geographical information systems, project management systems, etc. Two fundamental approaches to the indexing of multidimensional data, namely object aggregation and spatial decomposition, have been outlined. In the context of the former approach balanced search trees, also referred as object pyramids, have been discussed. In particular, generalizations of B-trees such as R-trees, R*-trees, R+-trees have been discussed in conformity to indexing of large data located in external memory and accessible by pages. The conducted analysis shows that object aggregation methods are well suited for static scenes but very limited for dynamic environments. The latter approach assumes the recursive decomposition of scene volume in accordance with the cutting rules. Space decomposition methods are octrees, A-trees, bintrees, K-D-trees, X-Y-trees, treemaps and puzzle-trees. They support more reasonable compromise between performance of spatial queries and expenses required to update indexing structures and to keep their in concordant state under permanent changes in the scenes. Compared to object pyramids, these methods look more promising for dramatically changed environments. It is concluded that regular dynamic octrees are most effective for the considered applications of modeling of large-scale dynamic scenes.

Abstract

The paper analyses a behavior of the SQL standard’s table expressions in comparison with relational equivalents. The concept of relation of relational model presumes nonexistence of duplicate tuples, attributes of the same name, ordering of attributes, ordering of tuples, no-typed NULLs. The paper demonstrate cases where behavior of constructions of the SQL-standard violates the rules of the relational model: these constructions produce tables that are not relations (join expressions, table references, set-theoretical operators, simple tables, and select). The paper also describes practices that allow to avoid such cases and to use SQL in truly relational manner. Relational use of SQL is possible nearly always. However, existing implementations of SQL are far from ideal, and sometimes such use leads a poor performance. Non-relational use of SQL means a tradeoff between performance and conceptual consistence. It is always useful to understand a theoretically perfect variant of use and to have solid reasons to abandon it. It is also useful to document these reasons to return to the perfect variant with some new version of SQL-based DBMS that does not demonstrates problems of performance.

Abstract
Building a control system for non-stationary objects is a subsequent step in developing an approach to controlling not fully described objects using Case-Based Reasoning. Initially this approach was offered in some earlier projects of the authors. The Case-Based Reasoning style of controlling complex objects shows us the cycle with three constituent elementary steps: estimating the object state, generating a controlling action, estimating the object state after that action. For non-stationary objects this traditional approach should be dramatically improved because now the state estimation cannot be done unambiguously. While we are performing these steps the object state may be changed spontaneously. In this case the action that was prepared beforehand becomes inapplicable. In this project the authors are trying to develop a method that should decrease the level of ambiguity in estimating the state object before and after the action. The previous object behavior is the source of additional information - that is the main idea of the method offered here. The authors have developed the technique of limiting the differential set. This technique allows us to fully represent all the previous states of the object starting from the very first step of control process.
Abstract
Decision support systems where the results of deduction on rules are supplemented with Case-based Reasoning results are another step forward in comparison with models which are supporting only single knowledge paradigm. The main deduction instruments in up-to-date hybrid systems are producing rules. The precedents are used only for exception processing. The approach described here which was designed and developed for the second release of system in the Institute for System Programming of Russian Academy of Sciences. On the basis of this research system the medical decision support system “Doctor’s Partner” is being developed. Now the rule and precedent reasoning are complementing each other while the conditions of ambiguity of not fully described case still exist. The absence of significant feature for one of the rules often prevents from using the rule-oriented method at all. During the current project stage the method of two-phase case estimation is used. The first phase consists of using the Case-based Reasoning method in order to get some knowledge about the possible case class (so called differential set). The next phase will use this information to produce the reverse logical inference treating the possible conclusion as a hypothesis and moving to the facts that are able to confirm it. The approach shortly described here may decrease the cost of additional rule investigation and is significantly limiting the branch choosing while moving along the rules chain.
Abstract
In 1993 Coffman and Shor proposed on-line strip packing algorithm with expected unpacked area (waste area) of order O(n^{2/3}) in a standard probabilistic model, where n is the number of rectangles. In the standard probabilistic model the width and height of each rectangle are independent random variables with uniform distribution in [0,1]. It is well-known that optimal packing  has expected waste of order O(n^{1/2}) and there exists off-line polynomial algorithm that provides this bound. During more than 10 years the question concerning the possibility to obtain similar upper bound in the class of on-line packing algorithms was open. It was also known that in the class of popular so-called shelf algorithms the bound O(n^{2/3}) cannot be improved. In this paper we give positive answer to this question. We analyze new packing algorithm proposed recently by the author and prove new upper bound of unpacked area of order O(n^{1/2} (log n)^{3/2}) which is almost optimal up to logarithmic factor.  The only restriction is the following: we must know the number n of rectangles in advance (exactly as in algorithm of Coffman and Shor). In a popular terminology concerning on-line algorithms this means that our algorithm is closed-end on-line algorithm.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)