Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 27, No 6 (2015)
View or download the full issue PDF (Russian)
7-20
Abstract

The paper addresses the problem of an optimizing compiler debugging. A new method for compile-time instrumentation is presented as an efficient approach to improve reliability of optimizing transformations implemented in the compiler. The principal feature of the method is that it aims at debugging the transformations itself rather than debugging the tests, and therefore it allows correctness of the resulting code to be verified on any input data fed into the executed program, i.e. the proposed self-checking instrumentation is orthogonal to particular input data in that sense it is able to detect the bugs on arbitrary data flow the program exhibits during its invocation.

The method can be applied to a wide set of optimizing transormations. And the cases are known when it reveals faults in transformations while the non-instrumented test itself remains fully functional (and other somewhat similar but more limited instrumentations reveal no bugs, too). Among other its features are the compactness of the embedded dynamic checkers, the linearity of code bloat, and also no assumptions on any standard libraries availability is made (eg. no functions like exit(), abort() etc are used to terminate the program if any checker is triggered). It all ensures that the optimized code is influenced minimally so the original sequence of optimizing transormations implemented in the compiler remains applicable (if compared to non-instrumented code).

The described method has demonstrated successfully its usability for detecting and identifying volatile bugs in the optimizing compilers of the Elbrus microprocessor family when they were used for building software with asynchronous control flow. Moreover it allowed to increase the reliability of not only the compiler itself but also the software being compiled as well as it shown its use in complex testing of the microprocessor prototypes.

21-32
Abstract
JavaScript is one of the most ubiquitous programming languages. Performance of JavaScript engines however is not always satisfactory. We developed approaches that increase performance of V8 engine up to 10% on major benchmark suites.
33-48
Abstract
Since its inception in the middle of the 90’s, JavaScript has become one of the most popular web development languages. Although initially developed as a browser-agnostic scripting language, in recent years JavaScript continues its evolution beyond the desktop to areas such as mobile and server-side web applications. Many massive applications are written using JavaScript, such as gmail, google docs, etc. JavaScript is also used in the Node.js - server-side web application developing platform. Moreover, JavaScript is the main language for developing applications on some operating systems for mobile and media devices. Examples of such systems are Tizen and FirefoxOS. Modern JavaScript engines use just-in-time (JIT) compilation to produce binary code. JIT compilers are limited in complexity of optimizations they can perform at runtime without delaying the execution. To maintain a trade-off between quick startup and doing sophisticated optimizations, JavaScript engines usually use multiple tiers for compiling hot functions. Our work is dedicated to performance improvement of JavaScript programs by adding a new optimizing level to the JavaScript V8 compiler. This level uses the LLVM infrastructure to optimize JavaScript functions and generate machine code. The main challenge of adding a new optimizing level is to support all the technologies (speculative compilation, on-stack replacement, concurrent compilation etc.) that are used in the modern multi-tier JIT compilers for increasing the performance and minimizing pauses during the interactive communication. All these technologies are fully supported in our solution. This has resulted in significant performance gains on the JavaScript benchmark suites when compiling hot functions.
49-66
Abstract
Performed by compiler code optimizing transformations efficiency can be greatly increased by obtaining and using program profile information, such compiler work is called profile-guided optimization (PGO). After applying transformations, which change control flow graph, it is necessary to adjust the profile information in order to maintain its adequacy for the work of succeeding optimizations. This paper considers two ways of representing profile information and the transition between them, describes the possible causes of uprising additional information on optimized code control flow requiring profile updating. Most frequently encountered problems of profile information update are considered and following solutions are offered: algorithm of acyclic graph profile correction according to its end nodes values; algorithm of loop average iteration number correction; control flow with “controversial node” profile correction algorithm. It is proved that the suggested algorithms of acyclic region counters and the loop iterations counter correction allow to change the counters ratio with original graph counters minimally. With the use of proposed algorithms, the mechanism of correction profile and thin profile information after loop peeling is described. All described methods are implemented in optimizing compilers lcc, lccs, lfortran, lfortrans for Elbrus and Sparc architectures.
67-86
Abstract
Modern JavaScript engines use just-in-time (JIT) compilation to produce binary code. JIT compilers are limited in a complexity of optimizations they can perform at a runtime without delaying an execution. Static ahead-of-time (AOT) compilers do not have such limitations, but they are not well suited for compiling dynamic languages such as JavaScript. In the paper, we discuss methods for augmenting multi-tiered JIT compiler with a capability for AOT compilation, and implement some of ahead-of-time compilation ideas in two JavaScript engines - JavaScriptCore and V8. In JavaScriptCore (JSC), which is a part of open-source WebKit library, we have developed and implemented a framework, which allows saving of JavaScript programs as a binary package containing bytecode and a native code. The framework consists of two components: command-line compiler, which compiles source JavaScript program into compressed binary package, consisting of JSC internal representation called bytecode and native code produced by JSC’s non-optimized JIT compiler. The second component is patched JSC engine with a capability for loading and executing binary packages produced by the compiler. In addition, we have implemented saving of optimized internal representation in another JavaScript engine, which is called V8 and is used in Chrome and many other browsers. When running the same JavaScript program, cached internal representation can be loaded to reduce JIT-compilation time and decrease percentage of running unoptimized code before it became hot.
87-96
Abstract
This article discusses the technique of writing code you can use to save time to write a particular program, the technique of instrumentation code in language C ++. The article gives examples of algorithms for writing code for software programs and considers optimized compilers, by which you can reduce the build time and speed the program.
97-110
Abstract
Link-time optimization and static analyzing systems scalability problem is of current importance: in spite of growth of performance and memory volume of modern computers programs grow in size and complexity as much. In particular, this is actual for such complex and large programs as browsers, operation systems, etc. This paper introduces memory scalability approach for LLVM-based link-time optimization system and proposes technique for applying this approach to static analyzing systems. Proposed approach was implemented and tested on SPEC CPU2000 benchmark suite [2].
111-134
Abstract
The paper describes a practical approach for finding bugs in the source code of programs using static analysis. This approach allows missing some of the defects. The goal is to find as many defects as possible while minimizing false positives and acceptable analysis time. Various methods of statistical analysis including an analysis based on the abstract syntax tree and data flow analysis were considered. All described algorithms have been implemented in a static analysis tool Svace.
135-150
Abstract
In this paper, we investigate efficiency of software transactional memory implementation in GCC compiler. The authors propose software tools for instrumentation to profiling programs with software transactional memory, and describe the method of reducing false conflicts. The method performs the tuning of transactional memory parameters value in GCC implementation (libitm runtime-library) by using the profiling results (profile-guided optimization). The static instrumentation allows to optimize dynamic properties of execution transactional sections. The efficiency of reducing false conflicts is investigated on the STAMP benchmarks.
151-158
Abstract
A program representation plays an important role in static analysis of software. The article discusses options for program representations built on various stages of compilation, and software bugs detectors working on these representations.
159-168
Abstract
The paper proposes an approach to introspection of virtual machines using the applications binary interface. The purpose of the method is to get information about the system, while having a minimum knowledge about its internal structure. Our system is based on QEMU emulator and has a modular structure. Existing approaches (RTKDSM, DECAF) receive data from the operating system using the kernel structures. Those instruments have to store a large number of data profiles, because all addresses and offsets in the kernel structures vary from version to version. We offer the use of the rarely changing application binary interfaces, such as calling conventions and the numbers and parameters of system calls. The idea of the method is to intercept system functions and read parameters and return values. Processor uses a special instruction to implement a system call. We expand QEMU with instrumentation engine, so we are able to monitor each executing instruction and to filter desired ones. In the event of a system call, we pass the control to the detector of system calls, that checks the number of occurred call and according to it decides to which plugin the job should be redirected to. In the mechanism of system calls interception, it is important not only to determine that the call occurred, but also to correctly determine its completion. That is needed to obtain the values of output parameters and return values. To determine the end of the system call, the system also has special instructions, but we need to collate the beginning of the call to its end correctly. And to do so we are using the current context. Thus, we have implemented monitoring of file operations and processes, and created a prototype of API functions monitor. We plan to expand the set of plugins for analysis and monitoring.
169-188
Abstract
The paper gives a brief overview of existing approaches to inheritance in programming languages and suggests alternative approach to it keeping multiple inheritance as the general mechanism for the software reuse. Approach is based on overloading and overriding with conflicts resolution at call sites based on conformance instead of full validity of the system inheritance graph.
189-198
Abstract
Inline expansion is very important for high performance VLIW, especially for microprocessors with static scheduling. Optimizations in optimizing compilers for VLIW duplicate code aggressively and lead to long compile time. Our inlining algorithm is based on heuristics that takes into account compile time explicitly. This made optimization more balanced and significantly reduced code growth and compile time compared to common inlining approach based on minimization of runtime within constraints. Instead of using hard constraints we are trading run time for compilation time in some proportion. Our heuristics predicts several key optimizations in evaluation of runtime and compile time: code scheduling, global copy propagation, dead code elimination and different loop optimizations. Optimizations prediction reduces the need in profile information which is rarely available in practice. Our implementation of inlining includes cloning, partial inlining and inlining across compilation modules in whole program mode. All this factors make dramatic impact on performance: our inlining implementation in the Elbrus optimizing compiler boost SPEC CPU2006 benchmark performance by factor of 1.41 at the cost of 12% increase of compile time and 7.7% increase of code size on average.
199-224
Abstract
In this paper, we address the performance evaluation of multi-tier clouds applications, and compare a Real-Time Calculus-based framework with two classical analytical approaches such as queuing theoretic approaches and control theoretic approaches. We focus on the capabilities of these alternatives for estimating the key Quality of Service parameter - the application response-time.
225-252
Abstract
This paper proposes an analysis of various distributed data storage systems and possible solutions to basic problems of the subject area, in particular, the issue of system scaling, data consistency, availability and partition tolerance. Performed analysis has allowed to reveal consistent patterns with an attempt to classify systems based on various parameters, in particular, the presence or absence of specific functions and mechanisms. The choices of the distributed data storage systems are based on data analysis and classification.
253-274
Abstract
This work examines the approach to the estimation of the data storage reliability that accounts for both explicit disk faults and latent bit errors as well as procedures to detect them. A new analytical math model of the failure and recovery events in the distributed data storage is proposed to calculate reliability. The model describes dynamics of the data loss and recovery based on Markov chains corresponding to the different schemes of redundant encoding. Advantages of the developed model as compared to classical models for traditional RAIDs are covered. Influence of latent HDD errors is considered, while other bit faults occurring in the other hardware components of the machine are omitted. Reliability is estimated according to new analytical formulas for calculation of the mean time to failure, at which data loss exceeds the recoverability threshold defined by the redundant encoding parameters. New analytical dependencies between the storage average lifetime until the data loss and the mean time for complete verification of the storage data are given.
275-284
Abstract
For efficient use of high-performance computing resources while implementing Computational Science methods for the study of physical, biological and social problems one can use problem-oriented distributed computing environments approach. They provide users with transparent access to the solution of specific classes of applications based on the available computing resources. To increase the effectiveness of such environments, one must use problem-oriented planning methods, which use the information about the subject area for predicting computing problems performance for optimal tasks planning and allocation. In the article the models of the subject area and problem-oriented cloud computing environment, focused on supporting the development of new problem-oriented scheduling algorithms are presented. Subject area P is defined as an ordered triple, consisting of a set of basic information objects B , the set of information object classes C and a set of functions defined over C . The task-oriented cloud computing environment can be defined as an ordered quadruple consisting of the set of nodes of a computer system N ; a set of network connections E ; a set of virtual machines images M , the basic subject area P . It should be required that within a problem-oriented computing environment the following functions for the prediction of the task execution, depending on the values of the input parameters for each class of problems were identified: the estimation of the amount of output data when a certain set of input parameters is given; the evaluation function of task execution time, given certain input parameters on the machine with the specified performance characteristics vector. Since it is impossible to estimate the time of the task execution with a perfect accuracy, task runtime evaluation should be modeled as a random variable. The provided model allows tasks execution time and output parameters volume evaluation through the collection, storage and analysis of statistics for all problems, executed in the environment.
285-306
Abstract
Actual problem of the development of new approaches to the implementation of fundamental service-oriented cloud environments to support large-scale scientific research is reviewed in the article. Review of modern technologies and models of cloud services, cloud computing is executed in the publication. The relevance of using the cloud service model "Everything As a Service" in distributed environments research is revealed. Architecture and features of realization of cloud computing, based on the reference model NIST were investigated. The structure of the reference model of a cloud computing environment for the scientific researchers is offered. The composition, objectives and functions of the actors of cloud infrastructure are described on the system level. In the model introduced new actors: the regulator, the crisis manager, the architect of composite applications. Actors interrelation of model of cloud computing environment with the business-processes of IT service management model is installed. Examples which illustrate basic scenarios of interaction of actors of cloud infrastructure, is given. On the basis of the reference model, which was proposed in the future, it is planned to develop a set of models, methods and algorithms that will allow to maintain a guaranteed level of IT services to the cloud, specializing in solving large-scale scientific problems. The work is supported by RFBR (grant 15-29-07936).
307-314
Abstract
This system is designed for automatic workload distribution in a cluster by analysing workload of compute nodes and migrating VMs from overloaded nodes to underloaded ones. Besides workload stabilization, the system also provides an opportunity to reduce power consumption by unloading and suspending underloaded nodes. Workload stabilization in a cluster increases stability and reduces system response time.
315-334
Abstract
Exponential grow of volume, increased quality of data in current and incoming sky surveys open new horizons for astrophysics but require new approaches to data processing especially big data technologies and cloud computing. This work presents a MapReduce-based approach to solve a major and important computational task in astrophysics - raw astronomical image data processing.
335-344
Abstract
Proposed are theoretical basis and algorithmic implementation of spectral-analytical method of recognition of repeats in character sequences. The theoretical justification is based on the theorem on equivalent representation of the character sequence by the vector of continuous characteristic functions. Comparison of fragments of characteristic functions is performed in the standard metric in Euclidean space of expansion coefficients of the Fourier series of orthogonal polynomials. An essential feature of this approach is the ability to evaluate repeats at different scales. Another important feature is the possibility of efficient parallelization of data. In the development of algorithms we preferred scheme of computing with a minimal amount of references to memory, implying repetitive calculations and evaluations on demand. In this paradigm, proposed is an algorithm for calculating the coefficients of expansions in the orthogonal polynomials through the use of recurrence relations. It is shown that the algorithm for calculating the coefficients of expansions in the orthogonal polynomials can be effectively vectorized by computing with a fixed vector length. Parallelization and vectorization implemented using the OpenMP standard and extension Cilk Plus of language C/C++. The developed method effectively scales, depending on the parameters of the problem and the number of processor cores on systems with shared memory.
345-354
Abstract
The paper describes some basic topics and use cases of cloud computing environments in various scientific laboratories. It describes the Joint Institute for Nuclear Research (JINR) activities aimed at the development of the cloud infrastructure deployed in the Laboratory of Infomation Technologies in JINR, as well as applied strategies of the improvement of efficiency, fault tolerance and reliability of the JINR cloud service, and also its integration with other cloud services.
355-380
Abstract
In this paper, we address power aware online scheduling of jobs with resource contention. We propose an optimization model and present new approach to resource allocation with job concentration taking into account types of applications. Heterogeneous workloads include CPU intensive, disk I/O intensive, memory intensive, network I/O intensive and other applications. When jobs of one type are allocated to the same resource, they may create a bottleneck and resource contention either in CPU, memory, disk or network. It may result in degradation of the system performance and increasing energy consumption. We focus on energy characteristics of applications, and show that an intelligent allocation strategy can further improve energy consumption compared with traditional approaches. We propose heterogeneous job consolidation algorithms and validate them by conducting a performance evaluation study using the CloudSim toolkit under different scenarios and real data. We analyze several scheduling algorithms depending on the type and amount of information they require.
381-394
Abstract
The paper considers the problems of developing and analysis of cloud database systems. We determine the requrments for encryption to be usable in practical applications. A new notion - a non-derandomizable encryption - allows to do this and we explain the practical value of this notion as well as links between it and classical notions of cryptosystem’s security, practical security of whole cloud computing system. We show the examples of simple algebraically homomorphic cryptosystems - both derandomizable and not non-derandomizable. The paper finally concludes about usability of considered cryptosystems for practical cloud systems.
395-408
Abstract
The article substantiates the necessity of using distributed computing systems for mathematical simulation of process of formation of cloud contaminated air at occurrence of beyond design basis emergency situations of technogenic chemical-technological objects. We analyzed the dependence of the rate of change of concentration of harmful impurities in the arbitrary point of the space and the mathematical model of processes of formation of cloud contaminated air at high temperature releases (explosions, fires) and the Straits of large quantities of toxic chemicals on various surfaces and their subsequent evaporation. Proved the feasibility of decomposing the problem of modeling of process of formation of the cloud of contaminated air.
409-420
Abstract
In work the structure and separate components of the cloudy service intended for the decision of multi-scale problems of nanotechnology on supercomputer systems are presented. The need of integration: (a) an ideas and knowledge on this applied problem, (b) a specialists in this scientific field and a programmers for the supercomputer systems, (c) various technologies of modeling and a set of packages of applied programs, (d) various computing resources which are available for the Institute and its partners - was motivation to creation of cloudy service. The prototype of the cloudy environment realized in the form of service Multilogin and the applied software available on virtual machines of users became a results of this work. The first applications of created service are (a) the Flow_and_Particles parallel program for supercomputer calculations of multi-scale gasdynamics processes in micro-channels of technical systems and (b) the Flow_and_Particles_View visualizer of distributed computation results. Use of the developed service allowed to increase efficiency of scientific research in the chosen applied field.
421-440
Abstract
Information-analytical decision support systems in government and business are distributed. There are two substantially different approaches of information, experts and analytical support of decision-making. The first is based on using the information that is stored in electronic databases, and the second - analyzing of the experts’ views in real time. We propose a design solution for creating the cloud framework for integrates the heterogeneous information, experts and analytical tools in the cloud that provides the convergence of the decision-making processes. The processes can include network expertise, cognitive modeling, visualization and analysis of Big Data. Examples of practical testing of the individual components are shown.
441-450
Abstract
The paper addresses the problem of deriving preset checking experiments for non-observable FSMs. The fault model is considered to be a ‘white box’ where all possible implementations are explicitly enumerated. We show that if the specification FMS has a tree structure then it is possible to derive a complete checking experiment that has polynomial length.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)