Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 21 (2011)
View or download the full issue PDF (Russian)
Abstract
In this work we suggest automatically generating code for OpenCL standard from loops with no dependencies in C/C++/Fortran programs. We use GCC compiler’s GRAPHITE infrastructure that represents loop nests as polyhedra. We describe our implementation and experimental results which are the best for computational programs that spend most of their execution time in loop nests.
Abstract
Static analysis is a popular way of finding given patterns in source or binary code (e.g., coding style errors, violations of project guidelines of using specific libraries or language features, critical errors, security vulnerabilities, malicious code). In this paper we review the static analysis tool developed in ISP RAS for finding critical errors and security vulnerabilities in C/C++ source code. The tool uses interprocedural unsound dataflow analysis and allows performing fully automatic analysis resulting in 40-80% true positive rate which is on par with the best commercial tools in this area.
Abstract
A static analysis tool Svace finding vulnerabilities and critical errors in the source code of C/C++  programs is developed in the ISP RAS. The purpose of Svace is to find as many errors as possible with  low level of false positives and suitable use of available resources. Important requirements for this kind of systems are scalability and extensibility. The article presents the mechanism supporting the addition to the Svace system detectors of new kinds of errors that preserves the scalability. Using the mechanism illustrated by the four detectors developed errors.
Abstract
This article describes an attempt to modify and use Avalanche tool for dynamic analysis and testing of programs reading input data from network sockets. The technique of received data substitution is introduced, and it’s Valgrind based implementation is described. An overview of interception and handling of network system calls is provided. The results of analysis of open-source network applications are included, as well as a list of newly discovered defects.
Abstract
One has to harness dynamic and adaptive recompilation methods when designing the system for general-purpose languages compilation which takes into account the specific factors of target hardware and the most likely way of usage. It is favorable to research those methods in the LLVM infrastructure environment. Unfortunately, LLVM does not support dynamic profile collection and recompilation at the moment, as well as contains only one profile-based optimization. Within the introduced work, we proposed and implemented the system for hardware interruptions profile collection and the algorithm to correct profile estimations, and, apart from that, the number of profile-based optimizations. We integrated dynamic profile collection into the LLVM dynamic compiler resulting in constant program quality regardless of hardware architecture.
Abstract
The methods for estimating execution time of the model of a parallel program using instrumental computer are discussed. The methods are based on accurate prediction of the execution time of fragments of the parallel program using the target computational platform. The model was developed for SPMD programs using explicit data exchange by Java MPI library and is the part of ParJava IDE. Certain kinds of loops (homogeneous, reducible) are marked out in the model and then estimated on the node of a target computational platform (high performance cluster). The technique allows to reduce prediction error and to accelerate the model simulation on the instrumental computer.
Abstract
Conditional execution is a hardware feature available in some CPUs that allows assigning instruction a predicate so that instruction will only execute if the predicate is true. In this paper we propose a method for supporting conditional execution in the instruction scheduler and discuss the advantages of this approach compared to a stand-alone optimization pass that precedes the instruction scheduling. We have implemented the proposed optimization in the selective scheduling in GCC compiler. Experimental results show an average improvement of 2% (and up to 16% on selected tests) for SPECFP part of SPEC CPU2000 benchmark.
Abstract
The paper considers a method for automatic recovery of variables based on program execution traces. The method uses one of the techniques of dataflow analisys - reaching definitions. The paper also discusses current approaches to solve the task of recover of variables.
Abstract
The paper describes a technology that allows to capture and rerun scripts of program execution within a virtual machine. This technology enables to debug programs deterministically and may be use in future for implementation of different mechanisms of dynamic program analysis and reverse debugging.
Abstract
Virtual Machine (VM) environments are experiencing a resurgence of interest for diverse uses including server consolidation and shared hosting. An application performance in a virtual machine environment can differ markedly from its performance in a non-virtualized environment because of interactions with the underlying virtual machine monitor and other virtual machines.       
In this paper, I describe a general approach for estimating the resource requirements of applications when they are transferred to a virtual environment. The core principle of this approach is splitting complex workload into a combination of smaller tasks and replacing these tasks with synthetic atomic tests. Performance evaluation of atomic tests on native hardware and in the virtual machine allows us to define virtualization overhead for the given platform.
Abstract
Many problems in software engineering such as program refactoring, deobfuscation, vulnerability detection, require an efficient toolset for detecting pieces of code that have similar behavior. Current state of art in software clone detection makes it possible to find out only those pieces of code which have the same syntactic structure since. A more profound analysis of functional properties of programs encounters with undecidability of equivalence checking problem for Turing-complete models of computation. To overcome this principal deficiency of modern clone detection techniques we suggest to use equivalence relations on programs that are more strong than functional equivalence. One of such equivalences is a so called logic&term program equivalence introduced by V.E. Itkin in 1972 г. In this paper we build a new algorithm for checking logic&term equivalence of programs. This algorithm is based on the operations for computing the least upper bound and the greatest upper bound in the lattice of finite substitutions. Using this algorithm we also develop a procedure for solving the unification problem for sequential programs, i.e. the problem of computing the most general common instance (specialization) of a given pair of programs (pieces of code).
Abstract
The paper introduces main notions and properties of risks of software complexes. Factors and types of such risks are considered. Preparation of input data for analysis, prediction, and reduction of complex software is discussed. Then the paper describes discovering, identification, and analysis of threads and risks in complex software. Finally, methods of dangerous risks reduction and removing, registration and approving acceptable integral risk of software products are considered. водятся основные понятия и свойства рисков комплексов программ.
Abstract
Suggest interpreting logical expressions in the relational restriction operator as relational expressions. To be more precise, the relational restriction operator (R WHERE b) over the relation R on some logical expression b can be represented as join (R<AND>B) of given relation R and relational expression B, which is derived from original logical expression b replacing  of logical operators AND, OR, NOT for corresponding relational operators <AND>, <OR>, <NOT>. Then for some tuple T we define a value of attribute A as the relation with the single tuple with the single value of concerned attribute – RELATION<A> {{a}}. The value of attribute marked as NULL meaning the unknown value we define as the relation with a head including concerned attribute and the body including all possible values over the type of attribute A – REALTIONAL<A>{…}. The equality values of attributes are definite as equijoin of these attributes represented as relation. The tuple T, which could be presented as Cartesian product over all its attributes, is also definite as a relation RT. The validity of this tuple T represented as relation RTover the original logical expression b means the truth of universal quantifier over the tuples of RT on expression b, which means equality of join (RT <AND> B)and RT  –  (RT <AND> B)= RT.
Abstract
Object-oriented DBMS theory is one of the most prospective areas of modern database theory, along with deductive and temporal DBMS. Nevertheless, a high variety of different approaches and absence of a uniform standard, both in theoretical OODBMS area (object calculus, data models) and in practical area (query languages, database API for programming languages), seriously hinder creation of firm OODBMS basics. The purpose of this article is to analyze the existing concepts of OODBMS structure, observing object data models and formal mathematical models; finally, a number of primary actual problems of OODBMS theory is formulated.
Abstract
Snapshot Isolation (SI) level is extensively used in commercial database systems. We developed a simple SI implementation protocol for distributed DBMS and implemented it in the Apache HBase. This work presents the performance evaluation of the protocol in HBase cluster for the OLAP-like workload. We have measured the performance of a single-node system and validated the model using the obtained results.
Abstract
In this paper we experiment with two major types of query parallelization techniques - intra and inter operator parallelism and their combinations. We evaluate these techniques applied to a query tree with a number of join operators in the multithreaded environment. In our experiments we vary thread count, buffer size, workload characteristics, measure the relative performance of several distinct join algorithms and observe several modes which appear in the series of experiments.
Abstract
Many data sets to be clustered are stored in relational databases. Having a clusterization algorithm implemented in SQL provides easier clusterization inside a relational DBMS than outside with some alternative tools. In this paper we propose Fuzzy c-Means clustering algorithm adapted for PostgreSQL open-source relational DBMS.
Abstract
Web spam is considered to be one of the greatest threats to modern search engines. Spammers use a wide range of content generation techniques known as content spam to fill search results with low quality pages. We argue that content spam must be tackled using a wide range of content quality features. In this paper we propose a set of content diversity features based on frequency rank distributions for terms and topics. We combine them with a wide range of other content features to produce a content spam classifier that outperforms existing results.
Abstract
Extracting information from tables is an important and rather complex part of information retrieval. For the task of objects extraction from HTML tables we introduce the following methods: determining table orientation, processing of aggregating objects (like Total) and scattered headers (super row labels, subheaders).
Abstract
Finding relationships between words in text and articles from Wikipedia is an extremely popular task known as wikification. However there is still no gold standard corpus for wikifiers comparison. We present WikifyMe, the online tool for collaborative work on universal test collection which allows users to easily prepare tests for two most difficult problems in wikification: word-sense disambiguation and keyphrase extraction.
Abstract
While many researchers attempt to build up different kinds of ontologies by means of Wikipedia, the possibility of deriving high-quality domain specific subset of Wikipedia using its own category structure still remains undervalued. We prove the necessity of such processing in this paper and also propose an appropriate technique. As a result, the size of knowledge base for our text processing framework has been reduced by more than order, while the precision of disambiguating musical metadata (ID3 tags) has decreased from 98% to 64%.
Abstract
The paper describes research-in-progress work on information systems that store semi-structured data and use heuristics to make hypotheses about possible data structure. Such systems are intended to be more user-friendly as data formalization is performed by answering simple questions.
Abstract
The paper describes the architecture and the design of PargreSQL parallel database management system (DBMS) for distributed memory multiprocessors. PargreSQL is based upon PostgreSQL open-source DBMS and exploits partitioned parallelism.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)