Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 28, No 5 (2016)
View or download the full issue PDF (Russian)
9-10
Abstract
This issue of the Proceedings of ISP RAS presents papers on one of the most developed areas of research and development in the ISP RAS - technologies for analysis, modeling and transformation of programs. All these articles are prepared on the basis of reports submitted at the Open Conference of ISPRAS in 2016.
11-26
Abstract
Developed method, which is described in this paper, is capable of automated detection of uninitialized values within the scope of full-system emulation. This method is of immediate interest for low-level software, such as BIOS or initial loader, which initializes hardware and loads the operating system. Errors in this kind of software are the most dangerous and lead to system shutdown. This sort of software is difficult to test on real hardware, consequently emulators of different architectures are used for these tasks. In the context of this work a new method of using shadow memory for storing and tracking register states and guest system memory cells. Criteria for detection of uninitialized variables usage and error reporting were defined. For example, these situations fall under the criteria: uninitialized value is the address for loading and unloading values from and to the memory, conditional jump is performed based on uninitialized value or to an uninitialized memory chunk. Developed method was implemented and tested in the guest system of x86 architecture in full-system emulator QEMU. System consists of few instructions, which initialize a processor and transfers control to a user application. Testing was performed on three simple examples for each of the criteria for unitialized values detection. Developed method demonstrated correct results on all examples.
27-54
Abstract
Null pointer dereferencing is a well-known bug in object-oriented programs. It can be avoided by adding special validity rules to a language in which programs are written. Are the rules sufficient to ensure absence of such exceptions? This work focuses on null safety for intra-procedural context where no additional type annotations are needed and formalizes the rules in Isabelle/HOL proof assistant. It then proves null-safety preservation theorem for big-step semantics in a computer-checkable way. Finally, it demonstrates that with such rules null-safe and null-unsafe semantics are equivalent.
55-72
Abstract
The majority of software vulnerabilities originate from buffer overflow. Techniques to eliminate buffer overflows and limit their damage include secure programming, source code audit, binary code audit, static and dynamic code generation features. Modern compilers implement compile-time and execution time protection schemes, that include variables reordering, inserting canary value, and separate stack for return addresses. Our research is targeted to finding the breaches in the compiler protection methods. We tested MSVC, gcc, and clang and found that two of these compilers have flaws that allow exploiting buffer overwrite under certain conditions.
73-92
Abstract
This paper introduces a refined method for automated exploitability evaluation of found program bugs. During security development lifecycle a significant number of crashes is detected in programs. Because of limited resources, bug fixing is time consuming and needs prioritization. It should be the matter of highest priority to fix exploitable bugs. Automated exploit generation technique is used to solve this problem in practice. Generated exploit confirms the presence of a critical vulnerability. However, state-of-the-art publications omit modern defense mechanisms preventing exploitation. It results in lowering of an evaluation quality. This paper considers modern vulnerability exploitation prevention mechanisms. An evaluation of their prevalence and efficiency is also presented. The method can be applied to program binaries and doesn’t require any debug information. Proposed method is based on symbolic interpretation of traces obtained by a full-system emulator. Our method can demonstrate a real exploitability for stack buffer overflow vulnerability with write-what-where condition even when DEP, ASLR, and “canary” operate together. The implemented method capabilities were shown on model examples and real programs.
93-104
Abstract
Development of malware detection techniques leads to the evolution of anti-detection techniques. In this paper we discuss possibility of creating an automatic tool for signature modification. In this article we describe our experience in designing and development of such tool. For signature modification in Linux programs we implemented a tool based on LLVM compiler infrastructure and for Windows programs we used post-link instrumentation and optimization tool Syzygy. The former approach requires program source code, while the latter assumes only the presence of debug information. Diversifying and obfuscating transformations were implemented in both cases with the aim of changing the signature of program to prevent matching them the known patterns. Implemented transformations are bogus code insertion, function permutation, instruction substitution, ciphering of constant buffer. As a result we demonstrate proof-of-concept examples which confirm that it is possible to automatically change of program signature for avoiding detection by signature-based analysis. Furthermore we explain drawbacks of this technique and discuss the further ways of development.
105-118
Abstract
This paper is devoted to the formalization of the error criteria for program static analysis, based on symbolic execution. Using the original error criteria of symbolic execution approach in program static analysis leads to an excessive number of false positives. To solve this problem, we propose an alternative definition of the error criteria. Proposed definition reports errors only if they occur on a certain set of input variables. Examples of such sets are the set of values of input variables in which control will pass through a given point of the program, or set of values in which the controls take place along a given path in the control flow graph. This paper discusses the various ways to specify such sets of initial values, including analysis of the final error criteria. We overview algorithms corresponding to the error criteria and prove their correctness. Finally, we consider the practical applications of the given error criteria, which include classification of the warnings generated by static analysis tools; taking into account unknown function contracting, especially preconditions; using the proposed error criteria as formulas for a SMT-solver. The latest application allows to get the precise solution of the particular error criteria, including the error trace.
119-134
Abstract
We propose inter-procedural static analysis tool for buffer overflow detection. It is based on previously developed intra-procedural algorithm which uses symbolic execution with state merging. This algorithm is path-sensitive and supports tracking several kinds of value relations such as arithmetic operations, cast instructions, binary relations from constraints. In this paper we provide a formal definition for inter-procedural buffer overflow errors and discuss different kinds of such errors. We use function summaries for inter-procedural analysis, so it provides natural path-sensitivity in some degree. This approach allowed us to improve intra-procedural algorithm by tracking inter-procedural value dependencies. Furthermore, we introduce a technique to extract the sufficient condition of buffer overflow for a function, which is supposed to be stored in the summary of this function and checked at every call site. This approach was implemented for Svace static analyzer as the new buffer overflow detector, and it has shown 64% true-positive ratio on Android 5.0.2.
135-144
Abstract
Buffer overflows are one of the most common and dangerous software errors. Exploitation of such errors can lead to an arbitrary code execution and system disclosure. This paper considers a method for detecting memory violations. The method is based on combined (static-dynamic) analysis of binary code. Analysis is based on symbolic interpretation of machine instructions executed during a single program run. Proposed method also provides abstraction from buffer sizes and can reveal sizes that cause buffer overflow errors. Analysis can be applied to program binaries and doesn't require a source code. Two techniques are proposed to improve method precision: cycle analysis and code coverage increase. Cycle analysis is one of the cumbersome problems in dynamic analysis. Separate cycle instruction analysis leads to an excess of constraints over input data that causes potential false negatives. The proposed technique is able to analyze cycles entirely and abstract from number of cycle iterations. One of the drawbacks of a single run analysis is an insufficient code coverage which prevents some errors from discovery. The technique proposed to increase code coverage is based on a dynamic symbolic execution. Some minimal path set from discovered code paths is selected and used to achieve better code coverage than from a single run. Inputs corresponding to each path from selected set are used to analyze several program runs. Proposed techniques were implemented and used to discover both known and non-disclosed bugs.
145-158
Abstract
The goal of finding unreachable code is to report warnings about possible bugs in the source code and an increase of other analyses accuracy. The paper describes unreachable code classification and approaches for finding unreachable code in C/C++ programs. We described three data-flow analysis problems: value interval analysis, excluded value analysis, predicate analysis. Solutions for these problems are used to detect redundant expressions in conditional statements. We described common occurrences of useless warnings. The algorithms are implemented in the Svace tool that is developed by ISP RAS. The results are evaluated for open source projects Android-5.02 and Tizen-2.3. They represent the number of found warnings and its intersection.
159-174
Abstract
Dynamic symbolic execution is a well-known technique used for different tasks of program analysis: input generation for increasing test coverage for program, inputs of death generation, exploit generation and etc. But huge time costs of program analysis during dynamic symbolic execution for any real-life program is a well-known problem caused by path explosion and necessity of path constraint solving for every path with different SAT/SMT techniques which is a NP-complete task in general case. Brute force analysis of every path in program has limited practical sense for time limited analysis; instead different techniques and heuristics are used to improve analysis performance and reduce space of analysis for specific needs of analyst or while solving specific problem under analysis. We present our approach which combines static analysis of program binary code based on binutils library with dynamic symbolic execution tool based on Avalanche - an iterative dynamic analysis tool to perform targeted input data generation for reaching specific function in the program. As the first step of our algorithm we extract reduced program call graph which contains only calls to functions which ends with the function of interest, then we amplify this call graph with control flow graph inside of functions included into reduced call graph. Using the reduced control-flow graph of program which contain only calls and conditional jumps directions which lead to the function of interest we built the metric of best next analysis direction. This approach allows us to significantly (up to twelve times for some real world programs) reduce the time of reaching function of interest comparatively to brute force program paths analysis with inversion of every conditional jump at the execution path dependent on tainted data.
175-198
Abstract
This paper proposes the two different approaches to speed-up program build: making link-time optimization work in parallel and lightweight optimization approach. The former technique is achieved by scaling LTO system. The latter makes link to work faster because of using summaries to manage some of interprocedural optimization passes instead of full IR code in memory. The problem of horizontal LTO system scaling leads to the problem of partition of the large task to several subtasks that can be performed concurrently. The problem is complicated by the compiler pipeline model: interprocedural optimization passes works consequentially and depends on previous performed ones. That means we can divide just data on which passes works, not passes themselves. We need to divide IR code to sub-independent parts and run LTO on them in parallel. We use program call graph analysis to divide a program to parts. Therefore, our goal is to divide call graph that is one of NP-compete problems. Nevertheless, the choice of the dividing algorithm strongly depends on properties of divided graph. The main goal of our investigation is to find lightweight graph partition algorithm that works efficiently on call graphs of real programs and that does not spoil LTO performance achievements after optimizing code pieces separately. This paper proposes new graph partition algorithm for program call graphs, results of comparing this one with two other methods on SPEC CPU2000 benchmark and implementation of the algorithm in scalable LLVM-based LTO system. The implementation of this approach in LTO system shows 31% link speedup and 3% of performance degradation for 4 threads. The lightweight optimization shows 0,5% speedup for single run in lazy code loading mode.
199-214
Abstract
Register allocation have a significant impact on performance of generated code. This paper explores the problem of global register alloction during dynamic binary translation. Since existing algorithms are designed for compilers and they are not always suitable for dynamic binary translation, a new global register allocation was developed. This algorithm decides which variables should reside on which registers before and after each basic block (called pre- and post- conditons of this basic block) and solves local register allocation problem in these constraints. To ensure that pre- and post- conditions of different basic blocks are consistent the algorithms must choose these conditions in such a way that for each basic block b' that precides arbitrary block b it's postconditions are the same as preconditions of b . This can be achieved by finding groups of arcs in control flow graph on which these conditions should remain the same (let's call them synchronisation points) and then choosing conditions for each such synchronisation point. Synchronization points are connected components in graph GE which is a graph where arcs of original CFG are vertices and edges connect arcs which start or end in the same basic block. This gives an efficient way to find synchronization points. To choose which variables should reside on registers in each synchronization point the amount of available register is estimated using register pressure in incident basic blocks. Then actual variables a picked based on how often they are used in incident basic blocks. Finally the local register allocation algorithm is modified to use precondition and ensure post conditions of the basic block. The efficient algorithm to convert existing allocation to the desired postcondition at the end of basick block is presented with the proof of that it's optimal in terms of generated spills, reloads and register moves. The described algorithm showed 29.6% running time decrease on the synthetic example. The needed modifications of the algorithm to efficiently run on real life application are explored.
215-226
Abstract
During the software development developers often copy and paste fragments of code to achieve the desired result. Copying of code can lead to variety of errors, as well as can increase the size of the source and binary code. The problem of finding semantically similar pieces of code (clones) in binary code becomes actual due to the unavailability of source code of many software programs. The first part of the article is dedicated to the analysis of the existing methods for finding code clone in binary code. In the second part we provide a newly developed tool for finding code clones in binary code. The work of the tool is divided into three main stages. The first stage is based on the Binnavi [1] framework, which is responsible for generation of program dependence graphs (PDG). Program dependence graphs are generated using REIL (Reverse Engineering Intermediate Language). The usage of REIL language allows to generate graphs for multiple architectures (x86, x86-64, ARM, MIPS, PPC), thus providing the independence of the tool from the target architecture. In the second step code clones are found based on previously created graphs. Maximum common subgraph is built for each pair of graphs and based on it, code clones are detected. In the third stage, the detected clones are visualized for convenient analysis of the results.
227-238
Abstract
Software testing is a time consuming process. In general, software companies spend about 50% of development time on testing. On the other hand, lack of testing implies financial and other risks. In the case of systematic testing a suitable test suite should provide an appropriate code coverage. A lot of code-based test generators have been developed in order to provide systematic code coverage. Such tools tend to produce lots of almost unreadable test suites which hard to verify manually. This problem is formulated as a Human Oracle Cost Problem. This work introduces a method for readability optimization of automatically generated test suites in context of Symbolic Execution. It uses natural language model in order to optimze each test case. This conception has been applied first in the Search Based Testing paradyghm. In contrast of existing search based tool proposed DSE-based tool uses SMT-solver in order to incrementally improve readability of a single test for a single program path. To validate proposed method a tool on top of LLVM and CVC4 frameworks is created. Experimental evaluation on 12 string processing routins from the Linux repository shows that this method can improve test inputs gracefully.
239-268
Abstract
The language FlexT (Flexible Types) is intended for specification of binary data formats. The language is declarative and designed to be well understood for human readers. Its main elements are the data type declarations, which look very much like the usual type declarations of the imperative programming languages, but are more flexible. In the article we first give a review of the capabilities of the modern projects oriented to specification of binary file formats. Then we consider the main features of the FlexT language and, in particular, the features that help to describe the formats of encoding of machine instructions. Finally we briefly describe the software developed, which is based upon the FlexT interpreter and some new capabilities of information search, which makes possible the use of the specifications.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)