Dynamic program analysis is a prominent approach towards software quality control allowing to perform automatic profiling, defect detection and other activities during software development. In this paper we focus on static binary code instrumentation – a technique to automatically modify program executable code in order to extract data necessary for dynamic analysis. We discuss the key features of this technique within context of dynamic analysis and propose a method to perform static binary code instrumentation for ELF executable and shared library files specifically targeting the ARM architecture.
We describe the main steps of the proposed method including the following: instrumentation specification and target code parsing, executable instrumentation code generation and finally target executable code file modification in order to insert instrumentation code and ensure that control transfer from original code to instrumentation code and vice versa will happen at runtime.
Executable code file modification is performed within bounds of ARM ELF specifications and is designed to minimize the changes introduced in actual executable code blocks. Instrumentation code is appended to target files as a set of separate sections; we implement control transfer to instrumentation code through unconditional jump instructions which replace small blocks of original instructions at instrumentation points. In order to preserve the original functionality we wrap instrumentation code blocks with instructions that save and restore program state; additionally, instructions replaced at instrumentation points are transferred to the instrumentation code blocks. We also describe a set of modifications performed in order to introduce instrumentation code external dependencies to the target executable files.
The proposed method was implemented in an instrumentation framework. We provide a brief overview of practical experiments using basic block counting and function entry/exit tracing as base instrumentation applications. The results show better performance in comparison to popular dynamic instrumentation framework Valgrind and low overhead for system-wide tracking of native Android libraries.This paper focuses on dynamic analysis of Java programs. We consider the following limitations: analysis tool may not have access to target program source code, and the program may be interpreted by a non-standard virtual machine with bytecode format different from Java Virtual Machine specifications. The paper describes an approach to bytecode instrumentation which is used to perform iterative dynamic analysis for new execution path discovery. Path discovery is performed through automatic input data generation by tracing tainted data, collecting path conditions, and satisfiability checking. The proposed approach is based on static bytecode instrumentation. The main advantages of this approach are analysis speedup (because of one-time instrumentation) and explicit access to statically generated instrumented bytecode which makes it possible to run instrumented program on different virtual machines with different bytecode formats. Proposed approaches were implemented in the Coffee Machine tool. Paper sections dedicated to this tool provide a detailed description of taint data tracing and automatic branch traversing techniques as well as a set of instrumentation utilities based on Coffee Machine allowing executed instructions printing, taint trace dumping, and synchronization events trace generation. Coffee Machine uses BCEL (bytecode instrumentation library) for instrumentation. The paper concludes with an overview of practical restrictions existing for discussed methods and possible future work directions. Main disadvantage of proposed approach is the inability to access dynamic data at run-time and instrument a set of system class methods. It may be resolved by method simulation and execution environment modifications.
In 2005, I wrote an article in which I discussed the most important features of the standards ODMG 3.0 (ODMG object model) and the SQL:2003 (SQL data model) and convincingly (as it seemed to me) argued that the similarity between the object model and object extensions to SQL is purely external, that close in mind syntactic constructions hide deep differences at the model level. Examples of such differences include von Neumann-style dereference of ObjectIDs in the ODMG model vs join-style dereference of reference values in the SQL model, separate and independent store of objects of one and the same object type in the ODMG model vs store of all raws of a typed table (SQL analogy of object) within this table, store of ObjectISs within extents in the ODMG model vs store within analogy of extent of objects their self in the SQL model, etc. Since then, it took many years for which I understood many things that were wrongly or insufficient correctly understood by me then, and gradually came to the conclusions that:
- differences that seemed to me deep, are not such, and indeed are not differences at the model level;
- the object extensions of SQL provide no less (and rather large) capabilities than the ODMG object model;
- reasonably (from the standpoint of the database community) used DBMSes based on the ODMG data model, one will create databases and tools to manipulate them similar to those prescribed by the SQL data model.
ISSN 2220-6426 (Online)