Vol 22 (2012)

В. Иванников

PDF (Rus)

Two-stage compilation for optimizing and deploying programs in general purpose languages

Arutyun Avetisyan

PDF (Rus)

Abstract

We describe the approach for two-stage compilation of C/C++ programs using the LLVM compiler infrastructure that allows optimizing programs taking into account the user profile and his/her target machine features, as well as deploying programs in a cloud storage while transparently optimizing and checking for defects. The notable features of the approach are its applicability to programs written using general-purpose languages and utilizing the common compiler infrastructure on all optimization and deployment stages.

On the first stage (host machine) the build process is transparently captured and the LLVM bitcode files are generated with the proper support for archive/library files. LTO optimizations are also performed at this time, and the program dependencies and installation structure (e.g. libraries, installed resource or documentation files) are also captured. On the second stage (target machine) the final code generation and installation is performed with the transparent path translation. Either static code generation or dynamic JIT-based compilation is supported.

As mentioned, a cloud-based deployment strategy can also be used allowing for additional features like defect and vulnerability checking while the cloud store has access to the deployed programs in the LLVM bitcode form.

While optimizing LLVM toolchain for working on ARM embedded devices we have achieved the memory consumption reduction of up to 10% and 10-20% compile time decrease.

Instruction scheduling and software pipelining for modern architectures

Arutyun Avetisyan

PDF (Rus)

Abstract

We describe the approach for instruction scheduling and software pipelining based on a two-stage extensible architecture of detecting and using the available instruction level parallelism. The detection stage is based on a selective scheduling approach and consists of a kernel supporting instruction movement with bookkeeping code creation and instruction unification, and of modules that implement additional instruction transformations such as register renaming, control and data speculation, predication (conditional execution). The usage stage is a number of heuristics for choosing the best instruction for schedule on the given scheduler iteration. Among the usual critical path heuristics, we introduce execution probability heuristics to control speculative movements, register renaming limitations with tracking register pressure, and the additional scheduling pass quickly removing the schedule holes possibly introduced by software pipelining. We concentrate on the basic approach improvements that were needed for implementing the suggested method in an industrial compiler. We also show experimental results of the scheduler on Intel Itanium and ARM platforms: Itanium speedup achieves 4% on average for SPEC FP 2000 tests with up to 10% for individual tests, while initial ARM support achieves 1-3% speedup for smaller programs.

Loops software pipelining on ARM platform

Roman Zhuykov, Dmitry Melnik, Ruben Buchatskiy

PDF (Rus)

Abstract

This article describes improvements we made in the implementation of swing modulo scheduling (SMS), a well-known software pipelining technique, in the GNU Compiler Collection (GCC) for ARM platform. Prior GCC implementation required a loop being pipelined to conform to the do-loop pattern, which needs a special hardware instruction. However, such hardware instruction is absent on ARM. First we implemented a “fake” do-loop instruction in the ARM backend, which helped us to verify whether GCC SMS implementation is profitable on ARM. Then we designed and implemented support for loops which loop counter varies as an arithmetic progression. In do-loops the loop counter must be used only in control part of the loop, and we allow reading loop counter register by other loop instructions. For such loops we improved the algorithm for creating prologue and epilogue as well as implemented much more complex algorithm of verification conditions for entering performance-optimized version of the loop. Also we made necessary changes in data dependency graph to generate correct code. When dependency graph is built we create additional anti-dependencies between instructions which use flag register. The resulting performance improvement is 3-4% for selected test applications on ARM platform. For x86-64 platform, performance results are mostly neutral, with exception of 2-3% improvement on matrix multiplication tests.

Automatic tuning of GCC optimization passes for ARM platform

Roman Zhuykov, Dmitry Plotnikov, Mamikon Vardanyan

PDF (Rus)

Abstract

Modern compilers can work on many platforms and implement a lot of optimizations, which are not always tuned well for every target platform. In the paper we present a tool for automatic tuning of compilation parameters that was developed in ISP RAS. The tool uses genetic algorithm to evolve the best compiler optimization parameters. It supports cross-compilation, parallel build and execution on remote targets, and support for two-tier compilation mode for profile-guided optimizations. Also it provides the tools for automated analysis of the tuning results to help identify the most important compilation parameters.

We used our tool to tune GCC parameters for ARM platform, and demonstrated how it can be used to improve performance of several popular applications. Using the tool’s automated analysis of the tuning results, we found several deficiencies in GCC compiler optimizations. The affected optimizations include if-conversion (conversion of conditional branches into predicated form), prefetching (adding of data prefetching instructions in loops), and autovectorization. We identified the reasons for suboptimal code generation, and improved affected optimizations accordingly. Also, we describe a few other improvements that we made to GCC based on the results of manual assembly code analysis. Described improvements to GCC yield significant speedup for chosen test applications on ARM platform.

Local Register Allocation Problem in Dynamic Binary Translation

Kirill Batuzov

PDF (Rus)

Abstract

QEMU is a generic machine emulator based on a dynamic binary translation approach. This paper evaluates possible improvements to the local register allocation algorithm in QEMU. The local register allocation problem is NP-hard, but there are several approximate heuristic algorithms. Among them Furthest-First and Clean-First heuristics are known to produce a near optimal result in linear time.

Local register allocation algorithm in QEMU scans instructions in direct order. For each instruction operand that is not yet stored on a register, an empty register is allocated. If there are no empty registers, then a spill of an arbitrary register is generated. On basic block ends and helper functions' calls all global variables are also spilled.

Two improvements for the register allocation algorithm are proposed in this paper. The first is to improve spill choices by using the heuristic from the Furthest-First algorithm. The second is to aggressively spill global variables which will be spilled anyway because of helper function calls and basic block ends. Both improvements were implemented and tested on practical workloads. Profiling results show that the amount of generated spills was reduced by approximately 1%.

With these improvements, over 96% of spills are generated due to function calls and basic block ends. This implies that the further improvements to this algorithm are not possible without altering its existing constraints.

Two approaches to full-system deterministic replay in QEMU

K. Batuzov, P. Dovgaluk, V. Koshelev, V. Padaryan

PDF (Rus)

Abstract

In the paper we evaluate two approaches to full-system deterministic replay. Both of them allow replaying guest OS and applications without modifications. Deterministic replay is intended for debugging and dynamic analysis of system core, multithreaded and non-deterministic applications, cross-platform applications, and devices’ drivers.

Presented approaches differ by boundary line dividing non-deterministic “outer world” and deterministic part of the simulator. All inputs from the “outer world” are written into the log to allow latter replaying of deterministic part. The common thing in both approaches is that deterministic part includes CPU, RAM, and video adapter. This allows debugging, tracing, and analyzing of the replayed code.

In the first approach “outer world” is presented by inputs – keyboard input, USB devices, mouse, network adapter, and microphone. All virtual peripheral devices should be deterministic in this approach. In the second approach all emulator parts except CPU, RAM, and video adapter are considered external. This means that all interactions between CPU and virtual peripheral devices (including IO, MMIO and DMA transactions) are written into the replay log.

The first approach has the following pros: one can replay whole virtual machine for devices’ drivers debugging; relatively low number of changes within the simulator; low usage of storage for replay log. The drawback of this approach is the need to support for all external interfaces of the virtual machine. The second approach is completely opposite – it requires changes only in the several interfaces, but every virtual device with DMA interface should be modified. It also generates bigger replay logs.

We evaluated time and space overheads for the first approach. Slowdown is very low – it is lower than 1% in loading Windows XP scenario and is about 25% for network operations. Replay log growth speed for basic guest OS execution is about 50kb per second.

Recovery of binary data structures from program traces

A. I. Avetisyan, A. I. Getman

PDF (Rus)

Abstract

In this paper we consider the problem of recovery of binary data formats and describe the format recovery system implemented in ISP RAS. First, we enumerate general approaches to this problem, their advantages and constraints: static, dynamic and network trace analysis. Here we also describe the fundamental dynamic analysis constraint (incomplete code coverage) and several possible methods to partly compensate it in this particular problem. Second, we discuss data sources and features of analysis of such objects as files, network packets of different levels and different kinds of protocols (stateful and stateless), incoming and outgoing messages. We also discuss the problem of protocol analysis and specifically the problem of recovering the protocol state machine. Third, we describe our function specification facility that allows us to define models of functions and their parameters and brings additional accuracy to our format recovery approach through taking into consideration user's knowledge about the features of a specific software environment. In this paper we also present the general scheme of our approach and test results of the implemented system. Finally, we discuss future research directions: encrypted traffic analysis and several possible applications for recovery results.

Incremental source code analysis for C/C++ languages

V. O. Savitsky, D. V. Sidorov

PDF (Rus)

Abstract

The article describes a way of implementing static code analysis for C/C++ languages that allows to reduce dramatically the time of the second and following analysis iterations. Main idea behind algorithm based on fact that programs on C/C++ languages widely use common parts of code placed inside header files with declarations and such header files included inside of several compilation units of projects. The ratio of the number of tokens in the source file to the number of tokens in the included headers is small. It is possible to reduce time of analysis using cache of compilation results for rarely changed part of compilation unit and use it in following iterations of analysis during work inside modern IDEs. The method is implemented in a commercial product for MS Visual Studio. The result of implementation described method is reduced more that in ten times time needed to analyze changed source code. Such performance improvement allows to re-analyze code on every changed symbol inside IDE and issue defects found in source code on-the-fly while developer creates program. This method is implemented inside of product Klocwork inSight 9.5

Combined (static and dynamic) analysis of binary code

A. YU. Tikhonov, A. I. Avetisyan

PDF (Rus)

Abstract

This paper investigates the process of binary code analysis. To achieve typical goals (such as extracting algorithm and data formats, exploiting vulnerabilities, revealing backdoors and undocumented features) a security analyst needs to explore control and data flow, reconstruct functions and variables, identify input and output data. Traditionally for this purposes disassemblers and other static data flow analysis tools have been used. However, since developers have been taking steps to protect their programs from analysis (for example, code being unpacked or decrypted at runtime), static analysis may not yield results.

In such cases we propose to use dynamic analysis (analysis of execution traces of the program) to complement static. The problems that arise in the analysis of binary programs are discussed, and corresponding ways to automate solving them are suggested. The core of proposed method consists of whole system tracing and consecutive representation lifting: OS-aware events, process/thread identification, fully automated control and data flow reconstruction. The only manual step is searching for anchor instructions in the trace, e.g. I/O operations, which are used as input criteria for another automated step: precise algorithm extraction by trace slicing. The final step of the method constructs static test case code suitable for further analysis in tools such as IDA Pro.

We implemented the proposed approach in an environment for dynamic analysis of binary code and evaluated it against a model example and two real-world examples: a program license manager and a malware program. Our results show that approach successfully explores algorithms and extracts them from whole system traces. The required efforts and amount of time are significantly reduced as compared with traditional disassembler and interactive debugger.

An Approach to Reconstruction of Control Flow of an Obfuscated Program

Ilya N. Ledovskikh, Maxim G. Bakulin

PDF (Rus)

Abstract

Control flow obfuscation is one of widespread methods used to protect application binary code from analysis. The obfuscation transformations dramatically increase the complexity of separation and recognition of the algorithm and data structures. This paper provides a short survey of obfuscating transformations and tools, essentially the control flow obfuscations like opaque predicates, dispatcher and code virtualization. Our research was concentrated in area of de-obfuscation based on dynamic binary code analysis methods. An approach to locating of potentially opaque predicates and accompanying obfuscations (like dead and redundant code insertion) in program trace is given. The main idea underlying the described method of opaque predicate detection is extraction of predicate code via dynamic program slicing. Next, we apply some heuristics to automatically reduce predicate list, and then our tool provides the analyst with information about potentially opaque predicates found in application binary code.

Another control flow obfuscation eliminated by described tool was code virtualization. An algorithm for control flow graph recovery of virtualized code is based on virtual program counter analysis. The corresponding tool was implemented and successfully tested on several applications that were protected via various virtualization obfuscators (VMProtect, CodeVirtualizer, Safengine, etc.). The experimental results described in this paper demonstrate the practical adaptability of implemented method to de-obfuscate virtualized applications.

Static analysis usage for customizable checks of programming languages semantic constraints

Valery Ignatyev

PDF (Rus)

Abstract

We describe the usage of programming language constraints to achieve program security and portability, which are especially important for large projects. The existing collections of such constraints (like MISRA C++, JSF, or HICPP) contain only descriptive natural language rule definitions, which could be possibly ambiguous or incomplete.

We propose the formal model for constraints definition, which allows specifying stylistic, syntax and contextual rules. We also give the constraints classification, which splits all constraints into 4 disjoint groups determining the constraint complexity and optimal checking order. To solve the problem of automatic rules checking, we have developed the analyzer based on the C/C++ Clang compiler maintained as a part of the LLVM project. We also describe some specific details of analyzer implementation: basic components overview, the set of specially developed and well-known static analysis algorithms used for constraint violation discovering, rule ordering approach, permanent external storage (SQLite database) usage for intermodule analysis and work with error messages (sorting, history). We also provide analyzer integration with popular build systems so that the source files used in the build process are analyzed automatically.

The implemented system is able to check approximately 50 different C and C++ constraints and requires only 20% more time than the regular optimized build.

The prospects for virtualization of high performance x64 systems

Alexander Kudryavtsev, Vladimir Koshelev, Arutyun Avetisyan

PDF (Rus)

Abstract

In this paper we explore the prospects of virtualization technologies being applied to x64-based high performance systems. The main reasons for performance overhead when running parallel programs inside multiple virtual machines are discussed. We consider KVM/QEMU and Palacios systems and use HPC Challenge and NASA Advanced Supercomputing Parallel Benchmarks packages as a test suite. All tests are performed using high performance cluster with high-speed Infiniband interconnect.

Gathered results show feasibility of applying virtualization to a big class of high performance applications. We use huge memory pages to decrease number of TLB cache misses and NUMA emulation mechanism to provide the description of memory topology of real system to virtual machine. Such optimizations decrease virtualization overhead from 10-60% to 1-5% for most tests from HPC Challenge and NASA Advanced Supercomputing Parallel Benchmarks suites. The main bottlenecks of virtualization systems are decreased memory system performance (critical only for a narrow class of applications), device virtualization overhead, and increased noise level caused by the host operating system and hypervisor. Noise may affect performance and scalability of fine-grained applications (those with frequent communications of small size). While the number of nodes in the system grows, the noise influence substantially increases.

Optimizations in Dynamic Binary Translation

Andrey Belevantsev, Alexey Merkulov, Vladimir Platonov

PDF (Rus)

Abstract

We suggest using OpenCL standard for programming FPGA devices that are used as accelerators in a heterogeneous system. We describe the implementation of a subset of OpenCL that is required for organizing data exchange and task management for FPGAs given that CPU and FPGA are connected via PCI-express bus. Basically, the first part of the required functions is the simple device manipulation and FPGA program loading; the latter requires flashing the FPGA via the JTAG interface. The second part is the memory buffer transfer to and from the FPGA. Its implementation in the runtime library is straightforward given that the FPGA supports PCI-express exchanges; the main load falls onto the FPGA driver and the FPGA system-level firmware organizing these exchanges. The final part is the FPGA task management that is achieved via the simple task scheduler implemented within the FPGA driver. The code running on FPGA can be created with a hardware description language or generated automatically using one of the known translators, e.g. C-to-Verilog, but it should adhere to the ABI described by the FPGA driver and firmware implementations.

Optimizing OpenFOAM GPU Solvers

Alexander Monakov

PDF (Rus)

Abstract

The paper presents preliminary research on improving performance of CFD simulations in OpenFOAM via offloading parts of computations (specifically, solution of linear systems) to a graphics accelerator (GPU). We present a short review of OpenFOAM package and describe porting conjugate gradient method to the GPU architecture using CUDA programming model. Porting the basic algorithm is straightforward, however care should be taken to avoid unnecessary copying over PCI-Express bus. Efficient preconditioning on the GPU is then discussed. We use approximate inverse preconditioning, which can be implemented with good parallelism on the GPU. To amortize the cost of preparing the preconditioner, we allow reuse of preconditioners on the GPU and compute them on the CPU in a helper thread asynchronously. We mention several optimization opportunities: reordering the preconditioner to upper-left triangular form so that CUDA blocks multiplying by denser parts of preconditiner factors are scheduled first; using single-precision storage for the preconditioner to save memory bandwidth; reordering the mesh with nested dissection method from Metis library and using mixed-precision iteration for the conjugate gradient method. Preliminary performance testing results show performance improvement starting from 64000-cell meshes and reaching 2x for a 1-million cell mesh for a non-parallel run. As future work we mention support for parallel runs with MPI, research of other solvers such as multigrid, BiCGStab and IDR, and choosing drop tolerance automatically for the AINV preconditioner.

The final models of specification

Igor Bourdonov, Alexander Kosachev

PDF (Rus)

Abstract

The paper describes the research in formal methods of conformance testing of the target system against requirements given in specifications. Such testing is based on interaction semantics defining test stimuli and observations of actions and refusals (absence of actions). Unobservable actions and refusals are also possible. Destruction is introduced as a forbidden action that should be avoided during interaction. A notion of safe testing is also introduced, when no unobservable refusals and destruction occur and no test stimuli applied in divergence. On this basis, the implementation hypothesis of safety and safe conformance are defined, as well as the generation of complete test suite from specification.

The most common model of specification is LTS (Labelled Transition System). However, for the described interaction semantics, only traces (sequences of observations) are important, not the LTS states. Therefore, the most natural is the trace model defined as a set of LTS traces.

The goal of this paper is to define the subset of specification traces sufficient for generation of the complete test suite. We called such subset the final trace model of specification. On the other hand, LTS model is convenient as a way of finite representation of regular trace sets. To represent the final trace model, the paper proposes a variation of LTS called final RTS (Refusal Transition System). The transitions on observable refusals are defined explicitly. Such model is very convenient for test generation: 1) it is deterministic, 2) trace of observations is safe iff it ends in non-terminal state with no destruction, 3) test stimulus is safe after the trace iff it is safe in the final state of the trace.

The paper proposes algorithms for transformation of LTS model into final RTS model. Sufficient conditions for creation of the final RTS in finite time are also defined.

Optimization of Boolean satisfiability solver by caching intermediate results

S. Vartanov, D. Sidorov

PDF (Rus)

Abstract

Recently, a number of dynamic analysis tools were developed that perform tainted data flow tracing and use algorithms for solving the CNF SAT problem for input data generation and path alteration. SAT problem is known to be NP-complete and it requires a lot of optimizations to partially solve the exponential growth problem. The problem appeared in Avalanche dynamic analysis project for automatic defect detection. In this paper we propose a modification of Davis—Putnam—Logemann—Loveland (DPLL) algorithm using intermediate result caching.

For path condition solving Avalanche uses STP which uses MiniSAT solver for Boolean formula satisfiability checking. MiniSAT main mechanism is based on variation of DPLL algorithm. We describe special properties of formula sets generated with dynamic analysis tools. Some formulas in generated set frequently have common parts. We introduce a DPLL optimization based on these peculiarity. Formula solving is performed on previously cached solver results and uses backtracking mechanism in case of failure. Described optimization is illustrated on simple examples.

Described mechanism was examined on random generated sets of Boolean formulas using implemented SAT solver prototype. Modified algorithm applications features a noteworthy increase in efficiency in comparison with standard DPLL approach.

Static Verification Tools for C Programs and Linux Device Drivers: A Survey

M. U. Mandrykin, V. S. Mutilin, E. M. Novikov, A. V. Khoroshilov

PDF (Rus)

Abstract

The survey considers methods and techniques used in modern static verification tools for C programs. It describes two main approaches Counter Example Guided Abstraction Refinement (CEGAR) and Bounded Model Checking (BMC) and techniques used to efficiently implement them such as Predicate Abstraction, Abstract Reachability Tree, Lazy Abstraction, Configurable Program Analysis, Explicit Analysis, Interpolation, and Shape Analysis. The paper also discusses current capabilities of the tools such as supported C programming language constructs, scalability, properties being checked, and trustworthiness of analysis results.

We provide description of such static verification tools, as BLAST, CPAchecker, HSF(C), SATABS, SLAM, WOLVERINE, YOGI, CBMC, ESBMC, LLBMC, FSHELL and PREDATOR. This description shows techniques implemented in these tools and their current capabilities. The paper presents results of the 1st International Competition on Software Verification in category DeviceDrivers64 which contains verification tasks based on device drivers from Linux kernel 3.0.

Specifics of device drivers verification are discussed and existing driver verification systems are described including Microsoft SDV for Windows operating system and DDVerify, Avinux and Linux Driver Verification for Linux.

The paper concludes that BMC-based tools work well for programs of limited size and control flow complexity. Regarding verification of device drivers that means these tools are able to quickly find violations of properties being checked if paths to these violations are quite short, but they mostly fail to prove correctness and to find complicated bugs. CEGAR-based tools demonstrate better scalability, while they have problems with handling address arithmetic and complex memory structures. For future improvements in static verification of C programs and Linux device drivers we propose composition of various techniques and modularization of analysis.

Instantiation-Based Interpolation for Quantified Formulae in CSIsat

Vadim Mutilin, Mikhail Mandrykin

PDF (Rus)

Abstract

The paper describes an implementation of instantiation-based Craig interpolation for quantified formulae. The implementation is based on the CSIsat interpolating solver. The tool supports interpolation for formulae with linear real arithmetic, uninterpreted functions and quantifiers. The paper suggests usage of an external decision procedure (namely CVC3) for quantified formula instantiation. It describes how the CSIsat and CVC3 tools were modified in order to support quantified formulae interpolation. The paper also contains results for benchmarking the modified CSIsat tool on a set of tests obtained by randomly splitting tasks from the SMTLIB library as well as on a small collection of specific dedicated interpolation tasks generated by encoding several manually specified parameterized error trace patterns with quantified logical formulae. The approach to interpolation considered in the paper is based on the recently proposed extension of the McMillan’s algorithm for resolution-based interpolant generation that was suggested earlier for implementation in the SMTInterpol interpolating solver by its developers. The extended algorithm additionally requires a set of quantified subformula instances sufficient for unsatisfiability proof of the initial formula and produces possibly quantified interpolants. A proper implementation of the algorithm could potentially be used in predicate abstraction-based verification tools for obtaining abstraction predicates from counterexamples by Craig interpolation. Though the evaluation presented in the paper showed that the considered implementation turned out to be too inefficient for this purpose due to significant repetitive proof overhead, which arose from combining a more efficient and advanced solver with a significantly less efficient one (in CVC3+CSIsat combination CSIsat is much less efficient than CVC3).

Analysis of typical faults in Linux operating system drivers

V. S. Mutilin, E. M. Novikov, A. V. Khoroshilov

PDF (Rus)

Abstract

Fast evolution of the Linux operating system kernel and drivers, developed by a big programmers community distributed all over the world, led to nowadays there is not a common base of rules that completely describe a correct interaction between drivers and the kernel. This is an obstacle both for programmers that do not have expert knowledge in all peculiarities of the given interaction, and for development and application of tools that could find corresponding typical faults in the automatic way. The given paper presents a method to detect and to classify typical faults and corresponding rules. This method is based on analysis of changes, made to Linux operating system drivers.

The paper gives results of the method application to stable versions of the Linux kernel from 2.6.35 till 3.0 starting from October 26, 2010 till October 26, 2011. We analyzed in total 1503 unique commits to drivers, marked 396 (about 27%) of them as fixes of typical faults and provided a classification and a distribution by classes for these typical faults. We distinguished 3 classes of typical faults: generic (faults all C programs are subjected to), specific (faults related to misuses of the Linux kernel API) and synchronization (faults related to parallel execution). We found that specific faults constitute about 50% of all typical faults. Then each typical fault was ascribed to one of 21 first-level subclasses: 12, 7 and 2 for specific, generic and synchronization classes correspondingly. We found that more than 50% of typical faults correspond to 5 first-level subclasses. More than 80% of typical faults correspond to 14 first-level subclasses. The most frequent faults were race conditions during parallel execution – they account for about 17% of all typical faults. Second and third places with about 9% were occupied by leaks of specific resources and null pointer deference.

Automation of scheduling for periodic real-time systems

A. V. Tretyakov

PDF (Rus)

Abstract

In the paper, we consider the scheduling problem for strictly periodic real-time systems. Strict periodicity means that all release points of all instances of any task must produce an arithmetic progression. Classical scheduling algorithms, such as Earliest Deadline First (EDF) and Rate-Monotonic Scheduling (RMS), are not applicable under strict periodicity constraint. The main problem is to search release points for all tasks. In fact, this search is NP-hard problem.

First, we study some necessary schedulability conditions for both classical and strictly periodic schedulability problem. Next, we suggest scheduling algorithm that consists of two main parts: heuristic algorithm of release points search, and scheduling algorithm for given release points. The algorithm of release points search is based on some ideas in number theory that allows iterate not all possible release points but only such points that with a high probability yields correct schedule. The scheduling algorithm for given release points is based on ideas of EDF algorithm.

Finally, we present developed tool set that provides automated scheduling of given tasks, allows visualization of generated schedule, provides GUI to edit schedule, and supports validation of built schedules. This tool set is a part of workspace for design of modern avionics systems developed in ISP RAS.

Recommender systems: a survey of modern approaches

Andrey G. Gomzin, Anton V. Korshunov

PDF (Rus)

Abstract

This paper is a survey of algorithms that are used in recommender systems. All related methods may be divided into three large groups: collaborative filtering, content-based and knowledge-based methods. These approaches are based on different source data that is used for recommendations computing. Collaborative filtering is based on analyzing ratings that user provide for items. The idea here is to recommend items that got high rating by the users that are similar with the user, who receives recommendations from the system. Content-based methods use available content of recommended items. Predominantly the content is represented as texts from items' names, descriptions and so on. Methods that analyze media content (images, videos, audio) are not considered in this paper. Knowledge-based methods analyze user requests that include information about what items do they want. The most popular methods are considered in this paper. Advantages and drawbacks of recommender technics of each group of methods are described. Moreover, composite techniques that allow drawbacks to be eliminated are considered in the paper. Composite technics assume that a lot of information about items is available. Hence, they combine collaborative filtering, content- and knowledge-based technics to provide more accurate recommendations for users.

Random graphs, models and generators of scale-free graphs

M. M. Bernovskiy, N. N. Kuzyurin

PDF (Rus)

Abstract

In this paper various models of random graphs describing real networks arising in different research fields: biology, computer science, engineering, sociology are considered. Particular attention is paid to models describing social networks. We start with observation of classical Erdos-Renyi model and basic facts concerning properties of random graphs depending on the value p of the probability of appearance each edge in a graph. Then we observe so-called scale-free model of random graphs proposed by Barabassi and Albert and some its generalizations. Some mathematical results about properties of random graphs in such models are described. For one such model of Bollobas-Riordan we describe results concerning the diameter of random graph, the number of verticies of given degree (power-law distribution) and other different properties. This class of models can be characterized by the property that so-called power law constant cannot be chosen in advance and has one fixed value. For example in Bollobas-Riordan model this constant is 3. Finely we observe more general models of random graphs such as Aiello-Chung-Lu model where the power law constant can be chosen as a parameter. In such models there are interesting results concerning evolution properties of random graphs depending on the value of parameter of power law constant. The main example of such property is the size of maximum clique in random graph which we consider in this paper.

Polynomial time algorithm for checking strong equivalence of program

V. A. Zakharov, T. A. Novikova

PDF (Rus)

Abstract

Strong (logic&term) equivalence of programs is the weakest decidable equivalence relation which approximates the functional equivalence of programs. In this paper we develop a new variant of the algorithm for checking strong equivalence of programs. A distinguished feature of our algorithm is that it relies completely on the algebra of finite substitutions which includes the operations of composition and antiunification. The paper begins with a short introduction to the theory of first-order substitutions. In the next section the formal first-order model of sequential programs and strong equivalence are defined. We also describe a preliminary variant of equivalence checking algorithm. This algorithm deals with composition and antiunification operations on finite substitutions. Since the complexity of these operations depends on data structures that represent substitutions, we consider in Section 4 graph-theoretic representations of substitutions as well as upper and lower bounds on the complexity of basic operations on substitutions. To reduce the complexity of equivalence checking algorithm for sequential programs we introduce in Section 5 a new class of truncated substitutions and study the basic algebraic properties of composition and antiunification operations on truncated substitutions. In Section 6 we showed that both operations on truncated substitutions can be performed in time O(n^2). Finally, in Section 7 using the properties of truncated substitutions we introduced an improved version of the equivalence checking algorithm and proved that its time complexity is O(n^6) on the size of programs to be checked.

On one problem of Koffman-Shor connected to strip packing problem

M. A. Trushnikov

PDF (Rus)

Abstract

In 1993 Coffman and Shor proposed on-line strip packing algorithm with expected unpacked area (waste area) of order O(n^{2/3}) in a standard probabilistic model, where n is the number of rectangles. In the standard probabilistic model the width and height of each rectangle are independent random variables with uniform distribution in [0,1]. It is well-known that optimal packing has expected waste of order O(n^{1/2}) and there exists off-line polynomial algorithm that provides this bound.. During more than 10 years the question concerning the possibility to obtain similar upper bound in the class of on-line strip packing algorithms was open. It was also known that in the class of popular so-called shelf algorithms the bound O(n^{2/3}) cannot be improved. In this paper we propose new strip packing algorithm which differs from all known on-line strip packing algorithms. We analyze our strip packing algorithm experimentally and show that the upper bound of expected unpacked area is of order O(n^{1/2}) which is optimal up to constant factor. Our experimnets show that this constant factor is close to 1.5-1.6. Our experiments were done with up to 2000000000 random rectangles. The only restriction is the following: we must know the number n of rectangles in advance. In a popular terminology concerning on-line algorithms this means that our algorithm is closed-end on-line algorithm.

The paper proposes a new online strip packing algorithm that provides a quality of packing significantly higher than well-known algorithm of Koffman-Shor.

Comparing complexities of problems of determining of Grebner’s basis of ideal and solving this ideal

A. V. Shokurov

PDF (Rus)

Abstract

A new method of investigation of ideals in the rings of polynomials was proposed by B. Buchberger in 1965. He proposed to use special basis in such rings named “Gröbner Basis” in the honor of his teacher. The proposed approach has allowed to prove the algorithmic reducibility of the task of finding solutions of a system of algebraic equations in many variables to the problem of solving algebraic equation of one variable. However, the practical use of the method needs enormous computational cost. The first estimate of the computational complexity of this method was obtained by G. Hermann in 1925, when the concept of Gröbner basis was not yet known. Was obtained the required upper estimate in the form of dual exponent of the number of variables and the maximum degree of incoming task description polynomials of degree of decomposition element ideal for an arbitrary basis. As it turned out later by T.W. Dube this estimate cannot be improved.

In 1996 K. Kühnle and E. W. Mayr proved, that for Boolean ideals the upper bound of memory capacity is exponential on input size. For zero-dimensional ideals A. Hashemi and D. Lazard proved the exponential complexity of finding Gröbner basis. Here we prove the lower bound of computation of such bases, by giving an example of an ideal having Gröbner basis of exponential size on input.

On creation approximating functions of characteristics of small number of systems

Ya. A. Khetagurov

PDF (Rus)

Abstract

Problem of accounting for the influence of many factors on the basic parameters of real-time computer based digital control systems at the initial design stage is solved mainly by selecting characteristics that take into account the complex area of work , system requirements and cost of material resources , characteristics of the product system, and the amount of their production. Values of these characteristics are influenced by various factors associated with the terms of building the system and its parts, with the production capacity of the enterprise, their applications, external conditions (funding, timing , the importance of the system) , especially with the experience of the leading developers and subjective conditions for the organization of the enterprise or enterprises producing system. Identifying mutual influence of these characteristics on the performance parameters of the system, parts, devices defined by above factors will be implemented by the construction of general mathematical models (MM) using the same characteristics of systems, parts, devices. Linear and hyperbolic approximating dependences will be applied to linear equations MM using performance data close to the destination and use of the systems, parts and devices.

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Cookies policy