Vol 28, No 1 (2016)

Profile-based optimizations for JavaScript programs

V. . Vardanyan

PDF (Rus)

5-20

Abstract

In recent years, JavaScript has become one of the most popular programming languages on the web. Many massive applications are written using JavaScript, such as Gmail, Google docs, etc. JavaScript is also used in the Node.js - server-side web application developing platform. Moreover, JavaScript is the main language for developing applications on some operating systems for mobile and media devices. Examples of such systems are Tizen and FirefoxOS. Due to increasing popularity of JavaScript many big companies produced and continue to develop their own dynamic compilers for this language. JIT compilation makes it possible to implement many well-known classic optimizations to improve programs performance. To maintain a trade-off between quick startup and doing sophisticated optimizations, JavaScript engines usually use multiple tiers for compiling hot functions: lower tier JITs generate less efficient code, but can start almost immediately (e.g., even with interpretation), while higher tier JITs aim at generating very effective code for hot places, but at the cost of long compilation time. So even highly optimized JavaScript execution engines require some time to "warm-up" before reaching their peak performance. This work is dedicated to the performance improvement of modern dynamic multitier JIT compilers by designing and implementing profile-based optimizations in JavaScript V8 JIT compiler.

C# static analysis framework

V. . Koshelev, V. . Ignatyev, A. . Borzilov

PDF (Rus)

21-40

Abstract

The paper describes static analysis techniques that are used for defect detection in C# programs. The goal of proposed analysis approaches is to catch more defects within an acceptable amount of time. Although the paper contains a description of full analysis cycle, it mainly focuses on special aspects that distinguish C# analysis approaches from well-known Java and C++ techniques. The paper illustrates methods that allow taking into account C# specialties of all analysis stages: call graph and control flow graph construction, data flow analysis, context- and path-sensitive interprocedural analysis. We propose an adaptation of symbolic execution methods inspired by Bounded Model Checking and Saturn Software Analysis Project. The paper also explains the organization of memory model, which is suitable for both a precise intraprocedural analysis and a creation of compact function-bound conditions used for interprocedural analysis. Special attention is paid to optimization of condition size and simplicity during a path sensitive-analysis. The conditions produced by a path-sensitive analysis are supposed to be solved by modern SMT solvers like Microsoft Z3 Prover. Different approaches to external functions modeling are covered. All proposed approaches are implemented in the SharpChecker static analysis tool and, as evaluated on several open source C# projects of varying size (1K - 1.35M lines of code), display good results and scalability.

Summary-based method of implementing arbitrary context-sensitive checks for source-based analysis via symbolic execution

A. . Dergachev, A. . Sidorin

PDF (Rus)

41-62

Abstract

A specific approach to summary-based interprocedural symbolic execution is described. The approach is suitable for analysis of program source code developed with high-level programming languages and allows executing arbitrarily complex checks during symbolic execution, including throwing reports in the callee function about defects that only become certain within the caller context. The structure of the function summary, procedure of applying the summary in a particular context, composition of symbolic values for particular contexts, effect of summary-based analysis on complexity of implementing specific checker modules, procedure for constructing path-sensitive bug reports, and other aspects of the implementation are discussed in detail. A particular implementation of the approach, based on Clang Static Analyzer, is described. The implementation is scalable enough to allow analysis of large-scale software projects in reasonable time, finding bugs faster than the existing implementation of the inlining-based interprocedural analysis, without sacrificing correctness and soundness of the analysis. Particular checker modules, which find various defects, such as integer overflows, modifications of constant-qualified memory, multithreading issues, array bound checks, exception safety checks, and file stream errors, were updated to use the summary-based approach, demonstrating flexibility of the technique proposed. The implementation was tested by running full intra-unit inter-procedural analysis of the Android Open Source Project codebase.

Dynamic loader optimization for ARM

E. A. Kudryashov, D. M. Melnik, A. V. Monakov

PDF (Rus)

63-80

Abstract

The paper discusses an optimization approach for external calls in position-independent code that is based on loading the callee address immediately at the call site from the Global Offset Table (GOT), avoiding the use of the Procedure Linkage Table (PLT). Normally the Linux toolchain creates the PLT both in the main executable (which comprises position-dependent code and has to rely on the PLT mechanism to make external calls) and in shared libraries, where the PLT serves to implement lazy binding of dynamic symbols, but is not required otherwise. However, calls via the PLT have some overhead due to an extra jump instruction and poorer instruction cache locality. On some architectures, binary interface of PLT calls constrains compiler optimization at the call site. It is possible to avoid the overhead of PLT calls by loading the callee address from the GOT at the call site and performing an indirect call, although it prevents lazy symbol resolution and may cause increase in code size. We implement this code generation variant in GCC compiler for x86 and ARM architectures. On ARM, loading the callee address from the GOT at call site normally needs a complex sequence with three load instructions. To improve that, we propose new relocation types that allow to build a PC-relative address of a given GOT slot with a pair of movt, movw instructions, and implement these relocation types in GCC and Binutils (assembler and linker) for both ARM and Thumb-2 modes. Our evaluation results show that proposed optimization yields performance improvements on both x86 (up to 12% improvement with Clang/LLVM built with multiple shared libraries, on big translation units) and ARM (up to 7% improvement with SQLite, average over several tests), even though code size on ARM also grows by 13-15%.

Overlapping communications and computations in GPU-based iterative linear solvers

V. . Platonov, A. . Monakov

PDF (Rus)

81-92

Abstract

Krylov subspace methods such as Conjugate Gradient and Biconjugate Gradient Stabilized methods are well known approaches for solving symmetric and asymmetric systems of linear algebraic equations, such as systems usually arising from partial differential equations in computational mathematics problems, like Navier-Stokes equations in fluid dynamics. With increasing sizes of meshes and numbers of computational nodes, network communications time may become an issue: stalls during global reductions have increasing duration, preventing useful computations. This happens because, in original formulations of methods, computing a dot product requires a global reduce operation, and its value is required on the next step, so each process has to stop until all others reach this point, like in a barrier synchronization. We research alternative formulations of conjugate gradient methods (Preconditioned Conjugate Gradient and BiCGStab) for GPU-based iterative linear system solvers. They allow to have an overlap of parallel computations and communications, at the cost of increased amount of computations and memory requirements. We describe an implementation of our approach for GPU-accelerated hybrid systems in OpenFOAM, an open source framework for computational fluid dynamics. Asynchronous collective communications from MPI-3 parallel programming API are used to avoid full barrier synchronization and reduce latency. Experimental results on 2 and 4 million cases from standard OpenFOAM problems are presented.

Uniformly distributed load of hardware and software core in the UNIX-based systems

E. V. Palchevsky, A. R. Khalikov

PDF (Rus)

93-102

Abstract

In this article we consider the problem of maximizing the capacity of the network stack to the interaction of hardware and software core to ensure the stability of the physical server. The algorithms and program codes are proposed to optimize the load capacity of the CPU by core parallelization. The paper also considers statistics of improved power of distributed attacks affecting the network infrastructure. It proved the impact of any application with access to the external global network to the production of the physical server presented in the form of physical resources. With the help of the developed and implemented the algorithm (in the language of «BASH»), produced by the distribution of the load capacity of the physical server cores, to further reduce the load capacity on the processing power of the CPU is provided. Showcased flowcharts, as well as the final test results of each stage of development, are discussed. Implemented network optimization mode «AF_PACKET», which has given the opportunity to accept external network packets without any locks that, in turn, increases the efficiency of achievement of the set goals (upon request from the server to the client). The possibility of taking up to ten million incoming network packets by software physical server, which allows for stable processing of information for the smooth operation under DDoS-attacks «SYN-flood" who realized the possibility of overload multimillion network packets. A similar number of incoming network packets provides an opportunity to fill the external network channel, with a consequent increase in the load capacity of the network TCP / IP stack that covers the remote control area physical server as soon as possible. Also adversely affect the performance of the working environment.

Testing of automata system

I. B. Burdonov, A. S. Kossatchev

PDF (Rus)

103-130

Abstract

The problem of testing of aggregate systems is considered. The system is described with an oriented graph of links. The nodes correspond to automata of the components and arcs correspond to simplex communication channels. The hypothesis of the links is assumed: the graph of links is static and the link structure is error-free. In each state, the automaton can accept and send multiple messages through incoming and outgoing arcs (at most one message through each arc). The goal of testing is to cover transitions of the automata reachable during the system work. It assumed that during testing it is possible to observe the state changes of automata and the messages on the arcs. A simplified system model with only one message circulating is considered at the beginning. On its example we show that the hypothesis on links allows considerably reduce the number of required testing actions from the multiplication of numbers of the component automata states to the sum of these numbers. If the numbers of states of all automata are equal, it gives exponential reduction of the number of test actions. Then the more general model is considered when the system can simultaneously contain multiple messages, but not more than one on each arc. A composition of the system automata is defined and the restrictions on automata making the system deterministic are described. An algorithm of test generation is proposed basing on test filtration generated for covering all transitions of the deterministic composition system. Test is rejected if it covers only such transitions of the components that are covered by the remaining tests. In conclusion, the directions of future research are described.

Automata system: composition according to graph of links

I. B. Burdonov, A. S. Kossatchev

PDF (Rus)

131-150

Abstract

The problem of modeling and composing of aggregate systems is considered. The system components are described with finite automata with multiple entries and exits. The communication between automata is described with message passing over simplex communication channels. The system is described with a directed graph of links. Each node of the graph corresponds to automaton of a component and an arc corresponds to a communication channel connecting exit of one automaton with entry of another automaton. Automaton of the graph node in each state can accept multiple messages from its entries (at most one message from each entry) and send multiple messages to its exits (at most one message to each exit). Entries (exits) of the automata not connected to exits (entries) of automata are considered to be external and used for communication between the system and its environment. The automata of the system operate synchronously: on each cycle each automaton performs one transition. A transition of an automaton imposes requirements on states of all its entries and exits (messages in them are specified) and explicitly specifies subset of entries and exits through which the messages are received or sent, respectively. Synchronous communication between automata means that for each link the requirements of the automata connected with this link must conform to each other. It makes possible to describe a wider spectrum of automata behavior. For example, a priority of message receiving: if there are multiple message in the automata entries, it can receive messages with the highest priority and discard the rest of the messages. It also makes possible for the automaton to receive messages regardless of ability to simultaneously send some message to some exit. A composition of the automata of the system according to the graph of links is defined and its associativity is proved. In conclusion, the directions of future research are described.

Automata system: determinism conditions and testing

Igor Burdonov, Alexander Kossatchev

PDF (Rus)

151-184

Abstract

The problem of testing of aggregate systems is considered. The system components are described with finite automata with multiple entries and exits. The communication between automata is described with message passing over simplex communication channels. The system is described with an oriented graph of links. Each node of the graph corresponds to automaton of a component and an arc corresponds to a communication channel connecting exit of one automaton with entry of another automaton. The hypothesis of the links is assumed: the graph of links is static and the link structure is error-free. Automaton of the graph node in each state can accept multiple messages from its entries (at most one message from each entry) and send multiple messages to its exits (at most one message to each exit). An associative composition of the automata is defined. The restrictions on the automata and the graph of links are determined, for which their composition, i.e. the system itself, is deterministic. The goal of testing is to cover transitions of the automata reachable during the system work. It assumed that during testing it is possible to observe the states of automata and the messages on the arcs. The complete testing of the automata system without considering the hypothesis of links may require the number of testing actions of order of multiplication of numbers of states of the component automata, while with the consideration of the hypothesis of links - of order of the sum of these numbers. If the numbers of states of all automata are equal, it gives exponential reduction of the number of test actions. If the hypothesis of links is true, an algorithm of test generation for a deterministic system is proposed basing on filtration of tests generated for covering all transitions of the composition. Test is rejected if it covers only such transitions of the components that are covered by the remaining tests. In conclusion, the directions of future research are described.

Numerical modeling of rectangular channel with shallow dumbbell dimples based Code Saturne

A. . Tsynaeva, M. . Nikitin

PDF (Rus)

185-196

Abstract

Numerical study was conducted for rectangular channel with dimples. Developed model was tested for adequacy by simulating of experiment, conducted by Dr. Terekhov, which was found in good agreement. Experimental setup utilized a single spherical dimple which was set at 11 diameters from inlet. Test simulation was conducted for incompressible fluid (water) in accordance with experiment conditions: inlet velocity 0.43 m/s, Reynolds for dimple 20000 and channel length 1.34 m. A 3D computation domain was meshed for 0.8 million elements with six viscous layers totalling 3 mm thick applied to smooth walls. A turbulent flow (Re = 31627) in rectangular channel with shallow dumbbell dimples was modelled with open source Code_Saturne. An ideal gas (ρ = 1.205 kg/m³) was considered as working medium. A 3D computation domain was meshed with open source Salome Meca for 0.77 million elements ranged 0.2...1.0 mm. Six viscous layers totalling 2 mm thick were applied to smooth walls. Unsteady flow simulated with k-w SST model utilizing 2nd order discretization schemes (SOLU) for velocity. 2000 iterations were calculated so far with pseudo time step of 0.1 ms. Additionally, impact of mesh quality regarding elements size on computation results was shown. Obtained results showed a strong dependence of flow velocity from inclination of dumbbell towards flow axis. Adjacent dumbbell dimples cause partial flow laminarization. Developed model shows aerodynamic advantage up to 10 % of dumbbell dimples over spherical ones of the same depth (h = 1.2 mm) and contact patch area (S = 59.76 mm²).

MHD supersonic flow control: OpenFOAM simulation

A. I. Ryakhovskiy, A. A. Schmidt

PDF (Eng)

197-206

Abstract

MHD flow control is a relevant topic in today’s aerospace engineering. An OpenFOAM density-based solver that is capable of handling MHD supersonic flow problems with constant magnetic field is developed. The proposed solver is based on Balbas-Tadmor central difference schemes. This solver can be applied to studying the potential of MHD flow control systems for atmospheric entry vehicles. A supersonic flow around a spherically blunt cone both with and without MHD interaction is studied. Gases with thermodynamic parameters characteristic for Earth’s and Martian atmospheres are considered. The results show visible effect of magnetic field on surface temperature of the body. The differences between shock standoff distances and general shockwave configurations of MHD and non-MHD flow are also apparent. The solution is stable for Stuart number below 0.2. Conditional instability of the solver can be attributed to the MHD term’s contribution to the local speed of sound and can be avoided by taking it into account. The developed application has proven the suitability of the used schemes for resolving steep gradients in MHD supersonic flow problems. The study itself has shown theoretical possibility of studying the MHD flow control using OpenFOAM. Further research may include an effort to stabilize the solver and to enhance the mathematical model of the flow.

High-performance numerical simulation of stratified flows around a wedge in OpenFOAM

N. F. Dimitrieva, Yu. D. Chashechkin

PDF (Rus)

207-220

Abstract

Results of numerical simulation of a continuously stratified fluid are presented. They are characterized by a wide range of values of internal scales that are not in a homogeneous liquid. Mathematical model is based on the fundamental set of differential equations of inhomogeneous multicomponent fluid mechanics. The problem is solved using the finite volume method in an open source package OpenFOAM. To take into account the stratification and diffusion effects a new own solver was developed and tested using the standard and extended libraries of the package. A particular attention is focused at construction of a high quality computational grid which satisfies basic requirements for resolution of all the microscales of the problem in high-gradient regions of the flow. Testing of the proposed numerical model Testing was conducted for continuously stratified fluid flows around a motionless and a moving wedge-shaped body with straight and curved edges. The calculations performed in parallel regime on computational facilities of the web-laboratory UniHUB (www.unihub.ru) demonstrated complex structure of flows. High-gradient layers near the sharp edges of the obstacles have been identified. Formation of an intensive zone of pressure depression in front of the leading vertex of the wedge is responsible for generation of propulsive mechanism that results in a self-motion of the obstacle along its neutral buoyancy horizon in a stably stratified environment.

Realization of parallel computations in the software package «LS-STAG_turb» for viscous incompressible flow simulation on systems with shared memory

V. . Puzikova

PDF (Rus)

221-242

Abstract

Immersed boundary methods have become popular in Computational Fluid Dynamics over recent years for simulating flows through complex solid geometries and in coupled hydroelasctic problems. The advantage of these methods over a method with a body-fitted mesh is their computational efficiency: they do not require regridding when domain shape changes in the simulation process due to hydroelastic body motion. The LS-STAG method for viscous incompressible flows simulation combines the advantages of immersed boundary methods, the marker and cells (MAC) method and level-set method. The LS-STAG method and its modifications for numerical simulation in coupled hydroelastic problems and for turbulence simulation by using RANS, LES and DES approaches are implemented in the software package «LS-STAG_turb». This software allows to simulate viscous incompressible flows around a moving airfoil of arbitrary shape or airfoils system with one or two degrees of freedom. For example, it allows to simulate rotors autorotation and airfoils system wind resonance. These phenomena are characterized by high velocities of airfoils and high values of local Reynolds number. So, extremely small space and time steps are required to obtain accurate quantitative results. It leads to significant increase in computational cost. To decrease it, the «LS-STAG_turb» parallel version is developed. Intel^® Cilk™ Plus, Intel^® TBB and OpenMP parallel programming technologies are used. Also, serial code sections are optimized. The FGMRES method usage for linear algebraic equations systems solving allows to achieve 2-fold computation time reduction in comparison with the BiCGStab method. In addition, the developed software implementation of the FGMRES method is more efficient than the similar solver implemented in Intel® MKL library both for single-core and multi-core computations.

Open-source software for modelling of free surface flows

E. . Davydova, V. . Korchagova

PDF (Rus)

243-258

Abstract

Problems of free-surface flow of viscous incompressible fluid are very useful in different practical cases. There are many specifies and limitations in these problems which are critically important for correct solving. The main goal is the review of existing numerical methods which can apply for modeling of free-surface flows and open-source programs where these methods are realized. Three methods for solving problems of free-surface flow were considered: Volume of Fluid, Smoothed Particle Hydrodynamics, Particle Finite Element Method v.2. They are realized in five open-source packages: OpenFOAM, Gerris, pySPH, DualSPHysics, Kratos. These packages were compared by modeling of two chosen cases: breaking of a dam and droplet impact to the liquid layer. Results of computations were compared with experimental results. There are good coincidence between them. The best results were obtained in OpenFOAM and Gerris. All main tools for modeling of free-surface flow are realized in these packages - the possibility of computations in 2D, 3D and axisymmetric model setup and also correct modeling of surface tension. Gerris can significantly accelerate computations in "big cases" due to dynamically adaptive remeshing. Further, DualSPHysics is the package for modeling of problems of coastal infrastructure where the most number of cases is 3D and the surface tension effect is negligible. The package pySPH was designed for clear demonstration of SPH working. The pySPH source code is on the Python language and not optimized. Kratos is the new package, which is in development now, therefore some tools are not developed in this moment.

On the estimations of efficiency and error of fast algorithm in vortex element method

K. S. Kuzmina, I. K. Marchevsky

PDF (Rus)

259-274

Abstract

The main computational complexity of vortex methods is concentrated in the calculation of the convective and diffusive velocities of all vortex elements. The analogue of the Barnes - Hut algorithm is considered as one of the most efficient ways to acceleration of the velocities computation in vortex element method. This method is based on the tree (hierarchical structure of regions) construction. When calculating the convective velocities this algorithm makes it possible to take into account the influence of vortex elements, which are located far enough from each other, approximately by using Taylor approximation of expression for convective velocity. The influence of vortex elements, which are close to each other is calculated directly using Biot -Savart law. There are two parameters of algorithm that affect the computational complexity of the algorithm and the problem solving accuracy: is the maximum tree level, is the parameter, which determines the radius of a near-field zone. When calculating diffusive velocities the influence of the vortex elements to each other decays exponentially with increasing distance between them. Therefore, constructed algorithm of diffusive velocities calculation allows finding vortex elements from near-field zone using tree structure and calculating diffusive velocities using only vortex elements from this zone. The estimations of computational complexity, which depends on algorithm parameters and number of vortex elements, are obtained for the algorithms for convective and diffusive velocities calculation. Also estimations for the errors of the vortex elements velocities computation, which depend on the algorithm parameters, are constructed. These estimates allow in practice to choice the optimal parameters of the whole algorithm and achieve the maximum possible acceleration of the computations for the given maximum error level.

Application of statistical and spectral methods to computational modeling of internal wave attractors

M. Providukhina, I. Sibgatullin

PDF (Rus)

275-282

Abstract

Direct numerical simulation of internal gravity waves propagation and formation of internal waves attractors in trapezoidal tank, which was filled with stably stratified salt solution of constant buoyancy frequency. The left vertical boundary oscillates with constant time frequency and has the form of semi-cosine of the tank height. The right wall is inclined to the vertical and performs focusing of waves, the other two boundaries are horizontal. The upper wall has stress free boundary conditions, on the other boundaries no-slip condition is imposed. Navier-Stokes equations in Boussinesq approximation and diffusive transport of salt are used as mathematical model. Direct numerical simulation is performed with the help of spectral element approach of 8-th order and modified code nek5000. Hilbert transforms and time-frequency diagrams were applied to the results of direct numerical simulation of internal wave attractors. In particular, with the help of phase reconstruction we obtained wave vectors, corresponding to time frequencies from time-frequency diagrams. To obtain phase images Hilbert transforms with filtration over narrow intervals of frequencies were used. Time-frequency diagrams for moderate forcing amplitudes show appearance of daughter waves of frequencies, corresponding to triadic resonance, which is also demonstrated with the help of bispectra: product of amplitudes on the background of cross product of frequency intervals. The results are close to data of experiments, being carried out at ENS de Lyon.

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Cookies policy