Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 29, No 2 (2017)
View or download the full issue PDF (Russian)
7-26
Abstract
The paper considers boundaries of memory necessary and sufficient for storage of undirected ordered rooted connected graphs, both numbered and unnumbered. The introduction contains the basic definitions and the problem statement. A graph is rooted if one of its vertices is marked as a root. A graph is ordered if for each of its vertices all the incident edges are ordered (numbered). A graph is numbered if all its vertices are numbered with different integer numbers (from 0 to n -1, where n is the number of vertices). Two undirected ordered graphs G and G` are weakly isomorphic if there exists a one-to-one correspondence between their vertices, for which corresponding vertices has the same degrees (numbers of incident edges) and two edges having corresponding ends and the same numbers in these ends, also have the other ends corresponding. Isomorphism of rooted graphs should also correspond their roots. Isomorphism of numbered graphs should also correspond the vertices with the same numbers. Graphs are considered up to weak isomorphism. It is shown that the memory necessary and sufficient for storage of any graph has the size Q( mlogn ) for numbered graphs, Q( n +( m - n +1) logn ) for unnumbered graphs with the number of vertices n and the number of edges m , and Q( n2 logn ) for graphs without multiple edges and loops with the number of vertices n . It is also shown that the memory sufficient for storage of an edge sequence of length O ( n ) or a spanning tree, has the O ( nlog ( n D)) or O ( nlog D) size, respectively, where D is the maximum vertex degree.
27-76
Abstract
We propose a general method to solve graph problems by a set of automata (computational agents) located in vertices of undirected ordered connected rooted graph and communicating by passing messages along graph edges. The automata are semi-robots, i.e., their internal memory size is sufficient to store values depending on the number of vertices and edges of the graph (n and m, correspondingly) but is too small to store the complete graph description. Section 2 presents classification of graph-based distributed computational models depending on internal memory size of vertex automaton, vertex automaton operation time, and edge capacity (the number of messages that are passing along an edge simultaneously). We choose for further use the model of maximum parallelism, having zero automaton operation time and unbounded edge capacity. It allows to obtain lower complexity bounds on distributed graph problems. Section 3 describes algorithm complexity assessment rules. Section 4 presents basic message processing procedures and message classification according to paths passed by them and methods of message processing by vertex automaton. We propose to construct algorithms solving graph problems on the base of the procedures considered, and present some examples in further sections. Sections 5-9 describe distributed algorithms for spanning tree construction, for task termination detection (based on detection of absence of messages used by the task), for shortest paths tree construction, for graph vertices enumeration, for collecting graph structure information in the root automaton memory (if it is unbounded). The algorithms proposed has linear time complexity in n and m, and use linear in n and m number of messages. Section 10 considers construction of maximum weight path in a weighted graph, which is the NP problem. We propose the algorithm that for the sake of using unbounded number of messages can solve this problem in linear time in n and m. The conclusion summarizes the paper and draws directions of further research: constructing algorithms for other graph problems and generalization of the approach to directed, non-deterministic and dynamic graphs.
77-96
Abstract
Existing research analyzing evolution of the Linux kernel considers the kernel together with loadable modules delivered with it or some specific subsystems of the kernel. The aim of this paper is to evaluate evolution of the kernel without loadable modules. It proposes a method for determining boundaries between them and evaluates evolution for all versions of the Linux kernel, released over the past 7.5 years. Also the paper presents a classification and a distribution of typical bugs that were fixed in the kernel, based on analysis of changes that have been made to stable branches of the kernel during the last 2 months of 2015. One can use the obtained results for evaluation of applicability of various methods and tools for software quality assurance.
97-116
Abstract
The most of modern widely used operating systems have monolithic kernels since this architecture aims at reaching maximum performance. Usually monolithic kernels without various extensions like device drivers consist of several million lines of code in the programming language C/C++ and in the assembly language. With time, their source code evolves quite intensively: a new functionality is supported, various operations are optimized, bugs are fixed. The high practical value of operating system monolithic kernels defines strict requirements for their functionality, security, reliability and performance. Approaches for software quality assurance which are currently used in practice help to identify and to fix quite a number of bugs, but none of them allows to detect all possible bugs of kinds sought for. This article shows that different approaches to static verification, which are aimed at solving this task, have significant restrictions if applied to monolithic kernels as a whole, primarily due to a large size and complexity of source code that is constantly evolving. As a first step towards static verification of operating system monolithic kernels a method is proposed for decomposition of kernels into subsystems.
117-160
Abstract
In October 2013, the eighth meeting of researchers in the field of databases was held. The first such meeting took place in February 1988, so that 25 years passed between them. After each meeting, a report was published containing an overview of the current state of the field and a research program for the nearest future, a kind of set of forecasts for the development of research activities. This paper looks at the most interesting forecasts from the reports of the research meetings, discusses how they proved to be valid, to what extent they were true or not. Among the various problems of database technology under consideration are the following: the role of specialized hardware in building effective DBMS; SQL and database applications; perspectives of object-relational extensions; distributed heterogeneous database systems; databases and Web; databases and data warehouses, OLAP and data mining; component organization of DBMS; query optimization criteria; self-tuning and self-management of DBMS; DBMS architecture and new hardware capabilities: SSD, non-volatile memory, massively multithreaded processors; specialized DBMS; data fusion and data spaces; the Big Data problem and the response to it in the database community; architectural shifts in computing.
161-200
Abstract
Text documents clustering is used in many applications such as information retrieval, exploratory search, spam detection. This problem is the subject of many scientific papers, but the specificity of scientific articles in regards to the clustering efficiency remains to be studied insufficiently; in particular, if all documents belong to the same domain or if full texts of articles are unavailable. This paper presents an overview and an experimental comparison of text clustering methods in application to scientific articles. We study methods based on bag of words, terminology extraction, topic modeling, word embedding and document embedding obtained by artificial neural networks (word2vec, paragraph2vec).
201-214
Abstract
During recent years, many research group interested in the study of organizational networks from different fields, e.g. studies of behavior of cities growth following the fractal theory. Considering the fractal analysis as a strong tools to analyse the relation between grow of cities and its relation with health center distribution. The propose of this paper is to develop a methodology based on a fractal analysis framework of Chilean public health networks, the population growth of cities and the location of health centers. To produce empirical approach for the analysis of the locations of health centers and its effects. On the other hand the health has been related to other sciences, a clear example is the physics and the mathematics, where we found many studies related with the geometry to understood the nature of the multiple natural shapes, its developed was called fractal nature or mathematic fractal model, this approach consider the different scales. For this reason, we present in this paper some preliminary ideas and some initial results of the analysis of the complex and fractal behavior of urban expansion, and its relation to the location of public health centers through simple counters algorithms and spectral methods using ImaCal software. The study health patterns will become a useful tool for understanding human flow and behavior of health center, where using a traditional measurement as equipment has serious problems and limitations.
215-230
Abstract
Local Diffusion and the topological structure of vorticity and velocity fields is measured in the transition from a homogeneous linearly stratified fluid to a cellular or layered structure by means of convective cooling and/or heating. Patterns arise by setting up a convective flow generated by an array of Thermoelectric devices (Peltier/Seebeck cells) these are controlled generating a buoyant heat flux. The experiments described here investigate high Prandtl number mixing using brine and fresh water in order to form density interfaces and low Prandtl number mixing with temperature gradients. The set of dimensionless parameters define conditions of numeric and small scale laboratory modeling of environmental flows. Fields of velocity, density and their gradients were computed and visualized using the open software tools of DigiFlow. When convective heating and cooling takes place in the side wall of a stratified enclosed cell, the combination of internal waves and buoyancy driven turbulence is much more complicated if the Rayleigh and Reynolds numbers are high. Higher order moments calculations and intermittency are important in order to study mixing in complex flows Here some examples are shown using the Thermoelectric Convection Didactive Device (TCDD) built by BEROTZA, mainly in a symmetric two dimensional pattern, but many other combinations, using heating-cooling and angles with the vertical are possible in order to validate more complex numerical experiments.
231-256
Abstract
Theory of scheduling and project planning is widely applied in diverse scientific and industrial areas. To effectively solve application-specific problems it is necessary to take into account a lot of factors such as task execution models, precedence relationship between tasks, resource limitations, directive deadlines, working calendars, conditions for financial and logistics support of project tasks, specific spatio-temporal requirements et al. Therefore, in practice there is a need to generalize the project scheduling problems and to consider their extended formulations. The paper provides a classical mathematical statement of RCPSP problems (Resource-Constrained Project Scheduling Problem) and a short description of the serial dispatching algorithm for their approximate solution. Shortcomings and limitations of the RCPSP statement are discussed and systemized in more details. The paper presents a mathematical formalization of project scheduling problems in extended definitioins taking into account numerous features of practical problems. The proposed statement of GCPSP problems (Generally Constrained Project Scheduling Problem) can be considered as an evolution of RCPSP problems. This statement implies a mathematically neutral specification of the optimization problem under algebraic constraints of the predefined sorts and priorities. Essentially, that the constraints are interpreted in terms of the local consistency that allows some violations in the case of overloaded algebraic systems. An effective algorithm for the approximate solution of GCPSP problems is also presented and explained by following to the classical serial algorithm. Moreover, the equivalence of the algorithms is proved for the cases when a solved GCPSP problem is reduced to the RCPSP. It is expected that the obtained results will allow developing a general-purpose software library for solving of diverse project scheduling problems.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)