Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 36, No 2 (2024)
View or download the full issue PDF (Russian)
7-20
Abstract

Network Traffic Analysis (NTA) helps identify security threats, monitor network performance, and plan for future capacity. While real-time analysis is ideal, it can be difficult due to high data volume and complexity. Large amounts of traffic require parsing, and real-time data may miss hidden threats. Post-analysis can address these challenges. It hardly depends on choosing an effective and appropriate storage solutions. A variety of storage systems exist, each employing different approaches and formats to retain data. This article explores the applications of various storage systems for NTA results. Three different types of storage systems considered, including Greenplum, Nebula graph and OpenSearch. A comparative approach is employed, analyzing the same dataset across various storage systems.This allows to examine how different database structures and query capabilities influence the efficiency and accuracy of NTA. The resulting insights will not only provide valuable guidance for selecting the optimal storage solution for specific NTA tasks, but also serve as a foundation for future research in this area.

21-32
Abstract

Domain-specific languages power numerous modern applications and libraries, including but not limited to: Wolfram Alpha, Microsoft Excel, Graphviz. This work aims to share the experience gathered from developing TQL (Talisman Query Language) – a domain-specific language used in Talisman platform. Talisman platform is a set of tools to automate data processing tasks, developed by Ivannikov Institute for System Programming of RAS. TQL implementation, discussed in this article, supports error-recovery, can be run directly inside a browser as well as on a server, and it also has an interactive playground that visualizes the parse tree while typing. This article describes several techniques and technologies that were used to make these qualities possible while keeping a single, maintainable codebase.

33-46
Abstract

Cyber-physical systems are a symbiosis of multi-level control systems that take into account the physical aspects of the functioning of target objects. Errors in such systems can be associated both with incorrect organization of the code and operation of the hardware, as well as with an incorrect understanding of physical laws and their numerical approximation. Continuing our previous work, we apply technologies for analyzing commits in git repositories of some well-known cyber-physical systems, followed by classification of messages from developers. As a result, we discuss the identified strong keywords and generalized fix messages that can reveal the main classes of bugs in these projects. The results of the work can be used in training and consulting on errors and vulnerabilities in complex systems.

47-58
Abstract

The article discusses the issues of planning and resource management in the process of testing software systems. The paper presents the ACC analysis method used at Google to optimize the distribution of efforts for testing different parts of the system. Extending the method by adding a fourth characteristic - actors (roles of system users) – allows for a more flexible assessment of action requirements and user skill levels. Illustrative examples of system attributes and components help understand the principles of the method. The work proposes a new approach to risk management and process improvement in testing software systems in a multidimensional space. The effectiveness of applying the enhanced ACC analysis method using a risk-oriented approach was demonstrated using the example of a control system for technological operations in the repair of electric motors, for which attributes, components, actors were identified, opportunities at their intersection were analyzed, and testing was conducted, which helped improve the system's quality.

59-72
Abstract

This paper considers the automated unit tests generation for programs written in Java using the Spring framework. Although several test generation tools for “pure” Java applications have been developed in recent decades, the features of this framework are mostly not taken into account. However, Spring is used to develop many industrial Java applications. At the same time, the presence of Spring components in the application for which the tests are generated imposes additional requirements not only on the code analysis approaches, but also on the structure of the generated tests. The main source of the information about object types and their properties is the Spring application context. The paper proposes an instrument for analyzing the application context, that in some cases allows generating test scenarios corresponding to real program executions and avoiding excessive mocking. The full initialization of the application context does not occur during this analysis. It makes the test generation safe for user data. The proposed instrument for analyzing the Spring context has been integrated into the UnitTestBot Java automatic test generation tool. We also provide examples of tests generated for real open-source projects.

73-82
Abstract

The article highlights an innovative approach to risk management in software projects using generative artificial intelligence. It describes a methodology that involves the use of publicly available chatbots to identify, analyze, and prioritize risks. The Crawford method is used as a basis for risk identification. The authors propose specific formulations of requests to chatbots (instructs, prompts) that facilitate obtaining the necessary information. The effectiveness of the methodology has been demonstrated on five small software projects and a dozen of economic and organizational projects of significantly different scales, from small to federal. This confirms its applicability and practical value.

83-90
Abstract

The choice of an educational program is momentous in young people's lives. Given the shortage of time after exams, applicants usually do not have time to analyze possible educational tracks. Furthermore, it requires a thorough study of learning plans. This research addresses the problem proposing the algorithm to data-driven curriculum analysis based on natural language processing of course names or competences listed in learning plans. Moreover, the intelligent software system architecture is described. The method is tested on the curricula scraped from university websites. In order to store the content a data warehouse has been developed. At this time, there are few studies on this topic. The existing ones are either on the early stages of development or scarce on implementation details. They are briefly discussed in this paper.

91-108
Abstract

Recently, there is a surge of interest in employing neurocomputer interfaces for a control contours implementation, especially for different infrastructures of Internet of Things. However, due to a low-level nature of such devices and related software tools, neurointerface integration with a large variety of IoT devices is quite a tedious task, and the one that requires a lot of knowledge in the neuroscience and signal processing to boot. In the paper, we propose an ontology-driven solution for facing the upcoming challenges of unified integration of brain-computer interfaces into IoT ecosystems. We demonstrate an adaptable mechanism for integrating brain-computer interfaces into the Internet of Things infrastructure by introducing an intermediate layer – a smart mediator that will be responsible for communication between the environment and the neurointerface. The mediator’s software is generated automatically, and this process is driven by a managing ontology. The proposed formal model and the system's implementation are described. The approach we have developed enables researchers and engineers without strong background in brain–computer interface to automate the integration neurointerfaces with different infrastructures of Internet of Things.

109-126
Abstract

An integral part of the process of creating high-performance computing systems designed to solve problems of numerical modeling of various physical processes is to check their compliance with the characteristics stated during their design. At the same time, there is a problem of evaluating the performance of computing systems on synthetic tests, which are significantly primitive in mathematical complexity to real applied problems. The article considers a set of test programs developed by the authors, which allows to more accurately assess the real performance of computing systems.

127-140
Abstract

The data visualization method based on a language-oriented approach is proposed. An analysis of data visualization tools and their customizability for subject areas based on user needs has been carried out. It is noted that these tools require highly qualified users to customize the data visualization format (users must have programming skills). It is proposed to customize visualization tools to the needs of users and the specifics of the user's tasks being solved by creating domain-specific languages (DSL). A system architecture based on the use of multifaceted ontology is described. The ontology includes descriptions of languages and domains, as well as rules for generating new languages and transforming constructed models. Languages are designed to describe different classes of diagrams. This system includes tools for automatic new DSL generation via mapping domain ontology onto the base language metamodel according to user-specified rules. Different types of diagrams have been classified and the main components of each type diagrams have been identified, which provides the basis for creating an ontology of data visualization languages. A base language is proposed for creating diagrams. The language customizability for specific domains is demonstrated. An example of the created data visualization models is shown.

141-168
Abstract

To create modern competitive and trusted software, it is necessary to use knowledge of formal methods. Currently, a huge number of students are studying specialties related to programming. However, when studying at a university, it is difficult to gain the skill of practical application of theoretical knowledge. Short competitions with non-standard, industrial-related problems can arouse students' interest in the field of formal methods. The article describes the first experience of organizing a competition in formal verification of programs among students of Russian universities. The competition was held in conjunction with a seminar on program semantics, specification and verification (PSSV) in Innopolis in November 2023. The format of the competition was close to the format of so-called hackathons. Participants were asked to solve verification problems using predefined model checking and deductive verification tools. We consider the issues of organizing such an event, proposed tasks, results of decisions and feedback from participants.

169-180
Abstract

Electrovortex flows arise when an electric current of varying density passes through a well-conducting fluid (e.g., acid or metal melt). In such a case, the electric current generates a magnetic field, which leads to the Lorentz force causing swirling currents of the medium. There are different methods of theoretical study of such currents. As a rule, to avoid the necessity to find the pressure dependence on coordinates, the variables "vector potential of velocity - swirl" ("scalar current function – swirl" in the case of axisymmetric flows) are used. In such a case, it is quite effective to use automodel variables, which allow to reduce the dimensionality of the problem. In this case, the solution for the introduced function can be sought in the form of an expansion by the electro-vortex flow parameter proportional to the square of the magnetic Reynolds number. Also, this solution can be obtained numerically, for example, using finite-difference methods. Nowadays, more and more often the solutions are investigated by means of direct numerical modeling methods, when no automodel approximations are made, which reduce the accuracy of the solution. Nevertheless, in such a case the number of computations can be quite large and requires the use of supercomputer resources. A separate difficulty is presented by the boundary conditions: for example, for the velocity vector potential we obtain a fourth-order equation, which imposes significant restrictions on the time steps in the evolution equation. The problem can be avoided by using approximate boundary conditions, but this again reduces the accuracy of the solution. In this paper, using the example of electro-vortex flow between planes, the solutions that can be obtained using the various computational approaches mentioned above are examined. The results obtained are compared, and they are also compared with analytical approximations.

181-192
Abstract

The epidemiological indicators (prevalence, mortality, convalescence) for COVID-19 are investigated as a temporary-spatial dependencies. Proper Orthogonal Decomposition is applied for the first time to this type of data; the main modes and corresponding coefficients are obtained. Due to this method, it is shown that there are modes concentrated in the particular regions which means there are independent factors for disease spreading. Additionally, it is showed that Empirical Mode Decomposition can be successfully applied for noise-reduction and better understanding time dependencies. The exponential nature of the decomposition error decree shows the accuracy of the decomposition. Despite the ability of POD to reveal hidden dependencies, it requires rows of simultaneous data, which in fact may not be so. In the article the correction applied is discussed, however it may not be enough because of mistakes in raw data. The method is recommended to use unless the data it is applied to is inaccurate.

193-198
Abstract

The aim of the study is to compare the oral speech of Soviet and Russian young people aged 13 to 23 years. The analysis was carried out according to a single indicator: the abundance of structurally incomplete statements in transcribed oral messages. Speech samples of both Soviet and modern Russian teenagers are taken from the media. As a result of the study, it was shown that Soviet schoolchildren and students used structurally incomplete statements in their oral speech 7 times less than modern youth. The study is a continuation of a more extensive comparison of the oral speech of schoolchildren and students of the Soviet and Russian eras, as well as speech role models that were broadcast by the Soviet and Russian media. The basis for the choice of research material was the non-trivial task of establishing the role of the influence of the modern media environment on the speech of adolescents. If Soviet schoolchildren and students consumed media content with the best linguistic role models for their time, then modern teenagers are formed in the free Internet environment. In the course of the study, a method was developed for objective comparison of the speech of different speakers. Text corpora were compared by the number of occasionalisms, author's syntagmas, phraseological units, professionalisms, clericalisms, vulgarisms, structurally incomplete statements, obscene vocabulary, etc. In a series of studies, it was shown that the speech samples of the Soviet era, that is, the Soviet group, are better both quantitatively and qualitatively than modern ones. The scientific novelty lies in the assessment of the oral speech of the speakers of information and entertainment media products of the media of one Russian-speaking country, but in different eras, although close in time, using a new complex method of linguistic analysis. The key factors that influenced the speech approaches of the creators of these media products were rapid social and technical changes.

199-210
Abstract

In this paper we argue that the category of nominal case in the Vakh Khanty dialect needs reviewing which is due to the availability of the recent field data obtained in 2019. This objective is considered to be topical because the results of the study will equip researches with all necessary data to develop a unified notation system for the case markers to be used in an annotated corpus of Vakh Khanty on the LingvoDoc platform. The declension system of the Eastern Khanty dialect under analysis is heterogeneous and relatively vast. The controversy around the synthetic means of expression of semantic relations concerns the terminology of case markers, their quantity, morphemic status and functional features. The purpose of this paper is to clarify the functional and semantic aspects of the Vakh Khanty case markers drawing on the recently obtained field material. This objective is achieved by means of the tools available on the LingvoDoc platform. The latest field data on the Vakh Khanty dialect was collected in the village of Korliki in 2019. More than 6,000 words of data are posted on the endangered languages documentation platform LingvoDoc. Part of the material used in the analysis is being processed and integrated on the LingvoDoc platform. The underlying opinion of this study draws on the seminal work by N.I. Tereshkin, as well as other prominent researchers of the Khanty language, and their description of the declension system of the Eastern dialect. The latest field data contains texts and questionnaires allowing researches to examine functioning of the morphological markers and compare results with the findings previously presented in the literature. Such approach entails observations leading to the systematization of the case category in this dialect at its current state of development. The examination of the range of meanings conveyed by the Vakh Khanty case markers at present has made it possible to distribute them into groups of semantic and syntactic case markers. As a result, the study confirmed that the nominal declension system encompasses such case markers as the abessive, ablative, allative, distributive, comitative, comparative, lative, locative, nominative, oblicative, and translative cases. The recent field data confirms the presence of the distributive case within the case paradigm and the status of the comparative element niŋit as a postposition. Each case marker of the Vakh Khanty dialect is terminologically specified which is essential for developing a parser for this dialect on the LingvoDoc platform. The development of the parser in the future will speed up the process of processing and analyzing language material and lead to the elimination of errors.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)