Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 37, No 4: Part 2. July-August
View or download the full issue PDF (Russian)
7-16
Abstract

In this paper we present the architecture for time synchronization in an onboard network. The architecture is specific to SpaceWire protocol and is based on mechanism of broadcast codes introduced in ECSS-E-ST-50-12C standard. We examine synchronization of real time clocks as well as synchronization of ARINC 653 node schedulers. We discuss the modification to ARINC 653 Interrupt Services required for time synchronization to operate. Achieved precision of synchronization is no worse than 5 ms.

17-30
Abstract

In the modern world, processor performance and energy efficiency play a key role in computer system design. Along with CPUs, GPUs are powerful computing devices used for computer graphics processing, machine learning, and more. Processors are equipped with built-in sensors accessible through specialized tools. The chip of a modern video card can operate in a fairly wide range of frequencies and power limits (PLs). Very often, when solving a computational task or rendering a scene, the video card can operate more optimally, without wasting excess power, which can significantly save energy on labor-intensive tasks. Therefore, it is important for a set of given tasks to find such parameters where the ratio of useful work per watt will be maximum. After conducting a large number of experiments, one can learn to predict the dependence of such a target function on the parameters. This paper examines obtaining current GPU parameter values using various tools. We present results of collecting raw data from NVIDIA GPUs and the subsequent construction of an optimal power consumption model.

31-46
Abstract

File systems are a crucial component of any modern operating system, whether it is a general-purpose computing system or a specialized data storage system. The cost of a file system error is very high; consequently, there is a need for effective tools for analyzing quality and detecting errors in file systems. This paper presents the DIFFuzzer tool, which is based on fuzzing techniques using grey-box and black-box principles, and implements a differential dynamic analysis approach, where the behavior of the target file system is compared to that of another, known to be of higher quality file system, which serves as a generator of reference behavior. In comparing behaviors, both the response codes of system calls and the aggregated state of the file systems are analyzed. The toolkit also includes a reducer that minimizes the erroneous trace and generates a short fragment where the error still appears. The developed tool has been tested on several POSIX-compliant file systems and has discovered several errors even during a relatively short experiment.

47-68
Abstract

The structure of a process model discovered from an event log of a multi-agent system often does not reflect the system architecture with respect to agent interactions. The existing conformance checking quality dimensions mainly evaluate the extent to which the behavior a discovered model corresponds to event sequences recorded in an event log. These behavioral dimensions might be insufficient to differentiate process models discovered from an event log of the same multi-agent system with respect to the independence of agents and the complexity of their interactions. In this work, we propose a theoretically grounded approach to measuring the structural complexity of a process model representing a multi-agent system with asynchronously interacting agents. We also report the key outcomes from a series of experiments to evaluate the sensitivity of the proposed approach to structural modifications in process models.

69-84
Abstract

To represent a model that includes both data and resource perspectives, Data Petri nets could be used. In this formalism, each transition has a constraint that includes input and output conditions on variables. To stay within decidability, the conditions should not contain arithmetic operations, so the resources are usually represented as separate places. Existing correctness criteria such as easy, relaxed and lazy soundness could be adapted to resource-oriented Data Petri nets but deciding them requires solving a reachability problem that is known to have very high complexity even for classical Petri nets. In this paper, we propose a new correctness notion called relaxed lazy soundness that incorporates the main features of the aforementioned properties and that could be decided as a coverability problem, which is known to be less computationally complex than the reachability one. We provide an algorithm to verify this property, prove its correctness, and implement it in the existing soundness verification toolkit. The performance evaluation results confirm the applicability of the algorithm to process models of a moderate size. The algorithm could be used both for verification of resource-oriented models and for preliminary validation of arbitrary process models represented as Data Petri nets.

85-102
Abstract

Viewpoint selection methods for 3D scenes are used in computer vision, computer graphics and scientific visualization to obtain views that are most suitable for the problem at hand. In this paper, a method for viewpoint selection based on inverse rendering is proposed for material reconstruction. The proposed method solves the problem of selecting arbitrary views (i.e., not from a predefined set) based on various view quality estimates using geometric characteristics of the target 3D object. The proposed method allows using both differentiable rendering-based and gradient-free optimization implementations of inverse rendering. The proposed method was tested on an open dataset for 3D reconstruction. Testing showed an increase in reconstruction quality when using the proposed method with various view quality estimates compared to naive viewpoint selection strategies.

103-116
Abstract

In the process of detecting anomalies or deviations from expected behavior in continuously streaming data is complex and necessitates the development of effective models that can adaptively retrain over time. The human brain serves as a prime example of such a system, as it continuously learns throughout life, with past experiences that once seemed erroneous gradually becoming integrated into commonplace knowledge. While modern neural network models have made significant advancements in recognizing text and images, they have diverged considerably from the original neuron models and no longer represent a singular algorithm akin to that which our brains utilize. Networks such as LSTM (Long Short-Term Memory) can account for both distant and immediate past information; however, they exhibit limitations in their retrainability. We align with the theories proposed by Jeff Hawkins, a prominent researcher in the field of bio-inspired intelligence, whose team is developing innovative cortical algorithms that emulate current research on the functioning of the intelligent brain. In this context, vision and hearing can be conceptualized as sensors, with the data they provide being integrated within the model to generate continuous predictions for each input signal. In our article, we explore contemporary theories on this subject and present a custom implementation of these concepts using the Erlang programming language.

117-132
Abstract

Code review is essential for software quality but labor-intensive in distributed teams. Current automated comment generation systems often rely on evaluation metrics focused on textual similarity. These metrics fail to capture the core goals of code review, such as identifying bugs, security flaws, and improving code reliability. Semantically equivalent comments can receive low scores if worded differently, and inaccurate suggestions can create confusion for developers. This work aims to develop an automated code review generator focused on producing highly relevant and applicable feedback for code changes. The approach leverages Large Language Models, moving beyond basic generation. The core methodology involves the systematic design and incremental application of sophisticated prompt engineering strategies. Key strategies include step-by-step reasoning instructions, providing the model with relevant examples (few-shot learning), enforcing structured output formats, and expanding contextual understanding. Crucially, a dedicated intelligent filtering stage is introduced: a LLM-as-a-Judge technique acts as an evaluator to rigorously rank generated comments and filter out irrelevant, redundant, or misleading suggestions before presenting results. The approach was implemented and tested using the Qwen/Qwen2.5-Coder-32B-Instruct model. Evaluation by original code authors demonstrated significant improvements. The optimal prompt strategy yielded a 2.5 times increase in the proportion of applicable reviews (reaching 37%) and a 1.6 times increase in good comments (reaching 61%) compared to a baseline. Providing examples enhanced comment quality, and the evaluator filter proved highly effective in boosting output precision. These results represent a substantial advance towards generating genuinely useful, actionable feedback. The approach significantly enhances the practical utility and user experience of automated code review tools for software developers by prioritizing relevance and applicability.

133-146
Abstract

The robustness of neural networks to adversarial perturbations in black-box settings remains a challenging problem. Most existing attack methods require an excessive number of queries to the target model, limiting their practical applicability. In this work, we propose an approach in which a surrogate student model is iteratively trained on failed attack attempts, gradually learning the local behavior of the black-box model. Experiments show that this method significantly reduces the number of queries required while maintaining a high attack success rate.

147-174
Abstract

The problem of detecting network attacks is becoming particularly important in the context of the increasing complexity of cyber threats and the limitations of traditional signature methods. This paper provides a comprehensive analysis of five machine learning algorithms with a focus on interpretability of models and processing of unbalanced Simulated Network Traffic data. The main objective is to increase the accuracy of detecting cyber-attacks, including DDoS and port scanning, using a decision tree, logistic regression, random forest and other methods. The study was performed in Python 3.13 using the scikit-learn, XGBoost and TensorFlow libraries. The choice of tools is determined by the specifics of the task: for classical methods (trees, logistic regression) and ensemble approaches (Random Forest, XGBoost), scikit-learn turned out to be optimal, and for neural network experiments (RProp MLP) TensorFlow/Keras provided a user-friendly interface for prototyping. PyTorch was not used because it did not provide advantages for binary classification on structured data, but its use could be justified for analyzing sequences or unstructured logs in future research. The decision tree demonstrated the highest accuracy – 99.4% with a depth of 5 and the selection of 8 key features out of 18. After tuning, gradient boosting showed a comparable result – 99.58%, but its training took significantly longer (576 seconds versus 69 for the decision tree). The random forest achieved 97.98% accuracy, while the logistic regression achieved 96.53%. Naive Bayes proved to be the least effective (86.48%), despite attempts to improve using PCA. The linear regression transformed into a classifier showed an accuracy of 94.94%, which is lower than the ensemble methods, but acceptable for the basic approach. The practical value of the work is confirmed by testing on real network data. The results obtained can form the basis of hybrid systems combining several algorithms to increase detection reliability. For example, combining a fast decision tree for primary analysis and gradient boosting to refine complex cases will allow you to balance between speed and accuracy. Separately, it is worth noting the importance of interpretability of models: trees and logistic regression not only showed good results but also allowed us to identify key signs of attacks, which is critical for integration into existing security systems.

175-190
Abstract

The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively small language models, the TOPSIS method to select the most appropriate one, prototyping, and the integration of the proposed software solution with the HEI educational system. As a result, a service that can be integrated into learning management systems is presented. The paper also presents the results of testing the models that formed the basis of the presented solution.

191-206
Abstract

This study introduces an AI-driven assistant prototype that automates the generation of data visualization scripts from natural language queries, eliminating the need for users to have programming skills. The article examines research aimed at developing tools for effective data visualization, compares data visualization systems based on the use of artificial intelligence, and shows the limitations of the existing tools. The proposed approach to data visualization is based on integrating knowledge-driven DSM platform (language toolkits) and generative AI tools. The proposed methodology categorizes tasks of data visualization into two distinct types: standard and non-standard. Standard tasks are solved with a code-generation approach based on prompts within a visual environment. Non-standard tasks are handled by extending existing libraries with user‑defined packages. The language-oriented approach with DSM tools effectively unifies both categories: for standard tasks, users work with pre-existing DSLs and adjust parameters as necessary, whereas for non-standard tasks, users develop new DSLs with language toolkits automating visual DSL creation and code generation. The core of the language toolkits is multifaceted ontology. By integrating a large language model (LLM) with a knowledge-driven framework and a multifaceted ontology, the system enables dynamic, context-aware visualization workflows that ensure semantic traceability and reproducibility. The ontology not only stores descriptions of data visualization tasks but also facilitates the reuse of generated scripts, thereby enhancing the system’s adaptability and fostering collaborative analytical work among user communities. The dataset, containing entries and variables encompassing different domains, is used to demonstrate the functionality of the prototype. The article provides examples of developing several visualization options, demonstrating the application of the proposed approach. Case studies demonstrate the prototype’s efficacy in creating histograms, scatter plots, and other visualization methods, while reducing technical barriers for users. Future work will extend the assistant’s functionality by incorporating user-defined visualization packages and additional LLM training to address non-standard tasks and complex visualization scenarios.

207-218
Abstract

The paper presents the development of a Knowledge-based Intelligence for Sustainability Assessment (KISA) system for the comprehensive assessment of the sustainability of Russian regions, which uses a large language model (LLM) with retrieval-augmented generation (RAG) technology and Rosstat data. KISA automatically selects relevant indicators based on users’ textual queries, determines their weights, and calculates regional ratings, overcoming the limitations of traditional methods associated with high resource costs, subjectivity, and low adaptability. The system reduces the time required for rating formation to 10 minutes – 140 times faster than existing approaches; financial costs are reduced by a factor of 16 due to the minimization of expert participation. The agreement with expert evaluations is 68%, confirming the validity of the method. KISA provides a web interface with map visualization, enhancing flexibility in analysis; the possibility of improvement through the addition of new sources ensures the continuous incorporating experts’ experience. The results of the study contribute to the improvement of regional sustainability assessment and can be used in management decision-making.

219-234
Abstract

The rapid advancement of AI technologies, particularly Large Language Models (LLMs), has sparked interest in their integration into Multi-Agent Systems (MAS). This holds substantial promise for applications such as smart homes, where it can significantly enhance user experience by optimizing comfort, energy efficiency, and security. Despite the potential benefits, the implementation of MAS based on LLMs faces several challenges, including the risks of hallucinations, scalability issues, and concerns about the reliability of these systems in real-world applications. This study explores the development of MAS incorporating LLMs, with a focus on mitigating hallucinations through the integration of formal logical models for knowledge representation and decision-making, along with other machine learning methods. To demonstrate the efficacy of this approach, we conducted experiments with a plant care module within a smart home system. The results show that our approach can significantly reduce hallucinations and enhance the overall reliability of the system. Further research will focus on refining these methods to enhance adaptability and scalability to ensure system’s functionality in real-world environments.

235-250
Abstract

Hybrid fuzzing and dynamic symbolic execution have become a vital part of the secure software development lifecycle. Currently, the proportion of code being developed for ARM and RISC-V architectures is constantly increasing, making the task of their effective analysis a top priority. This work is dedicated to solving this task by developing methods for dynamic symbolic execution and hybrid fuzzing for modern RISC architectures – «Baikal-M» (ARM/AArch64) and RISC-V 64. Based on modeling symbolic semantics of machine instructions the developed approaches are integrated into the Sydr tool within the Sydr-Fuzz framework and aim to enhance the efficiency of hybrid fuzzing. Key results include algorithms for processing indirect branches with accurate target addresses determination and RISC-V integer instruction set support in open-source symbolic framework Triton that provides the community with a ready-made foundation for creating dynamic analysis tools.

251-270
Abstract

Nowadays automated dynamic analysis frameworks for continuous testing are in high demand to ensure software safety and satisfy the security development lifecycle (SDL) requirements. The security bug hunting efficiency of cutting-edge hybrid fuzzing techniques outperforms widely utilized coverage-guided fuzzing. We propose an enhanced dynamic analysis pipeline to leverage productivity of automated bug detection based on hybrid fuzzing. We implement the proposed pipeline in the continuous fuzzing toolset Sydr-Fuzz which is powered by hybrid fuzzing orchestrator, integrating our DSE tool Sydr with libFuzzer and AFL++. Sydr-Fuzz also incorporates security predicate checkers, crash triaging tool Casr, and utilities for corpus minimization and coverage gathering. The benchmarking of our hybrid fuzzer against alternative state-of-the-art solutions demonstrates its superiority over coverage-guided fuzzers while remaining on the same level with advanced hybrid fuzzers. Furthermore, we approve the relevance of our approach by discovering 85 new real-world software flaws within the OSS-Sydr-Fuzz project. Finally, we open Casr source code to the community to facilitate examination of the existing crashes.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)