Vol 37, No 6

On Length of an Adaptive Distinguishing Sequence for a Family of Observable Finite State Machines

Igor Borisovich BURDONOV, Nina Vladimirovna YEVTUSHENKO, Alexander Sergeevitch KOSSATCHEV

PDF (Rus)

7-20

Abstract

In the paper, the length of an adaptive distinguishing sequence for a family of initialized complete observable, possibly nondeterministic Finite State Machines (FSM) is studied. The upper bound for the length of such a sequence is established and its reachability is shown. It is also discussed how the obtained results can be applied for constructing adaptive diagnostic test suites based on a FSM model.

Yet Another Kind of JavaBeans: Composed Types from Aggregated Instances

Efim Mikhailovitch GRINKRUG

PDF (Rus)

21-42

Abstract

The paper presents an approach to the JavaBeans-components implementations so that they provide support for dynamic components composition – user defined components creation without compiling them, by manipulations with pre-existing components instead. JavaBeans component model, presented at the beginning of Java technology, has its’ limitation that seems to be significant. The JavaBean component, by definition, is a serializable Java class with public no-arguments constructor; additionally, JavaBeans design patterns serve to use JavaBeans-components instances in a specific manipulating environment. The goal of the manipulations is to achieve the required states of the instances and their aggregation behavior altogether, when the aggregation can be serialized and deserialized later in similar environment. There is a hidden contradiction: starting from a predefined set of JavaBeans components – classes for class-based object-oriented environment, we end up with a prototype-based style for instances aggregation usage. To use user-defined aggregation as a new composed component we must change a programming paradigm and generate code in static-like way of component creation. We propose an evolution of JavaBeans component model that enables user defined composed components creation dynamically without aggregation cloning.

Comparison of Object-Oriented and Procedural-Parametric Polymorphism

Pavel Vladimirovich KOSOV, Alexander Ivanovich LEGALOV

PDF (Rus)

43-58

Abstract

Dynamic polymorphism is widely used in situations involving the identification and processing of alternatives during program execution. Dynamic polymorphism allows to flexibly expand programs without changing previously written code. It is widely used in statically typed object-oriented programming languages by combining inheritance and virtualization. The programming languages Go and Rust also provide support for dynamic polymorphism, using static duck typing to implement it. Another approach to implementing dynamic polymorphism is offered by the procedural-parametric programming paradigm, which at the same time provides direct support for multimethods and flexible evolutionary expansion of both alternative data and their processing functions. The paper compares the capabilities of object-oriented and procedural-parametric paradigms to support agile software development. The basic techniques that ensure the expansion of the functionality of programs are compared. The features of the implementation of design patterns are considered.

Static Analysis of Golang Source Code: A Survey

Varvara Viktorovna DVORTSOVA, Alexey Evgenevich BORODIN

PDF (Rus)

59-82

Abstract

Static analysis methods determine the properties of a program without executing it, while different properties allow solving different tasks. We have reviewed articles on Golang static analysis. In this paper, we have reviewed 34 papers published since the release of Go 1.0 (2012 – 2025) and focused on static analysis in Golang. Based on our analysis, we have identified the main trends and methods for performing static analysis as well as intermediate representations and features of Golang that affect the process. We have also examined the challenges faced by developers of static analyzers. This survey will be helpful for both developers of static analyzers and Golang developers, providing a systematic understanding of current research in static analysis for Go.

Increasing Precision of Static Code Analysis Using Large Language Models

Danila Dmitrievich PANOV, Nikita Vladimirovich SHIMCHIK, Dmitrii Aleksandrovich CHIBISOV, Andrey Andreevich BELEVANTSEV, Valery Nikolayevich IGNATYEV

PDF (Rus)

83-100

Abstract

This paper describes an approach to verifying the results of static code analysis using large language models (LLMs), which filters warnings to eliminate false positives. To construct the prompt for LLM, the proposed approach retains information collected by the analyzer, such as abstract syntax trees of files, symbol tables, type and function summaries. This information can either be directly included in the prompt or used to accurately identify the code fragments required to verify the warning. The approach was implemented in SharpChecker – an industrial static analyzer for the C# language. Testing on real-world code demonstrated an improvement in result precision by up to 10 percentage points while maintaining high recall (0.8 to 0.97) for context-sensitive and interprocedural path-sensitive detectors of resource leaks, null dereferences, and integer overflows. In case of unreachable code detector, use of information from the static analyzer improved recall by 11–27 percentage points compared to an approach that only uses the program's source code in the prompt.

Machine Learning-Based Validation of Warnings in an Industrial Static Code Analyzer

Uljana Vladimirovna TSIAZHKAROB, Mikchail Vladimirovich BELYAEV, Andrey Andreevich BELEVANTSEV, Valery Nikolayevich IGNATYEV

PDF (Rus)

101-120

Abstract

This paper describes a mechanism for the automatic classification of static analysis warnings using machine learning methods. Static analysis is a tool for detecting potential vulnerabilities and bugs in source code. However, static analyzers often generate a large number of warnings, including both true and false positives. Manually analyzing all the defects found by the analyzer is a labor-intensive and time-consuming task. The developed automatic classification mechanism demonstrated high precision of more than 93% with a recall of about 96% on a set of warnings generated by the industrial static analysis tool Svace during the analysis of real-world projects. The dataset for the machine learning model is generated based on the warnings and source code metrics obtained during the static analysis of the project. The paper explores various approaches to feature selection and processing for the classifier, taking into account the characteristics of different machine learning algorithms. The mechanism’s efficiency and its independence from the programming language allowed it to be integrated into the industrial static analysis tool Svace. Various approaches to integrating the tool were considered, accounting for the specifics of the static analyzer, and the most convenient one was selected.

Fast Calls and In-Place Expansion: A Hybrid Strategy for VM Intrinsics

Denis Vladislavovich ZAVEDEEV, Roman Aleksandrovich ZHUYKOV, Leonid Vladlenovich SKVORTSOV, Mikhail Vyacheslavovich PANTILIMONOV

PDF (Eng)

121-134

Abstract

This research proposes a hybrid approach for implementing performance-oriented compiler intrinsics. Compiler intrinsics are special functions that provide low-level functionality and performance improvements in high-level languages. Current implementations typically use either in-place expansion or call-based methods. In-place expansion can create excessive code size and increase compile time but it can produce more efficient code in terms of execution time. Call-based approaches can lose at performance due to call instruction overhead but win at compilation time and code size. We survey intrinsics implementation in several modern virtual machine compilers: HotSpot Java Virtual Machine, and Android RunTime. We implement our hybrid approach in the LLVM-based compiler of Ark VM. Ark VM is an experimental bytecode virtual machine with garbage collection and dynamic and static compilation. We evaluate our approach against in-place expansion and call approaches using a large set of benchmarks. Results show the hybrid approach provides considerable performance improvements. For string-related benchmarks, the hybrid approach is 6.8% faster compared to the no-inlining baseline. Pure in-place expansion achieves only 0.7% execution time improvement of the hybrid implementation. We explore two versions of our hybrid approach. The "untouched" version lets LLVM control inlining decisions. The "heuristic" approach was developed after we observed LLVM's tendency to inline code too aggressively. This research helps compiler developers balance execution speed with reasonable code size and compile time when implementing intrinsics.

Source Code Annotation for Static Analysis

Vitaly Olegovich AFANASYEV, Alexey Evgenevich BORODIN, Evgeny Alexandrovich VELESEVICH, Boris Viktorovich ORLOV

PDF (Rus)

135-148

Abstract

This paper describes source code annotations for static analysis. С/C++ attributes and JVM annotations are considered. Primary goals and reasons for source code annotation for static analysis are given. Main implementation aspects of annotations in Svace static analyzer are described.

Using Large Language Models for Table Header Recognition

Ilia Igorevich OKHOTIN, Nikita Olegovych DORODNYKH

PDF (Rus)

149-166

Abstract

Automatic table header recognition remains a challenging task due to the diversity of table layouts, including multi-level headers, merged cells, and non-standard formatting. This paper is the first to propose a methodology to evaluate the performance of large language models on this task using prompt engineering. The study covers eight different models and six prompt strategies with zero-shot and few-shot settings, on a dataset of 237 tables. The results demonstrate that model size critically affects the accuracy: large models (405 billion parameters) achieve F1 ≈ 0.80–0.85, while small ones (7 billion parameters) show F1 ≈ 0.06–0.30. Complicating prompts with step-by-step instructions, search criteria, and examples improves the results only for large models, while for small ones it leads to degradation due to context overload. The largest errors occur when processing tables with hierarchical headers and merged cells, where even large models lose up to accuracy of recognition. The practical significance of this paper lies in identifying optimal configurations of prompts for different types of models. For example, short instructions are effective for large models, and step-by-step instructions with search criteria are effective for medium ones. This study opens up new possibilities for creating universal tools for automatic analysis of table headers.

Enhancing E-Government Services through Chatbot Development Using Azure OpenAI

Lazar RADOSAVLJEVIĆ, Milica SIMIĆ, Aleksandar JOKSIMOVIĆ, Tamara NAUMOVIĆ, Marijana DESPOTOVIĆ-ZRAKIĆ

PDF (Eng)

167-180

Abstract

Implementing and developing chatbots as e-government services contributes significantly to the modernization and increased efficiency of public services. By leveraging Microsoft technologies - specifically the Azure OpenAI service - it is possible to rapidly and effectively develop intelligent chatbots. When integrated with the e-government portal, such chatbots offer users improved access to information and enable personalized communication between citizens and government institutions. A key issue currently lies in the lack of effective communication channels, which results in longer response times and reduced user satisfaction. The objective of this paper is to develop a chatbot that enhances service quality and brings public services closer to citizens via the e-government portal. The paper analyzes chatbot functionalities such as response generation, relevance checking of user queries, and information filtering. Furthermore, attention is given to legal and ethical considerations, data protection, and continuous model training to maintain data accuracy. This paper also explores and demonstrates how modern artificial intelligence technologies contribute to making public services more accessible to users. The proposed solution to the identified challenges involves the implementation of a chatbot model integrated with the e-government portal to improve communication. Ultimately, the focus is placed on the digitalization and modernization of public sector services to deliver benefits for society as a whole.

Plural Number Markers of Nouns in Selkup Dialects

Sergei Vasilievich KOVYLIN

PDF (Rus)

181-192

Abstract

The paper is devoted to the study of plural forms of nouns in the southern, transitional, central and northern dialects of Selkup. The material comprises corpus data of more than 85,000 tokens located on the Lingvodoc platform and in personal archives (Fieldworks Language Explorer files), as well as general grammatical and lexical papers on the language. The plural of nouns can be expressed by the suffixes -t(V), -la, -i (-ni), by the contaminated marker -lat, plural suffixes in combination with the markers of mutual connection ‑sa- and the collective plurality -mɨ- – -sat, -sala, -mɨt, mɨtla, as well as by the suffix of mutual connection -mɨ- without additional markers of plurality. As a result of the research, it was found out that the main plural suffix in the southern and transitional language zones is -la, in the central Vasyugan and Tym, as well as in the northern dialects – -t(V). In the central Narym dialect, both markers -t(V) and -la are found, including the contaminated form -lat, where the suffix -t(V) is more characteristic for the northern part, and -la for the southern part. In northern and central dialects, the plural marker -i (-ni) is used in possessive forms, while in southern and transitional dialects it was replaced by the marker -la in the same positions. In southern, transitional, central and northern dialects, with kinship terms, the suffix -sa- together with -t(V) – sat is used; more rarely, in southern, central, and northern dialects the collective suffix is -mɨ- together with -t(V) – -mɨt is observed. In northern dialects, the marker ‑mɨ- is used to express collective plurality without additional suffixes of numbers.

Turkic-Mongolian Parallels in the Vocabulary of the Material Culture of the Turkic Languages of the Ural-Volga Region (Based on the Names of Horse Colors)

Rimma Talgatovna MURATOVA

PDF (Rus)

193-202

Abstract

The article discusses the names of horse breeds that have lexical parallels in the Turkic languages of the Ural-Volga region and the Mongolian languages. The research was conducted based on data on the etymology and lexicology of the Turkic and Mongolian languages. An attempt was made to identify the distribution areas of individual lexemes. The search for etymologies and mapping were conducted using the linguistic platform LingvoDoc. The following features of the Turkic-Mongolian parallels of color designations in the Turkic languages of the Ural-Volga region have been identified: first, some of the color designations common to the Turkic languages of the Ural-Volga region and the Mongolian languages are genetically related and originate from the Proto-Altaic forms; second, the Mongolian languages contain Turkic and the Turkic languages contain Mongolian borrowed color names. It has been established that in the Turkic languages of the Ural-Volga region, Mongolian borrowings (color names) are mainly names of horse colors, while in the Mongolian languages, Turkic borrowings can refer to both animal’s colors and colors.

Material Culture Lexicon in M.A.Castrén’s Dictionary (1844) and an Audio Dictionary of the Ižma Dialect (2012): Comparative Analysis on LingvoDoc

Olga Nikolayevna BAZHENOVA

PDF (Eng)

203-218

Abstract

This article presents a comparative analysis of the material culture lexicon in the Izhma dialect of the Komi language, utilizing data from M.A. Castrén’s 1844 dictionary (revised 2022 edition) and a 2012 audio dictionary of the Beloyarsk village dialect processed and uploaded by E.V. Kashkin onto the LingvoDoc platform. The study aims to identify key trends in lexical dynamics by examining the interplay of indigenous vocabulary with borrowings, innovations, and archaisms over a period of more than 160 years. The Izhma dialect, shaped by intensive historical contacts with Russian and Nenets languages in a unique multi-ethnic environment, offers a significant case for understanding language vitality and shift. The research employed the LingvoDoc platform for processing and analyzing 127 lexemes from Castrén’s work and 167 from the modern audio dictionary. Lexemes were categorized into five thematic groups: dwelling, utensils and household items, clothing and footwear, tools and crafts, and transport. Each item was compared based on its presence in sources, phonological form, meaning, and etymology, identifying direct correspondences, archaization, innovation, lexical replacement, and phonetic-morphological changes. Results indicate distinct patterns across thematic groups. The "Tools and Crafts" category exhibited the highest proportion of archaisms (50%), reflecting the decline of traditional practices. Conversely, the "Transport" group showed the most significant innovation (73.9%), driven by new terminology and borrowings. The findings underscore that observed differences are not solely lexical transformations but also reflect varying recording completeness and focus between historical and modern sources. Overall, the Izhma dialect's material culture vocabulary reveals areas of both stability and active restructuring, providing valuable insights for reconstructing lexical subsystems, analyzing contact linguistics, and describing language evolution in peripheral Komi linguistic regions.

Segmentation of Documents Based on Graph Neural Networks: from Strings to Words

Daniil Evgenievich KOPYLOV, Andrey Anatolievitch MIKHAYLOV, Roman Igorevich TRIFONOV

PDF (Rus)

219-232

Abstract

The paper presents a method for analyzing the layout of PDF documents based on graph neural networks (GNN), which uses words as graph nodes to overcome the limitations of modern approaches based on strings or local areas. The proposed WordGLAM model, based on modified graph convolutional layers, demonstrates the possibility of constructing hierarchical structures through word aggregation, which ensures a balance between the accuracy of element detection and their semantic connectivity. Despite lagging behind state-of-the-art models (for example, Vision Grid Transformer) in accuracy metrics, the study reveals systemic problems of the region: data imbalance, ambiguity in word clustering ("chain links", "bridges" between unrelated regions), as well as controversial criteria selecting classes in the markup. The key contribution of this work is the formulation of new research tasks, including optimization of vector representations of words, consideration of edge embeddings, and development of estimation methods for complex word hierarchies. The results confirm the prospects of the approach for creating adaptable models capable of processing multi-format documents (scientific articles, legal texts). This paper highlights the need for further research in the field of regularization and extension of training data, opening up ways to improve the portability of layout analysis methods to new domains. The code and models were published on GitHub (https://github.com/YRL-AIDA/wordGLAM).

Comparison of the Interpretability of ResNet50 and ViT-224 Models in the lassification Task is Erroneous on Images of a Scanned Microscope Object

Vladimir Nikolaevich GRIDIN, Ivan Aleksandrovich NOVIKOV, Basim Raed SALEM, Vladimir Igorevich SOLODOVNIKOV

PDF (Rus)

233-242

Abstract

The paper studies the interpretability of two popular deep learning architectures, ResNet50 and Vision Transformer (ViT-224), in the context of solving the problem of classifying pathogenic microorganisms in images obtained using a scanning electron microscope and preliminary sample preparation using lanthanide contrast. In addition to standard quality metrics such as precision, recall, and F1 score, a key aspect was the study of the built-in attention maps of Vision Transformer and post-interpretation of the performance of the trained ResNet50 model using the Grad-CAM method. The experiments were performed on the original dataset, as well as three of its modifications: with a zeroed background (threshold), with modified image areas using the inpainting method, and with a completely cleared background using zeroed background areas. To evaluate the generality of the attention mechanism in Vision Transformer, a test was also conducted on the classic MNIST handwritten digit recognition task. The results showed that the Vision Transformer architecture exhibits more localized and biologically based attention heatmaps, as well as greater resilience to changes in background noise.

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Cookies policy