This research aims at studying the bisimulation relation for memory finite automata, which are used as the automata model for extended regular expressions in the series of works, and encapsulate the expressiveness of the named capture groups. We propose an experimental algorithm for checking bisimulation of one-memory MFAs. For multi-memory automata, we show that, in some borderline cases, the bisimulation problem is closely related to a question of whether a parameterized word is always a solution of a given word equation of an arbitrary form.
The paper discusses issues related to the topic of high-performance computing (HPC) systems designed to solve numerical simulation problems. A detailed description of the architecture, functional composition, as well as the system software used and information processing features in the HPC systems is provided. The authors of the work rely on their own long-term experience in creating HPC systems for Russian scientific and industrial centers. This work may be useful to specialists involved in the development and operation of modern high-performance computing systems designed for scientific research.
Full system cross-ISA emulation is widely used nowdays, but is known for being slow. Major contribution to the slowdown is made by software MMU doing guest virtual addresses translation. In article we look at optimization which allows to move part of such address translation work to the hardware MMU of the host system. For this goal, extra view to the whole guest virtual address space is added to the address space of the emulator process, using mmap system call. After mapping is done there is opportunity to use fixed offset correction to guest virtual address in the translated binary code in place of dynamic search of needed offset in software TLB. Additional view of guest virtual address space maintained coherent with guest page tables. Such approach allows to use less host instructions per each guest memory instruction, which lead to notable emulation acceleration, considering the large quantity of memory instructions in the guest execution flow. Measurments show speed up as large as 271% for benchmark tests and up to 217% for the real-world program. Ideas are proposed for overcoming some limitations of described approach.
The semantic gap is one of the key problems in developing solutions for full-system dynamic analysis. At the hypervisor level, tools have access only to low-level binary data, while analysis requires high-level information about the state of guest operating system objects. Virtual machine introspection approaches solve this problem. Unfortunately, implementations of existing approaches face performance issues and lack of functionality. They require the user to embed special agents into the virtual machine image or have debugging symbols for the OS kernel. They also turn out to work only for specific systems and processor architectures. The article presents a number of solutions that reduce overhead and increase the versatility of the analysis tool. The peculiarity of the developed introspection approach is that it does not require any additional actions from the user, collecting the information necessary for analysis during the OS boot on the emulator.
This article provides an overview of modern transfer learning methods in network intrusion detection systems (IDS), focusing on the problem of model stability in conditions of network data drift, traffic variability, and the emergence of new types of attacks. The main transfer paradigms – parametric, feature-based, and relationship-based – and their adaptation to the task of anomaly detection and network intrusion classification are considered. Particular attention is paid to the differences between methods based on the analysis of statistical properties of network flows and methods based on packet analysis. Based on an analysis of existing work, it is demonstrated that the use of transfer learning can significantly improve the robustness of network IDSs to changes in infrastructure and data distributions, but faces problems of negative transfer, lack of representative domain sources, and architectural complexity. Finally, key directions for further research are formulated, including adaptive models that account for drift, transfer under limited data conditions, and integration with streaming machine learning methods.
Correct design and implementation of concurrent algorithms is a crucial part of modern real-time operating system development. One of the main steps along this way is a verification of such algorithms within the programming language memory model. The article describes an integration of the ThreadSanitizer – broadly used LLVM tool for data race detection – into the RTOS kernel environment and discusses its advantages and disadvantages over other tools for data race detection. Among other things, the semantics of context switches and interrupt management within the happens-before synchronization model is considered. In conclusion the results of a ThreadSanitizer tool integration are provided compared to current approaches of concurrency bugs detection in RTOS kernel.
In this paper we present an approach to static analysis of Python programs based on a low-level intermediate representation and devirtualization to provide interprocedural and intermodule analysis. This approach can be used to analyze Python programs without type annotations and find complex defects inaccessible to traditional AST-based analysis tools. Using CPython bytecode as a base, the representation suitable to static analysis is constructed and call resolution is performed via an interprocedural devirtualization algorithm. We implemented the proposed approach in a static analyzer for finding errors in C, C++, Java, and Go programs and achieved good results on open-source projects with minimal modifications to existing detectors. The detectors that are relevant to Python had a true positive rate from 60% up to 96%. This demonstrates that our approach allows to apply techniques used for analysis of statically typed languages to Python.
Error recovery is a critical component of parsing technology, particularly in applications such as IDEs and compilers, where a single syntax error should not prevent further analysis of the input. This paper presents PereFlex – a tool for extensive experimental evaluation of error recovery in JVM-based parsers. Our evaluation is based on real-world parsers for Java and users' erroneous programs. The results demonstrate that while some strategies are fast, they often fail to provide meaningful recovery, whereas advanced methods offer better recovery quality at the cost of increased computational overhead.
For detecting race conditions in multithreaded programs, dynamic analysis methods can be used. Dynamic methods are based on observing program behavior on real program executions. Since analyzing all possible execution paths is generally infeasible (due to the combinatorial explosion of possible thread interleavings), dynamic methods can overlook certain bugs that manifest only within the specific conditions or thread interleavings. This limitation applies, for instance, to the approach implemented in the previous version of RaceHunter tool, which demonstrates the ability to effectively detect race conditions, but may still miss certain cases. To address the combinatorial explosion problem, context bounding analysis can be used. Context bounding is a dynamic analysis technique that limits the number of thread switches in each explored execution path, enabling more scalable exploration. This method is able to detect bugs missed by other techniques with the bound of only two preemptive thread switches.
In this work, we present an implementation of context bounding within the RaceHunter tool, which provides a unified framework for describing various dynamic analysis techniques. The evaluation shows that the proposed approach is able to detect race conditions that other methods missed, though at the cost of significantly increased analysis time. As expected, this increase in analysis time is caused by repeated executions. Still, the implementation is an important foundation for future integration with other race detection techniques, specifically with the approach already implemented in the RaceHunter tool.
The article explores the potential of artificial intelligence for discovering new etymologies. It consists of two parts: the first describes the structure of the neural network, while the second provides examples of new types of etymologies, including Erzya additions to existing well-known etymologies, separate Finnic-Erzya parallels, and new hypotheses regarding borrowings from Baltic and Germanic languages. The purpose is to demonstrate the kinds of new etymologies that can be proposed within a relatively short time frame for languages with an established etymological tradition through the use of a neural network. The study utilizes a Finnish-Russian dictionary containing 17,212 lexemes and an Erzya-Russian dictionary comprising 8,512 lexemes, both hosted on the LingvoDoc platform. A neural network capable of proposing new etymologies for dictionaries on the lingvodoc.ispras.ru platform has been developed. Using this tool, Finnish and Erzya dictionaries were processed, resulting in the identification of over 100 new etymologies. Among these, 16 etymologies are discussed in the article, pertaining both to native Finno-Ugric vocabulary and borrowings.
In the modern restaurant business, accurate mapping of product nomenclatures between restaurants and suppliers is a critical task. Effective inventory management and procurement optimization directly impact business profitability. With the increase in suppliers and product variety, traditional mapping methods become less efficient. This study proposes using large language models (LLM) to automate and improve the accuracy of product matching. Through a pilot project for a restaurant holding, we tested five product groups (shrimp, eel, parmesan cheese, cottage cheese, butter), achieving an average test accuracy of 83.8%. The solution architecture leverages prompt engineering, low-code platforms like Flowise, and Telegram integration for user-friendly processing. Key challenges, including semantic ambiguity and model hallucinations, were addressed via domain-specific dictionaries and validation. This approach reduces manual effort by approximately 90%, enabling scalable supply chain solutions applicable beyond restaurants to retail and e-commerce.
The article presents a model based on a convolutional neural network that matches a vector of embeddings encoding information about fonts to a text image. The model consists of two identical convolutional blocks that combine features into a vector, which is then analyzed by linear layers to find differences. The model trained in this way is able to distinguish fonts, ignoring the text content, which makes it universal for various types of documents. Embedding vectors are tested on additional tasks, such as text classification by fatness and tilt, demonstrating high accuracy and confirming their usefulness for analyzing stylistic features. Experiments with variable and manual fonts show the versatility of the model and its applicability to work with a variety of data. The results of the comparison with the base model confirm the effectiveness of the proposed architecture. However, the limitations associated with working with low-quality data and multilingual texts have been identified. The code and models were published on GitHub (https://github.com/YRL-AIDA/FontEmb).
Wave attractor numerical simulation is a costly problem that requires a precise calculation method as well as thorough setup parameters determination. These two facts require using preprocessing methods before CFD simulation. The coherent structure for a geometry and stratification selected will appear in a certain range of perturbation frequencies, which are typically unknown in advance. To check whether the attractor forms, one can run a ray tracing which represents the propagation of internal wave narrow beams in inviscid linear approximation of the Navier-Stokes equations. The current article describes the algorithm that can be used for the ray tracing on a wide class of problems. It is shown that this method is capable to detect specific forms of attractors under specific conditions. Additionally, a ray convergence measure estimation is proposed.
The paper proposes a mathematical model for studying the erosion of the coastal slope of the Pemzenskaya channel (Amur River) in the area of the overflow dam after the formation and expansion of the proran. The proran in the overflow dam was formed due to the erosion of the right bank during the floods of 2019-2022. It is known that the time to establish the hydrodynamic parameters of the flow is much shorter than the time to change its flow rate, therefore, the flow in the dam area is described within the quasi-stationary approximation. The algebraic model of Leo K.Van Rijn is used to model the turbulent viscosity of the flow. Changes in the bottom and shore markings of the riverbed are calculated using an analytical model of sediment movement developed in the works of Petrov and Potapov (2019). In order to prevent siltation of the proran during lateral movement of bottom material from the dry shore, a runoff term is introduced into the equation of bottom deformations. This term regulates the proran depth, which asymptotically tends to its regime depth. An algorithm based on the finite element method has been numerically developed to solve the problem. The results of calculations of coastal deformations were compared with experimental data, which showed their good qualitative and quantitative agreement. The experimental data was obtained from the Amur open source information system.
ISSN 2220-6426 (Online)





