The paper presents modifications of fastText word embedding model based solely on n-grams, for morphological analysis of texts. fastText is a library for classifying texts and teaching vector representations. The representation of each word is calculated as the sum of its individual vector and the vectors of its symbolic n-grams. fastText stores and uses a separate vector for the whole word, but in extra-vocabular cases there is no such vector, which leads to a deterioration in the quality of the resulting word vector. In addition, as a result of storing vectors for whole words, fastText models usually require a lot of memory for storage and processing. This becomes especially problematic for morphologically rich languages, given the large number of word forms. Unlike the original fastText model, the proposed modifications only pretrain and use vectors for the character n-grams of a word, eliminating the reliance on word-level vectors and at the same time helping to significantly reduce the number of parameters in the model. Two approaches are used to extract information from a word: internal character n-grams and suffixes. Proposed models are tested in the task of morphological analysis and lemmatization of the Russian language, using SynTagRus corpus, and demonstrate results comparable to the original fastText.
Software requirements are quite difficult to measure in terms of quality without reviews and subjective opinions of stakeholders. Quality assessment of specifications in an automated way saves project resources and prevents future latent defects in software. Requirements quality can be evaluated based on a huge variety of attributes, but their meaning is quite vague without any mapping to specific measurement metrics. Application of goal-question-metric (GQM) approach in the quality model helps to choose the most important quality attributes and create a mapping with metrics, which can be collected and calculated automatically. Text of software requirements written in natural language can be analyzed by NLP tools due to identify weak signle words and phrases, which make statements ambiguous. Metrics for such quality attributes as ambiguity, singularity, subjectivity, completeness, and readability are proposed in this work. The quality model was implemented in a prototype by adopting natural language processing techniques for requirements written in the Russian language with the support of external API.
The possibility of using machine learning technology to solve the problem of project analysis in order to support the decision to participate in the tender for the implementation of the project is substantiated and shown using a specific example. The approaches are described and the process of solving the problem of binary classification of projects using libraries of the Python language is shown. Attention is paid to the problem of choosing an algorithm for constructing a membership function, the problem of generating and analyzing input data, and evaluating the accuracy of a solution. It is shown that for the considered problem the best solution is provided by the logistic regression algorithm.
The widespread use of cloud technology allows optimizing the economic costs of maintaining the IT infrastructure of enterprises, but this increases the likelihood of theft of confidential data. One of the mechanisms to protect data from theft is cryptography. Using the classical primitives of symmetric and asymmetric encryption does not allow processing data in encrypted form. Homomorphic encryption is used for processing confidential data. Homomorphic encryption allows performing of arithmetic operations over encrypted text and obtaining an encrypted result that corresponds to the result of operations performed over plaintext. One of the perspective directions for constructing homomorphic ciphers is homomorphic ciphers based on Learning with Errors. In this paper we examine the cryptographic properties of existing homomorphic ciphers (CKKS, BFV) based on Learning with Errors, compare their technical characteristics: cryptographic strength and data redundancy, data encoding and decoding speed, speed of arithmetic operations, such as addition and multiplication, KeySwitching operation speed.
Interrupt system is an important part of microprocessors. Interrupts are widely used for interaction with hardware and responding to stimuli. Modern microprocessor interrupt systems include hardware support of virtualization. Hardware support helps to increase the performance of virtual machines. However, including additional functionality may lead to potential errors. The paper presents an overview of approaches used for multi-core microprocessors interrupt system with virtualization support verification. Some definitions and characteristics of interrupt systems that needed to be taken into account in the process of verification are described. Stand-alone verification environment general scheme is presented. Universal Verification Methodology was applied to construct test system. To simplify development of checking module discrete-event with time accounting reference model was used. Sequences of primary requests and automatically generated secondary requests in the special modules named auto-handlers were used for test system behavior randomization. We describe some difficulties discovered in the verification process and corresponding solving methods. Generalized test algorithm stages are presented. Some other techniques for checking the correctness of interrupt system have been reviewed. In conclusion, we provide the case study of applying the suggested approaches for interrupt system verification of microprocessors with “Elbrus” and “SPARC-V9” architectures developed by MCST. The results and further plan of the test system development are presented.
Performance characteristics of any modern microprocessor largely depend on its memory subsystem. Naturally, the memory subsystem software model is an important component of the cycle-accurate simulator, and its validity and quality have high impact on the overall accuracy of the simulation. In this paper the cycle-accurate application-level simulator of the Elbrus microprocessor family is introduced. The structure of the cycle-accurate simulator is briefly explained. After that the software model of memory subsystem and its integration as a part of the cycle-accurate application-level simulator are described. We evaluate accuracy of the application-level cycle-accurate simulator on the SPEC CPU2006 benchmark and analyze the simulation errors. Finally, a brief comparison of different Elbrus architecture simulators is given.
An Instruction Set Architecture (ISA) is the core around which the rest of the CPU is built. Errors or inflexibility in decisions once embedded in a system of instructions remain with this generation of processors forever. Therefore, one of the key reasons why the performance growth of modern CPUs has slowed down is that the source code of the processors “got corrupted” in literal and figurative sense of the word: the processors inside become complex which makes their further development difficult. The development of modern computers (CPU, GPU or specialized ones) is in any case an extremely expensive process consisting of a large number of costly articles. Therefore, the issue of cost of developing a processor is the corestone. In this work we conducted a study of existing popular processor command systems and made conclusions about the prospects of the RISC-V and other open source instruction set architectures. We tried to answer the following questions: why the processor instruction set architecture is really important? Why RISC-V, why is it better than the others? Which opportunities does RISC-V open for developers around the world and what analogues does it have?
Today, a laborious and non-trivial task is to automate monitoring of hard drives in a cluster infrastructure using the Kubernetes container management system. The paper discusses existing approaches to monitoring hard drives in the Kubernetes container orchestration system and provides a comparative analysis of them. Based on the presented analysis, a conclusion is drawn on the need to improve and automate approaches. The paper proposes an approach to automating the collection of metrics from hard drives by implementing the Kubernetes “operator” for a tool with which you can effectively obtain information about the state of hard drives in the system. As results, the temporal characteristics of collecting information about disks using existing approaches and the proposed approach are given. Numerical results and graphs showing the gain of the proposed approach are presented
SMT solvers are widely applied for deductive verification of C programs using various verification platforms (Why3, Frama-C/WP, F*) and interactive theorem proving systems (Isabelle, HOL4, Coq) as the decision procedures implemented in SMT solvers are complete for some combinations of logical theories (logics), in particular for the QF_UFLIA logic. At the same time, when verifying C programs, it is often necessary to discharge formulas in other logical theories and their combinations, that are also decidable but not supported by all SMT solvers. Theories of bounded integers both with overflow (for unsigned integers in C) and without overflow (for signed integers), and also theory of finite interpreted sets (needed to support frame conditions) are good examples of such theories. One of the possible ways to support such theories is to directly implement them in SMT-solvers, however, this method is often time-consuming, as well as insufficiently flexible and universal. Another way is to implement custom quantifier instantiation strategies to reduce formulas in unsupported theories to formulas in widespread decidable logics such as QF_UFLIA. In this paper, we present an instantiation procedure for translating formulas in the theory of bounded integers without overflow into the QF_UFLIA logic. We formally proved soundness and completeness of our instantiation procedure in Isabelle. The paper presents an informal description of this proof as well as some considerations on the efficiency of the proposed procedure. Our approach is sufficient to obtain efficient decision procedures implemented as Isabelle/HOL proof methods for several decidable logical theories used in C program verification by relying on the existing capabilities of well-known SMT solvers, such as Z3 and proof reconstruction capabilities of the Isabelle/HOL proof assistant.
Trace models such as Finite State Machines (FSMs) are widely used in the area of analysis and synthesis of discrete event systems. FSM minimization is a well-known optimization problem which allows to reduce the number of FSM states by deriving the canonical form that is unique up to isomorphism. In particular, the latter is a base for FSM-based ‘black-box’ test derivation methods such as W-method and its modifications. Since nowadays the behavior of many software and hardware systems significantly depends on time aspects, classical FSMs are extended by clock variables including input and output timeouts. The minimization of a Timed Finite State Machine (TFSM) includes both state and time aspects reduction. Existing approaches allow to derive the canonical representation for non-initialized deterministic TFSMs while for initialized TFSMs, i.e., TFSMs with the designated initial state, several pair-wise non-isomorphic minimal forms can exist. In this paper, we determine initialized TFSM classes for which the minimal form is unique up to isomorphism.
The theory of formal languages and, particularly, context-free grammars has been extensively studied and applied in different areas. For example, several approaches to the recognition and classification problems in bioinformatics are based on searching the genomic subsequences possessing some specific features which can be described by a context-free grammar. Therefore, the string-matching problem can be reduced to parsing – verification if some subsequence can be derived in this grammar. Such field of application as bioinformatics requires working with a large amount of data, so it is necessary to improve the existing parsing techniques. The most asymptotically efficient parsing algorithm that can be applied to any context-free grammar is a matrix-based algorithm proposed by Valiant. This paper aims to present Valiant’s algorithm modification, which main advantage is the possibility to divide the parsing table into successively computed layers of disjoint submatrices where each submatrix of the layer can be processed independently. Moreover, our approach is easily adapted for the string-matching problem. Our evaluation shows that the proposed modification retains all benefits of Valiant’s algorithm, especially its high performance achieved by using fast matrix multiplication methods. Also, the modified version decreases a large amount of excessive computations and accelerates the substrings searching.
The language-oriented approach is becoming more and more popular in the development of information systems, but the existing DSM platforms that implement this paradigm have significant limitations, including insufficient expressive capabilities of the models used to implement visual model editors for complex subject areas and limited abilities to transform visual models. Visual languages are usually based on graph models, but the types of graphs used have certain limitations, such as insufficient expressiveness, the complexity of representing large-dimensional models and operation executions. For creating a tool that does not have the described constraints, development of a new formal model is needed. HP-graphs can become a solution for this problem. It is not only possible to create new visual languages for diverse domains based on them, but also to develop efficient algorithms to perform different operations on models constructed using these languages. The HP-graph definition is given and the justification of the expressive power of the proposed model is presented, the main operations for HP-graphs are described. The chosen graph formalism combines the capabilities of different types of graphs to represent visual models and allows creating a flexible model editor for the DSM platform, to implement effective algorithms of performing operations, in particular, model transformations.
Randomized testing (fuzzing) is a well-known approach for finding bugs in programs. Fuzzing is typically performed during the finishing stage of quality assurance in order to check the stability of the target program in the face of malformed or unexpected input data. Modern software more than often provides an API for extending its functionality by third-party developers; since an API is an entry point to software internals, its functionality and usage scenarios must be tested as well. Thorough API testing must involve checking a large number of possible scenarios and it is fairly obvious that fuzzing can be applied to this task by generating usage scenarios in an automatic randomized way—which brings us to the concept of API fuzzing. In this paper we describe an automatic approach to randomized testing of API libraries for Android/desktop Java. Proposed method is able to change the sequence of called API functions in order to discover new execution paths. It consists of two basic stages. In the first stage the arguments of currently called API functions are mutated. When mutation of called API functions arguments can’t find new execution path the tool switches to the second stage. In the second stage current sequence of API functions calls is mutated. Mutation can add new API functions calls or remove some of them. After API calls sequence mutation, the tool switches back to the first stage. Switches between the first and the second stages are continued during whole process of fuzzing. During the experimental setup developed method of randomized testing were able to find 15 crashes in SmartThings application developed by Samsung.
Avionic industry in Russian Federation faces difficulties in organizing the reliable instrumental support of development processes. State-wide active direction on digitalization of the economy doesn’t facilitate the issue solving. The choice of software tools is an important component of success while developing complex certifiable software such as aircraft onboard systems. The same situation could be observed in other industries as well. Nowadays the Russian IT-market provides a sufficient amount of different software that can cover the development lifecycle processes of complex certifiable software for avionics in a varying degree. This article analyses the current situation on Russian software market and the impact of import substitution policy of Russian Federation on software developers and consumers – industrial enterprises. Details of regulation document DO-178C for onboard software development are considered to show the importance of correct choice of project’s instrumental landscape. Certain types of specialized software tools for development processes automating are considered. Authors identified the basic groups of tool functionality that provide support for the development lifecycle of onboard software. The Russian and foreign PLM (Product Lifecycle Management) and PDM (Product Data Management) systems and other software were examined for compliance with the necessary functionality. For comparative analysis the method based on additive verification of software by criteria was proposed. Research results allowed authors to make a conclusion about current Russian software level in comparison with worldwide analogues. Also some prospects of Russian software further evolution have received justification based on results of this research. Recommendations for the directions of software development and completion are given. The analysis, presented in the article, can be useful for avionic and other industries enterprises which need to choose some software for support the development lifecycle processes in new and ongoing projects of complex systems development. Also specialists who are interested in the current state of Russian IT industry can find some valuable information in this article.
ISSN 2220-6426 (Online)