Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 36, No 1 (2024)
View or download the full issue PDF (Russian)
7-22
Abstract

The paper examines the hypothesis of the applicability of neural autoencoders as a method of vector compression in the pipeline of approximate nearest neighbor search. The evaluation was conducted on several large datasets using various autoencoder architectures and indexes. It has been demonstrated that, although none of the combinations of autoencoders and indexes can fully outperform pure solutions, in some cases, they can be useful. Additionally, we have identified some empirical relationships between the optimal dimensionality of the hidden layer and the internal dimensionality of the datasets. It has also been shown that the loss function is a determining factor for compression quality.

23-34
Abstract

The purpose of the work is to study the possibility of implementing virtual networks taking into account various parameters and their adjustments in software-configurable structures modeled by a weighted data plane graph. The work examines parameters of “resource” and “cost” types. For a resource type parameter, an edge is augmented with its “capacity,” and the number of paths passing through the edge must not exceed the edge’s capacity. For a parameter of the “cost” type, the path weight is the sum of the weights of the edges and the task of minimizing the weight of the path is set. For implementing a virtual network on a weighted graph, an algorithm for adjusting a virtual network taking into account parameters of the “resource” type and two algorithms for constructing a virtual network taking into account parameters of the “cost” type are proposed. In the latter case, one algorithm builds one path from each host to one host of the given subset of target hosts; another algorithm builds a set of paths for each host: one path to one host from each set of a family of sets of target hosts.

35-44
Abstract

Federated learning is a technology for privacy-preserving learning in distributed storage systems. This training allows you to create a general forecasting model, storing all the data in your storage systems. Several devices take part in training the general model, and each device has its own unique data on which the neural network is trained. The interaction of devices occurs only to adjust the weights of the general model. After which, the updated model is transmitted to all devices. Training on multiple devices creates many attack opportunities against this type of network. After training on a local device, model data is sent via some type of communication to a central server or global model. Therefore, vulnerabilities in a federated network are possible not only at the training stage on a separate device, but also at the data exchange stage. All this together increases the number of possible vulnerabilities of federated neural networks. As is known, not only neural networks, but also other models can be used to build federated classifiers. Therefore, the types of attacks directly on the network also depend on the type of model used. Federated neural networks are a rather complex design, different from neural networks and other classifiers, which can be vulnerable to various types of attacks because training occurs on different devices, and both neural networks and simpler algorithms can be used. In addition, it is necessary to ensure data transfer between devices. All attacks come down to several main types that exploit classifier vulnerabilities. It is possible to implement protection against attacks by improving the architecture of the classifier itself and paying attention to data encryption.

45-60
Abstract

Today fuzzing (fuzzing-testing) is the main technique for testing software, systems and code functions. Fuzzing allows identify vulnerabilities or software failures. However, this practice may require the large resources involvement and network performance in large organizations where the number of systems may be large. Developers and information security specialists are simultaneously required to comply with time-to-market deadlines, requirements of various regulators and recommendations of standards. In current paper is proposed new fuzzing method, which is designed to solve the problem above. In current aproach is proposed use fuzzing testing for whole computing network at ones in large organizations if them operate with microservices. Polymorphic systems in this paper are understood like systems that consist of various API (Application Programming Interface) functions that operate with various types of data, not within single software, but inside subsystems with a set of several microservices. In this case, a lot of various network protocols, data types and formats can be used. With such a variety of features, there is a problem of detecting errors or vulnerabilities inside systems, beacause debugging or trace interfaces are not always developed in the microservice softwares. So, in this paper it is proposed to use also the method of collecting and analyzing statistics of time intervals of processing mutated data by microservices. For fuzzing tests, it is proposed to use mutated lists of exploit payloads. Time analyzing between client-server requests and the responses helps to identify patterns that showed the presence of potentially dangerous vulnerabilities. This paper discribes fuzzing of API functions only in the HTTP protocol (Hypertext Transfer Protocol). Current approach does not have a negative impact on the effectiveness of development or deadlines. Methods and solution described in the paper are recommended to be used in large organizations as an additional or basic information security solution in order to prevent critical infrastructure failures and financial losses.

61-72
Abstract

The paper considers the problem of error detection and localization of modular code. The polynomial residue number system represents the input number as a set of polynomials over the finite field GF(2m), which are residues from dividing the original polynomial by a set of irreducible polynomials. The introduction of redundant moduli provides the required corrective capability of the noise-tolerant code. The application of entropy for error detection of a polynomial residue number system, error correction of which is performed by the maximum likelihood method, is considered. In the residue number system, a number is represented as residues from division by a set of mutually prime numbers. An approach to error detection through entropy is proposed for the residue number system, which allows to detect errors of higher multiplicity compared to the classical approach. The maximum likelihood and projection methods are considered for error correction. The introduced constraints on the control modulo allowed us to detect not only all single errors on working moduli, but also a number of errors on two moduli. A computational experiment was carried out to investigate the corrective abilities for three sets of moduli {3, 5, 7, 8}, {3, 5, 7, 37}, {3, 5, 7, 71}. A reliable distributed storage system is proposed to detect and correct errors that occur when data is ingested from clouds.

73-104
Abstract

The principles of quantum mechanics – superposition, entanglement, measurement, and decoherence – form the foundation of quantum computing. Qubits, which are abstract objects having a mathematical expression to implement the rules of quantum physics, are the fundamental building blocks of computation. Software is a key component of quantum computing, along with quantum hardware. Algorithms make up software, and they are implemented using logic gates and quantum circuits. These qualities make quantum computing a paradigm that non-physicists find difficult to comprehend. It is crucial to incorporate a conceptual framework of the principles upon which quantum computing is founded into this new method of creating software. In this paper, we present a kind of taxonomical view of the fundamental concepts of quantum computing and the derived concepts that integrate the emerging discipline of quantum software engineering. Because the systematic review's main goal is to identify the core ideas behind quantum computing and quantum software, we conducted a quasi-systematic mapping as part of the review process. The findings can serve as a starting point for computer science teachers and students to address the study of this field.

105-130
Abstract

Microservices are the most promising direction for developing heterogeneous distributed software systems capable of adapting to dynamic changes in business and technology. In addition to the development of new software systems, the migration from legacy monolithic systems to microservice architectures is also a prominent aspect of microservices use. These trends resulted in an increasing number of primary and secondary studies on microservices, stressing the need for systematization of research at a higher level. The objective of this study is to comprehensively analyze secondary studies in the field of microservices with objectives to inquire about publishing trends, research trends, domains of implementation, and future research directions. The study follows the guidelines for conducting a systematic literature review, which resulted in the findings derived from 44 secondary studies. The study findings are structured to address the proposed research objectives. Recommendations for further literature reviews relate to the improvement of quality assessment of selected studies to increase the validity of findings, a more detailed review of human and organizational factors through the microservices life cycle, the use of social science qualitative methods for more detailed analysis of selected studies, and inclusion of gray literature that will bring the real opinions and experiences of experts from industry.

131-142
Abstract

The Requirements Engineering (ER) phase plays a critical role in software development, as any shortcomings during this stage can lead to project failure. Analysts rely on Requirements Specification (RS) to define a comprehensive list of quality requirements. The process of requirements classification, within RS, involves assigning each requirement to its respective class, presenting analysts with the challenge of accurate categorization. This research focuses on enhancing the classification of non-functional requirements (NFR) using a Convolutional Neural Network (CNN). The study also emphasizes the significance of preprocessing techniques, the implementation of sampling strategies, and the incorporation of pre-trained word embeddings such as Fasttext, Glove, and Word2vec. Evaluation of the proposed approach is performed using metrics like Recall, Precision, and F1, resulting in an average performance improvement of up to 30% compared to related work. Additionally, the model is assessed concerning its utilization of pre-trained word embeddings through ANOVA analysis, providing valuable insights into its effectiveness. This study aims to demonstrate the utility of CNNs and pre-trained word embeddings in the classification of NFRs, offering valuable contributions to the field of Requirements Engineering and enhancing the overall software development process.

143-156
Abstract

User Engagement is a metric that represents a part of the user experience characterized by attributes of reactions, visibility and user interactivity with others. Quantitative and qualitative analysis were used to establish a new method for calculating User Engagement in Facebook* fan pages focused in dissemination of scientific content, news, and events. We focused on social media processes based on Spearman correlation coefficients and categorization of publications by format type and source of content. Variations in Engagement for individual posts were explained by a multiple linear regression model defined using the number of clicks and the reach of posts with an accuracy of up to 91% (R2). The User Engagement increases preferably when it is presented in photo format of an original content creation. 

*Facebook (owned by Meta) is prohibited in the Russian Federation

157-174
Abstract

The aim of this work is to contribute to the personalization of intelligent learning environments by analyzing user-object interaction data to identify On-Task and Off-Task behaviors. This is accomplished by monitoring and analyzing users' interactions while performing academic activities with a tangible-intangible hybrid system in a university intelligent environment configuration. With the proposal of a framework and the Orange Data Mining tool and the Neural Network, Random Forest, Naive Bayes, and Tree classification models, training and testing was carried out with the user-object interaction records of the 13 students (11 for training and two for testing) to identify representative sequences of behavior from user-object interaction records. The two models that had the best results, despite the small number of data, were the Neural Network and Naive Bayes. Although a more significant amount of data is necessary to perform a classification adequately, the process allowed exemplifying this process so that it can later be fully incorporated into an intelligent educational system to contribute to build personalized environments.

175-198
Abstract

The field of vision-based human event recognition in smart environments has emerged as a thriving and successful discipline, with extensive efforts in research and development driving notable progress. This progress has not only yielded valuable insights but also practical applications across various domains. Within this context, human actions, activities, interactions, and behaviors are all considered as events of interest in smart environments. However, when focusing on smart classrooms, a lack of unified consensus on the definition of "human event" poses a significant challenge for educators, researchers, and developers. This lack of agreement hinders their ability to precisely identify and classify specific situations that are relevant to the educational context. To address this challenge, the aim of this paper is to conduct a systematic literature review of significant events, with a particular emphasis on their applications in assistive technology. The review encompasses a comprehensive analysis of 227 published documents spanning from 2012 to 2022. It delves into key algorithms, methodologies, and applications of vision-based event recognition in smart environments. As a primary outcome, the review identifies the most significant events, categorizing them according to single-person behavior, multiple-person interactions, or object-person interactions, examining their practical applications within the educational context. The paper concludes with a discussion on the relevance and practicality of vision-based human event recognition in smart classrooms, especially in the post-COVID era.

199-208
Abstract

This paper shows quantitative research regarding knowledge, soft & hard skills, and experience acquired by students hired by a University Software Development Company (USDC). Additionally, suggestions regarding how to set up a USDC in an academic environment, facing real customers, are shown. There have been good and bad experiences, both will be presented in this paper. Furthermore, students' perceptions will be discussed. To identify students’ perceptions a questionnaire (survey) was applied. Its reliability was calculated through Cronbach’s alpha coefficient (α =.89). Additionally, the Pearson correlation coefficient was calculated (r) in order to identify questions that should be deleted to increase the questionnaire’s reliability. Outcomes could be useful when a software engineering faculty wishes to set up a USDC.

209-224
Abstract

This research examines the state of applied and proposed software bots in software development through a systematic literature review. Spanning from 2003 to 2022 and encompassing 83 primary studies, the study identifies four bot archetypes: chatbots, analysis bots, repair bots, and development bots. The key benefits of utilizing bots include improved software quality, provision of information to developers, and time savings through automation. However, drawbacks such as limited effectiveness and reliance on third-party technologies are also noted. The study highlights the potential of including bots in software development but emphasizes the need for further exploration and research in this area.

225-238
Abstract

Breast cancer is a serious threat to women’s health worldwide. Although the exact causes of this disease are still unknown, it is known that the incidence of breast cancer is associated with risk factors. Risk factors in cancer are any genetic, reproductive, hormonal, physical, biological, or lifestyle-related conditions that increase the likelihood of developing breast cancer. This research aims to identify the most relevant risk factors in patients with breast cancer in a dataset by following the Knowledge Discovery in Databases process. To determine the relevance of risk factors, this research implements two feature selection methods: the Chi-Squared test and Mutual Information; and seven classifiers are used to validate the results obtained. Our results show that the risk factors identified as the most relevant are related to the age of the patient, her menopausal status, whether she had undergone hormonal therapy, and her type of menopause.

239-250
Abstract

A person who has had a stroke needs rehabilitation to recover from the effects of the incident. A multidisciplinary team of experts performs rehabilitation, offering treatment from many fields, including neurology, nutrition, psychology, and physiotherapy. In the rehabilitation process, physicians interact with medical computing software and devices. The interactions represent medical activities that follow rehabilitation. Nevertheless, how specialists collaborate to do medical tasks is poorly understood using technologies since no particular means of communication enable interdisciplinary cooperation for integral rehabilitation of strokes. Therefore, we present a collaborative software architecture to assist and enable the monitoring of medical activities through multimodal human-computer interactions. The architecture has three layers: the first is to perceive interactions and monitor activities, the second is to manage information sharing and interdisciplinary access, and the third is to assess how well multidisciplinary activities were carried out. The physicians are assisted in their decision-making on the execution of the treatment plan by evaluating how the activities are carried out, which are recollected through the architecture proposed. As a result, we provide a prototype with a user-centered design that understands how the architecture supports human-computer interactions.

251-258
Abstract

The training of teachers in the inclusive classroom in attention to children with hearing disabilities is important for an educational system in equal conditions. The User-Centered Design (UCD) methodology and the System test Usability Scale (SUS) provided perception data to support teacher training in the inclusive classroom, especially for children with hearing impairment. The test (SUS) was applied to 12 teachers, the result of the study indicates that the usability of all the tools is above the standards (72.5), equivalent to a very good rating. The tool fostered acceptance by teachers for inclusive classroom training, in addition to needing a teacher training program where children with disabilities and learning disorders are cared for.

259-276
Abstract

This study proposes a machine learning approach to automatically detect "appeal to emotion" fallacies. The objective is to establish a set of elements that enable the application of fallacy mining. Our method uses a lexicon of emotions to distinguish valid arguments from fallacies, employing Support Vector Machine and Multilayer Perceptron models. The Multilayer Perceptron obtained an F1 score of 0.60 in identifying fallacies. Based on our analysis, we suggest using lexical dictionaries to effectively identify "appeal to emotion" fallacies.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)