Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search
Vol 36, No 6 (2024)
View or download the full issue PDF (Russian)
7-18
Abstract

Software cost/effort estimation has been a key research topic for over six decades due to its industry impact. Despite numerous models, regression-based approaches dominate the literature. Challenges include insufficient datasets with enough data points and arbitrary integration of different source databases. This study proposes using the Kruskal-Wallis test to validate the integration of distinct source databases, aiming to avoid mixing unrelated data, increase data points, and enhance estimation models. A case study was conducted with data from an international company's Mexico office, which provides software development for "Microservices and APIs." Data from 2020 were analyzed. The estimation model's quality improved significantly. MMRE decreased by 25.4% (from 78.6% to 53.2%), standard deviation dropped by 97.2% (from 149.7% to 52.5%), and the Pred (25%) indicator rose by 3.2 percentage points. The number of data points increased, and linear regression constraints were met. The Kruskal-Wallis test effectively improved the estimation models by validating database integration.

19-38
Abstract

Addressing software defects is an ongoing challenge in software development, and effectively managing and resolving defects is vital for ensuring software reliability, which is in turn a crucial quality attribute of any software system. Software defect prediction supported by Machine Learning (ML) methods offers a promising approach to address the problem of software defects. However, one common challenge in ML-based software defect prediction is the issue of data imbalance. In this paper, we present an empirical study aimed at assessing the impact of various class balancing methods on the issue of class imbalance in software defect prediction. We conducted a set of experiments that involved nine distinct class balancing methods across seven different classifiers. We used datasets from the PROMISE repository, provided by the NASA software project. We also employed various metrics including AUC, Accuracy, Precision, Recall, and the F1 measure to gauge the effectiveness of the different class balancing methods. Furthermore, we applied hypothesis testing to determine any significant differences in metric results between datasets with balanced and unbalanced classes. Based on our findings, we conclude that balancing the classes in software defect prediction yields significant improvements in overall performance. Therefore, we strongly advocate for the inclusion of class balancing as a pre-processing step in this domain.

39-58
Abstract

With the increment in software development complexity, approaches such as Domain-Driven Design (DDD) are needed to tackle contemporary business domains. DDD is already being used in various software projects with different architectural styles. Although some studies have explored the decomposition of business domains or legacy monolithic systems into microservices, there is a lack of concrete information regarding the practical implementation of DDD in this architectural style. The paper systematizes findings on the purpose of using DDD, its patterns, associated technologies, and techniques to increase the clarity about the use of DDD in microservices-based systems development. A systematic literature review of 35 articles was conducted. Thematic analysis was employed to identify five high-order themes and 11 themes. Based on our analysis, we have concluded that microservice identification emerges as the primary motivation behind developers' adoption of DDD, but not the only usage of DDD reported in the literature. Finally, our analysis found benefits and challenges in the use of DDD in Microservices Architecture which are translated to opportunity areas for future works.

59-82
Abstract

Program Synthesis is the process of automatically generating software from a requirement specification. This paper presents a systematic literature review focused on program synthesis from specifications expressed in natural language. The research problem centers on the complexity of automatically generating accurate and robust code from high-level, ambiguous natural language descriptions – a barrier that limits the broader adoption of automatic code generation in software development. To address this issue, the study systematically examines research published between 2014 and 2024, focusing on works that explore various approaches to program synthesis from natural language inputs. The review follows a rigorous methodology, incorporating search strings tailored to capture relevant studies from five major data sources: IEEE, ACM, Springer, Elsevier, and MDPI. The selection process applied strict inclusion and exclusion criteria, resulting in a final set of 20 high-quality studies. The findings reveal significant advancements in the field, particularly in the integration of large language models (LLMs) with program synthesis techniques. The review also highlights the challenges and concludes by outlining key trends and proposing future research directions aimed at overcoming these challenges and expanding the applicability of program synthesis across various domains.

83-102
Abstract

Project management is a field that has been applied in various areas of knowledge, particularly in engineering and software development. For organizations, projects are a central element for generating value. They allow to reach the organizational goal by using specific methodologies, tools and software. One of the most recognized tools, even in other fields of knowledge, for its impact on process improvement is maturity models. These models have already begun to be implemented in project management. Project Management Maturity Models are useful tools to evaluate the management process using a process reference (e.g., PMBOK). This process reference describes the best practices to achieve success in projects. The purpose of this paper is the identification of research papers that present maturity models specifically for project management. A useful classification for project managers using maturity models in a project management context is generated from the results of the review.

103-114
Abstract

The process of developing software is intricate and time-consuming. Resource estimation is one of the most important responsibilities in software development. Since it is currently the only acceptable metric, the functional size of the program is used to generate estimating models in a widely accepted manner. On the other hand, functional size measurement takes time. The use of artificial intelligence (AI) to automate certain software development jobs has gained popularity in recent years. Software functional sizing and estimation is one area where artificial intelligence may be used. In this study, we investigate how to apply the concepts and guidelines of the COSMIC method to measurements using ChatGPT 4o, a large language model (LLM). To determine whether ChatGPT can perform COSMIC measurements, we discovered that ChatGPT could not reliably produce accurate findings. The primary shortcomings found in ChatGPT include its incapacity to accurately extract data movements, data groups, and functional users from the text. Because of this, ChatGPT's measurements fall short of two essential requirements for measurement: accuracy and reproducibility.

115-148
Abstract

Quantum computing, strongly based on quantum mechanics, presents significant challenges for people without a background in quantum physics. Strong technical skills in quantum topics, mathematics, and related fields are essential for job roles in this area. Additionally, the multidisciplinary nature of quantum computing requires soft skills for effective teamwork. This paper reviews literature using systematic mapping to identify the key technical and soft skills needed to prepare students and professionals for the quantum computing field, helping educational institutions design appropriate courses and curricula.

149-160
Abstract

The complexity of dealing with work-related stress as a Complex Informal Structured Domain (CISD) involves various social, technical, cultural, and scientific factors, which highlights the challenges posed by organizational decision-making and the need for cognitive solutions to improve understanding of such complex scenarios. The article discusses three empirical-theoretical approaches to conceptualizing and specifying cognitive solutions to real-world problems in CISD: A literature review examines the use of specific machine learning artificial algorithms to develop models for work stress prevention; the use of cognitive solutions as ontologies for explicit knowledge representation; and a systemic methodological framework that establishes a structured approach to conceptualization and specification. The exploration emphasizes the need for a methodological model that effectively supports these cognitive solutions, to improve organizational decision-making by leveraging systems thinking and knowledge management.

161-178
Abstract

Behavior-driven development (BDD) focuses on specifying system behavior through examples, fostering collaboration, and aligning development with business needs. This research provides a thematic synthesis of BDD, highlighting its challenges, benefits, and implications in software development. By analyzing 23 studies across four academic databases, the study identifies trends and themes in BDD adoption and implementation. The findings emphasize BDD's role in bridging the gap between technical and non-technical stakeholders, aligning software development with business goals. Despite initial adoption challenges, the study reveals significant long-term benefits in software quality and stakeholder satisfaction. Future research should focus on developing efficient training and tools to support BDD adoption in diverse environments.

179-194
Abstract

The advent of digitalization and Internet of things (IoT) technologies brings new challenges to the management of electric metering systems. Integrating institutional energy billing systems with government Ambient Intelligence (AMI) systems is essential for effective management. Blockchain technology is proposed to maintain data integrity through automated energy readings. This study introduces an innovative model designed to enhance public lighting in Mexico by integrating AMI and IoT, and employing LZ4 and IPFS for data compression. This approach aims to optimize the handling of large data volumes, resulting in improved data efficiency, enhanced security, cost reductions, and better energy resource management.

195-214
Abstract

The automation of workflow-based computing for solving large resource-intensive problems has undoubtedly had an impact on increasing the productivity of scientific research. In recent years, workflows have become the basis for abstractions covering data processing and high-performance computing using distributed applications. Workflow management systems are powerful tools for the collaborative development and use of distributed scientific applications. Nowadays, as part of the development of such systems, particular attention is currently being paid to supporting service-oriented scientific applications. Within this field of research, there is a large spectrum of problems related to the support of modular scientific applications, the standardization of their components and interfaces, the use of heterogeneous information and computing resources, and organization interdisciplinary research. Unfortunately, the solution to the above listed problems has not been fully implemented in known workflow management systems that support the development and use of service-oriented scientific applications. In this context, the paper discusses relevant aspects of organizing service-oriented computing in a computing environment with heterogeneous resources. The development of technologies for the development and use of service-oriented scientific applications, in which problem-solving schemes are formed in the form of workflows, is discussed. Existing standards for describing workflows are represented. A new framework for creating service-oriented scientific applications is proposed. It extends and complements the capabilities of systems for such purposes.

215-230
Abstract

In the last decade, Learning Analytics (LA) has evolved in a positive way, considering that the term emerged in 2011 through the Society for Learning Analytics Research (SoLAR). This area of data analytics can be identified as a specialization of Educational Data Mining (EDM). LA emphasizes student learning outcomes. In addition to, a better understanding of student learning behavior and processes. While EDM focuses on helping teachers and students with the analysis of the learning process using popular data mining methods. The purpose of this research is to explore the first decade of work with the application of Learning Analytics in Higher Education Institutions (HEI) in the context of Tutoring Information Systems (TIS), with the intention of supporting institutions, teachers and students to decrease dropout rates. This article presents a systematic literature review (SLR) with 17 primary studies, comprised between 2014 and 2024. The findings reflect the use of LA in improving or optimizing learning using student academic history obtained through Learning Management Systems (LMS), noting the scarcity of works with a focus on tutoring or academic advising. Ultimately, a gap is opened to apply LA in HEI, with information from Institutional Tutoring Program (PIT), integrated with information from an LMS, to contribute to student permanence.

231-246
Abstract

The COVID-19 pandemic was the first health crisis to affect the entire world in this century. The data captured revealed a lack of organization and control in health measures, containment, and mitigation policies, as well as a lack of planning and coordination in the use of medical supplies, which motivated the development of prediction models that provided predictive information on the evolution of the pandemic. In this work, a time series of accumulated cases of infection was generated through official data provided by the Ministry of Health of the Government of Mexico. Six deterministic and stochastic predictive models were applied to this information to compare their efficiency in predicting cases of COVID-19 infection. These models were applied to data from two cities in Mexico, Colima and the State of Mexico. The study concludes that the ARIMA and ANN MLP models adapt better to the data that is generated daily, therefore, they have an improved prediction capacity.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)