Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Improving Estimation Models by Merging Independent Data Sources

https://doi.org/10.15514/ISPRAS-2024-36(6)-1

Abstract

Software cost/effort estimation has been a key research topic for over six decades due to its industry impact. Despite numerous models, regression-based approaches dominate the literature. Challenges include insufficient datasets with enough data points and arbitrary integration of different source databases. This study proposes using the Kruskal-Wallis test to validate the integration of distinct source databases, aiming to avoid mixing unrelated data, increase data points, and enhance estimation models. A case study was conducted with data from an international company's Mexico office, which provides software development for "Microservices and APIs." Data from 2020 were analyzed. The estimation model's quality improved significantly. MMRE decreased by 25.4% (from 78.6% to 53.2%), standard deviation dropped by 97.2% (from 149.7% to 52.5%), and the Pred (25%) indicator rose by 3.2 percentage points. The number of data points increased, and linear regression constraints were met. The Kruskal-Wallis test effectively improved the estimation models by validating database integration.

About the Authors

Francisco VALDÉS-SOUTO
National Autonomous University of Mexico Science Faculty
Mexico

Has a PhD degree in Software Engineering with a specialty in Software Measurement and Estimation at the École de Technologie Supérieure (ETS) in Canada, two master's degrees in Mexico and France. President of COSMIC. Associate Professor of the Faculty of Sciences of the National Autonomous University of Mexico (UNAM). Founder of the Mexican Association of Software Metrics (AMMS). More than 25 years of experience in critical software development. He currently has more than 50 publications including articles in Indexed Journals, Proceedings, books and book chapters. He is the main promoter of the topic of formal software metrics in Mexico, promoting COSMIC (ISO/IEC 19761) as a National Standard. Member of the National System of Researchers (SNI). Research interests: software measurement and estimation applied to software project management, scope management, productivity and economics in software projects.



Jorge VALERIANO-ASSEM
SPINGERE
Mexico

Master in Computer Science and Engineering from the National Autonomous University of Mexico, specialist consultant in formal software measurement and estimation since 2016. Areas of interest: Software metrics (COSMIC), Software estimation models, Software Validation Models, Estimation of Functional and Non-Functional Requirements, Evaluation of the Performance of Software Development Projects aligned to Software Metrics, Evaluation of the Quality of the Software Development Product.



References

1. M. Jørgensen and M. Shepperd, “A systematic review of software development cost estimation studies,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 33–53, 2007, doi: 10.1109/TSE.2007.256943.

2. P. L. Braga, A. L. I. Oliveira, and S. R. L. Meira, “Software Effort Estimation using Machine Learning Techniques with Robust Confidence Intervals,” in 7th International Conference on Hybrid Intelligent Systems, 2007, no. October 2007. doi: 10.1109/his.2007.56.

3. C. E. Carbonera, K. Farias, and V. Bischoff, “Software development effort estimation: A systematic mapping study,” IET Res. Journals, vol. 14, no. 4, pp. 1–14, 2020, doi: 10.1049/iet-sen.2018.5334.

4. A. Abran, Software Project Estimation: The Fundamentals for Providing High Quality Information to Decision Makers, 1st ed. John Wiley & Sons, 2015.

5. A. Abran, Software Metrics and Software Metrology. Hoboken, New Jersey: John Wiley & Sons, 2010.

6. N. Kinoshita, A. Monden, M. Tshunoda, and Z. Yucel, “Predictability classification for software effort estimation,” in Proceedings - 2018 IEEE/ACIS 3rd International Conference on Big Data, Cloud Computing, Data Science and Engineering, BCD 2018, 2018, no. 1, pp. 43–48. doi: 10.1109/BCD2018.2018.00015.

7. F. Valdés-Souto, “Validation of supplier estimates using cosmic method,” CEURInternational Work. Softw. Meas. Int. Conf. Softw. Process Prod. Meas. (IWSM Mensura 2019), vol. 2476, pp. 15–30, 2019.

8. F. Valdés-Souto and L. Naranjo-Albarrán, “Improving the Software Estimation Models Based on Functional Size through Validation of the Assumptions behind the Linear Regression and the Use of the Confidence Intervals When the Reference Database Presents a Wedge-Shape Form,” Program. Comput. Softw., vol. 47, no. 8, pp. 673–693, 2021, doi: 10.1134/S0361768821080259.

9. F. Valdés-Souto, “Creating an Estimation Model from Functional Size Approximation Using the EPCU Approximation Approach for COSMIC (ISO 19761),” in Software Engineering: Methods, Modeling and Teaching, Volume 4, Editorial., C. Mario, Z. Jaramillo, C. Elena, D. Vanegas, and W. P. Charry, Eds. Bogotá, Colombia, 2017, p. 468.

10. L. Lavazza, “Accuracy Evaluation of Model-based COSMIC Functional Size Estimation,” in ICSEA 2017: The Twelfth International Conference on Software Engineering Advances, 2017, no. c, pp. 67–72.

11. W. A. Kruskal, W. H., & Wallis, “Use of Ranks in One-Criterion Variance Analysis,” J. Am. Stat. Assoc., vol. 47, no. 260, pp. 583–621, 1952, doi: https://doi.org/10.1080/01621459.1952.10483441.

12. J. W. T. W. Tukey, Exploratory Data Analysis, 1st ed. Addison & Wesley, 1977.

13. R. R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, 4th Editio. Academic Press, 2016. [Online]. Available: https://shop.elsevier.com/books/introduction-to-robust-estimation-and-hypothesis-testing/wilcox/978-0-12-804733-0

14. A. Abran et al., “Early Software Sizing with COSMIC: Experts Guide,” vol. 2020, no. May. Common Software Measurement International Consortium (COSMIC), pp. 1–67, 2020. doi: 10.13140/RG.2.1.4195.0567.


Review

For citations:


VALDÉS-SOUTO F., VALERIANO-ASSEM J. Improving Estimation Models by Merging Independent Data Sources. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2024;36(6):7-18. https://doi.org/10.15514/ISPRAS-2024-36(6)-1



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)