Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Determining Relevant Risk Factors for Breast Cancer

https://doi.org/10.15514/ISPRAS-2024-36(1)-14

Abstract

Breast cancer is a serious threat to women’s health worldwide. Although the exact causes of this disease are still unknown, it is known that the incidence of breast cancer is associated with risk factors. Risk factors in cancer are any genetic, reproductive, hormonal, physical, biological, or lifestyle-related conditions that increase the likelihood of developing breast cancer. This research aims to identify the most relevant risk factors in patients with breast cancer in a dataset by following the Knowledge Discovery in Databases process. To determine the relevance of risk factors, this research implements two feature selection methods: the Chi-Squared test and Mutual Information; and seven classifiers are used to validate the results obtained. Our results show that the risk factors identified as the most relevant are related to the age of the patient, her menopausal status, whether she had undergone hormonal therapy, and her type of menopause.

About the Authors

Zazil Josefina IBARRA-CUEVAS
School of Engineering, Universidad Autónoma de San Luis Potosí
Mexico

Master of Science in Computer Science and software developer working at a private company since 2022. Research interests: Data mining, data bases and software engineering.



Jose Ignacio NUNEZ-VARELA
School of Engineering, Universidad Autónoma de San Luis Potosí
Mexico

Doctor of Computer Science, professor, coordinator of the Intelligent Systems Engineering undergraduate program at the Autonomous University of San Luis Potosi since 2017. Research interests: Machine learning, data science, intelligent robotics.



Alberto NUNEZ-VARELA
School of Engineering, Universidad Autónoma de San Luis Potosí
Mexico

Doctor of Computer Science and associate professor at the Autonomous University of San Luis Potosi since 2014. Research interests: Software engineering, grammatical inference, natural language processing, and machine learning.



Francisco Eduardo MARTINEZ-PEREZ
School of Engineering, Universidad Autónoma de San Luis Potosí
Mexico

Francisco Eduardo Martinez-Perez – Doctor of Computer Science, professor, coordinator of the Computer Engineering undergraduate program at the Autonomous University of San Luis Potosi since 2023. Research interests: Image processing, ambient intelligence (AmI), ubiquitous computing, human–computer interaction, and medical informatics.



Sandra E. NAVA-MUÑOZ
School of Engineering, Universidad Autónoma de San Luis Potosí
Mexico

Doctor of Computer Science, professor, coordinator of the Computer Science postgraduate program at the Autonomous University of San Luis Potosi since 2023. Research interests: Software engineering, human-computer interaction, context aware computing, and medical informatics.



César Augusto RAMÍREZ-GÁMEZ
School of Engineering, Universidad Autónoma de San Luis Potosí
Mexico

Master of Science in Computer Science, his Ph.D. degree, and software developer working at a private company since 2023. Research interests: Computer vision, image processing, and machine learning.



Hector Gerardo PEREZ-GONZALEZ
School of Engineering, Universidad Autónoma de San Luis Potosí
Mexico

Full-time research professor at Universidad Autónoma de San Luis Potosi, Mexico. PhD in Computer Science from the University of Colorado in 2003. Author of research articles and book chapters on Automatic Software Design and Human-Computer Interaction. He has been a speaker at international conferences in the USA, Canada, UK, Portugal, and Singapore. His research areas are software design, computer science education, and quantum software engineering. He is a member of the National Researchers System in Mexico.



References

1. Global Cancer Observatory, “Cancer Today”, https://gco.iarc.fr/today/online-analysis-pie (accessed Apr. 25, 2023).

2. Cancer.Net, “Breast Cancer: Risk Factors and Prevention”, https://www.cancer.net/cancer-types/breast-cancer/risk-factors-and-prevention (accessed Apr. 25, 2023).

3. P. H. Abreu, M. S. Santos, M. H. Abreu, B. Andrade, and D. C. Silva, “Predicting Breast Cancer Recurrence Using Machine Learning Techniques”, ACM Comput. Surv., vol. 49, no. 3, pp. 1–40, Dec. 2016, doi: 10.1145/2988544.

4. H. Kawano, “Knowledge Discovery and Data Mining”, J. Japan Soc. Fuzzy Theory Syst., vol. 9, no. 6, pp. 851–860, 1997, doi: 10.3156/jfuzzy.9.6_851.

5. A. Li et al., “Association Rule-Based Breast Cancer Prevention and Control System”, IEEE Trans. Comput. Soc. Syst., vol. 6, no. 5, pp. 1106–1114, Oct. 2019, doi: 10.1109/TCSS.2019.2912629.

6. M. F. Kabir, S. A. Ludwig, and A. S. Abdullah, “Rule Discovery from Breast Cancer Risk Factors using Association Rule Mining”, in 2018 IEEE International Conference on Big Data (Big Data), Dec. 2018, pp. 2433–2441, doi: 10.1109/BigData.2018.8622028.

7. M. F. Kabir and S. Ludwig, “Classification of Breast Cancer Risk Factors Using Several Resampling Approaches”, in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Dec. 2018, pp. 1243–1248, doi: 10.1109/ICMLA.2018.00202.

8. W. E. Barlow et al., “Prospective Breast Cancer Risk Prediction Model for Women Undergoing Screening Mammography”, JNCI J. Natl. Cancer Inst., vol. 98, no. 17, pp. 1204–1214, Sep. 2006, doi: 10.1093/jnci/djj331.

9. K. Pearson, “On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling”, London, Edinburgh, Dublin Philos. Mag. J. Sci., vol. 50, no. 302, pp. 157–175, Jul. 1900, doi: 10.1080/14786440009463897.

10. D. J. C. MacKay, "Information Theory, Inference & Learning Algorithms". USA: Cambridge University Press, 2002.

11. H. Kaur, H. S. Pannu and A. K. Malhi, "A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions", ACM Computing Surveys, vol. 52, no. 4, pp. 1-36, 2019, doi: 10.1145/3343440.

12. L. Breiman, “Bagging Predictors”, Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996, doi: 10.1023/A:1018054314350.

13. I. Volkov, G. Radchenko, and A. Tchernykh, "Digital Twins, Internet of Things and Mobile Medicine: A Review of Current Platforms to Support Smart Healthcare". Programming and Computer Software, vol. 47, pp. 578–590, 2021, doi: 10.1134/S0361768821080284.

14. I. Vasilev, M. Petrovskiy, I. Mashechkin, et al. "Predicting COVID-19-Induced Lung Damage Based on Machine Learning Methods". Programming and Computer Software, vol. 48, pp. 243–255, 2022, doi: 10.1134/S0361768822040065.


Review

For citations:


IBARRA-CUEVAS Z.J., NUNEZ-VARELA J.I., NUNEZ-VARELA A., MARTINEZ-PEREZ F.E., NAVA-MUÑOZ S., RAMÍREZ-GÁMEZ C.A., PEREZ-GONZALEZ H.G. Determining Relevant Risk Factors for Breast Cancer. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2024;36(1):225-238. https://doi.org/10.15514/ISPRAS-2024-36(1)-14



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)