Program for Constructing Quite Interpretable Elementary and Non-elementary Quasi-linear Regression Models
https://doi.org/10.15514/ISPRAS-2023-35(4)-7
Abstract
A quite interpretable linear regression satisfies the following conditions: the signs of its coefficients correspond to the meaningful meaning of the factors; multicollinearity is negligible; coefficients are significant; the quality of the model approximation is high. Previously, to construct such models, estimated using the ordinary least squares, the QInter-1 program was developed. In it, according to the given initial parameters, the mixed integer 0-1 linear programming task is automatically generated, as a result of which the most informative regressors are selected. The mathematical apparatus underlying this program was significantly expanded over time: non-elementary linear regressions were developed, linear restrictions on the absolute values of intercorrelations were proposed to control multicollinearity, assumptions appeared about the possibility of constructing not only linear, but also quasi-linear regressions. This article is devoted to the description of the developed second version of the program for constructing quite interpretable regressions QInter-2. The QInter-2 program allows, depending on the initial parameters selected by the user, to automatically formulate for the LPSolve solver the mixed integer 0-1 linear programming task for constructing both elementary and non-elementary quite interpretable quasi-linear regressions. It is possible to set up to nine elementary functions and control such parameters as the number of regressors in the model, the number of signs in real numbers after the decimal point, the absolute contributions of variables to the overall determination, the number of occurrences of explanatory variables in the model, and the magnitude of intercorrelations. In the process of working with the program, you can also control the number of elementary and non-elementarily transformed variables that affect the speed of solving the mixed integer 0-1 linear programming task. The QInter-2 program is universal and can be used to construct quite interpretable mathematical dependencies in various subject areas.
About the Author
Mikhail Pavlovich BAZILEVSKIYRussian Federation
Candidate of Technical Sciences, Associate Professor, Associate Professor of the Department of Mathematics of the Irkutsk State Transport University. Research interests: mathematical modeling, data analysis, optimization, econometrics, machine learning, artificial intelligence.
References
1. Molnar C. Interpretable machine learning. Lulu.com, 2020.
2. Doshi-Velez F., Kim B. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
3. Montgomery D. C., Peck E. A., Vining G. G. Introduction to linear regression analysis. John Wiley & Sons, 2021.
4. Shrestha N. Detecting multicollinearity in regression analysis. American Journal of Applied Mathematics and Statistics, vol. 8, no. 2, 2020, pp. 39-42.
5. Базилевский М.П. Построение вполне интерпретируемых линейных регрессионных моделей с помощью метода последовательного повышения абсолютных вкладов переменных в общую детерминацию. Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии, ном. 2, 2022, стр. 5-16 / Bazilevskiy M.P. Construction of quite interpretable linear regression models using the method of successive increase the absolute contributions of variables to the general determination. Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, no. 2, 2022, pp. 5-16. (in Russian).
6. Горбач А.Н., Цейтлин Н.А. Покупательское поведение: анализ спонтанных последовательностей и регрессионных моделей в маркетинговых исследованиях. Киев, Освiта УкраЇны, 2011, 220 с. / Gorbach A.N., Tseytlin N.A. Buying Behavior: Analysis of Spontaneous Sequences and Regression Models in Marketing Research. Kyiv, Education of Ukraine, 2011, 220 p. (in Russian).
7. Miller A. Subset selection in regression. CRC Press, 2002.
8. Себер Дж. Линейный регрессионный анализ. М., Издательство “Мир”, 1980, 456 с. / Seber Dzh. Linear Regression Analysis. Moscow, Mir Publishing House, 1980, 456 p. (in Russian).
9. Фёрстер Э., Рёнц Б. Методы корреляционного и регрессионного анализа. М., Финансы и статистика, 1983, 303 с. / Ferster E., Rents B. Methods of Correlation and Regression Analysis. Moscow, Finance and Statistics, 1983, 303 p. (in Russian).
10. Konno H., Yamamoto R. Choosing the best set of variables in regression analysis using integer programming. Journal of Global Optimization, 2009, vol. 44, pp. 273-282. DOI: 10.1007/s10898-008-9323-9.
11. Miyashiro R., Takano Y. Mixed integer second-order cone programming formulations for variable selection in linear regression. European Journal of Operational Research, 2015, vol. 247, pp. 721-731. DOI: 10.1016/j.ejor.2015.06.081.
12. Miyashiro R., Takano Y. Subset selection by Mallows’ Cp: A mixed integer programming approach. Expert Systems with Applications, 2015, vol. 42, pp. 325-331. DOI: 10.1016/j.eswa.2014.07.056.
13. Tamura R., Kobayashi K., Takano Y., Miyashiro R., Nakata K., Matsui T. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. Journal of Global Optimization, 2019, vol. 73, pp. 431-446. DOI: 10.1007/s10898-018-0713-3.
14. Park Y.W., Klabjan D. Subset selection for multiple linear regression via optimization. Journal of Global Optimization, 2020, vol. 77, pp. 543-574. DOI: 10.1007/s10898-020-00876-1.
15. Takano Y., Miyashiro R. Best subset selection via cross-validation criterion. Top, 2020, vol. 28, no. 2, pp. 475-488. DOI: 10.1007/s11750-020-00538-1.
16. Bertsimas D., Li M.L. Scalable holistic linear regression. Operations Research Letters, 2020, vol. 48, no. 3, pp. 203-208. DOI: 10.1016/j.orl.2020.02.008.
17. Chung S., Park Y.W., Cheong T. A mathematical programming approach for integrated multiple linear regression subset selection and validation. Pattern Recognition, 2020, vol. 108. DOI: 10.1016/j.patcog.2020.107565.
18. Bertsimas D., Gurnee W. Learning sparse nonlinear dynamics via mixed-integer optimization. Nonlinear Dynamics, 2023, vol. 111, no. 7, pp. 6585-6604. DOI: 10.1007/s11071-022-08178-9.
19. Watanabe A., Tamura R., Takano Y., Miyashiro R. Branch-and-bound algorithm for optimal sparse canonical correlation analysis. Expert Systems with Applications, 2023, vol. 217, pp. 119530. DOI: 10.1016/j.eswa.2023.119530.
20. Базилевский М.П. Формализация процесса отбора информативных регрессоров в линейной регрессии в виде задачи частично-булевого линейного программирования с ограничениями на коэффициенты интеркорреляций. Современные наукоемкие технологии, ном. 8, 2023, стр. 10-14 / Bazilevskiy M.P. Formalization the subset selection process in linear regression as a mixed integer 0-1 linear programming problem with constraints on intercorrelation coefficients. Modern High Technologies, no. 8, 2023, pp. 10-14. (in Russian).
21. Базилевский М.П. Отбор информативных регрессоров с учётом мультиколлинеарности между ними в регрессионных моделях как задача частично-булевого линейного программирования. Моделирование, оптимизация и информационные технологии, том 6, ном. 2 (21), 2018, стр. 104-118 / Bazilevskiy M.P. Subset selection in regression models with considering multicollinearity as a task of mixed 0-1 integer linear programming. Modeling, Optimization and Information Technology, vol. 6, no. 2 (21), 2018, pp. 104-118. (in Russian).
22. Базилевский М.П. Отбор значимых по критерию Стьюдента информативных регрессоров в оцениваемых с помощью МНК регрессионных моделях как задача частично-булевого линейного программирования. Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии, ном. 3, 2021, стр. 5-16 / Bazilevskiy M.P. Selection of informative regressors significant by Student’s t-test in regression models estimated using OLS as a partial Boolean linear programming problem. Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, no. 3, 2021, pp. 5-16. (in Russian).
23. Базилевский М.П. Метод построения неэлементарных линейных регрессий на основе аппарата математического программирования. Проблемы управления, ном. 4, 2022, стр. 3-14 / Bazilevskiy M.P. A method for constructing non-elementary linear regressions based on mathematical programming. Control Sciences, no. 4, 2022, pp. 3-14. (in Russian).
24. Базилевский М.П. Построение вполне интерпретируемых неэлементарных линейных регрессионных моделей. Вестник Югорского государственного университета, ном. 4 (67), 2022, стр. 105-114 / Bazilevskiy M.P. Construction of quite interpretable non-elementary linear regression models. Yugra State University Bulletin, no. 4 (67), 2022, pp. 105-114. (in Russian).
25. Базилевский М.П. Критерии нелинейности квазилинейных регрессионных моделей. Моделирование, оптимизация и информационные технологии, том 6, ном. 4 (23), 2018, стр. 185-195 / Bazilevskiy M.P. Nonlinear criteria of quasi-linear regression models. Modeling, Optimization and Information Technology, vol. 6, no. 4 (23), 2018, pp. 185-195. (in Russian).
Supplementary files
|
1. Неозаглавлен | |
Subject | ||
Type | Other | |
View
(626KB)
|
Indexing metadata ▾ |
|
2. Неозаглавлен | |
Subject | ||
Type | Other | |
View
(80KB)
|
Indexing metadata ▾ |
|
3. Неозаглавлен | |
Subject | ||
Type | Other | |
View
(23KB)
|
Indexing metadata ▾ |
Review
For citations:
BAZILEVSKIY M.P. Program for Constructing Quite Interpretable Elementary and Non-elementary Quasi-linear Regression Models. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(4):129-144. (In Russ.) https://doi.org/10.15514/ISPRAS-2023-35(4)-7