Redundancy and Uncertainty-Based Algorithms for Computation Planning
https://doi.org/10.15514/ISPRAS-2022-34(1)-9
Abstract
Nowadays, the development and use of workflow-based applications (distributed applied software packages) are some of the key challenges in terms of preparing and carrying out large-scale scientific experiments in distributed environments with heterogeneous computing resources. The environment resources can be represented by clusters of personal computers, supercomputers, and private or public cloud platforms and differ in their computational characteristics. Moreover, the composition and characteristics of resources change in dynamics. Therefore, computations planning and resource allocation in the considered environments are important problems. In this regard, we propose new algorithms for computation planning taking into account redundancy and uncertainty in such distributed applied software packages. Compared to other algorithms of a similar purpose, the proposed algorithms use evaluations of workflow execution makespan obtained in the process of continuous integration, delivery, and deployment of applied software. The proposed algorithms provide the construction of redundant problem-solving schemes that allow us to adapt them to the dynamic characteristics of computational resources and improve distributed computing reliability. The algorithms are based on a theory of conceptual modeling computational processes. We demonstrate the process of constructing problem-solving schemes on model examples. In addition, we show the utility in using redundancy for increasing the distributed computing reliability In comparison with some traditional meta-schedulers.
Keywords
About the Authors
Alexander Gennadevich FEOKTISTOVRussian Federation
Ph.D., Associate Professor, Head of the Laboratory of Parallel and Distributed Computing Systems
Roman Olegovich KOSTROMIN
Russian Federation
Ph.D., Junior Researcher
Sergei Alexeevich GORSKY
Russian Federation
Ph.D., Senior Researcher
Igor Vyacheslavovich BYCHKOV
Russian Federation
Academician of RAS, Doctor of Science, Professor, Director
Andrei Nikolaevitch TCHERNYKH
Russian Federation
Doctor of Science, Professor
Olga Yurevna BASHARINA
Russian Federation
Ph.D., Research Officer of ISDCT SB RAS, Associate Professor of the Irkutsk State University
References
1. Casanova H., Legrand A. et al. Heuristics for scheduling parameter sweep applications in grid environments. In Proc. of the 9th Heterogeneous Computing Workshop, 2000, pp. 349-363.
2. Casavant T.L., Kuhl J.G. A Taxonomy of Scheduling in General–Purpose Distributed Computing Systems. IEEE Transactions on Software Engineering, vol. 14, issue 2, 1988, pp. 141-154.
3. Черных А.Н., Бычков И.В. и др. Смягчение неопределенности при разработке научных приложений в интегрированной среде. Труды ИСП РАН, том 33, вып. 1, 2021 г., стр. 151-171 / Tchernykh A., Bychkov I.V. et al. Mitigating Uncertainty in Developing Scientific Applications in Integrated Environment. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 1, 2021, pp. 151-172 (in Russian). DOI: 10.15514/ISPRAS–2021–33(1)–11.
4. Feoktistov A., Gorsky S. et al. Collaborative Development and Use of Scientific Applications in Orlando Tools: Integration, Delivery, and Deployment. Communications in Computer and Information Science, vol. 1087, 2020, pp. 18-32.
5. Cardoso J, Sheth A. Semantic E-Workflow Composition. Journal of Intelligent Information Systems, vol. 21, issue 3, 2003, pp.191-225.
6. Mineau G.W., Missaoui R., Godinx R. Conceptual modeling for data and knowledge management. Data & Knowledge Engineering, vol. 33, issue 2, 2000, pp. 137-168.
7. Yu J., Buyya R. A taxonomy of workflow management systems for grid computing. Journal of Grid computing, vol. 3, issue 3-4, 2005, pp. 171-200.
8. Fahringer T., Prodan R. et al. ASKALON: a Grid application development and computing environment. In Proc. of The 6th IEEE/ACM International Workshop on Grid Computing, 2005, pp. 1-10.
9. Tschager T., Schmidt H.A. Condor, DAGwoman: enabling DAGMan-like workflows on non-Condor platforms. In Proc. of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 2012, article no. 3, pp. 1-6.
10. Amin K., Laszewski G. et al. GridAnt: a client-controllable grid workflow system. In. Proc. of the 37th Annual Hawaii International Conference on System Science, 2004, pp. 1-10.
11. Carrion I.M., Huedo E., Llorente I.M. Interoperating Grid infrastructures with the GridWay metascheduler. Concurrency Computation, vol. 27, issue. 9, 2015, pp. 2278-2290.
12. Barseghian D., Altintas I. et al. Workflows and extensions to the Kepler scientific workflow system to support environmental sensor data access and analysis. Ecological Informatics, vol. 5, issue 1, 2010, pp. 42-50.
13. Missier P., Soiland-Reyes S. et al. Taverna, Reloaded. Lecture Notes in Computer Science, vol. 6187, 2010, pp. 471-481.
14. Vahi K., Harvey I. et al. A General Approach to Real-Time Workflow Monitoring. In Proc. of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012, pp. 108-118.
15. Benedyczak K., Bala P. et al. Key aspects of the UNICORE 6 security model. Future Generation Computer Systems, vol. 27, issue 2, 2011, pp. 195-201.
16. Extensible Markup Language (XML). Available at: https://www.w3.org/XML, accessed 13.07.2021.
17. XML Process Definition Language. Available at: https://www.w3.org/TR/xmlschema-0, accessed 13.07.2021.
18. Guizania K., Ghannouchia S.A. An approach for selecting a business process modeling language that best meets the requirements of a modeler. Procedia Computer Science, vol. 181, 2021, pp. 843-851.
19. Mo’Minov B.B., Eshankulov, K. Modelling Asynchronous Parallel Process with Petri Net. International Journal of Engineering Advanced Technology, vol. 8, issue 5S3, 2019, pp. 400-405.
20. Unified Modeling Language (UML) Diagrams. Available at: https://www.uml.org, accessed 13.07.2021.
21. Deelman E., Vahi K. et al. Pegasus, a workflow management system for science automation. Future Generation Computer Systems, vol. 46, 2015, pp. 17-35.
22. Blythe J., Deelman E. et al. The Role of Planning in Grid Computing. In Proc. Of the Thirteenth International Conference on Automated Planning and Scheduling, 2003, pp. 153–163.
23. Matskin M., Tyugu E. Strategies of structural synthesis of programs and its extensions. Computing and Informatics, vol. 20, issue 1, 2001, pp. 1-26.
24. Опарин Г.А., Новопашин А.П. Булевы модели и методы планирования параллельных абстрактных программ. Автоматика и телемеханика, вып. 8, 2008 г., cтр. 166-175 / Oparin G.A., Novopashin A.P. Boolean models and planning methods for parallel abstract programs. Automation and Remote Control, vol. 69, no. 8, 2008, pp. 1423-1432.
25. Новосельцев В.Б. Синтез параллельных рекурсивных программ в структурных функциональных моделях. Программирование, том 33, вып. 5, 2007 г., стр. 75-81 / Novoseltsev V.B. Synthesis of parallel recursive programs in structural functional models. Programming and Computer Software, vol. 33, no. 5, pp. 293-298.
26. Malyshkin V.E., Perepelkin V.A. LuNA fragmented programming system, main functions and peculiarities of run-time subsystem. Lecture Notes in Computer Science, vol. 6873, 2011, pp. 53-61.
27. Вальковский В.А., Малышкин В.Э. Синтез параллельных программ и систем на вычислительных моделях. Наука. Сибирское отделение, 1988 г., 128 стр. / Valkovsky V.A., Malyshkin V.E. Synthesis of parallel programs and systems on computational models. Nauka. Siberian branch, 1988, 128 p. (in Russian).
28. Gorsky S., Kostromin R. et al. Orlando Tools: Supporting High-performance Computing in Distributed Environments. In Proc. of the 6th International Conference on Information Technology and Nanotechnology, 2020, pp. 1-6.
29. Feoktistov A., Tchernych A. et al. Knowledge Elicitation in Multi-Agent System for Distributed Computing Management. In Proc. of the 40th International Convention on information and communication technology, electronics and microelectronics, 2017, pp. 1350-1355.
30. Bychkov I., Feoktistov A. et al. Machine Learning in a Multi-Agent System for Distributed Computing Management. In Proc. of the International Conference Information Technology and Nanotechnology. Session Data Science, CEUR-WS Proceedings, vol. 2212, 2018, pp. 89-97.
31. Feoktistov A., Kostromin R., Tchernykh A. Agent Behavior Model for Distributed Computing Management in the Environment with Virtualized Resources. In Proc. of the 41st International Convention on information and communication technology, electronics and microelectronics, 2018, pp. 1153-1158.
32. Feoktistov A.G., Basharina O.Yu. Predicting runtime of computational jobs in distributed computing environment. In Proc. of the 2nd International Workshop on Information, Computation, and Control Systems for Distributed Environments, CEUR-WS Proceedings, 2020, vol. 2638, pp. 109-117.
33. Ryabinin I.A. Logical probabilistic analysis and its history. International Journal of Risk Assessment and Management, vol. 18, issue 3-4, 2015, pp. 256-265.
34. Feoktistov A.G., Sidorov I.A. Logical-Probabilistic Analysis of Distributed Computing Reliability. In Proc. of the 39th International Convention on information and communication technology, electronics and microelectronics, 2016, pp. 247-252.
35. Bychkov I., Oparin G. et al. Conceptual Model of Problem-Oriented Heterogeneous Distributed Computing Environment with Multi-Agent Management. Procedia Computer Science, vol. 103, 2017, pp. 162-167.
36. Ershov A.P. On Mixed Computation: Informal Account of the Strict and Polyvariant Computation Schemes. In Control Flow and Data Flow: Concepts of Distributed Programming. Springer Study Edition, vol. 14, 1985, pp. 107-120.
37. Feoktistov A., Kostromin R. et al. Multi-Agent Algorithm for Re-Allocating Grid-Resources and Improving Fault-Tolerance of Problem-Solving Processes. Procedia Computer Science, 2019, vol. 150, pp. 171-178.
38. Tchernykh A., Feoktistov A. et al. Orlando Tools: Development, Training, and Use of Scalable Applications in Heterogeneous Distributed Computing Environments. Communications in Computer and Information Science, vol. 979, 2019, pp. 265-279.
39. Bychkov I.V., Oparin G.A. et al. Multiagent control of computational systems on the basis of meta-monitoring and imitational simulation. Optoelectronics, Instrumentation and Data Processing, vol. 52, issue 2, 2016, pp. 107-112.
Review
For citations:
FEOKTISTOV A.G., KOSTROMIN R.O., GORSKY S.A., BYCHKOV I.V., TCHERNYKH A.N., BASHARINA O.Yu. Redundancy and Uncertainty-Based Algorithms for Computation Planning. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(1):123-140. (In Russ.) https://doi.org/10.15514/ISPRAS-2022-34(1)-9