Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Task and resources management function in HPC operation system «SPO Super-EVM»

https://doi.org/10.15514/ISPRAS-2022-34(2)-13

Abstract

The Slurm-VNIITF software developed by Federal State Unitary Enterprise “Russian Federal Nuclear Center - Zababakhin All-Russian Research Institute of Technical Physics”, it’s architecture, resource management capabilities and task management for numerical simulation HPC systems described in this paper. During many years usage of the HPC systems researches show that the basic features of the Slurm (Simple linux utility for resource management) software are clearly insufficient for the effective use of computing resources in HPC centers. Therefore, the authors of this paper propose an improved task and resource management policy. Slurm extension modules (plugins) for implementing this policy also described in this paper.

About the Authors

Alexey Olegovich IGNATYEV
E. I. Zababakhin All-Russian Scientific Research Institute of Technical Physics
Russian Federation

Head of Laboratory



Alexey Alexeevich KALININ
E. I. Zababakhin All-Russian Scientific Research Institute of Technical Physics
Russian Federation

Head of Research Group



Sergey Yurievich MOKSHIN
E. I. Zababakhin All-Russian Scientific Research Institute of Technical Physics
Russian Federation

Head of Department



References

1. . Ignatyev A.O., Mokshin S.Yu. Base architecture of the mathematical modelling HPC system, Preprint FSUE «RFNC-VNIITF named after Academ. E.I. Zababakhin» № 265, Snezhinsk, 2020, 21 p. (in Russian).

2. . «SPO Super-EVM», Available at: http://vniitf.ru/article/spo-super-evm, accessed 01.04.2022. (in Russian).

3. . Slurm workload manager, Available at: https://slurm.schedmd.com/documentation.html, accessed 01.04.2022.

4. . MPI: The Message Passing Interface. Available at: http://parallel.ru/tech/tech_dev/mpi.html, accessed 01.06.2020.

5. . The OpenMP API specification for parallel programming. Available at: https://www.openmp.org/, accessed 01.04.2022.

6. . Maui Scheduler, Available at: https://github.com/TempleHPC/maui-scheduler, accessed 01.04.2022.

7. . Moab Cluster Suite, Available at: https://adaptivecomputing.com/moab-hpc-suite/, accessed 01.04.2022.

8. . Torque Resource Manager, Available at: https://adaptivecomputing.com/cherry-services/torque-resource-manager/, accessed 01.04.2022.

9. . YAML, Available at: http://yaml.org /, accessed 01.04.2022.


Review

For citations:


IGNATYEV A.O., KALININ A.A., MOKSHIN S.Yu. Task and resources management function in HPC operation system «SPO Super-EVM». Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(2):159-178. (In Russ.) https://doi.org/10.15514/ISPRAS-2022-34(2)-13



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)