Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Overlapping communications and computations in GPU-based iterative linear solvers

https://doi.org/10.15514/ISPRAS-2016-28(1)-5

Abstract

Krylov subspace methods such as Conjugate Gradient and Biconjugate Gradient Stabilized methods are well known approaches for solving symmetric and asymmetric systems of linear algebraic equations, such as systems usually arising from partial differential equations in computational mathematics problems, like Navier-Stokes equations in fluid dynamics. With increasing sizes of meshes and numbers of computational nodes, network communications time may become an issue: stalls during global reductions have increasing duration, preventing useful computations. This happens because, in original formulations of methods, computing a dot product requires a global reduce operation, and its value is required on the next step, so each process has to stop until all others reach this point, like in a barrier synchronization. We research alternative formulations of conjugate gradient methods (Preconditioned Conjugate Gradient and BiCGStab) for GPU-based iterative linear system solvers. They allow to have an overlap of parallel computations and communications, at the cost of increased amount of computations and memory requirements. We describe an implementation of our approach for GPU-accelerated hybrid systems in OpenFOAM, an open source framework for computational fluid dynamics. Asynchronous collective communications from MPI-3 parallel programming API are used to avoid full barrier synchronization and reduce latency. Experimental results on 2 and 4 million cases from standard OpenFOAM problems are presented.

About the Authors

V. . Platonov
ISP RAS
Russian Federation


A. . Monakov
ISP RAS
Russian Federation


References

1. OpenFOAM - http://openfoam.org/

2. Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM. 2003.

3. A. Monakov, E. Velesevich, V. Platonov, A. Avetisyan. Instrumenty analiza i razrabotki jeffektivnogo koda dlja parallel'nyh arhitektur (Analysis and development tools for efficient programs on parallel architectures). Trudy ISP RAN [Proceedings of ISP RAS], volume 26 (issue 1), 2014, pp. 357-374 (in Russian). DOI: 10.15514/ISPRAS-2014-26(1)-14

4. A. Monakov, V. Platonov. Optimizacija metoda reshenija linejnyh sistem uravnenij v OpenFOAM dlja platformy MPI + CUDA (Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform). Trudy ISP RAN [Proceedings of ISP RAS], volume 26 (issue 3), 2014, pp. 91-102 (in Russian). DOI: 10.15514/ISPRAS-2014-26(3)-4

5. P. Ghysels, W. Vanroose. Hiding Global Synchronization Latency in the Preconditioned Conjugate Gradient Algorithm. Submitted to Parallel Computing, 2012.

6. Thierry Jacques, Laurent Nicolas, Christian Vollaire, 7th International Conference, HPCN Europe 1999 Amsterdam, The Netherlands, April 12-14, 1999 Proceedings, pp 1025-1031

7. L. Yang, R. Brent, The improved BiCGStab method for large and sparse unsymmetric linear systems on parallel distributed memory architectures, in: Fifth International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’02). Proceedings., IEEE ComputerSociety, Los Alamitos, CA, USA, 2002, pp. 324-328. doi:10.1109/ICAPP.2002.1173595

8. Boris Krasnopolsky, The reordered BiCGStab method for distributed memory computer systems, Procedia Computer Science, Volume 1, Issue 1, May 2010, Pages 213-218, ISSN 1877-0509, http://dx.doi.org/10.1016/j.procs.2010.04.024.


Review

For citations:


Platonov V., Monakov A. Overlapping communications and computations in GPU-based iterative linear solvers. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(1):81-92. (In Russ.) https://doi.org/10.15514/ISPRAS-2016-28(1)-5



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)