Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform

https://doi.org/10.15514/ISPRAS-2014-26(3)-4

Abstract

We describe an implementation of conjugate gradient method on heterogeneous platforms (multiple nodes with GPU accelerators) to be used in OpenFOAM. Several optimizations are described. For conjugate gradient itself, we suggest using device memory for scalars used only on the GPU and pinned memory for scalars used in MPI reductions. For preconditioning, we choose AINV as a suitable preconditioner for GPUs and describe ways to make it more efficient, such as storing in it single precision, laying out factors in upper-left triangular form and computing it on the CPU asynchronously. We describe how multi-GPU computing can be supported together with arbitrary boundary conditions by copying only boundary coefficients from the accelerator to host memory and then using existing OpenFOAM methods on the CPU. To improve overlap of computations and communications, we suggest using a pipelined variant of the conjugate gradient method and describe GPU-specific adjustments. In experimental evaluation, we obtain a 1.75x speedup in the linear solver by using a Tesla K20X accelerator in addition to a 10-core Xeon CPU, but only for sufficiently large problem sizes: below 1 million cells per accelerator the efficiency of GPU computations dimishes.

About the Authors

A. V. Monakov
ISP RAS
Russian Federation


V. A. Platonov
ISP RAS
Russian Federation


References

1. OpenFOAM — http://openfoam.org/

2. Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM. 2003.

3. A. Monakov, E. Velesevich, V. Platonov, A. Avetisyan. Instrumenty analiza i razrabotki effektivnogo koda dlya parallel'nykh arkhitektur [Analysis and development tools for efficient programs on parallel architectures]. Trudy ISP RAN [The Proceedings of ISP RAS], 2014, vol. 26, no 1. pp. 357-374 (in Russian). DOI: 10.15514/ISPRAS-2014-26(1)-14

4. M. Benzi. Preconditioning Techniques for Large Linear Systems: A Survey. Journal of Computational Physics, 182 (2002), pp. 418-477.

5. R. Bridson, W.-P. Tang. Refining an Approximate Inverse. Journal on Computational and Applied Math, 123 (2000), Numerical Analysis 2000 vol. III: Linear Algebra, pp. 293-306.

6. P. Ghysels, W. Vanroose. Hiding Global Synchronization Latency in the Preconditioned Conjugate Gradient Algorithm. Submitted to Parallel Computing, 2012


Review

For citations:


Monakov A.V., Platonov V.A. Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014;26(3):91-102. (In Russ.) https://doi.org/10.15514/ISPRAS-2014-26(3)-4



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)