Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform

A. V. Monakov; V. A. Platonov

doi:10.15514/ISPRAS-2014-26(3)-4

Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform

A. V. Monakov, V. A. Platonov

https://doi.org/10.15514/ISPRAS-2014-26(3)-4

Full Text:

PDF (Rus) |

Generate QR code

Abstract

We describe an implementation of conjugate gradient method on heterogeneous platforms (multiple nodes with GPU accelerators) to be used in OpenFOAM. Several optimizations are described. For conjugate gradient itself, we suggest using device memory for scalars used only on the GPU and pinned memory for scalars used in MPI reductions. For preconditioning, we choose AINV as a suitable preconditioner for GPUs and describe ways to make it more efficient, such as storing in it single precision, laying out factors in upper-left triangular form and computing it on the CPU asynchronously. We describe how multi-GPU computing can be supported together with arbitrary boundary conditions by copying only boundary coefficients from the accelerator to host memory and then using existing OpenFOAM methods on the CPU. To improve overlap of computations and communications, we suggest using a pipelined variant of the conjugate gradient method and describe GPU-specific adjustments. In experimental evaluation, we obtain a 1.75x speedup in the linear solver by using a Tesla K20X accelerator in addition to a 10-core Xeon CPU, but only for sufficiently large problem sizes: below 1 million cells per accelerator the efficiency of GPU computations dimishes.

Keywords

conjugate gradient method, AINV preconditioning, OpenFOAM, GPU, MPI

About the Authors

A. V. Monakov

ISP RAS
Russian Federation

V. A. Platonov

ISP RAS
Russian Federation

References

1. OpenFOAM — http://openfoam.org/

2. Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM. 2003.

3. A. Monakov, E. Velesevich, V. Platonov, A. Avetisyan. Instrumenty analiza i razrabotki effektivnogo koda dlya parallel'nykh arkhitektur [Analysis and development tools for efficient programs on parallel architectures]. Trudy ISP RAN [The Proceedings of ISP RAS], 2014, vol. 26, no 1. pp. 357-374 (in Russian). DOI: 10.15514/ISPRAS-2014-26(1)-14

4. M. Benzi. Preconditioning Techniques for Large Linear Systems: A Survey. Journal of Computational Physics, 182 (2002), pp. 418-477.

5. R. Bridson, W.-P. Tang. Refining an Approximate Inverse. Journal on Computational and Applied Math, 123 (2000), Numerical Analysis 2000 vol. III: Linear Algebra, pp. 293-306.

6. P. Ghysels, W. Vanroose. Hiding Global Synchronization Latency in the Preconditioned Conjugate Gradient Algorithm. Submitted to Parallel Computing, 2012

Review

For citations:

Monakov A.V., Platonov V.A. Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014;26(3):91-102. (In Russ.) https://doi.org/10.15514/ISPRAS-2014-26(3)-4

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Optimizations for linear solvers in OpenFOAM for MPI + CUDA platform

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy