Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Optimizing OpenFOAM GPU Solvers

Abstract

The paper presents preliminary research on improving performance of CFD simulations in OpenFOAM via offloading parts of computations (specifically, solution of linear systems) to a graphics accelerator (GPU). We present a short review of OpenFOAM package and describe porting conjugate gradient method to the GPU architecture using CUDA programming model.  Porting the basic algorithm is straightforward, however care should be taken to avoid unnecessary copying over PCI-Express bus.  Efficient preconditioning on the GPU is then discussed. We use approximate inverse preconditioning, which can be implemented with good parallelism on the GPU.  To amortize the cost of preparing the preconditioner, we allow reuse of preconditioners on the GPU and compute them on the CPU in a helper thread asynchronously. We mention several optimization opportunities: reordering the preconditioner to upper-left triangular form so that CUDA blocks multiplying by denser parts of preconditiner factors are scheduled first; using single-precision storage for the preconditioner to save memory bandwidth; reordering the mesh with nested dissection method from Metis library and using mixed-precision iteration for the conjugate gradient method. Preliminary performance testing results show performance improvement starting from 64000-cell meshes and reaching 2x for a 1-million cell mesh for a non-parallel run. As future work we mention support for parallel runs with MPI, research of other solvers such as multigrid, BiCGStab and IDR, and choosing drop tolerance automatically for the AINV preconditioner.

About the Author

Alexander Monakov
ISP RAS
Russian Federation


References

1. SGI, The OpenFOAM Foundation, http://openfoam.org/

2. The OpenFOAM Extend Project, http://www.extend-project.de/

3. M. Benzi, Preconditioning techniques for large linear systems: a survey. J. Comput. Phys., 128 (2002), 418–477

4. Y. Saad, Iterative methods for sparse linear systems, SIAM, Philadelphia, 2003, 567

5. R. Bridson, W.-P. Tang, Refining an approximate inverse, Journal on Computational and Applied Math, 123 (2000), Numerical Analysis 2000 vol. III: Linear Algebra, pp. 293-306.

6. S. Pissanetzky, Sparse Matrix Technology, Academic Press, Waltham, 1984, 312

7. G. Karypis, V. Kumar. METIS: Unstructured graph partitioning and sparse matrix ordering system, version 4.0, http://www.cs.umn.edu/~metis, 2009

8. D. Göddeke, R. Strzodka, S. Turek, Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations, International Journal of Parallel, Emergent and Distributed Systems (IJPEDS), Special issue: Applied parallel computing, 22 (2007), 221–256


Review

For citations:


Monakov A. Optimizing OpenFOAM GPU Solvers. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2012;22. (In Russ.)



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)