Cited by
- BibTex
- RIS
- TXT
The lattice Boltzmann method (LBM) can gain a great amount of performance benefit by taking advantage of graphics processing unit (GPU) computing, and thus, the GPU, or multi-GPU based LBM can be considered as a promising and competent candidate in the study of large-scale fluid flows. However, the multi-GPU based lattice Boltzmann algorithm has not been studied extensively, especially for simulations of flow in complex geometries. In this paper, through coupling with the message passing interface (MPI) technique, we present an implementation of multi-GPU based LBM for fluid flow through porous media as well as some optimization strategies based on the data structure and layout, which can apparently reduce memory access and completely hide the communication time consumption. Then the performance of the algorithm is tested on a one-node cluster equipped with four Tesla C1060 GPU cards where up to 1732 MFLUPS is achieved for the Poiseuille flow and a nearly linear speedup with the number of GPUs is also observed.
}, issn = {2075-1354}, doi = {https://doi.org/10.4208/aamm.2014.m468}, url = {http://global-sci.org/intro/article_detail/aamm/10940.html} }The lattice Boltzmann method (LBM) can gain a great amount of performance benefit by taking advantage of graphics processing unit (GPU) computing, and thus, the GPU, or multi-GPU based LBM can be considered as a promising and competent candidate in the study of large-scale fluid flows. However, the multi-GPU based lattice Boltzmann algorithm has not been studied extensively, especially for simulations of flow in complex geometries. In this paper, through coupling with the message passing interface (MPI) technique, we present an implementation of multi-GPU based LBM for fluid flow through porous media as well as some optimization strategies based on the data structure and layout, which can apparently reduce memory access and completely hide the communication time consumption. Then the performance of the algorithm is tested on a one-node cluster equipped with four Tesla C1060 GPU cards where up to 1732 MFLUPS is achieved for the Poiseuille flow and a nearly linear speedup with the number of GPUs is also observed.