An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code

Volume 10, Issue 2

PuyaMemarzia and Farshad Khunjush

J. Info. Comput. Sci. , 10 (2015), pp. 124-136.

Preview Purchase PDF 118 913

Cited by

google scholar semantic scholar

Export citation

Abstract

In recent years, the rise of GPGPU as a viable solution for high performance computing has been accompanied by fresh challenges for developers. Chief among these challenges is efficiently harnessing the formidable power of the GPU and finding performance bottlenecks. Many factors play a role in a GPU application’s performance. This creates the need for studies performance comparisons, and ways to analyze programs from a fundamental level. With that in mind, our goal is to present an in-depth performance comparison of the CUDA and OpenCL platforms, and study how PTX code can affect performance. In order to achieve this goal, we explore the subject from three different angles: kernel execution times, data transfers that occur between the host and device, and the PTX code that is generated by each platform’s compiler. We carry out our experiments using ten real-world GPU kernels from the digital image processing domain, a selection of variable input data sizes, and a pair of GPUs based on the Nvidia Fermi and Kepler architectures. We show how PTX statistics and analysis can be used to provide further insight on performance discrepancies and bottlenecks. Our results indicate that, in an unbiased comparison such as this one, the OpenCL and CUDA platforms are essentially similar in terms of performance.

Keywords

GPU CUDA OpenCL PTX Performance.

AMS Subject Headings

Email address

BibTex
RIS
TXT

@Article{JICS-10-124, author = {PuyaMemarzia and Farshad Khunjush}, title = {An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code}, journal = {Journal of Information and Computing Science}, year = {2024}, volume = {10}, number = {2}, pages = {124--136}, abstract = {In recent years, the rise of GPGPU as a viable solution for high performance computing has been accompanied by fresh challenges for developers. Chief among these challenges is efficiently harnessing the formidable power of the GPU and finding performance bottlenecks. Many factors play a role in a GPU application’s performance. This creates the need for studies performance comparisons, and ways to analyze programs from a fundamental level. With that in mind, our goal is to present an in-depth performance comparison of the CUDA and OpenCL platforms, and study how PTX code can affect performance. In order to achieve this goal, we explore the subject from three different angles: kernel execution times, data transfers that occur between the host and device, and the PTX code that is generated by each platform’s compiler. We carry out our experiments using ten real-world GPU kernels from the digital image processing domain, a selection of variable input data sizes, and a pair of GPUs based on the Nvidia Fermi and Kepler architectures. We show how PTX statistics and analysis can be used to provide further insight on performance discrepancies and bottlenecks. Our results indicate that, in an unbiased comparison such as this one, the OpenCL and CUDA platforms are essentially similar in terms of performance. }, issn = {1746-7659}, doi = {https://doi.org/}, url = {http://global-sci.org/intro/article_detail/jics/22555.html} }

TY - JOUR T1 - An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code AU - PuyaMemarzia and Farshad Khunjush JO - Journal of Information and Computing Science VL - 2 SP - 124 EP - 136 PY - 2024 DA - 2024/01 SN - 10 DO - http://doi.org/ UR - https://global-sci.org/intro/article_detail/jics/22555.html KW - GPU KW - CUDA KW - OpenCL KW - PTX KW - Performance. AB - In recent years, the rise of GPGPU as a viable solution for high performance computing has been accompanied by fresh challenges for developers. Chief among these challenges is efficiently harnessing the formidable power of the GPU and finding performance bottlenecks. Many factors play a role in a GPU application’s performance. This creates the need for studies performance comparisons, and ways to analyze programs from a fundamental level. With that in mind, our goal is to present an in-depth performance comparison of the CUDA and OpenCL platforms, and study how PTX code can affect performance. In order to achieve this goal, we explore the subject from three different angles: kernel execution times, data transfers that occur between the host and device, and the PTX code that is generated by each platform’s compiler. We carry out our experiments using ten real-world GPU kernels from the digital image processing domain, a selection of variable input data sizes, and a pair of GPUs based on the Nvidia Fermi and Kepler architectures. We show how PTX statistics and analysis can be used to provide further insight on performance discrepancies and bottlenecks. Our results indicate that, in an unbiased comparison such as this one, the OpenCL and CUDA platforms are essentially similar in terms of performance.

PuyaMemarzia and Farshad Khunjush. (2024). An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code. Journal of Information and Computing Science. 10 (2). 124-136. doi:

Copy to clipboard

BibteX RIS TXT

The citation has been copied to your clipboard

- LOGIN -

- E-mail verification -

- REGISTER -