Tuesday, 8 January 2019

GPU Programming with Python

In this article, we’ll dive into GPU programming with Python. Using the ease of Python, you can unlock the incredible computing power of your video card’s GPU (graphics processing unit). In this example, we’ll work with NVIDIA’s CUDA library.

Limitations and Benefits of GPU Programming

It’s tempting to think that we can convert any Python program into a GPU-based program, dramatically accelerating its performance. However, the GPU on a video card works considerably differently than a standard CPU in a computer.
CPUs handle a lot of different inputs and outputs and have a wide assortment of instructions for dealing with these situations. They also are responsible for accessing memory, dealing with the system bus, handling protection rings, segmenting, and input/output functionality. They are extreme multitaskers with no specific focus.
GPUs on the other hand are built to process simple functions with blindingly fast speed. To accomplish this, they expect a more uniform state of input and output. By specializing in scalar functions. A scalar function takes one or more inputs but returns only a single output. These values must be types pre-defined by numpy.

Open Notepad++ and copy give code and paste in Notepad++ and save file as gpu.py on Desktop
<<<<< code start >>>>>
import numpy as np
from timeit import default_timer as timer
from numba import vectorize

# This should be a substantially high value. On my test machine, this took
# 33 seconds to run via the CPU and just over 3 seconds on the GPU.
NUM_ELEMENTS = 100000000

# This is the CPU version.
c = np.zeros(NUM_ELEMENTS, dtype=np.float32)
for i in range(NUM_ELEMENTS):
c[i] = a[i] + b[i]
return c

# This is the GPU version. Note the @vectorize decorator. This tells
# numba to turn this into a GPU vectorized function.
@vectorize(["float32(float32, float32)"], target='cuda')
return a + b;

def main():
a_source = np.ones(NUM_ELEMENTS, dtype=np.float32)
b_source = np.ones(NUM_ELEMENTS, dtype=np.float32)

# Time the CPU function
start = timer()

# Time the GPU function
start = timer()

# Report times
print("CPU function took %f seconds." % vector_add_cpu_time)
print("GPU function took %f seconds." % vector_add_gpu_time)

return 0

if __name__ == "__main__":

main()

<<<<< code end >>>>>

After installing Anaconda follow given step :
conda update conda
conda install numba
conda install cudatoolkit

cd Desktop
python gpu.py

NOTE: If you run into issues when running your program, try using “conda install accelerate”.
As you can see, the CPU version runs considerably slower.
If not, then your iterations are too small. Adjust the NUM_ELEMENTS to a larger value (on mine, the breakeven mark seemed to be around 100 million). This is because the setup of the GPU takes a small but noticeable amount of time, so to make the operation worth it, a higher workload is needed. Once you raise it above the threshold for your machine, you’ll notice substantial performance improvements of the GPU version over the CPU version.