What is cuda and how to set up?

6 min readApr 8, 2023

To fully grasp the significance of CUDA and its setup, it’s crucial to first comprehend the role GPUs play in the realm of deep learning. With the advent of GPUs in recent years, researchers have been able to train larger and more intricate models, leading to groundbreaking advancements in fields like computer vision and natural language processing. The development of applications like self-driving cars and voice assistants has also been made possible due to this GPU-driven progress.

In this blog post, we’ll delve into the history of GPUs in deep learning, analyze the pros and cons of utilizing them, and guide you through the process of installing and configuring CUDA, which enables access to GPUs for deep learning purposes.

GPU computation has become increasingly important in machine learning over the past few years, primarily due to the tremendous increase in the size and complexity of the datasets and models used in the field.

Evolution of GPU in Deep Learning

The use of GPUs (Graphics Processing Units) in deep learning has evolved significantly over the years, and here are the main stages of that evolution:

CPUs only: In the early days of deep learning, CPUs (Central Processing Units) were used exclusively to train models. However, this was a slow and inefficient process due to the computational intensity of deep learning algorithms.
Parallelization on CPUs: Researchers then turned to parallelizing computations on multiple CPUs to speed up the training process. This approach involved distributing the workload across multiple cores or machines, but it still wasn’t fast enough to meet the demands of large-scale deep learning models.
Introduction of GPUs: In the mid-2000s, GPUs were introduced as a way to accelerate graphics rendering. Researchers quickly realized that GPUs were also capable of processing large amounts of data in parallel, making them an ideal choice for deep learning.
CUDA framework: In 2007, NVIDIA released CUDA, a programming framework that allowed developers to write code for GPUs. This framework made it easier for researchers to use GPUs in their deep learning workflows, and it paved the way for the development of more powerful GPUs.
Specialized hardware: In recent years, several companies have developed specialized hardware for deep learning, such as Google’s TPU (Tensor Processing Unit) and NVIDIA’s Tensor Cores. These specialized chips are designed specifically for deep learning tasks, offering even faster training times and lower power consumption.

Advantages of GPUs in Deep learning

There are several advantages of using GPUs in deep learning:

Speed: GPUs have thousands of cores that can perform multiple operations simultaneously. This parallel processing capability enables them to perform deep learning calculations much faster than traditional CPUs.
Efficiency: GPUs are designed to handle the kinds of calculations required for deep learning, which means they are more energy-efficient than CPUs. This can result in significant cost savings, especially for large-scale deep learning projects.
Scalability: GPUs can be easily scaled by adding more cards to a computer, making it possible to perform deep learning on large datasets in a reasonable amount of time.
Flexibility: GPUs are compatible with a wide range of deep learning frameworks, making it easier for researchers and developers to experiment with different models and architectures.
Accuracy: GPUs can help improve the accuracy of deep learning models by enabling researchers to train larger and more complex models that can capture more nuanced features in the data.

Limitations of GPUs in Deep Learning

While GPUs have been instrumental in accelerating deep learning, they also have several limitations. Here are some of the key limitations of GPUs in deep learning:

Memory limitations: GPUs have limited memory compared to CPUs, which can be a significant bottleneck when training large models. To overcome this limitation, researchers often use techniques such as model parallelism or data parallelism, which distribute the model or data across multiple GPUs.
Power consumption: GPUs consume a significant amount of power, which can lead to high electricity bills and environmental concerns. To address this limitation, researchers are exploring low-power alternatives, such as FPGAs (Field-Programmable Gate Arrays) and TPUs (Tensor Processing Units).
Limited precision: GPUs are designed to work with 32-bit floating-point numbers, which can be limiting when training models that require higher precision, such as those used in scientific computing. To overcome this limitation, researchers have developed techniques such as mixed-precision training, which uses a combination of 16-bit and 32-bit floating-point numbers.
Limited flexibility: GPUs are designed for specific types of operations, such as matrix multiplications, and may not be optimal for other types of computations. To address this limitation, researchers are exploring alternatives, such as TPUs, which are designed specifically for deep learning workloads.
Cost: Individual GPUs are currently much more expensive than CPUs. Specialized large-scale GPU systems can reach costs of hundreds of thousands of dollars.

What is Cuda?

CUDA(Compute Unified Device Architecture) is a parallel computing platform and an application programming interface that allows the software to use certain types of graphics processing units(GPU) for general-purpose processing, an approach called general-purpose computing on GPUs. It was developed by NVIDIA in 2006.

GPUs are commonly used in deep learning for training models due to their high parallel processing capabilities.

Deep learning models typically require vast amounts of data and complex calculations, making the training process computationally intensive. A traditional CPU (Central Processing Unit) can only handle a limited number of calculations at once, resulting in slower training times.

However, GPUs are designed to handle a large number of calculations in parallel, making them highly efficient for training deep learning models. This is because GPUs have thousands of cores, which can perform multiple operations simultaneously, allowing for faster processing times.

In addition, GPUs can also be used to accelerate the inference or prediction stage of a deep learning model. During inference, a trained model is used to make predictions on new data. GPUs can perform this task quickly and efficiently, making them useful for applications such as computer vision, natural language processing, and speech recognition.

Steps to install CUDA in Ubuntu

Update the outdated packages in the server.

sudo apt update

2. Install the C compiler by the installation of the development package build-essential.

sudo apt-get install build-essential cmake unzip pkg-config

3. Install GNU Compiler Collection (GCC) compiler.

sudo apt-get install gcc

4. Add the graphics drivers repository to the system.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

5. Get the cuda installer(i.e., .run file) from https://developer.nvidia.com/cuda-downloads

wget https://developer.download.nvidia.com/compute/cuda/<cuda version>/local_installers/cuda_<cuda version>_<driver version>_linux.run

6. Install the downloaded file with a shell command.

sudo sh cuda_<cuda version>_<driver version>_linux.run

7. Add the CUDA installed folder in your system path. Open bashrc file in nano editor and update path.

nano ~/.bashrc

Add the following lines at the end of the bashrc file.

# NVIDIA CUDA TOOLKIT
export PATH=/usr/local/cuda-<cuda version>/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-<cuda version>/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Press Ctrl + x, y and Enter to save changes. Next, run the source command to update your current shell environment.

source ~/.bashrc

Check cuda version

Check CUDA toolkit version using NVIDIA’s CUDA Compiler(NVCC) is based on the widely used LLVM open-source compiler infrastructure. Developers can create or extend programming languages with support for GPU acceleration using the NVIDIA Compiler SDK.

nvcc -V

2. NVSMI, which stands for Nvidia System Management Interface, offers monitoring and management functionalities for all NVIDIA devices within the Fermi and newer architecture families, including Tesla, Quadro, and GRID. While the information available for GeForce devices is relatively limited, you can still use NVSMI to verify the driver versions installed on your system.

nvidia-smi

Debugging Installation

Check cuda driver and toolkit version

cat /usr/local/cuda/version.json

Uninstall CUDA

Remove CUDA Toolkit

sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"

Remove NVIDIA Drivers

sudo apt-get --purge remove "*nvidia*"

Clean up the uninstall

sudo apt-get autoremove

Additional details

CUDA has 2 primary APIs, the runtime and the driver API.
The necessary support for the driver API (e.g. libcuda.soon linux) is installed by the GPU driver installer.
The necessary support for the runtime API (e.g. libcudart.soon linux, and also nvcc) is installed by the CUDA toolkit installer
The nvidia-smi tool gets installed by the GPU driver installer, and generally has the GPU driver in view, not anything installed by the CUDA toolkit installer.
nvcc, the CUDA compiler-driver tool that is installed with the CUDA toolkit, will always report the CUDA runtime version that it was built to recognize. It doesn't know anything about what driver version is installed, or even if a GPU driver is installed.

References:

https://developer.nvidia.com/cuda-downloads