less than 1 minute read

These are my notes on High Performance Machine Learning course.

Introduction to Benchmarking

Coding Exercise

We learn and implement microbenchmarks to measure execution time, memory bandwidth, and compute FLOPS. This provides insights into system efficiency and computational throughput.

Code

Introduction to Profiling ML Models

Coding Exercise

We implement a ResNet-18 model and train it on the CIFAR-10 dataset with various training configurations. We profile the performance, analyzing the impact of number of workers in data loaders, optimizers and batch norm layers.

Code

Introduction to Model Tuning

Coding Exercise

We program a ChatBot trained using parameters obtained from a Weights & Biases (W&B) parameter sweep. We profile the model using PyTorch Profiler to analyze performance bottlenecks. Additionally, we create an optimized TorchScript version for efficient deployment.

Code

Introduction to CUDA Programming

Coding Exercise

We implement CUDA kernels for Matrix Multiplication, Unified Memory, and Convolution. We then benchmark GPU performance, measuring execution time, memory throughput, and computational efficiency.

Code