Course Notes: High Performance Machine Learning
These are my notes on High Performance Machine Learning course.
Introduction to Benchmarking
Coding Exercise
We learn and implement microbenchmarks to measure execution time, memory bandwidth, and compute FLOPS. This provides insights into system efficiency and computational throughput.
Introduction to Profiling ML Models
Coding Exercise
We implement a ResNet-18 model and train it on the CIFAR-10 dataset with various training configurations. We profile the performance, analyzing the impact of number of workers in data loaders, optimizers and batch norm layers.
Introduction to Model Tuning
Coding Exercise
We program a ChatBot trained using parameters obtained from a Weights & Biases (W&B) parameter sweep. We profile the model using PyTorch Profiler to analyze performance bottlenecks. Additionally, we create an optimized TorchScript version for efficient deployment.
Introduction to CUDA Programming
Coding Exercise
We implement CUDA kernels for Matrix Multiplication, Unified Memory, and Convolution. We then benchmark GPU performance, measuring execution time, memory throughput, and computational efficiency.