The Problems With “Big”

In this talk I will discuss the challenges we at Nvidia have experienced with running at large scale. I will talk about issues with deployment of training on large scale systems, including how with think about IO and experiment tracking. Most of the talk I will focus on the numerical issues we see running at large scale and techniques and algorithms we use and have developed for large batch training and changing how we extract alternate forms of parallelism.

Location: Cumberland Amphitheatre Date: August 28, 2019 Time: 2:15 pm - 2:45 pm