Archives

Enhancing Driver Awareness at the Edge

From calls, texts, e-mails, and even mobile games, distracted pedestrians and drivers are prevalent on roadways today. The National Highway Traffic Safety Administration’s National Center for Statistics and Analysis published a report of statistical analysis related to distracted driving. It was reported that about 10% of vehicle crashes as well as 10% of fatalities reported […]

Smart Infrastructure, Smart Science

At the height of its success, Cloud Computing is yielding to Edge Computing. Why is this happening? What is the unique value proposition of Edge Computing? As real-world deployments of Edge Computing appear, how will the lives of end users be improved? What new applications and capabilities will they see? Which are the applications that […]

Edgascale Computing? Why Exascale Needs an Edge

Supercomputing used to be simpler. Input files for extreme-scale jobs evolved slowly, and filesystems were sluggish. Those days are gone. Today’s extreme-scale platforms are rushing toward GPUs and heterogeneous architectures, reduced-precision arithmetic, and data-driven machine learning. Furthermore, instead of a handful of large computing centers linked with high-speed networking, we have nearly ubiquitous fast networking; […]

Beyond Moore: An Arm vision for edge to post-Exascale computing

With the end of lithography scaling as a key driver for technology improvement and the end of Dennard scaling, improvements in application performance will increasingly come from customized memory hierarchies and accelerator devices. No matter what form the accelerators take, where they are located (edge or on-chip), or the composition of the memory hierarchies, the […]

Spin: Deploying Edge Services with Docker at NERSC

Scientific research projects now routinely depend on science gateways, workflow managers, databases, API endpoints, and other network services that exist on the “edge” of conventional HPC systems, leveraging the resources inside. Building these services and providing the operational infrastructure and support to make them reliable enough for modern workflows presents challenges for both research teams […]

The Problems With “Big”

In this talk I will discuss the challenges we at Nvidia have experienced with running at large scale. I will talk about issues with deployment of training on large scale systems, including how with think about IO and experiment tracking. Most of the talk I will focus on the numerical issues we see running at […]

Challenges Moving Toward High Performance Machine Learning

Machine learning has in recent years gone from being data-intensive to being (also) computation-intensive. This means that performance is increasingly important. However, many of the use cases and demands of machine learning are quite different than those of traditional scientific computing, and understanding these differences is important for understanding how ideas from scientific computing and […]

Local Distributed SGD: Communication, Convergence and Residual Error

Stochastic Gradient Descent (SGD) is the work horse of modern machine learning, and is commonly used in training in several classifiers including deep neural networks. This talk will focus on a common variant of SGD implemented in distributed systems known as local, distributed SGD, where the model is updated locally at different distributed processing components […]