Deep Learning algorithms have enjoyed tremendous success in domains such as Computer Vision, and Natural Language Processing. The primary reasons for this success are the availability of large labeled data (big data), and the compute capacity provided by the large-scale HPC systems (big compute). Yet, there is a widening gap between the requirements of a data scientist — who executes the Deep Learning algorithms — and the capabilities provided by these large-scale systems. In this talk, I will present the disparity between the time-to-solution metric — the primary metric of interest to the data scientist, and the compute efficiency metric — the primary metric of interest to a large-scale system designer. I will provide an overview of solutions for reducing this gap that leverage the properties of large labeled datasets and conclude with a need to incorporate these solutions across the systems software associated with Deep Learning algorithms.