If you want to build a high-quality machine learning product, build a large, high-quality training set. At first glance, this seems as useful as the statement “if you want to be rich, get a lot of money.” However, a key idea driving our work is that new theoretical and systems concepts including weak supervision, automatic data augmentation policies, and more, can enable engineers to build training sets more quickly and cost effectively. Along with state-of-the-art results on benchmarks, these concepts have allowed our group and collaborators to build a range of state-of-the-art applications including patient-care monitoring on electronic health records, automatic triage systems for radiologists, and enabling cardiologists to spot rare abnormalities in video MRI—along with widely used products from Apple and Google. This talk describes the theoretical and systems challenges that such applications create. On the machine-learning theory side, a key problem is estimating the quality and correlation of various sources of training data—but without ground truth labels. This problem connects to classical questions about estimating the covariance of latent variable models. We describe our new techniques that solve this case and can even improve fully supervised methods for estimating the structure of graphical models. On the machine-learning systems side, this theory opens up new ways to build machine-learning systems. Here, we describe our recent work on systems that help engineers build and maintain machine learning products—without writing low-level code in frameworks like TensorFlow. These systems draw on recent ideas in machine learning, e.g., zero-code deep learning systems, and twists on classical data management ideas, e.g., schemas to separate the model, the supervision, and down-stream serving code. Much of this work is open source and available at http://snorkel.org or my website.
Biography: Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and enterprise companies. He cofounded a company, based on his research into machine learning systems, that was acquired by Apple in 2017. More recently, he cofounded SambaNova systems based, in part, on his work on accelerating machine learning. He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.View SKY Talk Session Information