Jones Seminar: Computer Vision with Big Weakly-Labeled Data

Lorenzo Torresani, Associate Professor of Computer Science, Dartmouth

Friday, May 8, 2015, 3:30–4:30pm

Spanos Auditorium, Cummings Hall

This seminar is part of the Jones Seminars on Science, Technology, and Society series.

Most modern computer vision methods employ a strongly-supervised learning paradigm that requires training on massive collections of richly-labeled images. These rich labels are provided by either human annotators or auxiliary sensors. For example, in order to build a model that automatically localizes and recognizes cars in pictures, it is necessary to train it on thousands of image examples containing manually selected regions specifying the locations of cars in each photo. Similarly, scene depth estimation techniques are customarily trained on large datasets of RGBD images, i.e., photos obtained with depth sensors that capture not only the color information (the RGB channels) but also the depth of the scene (the D channel). Unfortunately, the reliance on time-consuming human labeling or sensory data collection greatly limits the applicability of these methods to new settings or novel domains.

In this talk I will discuss the idea of eliminating or reducing the need for rich labels by leveraging existing large repositories of weakly-labeled images, i.e., photos annotated only with class labels indicating which objects are present but not their location. First, I will discuss self-taught object localization, a method that learns to localize objects without strong human supervision, i.e., without region annotations. Then, I will present an algorithm for depth estimation that achieves accuracy superior to the state-of-the-art while using two orders of magnitude less RGBD training data.

About the Speaker

Lorenzo Torresani is an Associate Professor in the Computer Science Department at Dartmouth. He received a Laurea Degree in Computer Science with summa cum laude honors from the University of Milan (Italy) in 1996, and an M.S. and a Ph.D. in Computer Science from Stanford University in 2001 and 2005, respectively. In the past, he has worked at several industrial research labs including Microsoft Research Cambridge, and Digital Persona. His research interests are in computer vision and machine learning. He is the recipient of several awards, including a CVPR best paper prize, a National Science Foundation CAREER Award, and a Google Faculty Research Award.

For more information, contact Haley Tucker at