ScanNet++ Novel View Synthesis and 3D Semantic Understanding Challenge

CVPR 2024 Workshop, Seattle, USA

June 18, 2024, Morning


teaser

Introduction

Recent advances in generative modeling and semantic understanding have spurred significant interest in synthesis and understanding of 3D scenes. In 3D, there is significant potential in application areas, for instance augmented and virtual reality, computational photography, interior design, and autonomous mobile robots all require a deep understanding of 3D scene spaces. We propose to offer the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding -- where very complete, high-fidelity ground truth scene data is available. This is enabled through the new ScanNet++ dataset, which offers 1mm resolution laser scan geometry, high-quality DSLR image capture, and dense semantic annotations over 1000 class categories. In particular, existing view synthesis leverages data captured from a single continuous trajectory, where evaluation of novel views outside of the original trajectory capture is impossible. In contrast, our novel view synthesis challenge leverages test images captured intentionally outside of the train image trajectory, allowing for comprehensive evaluation of methods to test new, challenging scenarios for state-of-the-art methods.


Schedule

Welcome and Introduction 8:50am - 9:00am
Invited Talk 1 9:00am - 9:30am
Invited Talk 2 9:30am - 10:00am
Winner Talks 10:00am - 10:30am
Invited Talk 3 10:30am - 11:00am
Invited Talk 4 11:00am - 11:30am
Invited Talk 5 11:30pm - 12:00pm
Panel Discussion and Conclusion 12:00pm - 12:30pm


Invited Speakers

Ben Mildenhall works on problems in graphics and 3D computer vision. From 2021 to 2023, he was a research scientist at Google Research. He received his PhD from UC Berkeley in 2020, where he was advised by Ren Ng and supported by a Hertz fellowship. In the summer of 2017, Ben was an intern in Marc Levoy's group in Google Research. In the summer of 2018, he worked with Rodrigo Ortiz-Cayon and Abhishek Kar at Fyusion. He did his undergrad at Stanford University and worked at Pixar Research in the summer of 2014.

Lingjie Liu is the Aravind K. Joshi Assistant Professor in the Department of Computer and Information Science at the University of Pennsylvania, where she leads the Penn Computer Graphics Lab and is also a member of the General Robotics, Automation, Sensing & Perception (GRASP) Lab. Previously, she was a Lise Meitner Postdoctoral Research Fellow at Max Planck Institute for Informatics. She received her Ph.D. degree at the University of Hong Kong in 2019. Her research interests are at the interface of Computer Graphics, Computer Vision, and AI, with a focus on Neural Scene Representations, Neural Rendering, Human Performance Modeling and Capture, and 3D Reconstruction. She is especially excited about exploring a new genre of 3D reconstruction and rendering algorithms for human characters and general scenes, which combine classical computer graphics pipelines with deep learning techniques.

Jon Barron is a senior staff research scientist at Google Research in San Francisco, where he works on computer vision and machine learning. He received a PhD in Computer Science from the University of California, Berkeley in 2013, where he was advised by Jitendra Malik, and he received a Honours BSc in Computer Science from the University of Toronto in 2007. He received a National Science Foundation Graduate Research Fellowship in 2009, the C.V. Ramamoorthy Distinguished Research Award in 2013, and the PAMI Young Researcher Award in 2020. His works have received awards at ECCV 2016, TPAMI 2016, ECCV 2020, ICCV 2021, CVPR 2022, the 2022 Communications of the ACM, and ICLR 2023.

Vincent Lepetit is a professor at ENPC ParisTech, France. Prior to this position, he was a full professor at the Institute for Computer Graphics and Vision, Graz University of Technology (TU Graz), Austria and before that, a senior researcher at CVLab, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. His current research focuses on 3D scene understanding, especially at trying to reduce the supervision needed by a system to learn new 3D objects and new 3D environments. In 2020, he received with colleagues the Koenderick “test-of-time” award for “Brief: Binary Robust Independent Elementary Features”. He often serves as an area chair of major computer vision conferences (CVPR, ICCV, ECCV) and as an editor for the Pattern Analysis and Machine Intelligence (PAMI). He was awarded in 2023 an ERC Advanced Grant for the 'explorer' project on creating digital twins of large-scale sites.

Federico Tombari is a Senior Staff Research Scientist and Manager at Google where he leads an applied research team in computer vision and machine learning across North America and Europe. He is also a Lecturer (PrivatDozent) at the Technical University of Munich (TUM). He has 250+ peer-reviewed publications in CV/ML with a particular focus on 3D object/scene reconstruction and understanding. He got his PhD from the University of Bologna and his Venia Legendi (Habilitation) from the Technical University of Munich (TUM). In 2018 he was co-founder and managing director of a startup on 3D perception for AR and robotics, then acquired by Google. He regularly serves as Area Chair and Associate Editor for international conferences and journals (IJRR, RA-L, IROS20/21/22, ICRA20/22, 3DV19/20/21/22/24, ECCV22/24, CVPR23/24, NeurIPS23 among others). He was the recipient of two Google Faculty Research Awards, one Amazon Research Award, 5 Outstanding Reviewer Awards (3xCVPR, ICCV21, NeuriIps21), among others. He has been a research partner of private and academic institutions including Google, Toyota, BMW, Audi, Amazon, Univ. Stanford, ETH and MIT.

Organizers

Angela Dai
Technical University of Munich
Yueh-Cheng Liu
Technical University of Munich
Chandan Yeshwanth
Technical University of Munich
Matthias Niessner
Technical University of Munich