ScanNet++ CVPR 2024 Workshop

Introduction

Recent advances in generative modeling and semantic understanding have spurred significant interest in synthesis and understanding of 3D scenes. In 3D, there is significant potential in application areas, for instance augmented and virtual reality, computational photography, interior design, and autonomous mobile robots all require a deep understanding of 3D scene spaces. We propose to offer the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding -- where very complete, high-fidelity ground truth scene data is available. This is enabled through the new ScanNet++ dataset, which offers 1mm resolution laser scan geometry, high-quality DSLR image capture, and dense semantic annotations over 1000 class categories. In particular, existing view synthesis leverages data captured from a single continuous trajectory, where evaluation of novel views outside of the original trajectory capture is impossible. In contrast, our novel view synthesis challenge leverages test images captured intentionally outside of the train image trajectory, allowing for comprehensive evaluation of methods to test new, challenging scenarios for state-of-the-art methods.

Schedule

Welcome and Introduction	8:50am - 9:00am
Invited Talk: Federico Tombari, Neural Representations for 3D Scene Reconstruction and Understanding	9:00am - 9:30am
Invited Talk: Jon Barron, Radiance Fields	9:30am - 10:00am
Winner Talks: Point Transformer V3 with Point Prompt Training Robust Point-based Graphics Feature Splatting for Better Novel View Synthesis with Low Overlap	10:00am - 10:30am
Invited Talk: Vincent Lepetit, Discrete+Continuous Optimization for Self-Supervised 3D Scene Understanding	10:30am - 11:00am
Invited Talk: Lingjie Liu, Segmentation and Pixel Tracking for Non-Static Scene Reconstruction	11:00am - 11:30am
Invited Talk: Ben Mildenhall	11:30pm - 12:00pm
Panel Discussion and Conclusion	12:00pm - 12:30pm

Invited Speakers

Ben Mildenhall works on problems at the intersection of graphics and computer vision, specializing in view synthesis and inverse rendering. From 2021 to 2023, he was a research scientist at Google Research. He completed his PhD in computer science from UC Berkeley in 2020, advised by Ren Ng and supported by a Hertz Fellowship, and received the ACM Doctoral Dissertation Award Honorable Mention and David J. Sakrison Memorial Prize for his thesis work on neural radiance fields. He has received paper awards at ECCV 2020, ICCV 2021, CVPR 2022, and ICLR 2023.

Lingjie Liu is the Aravind K. Joshi Assistant Professor in the Department of Computer and Information Science at the University of Pennsylvania, where she leads the Penn Computer Graphics Lab and is also a member of the General Robotics, Automation, Sensing & Perception (GRASP) Lab. Previously, she was a Lise Meitner Postdoctoral Research Fellow at Max Planck Institute for Informatics. She received her Ph.D. degree at the University of Hong Kong in 2019. Her research interests are at the interface of Computer Graphics, Computer Vision, and AI, with a focus on Neural Scene Representations, Neural Rendering, Human Performance Modeling and Capture, and 3D Reconstruction. She is especially excited about exploring a new genre of 3D reconstruction and rendering algorithms for human characters and general scenes, which combine classical computer graphics pipelines with deep learning techniques.

Jon Barron is a senior staff research scientist at Google Research in San Francisco, where he works on computer vision and machine learning. He received a PhD in Computer Science from the University of California, Berkeley in 2013, where he was advised by Jitendra Malik, and he received a Honours BSc in Computer Science from the University of Toronto in 2007. He received a National Science Foundation Graduate Research Fellowship in 2009, the C.V. Ramamoorthy Distinguished Research Award in 2013, and the PAMI Young Researcher Award in 2020. His works have received awards at ECCV 2016, TPAMI 2016, ECCV 2020, ICCV 2021, CVPR 2022, the 2022 Communications of the ACM, and ICLR 2023.

Vincent Lepetit is a professor at ENPC ParisTech, France. Prior to this position, he was a full professor at the Institute for Computer Graphics and Vision, Graz University of Technology (TU Graz), Austria and before that, a senior researcher at CVLab, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. His current research focuses on 3D scene understanding, especially at trying to reduce the supervision needed by a system to learn new 3D objects and new 3D environments. In 2020, he received with colleagues the Koenderick “test-of-time” award for “Brief: Binary Robust Independent Elementary Features”. He often serves as an area chair of major computer vision conferences (CVPR, ICCV, ECCV) and as an editor for the Pattern Analysis and Machine Intelligence (PAMI). He was awarded in 2023 an ERC Advanced Grant for the 'explorer' project on creating digital twins of large-scale sites.

Federico Tombari is a Senior Staff Research Scientist and Manager at Google where he leads an applied research team in computer vision and machine learning across North America and Europe. He is also a Lecturer (PrivatDozent) at the Technical University of Munich (TUM). He has 250+ peer-reviewed publications in CV/ML with a particular focus on 3D object/scene reconstruction and understanding. He got his PhD from the University of Bologna and his Venia Legendi (Habilitation) from the Technical University of Munich (TUM). In 2018 he was co-founder and managing director of a startup on 3D perception for AR and robotics, then acquired by Google. He regularly serves as Area Chair and Associate Editor for international conferences and journals (IJRR, RA-L, IROS20/21/22, ICRA20/22, 3DV19/20/21/22/24, ECCV22/24, CVPR23/24, NeurIPS23 among others). He was the recipient of two Google Faculty Research Awards, one Amazon Research Award, 5 Outstanding Reviewer Awards (3xCVPR, ICCV21, NeuriIps21), among others. He has been a research partner of private and academic institutions including Google, Toyota, BMW, Audi, Amazon, Univ. Stanford, ETH and MIT.