ScanNet++ CVPR 2025 Workshop

Introduction

Recent advances in generative modeling and semantic understanding have spurred significant interest in synthesis and understanding of 3D scenes. In 3D, there is significant potential in application areas, for instance augmented and virtual reality, computational photography, interior design, and autonomous mobile robots all require a deep understanding of 3D scene spaces.

We propose to offer the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding -- where very complete, high-fidelity ground truth scene data is available. This is enabled through the new ScanNet++ dataset, which offers 1mm resolution laser scan geometry, high-quality DSLR image capture, and dense semantic annotations over 1000 class categories. In particular, existing view synthesis leverages data captured from a single continuous trajectory, where evaluation of novel views outside of the original trajectory capture is impossible. In contrast, our novel view synthesis challenge leverages test images captured intentionally outside of the train image trajectory, allowing for comprehensive evaluation of methods to test new, challenging scenarios for state-of-the-art methods.

Please download the dataset through here and submit your result before May 22 to be considered for the challenge.

📢 New this year 📢 ScanNet++ v2 released with 1000+ scenes, more scene types, improved annotations and poses. Check it out!

Schedule

Welcome and Introduction		8:50am - 9:00am
Invited Talk 1:	Katja Schwarz - Building Worlds from Noise: Advances and Challenges in 3D Diffusion Models	9:00am - 9:30am
Invited Talk 2:	Cordelia Schmid - LLMs for Scene Generation and 3D Object Placement	9:30am - 10:00am
Winner Talk:	DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering Distilling Radiance Fields with Robust Point-based Graphics for Novel View Synthesis	10:00am - 10:30am
Invited Talk 3:	Qianqian Wang - Learning to Perceive the 4D World Online	10:30am - 11:00am
Invited Talk 4:	Gordon Wetzstein - World Models with Memory	11:00am - 11:30am
Invited Talk 5:	Andrea Vedaldi - Feed-forward 3D Reconstruction	11:30pm - 12:00pm
Panel Discussion and Conclusion		12:00pm - 12:30pm

Invited Speakers

Cordelia Schmid is a research director at Inria. She holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on "Local Greyvalue Invariants for Image Matching and Retrieval" received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled "From Image Matching to Learning Visual Models". Dr. Schmid is a member of the German National Academy of Sciences, Leopoldina and a fellow of IEEE and the ELLIS society. She was awarded the Longuet-Higgins prize in 2006, 2014 and 2016, the Koenderink prize in 2018 and the Helmholtz price in 2023, all for fundamental contributions in computer vision that have withstood the test of time. She received an ERC advanced grant in 2013, the Humbolt research award in 2015, the Inria & French Academy of Science Grand Prix in 2016, the Royal Society Milner award in 2020 and the PAMI distinguished researcher award in 2021. In 2023 she received the Korber European Science Prize. Dr. Schmid has been an Associate Editor for IEEE PAMI (2001- 2005) and for IJCV (2004–2012), an editor-in-chief for IJCV (2013–2018), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015, ECCV 2020 and ICCV 2023. Starting 2018 she holds a joint appointment with Google research.

Andrea Vedaldi is Professor of Computer Vision and Machine Learning at the University of Oxford, where he co-leads the Visual Geometry Group since 2012. He is also a senior research scientist and technical lead at Meta. He researches generative AI in 3D computer vision, applied to the generation of 3D content from text and images and to image understanding. He is the author of more than 200 peer-reviewed publications in computer vision and machine learning. He is the recipient of the IEEE Thomas Huang Memorial Prize, the IEEE Mark Everingham Prize, and the Test of Time Award by the ACM, and the best paper award from the Conference on Computer Vision and Pattern Recognition.

Gordon Wetzstein is an Associate Professor of Electrical Engineering and, by courtesy, of Computer Science at Stanford University. He is the leader of the Stanford Computational Imaging Lab and a faculty co-director of the Stanford Center for Image Systems Engineering. At the intersection of computer graphics and vision, artificial intelligence, computational optics, and applied vision science, Prof. Wetzstein's research has a wide range of applications in next-generation imaging, wearable computing, and neural rendering systems. Prof. Wetzstein is a Fellow of Optica and the recipient of numerous awards, including an IEEE VGTC Virtual Reality Technical Achievement Award, an NSF CAREER Award, an Alfred P. Sloan Fellowship, an ACM SIGGRAPH Significant New Researcher Award, a Presidential Early Career Award for Scientists and Engineers (PECASE), an SPIE Early Career Achievement Award, an Electronic Imaging Scientist of the Year Award, an Alain Fournier Ph.D. Dissertation Award as well as many Best Paper and Demo Awards.

Katja Schwarz worked as a Research Scientist at Meta AI. Prior to this, she did her PhD in the Autonomous Vision Group (AVG) at Tübingen University and the Max Planck Institute for Intelligent Systems, advised by Andreas Geiger. She studied Physics at Heidelberg University where she received her bachelor degree in 2016 and master degree in 2018. Her research lies at the intersection of computer vision and graphics and focuses on 3D vision. In particular, she is interested in enabling machines to infer 3D representations from sparse observations, such as 2D images. She is passionate about leveraging such representations for generative modeling in 2D and 3D.

Qianqian Wang is a postdoctoral researcher at UC Berkeley, working with Prof. Angjoo Kanazawa and Prof. Alexei A. Efros. She completed her PhD in Computer Science at Cornell Tech, Cornell University with Prof. Noah Snavely and Prof. Bharath Hariharan. Before that she received her bachelor's degree from Zhejiang University, working with Prof. Xiaowei Zhou.