Submitted by Adrian Kruse.

Submission data

Full nameVolume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding
Publication titleVolume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding
Publication authorsKadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe
Publication URLhttps://arxiv.org/pdf/2604.19609
Input Data TypesUses Color,Uses Geometry        Uses 3D
Programming language(s)Python
Hardware4× AMD EPYC 7402, 8× A100 40GB, 1 TB RAM
Websitehttps://yilmazkadir.github.io/Volt/
Source code or download URLhttps://github.com/YilmazKadir/Volt
Submission creation date18 Feb, 2026
Last edited22 Apr, 2026
Last uploaded18 Feb, 2026

3D semantic label results

Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
permissive0.8050.9320.8460.8010.7750.8620.6040.9550.7790.7220.9800.6350.3520.7990.9410.8870.8070.7480.9730.9110.798