Submitted by Adrian Kruse.

Submission data

Full nameVolume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding
Publication titleVolume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding
Publication authorsKadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe
Publication URLhttps://arxiv.org/pdf/2604.19609
Input Data TypesUses Color,Uses Geometry        Uses 3D
Programming language(s)Python
Hardware4× AMD EPYC 7402, 8× A100 40GB, 1 TB RAM
Websitehttps://yilmazkadir.github.io/Volt/
Source code or download URLhttps://github.com/YilmazKadir/Volt
Submission creation date4 Mar, 2026
Last edited22 Apr, 2026
Last uploaded4 Mar, 2026

3D semantic instance results



Infoavg ap 50%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
permissive0.8271.0000.9810.9750.8010.9400.4260.6930.7520.7620.8000.8040.8550.9590.7450.8790.8060.9970.710