Submitted anonymously.

Submission data

Full nameSwin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding
DescriptionOur backbone network is based on a 3D Swin transformer and carefully designed to efficiently conduct self-attention on sparse voxels with linear memory complexity and capture the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset and fine-tuned the pretrained model on ScanNet.
Input Data TypesUses Color,Uses Geometry        Uses 3D
Programming language(s)Python and C++
HardwareTesla V100
Submission creation date5 Feb, 2023
Last edited24 Apr, 2023

3D semantic label results

Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
permissive0.7790.8610.8180.8360.7900.8750.5760.9050.7040.7390.9690.6110.3490.7560.9580.7020.8050.7080.9160.8980.801