Submitted anonymously.

Submission data

Full nameSwin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding
DescriptionOur backbone network is based on a 3D Swin transformer and carefully designed to efficiently conduct self-attention on sparse voxels with linear memory complexity and capture the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset and fine-tuned the pretrained model on ScanNet.
Input Data TypesUses Color,Uses Geometry        Uses 3D
Programming language(s)Python and C++
HardwareTesla V100
Submission creation date5 Feb, 2023
Last edited24 Apr, 2023

3D semantic label results

Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow