Full name | Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding |
Description | Our backbone network is based on a 3D Swin transformer and carefully designed to efficiently conduct self-attention on sparse voxels with linear memory complexity and capture the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset and fine-tuned the pretrained model on ScanNet. |
Input Data Types | Uses Color,Uses Geometry Uses 3D |
Programming language(s) | Python and C++ |
Hardware | Tesla V100 |
Submission creation date | 5 Feb, 2023 |
Last edited | 24 Apr, 2023 |