3D Semantic instance benchmark
The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.
Evaluation and metricsOur evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP). Note that multiple predictions of the same ground truth instance are penalized as false positives.
This table lists the benchmark results for the 3D semantic instance scenario.
Method | Info | avg ap 50% | bathtub | bed | bookshelf | cabinet | chair | counter | curtain | desk | door | otherfurniture | picture | refrigerator | shower curtain | sink | sofa | table | toilet | window |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
SoftGroup++ | 0.769 1 | 1.000 1 | 0.803 13 | 0.937 1 | 0.684 3 | 0.865 3 | 0.213 11 | 0.870 2 | 0.664 2 | 0.571 4 | 0.758 1 | 0.702 4 | 0.807 1 | 1.000 1 | 0.653 11 | 0.902 1 | 0.792 2 | 1.000 1 | 0.626 1 | |
SoftGroup | ![]() | 0.761 2 | 1.000 1 | 0.808 11 | 0.845 5 | 0.716 1 | 0.862 5 | 0.243 8 | 0.824 3 | 0.655 4 | 0.620 2 | 0.734 2 | 0.699 5 | 0.791 3 | 0.981 18 | 0.716 4 | 0.844 4 | 0.769 3 | 1.000 1 | 0.594 5 |
Thang Vu, Kookhoi Kim, Tung M. Luu, Xuan Thanh Nguyen, Chang D. Yoo: SoftGroup for 3D Instance Segmentaiton on Point Clouds. CVPR 2022 [Oral] | ||||||||||||||||||||
GraphCut | 0.732 3 | 1.000 1 | 0.788 16 | 0.724 15 | 0.642 5 | 0.859 6 | 0.248 7 | 0.787 8 | 0.618 6 | 0.596 3 | 0.653 3 | 0.722 2 | 0.583 21 | 1.000 1 | 0.766 2 | 0.861 2 | 0.825 1 | 1.000 1 | 0.504 13 | |
IPCA-Inst | 0.731 4 | 1.000 1 | 0.788 17 | 0.884 4 | 0.698 2 | 0.788 19 | 0.252 6 | 0.760 10 | 0.646 5 | 0.511 10 | 0.637 5 | 0.665 6 | 0.804 2 | 1.000 1 | 0.644 12 | 0.778 9 | 0.747 4 | 1.000 1 | 0.561 8 | |
DKNet | 0.718 5 | 1.000 1 | 0.814 8 | 0.782 9 | 0.619 6 | 0.872 2 | 0.224 9 | 0.751 12 | 0.569 8 | 0.677 1 | 0.585 8 | 0.724 1 | 0.633 14 | 0.981 18 | 0.515 20 | 0.819 6 | 0.736 5 | 1.000 1 | 0.617 2 | |
HAIS | ![]() | 0.699 6 | 1.000 1 | 0.849 3 | 0.820 6 | 0.675 4 | 0.808 13 | 0.279 4 | 0.757 11 | 0.465 13 | 0.517 9 | 0.596 6 | 0.559 8 | 0.600 16 | 1.000 1 | 0.654 10 | 0.767 10 | 0.676 9 | 0.994 23 | 0.560 9 |
Shaoyu Chen, Jiemin Fang, Qian Zhang, Wenyu Liu, Xinggang Wang: Hierarchical Aggregation for 3D Instance Segmentation. ICCV 2021 | ||||||||||||||||||||
SSTNet | ![]() | 0.698 7 | 1.000 1 | 0.697 30 | 0.888 3 | 0.556 14 | 0.803 14 | 0.387 2 | 0.626 19 | 0.417 17 | 0.556 7 | 0.585 9 | 0.702 3 | 0.600 16 | 1.000 1 | 0.824 1 | 0.720 21 | 0.692 7 | 1.000 1 | 0.509 12 |
Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui Jia: Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks. ICCV2021 | ||||||||||||||||||||
SphereSeg | 0.680 8 | 1.000 1 | 0.856 2 | 0.744 14 | 0.618 7 | 0.893 1 | 0.151 13 | 0.651 17 | 0.713 1 | 0.537 8 | 0.579 11 | 0.430 18 | 0.651 6 | 1.000 1 | 0.389 29 | 0.744 17 | 0.697 6 | 0.991 24 | 0.601 4 | |
MaskVoteNet_Coarse | 0.677 9 | 1.000 1 | 0.847 4 | 0.771 10 | 0.509 20 | 0.816 9 | 0.277 5 | 0.558 26 | 0.482 10 | 0.562 6 | 0.640 4 | 0.448 14 | 0.700 4 | 1.000 1 | 0.666 6 | 0.852 3 | 0.578 19 | 0.997 19 | 0.488 17 | |
OccuSeg+instance | 0.672 10 | 1.000 1 | 0.758 24 | 0.682 17 | 0.576 12 | 0.842 7 | 0.477 1 | 0.504 29 | 0.524 9 | 0.567 5 | 0.585 10 | 0.451 13 | 0.557 22 | 1.000 1 | 0.751 3 | 0.797 8 | 0.563 22 | 1.000 1 | 0.467 20 | |
Lei Han, Tian Zheng, Lan Xu, Lu Fang: OccuSeg: Occupancy-aware 3D Instance Segmentation. CVPR2020 | ||||||||||||||||||||
Mask-Group | 0.664 11 | 1.000 1 | 0.822 7 | 0.764 13 | 0.616 8 | 0.815 10 | 0.139 17 | 0.694 15 | 0.597 7 | 0.459 15 | 0.566 12 | 0.599 7 | 0.600 16 | 0.516 35 | 0.715 5 | 0.819 7 | 0.635 13 | 1.000 1 | 0.603 3 | |
Min Zhong, Xinghao Chen, Xiaokang Chen, Gang Zeng, Yunhe Wang: MaskGroup: Hierarchical Point Grouping and Masking for 3D Instance Segmentation. ICME 2022 | ||||||||||||||||||||
INS-Conv-instance | 0.657 12 | 1.000 1 | 0.760 22 | 0.667 19 | 0.581 10 | 0.863 4 | 0.323 3 | 0.655 16 | 0.477 11 | 0.473 13 | 0.549 14 | 0.432 17 | 0.650 7 | 1.000 1 | 0.655 9 | 0.738 18 | 0.585 18 | 0.944 28 | 0.472 19 | |
CSC-Pretrained | 0.648 13 | 1.000 1 | 0.810 9 | 0.768 11 | 0.523 19 | 0.813 11 | 0.143 16 | 0.819 4 | 0.389 18 | 0.422 22 | 0.511 18 | 0.443 15 | 0.650 7 | 1.000 1 | 0.624 14 | 0.732 19 | 0.634 14 | 1.000 1 | 0.375 26 | |
PE | 0.645 14 | 1.000 1 | 0.773 19 | 0.798 8 | 0.538 16 | 0.786 20 | 0.088 24 | 0.799 7 | 0.350 22 | 0.435 21 | 0.547 15 | 0.545 9 | 0.646 13 | 0.933 20 | 0.562 17 | 0.761 13 | 0.556 27 | 0.997 19 | 0.501 15 | |
Biao Zhang, Peter Wonka: Point Cloud Instance Segmentation using Probabilistic Embeddings. CVPR 2021 | ||||||||||||||||||||
RPGN | 0.643 15 | 1.000 1 | 0.758 23 | 0.582 27 | 0.539 15 | 0.826 8 | 0.046 28 | 0.765 9 | 0.372 20 | 0.436 20 | 0.588 7 | 0.539 10 | 0.650 7 | 1.000 1 | 0.577 15 | 0.750 15 | 0.653 12 | 0.997 19 | 0.495 16 | |
Dyco3D | ![]() | 0.641 16 | 1.000 1 | 0.841 5 | 0.893 2 | 0.531 17 | 0.802 15 | 0.115 21 | 0.588 24 | 0.448 14 | 0.438 18 | 0.537 17 | 0.430 19 | 0.550 23 | 0.857 22 | 0.534 18 | 0.764 12 | 0.657 10 | 0.987 25 | 0.568 6 |
Tong He; Chunhua Shen; Anton van den Hengel: DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution. CVPR2021 | ||||||||||||||||||||
GICN | 0.638 17 | 1.000 1 | 0.895 1 | 0.800 7 | 0.480 23 | 0.676 24 | 0.144 15 | 0.737 13 | 0.354 21 | 0.447 16 | 0.400 27 | 0.365 24 | 0.700 4 | 1.000 1 | 0.569 16 | 0.836 5 | 0.599 16 | 1.000 1 | 0.473 18 | |
PointGroup | 0.636 18 | 1.000 1 | 0.765 20 | 0.624 21 | 0.505 22 | 0.797 16 | 0.116 20 | 0.696 14 | 0.384 19 | 0.441 17 | 0.559 13 | 0.476 11 | 0.596 19 | 1.000 1 | 0.666 6 | 0.756 14 | 0.556 26 | 0.997 19 | 0.513 11 | |
Li Jiang, Hengshuang Zhao, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia: PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation. CVPR 2020 [oral] | ||||||||||||||||||||
DD-UNet+Group | 0.635 19 | 0.667 29 | 0.797 15 | 0.714 16 | 0.562 13 | 0.774 21 | 0.146 14 | 0.810 6 | 0.429 16 | 0.476 12 | 0.546 16 | 0.399 21 | 0.633 14 | 1.000 1 | 0.632 13 | 0.722 20 | 0.609 15 | 1.000 1 | 0.514 10 | |
H. Liu, R. Liu, K. Yang, J. Zhang, K. Peng, R. Stiefelhagen: HIDA: Towards Holistic Indoor Understanding for the Visually Impaired via Semantic Instance Segmentation with a Wearable Solid-State LiDAR Sensor. ICCVW 2021 | ||||||||||||||||||||
DENet | 0.629 20 | 1.000 1 | 0.797 14 | 0.608 22 | 0.589 9 | 0.627 28 | 0.219 10 | 0.882 1 | 0.310 24 | 0.402 26 | 0.383 29 | 0.396 22 | 0.650 7 | 1.000 1 | 0.663 8 | 0.543 35 | 0.691 8 | 1.000 1 | 0.568 7 | |
3D-MPA | 0.611 21 | 1.000 1 | 0.833 6 | 0.765 12 | 0.526 18 | 0.756 22 | 0.136 19 | 0.588 24 | 0.470 12 | 0.438 19 | 0.432 25 | 0.358 25 | 0.650 7 | 0.857 22 | 0.429 25 | 0.765 11 | 0.557 25 | 1.000 1 | 0.430 22 | |
Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner: 3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation. CVPR 2020 | ||||||||||||||||||||
PCJC | 0.578 22 | 1.000 1 | 0.810 10 | 0.583 26 | 0.449 26 | 0.813 12 | 0.042 29 | 0.603 22 | 0.341 23 | 0.490 11 | 0.465 21 | 0.410 20 | 0.650 7 | 0.835 28 | 0.264 34 | 0.694 25 | 0.561 23 | 0.889 32 | 0.504 14 | |
SSEN | 0.575 23 | 1.000 1 | 0.761 21 | 0.473 29 | 0.477 24 | 0.795 17 | 0.066 25 | 0.529 27 | 0.658 3 | 0.460 14 | 0.461 22 | 0.380 23 | 0.331 34 | 0.859 21 | 0.401 28 | 0.692 26 | 0.653 11 | 1.000 1 | 0.348 28 | |
Dongsu Zhang, Junha Chun, Sang Kyun Cha, Young Min Kim: Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning. Arxiv | ||||||||||||||||||||
RWSeg | 0.567 24 | 0.528 37 | 0.708 29 | 0.626 20 | 0.580 11 | 0.745 23 | 0.063 26 | 0.627 18 | 0.240 28 | 0.400 27 | 0.497 19 | 0.464 12 | 0.515 24 | 1.000 1 | 0.475 22 | 0.745 16 | 0.571 20 | 1.000 1 | 0.429 23 | |
MTML | 0.549 25 | 1.000 1 | 0.807 12 | 0.588 25 | 0.327 30 | 0.647 26 | 0.004 34 | 0.815 5 | 0.180 30 | 0.418 23 | 0.364 30 | 0.182 29 | 0.445 28 | 1.000 1 | 0.442 24 | 0.688 27 | 0.571 21 | 1.000 1 | 0.396 24 | |
Jean Lahoud, Bernard Ghanem, Marc Pollefeys, Martin R. Oswald: 3D Instance Segmentation via Multi-task Metric Learning. ICCV 2019 [oral] | ||||||||||||||||||||
Sparse R-CNN | 0.515 26 | 1.000 1 | 0.538 37 | 0.282 32 | 0.468 25 | 0.790 18 | 0.173 12 | 0.345 33 | 0.429 15 | 0.413 25 | 0.484 20 | 0.176 30 | 0.595 20 | 0.591 33 | 0.522 19 | 0.668 28 | 0.476 31 | 0.986 26 | 0.327 29 | |
Occipital-SCS | 0.512 27 | 1.000 1 | 0.716 26 | 0.509 28 | 0.506 21 | 0.611 29 | 0.092 23 | 0.602 23 | 0.177 31 | 0.346 30 | 0.383 28 | 0.165 31 | 0.442 29 | 0.850 27 | 0.386 30 | 0.618 31 | 0.543 28 | 0.889 32 | 0.389 25 | |
3D-BoNet | 0.488 28 | 1.000 1 | 0.672 32 | 0.590 24 | 0.301 32 | 0.484 38 | 0.098 22 | 0.620 20 | 0.306 25 | 0.341 31 | 0.259 34 | 0.125 33 | 0.434 31 | 0.796 29 | 0.402 27 | 0.499 37 | 0.513 30 | 0.909 31 | 0.439 21 | |
Bo Yang, Jianan Wang, Ronald Clark, Qingyong Hu, Sen Wang, Andrew Markham, Niki Trigoni: Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds. NeurIPS 2019 Spotlight | ||||||||||||||||||||
PanopticFusion-inst | 0.478 29 | 0.667 29 | 0.712 28 | 0.595 23 | 0.259 34 | 0.550 35 | 0.000 37 | 0.613 21 | 0.175 32 | 0.250 36 | 0.434 23 | 0.437 16 | 0.411 33 | 0.857 22 | 0.485 21 | 0.591 34 | 0.267 40 | 0.944 28 | 0.359 27 | |
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. IROS 2019 (to appear) | ||||||||||||||||||||
SPG_WSIS | 0.470 30 | 0.667 29 | 0.685 31 | 0.677 18 | 0.372 28 | 0.562 33 | 0.000 37 | 0.482 30 | 0.244 27 | 0.316 33 | 0.298 31 | 0.052 39 | 0.442 30 | 0.857 22 | 0.267 33 | 0.702 22 | 0.559 24 | 1.000 1 | 0.287 31 | |
SALoss-ResNet | 0.459 31 | 1.000 1 | 0.737 25 | 0.159 41 | 0.259 33 | 0.587 31 | 0.138 18 | 0.475 31 | 0.217 29 | 0.416 24 | 0.408 26 | 0.128 32 | 0.315 35 | 0.714 30 | 0.411 26 | 0.536 36 | 0.590 17 | 0.873 35 | 0.304 30 | |
Zhidong Liang, Ming Yang, Hao Li, Chunxiang Wang: 3D Instance Embedding Learning With a Structure-Aware Loss Function for Point Cloud Segmentation. IEEE Robotics and Automation Letters (IROS2020) | ||||||||||||||||||||
MASC | ![]() | 0.447 32 | 0.528 37 | 0.555 35 | 0.381 30 | 0.382 27 | 0.633 27 | 0.002 35 | 0.509 28 | 0.260 26 | 0.361 29 | 0.432 24 | 0.327 26 | 0.451 27 | 0.571 34 | 0.367 31 | 0.639 29 | 0.386 32 | 0.980 27 | 0.276 32 |
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation. | ||||||||||||||||||||
SegGroup_ins | ![]() | 0.445 33 | 0.667 29 | 0.773 18 | 0.185 38 | 0.317 31 | 0.656 25 | 0.000 37 | 0.407 32 | 0.134 33 | 0.381 28 | 0.267 33 | 0.217 28 | 0.476 26 | 0.714 30 | 0.452 23 | 0.629 30 | 0.514 29 | 1.000 1 | 0.222 35 |
An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie Zhou: SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation. | ||||||||||||||||||||
3D-SIS | ![]() | 0.382 34 | 1.000 1 | 0.432 39 | 0.245 34 | 0.190 35 | 0.577 32 | 0.013 32 | 0.263 35 | 0.033 39 | 0.320 32 | 0.240 35 | 0.075 35 | 0.422 32 | 0.857 22 | 0.117 37 | 0.699 23 | 0.271 39 | 0.883 34 | 0.235 34 |
Ji Hou, Angela Dai, Matthias Niessner: 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. CVPR 2019 | ||||||||||||||||||||
Hier3D | ![]() | 0.323 35 | 0.667 29 | 0.542 36 | 0.264 33 | 0.157 38 | 0.550 34 | 0.000 37 | 0.205 38 | 0.009 40 | 0.270 35 | 0.218 36 | 0.075 35 | 0.500 25 | 0.688 32 | 0.007 43 | 0.698 24 | 0.301 36 | 0.459 40 | 0.200 36 |
Tan: HCFS3D: Hierarchical Coupled Feature Selection Network for 3D Semantic and Instance Segmentation. | ||||||||||||||||||||
UNet-backbone | 0.319 36 | 0.667 29 | 0.715 27 | 0.233 35 | 0.189 36 | 0.479 39 | 0.008 33 | 0.218 36 | 0.067 38 | 0.201 37 | 0.173 37 | 0.107 34 | 0.123 40 | 0.438 36 | 0.150 35 | 0.615 32 | 0.355 33 | 0.916 30 | 0.093 42 | |
R-PointNet | 0.306 37 | 0.500 39 | 0.405 40 | 0.311 31 | 0.348 29 | 0.589 30 | 0.054 27 | 0.068 41 | 0.126 34 | 0.283 34 | 0.290 32 | 0.028 40 | 0.219 38 | 0.214 39 | 0.331 32 | 0.396 41 | 0.275 37 | 0.821 37 | 0.245 33 | |
SemRegionNet | 0.250 38 | 0.333 40 | 0.613 33 | 0.229 36 | 0.163 37 | 0.493 36 | 0.000 37 | 0.304 34 | 0.107 35 | 0.147 39 | 0.100 38 | 0.052 38 | 0.231 36 | 0.119 40 | 0.039 39 | 0.445 39 | 0.325 34 | 0.654 38 | 0.141 38 | |
3D-BEVIS | 0.248 39 | 0.667 29 | 0.566 34 | 0.076 42 | 0.035 43 | 0.394 41 | 0.027 31 | 0.035 42 | 0.098 36 | 0.099 41 | 0.030 42 | 0.025 41 | 0.098 41 | 0.375 38 | 0.126 36 | 0.604 33 | 0.181 41 | 0.854 36 | 0.171 37 | |
Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe: 3D-BEVIS: Birds-Eye-View Instance Segmentation. | ||||||||||||||||||||
Region | 0.248 39 | 0.667 29 | 0.437 38 | 0.188 37 | 0.153 39 | 0.491 37 | 0.000 37 | 0.208 37 | 0.094 37 | 0.153 38 | 0.099 39 | 0.057 37 | 0.217 39 | 0.119 40 | 0.039 39 | 0.466 38 | 0.302 35 | 0.640 39 | 0.140 39 | |
ASIS | 0.199 41 | 0.333 40 | 0.253 42 | 0.167 40 | 0.140 40 | 0.438 40 | 0.000 37 | 0.177 39 | 0.008 41 | 0.121 40 | 0.069 40 | 0.004 43 | 0.231 37 | 0.429 37 | 0.036 41 | 0.445 40 | 0.273 38 | 0.333 42 | 0.119 41 | |
Sgpn_scannet | 0.143 42 | 0.208 43 | 0.390 41 | 0.169 39 | 0.065 41 | 0.275 42 | 0.029 30 | 0.069 40 | 0.000 42 | 0.087 42 | 0.043 41 | 0.014 42 | 0.027 43 | 0.000 42 | 0.112 38 | 0.351 42 | 0.168 42 | 0.438 41 | 0.138 40 | |
MaskRCNN 2d->3d Proj | 0.058 43 | 0.333 40 | 0.002 43 | 0.000 43 | 0.053 42 | 0.002 43 | 0.002 36 | 0.021 43 | 0.000 42 | 0.045 43 | 0.024 43 | 0.238 27 | 0.065 42 | 0.000 42 | 0.014 42 | 0.107 43 | 0.020 43 | 0.110 43 | 0.006 43 | |