This table lists the benchmark results for the 3D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SparseConvNet0.725 10.647 50.821 10.846 10.721 10.869 10.533 10.754 40.603 20.614 20.955 10.572 10.325 10.710 10.870 20.724 10.823 10.628 20.934 10.865 10.683 1
MinkowskiNet340.679 30.811 20.734 40.739 40.641 30.804 40.413 60.759 30.696 10.545 40.938 70.518 20.141 150.623 20.757 30.680 30.723 40.684 10.896 30.821 30.651 2
KP-FCNN0.694 20.849 10.770 30.810 20.685 20.813 30.438 30.791 20.566 30.616 10.944 30.500 30.216 60.559 40.880 10.690 20.758 30.627 30.922 20.832 20.613 3
joint point-based0.634 40.614 70.778 20.667 60.633 40.825 20.420 50.804 10.467 60.561 30.951 20.494 40.291 20.566 30.458 70.579 40.764 20.559 40.838 40.814 40.598 4
3DMV, FTSDF0.501 90.558 110.608 130.424 170.478 70.690 90.246 140.586 80.468 50.450 60.911 120.394 50.160 120.438 80.212 140.432 120.541 120.475 80.742 100.727 80.477 11
PointConv0.556 70.636 60.640 100.574 110.472 80.739 50.430 40.433 100.418 100.445 70.944 30.372 60.185 100.464 70.575 50.540 60.639 70.505 60.827 50.762 60.515 9
DVVNet0.562 60.648 40.700 50.770 30.586 50.687 100.333 80.650 70.514 40.475 50.906 140.359 70.223 50.340 110.442 80.422 130.668 50.501 70.708 110.779 50.534 8
TextureNet0.566 50.672 30.664 70.671 50.494 60.719 60.445 20.678 60.411 110.396 80.935 80.356 80.225 40.412 90.535 60.565 50.636 80.464 90.794 80.680 120.568 5
PanopticFusion-label0.529 80.491 140.688 60.604 90.386 110.632 140.225 180.705 50.434 80.293 130.815 170.348 90.241 30.499 60.669 40.507 70.649 60.442 120.796 70.602 160.561 6
PCNN0.498 100.559 100.644 90.560 120.420 100.711 80.229 160.414 110.436 70.352 100.941 60.324 100.155 130.238 150.387 90.493 80.529 130.509 50.813 60.751 70.504 10
3DMV0.484 110.484 150.538 150.643 70.424 90.606 170.310 90.574 90.433 90.378 90.796 180.301 110.214 70.537 50.208 150.472 110.507 160.413 150.693 120.602 160.539 7
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
Tangent Convolutionspermissive0.438 150.437 170.646 80.474 140.369 120.645 130.353 70.258 170.282 180.279 140.918 110.298 120.147 140.283 120.294 100.487 90.562 100.427 140.619 150.633 140.352 16
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou: Tangent convolutions for dense prediction in 3d. CVPR 2018
PointCNN with RGBpermissive0.458 120.577 90.611 120.356 190.321 160.715 70.299 110.376 140.328 160.319 110.944 30.285 130.164 110.216 170.229 130.484 100.545 110.456 100.755 90.709 90.475 12
Yangyan Li, Rui Bu, Mingchao Sun, Baoquan Chen: PointCNN. NeurIPS 2018
PNET20.442 130.548 120.548 140.597 100.363 130.628 150.300 100.292 150.374 130.307 120.881 160.268 140.186 90.238 150.204 160.407 140.506 170.449 110.667 130.620 150.462 13
SurfaceConvPF0.442 130.505 130.622 110.380 180.342 150.654 120.227 170.397 130.367 140.276 150.924 100.240 150.198 80.359 100.262 110.366 150.581 90.435 130.640 140.668 130.398 14
Hao Pan, Shilin Liu, Yang Liu, Xin Tong: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames.
ScanNet+FTSDF0.383 170.297 190.491 170.432 160.358 140.612 160.274 120.116 190.411 110.265 160.904 150.229 160.079 180.250 130.185 170.320 180.510 140.385 160.548 170.597 180.394 15
SPLAT Netcopyleft0.393 160.472 160.511 160.606 80.311 170.656 110.245 150.405 120.328 160.197 180.927 90.227 170.000 200.001 200.249 120.271 200.510 140.383 170.593 160.699 100.267 18
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz: SPLATNet: Sparse Lattice Networks for Point Cloud Processing. CVPR 2018
PointNet++permissive0.339 180.584 80.478 180.458 150.256 190.360 200.250 130.247 180.278 190.261 170.677 200.183 180.117 160.212 180.145 190.364 160.346 200.232 200.548 170.523 190.252 19
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas: pointnet++: deep hierarchical feature learning on point sets in a metric space.
SSC-UNetpermissive0.308 190.353 180.290 200.278 200.166 200.553 180.169 200.286 160.147 200.148 200.908 130.182 190.064 190.023 190.018 200.354 170.363 180.345 180.546 190.685 110.278 17
ScanNetpermissive0.306 200.203 200.366 190.501 130.311 170.524 190.211 190.002 200.342 150.189 190.786 190.145 200.102 170.245 140.152 180.318 190.348 190.300 190.460 200.437 200.182 20
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg ap 50%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort by
PanopticFusion-inst0.478 10.667 30.712 30.595 10.259 40.550 50.000 100.613 10.175 30.250 50.434 10.437 10.411 30.857 10.485 10.591 50.267 70.944 20.359 1
MASCpermissive0.447 30.528 70.555 60.381 20.382 10.633 10.002 80.509 20.260 10.361 20.432 20.327 20.451 10.571 40.367 30.639 20.386 20.980 10.276 3
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
ResNet-backbone0.459 21.000 10.737 10.159 90.259 30.587 30.138 10.475 30.217 20.416 10.408 30.128 40.315 40.714 30.411 20.536 60.590 10.873 50.304 2
R-PointNet0.306 60.500 80.405 80.311 40.348 20.589 20.054 20.068 90.126 40.283 40.290 40.028 70.219 50.214 80.331 40.396 80.275 50.821 70.245 4
3D-SIS0.382 41.000 10.432 70.245 60.190 50.577 40.013 60.263 50.033 80.320 30.240 50.075 60.422 20.857 10.117 70.699 10.271 60.883 40.235 5
UNet-backbone0.319 50.667 30.715 20.233 70.189 60.479 60.008 70.218 60.067 70.201 60.173 60.107 50.123 70.438 50.150 50.615 30.355 30.916 30.093 9
Seg-Clusterpermissive0.215 80.370 90.337 100.285 50.105 70.325 90.025 50.282 40.085 60.105 70.107 70.007 100.079 90.317 70.114 80.309 100.304 40.587 80.123 8
Sgpn_scannet0.143 100.208 110.390 90.169 80.065 80.275 100.029 30.069 80.000 100.087 100.043 80.014 90.027 110.000 100.112 90.351 90.168 90.438 100.138 7
3D-BEVIS0.248 70.667 30.566 50.076 100.035 100.394 70.027 40.035 100.098 50.099 90.030 90.025 80.098 80.375 60.126 60.604 40.181 80.854 60.171 6
MTML0.212 90.667 30.614 40.337 30.027 110.390 80.000 100.118 70.001 90.100 80.028 100.000 110.167 60.143 90.046 100.500 70.105 100.570 90.003 11
MaskRCNN 2d->3d Proj0.058 110.333 100.002 110.000 110.053 90.002 110.002 90.021 110.000 100.045 110.024 110.238 30.065 100.000 100.014 110.107 110.020 110.110 110.006 10

This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SSMAcopyleft0.577 10.695 10.716 20.439 40.563 10.314 30.444 10.719 10.551 10.503 10.887 30.346 10.348 20.603 10.353 30.709 10.600 20.457 10.901 10.786 10.599 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
FuseNetpermissive0.521 20.591 30.682 30.220 70.488 30.279 40.344 50.610 30.461 30.475 20.910 10.293 20.447 10.512 30.397 20.618 20.567 40.452 20.734 50.782 20.566 2
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
ILC-PSPNet0.475 50.490 40.581 50.289 60.507 20.067 70.379 30.610 30.417 60.435 30.822 60.278 30.267 30.503 40.228 40.616 40.533 50.375 50.820 20.729 30.560 3
3DMV (2d proj)0.498 40.481 50.612 40.579 20.456 40.343 10.384 20.623 20.525 20.381 40.845 40.254 40.264 40.557 20.182 50.581 50.598 30.429 40.760 40.661 60.446 5
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
AdapNet++copyleft0.503 30.613 20.722 10.418 50.358 70.337 20.370 40.479 50.443 40.368 50.907 20.207 50.213 60.464 50.525 10.618 20.657 10.450 30.788 30.721 40.408 6
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
Enet (reimpl)0.376 60.264 70.452 70.452 30.365 50.181 50.143 70.456 60.409 70.346 60.769 70.164 60.218 50.359 60.123 70.403 70.381 70.313 70.571 60.685 50.472 4
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 70.293 60.521 60.657 10.361 60.161 60.250 60.004 70.440 50.183 70.836 50.125 70.060 70.319 70.132 60.417 60.412 60.344 60.541 70.427 70.109 7
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort by
MaskRCNN_ScanNetpermissive0.119 10.129 10.212 10.002 10.112 10.148 10.014 10.205 10.044 10.066 10.078 10.095 10.142 10.030 10.128 10.139 10.080 10.459 10.057 1
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17

This table lists the benchmark results for the scene type classification scenario.




Method Infoavg recallapartmentbathroombedroom / hotelbookstore / libraryconference roomcopy/mail roomhallwaykitchenlaundry roomliving room / loungemiscofficestorage / basement / garage
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SE-ResNeXt-SSMA0.498 10.000 20.812 10.941 10.500 10.500 10.500 10.500 10.429 20.500 10.667 10.500 10.625 10.000 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
resnet50_scannet0.353 20.250 10.812 10.529 20.500 10.500 10.000 20.500 10.571 10.000 20.556 20.000 20.375 20.000 1