This table lists the benchmark results for the 3D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MVPNet0.641 40.831 30.715 50.671 60.590 60.781 50.394 80.679 90.642 10.553 50.937 100.462 40.256 60.649 30.406 130.626 50.691 50.666 10.877 50.792 60.608 5
MinkowskiNetpermissive0.734 10.858 10.833 10.834 20.716 20.855 20.459 30.836 10.639 20.641 10.953 20.541 20.302 20.743 10.865 20.726 10.771 30.664 20.891 30.851 20.694 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
KP-FCNN0.684 30.847 20.758 40.784 30.647 30.814 40.473 20.772 30.605 30.594 30.935 110.450 50.181 150.587 40.805 30.690 30.785 20.614 40.882 40.819 30.632 3
SparseConvNet0.725 20.647 110.821 20.846 10.721 10.869 10.533 10.754 40.603 40.614 20.955 10.572 10.325 10.710 20.870 10.724 20.823 10.628 30.934 10.865 10.683 2
LAP-D0.594 70.720 60.692 80.637 110.456 120.773 60.391 90.730 50.587 50.445 100.940 80.381 80.288 40.434 120.453 110.591 70.649 70.581 50.777 120.749 110.610 4
DPC0.592 80.720 60.700 60.602 140.480 90.762 80.380 110.713 60.585 60.437 110.940 80.369 90.288 40.434 120.509 90.590 80.639 110.567 70.772 130.755 90.592 9
HPEIN0.618 60.729 50.668 100.647 90.597 50.766 70.414 60.680 80.520 70.525 60.946 50.432 60.215 110.493 90.599 60.638 40.617 130.570 60.897 20.806 50.605 6
DVVNet0.562 110.648 100.700 60.770 40.586 70.687 150.333 130.650 110.514 80.475 80.906 200.359 100.223 100.340 160.442 120.422 190.668 60.501 110.708 170.779 70.534 13
CCRFNet0.589 90.766 40.659 120.683 50.470 110.740 90.387 100.620 120.490 90.476 70.922 150.355 120.245 70.511 70.511 80.571 100.643 90.493 120.872 60.762 80.600 7
FCPNpermissive0.447 180.679 80.604 180.578 160.380 160.682 160.291 170.106 250.483 100.258 230.920 160.258 200.025 250.231 220.325 150.480 160.560 160.463 150.725 160.666 180.231 25
Dario Rethage, Johanna Wald, Jürgen Sturm, Nassir Navab, Federico Tombari: Fully-Convolutional Point Networks for Large-Scale Point Clouds. ECCV 2018
3DMV, FTSDF0.501 140.558 170.608 170.424 230.478 100.690 140.246 200.586 130.468 110.450 90.911 180.394 70.160 180.438 110.212 200.432 180.541 180.475 130.742 150.727 120.477 15
joint point-based0.634 50.614 130.778 30.667 80.633 40.825 30.420 50.804 20.467 120.561 40.951 30.494 30.291 30.566 50.458 100.579 90.764 40.559 80.838 80.814 40.598 8
PointConv_withoutRGB0.540 120.623 120.535 210.543 180.321 210.735 100.409 70.533 150.453 130.381 130.949 40.312 150.174 160.482 100.627 50.616 60.640 100.517 90.849 70.655 190.440 18
PCNN0.498 150.559 160.644 140.560 170.420 140.711 130.229 220.414 160.436 140.352 150.941 70.324 140.155 190.238 200.387 140.493 130.529 190.509 100.813 90.751 100.504 14
PanopticFusion-label0.529 130.491 200.688 90.604 130.386 150.632 200.225 240.705 70.434 150.293 180.815 230.348 130.241 80.499 80.669 40.507 120.649 70.442 180.796 100.602 220.561 11
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
3DMV0.484 160.484 210.538 200.643 100.424 130.606 230.310 140.574 140.433 160.378 140.796 240.301 160.214 120.537 60.208 210.472 170.507 220.413 210.693 180.602 220.539 12
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
TextureNetpermissive0.566 100.672 90.664 110.671 60.494 80.719 110.445 40.678 100.411 170.396 120.935 110.356 110.225 90.412 140.535 70.565 110.636 120.464 140.794 110.680 160.568 10
Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkerhouser, Matthias Niessner, Leonidas Guibas: TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes. CVPR
ScanNet+FTSDF0.383 230.297 250.491 230.432 220.358 190.612 220.274 180.116 240.411 170.265 210.904 210.229 220.079 230.250 180.185 230.320 240.510 200.385 220.548 230.597 240.394 20
PNET20.442 190.548 180.548 190.597 150.363 180.628 210.300 150.292 200.374 190.307 170.881 220.268 190.186 140.238 200.204 220.407 200.506 230.449 170.667 190.620 210.462 17
SurfaceConvPF0.442 190.505 190.622 150.380 240.342 200.654 180.227 230.397 180.367 200.276 200.924 140.240 210.198 130.359 150.262 170.366 210.581 140.435 190.640 200.668 170.398 19
Hao Pan, Shilin Liu, Yang Liu, Xin Tong: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames.
ScanNetpermissive0.306 260.203 260.366 250.501 190.311 230.524 250.211 250.002 270.342 210.189 250.786 250.145 260.102 220.245 190.152 240.318 250.348 250.300 250.460 260.437 260.182 26
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
PointCNN with RGBpermissive0.458 170.577 150.611 160.356 250.321 210.715 120.299 160.376 190.328 220.319 160.944 60.285 180.164 170.216 230.229 190.484 150.545 170.456 160.755 140.709 130.475 16
Yangyan Li, Rui Bu, Mingchao Sun, Baoquan Chen: PointCNN. NeurIPS 2018
SPLAT Netcopyleft0.393 220.472 220.511 220.606 120.311 230.656 170.245 210.405 170.328 220.197 240.927 130.227 230.000 270.001 270.249 180.271 260.510 200.383 230.593 220.699 140.267 23
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz: SPLATNet: Sparse Lattice Networks for Point Cloud Processing. CVPR 2018
Tangent Convolutionspermissive0.438 210.437 230.646 130.474 200.369 170.645 190.353 120.258 220.282 240.279 190.918 170.298 170.147 200.283 170.294 160.487 140.562 150.427 200.619 210.633 200.352 21
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou: Tangent convolutions for dense prediction in 3d. CVPR 2018
PointNet++permissive0.339 240.584 140.478 240.458 210.256 250.360 260.250 190.247 230.278 250.261 220.677 260.183 240.117 210.212 240.145 250.364 220.346 260.232 260.548 230.523 250.252 24
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas: pointnet++: deep hierarchical feature learning on point sets in a metric space.
SSC-UNetpermissive0.308 250.353 240.290 260.278 260.166 260.553 240.169 260.286 210.147 260.148 260.908 190.182 250.064 240.023 260.018 270.354 230.363 240.345 240.546 250.685 150.278 22
ERROR0.054 270.000 270.041 270.172 270.030 270.062 270.001 270.035 260.004 270.051 270.143 270.019 270.003 260.041 250.050 260.003 270.054 270.018 270.005 270.264 270.082 27

This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MASCpermissive0.254 20.463 30.249 60.113 30.167 10.412 30.000 100.374 10.073 10.173 20.243 10.130 20.228 10.368 30.160 10.356 30.208 30.711 20.136 4
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
ResNet-backbone0.262 10.667 10.335 10.067 80.123 30.427 20.022 10.280 30.058 20.216 10.211 20.039 60.142 40.519 20.106 40.338 50.310 10.721 10.138 3
MTML0.247 30.435 40.316 30.102 50.094 50.498 10.000 90.254 40.047 30.154 30.122 50.039 50.196 20.655 10.056 50.414 10.259 20.629 40.170 1
R-PointNet0.158 70.356 60.173 70.113 40.140 20.359 40.012 20.023 80.039 40.134 40.123 40.008 80.089 60.149 70.117 30.221 70.128 60.563 60.094 5
3D-BEVIS0.117 80.250 70.308 40.020 100.009 110.269 70.006 40.008 90.029 50.037 80.014 100.003 100.036 80.147 80.042 70.381 20.118 70.362 80.069 7
Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe: 3D-BEVIS: Birds-Eye-View Instance Segmentation.
Seg-Clusterpermissive0.098 90.141 100.148 90.136 20.037 80.197 90.003 50.140 50.027 60.032 90.046 80.001 110.020 90.101 90.039 80.166 90.180 50.292 90.051 9
PanopticFusion-inst0.214 40.250 70.330 20.275 10.103 40.228 80.000 110.345 20.024 70.088 60.203 30.186 10.167 30.367 40.125 20.221 80.112 90.666 30.162 2
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
UNet-backbone0.161 50.519 20.259 50.084 60.059 60.325 60.002 60.093 70.009 80.077 70.064 70.045 40.044 70.161 60.045 60.331 60.180 40.566 50.033 10
3D-SISpermissive0.161 50.407 50.155 80.068 70.043 70.346 50.001 70.134 60.005 90.088 50.106 60.037 70.135 50.321 50.028 90.339 40.116 80.466 70.093 6
Ji Hou, Angela Dai, Matthias Niessner: 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. CVPR 2019
Sgpn_scannet0.049 100.023 110.134 100.031 90.013 100.144 100.006 30.008 100.000 100.028 100.017 90.003 90.009 110.000 100.021 100.122 100.095 100.175 100.054 8
MaskRCNN 2d->3d Proj0.022 110.185 90.000 110.000 110.015 90.000 110.000 80.006 110.000 100.010 110.006 110.107 30.012 100.000 100.002 110.027 110.004 110.022 110.001 11

This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SSMAcopyleft0.577 10.695 10.716 20.439 40.563 10.314 30.444 10.719 10.551 10.503 10.887 30.346 10.348 20.603 10.353 30.709 10.600 30.457 20.901 10.786 10.599 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
3DMV (2d proj)0.498 40.481 50.612 40.579 20.456 40.343 10.384 30.623 30.525 20.381 40.845 40.254 40.264 40.557 20.182 50.581 50.598 40.429 40.760 50.661 60.446 5
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
FuseNetpermissive0.535 20.570 30.681 30.182 70.512 20.290 40.431 20.659 20.504 30.495 20.903 20.308 20.428 10.523 30.365 20.676 20.621 20.470 10.762 40.779 20.541 3
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 30.613 20.722 10.418 50.358 70.337 20.370 50.479 50.443 40.368 50.907 10.207 50.213 60.464 50.525 10.618 30.657 10.450 30.788 30.721 40.408 6
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
ScanNet (2d proj)permissive0.330 70.293 60.521 60.657 10.361 60.161 60.250 60.004 70.440 50.183 70.836 50.125 70.060 70.319 70.132 60.417 60.412 60.344 60.541 70.427 70.109 7
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
ILC-PSPNet0.475 50.490 40.581 50.289 60.507 30.067 70.379 40.610 40.417 60.435 30.822 60.278 30.267 30.503 40.228 40.616 40.533 50.375 50.820 20.729 30.560 2
Enet (reimpl)0.376 60.264 70.452 70.452 30.365 50.181 50.143 70.456 60.409 70.346 60.769 70.164 60.218 50.359 60.123 70.403 70.381 70.313 70.571 60.685 50.472 4
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MaskRCNN_ScanNetpermissive0.119 10.129 10.212 10.002 10.112 10.148 10.014 10.205 10.044 10.066 10.078 10.095 10.142 10.030 10.128 10.139 10.080 10.459 10.057 1
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17

This table lists the benchmark results for the scene type classification scenario.




Method Infoavg recallapartmentbathroombedroom / hotelbookstore / libraryconference roomcopy/mail roomhallwaykitchenlaundry roomliving room / loungemiscofficestorage / basement / garage
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort by
resnet50_scannet0.353 20.250 10.812 10.529 20.500 10.500 10.000 20.500 10.571 10.000 20.556 20.000 20.375 20.000 1
SE-ResNeXt-SSMA0.498 10.000 20.812 10.941 10.500 10.500 10.500 10.500 10.429 20.500 10.667 10.500 10.625 10.000 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv