This table lists the benchmark results for the 3D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MVPNet0.641 40.831 30.715 50.671 60.590 60.781 50.394 80.679 90.642 10.553 50.937 100.462 40.256 60.649 30.406 130.626 50.691 50.666 10.877 50.792 60.608 5
MinkowskiNetpermissive0.734 10.858 10.833 10.834 20.716 20.855 20.459 30.836 10.639 20.641 10.953 20.541 20.302 20.743 10.865 20.726 10.771 30.664 20.891 30.851 20.694 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
KP-FCNN0.684 30.847 20.758 40.784 30.647 30.814 40.473 20.772 30.605 30.594 30.935 110.450 50.181 150.587 40.805 30.690 30.785 20.614 40.882 40.819 30.632 3
SparseConvNet0.725 20.647 110.821 20.846 10.721 10.869 10.533 10.754 40.603 40.614 20.955 10.572 10.325 10.710 20.870 10.724 20.823 10.628 30.934 10.865 10.683 2
LAP-D0.594 70.720 60.692 80.637 110.456 120.773 60.391 90.730 50.587 50.445 100.940 80.381 80.288 40.434 120.453 110.591 70.649 70.581 50.777 120.749 110.610 4
DPC0.592 80.720 60.700 60.602 140.480 90.762 80.380 110.713 60.585 60.437 110.940 80.369 90.288 40.434 120.509 90.590 80.639 110.567 70.772 130.755 90.592 9
HPEIN0.618 60.729 50.668 100.647 90.597 50.766 70.414 60.680 80.520 70.525 60.946 50.432 60.215 110.493 90.599 60.638 40.617 130.570 60.897 20.806 50.605 6
DVVNet0.562 110.648 100.700 60.770 40.586 70.687 150.333 130.650 110.514 80.475 80.906 200.359 100.223 100.340 160.442 120.422 190.668 60.501 110.708 170.779 70.534 13
CCRFNet0.589 90.766 40.659 120.683 50.470 110.740 90.387 100.620 120.490 90.476 70.922 150.355 120.245 70.511 70.511 80.571 100.643 90.493 120.872 60.762 80.600 7
FCPNpermissive0.447 180.679 80.604 180.578 160.380 160.682 160.291 170.106 250.483 100.258 230.920 160.258 200.025 250.231 220.325 150.480 160.560 160.463 150.725 160.666 180.231 25
Dario Rethage, Johanna Wald, Jürgen Sturm, Nassir Navab, Federico Tombari: Fully-Convolutional Point Networks for Large-Scale Point Clouds. ECCV 2018
3DMV, FTSDF0.501 140.558 170.608 170.424 230.478 100.690 140.246 200.586 130.468 110.450 90.911 180.394 70.160 180.438 110.212 200.432 180.541 180.475 130.742 150.727 120.477 15
joint point-based0.634 50.614 130.778 30.667 80.633 40.825 30.420 50.804 20.467 120.561 40.951 30.494 30.291 30.566 50.458 100.579 90.764 40.559 80.838 80.814 40.598 8
PointConv_withoutRGB0.540 120.623 120.535 210.543 180.321 210.735 100.409 70.533 150.453 130.381 130.949 40.312 150.174 160.482 100.627 50.616 60.640 100.517 90.849 70.655 190.440 18
PCNN0.498 150.559 160.644 140.560 170.420 140.711 130.229 220.414 160.436 140.352 150.941 70.324 140.155 190.238 200.387 140.493 130.529 190.509 100.813 90.751 100.504 14
PanopticFusion-label0.529 130.491 200.688 90.604 130.386 150.632 200.225 240.705 70.434 150.293 180.815 230.348 130.241 80.499 80.669 40.507 120.649 70.442 180.796 100.602 220.561 11
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
3DMV0.484 160.484 210.538 200.643 100.424 130.606 230.310 140.574 140.433 160.378 140.796 240.301 160.214 120.537 60.208 210.472 170.507 220.413 210.693 180.602 220.539 12
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
TextureNetpermissive0.566 100.672 90.664 110.671 60.494 80.719 110.445 40.678 100.411 170.396 120.935 110.356 110.225 90.412 140.535 70.565 110.636 120.464 140.794 110.680 160.568 10
Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkerhouser, Matthias Niessner, Leonidas Guibas: TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes. CVPR
ScanNet+FTSDF0.383 230.297 250.491 230.432 220.358 190.612 220.274 180.116 240.411 170.265 210.904 210.229 220.079 230.250 180.185 230.320 240.510 200.385 220.548 230.597 240.394 20
PNET20.442 190.548 180.548 190.597 150.363 180.628 210.300 150.292 200.374 190.307 170.881 220.268 190.186 140.238 200.204 220.407 200.506 230.449 170.667 190.620 210.462 17
SurfaceConvPF0.442 190.505 190.622 150.380 240.342 200.654 180.227 230.397 180.367 200.276 200.924 140.240 210.198 130.359 150.262 170.366 210.581 140.435 190.640 200.668 170.398 19
Hao Pan, Shilin Liu, Yang Liu, Xin Tong: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames.
ScanNetpermissive0.306 260.203 260.366 250.501 190.311 230.524 250.211 250.002 270.342 210.189 250.786 250.145 260.102 220.245 190.152 240.318 250.348 250.300 250.460 260.437 260.182 26
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
PointCNN with RGBpermissive0.458 170.577 150.611 160.356 250.321 210.715 120.299 160.376 190.328 220.319 160.944 60.285 180.164 170.216 230.229 190.484 150.545 170.456 160.755 140.709 130.475 16
Yangyan Li, Rui Bu, Mingchao Sun, Baoquan Chen: PointCNN. NeurIPS 2018
SPLAT Netcopyleft0.393 220.472 220.511 220.606 120.311 230.656 170.245 210.405 170.328 220.197 240.927 130.227 230.000 270.001 270.249 180.271 260.510 200.383 230.593 220.699 140.267 23
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz: SPLATNet: Sparse Lattice Networks for Point Cloud Processing. CVPR 2018
Tangent Convolutionspermissive0.438 210.437 230.646 130.474 200.369 170.645 190.353 120.258 220.282 240.279 190.918 170.298 170.147 200.283 170.294 160.487 140.562 150.427 200.619 210.633 200.352 21
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou: Tangent convolutions for dense prediction in 3d. CVPR 2018
PointNet++permissive0.339 240.584 140.478 240.458 210.256 250.360 260.250 190.247 230.278 250.261 220.677 260.183 240.117 210.212 240.145 250.364 220.346 260.232 260.548 230.523 250.252 24
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas: pointnet++: deep hierarchical feature learning on point sets in a metric space.
SSC-UNetpermissive0.308 250.353 240.290 260.278 260.166 260.553 240.169 260.286 210.147 260.148 260.908 190.182 250.064 240.023 260.018 270.354 230.363 240.345 240.546 250.685 150.278 22
ERROR0.054 270.000 270.041 270.172 270.030 270.062 270.001 270.035 260.004 270.051 270.143 270.019 270.003 260.041 250.050 260.003 270.054 270.018 270.005 270.264 270.082 27

This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg ap 50%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MASCpermissive0.447 40.528 70.555 60.381 20.382 10.633 20.002 80.509 30.260 10.361 20.432 20.327 20.451 10.571 50.367 30.639 20.386 30.980 10.276 4
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
MTML0.481 11.000 10.666 40.377 30.272 30.709 10.001 100.579 20.254 20.361 30.318 40.095 60.432 21.000 10.184 50.601 50.487 20.938 30.384 1
ResNet-backbone0.459 31.000 10.737 10.159 90.259 40.587 40.138 10.475 40.217 30.416 10.408 30.128 40.315 50.714 40.411 20.536 70.590 10.873 60.304 3
PanopticFusion-inst0.478 20.667 40.712 30.595 10.259 50.550 60.000 110.613 10.175 40.250 60.434 10.437 10.411 40.857 20.485 10.591 60.267 80.944 20.359 2
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
R-PointNet0.306 70.500 80.405 80.311 40.348 20.589 30.054 20.068 90.126 50.283 50.290 50.028 80.219 60.214 90.331 40.396 80.275 60.821 80.245 5
3D-BEVIS0.248 80.667 40.566 50.076 100.035 110.394 80.027 40.035 100.098 60.099 90.030 100.025 90.098 80.375 70.126 70.604 40.181 90.854 70.171 7
Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe: 3D-BEVIS: Birds-Eye-View Instance Segmentation.
Seg-Clusterpermissive0.215 90.370 90.337 100.285 50.105 80.325 90.025 50.282 50.085 70.105 80.107 80.007 110.079 90.317 80.114 90.309 100.304 50.587 90.123 9
UNet-backbone0.319 60.667 40.715 20.233 70.189 70.479 70.008 70.218 70.067 80.201 70.173 70.107 50.123 70.438 60.150 60.615 30.355 40.916 40.093 10
3D-SISpermissive0.382 51.000 10.432 70.245 60.190 60.577 50.013 60.263 60.033 90.320 40.240 60.075 70.422 30.857 20.117 80.699 10.271 70.883 50.235 6
Ji Hou, Angela Dai, Matthias Niessner: 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. CVPR 2019
Sgpn_scannet0.143 100.208 110.390 90.169 80.065 90.275 100.029 30.069 80.000 100.087 100.043 90.014 100.027 110.000 100.112 100.351 90.168 100.438 100.138 8
MaskRCNN 2d->3d Proj0.058 110.333 100.002 110.000 110.053 100.002 110.002 90.021 110.000 100.045 110.024 110.238 30.065 100.000 100.014 110.107 110.020 110.110 110.006 11

This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SSMAcopyleft0.577 10.695 10.716 20.439 40.563 10.314 30.444 10.719 10.551 10.503 10.887 30.346 10.348 20.603 10.353 30.709 10.600 30.457 20.901 10.786 10.599 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
3DMV (2d proj)0.498 40.481 50.612 40.579 20.456 40.343 10.384 30.623 30.525 20.381 40.845 40.254 40.264 40.557 20.182 50.581 50.598 40.429 40.760 50.661 60.446 5
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
FuseNetpermissive0.535 20.570 30.681 30.182 70.512 20.290 40.431 20.659 20.504 30.495 20.903 20.308 20.428 10.523 30.365 20.676 20.621 20.470 10.762 40.779 20.541 3
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 30.613 20.722 10.418 50.358 70.337 20.370 50.479 50.443 40.368 50.907 10.207 50.213 60.464 50.525 10.618 30.657 10.450 30.788 30.721 40.408 6
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
ScanNet (2d proj)permissive0.330 70.293 60.521 60.657 10.361 60.161 60.250 60.004 70.440 50.183 70.836 50.125 70.060 70.319 70.132 60.417 60.412 60.344 60.541 70.427 70.109 7
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
ILC-PSPNet0.475 50.490 40.581 50.289 60.507 30.067 70.379 40.610 40.417 60.435 30.822 60.278 30.267 30.503 40.228 40.616 40.533 50.375 50.820 20.729 30.560 2
Enet (reimpl)0.376 60.264 70.452 70.452 30.365 50.181 50.143 70.456 60.409 70.346 60.769 70.164 60.218 50.359 60.123 70.403 70.381 70.313 70.571 60.685 50.472 4
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MaskRCNN_ScanNetpermissive0.119 10.129 10.212 10.002 10.112 10.148 10.014 10.205 10.044 10.066 10.078 10.095 10.142 10.030 10.128 10.139 10.080 10.459 10.057 1
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17

This table lists the benchmark results for the scene type classification scenario.




Method Infoavg recallapartmentbathroombedroom / hotelbookstore / libraryconference roomcopy/mail roomhallwaykitchenlaundry roomliving room / loungemiscofficestorage / basement / garage
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort by
resnet50_scannet0.353 20.250 10.812 10.529 20.500 10.500 10.000 20.500 10.571 10.000 20.556 20.000 20.375 20.000 1
SE-ResNeXt-SSMA0.498 10.000 20.812 10.941 10.500 10.500 10.500 10.500 10.429 20.500 10.667 10.500 10.625 10.000 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv