This table lists the benchmark results for the 3D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SparseConvNet0.725 10.647 80.821 10.846 10.721 10.869 10.533 10.754 40.603 40.614 10.955 10.572 10.325 10.710 20.870 20.724 10.823 10.628 30.934 10.865 10.683 1
MinkowskiNet0.721 20.837 20.804 20.800 20.721 10.843 20.460 30.835 10.647 10.597 20.953 20.542 20.214 80.746 10.912 10.705 20.771 30.640 20.876 50.842 20.672 2
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
KP-FCNN0.684 30.847 10.758 40.784 30.647 30.814 40.473 20.772 30.605 30.594 30.935 90.450 50.181 130.587 40.805 30.690 30.785 20.614 40.882 30.819 30.632 3
MVPNet0.641 40.831 30.715 50.671 50.590 60.781 50.394 80.679 70.642 20.553 50.937 80.462 40.256 30.649 30.406 100.626 50.691 50.666 10.877 40.792 60.608 4
joint point-based0.634 50.614 100.778 30.667 70.633 40.825 30.420 60.804 20.467 100.561 40.951 30.494 30.291 20.566 50.458 80.579 60.764 40.559 60.838 60.814 40.598 6
HPEIN0.618 60.729 40.668 90.647 80.597 50.766 60.414 70.680 60.520 50.525 60.946 40.432 60.215 70.493 80.599 50.638 40.617 110.570 50.897 20.806 50.605 5
TextureNet0.566 70.672 60.664 100.671 50.494 80.719 80.445 40.678 80.411 150.396 110.935 90.356 100.225 50.412 110.535 70.565 80.636 90.464 120.794 100.680 150.568 7
DVVNet0.562 80.648 70.700 60.770 40.586 70.687 120.333 100.650 90.514 60.475 70.906 180.359 90.223 60.340 130.442 90.422 170.668 60.501 90.708 150.779 70.534 10
PointConv0.556 90.636 90.640 130.574 150.472 100.739 70.430 50.433 120.418 140.445 90.944 50.372 80.185 120.464 90.575 60.540 90.639 80.505 80.827 70.762 80.515 11
PanopticFusion-label0.529 100.491 180.688 70.604 120.386 140.632 180.225 220.705 50.434 120.293 160.815 210.348 110.241 40.499 70.669 40.507 100.649 70.442 160.796 90.602 200.561 8
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
LAP-D0.504 110.604 110.679 80.608 100.464 110.678 140.308 120.386 160.500 70.397 100.935 90.332 120.086 200.212 210.228 170.579 60.628 100.499 100.769 110.730 100.452 16
3DMV, FTSDF0.501 120.558 150.608 160.424 210.478 90.690 110.246 180.586 100.468 90.450 80.911 160.394 70.160 150.438 100.212 180.432 160.541 160.475 110.742 130.727 110.477 13
PCNN0.498 130.559 140.644 120.560 160.420 130.711 100.229 200.414 130.436 110.352 130.941 70.324 130.155 160.238 170.387 110.493 110.529 170.509 70.813 80.751 90.504 12
3DMV0.484 140.484 190.538 190.643 90.424 120.606 210.310 110.574 110.433 130.378 120.796 220.301 140.214 80.537 60.208 190.472 150.507 200.413 190.693 160.602 200.539 9
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
PointCNN with RGBpermissive0.458 150.577 130.611 150.356 230.321 200.715 90.299 140.376 170.328 200.319 140.944 50.285 160.164 140.216 200.229 160.484 130.545 150.456 140.755 120.709 120.475 14
Yangyan Li, Rui Bu, Mingchao Sun, Baoquan Chen: PointCNN. NeurIPS 2018
FCPNpermissive0.447 160.679 50.604 170.578 140.380 150.682 130.291 150.106 230.483 80.258 210.920 140.258 180.025 230.231 190.325 120.480 140.560 140.463 130.725 140.666 170.231 23
Dario Rethage, Johanna Wald, Jürgen Sturm, Nassir Navab, Federico Tombari: Fully-Convolutional Point Networks for Large-Scale Point Clouds. ECCV 2018
SurfaceConvPF0.442 170.505 170.622 140.380 220.342 190.654 160.227 210.397 150.367 180.276 180.924 130.240 190.198 100.359 120.262 140.366 190.581 120.435 170.640 180.668 160.398 17
Hao Pan, Shilin Liu, Yang Liu, Xin Tong: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames.
PNET20.442 170.548 160.548 180.597 130.363 170.628 190.300 130.292 180.374 170.307 150.881 200.268 170.186 110.238 170.204 200.407 180.506 210.449 150.667 170.620 190.462 15
Tangent Convolutionspermissive0.438 190.437 210.646 110.474 180.369 160.645 170.353 90.258 200.282 220.279 170.918 150.298 150.147 170.283 140.294 130.487 120.562 130.427 180.619 190.633 180.352 19
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou: Tangent convolutions for dense prediction in 3d. CVPR 2018
SPLAT Netcopyleft0.393 200.472 200.511 200.606 110.311 210.656 150.245 190.405 140.328 200.197 220.927 120.227 210.000 240.001 240.249 150.271 240.510 180.383 210.593 200.699 130.267 21
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz: SPLATNet: Sparse Lattice Networks for Point Cloud Processing. CVPR 2018
ScanNet+FTSDF0.383 210.297 230.491 210.432 200.358 180.612 200.274 160.116 220.411 150.265 190.904 190.229 200.079 210.250 150.185 210.320 220.510 180.385 200.548 210.597 220.394 18
PointNet++permissive0.339 220.584 120.478 220.458 190.256 230.360 240.250 170.247 210.278 230.261 200.677 240.183 220.117 180.212 210.145 230.364 200.346 240.232 240.548 210.523 230.252 22
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas: pointnet++: deep hierarchical feature learning on point sets in a metric space.
SSC-UNetpermissive0.308 230.353 220.290 240.278 240.166 240.553 220.169 240.286 190.147 240.148 240.908 170.182 230.064 220.023 230.018 240.354 210.363 220.345 220.546 230.685 140.278 20
ScanNetpermissive0.306 240.203 240.366 230.501 170.311 210.524 230.211 230.002 240.342 190.189 230.786 230.145 240.102 190.245 160.152 220.318 230.348 230.300 230.460 240.437 240.182 24
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg ap 50%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MTML0.481 11.000 10.666 40.377 30.272 30.709 10.001 100.579 20.254 20.361 30.318 40.095 60.432 21.000 10.184 50.601 50.487 20.938 30.384 1
PanopticFusion-inst0.478 20.667 40.712 30.595 10.259 50.550 60.000 110.613 10.175 40.250 60.434 10.437 10.411 40.857 20.485 10.591 60.267 80.944 20.359 2
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
ResNet-backbone0.459 31.000 10.737 10.159 90.259 40.587 40.138 10.475 40.217 30.416 10.408 30.128 40.315 50.714 40.411 20.536 70.590 10.873 60.304 3
MASCpermissive0.447 40.528 70.555 60.381 20.382 10.633 20.002 80.509 30.260 10.361 20.432 20.327 20.451 10.571 50.367 30.639 20.386 30.980 10.276 4
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
3D-SIS0.382 51.000 10.432 70.245 60.190 60.577 50.013 60.263 60.033 90.320 40.240 60.075 70.422 30.857 20.117 80.699 10.271 70.883 50.235 6
UNet-backbone0.319 60.667 40.715 20.233 70.189 70.479 70.008 70.218 70.067 80.201 70.173 70.107 50.123 70.438 60.150 60.615 30.355 40.916 40.093 10
R-PointNet0.306 70.500 80.405 80.311 40.348 20.589 30.054 20.068 90.126 50.283 50.290 50.028 80.219 60.214 90.331 40.396 80.275 60.821 80.245 5
3D-BEVIS0.248 80.667 40.566 50.076 100.035 110.394 80.027 40.035 100.098 60.099 90.030 100.025 90.098 80.375 70.126 70.604 40.181 90.854 70.171 7
Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe: 3D-BEVIS: Birds-Eye-View Instance Segmentation.
Seg-Clusterpermissive0.215 90.370 90.337 100.285 50.105 80.325 90.025 50.282 50.085 70.105 80.107 80.007 110.079 90.317 80.114 90.309 100.304 50.587 90.123 9
Sgpn_scannet0.143 100.208 110.390 90.169 80.065 90.275 100.029 30.069 80.000 100.087 100.043 90.014 100.027 110.000 100.112 100.351 90.168 100.438 100.138 8
MaskRCNN 2d->3d Proj0.058 110.333 100.002 110.000 110.053 100.002 110.002 90.021 110.000 100.045 110.024 110.238 30.065 100.000 100.014 110.107 110.020 110.110 110.006 11

This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SSMAcopyleft0.577 10.695 10.716 20.439 40.563 10.314 30.444 10.719 10.551 10.503 10.887 30.346 10.348 20.603 10.353 30.709 10.600 20.457 10.901 10.786 10.599 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
FuseNetpermissive0.521 20.591 30.682 30.220 70.488 30.279 40.344 50.610 30.461 30.475 20.910 10.293 20.447 10.512 30.397 20.618 20.567 40.452 20.734 50.782 20.566 2
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 30.613 20.722 10.418 50.358 70.337 20.370 40.479 50.443 40.368 50.907 20.207 50.213 60.464 50.525 10.618 20.657 10.450 30.788 30.721 40.408 6
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
3DMV (2d proj)0.498 40.481 50.612 40.579 20.456 40.343 10.384 20.623 20.525 20.381 40.845 40.254 40.264 40.557 20.182 50.581 50.598 30.429 40.760 40.661 60.446 5
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
ILC-PSPNet0.475 50.490 40.581 50.289 60.507 20.067 70.379 30.610 30.417 60.435 30.822 60.278 30.267 30.503 40.228 40.616 40.533 50.375 50.820 20.729 30.560 3
Enet (reimpl)0.376 60.264 70.452 70.452 30.365 50.181 50.143 70.456 60.409 70.346 60.769 70.164 60.218 50.359 60.123 70.403 70.381 70.313 70.571 60.685 50.472 4
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 70.293 60.521 60.657 10.361 60.161 60.250 60.004 70.440 50.183 70.836 50.125 70.060 70.319 70.132 60.417 60.412 60.344 60.541 70.427 70.109 7
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MaskRCNN_ScanNetpermissive0.119 10.129 10.212 10.002 10.112 10.148 10.014 10.205 10.044 10.066 10.078 10.095 10.142 10.030 10.128 10.139 10.080 10.459 10.057 1
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17

This table lists the benchmark results for the scene type classification scenario.




Method Infoavg iouapartmentbathroombedroom / hotelbookstore / libraryconference roomcopy/mail roomhallwaykitchenlaundry roomliving room / loungemiscofficestorage / basement / garage
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SE-ResNeXt-SSMA0.355 10.000 20.684 10.696 10.200 20.500 10.200 10.500 10.429 10.200 10.545 10.111 10.556 10.000 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
resnet50_scannet0.231 20.200 10.481 20.346 20.250 10.250 20.000 20.500 10.333 20.000 20.357 20.000 20.286 20.000 1