This table lists the benchmark results for the 3D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SparseConvNet0.725 10.647 80.821 10.846 10.721 10.869 10.533 10.754 40.603 40.614 10.955 10.572 10.325 10.710 20.870 20.724 10.823 10.628 30.934 10.865 10.683 1
MinkowskiNet0.721 20.837 20.804 20.800 20.721 10.843 20.460 30.835 10.647 10.597 20.953 20.542 20.214 80.746 10.912 10.705 20.771 30.640 20.876 50.842 20.672 2
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
KP-FCNN0.684 30.847 10.758 40.784 30.647 30.814 40.473 20.772 30.605 30.594 30.935 90.450 50.181 130.587 40.805 30.690 30.785 20.614 40.882 30.819 30.632 3
MVPNet0.641 40.831 30.715 50.671 50.590 60.781 50.394 80.679 70.642 20.553 50.937 80.462 40.256 30.649 30.406 100.626 50.691 50.666 10.877 40.792 60.608 4
joint point-based0.634 50.614 100.778 30.667 70.633 40.825 30.420 60.804 20.467 100.561 40.951 30.494 30.291 20.566 50.458 80.579 60.764 40.559 60.838 60.814 40.598 6
HPEIN0.618 60.729 40.668 90.647 80.597 50.766 60.414 70.680 60.520 50.525 60.946 40.432 60.215 70.493 80.599 50.638 40.617 110.570 50.897 20.806 50.605 5
TextureNet0.566 70.672 60.664 100.671 50.494 80.719 80.445 40.678 80.411 150.396 110.935 90.356 100.225 50.412 110.535 70.565 80.636 90.464 120.794 100.680 150.568 7
DVVNet0.562 80.648 70.700 60.770 40.586 70.687 120.333 100.650 90.514 60.475 70.906 180.359 90.223 60.340 130.442 90.422 170.668 60.501 90.708 150.779 70.534 10
PointConv0.556 90.636 90.640 130.574 150.472 100.739 70.430 50.433 120.418 140.445 90.944 50.372 80.185 120.464 90.575 60.540 90.639 80.505 80.827 70.762 80.515 11
PanopticFusion-label0.529 100.491 180.688 70.604 120.386 140.632 180.225 220.705 50.434 120.293 160.815 210.348 110.241 40.499 70.669 40.507 100.649 70.442 160.796 90.602 200.561 8
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
LAP-D0.504 110.604 110.679 80.608 100.464 110.678 140.308 120.386 160.500 70.397 100.935 90.332 120.086 200.212 210.228 170.579 60.628 100.499 100.769 110.730 100.452 16
3DMV, FTSDF0.501 120.558 150.608 160.424 210.478 90.690 110.246 180.586 100.468 90.450 80.911 160.394 70.160 150.438 100.212 180.432 160.541 160.475 110.742 130.727 110.477 13
PCNN0.498 130.559 140.644 120.560 160.420 130.711 100.229 200.414 130.436 110.352 130.941 70.324 130.155 160.238 170.387 110.493 110.529 170.509 70.813 80.751 90.504 12
3DMV0.484 140.484 190.538 190.643 90.424 120.606 210.310 110.574 110.433 130.378 120.796 220.301 140.214 80.537 60.208 190.472 150.507 200.413 190.693 160.602 200.539 9
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
PointCNN with RGBpermissive0.458 150.577 130.611 150.356 230.321 200.715 90.299 140.376 170.328 200.319 140.944 50.285 160.164 140.216 200.229 160.484 130.545 150.456 140.755 120.709 120.475 14
Yangyan Li, Rui Bu, Mingchao Sun, Baoquan Chen: PointCNN. NeurIPS 2018
FCPNpermissive0.447 160.679 50.604 170.578 140.380 150.682 130.291 150.106 230.483 80.258 210.920 140.258 180.025 230.231 190.325 120.480 140.560 140.463 130.725 140.666 170.231 23
Dario Rethage, Johanna Wald, Jürgen Sturm, Nassir Navab, Federico Tombari: Fully-Convolutional Point Networks for Large-Scale Point Clouds. ECCV 2018
SurfaceConvPF0.442 170.505 170.622 140.380 220.342 190.654 160.227 210.397 150.367 180.276 180.924 130.240 190.198 100.359 120.262 140.366 190.581 120.435 170.640 180.668 160.398 17
Hao Pan, Shilin Liu, Yang Liu, Xin Tong: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames.
PNET20.442 170.548 160.548 180.597 130.363 170.628 190.300 130.292 180.374 170.307 150.881 200.268 170.186 110.238 170.204 200.407 180.506 210.449 150.667 170.620 190.462 15
Tangent Convolutionspermissive0.438 190.437 210.646 110.474 180.369 160.645 170.353 90.258 200.282 220.279 170.918 150.298 150.147 170.283 140.294 130.487 120.562 130.427 180.619 190.633 180.352 19
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou: Tangent convolutions for dense prediction in 3d. CVPR 2018
SPLAT Netcopyleft0.393 200.472 200.511 200.606 110.311 210.656 150.245 190.405 140.328 200.197 220.927 120.227 210.000 240.001 240.249 150.271 240.510 180.383 210.593 200.699 130.267 21
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz: SPLATNet: Sparse Lattice Networks for Point Cloud Processing. CVPR 2018
ScanNet+FTSDF0.383 210.297 230.491 210.432 200.358 180.612 200.274 160.116 220.411 150.265 190.904 190.229 200.079 210.250 150.185 210.320 220.510 180.385 200.548 210.597 220.394 18
PointNet++permissive0.339 220.584 120.478 220.458 190.256 230.360 240.250 170.247 210.278 230.261 200.677 240.183 220.117 180.212 210.145 230.364 200.346 240.232 240.548 210.523 230.252 22
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas: pointnet++: deep hierarchical feature learning on point sets in a metric space.
SSC-UNetpermissive0.308 230.353 220.290 240.278 240.166 240.553 220.169 240.286 190.147 240.148 240.908 170.182 230.064 220.023 230.018 240.354 210.363 220.345 220.546 230.685 140.278 20
ScanNetpermissive0.306 240.203 240.366 230.501 170.311 210.524 230.211 230.002 240.342 190.189 230.786 230.145 240.102 190.245 160.152 220.318 230.348 230.300 230.460 240.437 240.182 24
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg ap 25%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
ResNet-backbone0.695 11.000 10.855 30.579 60.589 60.735 50.484 10.588 30.856 10.634 10.571 20.298 40.500 51.000 10.824 10.818 20.702 10.935 70.545 3
PanopticFusion-inst0.693 21.000 10.852 40.655 40.616 30.788 20.334 50.763 10.771 20.457 70.555 30.652 10.518 40.857 30.765 20.732 70.631 30.944 40.577 2
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. arXiv
MTML0.657 31.000 10.883 20.761 20.590 50.821 10.218 80.728 20.632 30.581 30.496 50.160 70.583 21.000 10.351 100.805 40.693 20.944 40.585 1
MASCpermissive0.615 40.711 70.802 50.540 70.757 10.777 30.029 110.577 40.588 50.521 50.600 10.436 30.534 30.697 60.616 50.838 10.526 50.980 10.534 4
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
UNet-backbone0.605 51.000 10.909 10.764 10.603 40.704 60.415 30.301 70.548 60.461 60.394 60.267 50.386 60.857 30.649 40.817 30.504 60.959 20.356 8
3D-SIS0.558 61.000 10.773 60.614 50.503 70.691 70.200 90.412 60.498 70.546 40.311 70.103 90.600 10.857 30.382 80.799 50.445 70.938 60.371 6
R-PointNet0.544 70.500 110.655 80.661 30.663 20.765 40.432 20.214 90.612 40.584 20.499 40.204 60.286 80.429 80.655 30.650 90.539 40.950 30.499 5
3D-BEVIS0.401 80.667 80.687 70.419 100.137 110.587 80.188 100.235 80.359 90.211 100.093 110.080 100.311 70.571 70.382 80.754 60.300 100.874 80.357 7
Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe: 3D-BEVIS: Birds-Eye-View Instance Segmentation.
Sgpn_scannet0.390 90.556 100.636 90.493 80.353 80.539 90.271 70.160 100.450 80.359 80.178 90.146 80.250 100.143 100.347 110.698 80.436 80.667 90.331 10
Seg-Clusterpermissive0.380 100.625 90.420 100.456 90.296 90.473 100.390 40.433 50.293 100.322 90.247 80.066 110.264 90.325 90.388 70.486 100.401 90.614 100.341 9
MaskRCNN 2d->3d Proj0.261 110.903 60.081 110.008 110.233 100.175 110.280 60.106 110.150 110.203 110.175 100.480 20.218 110.143 100.542 60.404 110.153 110.393 110.049 11

This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SSMAcopyleft0.577 10.695 10.716 20.439 40.563 10.314 30.444 10.719 10.551 10.503 10.887 30.346 10.348 20.603 10.353 30.709 10.600 20.457 10.901 10.786 10.599 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
FuseNetpermissive0.521 20.591 30.682 30.220 70.488 30.279 40.344 50.610 30.461 30.475 20.910 10.293 20.447 10.512 30.397 20.618 20.567 40.452 20.734 50.782 20.566 2
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 30.613 20.722 10.418 50.358 70.337 20.370 40.479 50.443 40.368 50.907 20.207 50.213 60.464 50.525 10.618 20.657 10.450 30.788 30.721 40.408 6
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
3DMV (2d proj)0.498 40.481 50.612 40.579 20.456 40.343 10.384 20.623 20.525 20.381 40.845 40.254 40.264 40.557 20.182 50.581 50.598 30.429 40.760 40.661 60.446 5
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
ILC-PSPNet0.475 50.490 40.581 50.289 60.507 20.067 70.379 30.610 30.417 60.435 30.822 60.278 30.267 30.503 40.228 40.616 40.533 50.375 50.820 20.729 30.560 3
Enet (reimpl)0.376 60.264 70.452 70.452 30.365 50.181 50.143 70.456 60.409 70.346 60.769 70.164 60.218 50.359 60.123 70.403 70.381 70.313 70.571 60.685 50.472 4
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 70.293 60.521 60.657 10.361 60.161 60.250 60.004 70.440 50.183 70.836 50.125 70.060 70.319 70.132 60.417 60.412 60.344 60.541 70.427 70.109 7
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MaskRCNN_ScanNetpermissive0.119 10.129 10.212 10.002 10.112 10.148 10.014 10.205 10.044 10.066 10.078 10.095 10.142 10.030 10.128 10.139 10.080 10.459 10.057 1
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17

This table lists the benchmark results for the scene type classification scenario.




Method Infoavg recallapartmentbathroombedroom / hotelbookstore / libraryconference roomcopy/mail roomhallwaykitchenlaundry roomliving room / loungemiscofficestorage / basement / garage
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SE-ResNeXt-SSMA0.498 10.000 20.812 10.941 10.500 10.500 10.500 10.500 10.429 20.500 10.667 10.500 10.625 10.000 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
resnet50_scannet0.353 20.250 10.812 10.529 20.500 10.500 10.000 20.500 10.571 10.000 20.556 20.000 20.375 20.000 1