This table lists the benchmark results for the 3D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SparseConvNet0.725 10.647 50.821 10.846 10.721 10.869 10.533 10.754 40.603 20.614 20.955 10.572 10.325 10.710 10.870 20.724 10.823 10.628 20.934 10.865 10.683 1
KP-FCNN0.694 20.849 10.770 30.810 20.685 20.813 30.438 30.791 20.566 30.616 10.944 30.500 30.216 60.559 40.880 10.690 20.758 30.627 30.922 20.832 20.613 3
MinkowskiNet340.679 30.811 20.734 40.739 40.641 30.804 40.413 60.759 30.696 10.545 40.938 70.518 20.141 150.623 20.757 30.680 30.723 40.684 10.896 30.821 30.651 2
joint point-based0.634 40.614 70.778 20.667 60.633 40.825 20.420 50.804 10.467 60.561 30.951 20.494 40.291 20.566 30.458 70.579 40.764 20.559 40.838 40.814 40.598 4
TextureNet0.566 50.672 30.664 70.671 50.494 60.719 60.445 20.678 60.411 110.396 80.935 80.356 80.225 40.412 90.535 60.565 50.636 80.464 90.794 80.680 120.568 5
DVVNet0.562 60.648 40.700 50.770 30.586 50.687 100.333 80.650 70.514 40.475 50.906 140.359 70.223 50.340 110.442 80.422 130.668 50.501 70.708 110.779 50.534 8
PointConv0.556 70.636 60.640 100.574 110.472 80.739 50.430 40.433 100.418 100.445 70.944 30.372 60.185 100.464 70.575 50.540 60.639 70.505 60.827 50.762 60.515 9
PanopticFusion-label0.529 80.491 140.688 60.604 90.386 110.632 140.225 180.705 50.434 80.293 130.815 170.348 90.241 30.499 60.669 40.507 70.649 60.442 120.796 70.602 160.561 6
3DMV, FTSDF0.501 90.558 110.608 130.424 170.478 70.690 90.246 140.586 80.468 50.450 60.911 120.394 50.160 120.438 80.212 140.432 120.541 120.475 80.742 100.727 80.477 11
PCNN0.498 100.559 100.644 90.560 120.420 100.711 80.229 160.414 110.436 70.352 100.941 60.324 100.155 130.238 150.387 90.493 80.529 130.509 50.813 60.751 70.504 10
3DMV0.484 110.484 150.538 150.643 70.424 90.606 170.310 90.574 90.433 90.378 90.796 180.301 110.214 70.537 50.208 150.472 110.507 160.413 150.693 120.602 160.539 7
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
PointCNN with RGBpermissive0.458 120.577 90.611 120.356 190.321 160.715 70.299 110.376 140.328 160.319 110.944 30.285 130.164 110.216 170.229 130.484 100.545 110.456 100.755 90.709 90.475 12
Yangyan Li, Rui Bu, Mingchao Sun, Baoquan Chen: PointCNN. NeurIPS 2018
SurfaceConvPF0.442 130.505 130.622 110.380 180.342 150.654 120.227 170.397 130.367 140.276 150.924 100.240 150.198 80.359 100.262 110.366 150.581 90.435 130.640 140.668 130.398 14
Hao Pan, Shilin Liu, Yang Liu, Xin Tong: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames.
PNET20.442 130.548 120.548 140.597 100.363 130.628 150.300 100.292 150.374 130.307 120.881 160.268 140.186 90.238 150.204 160.407 140.506 170.449 110.667 130.620 150.462 13
Tangent Convolutionspermissive0.438 150.437 170.646 80.474 140.369 120.645 130.353 70.258 170.282 180.279 140.918 110.298 120.147 140.283 120.294 100.487 90.562 100.427 140.619 150.633 140.352 16
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou: Tangent convolutions for dense prediction in 3d. CVPR 2018
SPLAT Netcopyleft0.393 160.472 160.511 160.606 80.311 170.656 110.245 150.405 120.328 160.197 180.927 90.227 170.000 200.001 200.249 120.271 200.510 140.383 170.593 160.699 100.267 18
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz: SPLATNet: Sparse Lattice Networks for Point Cloud Processing. CVPR 2018
ScanNet+FTSDF0.383 170.297 190.491 170.432 160.358 140.612 160.274 120.116 190.411 110.265 160.904 150.229 160.079 180.250 130.185 170.320 180.510 140.385 160.548 170.597 180.394 15
PointNet++permissive0.339 180.584 80.478 180.458 150.256 190.360 200.250 130.247 180.278 190.261 170.677 200.183 180.117 160.212 180.145 190.364 160.346 200.232 200.548 170.523 190.252 19
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas: pointnet++: deep hierarchical feature learning on point sets in a metric space.
SSC-UNetpermissive0.308 190.353 180.290 200.278 200.166 200.553 180.169 200.286 160.147 200.148 200.908 130.182 190.064 190.023 190.018 200.354 170.363 180.345 180.546 190.685 110.278 17
ScanNetpermissive0.306 200.203 200.366 190.501 130.311 170.524 190.211 190.002 200.342 150.189 190.786 190.145 200.102 170.245 140.152 180.318 190.348 190.300 190.460 200.437 200.182 20
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg ap 25%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
ResNet-backbone0.695 11.000 10.855 20.579 50.589 50.735 40.484 10.588 20.856 10.634 10.571 20.298 40.500 41.000 10.824 10.818 20.702 10.935 60.545 2
PanopticFusion-inst0.693 21.000 10.852 30.655 30.616 30.788 10.334 50.763 10.771 20.457 60.555 30.652 10.518 30.857 20.765 20.732 60.631 20.944 40.577 1
MASCpermissive0.615 30.711 60.802 40.540 60.757 10.777 20.029 110.577 30.588 40.521 40.600 10.436 30.534 20.697 50.616 50.838 10.526 40.980 10.534 3
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
UNet-backbone0.605 41.000 10.909 10.764 10.603 40.704 50.415 30.301 60.548 50.461 50.394 50.267 50.386 50.857 20.649 40.817 30.504 50.959 20.356 7
3D-SIS0.558 51.000 10.773 50.614 40.503 60.691 60.200 80.412 50.498 60.546 30.311 60.103 80.600 10.857 20.382 80.799 40.445 60.938 50.371 5
R-PointNet0.544 60.500 110.655 80.661 20.663 20.765 30.432 20.214 80.612 30.584 20.499 40.204 60.286 70.429 70.655 30.650 80.539 30.950 30.499 4
3D-BEVIS0.401 70.667 70.687 70.419 100.137 110.587 80.188 90.235 70.359 80.211 90.093 110.080 90.311 60.571 60.382 80.754 50.300 90.874 70.357 6
Sgpn_scannet0.390 80.556 100.636 90.493 70.353 70.539 90.271 70.160 100.450 70.359 70.178 80.146 70.250 100.143 90.347 110.698 70.436 70.667 80.331 9
Seg-Clusterpermissive0.380 90.625 90.420 100.456 90.296 80.473 100.390 40.433 40.293 100.322 80.247 70.066 100.264 80.325 80.388 70.486 100.401 80.614 90.341 8
MTML0.320 100.667 70.695 60.467 80.226 100.609 70.141 100.176 90.314 90.142 110.094 100.013 110.257 90.143 90.377 100.578 90.275 100.592 100.003 11
MaskRCNN 2d->3d Proj0.261 110.903 50.081 110.008 110.233 90.175 110.280 60.106 110.150 110.203 100.175 90.480 20.218 110.143 90.542 60.404 110.153 110.393 110.049 10

This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SSMAcopyleft0.577 10.695 10.716 20.439 40.563 10.314 30.444 10.719 10.551 10.503 10.887 30.346 10.348 20.603 10.353 30.709 10.600 20.457 10.901 10.786 10.599 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
FuseNetpermissive0.521 20.591 30.682 30.220 70.488 30.279 40.344 50.610 30.461 30.475 20.910 10.293 20.447 10.512 30.397 20.618 20.567 40.452 20.734 50.782 20.566 2
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 30.613 20.722 10.418 50.358 70.337 20.370 40.479 50.443 40.368 50.907 20.207 50.213 60.464 50.525 10.618 20.657 10.450 30.788 30.721 40.408 6
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
3DMV (2d proj)0.498 40.481 50.612 40.579 20.456 40.343 10.384 20.623 20.525 20.381 40.845 40.254 40.264 40.557 20.182 50.581 50.598 30.429 40.760 40.661 60.446 5
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
ILC-PSPNet0.475 50.490 40.581 50.289 60.507 20.067 70.379 30.610 30.417 60.435 30.822 60.278 30.267 30.503 40.228 40.616 40.533 50.375 50.820 20.729 30.560 3
Enet (reimpl)0.376 60.264 70.452 70.452 30.365 50.181 50.143 70.456 60.409 70.346 60.769 70.164 60.218 50.359 60.123 70.403 70.381 70.313 70.571 60.685 50.472 4
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 70.293 60.521 60.657 10.361 60.161 60.250 60.004 70.440 50.183 70.836 50.125 70.060 70.319 70.132 60.417 60.412 60.344 60.541 70.427 70.109 7
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17

This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
MaskRCNN_ScanNetpermissive0.119 10.129 10.212 10.002 10.112 10.148 10.014 10.205 10.044 10.066 10.078 10.095 10.142 10.030 10.128 10.139 10.080 10.459 10.057 1
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17

This table lists the benchmark results for the scene type classification scenario.




Method Infoavg recallapartmentbathroombedroom / hotelbookstore / libraryconference roomcopy/mail roomhallwaykitchenlaundry roomliving room / loungemiscofficestorage / basement / garage
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
SE-ResNeXt-SSMA0.498 10.000 20.812 10.941 10.500 10.500 10.500 10.500 10.429 20.500 10.667 10.500 10.625 10.000 1
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
resnet50_scannet0.353 20.250 10.812 10.529 20.500 10.500 10.000 20.500 10.571 10.000 20.556 20.000 20.375 20.000 1