Presenting the ScanNet200 Benchmark

We present the ScanNet200 benchmark, which studies an order of magnitude more class categories than previous version of ScanNet. The scene geometry is shared within the two tasks, but the parsing of surface annotation allows for a larger vocabulary and more realistic setting for in the wild 3D understanding methods.

The ScanNet200 benchmark includes both finer-grained categories as well as a large number of previously unaddressed classes. This induces a much more challenging setting regarding the diversity of naturally observed semantic classes seen in the raw ScanNet RGB-D observations, where the data also reflects naturally encountered class imbalances. The difference in category frequencies between ScanNet and ScanNet200 can be seen in the Figure above.

ScanNet200 Benchmark

This table lists the benchmark results for the ScanNet200 3D semantic label scenario.




Method Infoavg iouhead ioucommon ioutail ioualarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefloorfolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwallwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Voltpermissive0.416 20.619 20.318 40.269 30.528 30.138 40.862 10.000 30.356 10.000 10.380 80.438 70.616 20.952 10.795 70.143 130.891 20.000 30.904 20.227 20.087 20.606 40.237 130.625 20.238 80.188 30.429 50.000 10.000 30.251 40.504 30.791 30.000 10.218 40.000 30.900 50.082 80.735 10.097 10.093 80.754 10.475 10.981 10.000 10.000 20.425 90.653 40.000 100.696 80.988 20.773 30.000 170.265 40.905 50.000 10.000 110.631 20.000 70.493 80.401 10.753 20.499 10.392 90.437 120.000 170.609 40.881 10.000 70.277 70.958 50.142 90.000 10.518 20.000 150.274 40.700 20.752 60.709 30.421 50.431 90.462 110.583 30.000 40.000 10.553 50.020 60.007 170.218 40.631 20.934 20.005 160.614 80.223 40.430 40.884 20.407 10.000 10.652 50.040 180.000 140.000 40.000 10.398 20.855 20.635 110.151 40.624 30.903 20.335 30.686 10.063 110.865 40.000 60.551 10.000 10.000 80.000 30.678 40.000 50.000 90.000 10.696 20.962 10.410 80.679 150.997 10.000 10.542 20.635 30.588 10.909 30.728 10.414 31.000 10.261 60.000 10.834 20.737 40.136 120.066 50.888 10.924 10.000 10.541 120.069 100.000 70.000 10.682 60.000 20.000 10.747 20.639 10.603 40.329 80.778 20.982 20.501 10.725 30.680 30.141 70.719 40.000 10.000 120.893 10.842 60.930 10.000 10.850 40.272 70.898 20.000 10.351 140.576 10.357 30.721 40.324 13
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
OA-CNN-L_ScanNet2000.333 120.558 60.269 100.124 140.448 150.080 100.272 60.000 30.000 80.000 10.342 90.515 40.524 80.713 140.789 100.158 120.384 130.000 30.806 70.125 80.000 100.496 90.332 70.498 150.227 90.024 70.474 30.000 10.003 20.071 100.487 40.000 120.000 10.110 90.000 30.876 80.013 180.703 40.000 80.076 100.473 130.355 120.906 70.000 10.000 20.476 60.706 10.000 100.672 110.835 140.748 100.015 130.223 80.860 120.000 10.000 110.572 80.000 70.509 70.313 80.662 50.398 140.396 80.411 140.276 20.527 50.711 60.000 70.076 140.946 70.166 60.000 10.022 110.160 70.183 140.493 140.699 100.637 70.403 70.330 130.406 140.526 70.024 20.000 10.392 120.000 120.016 160.000 130.196 40.915 60.112 120.557 110.197 70.352 110.877 40.000 130.000 10.592 130.103 110.000 140.067 10.000 10.089 80.735 80.625 120.130 100.568 70.836 80.271 90.534 100.043 140.799 120.001 50.445 60.000 10.000 80.024 20.661 50.000 50.262 30.000 10.591 90.517 140.373 90.788 70.021 90.000 10.455 50.517 100.320 90.823 130.200 170.001 180.150 60.100 130.000 10.736 100.668 110.103 150.052 70.662 50.720 90.000 10.602 60.112 70.002 60.000 10.637 100.000 20.000 10.621 110.569 60.398 100.412 50.234 130.949 70.363 60.492 150.495 120.251 40.665 100.000 10.001 110.805 80.833 70.794 120.000 10.821 60.314 50.843 120.000 10.560 100.245 80.262 70.713 50.370 11
CSC-Pretrainpermissive0.249 180.455 180.171 170.079 180.418 160.059 150.186 110.000 30.000 80.000 10.335 110.250 140.316 170.766 80.697 180.142 140.170 150.003 20.553 150.112 100.097 10.201 170.186 150.476 160.081 170.000 100.216 180.000 10.000 30.001 180.314 180.000 120.000 10.055 160.000 30.832 170.094 30.659 160.002 60.076 100.310 170.293 180.664 150.000 10.000 20.175 180.634 70.130 20.552 180.686 180.700 180.076 70.110 160.770 180.000 10.000 110.430 180.000 70.319 160.166 160.542 180.327 170.205 170.332 150.052 160.375 140.444 180.000 70.012 180.930 180.203 30.000 10.000 130.046 120.175 150.413 170.592 150.471 170.299 160.152 170.340 170.247 180.000 40.000 10.225 160.058 30.037 40.000 130.207 30.862 160.014 140.548 140.033 170.233 170.816 170.000 130.000 10.542 160.123 50.121 10.019 20.000 10.000 120.463 170.454 180.045 180.128 180.557 160.235 150.441 170.063 110.484 180.000 60.308 180.000 10.000 80.000 30.318 180.000 50.000 90.000 10.545 150.543 130.164 150.734 90.000 110.000 10.215 180.371 170.198 150.743 150.205 160.062 160.000 120.079 150.000 10.683 170.547 170.142 90.000 110.441 120.579 160.000 10.464 150.098 90.041 10.000 10.590 150.000 20.000 10.373 140.494 150.174 160.105 170.001 180.895 170.222 170.537 130.307 170.180 50.625 150.000 10.000 120.591 180.609 150.398 160.000 10.766 180.014 170.638 180.000 10.377 130.004 140.206 140.609 180.465 5
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGroundpermissive0.272 160.485 160.184 160.106 160.476 120.077 110.218 90.000 30.000 80.000 10.547 20.295 120.540 60.746 110.745 160.058 170.112 170.005 10.658 120.077 160.000 100.322 150.178 170.512 120.190 140.199 20.277 160.000 10.000 30.173 80.399 130.000 120.000 10.039 170.000 30.858 150.085 70.676 120.002 60.103 60.498 90.323 150.703 130.000 10.000 20.296 160.549 130.216 10.702 60.768 150.718 150.028 100.092 170.786 170.000 10.000 110.453 170.022 50.251 180.252 100.572 160.348 150.321 120.514 70.063 150.279 170.552 160.000 70.019 170.932 160.132 160.000 10.000 130.000 150.156 180.457 160.623 130.518 150.265 170.358 120.381 160.395 150.000 40.000 10.127 180.012 90.051 10.000 130.000 50.886 140.014 140.437 180.179 90.244 160.826 160.000 130.000 10.599 110.136 10.085 30.000 40.000 10.000 120.565 140.612 140.143 60.207 160.566 150.232 160.446 160.127 40.708 160.000 60.384 100.000 10.000 80.000 30.402 150.000 50.059 70.000 10.525 160.566 120.229 130.659 160.000 110.000 10.265 160.446 150.147 170.720 180.597 90.066 150.000 120.187 100.000 10.726 140.467 180.134 130.000 110.413 160.629 130.000 10.363 170.055 110.022 30.000 10.626 120.000 20.000 10.323 160.479 180.154 170.117 160.028 170.901 160.243 160.415 170.295 180.143 60.610 170.000 10.000 120.777 130.397 180.324 170.000 10.778 160.179 90.702 170.000 10.274 170.404 50.233 110.622 160.398 7
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild. arXiv
AWCS0.305 150.508 150.225 150.142 120.463 140.063 140.195 100.000 30.000 80.000 10.467 30.551 30.504 90.773 70.764 150.142 140.029 180.000 30.626 140.100 120.000 100.360 140.179 160.507 140.137 160.006 90.300 130.000 10.000 30.172 90.364 160.512 100.000 10.056 150.000 30.865 140.093 40.634 180.000 80.071 140.396 150.296 170.876 100.000 10.000 20.373 140.436 170.063 90.749 20.877 110.721 130.131 30.124 150.804 160.000 10.000 110.515 130.010 60.452 110.252 100.578 150.417 90.179 180.484 100.171 70.337 150.606 130.000 70.115 110.937 150.142 90.000 10.008 120.000 150.157 170.484 150.402 180.501 160.339 100.553 70.529 30.478 130.000 40.000 10.404 110.001 110.022 130.077 100.000 50.894 130.219 70.628 70.093 160.305 150.886 10.233 100.000 10.603 100.112 60.023 90.000 40.000 10.000 120.741 70.664 80.097 160.253 150.782 110.264 120.523 120.154 20.707 170.000 60.411 90.000 10.000 80.000 30.332 170.000 50.000 90.000 10.602 80.595 110.185 140.656 170.159 70.000 10.355 120.424 160.154 160.729 160.516 110.220 110.620 40.084 140.000 10.707 150.651 140.173 50.014 100.381 180.582 150.000 10.619 30.049 130.000 70.000 10.702 40.000 20.000 10.302 170.489 160.317 140.334 70.392 80.922 150.254 140.533 140.394 140.129 150.613 160.000 10.000 120.820 60.649 120.749 140.000 10.782 150.282 60.863 70.000 10.288 160.006 130.220 120.633 150.542 3
: Long-Tailed 3D Semantic Segmentation with Adaptive Weight Constraint and Sampling. ICRA 2024
CeCo0.340 80.551 100.247 140.181 70.475 130.057 160.142 130.000 30.000 80.000 10.387 60.463 60.499 100.924 30.774 120.213 60.257 140.000 30.546 160.100 120.006 90.615 20.177 180.534 80.246 60.000 100.400 60.000 10.338 10.006 170.484 60.609 60.000 10.083 120.000 30.873 100.089 50.661 150.000 80.048 160.560 50.408 70.892 90.000 10.000 20.586 10.616 90.000 100.692 90.900 90.721 130.162 10.228 70.860 120.000 10.000 110.575 60.083 30.550 40.347 50.624 140.410 110.360 100.740 30.109 130.321 160.660 90.000 70.121 100.939 140.143 80.000 10.400 30.003 130.190 120.564 70.652 110.615 120.421 50.304 140.579 10.547 60.000 40.000 10.296 150.000 120.030 90.096 80.000 50.916 50.037 130.551 130.171 100.376 80.865 80.286 60.000 10.633 60.102 120.027 80.011 30.000 10.000 120.474 150.742 50.133 80.311 140.824 90.242 140.503 150.068 90.828 100.000 60.429 80.000 10.063 50.000 30.781 20.000 50.000 90.000 10.665 30.633 70.450 60.818 20.000 110.000 10.429 60.532 80.226 140.825 120.510 120.377 60.709 30.079 150.000 10.753 60.683 90.102 160.063 60.401 170.620 140.000 10.619 30.000 150.000 70.000 10.595 140.000 20.000 10.345 150.564 70.411 90.603 10.384 90.945 100.266 120.643 60.367 150.304 10.663 110.000 10.010 70.726 160.767 80.898 40.000 10.784 140.435 10.861 80.000 10.447 110.000 160.257 80.656 120.377 10
Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia: Understanding Imbalanced Semantic Segmentation Through Neural Collapse. CVPR 2023
OctFormer ScanNet200permissive0.326 140.539 110.265 110.131 130.499 70.110 50.522 40.000 30.000 80.000 10.318 120.427 80.455 160.743 120.765 140.175 110.842 50.000 30.828 60.204 50.033 70.429 120.335 60.601 30.312 30.000 100.357 110.000 10.000 30.047 120.423 100.000 120.000 10.105 100.000 30.873 100.079 100.670 130.000 80.117 50.471 140.432 40.829 120.000 10.000 20.584 20.417 180.089 60.684 100.837 130.705 170.021 120.178 120.892 70.000 10.028 80.505 140.000 70.457 100.200 150.662 50.412 100.244 160.496 80.000 170.451 90.626 100.000 70.102 120.943 100.138 140.000 10.000 130.149 80.291 30.534 100.722 80.632 80.331 110.253 150.453 120.487 120.000 40.000 10.479 70.000 120.022 130.000 130.000 50.900 110.128 110.684 30.164 110.413 50.854 110.000 130.000 10.512 170.074 150.003 110.000 40.000 10.000 120.469 160.613 130.132 90.529 80.871 40.227 170.582 80.026 180.787 130.000 60.339 160.000 10.000 80.000 30.626 80.000 50.029 80.000 10.587 100.612 90.411 70.724 100.000 110.000 10.407 70.552 60.513 40.849 110.655 50.408 50.000 120.296 20.000 10.686 160.645 150.145 80.022 90.414 150.633 120.000 10.637 20.224 30.000 70.000 10.650 90.000 20.000 10.622 100.535 130.343 130.483 30.230 140.943 110.289 110.618 80.596 60.140 90.679 90.000 10.022 60.783 120.620 130.906 20.000 10.806 90.137 110.865 60.000 10.378 120.000 160.168 160.680 90.227 14
Peng-Shuai Wang: OctFormer: Octree-based Transformers for 3D Point Clouds. SIGGRAPH 2023
PPT-SpUNet-F.T.0.332 130.556 70.270 80.123 150.519 50.091 80.349 50.000 30.000 80.000 10.339 100.383 110.498 110.833 50.807 40.241 40.584 100.000 30.755 80.124 90.000 100.608 30.330 80.530 100.314 20.000 100.374 90.000 10.000 30.197 60.459 80.000 120.000 10.117 70.000 30.876 80.095 20.682 100.000 80.086 90.518 80.433 30.930 50.000 10.000 20.563 30.542 150.077 70.715 40.858 120.756 60.008 160.171 130.874 90.000 10.039 70.550 120.000 70.545 50.256 90.657 90.453 50.351 110.449 110.213 60.392 130.611 120.000 70.037 160.946 70.138 140.000 10.000 130.063 110.308 20.537 90.796 50.673 50.323 120.392 110.400 150.509 80.000 40.000 10.649 10.000 120.023 120.000 130.000 50.914 70.002 170.506 170.163 120.359 90.872 60.000 130.000 10.623 80.112 60.001 120.000 40.000 10.021 100.753 60.565 160.150 50.579 50.806 100.267 100.616 50.042 150.783 140.000 60.374 120.000 10.000 80.000 30.620 90.000 50.000 90.000 10.572 140.634 60.350 100.792 50.000 110.000 10.376 100.535 70.378 70.855 80.672 40.074 140.000 120.185 110.000 10.727 130.660 130.076 180.000 110.432 130.646 110.000 10.594 80.006 140.000 70.000 10.658 80.000 20.000 10.661 50.549 110.300 150.291 90.045 150.942 120.304 90.600 90.572 80.135 130.695 60.000 10.008 90.793 100.942 20.899 30.000 10.816 70.181 80.897 30.000 10.679 40.223 90.264 60.691 60.345 12
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao: Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training. CVPR 2024
L3DETR-ScanNet_2000.336 90.533 120.279 70.155 110.508 60.073 120.101 180.000 30.058 70.000 10.294 150.233 150.548 50.927 20.788 110.264 20.463 120.000 30.638 130.098 140.014 80.411 130.226 140.525 110.225 100.010 80.397 70.000 10.000 30.192 70.380 150.598 70.000 10.117 70.000 30.883 70.082 80.689 50.000 80.032 180.549 70.417 50.910 60.000 10.000 20.448 80.613 100.000 100.697 70.960 40.759 50.158 20.293 30.883 80.000 10.312 30.583 50.079 40.422 120.068 180.660 80.418 80.298 130.430 130.114 110.526 60.776 40.051 30.679 30.946 70.152 70.000 10.183 90.000 150.211 90.511 110.409 170.565 130.355 90.448 80.512 50.557 40.000 40.000 10.420 100.000 120.007 170.104 70.000 50.125 180.330 30.514 160.146 130.321 140.860 90.174 120.000 10.629 70.075 140.000 140.000 40.000 10.002 110.671 90.712 70.141 70.339 130.856 50.261 130.529 110.067 100.835 70.000 60.369 130.000 10.259 20.000 30.629 70.000 50.487 10.000 10.579 120.646 50.107 180.720 110.122 80.000 10.333 150.505 110.303 100.908 40.503 140.565 20.074 90.324 10.000 10.740 90.661 120.109 140.000 110.427 140.563 180.000 10.579 110.108 80.000 70.000 10.664 70.000 20.000 10.641 80.539 120.416 80.515 20.256 120.940 130.312 70.209 180.620 40.138 120.636 120.000 10.000 120.775 140.861 50.765 130.000 10.801 100.119 120.860 90.000 10.687 20.001 150.192 150.679 100.699 1
Yanmin Wu, Qiankun Gao, Renrui Zhang, Jian Zhang: Language-Assisted 3D Scene Understanding. arXiv23.12
IMFSegNet0.334 100.532 140.251 120.179 80.486 100.041 170.139 140.003 10.283 50.000 10.274 160.191 160.457 150.704 150.795 70.197 90.830 70.000 30.710 100.055 170.064 50.518 70.305 100.458 180.216 130.027 60.284 140.000 10.000 30.044 130.406 110.561 80.000 10.080 130.000 30.873 100.021 160.683 90.000 80.076 100.494 110.363 100.648 170.000 10.000 20.425 90.649 50.000 100.668 130.908 80.740 120.010 140.206 90.862 110.000 10.000 110.560 100.000 70.359 140.237 120.631 130.408 120.411 40.322 160.246 40.439 110.599 140.047 40.213 80.940 110.139 120.000 10.369 60.124 100.188 130.495 120.624 120.626 90.320 150.595 40.495 80.496 110.000 40.000 10.340 130.014 70.032 70.135 60.000 50.903 90.277 60.612 90.196 80.344 130.848 140.260 70.000 10.574 140.073 160.062 40.000 40.000 10.091 70.839 40.776 30.123 130.392 100.756 130.274 60.518 130.029 170.842 50.000 60.357 140.000 10.035 70.000 30.444 130.793 20.245 50.000 10.512 170.512 160.159 160.713 130.000 110.000 10.336 140.484 130.569 30.852 100.615 70.120 130.068 110.228 90.000 10.733 110.773 20.190 40.000 110.608 70.792 50.000 10.597 70.000 150.025 20.000 10.573 180.000 20.000 10.508 120.555 90.363 110.139 130.610 30.947 90.305 80.594 100.527 100.009 180.633 140.000 10.060 30.820 60.604 160.799 100.000 10.799 120.034 150.784 140.000 10.618 60.424 30.134 170.646 140.214 15
PTv3 ScanNet2000.393 40.592 40.330 20.216 40.520 40.109 60.108 170.000 30.337 20.000 10.310 130.394 100.494 120.753 100.848 20.256 30.717 90.000 30.842 50.192 60.065 40.449 110.346 40.546 70.190 140.000 100.384 80.000 10.000 30.218 50.505 20.791 30.000 10.136 50.000 30.903 20.073 130.687 70.000 80.168 20.551 60.387 80.941 40.000 10.000 20.397 130.654 30.000 100.714 50.759 160.752 80.118 40.264 50.926 30.000 10.048 60.575 60.000 70.597 20.366 30.755 10.469 30.474 30.798 20.140 100.617 30.692 80.000 70.592 40.971 20.188 40.000 10.133 100.593 20.349 10.650 40.717 90.699 40.455 20.790 20.523 40.636 10.301 10.000 10.622 20.000 120.017 150.259 30.000 50.921 40.337 10.733 20.210 50.514 20.860 90.407 10.000 10.688 20.109 80.000 140.000 40.000 10.151 60.671 90.782 20.115 140.641 20.903 20.349 10.616 50.088 70.832 90.000 60.480 30.000 10.428 10.000 30.497 110.000 50.000 90.000 10.662 40.690 30.612 10.828 10.575 20.000 10.404 80.644 20.325 80.887 50.728 10.009 170.134 80.026 180.000 10.761 40.731 50.172 60.077 40.528 90.727 80.000 10.603 50.220 50.022 30.000 10.740 10.000 20.000 10.661 50.586 30.566 50.436 40.531 60.978 40.457 30.708 40.583 70.141 70.748 30.000 10.026 50.822 40.871 40.879 60.000 10.851 20.405 20.914 10.000 10.682 30.000 160.281 50.738 30.463 6
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao: Point Transformer V3: Simpler, Faster, Stronger. CVPR 2024 (Oral)
PonderV2 ScanNet2000.346 70.552 90.270 90.175 100.497 80.070 130.239 80.000 30.000 80.000 10.232 180.412 90.584 30.842 40.804 50.212 70.540 110.000 30.433 170.106 110.000 100.590 60.290 120.548 60.243 70.000 100.356 120.000 10.000 30.062 110.398 140.441 110.000 10.104 110.000 30.888 60.076 120.682 100.030 40.094 70.491 120.351 130.869 110.000 10.063 10.403 120.700 20.000 100.660 140.881 100.761 40.050 80.186 110.852 140.000 10.007 90.570 90.100 20.565 30.326 70.641 110.431 70.290 150.621 60.259 30.408 120.622 110.125 20.082 130.950 60.179 50.000 10.263 70.424 50.193 100.558 80.880 40.545 140.375 80.727 30.445 130.499 90.000 40.000 10.475 80.002 100.034 60.083 90.000 50.924 30.290 40.636 60.115 150.400 60.874 50.186 110.000 10.611 90.128 30.113 20.000 40.000 10.000 120.584 130.636 100.103 150.385 110.843 70.283 50.603 70.080 80.825 110.000 60.377 110.000 10.000 80.000 30.457 120.000 50.000 90.000 10.574 130.608 100.481 40.792 50.394 60.000 10.357 110.503 120.261 110.817 140.504 130.304 80.472 50.115 120.000 10.750 80.677 100.202 20.000 110.509 100.729 70.000 10.519 130.000 150.000 70.000 10.620 130.000 20.000 10.660 70.560 80.486 70.384 60.346 110.952 60.247 150.667 50.436 130.269 30.691 70.000 10.010 70.787 110.889 30.880 50.000 10.810 80.336 40.860 90.000 10.606 80.009 120.248 100.681 80.392 9
Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Wanli Ouyang: PonderV2: Pave the Way for 3D Foundataion Model with A Universal Pre-training Paradigm.
ODIN - Sem200permissive0.368 50.562 50.297 50.207 50.380 180.196 10.828 30.000 30.321 30.000 10.400 50.775 10.460 140.501 180.769 130.065 160.870 40.000 30.913 10.213 40.000 100.000 180.389 20.554 50.312 30.000 100.591 10.000 10.000 30.491 10.487 40.894 20.000 10.378 20.303 10.796 180.088 60.669 140.081 20.216 10.256 180.334 140.898 80.000 10.000 20.370 150.599 110.000 100.581 170.988 20.749 90.090 60.242 60.921 40.000 10.202 50.609 30.000 70.655 10.214 140.654 100.346 160.408 70.485 90.169 80.631 20.704 70.000 70.814 10.940 110.127 170.000 10.000 130.462 40.227 70.641 50.885 30.657 60.434 30.000 180.550 20.393 160.000 40.000 10.590 40.000 120.048 20.077 100.000 50.784 170.131 100.557 110.316 20.359 90.833 150.373 30.000 10.661 40.108 90.001 120.000 40.000 10.301 40.612 120.565 160.129 110.482 90.468 170.274 60.561 90.376 10.912 20.181 10.440 70.000 10.166 40.000 30.641 60.000 50.426 20.000 10.642 60.626 80.259 120.787 80.429 50.000 10.589 10.523 90.246 120.857 70.000 180.228 100.000 120.265 40.000 10.752 70.832 10.090 170.157 10.791 20.578 170.000 10.373 160.539 10.000 70.000 10.685 50.000 20.000 10.632 90.575 40.663 10.152 120.358 100.926 140.397 40.454 160.610 50.119 160.685 80.000 10.000 120.803 90.740 100.441 150.000 10.800 110.000 180.871 40.000 10.220 180.487 20.862 10.682 70.054 18
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki: ODIN: A Single Model for 2D and 3D Segmentation. CVPR 2024
ALS-MinkowskiNetcopyleft0.414 30.610 30.322 30.271 20.542 20.153 30.159 120.000 30.000 80.000 10.404 40.503 50.532 70.672 170.804 50.285 10.888 30.000 30.900 30.226 30.087 20.598 50.342 50.671 10.217 110.087 40.449 40.000 10.000 30.253 30.477 71.000 10.000 10.118 60.000 30.905 10.071 140.710 30.076 30.047 170.665 20.376 90.981 10.000 10.000 20.466 70.632 80.113 40.769 10.956 50.795 20.031 90.314 10.936 10.000 10.390 20.601 40.000 70.458 90.366 30.719 40.440 60.564 10.699 40.314 10.464 80.784 30.200 10.283 60.973 10.142 90.000 10.250 80.285 60.220 80.718 10.752 60.723 20.460 10.248 160.475 100.463 140.000 40.000 10.446 90.021 50.025 110.285 10.000 50.972 10.149 80.769 10.230 30.535 10.879 30.252 90.000 10.693 10.129 20.000 140.000 40.000 10.447 10.958 10.662 90.159 20.598 40.780 120.344 20.646 40.106 60.893 30.135 30.455 40.000 10.194 30.259 10.726 30.475 40.000 90.000 10.741 10.865 20.571 20.817 30.445 40.000 10.506 30.630 40.230 130.916 20.728 10.635 11.000 10.252 70.000 10.804 30.697 80.137 110.043 80.717 30.807 40.000 10.510 140.245 20.000 70.000 10.709 30.000 20.000 10.703 30.572 50.646 20.223 110.531 60.984 10.397 40.813 10.798 10.135 130.800 10.000 10.097 20.832 30.752 90.842 80.000 10.852 10.149 100.846 110.000 10.666 50.359 60.252 90.777 10.690 2
Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann Blum: ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding. CVPR 2025
BFANet ScanNet200permissive0.360 60.553 80.293 60.193 60.483 110.096 70.266 70.000 30.000 80.000 10.298 140.255 130.661 10.810 60.810 30.194 100.785 80.000 30.000 180.161 70.000 100.494 100.382 30.574 40.258 50.000 100.372 100.000 10.000 30.043 150.436 90.000 120.000 10.239 30.000 30.901 30.105 10.689 50.025 50.128 40.614 30.436 20.493 180.000 10.000 20.526 40.546 140.109 50.651 150.953 60.753 70.101 50.143 140.897 60.000 10.431 10.469 160.000 70.522 60.337 60.661 70.459 40.409 60.666 50.102 140.508 70.757 50.000 70.060 150.970 30.497 10.000 10.376 40.511 30.262 50.688 30.921 20.617 110.321 130.590 60.491 90.556 50.000 40.000 10.481 60.093 10.043 30.284 20.000 50.875 150.135 90.669 40.124 140.394 70.849 120.298 50.000 10.476 180.088 130.042 70.000 40.000 10.254 50.653 110.741 60.215 10.573 60.852 60.266 110.654 30.056 130.835 70.000 60.492 20.000 10.000 80.000 30.612 100.000 50.000 90.000 10.616 70.469 180.460 50.698 140.516 30.000 10.378 90.563 50.476 50.863 60.574 100.330 70.000 120.282 30.000 10.760 50.710 60.233 10.000 110.641 60.814 30.000 10.585 100.053 120.000 70.000 10.629 110.000 20.000 10.678 40.528 140.534 60.129 150.596 50.973 50.264 130.772 20.526 110.139 100.707 50.000 10.000 120.764 150.591 170.848 70.000 10.827 50.338 30.806 130.000 10.568 90.151 110.358 20.659 110.510 4
Weiguang Zhao, Rui Zhang, Qiufeng Wang, Guangliang Cheng, Kaizhu Huang: BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis. CVPR 2025
DITR0.449 10.629 10.392 10.289 10.650 10.168 20.862 10.000 30.313 40.000 10.580 10.568 20.564 40.766 80.867 10.238 50.949 10.000 30.866 40.300 10.000 100.664 10.482 10.508 130.317 10.420 10.551 20.000 10.000 30.486 20.519 10.662 50.000 10.385 10.000 30.901 30.079 100.727 20.000 80.160 30.606 40.417 50.967 30.000 10.000 20.498 50.596 120.130 20.728 30.998 10.805 10.000 170.314 10.934 20.000 10.278 40.636 10.000 70.403 130.367 20.741 30.484 20.500 21.000 10.113 120.828 10.815 20.000 70.733 20.969 40.374 20.000 10.579 11.000 10.230 60.617 60.983 10.729 10.423 40.855 10.508 60.622 20.018 30.000 10.591 30.034 40.028 100.066 120.869 10.904 80.334 20.651 50.716 10.514 20.871 70.315 40.000 10.664 30.128 30.014 100.000 40.000 10.392 30.851 30.817 10.153 30.823 10.991 10.318 40.680 20.134 30.913 10.157 20.448 50.000 10.000 80.000 30.826 10.978 10.091 60.000 10.660 50.647 40.571 20.804 40.001 100.000 10.480 40.700 10.421 60.947 10.433 150.411 40.148 70.262 50.000 10.849 10.709 70.138 100.150 20.714 40.889 20.000 10.698 10.222 40.000 70.000 10.720 20.000 20.000 10.805 10.600 20.642 30.268 100.904 10.982 20.477 20.632 70.718 20.139 100.776 20.000 10.178 10.886 20.962 10.839 90.000 10.851 20.043 130.869 50.000 10.710 10.315 70.348 40.753 20.397 8
Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, Bastian Leibe: DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation. 3DV 2026
GSTran0.334 110.533 130.250 130.179 90.487 90.041 170.139 140.003 10.273 60.000 10.273 170.189 170.465 130.704 150.794 90.198 80.831 60.000 30.712 90.055 170.063 60.518 70.306 90.459 170.217 110.028 50.282 150.000 10.000 30.044 130.405 120.558 90.000 10.080 130.000 30.873 100.020 170.684 80.000 80.075 130.496 100.363 100.651 160.000 10.000 20.425 90.648 60.000 100.669 120.914 70.741 110.009 150.200 100.864 100.000 10.000 110.560 100.000 70.357 150.233 130.633 120.408 120.411 40.320 170.242 50.440 100.598 150.047 40.205 90.940 110.139 120.000 10.372 50.138 90.191 110.495 120.618 140.624 100.321 130.595 40.496 70.499 90.000 40.000 10.340 130.014 70.032 70.136 50.000 50.903 90.279 50.601 100.198 60.345 120.849 120.260 70.000 10.573 150.072 170.060 50.000 40.000 10.089 80.838 50.775 40.125 120.381 120.752 140.274 60.517 140.032 160.841 60.000 60.354 150.000 10.047 60.000 30.439 140.787 30.252 40.000 10.512 170.507 170.158 170.717 120.000 110.000 10.337 130.483 140.570 20.853 90.614 80.121 120.070 100.229 80.000 10.732 120.773 20.193 30.000 110.606 80.791 60.000 10.593 90.000 150.010 50.000 10.574 170.000 20.000 10.507 130.554 100.361 120.136 140.608 40.948 80.304 90.593 110.533 90.011 170.634 130.000 10.060 30.821 50.613 140.797 110.000 10.799 120.036 140.782 150.000 10.609 70.423 40.133 180.647 130.213 16
Minkowski 34Dpermissive0.253 170.463 170.154 180.102 170.381 170.084 90.134 160.000 30.000 80.000 10.386 70.141 180.279 180.737 130.703 170.014 180.164 160.000 30.663 110.092 150.000 100.224 160.291 110.531 90.056 180.000 100.242 170.000 10.000 30.013 160.331 170.000 120.000 10.035 180.001 20.858 150.059 150.650 170.000 80.056 150.353 160.299 160.670 140.000 10.000 20.284 170.484 160.071 80.594 160.720 170.710 160.027 110.068 180.813 150.000 10.005 100.492 150.164 10.274 170.111 170.571 170.307 180.293 140.307 180.150 90.163 180.531 170.002 60.545 50.932 160.093 180.000 10.000 130.002 140.159 160.368 180.581 160.440 180.228 180.406 100.282 180.294 170.000 40.000 10.189 170.060 20.036 50.000 130.000 50.897 120.000 180.525 150.025 180.205 180.771 180.000 130.000 10.593 120.108 90.044 60.000 40.000 10.000 120.282 180.589 150.094 170.169 170.466 180.227 170.419 180.125 50.757 150.002 40.334 170.000 10.000 80.000 30.357 160.000 50.000 90.000 10.582 110.513 150.337 110.612 180.000 110.000 10.250 170.352 180.136 180.724 170.655 50.280 90.000 120.046 170.000 10.606 180.559 160.159 70.102 30.445 110.655 100.000 10.310 180.117 60.000 70.000 10.581 160.026 10.000 10.265 180.483 170.084 180.097 180.044 160.865 180.142 180.588 120.351 160.272 20.596 180.000 10.003 100.622 170.720 110.096 180.000 10.771 170.016 160.772 160.000 10.302 150.194 100.214 130.621 170.197 17
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019


This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
TD3D Scannet200permissive0.379 60.603 60.306 60.190 60.635 50.073 50.500 10.000 20.000 40.000 10.495 60.735 40.275 91.000 10.979 50.590 30.000 80.021 40.000 60.146 70.000 40.356 40.173 90.795 20.226 60.000 40.173 60.000 10.000 20.226 60.390 50.000 60.000 10.250 40.000 30.706 60.061 70.885 30.093 40.186 50.259 80.200 50.667 30.000 50.000 10.667 50.825 20.250 60.834 81.000 10.958 10.553 10.111 70.748 50.220 20.051 60.866 50.792 10.390 90.045 90.800 60.302 90.517 50.533 40.113 60.427 40.843 40.000 30.458 40.600 10.000 20.101 50.000 50.259 40.717 60.500 60.615 60.520 50.526 40.457 50.270 80.000 20.000 10.400 40.088 40.294 40.181 30.000 31.000 10.400 20.710 90.103 70.477 90.905 50.061 40.000 10.906 40.102 50.232 20.125 60.000 40.003 60.792 71.000 10.000 60.102 70.125 80.559 90.523 70.075 60.715 40.000 60.424 90.000 10.396 40.250 10.638 50.000 30.000 60.000 30.622 90.833 50.221 30.970 10.250 40.038 10.260 60.415 50.125 51.000 11.000 10.857 30.000 40.908 10.012 10.869 70.836 40.635 10.111 30.625 31.000 10.020 30.510 40.003 60.009 31.000 10.778 20.000 20.000 10.370 70.755 30.288 60.333 60.274 51.000 10.557 40.731 50.456 60.433 30.769 90.000 10.000 50.621 81.000 10.458 80.000 10.196 30.817 20.000 10.472 30.222 60.205 90.689 50.274 7
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.445 50.653 40.392 50.254 50.648 40.097 30.125 90.000 20.000 40.000 10.657 10.971 10.451 41.000 11.000 10.640 20.500 30.045 31.000 10.241 50.409 30.363 30.440 40.686 60.300 40.000 40.201 50.000 10.009 10.290 50.556 41.000 10.000 10.063 70.000 30.830 40.573 20.844 40.333 30.204 40.058 90.158 90.552 60.056 30.000 11.000 10.725 60.750 10.927 11.000 10.888 60.042 70.120 60.615 80.226 10.250 40.890 20.792 10.677 50.510 50.818 50.699 50.512 60.167 90.125 40.315 60.943 10.309 10.017 70.200 40.000 20.188 40.000 50.183 70.815 51.000 10.827 30.741 40.442 50.414 80.600 20.000 20.000 10.458 30.049 50.321 30.381 10.000 30.908 40.400 20.841 40.260 50.710 40.966 30.265 30.000 10.924 20.152 20.025 30.500 20.027 20.028 41.000 10.556 90.016 50.080 90.500 20.694 60.608 50.084 50.604 60.194 40.538 60.000 10.500 10.000 30.354 80.000 31.000 10.000 30.761 60.930 30.053 80.890 51.000 10.008 20.262 50.358 61.000 11.000 10.792 70.966 21.000 10.765 50.004 20.930 30.780 60.330 30.027 40.625 30.974 40.050 10.412 90.021 40.000 40.000 20.778 20.000 20.000 10.493 50.746 40.454 50.335 50.396 30.930 90.551 51.000 10.552 50.606 10.853 30.000 10.004 40.806 41.000 10.727 50.000 10.042 60.745 50.000 10.399 70.391 30.630 40.721 40.619 2
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
Volt-SPFormerpermissive0.527 10.731 10.475 20.342 10.789 20.076 40.500 10.000 20.125 10.000 10.391 80.508 80.753 11.000 10.994 40.400 50.500 30.020 51.000 10.413 20.850 20.547 10.510 30.810 10.433 30.250 20.346 30.000 10.000 20.519 40.594 31.000 10.000 10.331 30.000 30.937 10.638 10.826 50.056 60.214 20.850 20.262 30.667 30.028 40.000 10.817 20.825 10.250 60.880 31.000 10.950 20.279 20.309 30.856 10.000 40.304 20.867 40.000 60.750 10.542 10.942 10.818 20.901 10.458 60.329 10.750 10.855 30.000 30.510 30.200 40.000 20.677 20.500 10.397 10.903 21.000 10.843 20.773 21.000 10.799 20.449 40.250 10.000 10.600 10.027 60.372 20.000 41.000 10.833 50.400 20.878 30.656 30.843 20.973 20.000 50.000 10.921 30.103 40.008 40.500 20.057 10.278 21.000 10.802 40.557 20.700 11.000 10.874 10.767 20.279 20.801 30.047 50.714 10.000 10.500 10.250 10.907 20.000 31.000 10.011 20.875 10.944 20.255 20.923 21.000 10.002 30.321 30.579 31.000 11.000 11.000 11.000 11.000 10.737 60.000 30.926 40.857 20.343 20.000 60.741 10.629 90.025 20.500 50.000 70.000 40.000 20.725 60.000 20.000 10.715 10.803 20.738 31.000 10.500 21.000 10.565 30.884 20.812 20.167 70.937 10.000 10.019 20.923 11.000 11.000 10.000 10.099 51.000 10.000 10.472 30.764 10.614 50.815 20.681 1
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
CompetitorFormer-2000.469 30.676 30.401 40.296 30.692 30.057 60.500 10.083 10.000 40.000 10.534 40.701 50.410 60.903 80.998 30.878 10.500 30.068 10.250 50.424 11.000 10.244 50.556 10.696 40.270 51.000 10.240 40.000 10.000 20.587 30.380 61.000 10.000 10.500 10.000 30.900 20.257 40.901 10.085 50.207 30.863 10.224 41.000 10.109 20.000 10.724 40.806 30.500 20.869 41.000 10.829 70.247 30.474 10.759 40.021 30.269 30.873 30.125 50.467 80.542 10.885 30.829 10.711 30.285 80.118 50.482 30.770 80.025 20.018 60.400 20.000 20.677 20.500 10.222 50.916 11.000 10.818 40.827 10.342 70.650 30.452 30.000 20.000 10.330 50.173 20.278 50.000 40.083 21.000 10.336 50.748 60.508 40.698 50.989 10.286 20.000 10.933 10.175 10.400 10.663 10.015 30.103 31.000 10.829 20.125 30.293 40.500 20.847 30.711 30.295 10.543 80.385 30.581 50.000 10.500 10.000 30.747 30.050 21.000 10.013 10.850 30.886 40.214 60.918 30.125 50.000 40.320 40.610 20.025 90.933 71.000 10.820 50.250 30.901 20.000 30.980 10.878 10.325 40.160 20.574 50.703 70.009 40.540 30.011 50.000 40.000 20.700 70.056 10.000 10.491 60.729 50.617 40.489 40.565 11.000 10.410 70.750 40.629 40.292 40.839 50.000 10.157 10.839 21.000 10.834 40.000 10.131 40.794 30.000 10.667 10.144 70.664 20.854 10.500 4
DINO3D-Scannet200copyleft0.511 20.685 20.484 10.331 20.864 10.220 10.500 10.000 20.042 30.000 10.576 30.746 30.744 21.000 11.000 10.355 91.000 10.048 20.000 60.327 30.000 40.494 20.532 20.596 80.496 20.250 20.481 10.000 10.000 20.714 10.629 11.000 10.000 10.250 40.663 10.861 30.436 30.892 20.667 10.244 10.385 60.421 11.000 10.000 50.000 10.764 30.719 80.500 20.889 21.000 10.907 40.111 50.378 20.778 20.000 40.595 10.905 10.708 30.750 10.542 10.890 20.754 40.761 20.798 10.220 20.683 20.817 50.000 30.600 20.200 40.500 10.944 10.125 40.334 30.856 40.792 50.873 10.756 30.777 20.803 10.675 10.000 20.000 10.200 60.298 10.412 10.000 40.000 30.719 80.800 10.923 10.750 10.798 30.960 40.000 50.000 10.856 60.142 30.001 60.417 50.000 40.014 51.000 10.824 30.559 10.700 10.500 20.863 20.816 10.163 40.944 10.764 10.714 10.000 10.250 50.000 31.000 10.063 11.000 10.000 30.789 50.974 10.079 70.851 70.000 60.000 40.468 10.702 10.167 31.000 11.000 10.857 30.000 40.867 30.000 30.968 20.845 30.264 60.419 10.500 60.667 80.000 50.677 10.028 30.194 20.000 20.857 10.000 20.000 10.699 20.821 10.930 10.850 20.346 40.944 70.579 10.866 30.850 10.221 60.911 20.000 10.011 30.806 50.764 90.860 30.000 10.472 10.794 30.000 10.667 10.655 20.655 30.811 30.528 3
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.451 40.637 50.407 30.277 40.583 80.116 20.500 10.000 20.125 10.000 10.599 20.823 20.407 70.667 90.941 60.542 41.000 10.000 61.000 10.162 60.000 40.028 80.357 50.695 50.550 10.000 40.475 20.000 10.000 20.714 10.626 21.000 10.000 10.500 10.125 20.749 50.080 50.742 90.528 20.078 60.500 40.334 20.667 30.333 10.000 10.278 90.723 70.250 60.859 71.000 10.826 90.108 60.221 40.763 30.000 40.250 40.742 60.500 40.750 10.400 60.855 40.769 30.701 40.469 50.203 30.406 50.870 20.000 30.963 10.200 40.000 20.000 60.500 10.370 20.886 31.000 10.782 50.504 60.429 60.494 40.337 60.000 20.000 10.600 10.000 70.215 60.226 20.000 30.944 30.200 60.887 20.750 10.874 10.877 60.438 10.000 10.867 50.089 60.003 50.500 20.000 40.333 11.000 10.742 50.125 30.671 30.417 70.616 80.637 40.238 30.873 20.528 20.494 80.000 10.250 50.000 30.688 40.000 31.000 10.000 30.872 20.833 50.275 10.779 81.000 10.000 40.441 20.577 40.167 31.000 10.500 80.777 60.000 40.778 40.000 30.910 50.800 50.232 70.019 50.717 20.833 50.000 50.638 20.284 10.000 40.000 20.778 20.000 20.000 10.597 30.699 60.850 20.333 60.250 60.944 70.571 20.677 60.795 30.264 50.852 40.000 10.000 50.824 31.000 10.668 60.000 10.000 70.667 60.000 10.333 80.333 40.760 10.679 60.404 5
Minkowski 34D Inst.permissive0.280 80.488 80.192 90.124 80.593 70.010 80.500 10.000 20.000 40.000 10.447 70.535 70.445 51.000 10.861 80.400 50.225 60.000 60.000 60.142 80.000 40.074 70.342 70.467 90.067 70.000 40.119 90.000 10.000 20.000 80.337 90.000 60.000 10.000 80.000 30.506 90.070 60.804 70.000 70.000 80.333 70.172 70.150 90.000 50.000 10.479 80.745 50.000 90.830 91.000 10.904 50.167 40.090 80.732 60.000 40.000 70.443 80.000 60.500 60.542 10.772 90.396 80.077 90.385 70.044 80.118 90.777 70.000 30.000 80.200 40.000 20.000 60.000 50.148 80.502 80.500 60.419 80.159 90.281 80.404 90.317 70.000 20.000 10.200 60.000 70.077 70.000 40.000 30.750 60.200 60.715 80.021 80.551 60.828 90.000 50.000 10.743 80.059 90.000 70.000 70.000 40.000 70.125 90.648 70.000 60.191 60.500 20.669 70.502 80.000 90.568 70.000 60.516 70.000 10.000 70.000 30.305 90.000 30.000 60.000 30.825 40.833 50.021 90.918 30.000 60.000 40.191 80.346 80.100 70.981 61.000 10.286 80.000 40.000 90.000 30.868 80.648 90.292 50.000 60.375 71.000 10.000 50.500 50.000 70.333 10.000 20.538 90.000 20.000 10.213 90.518 80.098 80.528 30.250 60.997 50.284 90.677 60.398 70.167 70.790 80.000 10.000 50.618 90.903 80.200 90.000 10.333 20.333 80.000 10.442 60.083 80.213 80.587 80.131 9
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.275 90.466 90.218 80.110 90.625 60.007 90.500 10.000 20.000 40.000 10.000 90.222 90.377 81.000 10.661 90.400 50.000 80.000 60.000 60.119 90.000 40.000 90.277 80.685 70.067 70.000 40.132 70.000 10.000 20.000 80.367 80.000 60.000 10.000 80.000 30.591 70.055 80.783 80.000 70.014 70.500 40.161 80.278 70.000 50.000 10.667 50.768 40.500 20.866 51.000 10.829 80.000 80.019 90.555 90.000 40.000 70.305 90.000 60.750 10.200 80.783 80.429 70.395 70.677 30.020 90.286 70.584 90.000 30.000 80.115 90.000 20.000 60.000 50.145 90.423 90.500 60.364 90.369 80.571 30.448 70.206 90.000 20.000 10.200 60.106 30.065 90.000 40.000 30.750 60.200 60.774 50.000 90.501 70.841 80.000 50.000 10.692 90.063 80.000 70.000 70.000 40.000 70.500 80.649 60.000 60.084 80.125 80.719 40.413 90.004 80.450 90.000 60.638 30.000 10.000 70.000 30.505 70.000 30.000 60.000 30.727 70.833 50.221 40.779 80.000 60.000 40.168 90.311 90.125 50.571 80.500 80.143 90.000 40.250 80.000 30.869 60.667 80.162 90.000 60.250 81.000 10.000 50.500 50.000 70.000 40.000 20.689 80.000 20.000 10.312 80.383 90.114 70.333 60.000 80.997 50.420 60.613 80.212 90.500 20.819 60.000 10.000 50.768 61.000 10.918 20.000 10.000 70.278 90.000 10.333 80.000 90.353 60.546 90.258 8
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 70.529 70.225 70.155 70.578 90.010 70.500 10.000 20.000 40.000 10.515 50.556 60.696 31.000 10.927 70.400 50.083 70.000 61.000 10.252 40.000 40.167 60.350 60.731 30.067 70.000 40.123 80.000 10.000 20.036 70.372 70.000 60.000 10.250 40.000 30.569 80.031 90.810 60.000 70.000 80.630 30.183 60.278 70.000 50.000 10.582 70.589 90.500 20.863 61.000 10.940 30.000 80.144 50.716 70.000 40.000 70.484 70.000 60.500 60.400 60.798 70.500 60.278 80.750 20.093 70.166 80.783 60.000 30.200 50.400 20.000 20.000 60.000 50.219 60.539 70.500 60.578 70.413 70.181 90.457 60.375 50.000 20.000 10.050 90.000 70.077 80.000 40.000 30.500 90.000 90.743 70.250 60.488 80.846 70.000 50.000 10.800 70.069 70.000 70.000 70.000 40.000 71.000 10.607 80.000 60.200 50.500 20.694 50.528 60.063 70.659 50.000 60.594 40.000 10.000 70.000 30.571 60.000 30.000 60.000 30.716 80.647 90.221 40.857 60.000 60.000 40.217 70.346 70.071 80.530 91.000 10.429 70.000 40.286 70.000 30.826 90.706 70.208 80.000 60.250 80.744 60.000 50.500 50.042 20.000 40.000 20.746 50.000 20.000 10.517 40.625 70.085 90.333 60.000 81.000 10.378 80.533 90.376 80.042 90.814 70.000 10.000 50.765 71.000 10.600 70.000 10.000 70.667 60.000 10.472 30.333 40.337 70.605 70.305 6
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.


ScanNet Benchmark

This table lists the benchmark results for the 3D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Volt ScanNetpermissive0.805 10.932 50.846 30.801 490.775 100.862 110.604 10.955 10.779 10.722 40.980 10.635 10.352 120.799 30.941 40.887 10.807 200.748 20.973 30.911 10.798 6
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
PTv3-PPT-ALCcopyleft0.798 20.911 120.812 240.854 80.770 130.856 160.555 180.943 20.660 270.735 20.979 20.606 80.492 10.792 50.934 50.841 30.819 60.716 100.947 110.906 20.822 1
Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann Blum: ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding. CVPR 2025
DITR ScanNet0.797 30.727 780.869 10.882 10.785 60.868 70.578 60.943 20.744 20.727 30.979 20.627 30.364 90.824 10.949 20.779 160.844 10.757 10.982 10.905 30.802 3
Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, Bastian Leibe: DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation. 3DV 2026
PTv3 ScanNet0.794 40.941 30.813 230.851 110.782 70.890 20.597 20.916 70.696 120.713 60.979 20.635 10.384 30.793 40.907 110.821 60.790 380.696 150.967 50.903 40.805 2
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao: Point Transformer V3: Simpler, Faster, Stronger. CVPR 2024 (Oral)
PonderV20.785 50.978 10.800 320.833 300.788 40.853 210.545 220.910 100.713 40.705 70.979 20.596 100.390 20.769 160.832 460.821 60.792 370.730 30.975 20.897 70.785 8
Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Wanli Ouyang: PonderV2: Pave the Way for 3D Foundataion Model with A Universal Pre-training Paradigm.
Mix3Dpermissive0.781 60.964 20.855 20.843 200.781 80.858 140.575 90.831 410.685 180.714 50.979 20.594 110.310 320.801 20.892 200.841 30.819 60.723 70.940 160.887 90.725 30
Alexey Nekrasov, Jonas Schult, Or Litany, Bastian Leibe, Francis Engelmann: Mix3D: Out-of-Context Data Augmentation for 3D Scenes. 3DV 2021 (Oral)
Swin3Dpermissive0.779 70.861 250.818 180.836 270.790 30.875 40.576 80.905 110.704 80.739 10.969 130.611 40.349 130.756 260.958 10.702 530.805 210.708 110.916 400.898 60.801 4
TTT-KD0.773 80.646 990.818 180.809 420.774 110.878 30.581 40.943 20.687 160.704 80.978 70.607 70.336 210.775 120.912 90.838 50.823 40.694 160.967 50.899 50.794 7
Lisa Weijler, Muhammad Jehanzeb Mirza, Leon Sick, Can Ekkazan, Pedro Hermosilla: TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models.
ResLFE_HDS0.772 90.939 40.824 80.854 80.771 120.840 360.564 140.900 130.686 170.677 150.961 190.537 370.348 140.769 160.903 130.785 140.815 90.676 270.939 170.880 140.772 12
OctFormerpermissive0.766 100.925 80.808 280.849 130.786 50.846 310.566 130.876 200.690 140.674 180.960 200.576 230.226 750.753 280.904 120.777 170.815 90.722 80.923 320.877 180.776 11
Peng-Shuai Wang: OctFormer: Octree-based Transformers for 3D Point Clouds. SIGGRAPH 2023
PPT-SpUNet-Joint0.766 100.932 50.794 380.829 320.751 270.854 190.540 260.903 120.630 400.672 190.963 170.565 270.357 100.788 60.900 150.737 320.802 220.685 210.950 90.887 90.780 9
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao: Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training. CVPR 2024
CU-Hybrid Net0.764 120.924 90.819 150.840 230.757 220.853 210.580 50.848 330.709 60.643 290.958 250.587 170.295 400.753 280.884 240.758 240.815 90.725 60.927 280.867 290.743 21
OccuSeg+Semantic0.764 120.758 630.796 360.839 240.746 310.907 10.562 150.850 320.680 200.672 190.978 70.610 50.335 230.777 100.819 500.847 20.830 30.691 180.972 40.885 110.727 28
O-CNNpermissive0.762 140.924 90.823 90.844 190.770 130.852 230.577 70.847 350.711 50.640 330.958 250.592 120.217 810.762 210.888 210.758 240.813 130.726 50.932 260.868 280.744 20
Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, Xin Tong: O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis. SIGGRAPH 2017
DiffSegNet0.758 150.725 800.789 430.843 200.762 180.856 160.562 150.920 50.657 300.658 230.958 250.589 150.337 200.782 70.879 250.787 120.779 430.678 230.926 300.880 140.799 5
DTC0.757 160.843 310.820 130.847 160.791 20.862 110.511 400.870 240.707 70.652 250.954 420.604 90.279 510.760 220.942 30.734 330.766 520.701 140.884 630.874 240.736 22
OA-CNN-L_ScanNet200.756 170.783 490.826 70.858 60.776 90.837 410.548 210.896 160.649 320.675 170.962 180.586 180.335 230.771 150.802 550.770 200.787 400.691 180.936 210.880 140.761 15
PNE0.755 180.786 470.835 60.834 290.758 200.849 260.570 110.836 400.648 330.668 210.978 70.581 210.367 70.683 410.856 340.804 90.801 260.678 230.961 70.889 80.716 37
P. Hermosilla: Point Neighborhood Embeddings.
LSK3DNetpermissive0.755 180.899 180.823 90.843 200.764 170.838 390.584 30.845 360.717 30.638 350.956 320.580 220.229 740.640 510.900 150.750 270.813 130.729 40.920 360.872 260.757 16
Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang: LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels. CVPR 2024
ConDaFormer0.755 180.927 70.822 110.836 270.801 10.849 260.516 370.864 290.651 310.680 140.958 250.584 200.282 480.759 240.855 360.728 350.802 220.678 230.880 680.873 250.756 18
Lunhao Duan, Shanshan Zhao, Nan Xue, Mingming Gong, Guisong Xia, Dacheng Tao: ConDaFormer : Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding. Neurips, 2023
DMF-Net0.752 210.906 160.793 400.802 480.689 480.825 540.556 170.867 250.681 190.602 520.960 200.555 330.365 80.779 90.859 310.747 280.795 340.717 90.917 390.856 370.764 14
C.Yang, Y.Yan, W.Zhao, J.Ye, X.Yang, A.Hussain, B.Dong, K.Huang: Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation. ICONIP 2023
PointTransformerV20.752 210.742 700.809 270.872 20.758 200.860 130.552 190.891 180.610 470.687 90.960 200.559 310.304 350.766 190.926 70.767 210.797 300.644 400.942 140.876 210.722 33
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao: Point Transformer V2: Grouped Vector Attention and Partition-based Pooling. NeurIPS 2022
PointConvFormer0.749 230.793 450.790 410.807 440.750 290.856 160.524 330.881 190.588 600.642 320.977 110.591 130.274 540.781 80.929 60.804 90.796 310.642 410.947 110.885 110.715 38
Wenxuan Wu, Qi Shan, Li Fuxin: PointConvFormer: Revenge of the Point-based Convolution.
BPNetcopyleft0.749 230.909 140.818 180.811 400.752 250.839 380.485 550.842 370.673 220.644 280.957 300.528 440.305 340.773 130.859 310.788 110.818 80.693 170.916 400.856 370.723 32
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia, Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
MSP0.748 250.623 1020.804 300.859 50.745 320.824 560.501 440.912 90.690 140.685 110.956 320.567 260.320 290.768 180.918 80.720 400.802 220.676 270.921 340.881 130.779 10
StratifiedFormerpermissive0.747 260.901 170.803 310.845 180.757 220.846 310.512 390.825 440.696 120.645 270.956 320.576 230.262 650.744 340.861 300.742 300.770 500.705 120.899 520.860 340.734 23
Xin Lai*, Jianhui Liu*, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia: Stratified Transformer for 3D Point Cloud Segmentation. CVPR 2022
VMNetpermissive0.746 270.870 230.838 40.858 60.729 370.850 250.501 440.874 210.587 610.658 230.956 320.564 280.299 370.765 200.900 150.716 430.812 150.631 460.939 170.858 350.709 39
Zeyu HU, Xuyang Bai, Jiaxiang Shang, Runze Zhang, Jiayu Dong, Xin Wang, Guangyuan Sun, Hongbo Fu, Chiew-Lan Tai: VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation. ICCV 2021 (Oral)
Virtual MVFusion0.746 270.771 570.819 150.848 150.702 440.865 100.397 930.899 140.699 100.664 220.948 640.588 160.330 250.746 330.851 400.764 220.796 310.704 130.935 220.866 300.728 26
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
DiffSeg3D20.745 290.725 800.814 220.837 250.751 270.831 480.514 380.896 160.674 210.684 120.960 200.564 280.303 360.773 130.820 490.713 460.798 290.690 200.923 320.875 220.757 16
ODINpermissive0.744 300.658 950.752 660.870 30.714 410.843 340.569 120.919 60.703 90.622 420.949 610.591 130.343 160.736 350.784 570.816 80.838 20.672 320.918 380.854 410.725 30
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki: ODIN: A Single Model for 2D and 3D Segmentation. CVPR 2024
Retro-FPN0.744 300.842 320.800 320.767 630.740 330.836 430.541 240.914 80.672 230.626 390.958 250.552 340.272 560.777 100.886 230.696 540.801 260.674 300.941 150.858 350.717 35
Peng Xiang*, Xin Wen*, Yu-Shen Liu, Hui Zhang, Yi Fang, Zhizhong Han: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation. ICCV 2023
EQ-Net0.743 320.620 1030.799 350.849 130.730 360.822 580.493 520.897 150.664 240.681 130.955 360.562 300.378 40.760 220.903 130.738 310.801 260.673 310.907 440.877 180.745 19
Zetong Yang*, Li Jiang*, Yanan Sun, Bernt Schiele, Jiaya JIa: A Unified Query-based Paradigm for Point Cloud Understanding. CVPR 2022
SAT0.742 330.860 260.765 570.819 350.769 150.848 280.533 280.829 420.663 250.631 380.955 360.586 180.274 540.753 280.896 180.729 340.760 580.666 340.921 340.855 390.733 24
LRPNet0.742 330.816 400.806 290.807 440.752 250.828 520.575 90.839 390.699 100.637 360.954 420.520 480.320 290.755 270.834 440.760 230.772 470.676 270.915 420.862 320.717 35
LargeKernel3D0.739 350.909 140.820 130.806 460.740 330.852 230.545 220.826 430.594 590.643 290.955 360.541 360.263 640.723 390.858 330.775 190.767 510.678 230.933 240.848 450.694 44
Yukang Chen*, Jianhui Liu*, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia: LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs. CVPR 2023
RPN0.736 360.776 530.790 410.851 110.754 240.854 190.491 540.866 270.596 580.686 100.955 360.536 380.342 170.624 580.869 270.787 120.802 220.628 470.927 280.875 220.704 41
MinkowskiNetpermissive0.736 360.859 270.818 180.832 310.709 420.840 360.521 350.853 310.660 270.643 290.951 530.544 350.286 460.731 370.893 190.675 630.772 470.683 220.874 750.852 430.727 28
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
IPCA0.731 380.890 190.837 50.864 40.726 380.873 50.530 320.824 450.489 950.647 260.978 70.609 60.336 210.624 580.733 650.758 240.776 450.570 730.949 100.877 180.728 26
MS-SFA-net0.730 390.910 130.819 150.837 250.698 450.838 390.532 300.872 220.605 510.676 160.959 240.535 400.341 180.649 470.598 890.708 480.810 160.664 360.895 550.879 170.771 13
online3d0.727 400.715 850.777 500.854 80.748 300.858 140.497 490.872 220.572 680.639 340.957 300.523 450.297 390.750 310.803 540.744 290.810 160.587 690.938 190.871 270.719 34
SparseConvNet0.725 410.647 980.821 120.846 170.721 390.869 60.533 280.754 660.603 540.614 440.955 360.572 250.325 270.710 400.870 260.724 380.823 40.628 470.934 230.865 310.683 47
PointTransformer++0.725 410.727 780.811 260.819 350.765 160.841 350.502 430.814 500.621 430.623 410.955 360.556 320.284 470.620 600.866 280.781 150.757 620.648 380.932 260.862 320.709 39
MatchingNet0.724 430.812 420.812 240.810 410.735 350.834 450.495 510.860 300.572 680.602 520.954 420.512 500.280 500.757 250.845 420.725 370.780 420.606 570.937 200.851 440.700 43
INS-Conv-semantic0.717 440.751 660.759 600.812 390.704 430.868 70.537 270.842 370.609 490.608 480.953 460.534 410.293 410.616 610.864 290.719 420.793 350.640 420.933 240.845 490.663 53
PointMetaBase0.714 450.835 330.785 450.821 330.684 500.846 310.531 310.865 280.614 440.596 560.953 460.500 530.246 700.674 420.888 210.692 550.764 540.624 490.849 900.844 500.675 49
contrastBoundarypermissive0.705 460.769 600.775 510.809 420.687 490.820 610.439 810.812 510.661 260.591 580.945 720.515 490.171 1000.633 550.856 340.720 400.796 310.668 330.889 600.847 460.689 45
Liyao Tang, Yibing Zhan, Zhe Chen, Baosheng Yu, Dacheng Tao: Contrastive Boundary Learning for Point Cloud Segmentation. CVPR2022
ClickSeg_Semantic0.703 470.774 550.800 320.793 540.760 190.847 300.471 590.802 540.463 1020.634 370.968 150.491 560.271 580.726 380.910 100.706 490.815 90.551 850.878 690.833 510.570 85
RFCR0.702 480.889 200.745 720.813 380.672 530.818 650.493 520.815 490.623 410.610 460.947 660.470 650.249 690.594 650.848 410.705 500.779 430.646 390.892 580.823 570.611 68
Jingyu Gong, Jiachen Xu, Xin Tan, Haichuan Song, Yanyun Qu, Yuan Xie, Lizhuang Ma: Omni-Supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning. CVPR2021
One Thing One Click0.701 490.825 370.796 360.723 700.716 400.832 470.433 830.816 470.634 380.609 470.969 130.418 910.344 150.559 770.833 450.715 440.808 190.560 790.902 490.847 460.680 48
JSENetpermissive0.699 500.881 220.762 580.821 330.667 540.800 780.522 340.792 570.613 450.607 490.935 920.492 550.205 870.576 700.853 380.691 570.758 600.652 370.872 780.828 540.649 57
Zeyu HU, Mingmin Zhen, Xuyang BAI, Hongbo Fu, Chiew-lan Tai: JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds. ECCV 2020
One-Thing-One-Click0.693 510.743 690.794 380.655 930.684 500.822 580.497 490.719 760.622 420.617 430.977 110.447 780.339 190.750 310.664 820.703 520.790 380.596 620.946 130.855 390.647 58
Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu: One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation. CVPR 2021
PicassoNet-IIpermissive0.692 520.732 740.772 520.786 550.677 520.866 90.517 360.848 330.509 880.626 390.952 510.536 380.225 770.545 830.704 720.689 600.810 160.564 780.903 480.854 410.729 25
Huan Lei, Naveed Akhtar, Mubarak Shah, and Ajmal Mian: Geometric feature learning for 3D meshes.
Feature_GeometricNetpermissive0.690 530.884 210.754 640.795 520.647 610.818 650.422 850.802 540.612 460.604 500.945 720.462 680.189 950.563 760.853 380.726 360.765 530.632 450.904 460.821 600.606 72
Kangcheng Liu, Ben M. Chen: https://arxiv.org/abs/2012.09439. arXiv Preprint
FusionNet0.688 540.704 870.741 760.754 670.656 560.829 500.501 440.741 710.609 490.548 660.950 570.522 470.371 50.633 550.756 600.715 440.771 490.623 500.861 860.814 630.658 54
Feihu Zhang, Jin Fang, Benjamin Wah, Philip Torr: Deep FusionNet for Point Cloud Semantic Segmentation. ECCV 2020
Feature-Geometry Netpermissive0.685 550.866 240.748 690.819 350.645 630.794 810.450 710.802 540.587 610.604 500.945 720.464 670.201 900.554 790.840 430.723 390.732 730.602 600.907 440.822 590.603 75
KP-FCNN0.684 560.847 300.758 620.784 570.647 610.814 680.473 580.772 600.605 510.594 570.935 920.450 760.181 980.587 660.805 530.690 580.785 410.614 530.882 650.819 610.632 64
H. Thomas, C. Qi, J. Deschaud, B. Marcotegui, F. Goulette, L. Guibas.: KPConv: Flexible and Deformable Convolution for Point Clouds. ICCV 2019
VACNN++0.684 560.728 770.757 630.776 600.690 460.804 760.464 640.816 470.577 670.587 590.945 720.508 520.276 530.671 430.710 700.663 680.750 660.589 670.881 660.832 530.653 56
DGNet0.684 560.712 860.784 460.782 590.658 550.835 440.499 480.823 460.641 350.597 550.950 570.487 580.281 490.575 710.619 860.647 760.764 540.620 520.871 810.846 480.688 46
Superpoint Network0.683 590.851 290.728 800.800 510.653 580.806 740.468 610.804 520.572 680.602 520.946 690.453 750.239 730.519 880.822 470.689 600.762 570.595 640.895 550.827 550.630 65
PointContrast_LA_SEM0.683 590.757 640.784 460.786 550.639 650.824 560.408 880.775 590.604 530.541 680.934 960.532 420.269 600.552 800.777 580.645 790.793 350.640 420.913 430.824 560.671 50
VI-PointConv0.676 610.770 590.754 640.783 580.621 690.814 680.552 190.758 640.571 710.557 640.954 420.529 430.268 620.530 860.682 760.675 630.719 760.603 590.888 610.833 510.665 52
Xingyi Li, Wenxuan Wu, Xiaoli Z. Fern, Li Fuxin: The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions.
ROSMRF3D0.673 620.789 460.748 690.763 650.635 670.814 680.407 900.747 680.581 650.573 610.950 570.484 590.271 580.607 620.754 610.649 730.774 460.596 620.883 640.823 570.606 72
SALANet0.670 630.816 400.770 550.768 620.652 590.807 730.451 680.747 680.659 290.545 670.924 1020.473 640.149 1100.571 730.811 520.635 830.746 670.623 500.892 580.794 770.570 85
O3DSeg0.668 640.822 380.771 540.496 1140.651 600.833 460.541 240.761 630.555 770.611 450.966 160.489 570.370 60.388 1070.580 900.776 180.751 640.570 730.956 80.817 620.646 59
PointConvpermissive0.666 650.781 500.759 600.699 780.644 640.822 580.475 570.779 580.564 740.504 850.953 460.428 850.203 890.586 680.754 610.661 690.753 630.588 680.902 490.813 650.642 60
Wenxuan Wu, Zhongang Qi, Li Fuxin: PointConv: Deep Convolutional Networks on 3D Point Clouds. CVPR 2019
PointASNLpermissive0.666 650.703 880.781 480.751 690.655 570.830 490.471 590.769 610.474 980.537 700.951 530.475 630.279 510.635 530.698 750.675 630.751 640.553 840.816 970.806 670.703 42
Xu Yan, Chaoda Zheng, Zhen Li, Sheng Wang, Shuguang Cui: PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling. CVPR 2020
PPCNN++permissive0.663 670.746 670.708 830.722 710.638 660.820 610.451 680.566 1040.599 560.541 680.950 570.510 510.313 310.648 490.819 500.616 880.682 910.590 660.869 820.810 660.656 55
Pyunghwan Ahn, Juyoung Yang, Eojindl Yi, Chanho Lee, Junmo Kim: Projection-based Point Convolution for Efficient Point Cloud Segmentation. IEEE Access
DCM-Net0.658 680.778 510.702 860.806 460.619 700.813 710.468 610.693 840.494 910.524 760.941 840.449 770.298 380.510 900.821 480.675 630.727 750.568 760.826 950.803 700.637 62
Jonas Schult*, Francis Engelmann*, Theodora Kontogianni, Bastian Leibe: DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes. CVPR 2020 [Oral]
MVF-GNN0.658 680.558 1100.751 670.655 930.690 460.722 1030.453 670.867 250.579 660.576 600.893 1140.523 450.293 410.733 360.571 920.692 550.659 980.606 570.875 720.804 690.668 51
HPGCNN0.656 700.698 900.743 740.650 950.564 870.820 610.505 420.758 640.631 390.479 890.945 720.480 610.226 750.572 720.774 590.690 580.735 710.614 530.853 890.776 920.597 78
Jisheng Dang, Qingyong Hu, Yulan Guo, Jun Yang: HPGCNN.
SAFNet-segpermissive0.654 710.752 650.734 780.664 910.583 820.815 670.399 920.754 660.639 360.535 720.942 820.470 650.309 330.665 440.539 940.650 720.708 810.635 440.857 880.793 790.642 60
Linqing Zhao, Jiwen Lu, Jie Zhou: Similarity-Aware Fusion Network for 3D Semantic Segmentation. IROS 2021
RandLA-Netpermissive0.645 720.778 510.731 790.699 780.577 830.829 500.446 730.736 720.477 970.523 780.945 720.454 720.269 600.484 970.749 640.618 860.738 690.599 610.827 940.792 820.621 67
PointConv-SFPN0.641 730.776 530.703 850.721 720.557 900.826 530.451 680.672 890.563 750.483 880.943 810.425 880.162 1050.644 500.726 660.659 700.709 800.572 720.875 720.786 870.559 91
MVPNetpermissive0.641 730.831 340.715 810.671 880.590 780.781 870.394 940.679 860.642 340.553 650.937 890.462 680.256 660.649 470.406 1070.626 840.691 880.666 340.877 700.792 820.608 71
Maximilian Jaritz, Jiayuan Gu, Hao Su: Multi-view PointNet for 3D Scene Understanding. GMDL Workshop, ICCV 2019
PointMRNet0.640 750.717 840.701 870.692 810.576 840.801 770.467 630.716 770.563 750.459 950.953 460.429 840.169 1020.581 690.854 370.605 890.710 780.550 860.894 570.793 790.575 83
FPConvpermissive0.639 760.785 480.760 590.713 760.603 730.798 790.392 960.534 1090.603 540.524 760.948 640.457 700.250 680.538 840.723 680.598 930.696 860.614 530.872 780.799 720.567 88
Yiqun Lin, Zizheng Yan, Haibin Huang, Dong Du, Ligang Liu, Shuguang Cui, Xiaoguang Han: FPConv: Learning Local Flattening for Point Convolution. CVPR 2020
PD-Net0.638 770.797 440.769 560.641 1000.590 780.820 610.461 650.537 1080.637 370.536 710.947 660.388 980.206 860.656 450.668 800.647 760.732 730.585 700.868 830.793 790.473 111
PointSPNet0.637 780.734 730.692 940.714 750.576 840.797 800.446 730.743 700.598 570.437 1000.942 820.403 940.150 1090.626 570.800 560.649 730.697 850.557 820.846 910.777 910.563 89
SConv0.636 790.830 350.697 900.752 680.572 860.780 890.445 750.716 770.529 810.530 730.951 530.446 790.170 1010.507 920.666 810.636 820.682 910.541 920.886 620.799 720.594 79
Supervoxel-CNN0.635 800.656 960.711 820.719 730.613 710.757 980.444 780.765 620.534 800.566 620.928 1000.478 620.272 560.636 520.531 960.664 670.645 1020.508 1000.864 850.792 820.611 68
joint point-basedpermissive0.634 810.614 1040.778 490.667 900.633 680.825 540.420 860.804 520.467 1000.561 630.951 530.494 540.291 430.566 740.458 1020.579 990.764 540.559 810.838 920.814 630.598 77
Hung-Yueh Chiang, Yen-Liang Lin, Yueh-Cheng Liu, Winston H. Hsu: A Unified Point-Based Framework for 3D Segmentation. 3DV 2019
PointMTL0.632 820.731 750.688 970.675 850.591 770.784 860.444 780.565 1050.610 470.492 860.949 610.456 710.254 670.587 660.706 710.599 920.665 970.612 560.868 830.791 850.579 82
3DSM_DMMF0.631 830.626 1010.745 720.801 490.607 720.751 990.506 410.729 750.565 730.491 870.866 1170.434 800.197 930.595 640.630 850.709 470.705 830.560 790.875 720.740 1020.491 106
APCF-Net0.631 830.742 700.687 990.672 860.557 900.792 840.408 880.665 910.545 780.508 820.952 510.428 850.186 960.634 540.702 730.620 850.706 820.555 830.873 760.798 740.581 81
Haojia, Lin: Adaptive Pyramid Context Fusion for Point Cloud Perception. GRSL
PointNet2-SFPN0.631 830.771 570.692 940.672 860.524 960.837 410.440 800.706 820.538 790.446 970.944 780.421 900.219 800.552 800.751 630.591 950.737 700.543 910.901 510.768 940.557 92
FusionAwareConv0.630 860.604 1060.741 760.766 640.590 780.747 1000.501 440.734 730.503 900.527 740.919 1060.454 720.323 280.550 820.420 1060.678 620.688 890.544 890.896 540.795 760.627 66
Jiazhao Zhang, Chenyang Zhu, Lintao Zheng, Kai Xu: Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation. CVPR 2020
DenSeR0.628 870.800 430.625 1090.719 730.545 930.806 740.445 750.597 990.448 1050.519 800.938 880.481 600.328 260.489 960.499 1010.657 710.759 590.592 650.881 660.797 750.634 63
SegGroup_sempermissive0.627 880.818 390.747 710.701 770.602 740.764 950.385 1000.629 960.490 930.508 820.931 990.409 930.201 900.564 750.725 670.618 860.692 870.539 930.873 760.794 770.548 95
An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie Zhou: SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation. TIP 2022
dtc_net0.625 890.703 880.751 670.794 530.535 940.848 280.480 560.676 880.528 820.469 920.944 780.454 720.004 1220.464 990.636 840.704 510.758 600.548 880.924 310.787 860.492 105
SIConv0.625 890.830 350.694 920.757 660.563 880.772 930.448 720.647 940.520 840.509 810.949 610.431 830.191 940.496 940.614 870.647 760.672 950.535 960.876 710.783 880.571 84
Weakly-Openseg v30.625 890.924 90.787 440.620 1020.555 920.811 720.393 950.666 900.382 1130.520 790.953 460.250 1170.208 840.604 630.670 780.644 800.742 680.538 940.919 370.803 700.513 103
HPEIN0.618 920.729 760.668 1000.647 970.597 760.766 940.414 870.680 850.520 840.525 750.946 690.432 810.215 820.493 950.599 880.638 810.617 1070.570 730.897 530.806 670.605 74
Li Jiang, Hengshuang Zhao, Shu Liu, Xiaoyong Shen, Chi-Wing Fu, Jiaya Jia: Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation. ICCV 2019
SPH3D-GCNpermissive0.610 930.858 280.772 520.489 1150.532 950.792 840.404 910.643 950.570 720.507 840.935 920.414 920.046 1190.510 900.702 730.602 910.705 830.549 870.859 870.773 930.534 98
Huan Lei, Naveed Akhtar, and Ajmal Mian: Spherical Kernel for Efficient Graph Convolution on 3D Point Clouds. TPAMI 2020
AttAN0.609 940.760 620.667 1010.649 960.521 970.793 820.457 660.648 930.528 820.434 1020.947 660.401 950.153 1080.454 1000.721 690.648 750.717 770.536 950.904 460.765 950.485 107
Gege Zhang, Qinghua Ma, Licheng Jiao, Fang Liu and Qigong Sun: AttAN: Attention Adversarial Networks for 3D Point Cloud Semantic Segmentation. IJCAI2020
wsss-transformer0.600 950.634 1000.743 740.697 800.601 750.781 870.437 820.585 1020.493 920.446 970.933 970.394 960.011 1210.654 460.661 830.603 900.733 720.526 970.832 930.761 970.480 108
LAP-D0.594 960.720 820.692 940.637 1010.456 1060.773 920.391 980.730 740.587 610.445 990.940 860.381 990.288 440.434 1030.453 1040.591 950.649 1000.581 710.777 1010.749 1010.610 70
DPC0.592 970.720 820.700 880.602 1060.480 1020.762 970.380 1010.713 800.585 640.437 1000.940 860.369 1010.288 440.434 1030.509 1000.590 970.639 1050.567 770.772 1020.755 990.592 80
Francis Engelmann, Theodora Kontogianni, Bastian Leibe: Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds. ICRA 2020
CCRFNet0.589 980.766 610.659 1040.683 830.470 1050.740 1020.387 990.620 980.490 930.476 900.922 1040.355 1040.245 710.511 890.511 990.571 1000.643 1030.493 1040.872 780.762 960.600 76
ROSMRF0.580 990.772 560.707 840.681 840.563 880.764 950.362 1030.515 1100.465 1010.465 940.936 910.427 870.207 850.438 1010.577 910.536 1030.675 940.486 1050.723 1080.779 890.524 100
SD-DETR0.576 1000.746 670.609 1130.445 1190.517 980.643 1140.366 1020.714 790.456 1030.468 930.870 1160.432 810.264 630.558 780.674 770.586 980.688 890.482 1060.739 1060.733 1040.537 97
SQN_0.1%0.569 1010.676 920.696 910.657 920.497 990.779 900.424 840.548 1060.515 860.376 1070.902 1130.422 890.357 100.379 1080.456 1030.596 940.659 980.544 890.685 1110.665 1150.556 93
TextureNetpermissive0.566 1020.672 940.664 1020.671 880.494 1000.719 1040.445 750.678 870.411 1110.396 1050.935 920.356 1030.225 770.412 1050.535 950.565 1010.636 1060.464 1080.794 1000.680 1120.568 87
Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkerhouser, Matthias Niessner, Leonidas Guibas: TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes. CVPR
DVVNet0.562 1030.648 970.700 880.770 610.586 810.687 1080.333 1070.650 920.514 870.475 910.906 1100.359 1020.223 790.340 1100.442 1050.422 1140.668 960.501 1010.708 1090.779 890.534 98
Pointnet++ & Featurepermissive0.557 1040.735 720.661 1030.686 820.491 1010.744 1010.392 960.539 1070.451 1040.375 1080.946 690.376 1000.205 870.403 1060.356 1100.553 1020.643 1030.497 1020.824 960.756 980.515 101
GMLPs0.538 1050.495 1150.693 930.647 970.471 1040.793 820.300 1100.477 1110.505 890.358 1090.903 1120.327 1070.081 1160.472 980.529 970.448 1120.710 780.509 980.746 1040.737 1030.554 94
PanopticFusion-label0.529 1060.491 1160.688 970.604 1050.386 1110.632 1150.225 1210.705 830.434 1080.293 1150.815 1190.348 1050.241 720.499 930.669 790.507 1050.649 1000.442 1140.796 990.602 1190.561 90
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. IROS 2019 (to appear)
subcloud_weak0.516 1070.676 920.591 1160.609 1030.442 1070.774 910.335 1060.597 990.422 1100.357 1100.932 980.341 1060.094 1150.298 1120.528 980.473 1100.676 930.495 1030.602 1170.721 1070.349 119
Online SegFusion0.515 1080.607 1050.644 1070.579 1080.434 1080.630 1160.353 1040.628 970.440 1060.410 1030.762 1220.307 1090.167 1030.520 870.403 1080.516 1040.565 1100.447 1120.678 1120.701 1090.514 102
Davide Menini, Suryansh Kumar, Martin R. Oswald, Erik Sandstroem, Cristian Sminchisescu, Luc van Gool: A Real-Time Learning Framework for Joint 3D Reconstruction and Semantic Segmentation. Robotics and Automation Letters Submission
3DMV, FTSDF0.501 1090.558 1100.608 1140.424 1210.478 1030.690 1070.246 1170.586 1010.468 990.450 960.911 1080.394 960.160 1060.438 1010.212 1170.432 1130.541 1150.475 1070.742 1050.727 1050.477 109
PCNN0.498 1100.559 1090.644 1070.560 1100.420 1100.711 1060.229 1190.414 1120.436 1070.352 1110.941 840.324 1080.155 1070.238 1170.387 1090.493 1060.529 1160.509 980.813 980.751 1000.504 104
3DMV0.484 1110.484 1170.538 1190.643 990.424 1090.606 1190.310 1080.574 1030.433 1090.378 1060.796 1200.301 1100.214 830.537 850.208 1180.472 1110.507 1190.413 1170.693 1100.602 1190.539 96
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
PointCNN with RGBpermissive0.458 1120.577 1080.611 1120.356 1230.321 1190.715 1050.299 1120.376 1160.328 1190.319 1130.944 780.285 1120.164 1040.216 1200.229 1150.484 1080.545 1140.456 1100.755 1030.709 1080.475 110
Yangyan Li, Rui Bu, Mingchao Sun, Baoquan Chen: PointCNN. NeurIPS 2018
FCPNpermissive0.447 1130.679 910.604 1150.578 1090.380 1120.682 1090.291 1130.106 1230.483 960.258 1210.920 1050.258 1160.025 1200.231 1190.325 1110.480 1090.560 1120.463 1090.725 1070.666 1140.231 123
Dario Rethage, Johanna Wald, Jürgen Sturm, Nassir Navab, Federico Tombari: Fully-Convolutional Point Networks for Large-Scale Point Clouds. ECCV 2018
DGCNN_reproducecopyleft0.446 1140.474 1180.623 1100.463 1170.366 1140.651 1120.310 1080.389 1150.349 1170.330 1120.937 890.271 1140.126 1120.285 1130.224 1160.350 1190.577 1090.445 1130.625 1150.723 1060.394 115
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, Justin M. Solomon: Dynamic Graph CNN for Learning on Point Clouds. TOG 2019
PNET20.442 1150.548 1120.548 1180.597 1070.363 1150.628 1170.300 1100.292 1180.374 1140.307 1140.881 1150.268 1150.186 960.238 1170.204 1190.407 1150.506 1200.449 1110.667 1130.620 1180.462 113
SurfaceConvPF0.442 1150.505 1140.622 1110.380 1220.342 1170.654 1110.227 1200.397 1140.367 1150.276 1170.924 1020.240 1180.198 920.359 1090.262 1130.366 1160.581 1080.435 1150.640 1140.668 1130.398 114
Hao Pan, Shilin Liu, Yang Liu, Xin Tong: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames.
Tangent Convolutionspermissive0.438 1170.437 1200.646 1060.474 1160.369 1130.645 1130.353 1040.258 1200.282 1220.279 1160.918 1070.298 1110.147 1110.283 1140.294 1120.487 1070.562 1110.427 1160.619 1160.633 1170.352 118
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou: Tangent convolutions for dense prediction in 3d. CVPR 2018
3DWSSS0.425 1180.525 1130.647 1050.522 1110.324 1180.488 1230.077 1240.712 810.353 1160.401 1040.636 1240.281 1130.176 990.340 1100.565 930.175 1230.551 1130.398 1180.370 1240.602 1190.361 117
SPLAT Netcopyleft0.393 1190.472 1190.511 1200.606 1040.311 1200.656 1100.245 1180.405 1130.328 1190.197 1220.927 1010.227 1200.000 1240.001 1250.249 1140.271 1220.510 1170.383 1200.593 1180.699 1100.267 121
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz: SPLATNet: Sparse Lattice Networks for Point Cloud Processing. CVPR 2018
ScanNet+FTSDF0.383 1200.297 1220.491 1210.432 1200.358 1160.612 1180.274 1150.116 1220.411 1110.265 1180.904 1110.229 1190.079 1170.250 1150.185 1200.320 1200.510 1170.385 1190.548 1190.597 1220.394 115
PointNet++permissive0.339 1210.584 1070.478 1220.458 1180.256 1220.360 1240.250 1160.247 1210.278 1230.261 1200.677 1230.183 1210.117 1130.212 1210.145 1220.364 1170.346 1240.232 1240.548 1190.523 1230.252 122
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas: pointnet++: deep hierarchical feature learning on point sets in a metric space.
GrowSP++0.323 1220.114 1240.589 1170.499 1130.147 1240.555 1200.290 1140.336 1170.290 1210.262 1190.865 1180.102 1240.000 1240.037 1230.000 1250.000 1250.462 1210.381 1210.389 1230.664 1160.473 111
SSC-UNetpermissive0.308 1230.353 1210.290 1240.278 1240.166 1230.553 1210.169 1230.286 1190.147 1240.148 1240.908 1090.182 1220.064 1180.023 1240.018 1240.354 1180.363 1220.345 1220.546 1210.685 1110.278 120
ScanNetpermissive0.306 1240.203 1230.366 1230.501 1120.311 1200.524 1220.211 1220.002 1250.342 1180.189 1230.786 1210.145 1230.102 1140.245 1160.152 1210.318 1210.348 1230.300 1230.460 1220.437 1240.182 124
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
ERROR0.054 1250.000 1250.041 1250.172 1250.030 1250.062 1250.001 1250.035 1240.004 1250.051 1250.143 1250.019 1250.003 1230.041 1220.050 1230.003 1240.054 1250.018 1250.005 1250.264 1250.082 125


This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg ap 25%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Volt-SPFormerScanNetpermissive0.908 11.000 10.981 220.975 10.885 10.964 60.744 230.845 180.906 160.916 20.842 10.820 20.879 10.959 520.955 90.944 10.872 180.999 410.869 3
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
PointRel0.901 21.000 10.978 260.928 40.879 20.962 80.882 50.749 410.947 30.912 30.802 40.753 220.820 31.000 10.984 40.919 70.894 31.000 10.815 17
: Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation. CVPR 2025
PointComp0.897 31.000 10.998 60.864 200.869 40.969 40.830 80.783 330.905 170.894 110.791 50.834 10.769 141.000 10.982 50.920 60.868 211.000 10.872 2
Competitor-MAFT0.896 41.000 11.000 10.872 180.847 120.967 50.955 10.778 350.901 190.919 10.784 80.812 40.770 131.000 10.949 100.865 380.868 201.000 10.840 7
OneFormer3Dcopyleft0.896 41.000 11.000 10.913 70.858 70.951 140.786 160.837 200.916 140.908 50.778 110.803 80.750 161.000 10.976 70.926 50.882 70.995 510.849 4
Maxim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: OneFormer3D: One Transformer for Unified Point Cloud Segmentation.
MG-Former0.887 61.000 10.991 160.837 280.801 280.935 230.887 40.857 110.946 40.891 120.748 210.805 70.739 181.000 10.993 20.809 620.876 151.000 10.842 6
DCD0.885 71.000 10.933 440.856 240.832 170.959 100.930 20.858 100.802 410.859 200.767 120.796 120.709 231.000 10.971 80.871 320.904 11.000 10.874 1
KmaxOneFormerNetpermissive0.883 81.000 11.000 10.798 430.848 110.971 20.853 70.903 30.827 350.910 40.748 200.809 60.724 201.000 10.980 60.855 440.844 271.000 10.832 8
InsSSM0.883 81.000 10.996 80.800 420.865 50.960 90.808 130.852 160.940 70.899 100.785 70.810 50.700 251.000 10.912 230.851 470.895 20.997 440.827 10
Lei Yao, Yi Wang, Moyun Liu, Lap-Pui Chau: SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation. TCSVT, 2024
Competitor-SPFormer0.881 101.000 11.000 10.845 260.854 80.962 70.714 260.857 120.904 180.902 80.782 100.789 150.662 311.000 10.988 30.874 290.886 60.997 440.847 5
VDG-Uni3DSeg0.880 111.000 10.990 180.889 110.823 210.952 130.764 180.893 60.941 60.907 60.756 170.781 170.628 491.000 10.918 210.903 100.872 190.999 410.821 14
TST3D0.879 121.000 10.994 110.921 60.807 270.939 200.771 170.887 70.923 120.862 190.722 260.768 190.756 151.000 10.910 340.904 90.836 300.999 410.824 12
Duc Tran Dang Trung, Byeongkeun Kang, Yeejin Lee: MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation. ACM Multimedia 2024
SIM3D0.878 131.000 10.972 280.863 210.817 250.952 120.821 110.783 310.890 220.902 90.735 240.797 100.799 101.000 10.931 180.893 160.853 251.000 10.792 21
EV3D0.877 141.000 10.996 100.873 160.854 90.950 150.691 300.783 320.926 90.889 150.754 180.794 140.820 31.000 10.912 230.900 120.860 231.000 10.779 24
TD3Dpermissive0.875 151.000 10.976 270.877 140.783 340.970 30.889 30.828 210.945 50.803 270.713 280.720 290.709 221.000 10.936 160.934 40.873 161.000 10.791 22
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Spherical Mask(CtoF)0.875 151.000 10.991 170.873 160.850 100.946 170.691 300.752 400.926 90.889 140.759 150.794 130.820 31.000 10.912 230.900 120.878 121.000 10.769 26
Queryformer0.874 171.000 10.978 250.809 400.876 30.936 220.702 270.716 460.920 130.875 180.766 130.772 180.818 71.000 10.995 10.916 80.892 41.000 10.767 27
SoftGroup++0.874 171.000 10.972 290.947 20.839 150.898 310.556 450.913 20.881 250.756 290.828 30.748 240.821 21.000 10.937 150.937 20.887 51.000 10.821 13
UniPerception0.870 191.000 10.998 60.770 530.835 160.972 10.762 190.754 390.928 80.845 210.790 60.819 30.717 210.981 480.915 220.890 180.878 101.000 10.809 18
Mask3D0.870 191.000 10.985 200.782 500.818 240.938 210.760 200.749 410.923 110.877 170.760 140.785 160.820 31.000 10.912 230.864 400.878 120.983 570.825 11
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
ExtMask3D0.867 211.000 11.000 10.756 580.816 260.940 190.795 140.760 380.862 270.888 160.739 220.763 200.774 111.000 10.929 190.878 260.879 91.000 10.819 16
SoftGrouppermissive0.865 221.000 10.969 300.860 220.860 60.913 270.558 420.899 40.911 150.760 280.828 20.736 260.802 90.981 480.919 200.875 270.877 141.000 10.820 15
Thang Vu, Kookhoi Kim, Tung M. Luu, Xuan Thanh Nguyen, Chang D. Yoo: SoftGroup for 3D Instance Segmentaiton on Point Clouds. CVPR 2022 [Oral]
MAFT0.860 231.000 10.990 180.810 390.829 180.949 160.809 120.688 520.836 320.904 70.751 190.796 110.741 171.000 10.864 440.848 490.837 281.000 10.828 9
IPCA-Inst0.851 241.000 10.968 310.884 130.842 140.862 440.693 290.812 260.888 240.677 410.783 90.698 300.807 81.000 10.911 310.865 390.865 221.000 10.757 30
SPFormerpermissive0.851 241.000 10.994 120.806 410.774 360.942 180.637 340.849 170.859 290.889 130.720 270.730 270.665 301.000 10.911 310.868 370.873 171.000 10.796 20
Sun Jiahao, Qing Chunmei, Tan Junpeng, Xu Xiangmin: Superpoint Transformer for 3D Scene Instance Segmentation. AAAI 2023 [Oral]
ODIN - Inspermissive0.847 261.000 10.951 370.834 330.828 190.875 360.871 60.767 360.821 370.816 240.690 350.800 90.771 121.000 10.912 230.891 170.821 310.886 730.713 37
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki: ODIN: A Single Model for 2D and 3D Segmentation. CVPR 2024
Mask3D_evaluation0.843 271.000 10.955 360.847 250.795 300.932 240.750 220.780 340.891 210.818 230.737 230.633 390.703 241.000 10.902 360.870 330.820 320.941 650.805 19
SphereSeg0.835 281.000 10.963 340.891 100.794 310.954 110.822 100.710 470.961 20.721 330.693 340.530 520.653 331.000 10.867 430.857 430.859 240.991 540.771 25
ISBNetpermissive0.835 281.000 10.950 380.731 600.819 220.918 250.790 150.740 430.851 310.831 220.661 370.742 250.650 341.000 10.937 140.814 610.836 291.000 10.765 28
Tuan Duc Ngo, Binh-Son Hua, Khoi Nguyen: ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution. CVPR 2023
GraphCut0.832 301.000 10.922 530.724 620.798 290.902 300.701 280.856 140.859 280.715 340.706 290.748 230.640 451.000 10.934 170.862 410.880 81.000 10.729 33
TopoSeg0.832 301.000 10.981 230.933 30.819 230.826 530.524 510.841 190.811 380.681 400.759 160.687 310.727 190.981 480.911 310.883 220.853 261.000 10.756 31
PBNetpermissive0.825 321.000 10.963 330.837 300.843 130.865 390.822 90.647 550.878 260.733 310.639 440.683 320.650 341.000 10.853 450.870 340.820 331.000 10.744 32
Weiguang Zhao, Yuyao Yan, Chaolong Yang, Jianan Ye, Xi Yang, Kaizhu Huang: Divide and Conquer: 3D Instance Segmentation With Point-Wise Binarization. ICCV 2023
SSEC0.820 331.000 10.983 210.924 50.826 200.817 560.415 600.899 50.793 430.673 420.731 250.636 370.653 321.000 10.939 130.804 640.878 111.000 10.780 23
DKNet0.815 341.000 10.930 450.844 270.765 400.915 260.534 490.805 280.805 400.807 260.654 380.763 210.650 341.000 10.794 570.881 230.766 371.000 10.758 29
Yizheng Wu, Min Shi, Shuaiyuan Du, Hao Lu, Zhiguo Cao, Weicai Zhong: 3D Instances as 1D Kernels. ECCV 2022
RPGN0.806 351.000 10.992 140.789 450.723 530.891 320.650 330.810 270.832 330.665 440.699 320.658 330.700 251.000 10.881 380.832 530.774 350.997 440.613 54
Shichao Dong, Guosheng Lin, Tzu-Yi Hung: Learning Regional Purity for Instance Segmentation on 3D Point Clouds. ECCV 2022
Box2Mask0.803 361.000 10.962 350.874 150.707 570.887 350.686 320.598 600.961 10.715 350.694 330.469 570.700 251.000 10.912 230.902 110.753 420.997 440.637 48
Julian Chibane, Francis Engelmann, Tuan Anh Tran, Gerard Pons-Moll: Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes. ECCV 2022
HAISpermissive0.803 361.000 10.994 120.820 350.759 410.855 450.554 460.882 80.827 360.615 500.676 360.638 360.646 431.000 10.912 230.797 670.767 360.994 520.726 34
Shaoyu Chen, Jiemin Fang, Qian Zhang, Wenyu Liu, Xinggang Wang: Hierarchical Aggregation for 3D Instance Segmentation. ICCV 2021
Mask-Group0.792 381.000 10.968 320.812 360.766 390.864 400.460 540.815 250.888 230.598 540.651 410.639 350.600 520.918 550.941 110.896 150.721 491.000 10.723 35
Min Zhong, Xinghao Chen, Xiaokang Chen, Gang Zeng, Yunhe Wang: MaskGroup: Hierarchical Point Grouping and Masking for 3D Instance Segmentation. ICME 2022
CSC-Pretrained0.791 391.000 10.996 80.829 340.767 380.889 340.600 370.819 240.770 480.594 550.620 480.541 490.700 251.000 10.941 110.889 200.763 381.000 10.526 64
SSTNetpermissive0.789 401.000 10.840 670.888 120.717 540.835 490.717 250.684 530.627 630.724 320.652 400.727 280.600 521.000 10.912 230.822 560.757 411.000 10.691 42
Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui Jia: Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks. ICCV2021
GICN0.788 411.000 10.978 240.867 190.781 350.833 500.527 500.824 220.806 390.549 630.596 510.551 450.700 251.000 10.853 450.935 30.733 461.000 10.651 45
DANCENET0.786 421.000 10.936 410.783 480.737 500.852 470.742 240.647 550.765 500.811 250.624 470.579 420.632 481.000 10.909 350.898 140.696 540.944 610.601 57
DENet0.786 421.000 10.929 460.736 590.750 470.720 690.755 210.934 10.794 420.590 560.561 570.537 500.650 341.000 10.882 370.804 650.789 341.000 10.719 36
DualGroup0.782 441.000 10.927 470.811 370.772 370.853 460.631 360.805 280.773 450.613 510.611 490.610 400.650 340.835 660.881 380.879 250.750 441.000 10.675 43
PointGroup0.778 451.000 10.900 570.798 440.715 550.863 410.493 520.706 480.895 200.569 610.701 300.576 430.639 461.000 10.880 400.851 460.719 500.997 440.709 39
Li Jiang, Hengshuang Zhao, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia: PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation. CVPR 2020 [oral]
PE0.776 461.000 10.900 580.860 220.728 520.869 370.400 610.857 130.774 440.568 620.701 310.602 410.646 430.933 540.843 480.890 190.691 580.997 440.709 38
Biao Zhang, Peter Wonka: Point Cloud Instance Segmentation using Probabilistic Embeddings. CVPR 2021
AOIA0.767 471.000 10.937 400.810 380.740 490.906 280.550 470.800 300.706 550.577 600.624 460.544 480.596 570.857 580.879 420.880 240.750 430.992 530.658 44
DD-UNet+Group0.764 481.000 10.897 600.837 290.753 440.830 520.459 560.824 220.699 570.629 480.653 390.438 600.650 341.000 10.880 400.858 420.690 591.000 10.650 46
H. Liu, R. Liu, K. Yang, J. Zhang, K. Peng, R. Stiefelhagen: HIDA: Towards Holistic Indoor Understanding for the Visually Impaired via Semantic Instance Segmentation with a Wearable Solid-State LiDAR Sensor. ICCVW 2021
INS-Conv-instance0.762 491.000 10.923 500.765 540.785 330.905 290.600 370.655 540.646 620.683 390.647 420.530 510.650 341.000 10.824 500.830 540.693 570.944 610.644 47
Dyco3Dcopyleft0.761 501.000 10.935 420.893 90.752 460.863 420.600 370.588 610.742 520.641 460.633 450.546 470.550 590.857 580.789 590.853 450.762 390.987 550.699 40
Tong He; Chunhua Shen; Anton van den Hengel: DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution. CVPR2021
OccuSeg+instance0.742 511.000 10.923 500.785 460.745 480.867 380.557 430.578 640.729 530.670 430.644 430.488 550.577 581.000 10.794 570.830 540.620 671.000 10.550 60
Lei Han, Tian Zheng, Lan Xu, Lu Fang: OccuSeg: Occupancy-aware 3D Instance Segmentation. CVPR2020
RWSeg0.739 521.000 10.899 590.759 560.753 450.823 540.282 660.691 510.658 600.582 590.594 520.547 460.628 491.000 10.795 560.868 360.728 481.000 10.692 41
3D-MPA0.737 531.000 10.933 430.785 460.794 320.831 510.279 680.588 610.695 580.616 490.559 580.556 440.650 341.000 10.809 540.875 280.696 551.000 10.608 56
Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner: 3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation. CVPR 2020
MTML0.731 541.000 10.992 140.779 520.609 660.746 640.308 650.867 90.601 660.607 520.539 610.519 530.550 591.000 10.824 500.869 350.729 471.000 10.616 52
Jean Lahoud, Bernard Ghanem, Marc Pollefeys, Martin R. Oswald: 3D Instance Segmentation via Multi-task Metric Learning. ICCV 2019 [oral]
OSIS0.725 551.000 10.885 630.653 680.657 630.801 570.576 410.695 500.828 340.698 370.534 620.457 590.500 660.857 580.831 490.841 510.627 651.000 10.619 51
SSEN0.724 561.000 10.926 480.781 510.661 610.845 480.596 400.529 670.764 510.653 450.489 680.461 580.500 660.859 570.765 600.872 310.761 401.000 10.577 58
Dongsu Zhang, Junha Chun, Sang Kyun Cha, Young Min Kim: Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning. Arxiv
NeuralBF0.718 571.000 10.945 390.901 80.754 430.817 550.460 540.700 490.772 460.688 380.568 560.000 790.500 660.981 480.606 700.872 300.740 451.000 10.614 53
Weiwei Sun, Daniel Rebain, Renjie Liao, Vladimir Tankovich, Soroosh Yazdani, Kwang Moo Yi, Andrea Tagliasacchi: NeuralBF: Neural Bilateral Filtering for Top-down Instance Segmentation on Point Clouds. WACV 2023
Sparse R-CNN0.714 581.000 10.926 490.694 630.699 590.890 330.636 350.516 680.693 590.743 300.588 530.369 640.601 510.594 720.800 550.886 210.676 600.986 560.546 61
SALoss-ResNet0.695 591.000 10.855 650.579 730.589 680.735 670.484 530.588 610.856 300.634 470.571 550.298 650.500 661.000 10.824 500.818 570.702 530.935 680.545 62
Zhidong Liang, Ming Yang, Hao Li, Chunxiang Wang: 3D Instance Embedding Learning With a Structure-Aware Loss Function for Point Cloud Segmentation. IEEE Robotics and Automation Letters (IROS2020)
PanopticFusion-inst0.693 601.000 10.852 660.655 670.616 650.788 590.334 630.763 370.771 470.457 730.555 590.652 340.518 630.857 580.765 600.732 730.631 630.944 610.577 59
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. IROS 2019 (to appear)
Occipital-SCS0.688 611.000 10.913 540.730 610.737 510.743 660.442 570.855 150.655 610.546 640.546 600.263 670.508 650.889 560.568 710.771 700.705 520.889 710.625 50
3D-BoNet0.687 621.000 10.887 620.836 310.587 690.643 760.550 470.620 570.724 540.522 680.501 660.243 680.512 641.000 10.751 620.807 630.661 620.909 700.612 55
Bo Yang, Jianan Wang, Ronald Clark, Qingyong Hu, Sen Wang, Andrew Markham, Niki Trigoni: Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds. NeurIPS 2019 Spotlight
ClickSeg_Instance0.685 631.000 10.818 690.600 710.715 560.795 580.557 430.533 660.591 680.601 530.519 640.429 620.638 470.938 530.706 650.817 590.624 660.944 610.502 66
PCJC0.684 641.000 10.895 610.757 570.659 620.862 430.189 750.739 440.606 650.712 360.581 540.515 540.650 340.857 580.357 760.785 680.631 640.889 710.635 49
SPG_WSIS0.678 651.000 10.880 640.836 310.701 580.727 680.273 700.607 590.706 560.541 660.515 650.174 710.600 520.857 580.716 640.846 500.711 511.000 10.506 65
One_Thing_One_Clickpermissive0.675 661.000 10.823 680.782 490.621 640.766 610.211 720.736 450.560 700.586 570.522 630.636 380.453 700.641 700.853 450.850 480.694 560.997 440.411 71
Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu: One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation. CVPR 2021
SegGroup_inspermissive0.637 671.000 10.923 520.593 720.561 700.746 650.143 770.504 690.766 490.485 710.442 690.372 630.530 620.714 670.815 530.775 690.673 611.000 10.431 70
An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie Zhou: SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation. TIP 2022
MASCpermissive0.615 680.711 750.802 700.540 740.757 420.777 600.029 780.577 650.588 690.521 690.600 500.436 610.534 610.697 680.616 690.838 520.526 690.980 580.534 63
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
UNet-backbone0.605 691.000 10.909 550.764 550.603 670.704 700.415 590.301 740.548 710.461 720.394 700.267 660.386 720.857 580.649 680.817 580.504 710.959 590.356 74
3D-SISpermissive0.558 701.000 10.773 710.614 700.503 730.691 720.200 730.412 700.498 740.546 650.311 750.103 750.600 520.857 580.382 730.799 660.445 770.938 670.371 72
Ji Hou, Angela Dai, Matthias Niessner: 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. CVPR 2019
R-PointNet0.544 710.500 780.655 770.661 660.663 600.765 620.432 580.214 770.612 640.584 580.499 670.204 700.286 760.429 750.655 670.650 780.539 680.950 600.499 67
Hier3Dcopyleft0.540 721.000 10.727 720.626 690.467 760.693 710.200 730.412 700.480 750.528 670.318 740.077 780.600 520.688 690.382 730.768 710.472 730.941 650.350 75
Tan: HCFS3D: Hierarchical Coupled Feature Selection Network for 3D Semantic and Instance Segmentation.
Region-18class0.497 730.250 800.902 560.689 640.540 710.747 630.276 690.610 580.268 790.489 700.348 710.000 790.243 790.220 780.663 660.814 600.459 750.928 690.496 68
Sem_Recon_ins0.484 740.764 740.608 790.470 760.521 720.637 770.311 640.218 760.348 780.365 770.223 760.222 690.258 770.629 710.734 630.596 790.509 700.858 750.444 69
tmp0.474 751.000 10.727 720.433 780.481 750.673 740.022 800.380 720.517 730.436 750.338 730.128 730.343 740.429 750.291 780.728 740.473 720.833 760.300 77
SemRegionNet-20cls0.470 761.000 10.727 720.447 770.481 740.678 730.024 790.380 720.518 720.440 740.339 720.128 730.350 730.429 750.212 790.711 750.465 740.833 760.290 78
ASIS0.422 770.333 790.707 750.676 650.401 770.650 750.350 620.177 780.594 670.376 760.202 770.077 770.404 710.571 730.197 800.674 770.447 760.500 790.260 79
3D-BEVIS0.401 780.667 760.687 760.419 790.137 800.587 780.188 760.235 750.359 770.211 790.093 800.080 760.311 750.571 730.382 730.754 720.300 790.874 740.357 73
Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe: 3D-BEVIS: Birds-Eye-View Instance Segmentation.
Sgpn_scannet0.390 790.556 770.636 780.493 750.353 780.539 790.271 710.160 790.450 760.359 780.178 780.146 720.250 780.143 790.347 770.698 760.436 780.667 780.331 76
MaskRCNN 2d->3d Proj0.261 800.903 730.081 800.008 800.233 790.175 800.280 670.106 800.150 800.203 800.175 790.480 560.218 800.143 790.542 720.404 800.153 800.393 800.049 80


This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Virtual MVFusion (R)0.745 10.861 10.839 10.881 10.672 20.512 10.422 190.898 10.723 10.714 10.954 20.454 10.509 10.773 10.895 10.756 10.820 10.653 10.935 10.891 10.728 1
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
BPNet_2Dcopyleft0.670 20.822 30.795 30.836 20.659 30.481 20.451 150.769 50.656 30.567 40.931 30.395 60.390 60.700 40.534 40.689 110.770 20.574 30.865 110.831 30.675 6
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
MVF-GNN(2D)0.636 30.606 160.794 40.434 170.688 10.337 80.464 140.798 40.632 50.589 30.908 90.420 20.329 140.743 20.594 20.738 20.676 50.527 40.906 20.818 60.715 3
CU-Hybrid-2D Net0.636 30.825 20.820 20.179 250.648 40.463 30.549 20.742 90.676 20.628 20.961 10.420 20.379 70.684 80.381 200.732 30.723 30.599 20.827 180.851 20.634 9
DVEFormer0.626 50.616 120.764 60.690 50.583 110.322 140.540 30.809 30.593 70.502 120.900 140.374 90.433 30.660 90.528 50.665 190.663 60.491 90.871 100.810 90.705 4
Fischedick, S., Seichter, D., Stephan, B., Schmidt, R., Gross, H.-M.: DVEFormer: Efficient Prediction of Dense Visual Embeddings via Distillation and RGB-D Transformers. IROS 2025
CMX0.613 60.681 90.725 130.502 130.634 60.297 190.478 120.830 20.651 40.537 70.924 40.375 70.315 160.686 70.451 150.714 50.543 230.504 60.894 70.823 50.688 5
DMMF_3d0.605 70.651 100.744 110.782 30.637 50.387 40.536 50.732 100.590 80.540 60.856 230.359 120.306 170.596 160.539 30.627 220.706 40.497 80.785 230.757 210.476 24
EMSANet0.600 80.716 40.746 100.395 200.614 90.382 50.523 60.713 130.571 120.503 100.922 70.404 50.397 50.655 100.400 170.626 230.663 60.469 140.900 40.827 40.577 16
Seichter, Daniel and Fischedick, Söhnke and Köhler, Mona and Gross, Horst-Michael: EMSANet: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments. IJCNN 2022
MCA-Net0.595 90.533 220.756 90.746 40.590 100.334 100.506 90.670 170.587 90.500 130.905 110.366 110.352 100.601 150.506 90.669 170.648 100.501 70.839 170.769 170.516 23
RFBNet0.592 100.616 120.758 80.659 60.581 120.330 110.469 130.655 200.543 150.524 80.924 40.355 140.336 120.572 190.479 110.671 150.648 100.480 110.814 210.814 70.614 12
FAN_NV_RVC0.586 110.510 230.764 60.079 280.620 80.330 110.494 100.753 70.573 100.556 50.884 180.405 40.303 180.718 30.452 140.672 140.658 80.509 50.898 50.813 80.727 2
WSGFormer0.585 120.706 50.708 180.434 170.574 140.283 220.538 40.759 60.542 170.482 170.924 40.351 160.333 130.614 120.393 180.692 100.551 220.461 150.874 90.809 100.673 7
DCRedNet0.583 130.682 80.723 140.542 120.510 220.310 160.451 150.668 180.549 140.520 90.920 80.375 70.446 20.528 220.417 160.670 160.577 190.478 120.862 120.806 110.628 11
MIX6D_RVC0.582 140.695 60.687 190.225 230.632 70.328 130.550 10.748 80.623 60.494 160.890 160.350 170.254 250.688 60.454 130.716 40.597 180.489 100.881 80.768 180.575 17
SSMAcopyleft0.577 150.695 60.716 160.439 150.563 160.314 150.444 170.719 110.551 130.503 100.887 170.346 180.348 110.603 140.353 220.709 60.600 160.457 160.901 30.786 130.599 15
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
DMMF0.567 160.623 110.767 50.238 220.571 150.347 60.413 210.719 110.472 220.418 240.895 150.357 130.260 240.696 50.523 80.666 180.642 120.437 200.895 60.793 120.603 14
UNIV_CNP_RVC_UE0.566 170.569 210.686 210.435 160.524 190.294 200.421 200.712 140.543 150.463 190.872 190.320 190.363 90.611 130.477 120.686 120.627 130.443 190.862 120.775 160.639 8
EMSAFormer0.564 180.581 180.736 120.564 110.546 180.219 250.517 70.675 160.486 210.427 230.904 120.352 150.320 150.589 170.528 50.708 70.464 260.413 240.847 160.786 130.611 13
Söhnke Benedikt Fischedick, Daniel Seichter, Robin Schmidt, Leonard Rabes, and Horst-Michael Gross: Efficient Multi-Task Scene Analysis with RGB-D Transformers. IJCNN 2023
SN_RN152pyrx8_RVCcopyleft0.546 190.572 190.663 230.638 80.518 200.298 180.366 260.633 230.510 190.446 210.864 210.296 220.267 210.542 210.346 230.704 80.575 200.431 210.853 150.766 190.630 10
UDSSEG_RVC0.545 200.610 150.661 240.588 90.556 170.268 230.482 110.642 220.572 110.475 180.836 250.312 200.367 80.630 110.189 250.639 210.495 250.452 170.826 190.756 220.541 19
segfomer with 6d0.542 210.594 170.687 190.146 260.579 130.308 170.515 80.703 150.472 220.498 140.868 200.369 100.282 190.589 170.390 190.701 90.556 210.416 230.860 140.759 200.539 21
FuseNetpermissive0.535 220.570 200.681 220.182 240.512 210.290 210.431 180.659 190.504 200.495 150.903 130.308 210.428 40.523 230.365 210.676 130.621 150.470 130.762 240.779 150.541 19
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 230.613 140.722 150.418 190.358 280.337 80.370 250.479 260.443 240.368 260.907 100.207 250.213 270.464 260.525 70.618 240.657 90.450 180.788 220.721 250.408 27
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
3DMV (2d proj)0.498 240.481 260.612 250.579 100.456 240.343 70.384 230.623 240.525 180.381 250.845 240.254 240.264 230.557 200.182 260.581 260.598 170.429 220.760 250.661 270.446 26
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
MSeg1080_RVCpermissive0.485 250.505 240.709 170.092 270.427 250.241 240.411 220.654 210.385 280.457 200.861 220.053 280.279 200.503 240.481 100.645 200.626 140.365 260.748 260.725 240.529 22
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, Vladlen Koltun: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. CVPR 2020
ILC-PSPNet0.475 260.490 250.581 260.289 210.507 230.067 280.379 240.610 250.417 260.435 220.822 270.278 230.267 210.503 240.228 240.616 250.533 240.375 250.820 200.729 230.560 18
Enet (reimpl)0.376 270.264 280.452 280.452 140.365 260.181 260.143 280.456 270.409 270.346 270.769 280.164 260.218 260.359 270.123 280.403 280.381 280.313 280.571 270.685 260.472 25
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 280.293 270.521 270.657 70.361 270.161 270.250 270.004 280.440 250.183 280.836 250.125 270.060 280.319 280.132 270.417 270.412 270.344 270.541 280.427 280.109 28
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17


This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
EMSANet (Instance)0.241 10.401 10.439 10.085 10.242 10.220 10.081 10.289 20.117 20.121 10.182 10.126 10.346 10.181 20.181 20.358 10.156 10.675 20.131 1
Seichter, Daniel and Fischedick, Söhnke and Köhler, Mona and Gross, Horst-Michael: EMSANet: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments. IJCNN 2022
UniDet_RVC0.205 20.381 20.323 30.037 30.226 30.177 30.063 20.277 30.120 10.067 30.131 30.074 30.317 20.080 30.235 10.289 30.141 30.678 10.080 3
FKNet0.204 30.334 30.358 20.038 20.234 20.184 20.025 30.318 10.042 40.088 20.141 20.053 40.300 30.207 10.171 30.292 20.149 20.636 30.109 2
MaskRCNN_ScanNetpermissive0.119 40.129 40.212 40.002 40.112 40.148 40.014 40.205 40.044 30.066 40.078 40.095 20.142 40.030 40.128 40.139 40.080 40.459 40.057 4
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17


This table lists the benchmark results for the scene type classification scenario.




Method Infoavg recallapartmentbathroombedroom / hotelbookstore / libraryconference roomcopy/mail roomhallwaykitchenlaundry roomliving room / loungemiscofficestorage / basement / garage
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
LAST-PCL-type0.780 10.250 31.000 11.000 11.000 11.000 11.000 10.500 21.000 10.500 20.889 10.000 21.000 11.000 1
Yanmin Wu, Qiankun Gao, Renrui Zhang, and Jian Zhang: Language-Assisted 3D Scene Understanding. arxiv23.12
multi-taskpermissive0.700 20.500 11.000 10.882 30.500 31.000 11.000 10.500 21.000 11.000 10.778 20.000 20.938 20.000 3
Shengyu Huang, Mikhail Usvyatsov, Konrad Schindler: Indoor Scene Recognition in 3D. IROS 2020
3DASPP-SCE0.691 30.500 10.938 30.824 41.000 11.000 10.500 31.000 10.857 30.500 20.556 40.000 20.812 30.500 2
SE-ResNeXt-SSMA0.498 40.000 50.812 40.941 20.500 30.500 40.500 30.500 20.429 50.500 20.667 30.500 10.625 40.000 3
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. arXiv
resnet50_scannet0.353 50.250 30.812 40.529 50.500 30.500 40.000 50.500 20.571 40.000 50.556 40.000 20.375 50.000 3