The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Volt-SPFormerpermissive0.475 10.630 10.451 20.314 10.772 20.068 40.500 10.000 20.125 10.000 10.107 40.524 50.742 11.000 10.994 10.400 50.500 30.019 41.000 10.410 10.667 20.500 10.423 20.811 10.412 30.250 20.281 20.000 10.000 20.519 40.541 31.000 10.000 10.331 30.000 30.841 20.638 10.806 50.000 30.014 40.241 30.245 20.333 40.028 30.000 10.817 20.825 10.250 50.799 61.000 10.847 40.129 10.294 30.702 20.000 40.304 20.755 10.000 60.750 10.400 10.923 10.482 20.900 10.208 80.319 10.750 10.823 20.000 30.510 20.200 30.000 20.500 30.500 10.300 30.903 11.000 10.564 40.372 41.000 10.787 20.449 40.250 10.000 10.600 10.026 30.375 10.000 41.000 10.455 30.400 20.878 20.641 30.612 10.894 40.000 40.000 10.800 40.078 40.008 40.500 20.056 10.278 21.000 10.797 30.500 20.585 21.000 10.869 10.735 20.056 40.768 20.043 50.714 10.000 10.500 10.250 10.683 40.000 31.000 10.000 10.853 10.944 20.255 20.923 21.000 10.002 30.224 30.499 30.250 21.000 11.000 10.857 21.000 10.613 10.000 30.818 10.857 20.343 20.000 50.209 30.629 80.025 20.500 50.000 20.000 30.000 20.725 30.000 20.000 10.716 10.666 40.651 31.000 10.500 20.990 60.565 20.750 30.699 20.167 70.930 10.000 10.019 20.784 21.000 11.000 10.000 10.099 41.000 10.000 10.472 30.764 10.546 20.621 20.452 3
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
CompetitorFormer-2000.415 30.574 30.370 40.274 30.632 30.054 60.500 10.083 10.000 40.000 10.260 20.541 40.410 40.903 60.987 20.885 10.500 30.064 10.250 40.378 21.000 10.243 50.428 10.599 60.218 51.000 10.196 40.000 10.000 20.587 30.318 61.000 10.000 10.500 10.000 30.845 10.232 40.885 10.005 20.015 30.216 40.183 50.399 30.082 10.000 10.724 40.806 30.500 20.869 21.000 10.779 70.095 30.443 10.685 40.021 30.269 30.704 30.083 50.467 70.400 10.846 20.551 10.663 30.261 70.103 50.482 30.758 30.025 20.018 60.400 20.000 20.677 20.500 10.207 50.881 21.000 10.600 30.648 20.144 80.641 30.452 30.000 20.000 10.327 50.142 20.209 50.000 40.083 20.215 60.317 50.748 50.508 40.484 30.957 10.000 40.000 10.833 30.132 10.400 10.663 10.015 30.103 31.000 10.759 40.125 30.286 40.500 20.830 20.651 30.089 20.540 70.380 30.581 20.000 10.500 10.000 30.745 20.050 21.000 10.000 10.622 60.694 50.213 40.870 50.125 50.000 40.205 40.562 20.000 70.933 71.000 10.820 50.250 30.347 60.000 30.731 40.877 10.289 30.160 20.186 40.684 50.008 40.538 30.000 20.000 30.000 20.700 40.056 10.000 10.491 40.584 60.602 40.489 30.565 11.000 10.311 70.750 30.583 40.292 40.832 50.000 10.157 10.780 31.000 10.625 40.000 10.131 30.794 30.000 10.667 10.071 80.545 30.682 10.462 2
DINO3D-Scannet200copyleft0.454 20.587 20.453 10.296 20.851 10.200 10.500 10.000 20.042 30.000 10.378 10.545 30.729 21.000 10.981 30.355 91.000 10.046 20.000 60.248 30.000 40.494 20.381 30.586 80.496 20.250 20.409 10.000 10.000 20.714 10.572 11.000 10.000 10.250 40.050 20.793 30.436 30.871 20.000 30.216 10.284 10.290 10.083 60.000 40.000 10.764 30.716 80.500 20.842 31.000 10.891 20.096 20.361 20.690 30.000 40.595 10.753 20.708 30.750 10.400 10.845 30.475 30.728 20.750 10.214 20.683 20.743 40.000 30.400 40.200 30.500 10.944 10.125 40.327 10.823 30.792 50.602 20.662 10.777 20.803 10.675 10.000 20.000 10.200 60.298 10.324 20.000 40.000 30.000 80.800 10.824 40.750 10.507 20.937 20.000 40.000 10.779 60.116 20.001 60.417 50.000 40.014 51.000 10.816 20.548 10.600 10.500 20.771 30.773 10.117 10.944 10.764 10.571 30.000 10.250 50.000 31.000 10.063 11.000 10.000 10.720 30.974 10.079 70.918 30.000 60.000 40.312 20.616 10.125 31.000 11.000 10.857 20.000 40.594 20.000 30.767 20.845 30.264 50.419 10.177 50.667 60.000 50.677 10.000 20.194 10.000 20.857 10.000 20.000 10.563 30.703 10.835 20.850 20.346 30.944 70.499 50.866 20.777 10.221 60.911 20.000 10.011 30.721 40.764 80.520 60.000 10.442 10.405 70.000 10.667 10.655 20.473 50.614 30.437 4
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.381 50.507 50.375 30.237 40.484 70.108 20.500 10.000 20.125 10.000 10.058 70.647 20.385 50.667 70.853 50.542 41.000 10.000 61.000 10.093 50.000 40.028 80.274 50.682 30.550 10.000 40.269 30.000 10.000 20.714 10.566 21.000 10.000 10.500 10.125 10.585 60.066 60.653 90.083 10.049 20.264 20.227 30.667 10.000 40.000 10.278 90.723 70.250 50.786 81.000 10.744 90.039 50.209 40.494 80.000 40.250 40.446 60.500 40.750 10.200 60.780 40.333 40.602 40.469 40.163 30.406 50.530 70.000 30.668 10.200 30.000 20.000 60.500 10.313 20.769 41.000 10.511 50.196 50.286 50.393 80.337 50.000 20.000 10.600 10.000 60.174 60.226 20.000 30.579 20.200 60.887 10.750 10.428 50.782 60.438 10.000 10.795 50.063 60.003 50.500 20.000 40.333 11.000 10.742 50.083 40.585 20.417 70.448 90.496 50.055 50.734 30.472 20.174 80.000 10.250 50.000 30.688 30.000 31.000 10.000 10.631 50.667 60.275 10.694 91.000 10.000 40.328 10.422 40.000 71.000 10.500 70.638 60.000 40.391 50.000 30.582 60.800 40.208 80.000 50.246 20.667 60.000 50.638 20.167 10.000 30.000 20.778 20.000 20.000 10.563 20.614 50.841 10.333 50.250 50.938 80.569 10.500 70.695 30.264 50.863 30.000 10.000 50.550 81.000 10.668 30.000 10.000 60.667 50.000 10.333 70.333 40.665 10.434 60.264 5
TD3D Scannet200permissive0.320 60.501 60.264 60.164 60.506 60.062 50.500 10.000 20.000 40.000 10.208 30.431 60.252 71.000 10.733 70.587 30.000 60.008 50.000 60.106 40.000 40.356 30.123 80.686 20.101 60.000 40.152 60.000 10.000 20.226 50.280 70.000 60.000 10.250 40.000 30.619 50.061 70.841 30.000 30.000 60.167 50.194 40.333 40.000 40.000 10.667 50.820 20.250 50.790 71.000 10.879 30.077 40.094 70.708 10.217 20.049 60.634 40.792 10.331 80.033 90.716 60.159 60.396 60.331 60.099 60.415 40.842 10.000 30.458 30.542 10.000 20.101 50.000 50.218 40.513 60.500 60.458 60.104 60.516 30.456 40.268 80.000 20.000 10.400 30.022 40.233 40.143 30.000 30.677 10.400 20.504 90.095 70.083 90.890 50.061 30.000 10.906 10.076 50.231 20.125 60.000 40.003 60.792 70.881 10.000 60.098 70.125 80.498 80.459 60.063 30.715 40.000 60.241 70.000 10.396 40.063 20.605 50.000 30.000 60.000 10.448 90.629 70.202 50.967 10.250 40.038 10.192 50.185 60.083 61.000 11.000 10.857 20.000 40.470 40.012 10.565 70.798 50.621 10.111 30.500 11.000 10.017 30.509 40.000 20.008 21.000 10.525 60.000 20.000 10.332 70.679 20.264 60.333 50.267 41.000 10.549 30.299 90.387 60.328 30.744 80.000 10.000 50.435 91.000 10.283 80.000 10.196 20.817 20.000 10.472 30.222 60.123 80.560 50.156 6
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.388 40.542 40.357 50.237 50.610 40.091 30.125 90.000 20.000 40.000 10.065 60.668 10.451 31.000 10.955 40.640 20.500 30.039 30.125 50.063 60.409 30.311 40.291 40.609 50.266 40.000 40.163 50.000 10.008 10.044 60.496 41.000 10.000 10.018 60.000 30.756 40.573 20.808 40.000 30.010 50.042 70.130 70.552 20.042 20.000 11.000 10.725 60.750 10.883 11.000 10.832 60.024 60.107 50.614 60.226 10.250 40.628 50.792 10.677 50.400 10.741 50.278 50.511 50.077 90.111 40.313 60.715 50.302 10.017 70.200 30.000 20.188 40.000 50.178 60.736 51.000 10.615 10.514 30.409 40.380 90.600 20.000 20.000 10.400 30.013 50.254 30.381 10.000 30.123 70.400 20.839 30.258 50.463 40.926 30.265 20.000 10.857 20.099 30.021 30.500 20.027 20.028 41.000 10.502 90.016 50.076 80.500 20.612 40.578 40.005 60.597 50.194 40.497 40.000 10.500 10.000 30.323 80.000 31.000 10.000 10.748 20.708 40.050 80.890 41.000 10.008 20.151 70.301 51.000 11.000 10.792 60.945 11.000 10.511 30.004 20.753 30.776 60.287 40.020 40.003 80.974 30.033 10.412 90.000 20.000 30.000 20.667 50.000 20.000 10.491 50.676 30.352 50.335 40.060 60.822 90.527 41.000 10.517 50.606 10.853 40.000 10.004 40.806 11.000 10.727 20.000 10.042 50.739 40.000 10.399 60.391 30.504 40.591 40.571 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
Minkowski 34D Inst.permissive0.203 90.369 80.134 90.078 90.479 80.003 80.500 10.000 20.000 40.000 10.100 50.371 70.300 60.667 70.746 60.400 50.000 60.000 60.000 60.031 70.000 40.074 70.165 70.413 90.000 80.000 40.070 80.000 10.000 20.000 70.221 90.000 60.000 10.000 70.000 30.372 90.070 50.706 70.000 30.000 60.000 90.123 80.033 90.000 40.000 10.422 80.732 50.000 80.778 91.000 10.845 50.000 70.090 80.636 50.000 40.000 70.158 80.000 60.250 90.050 80.693 70.123 80.051 90.385 50.009 80.118 90.406 90.000 30.000 80.200 30.000 20.000 60.000 50.133 80.307 90.500 60.251 80.000 80.281 60.402 70.317 60.000 20.000 10.000 70.000 60.060 80.000 40.000 30.396 40.200 60.669 60.021 80.218 80.720 90.000 40.000 10.696 70.025 80.000 70.000 70.000 40.000 70.125 90.596 60.000 60.191 50.500 20.595 50.369 80.000 70.500 80.000 60.143 90.000 10.000 70.000 30.226 90.000 30.000 60.000 10.701 40.511 80.000 90.851 70.000 60.000 40.150 80.052 90.100 50.981 60.500 70.286 70.000 40.000 90.000 30.545 80.522 90.250 60.000 50.000 90.522 90.000 50.500 50.000 20.000 30.000 20.282 90.000 20.000 10.178 90.382 80.018 90.056 80.000 70.997 40.107 90.677 50.313 80.000 80.726 90.000 10.000 50.583 70.903 70.200 90.000 10.000 60.333 80.000 10.442 50.083 70.109 90.387 80.000 9
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.209 80.361 90.157 80.085 80.506 50.007 70.500 10.000 20.000 40.000 10.000 90.093 90.221 80.667 70.524 90.400 50.000 60.000 60.000 60.004 80.000 40.000 90.109 90.589 70.000 80.000 40.059 90.000 10.000 20.000 70.322 50.000 60.000 10.000 70.000 30.405 70.055 80.700 80.000 30.000 60.028 80.091 90.083 60.000 40.000 10.667 50.768 40.000 80.807 51.000 10.776 80.000 70.000 90.340 90.000 40.000 70.103 90.000 60.750 10.200 60.634 90.053 90.246 70.677 30.006 90.198 70.432 80.000 30.000 80.050 80.000 20.000 60.000 50.111 90.356 80.500 60.188 90.000 80.220 70.448 50.050 90.000 20.000 10.000 70.000 60.032 90.000 40.000 30.396 40.000 80.573 80.000 90.228 70.747 80.000 40.000 10.573 90.021 90.000 70.000 70.000 40.000 70.500 80.573 70.000 60.000 90.125 80.592 60.364 90.000 70.450 90.000 60.364 50.000 10.000 70.000 30.340 70.000 30.000 60.000 10.610 70.833 30.221 30.702 80.000 60.000 40.135 90.094 80.125 30.571 80.500 70.143 90.000 40.125 70.000 30.618 50.667 80.115 90.000 50.125 61.000 10.000 50.500 50.000 20.000 30.000 20.502 80.000 20.000 10.312 80.248 90.050 80.000 90.000 70.997 40.420 60.500 70.149 90.451 20.748 60.000 10.000 50.636 60.667 90.600 50.000 10.000 60.278 90.000 10.333 70.000 90.294 60.381 90.110 7
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.246 70.413 70.170 70.130 70.455 90.003 90.500 10.000 20.000 40.000 10.017 80.333 80.111 91.000 10.681 80.400 50.000 60.000 61.000 10.003 90.000 40.167 60.190 60.637 40.067 70.000 40.081 70.000 10.000 20.000 70.264 80.000 60.000 10.000 70.000 30.387 80.031 90.754 60.000 30.000 60.151 60.135 60.056 80.000 40.000 10.582 70.589 90.500 20.815 41.000 10.903 10.000 70.097 60.588 70.000 40.000 70.234 70.000 60.500 60.400 10.682 80.156 70.159 80.750 10.046 70.125 80.660 60.000 30.200 50.000 90.000 20.000 60.000 50.164 70.402 70.500 60.373 70.025 70.143 90.426 60.317 60.000 20.000 10.000 70.000 60.063 70.000 40.000 30.000 80.000 80.575 70.250 60.241 60.772 70.000 40.000 10.653 80.034 70.000 70.000 70.000 40.000 71.000 10.561 80.000 60.100 60.500 20.541 70.452 70.000 70.581 60.000 60.364 50.000 10.000 70.000 30.571 60.000 30.000 60.000 10.568 80.511 80.167 60.857 60.000 60.000 40.164 60.112 70.000 70.530 91.000 10.286 70.000 40.125 70.000 30.464 90.706 70.208 70.000 50.125 60.744 40.000 50.500 50.000 20.000 30.000 20.511 70.000 20.000 10.344 60.541 70.068 70.333 50.000 71.000 10.196 80.533 60.318 70.000 80.748 70.000 10.000 50.690 51.000 10.400 70.000 10.000 60.667 50.000 10.333 70.333 40.270 70.399 70.083 8
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.