The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
DINO3D-Scannet200copyleft0.454 10.587 10.453 10.296 10.851 10.200 10.500 10.000 20.042 20.000 10.378 10.545 30.729 11.000 10.981 20.355 81.000 10.046 20.000 50.248 20.000 30.494 10.381 20.586 70.496 20.250 20.409 10.000 10.000 20.714 10.572 11.000 10.000 10.250 30.050 20.793 20.436 20.871 20.000 30.216 10.284 10.290 10.083 50.000 30.000 10.764 20.716 70.500 20.842 31.000 10.891 20.096 10.361 20.690 20.000 40.595 10.753 10.708 30.750 10.400 10.845 20.475 20.728 10.750 10.214 10.683 10.743 30.000 30.400 30.200 30.500 10.944 10.125 30.327 10.823 20.792 40.602 20.662 10.777 10.803 10.675 10.000 10.000 10.200 50.298 10.324 10.000 40.000 20.000 70.800 10.824 30.750 10.507 10.937 20.000 40.000 10.779 50.116 20.001 50.417 40.000 30.014 41.000 10.816 20.548 10.600 10.500 10.771 20.773 10.117 10.944 10.764 10.571 20.000 10.250 40.000 21.000 10.063 11.000 10.000 10.720 20.974 10.079 60.918 20.000 50.000 30.312 20.616 10.125 21.000 11.000 10.857 20.000 30.594 10.000 30.767 10.845 20.264 40.419 10.177 40.667 60.000 40.677 10.000 20.194 10.000 20.857 10.000 20.000 10.563 20.703 10.835 20.850 10.346 20.944 60.499 40.866 20.777 10.221 60.911 10.000 10.011 20.721 30.764 70.520 50.000 10.442 10.405 60.000 10.667 10.655 10.473 40.614 20.437 3
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
CompetitorFormer-2000.415 20.574 20.370 30.274 20.632 20.054 50.500 10.083 10.000 30.000 10.260 20.541 40.410 30.903 50.987 10.885 10.500 30.064 10.250 30.378 11.000 10.243 40.428 10.599 50.218 41.000 10.196 30.000 10.000 20.587 30.318 51.000 10.000 10.500 10.000 30.845 10.232 30.885 10.005 20.015 30.216 30.183 40.399 30.082 10.000 10.724 30.806 20.500 20.869 21.000 10.779 60.095 20.443 10.685 30.021 30.269 20.704 20.083 50.467 60.400 10.846 10.551 10.663 20.261 70.103 40.482 20.758 20.025 20.018 50.400 20.000 20.677 20.500 10.207 40.881 11.000 10.600 30.648 20.144 70.641 20.452 30.000 10.000 10.327 40.142 20.209 40.000 40.083 10.215 50.317 40.748 40.508 30.484 20.957 10.000 40.000 10.833 30.132 10.400 10.663 10.015 20.103 21.000 10.759 30.125 20.286 30.500 10.830 10.651 20.089 20.540 60.380 30.581 10.000 10.500 10.000 20.745 20.050 21.000 10.000 10.622 50.694 40.213 30.870 40.125 40.000 30.205 30.562 20.000 60.933 61.000 10.820 40.250 20.347 50.000 30.731 30.877 10.289 20.160 20.186 30.684 50.008 30.538 30.000 20.000 30.000 20.700 30.056 10.000 10.491 30.584 50.602 30.489 20.565 11.000 10.311 60.750 30.583 30.292 40.832 40.000 10.157 10.780 21.000 10.625 30.000 10.131 30.794 20.000 10.667 10.071 70.545 20.682 10.462 2
ODIN - Ins200permissive0.381 40.507 40.375 20.237 30.484 60.108 20.500 10.000 20.125 10.000 10.058 60.647 20.385 40.667 60.853 40.542 41.000 10.000 51.000 10.093 40.000 30.028 70.274 40.682 20.550 10.000 30.269 20.000 10.000 20.714 10.566 21.000 10.000 10.500 10.125 10.585 50.066 50.653 80.083 10.049 20.264 20.227 20.667 10.000 30.000 10.278 80.723 60.250 50.786 71.000 10.744 80.039 40.209 30.494 70.000 40.250 30.446 50.500 40.750 10.200 50.780 30.333 30.602 30.469 40.163 20.406 40.530 60.000 30.668 10.200 30.000 20.000 50.500 10.313 20.769 31.000 10.511 40.196 40.286 40.393 70.337 40.000 10.000 10.600 10.000 50.174 50.226 20.000 20.579 20.200 50.887 10.750 10.428 40.782 50.438 10.000 10.795 40.063 50.003 40.500 20.000 30.333 11.000 10.742 40.083 30.585 20.417 60.448 80.496 40.055 40.734 20.472 20.174 70.000 10.250 40.000 20.688 30.000 31.000 10.000 10.631 40.667 50.275 10.694 81.000 10.000 30.328 10.422 30.000 61.000 10.500 60.638 50.000 30.391 40.000 30.582 50.800 30.208 70.000 50.246 20.667 60.000 40.638 20.167 10.000 30.000 20.778 20.000 20.000 10.563 10.614 40.841 10.333 40.250 40.938 70.569 10.500 60.695 20.264 50.863 20.000 10.000 40.550 71.000 10.668 20.000 10.000 50.667 40.000 10.333 60.333 30.665 10.434 50.264 4
Mask3D Scannet2000.388 30.542 30.357 40.237 40.610 30.091 30.125 80.000 20.000 30.000 10.065 50.668 10.451 21.000 10.955 30.640 20.500 30.039 30.125 40.063 50.409 20.311 30.291 30.609 40.266 30.000 30.163 40.000 10.008 10.044 50.496 31.000 10.000 10.018 50.000 30.756 30.573 10.808 40.000 30.010 40.042 60.130 60.552 20.042 20.000 11.000 10.725 50.750 10.883 11.000 10.832 50.024 50.107 40.614 50.226 10.250 30.628 40.792 10.677 40.400 10.741 40.278 40.511 40.077 80.111 30.313 50.715 40.302 10.017 60.200 30.000 20.188 30.000 40.178 50.736 41.000 10.615 10.514 30.409 30.380 80.600 20.000 10.000 10.400 20.013 40.254 20.381 10.000 20.123 60.400 20.839 20.258 40.463 30.926 30.265 20.000 10.857 20.099 30.021 30.500 20.027 10.028 31.000 10.502 80.016 40.076 70.500 10.612 30.578 30.005 50.597 40.194 40.497 30.000 10.500 10.000 20.323 70.000 31.000 10.000 10.748 10.708 30.050 70.890 31.000 10.008 20.151 60.301 41.000 11.000 10.792 50.945 11.000 10.511 20.004 20.753 20.776 50.287 30.020 40.003 70.974 30.033 10.412 80.000 20.000 30.000 20.667 40.000 20.000 10.491 40.676 30.352 40.335 30.060 50.822 80.527 31.000 10.517 40.606 10.853 30.000 10.004 30.806 11.000 10.727 10.000 10.042 40.739 30.000 10.399 50.391 20.504 30.591 30.571 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.320 50.501 50.264 50.164 50.506 50.062 40.500 10.000 20.000 30.000 10.208 30.431 50.252 61.000 10.733 60.587 30.000 50.008 40.000 50.106 30.000 30.356 20.123 70.686 10.101 50.000 30.152 50.000 10.000 20.226 40.280 60.000 50.000 10.250 30.000 30.619 40.061 60.841 30.000 30.000 50.167 40.194 30.333 40.000 30.000 10.667 40.820 10.250 50.790 61.000 10.879 30.077 30.094 60.708 10.217 20.049 50.634 30.792 10.331 70.033 80.716 50.159 50.396 50.331 60.099 50.415 30.842 10.000 30.458 20.542 10.000 20.101 40.000 40.218 30.513 50.500 50.458 50.104 50.516 20.456 30.268 70.000 10.000 10.400 20.022 30.233 30.143 30.000 20.677 10.400 20.504 80.095 60.083 80.890 40.061 30.000 10.906 10.076 40.231 20.125 50.000 30.003 50.792 60.881 10.000 50.098 60.125 70.498 70.459 50.063 30.715 30.000 50.241 60.000 10.396 30.063 10.605 40.000 30.000 50.000 10.448 80.629 60.202 40.967 10.250 30.038 10.192 40.185 50.083 51.000 11.000 10.857 20.000 30.470 30.012 10.565 60.798 40.621 10.111 30.500 11.000 10.017 20.509 40.000 20.008 21.000 10.525 50.000 20.000 10.332 60.679 20.264 50.333 40.267 31.000 10.549 20.299 80.387 50.328 30.744 70.000 10.000 40.435 81.000 10.283 70.000 10.196 20.817 10.000 10.472 30.222 50.123 70.560 40.156 5
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.203 80.369 70.134 80.078 80.479 70.003 70.500 10.000 20.000 30.000 10.100 40.371 60.300 50.667 60.746 50.400 50.000 50.000 50.000 50.031 60.000 30.074 60.165 60.413 80.000 70.000 30.070 70.000 10.000 20.000 60.221 80.000 50.000 10.000 60.000 30.372 80.070 40.706 60.000 30.000 50.000 80.123 70.033 80.000 30.000 10.422 70.732 40.000 70.778 81.000 10.845 40.000 60.090 70.636 40.000 40.000 60.158 70.000 60.250 80.050 70.693 60.123 70.051 80.385 50.009 70.118 80.406 80.000 30.000 70.200 30.000 20.000 50.000 40.133 70.307 80.500 50.251 70.000 70.281 50.402 60.317 50.000 10.000 10.000 60.000 50.060 70.000 40.000 20.396 30.200 50.669 50.021 70.218 70.720 80.000 40.000 10.696 60.025 70.000 60.000 60.000 30.000 60.125 80.596 50.000 50.191 40.500 10.595 40.369 70.000 60.500 70.000 50.143 80.000 10.000 60.000 20.226 80.000 30.000 50.000 10.701 30.511 70.000 80.851 60.000 50.000 30.150 70.052 80.100 40.981 50.500 60.286 60.000 30.000 80.000 30.545 70.522 80.250 50.000 50.000 80.522 80.000 40.500 50.000 20.000 30.000 20.282 80.000 20.000 10.178 80.382 70.018 80.056 70.000 60.997 40.107 80.677 40.313 70.000 70.726 80.000 10.000 40.583 60.903 60.200 80.000 10.000 50.333 70.000 10.442 40.083 60.109 80.387 70.000 8
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.209 70.361 80.157 70.085 70.506 40.007 60.500 10.000 20.000 30.000 10.000 80.093 80.221 70.667 60.524 80.400 50.000 50.000 50.000 50.004 70.000 30.000 80.109 80.589 60.000 70.000 30.059 80.000 10.000 20.000 60.322 40.000 50.000 10.000 60.000 30.405 60.055 70.700 70.000 30.000 50.028 70.091 80.083 50.000 30.000 10.667 40.768 30.000 70.807 51.000 10.776 70.000 60.000 80.340 80.000 40.000 60.103 80.000 60.750 10.200 50.634 80.053 80.246 60.677 30.006 80.198 60.432 70.000 30.000 70.050 70.000 20.000 50.000 40.111 80.356 70.500 50.188 80.000 70.220 60.448 40.050 80.000 10.000 10.000 60.000 50.032 80.000 40.000 20.396 30.000 70.573 70.000 80.228 60.747 70.000 40.000 10.573 80.021 80.000 60.000 60.000 30.000 60.500 70.573 60.000 50.000 80.125 70.592 50.364 80.000 60.450 80.000 50.364 40.000 10.000 60.000 20.340 60.000 30.000 50.000 10.610 60.833 20.221 20.702 70.000 50.000 30.135 80.094 70.125 20.571 70.500 60.143 80.000 30.125 60.000 30.618 40.667 70.115 80.000 50.125 51.000 10.000 40.500 50.000 20.000 30.000 20.502 70.000 20.000 10.312 70.248 80.050 70.000 80.000 60.997 40.420 50.500 60.149 80.451 20.748 50.000 10.000 40.636 50.667 80.600 40.000 10.000 50.278 80.000 10.333 60.000 80.294 50.381 80.110 6
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.246 60.413 60.170 60.130 60.455 80.003 80.500 10.000 20.000 30.000 10.017 70.333 70.111 81.000 10.681 70.400 50.000 50.000 51.000 10.003 80.000 30.167 50.190 50.637 30.067 60.000 30.081 60.000 10.000 20.000 60.264 70.000 50.000 10.000 60.000 30.387 70.031 80.754 50.000 30.000 50.151 50.135 50.056 70.000 30.000 10.582 60.589 80.500 20.815 41.000 10.903 10.000 60.097 50.588 60.000 40.000 60.234 60.000 60.500 50.400 10.682 70.156 60.159 70.750 10.046 60.125 70.660 50.000 30.200 40.000 80.000 20.000 50.000 40.164 60.402 60.500 50.373 60.025 60.143 80.426 50.317 50.000 10.000 10.000 60.000 50.063 60.000 40.000 20.000 70.000 70.575 60.250 50.241 50.772 60.000 40.000 10.653 70.034 60.000 60.000 60.000 30.000 61.000 10.561 70.000 50.100 50.500 10.541 60.452 60.000 60.581 50.000 50.364 40.000 10.000 60.000 20.571 50.000 30.000 50.000 10.568 70.511 70.167 50.857 50.000 50.000 30.164 50.112 60.000 60.530 81.000 10.286 60.000 30.125 60.000 30.464 80.706 60.208 60.000 50.125 50.744 40.000 40.500 50.000 20.000 30.000 20.511 60.000 20.000 10.344 50.541 60.068 60.333 40.000 61.000 10.196 70.533 50.318 60.000 70.748 60.000 10.000 40.690 41.000 10.400 60.000 10.000 50.667 40.000 10.333 60.333 30.270 60.399 60.083 7
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.