The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
DINO3D-Scannet200copyleft0.454 10.587 10.453 10.296 10.851 10.200 10.500 10.000 10.042 20.000 10.378 10.545 30.729 11.000 10.981 10.355 71.000 10.046 10.000 40.248 10.000 20.494 10.381 10.586 60.496 20.250 10.409 10.000 10.000 20.714 10.572 11.000 10.000 10.250 20.050 20.793 10.436 20.871 10.000 20.216 10.284 10.290 10.083 40.000 20.000 10.764 20.716 60.500 20.842 21.000 10.891 20.096 10.361 10.690 20.000 30.595 10.753 10.708 30.750 10.400 10.845 10.475 10.728 10.750 10.214 10.683 10.743 20.000 20.400 30.200 20.500 10.944 10.125 20.327 10.823 10.792 30.602 20.662 10.777 10.803 10.675 10.000 10.000 10.200 40.298 10.324 10.000 40.000 10.000 60.800 10.824 30.750 10.507 10.937 10.000 40.000 10.779 40.116 10.001 40.417 30.000 20.014 31.000 10.816 20.548 10.600 10.500 10.771 10.773 10.117 10.944 10.764 10.571 10.000 10.250 30.000 21.000 10.063 11.000 10.000 10.720 20.974 10.079 50.918 20.000 40.000 30.312 20.616 10.125 21.000 11.000 10.857 20.000 20.594 10.000 30.767 10.845 10.264 30.419 10.177 30.667 50.000 30.677 10.000 20.194 10.000 20.857 10.000 10.000 10.563 20.703 10.835 20.850 10.346 10.944 50.499 40.866 20.777 10.221 50.911 10.000 10.011 10.721 20.764 60.520 40.000 10.442 10.405 50.000 10.667 10.655 10.473 30.614 10.437 2
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.381 30.507 30.375 20.237 20.484 50.108 20.500 10.000 10.125 10.000 10.058 50.647 20.385 30.667 50.853 30.542 31.000 10.000 41.000 10.093 30.000 20.028 60.274 30.682 20.550 10.000 20.269 20.000 10.000 20.714 10.566 21.000 10.000 10.500 10.125 10.585 40.066 40.653 70.083 10.049 20.264 20.227 20.667 10.000 20.000 10.278 70.723 50.250 40.786 61.000 10.744 70.039 30.209 20.494 60.000 30.250 20.446 40.500 40.750 10.200 40.780 20.333 20.602 20.469 40.163 20.406 30.530 50.000 20.668 10.200 20.000 20.000 40.500 10.313 20.769 21.000 10.511 30.196 30.286 40.393 60.337 30.000 10.000 10.600 10.000 40.174 40.226 20.000 10.579 20.200 40.887 10.750 10.428 30.782 40.438 10.000 10.795 30.063 40.003 30.500 10.000 20.333 11.000 10.742 30.083 20.585 20.417 50.448 70.496 30.055 30.734 20.472 20.174 60.000 10.250 30.000 20.688 20.000 21.000 10.000 10.631 40.667 40.275 10.694 71.000 10.000 30.328 10.422 20.000 61.000 10.500 50.638 40.000 20.391 40.000 30.582 40.800 20.208 60.000 40.246 20.667 50.000 30.638 20.167 10.000 30.000 20.778 20.000 10.000 10.563 10.614 40.841 10.333 30.250 30.938 60.569 10.500 50.695 20.264 40.863 20.000 10.000 30.550 61.000 10.668 20.000 10.000 40.667 30.000 10.333 50.333 30.665 10.434 40.264 3
Mask3D Scannet2000.388 20.542 20.357 30.237 30.610 20.091 30.125 70.000 10.000 30.000 10.065 40.668 10.451 21.000 10.955 20.640 10.500 30.039 20.125 30.063 40.409 10.311 30.291 20.609 40.266 30.000 20.163 30.000 10.008 10.044 40.496 31.000 10.000 10.018 40.000 30.756 20.573 10.808 30.000 20.010 30.042 50.130 50.552 20.042 10.000 11.000 10.725 40.750 10.883 11.000 10.832 50.024 40.107 30.614 40.226 10.250 20.628 30.792 10.677 40.400 10.741 30.278 30.511 30.077 70.111 30.313 40.715 30.302 10.017 50.200 20.000 20.188 20.000 30.178 40.736 31.000 10.615 10.514 20.409 30.380 70.600 20.000 10.000 10.400 20.013 30.254 20.381 10.000 10.123 50.400 20.839 20.258 30.463 20.926 20.265 20.000 10.857 20.099 20.021 20.500 10.027 10.028 21.000 10.502 70.016 30.076 60.500 10.612 20.578 20.005 40.597 40.194 30.497 20.000 10.500 10.000 20.323 60.000 21.000 10.000 10.748 10.708 30.050 60.890 31.000 10.008 20.151 50.301 31.000 11.000 10.792 40.945 11.000 10.511 20.004 20.753 20.776 40.287 20.020 30.003 60.974 30.033 10.412 70.000 20.000 30.000 20.667 30.000 10.000 10.491 30.676 30.352 30.335 20.060 40.822 70.527 31.000 10.517 30.606 10.853 30.000 10.004 20.806 11.000 10.727 10.000 10.042 30.739 20.000 10.399 40.391 20.504 20.591 20.571 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.320 40.501 40.264 40.164 40.506 40.062 40.500 10.000 10.000 30.000 10.208 20.431 40.252 51.000 10.733 50.587 20.000 40.008 30.000 40.106 20.000 20.356 20.123 60.686 10.101 40.000 20.152 40.000 10.000 20.226 30.280 50.000 40.000 10.250 20.000 30.619 30.061 50.841 20.000 20.000 40.167 30.194 30.333 30.000 20.000 10.667 30.820 10.250 40.790 51.000 10.879 30.077 20.094 50.708 10.217 20.049 40.634 20.792 10.331 60.033 70.716 40.159 40.396 40.331 60.099 40.415 20.842 10.000 20.458 20.542 10.000 20.101 30.000 30.218 30.513 40.500 40.458 40.104 40.516 20.456 20.268 60.000 10.000 10.400 20.022 20.233 30.143 30.000 10.677 10.400 20.504 70.095 50.083 70.890 30.061 30.000 10.906 10.076 30.231 10.125 40.000 20.003 40.792 50.881 10.000 40.098 50.125 60.498 60.459 40.063 20.715 30.000 40.241 50.000 10.396 20.063 10.605 30.000 20.000 40.000 10.448 70.629 50.202 30.967 10.250 30.038 10.192 30.185 40.083 51.000 11.000 10.857 20.000 20.470 30.012 10.565 50.798 30.621 10.111 20.500 11.000 10.017 20.509 30.000 20.008 21.000 10.525 40.000 10.000 10.332 50.679 20.264 40.333 30.267 21.000 10.549 20.299 70.387 40.328 30.744 60.000 10.000 30.435 71.000 10.283 60.000 10.196 20.817 10.000 10.472 20.222 50.123 60.560 30.156 4
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.203 70.369 60.134 70.078 70.479 60.003 60.500 10.000 10.000 30.000 10.100 30.371 50.300 40.667 50.746 40.400 40.000 40.000 40.000 40.031 50.000 20.074 50.165 50.413 70.000 60.000 20.070 60.000 10.000 20.000 50.221 70.000 40.000 10.000 50.000 30.372 70.070 30.706 50.000 20.000 40.000 70.123 60.033 70.000 20.000 10.422 60.732 30.000 60.778 71.000 10.845 40.000 50.090 60.636 30.000 30.000 50.158 60.000 50.250 70.050 60.693 50.123 60.051 70.385 50.009 60.118 70.406 70.000 20.000 60.200 20.000 20.000 40.000 30.133 60.307 70.500 40.251 60.000 60.281 50.402 50.317 40.000 10.000 10.000 50.000 40.060 60.000 40.000 10.396 30.200 40.669 40.021 60.218 60.720 70.000 40.000 10.696 50.025 60.000 50.000 50.000 20.000 50.125 70.596 40.000 40.191 30.500 10.595 30.369 60.000 50.500 60.000 40.143 70.000 10.000 50.000 20.226 70.000 20.000 40.000 10.701 30.511 60.000 70.851 50.000 40.000 30.150 60.052 70.100 40.981 50.500 50.286 50.000 20.000 70.000 30.545 60.522 70.250 40.000 40.000 70.522 70.000 30.500 40.000 20.000 30.000 20.282 70.000 10.000 10.178 70.382 60.018 70.056 60.000 50.997 30.107 70.677 30.313 60.000 60.726 70.000 10.000 30.583 50.903 50.200 70.000 10.000 40.333 60.000 10.442 30.083 60.109 70.387 60.000 7
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.209 60.361 70.157 60.085 60.506 30.007 50.500 10.000 10.000 30.000 10.000 70.093 70.221 60.667 50.524 70.400 40.000 40.000 40.000 40.004 60.000 20.000 70.109 70.589 50.000 60.000 20.059 70.000 10.000 20.000 50.322 40.000 40.000 10.000 50.000 30.405 50.055 60.700 60.000 20.000 40.028 60.091 70.083 40.000 20.000 10.667 30.768 20.000 60.807 41.000 10.776 60.000 50.000 70.340 70.000 30.000 50.103 70.000 50.750 10.200 40.634 70.053 70.246 50.677 30.006 70.198 50.432 60.000 20.000 60.050 60.000 20.000 40.000 30.111 70.356 60.500 40.188 70.000 60.220 60.448 30.050 70.000 10.000 10.000 50.000 40.032 70.000 40.000 10.396 30.000 60.573 60.000 70.228 50.747 60.000 40.000 10.573 70.021 70.000 50.000 50.000 20.000 50.500 60.573 50.000 40.000 70.125 60.592 40.364 70.000 50.450 70.000 40.364 30.000 10.000 50.000 20.340 50.000 20.000 40.000 10.610 50.833 20.221 20.702 60.000 40.000 30.135 70.094 60.125 20.571 60.500 50.143 70.000 20.125 50.000 30.618 30.667 60.115 70.000 40.125 41.000 10.000 30.500 40.000 20.000 30.000 20.502 60.000 10.000 10.312 60.248 70.050 60.000 70.000 50.997 30.420 50.500 50.149 70.451 20.748 40.000 10.000 30.636 40.667 70.600 30.000 10.000 40.278 70.000 10.333 50.000 70.294 40.381 70.110 5
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.246 50.413 50.170 50.130 50.455 70.003 70.500 10.000 10.000 30.000 10.017 60.333 60.111 71.000 10.681 60.400 40.000 40.000 41.000 10.003 70.000 20.167 40.190 40.637 30.067 50.000 20.081 50.000 10.000 20.000 50.264 60.000 40.000 10.000 50.000 30.387 60.031 70.754 40.000 20.000 40.151 40.135 40.056 60.000 20.000 10.582 50.589 70.500 20.815 31.000 10.903 10.000 50.097 40.588 50.000 30.000 50.234 50.000 50.500 50.400 10.682 60.156 50.159 60.750 10.046 50.125 60.660 40.000 20.200 40.000 70.000 20.000 40.000 30.164 50.402 50.500 40.373 50.025 50.143 70.426 40.317 40.000 10.000 10.000 50.000 40.063 50.000 40.000 10.000 60.000 60.575 50.250 40.241 40.772 50.000 40.000 10.653 60.034 50.000 50.000 50.000 20.000 51.000 10.561 60.000 40.100 40.500 10.541 50.452 50.000 50.581 50.000 40.364 30.000 10.000 50.000 20.571 40.000 20.000 40.000 10.568 60.511 60.167 40.857 40.000 40.000 30.164 40.112 50.000 60.530 71.000 10.286 50.000 20.125 50.000 30.464 70.706 50.208 50.000 40.125 40.744 40.000 30.500 40.000 20.000 30.000 20.511 50.000 10.000 10.344 40.541 50.068 50.333 30.000 51.000 10.196 60.533 40.318 50.000 60.748 50.000 10.000 30.690 31.000 10.400 50.000 10.000 40.667 30.000 10.333 50.333 30.270 50.399 50.083 6
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.