The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
ODIN - Ins200permissive0.381 40.507 40.375 20.237 30.653 80.614 40.780 30.744 80.566 20.328 10.446 50.003 40.853 40.496 40.582 50.448 80.434 50.938 70.682 20.782 50.494 70.274 40.723 60.269 20.694 80.393 70.511 40.695 20.227 20.550 70.795 40.039 40.602 30.638 20.000 40.734 20.585 50.667 60.163 20.500 40.769 30.108 20.484 60.569 10.688 31.000 10.665 10.093 41.000 10.863 20.049 20.667 60.887 10.778 20.422 30.786 70.550 10.000 50.542 40.028 70.667 40.428 41.000 10.125 10.208 70.530 60.406 40.337 40.000 60.000 30.585 20.742 40.500 20.000 30.000 10.472 21.000 10.417 60.563 10.631 40.275 10.000 50.800 30.841 10.000 20.083 10.000 40.174 50.000 10.055 40.667 10.000 30.000 50.250 51.000 10.286 40.058 60.391 40.209 30.313 20.167 10.278 80.200 50.083 30.000 20.200 50.264 40.000 20.250 40.714 10.500 10.196 40.333 30.500 60.750 10.668 10.500 10.000 20.500 60.333 61.000 10.000 10.000 50.438 10.500 10.000 31.000 10.333 40.226 20.250 40.250 30.000 30.000 30.668 20.000 10.174 70.000 10.000 50.750 10.000 10.667 50.000 10.000 10.638 50.333 30.579 20.000 30.333 10.000 11.000 10.000 10.063 50.385 40.600 10.647 20.066 50.264 50.469 40.246 20.000 40.000 10.264 20.000 10.000 20.000 31.000 10.125 10.000 30.000 20.200 30.000 30.000 21.000 10.000 1
TD3D Scannet200permissive0.320 50.501 50.264 50.164 50.841 30.679 20.716 50.879 30.280 60.192 40.634 30.231 20.733 60.459 50.565 60.498 70.560 41.000 10.686 10.890 40.708 10.123 70.820 10.152 50.967 10.456 30.458 50.387 50.194 30.435 80.906 10.077 30.396 50.509 40.217 20.715 30.619 41.000 10.099 50.792 10.513 50.062 40.506 50.549 20.605 41.000 10.123 70.106 31.000 10.744 70.000 51.000 10.504 80.525 50.185 50.790 60.101 50.008 40.587 30.356 20.817 10.083 81.000 10.000 30.621 10.842 10.415 30.268 70.083 50.000 30.098 60.881 10.125 50.000 30.000 10.000 50.000 50.125 70.332 60.448 80.202 40.196 20.798 40.264 50.000 20.000 30.017 20.233 30.000 10.063 30.333 40.038 10.111 30.250 50.000 50.516 20.208 30.470 30.094 60.218 30.000 20.667 40.033 80.000 50.000 20.400 20.156 50.000 20.267 30.226 40.000 40.104 50.159 50.299 80.095 60.458 20.500 10.000 21.000 10.472 30.792 60.000 10.022 30.061 30.250 30.008 20.250 30.333 40.143 30.396 30.049 50.012 10.000 30.283 70.000 10.241 60.000 10.101 40.331 70.000 10.629 60.000 10.000 10.857 20.222 50.677 10.000 30.003 50.000 10.000 50.000 10.076 40.252 60.400 20.431 50.061 60.328 30.331 60.500 10.000 40.000 10.167 40.000 10.000 20.000 30.500 50.000 30.000 31.000 10.542 10.000 30.063 10.000 50.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.388 30.542 30.357 40.237 40.808 40.676 30.741 40.832 50.496 30.151 60.628 40.021 30.955 30.578 30.753 20.612 30.591 30.822 80.609 40.926 30.614 50.291 30.725 50.163 40.890 30.380 80.615 10.517 40.130 60.806 10.857 20.024 50.511 40.412 80.226 10.597 40.756 31.000 10.111 30.792 10.736 40.091 30.610 30.527 30.323 71.000 10.504 30.063 51.000 10.853 30.010 40.974 30.839 20.667 40.301 40.883 10.266 30.039 30.640 20.311 30.739 30.463 31.000 10.000 30.287 30.715 40.313 50.600 21.000 10.027 10.076 70.502 80.500 20.409 20.000 10.194 40.125 40.500 10.491 40.748 10.050 70.042 40.776 50.352 40.008 10.000 30.033 10.254 20.000 10.005 50.552 20.008 20.020 40.750 10.500 30.409 30.065 50.511 20.107 40.178 50.000 21.000 10.400 10.016 40.000 20.400 20.571 10.000 20.060 50.044 50.000 40.514 30.278 41.000 10.258 40.017 60.125 80.000 20.792 50.399 51.000 10.000 10.013 40.265 20.018 50.000 31.000 10.335 30.381 10.500 10.250 30.004 20.000 30.727 10.000 10.497 30.000 10.188 30.677 40.000 10.708 30.000 10.000 10.945 10.391 20.123 60.000 30.028 30.000 11.000 10.000 10.099 30.451 20.400 20.668 10.573 10.606 10.077 80.003 70.004 30.000 10.042 60.000 10.000 21.000 11.000 10.000 30.042 20.000 20.200 30.302 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
DINO3D-Scannet200copyleft0.454 10.587 10.453 10.296 10.871 20.703 10.845 20.891 20.572 10.312 20.753 10.001 50.981 20.773 10.767 10.771 20.614 20.944 60.586 70.937 20.690 20.381 20.716 70.409 10.918 20.803 10.602 20.777 10.290 10.721 30.779 50.096 10.728 10.677 10.000 40.944 10.793 21.000 10.214 10.708 30.823 20.200 10.851 10.499 41.000 10.764 70.473 40.248 21.000 10.911 10.216 10.667 60.824 30.857 10.616 10.842 30.496 20.046 20.355 80.494 10.405 60.507 11.000 10.042 20.264 40.743 30.683 10.675 10.125 20.000 30.600 10.816 20.417 40.000 30.000 10.764 10.000 50.500 10.563 20.720 20.079 60.442 10.845 20.835 20.000 20.000 30.000 40.324 10.000 10.117 10.083 50.000 30.419 10.500 21.000 10.777 10.378 10.594 10.361 20.327 10.000 20.764 20.400 10.548 10.000 20.800 10.437 30.000 20.346 20.714 10.125 30.662 10.475 20.866 20.750 10.400 30.500 10.500 11.000 10.667 11.000 10.000 10.298 10.000 40.250 30.194 10.000 50.850 10.000 40.250 40.595 10.000 30.063 10.520 50.000 10.571 20.000 10.944 10.750 10.000 10.974 10.000 10.000 10.857 20.655 10.000 70.250 20.014 40.000 11.000 10.000 10.116 20.729 10.200 50.545 30.436 20.221 60.750 10.177 40.011 20.000 10.284 10.000 10.000 20.000 30.792 40.050 20.000 30.000 20.200 30.000 30.000 21.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
CSC-Pretrain Inst.permissive0.209 70.361 80.157 70.085 70.700 70.248 80.634 80.776 70.322 40.135 80.103 80.000 60.524 80.364 80.618 40.592 50.381 80.997 40.589 60.747 70.340 80.109 80.768 30.059 80.702 70.448 40.188 80.149 80.091 80.636 50.573 80.000 60.246 60.500 50.000 40.450 80.405 60.667 60.006 80.000 60.356 70.007 60.506 40.420 50.340 60.667 80.294 50.004 70.571 70.748 50.000 51.000 10.573 70.502 70.094 70.807 50.000 70.000 50.400 50.000 80.278 80.228 61.000 10.000 30.115 80.432 70.198 60.050 80.125 20.000 30.000 80.573 60.000 60.000 30.000 10.000 50.000 50.125 70.312 70.610 60.221 20.000 50.667 70.050 70.000 20.000 30.000 40.032 80.000 10.000 60.083 50.000 30.000 50.000 70.000 50.220 60.000 80.125 60.000 80.111 80.000 20.667 40.200 50.000 50.000 20.000 70.110 60.000 20.000 60.000 60.000 40.000 70.053 80.500 60.000 80.000 70.500 10.000 20.500 60.333 60.500 70.000 10.000 50.000 40.000 60.000 30.000 50.000 80.000 40.000 60.000 60.000 30.000 30.600 40.000 10.364 40.000 10.000 50.750 10.000 10.833 20.000 10.000 10.143 80.000 80.396 30.000 30.000 60.000 10.000 50.000 10.021 80.221 70.000 60.093 80.055 70.451 20.677 30.125 50.000 40.000 10.028 70.000 10.000 20.000 30.500 50.000 30.000 30.000 20.050 70.000 30.000 20.000 50.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
CompetitorFormer-2000.415 20.574 20.370 30.274 20.885 10.584 50.846 10.779 60.318 50.205 30.704 20.400 10.987 10.651 20.731 30.830 10.682 11.000 10.599 50.957 10.685 30.428 10.806 20.196 30.870 40.641 20.600 30.583 30.183 40.780 20.833 30.095 20.663 20.538 30.021 30.540 60.845 10.903 50.103 40.083 50.881 10.054 50.632 20.311 60.745 21.000 10.545 20.378 10.933 60.832 40.015 30.684 50.748 40.700 30.562 20.869 20.218 40.064 10.885 10.243 40.794 20.484 21.000 10.000 30.289 20.758 20.482 20.452 30.000 60.015 20.286 30.759 30.663 11.000 10.000 10.380 30.250 30.500 10.491 30.622 50.213 30.131 30.877 10.602 30.000 20.005 20.008 30.209 40.000 10.089 20.399 30.000 30.160 20.500 20.500 30.144 70.260 20.347 50.443 10.207 40.000 20.724 30.400 10.125 20.083 10.317 40.462 20.083 10.565 10.587 30.500 10.648 20.551 10.750 30.508 30.018 50.500 10.000 21.000 10.667 11.000 10.000 10.142 20.000 40.500 10.000 30.125 40.489 20.000 40.500 10.269 20.000 30.050 20.625 30.000 10.581 10.000 10.677 20.467 60.000 10.694 40.000 10.000 10.820 40.071 70.215 51.000 10.103 20.000 11.000 10.000 10.132 10.410 30.327 40.541 40.232 30.292 40.261 70.186 30.157 10.000 10.216 30.000 10.056 10.250 21.000 10.000 30.082 10.000 20.400 20.025 20.000 21.000 10.000 1
LGround Inst.permissive0.246 60.413 60.170 60.130 60.754 50.541 60.682 70.903 10.264 70.164 50.234 60.000 60.681 70.452 60.464 80.541 60.399 61.000 10.637 30.772 60.588 60.190 50.589 80.081 60.857 50.426 50.373 60.318 60.135 50.690 40.653 70.000 60.159 70.500 50.000 40.581 50.387 71.000 10.046 60.000 60.402 60.003 80.455 80.196 70.571 51.000 10.270 60.003 80.530 80.748 60.000 50.744 40.575 60.511 60.112 60.815 40.067 60.000 50.400 50.167 50.667 40.241 51.000 10.000 30.208 60.660 50.125 70.317 50.000 60.000 30.100 50.561 70.000 60.000 30.000 10.000 51.000 10.500 10.344 50.568 70.167 50.000 50.706 60.068 60.000 20.000 30.000 40.063 60.000 10.000 60.056 70.000 30.000 50.500 20.000 50.143 80.017 70.125 60.097 50.164 60.000 20.582 60.400 10.000 50.000 20.000 70.083 70.000 20.000 60.000 60.000 40.025 60.156 60.533 50.250 50.200 40.500 10.000 21.000 10.333 61.000 10.000 10.000 50.000 40.000 60.000 30.000 50.333 40.000 40.000 60.000 60.000 30.000 30.400 60.000 10.364 40.000 10.000 50.500 50.000 10.511 70.000 10.000 10.286 60.333 30.000 70.000 30.000 60.000 10.000 50.000 10.034 60.111 80.000 60.333 70.031 80.000 70.750 10.125 50.000 40.000 10.151 50.000 10.000 20.000 30.500 50.000 30.000 30.000 20.000 80.000 30.000 20.000 50.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
Minkowski 34D Inst.permissive0.203 80.369 70.134 80.078 80.706 60.382 70.693 60.845 40.221 80.150 70.158 70.000 60.746 50.369 70.545 70.595 40.387 70.997 40.413 80.720 80.636 40.165 60.732 40.070 70.851 60.402 60.251 70.313 70.123 70.583 60.696 60.000 60.051 80.500 50.000 40.500 70.372 80.667 60.009 70.000 60.307 80.003 70.479 70.107 80.226 80.903 60.109 80.031 60.981 50.726 80.000 50.522 80.669 50.282 80.052 80.778 80.000 70.000 50.400 50.074 60.333 70.218 71.000 10.000 30.250 50.406 80.118 80.317 50.100 40.000 30.191 40.596 50.000 60.000 30.000 10.000 50.000 50.500 10.178 80.701 30.000 80.000 50.522 80.018 80.000 20.000 30.000 40.060 70.000 10.000 60.033 80.000 30.000 50.000 70.000 50.281 50.100 40.000 80.090 70.133 70.000 20.422 70.050 70.000 50.000 20.200 50.000 80.000 20.000 60.000 60.000 40.000 70.123 70.677 40.021 70.000 70.500 10.000 20.500 60.442 40.125 80.000 10.000 50.000 40.000 60.000 30.000 50.056 70.000 40.000 60.000 60.000 30.000 30.200 80.000 10.143 80.000 10.000 50.250 80.000 10.511 70.000 10.000 10.286 60.083 60.396 30.000 30.000 60.000 10.000 50.000 10.025 70.300 50.000 60.371 60.070 40.000 70.385 50.000 80.000 40.000 10.000 80.000 10.000 20.000 30.500 50.000 30.000 30.000 20.200 30.000 30.000 20.000 50.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019