The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
DINO3D-Scannet200copyleft0.454 10.587 10.453 10.296 10.871 10.703 10.845 10.891 20.572 10.312 20.753 10.001 40.981 10.773 10.767 10.771 10.614 10.944 50.586 60.937 10.690 20.381 10.716 60.409 10.918 20.803 10.602 20.777 10.290 10.721 20.779 40.096 10.728 10.677 10.000 30.944 10.793 11.000 10.214 10.708 30.823 10.200 10.851 10.499 41.000 10.764 60.473 30.248 11.000 10.911 10.216 10.667 50.824 30.857 10.616 10.842 20.496 20.046 10.355 70.494 10.405 50.507 11.000 10.042 20.264 30.743 20.683 10.675 10.125 20.000 20.600 10.816 20.417 30.000 20.000 10.764 10.000 40.500 10.563 20.720 20.079 50.442 10.845 10.835 20.000 20.000 20.000 30.324 10.000 10.117 10.083 40.000 30.419 10.500 21.000 10.777 10.378 10.594 10.361 10.327 10.000 20.764 20.400 10.548 10.000 10.800 10.437 20.000 10.346 10.714 10.125 20.662 10.475 10.866 20.750 10.400 30.500 10.500 11.000 10.667 11.000 10.000 10.298 10.000 40.250 20.194 10.000 40.850 10.000 40.250 30.595 10.000 30.063 10.520 40.000 10.571 10.000 10.944 10.750 10.000 10.974 10.000 10.000 10.857 20.655 10.000 60.250 10.014 30.000 11.000 10.000 10.116 10.729 10.200 40.545 30.436 20.221 50.750 10.177 30.011 10.000 10.284 10.000 10.000 10.000 20.792 30.050 20.000 20.000 20.200 20.000 20.000 21.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.381 30.507 30.375 20.237 20.653 70.614 40.780 20.744 70.566 20.328 10.446 40.003 30.853 30.496 30.582 40.448 70.434 40.938 60.682 20.782 40.494 60.274 30.723 50.269 20.694 70.393 60.511 30.695 20.227 20.550 60.795 30.039 30.602 20.638 20.000 30.734 20.585 40.667 50.163 20.500 40.769 20.108 20.484 50.569 10.688 21.000 10.665 10.093 31.000 10.863 20.049 20.667 50.887 10.778 20.422 20.786 60.550 10.000 40.542 30.028 60.667 30.428 31.000 10.125 10.208 60.530 50.406 30.337 30.000 60.000 20.585 20.742 30.500 10.000 20.000 10.472 21.000 10.417 50.563 10.631 40.275 10.000 40.800 20.841 10.000 20.083 10.000 30.174 40.000 10.055 30.667 10.000 30.000 40.250 41.000 10.286 40.058 50.391 40.209 20.313 20.167 10.278 70.200 40.083 20.000 10.200 40.264 30.000 10.250 30.714 10.500 10.196 30.333 20.500 50.750 10.668 10.500 10.000 20.500 50.333 51.000 10.000 10.000 40.438 10.500 10.000 31.000 10.333 30.226 20.250 30.250 20.000 30.000 20.668 20.000 10.174 60.000 10.000 40.750 10.000 10.667 40.000 10.000 10.638 40.333 30.579 20.000 20.333 10.000 11.000 10.000 10.063 40.385 30.600 10.647 20.066 40.264 40.469 40.246 20.000 30.000 10.264 20.000 10.000 10.000 21.000 10.125 10.000 20.000 20.200 20.000 20.000 21.000 10.000 1
Mask3D Scannet2000.388 20.542 20.357 30.237 30.808 30.676 30.741 30.832 50.496 30.151 50.628 30.021 20.955 20.578 20.753 20.612 20.591 20.822 70.609 40.926 20.614 40.291 20.725 40.163 30.890 30.380 70.615 10.517 30.130 50.806 10.857 20.024 40.511 30.412 70.226 10.597 40.756 21.000 10.111 30.792 10.736 30.091 30.610 20.527 30.323 61.000 10.504 20.063 41.000 10.853 30.010 30.974 30.839 20.667 30.301 30.883 10.266 30.039 20.640 10.311 30.739 20.463 21.000 10.000 30.287 20.715 30.313 40.600 21.000 10.027 10.076 60.502 70.500 10.409 10.000 10.194 30.125 30.500 10.491 30.748 10.050 60.042 30.776 40.352 30.008 10.000 20.033 10.254 20.000 10.005 40.552 20.008 20.020 30.750 10.500 30.409 30.065 40.511 20.107 30.178 40.000 21.000 10.400 10.016 30.000 10.400 20.571 10.000 10.060 40.044 40.000 30.514 20.278 31.000 10.258 30.017 50.125 70.000 20.792 40.399 41.000 10.000 10.013 30.265 20.018 40.000 31.000 10.335 20.381 10.500 10.250 20.004 20.000 20.727 10.000 10.497 20.000 10.188 20.677 40.000 10.708 30.000 10.000 10.945 10.391 20.123 50.000 20.028 20.000 11.000 10.000 10.099 20.451 20.400 20.668 10.573 10.606 10.077 70.003 60.004 20.000 10.042 50.000 10.000 11.000 11.000 10.000 30.042 10.000 20.200 20.302 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
CSC-Pretrain Inst.permissive0.209 60.361 70.157 60.085 60.700 60.248 70.634 70.776 60.322 40.135 70.103 70.000 50.524 70.364 70.618 30.592 40.381 70.997 30.589 50.747 60.340 70.109 70.768 20.059 70.702 60.448 30.188 70.149 70.091 70.636 40.573 70.000 50.246 50.500 40.000 30.450 70.405 50.667 50.006 70.000 50.356 60.007 50.506 30.420 50.340 50.667 70.294 40.004 60.571 60.748 40.000 41.000 10.573 60.502 60.094 60.807 40.000 60.000 40.400 40.000 70.278 70.228 51.000 10.000 30.115 70.432 60.198 50.050 70.125 20.000 20.000 70.573 50.000 50.000 20.000 10.000 40.000 40.125 60.312 60.610 50.221 20.000 40.667 60.050 60.000 20.000 20.000 30.032 70.000 10.000 50.083 40.000 30.000 40.000 60.000 40.220 60.000 70.125 50.000 70.111 70.000 20.667 30.200 40.000 40.000 10.000 60.110 50.000 10.000 50.000 50.000 30.000 60.053 70.500 50.000 70.000 60.500 10.000 20.500 50.333 50.500 60.000 10.000 40.000 40.000 50.000 30.000 40.000 70.000 40.000 50.000 50.000 30.000 20.600 30.000 10.364 30.000 10.000 40.750 10.000 10.833 20.000 10.000 10.143 70.000 70.396 30.000 20.000 50.000 10.000 40.000 10.021 70.221 60.000 50.093 70.055 60.451 20.677 30.125 40.000 30.000 10.028 60.000 10.000 10.000 20.500 40.000 30.000 20.000 20.050 60.000 20.000 20.000 40.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.246 50.413 50.170 50.130 50.754 40.541 50.682 60.903 10.264 60.164 40.234 50.000 50.681 60.452 50.464 70.541 50.399 51.000 10.637 30.772 50.588 50.190 40.589 70.081 50.857 40.426 40.373 50.318 50.135 40.690 30.653 60.000 50.159 60.500 40.000 30.581 50.387 61.000 10.046 50.000 50.402 50.003 70.455 70.196 60.571 41.000 10.270 50.003 70.530 70.748 50.000 40.744 40.575 50.511 50.112 50.815 30.067 50.000 40.400 40.167 40.667 30.241 41.000 10.000 30.208 50.660 40.125 60.317 40.000 60.000 20.100 40.561 60.000 50.000 20.000 10.000 41.000 10.500 10.344 40.568 60.167 40.000 40.706 50.068 50.000 20.000 20.000 30.063 50.000 10.000 50.056 60.000 30.000 40.500 20.000 40.143 70.017 60.125 50.097 40.164 50.000 20.582 50.400 10.000 40.000 10.000 60.083 60.000 10.000 50.000 50.000 30.025 50.156 50.533 40.250 40.200 40.500 10.000 21.000 10.333 51.000 10.000 10.000 40.000 40.000 50.000 30.000 40.333 30.000 40.000 50.000 50.000 30.000 20.400 50.000 10.364 30.000 10.000 40.500 50.000 10.511 60.000 10.000 10.286 50.333 30.000 60.000 20.000 50.000 10.000 40.000 10.034 50.111 70.000 50.333 60.031 70.000 60.750 10.125 40.000 30.000 10.151 40.000 10.000 10.000 20.500 40.000 30.000 20.000 20.000 70.000 20.000 20.000 40.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
TD3D Scannet200permissive0.320 40.501 40.264 40.164 40.841 20.679 20.716 40.879 30.280 50.192 30.634 20.231 10.733 50.459 40.565 50.498 60.560 31.000 10.686 10.890 30.708 10.123 60.820 10.152 40.967 10.456 20.458 40.387 40.194 30.435 70.906 10.077 20.396 40.509 30.217 20.715 30.619 31.000 10.099 40.792 10.513 40.062 40.506 40.549 20.605 31.000 10.123 60.106 21.000 10.744 60.000 41.000 10.504 70.525 40.185 40.790 50.101 40.008 30.587 20.356 20.817 10.083 71.000 10.000 30.621 10.842 10.415 20.268 60.083 50.000 20.098 50.881 10.125 40.000 20.000 10.000 40.000 40.125 60.332 50.448 70.202 30.196 20.798 30.264 40.000 20.000 20.017 20.233 30.000 10.063 20.333 30.038 10.111 20.250 40.000 40.516 20.208 20.470 30.094 50.218 30.000 20.667 30.033 70.000 40.000 10.400 20.156 40.000 10.267 20.226 30.000 30.104 40.159 40.299 70.095 50.458 20.500 10.000 21.000 10.472 20.792 50.000 10.022 20.061 30.250 20.008 20.250 30.333 30.143 30.396 20.049 40.012 10.000 20.283 60.000 10.241 50.000 10.101 30.331 60.000 10.629 50.000 10.000 10.857 20.222 50.677 10.000 20.003 40.000 10.000 40.000 10.076 30.252 50.400 20.431 40.061 50.328 30.331 60.500 10.000 30.000 10.167 30.000 10.000 10.000 20.500 40.000 30.000 21.000 10.542 10.000 20.063 10.000 40.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.203 70.369 60.134 70.078 70.706 50.382 60.693 50.845 40.221 70.150 60.158 60.000 50.746 40.369 60.545 60.595 30.387 60.997 30.413 70.720 70.636 30.165 50.732 30.070 60.851 50.402 50.251 60.313 60.123 60.583 50.696 50.000 50.051 70.500 40.000 30.500 60.372 70.667 50.009 60.000 50.307 70.003 60.479 60.107 70.226 70.903 50.109 70.031 50.981 50.726 70.000 40.522 70.669 40.282 70.052 70.778 70.000 60.000 40.400 40.074 50.333 60.218 61.000 10.000 30.250 40.406 70.118 70.317 40.100 40.000 20.191 30.596 40.000 50.000 20.000 10.000 40.000 40.500 10.178 70.701 30.000 70.000 40.522 70.018 70.000 20.000 20.000 30.060 60.000 10.000 50.033 70.000 30.000 40.000 60.000 40.281 50.100 30.000 70.090 60.133 60.000 20.422 60.050 60.000 40.000 10.200 40.000 70.000 10.000 50.000 50.000 30.000 60.123 60.677 30.021 60.000 60.500 10.000 20.500 50.442 30.125 70.000 10.000 40.000 40.000 50.000 30.000 40.056 60.000 40.000 50.000 50.000 30.000 20.200 70.000 10.143 70.000 10.000 40.250 70.000 10.511 60.000 10.000 10.286 50.083 60.396 30.000 20.000 50.000 10.000 40.000 10.025 60.300 40.000 50.371 50.070 30.000 60.385 50.000 70.000 30.000 10.000 70.000 10.000 10.000 20.500 40.000 30.000 20.000 20.200 20.000 20.000 20.000 40.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019