The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
ODIN - Ins200permissive0.381 20.507 20.375 10.237 10.653 60.614 30.780 10.744 60.566 10.328 10.446 30.003 30.853 20.496 20.582 30.448 60.434 30.938 50.682 20.782 30.494 50.274 20.723 50.269 10.694 60.393 50.511 20.695 10.227 10.550 50.795 30.039 20.602 10.638 10.000 30.734 10.585 30.667 40.163 10.500 30.769 10.108 10.484 40.569 10.688 11.000 10.665 10.093 21.000 10.863 10.049 10.667 50.887 10.778 10.422 10.786 50.550 10.000 30.542 30.028 50.667 30.428 21.000 10.125 10.208 50.530 40.406 20.337 20.000 50.000 20.585 10.742 20.500 10.000 20.000 10.472 11.000 10.417 40.563 10.631 30.275 10.000 30.800 10.841 10.000 20.083 10.000 30.174 30.000 10.055 20.667 10.000 30.000 30.250 31.000 10.286 30.058 40.391 30.209 10.313 10.167 10.278 60.200 30.083 10.000 10.200 30.264 20.000 10.250 20.714 10.500 10.196 20.333 10.500 40.750 10.668 10.500 10.000 10.500 40.333 41.000 10.000 10.000 30.438 10.500 10.000 21.000 10.333 20.226 20.250 30.250 10.000 30.000 10.668 20.000 10.174 50.000 10.000 30.750 10.000 10.667 30.000 10.000 10.638 30.333 20.579 20.000 10.333 10.000 11.000 10.000 10.063 30.385 20.600 10.647 20.066 30.264 40.469 30.246 20.000 20.000 10.264 10.000 10.000 10.000 21.000 10.125 10.000 20.000 20.200 20.000 20.000 21.000 10.000 1
Mask3D Scannet2000.388 10.542 10.357 20.237 20.808 20.676 20.741 20.832 40.496 20.151 40.628 20.021 20.955 10.578 10.753 10.612 10.591 10.822 60.609 40.926 10.614 30.291 10.725 40.163 20.890 20.380 60.615 10.517 20.130 40.806 10.857 20.024 30.511 20.412 60.226 10.597 30.756 11.000 10.111 20.792 10.736 20.091 20.610 10.527 30.323 51.000 10.504 20.063 31.000 10.853 20.010 20.974 30.839 20.667 20.301 20.883 10.266 20.039 10.640 10.311 20.739 20.463 11.000 10.000 20.287 20.715 20.313 30.600 11.000 10.027 10.076 50.502 60.500 10.409 10.000 10.194 20.125 30.500 10.491 20.748 10.050 50.042 20.776 30.352 20.008 10.000 20.033 10.254 10.000 10.005 30.552 20.008 20.020 20.750 10.500 20.409 20.065 30.511 10.107 20.178 30.000 21.000 10.400 10.016 20.000 10.400 10.571 10.000 10.060 30.044 30.000 20.514 10.278 21.000 10.258 20.017 40.125 60.000 10.792 30.399 31.000 10.000 10.013 20.265 20.018 30.000 21.000 10.335 10.381 10.500 10.250 10.004 20.000 10.727 10.000 10.497 10.000 10.188 10.677 30.000 10.708 20.000 10.000 10.945 10.391 10.123 50.000 10.028 20.000 11.000 10.000 10.099 10.451 10.400 20.668 10.573 10.606 10.077 60.003 50.004 10.000 10.042 40.000 10.000 11.000 11.000 10.000 20.042 10.000 20.200 20.302 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.320 30.501 30.264 30.164 30.841 10.679 10.716 30.879 20.280 40.192 20.634 10.231 10.733 40.459 30.565 40.498 50.560 21.000 10.686 10.890 20.708 10.123 50.820 10.152 30.967 10.456 10.458 30.387 30.194 20.435 60.906 10.077 10.396 30.509 20.217 20.715 20.619 21.000 10.099 30.792 10.513 30.062 30.506 30.549 20.605 21.000 10.123 50.106 11.000 10.744 50.000 31.000 10.504 60.525 30.185 30.790 40.101 30.008 20.587 20.356 10.817 10.083 61.000 10.000 20.621 10.842 10.415 10.268 50.083 40.000 20.098 40.881 10.125 30.000 20.000 10.000 30.000 40.125 50.332 40.448 60.202 30.196 10.798 20.264 30.000 20.000 20.017 20.233 20.000 10.063 10.333 30.038 10.111 10.250 30.000 30.516 10.208 10.470 20.094 40.218 20.000 20.667 20.033 60.000 30.000 10.400 10.156 30.000 10.267 10.226 20.000 20.104 30.159 30.299 60.095 40.458 20.500 10.000 11.000 10.472 10.792 40.000 10.022 10.061 30.250 20.008 10.250 30.333 20.143 30.396 20.049 30.012 10.000 10.283 50.000 10.241 40.000 10.101 20.331 50.000 10.629 40.000 10.000 10.857 20.222 40.677 10.000 10.003 30.000 10.000 30.000 10.076 20.252 40.400 20.431 30.061 40.328 30.331 50.500 10.000 20.000 10.167 20.000 10.000 10.000 20.500 30.000 20.000 21.000 10.542 10.000 20.063 10.000 30.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.203 60.369 50.134 60.078 60.706 40.382 50.693 40.845 30.221 60.150 50.158 50.000 40.746 30.369 50.545 50.595 20.387 50.997 30.413 60.720 60.636 20.165 40.732 30.070 50.851 40.402 40.251 50.313 50.123 50.583 40.696 40.000 40.051 60.500 30.000 30.500 50.372 60.667 40.009 50.000 40.307 60.003 50.479 50.107 60.226 60.903 50.109 60.031 40.981 40.726 60.000 30.522 60.669 30.282 60.052 60.778 60.000 50.000 30.400 40.074 40.333 50.218 51.000 10.000 20.250 30.406 60.118 60.317 30.100 30.000 20.191 20.596 30.000 40.000 20.000 10.000 30.000 40.500 10.178 60.701 20.000 60.000 30.522 60.018 60.000 20.000 20.000 30.060 50.000 10.000 40.033 60.000 30.000 30.000 50.000 30.281 40.100 20.000 60.090 50.133 50.000 20.422 50.050 50.000 30.000 10.200 30.000 60.000 10.000 40.000 40.000 20.000 50.123 50.677 20.021 50.000 50.500 10.000 10.500 40.442 20.125 60.000 10.000 30.000 40.000 40.000 20.000 40.056 50.000 40.000 40.000 40.000 30.000 10.200 60.000 10.143 60.000 10.000 30.250 60.000 10.511 50.000 10.000 10.286 40.083 50.396 30.000 10.000 40.000 10.000 30.000 10.025 50.300 30.000 40.371 40.070 20.000 50.385 40.000 60.000 20.000 10.000 60.000 10.000 10.000 20.500 30.000 20.000 20.000 20.200 20.000 20.000 20.000 30.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.209 50.361 60.157 50.085 50.700 50.248 60.634 60.776 50.322 30.135 60.103 60.000 40.524 60.364 60.618 20.592 30.381 60.997 30.589 50.747 50.340 60.109 60.768 20.059 60.702 50.448 20.188 60.149 60.091 60.636 30.573 60.000 40.246 40.500 30.000 30.450 60.405 40.667 40.006 60.000 40.356 50.007 40.506 20.420 40.340 40.667 60.294 30.004 50.571 50.748 30.000 31.000 10.573 50.502 50.094 50.807 30.000 50.000 30.400 40.000 60.278 60.228 41.000 10.000 20.115 60.432 50.198 40.050 60.125 20.000 20.000 60.573 40.000 40.000 20.000 10.000 30.000 40.125 50.312 50.610 40.221 20.000 30.667 50.050 50.000 20.000 20.000 30.032 60.000 10.000 40.083 40.000 30.000 30.000 50.000 30.220 50.000 60.125 40.000 60.111 60.000 20.667 20.200 30.000 30.000 10.000 50.110 40.000 10.000 40.000 40.000 20.000 50.053 60.500 40.000 60.000 50.500 10.000 10.500 40.333 40.500 50.000 10.000 30.000 40.000 40.000 20.000 40.000 60.000 40.000 40.000 40.000 30.000 10.600 30.000 10.364 20.000 10.000 30.750 10.000 10.833 10.000 10.000 10.143 60.000 60.396 30.000 10.000 40.000 10.000 30.000 10.021 60.221 50.000 40.093 60.055 50.451 20.677 20.125 30.000 20.000 10.028 50.000 10.000 10.000 20.500 30.000 20.000 20.000 20.050 50.000 20.000 20.000 30.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.246 40.413 40.170 40.130 40.754 30.541 40.682 50.903 10.264 50.164 30.234 40.000 40.681 50.452 40.464 60.541 40.399 41.000 10.637 30.772 40.588 40.190 30.589 60.081 40.857 30.426 30.373 40.318 40.135 30.690 20.653 50.000 40.159 50.500 30.000 30.581 40.387 51.000 10.046 40.000 40.402 40.003 60.455 60.196 50.571 31.000 10.270 40.003 60.530 60.748 40.000 30.744 40.575 40.511 40.112 40.815 20.067 40.000 30.400 40.167 30.667 30.241 31.000 10.000 20.208 40.660 30.125 50.317 30.000 50.000 20.100 30.561 50.000 40.000 20.000 10.000 31.000 10.500 10.344 30.568 50.167 40.000 30.706 40.068 40.000 20.000 20.000 30.063 40.000 10.000 40.056 50.000 30.000 30.500 20.000 30.143 60.017 50.125 40.097 30.164 40.000 20.582 40.400 10.000 30.000 10.000 50.083 50.000 10.000 40.000 40.000 20.025 40.156 40.533 30.250 30.200 30.500 10.000 11.000 10.333 41.000 10.000 10.000 30.000 40.000 40.000 20.000 40.333 20.000 40.000 40.000 40.000 30.000 10.400 40.000 10.364 20.000 10.000 30.500 40.000 10.511 50.000 10.000 10.286 40.333 20.000 60.000 10.000 40.000 10.000 30.000 10.034 40.111 60.000 40.333 50.031 60.000 50.750 10.125 30.000 20.000 10.151 30.000 10.000 10.000 20.500 30.000 20.000 20.000 20.000 60.000 20.000 20.000 30.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.