The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.388 10.542 10.357 10.237 10.808 20.676 20.741 10.832 40.496 10.151 30.628 20.021 20.955 10.578 10.753 10.612 10.591 10.822 50.609 30.926 10.614 30.291 10.725 40.163 10.890 20.380 50.615 10.517 10.130 30.806 10.857 20.024 20.511 10.412 50.226 10.597 20.756 11.000 10.111 10.792 10.736 10.091 10.610 10.527 20.323 41.000 10.504 10.063 21.000 10.853 10.010 10.974 30.839 10.667 10.301 10.883 10.266 10.039 10.640 10.311 20.739 20.463 11.000 10.000 10.287 20.715 20.313 20.600 11.000 10.027 10.076 40.502 50.500 10.409 10.000 10.194 10.125 20.500 10.491 10.748 10.050 40.042 20.776 20.352 10.008 10.000 10.033 10.254 10.000 10.005 20.552 10.008 20.020 20.750 10.500 10.409 20.065 30.511 10.107 10.178 20.000 11.000 10.400 10.016 10.000 10.400 10.571 10.000 10.060 20.044 20.000 10.514 10.278 11.000 10.258 10.017 30.125 50.000 10.792 30.399 31.000 10.000 10.013 20.265 10.018 20.000 21.000 10.335 10.381 10.500 10.250 10.004 20.000 10.727 10.000 10.497 10.000 10.188 10.677 20.000 10.708 20.000 10.000 10.945 10.391 10.123 40.000 10.028 10.000 11.000 10.000 10.099 10.451 10.400 10.668 10.573 10.606 10.077 50.003 40.004 10.000 10.042 30.000 10.000 11.000 11.000 10.000 10.042 10.000 20.200 20.302 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.320 20.501 20.264 20.164 20.841 10.679 10.716 20.879 20.280 30.192 10.634 10.231 10.733 30.459 20.565 30.498 50.560 21.000 10.686 10.890 20.708 10.123 40.820 10.152 20.967 10.456 10.458 20.387 20.194 10.435 50.906 10.077 10.396 20.509 10.217 20.715 10.619 21.000 10.099 20.792 10.513 20.062 20.506 30.549 10.605 11.000 10.123 40.106 11.000 10.744 40.000 21.000 10.504 50.525 20.185 20.790 40.101 20.008 20.587 20.356 10.817 10.083 51.000 10.000 10.621 10.842 10.415 10.268 40.083 40.000 20.098 30.881 10.125 20.000 20.000 10.000 20.000 30.125 40.332 30.448 50.202 20.196 10.798 10.264 20.000 20.000 10.017 20.233 20.000 10.063 10.333 20.038 10.111 10.250 30.000 20.516 10.208 10.470 20.094 30.218 10.000 10.667 20.033 50.000 20.000 10.400 10.156 20.000 10.267 10.226 10.000 10.104 20.159 20.299 50.095 30.458 10.500 10.000 11.000 10.472 10.792 30.000 10.022 10.061 20.250 10.008 10.250 20.333 20.143 20.396 20.049 20.012 10.000 10.283 40.000 10.241 40.000 10.101 20.331 40.000 10.629 30.000 10.000 10.857 20.222 30.677 10.000 10.003 20.000 10.000 20.000 10.076 20.252 30.400 10.431 20.061 30.328 30.331 40.500 10.000 20.000 10.167 10.000 10.000 10.000 20.500 20.000 10.000 21.000 10.542 10.000 20.063 10.000 20.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.203 50.369 40.134 50.078 50.706 40.382 40.693 30.845 30.221 50.150 40.158 40.000 30.746 20.369 40.545 40.595 20.387 40.997 30.413 50.720 50.636 20.165 30.732 30.070 40.851 40.402 40.251 40.313 40.123 40.583 40.696 30.000 30.051 50.500 20.000 30.500 40.372 50.667 40.009 40.000 30.307 50.003 40.479 40.107 50.226 50.903 40.109 50.031 30.981 30.726 50.000 20.522 50.669 20.282 50.052 50.778 50.000 40.000 30.400 30.074 40.333 40.218 41.000 10.000 10.250 30.406 50.118 50.317 20.100 30.000 20.191 10.596 20.000 30.000 20.000 10.000 20.000 30.500 10.178 50.701 20.000 50.000 30.522 50.018 50.000 20.000 10.000 30.060 40.000 10.000 30.033 50.000 30.000 30.000 40.000 20.281 30.100 20.000 50.090 40.133 40.000 10.422 50.050 40.000 20.000 10.200 30.000 50.000 10.000 30.000 30.000 10.000 40.123 40.677 20.021 40.000 40.500 10.000 10.500 40.442 20.125 50.000 10.000 30.000 30.000 30.000 20.000 30.056 40.000 30.000 30.000 30.000 30.000 10.200 50.000 10.143 50.000 10.000 30.250 50.000 10.511 40.000 10.000 10.286 30.083 40.396 20.000 10.000 30.000 10.000 20.000 10.025 40.300 20.000 30.371 30.070 20.000 40.385 30.000 50.000 20.000 10.000 50.000 10.000 10.000 20.500 20.000 10.000 20.000 20.200 20.000 20.000 20.000 20.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.209 40.361 50.157 40.085 40.700 50.248 50.634 50.776 50.322 20.135 50.103 50.000 30.524 50.364 50.618 20.592 30.381 50.997 30.589 40.747 40.340 50.109 50.768 20.059 50.702 50.448 20.188 50.149 50.091 50.636 30.573 50.000 30.246 30.500 20.000 30.450 50.405 30.667 40.006 50.000 30.356 40.007 30.506 20.420 30.340 30.667 50.294 20.004 40.571 40.748 20.000 21.000 10.573 40.502 40.094 40.807 30.000 40.000 30.400 30.000 50.278 50.228 31.000 10.000 10.115 50.432 40.198 30.050 50.125 20.000 20.000 50.573 30.000 30.000 20.000 10.000 20.000 30.125 40.312 40.610 30.221 10.000 30.667 40.050 40.000 20.000 10.000 30.032 50.000 10.000 30.083 30.000 30.000 30.000 40.000 20.220 40.000 50.125 30.000 50.111 50.000 10.667 20.200 30.000 20.000 10.000 40.110 30.000 10.000 30.000 30.000 10.000 40.053 50.500 40.000 50.000 40.500 10.000 10.500 40.333 40.500 40.000 10.000 30.000 30.000 30.000 20.000 30.000 50.000 30.000 30.000 30.000 30.000 10.600 20.000 10.364 20.000 10.000 30.750 10.000 10.833 10.000 10.000 10.143 50.000 50.396 20.000 10.000 30.000 10.000 20.000 10.021 50.221 40.000 30.093 50.055 40.451 20.677 20.125 20.000 20.000 10.028 40.000 10.000 10.000 20.500 20.000 10.000 20.000 20.050 40.000 20.000 20.000 20.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.246 30.413 30.170 30.130 30.754 30.541 30.682 40.903 10.264 40.164 20.234 30.000 30.681 40.452 30.464 50.541 40.399 31.000 10.637 20.772 30.588 40.190 20.589 50.081 30.857 30.426 30.373 30.318 30.135 20.690 20.653 40.000 30.159 40.500 20.000 30.581 30.387 41.000 10.046 30.000 30.402 30.003 50.455 50.196 40.571 21.000 10.270 30.003 50.530 50.748 30.000 20.744 40.575 30.511 30.112 30.815 20.067 30.000 30.400 30.167 30.667 30.241 21.000 10.000 10.208 40.660 30.125 40.317 20.000 50.000 20.100 20.561 40.000 30.000 20.000 10.000 21.000 10.500 10.344 20.568 40.167 30.000 30.706 30.068 30.000 20.000 10.000 30.063 30.000 10.000 30.056 40.000 30.000 30.500 20.000 20.143 50.017 40.125 30.097 20.164 30.000 10.582 40.400 10.000 20.000 10.000 40.083 40.000 10.000 30.000 30.000 10.025 30.156 30.533 30.250 20.200 20.500 10.000 11.000 10.333 41.000 10.000 10.000 30.000 30.000 30.000 20.000 30.333 20.000 30.000 30.000 30.000 30.000 10.400 30.000 10.364 20.000 10.000 30.500 30.000 10.511 40.000 10.000 10.286 30.333 20.000 50.000 10.000 30.000 10.000 20.000 10.034 30.111 50.000 30.333 40.031 50.000 40.750 10.125 20.000 20.000 10.151 20.000 10.000 10.000 20.500 20.000 10.000 20.000 20.000 50.000 20.000 20.000 20.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.