The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.388 10.542 10.357 10.237 10.808 10.676 10.741 10.832 30.496 10.151 20.628 10.021 10.955 10.578 10.753 10.612 10.591 10.822 40.609 20.926 10.614 20.291 10.725 30.163 10.890 10.380 40.615 10.517 10.130 20.806 10.857 10.024 10.511 10.412 40.226 10.597 10.756 11.000 10.111 10.792 10.736 10.091 10.610 10.527 10.323 31.000 10.504 10.063 11.000 10.853 10.010 10.974 20.839 10.667 10.301 10.883 10.266 10.039 10.640 10.311 10.739 10.463 11.000 10.000 10.287 10.715 10.313 10.600 11.000 10.027 10.076 30.502 40.500 10.409 10.000 10.194 10.125 20.500 10.491 10.748 10.050 30.042 10.776 10.352 10.008 10.000 10.033 10.254 10.000 10.005 10.552 10.008 10.020 10.750 10.500 10.409 10.065 20.511 10.107 10.178 10.000 11.000 10.400 10.016 10.000 10.400 10.571 10.000 10.060 10.044 10.000 10.514 10.278 11.000 10.258 10.017 20.125 40.000 10.792 20.399 21.000 10.000 10.013 10.265 10.018 10.000 11.000 10.335 10.381 10.500 10.250 10.004 10.000 10.727 10.000 10.497 10.000 10.188 10.677 20.000 10.708 20.000 10.000 10.945 10.391 10.123 30.000 10.028 10.000 11.000 10.000 10.099 10.451 10.400 10.668 10.573 10.606 10.077 40.003 30.004 10.000 10.042 20.000 10.000 11.000 11.000 10.000 10.042 10.000 10.200 10.302 10.000 11.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation.
LGround Inst.permissive0.246 20.413 20.170 20.130 20.754 20.541 20.682 30.903 10.264 30.164 10.234 20.000 20.681 30.452 20.464 40.541 40.399 21.000 10.637 10.772 20.588 30.190 20.589 40.081 20.857 20.426 20.373 20.318 20.135 10.690 20.653 30.000 20.159 30.500 10.000 20.581 20.387 31.000 10.046 20.000 20.402 20.003 40.455 40.196 30.571 11.000 10.270 30.003 40.530 40.748 30.000 20.744 30.575 30.511 20.112 20.815 20.067 20.000 20.400 20.167 20.667 20.241 21.000 10.000 10.208 30.660 20.125 30.317 20.000 40.000 20.100 20.561 30.000 20.000 20.000 10.000 21.000 10.500 10.344 20.568 40.167 20.000 20.706 20.068 20.000 20.000 10.000 20.063 20.000 10.000 20.056 30.000 20.000 20.500 20.000 20.143 40.017 30.125 20.097 20.164 20.000 10.582 30.400 10.000 20.000 10.000 30.083 30.000 10.000 20.000 20.000 10.025 20.156 20.533 30.250 20.200 10.500 10.000 11.000 10.333 31.000 10.000 10.000 20.000 20.000 20.000 10.000 20.333 20.000 20.000 20.000 20.000 20.000 10.400 30.000 10.364 20.000 10.000 20.500 30.000 10.511 30.000 10.000 10.286 20.333 20.000 40.000 10.000 20.000 10.000 20.000 10.034 20.111 40.000 20.333 30.031 40.000 30.750 10.125 10.000 20.000 10.151 10.000 10.000 10.000 20.500 20.000 10.000 20.000 10.000 40.000 20.000 10.000 20.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
Minkowski 34D Inst.permissive0.203 40.369 30.134 40.078 40.706 30.382 30.693 20.845 20.221 40.150 30.158 30.000 20.746 20.369 30.545 30.595 20.387 30.997 20.413 40.720 40.636 10.165 30.732 20.070 30.851 30.402 30.251 30.313 30.123 30.583 40.696 20.000 20.051 40.500 10.000 20.500 30.372 40.667 30.009 30.000 20.307 40.003 30.479 30.107 40.226 40.903 30.109 40.031 20.981 20.726 40.000 20.522 40.669 20.282 40.052 40.778 40.000 30.000 20.400 20.074 30.333 30.218 41.000 10.000 10.250 20.406 40.118 40.317 20.100 30.000 20.191 10.596 10.000 20.000 20.000 10.000 20.000 30.500 10.178 40.701 20.000 40.000 20.522 40.018 40.000 20.000 10.000 20.060 30.000 10.000 20.033 40.000 20.000 20.000 30.000 20.281 20.100 10.000 40.090 30.133 30.000 10.422 40.050 40.000 20.000 10.200 20.000 40.000 10.000 20.000 20.000 10.000 30.123 30.677 20.021 30.000 30.500 10.000 10.500 30.442 10.125 40.000 10.000 20.000 20.000 20.000 10.000 20.056 30.000 20.000 20.000 20.000 20.000 10.200 40.000 10.143 40.000 10.000 20.250 40.000 10.511 30.000 10.000 10.286 20.083 30.396 10.000 10.000 20.000 10.000 20.000 10.025 30.300 20.000 20.371 20.070 20.000 30.385 30.000 40.000 20.000 10.000 40.000 10.000 10.000 20.500 20.000 10.000 20.000 10.200 10.000 20.000 10.000 20.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.209 30.361 40.157 30.085 30.700 40.248 40.634 40.776 40.322 20.135 40.103 40.000 20.524 40.364 40.618 20.592 30.381 40.997 20.589 30.747 30.340 40.109 40.768 10.059 40.702 40.448 10.188 40.149 40.091 40.636 30.573 40.000 20.246 20.500 10.000 20.450 40.405 20.667 30.006 40.000 20.356 30.007 20.506 20.420 20.340 20.667 40.294 20.004 30.571 30.748 20.000 21.000 10.573 40.502 30.094 30.807 30.000 30.000 20.400 20.000 40.278 40.228 31.000 10.000 10.115 40.432 30.198 20.050 40.125 20.000 20.000 40.573 20.000 20.000 20.000 10.000 20.000 30.125 40.312 30.610 30.221 10.000 20.667 30.050 30.000 20.000 10.000 20.032 40.000 10.000 20.083 20.000 20.000 20.000 30.000 20.220 30.000 40.125 20.000 40.111 40.000 10.667 20.200 30.000 20.000 10.000 30.110 20.000 10.000 20.000 20.000 10.000 30.053 40.500 40.000 40.000 30.500 10.000 10.500 30.333 30.500 30.000 10.000 20.000 20.000 20.000 10.000 20.000 40.000 20.000 20.000 20.000 20.000 10.600 20.000 10.364 20.000 10.000 20.750 10.000 10.833 10.000 10.000 10.143 40.000 40.396 10.000 10.000 20.000 10.000 20.000 10.021 40.221 30.000 20.093 40.055 30.451 20.677 20.125 10.000 20.000 10.028 30.000 10.000 10.000 20.500 20.000 10.000 20.000 10.050 30.000 20.000 10.000 20.000 1
Ji Hou, Benjamin Graham, Matthias Nie├čner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021