The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apchairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.278 10.383 10.263 10.168 10.661 10.465 10.572 10.665 20.391 10.121 30.304 10.015 10.647 10.349 10.474 10.489 10.321 10.816 40.351 20.722 10.402 30.195 10.515 20.082 10.795 10.215 10.396 10.377 10.082 30.724 10.586 10.015 10.277 10.377 40.201 10.475 10.572 10.778 20.089 10.759 10.556 10.068 10.506 10.467 10.323 20.778 10.427 10.027 10.789 10.744 10.003 10.570 10.561 10.337 10.265 10.711 10.258 10.031 10.569 10.311 10.441 10.179 11.000 10.000 10.233 10.411 10.283 10.380 10.667 10.016 10.048 30.418 10.139 10.173 10.000 10.086 10.014 20.500 10.384 10.497 10.044 20.032 10.752 10.287 10.003 10.000 10.007 10.208 10.000 10.001 10.349 10.008 10.014 10.509 10.500 10.323 10.023 10.176 10.107 10.105 20.000 10.605 10.378 10.016 10.000 10.400 10.192 10.000 10.048 10.037 10.000 10.275 10.119 10.810 10.258 10.006 20.083 40.000 10.568 10.377 20.708 10.000 10.005 10.147 10.014 10.000 10.556 10.085 10.325 10.500 10.083 10.004 10.000 10.590 10.000 10.365 10.000 10.116 10.491 10.000 10.626 10.000 10.000 10.579 10.391 10.050 30.000 10.028 10.000 10.222 10.000 10.063 10.302 10.356 10.149 30.573 10.415 10.013 40.002 30.004 10.000 10.005 30.000 10.000 10.444 10.514 10.000 10.028 10.000 10.156 10.267 10.000 11.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation.
Minkowski 34D Inst.permissive0.130 30.246 30.083 30.043 40.547 40.236 30.415 30.672 10.141 40.133 20.067 30.000 20.521 20.114 40.238 40.289 20.232 30.883 10.182 40.373 40.486 10.076 30.488 30.022 30.529 30.199 40.110 30.217 30.100 10.460 40.319 30.000 20.025 40.472 10.000 20.394 20.210 30.537 30.004 30.000 20.083 40.000 40.299 30.061 40.201 40.761 30.084 40.008 20.720 20.557 40.000 20.317 40.280 20.094 40.020 40.564 40.000 30.000 20.400 20.048 30.259 30.101 31.000 10.000 10.190 20.142 40.094 40.137 30.089 30.000 20.101 10.355 40.000 20.000 20.000 10.000 20.000 30.444 20.082 40.384 20.000 40.000 20.334 40.004 40.000 20.000 10.000 20.041 30.000 10.000 20.026 40.000 20.000 20.000 30.000 20.082 40.022 20.000 40.021 30.088 30.000 10.241 40.033 40.000 20.000 10.067 20.000 40.000 10.000 20.000 20.000 10.000 30.026 30.262 20.016 30.000 30.278 10.000 10.500 30.394 10.028 40.000 10.000 20.000 20.000 20.000 10.000 20.019 30.000 20.000 20.000 20.000 20.000 10.156 40.000 10.032 40.000 10.000 20.194 40.000 10.248 40.000 10.000 10.099 30.019 30.308 10.000 10.000 20.000 10.000 20.000 10.007 30.122 20.000 20.175 20.063 20.000 30.271 10.000 40.000 20.000 10.000 40.000 10.000 10.000 20.278 20.000 10.000 20.000 10.111 20.000 20.000 10.000 20.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 40.223 40.082 40.046 30.564 30.152 40.394 40.578 40.235 20.116 40.034 40.000 20.348 40.119 30.297 20.285 30.202 40.838 30.323 30.407 30.184 40.037 40.516 10.013 40.424 40.214 20.093 40.105 40.078 40.542 30.250 40.000 20.064 30.444 20.000 20.224 40.231 20.537 30.001 40.000 20.126 30.004 20.308 20.193 20.244 30.343 40.228 20.000 40.441 30.588 20.000 20.338 30.275 30.189 30.030 30.600 30.000 30.000 20.378 30.000 40.108 40.098 41.000 10.000 10.096 40.172 30.144 20.011 40.125 20.000 20.000 40.376 30.000 20.000 20.000 10.000 20.000 30.042 40.141 30.377 30.051 10.000 20.483 20.017 30.000 20.000 10.000 20.022 40.000 10.000 20.065 20.000 20.000 20.000 30.000 20.094 30.000 40.042 20.000 40.064 40.000 10.259 30.089 30.000 20.000 10.000 30.022 30.000 10.000 20.000 20.000 10.000 30.018 40.111 40.000 40.000 30.278 10.000 10.444 40.333 30.333 30.000 10.000 20.000 20.000 20.000 10.000 20.000 40.000 20.000 20.000 20.000 20.000 10.267 30.000 10.184 30.000 10.000 20.211 30.000 10.378 20.000 10.000 10.063 40.000 40.275 20.000 10.000 20.000 10.000 20.000 10.007 40.105 30.000 20.032 40.045 30.198 20.171 30.028 10.000 20.000 10.006 20.000 10.000 10.000 20.278 20.000 10.000 20.000 10.044 30.000 20.000 10.000 20.000 1
Ji Hou, Benjamin Graham, Matthias Nie├čner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.154 20.275 20.108 20.060 20.573 20.381 20.434 20.654 30.190 30.141 10.097 20.000 20.503 30.180 20.252 30.242 40.242 20.881 20.448 10.494 20.429 20.078 20.364 40.024 20.654 20.213 30.222 20.239 20.099 20.616 20.363 20.000 20.092 20.444 20.000 20.383 30.209 40.815 10.030 20.000 20.166 20.002 30.295 40.099 30.364 10.778 10.177 30.001 30.427 40.585 30.000 20.470 20.268 40.205 20.045 20.642 20.007 20.000 20.333 40.148 20.407 20.130 21.000 10.000 10.156 30.189 20.097 30.169 20.000 40.000 20.056 20.400 20.000 20.000 20.000 10.000 20.556 10.278 30.203 20.323 40.019 30.000 20.402 30.026 20.000 20.000 10.000 20.044 20.000 10.000 20.037 30.000 20.000 20.181 20.000 20.127 20.006 30.028 30.023 20.115 10.000 10.327 20.267 20.000 20.000 10.000 30.028 20.000 10.000 20.000 20.000 10.003 20.048 20.135 30.222 20.089 10.278 10.000 10.514 20.333 30.611 20.000 10.000 20.000 20.000 20.000 10.000 20.037 20.000 20.000 20.000 20.000 20.000 10.322 20.000 10.209 20.000 10.000 20.278 20.000 10.302 30.000 10.000 10.143 20.148 20.000 40.000 10.000 20.000 10.000 20.000 10.015 20.064 40.000 20.272 10.031 40.000 30.257 20.028 10.000 20.000 10.041 10.000 10.000 10.000 20.222 40.000 10.000 20.000 10.000 40.000 20.000 10.000 20.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.