The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apchairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
ODIN - Ins200permissive0.265 20.349 20.268 10.163 20.485 60.366 40.549 20.492 60.421 10.229 10.265 30.003 30.609 20.297 20.320 20.327 20.251 30.848 40.314 50.526 30.324 50.138 20.529 20.178 10.440 50.186 60.306 20.546 10.160 10.494 40.476 30.016 20.231 30.594 10.000 30.615 10.357 30.630 40.141 10.167 30.665 10.054 20.360 20.451 20.610 10.769 40.640 10.032 20.746 20.698 20.040 10.389 40.550 20.371 10.257 20.617 40.310 10.000 30.481 30.022 50.463 10.160 21.000 10.125 10.193 30.267 30.253 30.156 30.000 50.000 20.332 10.606 20.444 10.000 20.000 10.281 11.000 10.417 30.344 20.238 60.218 10.000 30.655 30.506 10.000 20.052 10.000 30.091 30.000 10.035 10.370 10.000 30.000 30.250 20.903 10.037 60.031 10.221 10.197 10.285 10.037 10.191 60.200 30.083 10.000 10.200 30.115 20.000 10.250 10.552 10.278 10.077 20.107 20.389 20.674 10.565 10.278 10.000 10.361 60.333 40.361 40.000 10.000 30.438 10.451 10.000 21.000 10.074 20.204 20.250 20.250 10.000 30.000 10.493 20.000 10.083 50.000 10.000 30.317 20.000 10.481 20.000 10.000 10.188 30.333 20.345 20.000 10.333 10.000 10.333 10.000 10.035 30.266 20.478 10.506 10.054 30.205 30.119 50.067 20.000 20.000 10.210 10.000 10.000 10.000 20.389 20.097 10.000 20.000 20.111 30.000 20.000 20.889 20.000 1
TD3D Scannet200permissive0.211 30.332 30.177 30.103 30.662 10.413 20.463 30.705 10.192 40.145 20.266 20.215 10.452 50.209 30.222 60.219 60.315 20.893 10.380 20.617 20.439 20.047 50.646 10.080 30.610 30.253 10.237 30.293 30.135 20.379 60.494 20.048 10.252 20.451 30.184 20.483 20.395 20.852 10.083 30.551 20.278 30.036 30.337 30.266 30.544 20.963 10.079 60.039 10.740 30.604 30.000 30.586 10.283 30.282 30.059 30.633 30.028 30.004 20.559 20.309 20.420 30.028 61.000 10.000 20.456 10.411 10.372 10.060 50.046 40.000 20.040 50.694 10.083 30.000 20.000 10.000 30.000 40.083 50.252 30.260 50.200 20.160 10.669 20.111 30.000 20.000 20.006 20.169 20.000 10.007 20.296 30.032 10.074 10.139 40.000 30.321 20.031 20.108 30.088 30.157 20.000 20.231 50.026 60.000 30.000 10.356 20.052 30.000 10.240 20.147 20.000 20.015 30.046 40.144 40.073 40.414 20.222 50.000 10.806 10.343 30.486 30.000 10.008 10.038 30.083 20.002 10.028 30.074 20.032 30.150 30.039 30.008 10.000 10.250 50.000 10.125 40.000 10.052 20.260 40.000 10.143 60.000 10.000 10.543 20.207 30.404 10.000 10.003 30.000 10.000 30.000 10.037 20.093 50.272 30.342 20.039 50.281 20.249 30.224 10.000 20.000 10.074 20.000 10.000 10.000 20.278 30.000 20.000 20.889 10.323 10.000 20.014 10.000 30.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.278 10.383 10.263 20.168 10.661 20.465 10.572 10.665 30.391 20.121 50.304 10.015 20.647 10.349 10.474 10.489 10.321 10.816 60.351 30.722 10.402 40.195 10.515 40.082 20.795 10.215 20.396 10.377 20.082 50.724 10.586 10.015 30.277 10.377 60.201 10.475 30.572 10.778 30.089 20.759 10.556 20.068 10.506 10.467 10.323 40.778 20.427 20.027 30.789 10.744 10.003 20.570 20.561 10.337 20.265 10.711 10.258 20.031 10.569 10.311 10.441 20.179 11.000 10.000 20.233 20.411 20.283 20.380 10.667 10.016 10.048 40.418 30.139 20.173 10.000 10.086 20.014 30.500 10.384 10.497 10.044 40.032 20.752 10.287 20.003 10.000 20.007 10.208 10.000 10.001 30.349 20.008 20.014 20.509 10.500 20.323 10.023 30.176 20.107 20.105 40.000 20.605 10.378 10.016 20.000 10.400 10.192 10.000 10.048 30.037 30.000 20.275 10.119 10.810 10.258 20.006 40.083 60.000 10.568 20.377 20.708 10.000 10.005 20.147 20.014 30.000 20.556 20.085 10.325 10.500 10.083 20.004 20.000 10.590 10.000 10.365 10.000 10.116 10.491 10.000 10.626 10.000 10.000 10.579 10.391 10.050 50.000 10.028 20.000 10.222 20.000 10.063 10.302 10.356 20.149 50.573 10.415 10.013 60.002 50.004 10.000 10.005 50.000 10.000 10.444 10.514 10.000 20.028 10.000 20.156 20.267 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
Minkowski 34D Inst.permissive0.130 50.246 50.083 50.043 60.547 50.236 50.415 50.672 20.141 60.133 40.067 50.000 40.521 30.114 60.238 50.289 30.232 50.883 20.182 60.373 60.486 10.076 40.488 50.022 50.529 40.199 50.110 50.217 50.100 30.460 50.319 50.000 40.025 60.472 20.000 30.394 40.210 50.537 50.004 50.000 40.083 60.000 60.299 50.061 60.201 60.761 50.084 50.008 40.720 40.557 60.000 30.317 60.280 40.094 60.020 60.564 60.000 50.000 30.400 40.048 40.259 50.101 41.000 10.000 20.190 40.142 60.094 60.137 40.089 30.000 20.101 20.355 60.000 40.000 20.000 10.000 30.000 40.444 20.082 60.384 20.000 60.000 30.334 60.004 60.000 20.000 20.000 30.041 50.000 10.000 40.026 60.000 30.000 30.000 50.000 30.082 50.022 40.000 60.021 50.088 50.000 20.241 40.033 50.000 30.000 10.067 40.000 60.000 10.000 40.000 40.000 20.000 50.026 50.262 30.016 50.000 50.278 10.000 10.500 40.394 10.028 60.000 10.000 30.000 40.000 40.000 20.000 40.019 50.000 40.000 40.000 40.000 30.000 10.156 60.000 10.032 60.000 10.000 30.194 60.000 10.248 50.000 10.000 10.099 50.019 50.308 30.000 10.000 40.000 10.000 30.000 10.007 50.122 30.000 40.175 40.063 20.000 50.271 10.000 60.000 20.000 10.000 60.000 10.000 10.000 20.278 30.000 20.000 20.000 20.111 30.000 20.000 20.000 30.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 60.223 60.082 60.046 50.564 40.152 60.394 60.578 50.235 30.116 60.034 60.000 40.348 60.119 50.297 30.285 40.202 60.838 50.323 40.407 50.184 60.037 60.516 30.013 60.424 60.214 30.093 60.105 60.078 60.542 30.250 60.000 40.064 50.444 40.000 30.224 60.231 40.537 50.001 60.000 40.126 50.004 40.308 40.193 40.244 50.343 60.228 30.000 60.441 50.588 40.000 30.338 50.275 50.189 50.030 50.600 50.000 50.000 30.378 50.000 60.108 60.098 51.000 10.000 20.096 60.172 50.144 40.011 60.125 20.000 20.000 60.376 50.000 40.000 20.000 10.000 30.000 40.042 60.141 50.377 30.051 30.000 30.483 40.017 50.000 20.000 20.000 30.022 60.000 10.000 40.065 40.000 30.000 30.000 50.000 30.094 40.000 60.042 40.000 60.064 60.000 20.259 30.089 40.000 30.000 10.000 50.022 50.000 10.000 40.000 40.000 20.000 50.018 60.111 60.000 60.000 50.278 10.000 10.444 50.333 40.333 50.000 10.000 30.000 40.000 40.000 20.000 40.000 60.000 40.000 40.000 40.000 30.000 10.267 40.000 10.184 30.000 10.000 30.211 50.000 10.378 30.000 10.000 10.063 60.000 60.275 40.000 10.000 40.000 10.000 30.000 10.007 60.105 40.000 40.032 60.045 40.198 40.171 40.028 30.000 20.000 10.006 40.000 10.000 10.000 20.278 30.000 20.000 20.000 20.044 50.000 20.000 20.000 30.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.154 40.275 40.108 40.060 40.573 30.381 30.434 40.654 40.190 50.141 30.097 40.000 40.503 40.180 40.252 40.242 50.242 40.881 30.448 10.494 40.429 30.078 30.364 60.024 40.654 20.213 40.222 40.239 40.099 40.616 20.363 40.000 40.092 40.444 40.000 30.383 50.209 60.815 20.030 40.000 40.166 40.002 50.295 60.099 50.364 30.778 20.177 40.001 50.427 60.585 50.000 30.470 30.268 60.205 40.045 40.642 20.007 40.000 30.333 60.148 30.407 40.130 31.000 10.000 20.156 50.189 40.097 50.169 20.000 50.000 20.056 30.400 40.000 40.000 20.000 10.000 30.556 20.278 40.203 40.323 40.019 50.000 30.402 50.026 40.000 20.000 20.000 30.044 40.000 10.000 40.037 50.000 30.000 30.181 30.000 30.127 30.006 50.028 50.023 40.115 30.000 20.327 20.267 20.000 30.000 10.000 50.028 40.000 10.000 40.000 40.000 20.003 40.048 30.135 50.222 30.089 30.278 10.000 10.514 30.333 40.611 20.000 10.000 30.000 40.000 40.000 20.000 40.037 40.000 40.000 40.000 40.000 30.000 10.322 30.000 10.209 20.000 10.000 30.278 30.000 10.302 40.000 10.000 10.143 40.148 40.000 60.000 10.000 40.000 10.000 30.000 10.015 40.064 60.000 40.272 30.031 60.000 50.257 20.028 30.000 20.000 10.041 30.000 10.000 10.000 20.222 60.000 20.000 20.000 20.000 60.000 20.000 20.000 30.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.