The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apchairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
DINO3D-Scannet200copyleft0.346 10.437 10.353 10.229 10.729 10.536 10.659 10.733 10.431 10.264 10.388 10.001 40.764 10.529 10.462 20.669 10.411 10.925 10.371 30.766 10.545 10.263 10.574 20.257 10.714 20.504 10.325 20.726 10.206 10.618 20.628 10.066 10.297 10.558 20.000 30.732 10.594 10.940 10.199 10.558 20.752 10.174 10.687 10.470 10.921 10.764 50.345 30.142 10.731 40.780 10.138 10.514 30.712 10.556 10.417 10.719 10.407 10.042 10.292 70.456 10.245 60.266 11.000 10.042 20.247 20.446 10.373 10.241 20.049 40.000 20.328 20.536 30.417 20.000 20.000 10.764 10.000 40.500 10.406 10.520 10.045 40.442 10.803 10.681 10.000 20.000 20.000 30.251 10.000 10.027 20.083 40.000 30.303 10.306 20.889 20.551 10.094 10.264 10.361 10.253 20.000 20.611 10.400 10.516 10.000 10.599 10.279 10.000 10.346 10.642 10.111 20.282 10.183 10.664 20.750 10.378 30.333 10.500 10.514 30.593 10.708 10.000 10.238 10.000 40.250 20.111 10.000 40.484 10.000 40.250 20.585 10.000 30.063 10.487 30.000 10.365 10.000 10.772 10.639 10.000 10.769 10.000 10.000 10.545 20.655 10.000 60.250 10.014 30.000 10.222 20.000 10.082 10.618 10.156 40.384 20.436 20.130 50.246 40.049 30.009 10.000 10.192 20.000 10.000 10.000 20.477 20.028 20.000 20.000 20.156 20.000 20.000 21.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.265 30.349 30.268 20.163 30.485 70.366 50.549 30.492 70.421 20.229 20.265 40.003 30.609 30.297 30.320 30.327 30.251 40.848 50.314 60.526 40.324 60.138 30.529 30.178 20.440 60.186 70.306 30.546 20.160 20.494 50.476 40.016 30.231 40.594 10.000 30.615 20.357 40.630 50.141 20.167 40.665 20.054 30.360 30.451 30.610 20.769 40.640 10.032 30.746 20.698 30.040 20.389 50.550 30.371 20.257 30.617 50.310 20.000 40.481 30.022 60.463 10.160 31.000 10.125 10.193 40.267 40.253 40.156 40.000 60.000 20.332 10.606 20.444 10.000 20.000 10.281 21.000 10.417 40.344 30.238 70.218 10.000 40.655 40.506 20.000 20.052 10.000 30.091 40.000 10.035 10.370 10.000 30.000 40.250 30.903 10.037 70.031 20.221 20.197 20.285 10.037 10.191 70.200 40.083 20.000 10.200 40.115 30.000 10.250 20.552 20.278 10.077 30.107 30.389 30.674 20.565 10.278 20.000 20.361 70.333 50.361 50.000 10.000 40.438 10.451 10.000 31.000 10.074 30.204 20.250 20.250 20.000 30.000 20.493 20.000 10.083 60.000 10.000 40.317 30.000 10.481 30.000 10.000 10.188 40.333 30.345 20.000 20.333 10.000 10.333 10.000 10.035 40.266 30.478 10.506 10.054 40.205 30.119 60.067 20.000 30.000 10.210 10.000 10.000 10.000 20.389 30.097 10.000 20.000 20.111 40.000 20.000 20.889 30.000 1
TD3D Scannet200permissive0.211 40.332 40.177 40.103 40.662 20.413 30.463 40.705 20.192 50.145 30.266 30.215 10.452 60.209 40.222 70.219 70.315 30.893 20.380 20.617 30.439 30.047 60.646 10.080 40.610 40.253 20.237 40.293 40.135 30.379 70.494 30.048 20.252 30.451 40.184 20.483 30.395 30.852 20.083 40.551 30.278 40.036 40.337 40.266 40.544 30.963 10.079 70.039 20.740 30.604 40.000 40.586 10.283 40.282 40.059 40.633 40.028 40.004 30.559 20.309 30.420 30.028 71.000 10.000 30.456 10.411 20.372 20.060 60.046 50.000 20.040 60.694 10.083 40.000 20.000 10.000 40.000 40.083 60.252 40.260 60.200 20.160 20.669 30.111 40.000 20.000 20.006 20.169 30.000 10.007 30.296 30.032 10.074 20.139 50.000 40.321 30.031 30.108 40.088 40.157 30.000 20.231 60.026 70.000 40.000 10.356 30.052 40.000 10.240 30.147 30.000 30.015 40.046 50.144 50.073 50.414 20.222 60.000 20.806 10.343 40.486 40.000 10.008 20.038 30.083 30.002 20.028 30.074 30.032 30.150 40.039 40.008 10.000 20.250 60.000 10.125 50.000 10.052 30.260 50.000 10.143 70.000 10.000 10.543 30.207 40.404 10.000 20.003 40.000 10.000 40.000 10.037 30.093 60.272 30.342 30.039 60.281 20.249 30.224 10.000 30.000 10.074 30.000 10.000 10.000 20.278 40.000 30.000 20.889 10.323 10.000 20.014 10.000 40.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.278 20.383 20.263 30.168 20.661 30.465 20.572 20.665 40.391 30.121 60.304 20.015 20.647 20.349 20.474 10.489 20.321 20.816 70.351 40.722 20.402 50.195 20.515 50.082 30.795 10.215 30.396 10.377 30.082 60.724 10.586 20.015 40.277 20.377 70.201 10.475 40.572 20.778 40.089 30.759 10.556 30.068 20.506 20.467 20.323 50.778 20.427 20.027 40.789 10.744 20.003 30.570 20.561 20.337 30.265 20.711 20.258 30.031 20.569 10.311 20.441 20.179 21.000 10.000 30.233 30.411 30.283 30.380 10.667 10.016 10.048 50.418 40.139 30.173 10.000 10.086 30.014 30.500 10.384 20.497 20.044 50.032 30.752 20.287 30.003 10.000 20.007 10.208 20.000 10.001 40.349 20.008 20.014 30.509 10.500 30.323 20.023 40.176 30.107 30.105 50.000 20.605 20.378 20.016 30.000 10.400 20.192 20.000 10.048 40.037 40.000 30.275 20.119 20.810 10.258 30.006 50.083 70.000 20.568 20.377 30.708 10.000 10.005 30.147 20.014 40.000 30.556 20.085 20.325 10.500 10.083 30.004 20.000 20.590 10.000 10.365 20.000 10.116 20.491 20.000 10.626 20.000 10.000 10.579 10.391 20.050 50.000 20.028 20.000 10.222 20.000 10.063 20.302 20.356 20.149 60.573 10.415 10.013 70.002 60.004 20.000 10.005 60.000 10.000 10.444 10.514 10.000 30.028 10.000 20.156 20.267 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
Minkowski 34D Inst.permissive0.130 60.246 60.083 60.043 70.547 60.236 60.415 60.672 30.141 70.133 50.067 60.000 50.521 40.114 70.238 60.289 40.232 60.883 30.182 70.373 70.486 20.076 50.488 60.022 60.529 50.199 60.110 60.217 60.100 40.460 60.319 60.000 50.025 70.472 30.000 30.394 50.210 60.537 60.004 60.000 50.083 70.000 70.299 60.061 70.201 70.761 60.084 60.008 50.720 50.557 70.000 40.317 70.280 50.094 70.020 70.564 70.000 60.000 40.400 40.048 50.259 50.101 51.000 10.000 30.190 50.142 70.094 70.137 50.089 30.000 20.101 30.355 70.000 50.000 20.000 10.000 40.000 40.444 30.082 70.384 30.000 70.000 40.334 70.004 70.000 20.000 20.000 30.041 60.000 10.000 50.026 70.000 30.000 40.000 60.000 40.082 60.022 50.000 70.021 60.088 60.000 20.241 50.033 60.000 40.000 10.067 50.000 70.000 10.000 50.000 50.000 30.000 60.026 60.262 40.016 60.000 60.278 20.000 20.500 50.394 20.028 70.000 10.000 40.000 40.000 50.000 30.000 40.019 60.000 40.000 50.000 50.000 30.000 20.156 70.000 10.032 70.000 10.000 40.194 70.000 10.248 60.000 10.000 10.099 60.019 60.308 30.000 20.000 50.000 10.000 40.000 10.007 60.122 40.000 50.175 50.063 30.000 60.271 10.000 70.000 30.000 10.000 70.000 10.000 10.000 20.278 40.000 30.000 20.000 20.111 40.000 20.000 20.000 40.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 70.223 70.082 70.046 60.564 50.152 70.394 70.578 60.235 40.116 70.034 70.000 50.348 70.119 60.297 40.285 50.202 70.838 60.323 50.407 60.184 70.037 70.516 40.013 70.424 70.214 40.093 70.105 70.078 70.542 40.250 70.000 50.064 60.444 50.000 30.224 70.231 50.537 60.001 70.000 50.126 60.004 50.308 50.193 50.244 60.343 70.228 40.000 70.441 60.588 50.000 40.338 60.275 60.189 60.030 60.600 60.000 60.000 40.378 50.000 70.108 70.098 61.000 10.000 30.096 70.172 60.144 50.011 70.125 20.000 20.000 70.376 60.000 50.000 20.000 10.000 40.000 40.042 70.141 60.377 40.051 30.000 40.483 50.017 60.000 20.000 20.000 30.022 70.000 10.000 50.065 50.000 30.000 40.000 60.000 40.094 50.000 70.042 50.000 70.064 70.000 20.259 40.089 50.000 40.000 10.000 60.022 60.000 10.000 50.000 50.000 30.000 60.018 70.111 70.000 70.000 60.278 20.000 20.444 60.333 50.333 60.000 10.000 40.000 40.000 50.000 30.000 40.000 70.000 40.000 50.000 50.000 30.000 20.267 50.000 10.184 40.000 10.000 40.211 60.000 10.378 40.000 10.000 10.063 70.000 70.275 40.000 20.000 50.000 10.000 40.000 10.007 70.105 50.000 50.032 70.045 50.198 40.171 50.028 40.000 30.000 10.006 50.000 10.000 10.000 20.278 40.000 30.000 20.000 20.044 60.000 20.000 20.000 40.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.154 50.275 50.108 50.060 50.573 40.381 40.434 50.654 50.190 60.141 40.097 50.000 50.503 50.180 50.252 50.242 60.242 50.881 40.448 10.494 50.429 40.078 40.364 70.024 50.654 30.213 50.222 50.239 50.099 50.616 30.363 50.000 50.092 50.444 50.000 30.383 60.209 70.815 30.030 50.000 50.166 50.002 60.295 70.099 60.364 40.778 20.177 50.001 60.427 70.585 60.000 40.470 40.268 70.205 50.045 50.642 30.007 50.000 40.333 60.148 40.407 40.130 41.000 10.000 30.156 60.189 50.097 60.169 30.000 60.000 20.056 40.400 50.000 50.000 20.000 10.000 40.556 20.278 50.203 50.323 50.019 60.000 40.402 60.026 50.000 20.000 20.000 30.044 50.000 10.000 50.037 60.000 30.000 40.181 40.000 40.127 40.006 60.028 60.023 50.115 40.000 20.327 30.267 30.000 40.000 10.000 60.028 50.000 10.000 50.000 50.000 30.003 50.048 40.135 60.222 40.089 40.278 20.000 20.514 30.333 50.611 30.000 10.000 40.000 40.000 50.000 30.000 40.037 50.000 40.000 50.000 50.000 30.000 20.322 40.000 10.209 30.000 10.000 40.278 40.000 10.302 50.000 10.000 10.143 50.148 50.000 60.000 20.000 50.000 10.000 40.000 10.015 50.064 70.000 50.272 40.031 70.000 60.257 20.028 40.000 30.000 10.041 40.000 10.000 10.000 20.222 70.000 30.000 20.000 20.000 70.000 20.000 20.000 40.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.