The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apchairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
DINO3D-Scannet200copyleft0.346 10.437 20.353 10.229 10.729 20.536 10.659 20.733 10.431 10.264 10.388 20.001 50.764 10.529 10.462 20.669 20.411 20.925 20.371 40.766 20.545 10.263 20.574 30.257 10.714 30.504 10.325 30.726 10.206 10.618 30.628 20.066 10.297 20.558 20.000 40.732 10.594 20.940 10.199 10.558 20.752 20.174 10.687 10.470 10.921 10.764 60.345 40.142 20.731 50.780 10.138 10.514 30.712 10.556 10.417 10.719 10.407 10.042 20.292 80.456 10.245 70.266 21.000 10.042 20.247 30.446 20.373 20.241 30.049 40.000 30.328 20.536 40.417 30.000 30.000 10.764 10.000 50.500 10.406 10.520 10.045 50.442 10.803 20.681 10.000 20.000 30.000 40.251 10.000 10.027 20.083 50.000 30.303 10.306 30.889 20.551 10.094 20.264 10.361 20.253 20.000 20.611 20.400 10.516 10.000 20.599 10.279 20.000 20.346 10.642 10.111 30.282 20.183 20.664 20.750 10.378 30.333 10.500 10.514 40.593 10.708 20.000 10.238 10.000 40.250 30.111 10.000 50.484 10.000 40.250 30.585 10.000 30.063 10.487 30.000 10.365 20.000 10.772 10.639 10.000 10.769 10.000 10.000 10.545 30.655 10.000 70.250 20.014 40.000 10.222 30.000 10.082 20.618 10.156 50.384 30.436 20.130 60.246 40.049 40.009 20.000 10.192 20.000 10.000 20.000 30.477 30.028 20.000 30.000 20.156 30.000 30.000 21.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
Mask3D Scannet2000.278 30.383 30.263 40.168 30.661 40.465 20.572 30.665 50.391 30.121 70.304 30.015 30.647 30.349 30.474 10.489 30.321 30.816 80.351 50.722 30.402 60.195 30.515 60.082 40.795 10.215 40.396 10.377 40.082 70.724 10.586 30.015 50.277 30.377 80.201 10.475 40.572 30.778 50.089 30.759 10.556 40.068 20.506 30.467 20.323 60.778 30.427 30.027 50.789 20.744 20.003 40.570 20.561 30.337 40.265 30.711 20.258 30.031 30.569 20.311 20.441 20.179 31.000 10.000 30.233 40.411 40.283 40.380 10.667 10.016 10.048 60.418 50.139 40.173 20.000 10.086 40.014 40.500 10.384 20.497 20.044 60.032 40.752 30.287 40.003 10.000 30.007 10.208 20.000 10.001 50.349 30.008 20.014 40.509 10.500 30.323 20.023 50.176 30.107 40.105 60.000 20.605 30.378 30.016 40.000 20.400 20.192 30.000 20.048 50.037 50.000 40.275 30.119 30.810 10.258 40.006 60.083 80.000 20.568 30.377 40.708 20.000 10.005 40.147 20.014 50.000 30.556 20.085 30.325 10.500 10.083 40.004 20.000 30.590 10.000 10.365 30.000 10.116 30.491 20.000 10.626 20.000 10.000 10.579 20.391 20.050 60.000 30.028 30.000 10.222 30.000 10.063 30.302 20.356 20.149 70.573 10.415 10.013 80.002 70.004 30.000 10.005 70.000 10.000 20.444 10.514 20.000 30.028 20.000 20.156 30.267 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
ODIN - Ins200permissive0.265 40.349 40.268 30.163 40.485 80.366 60.549 40.492 80.421 20.229 20.265 50.003 40.609 40.297 40.320 40.327 40.251 50.848 60.314 70.526 50.324 70.138 40.529 40.178 20.440 70.186 80.306 40.546 20.160 20.494 60.476 50.016 40.231 50.594 10.000 40.615 20.357 50.630 60.141 20.167 40.665 30.054 30.360 40.451 30.610 30.769 50.640 10.032 40.746 30.698 40.040 20.389 60.550 40.371 30.257 40.617 60.310 20.000 50.481 40.022 70.463 10.160 41.000 10.125 10.193 50.267 50.253 50.156 50.000 60.000 30.332 10.606 20.444 10.000 30.000 10.281 31.000 10.417 50.344 40.238 80.218 10.000 50.655 50.506 30.000 20.052 10.000 40.091 50.000 10.035 10.370 10.000 30.000 50.250 40.903 10.037 80.031 30.221 20.197 30.285 10.037 10.191 80.200 50.083 30.000 20.200 50.115 40.000 20.250 30.552 20.278 20.077 40.107 40.389 40.674 20.565 10.278 30.000 20.361 80.333 60.361 60.000 10.000 50.438 10.451 20.000 31.000 10.074 40.204 20.250 30.250 30.000 30.000 30.493 20.000 10.083 70.000 10.000 50.317 40.000 10.481 40.000 10.000 10.188 50.333 30.345 20.000 30.333 10.000 10.333 20.000 10.035 50.266 40.478 10.506 10.054 50.205 40.119 70.067 30.000 40.000 10.210 10.000 10.000 20.000 30.389 40.097 10.000 30.000 20.111 50.000 30.000 20.889 40.000 1
CompetitorFormer-2000.328 20.439 10.303 20.223 20.771 10.456 30.663 10.673 30.259 40.182 30.455 10.373 10.722 20.504 20.450 30.774 10.469 10.945 10.380 20.820 10.479 30.312 10.641 20.143 30.786 20.346 20.356 20.534 30.120 40.658 20.655 10.049 20.464 10.428 70.014 30.465 50.650 10.850 30.076 50.083 50.808 10.044 40.543 20.271 40.712 21.000 10.454 20.183 10.831 10.730 30.010 30.471 40.575 20.421 20.390 20.663 30.192 40.047 10.820 10.243 40.441 30.303 11.000 10.000 30.277 20.620 10.427 10.312 20.000 60.011 20.123 30.569 30.430 20.562 10.000 10.353 20.083 30.500 10.358 30.396 30.120 30.082 30.868 10.518 20.000 20.004 20.001 30.137 40.000 10.019 30.366 20.000 30.083 20.500 20.444 40.119 50.099 10.110 40.400 10.178 30.000 20.689 10.400 10.125 20.065 10.314 40.384 10.044 10.256 20.484 30.333 10.345 10.243 10.632 30.487 30.013 50.333 10.000 21.000 10.472 20.835 10.000 10.116 20.000 40.500 10.000 30.069 30.237 20.000 40.500 10.267 20.000 30.050 20.452 40.000 10.475 10.000 10.677 20.400 30.000 10.555 30.000 10.000 10.679 10.060 60.171 51.000 10.103 20.000 10.667 10.000 10.088 10.296 30.305 30.444 20.221 30.208 30.192 50.069 20.140 10.000 10.043 40.000 10.043 10.111 20.556 10.000 30.054 10.000 20.322 20.025 20.000 21.000 10.000 1
TD3D Scannet200permissive0.211 50.332 50.177 50.103 50.662 30.413 40.463 50.705 20.192 60.145 40.266 40.215 20.452 70.209 50.222 80.219 80.315 40.893 30.380 30.617 40.439 40.047 70.646 10.080 50.610 50.253 30.237 50.293 50.135 30.379 80.494 40.048 30.252 40.451 40.184 20.483 30.395 40.852 20.083 40.551 30.278 50.036 50.337 50.266 50.544 40.963 20.079 80.039 30.740 40.604 50.000 50.586 10.283 50.282 50.059 50.633 50.028 50.004 40.559 30.309 30.420 40.028 81.000 10.000 30.456 10.411 30.372 30.060 70.046 50.000 30.040 70.694 10.083 50.000 30.000 10.000 50.000 50.083 70.252 50.260 70.200 20.160 20.669 40.111 50.000 20.000 30.006 20.169 30.000 10.007 40.296 40.032 10.074 30.139 60.000 50.321 30.031 40.108 50.088 50.157 40.000 20.231 70.026 80.000 50.000 20.356 30.052 50.000 20.240 40.147 40.000 40.015 50.046 60.144 60.073 60.414 20.222 70.000 20.806 20.343 50.486 50.000 10.008 30.038 30.083 40.002 20.028 40.074 40.032 30.150 50.039 50.008 10.000 30.250 70.000 10.125 60.000 10.052 40.260 60.000 10.143 80.000 10.000 10.543 40.207 40.404 10.000 30.003 50.000 10.000 50.000 10.037 40.093 70.272 40.342 40.039 70.281 20.249 30.224 10.000 40.000 10.074 30.000 10.000 20.000 30.278 50.000 30.000 30.889 10.323 10.000 30.014 10.000 50.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
CSC-Pretrain Inst.permissive0.123 80.223 80.082 80.046 70.564 60.152 80.394 80.578 70.235 50.116 80.034 80.000 60.348 80.119 70.297 50.285 60.202 80.838 70.323 60.407 70.184 80.037 80.516 50.013 80.424 80.214 50.093 80.105 80.078 80.542 50.250 80.000 60.064 70.444 50.000 40.224 80.231 60.537 70.001 80.000 60.126 70.004 60.308 60.193 60.244 70.343 80.228 50.000 80.441 70.588 60.000 50.338 70.275 70.189 70.030 70.600 70.000 70.000 50.378 60.000 80.108 80.098 71.000 10.000 30.096 80.172 70.144 60.011 80.125 20.000 30.000 80.376 70.000 60.000 30.000 10.000 50.000 50.042 80.141 70.377 50.051 40.000 50.483 60.017 70.000 20.000 30.000 40.022 80.000 10.000 60.065 60.000 30.000 50.000 70.000 50.094 60.000 80.042 60.000 80.064 80.000 20.259 50.089 60.000 50.000 20.000 70.022 70.000 20.000 60.000 60.000 40.000 70.018 80.111 80.000 80.000 70.278 30.000 20.444 70.333 60.333 70.000 10.000 50.000 40.000 60.000 30.000 50.000 80.000 40.000 60.000 60.000 30.000 30.267 60.000 10.184 50.000 10.000 50.211 70.000 10.378 50.000 10.000 10.063 80.000 80.275 40.000 30.000 60.000 10.000 50.000 10.007 80.105 60.000 60.032 80.045 60.198 50.171 60.028 50.000 40.000 10.006 60.000 10.000 20.000 30.278 50.000 30.000 30.000 20.044 70.000 30.000 20.000 50.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.154 60.275 60.108 60.060 60.573 50.381 50.434 60.654 60.190 70.141 50.097 60.000 60.503 60.180 60.252 60.242 70.242 60.881 50.448 10.494 60.429 50.078 50.364 80.024 60.654 40.213 60.222 60.239 60.099 60.616 40.363 60.000 60.092 60.444 50.000 40.383 70.209 80.815 40.030 60.000 60.166 60.002 70.295 80.099 70.364 50.778 30.177 60.001 70.427 80.585 70.000 50.470 50.268 80.205 60.045 60.642 40.007 60.000 50.333 70.148 50.407 50.130 51.000 10.000 30.156 70.189 60.097 70.169 40.000 60.000 30.056 50.400 60.000 60.000 30.000 10.000 50.556 20.278 60.203 60.323 60.019 70.000 50.402 70.026 60.000 20.000 30.000 40.044 60.000 10.000 60.037 70.000 30.000 50.181 50.000 50.127 40.006 70.028 70.023 60.115 50.000 20.327 40.267 40.000 50.000 20.000 70.028 60.000 20.000 60.000 60.000 40.003 60.048 50.135 70.222 50.089 40.278 30.000 20.514 40.333 60.611 40.000 10.000 50.000 40.000 60.000 30.000 50.037 60.000 40.000 60.000 60.000 30.000 30.322 50.000 10.209 40.000 10.000 50.278 50.000 10.302 60.000 10.000 10.143 60.148 50.000 70.000 30.000 60.000 10.000 50.000 10.015 60.064 80.000 60.272 50.031 80.000 70.257 20.028 50.000 40.000 10.041 50.000 10.000 20.000 30.222 80.000 30.000 30.000 20.000 80.000 30.000 20.000 50.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
Minkowski 34D Inst.permissive0.130 70.246 70.083 70.043 80.547 70.236 70.415 70.672 40.141 80.133 60.067 70.000 60.521 50.114 80.238 70.289 50.232 70.883 40.182 80.373 80.486 20.076 60.488 70.022 70.529 60.199 70.110 70.217 70.100 50.460 70.319 70.000 60.025 80.472 30.000 40.394 60.210 70.537 70.004 70.000 60.083 80.000 80.299 70.061 80.201 80.761 70.084 70.008 60.720 60.557 80.000 50.317 80.280 60.094 80.020 80.564 80.000 70.000 50.400 50.048 60.259 60.101 61.000 10.000 30.190 60.142 80.094 80.137 60.089 30.000 30.101 40.355 80.000 60.000 30.000 10.000 50.000 50.444 40.082 80.384 40.000 80.000 50.334 80.004 80.000 20.000 30.000 40.041 70.000 10.000 60.026 80.000 30.000 50.000 70.000 50.082 70.022 60.000 80.021 70.088 70.000 20.241 60.033 70.000 50.000 20.067 60.000 80.000 20.000 60.000 60.000 40.000 70.026 70.262 50.016 70.000 70.278 30.000 20.500 60.394 30.028 80.000 10.000 50.000 40.000 60.000 30.000 50.019 70.000 40.000 60.000 60.000 30.000 30.156 80.000 10.032 80.000 10.000 50.194 80.000 10.248 70.000 10.000 10.099 70.019 70.308 30.000 30.000 60.000 10.000 50.000 10.007 70.122 50.000 60.175 60.063 40.000 70.271 10.000 80.000 40.000 10.000 80.000 10.000 20.000 30.278 50.000 30.000 30.000 20.111 50.000 30.000 20.000 50.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019