The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apalarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
DINO3D-Scannet200copyleft0.346 10.437 10.353 10.229 10.687 10.174 10.333 10.000 10.042 20.000 10.094 10.384 20.618 10.940 10.764 10.292 70.889 20.042 10.000 40.142 10.000 20.456 10.263 10.371 30.407 10.250 10.257 10.000 10.000 20.642 10.431 11.000 10.000 10.250 20.028 20.594 10.436 20.729 10.000 20.138 10.192 20.206 10.083 40.000 20.000 10.611 10.574 20.306 20.719 11.000 10.733 10.066 10.361 10.545 10.000 30.585 10.388 10.558 20.639 10.400 10.659 10.183 10.297 10.246 40.199 10.373 10.446 10.000 20.378 30.156 20.500 10.772 10.111 20.253 20.752 10.477 20.325 20.282 10.551 10.504 10.241 20.000 10.000 10.156 40.238 10.251 10.000 40.000 10.000 60.599 10.712 10.750 10.266 10.766 10.000 40.000 10.628 10.082 10.001 40.417 20.000 20.014 30.708 10.536 30.516 10.328 20.500 10.669 10.529 10.027 20.732 10.764 10.365 10.000 10.250 20.000 20.921 10.063 10.222 20.000 10.520 10.769 10.045 40.714 20.000 40.000 30.264 10.417 10.049 40.731 40.514 30.545 20.000 20.264 10.000 30.462 20.803 10.247 20.303 10.049 30.514 30.000 30.558 20.000 20.111 10.000 20.556 10.000 10.000 10.406 10.536 10.681 10.484 10.346 10.925 10.470 10.664 20.726 10.130 50.780 10.000 10.009 10.618 20.764 50.487 30.000 10.442 10.245 60.000 10.593 10.655 10.345 30.411 10.279 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
Mask3D Scannet2000.278 20.383 20.263 30.168 20.506 20.068 20.083 70.000 10.000 30.000 10.023 40.149 60.302 20.778 40.647 20.569 10.500 30.031 20.014 30.027 40.173 10.311 20.195 20.351 40.258 30.000 20.082 30.000 10.003 10.037 40.391 31.000 10.000 10.014 40.000 30.572 20.573 10.661 30.000 20.003 30.005 60.082 60.349 20.028 10.000 10.605 20.515 50.509 10.711 21.000 10.665 40.015 40.107 30.402 50.201 10.083 30.304 20.759 10.491 20.378 20.572 20.119 20.277 20.013 70.089 30.283 30.411 30.267 10.006 50.156 20.000 20.116 20.000 30.105 50.556 30.514 10.396 10.275 20.323 20.215 30.380 10.000 10.000 10.356 20.005 30.208 20.325 10.000 10.050 50.400 20.561 20.258 30.179 20.722 20.147 20.000 10.586 20.063 20.015 20.139 30.016 10.028 20.708 10.418 40.016 30.048 50.500 10.489 20.349 20.001 40.475 40.086 30.365 20.000 10.500 10.000 20.323 50.000 20.222 20.000 10.497 20.626 20.044 50.795 10.556 20.008 20.121 60.265 20.667 10.789 10.568 20.579 10.444 10.176 30.004 20.474 10.752 20.233 30.014 30.002 60.570 20.007 10.377 70.000 20.000 30.000 20.337 30.000 10.000 10.384 20.465 20.287 30.085 20.048 40.816 70.467 20.810 10.377 30.415 10.744 20.000 10.004 20.724 10.778 20.590 10.000 10.032 30.441 20.000 10.377 30.391 20.427 20.321 20.192 2
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.211 40.332 40.177 40.103 40.337 40.036 40.222 60.000 10.000 30.000 10.031 30.342 30.093 60.852 20.452 60.559 20.000 40.004 30.000 40.039 20.000 20.309 30.047 60.380 20.028 40.000 20.080 40.000 10.000 20.147 30.192 50.000 40.000 10.083 30.000 30.395 30.039 60.662 20.000 20.000 40.074 30.135 30.296 30.000 20.000 10.231 60.646 10.139 50.633 41.000 10.705 20.048 20.088 40.439 30.184 20.039 40.266 30.551 30.260 50.026 70.463 40.046 50.252 30.249 30.083 40.372 20.411 20.000 20.414 20.323 10.000 20.052 30.000 30.157 30.278 40.278 40.237 40.015 40.321 30.253 20.060 60.000 10.000 10.272 30.008 20.169 30.032 30.000 10.404 10.356 30.283 40.073 50.028 70.617 30.038 30.000 10.494 30.037 30.215 10.083 40.000 20.003 40.486 40.694 10.000 40.040 60.083 60.219 70.209 40.007 30.483 30.000 40.125 50.000 10.150 40.014 10.544 30.000 20.000 40.000 10.260 60.143 70.200 20.610 40.028 30.032 10.145 30.059 40.046 50.740 30.806 10.543 30.000 20.108 40.008 10.222 70.669 30.456 10.074 20.224 10.586 10.006 20.451 40.000 20.002 20.889 10.282 40.000 10.000 10.252 40.413 30.111 40.074 30.240 30.893 20.266 40.144 50.293 40.281 20.604 40.000 10.000 30.379 70.963 10.250 60.000 10.160 20.420 30.000 10.343 40.207 40.079 70.315 30.052 4
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
LGround Inst.permissive0.154 50.275 50.108 50.060 50.295 70.002 60.278 20.000 10.000 30.000 10.006 60.272 40.064 70.815 30.503 50.333 60.000 40.000 40.556 20.001 60.000 20.148 40.078 40.448 10.007 50.000 20.024 50.000 10.000 20.000 50.190 60.000 40.000 10.000 50.000 30.209 70.031 70.573 40.000 20.000 40.041 40.099 50.037 60.000 20.000 10.327 30.364 70.181 40.642 31.000 10.654 50.000 50.023 50.429 40.000 30.000 50.097 50.000 50.278 40.267 30.434 50.048 40.092 50.257 20.030 50.097 60.189 50.000 20.089 40.000 70.000 20.000 40.000 30.115 40.166 50.222 70.222 50.003 50.127 40.213 50.169 30.000 10.000 10.000 50.000 40.044 50.000 40.000 10.000 60.000 60.268 70.222 40.130 40.494 50.000 40.000 10.363 50.015 50.000 50.000 50.000 20.000 50.611 30.400 50.000 40.056 40.278 50.242 60.180 50.000 50.383 60.000 40.209 30.000 10.000 50.000 20.364 40.000 20.000 40.000 10.323 50.302 50.019 60.654 30.000 40.000 30.141 40.045 50.000 60.427 70.514 30.143 50.000 20.028 60.000 30.252 50.402 60.156 60.000 40.028 40.470 40.000 30.444 50.000 20.000 30.000 20.205 50.000 10.000 10.203 50.381 40.026 50.037 50.000 50.881 40.099 60.135 60.239 50.000 60.585 60.000 10.000 30.616 30.778 20.322 40.000 10.000 40.407 40.000 10.333 50.148 50.177 50.242 50.028 5
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
ODIN - Ins200permissive0.265 30.349 30.268 20.163 30.360 30.054 30.278 20.000 10.125 10.000 10.031 20.506 10.266 30.630 50.609 30.481 30.903 10.000 41.000 10.032 30.000 20.022 60.138 30.314 60.310 20.000 20.178 20.000 10.000 20.552 20.421 20.889 30.000 10.451 10.097 10.357 40.054 40.485 70.052 10.040 20.210 10.160 20.370 10.000 20.000 10.191 70.529 30.250 30.617 51.000 10.492 70.016 30.197 20.324 60.000 30.250 20.265 40.167 40.317 30.200 40.549 30.107 30.231 40.119 60.141 20.253 40.267 40.000 20.565 10.111 40.000 20.000 40.278 10.285 10.665 20.389 30.306 30.077 30.037 70.186 70.156 40.000 10.000 10.478 10.000 40.091 40.204 20.000 10.345 20.200 40.550 30.674 20.160 30.526 40.438 10.000 10.476 40.035 40.003 30.444 10.000 20.333 10.361 50.606 20.083 20.332 10.417 40.327 30.297 30.035 10.615 20.281 20.083 60.000 10.250 20.000 20.610 20.000 20.333 10.000 10.238 70.481 30.218 10.440 61.000 10.000 30.229 20.257 30.000 60.746 20.361 70.188 40.000 20.221 20.000 30.320 30.655 40.193 40.000 40.067 20.389 50.000 30.594 10.037 10.000 30.000 20.371 20.000 10.000 10.344 30.366 50.506 20.074 30.250 20.848 50.451 30.389 30.546 20.205 30.698 30.000 10.000 30.494 50.769 40.493 20.000 10.000 40.463 10.000 10.333 50.333 30.640 10.251 40.115 3
Minkowski 34D Inst.permissive0.130 60.246 60.083 60.043 70.299 60.000 70.278 20.000 10.000 30.000 10.022 50.175 50.122 40.537 60.521 40.400 40.000 40.000 40.000 40.008 50.000 20.048 50.076 50.182 70.000 60.000 20.022 60.000 10.000 20.000 50.141 70.000 40.000 10.000 50.000 30.210 60.063 30.547 60.000 20.000 40.000 70.100 40.026 70.000 20.000 10.241 50.488 60.000 60.564 71.000 10.672 30.000 50.021 60.486 20.000 30.000 50.067 60.000 50.194 70.033 60.415 60.026 60.025 70.271 10.004 60.094 70.142 70.000 20.000 60.111 40.000 20.000 40.000 30.088 60.083 70.278 40.110 60.000 60.082 60.199 60.137 50.000 10.000 10.000 50.000 40.041 60.000 40.000 10.308 30.067 50.280 50.016 60.101 50.373 70.000 40.000 10.319 60.007 60.000 50.000 50.000 20.000 50.028 70.355 70.000 40.101 30.444 30.289 40.114 70.000 50.394 50.000 40.032 70.000 10.000 50.000 20.201 70.000 20.000 40.000 10.384 30.248 60.000 70.529 50.000 40.000 30.133 50.020 70.089 30.720 50.500 50.099 60.000 20.000 70.000 30.238 60.334 70.190 50.000 40.000 70.317 70.000 30.472 30.000 20.000 30.000 20.094 70.000 10.000 10.082 70.236 60.004 70.019 60.000 50.883 30.061 70.262 40.217 60.000 60.557 70.000 10.000 30.460 60.761 60.156 70.000 10.000 40.259 50.000 10.394 20.019 60.084 60.232 60.000 7
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 70.223 70.082 70.046 60.308 50.004 50.278 20.000 10.000 30.000 10.000 70.032 70.105 50.537 60.348 70.378 50.000 40.000 40.000 40.000 70.000 20.000 70.037 70.323 50.000 60.000 20.013 70.000 10.000 20.000 50.235 40.000 40.000 10.000 50.000 30.231 50.045 50.564 50.000 20.000 40.006 50.078 70.065 50.000 20.000 10.259 40.516 40.000 60.600 61.000 10.578 60.000 50.000 70.184 70.000 30.000 50.034 70.000 50.211 60.089 50.394 70.018 70.064 60.171 50.001 70.144 50.172 60.000 20.000 60.044 60.000 20.000 40.000 30.064 70.126 60.278 40.093 70.000 60.094 50.214 40.011 70.000 10.000 10.000 50.000 40.022 70.000 40.000 10.275 40.000 60.275 60.000 70.098 60.407 60.000 40.000 10.250 70.007 70.000 50.000 50.000 20.000 50.333 60.376 60.000 40.000 70.042 70.285 50.119 60.000 50.224 70.000 40.184 40.000 10.000 50.000 20.244 60.000 20.000 40.000 10.377 40.378 40.051 30.424 70.000 40.000 30.116 70.030 60.125 20.441 60.444 60.063 70.000 20.042 50.000 30.297 40.483 50.096 70.000 40.028 40.338 60.000 30.444 50.000 20.000 30.000 20.189 60.000 10.000 10.141 60.152 70.017 60.000 70.000 50.838 60.193 50.111 70.105 70.198 40.588 50.000 10.000 30.542 40.343 70.267 50.000 10.000 40.108 70.000 10.333 50.000 70.228 40.202 70.022 6
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021