The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apalarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
CompetitorFormer-2000.328 20.439 10.303 20.223 20.543 20.044 40.333 10.044 10.000 30.000 10.099 10.444 20.296 30.850 30.722 20.820 10.444 40.047 10.083 30.183 10.562 10.243 40.312 10.380 20.192 41.000 10.143 30.000 10.000 20.484 30.259 41.000 10.000 10.500 10.000 30.650 10.221 30.771 10.004 20.010 30.043 40.120 40.366 20.054 10.000 10.689 10.641 20.500 20.663 31.000 10.673 30.049 20.400 10.479 30.014 30.267 20.455 10.083 50.400 30.400 10.663 10.243 10.464 10.192 50.076 50.427 10.620 10.025 20.013 50.322 20.000 20.677 20.333 10.178 30.808 10.556 10.356 20.345 10.119 50.346 20.312 20.000 10.000 10.305 30.116 20.137 40.000 40.065 10.171 50.314 40.575 20.487 30.303 10.820 10.000 40.000 10.655 10.088 10.373 10.430 20.011 20.103 20.835 10.569 30.125 20.123 30.500 10.774 10.504 20.019 30.465 50.353 20.475 10.000 10.500 10.000 20.712 20.050 20.667 10.000 10.396 30.555 30.120 30.786 20.069 30.000 30.182 30.390 20.000 60.831 11.000 10.679 10.111 20.110 40.000 30.450 30.868 10.277 20.083 20.069 20.471 40.001 30.428 70.000 20.000 30.000 20.421 20.043 10.000 10.358 30.456 30.518 20.237 20.256 20.945 10.271 40.632 30.534 30.208 30.730 30.000 10.140 10.658 21.000 10.452 40.000 10.082 30.441 30.000 10.472 20.060 60.454 20.469 10.384 1
DINO3D-Scannet200copyleft0.346 10.437 20.353 10.229 10.687 10.174 10.333 10.000 20.042 20.000 10.094 20.384 30.618 10.940 10.764 10.292 80.889 20.042 20.000 50.142 20.000 30.456 10.263 20.371 40.407 10.250 20.257 10.000 10.000 20.642 10.431 11.000 10.000 10.250 30.028 20.594 20.436 20.729 20.000 30.138 10.192 20.206 10.083 50.000 30.000 10.611 20.574 30.306 30.719 11.000 10.733 10.066 10.361 20.545 10.000 40.585 10.388 20.558 20.639 10.400 10.659 20.183 20.297 20.246 40.199 10.373 20.446 20.000 30.378 30.156 30.500 10.772 10.111 30.253 20.752 20.477 30.325 30.282 20.551 10.504 10.241 30.000 10.000 10.156 50.238 10.251 10.000 40.000 20.000 70.599 10.712 10.750 10.266 20.766 20.000 40.000 10.628 20.082 20.001 50.417 30.000 30.014 40.708 20.536 40.516 10.328 20.500 10.669 20.529 10.027 20.732 10.764 10.365 20.000 10.250 30.000 20.921 10.063 10.222 30.000 10.520 10.769 10.045 50.714 30.000 50.000 30.264 10.417 10.049 40.731 50.514 40.545 30.000 30.264 10.000 30.462 20.803 20.247 30.303 10.049 40.514 30.000 40.558 20.000 20.111 10.000 20.556 10.000 20.000 10.406 10.536 10.681 10.484 10.346 10.925 20.470 10.664 20.726 10.130 60.780 10.000 10.009 20.618 30.764 60.487 30.000 10.442 10.245 70.000 10.593 10.655 10.345 40.411 20.279 2
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.265 40.349 40.268 30.163 40.360 40.054 30.278 30.000 20.125 10.000 10.031 30.506 10.266 40.630 60.609 40.481 40.903 10.000 51.000 10.032 40.000 30.022 70.138 40.314 70.310 20.000 30.178 20.000 10.000 20.552 20.421 20.889 40.000 10.451 20.097 10.357 50.054 50.485 80.052 10.040 20.210 10.160 20.370 10.000 30.000 10.191 80.529 40.250 40.617 61.000 10.492 80.016 40.197 30.324 70.000 40.250 30.265 50.167 40.317 40.200 50.549 40.107 40.231 50.119 70.141 20.253 50.267 50.000 30.565 10.111 50.000 20.000 50.278 20.285 10.665 30.389 40.306 40.077 40.037 80.186 80.156 50.000 10.000 10.478 10.000 50.091 50.204 20.000 20.345 20.200 50.550 40.674 20.160 40.526 50.438 10.000 10.476 50.035 50.003 40.444 10.000 30.333 10.361 60.606 20.083 30.332 10.417 50.327 40.297 40.035 10.615 20.281 30.083 70.000 10.250 30.000 20.610 30.000 30.333 20.000 10.238 80.481 40.218 10.440 71.000 10.000 30.229 20.257 40.000 60.746 30.361 80.188 50.000 30.221 20.000 30.320 40.655 50.193 50.000 50.067 30.389 60.000 40.594 10.037 10.000 30.000 20.371 30.000 20.000 10.344 40.366 60.506 30.074 40.250 30.848 60.451 30.389 40.546 20.205 40.698 40.000 10.000 40.494 60.769 50.493 20.000 10.000 50.463 10.000 10.333 60.333 30.640 10.251 50.115 4
Minkowski 34D Inst.permissive0.130 70.246 70.083 70.043 80.299 70.000 80.278 30.000 20.000 30.000 10.022 60.175 60.122 50.537 70.521 50.400 50.000 50.000 50.000 50.008 60.000 30.048 60.076 60.182 80.000 70.000 30.022 70.000 10.000 20.000 60.141 80.000 50.000 10.000 60.000 30.210 70.063 40.547 70.000 30.000 50.000 80.100 50.026 80.000 30.000 10.241 60.488 70.000 70.564 81.000 10.672 40.000 60.021 70.486 20.000 40.000 60.067 70.000 60.194 80.033 70.415 70.026 70.025 80.271 10.004 70.094 80.142 80.000 30.000 70.111 50.000 20.000 50.000 40.088 70.083 80.278 50.110 70.000 70.082 70.199 70.137 60.000 10.000 10.000 60.000 50.041 70.000 40.000 20.308 30.067 60.280 60.016 70.101 60.373 80.000 40.000 10.319 70.007 70.000 60.000 60.000 30.000 60.028 80.355 80.000 50.101 40.444 40.289 50.114 80.000 60.394 60.000 50.032 80.000 10.000 60.000 20.201 80.000 30.000 50.000 10.384 40.248 70.000 80.529 60.000 50.000 30.133 60.020 80.089 30.720 60.500 60.099 70.000 30.000 80.000 30.238 70.334 80.190 60.000 50.000 80.317 80.000 40.472 30.000 20.000 30.000 20.094 80.000 20.000 10.082 80.236 70.004 80.019 70.000 60.883 40.061 80.262 50.217 70.000 70.557 80.000 10.000 40.460 70.761 70.156 80.000 10.000 50.259 60.000 10.394 30.019 70.084 70.232 70.000 8
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 80.223 80.082 80.046 70.308 60.004 60.278 30.000 20.000 30.000 10.000 80.032 80.105 60.537 70.348 80.378 60.000 50.000 50.000 50.000 80.000 30.000 80.037 80.323 60.000 70.000 30.013 80.000 10.000 20.000 60.235 50.000 50.000 10.000 60.000 30.231 60.045 60.564 60.000 30.000 50.006 60.078 80.065 60.000 30.000 10.259 50.516 50.000 70.600 71.000 10.578 70.000 60.000 80.184 80.000 40.000 60.034 80.000 60.211 70.089 60.394 80.018 80.064 70.171 60.001 80.144 60.172 70.000 30.000 70.044 70.000 20.000 50.000 40.064 80.126 70.278 50.093 80.000 70.094 60.214 50.011 80.000 10.000 10.000 60.000 50.022 80.000 40.000 20.275 40.000 70.275 70.000 80.098 70.407 70.000 40.000 10.250 80.007 80.000 60.000 60.000 30.000 60.333 70.376 70.000 50.000 80.042 80.285 60.119 70.000 60.224 80.000 50.184 50.000 10.000 60.000 20.244 70.000 30.000 50.000 10.377 50.378 50.051 40.424 80.000 50.000 30.116 80.030 70.125 20.441 70.444 70.063 80.000 30.042 60.000 30.297 50.483 60.096 80.000 50.028 50.338 70.000 40.444 50.000 20.000 30.000 20.189 70.000 20.000 10.141 70.152 80.017 70.000 80.000 60.838 70.193 60.111 80.105 80.198 50.588 60.000 10.000 40.542 50.343 80.267 60.000 10.000 50.108 80.000 10.333 60.000 80.228 50.202 80.022 7
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.154 60.275 60.108 60.060 60.295 80.002 70.278 30.000 20.000 30.000 10.006 70.272 50.064 80.815 40.503 60.333 70.000 50.000 50.556 20.001 70.000 30.148 50.078 50.448 10.007 60.000 30.024 60.000 10.000 20.000 60.190 70.000 50.000 10.000 60.000 30.209 80.031 80.573 50.000 30.000 50.041 50.099 60.037 70.000 30.000 10.327 40.364 80.181 50.642 41.000 10.654 60.000 60.023 60.429 50.000 40.000 60.097 60.000 60.278 50.267 40.434 60.048 50.092 60.257 20.030 60.097 70.189 60.000 30.089 40.000 80.000 20.000 50.000 40.115 50.166 60.222 80.222 60.003 60.127 40.213 60.169 40.000 10.000 10.000 60.000 50.044 60.000 40.000 20.000 70.000 70.268 80.222 50.130 50.494 60.000 40.000 10.363 60.015 60.000 60.000 60.000 30.000 60.611 40.400 60.000 50.056 50.278 60.242 70.180 60.000 60.383 70.000 50.209 40.000 10.000 60.000 20.364 50.000 30.000 50.000 10.323 60.302 60.019 70.654 40.000 50.000 30.141 50.045 60.000 60.427 80.514 40.143 60.000 30.028 70.000 30.252 60.402 70.156 70.000 50.028 50.470 50.000 40.444 50.000 20.000 30.000 20.205 60.000 20.000 10.203 60.381 50.026 60.037 60.000 60.881 50.099 70.135 70.239 60.000 70.585 70.000 10.000 40.616 40.778 30.322 50.000 10.000 50.407 50.000 10.333 60.148 50.177 60.242 60.028 6
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
TD3D Scannet200permissive0.211 50.332 50.177 50.103 50.337 50.036 50.222 70.000 20.000 30.000 10.031 40.342 40.093 70.852 20.452 70.559 30.000 50.004 40.000 50.039 30.000 30.309 30.047 70.380 30.028 50.000 30.080 50.000 10.000 20.147 40.192 60.000 50.000 10.083 40.000 30.395 40.039 70.662 30.000 30.000 50.074 30.135 30.296 40.000 30.000 10.231 70.646 10.139 60.633 51.000 10.705 20.048 30.088 50.439 40.184 20.039 50.266 40.551 30.260 60.026 80.463 50.046 60.252 40.249 30.083 40.372 30.411 30.000 30.414 20.323 10.000 20.052 40.000 40.157 40.278 50.278 50.237 50.015 50.321 30.253 30.060 70.000 10.000 10.272 40.008 30.169 30.032 30.000 20.404 10.356 30.283 50.073 60.028 80.617 40.038 30.000 10.494 40.037 40.215 20.083 50.000 30.003 50.486 50.694 10.000 50.040 70.083 70.219 80.209 50.007 40.483 30.000 50.125 60.000 10.150 50.014 10.544 40.000 30.000 50.000 10.260 70.143 80.200 20.610 50.028 40.032 10.145 40.059 50.046 50.740 40.806 20.543 40.000 30.108 50.008 10.222 80.669 40.456 10.074 30.224 10.586 10.006 20.451 40.000 20.002 20.889 10.282 50.000 20.000 10.252 50.413 40.111 50.074 40.240 40.893 30.266 50.144 60.293 50.281 20.604 50.000 10.000 40.379 80.963 20.250 70.000 10.160 20.420 40.000 10.343 50.207 40.079 80.315 40.052 5
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.278 30.383 30.263 40.168 30.506 30.068 20.083 80.000 20.000 30.000 10.023 50.149 70.302 20.778 50.647 30.569 20.500 30.031 30.014 40.027 50.173 20.311 20.195 30.351 50.258 30.000 30.082 40.000 10.003 10.037 50.391 31.000 10.000 10.014 50.000 30.572 30.573 10.661 40.000 30.003 40.005 70.082 70.349 30.028 20.000 10.605 30.515 60.509 10.711 21.000 10.665 50.015 50.107 40.402 60.201 10.083 40.304 30.759 10.491 20.378 30.572 30.119 30.277 30.013 80.089 30.283 40.411 40.267 10.006 60.156 30.000 20.116 30.000 40.105 60.556 40.514 20.396 10.275 30.323 20.215 40.380 10.000 10.000 10.356 20.005 40.208 20.325 10.000 20.050 60.400 20.561 30.258 40.179 30.722 30.147 20.000 10.586 30.063 30.015 30.139 40.016 10.028 30.708 20.418 50.016 40.048 60.500 10.489 30.349 30.001 50.475 40.086 40.365 30.000 10.500 10.000 20.323 60.000 30.222 30.000 10.497 20.626 20.044 60.795 10.556 20.008 20.121 70.265 30.667 10.789 20.568 30.579 20.444 10.176 30.004 20.474 10.752 30.233 40.014 40.002 70.570 20.007 10.377 80.000 20.000 30.000 20.337 40.000 20.000 10.384 20.465 20.287 40.085 30.048 50.816 80.467 20.810 10.377 40.415 10.744 20.000 10.004 30.724 10.778 30.590 10.000 10.032 40.441 20.000 10.377 40.391 20.427 30.321 30.192 3
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023