The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apalarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.278 10.383 10.263 20.168 10.506 10.068 10.083 60.000 10.000 20.000 10.023 30.149 50.302 10.778 30.647 10.569 10.500 20.031 10.014 30.027 30.173 10.311 10.195 10.351 30.258 20.000 10.082 20.000 10.003 10.037 30.391 21.000 10.000 10.014 30.000 20.572 10.573 10.661 20.000 20.003 20.005 50.082 50.349 20.028 10.000 10.605 10.515 40.509 10.711 11.000 10.665 30.015 30.107 20.402 40.201 10.083 20.304 10.759 10.491 10.378 10.572 10.119 10.277 10.013 60.089 20.283 20.411 20.267 10.006 40.156 20.000 10.116 10.000 20.105 40.556 20.514 10.396 10.275 10.323 10.215 20.380 10.000 10.000 10.356 20.005 20.208 10.325 10.000 10.050 50.400 10.561 10.258 20.179 10.722 10.147 20.000 10.586 10.063 10.015 20.139 20.016 10.028 20.708 10.418 30.016 20.048 40.500 10.489 10.349 10.001 30.475 30.086 20.365 10.000 10.500 10.000 20.323 40.000 10.222 20.000 10.497 10.626 10.044 40.795 10.556 20.008 20.121 50.265 10.667 10.789 10.568 20.579 10.444 10.176 20.004 20.474 10.752 10.233 20.014 20.002 50.570 20.007 10.377 60.000 20.000 20.000 20.337 20.000 10.000 10.384 10.465 10.287 20.085 10.048 30.816 60.467 10.810 10.377 20.415 10.744 10.000 10.004 10.724 10.778 20.590 10.000 10.032 20.441 20.000 10.377 20.391 10.427 20.321 10.192 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.211 30.332 30.177 30.103 30.337 30.036 30.222 50.000 10.000 20.000 10.031 20.342 20.093 50.852 10.452 50.559 20.000 30.004 20.000 40.039 10.000 20.309 20.047 50.380 20.028 30.000 10.080 30.000 10.000 20.147 20.192 40.000 30.000 10.083 20.000 20.395 20.039 50.662 10.000 20.000 30.074 20.135 20.296 30.000 20.000 10.231 50.646 10.139 40.633 31.000 10.705 10.048 10.088 30.439 20.184 20.039 30.266 20.551 20.260 40.026 60.463 30.046 40.252 20.249 30.083 30.372 10.411 10.000 20.414 20.323 10.000 10.052 20.000 20.157 20.278 30.278 30.237 30.015 30.321 20.253 10.060 50.000 10.000 10.272 30.008 10.169 20.032 30.000 10.404 10.356 20.283 30.073 40.028 60.617 20.038 30.000 10.494 20.037 20.215 10.083 30.000 20.003 30.486 30.694 10.000 30.040 50.083 50.219 60.209 30.007 20.483 20.000 30.125 40.000 10.150 30.014 10.544 20.000 10.000 30.000 10.260 50.143 60.200 20.610 30.028 30.032 10.145 20.059 30.046 40.740 30.806 10.543 20.000 20.108 30.008 10.222 60.669 20.456 10.074 10.224 10.586 10.006 20.451 30.000 20.002 10.889 10.282 30.000 10.000 10.252 30.413 20.111 30.074 20.240 20.893 10.266 30.144 40.293 30.281 20.604 30.000 10.000 20.379 60.963 10.250 50.000 10.160 10.420 30.000 10.343 30.207 30.079 60.315 20.052 3
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
ODIN - Ins200permissive0.265 20.349 20.268 10.163 20.360 20.054 20.278 10.000 10.125 10.000 10.031 10.506 10.266 20.630 40.609 20.481 30.903 10.000 31.000 10.032 20.000 20.022 50.138 20.314 50.310 10.000 10.178 10.000 10.000 20.552 10.421 10.889 20.000 10.451 10.097 10.357 30.054 30.485 60.052 10.040 10.210 10.160 10.370 10.000 20.000 10.191 60.529 20.250 20.617 41.000 10.492 60.016 20.197 10.324 50.000 30.250 10.265 30.167 30.317 20.200 30.549 20.107 20.231 30.119 50.141 10.253 30.267 30.000 20.565 10.111 30.000 10.000 30.278 10.285 10.665 10.389 20.306 20.077 20.037 60.186 60.156 30.000 10.000 10.478 10.000 30.091 30.204 20.000 10.345 20.200 30.550 20.674 10.160 20.526 30.438 10.000 10.476 30.035 30.003 30.444 10.000 20.333 10.361 40.606 20.083 10.332 10.417 30.327 20.297 20.035 10.615 10.281 10.083 50.000 10.250 20.000 20.610 10.000 10.333 10.000 10.238 60.481 20.218 10.440 51.000 10.000 30.229 10.257 20.000 50.746 20.361 60.188 30.000 20.221 10.000 30.320 20.655 30.193 30.000 30.067 20.389 40.000 30.594 10.037 10.000 20.000 20.371 10.000 10.000 10.344 20.366 40.506 10.074 20.250 10.848 40.451 20.389 20.546 10.205 30.698 20.000 10.000 20.494 40.769 40.493 20.000 10.000 30.463 10.000 10.333 40.333 20.640 10.251 30.115 2
LGround Inst.permissive0.154 40.275 40.108 40.060 40.295 60.002 50.278 10.000 10.000 20.000 10.006 50.272 30.064 60.815 20.503 40.333 60.000 30.000 30.556 20.001 50.000 20.148 30.078 30.448 10.007 40.000 10.024 40.000 10.000 20.000 40.190 50.000 30.000 10.000 40.000 20.209 60.031 60.573 30.000 20.000 30.041 30.099 40.037 50.000 20.000 10.327 20.364 60.181 30.642 21.000 10.654 40.000 40.023 40.429 30.000 30.000 40.097 40.000 40.278 30.267 20.434 40.048 30.092 40.257 20.030 40.097 50.189 40.000 20.089 30.000 60.000 10.000 30.000 20.115 30.166 40.222 60.222 40.003 40.127 30.213 40.169 20.000 10.000 10.000 40.000 30.044 40.000 40.000 10.000 60.000 50.268 60.222 30.130 30.494 40.000 40.000 10.363 40.015 40.000 40.000 40.000 20.000 40.611 20.400 40.000 30.056 30.278 40.242 50.180 40.000 40.383 50.000 30.209 20.000 10.000 40.000 20.364 30.000 10.000 30.000 10.323 40.302 40.019 50.654 20.000 40.000 30.141 30.045 40.000 50.427 60.514 30.143 40.000 20.028 50.000 30.252 40.402 50.156 50.000 30.028 30.470 30.000 30.444 40.000 20.000 20.000 20.205 40.000 10.000 10.203 40.381 30.026 40.037 40.000 40.881 30.099 50.135 50.239 40.000 50.585 50.000 10.000 20.616 20.778 20.322 30.000 10.000 30.407 40.000 10.333 40.148 40.177 40.242 40.028 4
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
Minkowski 34D Inst.permissive0.130 50.246 50.083 50.043 60.299 50.000 60.278 10.000 10.000 20.000 10.022 40.175 40.122 30.537 50.521 30.400 40.000 30.000 30.000 40.008 40.000 20.048 40.076 40.182 60.000 50.000 10.022 50.000 10.000 20.000 40.141 60.000 30.000 10.000 40.000 20.210 50.063 20.547 50.000 20.000 30.000 60.100 30.026 60.000 20.000 10.241 40.488 50.000 50.564 61.000 10.672 20.000 40.021 50.486 10.000 30.000 40.067 50.000 40.194 60.033 50.415 50.026 50.025 60.271 10.004 50.094 60.142 60.000 20.000 50.111 30.000 10.000 30.000 20.088 50.083 60.278 30.110 50.000 50.082 50.199 50.137 40.000 10.000 10.000 40.000 30.041 50.000 40.000 10.308 30.067 40.280 40.016 50.101 40.373 60.000 40.000 10.319 50.007 50.000 40.000 40.000 20.000 40.028 60.355 60.000 30.101 20.444 20.289 30.114 60.000 40.394 40.000 30.032 60.000 10.000 40.000 20.201 60.000 10.000 30.000 10.384 20.248 50.000 60.529 40.000 40.000 30.133 40.020 60.089 30.720 40.500 40.099 50.000 20.000 60.000 30.238 50.334 60.190 40.000 30.000 60.317 60.000 30.472 20.000 20.000 20.000 20.094 60.000 10.000 10.082 60.236 50.004 60.019 50.000 40.883 20.061 60.262 30.217 50.000 50.557 60.000 10.000 20.460 50.761 50.156 60.000 10.000 30.259 50.000 10.394 10.019 50.084 50.232 50.000 6
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 60.223 60.082 60.046 50.308 40.004 40.278 10.000 10.000 20.000 10.000 60.032 60.105 40.537 50.348 60.378 50.000 30.000 30.000 40.000 60.000 20.000 60.037 60.323 40.000 50.000 10.013 60.000 10.000 20.000 40.235 30.000 30.000 10.000 40.000 20.231 40.045 40.564 40.000 20.000 30.006 40.078 60.065 40.000 20.000 10.259 30.516 30.000 50.600 51.000 10.578 50.000 40.000 60.184 60.000 30.000 40.034 60.000 40.211 50.089 40.394 60.018 60.064 50.171 40.001 60.144 40.172 50.000 20.000 50.044 50.000 10.000 30.000 20.064 60.126 50.278 30.093 60.000 50.094 40.214 30.011 60.000 10.000 10.000 40.000 30.022 60.000 40.000 10.275 40.000 50.275 50.000 60.098 50.407 50.000 40.000 10.250 60.007 60.000 40.000 40.000 20.000 40.333 50.376 50.000 30.000 60.042 60.285 40.119 50.000 40.224 60.000 30.184 30.000 10.000 40.000 20.244 50.000 10.000 30.000 10.377 30.378 30.051 30.424 60.000 40.000 30.116 60.030 50.125 20.441 50.444 50.063 60.000 20.042 40.000 30.297 30.483 40.096 60.000 30.028 30.338 50.000 30.444 40.000 20.000 20.000 20.189 50.000 10.000 10.141 50.152 60.017 50.000 60.000 40.838 50.193 40.111 60.105 60.198 40.588 40.000 10.000 20.542 30.343 60.267 40.000 10.000 30.108 60.000 10.333 40.000 60.228 30.202 60.022 5
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021