The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apalarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.278 10.383 10.263 10.168 10.506 10.068 10.083 50.000 10.000 10.000 10.023 20.149 40.302 10.778 30.647 10.569 10.500 10.031 10.014 20.027 20.173 10.311 10.195 10.351 30.258 10.000 10.082 10.000 10.003 10.037 20.391 11.000 10.000 10.014 20.000 10.572 10.573 10.661 20.000 10.003 10.005 40.082 40.349 10.028 10.000 10.605 10.515 30.509 10.711 11.000 10.665 30.015 20.107 10.402 40.201 10.083 10.304 10.759 10.491 10.378 10.572 10.119 10.277 10.013 50.089 10.283 20.411 20.267 10.006 30.156 20.000 10.116 10.000 10.105 30.556 10.514 10.396 10.275 10.323 10.215 20.380 10.000 10.000 10.356 10.005 20.208 10.325 10.000 10.050 40.400 10.561 10.258 10.179 10.722 10.147 10.000 10.586 10.063 10.015 20.139 10.016 10.028 10.708 10.418 20.016 10.048 30.500 10.489 10.349 10.001 20.475 20.086 10.365 10.000 10.500 10.000 20.323 30.000 10.222 10.000 10.497 10.626 10.044 30.795 10.556 10.008 20.121 40.265 10.667 10.789 10.568 20.579 10.444 10.176 10.004 20.474 10.752 10.233 20.014 20.002 40.570 20.007 10.377 50.000 10.000 20.000 20.337 10.000 10.000 10.384 10.465 10.287 10.085 10.048 20.816 50.467 10.810 10.377 10.415 10.744 10.000 10.004 10.724 10.778 20.590 10.000 10.032 20.441 10.000 10.377 20.391 10.427 10.321 10.192 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet2000.211 20.332 20.177 20.103 20.337 20.036 20.222 40.000 10.000 10.000 10.031 10.342 10.093 40.852 10.452 40.559 20.000 20.004 20.000 30.039 10.000 20.309 20.047 40.380 20.028 20.000 10.080 20.000 10.000 20.147 10.192 30.000 20.000 10.083 10.000 10.395 20.039 40.662 10.000 10.000 20.074 10.135 10.296 20.000 20.000 10.231 50.646 10.139 30.633 31.000 10.705 10.048 10.088 20.439 20.184 20.039 20.266 20.551 20.260 30.026 50.463 20.046 30.252 20.249 30.083 20.372 10.411 10.000 20.414 10.323 10.000 10.052 20.000 10.157 10.278 20.278 20.237 20.015 20.321 20.253 10.060 40.000 10.000 10.272 20.008 10.169 20.032 20.000 10.404 10.356 20.283 20.073 30.028 50.617 20.038 20.000 10.494 20.037 20.215 10.083 20.000 20.003 20.486 30.694 10.000 20.040 40.083 40.219 50.209 20.007 10.483 10.000 20.125 40.000 10.150 20.014 10.544 10.000 10.000 20.000 10.260 50.143 50.200 10.610 30.028 20.032 10.145 10.059 20.046 40.740 20.806 10.543 20.000 20.108 20.008 10.222 50.669 20.456 10.074 10.224 10.586 10.006 20.451 20.000 10.002 10.889 10.282 20.000 10.000 10.252 20.413 20.111 20.074 20.240 10.893 10.266 20.144 30.293 20.281 20.604 20.000 10.000 20.379 50.963 10.250 40.000 10.160 10.420 20.000 10.343 30.207 20.079 50.315 20.052 2
LGround Inst.permissive0.154 30.275 30.108 30.060 30.295 50.002 40.278 10.000 10.000 10.000 10.006 40.272 20.064 50.815 20.503 30.333 50.000 20.000 30.556 10.001 40.000 20.148 30.078 20.448 10.007 30.000 10.024 30.000 10.000 20.000 30.190 40.000 20.000 10.000 30.000 10.209 50.031 50.573 30.000 10.000 20.041 20.099 30.037 40.000 20.000 10.327 20.364 50.181 20.642 21.000 10.654 40.000 30.023 30.429 30.000 30.000 30.097 30.000 30.278 20.267 20.434 30.048 20.092 30.257 20.030 30.097 40.189 30.000 20.089 20.000 50.000 10.000 30.000 10.115 20.166 30.222 50.222 30.003 30.127 30.213 40.169 20.000 10.000 10.000 30.000 30.044 30.000 30.000 10.000 50.000 40.268 50.222 20.130 20.494 30.000 30.000 10.363 30.015 30.000 30.000 30.000 20.000 30.611 20.400 30.000 20.056 20.278 30.242 40.180 30.000 30.383 40.000 20.209 20.000 10.000 30.000 20.364 20.000 10.000 20.000 10.323 40.302 30.019 40.654 20.000 30.000 30.141 20.045 30.000 50.427 50.514 30.143 30.000 20.028 40.000 30.252 30.402 40.156 40.000 30.028 20.470 30.000 30.444 30.000 10.000 20.000 20.205 30.000 10.000 10.203 30.381 30.026 30.037 30.000 30.881 30.099 40.135 40.239 30.000 40.585 40.000 10.000 20.616 20.778 20.322 20.000 10.000 30.407 30.000 10.333 40.148 30.177 30.242 30.028 3
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
CSC-Pretrain Inst.permissive0.123 50.223 50.082 50.046 40.308 30.004 30.278 10.000 10.000 10.000 10.000 50.032 50.105 30.537 40.348 50.378 40.000 20.000 30.000 30.000 50.000 20.000 50.037 50.323 40.000 40.000 10.013 50.000 10.000 20.000 30.235 20.000 20.000 10.000 30.000 10.231 30.045 30.564 40.000 10.000 20.006 30.078 50.065 30.000 20.000 10.259 30.516 20.000 40.600 41.000 10.578 50.000 30.000 50.184 50.000 30.000 30.034 50.000 30.211 40.089 30.394 50.018 50.064 40.171 40.001 50.144 30.172 40.000 20.000 40.044 40.000 10.000 30.000 10.064 50.126 40.278 20.093 50.000 40.094 40.214 30.011 50.000 10.000 10.000 30.000 30.022 50.000 30.000 10.275 30.000 40.275 40.000 50.098 40.407 40.000 30.000 10.250 50.007 50.000 30.000 30.000 20.000 30.333 40.376 40.000 20.000 50.042 50.285 30.119 40.000 30.224 50.000 20.184 30.000 10.000 30.000 20.244 40.000 10.000 20.000 10.377 30.378 20.051 20.424 50.000 30.000 30.116 50.030 40.125 20.441 40.444 50.063 50.000 20.042 30.000 30.297 20.483 30.096 50.000 30.028 20.338 40.000 30.444 30.000 10.000 20.000 20.189 40.000 10.000 10.141 40.152 50.017 40.000 50.000 30.838 40.193 30.111 50.105 50.198 30.588 30.000 10.000 20.542 30.343 50.267 30.000 10.000 30.108 50.000 10.333 40.000 50.228 20.202 50.022 4
Ji Hou, Benjamin Graham, Matthias Nie├čner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
Minkowski 34D Inst.permissive0.130 40.246 40.083 40.043 50.299 40.000 50.278 10.000 10.000 10.000 10.022 30.175 30.122 20.537 40.521 20.400 30.000 20.000 30.000 30.008 30.000 20.048 40.076 30.182 50.000 40.000 10.022 40.000 10.000 20.000 30.141 50.000 20.000 10.000 30.000 10.210 40.063 20.547 50.000 10.000 20.000 50.100 20.026 50.000 20.000 10.241 40.488 40.000 40.564 51.000 10.672 20.000 30.021 40.486 10.000 30.000 30.067 40.000 30.194 50.033 40.415 40.026 40.025 50.271 10.004 40.094 50.142 50.000 20.000 40.111 30.000 10.000 30.000 10.088 40.083 50.278 20.110 40.000 40.082 50.199 50.137 30.000 10.000 10.000 30.000 30.041 40.000 30.000 10.308 20.067 30.280 30.016 40.101 30.373 50.000 30.000 10.319 40.007 40.000 30.000 30.000 20.000 30.028 50.355 50.000 20.101 10.444 20.289 20.114 50.000 30.394 30.000 20.032 50.000 10.000 30.000 20.201 50.000 10.000 20.000 10.384 20.248 40.000 50.529 40.000 30.000 30.133 30.020 50.089 30.720 30.500 40.099 40.000 20.000 50.000 30.238 40.334 50.190 30.000 30.000 50.317 50.000 30.472 10.000 10.000 20.000 20.094 50.000 10.000 10.082 50.236 40.004 50.019 40.000 30.883 20.061 50.262 20.217 40.000 40.557 50.000 10.000 20.460 40.761 40.156 50.000 10.000 30.259 40.000 10.394 10.019 40.084 40.232 40.000 5
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019