The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apalarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.278 40.383 40.263 50.168 40.506 40.068 20.083 90.000 20.000 40.000 10.023 60.149 80.302 30.778 60.647 40.569 20.500 30.031 30.014 50.027 60.173 30.311 20.195 40.351 60.258 40.000 40.082 50.000 10.003 10.037 60.391 41.000 10.000 10.014 60.000 30.572 40.573 20.661 50.000 30.003 50.005 80.082 80.349 30.028 20.000 10.605 40.515 70.509 10.711 21.000 10.665 60.015 60.107 50.402 70.201 10.083 50.304 40.759 10.491 30.378 30.572 40.119 40.277 40.013 90.089 40.283 50.411 50.267 10.006 70.156 30.000 20.116 40.000 50.105 70.556 50.514 30.396 10.275 30.323 30.215 50.380 10.000 20.000 10.356 30.005 50.208 30.325 10.000 30.050 70.400 20.561 40.258 50.179 40.722 30.147 20.000 10.586 30.063 30.015 30.139 50.016 20.028 40.708 30.418 60.016 50.048 70.500 20.489 40.349 40.001 60.475 50.086 40.365 40.000 10.500 10.000 30.323 70.000 30.222 30.000 10.497 30.626 30.044 70.795 20.556 20.008 20.121 80.265 40.667 10.789 30.568 40.579 30.444 10.176 40.004 20.474 20.752 40.233 50.014 40.002 80.570 20.007 20.377 90.000 20.000 30.000 20.337 50.000 20.000 10.384 30.465 30.287 50.085 40.048 60.816 90.467 30.810 10.377 50.415 10.744 30.000 10.004 40.724 10.778 40.590 20.000 10.032 50.441 30.000 10.377 50.391 30.427 40.321 40.192 3
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
ODIN - Ins200permissive0.265 50.349 50.268 40.163 50.360 50.054 30.278 40.000 20.125 10.000 10.031 30.506 10.266 50.630 70.609 50.481 40.903 10.000 61.000 10.032 50.000 40.022 80.138 50.314 80.310 30.000 40.178 30.000 10.000 20.552 20.421 20.889 50.000 10.451 20.097 10.357 60.054 60.485 90.052 10.040 20.210 10.160 20.370 10.000 40.000 10.191 90.529 50.250 40.617 71.000 10.492 90.016 50.197 40.324 80.000 40.250 40.265 60.167 40.317 50.200 60.549 50.107 50.231 60.119 70.141 30.253 60.267 60.000 30.565 10.111 60.000 20.000 60.278 20.285 10.665 40.389 50.306 50.077 50.037 90.186 90.156 60.000 20.000 10.478 20.000 60.091 60.204 20.000 30.345 20.200 60.550 50.674 20.160 50.526 60.438 10.000 10.476 60.035 60.003 50.444 10.000 40.333 10.361 70.606 30.083 40.332 20.417 60.327 50.297 50.035 10.615 20.281 30.083 80.000 10.250 40.000 30.610 40.000 30.333 20.000 10.238 90.481 50.218 20.440 81.000 10.000 40.229 20.257 50.000 70.746 40.361 90.188 60.000 40.221 30.000 30.320 50.655 60.193 60.000 50.067 40.389 60.000 50.594 10.037 10.000 30.000 20.371 40.000 20.000 10.344 50.366 70.506 40.074 50.250 40.848 70.451 40.389 50.546 30.205 40.698 50.000 10.000 50.494 70.769 60.493 30.000 10.000 60.463 20.000 10.333 70.333 40.640 10.251 60.115 5
TD3D Scannet200permissive0.211 60.332 60.177 60.103 60.337 60.036 60.222 80.000 20.000 40.000 10.031 40.342 50.093 80.852 30.452 80.559 30.000 60.004 50.000 60.039 40.000 40.309 30.047 80.380 40.028 60.000 40.080 60.000 10.000 20.147 50.192 70.000 60.000 10.083 50.000 30.395 50.039 80.662 40.000 30.000 60.074 40.135 40.296 50.000 40.000 10.231 80.646 20.139 60.633 61.000 10.705 20.048 40.088 60.439 50.184 20.039 60.266 50.551 30.260 70.026 90.463 60.046 70.252 50.249 30.083 50.372 40.411 40.000 30.414 30.323 10.000 20.052 50.000 50.157 50.278 60.278 60.237 60.015 60.321 40.253 40.060 80.000 20.000 10.272 50.008 40.169 40.032 30.000 30.404 10.356 40.283 60.073 70.028 90.617 50.038 30.000 10.494 50.037 50.215 20.083 60.000 40.003 60.486 60.694 20.000 60.040 80.083 80.219 90.209 60.007 50.483 40.000 60.125 70.000 10.150 60.014 20.544 50.000 30.000 60.000 10.260 80.143 90.200 30.610 60.028 50.032 10.145 50.059 60.046 60.740 50.806 30.543 50.000 40.108 60.008 10.222 90.669 50.456 10.074 30.224 10.586 10.006 30.451 40.000 20.002 20.889 10.282 60.000 20.000 10.252 60.413 50.111 60.074 50.240 50.893 40.266 60.144 70.293 60.281 20.604 60.000 10.000 50.379 90.963 30.250 80.000 10.160 20.420 50.000 10.343 60.207 50.079 90.315 50.052 6
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Volt-SPFormerpermissive0.367 10.475 10.359 10.248 10.635 20.051 40.333 10.000 20.125 10.000 10.029 50.345 40.528 21.000 10.663 30.400 50.389 50.012 40.556 20.235 10.407 20.240 50.308 20.550 10.380 20.250 20.193 20.000 10.000 20.439 40.416 31.000 10.000 10.254 30.000 30.609 20.638 10.678 30.000 30.004 40.113 30.144 30.333 40.028 20.000 10.719 10.685 10.139 60.682 31.000 10.689 30.052 20.247 30.470 40.000 40.304 20.484 10.000 60.588 20.378 30.736 10.241 20.663 10.066 80.299 10.717 10.660 10.000 30.466 20.156 30.000 20.500 30.278 20.230 30.831 10.556 10.365 20.192 40.822 10.565 10.318 20.111 10.000 10.533 10.013 30.232 20.000 40.778 10.112 60.400 20.693 20.588 30.284 20.684 40.000 40.000 10.556 40.050 40.008 40.333 40.029 10.278 21.000 10.748 10.500 20.340 11.000 10.787 10.575 10.013 40.527 30.017 50.502 10.000 10.500 10.167 10.650 30.000 30.222 30.000 10.549 10.655 20.238 10.799 10.556 20.002 30.170 40.348 30.250 20.873 11.000 10.652 20.444 10.551 10.000 30.524 10.821 20.329 20.000 50.117 20.383 70.014 10.417 80.000 20.000 30.000 20.469 20.000 20.000 10.552 10.494 20.515 30.710 10.388 10.928 20.524 10.537 40.560 20.167 60.817 10.000 10.019 20.682 21.000 10.864 10.000 10.099 30.664 10.000 10.395 30.764 10.442 30.418 20.117 4
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
CompetitorFormer-2000.328 30.439 20.303 30.223 30.543 30.044 50.333 10.044 10.000 40.000 10.099 10.444 20.296 40.850 40.722 20.820 10.444 40.047 10.083 40.183 20.562 10.243 40.312 10.380 30.192 51.000 10.143 40.000 10.000 20.484 30.259 51.000 10.000 10.500 10.000 30.650 10.221 40.771 10.004 20.010 30.043 50.120 50.366 20.054 10.000 10.689 20.641 30.500 20.663 41.000 10.673 40.049 30.400 10.479 30.014 30.267 30.455 20.083 50.400 40.400 10.663 20.243 10.464 20.192 50.076 60.427 20.620 20.025 20.013 60.322 20.000 20.677 20.333 10.178 40.808 20.556 10.356 30.345 10.119 60.346 30.312 30.000 20.000 10.305 40.116 20.137 50.000 40.065 20.171 50.314 50.575 30.487 40.303 10.820 10.000 40.000 10.655 10.088 10.373 10.430 20.011 30.103 30.835 20.569 40.125 30.123 40.500 20.774 20.504 30.019 30.465 60.353 20.475 20.000 10.500 10.000 30.712 20.050 20.667 10.000 10.396 40.555 40.120 40.786 30.069 40.000 40.182 30.390 20.000 70.831 21.000 10.679 10.111 30.110 50.000 30.450 40.868 10.277 30.083 20.069 30.471 40.001 40.428 70.000 20.000 30.000 20.421 30.043 10.000 10.358 40.456 40.518 20.237 30.256 30.945 10.271 50.632 30.534 40.208 30.730 40.000 10.140 10.658 31.000 10.452 50.000 10.082 40.441 40.000 10.472 20.060 70.454 20.469 10.384 1
DINO3D-Scannet200copyleft0.346 20.437 30.353 20.229 20.687 10.174 10.333 10.000 20.042 30.000 10.094 20.384 30.618 10.940 20.764 10.292 90.889 20.042 20.000 60.142 30.000 40.456 10.263 30.371 50.407 10.250 20.257 10.000 10.000 20.642 10.431 11.000 10.000 10.250 40.028 20.594 30.436 30.729 20.000 30.138 10.192 20.206 10.083 60.000 40.000 10.611 30.574 40.306 30.719 11.000 10.733 10.066 10.361 20.545 10.000 40.585 10.388 30.558 20.639 10.400 10.659 30.183 30.297 30.246 40.199 20.373 30.446 30.000 30.378 40.156 30.500 10.772 10.111 40.253 20.752 30.477 40.325 40.282 20.551 20.504 20.241 40.000 20.000 10.156 60.238 10.251 10.000 40.000 30.000 80.599 10.712 10.750 10.266 30.766 20.000 40.000 10.628 20.082 20.001 60.417 30.000 40.014 50.708 30.536 50.516 10.328 30.500 20.669 30.529 20.027 20.732 10.764 10.365 30.000 10.250 40.000 30.921 10.063 10.222 30.000 10.520 20.769 10.045 60.714 40.000 60.000 40.264 10.417 10.049 50.731 60.514 50.545 40.000 40.264 20.000 30.462 30.803 30.247 40.303 10.049 50.514 30.000 50.558 20.000 20.111 10.000 20.556 10.000 20.000 10.406 20.536 10.681 10.484 20.346 20.925 30.470 20.664 20.726 10.130 70.780 20.000 10.009 30.618 40.764 70.487 40.000 10.442 10.245 80.000 10.593 10.655 20.345 50.411 30.279 2
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
Minkowski 34D Inst.permissive0.130 80.246 80.083 80.043 90.299 80.000 90.278 40.000 20.000 40.000 10.022 70.175 70.122 60.537 80.521 60.400 50.000 60.000 60.000 60.008 70.000 40.048 70.076 70.182 90.000 80.000 40.022 80.000 10.000 20.000 70.141 90.000 60.000 10.000 70.000 30.210 80.063 50.547 80.000 30.000 60.000 90.100 60.026 90.000 40.000 10.241 70.488 80.000 80.564 91.000 10.672 50.000 70.021 80.486 20.000 40.000 70.067 80.000 60.194 90.033 80.415 80.026 80.025 90.271 10.004 80.094 90.142 90.000 30.000 80.111 60.000 20.000 60.000 50.088 80.083 90.278 60.110 80.000 80.082 80.199 80.137 70.000 20.000 10.000 70.000 60.041 80.000 40.000 30.308 30.067 70.280 70.016 80.101 70.373 90.000 40.000 10.319 80.007 80.000 70.000 70.000 40.000 70.028 90.355 90.000 60.101 50.444 50.289 60.114 90.000 70.394 70.000 60.032 90.000 10.000 70.000 30.201 90.000 30.000 60.000 10.384 50.248 80.000 90.529 70.000 60.000 40.133 70.020 90.089 40.720 70.500 70.099 80.000 40.000 90.000 30.238 80.334 90.190 70.000 50.000 90.317 90.000 50.472 30.000 20.000 30.000 20.094 90.000 20.000 10.082 90.236 80.004 90.019 80.000 70.883 50.061 90.262 60.217 80.000 80.557 90.000 10.000 50.460 80.761 80.156 90.000 10.000 60.259 70.000 10.394 40.019 80.084 80.232 80.000 9
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 90.223 90.082 90.046 80.308 70.004 70.278 40.000 20.000 40.000 10.000 90.032 90.105 70.537 80.348 90.378 70.000 60.000 60.000 60.000 90.000 40.000 90.037 90.323 70.000 80.000 40.013 90.000 10.000 20.000 70.235 60.000 60.000 10.000 70.000 30.231 70.045 70.564 70.000 30.000 60.006 70.078 90.065 70.000 40.000 10.259 60.516 60.000 80.600 81.000 10.578 80.000 70.000 90.184 90.000 40.000 70.034 90.000 60.211 80.089 70.394 90.018 90.064 80.171 60.001 90.144 70.172 80.000 30.000 80.044 80.000 20.000 60.000 50.064 90.126 80.278 60.093 90.000 80.094 70.214 60.011 90.000 20.000 10.000 70.000 60.022 90.000 40.000 30.275 40.000 80.275 80.000 90.098 80.407 80.000 40.000 10.250 90.007 90.000 70.000 70.000 40.000 70.333 80.376 80.000 60.000 90.042 90.285 70.119 80.000 70.224 90.000 60.184 60.000 10.000 70.000 30.244 80.000 30.000 60.000 10.377 60.378 60.051 50.424 90.000 60.000 40.116 90.030 80.125 30.441 80.444 80.063 90.000 40.042 70.000 30.297 60.483 70.096 90.000 50.028 60.338 80.000 50.444 50.000 20.000 30.000 20.189 80.000 20.000 10.141 80.152 90.017 80.000 90.000 70.838 80.193 70.111 90.105 90.198 50.588 70.000 10.000 50.542 60.343 90.267 70.000 10.000 60.108 90.000 10.333 70.000 90.228 60.202 90.022 8
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.154 70.275 70.108 70.060 70.295 90.002 80.278 40.000 20.000 40.000 10.006 80.272 60.064 90.815 50.503 70.333 80.000 60.000 60.556 20.001 80.000 40.148 60.078 60.448 20.007 70.000 40.024 70.000 10.000 20.000 70.190 80.000 60.000 10.000 70.000 30.209 90.031 90.573 60.000 30.000 60.041 60.099 70.037 80.000 40.000 10.327 50.364 90.181 50.642 51.000 10.654 70.000 70.023 70.429 60.000 40.000 70.097 70.000 60.278 60.267 50.434 70.048 60.092 70.257 20.030 70.097 80.189 70.000 30.089 50.000 90.000 20.000 60.000 50.115 60.166 70.222 90.222 70.003 70.127 50.213 70.169 50.000 20.000 10.000 70.000 60.044 70.000 40.000 30.000 80.000 80.268 90.222 60.130 60.494 70.000 40.000 10.363 70.015 70.000 70.000 70.000 40.000 70.611 50.400 70.000 60.056 60.278 70.242 80.180 70.000 70.383 80.000 60.209 50.000 10.000 70.000 30.364 60.000 30.000 60.000 10.323 70.302 70.019 80.654 50.000 60.000 40.141 60.045 70.000 70.427 90.514 50.143 70.000 40.028 80.000 30.252 70.402 80.156 80.000 50.028 60.470 50.000 50.444 50.000 20.000 30.000 20.205 70.000 20.000 10.203 70.381 60.026 70.037 70.000 70.881 60.099 80.135 80.239 70.000 80.585 80.000 10.000 50.616 50.778 40.322 60.000 10.000 60.407 60.000 10.333 70.148 60.177 70.242 70.028 7
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.