The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg aphead apcommon aptail apchairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.278 40.383 40.263 50.168 40.661 50.465 30.572 40.665 60.391 40.121 80.304 40.015 30.647 40.349 40.474 20.489 40.321 40.816 90.351 60.722 30.402 70.195 40.515 70.082 50.795 20.215 50.396 10.377 50.082 80.724 10.586 30.015 60.277 40.377 90.201 10.475 50.572 40.778 60.089 40.759 10.556 50.068 20.506 40.467 30.323 70.778 40.427 40.027 60.789 30.744 30.003 50.570 20.561 40.337 50.265 40.711 20.258 40.031 30.569 20.311 20.441 30.179 41.000 10.000 40.233 50.411 50.283 50.380 10.667 10.016 20.048 70.418 60.139 50.173 30.000 10.086 40.014 50.500 20.384 30.497 30.044 70.032 50.752 40.287 50.003 10.000 30.007 20.208 30.000 20.001 60.349 30.008 20.014 40.509 10.500 30.323 30.023 60.176 40.107 50.105 70.000 20.605 40.378 30.016 50.000 30.400 20.192 30.000 20.048 60.037 60.000 50.275 30.119 40.810 10.258 50.006 70.083 90.000 20.568 40.377 50.708 30.000 10.005 50.147 20.014 60.000 30.556 20.085 40.325 10.500 10.083 50.004 20.000 30.590 20.000 10.365 40.000 10.116 40.491 30.000 10.626 30.000 10.000 10.579 30.391 30.050 70.000 40.028 40.000 10.222 30.000 10.063 30.302 30.356 30.149 80.573 20.415 10.013 90.002 80.004 40.000 10.005 80.000 10.000 20.444 10.514 30.000 30.028 20.000 20.156 30.267 10.000 31.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
CompetitorFormer-2000.328 30.439 20.303 30.223 30.771 10.456 40.663 20.673 40.259 50.182 30.455 20.373 10.722 20.504 30.450 40.774 20.469 10.945 10.380 30.820 10.479 30.312 10.641 30.143 40.786 30.346 30.356 30.534 40.120 50.658 30.655 10.049 30.464 20.428 70.014 30.465 60.650 10.850 40.076 60.083 50.808 20.044 50.543 30.271 50.712 21.000 10.454 20.183 20.831 20.730 40.010 30.471 40.575 30.421 30.390 20.663 40.192 50.047 10.820 10.243 40.441 40.303 11.000 10.000 40.277 30.620 20.427 20.312 30.000 70.011 30.123 40.569 40.430 20.562 10.000 10.353 20.083 40.500 20.358 40.396 40.120 40.082 40.868 10.518 20.000 20.004 20.001 40.137 50.000 20.019 30.366 20.000 40.083 20.500 20.444 40.119 60.099 10.110 50.400 10.178 40.000 20.689 20.400 10.125 30.065 20.314 50.384 10.044 10.256 30.484 30.333 10.345 10.243 10.632 30.487 40.013 60.333 10.000 21.000 10.472 20.835 20.000 10.116 20.000 40.500 10.000 30.069 40.237 30.000 40.500 10.267 30.000 30.050 20.452 50.000 10.475 20.000 10.677 20.400 40.000 10.555 40.000 10.000 10.679 10.060 70.171 51.000 10.103 30.000 10.667 10.000 10.088 10.296 40.305 40.444 20.221 40.208 30.192 50.069 30.140 10.000 10.043 50.000 10.043 10.111 30.556 10.000 30.054 10.000 20.322 20.025 20.000 31.000 10.000 1
Volt-SPFormerpermissive0.367 10.475 10.359 10.248 10.678 30.494 20.736 10.689 30.416 30.170 40.484 10.008 40.663 30.575 10.524 10.787 10.418 20.928 20.550 10.684 40.470 40.308 20.685 10.193 20.799 10.565 10.365 20.560 20.144 30.682 20.556 40.052 20.663 10.417 80.000 40.527 30.609 21.000 10.299 10.000 60.831 10.051 40.635 20.524 10.650 31.000 10.442 30.235 10.873 10.817 10.004 40.383 70.693 20.469 20.348 30.682 30.380 20.012 40.400 50.240 50.664 10.284 21.000 10.125 10.329 20.660 10.717 10.318 20.250 20.029 10.340 10.748 10.333 40.407 20.000 10.017 50.556 21.000 10.552 10.549 10.238 10.099 30.821 20.515 30.000 20.000 30.014 10.232 20.111 10.013 40.333 40.002 30.000 50.139 60.389 50.822 10.029 50.551 10.247 30.230 30.000 20.719 10.378 30.500 20.778 10.400 20.117 40.000 20.388 10.439 40.278 20.192 40.241 20.537 40.588 30.466 20.333 10.000 21.000 10.395 31.000 10.000 10.013 30.000 40.254 30.000 30.556 20.710 10.000 40.500 10.304 20.000 30.000 30.864 10.000 10.502 10.000 10.500 30.588 20.000 10.655 20.000 10.000 10.652 20.764 10.112 60.250 20.278 20.000 10.222 30.000 10.050 40.528 20.533 10.345 40.638 10.167 60.066 80.117 20.019 20.000 10.113 30.000 10.000 20.444 10.556 10.000 30.028 20.000 20.156 30.000 30.167 11.000 10.000 1
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
DINO3D-Scannet200copyleft0.346 20.437 30.353 20.229 20.729 20.536 10.659 30.733 10.431 10.264 10.388 30.001 60.764 10.529 20.462 30.669 30.411 30.925 30.371 50.766 20.545 10.263 30.574 40.257 10.714 40.504 20.325 40.726 10.206 10.618 40.628 20.066 10.297 30.558 20.000 40.732 10.594 30.940 20.199 20.558 20.752 30.174 10.687 10.470 20.921 10.764 70.345 50.142 30.731 60.780 20.138 10.514 30.712 10.556 10.417 10.719 10.407 10.042 20.292 90.456 10.245 80.266 31.000 10.042 30.247 40.446 30.373 30.241 40.049 50.000 40.328 30.536 50.417 30.000 40.000 10.764 10.000 60.500 20.406 20.520 20.045 60.442 10.803 30.681 10.000 20.000 30.000 50.251 10.000 20.027 20.083 60.000 40.303 10.306 30.889 20.551 20.094 20.264 20.361 20.253 20.000 20.611 30.400 10.516 10.000 30.599 10.279 20.000 20.346 20.642 10.111 40.282 20.183 30.664 20.750 10.378 40.333 10.500 10.514 50.593 10.708 30.000 10.238 10.000 40.250 40.111 10.000 60.484 20.000 40.250 40.585 10.000 30.063 10.487 40.000 10.365 30.000 10.772 10.639 10.000 10.769 10.000 10.000 10.545 40.655 20.000 80.250 20.014 50.000 10.222 30.000 10.082 20.618 10.156 60.384 30.436 30.130 70.246 40.049 50.009 30.000 10.192 20.000 10.000 20.000 40.477 40.028 20.000 40.000 20.156 30.000 30.000 31.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.265 50.349 50.268 40.163 50.485 90.366 70.549 50.492 90.421 20.229 20.265 60.003 50.609 50.297 50.320 50.327 50.251 60.848 70.314 80.526 60.324 80.138 50.529 50.178 30.440 80.186 90.306 50.546 30.160 20.494 70.476 60.016 50.231 60.594 10.000 40.615 20.357 60.630 70.141 30.167 40.665 40.054 30.360 50.451 40.610 40.769 60.640 10.032 50.746 40.698 50.040 20.389 60.550 50.371 40.257 50.617 70.310 30.000 60.481 40.022 80.463 20.160 51.000 10.125 10.193 60.267 60.253 60.156 60.000 70.000 40.332 20.606 30.444 10.000 40.000 10.281 31.000 10.417 60.344 50.238 90.218 20.000 60.655 60.506 40.000 20.052 10.000 50.091 60.000 20.035 10.370 10.000 40.000 50.250 40.903 10.037 90.031 30.221 30.197 40.285 10.037 10.191 90.200 60.083 40.000 30.200 60.115 50.000 20.250 40.552 20.278 20.077 50.107 50.389 50.674 20.565 10.278 40.000 20.361 90.333 70.361 70.000 10.000 60.438 10.451 20.000 31.000 10.074 50.204 20.250 40.250 40.000 30.000 30.493 30.000 10.083 80.000 10.000 60.317 50.000 10.481 50.000 10.000 10.188 60.333 40.345 20.000 40.333 10.000 10.333 20.000 10.035 60.266 50.478 20.506 10.054 60.205 40.119 70.067 40.000 50.000 10.210 10.000 10.000 20.000 40.389 50.097 10.000 40.000 20.111 60.000 30.000 30.889 50.000 1
TD3D Scannet200permissive0.211 60.332 60.177 60.103 60.662 40.413 50.463 60.705 20.192 70.145 50.266 50.215 20.452 80.209 60.222 90.219 90.315 50.893 40.380 40.617 50.439 50.047 80.646 20.080 60.610 60.253 40.237 60.293 60.135 40.379 90.494 50.048 40.252 50.451 40.184 20.483 40.395 50.852 30.083 50.551 30.278 60.036 60.337 60.266 60.544 50.963 30.079 90.039 40.740 50.604 60.000 60.586 10.283 60.282 60.059 60.633 60.028 60.004 50.559 30.309 30.420 50.028 91.000 10.000 40.456 10.411 40.372 40.060 80.046 60.000 40.040 80.694 20.083 60.000 40.000 10.000 60.000 60.083 80.252 60.260 80.200 30.160 20.669 50.111 60.000 20.000 30.006 30.169 40.000 20.007 50.296 50.032 10.074 30.139 60.000 60.321 40.031 40.108 60.088 60.157 50.000 20.231 80.026 90.000 60.000 30.356 40.052 60.000 20.240 50.147 50.000 50.015 60.046 70.144 70.073 70.414 30.222 80.000 20.806 30.343 60.486 60.000 10.008 40.038 30.083 50.002 20.028 50.074 50.032 30.150 60.039 60.008 10.000 30.250 80.000 10.125 70.000 10.052 50.260 70.000 10.143 90.000 10.000 10.543 50.207 50.404 10.000 40.003 60.000 10.000 60.000 10.037 50.093 80.272 50.342 50.039 80.281 20.249 30.224 10.000 50.000 10.074 40.000 10.000 20.000 40.278 60.000 30.000 40.889 10.323 10.000 30.014 20.000 60.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.130 80.246 80.083 80.043 90.547 80.236 80.415 80.672 50.141 90.133 70.067 80.000 70.521 60.114 90.238 80.289 60.232 80.883 50.182 90.373 90.486 20.076 70.488 80.022 80.529 70.199 80.110 80.217 80.100 60.460 80.319 80.000 70.025 90.472 30.000 40.394 70.210 80.537 80.004 80.000 60.083 90.000 90.299 80.061 90.201 90.761 80.084 80.008 70.720 70.557 90.000 60.317 90.280 70.094 90.020 90.564 90.000 80.000 60.400 50.048 70.259 70.101 71.000 10.000 40.190 70.142 90.094 90.137 70.089 40.000 40.101 50.355 90.000 70.000 40.000 10.000 60.000 60.444 50.082 90.384 50.000 90.000 60.334 90.004 90.000 20.000 30.000 50.041 80.000 20.000 70.026 90.000 40.000 50.000 80.000 60.082 80.022 70.000 90.021 80.088 80.000 20.241 70.033 80.000 60.000 30.067 70.000 90.000 20.000 70.000 70.000 50.000 80.026 80.262 60.016 80.000 80.278 40.000 20.500 70.394 40.028 90.000 10.000 60.000 40.000 70.000 30.000 60.019 80.000 40.000 70.000 70.000 30.000 30.156 90.000 10.032 90.000 10.000 60.194 90.000 10.248 80.000 10.000 10.099 80.019 80.308 30.000 40.000 70.000 10.000 60.000 10.007 80.122 60.000 70.175 70.063 50.000 80.271 10.000 90.000 50.000 10.000 90.000 10.000 20.000 40.278 60.000 30.000 40.000 20.111 60.000 30.000 30.000 60.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.123 90.223 90.082 90.046 80.564 70.152 90.394 90.578 80.235 60.116 90.034 90.000 70.348 90.119 80.297 60.285 70.202 90.838 80.323 70.407 80.184 90.037 90.516 60.013 90.424 90.214 60.093 90.105 90.078 90.542 60.250 90.000 70.064 80.444 50.000 40.224 90.231 70.537 80.001 90.000 60.126 80.004 70.308 70.193 70.244 80.343 90.228 60.000 90.441 80.588 70.000 60.338 80.275 80.189 80.030 80.600 80.000 80.000 60.378 70.000 90.108 90.098 81.000 10.000 40.096 90.172 80.144 70.011 90.125 30.000 40.000 90.376 80.000 70.000 40.000 10.000 60.000 60.042 90.141 80.377 60.051 50.000 60.483 70.017 80.000 20.000 30.000 50.022 90.000 20.000 70.065 70.000 40.000 50.000 80.000 60.094 70.000 90.042 70.000 90.064 90.000 20.259 60.089 70.000 60.000 30.000 80.022 80.000 20.000 70.000 70.000 50.000 80.018 90.111 90.000 90.000 80.278 40.000 20.444 80.333 70.333 80.000 10.000 60.000 40.000 70.000 30.000 60.000 90.000 40.000 70.000 70.000 30.000 30.267 70.000 10.184 60.000 10.000 60.211 80.000 10.378 60.000 10.000 10.063 90.000 90.275 40.000 40.000 70.000 10.000 60.000 10.007 90.105 70.000 70.032 90.045 70.198 50.171 60.028 60.000 50.000 10.006 70.000 10.000 20.000 40.278 60.000 30.000 40.000 20.044 80.000 30.000 30.000 60.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.154 70.275 70.108 70.060 70.573 60.381 60.434 70.654 70.190 80.141 60.097 70.000 70.503 70.180 70.252 70.242 80.242 70.881 60.448 20.494 70.429 60.078 60.364 90.024 70.654 50.213 70.222 70.239 70.099 70.616 50.363 70.000 70.092 70.444 50.000 40.383 80.209 90.815 50.030 70.000 60.166 70.002 80.295 90.099 80.364 60.778 40.177 70.001 80.427 90.585 80.000 60.470 50.268 90.205 70.045 70.642 50.007 70.000 60.333 80.148 60.407 60.130 61.000 10.000 40.156 80.189 70.097 80.169 50.000 70.000 40.056 60.400 70.000 70.000 40.000 10.000 60.556 20.278 70.203 70.323 70.019 80.000 60.402 80.026 70.000 20.000 30.000 50.044 70.000 20.000 70.037 80.000 40.000 50.181 50.000 60.127 50.006 80.028 80.023 70.115 60.000 20.327 50.267 50.000 60.000 30.000 80.028 70.000 20.000 70.000 70.000 50.003 70.048 60.135 80.222 60.089 50.278 40.000 20.514 50.333 70.611 50.000 10.000 60.000 40.000 70.000 30.000 60.037 70.000 40.000 70.000 70.000 30.000 30.322 60.000 10.209 50.000 10.000 60.278 60.000 10.302 70.000 10.000 10.143 70.148 60.000 80.000 40.000 70.000 10.000 60.000 10.015 70.064 90.000 70.272 60.031 90.000 80.257 20.028 60.000 50.000 10.041 60.000 10.000 20.000 40.222 90.000 30.000 40.000 20.000 90.000 30.000 30.000 60.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.