The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Volt-SPFormerpermissive0.475 10.630 10.451 20.314 10.806 50.666 40.923 10.847 40.541 30.224 30.755 10.008 40.994 10.735 20.818 10.869 10.621 20.990 60.811 10.894 40.702 20.423 20.825 10.281 20.923 20.787 20.564 40.699 20.245 20.784 20.800 40.129 10.900 10.500 50.000 40.768 20.841 21.000 10.319 10.000 60.903 10.068 40.772 20.565 20.683 41.000 10.546 20.410 11.000 10.930 10.014 40.629 80.878 20.725 30.499 30.799 60.412 30.019 40.400 50.500 11.000 10.612 11.000 10.125 10.343 20.823 20.750 10.449 40.250 20.056 10.585 20.797 30.500 20.667 20.000 10.043 51.000 11.000 10.716 10.853 10.255 20.099 40.857 20.651 30.000 20.000 30.025 20.375 10.250 10.056 40.333 40.002 30.000 50.250 50.500 31.000 10.107 40.613 10.294 30.300 30.000 20.817 20.400 10.500 21.000 10.400 20.452 30.000 20.500 20.519 40.500 10.372 40.482 20.750 30.641 30.510 20.500 10.000 21.000 10.472 31.000 10.000 10.026 30.000 40.331 30.000 31.000 11.000 10.000 40.500 10.304 20.000 30.000 31.000 10.000 10.714 10.000 10.500 30.750 10.000 10.944 20.000 10.000 10.857 20.764 10.455 30.250 20.278 20.000 11.000 10.000 10.078 40.742 10.600 10.524 50.638 10.167 70.208 80.209 30.019 20.000 10.241 30.000 10.000 21.000 11.000 10.000 30.028 30.000 20.200 30.000 30.250 11.000 10.000 1
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
DINO3D-Scannet200copyleft0.454 20.587 20.453 10.296 20.871 20.703 10.845 30.891 20.572 10.312 20.753 20.001 60.981 30.773 10.767 20.771 30.614 30.944 70.586 80.937 20.690 30.381 30.716 80.409 10.918 30.803 10.602 20.777 10.290 10.721 40.779 60.096 20.728 20.677 10.000 40.944 10.793 31.000 10.214 20.708 30.823 30.200 10.851 10.499 51.000 10.764 80.473 50.248 31.000 10.911 20.216 10.667 60.824 40.857 10.616 10.842 30.496 20.046 20.355 90.494 20.405 70.507 21.000 10.042 30.264 50.743 40.683 20.675 10.125 30.000 40.600 10.816 20.417 50.000 40.000 10.764 10.000 60.500 20.563 30.720 30.079 70.442 10.845 30.835 20.000 20.000 30.000 50.324 20.000 20.117 10.083 60.000 40.419 10.500 21.000 10.777 20.378 10.594 20.361 20.327 10.000 20.764 30.400 10.548 10.000 30.800 10.437 40.000 20.346 30.714 10.125 40.662 10.475 30.866 20.750 10.400 40.500 10.500 11.000 10.667 11.000 10.000 10.298 10.000 40.250 40.194 10.000 60.850 20.000 40.250 50.595 10.000 30.063 10.520 60.000 10.571 30.000 10.944 10.750 10.000 10.974 10.000 10.000 10.857 20.655 20.000 80.250 20.014 50.000 11.000 10.000 10.116 20.729 20.200 60.545 30.436 30.221 60.750 10.177 50.011 30.000 10.284 10.000 10.000 20.000 40.792 50.050 20.000 40.000 20.200 30.000 30.000 31.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
CompetitorFormer-2000.415 30.574 30.370 40.274 30.885 10.584 60.846 20.779 70.318 60.205 40.704 30.400 10.987 20.651 30.731 40.830 20.682 11.000 10.599 60.957 10.685 40.428 10.806 30.196 40.870 50.641 30.600 30.583 40.183 50.780 30.833 30.095 30.663 30.538 30.021 30.540 70.845 10.903 60.103 50.083 50.881 20.054 60.632 30.311 70.745 21.000 10.545 30.378 20.933 70.832 50.015 30.684 50.748 50.700 40.562 20.869 20.218 50.064 10.885 10.243 50.794 30.484 31.000 10.000 40.289 30.758 30.482 30.452 30.000 70.015 30.286 40.759 40.663 11.000 10.000 10.380 30.250 40.500 20.491 40.622 60.213 40.131 30.877 10.602 40.000 20.005 20.008 40.209 50.000 20.089 20.399 30.000 40.160 20.500 20.500 30.144 80.260 20.347 60.443 10.207 50.000 20.724 40.400 10.125 30.083 20.317 50.462 20.083 10.565 10.587 30.500 10.648 20.551 10.750 30.508 40.018 60.500 10.000 21.000 10.667 11.000 10.000 10.142 20.000 40.500 10.000 30.125 50.489 30.000 40.500 10.269 30.000 30.050 20.625 40.000 10.581 20.000 10.677 20.467 70.000 10.694 50.000 10.000 10.820 50.071 80.215 61.000 10.103 30.000 11.000 10.000 10.132 10.410 40.327 50.541 40.232 40.292 40.261 70.186 40.157 10.000 10.216 40.000 10.056 10.250 31.000 10.000 30.082 10.000 20.400 20.025 20.000 31.000 10.000 1
Mask3D Scannet2000.388 40.542 40.357 50.237 50.808 40.676 30.741 50.832 60.496 40.151 70.628 50.021 30.955 40.578 40.753 30.612 40.591 40.822 90.609 50.926 30.614 60.291 40.725 60.163 50.890 40.380 90.615 10.517 50.130 70.806 10.857 20.024 60.511 50.412 90.226 10.597 50.756 41.000 10.111 40.792 10.736 50.091 30.610 40.527 40.323 81.000 10.504 40.063 61.000 10.853 40.010 50.974 30.839 30.667 50.301 50.883 10.266 40.039 30.640 20.311 40.739 40.463 41.000 10.000 40.287 40.715 50.313 60.600 21.000 10.027 20.076 80.502 90.500 20.409 30.000 10.194 40.125 50.500 20.491 50.748 20.050 80.042 50.776 60.352 50.008 10.000 30.033 10.254 30.000 20.005 60.552 20.008 20.020 40.750 10.500 30.409 40.065 60.511 30.107 50.178 60.000 21.000 10.400 10.016 50.000 30.400 20.571 10.000 20.060 60.044 60.000 50.514 30.278 51.000 10.258 50.017 70.125 90.000 20.792 60.399 61.000 10.000 10.013 50.265 20.018 60.000 31.000 10.335 40.381 10.500 10.250 40.004 20.000 30.727 20.000 10.497 40.000 10.188 40.677 50.000 10.708 40.000 10.000 10.945 10.391 30.123 70.000 40.028 40.000 11.000 10.000 10.099 30.451 30.400 30.668 10.573 20.606 10.077 90.003 80.004 40.000 10.042 70.000 10.000 21.000 11.000 10.000 30.042 20.000 20.200 30.302 10.000 31.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
ODIN - Ins200permissive0.381 50.507 50.375 30.237 40.653 90.614 50.780 40.744 90.566 20.328 10.446 60.003 50.853 50.496 50.582 60.448 90.434 60.938 80.682 30.782 60.494 80.274 50.723 70.269 30.694 90.393 80.511 50.695 30.227 30.550 80.795 50.039 50.602 40.638 20.000 40.734 30.585 60.667 70.163 30.500 40.769 40.108 20.484 70.569 10.688 31.000 10.665 10.093 51.000 10.863 30.049 20.667 60.887 10.778 20.422 40.786 80.550 10.000 60.542 40.028 80.667 50.428 51.000 10.125 10.208 80.530 70.406 50.337 50.000 70.000 40.585 20.742 50.500 20.000 40.000 10.472 21.000 10.417 70.563 20.631 50.275 10.000 60.800 40.841 10.000 20.083 10.000 50.174 60.000 20.055 50.667 10.000 40.000 50.250 51.000 10.286 50.058 70.391 50.209 40.313 20.167 10.278 90.200 60.083 40.000 30.200 60.264 50.000 20.250 50.714 10.500 10.196 50.333 40.500 70.750 10.668 10.500 10.000 20.500 70.333 71.000 10.000 10.000 60.438 10.500 10.000 31.000 10.333 50.226 20.250 50.250 40.000 30.000 30.668 30.000 10.174 80.000 10.000 60.750 10.000 10.667 60.000 10.000 10.638 60.333 40.579 20.000 40.333 10.000 11.000 10.000 10.063 60.385 50.600 10.647 20.066 60.264 50.469 40.246 20.000 50.000 10.264 20.000 10.000 20.000 41.000 10.125 10.000 40.000 20.200 30.000 30.000 31.000 10.000 1
TD3D Scannet200permissive0.320 60.501 60.264 60.164 60.841 30.679 20.716 60.879 30.280 70.192 50.634 40.231 20.733 70.459 60.565 70.498 80.560 51.000 10.686 20.890 50.708 10.123 80.820 20.152 60.967 10.456 40.458 60.387 60.194 40.435 90.906 10.077 40.396 60.509 40.217 20.715 40.619 51.000 10.099 60.792 10.513 60.062 50.506 60.549 30.605 51.000 10.123 80.106 41.000 10.744 80.000 61.000 10.504 90.525 60.185 60.790 70.101 60.008 50.587 30.356 30.817 20.083 91.000 10.000 40.621 10.842 10.415 40.268 80.083 60.000 40.098 70.881 10.125 60.000 40.000 10.000 60.000 60.125 80.332 70.448 90.202 50.196 20.798 50.264 60.000 20.000 30.017 30.233 40.000 20.063 30.333 40.038 10.111 30.250 50.000 60.516 30.208 30.470 40.094 70.218 40.000 20.667 50.033 90.000 60.000 30.400 20.156 60.000 20.267 40.226 50.000 50.104 60.159 60.299 90.095 70.458 30.500 10.000 21.000 10.472 30.792 70.000 10.022 40.061 30.250 40.008 20.250 40.333 50.143 30.396 40.049 60.012 10.000 30.283 80.000 10.241 70.000 10.101 50.331 80.000 10.629 70.000 10.000 10.857 20.222 60.677 10.000 40.003 60.000 10.000 60.000 10.076 50.252 70.400 30.431 60.061 70.328 30.331 60.500 10.000 50.000 10.167 50.000 10.000 20.000 40.500 60.000 30.000 41.000 10.542 10.000 30.063 20.000 60.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
LGround Inst.permissive0.246 70.413 70.170 70.130 70.754 60.541 70.682 80.903 10.264 80.164 60.234 70.000 70.681 80.452 70.464 90.541 70.399 71.000 10.637 40.772 70.588 70.190 60.589 90.081 70.857 60.426 60.373 70.318 70.135 60.690 50.653 80.000 70.159 80.500 50.000 40.581 60.387 81.000 10.046 70.000 60.402 70.003 90.455 90.196 80.571 61.000 10.270 70.003 90.530 90.748 70.000 60.744 40.575 70.511 70.112 70.815 40.067 70.000 60.400 50.167 60.667 50.241 61.000 10.000 40.208 70.660 60.125 80.317 60.000 70.000 40.100 60.561 80.000 70.000 40.000 10.000 61.000 10.500 20.344 60.568 80.167 60.000 60.706 70.068 70.000 20.000 30.000 50.063 70.000 20.000 70.056 80.000 40.000 50.500 20.000 60.143 90.017 80.125 70.097 60.164 70.000 20.582 70.400 10.000 60.000 30.000 80.083 80.000 20.000 70.000 70.000 50.025 70.156 70.533 60.250 60.200 50.500 10.000 21.000 10.333 71.000 10.000 10.000 60.000 40.000 70.000 30.000 60.333 50.000 40.000 70.000 70.000 30.000 30.400 70.000 10.364 50.000 10.000 60.500 60.000 10.511 80.000 10.000 10.286 70.333 40.000 80.000 40.000 70.000 10.000 60.000 10.034 70.111 90.000 70.333 80.031 90.000 80.750 10.125 60.000 50.000 10.151 60.000 10.000 20.000 40.500 60.000 30.000 40.000 20.000 90.000 30.000 30.000 60.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
CSC-Pretrain Inst.permissive0.209 80.361 90.157 80.085 80.700 80.248 90.634 90.776 80.322 50.135 90.103 90.000 70.524 90.364 90.618 50.592 60.381 90.997 40.589 70.747 80.340 90.109 90.768 40.059 90.702 80.448 50.188 90.149 90.091 90.636 60.573 90.000 70.246 70.500 50.000 40.450 90.405 70.667 70.006 90.000 60.356 80.007 70.506 50.420 60.340 70.667 90.294 60.004 80.571 80.748 60.000 61.000 10.573 80.502 80.094 80.807 50.000 80.000 60.400 50.000 90.278 90.228 71.000 10.000 40.115 90.432 80.198 70.050 90.125 30.000 40.000 90.573 70.000 70.000 40.000 10.000 60.000 60.125 80.312 80.610 70.221 30.000 60.667 80.050 80.000 20.000 30.000 50.032 90.000 20.000 70.083 60.000 40.000 50.000 80.000 60.220 70.000 90.125 70.000 90.111 90.000 20.667 50.200 60.000 60.000 30.000 80.110 70.000 20.000 70.000 70.000 50.000 80.053 90.500 70.000 90.000 80.500 10.000 20.500 70.333 70.500 80.000 10.000 60.000 40.000 70.000 30.000 60.000 90.000 40.000 70.000 70.000 30.000 30.600 50.000 10.364 50.000 10.000 60.750 10.000 10.833 30.000 10.000 10.143 90.000 90.396 40.000 40.000 70.000 10.000 60.000 10.021 90.221 80.000 70.093 90.055 80.451 20.677 30.125 60.000 50.000 10.028 80.000 10.000 20.000 40.500 60.000 30.000 40.000 20.050 80.000 30.000 30.000 60.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
Minkowski 34D Inst.permissive0.203 90.369 80.134 90.078 90.706 70.382 80.693 70.845 50.221 90.150 80.158 80.000 70.746 60.369 80.545 80.595 50.387 80.997 40.413 90.720 90.636 50.165 70.732 50.070 80.851 70.402 70.251 80.313 80.123 80.583 70.696 70.000 70.051 90.500 50.000 40.500 80.372 90.667 70.009 80.000 60.307 90.003 80.479 80.107 90.226 90.903 70.109 90.031 70.981 60.726 90.000 60.522 90.669 60.282 90.052 90.778 90.000 80.000 60.400 50.074 70.333 80.218 81.000 10.000 40.250 60.406 90.118 90.317 60.100 50.000 40.191 50.596 60.000 70.000 40.000 10.000 60.000 60.500 20.178 90.701 40.000 90.000 60.522 90.018 90.000 20.000 30.000 50.060 80.000 20.000 70.033 90.000 40.000 50.000 80.000 60.281 60.100 50.000 90.090 80.133 80.000 20.422 80.050 80.000 60.000 30.200 60.000 90.000 20.000 70.000 70.000 50.000 80.123 80.677 50.021 80.000 80.500 10.000 20.500 70.442 50.125 90.000 10.000 60.000 40.000 70.000 30.000 60.056 80.000 40.000 70.000 70.000 30.000 30.200 90.000 10.143 90.000 10.000 60.250 90.000 10.511 80.000 10.000 10.286 70.083 70.396 40.000 40.000 70.000 10.000 60.000 10.025 80.300 60.000 70.371 70.070 50.000 80.385 50.000 90.000 50.000 10.000 90.000 10.000 20.000 40.500 60.000 30.000 40.000 20.200 30.000 30.000 30.000 60.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019