The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
DINO3D-Scannet200copyleft0.511 10.685 10.484 10.331 10.892 20.821 10.890 10.907 30.629 10.468 10.905 10.001 51.000 10.816 10.968 20.863 10.811 20.944 60.596 70.960 30.778 10.532 20.719 70.481 10.851 60.803 10.873 10.850 10.421 10.806 40.856 50.111 40.761 10.677 10.000 40.944 10.861 21.000 10.220 10.708 30.856 30.220 10.864 10.579 11.000 10.764 80.655 30.327 21.000 10.911 10.244 10.667 80.923 10.857 10.702 10.889 20.496 20.048 20.355 80.494 10.794 20.798 21.000 10.042 20.264 50.817 40.683 10.675 10.167 20.000 30.700 10.824 30.417 40.000 30.000 20.764 10.000 50.500 10.699 10.789 40.079 60.472 10.845 20.930 10.000 20.667 10.000 40.412 10.000 10.163 31.000 10.000 30.419 10.500 21.000 10.777 10.576 30.867 30.378 20.334 20.028 30.764 20.542 10.559 10.000 20.800 10.528 20.000 20.346 30.714 10.125 30.756 20.754 30.866 20.750 10.600 20.500 10.500 11.000 10.667 11.000 10.000 10.298 10.000 50.250 30.194 20.000 50.850 10.000 40.250 40.595 10.000 30.063 10.860 20.000 10.714 10.000 10.944 10.750 10.000 10.974 10.000 10.000 10.857 20.655 10.719 70.250 20.014 40.000 11.000 10.000 10.142 30.744 10.200 50.746 30.436 20.221 60.798 10.500 50.011 20.000 10.385 50.000 10.000 20.000 30.792 40.663 10.000 40.000 20.200 40.000 30.000 21.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
CompetitorFormer-2000.469 20.676 20.401 30.296 20.901 10.729 40.885 20.829 60.380 50.320 30.873 30.400 10.998 30.711 20.980 10.847 20.854 11.000 10.696 30.989 10.759 30.556 10.806 20.240 30.918 20.650 20.818 30.629 30.224 30.839 10.933 10.247 20.711 20.540 30.021 30.543 70.900 10.903 70.118 40.125 50.916 10.057 50.692 20.410 60.747 21.000 10.664 20.424 10.933 60.839 40.207 20.703 70.748 50.700 60.610 20.869 30.270 40.068 10.878 10.244 40.794 20.698 41.000 10.000 30.325 30.770 70.482 20.452 30.025 80.015 20.293 30.829 20.663 11.000 10.013 10.385 30.250 40.500 10.491 50.850 20.214 50.131 40.878 10.617 30.000 20.085 50.009 30.278 40.000 10.295 11.000 10.000 30.160 20.500 20.500 30.342 60.534 40.901 20.474 10.222 40.011 50.724 30.542 10.125 20.083 10.336 40.500 30.083 10.565 10.587 30.500 10.827 10.829 10.750 30.508 30.018 50.500 10.000 21.000 10.667 11.000 10.000 10.173 20.286 20.500 10.000 40.125 40.489 30.000 40.500 10.269 20.000 30.050 20.834 30.000 10.581 40.000 10.677 20.467 70.000 10.886 30.000 10.000 10.820 40.144 61.000 11.000 10.103 20.000 11.000 10.000 10.175 10.410 50.330 40.701 50.257 30.292 40.285 70.574 40.157 10.000 10.863 10.000 10.056 10.250 21.000 10.000 30.109 20.000 20.400 20.025 20.000 21.000 10.000 1
ODIN - Ins200permissive0.451 30.637 40.407 20.277 30.742 80.699 50.855 30.826 80.626 20.441 20.742 50.003 40.941 50.637 30.910 40.616 70.679 50.944 60.695 40.877 50.763 20.357 40.723 60.475 20.779 70.494 30.782 40.795 20.334 20.824 20.867 40.108 50.701 30.638 20.000 40.873 20.749 40.667 80.203 20.500 40.886 20.116 20.583 70.571 20.688 31.000 10.760 10.162 51.000 10.852 30.078 50.833 50.887 20.778 20.577 30.859 60.550 10.000 50.542 40.028 70.667 50.874 11.000 10.125 10.232 60.870 20.406 40.337 50.167 20.000 30.671 20.742 40.500 20.000 30.000 20.528 21.000 10.417 60.597 20.872 10.275 10.000 60.800 40.850 20.000 20.528 20.000 40.215 50.000 10.238 20.667 30.000 30.019 50.250 61.000 10.429 50.599 20.778 40.221 30.370 10.284 10.278 80.400 50.125 20.000 20.200 50.404 40.000 20.250 50.714 10.500 10.504 50.769 20.677 50.750 10.963 10.500 10.000 20.500 70.333 71.000 10.000 10.000 60.438 10.500 10.000 41.000 10.333 50.226 20.250 40.250 30.000 30.000 30.668 50.000 10.494 70.000 10.000 50.750 10.000 10.833 40.000 10.000 10.777 50.333 30.944 30.000 30.333 10.000 11.000 10.000 10.089 50.407 60.600 10.823 20.080 40.264 50.469 50.717 10.000 40.000 10.500 30.000 10.000 20.000 31.000 10.125 20.333 10.000 20.200 40.000 30.000 21.000 10.000 1
Mask3D Scannet2000.445 40.653 30.392 40.254 40.844 40.746 30.818 40.888 50.556 30.262 40.890 20.025 31.000 10.608 40.930 30.694 50.721 30.930 80.686 50.966 20.615 70.440 30.725 50.201 40.890 40.414 70.827 20.552 40.158 80.806 30.924 20.042 60.512 50.412 80.226 10.604 50.830 31.000 10.125 30.792 10.815 40.097 30.648 30.551 40.354 71.000 10.630 40.241 41.000 10.853 20.204 30.974 40.841 30.778 20.358 50.927 10.300 30.045 30.640 20.363 20.745 40.710 31.000 10.000 30.330 20.943 10.315 50.600 21.000 10.027 10.080 80.556 80.500 20.409 20.000 20.194 41.000 10.500 10.493 40.761 50.053 70.042 50.780 50.454 40.009 10.333 30.050 10.321 20.000 10.084 40.552 50.008 20.027 40.750 10.500 30.442 40.657 10.765 50.120 50.183 60.021 41.000 10.510 40.016 40.000 20.400 20.619 10.000 20.396 20.290 40.000 40.741 30.699 41.000 10.260 40.017 60.125 80.000 20.792 60.399 61.000 10.000 10.049 50.265 30.063 60.000 41.000 10.335 40.381 10.500 10.250 30.004 20.000 30.727 40.000 10.538 50.000 10.188 30.677 40.000 10.930 20.000 10.000 10.966 10.391 20.908 40.000 30.028 30.000 11.000 10.000 10.152 20.451 30.458 20.971 10.573 10.606 10.167 80.625 20.004 30.000 10.058 80.000 10.000 21.000 11.000 10.000 30.056 30.000 20.200 40.309 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.379 50.603 50.306 50.190 50.885 30.755 20.800 50.958 10.390 40.260 50.866 40.232 20.979 40.523 60.869 60.559 80.689 41.000 10.795 10.905 40.748 40.173 80.825 10.173 50.970 10.457 40.615 50.456 50.200 40.621 70.906 30.553 10.517 40.510 40.220 20.715 30.706 51.000 10.113 50.792 10.717 50.073 40.635 40.557 30.638 41.000 10.205 80.146 61.000 10.769 80.186 41.000 10.710 80.778 20.415 40.834 70.226 50.021 40.590 30.356 30.817 10.477 81.000 10.000 30.635 10.843 30.427 30.270 70.125 40.000 30.102 61.000 10.125 50.000 30.000 20.000 50.000 50.125 70.370 60.622 80.221 20.196 30.836 30.288 50.000 20.093 40.020 20.294 30.000 10.075 50.667 30.038 10.111 30.250 60.000 70.526 30.495 60.908 10.111 60.259 30.003 60.667 40.045 80.000 50.000 20.400 20.274 60.000 20.274 40.226 50.000 40.520 40.302 80.731 40.103 60.458 30.500 10.000 21.000 10.472 30.792 60.000 10.088 40.061 40.250 30.009 30.250 30.333 50.181 30.396 30.051 50.012 10.000 30.458 70.000 10.424 80.000 10.101 40.390 80.000 10.833 40.000 10.000 10.857 20.222 51.000 10.000 30.003 50.000 10.000 50.000 10.102 40.275 80.400 30.735 40.061 60.433 30.533 40.625 20.000 40.000 10.259 70.000 10.000 20.000 30.500 50.000 30.000 41.000 10.600 10.000 30.250 10.000 50.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.280 70.488 70.192 80.124 70.804 60.518 70.772 80.904 40.337 80.191 70.443 70.000 60.861 70.502 70.868 70.669 60.587 70.997 40.467 80.828 80.732 50.342 60.745 40.119 80.918 20.404 80.419 70.398 60.172 60.618 80.743 70.167 30.077 80.500 50.000 40.568 60.506 81.000 10.044 70.000 60.502 70.010 70.593 60.284 80.305 80.903 70.213 70.142 70.981 50.790 70.000 71.000 10.715 70.538 80.346 70.830 80.067 60.000 50.400 50.074 60.333 70.551 51.000 10.000 30.292 40.777 60.118 80.317 60.100 60.000 30.191 50.648 60.000 60.000 30.000 20.000 50.000 50.500 10.213 80.825 30.021 80.333 20.648 80.098 70.000 20.000 60.000 40.077 60.000 10.000 80.150 80.000 30.000 60.000 80.225 50.281 70.447 70.000 80.090 70.148 70.000 70.479 70.542 10.000 50.000 20.200 50.131 80.000 20.250 50.000 70.000 40.159 80.396 70.677 50.021 70.000 70.500 10.000 21.000 10.442 50.125 80.000 10.000 60.000 50.000 70.333 10.000 50.528 20.000 40.000 60.000 60.000 30.000 30.200 80.000 10.516 60.000 10.000 50.500 50.000 10.833 40.000 10.000 10.286 70.083 70.750 50.000 30.000 60.000 10.000 50.000 10.059 80.445 40.200 50.535 70.070 50.167 70.385 60.375 60.000 40.000 10.333 60.000 10.000 20.000 30.500 50.000 30.000 40.000 20.200 40.000 30.000 20.000 50.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.275 80.466 80.218 70.110 80.783 70.383 80.783 70.829 70.367 70.168 80.305 80.000 60.661 80.413 80.869 50.719 30.546 80.997 40.685 60.841 70.555 80.277 70.768 30.132 60.779 70.448 60.364 80.212 80.161 70.768 50.692 80.000 70.395 60.500 50.000 40.450 80.591 61.000 10.020 80.000 60.423 80.007 80.625 50.420 50.505 61.000 10.353 50.119 80.571 70.819 50.014 61.000 10.774 40.689 70.311 80.866 40.067 60.000 50.400 50.000 80.278 80.501 61.000 10.000 30.162 80.584 80.286 60.206 80.125 40.000 30.084 70.649 50.000 60.000 30.000 20.000 50.000 50.125 70.312 70.727 60.221 30.000 60.667 70.114 60.000 20.000 60.000 40.065 80.000 10.004 70.278 60.000 30.000 60.500 20.000 70.571 20.000 80.250 70.019 80.145 80.000 70.667 40.200 70.000 50.000 20.200 50.258 70.000 20.000 70.000 70.000 40.369 70.429 60.613 70.000 80.000 70.500 10.000 20.500 70.333 70.500 70.000 10.106 30.000 50.000 70.000 40.000 50.333 50.000 40.000 60.000 60.000 30.000 30.918 10.000 10.638 20.000 10.000 50.750 10.000 10.833 40.000 10.000 10.143 80.000 80.750 50.000 30.000 60.000 10.000 50.000 10.063 70.377 70.200 50.222 80.055 70.500 20.677 30.250 70.000 40.000 10.500 30.000 10.000 20.000 30.500 50.000 30.000 40.000 20.115 80.000 30.000 20.000 50.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 60.529 60.225 60.155 60.810 50.625 60.798 60.940 20.372 60.217 60.484 60.000 60.927 60.528 50.826 80.694 40.605 61.000 10.731 20.846 60.716 60.350 50.589 80.123 70.857 50.457 50.578 60.376 70.183 50.765 60.800 60.000 70.278 70.500 50.000 40.659 40.569 71.000 10.093 60.000 60.539 60.010 60.578 80.378 70.571 51.000 10.337 60.252 30.530 80.814 60.000 70.744 60.743 60.746 50.346 60.863 50.067 60.000 50.400 50.167 50.667 50.488 71.000 10.000 30.208 70.783 50.166 70.375 40.071 70.000 30.200 40.607 70.000 60.000 30.000 20.000 51.000 10.500 10.517 30.716 70.221 30.000 60.706 60.085 80.000 20.000 60.000 40.077 70.000 10.063 60.278 60.000 30.000 60.500 20.083 60.181 80.515 50.286 60.144 40.219 50.042 20.582 60.400 50.000 50.000 20.000 80.305 50.000 20.000 70.036 60.000 40.413 60.500 50.533 80.250 50.200 40.500 10.000 21.000 10.472 31.000 10.000 10.000 60.000 50.250 30.000 40.000 50.333 50.000 40.000 60.000 60.000 30.000 30.600 60.000 10.594 30.000 10.000 50.500 50.000 10.647 80.000 10.000 10.429 60.333 30.500 80.000 30.000 60.000 10.000 50.000 10.069 60.696 20.050 80.556 60.031 80.042 80.750 20.250 70.000 40.000 10.630 20.000 10.000 20.000 30.500 50.000 30.000 40.000 20.400 20.000 30.000 20.000 50.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.