The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
ODIN - Ins200permissive0.451 20.637 30.407 20.277 20.742 70.699 40.855 20.826 70.626 20.441 20.742 40.003 30.941 40.637 20.910 30.616 60.679 40.944 50.695 30.877 40.763 20.357 30.723 50.475 20.779 60.494 20.782 30.795 20.334 20.824 10.867 30.108 40.701 20.638 20.000 30.873 20.749 30.667 70.203 20.500 40.886 10.116 20.583 60.571 20.688 21.000 10.760 10.162 41.000 10.852 30.078 40.833 50.887 20.778 20.577 20.859 50.550 10.000 40.542 30.028 60.667 40.874 11.000 10.125 10.232 50.870 20.406 30.337 40.167 20.000 20.671 20.742 30.500 10.000 20.000 10.528 21.000 10.417 50.597 20.872 10.275 10.000 50.800 30.850 20.000 20.528 20.000 30.215 40.000 10.238 10.667 20.000 30.019 40.250 51.000 10.429 50.599 20.778 30.221 20.370 10.284 10.278 70.400 40.125 20.000 10.200 40.404 30.000 10.250 40.714 10.500 10.504 40.769 10.677 40.750 10.963 10.500 10.000 20.500 60.333 61.000 10.000 10.000 50.438 10.500 10.000 41.000 10.333 40.226 20.250 30.250 20.000 30.000 20.668 40.000 10.494 60.000 10.000 40.750 10.000 10.833 30.000 10.000 10.777 40.333 30.944 20.000 20.333 10.000 11.000 10.000 10.089 40.407 50.600 10.823 20.080 30.264 40.469 50.717 10.000 30.000 10.500 20.000 10.000 10.000 21.000 10.125 20.333 10.000 20.200 30.000 20.000 21.000 10.000 1
Mask3D Scannet2000.445 30.653 20.392 30.254 30.844 30.746 30.818 30.888 50.556 30.262 30.890 20.025 21.000 10.608 30.930 20.694 40.721 20.930 70.686 40.966 10.615 60.440 20.725 40.201 30.890 30.414 60.827 20.552 30.158 70.806 20.924 10.042 50.512 40.412 70.226 10.604 50.830 21.000 10.125 30.792 10.815 30.097 30.648 20.551 40.354 61.000 10.630 30.241 31.000 10.853 20.204 20.974 40.841 30.778 20.358 40.927 10.300 30.045 20.640 10.363 20.745 30.710 31.000 10.000 30.330 20.943 10.315 40.600 21.000 10.027 10.080 70.556 70.500 10.409 10.000 10.194 31.000 10.500 10.493 40.761 40.053 60.042 40.780 40.454 30.009 10.333 30.050 10.321 20.000 10.084 30.552 40.008 20.027 30.750 10.500 30.442 40.657 10.765 40.120 40.183 50.021 41.000 10.510 30.016 30.000 10.400 20.619 10.000 10.396 10.290 30.000 30.741 20.699 31.000 10.260 30.017 50.125 70.000 20.792 50.399 51.000 10.000 10.049 40.265 20.063 50.000 41.000 10.335 30.381 10.500 10.250 20.004 20.000 20.727 30.000 10.538 40.000 10.188 20.677 40.000 10.930 20.000 10.000 10.966 10.391 20.908 30.000 20.028 20.000 11.000 10.000 10.152 10.451 30.458 20.971 10.573 10.606 10.167 70.625 20.004 20.000 10.058 70.000 10.000 11.000 11.000 10.000 30.056 20.000 20.200 30.309 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
DINO3D-Scannet200copyleft0.511 10.685 10.484 10.331 10.892 10.821 10.890 10.907 30.629 10.468 10.905 10.001 41.000 10.816 10.968 10.863 10.811 10.944 50.596 60.960 20.778 10.532 10.719 60.481 10.851 50.803 10.873 10.850 10.421 10.806 30.856 40.111 30.761 10.677 10.000 30.944 10.861 11.000 10.220 10.708 30.856 20.220 10.864 10.579 11.000 10.764 70.655 20.327 11.000 10.911 10.244 10.667 70.923 10.857 10.702 10.889 20.496 20.048 10.355 70.494 10.794 20.798 21.000 10.042 20.264 40.817 40.683 10.675 10.167 20.000 20.700 10.824 20.417 30.000 20.000 10.764 10.000 40.500 10.699 10.789 30.079 50.472 10.845 10.930 10.000 20.667 10.000 30.412 10.000 10.163 21.000 10.000 30.419 10.500 21.000 10.777 10.576 30.867 20.378 10.334 20.028 30.764 20.542 10.559 10.000 10.800 10.528 20.000 10.346 20.714 10.125 20.756 10.754 20.866 20.750 10.600 20.500 10.500 11.000 10.667 11.000 10.000 10.298 10.000 40.250 20.194 20.000 40.850 10.000 40.250 30.595 10.000 30.063 10.860 20.000 10.714 10.000 10.944 10.750 10.000 10.974 10.000 10.000 10.857 20.655 10.719 60.250 10.014 30.000 11.000 10.000 10.142 20.744 10.200 40.746 30.436 20.221 50.798 10.500 40.011 10.000 10.385 40.000 10.000 10.000 20.792 30.663 10.000 30.000 20.200 30.000 20.000 21.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
TD3D Scannet200permissive0.379 40.603 40.306 40.190 40.885 20.755 20.800 40.958 10.390 40.260 40.866 30.232 10.979 30.523 50.869 50.559 70.689 31.000 10.795 10.905 30.748 30.173 70.825 10.173 40.970 10.457 30.615 40.456 40.200 30.621 60.906 20.553 10.517 30.510 30.220 20.715 30.706 41.000 10.113 40.792 10.717 40.073 40.635 30.557 30.638 31.000 10.205 70.146 51.000 10.769 70.186 31.000 10.710 70.778 20.415 30.834 60.226 40.021 30.590 20.356 30.817 10.477 71.000 10.000 30.635 10.843 30.427 20.270 60.125 40.000 20.102 51.000 10.125 40.000 20.000 10.000 40.000 40.125 60.370 50.622 70.221 20.196 30.836 20.288 40.000 20.093 40.020 20.294 30.000 10.075 40.667 20.038 10.111 20.250 50.000 60.526 30.495 50.908 10.111 50.259 30.003 50.667 30.045 70.000 40.000 10.400 20.274 50.000 10.274 30.226 40.000 30.520 30.302 70.731 30.103 50.458 30.500 10.000 21.000 10.472 20.792 50.000 10.088 30.061 30.250 20.009 30.250 30.333 40.181 30.396 20.051 40.012 10.000 20.458 60.000 10.424 70.000 10.101 30.390 70.000 10.833 30.000 10.000 10.857 20.222 51.000 10.000 20.003 40.000 10.000 40.000 10.102 30.275 70.400 30.735 40.061 50.433 30.533 40.625 20.000 30.000 10.259 60.000 10.000 10.000 20.500 40.000 30.000 31.000 10.600 10.000 20.250 10.000 40.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.280 60.488 60.192 70.124 60.804 50.518 60.772 70.904 40.337 70.191 60.443 60.000 50.861 60.502 60.868 60.669 50.587 60.997 30.467 70.828 70.732 40.342 50.745 30.119 70.918 20.404 70.419 60.398 50.172 50.618 70.743 60.167 20.077 70.500 40.000 30.568 60.506 71.000 10.044 60.000 50.502 60.010 60.593 50.284 70.305 70.903 60.213 60.142 60.981 50.790 60.000 61.000 10.715 60.538 70.346 60.830 70.067 50.000 40.400 40.074 50.333 60.551 41.000 10.000 30.292 30.777 60.118 70.317 50.100 60.000 20.191 40.648 50.000 50.000 20.000 10.000 40.000 40.500 10.213 70.825 20.021 70.333 20.648 70.098 60.000 20.000 50.000 30.077 50.000 10.000 70.150 70.000 30.000 50.000 70.225 40.281 60.447 60.000 70.090 60.148 60.000 60.479 60.542 10.000 40.000 10.200 40.131 70.000 10.250 40.000 60.000 30.159 70.396 60.677 40.021 60.000 60.500 10.000 21.000 10.442 40.125 70.000 10.000 50.000 40.000 60.333 10.000 40.528 20.000 40.000 50.000 50.000 30.000 20.200 70.000 10.516 50.000 10.000 40.500 50.000 10.833 30.000 10.000 10.286 60.083 60.750 40.000 20.000 50.000 10.000 40.000 10.059 70.445 40.200 40.535 60.070 40.167 60.385 60.375 50.000 30.000 10.333 50.000 10.000 10.000 20.500 40.000 30.000 30.000 20.200 30.000 20.000 20.000 40.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.275 70.466 70.218 60.110 70.783 60.383 70.783 60.829 60.367 60.168 70.305 70.000 50.661 70.413 70.869 40.719 20.546 70.997 30.685 50.841 60.555 70.277 60.768 20.132 50.779 60.448 50.364 70.212 70.161 60.768 40.692 70.000 60.395 50.500 40.000 30.450 70.591 51.000 10.020 70.000 50.423 70.007 70.625 40.420 50.505 51.000 10.353 40.119 70.571 60.819 40.014 51.000 10.774 40.689 60.311 70.866 30.067 50.000 40.400 40.000 70.278 70.501 51.000 10.000 30.162 70.584 70.286 50.206 70.125 40.000 20.084 60.649 40.000 50.000 20.000 10.000 40.000 40.125 60.312 60.727 50.221 30.000 50.667 60.114 50.000 20.000 50.000 30.065 70.000 10.004 60.278 50.000 30.000 50.500 20.000 60.571 20.000 70.250 60.019 70.145 70.000 60.667 30.200 60.000 40.000 10.200 40.258 60.000 10.000 60.000 60.000 30.369 60.429 50.613 60.000 70.000 60.500 10.000 20.500 60.333 60.500 60.000 10.106 20.000 40.000 60.000 40.000 40.333 40.000 40.000 50.000 50.000 30.000 20.918 10.000 10.638 20.000 10.000 40.750 10.000 10.833 30.000 10.000 10.143 70.000 70.750 40.000 20.000 50.000 10.000 40.000 10.063 60.377 60.200 40.222 70.055 60.500 20.677 30.250 60.000 30.000 10.500 20.000 10.000 10.000 20.500 40.000 30.000 30.000 20.115 70.000 20.000 20.000 40.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 50.529 50.225 50.155 50.810 40.625 50.798 50.940 20.372 50.217 50.484 50.000 50.927 50.528 40.826 70.694 30.605 51.000 10.731 20.846 50.716 50.350 40.589 70.123 60.857 40.457 40.578 50.376 60.183 40.765 50.800 50.000 60.278 60.500 40.000 30.659 40.569 61.000 10.093 50.000 50.539 50.010 50.578 70.378 60.571 41.000 10.337 50.252 20.530 70.814 50.000 60.744 60.743 50.746 50.346 50.863 40.067 50.000 40.400 40.167 40.667 40.488 61.000 10.000 30.208 60.783 50.166 60.375 30.071 70.000 20.200 30.607 60.000 50.000 20.000 10.000 41.000 10.500 10.517 30.716 60.221 30.000 50.706 50.085 70.000 20.000 50.000 30.077 60.000 10.063 50.278 50.000 30.000 50.500 20.083 50.181 70.515 40.286 50.144 30.219 40.042 20.582 50.400 40.000 40.000 10.000 70.305 40.000 10.000 60.036 50.000 30.413 50.500 40.533 70.250 40.200 40.500 10.000 21.000 10.472 21.000 10.000 10.000 50.000 40.250 20.000 40.000 40.333 40.000 40.000 50.000 50.000 30.000 20.600 50.000 10.594 30.000 10.000 40.500 50.000 10.647 70.000 10.000 10.429 50.333 30.500 70.000 20.000 50.000 10.000 40.000 10.069 50.696 20.050 70.556 50.031 70.042 70.750 20.250 60.000 30.000 10.630 10.000 10.000 10.000 20.500 40.000 30.000 30.000 20.400 20.000 20.000 20.000 40.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.