The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Volt-SPFormerpermissive0.527 10.731 10.475 20.342 10.826 50.803 20.942 10.950 20.594 30.321 30.867 40.008 40.994 40.767 20.926 40.874 10.815 21.000 10.810 10.973 20.856 10.510 30.825 10.346 30.923 20.799 20.843 20.812 20.262 30.923 10.921 30.279 20.901 10.500 50.000 40.801 30.937 11.000 10.329 10.000 60.903 20.076 40.789 20.565 30.907 21.000 10.614 50.413 21.000 10.937 10.214 20.629 90.878 30.725 60.579 30.880 30.433 30.020 50.400 50.547 11.000 10.843 21.000 10.125 10.343 20.855 30.750 10.449 41.000 10.057 10.700 10.802 40.500 20.850 20.011 20.047 51.000 11.000 10.715 10.875 10.255 20.099 50.857 20.738 30.000 20.056 60.025 20.372 20.250 10.279 20.667 30.002 30.000 60.250 60.500 31.000 10.391 80.737 60.309 30.397 10.000 70.817 20.542 10.557 21.000 10.400 20.681 10.000 20.500 20.519 40.500 10.773 20.818 20.884 20.656 30.510 30.500 10.000 21.000 10.472 31.000 10.000 10.027 60.000 50.331 30.000 41.000 11.000 10.000 40.500 10.304 20.000 30.000 31.000 10.000 10.714 10.000 10.677 20.750 10.000 10.944 20.000 10.000 11.000 10.764 10.833 50.250 20.278 20.000 11.000 10.000 10.103 40.753 10.600 10.508 80.638 10.167 70.458 60.741 10.019 20.000 10.850 20.000 10.000 21.000 11.000 10.000 30.028 40.000 20.200 40.000 30.250 11.000 10.000 1
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
DINO3D-Scannet200copyleft0.511 20.685 20.484 10.331 20.892 20.821 10.890 20.907 40.629 10.468 10.905 10.001 61.000 10.816 10.968 20.863 20.811 30.944 70.596 80.960 40.778 20.532 20.719 80.481 10.851 70.803 10.873 10.850 10.421 10.806 50.856 60.111 50.761 20.677 10.000 40.944 10.861 31.000 10.220 20.708 30.856 40.220 10.864 10.579 11.000 10.764 90.655 30.327 31.000 10.911 20.244 10.667 80.923 10.857 10.702 10.889 20.496 20.048 20.355 90.494 20.794 30.798 31.000 10.042 30.264 60.817 50.683 20.675 10.167 30.000 40.700 10.824 30.417 50.000 40.000 30.764 10.000 60.500 20.699 20.789 50.079 70.472 10.845 30.930 10.000 20.667 10.000 50.412 10.000 20.163 41.000 10.000 40.419 10.500 21.000 10.777 20.576 30.867 30.378 20.334 30.028 30.764 30.542 10.559 10.000 30.800 10.528 30.000 20.346 40.714 10.125 40.756 30.754 40.866 30.750 10.600 20.500 10.500 11.000 10.667 11.000 10.000 10.298 10.000 50.250 40.194 20.000 60.850 20.000 40.250 50.595 10.000 30.063 10.860 30.000 10.714 10.000 10.944 10.750 10.000 10.974 10.000 10.000 10.857 30.655 20.719 80.250 20.014 50.000 11.000 10.000 10.142 30.744 20.200 60.746 30.436 30.221 60.798 10.500 60.011 30.000 10.385 60.000 10.000 20.000 40.792 50.663 10.000 50.000 20.200 40.000 30.000 31.000 10.000 1
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
ODIN - Ins200permissive0.451 40.637 50.407 30.277 40.742 90.699 60.855 40.826 90.626 20.441 20.742 60.003 50.941 60.637 40.910 50.616 80.679 60.944 70.695 50.877 60.763 30.357 50.723 70.475 20.779 80.494 40.782 50.795 30.334 20.824 30.867 50.108 60.701 40.638 20.000 40.873 20.749 50.667 90.203 30.500 40.886 30.116 20.583 80.571 20.688 41.000 10.760 10.162 61.000 10.852 40.078 60.833 50.887 20.778 20.577 40.859 70.550 10.000 60.542 40.028 80.667 60.874 11.000 10.125 10.232 70.870 20.406 50.337 60.167 30.000 40.671 30.742 50.500 20.000 40.000 30.528 21.000 10.417 70.597 30.872 20.275 10.000 70.800 50.850 20.000 20.528 20.000 50.215 60.000 20.238 30.667 30.000 40.019 50.250 61.000 10.429 60.599 20.778 40.221 40.370 20.284 10.278 90.400 60.125 30.000 30.200 60.404 50.000 20.250 60.714 10.500 10.504 60.769 30.677 60.750 10.963 10.500 10.000 20.500 80.333 81.000 10.000 10.000 70.438 10.500 10.000 41.000 10.333 60.226 20.250 50.250 40.000 30.000 30.668 60.000 10.494 80.000 10.000 60.750 10.000 10.833 50.000 10.000 10.777 60.333 40.944 30.000 40.333 10.000 11.000 10.000 10.089 60.407 70.600 10.823 20.080 50.264 50.469 50.717 20.000 50.000 10.500 40.000 10.000 20.000 41.000 10.125 20.333 10.000 20.200 40.000 30.000 31.000 10.000 1
TD3D Scannet200permissive0.379 60.603 60.306 60.190 60.885 30.755 30.800 60.958 10.390 50.260 60.866 50.232 20.979 50.523 70.869 70.559 90.689 51.000 10.795 20.905 50.748 50.173 90.825 20.173 60.970 10.457 50.615 60.456 60.200 50.621 80.906 40.553 10.517 50.510 40.220 20.715 40.706 61.000 10.113 60.792 10.717 60.073 50.635 50.557 40.638 51.000 10.205 90.146 71.000 10.769 90.186 51.000 10.710 90.778 20.415 50.834 80.226 60.021 40.590 30.356 40.817 20.477 91.000 10.000 40.635 10.843 40.427 40.270 80.125 50.000 40.102 71.000 10.125 60.000 40.000 30.000 60.000 60.125 80.370 70.622 90.221 30.196 30.836 40.288 60.000 20.093 40.020 30.294 40.000 20.075 60.667 30.038 10.111 30.250 60.000 80.526 40.495 60.908 10.111 70.259 40.003 60.667 50.045 90.000 60.000 30.400 20.274 70.000 20.274 50.226 60.000 50.520 50.302 90.731 50.103 70.458 40.500 10.000 21.000 10.472 30.792 70.000 10.088 40.061 40.250 40.009 30.250 40.333 60.181 30.396 40.051 60.012 10.000 30.458 80.000 10.424 90.000 10.101 50.390 90.000 10.833 50.000 10.000 10.857 30.222 61.000 10.000 40.003 60.000 10.000 60.000 10.102 50.275 90.400 40.735 40.061 70.433 30.533 40.625 30.000 50.000 10.259 80.000 10.000 20.000 40.500 60.000 30.000 51.000 10.600 10.000 30.250 10.000 60.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.445 50.653 40.392 50.254 50.844 40.746 40.818 50.888 60.556 40.262 50.890 20.025 31.000 10.608 50.930 30.694 60.721 40.930 90.686 60.966 30.615 80.440 40.725 60.201 50.890 50.414 80.827 30.552 50.158 90.806 40.924 20.042 70.512 60.412 90.226 10.604 60.830 41.000 10.125 40.792 10.815 50.097 30.648 40.551 50.354 81.000 10.630 40.241 51.000 10.853 30.204 40.974 40.841 40.778 20.358 60.927 10.300 40.045 30.640 20.363 30.745 50.710 41.000 10.000 40.330 30.943 10.315 60.600 21.000 10.027 20.080 90.556 90.500 20.409 30.000 30.194 41.000 10.500 20.493 50.761 60.053 80.042 60.780 60.454 50.009 10.333 30.050 10.321 30.000 20.084 50.552 60.008 20.027 40.750 10.500 30.442 50.657 10.765 50.120 60.183 70.021 41.000 10.510 50.016 50.000 30.400 20.619 20.000 20.396 30.290 50.000 50.741 40.699 51.000 10.260 50.017 70.125 90.000 20.792 70.399 71.000 10.000 10.049 50.265 30.063 70.000 41.000 10.335 50.381 10.500 10.250 40.004 20.000 30.727 50.000 10.538 60.000 10.188 40.677 50.000 10.930 30.000 10.000 10.966 20.391 30.908 40.000 40.028 40.000 11.000 10.000 10.152 20.451 40.458 30.971 10.573 20.606 10.167 90.625 30.004 40.000 10.058 90.000 10.000 21.000 11.000 10.000 30.056 30.000 20.200 40.309 10.000 31.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
Minkowski 34D Inst.permissive0.280 80.488 80.192 90.124 80.804 70.518 80.772 90.904 50.337 90.191 80.443 80.000 70.861 80.502 80.868 80.669 70.587 80.997 50.467 90.828 90.732 60.342 70.745 50.119 90.918 30.404 90.419 80.398 70.172 70.618 90.743 80.167 40.077 90.500 50.000 40.568 70.506 91.000 10.044 80.000 60.502 80.010 80.593 70.284 90.305 90.903 80.213 80.142 80.981 60.790 80.000 81.000 10.715 80.538 90.346 80.830 90.067 70.000 60.400 50.074 70.333 80.551 61.000 10.000 40.292 50.777 70.118 90.317 70.100 70.000 40.191 60.648 70.000 70.000 40.000 30.000 60.000 60.500 20.213 90.825 40.021 90.333 20.648 90.098 80.000 20.000 70.000 50.077 70.000 20.000 90.150 90.000 40.000 60.000 90.225 60.281 80.447 70.000 90.090 80.148 80.000 70.479 80.542 10.000 60.000 30.200 60.131 90.000 20.250 60.000 80.000 50.159 90.396 80.677 60.021 80.000 80.500 10.000 21.000 10.442 60.125 90.000 10.000 70.000 50.000 80.333 10.000 60.528 30.000 40.000 70.000 70.000 30.000 30.200 90.000 10.516 70.000 10.000 60.500 60.000 10.833 50.000 10.000 10.286 80.083 80.750 60.000 40.000 70.000 10.000 60.000 10.059 90.445 50.200 60.535 70.070 60.167 70.385 70.375 70.000 50.000 10.333 70.000 10.000 20.000 40.500 60.000 30.000 50.000 20.200 40.000 30.000 30.000 60.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CompetitorFormer-2000.469 30.676 30.401 40.296 30.901 10.729 50.885 30.829 70.380 60.320 40.873 30.400 10.998 30.711 30.980 10.847 30.854 11.000 10.696 40.989 10.759 40.556 10.806 30.240 40.918 30.650 30.818 40.629 40.224 40.839 20.933 10.247 30.711 30.540 30.021 30.543 80.900 20.903 80.118 50.125 50.916 10.057 60.692 30.410 70.747 31.000 10.664 20.424 10.933 70.839 50.207 30.703 70.748 60.700 70.610 20.869 40.270 50.068 10.878 10.244 50.794 30.698 51.000 10.000 40.325 40.770 80.482 30.452 30.025 90.015 30.293 40.829 20.663 11.000 10.013 10.385 30.250 50.500 20.491 60.850 30.214 60.131 40.878 10.617 40.000 20.085 50.009 40.278 50.000 20.295 11.000 10.000 40.160 20.500 20.500 30.342 70.534 40.901 20.474 10.222 50.011 50.724 40.542 10.125 30.083 20.336 50.500 40.083 10.565 10.587 30.500 10.827 10.829 10.750 40.508 40.018 60.500 10.000 21.000 10.667 11.000 10.000 10.173 20.286 20.500 10.000 40.125 50.489 40.000 40.500 10.269 30.000 30.050 20.834 40.000 10.581 50.000 10.677 20.467 80.000 10.886 40.000 10.000 10.820 50.144 71.000 11.000 10.103 30.000 11.000 10.000 10.175 10.410 60.330 50.701 50.257 40.292 40.285 80.574 50.157 10.000 10.863 10.000 10.056 10.250 31.000 10.000 30.109 20.000 20.400 20.025 20.000 31.000 10.000 1
CSC-Pretrain Inst.permissive0.275 90.466 90.218 80.110 90.783 80.383 90.783 80.829 80.367 80.168 90.305 90.000 70.661 90.413 90.869 60.719 40.546 90.997 50.685 70.841 80.555 90.277 80.768 40.132 70.779 80.448 70.364 90.212 90.161 80.768 60.692 90.000 80.395 70.500 50.000 40.450 90.591 71.000 10.020 90.000 60.423 90.007 90.625 60.420 60.505 71.000 10.353 60.119 90.571 80.819 60.014 71.000 10.774 50.689 80.311 90.866 50.067 70.000 60.400 50.000 90.278 90.501 71.000 10.000 40.162 90.584 90.286 70.206 90.125 50.000 40.084 80.649 60.000 70.000 40.000 30.000 60.000 60.125 80.312 80.727 70.221 40.000 70.667 80.114 70.000 20.000 70.000 50.065 90.000 20.004 80.278 70.000 40.000 60.500 20.000 80.571 30.000 90.250 80.019 90.145 90.000 70.667 50.200 80.000 60.000 30.200 60.258 80.000 20.000 80.000 80.000 50.369 80.429 70.613 80.000 90.000 80.500 10.000 20.500 80.333 80.500 80.000 10.106 30.000 50.000 80.000 40.000 60.333 60.000 40.000 70.000 70.000 30.000 30.918 20.000 10.638 30.000 10.000 60.750 10.000 10.833 50.000 10.000 10.143 90.000 90.750 60.000 40.000 70.000 10.000 60.000 10.063 80.377 80.200 60.222 90.055 80.500 20.677 30.250 80.000 50.000 10.500 40.000 10.000 20.000 40.500 60.000 30.000 50.000 20.115 90.000 30.000 30.000 60.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 70.529 70.225 70.155 70.810 60.625 70.798 70.940 30.372 70.217 70.484 70.000 70.927 70.528 60.826 90.694 50.605 71.000 10.731 30.846 70.716 70.350 60.589 90.123 80.857 60.457 60.578 70.376 80.183 60.765 70.800 70.000 80.278 80.500 50.000 40.659 50.569 81.000 10.093 70.000 60.539 70.010 70.578 90.378 80.571 61.000 10.337 70.252 40.530 90.814 70.000 80.744 60.743 70.746 50.346 70.863 60.067 70.000 60.400 50.167 60.667 60.488 81.000 10.000 40.208 80.783 60.166 80.375 50.071 80.000 40.200 50.607 80.000 70.000 40.000 30.000 61.000 10.500 20.517 40.716 80.221 40.000 70.706 70.085 90.000 20.000 70.000 50.077 80.000 20.063 70.278 70.000 40.000 60.500 20.083 70.181 90.515 50.286 70.144 50.219 60.042 20.582 70.400 60.000 60.000 30.000 90.305 60.000 20.000 80.036 70.000 50.413 70.500 60.533 90.250 60.200 50.500 10.000 21.000 10.472 31.000 10.000 10.000 70.000 50.250 40.000 40.000 60.333 60.000 40.000 70.000 70.000 30.000 30.600 70.000 10.594 40.000 10.000 60.500 60.000 10.647 90.000 10.000 10.429 70.333 40.500 90.000 40.000 70.000 10.000 60.000 10.069 70.696 30.050 90.556 60.031 90.042 90.750 20.250 80.000 50.000 10.630 30.000 10.000 20.000 40.500 60.000 30.000 50.000 20.400 20.000 30.000 30.000 60.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.