The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.445 50.653 40.392 50.254 50.648 40.097 30.125 90.000 20.000 40.000 10.657 10.971 10.451 41.000 11.000 10.640 20.500 30.045 31.000 10.241 50.409 30.363 30.440 40.686 60.300 40.000 40.201 50.000 10.009 10.290 50.556 41.000 10.000 10.063 70.000 30.830 40.573 20.844 40.333 30.204 40.058 90.158 90.552 60.056 30.000 11.000 10.725 60.750 10.927 11.000 10.888 60.042 70.120 60.615 80.226 10.250 40.890 20.792 10.677 50.510 50.818 50.699 50.512 60.167 90.125 40.315 60.943 10.309 10.017 70.200 40.000 20.188 40.000 50.183 70.815 51.000 10.827 30.741 40.442 50.414 80.600 20.000 20.000 10.458 30.049 50.321 30.381 10.000 30.908 40.400 20.841 40.260 50.710 40.966 30.265 30.000 10.924 20.152 20.025 30.500 20.027 20.028 41.000 10.556 90.016 50.080 90.500 20.694 60.608 50.084 50.604 60.194 40.538 60.000 10.500 10.000 30.354 80.000 31.000 10.000 30.761 60.930 30.053 80.890 51.000 10.008 20.262 50.358 61.000 11.000 10.792 70.966 21.000 10.765 50.004 20.930 30.780 60.330 30.027 40.625 30.974 40.050 10.412 90.021 40.000 40.000 20.778 20.000 20.000 10.493 50.746 40.454 50.335 50.396 30.930 90.551 51.000 10.552 50.606 10.853 30.000 10.004 40.806 41.000 10.727 50.000 10.042 60.745 50.000 10.399 70.391 30.630 40.721 40.619 2
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
ODIN - Ins200permissive0.451 40.637 50.407 30.277 40.583 80.116 20.500 10.000 20.125 10.000 10.599 20.823 20.407 70.667 90.941 60.542 41.000 10.000 61.000 10.162 60.000 40.028 80.357 50.695 50.550 10.000 40.475 20.000 10.000 20.714 10.626 21.000 10.000 10.500 10.125 20.749 50.080 50.742 90.528 20.078 60.500 40.334 20.667 30.333 10.000 10.278 90.723 70.250 60.859 71.000 10.826 90.108 60.221 40.763 30.000 40.250 40.742 60.500 40.750 10.400 60.855 40.769 30.701 40.469 50.203 30.406 50.870 20.000 30.963 10.200 40.000 20.000 60.500 10.370 20.886 31.000 10.782 50.504 60.429 60.494 40.337 60.000 20.000 10.600 10.000 70.215 60.226 20.000 30.944 30.200 60.887 20.750 10.874 10.877 60.438 10.000 10.867 50.089 60.003 50.500 20.000 40.333 11.000 10.742 50.125 30.671 30.417 70.616 80.637 40.238 30.873 20.528 20.494 80.000 10.250 50.000 30.688 40.000 31.000 10.000 30.872 20.833 50.275 10.779 81.000 10.000 40.441 20.577 40.167 31.000 10.500 80.777 60.000 40.778 40.000 30.910 50.800 50.232 70.019 50.717 20.833 50.000 50.638 20.284 10.000 40.000 20.778 20.000 20.000 10.597 30.699 60.850 20.333 60.250 60.944 70.571 20.677 60.795 30.264 50.852 40.000 10.000 50.824 31.000 10.668 60.000 10.000 70.667 60.000 10.333 80.333 40.760 10.679 60.404 5
Volt-SPFormerpermissive0.527 10.731 10.475 20.342 10.789 20.076 40.500 10.000 20.125 10.000 10.391 80.508 80.753 11.000 10.994 40.400 50.500 30.020 51.000 10.413 20.850 20.547 10.510 30.810 10.433 30.250 20.346 30.000 10.000 20.519 40.594 31.000 10.000 10.331 30.000 30.937 10.638 10.826 50.056 60.214 20.850 20.262 30.667 30.028 40.000 10.817 20.825 10.250 60.880 31.000 10.950 20.279 20.309 30.856 10.000 40.304 20.867 40.000 60.750 10.542 10.942 10.818 20.901 10.458 60.329 10.750 10.855 30.000 30.510 30.200 40.000 20.677 20.500 10.397 10.903 21.000 10.843 20.773 21.000 10.799 20.449 40.250 10.000 10.600 10.027 60.372 20.000 41.000 10.833 50.400 20.878 30.656 30.843 20.973 20.000 50.000 10.921 30.103 40.008 40.500 20.057 10.278 21.000 10.802 40.557 20.700 11.000 10.874 10.767 20.279 20.801 30.047 50.714 10.000 10.500 10.250 10.907 20.000 31.000 10.011 20.875 10.944 20.255 20.923 21.000 10.002 30.321 30.579 31.000 11.000 11.000 11.000 11.000 10.737 60.000 30.926 40.857 20.343 20.000 60.741 10.629 90.025 20.500 50.000 70.000 40.000 20.725 60.000 20.000 10.715 10.803 20.738 31.000 10.500 21.000 10.565 30.884 20.812 20.167 70.937 10.000 10.019 20.923 11.000 11.000 10.000 10.099 51.000 10.000 10.472 30.764 10.614 50.815 20.681 1
Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe: Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding.
TD3D Scannet200permissive0.379 60.603 60.306 60.190 60.635 50.073 50.500 10.000 20.000 40.000 10.495 60.735 40.275 91.000 10.979 50.590 30.000 80.021 40.000 60.146 70.000 40.356 40.173 90.795 20.226 60.000 40.173 60.000 10.000 20.226 60.390 50.000 60.000 10.250 40.000 30.706 60.061 70.885 30.093 40.186 50.259 80.200 50.667 30.000 50.000 10.667 50.825 20.250 60.834 81.000 10.958 10.553 10.111 70.748 50.220 20.051 60.866 50.792 10.390 90.045 90.800 60.302 90.517 50.533 40.113 60.427 40.843 40.000 30.458 40.600 10.000 20.101 50.000 50.259 40.717 60.500 60.615 60.520 50.526 40.457 50.270 80.000 20.000 10.400 40.088 40.294 40.181 30.000 31.000 10.400 20.710 90.103 70.477 90.905 50.061 40.000 10.906 40.102 50.232 20.125 60.000 40.003 60.792 71.000 10.000 60.102 70.125 80.559 90.523 70.075 60.715 40.000 60.424 90.000 10.396 40.250 10.638 50.000 30.000 60.000 30.622 90.833 50.221 30.970 10.250 40.038 10.260 60.415 50.125 51.000 11.000 10.857 30.000 40.908 10.012 10.869 70.836 40.635 10.111 30.625 31.000 10.020 30.510 40.003 60.009 31.000 10.778 20.000 20.000 10.370 70.755 30.288 60.333 60.274 51.000 10.557 40.731 50.456 60.433 30.769 90.000 10.000 50.621 81.000 10.458 80.000 10.196 30.817 20.000 10.472 30.222 60.205 90.689 50.274 7
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
DINO3D-Scannet200copyleft0.511 20.685 20.484 10.331 20.864 10.220 10.500 10.000 20.042 30.000 10.576 30.746 30.744 21.000 11.000 10.355 91.000 10.048 20.000 60.327 30.000 40.494 20.532 20.596 80.496 20.250 20.481 10.000 10.000 20.714 10.629 11.000 10.000 10.250 40.663 10.861 30.436 30.892 20.667 10.244 10.385 60.421 11.000 10.000 50.000 10.764 30.719 80.500 20.889 21.000 10.907 40.111 50.378 20.778 20.000 40.595 10.905 10.708 30.750 10.542 10.890 20.754 40.761 20.798 10.220 20.683 20.817 50.000 30.600 20.200 40.500 10.944 10.125 40.334 30.856 40.792 50.873 10.756 30.777 20.803 10.675 10.000 20.000 10.200 60.298 10.412 10.000 40.000 30.719 80.800 10.923 10.750 10.798 30.960 40.000 50.000 10.856 60.142 30.001 60.417 50.000 40.014 51.000 10.824 30.559 10.700 10.500 20.863 20.816 10.163 40.944 10.764 10.714 10.000 10.250 50.000 31.000 10.063 11.000 10.000 30.789 50.974 10.079 70.851 70.000 60.000 40.468 10.702 10.167 31.000 11.000 10.857 30.000 40.867 30.000 30.968 20.845 30.264 60.419 10.500 60.667 80.000 50.677 10.028 30.194 20.000 20.857 10.000 20.000 10.699 20.821 10.930 10.850 20.346 40.944 70.579 10.866 30.850 10.221 60.911 20.000 10.011 30.806 50.764 90.860 30.000 10.472 10.794 30.000 10.667 10.655 20.655 30.811 30.528 3
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
LGround Inst.permissive0.314 70.529 70.225 70.155 70.578 90.010 70.500 10.000 20.000 40.000 10.515 50.556 60.696 31.000 10.927 70.400 50.083 70.000 61.000 10.252 40.000 40.167 60.350 60.731 30.067 70.000 40.123 80.000 10.000 20.036 70.372 70.000 60.000 10.250 40.000 30.569 80.031 90.810 60.000 70.000 80.630 30.183 60.278 70.000 50.000 10.582 70.589 90.500 20.863 61.000 10.940 30.000 80.144 50.716 70.000 40.000 70.484 70.000 60.500 60.400 60.798 70.500 60.278 80.750 20.093 70.166 80.783 60.000 30.200 50.400 20.000 20.000 60.000 50.219 60.539 70.500 60.578 70.413 70.181 90.457 60.375 50.000 20.000 10.050 90.000 70.077 80.000 40.000 30.500 90.000 90.743 70.250 60.488 80.846 70.000 50.000 10.800 70.069 70.000 70.000 70.000 40.000 71.000 10.607 80.000 60.200 50.500 20.694 50.528 60.063 70.659 50.000 60.594 40.000 10.000 70.000 30.571 60.000 30.000 60.000 30.716 80.647 90.221 40.857 60.000 60.000 40.217 70.346 70.071 80.530 91.000 10.429 70.000 40.286 70.000 30.826 90.706 70.208 80.000 60.250 80.744 60.000 50.500 50.042 20.000 40.000 20.746 50.000 20.000 10.517 40.625 70.085 90.333 60.000 81.000 10.378 80.533 90.376 80.042 90.814 70.000 10.000 50.765 71.000 10.600 70.000 10.000 70.667 60.000 10.472 30.333 40.337 70.605 70.305 6
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
Minkowski 34D Inst.permissive0.280 80.488 80.192 90.124 80.593 70.010 80.500 10.000 20.000 40.000 10.447 70.535 70.445 51.000 10.861 80.400 50.225 60.000 60.000 60.142 80.000 40.074 70.342 70.467 90.067 70.000 40.119 90.000 10.000 20.000 80.337 90.000 60.000 10.000 80.000 30.506 90.070 60.804 70.000 70.000 80.333 70.172 70.150 90.000 50.000 10.479 80.745 50.000 90.830 91.000 10.904 50.167 40.090 80.732 60.000 40.000 70.443 80.000 60.500 60.542 10.772 90.396 80.077 90.385 70.044 80.118 90.777 70.000 30.000 80.200 40.000 20.000 60.000 50.148 80.502 80.500 60.419 80.159 90.281 80.404 90.317 70.000 20.000 10.200 60.000 70.077 70.000 40.000 30.750 60.200 60.715 80.021 80.551 60.828 90.000 50.000 10.743 80.059 90.000 70.000 70.000 40.000 70.125 90.648 70.000 60.191 60.500 20.669 70.502 80.000 90.568 70.000 60.516 70.000 10.000 70.000 30.305 90.000 30.000 60.000 30.825 40.833 50.021 90.918 30.000 60.000 40.191 80.346 80.100 70.981 61.000 10.286 80.000 40.000 90.000 30.868 80.648 90.292 50.000 60.375 71.000 10.000 50.500 50.000 70.333 10.000 20.538 90.000 20.000 10.213 90.518 80.098 80.528 30.250 60.997 50.284 90.677 60.398 70.167 70.790 80.000 10.000 50.618 90.903 80.200 90.000 10.333 20.333 80.000 10.442 60.083 80.213 80.587 80.131 9
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CompetitorFormer-2000.469 30.676 30.401 40.296 30.692 30.057 60.500 10.083 10.000 40.000 10.534 40.701 50.410 60.903 80.998 30.878 10.500 30.068 10.250 50.424 11.000 10.244 50.556 10.696 40.270 51.000 10.240 40.000 10.000 20.587 30.380 61.000 10.000 10.500 10.000 30.900 20.257 40.901 10.085 50.207 30.863 10.224 41.000 10.109 20.000 10.724 40.806 30.500 20.869 41.000 10.829 70.247 30.474 10.759 40.021 30.269 30.873 30.125 50.467 80.542 10.885 30.829 10.711 30.285 80.118 50.482 30.770 80.025 20.018 60.400 20.000 20.677 20.500 10.222 50.916 11.000 10.818 40.827 10.342 70.650 30.452 30.000 20.000 10.330 50.173 20.278 50.000 40.083 21.000 10.336 50.748 60.508 40.698 50.989 10.286 20.000 10.933 10.175 10.400 10.663 10.015 30.103 31.000 10.829 20.125 30.293 40.500 20.847 30.711 30.295 10.543 80.385 30.581 50.000 10.500 10.000 30.747 30.050 21.000 10.013 10.850 30.886 40.214 60.918 30.125 50.000 40.320 40.610 20.025 90.933 71.000 10.820 50.250 30.901 20.000 30.980 10.878 10.325 40.160 20.574 50.703 70.009 40.540 30.011 50.000 40.000 20.700 70.056 10.000 10.491 60.729 50.617 40.489 40.565 11.000 10.410 70.750 40.629 40.292 40.839 50.000 10.157 10.839 21.000 10.834 40.000 10.131 40.794 30.000 10.667 10.144 70.664 20.854 10.500 4
CSC-Pretrain Inst.permissive0.275 90.466 90.218 80.110 90.625 60.007 90.500 10.000 20.000 40.000 10.000 90.222 90.377 81.000 10.661 90.400 50.000 80.000 60.000 60.119 90.000 40.000 90.277 80.685 70.067 70.000 40.132 70.000 10.000 20.000 80.367 80.000 60.000 10.000 80.000 30.591 70.055 80.783 80.000 70.014 70.500 40.161 80.278 70.000 50.000 10.667 50.768 40.500 20.866 51.000 10.829 80.000 80.019 90.555 90.000 40.000 70.305 90.000 60.750 10.200 80.783 80.429 70.395 70.677 30.020 90.286 70.584 90.000 30.000 80.115 90.000 20.000 60.000 50.145 90.423 90.500 60.364 90.369 80.571 30.448 70.206 90.000 20.000 10.200 60.106 30.065 90.000 40.000 30.750 60.200 60.774 50.000 90.501 70.841 80.000 50.000 10.692 90.063 80.000 70.000 70.000 40.000 70.500 80.649 60.000 60.084 80.125 80.719 40.413 90.004 80.450 90.000 60.638 30.000 10.000 70.000 30.505 70.000 30.000 60.000 30.727 70.833 50.221 40.779 80.000 60.000 40.168 90.311 90.125 50.571 80.500 80.143 90.000 40.250 80.000 30.869 60.667 80.162 90.000 60.250 81.000 10.000 50.500 50.000 70.000 40.000 20.689 80.000 20.000 10.312 80.383 90.114 70.333 60.000 80.997 50.420 60.613 80.212 90.500 20.819 60.000 10.000 50.768 61.000 10.918 20.000 10.000 70.278 90.000 10.333 80.000 90.353 60.546 90.258 8
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021