The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
TD3D Scannet200permissive0.379 40.603 40.306 40.190 40.635 30.073 40.500 10.000 10.000 30.000 10.495 50.735 40.275 71.000 10.979 30.590 20.000 60.021 30.000 40.146 50.000 20.356 30.173 70.795 10.226 40.000 20.173 40.000 10.000 20.226 40.390 40.000 40.000 10.250 20.000 30.706 40.061 50.885 20.093 40.186 30.259 60.200 30.667 20.000 30.000 10.667 30.825 10.250 50.834 61.000 10.958 10.553 10.111 50.748 30.220 20.051 40.866 30.792 10.390 70.045 70.800 40.302 70.517 30.533 40.113 40.427 20.843 30.000 20.458 30.600 10.000 20.101 30.000 30.259 30.717 40.500 40.615 40.520 30.526 30.457 30.270 60.000 10.000 10.400 30.088 30.294 30.181 30.000 11.000 10.400 20.710 70.103 50.477 70.905 30.061 30.000 10.906 20.102 30.232 10.125 40.000 20.003 40.792 51.000 10.000 40.102 50.125 60.559 70.523 50.075 40.715 30.000 40.424 70.000 10.396 20.250 10.638 30.000 20.000 40.000 10.622 70.833 30.221 20.970 10.250 30.038 10.260 40.415 30.125 41.000 11.000 10.857 20.000 20.908 10.012 10.869 50.836 20.635 10.111 20.625 21.000 10.020 20.510 30.003 50.009 31.000 10.778 20.000 10.000 10.370 50.755 20.288 40.333 40.274 31.000 10.557 30.731 30.456 40.433 30.769 70.000 10.000 30.621 61.000 10.458 60.000 10.196 30.817 10.000 10.472 20.222 50.205 70.689 30.274 5
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.445 30.653 20.392 30.254 30.648 20.097 30.125 70.000 10.000 30.000 10.657 10.971 10.451 31.000 11.000 10.640 10.500 30.045 21.000 10.241 30.409 10.363 20.440 20.686 40.300 30.000 20.201 30.000 10.009 10.290 30.556 31.000 10.000 10.063 50.000 30.830 20.573 10.844 30.333 30.204 20.058 70.158 70.552 40.056 20.000 11.000 10.725 40.750 10.927 11.000 10.888 50.042 50.120 40.615 60.226 10.250 20.890 20.792 10.677 40.510 30.818 30.699 30.512 40.167 70.125 30.315 40.943 10.309 10.017 50.200 30.000 20.188 20.000 30.183 50.815 31.000 10.827 20.741 20.442 40.414 60.600 20.000 10.000 10.458 20.049 40.321 20.381 10.000 10.908 30.400 20.841 30.260 30.710 30.966 10.265 20.000 10.924 10.152 10.025 20.500 10.027 10.028 21.000 10.556 70.016 30.080 70.500 10.694 40.608 30.084 30.604 50.194 30.538 40.000 10.500 10.000 20.354 60.000 21.000 10.000 10.761 40.930 20.053 60.890 31.000 10.008 20.262 30.358 41.000 11.000 10.792 50.966 11.000 10.765 40.004 20.930 20.780 40.330 20.027 30.625 20.974 40.050 10.412 70.021 40.000 40.000 20.778 20.000 10.000 10.493 40.746 30.454 30.335 30.396 10.930 70.551 41.000 10.552 30.606 10.853 20.000 10.004 20.806 21.000 10.727 30.000 10.042 40.745 30.000 10.399 50.391 20.630 30.721 20.619 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
ODIN - Ins200permissive0.451 20.637 30.407 20.277 20.583 60.116 20.500 10.000 10.125 10.000 10.599 20.823 20.407 50.667 70.941 40.542 31.000 10.000 41.000 10.162 40.000 20.028 60.357 30.695 30.550 10.000 20.475 20.000 10.000 20.714 10.626 21.000 10.000 10.500 10.125 20.749 30.080 30.742 70.528 20.078 40.500 20.334 20.667 20.333 10.000 10.278 70.723 50.250 50.859 51.000 10.826 70.108 40.221 20.763 20.000 30.250 20.742 40.500 40.750 10.400 40.855 20.769 10.701 20.469 50.203 20.406 30.870 20.000 20.963 10.200 30.000 20.000 40.500 10.370 10.886 11.000 10.782 30.504 40.429 50.494 20.337 40.000 10.000 10.600 10.000 50.215 40.226 20.000 10.944 20.200 40.887 20.750 10.874 10.877 40.438 10.000 10.867 30.089 40.003 30.500 10.000 20.333 11.000 10.742 30.125 20.671 20.417 50.616 60.637 20.238 10.873 20.528 20.494 60.000 10.250 30.000 20.688 20.000 21.000 10.000 10.872 10.833 30.275 10.779 61.000 10.000 30.441 20.577 20.167 21.000 10.500 60.777 40.000 20.778 30.000 30.910 30.800 30.232 50.019 40.717 10.833 50.000 30.638 20.284 10.000 40.000 20.778 20.000 10.000 10.597 20.699 40.850 20.333 40.250 40.944 50.571 20.677 40.795 20.264 40.852 30.000 10.000 30.824 11.000 10.668 40.000 10.000 50.667 40.000 10.333 60.333 30.760 10.679 40.404 3
DINO3D-Scannet200copyleft0.511 10.685 10.484 10.331 10.864 10.220 10.500 10.000 10.042 20.000 10.576 30.746 30.744 11.000 11.000 10.355 71.000 10.048 10.000 40.327 10.000 20.494 10.532 10.596 60.496 20.250 10.481 10.000 10.000 20.714 10.629 11.000 10.000 10.250 20.663 10.861 10.436 20.892 10.667 10.244 10.385 40.421 11.000 10.000 30.000 10.764 20.719 60.500 20.889 21.000 10.907 30.111 30.378 10.778 10.000 30.595 10.905 10.708 30.750 10.542 10.890 10.754 20.761 10.798 10.220 10.683 10.817 40.000 20.600 20.200 30.500 10.944 10.125 20.334 20.856 20.792 30.873 10.756 10.777 10.803 10.675 10.000 10.000 10.200 40.298 10.412 10.000 40.000 10.719 60.800 10.923 10.750 10.798 20.960 20.000 40.000 10.856 40.142 20.001 40.417 30.000 20.014 31.000 10.824 20.559 10.700 10.500 10.863 10.816 10.163 20.944 10.764 10.714 10.000 10.250 30.000 21.000 10.063 11.000 10.000 10.789 30.974 10.079 50.851 50.000 40.000 30.468 10.702 10.167 21.000 11.000 10.857 20.000 20.867 20.000 30.968 10.845 10.264 40.419 10.500 40.667 70.000 30.677 10.028 30.194 20.000 20.857 10.000 10.000 10.699 10.821 10.930 10.850 10.346 20.944 50.579 10.866 20.850 10.221 50.911 10.000 10.011 10.806 30.764 70.860 20.000 10.472 10.794 20.000 10.667 10.655 10.655 20.811 10.528 2
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
Minkowski 34D Inst.permissive0.280 60.488 60.192 70.124 60.593 50.010 60.500 10.000 10.000 30.000 10.447 60.535 60.445 41.000 10.861 60.400 40.225 40.000 40.000 40.142 60.000 20.074 50.342 50.467 70.067 50.000 20.119 70.000 10.000 20.000 60.337 70.000 40.000 10.000 60.000 30.506 70.070 40.804 50.000 50.000 60.333 50.172 50.150 70.000 30.000 10.479 60.745 30.000 70.830 71.000 10.904 40.167 20.090 60.732 40.000 30.000 50.443 60.000 50.500 50.542 10.772 70.396 60.077 70.385 60.044 60.118 70.777 60.000 20.000 60.200 30.000 20.000 40.000 30.148 60.502 60.500 40.419 60.159 70.281 60.404 70.317 50.000 10.000 10.200 40.000 50.077 50.000 40.000 10.750 40.200 40.715 60.021 60.551 40.828 70.000 40.000 10.743 60.059 70.000 50.000 50.000 20.000 50.125 70.648 50.000 40.191 40.500 10.669 50.502 60.000 70.568 60.000 40.516 50.000 10.000 50.000 20.305 70.000 20.000 40.000 10.825 20.833 30.021 70.918 20.000 40.000 30.191 60.346 60.100 60.981 51.000 10.286 60.000 20.000 70.000 30.868 60.648 70.292 30.000 50.375 51.000 10.000 30.500 40.000 60.333 10.000 20.538 70.000 10.000 10.213 70.518 60.098 60.528 20.250 40.997 30.284 70.677 40.398 50.167 60.790 60.000 10.000 30.618 70.903 60.200 70.000 10.333 20.333 60.000 10.442 40.083 60.213 60.587 60.131 7
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.275 70.466 70.218 60.110 70.625 40.007 70.500 10.000 10.000 30.000 10.000 70.222 70.377 61.000 10.661 70.400 40.000 60.000 40.000 40.119 70.000 20.000 70.277 60.685 50.067 50.000 20.132 50.000 10.000 20.000 60.367 60.000 40.000 10.000 60.000 30.591 50.055 60.783 60.000 50.014 50.500 20.161 60.278 50.000 30.000 10.667 30.768 20.500 20.866 31.000 10.829 60.000 60.019 70.555 70.000 30.000 50.305 70.000 50.750 10.200 60.783 60.429 50.395 50.677 30.020 70.286 50.584 70.000 20.000 60.115 70.000 20.000 40.000 30.145 70.423 70.500 40.364 70.369 60.571 20.448 50.206 70.000 10.000 10.200 40.106 20.065 70.000 40.000 10.750 40.200 40.774 40.000 70.501 50.841 60.000 40.000 10.692 70.063 60.000 50.000 50.000 20.000 50.500 60.649 40.000 40.084 60.125 60.719 20.413 70.004 60.450 70.000 40.638 20.000 10.000 50.000 20.505 50.000 20.000 40.000 10.727 50.833 30.221 30.779 60.000 40.000 30.168 70.311 70.125 40.571 60.500 60.143 70.000 20.250 60.000 30.869 40.667 60.162 70.000 50.250 61.000 10.000 30.500 40.000 60.000 40.000 20.689 60.000 10.000 10.312 60.383 70.114 50.333 40.000 60.997 30.420 50.613 60.212 70.500 20.819 40.000 10.000 30.768 41.000 10.918 10.000 10.000 50.278 70.000 10.333 60.000 70.353 40.546 70.258 6
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 50.529 50.225 50.155 50.578 70.010 50.500 10.000 10.000 30.000 10.515 40.556 50.696 21.000 10.927 50.400 40.083 50.000 41.000 10.252 20.000 20.167 40.350 40.731 20.067 50.000 20.123 60.000 10.000 20.036 50.372 50.000 40.000 10.250 20.000 30.569 60.031 70.810 40.000 50.000 60.630 10.183 40.278 50.000 30.000 10.582 50.589 70.500 20.863 41.000 10.940 20.000 60.144 30.716 50.000 30.000 50.484 50.000 50.500 50.400 40.798 50.500 40.278 60.750 20.093 50.166 60.783 50.000 20.200 40.400 20.000 20.000 40.000 30.219 40.539 50.500 40.578 50.413 50.181 70.457 40.375 30.000 10.000 10.050 70.000 50.077 60.000 40.000 10.500 70.000 70.743 50.250 40.488 60.846 50.000 40.000 10.800 50.069 50.000 50.000 50.000 20.000 51.000 10.607 60.000 40.200 30.500 10.694 30.528 40.063 50.659 40.000 40.594 30.000 10.000 50.000 20.571 40.000 20.000 40.000 10.716 60.647 70.221 30.857 40.000 40.000 30.217 50.346 50.071 70.530 71.000 10.429 50.000 20.286 50.000 30.826 70.706 50.208 60.000 50.250 60.744 60.000 30.500 40.042 20.000 40.000 20.746 50.000 10.000 10.517 30.625 50.085 70.333 40.000 61.000 10.378 60.533 70.376 60.042 70.814 50.000 10.000 30.765 51.000 10.600 50.000 10.000 50.667 40.000 10.472 20.333 30.337 50.605 50.305 4
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.