The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
ODIN - Ins200permissive0.451 10.637 20.407 10.277 10.583 50.116 10.500 10.000 10.125 10.000 10.599 20.823 20.407 40.667 60.941 30.542 31.000 10.000 31.000 10.162 30.000 20.028 50.357 20.695 30.550 10.000 10.475 10.000 10.000 20.714 10.626 11.000 10.000 10.500 10.125 10.749 20.080 20.742 60.528 10.078 30.500 20.334 10.667 10.333 10.000 10.278 60.723 50.250 40.859 41.000 10.826 60.108 30.221 10.763 10.000 30.250 10.742 30.500 30.750 10.400 30.855 10.769 10.701 10.469 40.203 10.406 20.870 20.000 20.963 10.200 30.000 10.000 30.500 10.370 10.886 11.000 10.782 20.504 30.429 40.494 10.337 30.000 10.000 10.600 10.000 40.215 30.226 20.000 10.944 20.200 30.887 10.750 10.874 10.877 30.438 10.000 10.867 30.089 30.003 30.500 10.000 20.333 11.000 10.742 20.125 10.671 10.417 40.616 50.637 10.238 10.873 10.528 10.494 50.000 10.250 30.000 20.688 10.000 11.000 10.000 10.872 10.833 20.275 10.779 51.000 10.000 30.441 10.577 10.167 21.000 10.500 50.777 30.000 20.778 20.000 30.910 20.800 20.232 40.019 30.717 10.833 50.000 30.638 10.284 10.000 30.000 20.778 10.000 10.000 10.597 10.699 30.850 10.333 30.250 30.944 50.571 10.677 30.795 10.264 40.852 20.000 10.000 20.824 11.000 10.668 30.000 10.000 40.667 30.000 10.333 50.333 20.760 10.679 30.404 2
TD3D Scannet200permissive0.379 30.603 30.306 30.190 30.635 20.073 30.500 10.000 10.000 20.000 10.495 40.735 30.275 61.000 10.979 20.590 20.000 50.021 20.000 40.146 40.000 20.356 20.173 60.795 10.226 30.000 10.173 30.000 10.000 20.226 30.390 30.000 30.000 10.250 20.000 20.706 30.061 40.885 10.093 30.186 20.259 50.200 20.667 10.000 30.000 10.667 20.825 10.250 40.834 51.000 10.958 10.553 10.111 40.748 20.220 20.051 30.866 20.792 10.390 60.045 60.800 30.302 60.517 20.533 30.113 30.427 10.843 30.000 20.458 20.600 10.000 10.101 20.000 20.259 20.717 30.500 30.615 30.520 20.526 20.457 20.270 50.000 10.000 10.400 30.088 20.294 20.181 30.000 11.000 10.400 10.710 60.103 40.477 60.905 20.061 30.000 10.906 20.102 20.232 10.125 30.000 20.003 30.792 41.000 10.000 30.102 40.125 50.559 60.523 40.075 30.715 20.000 30.424 60.000 10.396 20.250 10.638 20.000 10.000 30.000 10.622 60.833 20.221 20.970 10.250 30.038 10.260 30.415 20.125 31.000 11.000 10.857 20.000 20.908 10.012 10.869 40.836 10.635 10.111 10.625 21.000 10.020 20.510 20.003 40.009 21.000 10.778 10.000 10.000 10.370 40.755 10.288 30.333 30.274 21.000 10.557 20.731 20.456 30.433 30.769 60.000 10.000 20.621 51.000 10.458 50.000 10.196 20.817 10.000 10.472 10.222 40.205 60.689 20.274 4
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Mask3D Scannet2000.445 20.653 10.392 20.254 20.648 10.097 20.125 60.000 10.000 20.000 10.657 10.971 10.451 21.000 11.000 10.640 10.500 20.045 11.000 10.241 20.409 10.363 10.440 10.686 40.300 20.000 10.201 20.000 10.009 10.290 20.556 21.000 10.000 10.063 40.000 20.830 10.573 10.844 20.333 20.204 10.058 60.158 60.552 30.056 20.000 11.000 10.725 40.750 10.927 11.000 10.888 40.042 40.120 30.615 50.226 10.250 10.890 10.792 10.677 30.510 20.818 20.699 20.512 30.167 60.125 20.315 30.943 10.309 10.017 40.200 30.000 10.188 10.000 20.183 40.815 21.000 10.827 10.741 10.442 30.414 50.600 10.000 10.000 10.458 20.049 30.321 10.381 10.000 10.908 30.400 10.841 20.260 20.710 20.966 10.265 20.000 10.924 10.152 10.025 20.500 10.027 10.028 21.000 10.556 60.016 20.080 60.500 10.694 30.608 20.084 20.604 40.194 20.538 30.000 10.500 10.000 20.354 50.000 11.000 10.000 10.761 30.930 10.053 50.890 31.000 10.008 20.262 20.358 31.000 11.000 10.792 40.966 11.000 10.765 30.004 20.930 10.780 30.330 20.027 20.625 20.974 40.050 10.412 60.021 30.000 30.000 20.778 10.000 10.000 10.493 30.746 20.454 20.335 20.396 10.930 60.551 31.000 10.552 20.606 10.853 10.000 10.004 10.806 21.000 10.727 20.000 10.042 30.745 20.000 10.399 40.391 10.630 20.721 10.619 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
Minkowski 34D Inst.permissive0.280 50.488 50.192 60.124 50.593 40.010 50.500 10.000 10.000 20.000 10.447 50.535 50.445 31.000 10.861 50.400 40.225 30.000 30.000 40.142 50.000 20.074 40.342 40.467 60.067 40.000 10.119 60.000 10.000 20.000 50.337 60.000 30.000 10.000 50.000 20.506 60.070 30.804 40.000 40.000 50.333 40.172 40.150 60.000 30.000 10.479 50.745 30.000 60.830 61.000 10.904 30.167 20.090 50.732 30.000 30.000 40.443 50.000 40.500 40.542 10.772 60.396 50.077 60.385 50.044 50.118 60.777 50.000 20.000 50.200 30.000 10.000 30.000 20.148 50.502 50.500 30.419 50.159 60.281 50.404 60.317 40.000 10.000 10.200 40.000 40.077 40.000 40.000 10.750 40.200 30.715 50.021 50.551 30.828 60.000 40.000 10.743 50.059 60.000 40.000 40.000 20.000 40.125 60.648 40.000 30.191 30.500 10.669 40.502 50.000 60.568 50.000 30.516 40.000 10.000 40.000 20.305 60.000 10.000 30.000 10.825 20.833 20.021 60.918 20.000 40.000 30.191 50.346 50.100 50.981 41.000 10.286 50.000 20.000 60.000 30.868 50.648 60.292 30.000 40.375 41.000 10.000 30.500 30.000 50.333 10.000 20.538 60.000 10.000 10.213 60.518 50.098 50.528 10.250 30.997 30.284 60.677 30.398 40.167 50.790 50.000 10.000 20.618 60.903 60.200 60.000 10.333 10.333 50.000 10.442 30.083 50.213 50.587 50.131 6
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.275 60.466 60.218 50.110 60.625 30.007 60.500 10.000 10.000 20.000 10.000 60.222 60.377 51.000 10.661 60.400 40.000 50.000 30.000 40.119 60.000 20.000 60.277 50.685 50.067 40.000 10.132 40.000 10.000 20.000 50.367 50.000 30.000 10.000 50.000 20.591 40.055 50.783 50.000 40.014 40.500 20.161 50.278 40.000 30.000 10.667 20.768 20.500 20.866 21.000 10.829 50.000 50.019 60.555 60.000 30.000 40.305 60.000 40.750 10.200 50.783 50.429 40.395 40.677 20.020 60.286 40.584 60.000 20.000 50.115 60.000 10.000 30.000 20.145 60.423 60.500 30.364 60.369 50.571 10.448 40.206 60.000 10.000 10.200 40.106 10.065 60.000 40.000 10.750 40.200 30.774 30.000 60.501 40.841 50.000 40.000 10.692 60.063 50.000 40.000 40.000 20.000 40.500 50.649 30.000 30.084 50.125 50.719 10.413 60.004 50.450 60.000 30.638 10.000 10.000 40.000 20.505 40.000 10.000 30.000 10.727 40.833 20.221 30.779 50.000 40.000 30.168 60.311 60.125 30.571 50.500 50.143 60.000 20.250 50.000 30.869 30.667 50.162 60.000 40.250 51.000 10.000 30.500 30.000 50.000 30.000 20.689 50.000 10.000 10.312 50.383 60.114 40.333 30.000 50.997 30.420 40.613 50.212 60.500 20.819 30.000 10.000 20.768 31.000 10.918 10.000 10.000 40.278 60.000 10.333 50.000 60.353 30.546 60.258 5
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 40.529 40.225 40.155 40.578 60.010 40.500 10.000 10.000 20.000 10.515 30.556 40.696 11.000 10.927 40.400 40.083 40.000 31.000 10.252 10.000 20.167 30.350 30.731 20.067 40.000 10.123 50.000 10.000 20.036 40.372 40.000 30.000 10.250 20.000 20.569 50.031 60.810 30.000 40.000 50.630 10.183 30.278 40.000 30.000 10.582 40.589 60.500 20.863 31.000 10.940 20.000 50.144 20.716 40.000 30.000 40.484 40.000 40.500 40.400 30.798 40.500 30.278 50.750 10.093 40.166 50.783 40.000 20.200 30.400 20.000 10.000 30.000 20.219 30.539 40.500 30.578 40.413 40.181 60.457 30.375 20.000 10.000 10.050 60.000 40.077 50.000 40.000 10.500 60.000 60.743 40.250 30.488 50.846 40.000 40.000 10.800 40.069 40.000 40.000 40.000 20.000 41.000 10.607 50.000 30.200 20.500 10.694 20.528 30.063 40.659 30.000 30.594 20.000 10.000 40.000 20.571 30.000 10.000 30.000 10.716 50.647 60.221 30.857 40.000 40.000 30.217 40.346 40.071 60.530 61.000 10.429 40.000 20.286 40.000 30.826 60.706 40.208 50.000 40.250 50.744 60.000 30.500 30.042 20.000 30.000 20.746 40.000 10.000 10.517 20.625 40.085 60.333 30.000 51.000 10.378 50.533 60.376 50.042 60.814 40.000 10.000 20.765 41.000 10.600 40.000 10.000 40.667 30.000 10.472 10.333 20.337 40.605 40.305 3
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.