The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.445 10.653 10.392 10.254 10.844 20.746 20.818 10.888 40.556 10.262 10.890 10.025 21.000 10.608 10.930 10.694 30.721 10.930 50.686 30.966 10.615 40.440 10.725 40.201 10.890 30.414 40.827 10.552 10.158 50.806 10.924 10.042 30.512 20.412 50.226 10.604 30.830 11.000 10.125 10.792 10.815 10.097 10.648 10.551 20.354 41.000 10.630 10.241 21.000 10.853 10.204 10.974 40.841 10.778 10.358 20.927 10.300 10.045 10.640 10.363 10.745 20.710 11.000 10.000 10.330 20.943 10.315 20.600 11.000 10.027 10.080 50.556 50.500 10.409 10.000 10.194 11.000 10.500 10.493 20.761 20.053 40.042 30.780 20.454 10.009 10.333 10.050 10.321 10.000 10.084 10.552 20.008 20.027 20.750 10.500 10.442 30.657 10.765 20.120 20.183 30.021 21.000 10.510 20.016 10.000 10.400 10.619 10.000 10.396 10.290 10.000 10.741 10.699 11.000 10.260 10.017 30.125 50.000 10.792 40.399 41.000 10.000 10.049 30.265 10.063 30.000 31.000 10.335 20.381 10.500 10.250 10.004 20.000 10.727 20.000 10.538 30.000 10.188 10.677 20.000 10.930 10.000 10.000 10.966 10.391 10.908 20.000 10.028 10.000 11.000 10.000 10.152 10.451 20.458 10.971 10.573 10.606 10.167 50.625 10.004 10.000 10.058 50.000 10.000 11.000 11.000 10.000 10.056 10.000 20.200 30.309 10.000 21.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
TD3D Scannet200permissive0.379 20.603 20.306 20.190 20.885 10.755 10.800 20.958 10.390 20.260 20.866 20.232 10.979 20.523 30.869 30.559 50.689 21.000 10.795 10.905 20.748 10.173 50.825 10.173 20.970 10.457 10.615 20.456 20.200 10.621 40.906 20.553 10.517 10.510 10.220 20.715 10.706 21.000 10.113 20.792 10.717 20.073 20.635 20.557 10.638 11.000 10.205 50.146 31.000 10.769 50.186 21.000 10.710 50.778 10.415 10.834 40.226 20.021 20.590 20.356 20.817 10.477 51.000 10.000 10.635 10.843 20.427 10.270 40.125 20.000 20.102 31.000 10.125 20.000 20.000 10.000 20.000 30.125 40.370 30.622 50.221 10.196 20.836 10.288 20.000 20.093 20.020 20.294 20.000 10.075 20.667 10.038 10.111 10.250 40.000 40.526 20.495 30.908 10.111 30.259 10.003 30.667 20.045 50.000 20.000 10.400 10.274 30.000 10.274 20.226 20.000 10.520 20.302 50.731 20.103 30.458 10.500 10.000 11.000 10.472 10.792 30.000 10.088 20.061 20.250 10.009 20.250 20.333 30.181 20.396 20.051 20.012 10.000 10.458 40.000 10.424 50.000 10.101 20.390 50.000 10.833 20.000 10.000 10.857 20.222 31.000 10.000 10.003 20.000 10.000 20.000 10.102 20.275 50.400 20.735 20.061 30.433 30.533 30.625 10.000 20.000 10.259 40.000 10.000 10.000 20.500 20.000 10.000 21.000 10.600 10.000 20.250 10.000 20.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
Minkowski 34D Inst.permissive0.280 40.488 40.192 50.124 40.804 40.518 40.772 50.904 30.337 50.191 40.443 40.000 30.861 40.502 40.868 40.669 40.587 40.997 30.467 50.828 50.732 20.342 30.745 30.119 50.918 20.404 50.419 40.398 30.172 30.618 50.743 40.167 20.077 50.500 20.000 30.568 40.506 51.000 10.044 40.000 30.502 40.010 40.593 40.284 50.305 50.903 50.213 40.142 40.981 30.790 40.000 41.000 10.715 40.538 50.346 40.830 50.067 30.000 30.400 30.074 40.333 40.551 21.000 10.000 10.292 30.777 40.118 50.317 30.100 40.000 20.191 20.648 30.000 30.000 20.000 10.000 20.000 30.500 10.213 50.825 10.021 50.333 10.648 50.098 40.000 20.000 30.000 30.077 30.000 10.000 50.150 50.000 30.000 30.000 50.225 20.281 40.447 40.000 50.090 40.148 40.000 40.479 50.542 10.000 20.000 10.200 30.131 50.000 10.250 30.000 40.000 10.159 50.396 40.677 30.021 40.000 40.500 10.000 11.000 10.442 30.125 50.000 10.000 40.000 30.000 40.333 10.000 30.528 10.000 30.000 30.000 30.000 30.000 10.200 50.000 10.516 40.000 10.000 30.500 30.000 10.833 20.000 10.000 10.286 40.083 40.750 30.000 10.000 30.000 10.000 20.000 10.059 50.445 30.200 30.535 40.070 20.167 40.385 40.375 30.000 20.000 10.333 30.000 10.000 10.000 20.500 20.000 10.000 20.000 20.200 30.000 20.000 20.000 20.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.275 50.466 50.218 40.110 50.783 50.383 50.783 40.829 50.367 40.168 50.305 50.000 30.661 50.413 50.869 20.719 10.546 50.997 30.685 40.841 40.555 50.277 40.768 20.132 30.779 50.448 30.364 50.212 50.161 40.768 20.692 50.000 40.395 30.500 20.000 30.450 50.591 31.000 10.020 50.000 30.423 50.007 50.625 30.420 30.505 31.000 10.353 20.119 50.571 40.819 20.014 31.000 10.774 20.689 40.311 50.866 20.067 30.000 30.400 30.000 50.278 50.501 31.000 10.000 10.162 50.584 50.286 30.206 50.125 20.000 20.084 40.649 20.000 30.000 20.000 10.000 20.000 30.125 40.312 40.727 30.221 20.000 40.667 40.114 30.000 20.000 30.000 30.065 50.000 10.004 40.278 30.000 30.000 30.500 20.000 40.571 10.000 50.250 40.019 50.145 50.000 40.667 20.200 40.000 20.000 10.200 30.258 40.000 10.000 40.000 40.000 10.369 40.429 30.613 40.000 50.000 40.500 10.000 10.500 50.333 50.500 40.000 10.106 10.000 30.000 40.000 30.000 30.333 30.000 30.000 30.000 30.000 30.000 10.918 10.000 10.638 10.000 10.000 30.750 10.000 10.833 20.000 10.000 10.143 50.000 50.750 30.000 10.000 30.000 10.000 20.000 10.063 40.377 40.200 30.222 50.055 40.500 20.677 20.250 40.000 20.000 10.500 20.000 10.000 10.000 20.500 20.000 10.000 20.000 20.115 50.000 20.000 20.000 20.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 30.529 30.225 30.155 30.810 30.625 30.798 30.940 20.372 30.217 30.484 30.000 30.927 30.528 20.826 50.694 20.605 31.000 10.731 20.846 30.716 30.350 20.589 50.123 40.857 40.457 20.578 30.376 40.183 20.765 30.800 30.000 40.278 40.500 20.000 30.659 20.569 41.000 10.093 30.000 30.539 30.010 30.578 50.378 40.571 21.000 10.337 30.252 10.530 50.814 30.000 40.744 50.743 30.746 30.346 30.863 30.067 30.000 30.400 30.167 30.667 30.488 41.000 10.000 10.208 40.783 30.166 40.375 20.071 50.000 20.200 10.607 40.000 30.000 20.000 10.000 21.000 10.500 10.517 10.716 40.221 20.000 40.706 30.085 50.000 20.000 30.000 30.077 40.000 10.063 30.278 30.000 30.000 30.500 20.083 30.181 50.515 20.286 30.144 10.219 20.042 10.582 40.400 30.000 20.000 10.000 50.305 20.000 10.000 40.036 30.000 10.413 30.500 20.533 50.250 20.200 20.500 10.000 11.000 10.472 11.000 10.000 10.000 40.000 30.250 10.000 30.000 30.333 30.000 30.000 30.000 30.000 30.000 10.600 30.000 10.594 20.000 10.000 30.500 30.000 10.647 50.000 10.000 10.429 30.333 20.500 50.000 10.000 30.000 10.000 20.000 10.069 30.696 10.050 50.556 30.031 50.042 50.750 10.250 40.000 20.000 10.630 10.000 10.000 10.000 20.500 20.000 10.000 20.000 20.400 20.000 20.000 20.000 20.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.