The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%chairtabledoorcouchcabinetshelfdeskoffice chairbedpillowsinkpicturewindowtoiletbookshelfmonitorcurtainbookarmchaircoffee tableboxrefrigeratorlampkitchen cabinettowelclothestvnightstandcounterdresserstoolcushionplantceilingbathtubend tabledining tablekeyboardbagbackpacktoilet paperprintertv standwhiteboardblanketshower curtaintrash canclosetstairsmicrowavestoveshoecomputer towerbottlebinottomanbenchboardwashing machinemirrorcopierbasketsofa chairfile cabinetfanlaptopshowerpaperpersonpaper towel dispenserovenblindsrackplateblackboardpianosuitcaserailradiatorrecycling bincontainerwardrobesoap dispensertelephonebucketclockstandlightlaundry basketpipeclothes dryerguitartoilet paper holderseatspeakercolumnbicycleladderbathroom stallshower wallcupjacketstorage bincoffee makerdishwasherpaper towel rollmachinematwindowsillbartoasterbulletin boardironing boardfireplacesoap dishkitchen counterdoorframetoilet paper dispensermini fridgefire extinguisherballhatshower curtain rodwater coolerpaper cuttertrayshower doorpillarledgetoaster ovenmousetoilet seat cover dispenserfurniturecartstorage containerscaletissue boxlight switchcratepower outletdecorationsignprojectorcloset doorvacuum cleanercandleplungerstuffed animalheadphonesdish rackbroomguitar caserange hooddustpanhair dryerwater bottlehandicap barpurseventshower floorwater pitchermailboxbowlpaper bagalarm clockmusic standprojector screendividerlaundry detergentbathroom counterobjectbathroom vanitycloset walllaundry hamperbathroom stall doorceiling lighttrash bindumbbellstair railtubebathroom cabinetcd casecloset rodcoffee kettlestructureshower headkeyboard pianocase of water bottlescoat rackstorage organizerfolded chairfire alarmpower stripcalendarposterpotted plantluggagemattress
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Mask3D Scannet2000.445 10.653 10.392 10.254 10.844 10.746 10.818 10.888 30.556 10.262 10.890 10.025 11.000 10.608 10.930 10.694 30.721 10.930 40.686 20.966 10.615 30.440 10.725 30.201 10.890 20.414 30.827 10.552 10.158 40.806 10.924 10.042 20.512 10.412 40.226 10.604 20.830 11.000 10.125 10.792 10.815 10.097 10.648 10.551 10.354 31.000 10.630 10.241 21.000 10.853 10.204 10.974 30.841 10.778 10.358 10.927 10.300 10.045 10.640 10.363 10.745 10.710 11.000 10.000 10.330 10.943 10.315 10.600 11.000 10.027 10.080 40.556 40.500 10.409 10.000 10.194 11.000 10.500 10.493 20.761 20.053 30.042 20.780 10.454 10.009 10.333 10.050 10.321 10.000 10.084 10.552 10.008 10.027 10.750 10.500 10.442 20.657 10.765 10.120 20.183 20.021 21.000 10.510 20.016 10.000 10.400 10.619 10.000 10.396 10.290 10.000 10.741 10.699 11.000 10.260 10.017 20.125 40.000 10.792 30.399 31.000 10.000 10.049 20.265 10.063 20.000 21.000 10.335 20.381 10.500 10.250 10.004 10.000 10.727 20.000 10.538 30.000 10.188 10.677 20.000 10.930 10.000 10.000 10.966 10.391 10.908 10.000 10.028 10.000 11.000 10.000 10.152 10.451 20.458 10.971 10.573 10.606 10.167 40.625 10.004 10.000 10.058 40.000 10.000 11.000 11.000 10.000 10.056 10.000 10.200 20.309 10.000 11.000 10.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation.
Minkowski 34D Inst.permissive0.280 30.488 30.192 40.124 30.804 30.518 30.772 40.904 20.337 40.191 30.443 30.000 20.861 30.502 30.868 30.669 40.587 30.997 20.467 40.828 40.732 10.342 30.745 20.119 40.918 10.404 40.419 30.398 20.172 20.618 40.743 30.167 10.077 40.500 10.000 20.568 30.506 41.000 10.044 30.000 20.502 30.010 30.593 30.284 40.305 40.903 40.213 40.142 30.981 20.790 40.000 31.000 10.715 40.538 40.346 30.830 40.067 20.000 20.400 20.074 30.333 30.551 21.000 10.000 10.292 20.777 30.118 40.317 30.100 30.000 20.191 20.648 20.000 20.000 20.000 10.000 20.000 30.500 10.213 40.825 10.021 40.333 10.648 40.098 30.000 20.000 20.000 20.077 20.000 10.000 40.150 40.000 20.000 20.000 40.225 20.281 30.447 30.000 40.090 30.148 30.000 30.479 40.542 10.000 20.000 10.200 20.131 40.000 10.250 20.000 30.000 10.159 40.396 40.677 20.021 30.000 30.500 10.000 11.000 10.442 20.125 40.000 10.000 30.000 20.000 30.333 10.000 20.528 10.000 20.000 20.000 20.000 20.000 10.200 40.000 10.516 40.000 10.000 20.500 30.000 10.833 20.000 10.000 10.286 30.083 30.750 20.000 10.000 20.000 10.000 20.000 10.059 40.445 30.200 20.535 30.070 20.167 30.385 30.375 20.000 20.000 10.333 30.000 10.000 10.000 20.500 20.000 10.000 20.000 10.200 20.000 20.000 10.000 20.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
CSC-Pretrain Inst.permissive0.275 40.466 40.218 30.110 40.783 40.383 40.783 30.829 40.367 30.168 40.305 40.000 20.661 40.413 40.869 20.719 10.546 40.997 20.685 30.841 30.555 40.277 40.768 10.132 20.779 40.448 20.364 40.212 40.161 30.768 20.692 40.000 30.395 20.500 10.000 20.450 40.591 21.000 10.020 40.000 20.423 40.007 40.625 20.420 20.505 21.000 10.353 20.119 40.571 30.819 20.014 21.000 10.774 20.689 30.311 40.866 20.067 20.000 20.400 20.000 40.278 40.501 31.000 10.000 10.162 40.584 40.286 20.206 40.125 20.000 20.084 30.649 10.000 20.000 20.000 10.000 20.000 30.125 40.312 30.727 30.221 10.000 30.667 30.114 20.000 20.000 20.000 20.065 40.000 10.004 30.278 20.000 20.000 20.500 20.000 40.571 10.000 40.250 30.019 40.145 40.000 30.667 20.200 40.000 20.000 10.200 20.258 30.000 10.000 30.000 30.000 10.369 30.429 30.613 30.000 40.000 30.500 10.000 10.500 40.333 40.500 30.000 10.106 10.000 20.000 30.000 20.000 20.333 30.000 20.000 20.000 20.000 20.000 10.918 10.000 10.638 10.000 10.000 20.750 10.000 10.833 20.000 10.000 10.143 40.000 40.750 20.000 10.000 20.000 10.000 20.000 10.063 30.377 40.200 20.222 40.055 30.500 20.677 20.250 30.000 20.000 10.500 20.000 10.000 10.000 20.500 20.000 10.000 20.000 10.115 40.000 20.000 10.000 20.000 1
Ji Hou, Benjamin Graham, Matthias Nie├čner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 20.529 20.225 20.155 20.810 20.625 20.798 20.940 10.372 20.217 20.484 20.000 20.927 20.528 20.826 40.694 20.605 21.000 10.731 10.846 20.716 20.350 20.589 40.123 30.857 30.457 10.578 20.376 30.183 10.765 30.800 20.000 30.278 30.500 10.000 20.659 10.569 31.000 10.093 20.000 20.539 20.010 20.578 40.378 30.571 11.000 10.337 30.252 10.530 40.814 30.000 30.744 40.743 30.746 20.346 20.863 30.067 20.000 20.400 20.167 20.667 20.488 41.000 10.000 10.208 30.783 20.166 30.375 20.071 40.000 20.200 10.607 30.000 20.000 20.000 10.000 21.000 10.500 10.517 10.716 40.221 10.000 30.706 20.085 40.000 20.000 20.000 20.077 30.000 10.063 20.278 20.000 20.000 20.500 20.083 30.181 40.515 20.286 20.144 10.219 10.042 10.582 30.400 30.000 20.000 10.000 40.305 20.000 10.000 30.036 20.000 10.413 20.500 20.533 40.250 20.200 10.500 10.000 11.000 10.472 11.000 10.000 10.000 30.000 20.250 10.000 20.000 20.333 30.000 20.000 20.000 20.000 20.000 10.600 30.000 10.594 20.000 10.000 20.500 30.000 10.647 40.000 10.000 10.429 20.333 20.500 40.000 10.000 20.000 10.000 20.000 10.069 20.696 10.050 40.556 20.031 40.042 40.750 10.250 30.000 20.000 10.630 10.000 10.000 10.000 20.500 20.000 10.000 20.000 10.400 10.000 20.000 10.000 20.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.