The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 25%head ap 25%common ap 25%tail ap 25%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
CSC-Pretrain Inst.permissive0.275 40.466 40.218 30.110 40.625 20.007 40.500 10.000 10.000 10.000 10.000 40.222 40.377 41.000 10.661 40.400 20.000 40.000 20.000 30.119 40.000 20.000 40.277 40.685 30.067 20.000 10.132 20.000 10.000 20.000 30.367 30.000 20.000 10.000 30.000 10.591 20.055 30.783 40.000 20.014 20.500 20.161 30.278 20.000 20.000 10.667 20.768 10.500 20.866 21.000 10.829 40.000 30.019 40.555 40.000 20.000 20.305 40.000 20.750 10.200 40.783 30.429 30.395 20.677 20.020 40.286 20.584 40.000 20.000 30.115 40.000 10.000 20.000 10.145 40.423 40.500 20.364 40.369 30.571 10.448 20.206 40.000 10.000 10.200 20.106 10.065 40.000 20.000 10.750 20.200 20.774 20.000 40.501 30.841 30.000 20.000 10.692 40.063 30.000 20.000 20.000 20.000 20.500 30.649 10.000 20.084 30.125 40.719 10.413 40.004 30.450 40.000 20.638 10.000 10.000 20.000 10.505 20.000 10.000 20.000 10.727 30.833 20.221 10.779 40.000 20.000 20.168 40.311 40.125 20.571 30.500 40.143 40.000 20.250 30.000 20.869 20.667 30.162 40.000 20.250 31.000 10.000 20.500 10.000 30.000 20.000 10.689 30.000 10.000 10.312 30.383 40.114 20.333 30.000 30.997 20.420 20.613 30.212 40.500 20.819 20.000 10.000 20.768 21.000 10.918 10.000 10.000 30.278 40.000 10.333 40.000 40.353 20.546 40.258 3
Ji Hou, Benjamin Graham, Matthias Nie├čner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.permissive0.314 20.529 20.225 20.155 20.578 40.010 20.500 10.000 10.000 10.000 10.515 20.556 20.696 11.000 10.927 20.400 20.083 30.000 21.000 10.252 10.000 20.167 20.350 20.731 10.067 20.000 10.123 30.000 10.000 20.036 20.372 20.000 20.000 10.250 10.000 10.569 30.031 40.810 20.000 20.000 30.630 10.183 10.278 20.000 20.000 10.582 30.589 40.500 20.863 31.000 10.940 10.000 30.144 10.716 20.000 20.000 20.484 20.000 20.500 30.400 30.798 20.500 20.278 30.750 10.093 20.166 30.783 20.000 20.200 10.400 10.000 10.000 20.000 10.219 10.539 20.500 20.578 20.413 20.181 40.457 10.375 20.000 10.000 10.050 40.000 30.077 30.000 20.000 10.500 40.000 40.743 30.250 20.488 40.846 20.000 20.000 10.800 20.069 20.000 20.000 20.000 20.000 21.000 10.607 30.000 20.200 10.500 10.694 20.528 20.063 20.659 10.000 20.594 20.000 10.000 20.000 10.571 10.000 10.000 20.000 10.716 40.647 40.221 10.857 30.000 20.000 20.217 20.346 20.071 40.530 41.000 10.429 20.000 20.286 20.000 20.826 40.706 20.208 30.000 20.250 30.744 40.000 20.500 10.042 10.000 20.000 10.746 20.000 10.000 10.517 10.625 20.085 40.333 30.000 31.000 10.378 30.533 40.376 30.042 40.814 30.000 10.000 20.765 31.000 10.600 30.000 10.000 30.667 20.000 10.472 10.333 20.337 30.605 20.305 2
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
Mask3D Scannet2000.445 10.653 10.392 10.254 10.648 10.097 10.125 40.000 10.000 10.000 10.657 10.971 10.451 21.000 11.000 10.640 10.500 10.045 11.000 10.241 20.409 10.363 10.440 10.686 20.300 10.000 10.201 10.000 10.009 10.290 10.556 11.000 10.000 10.063 20.000 10.830 10.573 10.844 10.333 10.204 10.058 40.158 40.552 10.056 10.000 11.000 10.725 30.750 10.927 11.000 10.888 30.042 20.120 20.615 30.226 10.250 10.890 10.792 10.677 20.510 20.818 10.699 10.512 10.167 40.125 10.315 10.943 10.309 10.017 20.200 20.000 10.188 10.000 10.183 20.815 11.000 10.827 10.741 10.442 20.414 30.600 10.000 10.000 10.458 10.049 20.321 10.381 10.000 10.908 10.400 10.841 10.260 10.710 10.966 10.265 10.000 10.924 10.152 10.025 10.500 10.027 10.028 11.000 10.556 40.016 10.080 40.500 10.694 30.608 10.084 10.604 20.194 10.538 30.000 10.500 10.000 10.354 30.000 11.000 10.000 10.761 20.930 10.053 30.890 21.000 10.008 10.262 10.358 11.000 11.000 10.792 30.966 11.000 10.765 10.004 10.930 10.780 10.330 10.027 10.625 10.974 30.050 10.412 40.021 20.000 20.000 10.778 10.000 10.000 10.493 20.746 10.454 10.335 20.396 10.930 40.551 11.000 10.552 10.606 10.853 10.000 10.004 10.806 11.000 10.727 20.000 10.042 20.745 10.000 10.399 30.391 10.630 10.721 10.619 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation.
Minkowski 34D Inst.permissive0.280 30.488 30.192 40.124 30.593 30.010 30.500 10.000 10.000 10.000 10.447 30.535 30.445 31.000 10.861 30.400 20.225 20.000 20.000 30.142 30.000 20.074 30.342 30.467 40.067 20.000 10.119 40.000 10.000 20.000 30.337 40.000 20.000 10.000 30.000 10.506 40.070 20.804 30.000 20.000 30.333 30.172 20.150 40.000 20.000 10.479 40.745 20.000 40.830 41.000 10.904 20.167 10.090 30.732 10.000 20.000 20.443 30.000 20.500 30.542 10.772 40.396 40.077 40.385 30.044 30.118 40.777 30.000 20.000 30.200 20.000 10.000 20.000 10.148 30.502 30.500 20.419 30.159 40.281 30.404 40.317 30.000 10.000 10.200 20.000 30.077 20.000 20.000 10.750 20.200 20.715 40.021 30.551 20.828 40.000 20.000 10.743 30.059 40.000 20.000 20.000 20.000 20.125 40.648 20.000 20.191 20.500 10.669 40.502 30.000 40.568 30.000 20.516 40.000 10.000 20.000 10.305 40.000 10.000 20.000 10.825 10.833 20.021 40.918 10.000 20.000 20.191 30.346 30.100 30.981 21.000 10.286 30.000 20.000 40.000 20.868 30.648 40.292 20.000 20.375 21.000 10.000 20.500 10.000 30.333 10.000 10.538 40.000 10.000 10.213 40.518 30.098 30.528 10.250 20.997 20.284 40.677 20.398 20.167 30.790 40.000 10.000 20.618 40.903 40.200 40.000 10.333 10.333 30.000 10.442 20.083 30.213 40.587 30.131 4
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019