The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.




Method Infoavg ap 50%head ap 50%common ap 50%tail ap 50%alarm clockarmchairbackpackbagballbarbasketbathroom cabinetbathroom counterbathroom stallbathroom stall doorbathroom vanitybathtubbedbenchbicyclebinblackboardblanketblindsboardbookbookshelfbottlebowlboxbroombucketbulletin boardcabinetcalendarcandlecartcase of water bottlescd caseceilingceiling lightchairclockclosetcloset doorcloset rodcloset wallclothesclothes dryercoat rackcoffee kettlecoffee makercoffee tablecolumncomputer towercontainercopiercouchcountercratecupcurtaincushiondecorationdeskdining tabledish rackdishwasherdividerdoordoorframedresserdumbbelldustpanend tablefanfile cabinetfire alarmfire extinguisherfireplacefolded chairfurnitureguitarguitar casehair dryerhandicap barhatheadphonesironing boardjacketkeyboardkeyboard pianokitchen cabinetkitchen counterladderlamplaptoplaundry basketlaundry detergentlaundry hamperledgelightlight switchluggagemachinemailboxmatmattressmicrowavemini fridgemirrormonitormousemusic standnightstandobjectoffice chairottomanovenpaperpaper bagpaper cutterpaper towel dispenserpaper towel rollpersonpianopicturepillarpillowpipeplantplateplungerposterpotted plantpower outletpower stripprinterprojectorprojector screenpurserackradiatorrailrange hoodrecycling binrefrigeratorscaleseatshelfshoeshowershower curtainshower curtain rodshower doorshower floorshower headshower wallsignsinksoap dishsoap dispensersofa chairspeakerstair railstairsstandstoolstorage binstorage containerstorage organizerstovestructurestuffed animalsuitcasetabletelephonetissue boxtoastertoaster oventoilettoilet papertoilet paper dispensertoilet paper holdertoilet seat cover dispensertoweltrash bintrash cantraytubetvtv standvacuum cleanerventwardrobewashing machinewater bottlewater coolerwater pitcherwhiteboardwindowwindowsill
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
TD3D Scannet200permissive0.320 30.501 30.264 30.164 30.506 30.062 30.500 10.000 10.000 20.000 10.208 10.431 30.252 41.000 10.733 40.587 20.000 30.008 20.000 40.106 10.000 20.356 10.123 50.686 10.101 30.000 10.152 30.000 10.000 20.226 20.280 40.000 30.000 10.250 20.000 20.619 20.061 40.841 10.000 20.000 30.167 20.194 20.333 30.000 20.000 10.667 20.820 10.250 30.790 41.000 10.879 20.077 10.094 40.708 10.217 20.049 30.634 10.792 10.331 50.033 60.716 30.159 30.396 30.331 50.099 30.415 10.842 10.000 20.458 20.542 10.000 10.101 20.000 20.218 20.513 30.500 30.458 30.104 30.516 10.456 10.268 50.000 10.000 10.400 20.022 10.233 20.143 30.000 10.677 10.400 10.504 60.095 40.083 60.890 20.061 30.000 10.906 10.076 20.231 10.125 30.000 20.003 30.792 40.881 10.000 30.098 40.125 50.498 50.459 30.063 10.715 20.000 30.241 40.000 10.396 20.063 10.605 20.000 10.000 30.000 10.448 60.629 40.202 30.967 10.250 30.038 10.192 20.185 30.083 41.000 11.000 10.857 20.000 20.470 20.012 10.565 40.798 20.621 10.111 10.500 11.000 10.017 20.509 20.000 20.008 11.000 10.525 30.000 10.000 10.332 40.679 10.264 30.333 20.267 11.000 10.549 20.299 60.387 30.328 30.744 50.000 10.000 20.435 61.000 10.283 50.000 10.196 10.817 10.000 10.472 10.222 40.123 50.560 20.156 3
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
CSC-Pretrain Inst.permissive0.209 50.361 60.157 50.085 50.506 20.007 40.500 10.000 10.000 20.000 10.000 60.093 60.221 50.667 40.524 60.400 40.000 30.000 30.000 40.004 50.000 20.000 60.109 60.589 50.000 50.000 10.059 60.000 10.000 20.000 40.322 30.000 30.000 10.000 40.000 20.405 40.055 50.700 50.000 20.000 30.028 50.091 60.083 40.000 20.000 10.667 20.768 20.000 50.807 31.000 10.776 50.000 40.000 60.340 60.000 30.000 40.103 60.000 40.750 10.200 30.634 60.053 60.246 40.677 20.006 60.198 40.432 50.000 20.000 50.050 50.000 10.000 30.000 20.111 60.356 50.500 30.188 60.000 50.220 50.448 20.050 60.000 10.000 10.000 40.000 30.032 60.000 40.000 10.396 30.000 50.573 50.000 60.228 40.747 50.000 40.000 10.573 60.021 60.000 40.000 40.000 20.000 40.500 50.573 40.000 30.000 60.125 50.592 30.364 60.000 40.450 60.000 30.364 20.000 10.000 40.000 20.340 40.000 10.000 30.000 10.610 40.833 10.221 20.702 50.000 40.000 30.135 60.094 50.125 20.571 50.500 40.143 60.000 20.125 40.000 30.618 20.667 50.115 60.000 30.125 31.000 10.000 30.500 30.000 20.000 20.000 20.502 50.000 10.000 10.312 50.248 60.050 50.000 60.000 40.997 30.420 40.500 40.149 60.451 20.748 30.000 10.000 20.636 30.667 60.600 30.000 10.000 30.278 60.000 10.333 40.000 60.294 30.381 60.110 4
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
Mask3D Scannet2000.388 10.542 10.357 20.237 20.610 10.091 20.125 60.000 10.000 20.000 10.065 30.668 10.451 11.000 10.955 10.640 10.500 20.039 10.125 30.063 30.409 10.311 20.291 10.609 40.266 20.000 10.163 20.000 10.008 10.044 30.496 21.000 10.000 10.018 30.000 20.756 10.573 10.808 20.000 20.010 20.042 40.130 40.552 20.042 10.000 11.000 10.725 40.750 10.883 11.000 10.832 40.024 30.107 20.614 30.226 10.250 10.628 20.792 10.677 30.400 10.741 20.278 20.511 20.077 60.111 20.313 30.715 20.302 10.017 40.200 20.000 10.188 10.000 20.178 30.736 21.000 10.615 10.514 10.409 20.380 60.600 10.000 10.000 10.400 20.013 20.254 10.381 10.000 10.123 50.400 10.839 20.258 20.463 10.926 10.265 20.000 10.857 20.099 10.021 20.500 10.027 10.028 21.000 10.502 60.016 20.076 50.500 10.612 10.578 10.005 30.597 30.194 20.497 10.000 10.500 10.000 20.323 50.000 11.000 10.000 10.748 10.708 20.050 50.890 21.000 10.008 20.151 40.301 21.000 11.000 10.792 30.945 11.000 10.511 10.004 20.753 10.776 30.287 20.020 20.003 50.974 30.033 10.412 60.000 20.000 20.000 20.667 20.000 10.000 10.491 20.676 20.352 20.335 10.060 30.822 60.527 31.000 10.517 20.606 10.853 20.000 10.004 10.806 11.000 10.727 10.000 10.042 20.739 20.000 10.399 30.391 10.504 20.591 10.571 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
LGround Inst.permissive0.246 40.413 40.170 40.130 40.455 60.003 60.500 10.000 10.000 20.000 10.017 50.333 50.111 61.000 10.681 50.400 40.000 30.000 31.000 10.003 60.000 20.167 30.190 30.637 30.067 40.000 10.081 40.000 10.000 20.000 40.264 50.000 30.000 10.000 40.000 20.387 50.031 60.754 30.000 20.000 30.151 30.135 30.056 50.000 20.000 10.582 40.589 60.500 20.815 21.000 10.903 10.000 40.097 30.588 40.000 30.000 40.234 40.000 40.500 40.400 10.682 50.156 40.159 50.750 10.046 40.125 50.660 30.000 20.200 30.000 60.000 10.000 30.000 20.164 40.402 40.500 30.373 40.025 40.143 60.426 30.317 30.000 10.000 10.000 40.000 30.063 40.000 40.000 10.000 60.000 50.575 40.250 30.241 30.772 40.000 40.000 10.653 50.034 40.000 40.000 40.000 20.000 41.000 10.561 50.000 30.100 30.500 10.541 40.452 40.000 40.581 40.000 30.364 20.000 10.000 40.000 20.571 30.000 10.000 30.000 10.568 50.511 50.167 40.857 30.000 40.000 30.164 30.112 40.000 50.530 61.000 10.286 40.000 20.125 40.000 30.464 60.706 40.208 40.000 30.125 30.744 40.000 30.500 30.000 20.000 20.000 20.511 40.000 10.000 10.344 30.541 40.068 40.333 20.000 41.000 10.196 50.533 30.318 40.000 50.748 40.000 10.000 20.690 21.000 10.400 40.000 10.000 30.667 30.000 10.333 40.333 20.270 40.399 40.083 5
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
ODIN - Ins200permissive0.381 20.507 20.375 10.237 10.484 40.108 10.500 10.000 10.125 10.000 10.058 40.647 20.385 20.667 40.853 20.542 31.000 10.000 31.000 10.093 20.000 20.028 50.274 20.682 20.550 10.000 10.269 10.000 10.000 20.714 10.566 11.000 10.000 10.500 10.125 10.585 30.066 30.653 60.083 10.049 10.264 10.227 10.667 10.000 20.000 10.278 60.723 50.250 30.786 51.000 10.744 60.039 20.209 10.494 50.000 30.250 10.446 30.500 30.750 10.200 30.780 10.333 10.602 10.469 30.163 10.406 20.530 40.000 20.668 10.200 20.000 10.000 30.500 10.313 10.769 11.000 10.511 20.196 20.286 30.393 50.337 20.000 10.000 10.600 10.000 30.174 30.226 20.000 10.579 20.200 30.887 10.750 10.428 20.782 30.438 10.000 10.795 30.063 30.003 30.500 10.000 20.333 11.000 10.742 20.083 10.585 10.417 40.448 60.496 20.055 20.734 10.472 10.174 50.000 10.250 30.000 20.688 10.000 11.000 10.000 10.631 30.667 30.275 10.694 61.000 10.000 30.328 10.422 10.000 51.000 10.500 40.638 30.000 20.391 30.000 30.582 30.800 10.208 50.000 30.246 20.667 50.000 30.638 10.167 10.000 20.000 20.778 10.000 10.000 10.563 10.614 30.841 10.333 20.250 20.938 50.569 10.500 40.695 10.264 40.863 10.000 10.000 20.550 51.000 10.668 20.000 10.000 30.667 30.000 10.333 40.333 20.665 10.434 30.264 2
Minkowski 34D Inst.permissive0.203 60.369 50.134 60.078 60.479 50.003 50.500 10.000 10.000 20.000 10.100 20.371 40.300 30.667 40.746 30.400 40.000 30.000 30.000 40.031 40.000 20.074 40.165 40.413 60.000 50.000 10.070 50.000 10.000 20.000 40.221 60.000 30.000 10.000 40.000 20.372 60.070 20.706 40.000 20.000 30.000 60.123 50.033 60.000 20.000 10.422 50.732 30.000 50.778 61.000 10.845 30.000 40.090 50.636 20.000 30.000 40.158 50.000 40.250 60.050 50.693 40.123 50.051 60.385 40.009 50.118 60.406 60.000 20.000 50.200 20.000 10.000 30.000 20.133 50.307 60.500 30.251 50.000 50.281 40.402 40.317 30.000 10.000 10.000 40.000 30.060 50.000 40.000 10.396 30.200 30.669 30.021 50.218 50.720 60.000 40.000 10.696 40.025 50.000 40.000 40.000 20.000 40.125 60.596 30.000 30.191 20.500 10.595 20.369 50.000 40.500 50.000 30.143 60.000 10.000 40.000 20.226 60.000 10.000 30.000 10.701 20.511 50.000 60.851 40.000 40.000 30.150 50.052 60.100 30.981 40.500 40.286 40.000 20.000 60.000 30.545 50.522 60.250 30.000 30.000 60.522 60.000 30.500 30.000 20.000 20.000 20.282 60.000 10.000 10.178 60.382 50.018 60.056 50.000 40.997 30.107 60.677 20.313 50.000 50.726 60.000 10.000 20.583 40.903 50.200 60.000 10.000 30.333 50.000 10.442 20.083 50.109 60.387 50.000 6
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019