ScanNet200 3D Semantic Instance Benchmark
The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.
Evaluation and metricsSimilarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.
This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.
Method | Info | avg ap 50% | head ap 50% | common ap 50% | tail ap 50% | chair | table | door | couch | cabinet | shelf | desk | office chair | bed | pillow | sink | picture | window | toilet | bookshelf | monitor | curtain | book | armchair | coffee table | box | refrigerator | lamp | kitchen cabinet | towel | clothes | tv | nightstand | counter | dresser | stool | cushion | plant | ceiling | bathtub | end table | dining table | keyboard | bag | backpack | toilet paper | printer | tv stand | whiteboard | blanket | shower curtain | trash can | closet | stairs | microwave | stove | shoe | computer tower | bottle | bin | ottoman | bench | board | washing machine | mirror | copier | basket | sofa chair | file cabinet | fan | laptop | shower | paper | person | paper towel dispenser | oven | blinds | rack | plate | blackboard | piano | suitcase | rail | radiator | recycling bin | container | wardrobe | soap dispenser | telephone | bucket | clock | stand | light | laundry basket | pipe | clothes dryer | guitar | toilet paper holder | seat | speaker | column | bicycle | ladder | bathroom stall | shower wall | cup | jacket | storage bin | coffee maker | dishwasher | paper towel roll | machine | mat | windowsill | bar | toaster | bulletin board | ironing board | fireplace | soap dish | kitchen counter | doorframe | toilet paper dispenser | mini fridge | fire extinguisher | ball | hat | shower curtain rod | water cooler | paper cutter | tray | shower door | pillar | ledge | toaster oven | mouse | toilet seat cover dispenser | furniture | cart | storage container | scale | tissue box | light switch | crate | power outlet | decoration | sign | projector | closet door | vacuum cleaner | candle | plunger | stuffed animal | headphones | dish rack | broom | guitar case | range hood | dustpan | hair dryer | water bottle | handicap bar | purse | vent | shower floor | water pitcher | mailbox | bowl | paper bag | alarm clock | music stand | projector screen | divider | laundry detergent | bathroom counter | object | bathroom vanity | closet wall | laundry hamper | bathroom stall door | ceiling light | trash bin | dumbbell | stair rail | tube | bathroom cabinet | cd case | closet rod | coffee kettle | structure | shower head | keyboard piano | case of water bottles | coat rack | storage organizer | folded chair | fire alarm | power strip | calendar | poster | potted plant | luggage | mattress |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask3D Scannet200 | 0.388 1 | 0.542 1 | 0.357 1 | 0.237 1 | 0.808 2 | 0.676 2 | 0.741 1 | 0.832 4 | 0.496 1 | 0.151 3 | 0.628 2 | 0.021 2 | 0.955 1 | 0.578 1 | 0.753 1 | 0.612 1 | 0.591 1 | 0.822 5 | 0.609 3 | 0.926 1 | 0.614 3 | 0.291 1 | 0.725 4 | 0.163 1 | 0.890 2 | 0.380 5 | 0.615 1 | 0.517 1 | 0.130 3 | 0.806 1 | 0.857 2 | 0.024 2 | 0.511 1 | 0.412 5 | 0.226 1 | 0.597 2 | 0.756 1 | 1.000 1 | 0.111 1 | 0.792 1 | 0.736 1 | 0.091 1 | 0.610 1 | 0.527 2 | 0.323 4 | 1.000 1 | 0.504 1 | 0.063 2 | 1.000 1 | 0.853 1 | 0.010 1 | 0.974 3 | 0.839 1 | 0.667 1 | 0.301 1 | 0.883 1 | 0.266 1 | 0.039 1 | 0.640 1 | 0.311 2 | 0.739 2 | 0.463 1 | 1.000 1 | 0.000 1 | 0.287 2 | 0.715 2 | 0.313 2 | 0.600 1 | 1.000 1 | 0.027 1 | 0.076 4 | 0.502 5 | 0.500 1 | 0.409 1 | 0.000 1 | 0.194 1 | 0.125 2 | 0.500 1 | 0.491 1 | 0.748 1 | 0.050 4 | 0.042 2 | 0.776 2 | 0.352 1 | 0.008 1 | 0.000 1 | 0.033 1 | 0.254 1 | 0.000 1 | 0.005 2 | 0.552 1 | 0.008 2 | 0.020 2 | 0.750 1 | 0.500 1 | 0.409 2 | 0.065 3 | 0.511 1 | 0.107 1 | 0.178 2 | 0.000 1 | 1.000 1 | 0.400 1 | 0.016 1 | 0.000 1 | 0.400 1 | 0.571 1 | 0.000 1 | 0.060 2 | 0.044 2 | 0.000 1 | 0.514 1 | 0.278 1 | 1.000 1 | 0.258 1 | 0.017 3 | 0.125 5 | 0.000 1 | 0.792 3 | 0.399 3 | 1.000 1 | 0.000 1 | 0.013 2 | 0.265 1 | 0.018 2 | 0.000 2 | 1.000 1 | 0.335 1 | 0.381 1 | 0.500 1 | 0.250 1 | 0.004 2 | 0.000 1 | 0.727 1 | 0.000 1 | 0.497 1 | 0.000 1 | 0.188 1 | 0.677 2 | 0.000 1 | 0.708 2 | 0.000 1 | 0.000 1 | 0.945 1 | 0.391 1 | 0.123 4 | 0.000 1 | 0.028 1 | 0.000 1 | 1.000 1 | 0.000 1 | 0.099 1 | 0.451 1 | 0.400 1 | 0.668 1 | 0.573 1 | 0.606 1 | 0.077 5 | 0.003 4 | 0.004 1 | 0.000 1 | 0.042 3 | 0.000 1 | 0.000 1 | 1.000 1 | 1.000 1 | 0.000 1 | 0.042 1 | 0.000 2 | 0.200 2 | 0.302 1 | 0.000 2 | 1.000 1 | 0.000 1 | |||||||||||||||||||||||||||||
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TD3D Scannet200 | 0.320 2 | 0.501 2 | 0.264 2 | 0.164 2 | 0.841 1 | 0.679 1 | 0.716 2 | 0.879 2 | 0.280 3 | 0.192 1 | 0.634 1 | 0.231 1 | 0.733 3 | 0.459 2 | 0.565 3 | 0.498 5 | 0.560 2 | 1.000 1 | 0.686 1 | 0.890 2 | 0.708 1 | 0.123 4 | 0.820 1 | 0.152 2 | 0.967 1 | 0.456 1 | 0.458 2 | 0.387 2 | 0.194 1 | 0.435 5 | 0.906 1 | 0.077 1 | 0.396 2 | 0.509 1 | 0.217 2 | 0.715 1 | 0.619 2 | 1.000 1 | 0.099 2 | 0.792 1 | 0.513 2 | 0.062 2 | 0.506 3 | 0.549 1 | 0.605 1 | 1.000 1 | 0.123 4 | 0.106 1 | 1.000 1 | 0.744 4 | 0.000 2 | 1.000 1 | 0.504 5 | 0.525 2 | 0.185 2 | 0.790 4 | 0.101 2 | 0.008 2 | 0.587 2 | 0.356 1 | 0.817 1 | 0.083 5 | 1.000 1 | 0.000 1 | 0.621 1 | 0.842 1 | 0.415 1 | 0.268 4 | 0.083 4 | 0.000 2 | 0.098 3 | 0.881 1 | 0.125 2 | 0.000 2 | 0.000 1 | 0.000 2 | 0.000 3 | 0.125 4 | 0.332 3 | 0.448 5 | 0.202 2 | 0.196 1 | 0.798 1 | 0.264 2 | 0.000 2 | 0.000 1 | 0.017 2 | 0.233 2 | 0.000 1 | 0.063 1 | 0.333 2 | 0.038 1 | 0.111 1 | 0.250 3 | 0.000 2 | 0.516 1 | 0.208 1 | 0.470 2 | 0.094 3 | 0.218 1 | 0.000 1 | 0.667 2 | 0.033 5 | 0.000 2 | 0.000 1 | 0.400 1 | 0.156 2 | 0.000 1 | 0.267 1 | 0.226 1 | 0.000 1 | 0.104 2 | 0.159 2 | 0.299 5 | 0.095 3 | 0.458 1 | 0.500 1 | 0.000 1 | 1.000 1 | 0.472 1 | 0.792 3 | 0.000 1 | 0.022 1 | 0.061 2 | 0.250 1 | 0.008 1 | 0.250 2 | 0.333 2 | 0.143 2 | 0.396 2 | 0.049 2 | 0.012 1 | 0.000 1 | 0.283 4 | 0.000 1 | 0.241 4 | 0.000 1 | 0.101 2 | 0.331 4 | 0.000 1 | 0.629 3 | 0.000 1 | 0.000 1 | 0.857 2 | 0.222 3 | 0.677 1 | 0.000 1 | 0.003 2 | 0.000 1 | 0.000 2 | 0.000 1 | 0.076 2 | 0.252 3 | 0.400 1 | 0.431 2 | 0.061 3 | 0.328 3 | 0.331 4 | 0.500 1 | 0.000 2 | 0.000 1 | 0.167 1 | 0.000 1 | 0.000 1 | 0.000 2 | 0.500 2 | 0.000 1 | 0.000 2 | 1.000 1 | 0.542 1 | 0.000 2 | 0.063 1 | 0.000 2 | 0.000 1 | |||||||||||||||||||||||||||||
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CSC-Pretrain Inst. | 0.209 4 | 0.361 5 | 0.157 4 | 0.085 4 | 0.700 5 | 0.248 5 | 0.634 5 | 0.776 5 | 0.322 2 | 0.135 5 | 0.103 5 | 0.000 3 | 0.524 5 | 0.364 5 | 0.618 2 | 0.592 3 | 0.381 5 | 0.997 3 | 0.589 4 | 0.747 4 | 0.340 5 | 0.109 5 | 0.768 2 | 0.059 5 | 0.702 5 | 0.448 2 | 0.188 5 | 0.149 5 | 0.091 5 | 0.636 3 | 0.573 5 | 0.000 3 | 0.246 3 | 0.500 2 | 0.000 3 | 0.450 5 | 0.405 3 | 0.667 4 | 0.006 5 | 0.000 3 | 0.356 4 | 0.007 3 | 0.506 2 | 0.420 3 | 0.340 3 | 0.667 5 | 0.294 2 | 0.004 4 | 0.571 4 | 0.748 2 | 0.000 2 | 1.000 1 | 0.573 4 | 0.502 4 | 0.094 4 | 0.807 3 | 0.000 4 | 0.000 3 | 0.400 3 | 0.000 5 | 0.278 5 | 0.228 3 | 1.000 1 | 0.000 1 | 0.115 5 | 0.432 4 | 0.198 3 | 0.050 5 | 0.125 2 | 0.000 2 | 0.000 5 | 0.573 3 | 0.000 3 | 0.000 2 | 0.000 1 | 0.000 2 | 0.000 3 | 0.125 4 | 0.312 4 | 0.610 3 | 0.221 1 | 0.000 3 | 0.667 4 | 0.050 4 | 0.000 2 | 0.000 1 | 0.000 3 | 0.032 5 | 0.000 1 | 0.000 3 | 0.083 3 | 0.000 3 | 0.000 3 | 0.000 4 | 0.000 2 | 0.220 4 | 0.000 5 | 0.125 3 | 0.000 5 | 0.111 5 | 0.000 1 | 0.667 2 | 0.200 3 | 0.000 2 | 0.000 1 | 0.000 4 | 0.110 3 | 0.000 1 | 0.000 3 | 0.000 3 | 0.000 1 | 0.000 4 | 0.053 5 | 0.500 4 | 0.000 5 | 0.000 4 | 0.500 1 | 0.000 1 | 0.500 4 | 0.333 4 | 0.500 4 | 0.000 1 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 2 | 0.000 3 | 0.000 5 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 1 | 0.600 2 | 0.000 1 | 0.364 2 | 0.000 1 | 0.000 3 | 0.750 1 | 0.000 1 | 0.833 1 | 0.000 1 | 0.000 1 | 0.143 5 | 0.000 5 | 0.396 2 | 0.000 1 | 0.000 3 | 0.000 1 | 0.000 2 | 0.000 1 | 0.021 5 | 0.221 4 | 0.000 3 | 0.093 5 | 0.055 4 | 0.451 2 | 0.677 2 | 0.125 2 | 0.000 2 | 0.000 1 | 0.028 4 | 0.000 1 | 0.000 1 | 0.000 2 | 0.500 2 | 0.000 1 | 0.000 2 | 0.000 2 | 0.050 4 | 0.000 2 | 0.000 2 | 0.000 2 | 0.000 1 | |||||||||||||||||||||||||||||
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Minkowski 34D Inst. | 0.203 5 | 0.369 4 | 0.134 5 | 0.078 5 | 0.706 4 | 0.382 4 | 0.693 3 | 0.845 3 | 0.221 5 | 0.150 4 | 0.158 4 | 0.000 3 | 0.746 2 | 0.369 4 | 0.545 4 | 0.595 2 | 0.387 4 | 0.997 3 | 0.413 5 | 0.720 5 | 0.636 2 | 0.165 3 | 0.732 3 | 0.070 4 | 0.851 4 | 0.402 4 | 0.251 4 | 0.313 4 | 0.123 4 | 0.583 4 | 0.696 3 | 0.000 3 | 0.051 5 | 0.500 2 | 0.000 3 | 0.500 4 | 0.372 5 | 0.667 4 | 0.009 4 | 0.000 3 | 0.307 5 | 0.003 4 | 0.479 4 | 0.107 5 | 0.226 5 | 0.903 4 | 0.109 5 | 0.031 3 | 0.981 3 | 0.726 5 | 0.000 2 | 0.522 5 | 0.669 2 | 0.282 5 | 0.052 5 | 0.778 5 | 0.000 4 | 0.000 3 | 0.400 3 | 0.074 4 | 0.333 4 | 0.218 4 | 1.000 1 | 0.000 1 | 0.250 3 | 0.406 5 | 0.118 5 | 0.317 2 | 0.100 3 | 0.000 2 | 0.191 1 | 0.596 2 | 0.000 3 | 0.000 2 | 0.000 1 | 0.000 2 | 0.000 3 | 0.500 1 | 0.178 5 | 0.701 2 | 0.000 5 | 0.000 3 | 0.522 5 | 0.018 5 | 0.000 2 | 0.000 1 | 0.000 3 | 0.060 4 | 0.000 1 | 0.000 3 | 0.033 5 | 0.000 3 | 0.000 3 | 0.000 4 | 0.000 2 | 0.281 3 | 0.100 2 | 0.000 5 | 0.090 4 | 0.133 4 | 0.000 1 | 0.422 5 | 0.050 4 | 0.000 2 | 0.000 1 | 0.200 3 | 0.000 5 | 0.000 1 | 0.000 3 | 0.000 3 | 0.000 1 | 0.000 4 | 0.123 4 | 0.677 2 | 0.021 4 | 0.000 4 | 0.500 1 | 0.000 1 | 0.500 4 | 0.442 2 | 0.125 5 | 0.000 1 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 2 | 0.000 3 | 0.056 4 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 1 | 0.200 5 | 0.000 1 | 0.143 5 | 0.000 1 | 0.000 3 | 0.250 5 | 0.000 1 | 0.511 4 | 0.000 1 | 0.000 1 | 0.286 3 | 0.083 4 | 0.396 2 | 0.000 1 | 0.000 3 | 0.000 1 | 0.000 2 | 0.000 1 | 0.025 4 | 0.300 2 | 0.000 3 | 0.371 3 | 0.070 2 | 0.000 4 | 0.385 3 | 0.000 5 | 0.000 2 | 0.000 1 | 0.000 5 | 0.000 1 | 0.000 1 | 0.000 2 | 0.500 2 | 0.000 1 | 0.000 2 | 0.000 2 | 0.200 2 | 0.000 2 | 0.000 2 | 0.000 2 | 0.000 1 | |||||||||||||||||||||||||||||
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
LGround Inst. | 0.246 3 | 0.413 3 | 0.170 3 | 0.130 3 | 0.754 3 | 0.541 3 | 0.682 4 | 0.903 1 | 0.264 4 | 0.164 2 | 0.234 3 | 0.000 3 | 0.681 4 | 0.452 3 | 0.464 5 | 0.541 4 | 0.399 3 | 1.000 1 | 0.637 2 | 0.772 3 | 0.588 4 | 0.190 2 | 0.589 5 | 0.081 3 | 0.857 3 | 0.426 3 | 0.373 3 | 0.318 3 | 0.135 2 | 0.690 2 | 0.653 4 | 0.000 3 | 0.159 4 | 0.500 2 | 0.000 3 | 0.581 3 | 0.387 4 | 1.000 1 | 0.046 3 | 0.000 3 | 0.402 3 | 0.003 5 | 0.455 5 | 0.196 4 | 0.571 2 | 1.000 1 | 0.270 3 | 0.003 5 | 0.530 5 | 0.748 3 | 0.000 2 | 0.744 4 | 0.575 3 | 0.511 3 | 0.112 3 | 0.815 2 | 0.067 3 | 0.000 3 | 0.400 3 | 0.167 3 | 0.667 3 | 0.241 2 | 1.000 1 | 0.000 1 | 0.208 4 | 0.660 3 | 0.125 4 | 0.317 2 | 0.000 5 | 0.000 2 | 0.100 2 | 0.561 4 | 0.000 3 | 0.000 2 | 0.000 1 | 0.000 2 | 1.000 1 | 0.500 1 | 0.344 2 | 0.568 4 | 0.167 3 | 0.000 3 | 0.706 3 | 0.068 3 | 0.000 2 | 0.000 1 | 0.000 3 | 0.063 3 | 0.000 1 | 0.000 3 | 0.056 4 | 0.000 3 | 0.000 3 | 0.500 2 | 0.000 2 | 0.143 5 | 0.017 4 | 0.125 3 | 0.097 2 | 0.164 3 | 0.000 1 | 0.582 4 | 0.400 1 | 0.000 2 | 0.000 1 | 0.000 4 | 0.083 4 | 0.000 1 | 0.000 3 | 0.000 3 | 0.000 1 | 0.025 3 | 0.156 3 | 0.533 3 | 0.250 2 | 0.200 2 | 0.500 1 | 0.000 1 | 1.000 1 | 0.333 4 | 1.000 1 | 0.000 1 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 2 | 0.000 3 | 0.333 2 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 3 | 0.000 1 | 0.400 3 | 0.000 1 | 0.364 2 | 0.000 1 | 0.000 3 | 0.500 3 | 0.000 1 | 0.511 4 | 0.000 1 | 0.000 1 | 0.286 3 | 0.333 2 | 0.000 5 | 0.000 1 | 0.000 3 | 0.000 1 | 0.000 2 | 0.000 1 | 0.034 3 | 0.111 5 | 0.000 3 | 0.333 4 | 0.031 5 | 0.000 4 | 0.750 1 | 0.125 2 | 0.000 2 | 0.000 1 | 0.151 2 | 0.000 1 | 0.000 1 | 0.000 2 | 0.500 2 | 0.000 1 | 0.000 2 | 0.000 2 | 0.000 5 | 0.000 2 | 0.000 2 | 0.000 2 | 0.000 1 | |||||||||||||||||||||||||||||
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild. |