ScanNet200 3D Semantic Instance Benchmark
The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.
Evaluation and metricsSimilarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.
This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.
Method | Info | avg ap 25% | head ap 25% | common ap 25% | tail ap 25% | alarm clock | armchair | backpack | bag | ball | bar | basket | bathroom cabinet | bathroom counter | bathroom stall | bathroom stall door | bathroom vanity | bathtub | bed | bench | bicycle | bin | blackboard | blanket | blinds | board | book | bookshelf | bottle | bowl | box | broom | bucket | bulletin board | cabinet | calendar | candle | cart | case of water bottles | cd case | ceiling | ceiling light | chair | clock | closet | closet door | closet rod | closet wall | clothes | clothes dryer | coat rack | coffee kettle | coffee maker | coffee table | column | computer tower | container | copier | couch | counter | crate | cup | curtain | cushion | decoration | desk | dining table | dish rack | dishwasher | divider | door | doorframe | dresser | dumbbell | dustpan | end table | fan | file cabinet | fire alarm | fire extinguisher | fireplace | folded chair | furniture | guitar | guitar case | hair dryer | handicap bar | hat | headphones | ironing board | jacket | keyboard | keyboard piano | kitchen cabinet | kitchen counter | ladder | lamp | laptop | laundry basket | laundry detergent | laundry hamper | ledge | light | light switch | luggage | machine | mailbox | mat | mattress | microwave | mini fridge | mirror | monitor | mouse | music stand | nightstand | object | office chair | ottoman | oven | paper | paper bag | paper cutter | paper towel dispenser | paper towel roll | person | piano | picture | pillar | pillow | pipe | plant | plate | plunger | poster | potted plant | power outlet | power strip | printer | projector | projector screen | purse | rack | radiator | rail | range hood | recycling bin | refrigerator | scale | seat | shelf | shoe | shower | shower curtain | shower curtain rod | shower door | shower floor | shower head | shower wall | sign | sink | soap dish | soap dispenser | sofa chair | speaker | stair rail | stairs | stand | stool | storage bin | storage container | storage organizer | stove | structure | stuffed animal | suitcase | table | telephone | tissue box | toaster | toaster oven | toilet | toilet paper | toilet paper dispenser | toilet paper holder | toilet seat cover dispenser | towel | trash bin | trash can | tray | tube | tv | tv stand | vacuum cleaner | vent | wardrobe | washing machine | water bottle | water cooler | water pitcher | whiteboard | window | windowsill |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask3D Scannet200 | 0.445 1 | 0.653 1 | 0.392 1 | 0.254 1 | 0.648 1 | 0.097 1 | 0.125 5 | 0.000 1 | 0.000 1 | 0.000 1 | 0.657 1 | 0.971 1 | 0.451 2 | 1.000 1 | 1.000 1 | 0.640 1 | 0.500 1 | 0.045 1 | 1.000 1 | 0.241 2 | 0.409 1 | 0.363 1 | 0.440 1 | 0.686 3 | 0.300 1 | 0.000 1 | 0.201 1 | 0.000 1 | 0.009 1 | 0.290 1 | 0.556 1 | 1.000 1 | 0.000 1 | 0.063 3 | 0.000 1 | 0.830 1 | 0.573 1 | 0.844 2 | 0.333 1 | 0.204 1 | 0.058 5 | 0.158 5 | 0.552 2 | 0.056 1 | 0.000 1 | 1.000 1 | 0.725 4 | 0.750 1 | 0.927 1 | 1.000 1 | 0.888 4 | 0.042 3 | 0.120 2 | 0.615 4 | 0.226 1 | 0.250 1 | 0.890 1 | 0.792 1 | 0.677 2 | 0.510 2 | 0.818 1 | 0.699 1 | 0.512 2 | 0.167 5 | 0.125 1 | 0.315 2 | 0.943 1 | 0.309 1 | 0.017 3 | 0.200 3 | 0.000 1 | 0.188 1 | 0.000 1 | 0.183 3 | 0.815 1 | 1.000 1 | 0.827 1 | 0.741 1 | 0.442 3 | 0.414 4 | 0.600 1 | 0.000 1 | 0.000 1 | 0.458 1 | 0.049 3 | 0.321 1 | 0.381 1 | 0.000 1 | 0.908 2 | 0.400 1 | 0.841 1 | 0.260 1 | 0.710 1 | 0.966 1 | 0.265 1 | 0.000 1 | 0.924 1 | 0.152 1 | 0.025 2 | 0.500 1 | 0.027 1 | 0.028 1 | 1.000 1 | 0.556 5 | 0.016 1 | 0.080 5 | 0.500 1 | 0.694 3 | 0.608 1 | 0.084 1 | 0.604 3 | 0.194 1 | 0.538 3 | 0.000 1 | 0.500 1 | 0.000 2 | 0.354 4 | 0.000 1 | 1.000 1 | 0.000 1 | 0.761 2 | 0.930 1 | 0.053 4 | 0.890 3 | 1.000 1 | 0.008 2 | 0.262 1 | 0.358 2 | 1.000 1 | 1.000 1 | 0.792 4 | 0.966 1 | 1.000 1 | 0.765 2 | 0.004 2 | 0.930 1 | 0.780 2 | 0.330 2 | 0.027 2 | 0.625 1 | 0.974 4 | 0.050 1 | 0.412 5 | 0.021 2 | 0.000 3 | 0.000 2 | 0.778 1 | 0.000 1 | 0.000 1 | 0.493 2 | 0.746 2 | 0.454 1 | 0.335 2 | 0.396 1 | 0.930 5 | 0.551 2 | 1.000 1 | 0.552 1 | 0.606 1 | 0.853 1 | 0.000 1 | 0.004 1 | 0.806 1 | 1.000 1 | 0.727 2 | 0.000 1 | 0.042 3 | 0.745 2 | 0.000 1 | 0.399 4 | 0.391 1 | 0.630 1 | 0.721 1 | 0.619 1 | |||||||||||||||||||||||||||||
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TD3D Scannet200 | 0.379 2 | 0.603 2 | 0.306 2 | 0.190 2 | 0.635 2 | 0.073 2 | 0.500 1 | 0.000 1 | 0.000 1 | 0.000 1 | 0.495 3 | 0.735 2 | 0.275 5 | 1.000 1 | 0.979 2 | 0.590 2 | 0.000 4 | 0.021 2 | 0.000 3 | 0.146 3 | 0.000 2 | 0.356 2 | 0.173 5 | 0.795 1 | 0.226 2 | 0.000 1 | 0.173 2 | 0.000 1 | 0.000 2 | 0.226 2 | 0.390 2 | 0.000 2 | 0.000 1 | 0.250 1 | 0.000 1 | 0.706 2 | 0.061 3 | 0.885 1 | 0.093 2 | 0.186 2 | 0.259 4 | 0.200 1 | 0.667 1 | 0.000 2 | 0.000 1 | 0.667 2 | 0.825 1 | 0.250 4 | 0.834 4 | 1.000 1 | 0.958 1 | 0.553 1 | 0.111 3 | 0.748 1 | 0.220 2 | 0.051 2 | 0.866 2 | 0.792 1 | 0.390 5 | 0.045 5 | 0.800 2 | 0.302 5 | 0.517 1 | 0.533 3 | 0.113 2 | 0.427 1 | 0.843 2 | 0.000 2 | 0.458 1 | 0.600 1 | 0.000 1 | 0.101 2 | 0.000 1 | 0.259 1 | 0.717 2 | 0.500 2 | 0.615 2 | 0.520 2 | 0.526 2 | 0.457 1 | 0.270 4 | 0.000 1 | 0.000 1 | 0.400 2 | 0.088 2 | 0.294 2 | 0.181 2 | 0.000 1 | 1.000 1 | 0.400 1 | 0.710 5 | 0.103 3 | 0.477 5 | 0.905 2 | 0.061 2 | 0.000 1 | 0.906 2 | 0.102 2 | 0.232 1 | 0.125 2 | 0.000 2 | 0.003 2 | 0.792 3 | 1.000 1 | 0.000 2 | 0.102 3 | 0.125 4 | 0.559 5 | 0.523 3 | 0.075 2 | 0.715 1 | 0.000 2 | 0.424 5 | 0.000 1 | 0.396 2 | 0.250 1 | 0.638 1 | 0.000 1 | 0.000 2 | 0.000 1 | 0.622 5 | 0.833 2 | 0.221 1 | 0.970 1 | 0.250 2 | 0.038 1 | 0.260 2 | 0.415 1 | 0.125 2 | 1.000 1 | 1.000 1 | 0.857 2 | 0.000 2 | 0.908 1 | 0.012 1 | 0.869 3 | 0.836 1 | 0.635 1 | 0.111 1 | 0.625 1 | 1.000 1 | 0.020 2 | 0.510 1 | 0.003 3 | 0.009 2 | 1.000 1 | 0.778 1 | 0.000 1 | 0.000 1 | 0.370 3 | 0.755 1 | 0.288 2 | 0.333 3 | 0.274 2 | 1.000 1 | 0.557 1 | 0.731 2 | 0.456 2 | 0.433 3 | 0.769 5 | 0.000 1 | 0.000 2 | 0.621 4 | 1.000 1 | 0.458 4 | 0.000 1 | 0.196 2 | 0.817 1 | 0.000 1 | 0.472 1 | 0.222 3 | 0.205 5 | 0.689 2 | 0.274 3 | |||||||||||||||||||||||||||||
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Minkowski 34D Inst. | 0.280 4 | 0.488 4 | 0.192 5 | 0.124 4 | 0.593 4 | 0.010 4 | 0.500 1 | 0.000 1 | 0.000 1 | 0.000 1 | 0.447 4 | 0.535 4 | 0.445 3 | 1.000 1 | 0.861 4 | 0.400 3 | 0.225 2 | 0.000 3 | 0.000 3 | 0.142 4 | 0.000 2 | 0.074 4 | 0.342 3 | 0.467 5 | 0.067 3 | 0.000 1 | 0.119 5 | 0.000 1 | 0.000 2 | 0.000 4 | 0.337 5 | 0.000 2 | 0.000 1 | 0.000 4 | 0.000 1 | 0.506 5 | 0.070 2 | 0.804 4 | 0.000 3 | 0.000 4 | 0.333 3 | 0.172 3 | 0.150 5 | 0.000 2 | 0.000 1 | 0.479 5 | 0.745 3 | 0.000 5 | 0.830 5 | 1.000 1 | 0.904 3 | 0.167 2 | 0.090 4 | 0.732 2 | 0.000 3 | 0.000 3 | 0.443 4 | 0.000 3 | 0.500 3 | 0.542 1 | 0.772 5 | 0.396 4 | 0.077 5 | 0.385 4 | 0.044 4 | 0.118 5 | 0.777 4 | 0.000 2 | 0.000 4 | 0.200 3 | 0.000 1 | 0.000 3 | 0.000 1 | 0.148 4 | 0.502 4 | 0.500 2 | 0.419 4 | 0.159 5 | 0.281 4 | 0.404 5 | 0.317 3 | 0.000 1 | 0.000 1 | 0.200 3 | 0.000 4 | 0.077 3 | 0.000 3 | 0.000 1 | 0.750 3 | 0.200 3 | 0.715 4 | 0.021 4 | 0.551 2 | 0.828 5 | 0.000 3 | 0.000 1 | 0.743 4 | 0.059 5 | 0.000 3 | 0.000 3 | 0.000 2 | 0.000 3 | 0.125 5 | 0.648 3 | 0.000 2 | 0.191 2 | 0.500 1 | 0.669 4 | 0.502 4 | 0.000 5 | 0.568 4 | 0.000 2 | 0.516 4 | 0.000 1 | 0.000 3 | 0.000 2 | 0.305 5 | 0.000 1 | 0.000 2 | 0.000 1 | 0.825 1 | 0.833 2 | 0.021 5 | 0.918 2 | 0.000 3 | 0.000 3 | 0.191 4 | 0.346 4 | 0.100 4 | 0.981 3 | 1.000 1 | 0.286 4 | 0.000 2 | 0.000 5 | 0.000 3 | 0.868 4 | 0.648 5 | 0.292 3 | 0.000 3 | 0.375 3 | 1.000 1 | 0.000 3 | 0.500 2 | 0.000 4 | 0.333 1 | 0.000 2 | 0.538 5 | 0.000 1 | 0.000 1 | 0.213 5 | 0.518 4 | 0.098 4 | 0.528 1 | 0.250 3 | 0.997 3 | 0.284 5 | 0.677 3 | 0.398 3 | 0.167 4 | 0.790 4 | 0.000 1 | 0.000 2 | 0.618 5 | 0.903 5 | 0.200 5 | 0.000 1 | 0.333 1 | 0.333 4 | 0.000 1 | 0.442 3 | 0.083 4 | 0.213 4 | 0.587 4 | 0.131 5 | |||||||||||||||||||||||||||||
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
LGround Inst. | 0.314 3 | 0.529 3 | 0.225 3 | 0.155 3 | 0.578 5 | 0.010 3 | 0.500 1 | 0.000 1 | 0.000 1 | 0.000 1 | 0.515 2 | 0.556 3 | 0.696 1 | 1.000 1 | 0.927 3 | 0.400 3 | 0.083 3 | 0.000 3 | 1.000 1 | 0.252 1 | 0.000 2 | 0.167 3 | 0.350 2 | 0.731 2 | 0.067 3 | 0.000 1 | 0.123 4 | 0.000 1 | 0.000 2 | 0.036 3 | 0.372 3 | 0.000 2 | 0.000 1 | 0.250 1 | 0.000 1 | 0.569 4 | 0.031 5 | 0.810 3 | 0.000 3 | 0.000 4 | 0.630 1 | 0.183 2 | 0.278 3 | 0.000 2 | 0.000 1 | 0.582 4 | 0.589 5 | 0.500 2 | 0.863 3 | 1.000 1 | 0.940 2 | 0.000 4 | 0.144 1 | 0.716 3 | 0.000 3 | 0.000 3 | 0.484 3 | 0.000 3 | 0.500 3 | 0.400 3 | 0.798 3 | 0.500 2 | 0.278 4 | 0.750 1 | 0.093 3 | 0.166 4 | 0.783 3 | 0.000 2 | 0.200 2 | 0.400 2 | 0.000 1 | 0.000 3 | 0.000 1 | 0.219 2 | 0.539 3 | 0.500 2 | 0.578 3 | 0.413 3 | 0.181 5 | 0.457 2 | 0.375 2 | 0.000 1 | 0.000 1 | 0.050 5 | 0.000 4 | 0.077 4 | 0.000 3 | 0.000 1 | 0.500 5 | 0.000 5 | 0.743 3 | 0.250 2 | 0.488 4 | 0.846 3 | 0.000 3 | 0.000 1 | 0.800 3 | 0.069 3 | 0.000 3 | 0.000 3 | 0.000 2 | 0.000 3 | 1.000 1 | 0.607 4 | 0.000 2 | 0.200 1 | 0.500 1 | 0.694 2 | 0.528 2 | 0.063 3 | 0.659 2 | 0.000 2 | 0.594 2 | 0.000 1 | 0.000 3 | 0.000 2 | 0.571 2 | 0.000 1 | 0.000 2 | 0.000 1 | 0.716 4 | 0.647 5 | 0.221 2 | 0.857 4 | 0.000 3 | 0.000 3 | 0.217 3 | 0.346 3 | 0.071 5 | 0.530 5 | 1.000 1 | 0.429 3 | 0.000 2 | 0.286 3 | 0.000 3 | 0.826 5 | 0.706 3 | 0.208 4 | 0.000 3 | 0.250 4 | 0.744 5 | 0.000 3 | 0.500 2 | 0.042 1 | 0.000 3 | 0.000 2 | 0.746 3 | 0.000 1 | 0.000 1 | 0.517 1 | 0.625 3 | 0.085 5 | 0.333 3 | 0.000 4 | 1.000 1 | 0.378 4 | 0.533 5 | 0.376 4 | 0.042 5 | 0.814 3 | 0.000 1 | 0.000 2 | 0.765 3 | 1.000 1 | 0.600 3 | 0.000 1 | 0.000 4 | 0.667 3 | 0.000 1 | 0.472 1 | 0.333 2 | 0.337 3 | 0.605 3 | 0.305 2 | |||||||||||||||||||||||||||||
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CSC-Pretrain Inst. | 0.275 5 | 0.466 5 | 0.218 4 | 0.110 5 | 0.625 3 | 0.007 5 | 0.500 1 | 0.000 1 | 0.000 1 | 0.000 1 | 0.000 5 | 0.222 5 | 0.377 4 | 1.000 1 | 0.661 5 | 0.400 3 | 0.000 4 | 0.000 3 | 0.000 3 | 0.119 5 | 0.000 2 | 0.000 5 | 0.277 4 | 0.685 4 | 0.067 3 | 0.000 1 | 0.132 3 | 0.000 1 | 0.000 2 | 0.000 4 | 0.367 4 | 0.000 2 | 0.000 1 | 0.000 4 | 0.000 1 | 0.591 3 | 0.055 4 | 0.783 5 | 0.000 3 | 0.014 3 | 0.500 2 | 0.161 4 | 0.278 3 | 0.000 2 | 0.000 1 | 0.667 2 | 0.768 2 | 0.500 2 | 0.866 2 | 1.000 1 | 0.829 5 | 0.000 4 | 0.019 5 | 0.555 5 | 0.000 3 | 0.000 3 | 0.305 5 | 0.000 3 | 0.750 1 | 0.200 4 | 0.783 4 | 0.429 3 | 0.395 3 | 0.677 2 | 0.020 5 | 0.286 3 | 0.584 5 | 0.000 2 | 0.000 4 | 0.115 5 | 0.000 1 | 0.000 3 | 0.000 1 | 0.145 5 | 0.423 5 | 0.500 2 | 0.364 5 | 0.369 4 | 0.571 1 | 0.448 3 | 0.206 5 | 0.000 1 | 0.000 1 | 0.200 3 | 0.106 1 | 0.065 5 | 0.000 3 | 0.000 1 | 0.750 3 | 0.200 3 | 0.774 2 | 0.000 5 | 0.501 3 | 0.841 4 | 0.000 3 | 0.000 1 | 0.692 5 | 0.063 4 | 0.000 3 | 0.000 3 | 0.000 2 | 0.000 3 | 0.500 4 | 0.649 2 | 0.000 2 | 0.084 4 | 0.125 4 | 0.719 1 | 0.413 5 | 0.004 4 | 0.450 5 | 0.000 2 | 0.638 1 | 0.000 1 | 0.000 3 | 0.000 2 | 0.505 3 | 0.000 1 | 0.000 2 | 0.000 1 | 0.727 3 | 0.833 2 | 0.221 2 | 0.779 5 | 0.000 3 | 0.000 3 | 0.168 5 | 0.311 5 | 0.125 2 | 0.571 4 | 0.500 5 | 0.143 5 | 0.000 2 | 0.250 4 | 0.000 3 | 0.869 2 | 0.667 4 | 0.162 5 | 0.000 3 | 0.250 4 | 1.000 1 | 0.000 3 | 0.500 2 | 0.000 4 | 0.000 3 | 0.000 2 | 0.689 4 | 0.000 1 | 0.000 1 | 0.312 4 | 0.383 5 | 0.114 3 | 0.333 3 | 0.000 4 | 0.997 3 | 0.420 3 | 0.613 4 | 0.212 5 | 0.500 2 | 0.819 2 | 0.000 1 | 0.000 2 | 0.768 2 | 1.000 1 | 0.918 1 | 0.000 1 | 0.000 4 | 0.278 5 | 0.000 1 | 0.333 5 | 0.000 5 | 0.353 2 | 0.546 5 | 0.258 4 | |||||||||||||||||||||||||||||
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021 |