3D Semantic Label with Limited Annotations Benchmark
The 3D semantic labeling task involves predicting a semantic labeling of a 3D scan mesh.
Evaluation and metricsOur evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively. Predicted labels are evaluated per-vertex over the respective 3D scan mesh; for 3D approaches that operate on other representations like grids or points, the predicted labels should be mapped onto the mesh vertices (e.g., one such example for grid to mesh vertices is provided in the evaluation helpers).
This table lists the benchmark results for the 3D semantic label with limited annotations scenario.
Method | Info | avg iou | bathtub | bed | bookshelf | cabinet | chair | counter | curtain | desk | door | floor | otherfurniture | picture | refrigerator | shower curtain | sink | sofa | table | toilet | wall | window |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GaIA | 0.685 8 | 0.759 10 | 0.834 1 | 0.759 5 | 0.650 8 | 0.859 3 | 0.427 10 | 0.694 10 | 0.524 10 | 0.575 7 | 0.948 6 | 0.537 1 | 0.304 3 | 0.534 12 | 0.853 2 | 0.678 7 | 0.820 1 | 0.581 10 | 0.914 4 | 0.828 5 | 0.626 8 | |
Min Seok Lee*, Seok Woo Yang*, and Sung Won Han: GaIA: Graphical Information gain based Attention Network for Weakly Supervised 3D Point Cloud Semantic Segmentation. WACV 2023 | ||||||||||||||||||||||
Q2E | 0.743 2 | 0.984 1 | 0.803 4 | 0.770 1 | 0.725 1 | 0.881 1 | 0.572 1 | 0.806 2 | 0.663 2 | 0.665 1 | 0.972 2 | 0.506 3 | 0.305 2 | 0.652 6 | 0.829 4 | 0.761 2 | 0.809 2 | 0.660 1 | 0.951 2 | 0.862 2 | 0.682 2 | |
DE-3DLearner LA | 0.709 3 | 0.877 4 | 0.772 8 | 0.744 9 | 0.694 3 | 0.836 7 | 0.453 6 | 0.787 4 | 0.623 4 | 0.598 4 | 0.953 4 | 0.490 7 | 0.216 11 | 0.682 5 | 0.879 1 | 0.727 3 | 0.802 3 | 0.604 5 | 0.922 3 | 0.845 4 | 0.676 3 | |
Ping-Chung Yu, Cheng Sun, Min Sun: Data Efficient 3D Learner via Knowledge Transferred from 2D Model. ECCV 2022 | ||||||||||||||||||||||
CSC_LA_SEM | 0.665 10 | 0.857 6 | 0.756 9 | 0.763 4 | 0.647 9 | 0.852 4 | 0.432 9 | 0.684 12 | 0.543 8 | 0.514 12 | 0.948 6 | 0.469 8 | 0.179 12 | 0.599 9 | 0.702 11 | 0.620 10 | 0.789 4 | 0.614 4 | 0.911 5 | 0.815 11 | 0.607 11 | |
One-Thing-One-Click | 0.694 4 | 0.760 9 | 0.815 2 | 0.706 13 | 0.684 5 | 0.840 6 | 0.492 4 | 0.701 9 | 0.557 7 | 0.596 5 | 0.972 2 | 0.497 4 | 0.281 4 | 0.709 2 | 0.757 8 | 0.689 6 | 0.789 4 | 0.600 7 | 0.907 7 | 0.864 1 | 0.671 4 | |
Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu: One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation. CVPR 2021 | ||||||||||||||||||||||
ActiveST | 0.748 1 | 0.984 1 | 0.804 3 | 0.759 5 | 0.720 2 | 0.849 5 | 0.516 2 | 0.791 3 | 0.670 1 | 0.654 2 | 0.974 1 | 0.495 5 | 0.382 1 | 0.811 1 | 0.828 5 | 0.787 1 | 0.780 6 | 0.640 2 | 0.952 1 | 0.861 3 | 0.701 1 | |
Gengxin Liu, Oliver van Kaick, Hui Huang, Ruizhen Hu: Active Self-Training for Weakly Supervised 3D Scene Semantic Segmentation. | ||||||||||||||||||||||
VIBUS | 0.691 6 | 0.860 5 | 0.731 12 | 0.738 10 | 0.672 7 | 0.860 2 | 0.470 5 | 0.766 5 | 0.625 3 | 0.547 11 | 0.949 5 | 0.491 6 | 0.255 5 | 0.693 4 | 0.715 10 | 0.712 4 | 0.778 7 | 0.597 8 | 0.911 5 | 0.816 9 | 0.635 7 | |
Beiwen Tian,Liyi Luo,Hao Zhao,Guyue Zhou: VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and Uncertainty-Spectrum Modeling. ISPRS Journal of Photogrammetry and Remote Sensing | ||||||||||||||||||||||
WS3D_LA_Sem | 0.694 4 | 0.895 3 | 0.743 10 | 0.767 2 | 0.675 6 | 0.826 10 | 0.496 3 | 0.817 1 | 0.612 5 | 0.613 3 | 0.947 10 | 0.460 9 | 0.254 6 | 0.558 11 | 0.811 7 | 0.710 5 | 0.776 8 | 0.616 3 | 0.874 11 | 0.822 6 | 0.603 12 | |
Kangcheng Liu: WS3D: Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination. European Conference on Computer Vision (ECCV), 2022 | ||||||||||||||||||||||
PointContrast_LA_SEM | 0.653 11 | 0.717 12 | 0.775 7 | 0.754 7 | 0.626 11 | 0.804 13 | 0.391 12 | 0.689 11 | 0.485 13 | 0.572 9 | 0.945 12 | 0.448 10 | 0.232 9 | 0.603 8 | 0.813 6 | 0.591 12 | 0.775 9 | 0.537 12 | 0.885 10 | 0.816 9 | 0.608 10 | |
Viewpoint_BN_LA_AIR | 0.669 9 | 0.847 8 | 0.732 11 | 0.724 11 | 0.613 12 | 0.827 9 | 0.443 7 | 0.742 6 | 0.562 6 | 0.551 10 | 0.947 10 | 0.441 12 | 0.218 10 | 0.650 7 | 0.753 9 | 0.621 9 | 0.765 10 | 0.601 6 | 0.905 8 | 0.814 12 | 0.618 9 | |
Liyi Luo, Beiwen Tian, Hao Zhao, Guyue Zhou: Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck. | ||||||||||||||||||||||
LE | 0.688 7 | 0.856 7 | 0.779 6 | 0.754 7 | 0.687 4 | 0.834 8 | 0.438 8 | 0.732 7 | 0.536 9 | 0.577 6 | 0.948 6 | 0.508 2 | 0.248 7 | 0.699 3 | 0.831 3 | 0.636 8 | 0.752 11 | 0.586 9 | 0.895 9 | 0.821 7 | 0.643 6 | |
Scratch_LA_SEM | 0.643 12 | 0.699 13 | 0.793 5 | 0.718 12 | 0.636 10 | 0.816 11 | 0.411 11 | 0.707 8 | 0.490 12 | 0.574 8 | 0.948 6 | 0.448 10 | 0.173 13 | 0.559 10 | 0.689 12 | 0.604 11 | 0.722 12 | 0.556 11 | 0.853 12 | 0.820 8 | 0.651 5 | |
SQN_LA | 0.598 13 | 0.741 11 | 0.681 13 | 0.766 3 | 0.482 13 | 0.805 12 | 0.389 13 | 0.658 13 | 0.499 11 | 0.437 13 | 0.936 13 | 0.386 13 | 0.243 8 | 0.422 13 | 0.663 13 | 0.552 13 | 0.700 13 | 0.519 13 | 0.809 13 | 0.750 13 | 0.515 13 | |