3D Semantic Label with Limited Annotations Benchmark
The 3D semantic labeling task involves predicting a semantic labeling of a 3D scan mesh.
Evaluation and metricsOur evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively. Predicted labels are evaluated per-vertex over the respective 3D scan mesh; for 3D approaches that operate on other representations like grids or points, the predicted labels should be mapped onto the mesh vertices (e.g., one such example for grid to mesh vertices is provided in the evaluation helpers).
This table lists the benchmark results for the 3D semantic label with limited annotations scenario.
Method | Info | avg iou | bathtub | bed | bookshelf | cabinet | chair | counter | curtain | desk | door | floor | otherfurniture | picture | refrigerator | shower curtain | sink | sofa | table | toilet | wall | window |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Q2E | 0.741 1 | 0.984 1 | 0.821 2 | 0.757 4 | 0.739 1 | 0.868 2 | 0.600 1 | 0.849 1 | 0.595 6 | 0.659 1 | 0.971 2 | 0.490 2 | 0.299 2 | 0.689 4 | 0.822 3 | 0.749 1 | 0.788 4 | 0.641 1 | 0.935 2 | 0.860 1 | 0.699 2 | |
ActiveST | 0.735 2 | 0.983 2 | 0.769 4 | 0.798 1 | 0.701 2 | 0.852 5 | 0.527 2 | 0.801 2 | 0.680 1 | 0.629 2 | 0.973 1 | 0.447 10 | 0.312 1 | 0.757 1 | 0.799 4 | 0.747 2 | 0.795 3 | 0.632 2 | 0.952 1 | 0.855 2 | 0.684 3 | |
Gengxin Liu, Oliver van Kaick, Hui Huang, Ruizhen Hu: Active Self-Training for Weakly Supervised 3D Scene Semantic Segmentation. | ||||||||||||||||||||||
WS3D_LA_Sem | 0.689 4 | 0.879 3 | 0.753 6 | 0.798 1 | 0.648 8 | 0.816 9 | 0.421 10 | 0.796 3 | 0.604 5 | 0.603 3 | 0.945 10 | 0.457 9 | 0.204 9 | 0.559 10 | 0.851 2 | 0.724 3 | 0.760 7 | 0.630 3 | 0.903 7 | 0.821 5 | 0.603 8 | |
Kangcheng Liu: WS3D: Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination. European Conference on Computer Vision (ECCV), 2022 | ||||||||||||||||||||||
GaIA | 0.682 6 | 0.731 11 | 0.846 1 | 0.713 8 | 0.657 6 | 0.869 1 | 0.475 4 | 0.705 9 | 0.452 13 | 0.569 5 | 0.951 5 | 0.563 1 | 0.290 3 | 0.544 11 | 0.799 4 | 0.677 5 | 0.810 1 | 0.618 4 | 0.900 8 | 0.821 5 | 0.642 5 | |
Min Seok Lee*, Seok Woo Yang*, and Sung Won Han: GaIA: Graphical Information gain based Attention Network for Weakly Supervised 3D Point Cloud Semantic Segmentation. WACV 2023 | ||||||||||||||||||||||
VIBUS | 0.684 5 | 0.848 4 | 0.752 7 | 0.708 9 | 0.691 3 | 0.861 3 | 0.474 5 | 0.770 5 | 0.611 4 | 0.538 9 | 0.951 5 | 0.478 6 | 0.275 4 | 0.676 5 | 0.671 11 | 0.649 8 | 0.788 4 | 0.610 5 | 0.869 9 | 0.808 10 | 0.657 4 | |
Beiwen Tian,Liyi Luo,Hao Zhao,Guyue Zhou: VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and Uncertainty-Spectrum Modeling. ISPRS Journal of Photogrammetry and Remote Sensing | ||||||||||||||||||||||
DE-3DLearner LA | 0.704 3 | 0.774 7 | 0.766 5 | 0.764 3 | 0.687 4 | 0.832 7 | 0.413 11 | 0.790 4 | 0.639 2 | 0.599 4 | 0.952 4 | 0.478 6 | 0.222 8 | 0.746 2 | 0.859 1 | 0.678 4 | 0.806 2 | 0.607 6 | 0.915 5 | 0.847 3 | 0.703 1 | |
Ping-Chung Yu, Cheng Sun, Min Sun: Data Efficient 3D Learner via Knowledge Transferred from 2D Model. ECCV 2022 | ||||||||||||||||||||||
LE | 0.680 7 | 0.744 9 | 0.731 9 | 0.727 6 | 0.664 5 | 0.859 4 | 0.427 9 | 0.759 6 | 0.562 7 | 0.562 6 | 0.948 7 | 0.480 4 | 0.245 6 | 0.735 3 | 0.765 6 | 0.648 10 | 0.786 6 | 0.591 7 | 0.931 3 | 0.817 7 | 0.624 7 | |
Viewpoint_BN_LA_AIR | 0.650 9 | 0.778 6 | 0.731 9 | 0.688 11 | 0.617 11 | 0.812 11 | 0.446 7 | 0.739 8 | 0.618 3 | 0.540 8 | 0.945 10 | 0.415 11 | 0.204 9 | 0.623 7 | 0.676 10 | 0.594 11 | 0.744 10 | 0.576 8 | 0.868 10 | 0.811 8 | 0.582 10 | |
Liyi Luo, Beiwen Tian, Hao Zhao, Guyue Zhou: Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck. | ||||||||||||||||||||||
PointContrast_LA_SEM | 0.636 11 | 0.694 12 | 0.738 8 | 0.731 5 | 0.653 7 | 0.817 8 | 0.467 6 | 0.651 12 | 0.517 8 | 0.522 10 | 0.946 8 | 0.479 5 | 0.198 11 | 0.575 9 | 0.526 13 | 0.649 8 | 0.747 8 | 0.569 9 | 0.845 11 | 0.803 11 | 0.600 9 | |
CSC_LA_SEM | 0.644 10 | 0.761 8 | 0.707 12 | 0.703 10 | 0.642 10 | 0.813 10 | 0.436 8 | 0.659 11 | 0.502 9 | 0.516 11 | 0.945 10 | 0.487 3 | 0.238 7 | 0.538 12 | 0.678 9 | 0.659 6 | 0.739 12 | 0.568 10 | 0.915 5 | 0.811 8 | 0.566 12 | |
One-Thing-One-Click | 0.670 8 | 0.734 10 | 0.815 3 | 0.661 13 | 0.644 9 | 0.841 6 | 0.509 3 | 0.741 7 | 0.479 12 | 0.548 7 | 0.968 3 | 0.461 8 | 0.251 5 | 0.664 6 | 0.754 7 | 0.656 7 | 0.744 10 | 0.541 11 | 0.917 4 | 0.844 4 | 0.625 6 | |
Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu: One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation. CVPR 2021 | ||||||||||||||||||||||
Scratch_LA_SEM | 0.621 12 | 0.802 5 | 0.715 11 | 0.687 12 | 0.570 12 | 0.800 12 | 0.386 12 | 0.703 10 | 0.486 11 | 0.514 12 | 0.946 8 | 0.390 12 | 0.181 12 | 0.620 8 | 0.670 12 | 0.487 13 | 0.746 9 | 0.539 12 | 0.804 12 | 0.798 12 | 0.580 11 | |
SQN_LA | 0.576 13 | 0.674 13 | 0.670 13 | 0.722 7 | 0.454 13 | 0.790 13 | 0.342 13 | 0.622 13 | 0.487 10 | 0.427 13 | 0.933 13 | 0.357 13 | 0.157 13 | 0.452 13 | 0.721 8 | 0.492 12 | 0.696 13 | 0.487 13 | 0.790 13 | 0.748 13 | 0.507 13 | |