3D Semantic Segmentation

This task involves predicting a semantic labeling of a 3D scan mesh which is obtained from the laser scanner point cloud. Submissions must provide a semantic label for each vertex of the 3D mesh. teaser

Evaluation and Metrics

The mesh surface is manually annotated with long-tail labels, of which we pick the top 100 classes.

We rank methods similar to the ScanNet intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative 3D vertices, respectively.

Predicted labels are evaluated per-vertex over the vertices of 5% decimated 3D scan mesh (mesh_aligned_0.05.ply); for 3D approaches that operate on other representations like grids or points, the predicted labels should be mapped onto the mesh vertices.

Evaluation excludes the vertices which are anonymized. The list of these vertices is in mesh_aligned_0.05_mask.txt

Multilabel Evaluation

The ScanNet++ ground truth contains multilabels; vertices that have more than one label in ambiguous cases. Hence, we evaluate the Top-1 and Top-3 performance of methods.

Submissions may either contain a single prediction per-vertex, or 3 predictions per-vertex.

Top-1 evaluation considers the top prediction for each vertex, and considers a prediction correct if it matches any ground truth label for that vertex.

Top-3 evaluation considers the top 3 predictions for each vertex, and considers a prediction correct if any of the top 3 predictions matches the ground truth. For multilabeled vertices, all labels in the ground truth must be present in the top 3 predictions for the prediction to be considered correct. Submissions with a single prediction per-vertex will be evaluated as if they had 3 predictions per-vertex, with the same prediction repeated 3 times.

Results

The benchmark is currently evaluated on the v1 (Nov 2023) version of the dataset.

Methods MIOU AIR VENT BACKPACK BAG BASKET BED BINDER BLANKET BLIND RAIL BLINDS BOARD BOOK BOOKSHELF BOTTLE BOWL BOX BUCKET CABINET CEILING CEILING LAMP CHAIR CLOCK CLOTH CLOTHES HANGER COAT HANGER COMPUTER TOWER CONTAINER CRATE CUP CURTAIN CUSHION CUTTING BOARD DOOR DOORFRAME ELECTRICAL DUCT EXHAUST FAN FILE FOLDER FLOOR HEADPHONES HEATER JACKET JAR KETTLE KEYBOARD KITCHEN CABINET KITCHEN COUNTER LAPTOP LIGHT SWITCH MARKER MICROWAVE MONITOR MOUSE OFFICE CHAIR PAINTING PAN PAPER PAPER BAG PAPER TOWEL PICTURE PILLOW PIPE PLANT PLANT POT POSTER POT POWER STRIP PRINTER RACK REFRIGERATOR SHELF SHOE RACK SHOES SHOWER WALL SINK SLIPPERS SMOKE DETECTOR SOAP DISPENSER SOCKET SOFA SPEAKER SPRAY BOTTLE STAPLER STORAGE CABINET SUITCASE TABLE TABLE LAMP TAP TELEPHONE TISSUE BOX TOILET TOILET BRUSH TOILET PAPER TOWEL TRASH CAN TV WALL WHITEBOARD WHITEBOARD ERASER WINDOW WINDOW FRAME WINDOWSILL
PTv3 - PPT 0.464 0.034 0.591 0.427 0.007 0.812 0.000 0.745 0.629 0.876 0.000 0.171 0.494 0.382 0.118 0.507 0.327 0.366 0.908 0.921 0.711 0.732 0.000 0.056 0.083 0.556 0.002 0.145 0.439 0.781 0.020 0.163 0.702 0.396 0.316 0.742 0.002 0.926 0.192 0.801 0.691 0.043 0.414 0.829 0.558 0.218 0.753 0.466 0.000 0.896 0.875 0.749 0.841 0.082 0.302 0.292 0.159 0.263 0.514 0.394 0.729 0.905 0.536 0.148 0.000 0.372 0.447 0.005 0.834 0.254 0.309 0.406 0.212 0.779 0.354 0.423 0.731 0.227 0.819 0.077 0.056 0.058 0.541 0.588 0.775 0.846 0.447 0.781 0.247 0.919 0.678 0.335 0.617 0.779 0.983 0.830 0.810 0.390 0.630 0.420 0.689
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao. Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training. CVPR 2024
PTv3 0.458 0.057 0.613 0.423 0.023 0.710 0.000 0.707 0.626 0.871 0.002 0.239 0.524 0.421 0.204 0.452 0.140 0.383 0.915 0.915 0.749 0.706 0.000 0.052 0.010 0.444 0.047 0.436 0.472 0.772 0.230 0.194 0.710 0.389 0.277 0.757 0.000 0.933 0.155 0.773 0.687 0.106 0.797 0.855 0.615 0.263 0.704 0.461 0.000 0.814 0.878 0.766 0.879 0.000 0.210 0.228 0.138 0.245 0.251 0.421 0.746 0.871 0.545 0.020 0.000 0.426 0.318 0.002 0.729 0.275 0.305 0.397 0.182 0.665 0.335 0.454 0.721 0.205 0.796 0.191 0.051 0.046 0.491 0.550 0.771 0.745 0.467 0.811 0.258 0.904 0.679 0.258 0.542 0.743 0.857 0.825 0.758 0.463 0.618 0.458 0.691
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao. Point Transformer V3: Simpler, Faster, Stronger. CVPR 2024 Oral
PT-Fusion-All 0.450 0.058 0.467 0.301 0.076 0.815 0.000 0.689 0.566 0.821 0.080 0.346 0.578 0.378 0.136 0.468 0.211 0.342 0.925 0.935 0.723 0.596 0.000 0.058 0.123 0.512 0.003 0.712 0.447 0.807 0.203 0.242 0.674 0.417 0.322 0.669 0.000 0.945 0.124 0.803 0.599 0.094 0.410 0.841 0.616 0.274 0.414 0.471 0.000 0.905 0.885 0.724 0.856 0.150 0.176 0.185 0.005 0.000 0.372 0.443 0.764 0.857 0.525 0.038 0.031 0.346 0.186 0.000 0.734 0.242 0.257 0.370 0.142 0.729 0.378 0.471 0.768 0.481 0.785 0.041 0.022 0.056 0.408 0.601 0.781 0.758 0.623 0.778 0.224 0.922 0.668 0.244 0.493 0.767 0.898 0.835 0.764 0.411 0.535 0.445 0.660
PTv2 0.427 0.073 0.463 0.219 0.003 0.679 0.000 0.667 0.597 0.873 0.007 0.187 0.523 0.435 0.295 0.461 0.101 0.369 0.916 0.902 0.712 0.727 0.000 0.053 0.037 0.546 0.090 0.549 0.412 0.791 0.009 0.035 0.671 0.356 0.316 0.660 0.003 0.934 0.050 0.758 0.651 0.058 0.621 0.811 0.596 0.227 0.218 0.428 0.021 0.812 0.853 0.740 0.799 0.000 0.016 0.262 0.000 0.236 0.192 0.346 0.766 0.862 0.390 0.042 0.000 0.259 0.278 0.000 0.605 0.290 0.265 0.399 0.196 0.718 0.541 0.447 0.676 0.232 0.756 0.251 0.046 0.051 0.457 0.541 0.737 0.650 0.488 0.721 0.242 0.880 0.685 0.272 0.496 0.674 0.776 0.823 0.800 0.335 0.609 0.394 0.670
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao. Point Transformer V2: Grouped Vector Attention and Partition-based Pooling. NeurIPS 2022
OA-CNNs-Small 0.392 0.000 0.478 0.137 0.021 0.649 0.000 0.523 0.541 0.879 0.072 0.161 0.523 0.339 0.121 0.374 0.168 0.329 0.895 0.902 0.683 0.625 0.000 0.055 0.035 0.483 0.001 0.140 0.386 0.769 0.093 0.098 0.629 0.378 0.323 0.573 0.011 0.932 0.247 0.757 0.642 0.053 0.401 0.736 0.535 0.164 0.137 0.448 0.015 0.786 0.874 0.703 0.801 0.022 0.193 0.128 0.208 0.217 0.183 0.279 0.721 0.861 0.338 0.125 0.000 0.189 0.239 0.010 0.540 0.244 0.296 0.359 0.066 0.634 0.198 0.416 0.620 0.201 0.811 0.152 0.015 0.047 0.415 0.400 0.739 0.445 0.518 0.634 0.152 0.860 0.572 0.195 0.468 0.577 0.941 0.810 0.747 0.312 0.510 0.330 0.646
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia. OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation. Journal of Intelligent & Fuzzy Systems, Volume 43, Issue 5
PonderV2-SparseUNet-base 0.386 0.000 0.389 0.110 0.054 0.739 0.000 0.565 0.530 0.822 0.000 0.382 0.548 0.325 0.000 0.361 0.315 0.396 0.917 0.929 0.730 0.547 0.000 0.050 0.000 0.445 0.000 0.339 0.471 0.832 0.232 0.212 0.715 0.457 0.250 0.397 0.000 0.945 0.000 0.775 0.623 0.000 0.550 0.804 0.526 0.199 0.404 0.473 0.000 0.735 0.874 0.759 0.858 0.000 0.003 0.154 0.000 0.118 0.230 0.377 0.770 0.844 0.325 0.037 0.000 0.000 0.139 0.034 0.750 0.289 0.237 0.372 0.107 0.668 0.315 0.404 0.000 0.445 0.738 0.164 0.000 0.037 0.442 0.481 0.765 0.601 0.350 0.634 0.093 0.894 0.000 0.155 0.299 0.743 0.910 0.834 0.775 0.000 0.478 0.342 0.641
Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang. PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm. Arxiv, 2023
MinkowskiNet 0.292 0.000 0.323 0.116 0.039 0.719 0.000 0.418 0.031 0.726 0.005 0.217 0.481 0.178 0.000 0.212 0.084 0.286 0.879 0.880 0.627 0.211 0.000 0.002 0.278 0.377 0.000 0.207 0.320 0.737 0.001 0.000 0.579 0.252 0.170 0.047 0.000 0.923 0.000 0.603 0.614 0.000 0.149 0.506 0.451 0.240 0.025 0.011 0.000 0.531 0.837 0.369 0.815 0.028 0.000 0.069 0.000 0.130 0.238 0.357 0.721 0.842 0.372 0.086 0.000 0.008 0.033 0.029 0.386 0.315 0.046 0.281 0.132 0.667 0.064 0.348 0.273 0.200 0.608 0.103 0.000 0.016 0.234 0.236 0.704 0.233 0.046 0.390 0.021 0.703 0.363 0.057 0.214 0.539 0.706 0.761 0.724 0.089 0.428 0.321 0.601
Christopher Choy, JunYoung Gwak, Silvio Savarese. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
KPConv 0.265 0.000 0.283 0.111 0.003 0.675 0.000 0.579 0.158 0.682 0.006 0.193 0.553 0.000 0.000 0.288 0.019 0.298 0.904 0.928 0.687 0.000 0.000 0.000 0.000 0.435 0.000 0.271 0.126 0.777 0.004 0.000 0.577 0.165 0.211 0.000 0.000 0.925 0.000 0.645 0.560 0.000 0.000 0.595 0.527 0.200 0.016 0.000 0.000 0.638 0.848 0.000 0.813 0.075 0.000 0.087 0.000 0.000 0.174 0.329 0.814 0.823 0.144 0.009 0.000 0.000 0.002 0.018 0.235 0.326 0.187 0.327 0.007 0.623 0.000 0.000 0.000 0.000 0.705 0.000 0.000 0.000 0.306 0.228 0.689 0.407 0.000 0.375 0.000 0.798 0.000 0.000 0.109 0.558 0.752 0.748 0.660 0.000 0.429 0.299 0.564
Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, Leonidas J. Guibas. KPConv: Flexible and Deformable Convolution for Point Clouds. ICCV 2019
PointNet++ 0.201 0.000 0.128 0.061 0.000 0.426 0.000 0.466 0.161 0.676 0.000 0.154 0.404 0.146 0.000 0.194 0.031 0.221 0.894 0.918 0.515 0.000 0.000 0.000 0.000 0.303 0.000 0.000 0.090 0.396 0.000 0.000 0.292 0.081 0.047 0.000 0.000 0.912 0.000 0.578 0.462 0.000 0.014 0.408 0.269 0.228 0.017 0.000 0.000 0.281 0.775 0.087 0.670 0.001 0.000 0.000 0.000 0.000 0.060 0.282 0.700 0.735 0.000 0.000 0.000 0.000 0.016 0.000 0.000 0.201 0.010 0.254 0.000 0.458 0.000 0.000 0.000 0.105 0.542 0.000 0.000 0.000 0.251 0.075 0.593 0.142 0.000 0.173 0.000 0.467 0.000 0.000 0.062 0.481 0.705 0.698 0.586 0.000 0.446 0.255 0.515
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. NIPS 2017

Please refer to the submission instructions before making a submission

Submit results