The 2D semantic labeling task involves predicting a per-pixel semantic labeling of an image.

Evaluation and metrics

Our evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.



This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Virtual MVFusion (R)0.745 10.861 10.839 10.881 10.672 20.512 10.422 170.898 10.723 10.714 10.954 20.454 10.509 10.773 10.895 10.756 10.820 10.653 10.935 10.891 10.728 1
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
BPNet_2Dcopyleft0.670 20.822 30.795 30.836 20.659 30.481 20.451 130.769 40.656 30.567 40.931 30.395 60.390 50.700 40.534 40.689 100.770 20.574 30.865 80.831 30.675 5
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
MVF-GNN(2D)0.636 30.606 130.794 40.434 160.688 10.337 70.464 120.798 30.632 50.589 30.908 80.420 20.329 120.743 20.594 20.738 20.676 50.527 40.906 20.818 60.715 3
CU-Hybrid-2D Net0.636 30.825 20.820 20.179 220.648 40.463 30.549 20.742 70.676 20.628 20.961 10.420 20.379 60.684 70.381 170.732 30.723 30.599 20.827 150.851 20.634 7
CMX0.613 50.681 80.725 110.502 120.634 60.297 170.478 100.830 20.651 40.537 70.924 40.375 70.315 140.686 60.451 130.714 50.543 200.504 60.894 60.823 50.688 4
DMMF_3d0.605 60.651 90.744 90.782 30.637 50.387 40.536 30.732 80.590 70.540 60.856 200.359 110.306 150.596 130.539 30.627 190.706 40.497 80.785 200.757 180.476 21
EMSANet0.600 70.716 40.746 80.395 180.614 90.382 50.523 40.713 100.571 110.503 100.922 60.404 50.397 40.655 80.400 150.626 200.663 60.469 130.900 40.827 40.577 13
Seichter, Daniel and Fischedick, Söhnke and Köhler, Mona and Gross, Horst-Michael: EMSANet: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments. IJCNN 2022
MCA-Net0.595 80.533 190.756 70.746 40.590 100.334 90.506 70.670 140.587 80.500 120.905 100.366 100.352 90.601 120.506 70.669 160.648 90.501 70.839 140.769 140.516 20
RFBNet0.592 90.616 100.758 60.659 50.581 110.330 100.469 110.655 170.543 140.524 80.924 40.355 120.336 110.572 160.479 90.671 140.648 90.480 100.814 180.814 70.614 10
FAN_NV_RVC0.586 100.510 200.764 50.079 250.620 80.330 100.494 80.753 50.573 90.556 50.884 150.405 40.303 160.718 30.452 120.672 130.658 70.509 50.898 50.813 80.727 2
DCRedNet0.583 110.682 70.723 120.542 110.510 190.310 140.451 130.668 150.549 130.520 90.920 70.375 70.446 20.528 190.417 140.670 150.577 170.478 110.862 90.806 90.628 9
MIX6D_RVC0.582 120.695 50.687 160.225 200.632 70.328 120.550 10.748 60.623 60.494 150.890 130.350 140.254 220.688 50.454 110.716 40.597 160.489 90.881 70.768 150.575 14
SSMAcopyleft0.577 130.695 50.716 140.439 140.563 130.314 130.444 150.719 90.551 120.503 100.887 140.346 150.348 100.603 110.353 190.709 60.600 140.457 140.901 30.786 100.599 12
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
UNIV_CNP_RVC_UE0.566 140.569 180.686 180.435 150.524 160.294 180.421 180.712 110.543 140.463 170.872 160.320 160.363 80.611 100.477 100.686 110.627 110.443 170.862 90.775 130.639 6
EMSAFormer0.564 150.581 150.736 100.564 100.546 150.219 220.517 50.675 130.486 190.427 210.904 110.352 130.320 130.589 140.528 50.708 70.464 230.413 210.847 130.786 100.611 11
SN_RN152pyrx8_RVCcopyleft0.546 160.572 160.663 200.638 70.518 170.298 160.366 230.633 200.510 170.446 190.864 180.296 190.267 190.542 180.346 200.704 80.575 180.431 180.853 120.766 160.630 8
UDSSEG_RVC0.545 170.610 120.661 210.588 80.556 140.268 200.482 90.642 190.572 100.475 160.836 220.312 170.367 70.630 90.189 220.639 180.495 220.452 150.826 160.756 190.541 16
segfomer with 6d0.542 180.594 140.687 160.146 230.579 120.308 150.515 60.703 120.472 200.498 130.868 170.369 90.282 170.589 140.390 160.701 90.556 190.416 200.860 110.759 170.539 18
FuseNetpermissive0.535 190.570 170.681 190.182 210.512 180.290 190.431 160.659 160.504 180.495 140.903 120.308 180.428 30.523 200.365 180.676 120.621 130.470 120.762 210.779 120.541 16
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 200.613 110.722 130.418 170.358 250.337 70.370 220.479 230.443 210.368 230.907 90.207 220.213 240.464 230.525 60.618 210.657 80.450 160.788 190.721 220.408 24
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
3DMV (2d proj)0.498 210.481 230.612 220.579 90.456 210.343 60.384 200.623 210.525 160.381 220.845 210.254 210.264 210.557 170.182 230.581 230.598 150.429 190.760 220.661 240.446 23
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
MSeg1080_RVCpermissive0.485 220.505 210.709 150.092 240.427 220.241 210.411 190.654 180.385 250.457 180.861 190.053 250.279 180.503 210.481 80.645 170.626 120.365 230.748 230.725 210.529 19
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, Vladlen Koltun: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. CVPR 2020
ILC-PSPNet0.475 230.490 220.581 230.289 190.507 200.067 250.379 210.610 220.417 230.435 200.822 240.278 200.267 190.503 210.228 210.616 220.533 210.375 220.820 170.729 200.560 15
Enet (reimpl)0.376 240.264 250.452 250.452 130.365 230.181 230.143 250.456 240.409 240.346 240.769 250.164 230.218 230.359 240.123 250.403 250.381 250.313 250.571 240.685 230.472 22
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 250.293 240.521 240.657 60.361 240.161 240.250 240.004 250.440 220.183 250.836 220.125 240.060 250.319 250.132 240.417 240.412 240.344 240.541 250.427 250.109 25
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
DMMF0.003 260.000 260.005 260.000 260.000 260.037 260.001 260.000 260.001 260.005 260.003 260.000 260.000 260.000 260.000 260.000 260.002 260.001 260.000 260.006 260.000 26