The 2D semantic labeling task involves predicting a per-pixel semantic labeling of an image.

Evaluation and metrics

Our evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.



This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort by
Virtual MVFusion (R)0.745 10.861 10.839 10.881 10.672 20.512 10.422 180.898 10.723 10.714 10.954 20.454 10.509 10.773 10.895 10.756 10.820 10.653 10.935 10.891 10.728 1
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
MVF-GNN(2D)0.636 30.606 150.794 40.434 160.688 10.337 80.464 130.798 30.632 50.589 30.908 90.420 20.329 130.743 20.594 20.738 20.676 50.527 40.906 20.818 60.715 3
DMMF_3d0.605 60.651 100.744 100.782 30.637 50.387 40.536 40.732 90.590 70.540 60.856 220.359 110.306 160.596 150.539 30.627 210.706 40.497 80.785 220.757 200.476 23
BPNet_2Dcopyleft0.670 20.822 30.795 30.836 20.659 30.481 20.451 140.769 40.656 30.567 40.931 30.395 60.390 50.700 40.534 40.689 110.770 20.574 30.865 100.831 30.675 5
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
EMSAFormer0.564 170.581 170.736 110.564 100.546 170.219 240.517 60.675 150.486 200.427 220.904 120.352 140.320 140.589 160.528 50.708 70.464 250.413 230.847 150.786 120.611 12
AdapNet++copyleft0.503 220.613 130.722 140.418 180.358 270.337 80.370 240.479 250.443 230.368 250.907 100.207 240.213 260.464 250.525 60.618 230.657 80.450 170.788 210.721 240.408 26
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
DMMF0.567 150.623 110.767 50.238 210.571 140.347 60.413 200.719 100.472 210.418 230.895 140.357 120.260 230.696 50.523 70.666 180.642 110.437 190.895 60.793 110.603 13
MCA-Net0.595 80.533 210.756 80.746 40.590 100.334 100.506 80.670 160.587 80.500 120.905 110.366 100.352 90.601 140.506 80.669 170.648 90.501 70.839 160.769 160.516 22
MSeg1080_RVCpermissive0.485 240.505 230.709 160.092 260.427 240.241 230.411 210.654 200.385 270.457 190.861 210.053 270.279 190.503 230.481 90.645 190.626 130.365 250.748 250.725 230.529 21
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, Vladlen Koltun: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. CVPR 2020
RFBNet0.592 90.616 120.758 70.659 50.581 110.330 110.469 120.655 190.543 140.524 80.924 40.355 130.336 110.572 180.479 100.671 150.648 90.480 100.814 200.814 70.614 11
UNIV_CNP_RVC_UE0.566 160.569 200.686 200.435 150.524 180.294 190.421 190.712 130.543 140.463 180.872 180.320 180.363 80.611 120.477 110.686 120.627 120.443 180.862 110.775 150.639 7
MIX6D_RVC0.582 130.695 60.687 180.225 220.632 70.328 130.550 10.748 70.623 60.494 150.890 150.350 160.254 240.688 60.454 120.716 40.597 170.489 90.881 80.768 170.575 16
FAN_NV_RVC0.586 100.510 220.764 60.079 270.620 80.330 110.494 90.753 60.573 90.556 50.884 170.405 40.303 170.718 30.452 130.672 140.658 70.509 50.898 50.813 80.727 2
CMX0.613 50.681 90.725 120.502 120.634 60.297 180.478 110.830 20.651 40.537 70.924 40.375 70.315 150.686 70.451 140.714 50.543 220.504 60.894 70.823 50.688 4
DCRedNet0.583 120.682 80.723 130.542 110.510 210.310 150.451 140.668 170.549 130.520 90.920 80.375 70.446 20.528 210.417 150.670 160.577 180.478 110.862 110.806 100.628 10
EMSANet0.600 70.716 40.746 90.395 190.614 90.382 50.523 50.713 120.571 110.503 100.922 70.404 50.397 40.655 90.400 160.626 220.663 60.469 130.900 40.827 40.577 15
Seichter, Daniel and Fischedick, Söhnke and Köhler, Mona and Gross, Horst-Michael: EMSANet: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments. IJCNN 2022
WSGFormer0.585 110.706 50.708 170.434 160.574 130.283 210.538 30.759 50.542 160.482 160.924 40.351 150.333 120.614 110.393 170.692 100.551 210.461 140.874 90.809 90.673 6
segfomer with 6d0.542 200.594 160.687 180.146 250.579 120.308 160.515 70.703 140.472 210.498 130.868 190.369 90.282 180.589 160.390 180.701 90.556 200.416 220.860 130.759 190.539 20
CU-Hybrid-2D Net0.636 30.825 20.820 20.179 240.648 40.463 30.549 20.742 80.676 20.628 20.961 10.420 20.379 60.684 80.381 190.732 30.723 30.599 20.827 170.851 20.634 8
FuseNetpermissive0.535 210.570 190.681 210.182 230.512 200.290 200.431 170.659 180.504 190.495 140.903 130.308 200.428 30.523 220.365 200.676 130.621 140.470 120.762 230.779 140.541 18
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
SSMAcopyleft0.577 140.695 60.716 150.439 140.563 150.314 140.444 160.719 100.551 120.503 100.887 160.346 170.348 100.603 130.353 210.709 60.600 150.457 150.901 30.786 120.599 14
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
SN_RN152pyrx8_RVCcopyleft0.546 180.572 180.663 220.638 70.518 190.298 170.366 250.633 220.510 180.446 200.864 200.296 210.267 200.542 200.346 220.704 80.575 190.431 200.853 140.766 180.630 9
ILC-PSPNet0.475 250.490 240.581 250.289 200.507 220.067 270.379 230.610 240.417 250.435 210.822 260.278 220.267 200.503 230.228 230.616 240.533 230.375 240.820 190.729 220.560 17
UDSSEG_RVC0.545 190.610 140.661 230.588 80.556 160.268 220.482 100.642 210.572 100.475 170.836 240.312 190.367 70.630 100.189 240.639 200.495 240.452 160.826 180.756 210.541 18
3DMV (2d proj)0.498 230.481 250.612 240.579 90.456 230.343 70.384 220.623 230.525 170.381 240.845 230.254 230.264 220.557 190.182 250.581 250.598 160.429 210.760 240.661 260.446 25
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
ScanNet (2d proj)permissive0.330 270.293 260.521 260.657 60.361 260.161 260.250 260.004 270.440 240.183 270.836 240.125 260.060 270.319 270.132 260.417 260.412 260.344 260.541 270.427 270.109 27
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
Enet (reimpl)0.376 260.264 270.452 270.452 130.365 250.181 250.143 270.456 260.409 260.346 260.769 270.164 250.218 250.359 260.123 270.403 270.381 270.313 270.571 260.685 250.472 24
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.