The 2D semantic labeling task involves predicting a per-pixel semantic labeling of an image.

Evaluation and metrics

Our evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.



This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Virtual MVFusion (R)0.745 10.861 10.839 10.881 10.672 10.512 10.422 140.898 10.723 10.714 10.954 20.454 10.509 10.773 10.895 10.756 10.820 10.653 10.935 10.891 10.728 1
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
BPNet_2Dcopyleft0.670 20.822 30.795 30.836 20.659 20.481 20.451 100.769 30.656 30.567 30.931 30.395 40.390 40.700 30.534 30.689 80.770 20.574 30.865 60.831 30.675 4
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
CU-Hybrid-2D Net0.636 30.825 20.820 20.179 190.648 30.463 30.549 20.742 60.676 20.628 20.961 10.420 20.379 50.684 60.381 140.732 20.723 30.599 20.827 120.851 20.634 6
CMX0.613 40.681 70.725 80.502 110.634 50.297 150.478 80.830 20.651 40.537 60.924 40.375 50.315 110.686 50.451 110.714 40.543 180.504 50.894 40.823 40.688 3
DMMF_3d0.605 50.651 80.744 70.782 30.637 40.387 40.536 30.732 70.590 60.540 50.856 170.359 90.306 120.596 110.539 20.627 170.706 40.497 70.785 170.757 150.476 18
MCA-Net0.595 60.533 160.756 60.746 40.590 80.334 70.506 50.670 110.587 70.500 100.905 80.366 80.352 80.601 100.506 50.669 140.648 70.501 60.839 110.769 110.516 17
RFBNet0.592 70.616 90.758 50.659 50.581 90.330 80.469 90.655 140.543 120.524 70.924 40.355 100.336 100.572 130.479 70.671 120.648 70.480 90.814 150.814 50.614 9
FAN_NV_RVC0.586 80.510 170.764 40.079 220.620 70.330 80.494 60.753 40.573 80.556 40.884 120.405 30.303 130.718 20.452 100.672 110.658 50.509 40.898 30.813 60.727 2
DCRedNet0.583 90.682 60.723 90.542 100.510 160.310 120.451 100.668 120.549 110.520 80.920 60.375 50.446 20.528 160.417 120.670 130.577 150.478 100.862 70.806 70.628 8
MIX6D_RVC0.582 100.695 40.687 130.225 170.632 60.328 100.550 10.748 50.623 50.494 130.890 100.350 110.254 190.688 40.454 90.716 30.597 140.489 80.881 50.768 120.575 11
SSMAcopyleft0.577 110.695 40.716 110.439 130.563 110.314 110.444 120.719 80.551 100.503 90.887 110.346 120.348 90.603 90.353 160.709 50.600 120.457 120.901 20.786 80.599 10
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
UNIV_CNP_RVC_UE0.566 120.569 150.686 150.435 140.524 130.294 160.421 150.712 90.543 120.463 150.872 130.320 130.363 70.611 80.477 80.686 90.627 90.443 150.862 70.775 100.639 5
SN_RN152pyrx8_RVCcopyleft0.546 130.572 130.663 170.638 70.518 140.298 140.366 200.633 170.510 150.446 170.864 150.296 160.267 160.542 150.346 170.704 60.575 160.431 160.853 100.766 130.630 7
UDSSEG_RVC0.545 140.610 110.661 180.588 80.556 120.268 180.482 70.642 160.572 90.475 140.836 190.312 140.367 60.630 70.189 190.639 160.495 200.452 130.826 130.756 160.541 13
segfomer with 6d0.542 150.594 120.687 130.146 200.579 100.308 130.515 40.703 100.472 170.498 110.868 140.369 70.282 140.589 120.390 130.701 70.556 170.416 180.860 90.759 140.539 15
FuseNetpermissive0.535 160.570 140.681 160.182 180.512 150.290 170.431 130.659 130.504 160.495 120.903 90.308 150.428 30.523 170.365 150.676 100.621 110.470 110.762 180.779 90.541 13
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 170.613 100.722 100.418 150.358 220.337 60.370 190.479 200.443 180.368 200.907 70.207 190.213 210.464 200.525 40.618 180.657 60.450 140.788 160.721 190.408 21
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
3DMV (2d proj)0.498 180.481 200.612 190.579 90.456 180.343 50.384 170.623 180.525 140.381 190.845 180.254 180.264 180.557 140.182 200.581 200.598 130.429 170.760 190.661 210.446 20
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
MSeg1080_RVCpermissive0.485 190.505 180.709 120.092 210.427 190.241 190.411 160.654 150.385 220.457 160.861 160.053 220.279 150.503 180.481 60.645 150.626 100.365 200.748 200.725 180.529 16
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, Vladlen Koltun: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. CVPR 2020
ILC-PSPNet0.475 200.490 190.581 200.289 160.507 170.067 220.379 180.610 190.417 200.435 180.822 210.278 170.267 160.503 180.228 180.616 190.533 190.375 190.820 140.729 170.560 12
Enet (reimpl)0.376 210.264 220.452 220.452 120.365 200.181 200.143 220.456 210.409 210.346 210.769 220.164 200.218 200.359 210.123 220.403 220.381 220.313 220.571 210.685 200.472 19
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 220.293 210.521 210.657 60.361 210.161 210.250 210.004 220.440 190.183 220.836 190.125 210.060 220.319 220.132 210.417 210.412 210.344 210.541 220.427 220.109 22
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nie├čner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
DMMF0.003 230.000 230.005 230.000 230.000 230.037 230.001 230.000 230.001 230.005 230.003 230.000 230.000 230.000 230.000 230.000 230.002 230.001 230.000 230.006 230.000 23