The 2D semantic labeling task involves predicting a per-pixel semantic labeling of an image.

Evaluation and metrics

Our evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.



This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Virtual MVFusion (R)0.745 10.861 10.839 10.881 10.672 10.512 10.422 110.898 10.723 10.714 10.954 20.454 10.509 10.773 10.895 10.756 10.820 10.653 10.935 10.891 10.728 1
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
CU-Hybrid-2D Net0.636 30.825 20.820 20.179 170.648 30.463 30.549 10.742 40.676 20.628 20.961 10.420 20.379 50.684 40.381 110.732 20.723 30.599 20.827 90.851 20.634 4
BPNet_2Dcopyleft0.670 20.822 30.795 30.836 20.659 20.481 20.451 70.769 30.656 30.567 30.931 30.395 30.390 40.700 20.534 30.689 60.770 20.574 30.865 40.831 30.675 3
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
RFBNet0.592 80.616 80.758 40.659 60.581 80.330 90.469 60.655 110.543 100.524 60.924 40.355 80.336 80.572 90.479 80.671 90.648 60.480 80.814 110.814 50.614 7
MCA-Net0.595 70.533 130.756 50.746 50.590 60.334 80.506 30.670 80.587 70.500 90.905 90.366 60.352 60.601 60.506 60.669 110.648 60.501 60.839 80.769 100.516 13
DMMF0.597 60.543 120.755 60.749 40.585 70.338 60.494 40.704 70.598 50.494 110.911 70.347 90.327 90.593 80.527 40.675 80.646 80.513 40.842 70.774 90.527 12
DMMF_3d0.605 50.651 70.744 70.782 30.637 40.387 40.536 20.732 50.590 60.540 40.856 140.359 70.306 110.596 70.539 20.627 130.706 40.497 70.785 130.757 120.476 14
CMX0.613 40.681 60.725 80.502 110.634 50.297 130.478 50.830 20.651 40.537 50.924 40.375 40.315 100.686 30.451 90.714 30.543 150.504 50.894 30.823 40.688 2
DCRedNet0.583 90.682 50.723 90.542 100.510 120.310 110.451 70.668 90.549 90.520 70.920 60.375 40.446 20.528 120.417 100.670 100.577 130.478 90.862 50.806 60.628 6
AdapNet++copyleft0.503 130.613 90.722 100.418 140.358 180.337 70.370 150.479 160.443 140.368 160.907 80.207 150.213 170.464 160.525 50.618 140.657 50.450 120.788 120.721 150.408 17
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
SSMAcopyleft0.577 100.695 40.716 110.439 130.563 90.314 100.444 90.719 60.551 80.503 80.887 110.346 100.348 70.603 50.353 130.709 40.600 110.457 110.901 20.786 70.599 8
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
MSeg1080_RVCpermissive0.485 150.505 140.709 120.092 180.427 150.241 150.411 120.654 120.385 180.457 120.861 130.053 180.279 120.503 140.481 70.645 120.626 90.365 160.748 160.725 140.529 11
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, Vladlen Koltun: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. CVPR 2020
FuseNetpermissive0.535 120.570 110.681 130.182 160.512 110.290 140.431 100.659 100.504 130.495 100.903 100.308 110.428 30.523 130.365 120.676 70.621 100.470 100.762 140.779 80.541 10
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
SN_RN152pyrx8_RVCcopyleft0.546 110.572 100.663 140.638 80.518 100.298 120.366 160.633 130.510 120.446 130.864 120.296 120.267 130.542 110.346 140.704 50.575 140.431 130.853 60.766 110.630 5
3DMV (2d proj)0.498 140.481 160.612 150.579 90.456 140.343 50.384 130.623 140.525 110.381 150.845 150.254 140.264 150.557 100.182 160.581 160.598 120.429 140.760 150.661 170.446 16
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
ILC-PSPNet0.475 160.490 150.581 160.289 150.507 130.067 180.379 140.610 150.417 160.435 140.822 170.278 130.267 130.503 140.228 150.616 150.533 160.375 150.820 100.729 130.560 9
ScanNet (2d proj)permissive0.330 180.293 170.521 170.657 70.361 170.161 170.250 170.004 180.440 150.183 180.836 160.125 170.060 180.319 180.132 170.417 170.412 170.344 170.541 180.427 180.109 18
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nie├čner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
Enet (reimpl)0.376 170.264 180.452 180.452 120.365 160.181 160.143 180.456 170.409 170.346 170.769 180.164 160.218 160.359 170.123 180.403 180.381 180.313 180.571 170.685 160.472 15
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.