The 2D semantic labeling task involves predicting a per-pixel semantic labeling of an image.

Evaluation and metrics

Our evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.



This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
Virtual MVFusion (R)0.745 10.861 10.839 10.881 10.672 10.512 10.422 100.898 10.723 10.714 10.954 20.454 10.509 10.773 10.895 10.756 10.820 10.653 10.935 10.891 10.728 1
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
BPNet_2Dcopyleft0.670 20.822 30.795 30.836 20.659 20.481 20.451 60.769 30.656 30.567 30.931 30.395 30.390 40.700 20.534 30.689 60.770 20.574 30.865 40.831 30.675 3
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
CU-Hybrid-2D Net0.636 30.825 20.820 20.179 160.648 30.463 30.549 10.742 40.676 20.628 20.961 10.420 20.379 50.684 40.381 100.732 20.723 30.599 20.827 80.851 20.634 4
CMX0.613 40.681 60.725 70.502 100.634 50.297 120.478 40.830 20.651 40.537 50.924 40.375 40.315 90.686 30.451 80.714 30.543 140.504 40.894 30.823 40.688 2
DMMF_3d0.605 50.651 70.744 60.782 30.637 40.387 40.536 20.732 50.590 50.540 40.856 130.359 70.306 100.596 70.539 20.627 120.706 40.497 60.785 120.757 110.476 13
MCA-Net0.595 60.533 120.756 50.746 40.590 60.334 70.506 30.670 70.587 60.500 90.905 80.366 60.352 60.601 60.506 50.669 100.648 60.501 50.839 70.769 90.516 12
RFBNet0.592 70.616 80.758 40.659 50.581 70.330 80.469 50.655 100.543 90.524 60.924 40.355 80.336 80.572 80.479 70.671 80.648 60.480 70.814 100.814 50.614 7
DCRedNet0.583 80.682 50.723 80.542 90.510 110.310 100.451 60.668 80.549 80.520 70.920 60.375 40.446 20.528 110.417 90.670 90.577 120.478 80.862 50.806 60.628 6
SSMAcopyleft0.577 90.695 40.716 100.439 120.563 80.314 90.444 80.719 60.551 70.503 80.887 100.346 90.348 70.603 50.353 120.709 40.600 100.457 100.901 20.786 70.599 8
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
SN_RN152pyrx8_RVCcopyleft0.546 100.572 100.663 130.638 70.518 90.298 110.366 150.633 120.510 110.446 120.864 110.296 110.267 120.542 100.346 130.704 50.575 130.431 120.853 60.766 100.630 5
FuseNetpermissive0.535 110.570 110.681 120.182 150.512 100.290 130.431 90.659 90.504 120.495 100.903 90.308 100.428 30.523 120.365 110.676 70.621 90.470 90.762 130.779 80.541 10
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
AdapNet++copyleft0.503 120.613 90.722 90.418 130.358 170.337 60.370 140.479 150.443 130.368 150.907 70.207 140.213 160.464 150.525 40.618 130.657 50.450 110.788 110.721 140.408 16
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
3DMV (2d proj)0.498 130.481 150.612 140.579 80.456 130.343 50.384 120.623 130.525 100.381 140.845 140.254 130.264 140.557 90.182 150.581 150.598 110.429 130.760 140.661 160.446 15
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
MSeg1080_RVCpermissive0.485 140.505 130.709 110.092 170.427 140.241 140.411 110.654 110.385 170.457 110.861 120.053 170.279 110.503 130.481 60.645 110.626 80.365 150.748 150.725 130.529 11
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, Vladlen Koltun: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. CVPR 2020
ILC-PSPNet0.475 150.490 140.581 150.289 140.507 120.067 170.379 130.610 140.417 150.435 130.822 160.278 120.267 120.503 130.228 140.616 140.533 150.375 140.820 90.729 120.560 9
Enet (reimpl)0.376 160.264 170.452 170.452 110.365 150.181 150.143 170.456 160.409 160.346 160.769 170.164 150.218 150.359 160.123 170.403 170.381 170.313 170.571 160.685 150.472 14
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 170.293 160.521 160.657 60.361 160.161 160.250 160.004 170.440 140.183 170.836 150.125 160.060 170.319 170.132 160.417 160.412 160.344 160.541 170.427 170.109 17
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nie├čner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17
DMMF0.003 180.000 180.005 180.000 180.000 180.037 180.001 180.000 180.001 180.005 180.003 180.000 180.000 180.000 180.000 180.000 180.002 180.001 180.000 180.006 180.000 18