The 2D semantic labeling task involves predicting a per-pixel semantic labeling of an image.

Evaluation and metrics

Our evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.



This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort by
Virtual MVFusion (R)0.745 10.861 10.839 10.881 10.672 10.512 10.422 100.898 10.723 10.714 10.954 20.454 10.509 10.773 10.895 10.756 10.820 10.653 10.935 10.891 10.728 1
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru: Virtual Multi-view Fusion for 3D Semantic Segmentation. ECCV 2020
CU-Hybrid-2D Net0.636 30.825 20.820 20.179 160.648 30.463 30.549 10.742 30.676 20.628 20.961 10.420 20.379 50.684 30.381 100.732 20.723 30.599 20.827 80.851 20.634 3
BPNet_2Dcopyleft0.670 20.822 30.795 30.836 20.659 20.481 20.451 60.769 20.656 30.567 30.931 30.395 30.390 40.700 20.534 30.689 50.770 20.574 30.865 30.831 30.675 2
Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong: Bidirectional Projection Network for Cross Dimension Scene Understanding. CVPR 2021 (Oral)
RFBNet0.592 70.616 70.758 40.659 60.581 70.330 90.469 50.655 100.543 90.524 50.924 40.355 70.336 80.572 80.479 80.671 80.648 60.480 70.814 100.814 40.614 6
DCRedNet0.583 80.682 50.723 80.542 100.510 110.310 110.451 60.668 80.549 80.520 60.920 50.375 40.446 20.528 110.417 90.670 90.577 130.478 80.862 40.806 50.628 5
SSMAcopyleft0.577 90.695 40.716 100.439 120.563 80.314 100.444 80.719 50.551 70.503 70.887 100.346 90.348 70.603 40.353 120.709 30.600 110.457 100.901 20.786 60.599 7
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
FuseNetpermissive0.535 110.570 100.681 120.182 150.512 100.290 130.431 90.659 90.504 120.495 90.903 90.308 100.428 30.523 120.365 110.676 60.621 100.470 90.762 130.779 70.541 9
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
DMMF0.597 50.543 110.755 60.749 40.585 60.338 60.494 40.704 60.598 40.494 100.911 60.347 80.327 90.593 70.527 40.675 70.646 80.513 40.842 60.774 80.527 11
MCA-Net0.595 60.533 120.756 50.746 50.590 50.334 80.506 30.670 70.587 60.500 80.905 80.366 50.352 60.601 50.506 60.669 100.648 60.501 50.839 70.769 90.516 12
SN_RN152pyrx8_RVCcopyleft0.546 100.572 90.663 130.638 80.518 90.298 120.366 150.633 120.510 110.446 120.864 110.296 110.267 120.542 100.346 130.704 40.575 140.431 120.853 50.766 100.630 4
DMMF_3d0.605 40.651 60.744 70.782 30.637 40.387 40.536 20.732 40.590 50.540 40.856 130.359 60.306 100.596 60.539 20.627 120.706 40.497 60.785 120.757 110.476 13
ILC-PSPNet0.475 150.490 140.581 150.289 140.507 120.067 170.379 130.610 140.417 150.435 130.822 160.278 120.267 120.503 130.228 140.616 140.533 150.375 140.820 90.729 120.560 8
MSeg1080_RVCpermissive0.485 140.505 130.709 110.092 170.427 140.241 140.411 110.654 110.385 170.457 110.861 120.053 170.279 110.503 130.481 70.645 110.626 90.365 150.748 150.725 130.529 10
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, Vladlen Koltun: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. CVPR 2020
AdapNet++copyleft0.503 120.613 80.722 90.418 130.358 170.337 70.370 140.479 150.443 130.368 150.907 70.207 140.213 160.464 150.525 50.618 130.657 50.450 110.788 110.721 140.408 16
Abhinav Valada, Rohit Mohan, Wolfram Burgard: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation. International Journal of Computer Vision, 2019
Enet (reimpl)0.376 160.264 170.452 170.452 110.365 150.181 150.143 170.456 160.409 160.346 160.769 170.164 150.218 150.359 160.123 170.403 170.381 170.313 170.571 160.685 150.472 14
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
3DMV (2d proj)0.498 130.481 150.612 140.579 90.456 130.343 50.384 120.623 130.525 100.381 140.845 140.254 130.264 140.557 90.182 150.581 150.598 120.429 130.760 140.661 160.446 15
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
ScanNet (2d proj)permissive0.330 170.293 160.521 160.657 70.361 160.161 160.250 160.004 170.440 140.183 170.836 150.125 160.060 170.319 170.132 160.417 160.412 160.344 160.541 170.427 170.109 17
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nie├čner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17