The 2D semantic labeling task involves predicting a per-pixel semantic labeling of an image.

Evaluation and metrics

Our evaluation ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.



This table lists the benchmark results for the 2D semantic label scenario.


Method Infoavg ioubathtubbedbookshelfcabinetchaircountercurtaindeskdoorfloorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwallwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
FuseNetpermissive0.521 10.591 10.682 10.220 50.488 20.279 20.344 30.610 20.461 20.475 10.910 10.293 10.447 10.512 20.397 10.618 10.567 20.452 10.734 30.782 10.566 1
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers: FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture. ACCV 2016
3DMV (2d proj)0.498 20.481 30.612 20.579 20.456 30.343 10.384 10.623 10.525 10.381 30.845 20.254 30.264 30.557 10.182 30.581 30.598 10.429 20.760 20.661 40.446 4
Angela Dai, Matthias Niessner: 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. ECCV'18
ILC-PSPNet0.475 30.490 20.581 30.289 40.507 10.067 50.379 20.610 20.417 40.435 20.822 40.278 20.267 20.503 30.228 20.616 20.533 30.375 30.820 10.729 20.560 2
Enet (reimpl)0.376 40.264 50.452 50.452 30.365 40.181 30.143 50.456 40.409 50.346 40.769 50.164 40.218 40.359 40.123 50.403 50.381 50.313 50.571 40.685 30.472 3
Re-implementation of Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
ScanNet (2d proj)permissive0.330 50.293 40.521 40.657 10.361 50.161 40.250 40.004 50.440 30.183 50.836 30.125 50.060 50.319 50.132 40.417 40.412 40.344 40.541 50.427 50.109 5
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nie├čner: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. CVPR'17