The 2D semantic instance prediction task involves detecting and segmenting the object in an image.

Evaluation and metrics

Our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP (from overlaps [0.5:0.95:0.05]), as well as AP 50% for an overlap value of 50. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg ap 50%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
UniDet_RVC0.358 30.554 20.543 30.128 20.402 20.381 30.200 10.461 20.328 10.138 30.232 30.148 30.466 10.109 30.538 10.506 20.294 30.862 20.159 3
EMSANet (Instance)0.380 10.549 30.651 10.147 10.397 30.399 10.167 20.437 30.319 20.210 10.301 10.235 10.463 20.245 20.372 30.511 10.296 20.876 10.268 1
Seichter, Daniel and Fischedick, Söhnke and Köhler, Mona and Gross, Horst-Michael: EMSANet: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments. IJCNN 2022
FKNet0.368 20.588 10.618 20.099 30.466 10.395 20.108 30.548 10.157 30.175 20.268 20.096 40.439 30.343 10.420 20.500 30.317 10.855 30.234 2
MaskRCNN_ScanNetpermissive0.227 40.228 40.381 40.013 40.237 40.339 40.089 40.339 40.150 40.134 40.143 40.179 20.255 40.053 40.331 40.244 40.154 40.687 40.127 4
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17