The 2D semantic instance prediction task involves detecting and segmenting the object in an image.

Evaluation and metrics

Our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP (from overlaps [0.5:0.95:0.05]), as well as AP 50% for an overlap value of 50. Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the 2D semantic instance scenario.




Method Infoavg apbathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sorted bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort by
EMSANet (Instance)0.241 10.401 10.439 10.085 10.242 10.220 10.081 10.289 10.117 20.121 10.182 10.126 10.346 10.181 10.181 20.358 10.156 10.675 20.131 1
Seichter, Daniel and Fischedick, Söhnke and Köhler, Mona and Gross, Horst-Michael: EMSANet: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments. IJCNN 2022
UniDet_RVC0.205 20.381 20.323 20.037 20.226 20.177 20.063 20.277 20.120 10.067 20.131 20.074 30.317 20.080 20.235 10.289 20.141 20.678 10.080 2
MaskRCNN_ScanNetpermissive0.119 30.129 30.212 30.002 30.112 30.148 30.014 30.205 30.044 30.066 30.078 30.095 20.142 30.030 30.128 30.139 30.080 30.459 30.057 3
Re-implementation of Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. ICCV'17