This table lists the benchmark results for the Scan2Cap Dense Captioning Benchmark scenario.


   Captioning F1-Score Dense Captioning Object Detection
Method InfoCIDEr@0.5IoUBLEU-4@0.5IoURouge-L@0.5IoUMETEOR@0.5IoUDCmAPmAP@0.5
sort bysort bysort bysorted bysort bysort by
Vote2Cap-DETR++0.3360 10.1908 10.3012 10.1386 10.1864 10.5090 1
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen: Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning.
TMP0.3029 30.1728 30.2898 20.1332 20.1801 30.4605 2
vote2cap-detrpermissive0.3128 20.1778 20.2842 30.1316 30.1825 20.4454 3
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU, Taihao Li: End-to-End 3D Dense Captioning with Vote2Cap-DETR. CVPR 2023
CFM0.2360 40.1417 40.2253 40.1034 40.1379 70.3008 7
CM3D-Trans+0.2348 50.1383 50.2250 60.1030 50.1398 60.2966 9
Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma: Contextual Modeling for 3D Dense Captioning on Point Clouds.
Forest-xyz0.2266 60.1363 60.2250 50.1027 60.1161 120.2825 12
D3Net - Speakerpermissive0.2088 70.1335 80.2237 70.1022 70.1481 50.4198 4
Dave Zhenyu Chen, Qirui Wu, Matthias Niessner, Angel X. Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding. 17th European Conference on Computer Vision (ECCV), 2022
3DJCG(Captioning)permissive0.1918 80.1350 70.2207 80.1013 80.1506 40.3867 5
Daigang Cai, Lichen Zhao, Jing Zhang†, Lu Sheng, Dong Xu: 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR2022 Oral
REMAN0.1662 90.1070 90.1790 90.0815 90.1235 100.2927 11
NOAH0.1382 100.0901 100.1598 100.0747 100.1359 80.2977 8
SpaCap3Dpermissive0.1359 110.0883 110.1591 110.0738 110.1182 110.3275 6
Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai: Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022
SUN+0.1148 140.0846 120.1564 120.0711 120.1143 130.2958 10
X-Trans2Cappermissive0.1274 120.0808 130.1392 130.0653 130.1244 90.2795 13
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen: X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning. CVPR 2022
MORE-xyzpermissive0.1239 130.0796 140.1362 140.0631 140.1116 140.2648 14
Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang: MORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes. ECCV 2022
Scan2Cappermissive0.0849 150.0576 150.1073 150.0492 150.0970 150.2481 15
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner and Angel X. Chang: Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. CVPR 2021