This table lists the benchmark results for the Scan2Cap Dense Captioning Benchmark scenario.


   Captioning F1-Score Dense Captioning Object Detection
Method InfoCIDEr@0.5IoUBLEU-4@0.5IoURouge-L@0.5IoUMETEOR@0.5IoUDCmAPmAP@0.5
sort bysort bysorted bysort bysort bysort by
Vote2Cap-DETR++0.3360 10.1908 10.3012 10.1386 10.1864 10.5090 1
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen: Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning.
vote2cap-detrpermissive0.3128 20.1778 20.2842 20.1316 20.1825 20.4454 2
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU, Taihao Li: End-to-End 3D Dense Captioning with Vote2Cap-DETR. CVPR 2023
CFM0.2360 30.1417 30.2253 30.1034 30.1379 60.3008 6
Forest-xyz0.2266 50.1363 50.2250 40.1027 50.1161 110.2825 11
CM3D-Trans+0.2348 40.1383 40.2250 50.1030 40.1398 50.2966 8
Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma: Contextual Modeling for 3D Dense Captioning on Point Clouds.
D3Net - Speakerpermissive0.2088 60.1335 70.2237 60.1022 60.1481 40.4198 3
Dave Zhenyu Chen, Qirui Wu, Matthias Niessner, Angel X. Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding. 17th European Conference on Computer Vision (ECCV), 2022
3DJCG(Captioning)permissive0.1918 70.1350 60.2207 70.1013 70.1506 30.3867 4
Daigang Cai, Lichen Zhao, Jing Zhang†, Lu Sheng, Dong Xu: 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR2022 Oral
REMAN0.1662 80.1070 80.1790 80.0815 80.1235 90.2927 10
NOAH0.1382 90.0901 90.1598 90.0747 90.1359 70.2977 7
SpaCap3Dpermissive0.1359 100.0883 100.1591 100.0738 100.1182 100.3275 5
Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai: Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022
SUN+0.1148 130.0846 110.1564 110.0711 110.1143 120.2958 9
X-Trans2Cappermissive0.1274 110.0808 120.1392 120.0653 120.1244 80.2795 12
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen: X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning. CVPR 2022
MORE-xyzpermissive0.1239 120.0796 130.1362 130.0631 130.1116 130.2648 13
Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang: MORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes. ECCV 2022
Scan2Cappermissive0.0849 140.0576 140.1073 140.0492 140.0970 140.2481 14
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner and Angel X. Chang: Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. CVPR 2021