This table lists the benchmark results for the Scan2Cap Dense Captioning Benchmark scenario.


   Captioning F1-Score Dense Captioning Object Detection
Method InfoCIDEr@0.5IoUBLEU-4@0.5IoURouge-L@0.5IoUMETEOR@0.5IoUDCmAPmAP@0.5
sorted bysort bysort bysort bysort bysort by
vote2cap-detrpermissive0.3128 10.1778 10.2842 10.1316 10.1825 10.4454 1
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU: End-to-End 3D Dense Captioning with Vote2Cap-DETR. CVPR 2023
CFM0.2360 20.1417 20.2253 20.1034 20.1379 50.3008 5
CM3D-Trans+0.2348 30.1383 30.2250 40.1030 30.1398 40.2966 7
Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma: Contextual Modeling for 3D Dense Captioning on Point Clouds.
Forest-xyz0.2266 40.1363 40.2250 30.1027 40.1161 100.2825 10
D3Net - Speakerpermissive0.2088 50.1335 60.2237 50.1022 50.1481 30.4198 2
Dave Zhenyu Chen, Qirui Wu, Matthias Niessner, Angel X. Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding. 17th European Conference on Computer Vision (ECCV), 2022
3DJCG(Captioning)permissive0.1918 60.1350 50.2207 60.1013 60.1506 20.3867 3
Daigang Cai, Lichen Zhao, Jing Zhang†, Lu Sheng, Dong Xu: 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR2022 Oral
REMAN0.1662 70.1070 70.1790 70.0815 70.1235 80.2927 9
NOAH0.1382 80.0901 80.1598 80.0747 80.1359 60.2977 6
SpaCap3Dpermissive0.1359 90.0883 90.1591 90.0738 90.1182 90.3275 4
Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai: Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022
X-Trans2Cappermissive0.1274 100.0808 110.1392 110.0653 110.1244 70.2795 11
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen: X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning. CVPR 2022
MORE-xyzpermissive0.1239 110.0796 120.1362 120.0631 120.1116 120.2648 12
Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang: MORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes. ECCV 2022
SUN+0.1148 120.0846 100.1564 100.0711 100.1143 110.2958 8
Scan2Cappermissive0.0849 130.0576 130.1073 130.0492 130.0970 130.2481 13
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner and Angel X. Chang: Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. CVPR 2021