This table lists the benchmark results for the Scan2Cap Dense Captioning Benchmark scenario.


   Captioning F1-Score Dense Captioning Object Detection
Method InfoCIDEr@0.5IoUBLEU-4@0.5IoURouge-L@0.5IoUMETEOR@0.5IoUDCmAPmAP@0.5
sort bysort bysort bysort bysorted bysort by
Vote2Cap-DETR++0.3360 10.1908 10.3012 10.1386 10.1864 10.5090 1
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen: Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning.
vote2cap-detrpermissive0.3128 30.1778 20.2842 40.1316 40.1825 20.4454 6
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU, Taihao Li: End-to-End 3D Dense Captioning with Vote2Cap-DETR. CVPR 2023
TMP0.3029 40.1728 30.2898 20.1332 30.1801 30.4605 5
Chat-Scene-all0.1257 150.0671 170.1150 170.0554 170.1539 40.5076 2
Chat-Scene-thres0.010.2053 90.1103 100.1884 100.0907 100.1527 50.5076 2
3DJCG(Captioning)permissive0.1918 100.1350 80.2207 90.1013 90.1506 60.3867 8
Daigang Cai, Lichen Zhao, Jing Zhang†, Lu Sheng, Dong Xu: 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR2022 Oral
D3Net - Speakerpermissive0.2088 80.1335 90.2237 80.1022 80.1481 70.4198 7
Dave Zhenyu Chen, Qirui Wu, Matthias Niessner, Angel X. Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding. 17th European Conference on Computer Vision (ECCV), 2022
Chat-Scene-thres0.5permissive0.3128 20.1679 40.2862 30.1376 20.1478 80.4981 4
Haifeng Huang, Yilun Chen, Zehan Wang, et al.: Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers. NeurIPS 2024
CM3D-Trans+0.2348 60.1383 60.2250 70.1030 60.1398 90.2966 12
Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma: Contextual Modeling for 3D Dense Captioning on Point Clouds.
CFM0.2360 50.1417 50.2253 50.1034 50.1379 100.3008 10
NOAH0.1382 120.0901 120.1598 120.0747 120.1359 110.2977 11
X-Trans2Cappermissive0.1274 140.0808 150.1392 150.0653 150.1244 120.2795 16
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen: X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning. CVPR 2022
REMAN0.1662 110.1070 110.1790 110.0815 110.1235 130.2927 14
SpaCap3Dpermissive0.1359 130.0883 130.1591 130.0738 130.1182 140.3275 9
Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai: Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022
Forest-xyz0.2266 70.1363 70.2250 60.1027 70.1161 150.2825 15
SUN+0.1148 170.0846 140.1564 140.0711 140.1143 160.2958 13
MORE-xyzpermissive0.1239 160.0796 160.1362 160.0631 160.1116 170.2648 17
Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang: MORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes. ECCV 2022
Scan2Cappermissive0.0849 180.0576 180.1073 180.0492 180.0970 180.2481 18
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner and Angel X. Chang: Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. CVPR 2021