Scan2Cap Benchmark
This table lists the benchmark results for the Scan2Cap Dense Captioning Benchmark scenario.
Captioning F1-Score | Dense Captioning | Object Detection | |||||
---|---|---|---|---|---|---|---|
Method | Info | CIDEr@0.5IoU | BLEU-4@0.5IoU | Rouge-L@0.5IoU | METEOR@0.5IoU | DCmAP | mAP@0.5 |
Chat-Scene-thres0.5 | 0.3456 1 | 0.1859 2 | 0.3162 1 | 0.1527 1 | 0.1415 8 | 0.4856 4 | |
Haifeng Huang, Yilun Chen, Zehan Wang, et al.: Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers. NeurIPS 2024 | |||||||
Vote2Cap-DETR++ | 0.3360 2 | 0.1908 1 | 0.3012 2 | 0.1386 2 | 0.1864 1 | 0.5090 1 | |
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen: Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning. | |||||||
vote2cap-detr | 0.3128 3 | 0.1778 3 | 0.2842 4 | 0.1316 4 | 0.1825 2 | 0.4454 6 | |
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU, Taihao Li: End-to-End 3D Dense Captioning with Vote2Cap-DETR. CVPR 2023 | |||||||
TMP | 0.3029 4 | 0.1728 4 | 0.2898 3 | 0.1332 3 | 0.1801 3 | 0.4605 5 | |
CFM | 0.2360 5 | 0.1417 5 | 0.2253 5 | 0.1034 5 | 0.1379 10 | 0.3008 10 | |
CM3D-Trans+ | 0.2348 6 | 0.1383 6 | 0.2250 7 | 0.1030 6 | 0.1398 9 | 0.2966 12 | |
Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma: Contextual Modeling for 3D Dense Captioning on Point Clouds. | |||||||
Forest-xyz | 0.2266 7 | 0.1363 7 | 0.2250 6 | 0.1027 7 | 0.1161 15 | 0.2825 15 | |
D3Net - Speaker | 0.2088 8 | 0.1335 9 | 0.2237 8 | 0.1022 8 | 0.1481 7 | 0.4198 7 | |
Dave Zhenyu Chen, Qirui Wu, Matthias Niessner, Angel X. Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding. 17th European Conference on Computer Vision (ECCV), 2022 | |||||||
Chat-Scene-thres0.01 | 0.2053 9 | 0.1103 10 | 0.1884 10 | 0.0907 10 | 0.1527 5 | 0.5076 2 | |
3DJCG(Captioning) | 0.1918 10 | 0.1350 8 | 0.2207 9 | 0.1013 9 | 0.1506 6 | 0.3867 8 | |
Daigang Cai, Lichen Zhao, Jing Zhang†, Lu Sheng, Dong Xu: 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR2022 Oral | |||||||
REMAN | 0.1662 11 | 0.1070 11 | 0.1790 11 | 0.0815 11 | 0.1235 13 | 0.2927 14 | |
NOAH | 0.1382 12 | 0.0901 12 | 0.1598 12 | 0.0747 12 | 0.1359 11 | 0.2977 11 | |
SpaCap3D | 0.1359 13 | 0.0883 13 | 0.1591 13 | 0.0738 13 | 0.1182 14 | 0.3275 9 | |
Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai: Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022 | |||||||
X-Trans2Cap | 0.1274 14 | 0.0808 15 | 0.1392 15 | 0.0653 15 | 0.1244 12 | 0.2795 16 | |
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen: X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning. CVPR 2022 | |||||||
Chat-Scene-all | 0.1257 15 | 0.0671 17 | 0.1150 17 | 0.0554 17 | 0.1539 4 | 0.5076 2 | |
MORE-xyz | 0.1239 16 | 0.0796 16 | 0.1362 16 | 0.0631 16 | 0.1116 17 | 0.2648 17 | |
Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang: MORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes. ECCV 2022 | |||||||
SUN+ | 0.1148 17 | 0.0846 14 | 0.1564 14 | 0.0711 14 | 0.1143 16 | 0.2958 13 | |
Scan2Cap | 0.0849 18 | 0.0576 18 | 0.1073 18 | 0.0492 18 | 0.0970 18 | 0.2481 18 | |
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner and Angel X. Chang: Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. CVPR 2021 |