Scan2Cap Benchmark
This table lists the benchmark results for the Scan2Cap Dense Captioning Benchmark scenario.
Captioning F1-Score | Dense Captioning | Object Detection | |||||
---|---|---|---|---|---|---|---|
Method | Info | CIDEr@0.5IoU | BLEU-4@0.5IoU | Rouge-L@0.5IoU | METEOR@0.5IoU | DCmAP | mAP@0.5 |
Vote2Cap-DETR++ | 0.3360 1 | 0.1908 1 | 0.3012 1 | 0.1386 1 | 0.1864 1 | 0.5090 1 | |
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen: Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning. | |||||||
vote2cap-detr | 0.3128 2 | 0.1778 2 | 0.2842 2 | 0.1316 2 | 0.1825 2 | 0.4454 2 | |
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU, Taihao Li: End-to-End 3D Dense Captioning with Vote2Cap-DETR. CVPR 2023 | |||||||
CFM | 0.2360 3 | 0.1417 3 | 0.2253 3 | 0.1034 3 | 0.1379 6 | 0.3008 6 | |
Forest-xyz | 0.2266 5 | 0.1363 5 | 0.2250 4 | 0.1027 5 | 0.1161 11 | 0.2825 11 | |
CM3D-Trans+ | 0.2348 4 | 0.1383 4 | 0.2250 5 | 0.1030 4 | 0.1398 5 | 0.2966 8 | |
Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma: Contextual Modeling for 3D Dense Captioning on Point Clouds. | |||||||
D3Net - Speaker | 0.2088 6 | 0.1335 7 | 0.2237 6 | 0.1022 6 | 0.1481 4 | 0.4198 3 | |
Dave Zhenyu Chen, Qirui Wu, Matthias Niessner, Angel X. Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding. 17th European Conference on Computer Vision (ECCV), 2022 | |||||||
3DJCG(Captioning) | 0.1918 7 | 0.1350 6 | 0.2207 7 | 0.1013 7 | 0.1506 3 | 0.3867 4 | |
Daigang Cai, Lichen Zhao, Jing Zhang†, Lu Sheng, Dong Xu: 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR2022 Oral | |||||||
REMAN | 0.1662 8 | 0.1070 8 | 0.1790 8 | 0.0815 8 | 0.1235 9 | 0.2927 10 | |
NOAH | 0.1382 9 | 0.0901 9 | 0.1598 9 | 0.0747 9 | 0.1359 7 | 0.2977 7 | |
SpaCap3D | 0.1359 10 | 0.0883 10 | 0.1591 10 | 0.0738 10 | 0.1182 10 | 0.3275 5 | |
Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai: Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022 | |||||||
SUN+ | 0.1148 13 | 0.0846 11 | 0.1564 11 | 0.0711 11 | 0.1143 12 | 0.2958 9 | |
X-Trans2Cap | 0.1274 11 | 0.0808 12 | 0.1392 12 | 0.0653 12 | 0.1244 8 | 0.2795 12 | |
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen: X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning. CVPR 2022 | |||||||
MORE-xyz | 0.1239 12 | 0.0796 13 | 0.1362 13 | 0.0631 13 | 0.1116 13 | 0.2648 13 | |
Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang: MORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes. ECCV 2022 | |||||||
Scan2Cap | 0.0849 14 | 0.0576 14 | 0.1073 14 | 0.0492 14 | 0.0970 14 | 0.2481 14 | |
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner and Angel X. Chang: Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. CVPR 2021 |