
Scan2Cap Benchmark
This table lists the benchmark results for the Scan2Cap Dense Captioning Benchmark scenario.
Captioning F1-Score | Dense Captioning | Object Detection | |||||
---|---|---|---|---|---|---|---|
Method | Info | CIDEr@0.5IoU | BLEU-4@0.5IoU | Rouge-L@0.5IoU | METEOR@0.5IoU | DCmAP | mAP@0.5 |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
vote2cap-detr | ![]() | 0.3128 1 | 0.1778 1 | 0.2842 1 | 0.1316 1 | 0.1825 1 | 0.4454 1 |
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU: End-to-End 3D Dense Captioning with Vote2Cap-DETR. CVPR 2023 | |||||||
CFM | 0.2360 2 | 0.1417 2 | 0.2253 2 | 0.1034 2 | 0.1379 5 | 0.3008 5 | |
CM3D-Trans+ | 0.2348 3 | 0.1383 3 | 0.2250 4 | 0.1030 3 | 0.1398 4 | 0.2966 7 | |
Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma: Contextual Modeling for 3D Dense Captioning on Point Clouds. | |||||||
Forest-xyz | 0.2266 4 | 0.1363 4 | 0.2250 3 | 0.1027 4 | 0.1161 10 | 0.2825 10 | |
D3Net - Speaker | ![]() | 0.2088 5 | 0.1335 6 | 0.2237 5 | 0.1022 5 | 0.1481 3 | 0.4198 2 |
Dave Zhenyu Chen, Qirui Wu, Matthias Niessner, Angel X. Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding. 17th European Conference on Computer Vision (ECCV), 2022 | |||||||
3DJCG(Captioning) | ![]() | 0.1918 6 | 0.1350 5 | 0.2207 6 | 0.1013 6 | 0.1506 2 | 0.3867 3 |
Daigang Cai, Lichen Zhao, Jing Zhang†, Lu Sheng, Dong Xu: 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR2022 Oral | |||||||
REMAN | 0.1662 7 | 0.1070 7 | 0.1790 7 | 0.0815 7 | 0.1235 8 | 0.2927 9 | |
NOAH | 0.1382 8 | 0.0901 8 | 0.1598 8 | 0.0747 8 | 0.1359 6 | 0.2977 6 | |
SpaCap3D | ![]() | 0.1359 9 | 0.0883 9 | 0.1591 9 | 0.0738 9 | 0.1182 9 | 0.3275 4 |
Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai: Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022 | |||||||
X-Trans2Cap | ![]() | 0.1274 10 | 0.0808 11 | 0.1392 11 | 0.0653 11 | 0.1244 7 | 0.2795 11 |
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen: X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning. CVPR 2022 | |||||||
MORE-xyz | ![]() | 0.1239 11 | 0.0796 12 | 0.1362 12 | 0.0631 12 | 0.1116 12 | 0.2648 12 |
Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang: MORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes. ECCV 2022 | |||||||
SUN+ | 0.1148 12 | 0.0846 10 | 0.1564 10 | 0.0711 10 | 0.1143 11 | 0.2958 8 | |
Scan2Cap | ![]() | 0.0849 13 | 0.0576 13 | 0.1073 13 | 0.0492 13 | 0.0970 13 | 0.2481 13 |
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner and Angel X. Chang: Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. CVPR 2021 |