Submitted by Sijin Chen.

Submission data

Full nameVote2Cap-DETR++
DescriptionDecoupled feature extraction and task decoding for 3D dense captioning.

Set-to-set training, and fine-tuned with SCST (CiDEr reward)
Publication titleVote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Publication authorsSijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen
Publication URLhttps://arxiv.org/abs/2309.02999
Input Data TypesUses XYZ coordinates,Uses RGB values,Uses Normal Vectors
Programming language(s)python
HardwareRTX3090
Source code or download URLhttps://github.com/ch3cook-fdu/Vote2Cap-DETR
Submission creation date16 Feb, 2024
Last edited19 Feb, 2024

Captioning

Captioning F1-Score Dense Captioning Object Detection
CIDEr@0.5IoUBLEU-4@0.5IoURouge-L@0.5IoUMETEOR@0.5IoUDCmAPmAP@0.5
0.33600.19080.30120.13860.18640.5090