Submitted by Yang Jiao.

Submission data

Full nameMORE
Description3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart. However, it is also more challenging due to the higher complexity and wider variety of inter-object relations contained in point clouds. Existing methods only treat such relations as by-products of object feature learning in graphs without specifically encoding them, which leads to sub-optimal results. In this paper, aiming at improving 3D dense captioning via capturing and utilizing the complex relations in the 3D scene, we propose MORE, a Multi-Order RElation mining model, to support generating more descriptive and comprehensive captions. Technically, our MORE encodes object relations in a progressive manner since complex relations can be deduced from a limited number of basic ones. We first devise a novel Spatial Layout Graph Convolution (SLGC), which semantically encodes several first-order relations as edges of a graph constructed over 3D object pro
Publication titleMORE: Multi_ORder RElation Mining for Dense Captioning in 3D Scenes
Publication authorsYang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang
Publication venueECCV 2022
Publication URLhttps://arxiv.org/abs/2203.05203
Input Data TypesUses XYZ coordinates,Uses Normal Vectors
Programming language(s)Pytorch CUDA
HardwareRTX-2080
Websitehttps://github.com/SxJyJay/MORE
Source code or download URLhttps://github.com/SxJyJay/MORE
Submission creation date11 Sep, 2022
Last edited6 Oct, 2022

Captioning

Captioning F1-Score Dense Captioning Object Detection
CIDEr@0.5IoUBLEU-4@0.5IoURouge-L@0.5IoUMETEOR@0.5IoUDCmAPmAP@0.5
0.12390.07960.13620.06310.11160.2648