Submitted anonymously.

Submission data

Full nameVision-Language Pre-training with Object Contrastive Learning
DescriptionWe propose a vision-language pre-training framework 3DVLP (3D vision-language pre-training with object contrastive learning), which transfers flexibly on 3D visionlanguage downstream tasks. 3DVLP takes visual grounding as the proxy task and introduces Object-level IoUguided Detection (OID) loss to obtain high-quality proposals in the scene. Moreover, we design Object-level Cross-
Contrastive alignment (OCC) task and Object-level Self-Contrastive learning (OSC) task to align the objects with descriptions and distinguish different objects in the scene, respectively.
Input Data TypesUses XYZ coordinates,Uses RGB values,Uses Multiview Image Features,Uses Normal Vectors
Programming language(s)python
HardwareRTX4090
Submission creation date26 Feb, 2025
Last edited26 Feb, 2025

Localization

Unique Unique Multiple Multiple Overall Overall
acc@0.25IoUacc@0.5IoUacc@0.25IoUacc@0.5IoUacc@0.25IoUacc@0.5IoU
0.00380.00190.00490.00230.00470.0022