Submitted by Alexander Swerdlow.

Submission data

Full nameUnifying 2D and 3D Vision-Language Understanding
Publication titleUnifying 2D and 3D Vision-Language Understanding
Publication authorsAyush Jain, Alexander Swerdlow, Yuzhou Wang, Alexander Sax, Franziska Meier, Katerina Fragkiadaki
Publication URLhttps://arxiv.org/abs/2503.10745
Input Data TypesUses XYZ coordinates,Uses RGB values,Uses Multiview Image Features
Programming language(s)Python with CUDA
HardwareA100/L40S, >=40GB of VRAM
Websitehttps://univlg.github.io/
Source code or download URLhttps://github.com/facebookresearch/univlg
Submission creation date18 Feb, 2025
Last edited18 Mar, 2025

Localization

Unique Unique Multiple Multiple Overall Overall
acc@0.25IoUacc@0.5IoUacc@0.25IoUacc@0.5IoUacc@0.25IoUacc@0.5IoU
0.88950.82360.59210.50300.65880.5749