FLAME-driven Monocular Head Avatar Reconstruction

Given several frontal videos of a person's head with corresponding tracked meshes from FLAME, the task is to re-animate the person with unseen FLAME expression codes and then render from both seen (blue) and unseen (orange) camera viewpoints. This requires reconstructing an animatable 3D head representation (=3D head avatar). The challenge is conducted on recordings from 5 different individuals. For each individual, 18 short facial performance sequences are provided for training while the remaining 4 sequences are hold-out. For the hold-out sequences, only the tracked FLAME meshes and the camera poses are known.

Please see the NeRSemble Benchmark Toolkit on how to obtain the data to participate in the benchmark and prepare a submission.

Evaluation and Metrics

We evaluate the similarity beween the ground truth and generated RGB videos. Our image-wise evaluation metrics are peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and learned perceptual image patch similarity (LPIPS). Additionally, we measure temporal consistency with FovVideoVDP (JOD) that is sensitive to flickering, noise and other temporal artifacts. Finally, we employ two face-specific metrics: Average Keypoint Distance (AKD) measured in pixels with keypoints estimated from PIPNet, and cosine similarity (CSIM) of identity embeddings based on ArcFace. For each pair of generated and ground-truth videos, we compute these six metrics and average the result for all 3 hold-out cameras, 4 hold-out sequences, and for all 5 individuals. To save compute during metric evaluation, the face and image metrics are evaluated every 3rd frame.

Evaluation is carried out on GT images with resolution 512x512.

Results

Methods PSNR SSIM LPIPS JOD AKD CSIM
FlashAvatar 16.300 0.731 0.386 4.146 19.179 0.304
Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang. FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding. CVPR 2024

Please refer to the submission instructions before making a submission

Submit results