The NeRSemble 3D Head Avatar Benchmark

based on the data from

NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads

Technical University of Munich
SIGGRAPH 2023 [Paper]
teaser

Introduction

The NeRSemble benchmark aims to make research for photorealistic 3D head avatars more comparable. The benchmark studies distinct phenomena of 3D head avatar creation, such as extreme facial expressions, slow motion captures of shaking long hair, or complicated light reflection and refraction patterns of glasses. To this end, v1 of the benchmark introduces two tasks: Dynamic Novel View Synthesis on Heads and FLAME-driven Monocular Head Avatar Reconstruction. These tasks assess two core desiderata of 3D avatars: While the novel view synthesis challenge focuses on best possible rendering quality of complex moving scenes, the avatar animation challenge is concerned with how well a driving signal is translated into an avatar.

Dynamic Novel View Synthesis.
The first task, dynamic novel view synthesis, is particular interesting for digital humans where the bar for a free viewpoint video to be perceived as ”real” is exceptionally high due to human sensitivity to faces. Human heads constitute an excellent playground to benchmark dynamic novel view synthesis methods, as many complex physical phenomena such as topological changes, light reflections and refractions, sub-surface scattering, or fast movements of thin structures can be studied in a very controlled setting. In the benchmark, 13 synchronized video streams for 5 challenging human head sequences will be provided to reconstruct a high-quality dynamic 3D representation.

FLAME-driven Monocular Head Avatar Reconstruction.
On the other hand, the monocular FLAME avatar challenge poses an under-constrained setting as data from only a single frontal camera is provided, mimicing the use-case of a user casually creating an avatar of themselves with their phone. Here, the focus lies on re-animating the avatar with unseen expressions and rendering it from unseen views. In the interest of comparability, we restrict the driving signal to FLAME expression codes which currently is a popular choice for animating a 3D head avatar. The benchmark dataset provides high-quality FLAME trackings that were obtained by fitting the 3D face model to accurate 3D point clouds reconstructed from all 16 camera views.

For all benchmark tasks, evaluation is performed on hold-out camera viewpoints that are not part of the published benchmark data.

Download the benchmark data

The benchmark data is based on the NeRSemble dataset.
To download the benchmark data, please follow the steps in our NeRSemble benchmark github repo. You will first have to request access to the NeRSemble dataset via this form. Once your application is approved (typically within 1 day), you can use the convenient download scripts in the repo to obtain the benchmark data.

Benchmark Tasks

Dynamic Novel View Synthesis

FLAME-driven Monocular Head Avatar Reconstruction

Citation

If you use the NeRSemble benchmark data or code please cite:

@article{kirschstein2023nersemble,
    author = {Kirschstein, Tobias and Qian, Shenhan and Giebenhain, Simon and Walter, Tim and Nie\ss{}ner, Matthias},
    title = {NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads},
    year = {2023},
    issue_date = {August 2023},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    volume = {42},
    number = {4},
    issn = {0730-0301},
    url = {https://doi.org/10.1145/3592455},
    doi = {10.1145/3592455},
    journal = {ACM Trans. Graph.},
    month = {jul},
    articleno = {161},
    numpages = {14},
}

License

The NeRSemble benchmark data is released under the same Terms of Use as the NeRSemble dataset, which you can have to agree to before access is granted.