Novel View Synthesis on iPhone Images (Internal Testing)

The novel view synthesis task is to render images from novel viewpoints given an RGB scene capture. Those images are captured by an iPhone camera (60 FPS, 1920x1440). Camera poses from COLMAP for some training images, ARKit poses, and IMU information from the iPhone are available for all scenes. Check out the details at the section iphone/ on the official documentation ScanNet++ Documentation.

Our test frames are captured only by a fisheye DSLR camera in an identical scene setting and provided as the corresponding test images for each iPhone scene.

Since our iPhone scenes are captured with one of the consumer devices (iPhone) to bridge the gap between an ideal setting assumed by cutting-edge research and a real-world setting, they have a motion blur and inconsistent lighting compared to our DSLR images. Therefore, this Novel View Synthesis task with iPhone is more more challenging than one on DSLR.

We provide the evaluation track, rendering undistorted perspective images of the given DSLR poses and iPhone intrinsics (assumed as a pinhole by ARKit). A set of training images is extracted from an RGB video, and one can obtain a test undistorted DSLR image by using given iPhone intrinsic and ScanNet++ Toolbox.

Evaluation

The complete testing set for iPhone NVS consists of 12 scenes.

Metrics

We evaluate the similarity beween the ground truth (GT) and generated RGB images.

Our evaluation metrics are

peak signal-to-noise ratio (PSNR)
similarity index measure (SSIM)
perceptual image patch similarity (LPIPS)

For each pair of generated and ground-truth images, we compute these three metrics, and the numbers reported in the table are the average over all the images across all the scenes.

Evaluation is done on GT images (undistorted fish-eye DSLR test images by iPhone intrinsic) with a resolution of 1920 x 1440. Submitted images will be automatically resized if their resolutions differ from this. Due to inconsistent lighting conditions, a color-correction process was applied to the images submitted. The estimation of color correction is done using POT: Python Optimal Transport. Specifically, we estimate the optimal transport operator between empirical color distributions, namely a generated RGB image and GT. For more information, please refer to the following reference: POT.

Evaluation excludes the pixels which are anonymized. Anonymized pixels are specified in resized_anon_masks and original_anon_masks.

Results

The full set is the complete iPhone NVS test set, and contains 12 scenes.

Methods	PSNR (before/after CC)	SSIM (before/after CC)	LPIPS (before/after CC)

Please refer to the submission instructions before making a submission

Submit results