ScanNet++ Dataset

Submission Policy

The benchmark is currently evaluated on the v2 version of the dataset.

The ScanNet++ training and validation sets are provided for training models and optimizing model parameters. Benchmark results are evaluated on the hidden test sets, which are not publicly available. Users must not attempt to optimize results on the hidden test set through repeated submissions or to access test set data in any way.

A user may create upto 4 submissions for each task. Creating multiple accounts to bypass this limit is prohibited, and will cause the user to get banned from the submission system.

The interval between two submissions must be at least 24 hours. It may take upto 24 hours after a valid submission for the results to appear on the public leaderboard.

Before submission, users are encouraged to use the public evaluation scripts on the validation set to make sure their submissions are in the correct format. Failed submissions due to incorrect formats will count towards a user's submission quota.

The hidden test sets (and hence the submission system) must not be used for ablation studies. Ablation studies must be reported on the ScanNet++ validation set. Once the algorithm and its parameters are finalized on the validation set, the user may create a single submission through the submission system.

Submission Instructions

Novel View Synthesis on DSLR Images

A submission must include algorithm-generated images (supported formats/extensions: JPG, jpg, jpeg, png) for each filename in the "test" field of train_test_lists.json or in the "test_frames" field of nerfstudio/transforms.json using the given camera poses. All the images from a single scene should be in a directory named <scene_id>, and all such directories should be in a single .zip file. The filenames must exactly match the ones in train_test_lists.json.

Generated images must preferably be in the resolution specified by the camera intrinsics. If this is not the case, they will be resized to the correct resolution before evaluation. For more details of evaluation see the novel view synthesis page.

The complete testing set for NVS consists of 50 scenes (full set). For quicker evaluation of per-scene optimization methods, we also offer a small set. Users have the option to submit results for the small set alone or for the full set. Submissions on the full set will be evaluated on both the small set and the full set.

The NVS benchmark has 2 tracks: fisheye and undistorted. In the fisheye track, original fisheye DSLR images are used as GT and results must be generated with the fisheye camera intrinsics, while in the undistorted track, undistorted DSLR images are used as GT and results must be submitted as undistorted images.

Undistorted images can be obtained by using the ScanNet++ Toolbox, which gives undistorted camera intrinsics and undistorted images while the camera poses (extrinsics) remain the same.

The .zip file should be uploaded to the submission system. A submission could look like this (scene IDs and filenames are for illustration purposes only):


unzip_root/
    |-- 56a0ec536c
        |-- DSC01752.JPG
        |-- DSC01753.JPG
            ⋮
    |-- 8b5caf3398
        |-- DSC00299.JPG
        |-- DSC00143.JPG
            ⋮
    |-- 98b4ec142f
        ⋮

Important: Unzipping the submission .zip file must not create the unzip_root directory, and must create only the <scene_id> directories directly. The maximum upload size is 2GB.

3D Semantic Segmentation

The 3D semantic segmentation task is evaluated on the top 100 semantic classes. Semantic label predictions must be provided for each vertex of the mesh. See the 3D semantic segmentation page for more details.

A submission must contain one .txt file for each test scene, named <scene_id>.txt. This file must contain 0-indexed semantic label predictions for each vertex of the mesh according to the labels in top100.txt, separated by a newline. A file could look like this:

All such files must be in a single .zip file and uploaded to the submission system, which when unzipped has the following structure (scene IDs are for illustration purposes only):


unzip_root/
    |-- 56a0ec536c.txt
    |-- 8b5caf3398.txt
    |-- 41b00feddb.txt
        ⋮
    |-- 98b4ec142f.txt

Important: Unzipping the submission .zip file must not create the unzip_root directory, and must create only the .txt files directly.

3D Instance Segmentation

The 3D instance segmentation task is evaluated on a subset of the 100 semantic classes given in top100_instance.txt. A submission must contain the list of predicted 3D instances and their RLE-encoded (run-length encoded) vertex masks for each test scene. See the 3D instance segmentation page for more details.

Results must be provided as a text file for each test scan. Each text file should contain a line for each instance, containing the relative path to an RLE-encoding of the instance mask in a JSON file, the predicted label id, and the confidence of the prediction.

The result text files must be named according to the corresponding test scene, as <scene_id>.txt. Predicted .txt files listing the instances of each scan are in the root of the unzipped submission. Predicted instance RLE-mask JSON files must be in a subdirectory of the unzipped submission.

For instance, a submission could look like (scene IDs are for illustration purposes only):


unzip_root/
    |-- 56a0ec536c.txt
    |-- 8b5caf3398.txt
    |-- 41b00feddb.txt
        ⋮
    |-- 98b4ec142f.txt
    |-- predicted_masks/
        |-- 56a0ec536c_000.json
        |-- 56a0ec536c_001.json
        |-- 8b5caf3398_001.json
            ⋮
        |-- 98b4ec142f_035.json

Important: Unzipping the submission .zip file must not create the unzip_root directory, and must create only the .txt files directly.

Each prediction file for a scan must contain one line per predicted instance, with the following space-separated fields: (1) the relative path to the predicted RLE-mask file, (2) the integer class label id, (3) the float confidence score. Thus, the filenames in the prediction files must not contain spaces. For example, 56a0ec536c.txt could look like this::


predicted_masks/56a0ec536c_000.json 29 0.489
predicted_masks/56a0ec536c_001.json 92 0.965
⋮

The predicted instance mask JSON file is an RLE encoding of the mask over the vertices of the scan mesh. See the semantic GT preparation script for an example of how to create the JSON file. The length field of the JSON contains the number of vertices in the mask, and the counts field contains the RLE encoding of the mask as pairs of start and length values. start is the 1-indexed position of a contiguous group of vertices in the mesh, and length is the number of vertices in the group.

For example, predicted_masks/56a0ec536c_000.json with RLE encoding of the binary mask [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1] could look like this:


{
    'length': 12,
    'counts': '4 3 10 3'
}

Common FAQ

1. Do I have to use zip?

We support .zip, .tar.gz, and .7z.

2. How to compress the files without the parent directory?


cd YOUR_PRED_FOLDER; zip -r ../upload.zip .; cd ..

or with gzip


cd YOUR_PRED_FOLDER; tar -czfv upload.tar.gz *; cd ..