Submission policy

Data

Training Data
    Download
    Limited Scene Reconstructions
    Limited Annotations
        Limited Bounding Boxes

Submission format
    Format for 3D Semantic Label Prediction
    Format for 3D Semantic Instance Prediction
    Format for 3D Object Detection


Data

Download: ScanNet Data Efficient Benchmark uses ScanNet dataset. If you would like to download the ScanNet data, please fill out an agreement to the ScanNet Terms of Use and send it to us at the scannet group email. For more information regarding the ScanNet dataset, please see our git repo.


Submission policy

We release 2 configurations in this benchmark on Semantic Segmentation, Instance Segmentation and Object Detection tasks, i.e. Limited Scene Reconstructions (LR) and Limited Scene Annotations (LA). In LR, you are only allowed to train on limited scene reconstructions (with {1%, 5%, 10%, 20%} for instance/semantic segmentation tasks and {10%, 20%, 40%, 80%} for object detection task) of total 1201 training scenes. In LA, our benchmark considers four different training configurations on ScanNet including using {20, 50, 100, 200} labeled points per scene for semantic segmentation and instance segmentation; for object detection we provides {1,2,4,7} bounding boxes that can be used for training per scene.

Parameter tuning is only allowed on the training data. Evaluating on the test data via this evaluation server must only be done once for the final system. It is not permitted to use it to train systems, for example by trying out different parameter values and choosing the best. Only one version must be evaluated (which performed best on the training data). This is to avoid overfitting on the test data. Results of different parameter settings of an algorithm can therefore only be reported on the training set. To help enforcing this policy, we block updates to the test set results of a method for two weeks after a test set submission. You can split up the training data into training and validation sets yourself as you wish.

It is not permitted to register on this webpage with multiple e-mail addresses. We will ban users or domains if required.


Training Data


Download

Using the ScanNet download script: python download-scannet.py --data_efficient -o output_folder

Download the limited reconstructions files (for instance/semantic segmentation tasks here and object detection task here), limited annotations files, and limited bounding box files.

Limited Reconstructions

We provide a scene list for each configuration limiting which scenes can be used for training. The file structure is as follows: unzip_root/
 |-- 1.txt
 |-- 5.txt
 |-- 10.txt
 |-- 20.txt
For example, 20.txt includes 20% scene ids of total scene ids that you can use for training your model.


Limited Annotations

We provide the list of indices of points (aligned to .ply files), which can be used for training for each configuration for instance segmentation and semantic segmentation tasks. The file structure is as follows: unzip_root/
 |-- points20
 |-- points50
 |-- points100
 |-- points200
For example, points100 denotes configuration that only 100 points per scene can be used for training. Each one is a torch packed file, you can load it by following code as an example: # data efficiency by sampling points
if phase == DatasetPhase.Train:
    sampled_inds = torch.load(PATH_FILE)
The sample_inds will be a python dictionary, where key is the scene id, the value is a list of indices. The indices tell you which points can be used. The indices are aligned to each vertex in the order provided by the *_vh_clean_2.ply mesh.


Limited Bounding Boxes

For Object Detection we provide instance id that can be used for training. The file structure is as follows: unzip_root/
 |-- bbox1
 |-- bbox2
 |-- bbox4
 |-- bbox7
For example, bbox7 denotes configuration that only 7 bounding boxes per scene (on average) can be used for training. Each one is a torch packed file, you can load it by following code as an example: if split_set == 'train':
    sampled_bbox = torch.load(PATH_FILE)
The sample_bbox will be a python dictionary, where key is the scene id, the value is a list of instance id. The instance id tells you which instance can be used for training. The instance ids are aligned to '_ins_label.npy' that is generated by VoteNet. Note we assume the instance mask inside bounding boxes is known for training. For detection task, you can compute the axis-aligned bounding boxes from the instance mask.

Note for some methods, you may need object centers as ground truth. However, the object centers can only be computed from the provided list of points, which could be shifted from the real object center. Because we assume only provided points are annotated


Submission format

For all tasks, you need to upload a zip or .7z file (7z is preferred due to smaller file sizes) including 4 submissions for 4 different configurations. Each submission should be one folder, the folder contains txt files organized according to different tasks.


Format for 3D Semantic Label Prediction

There should be a folder for each configuration, i.e {20, 50, 100, 200} for LA; {1, 5, 10, 20} for LR. A submission under each folder must contain a .txt prediction file for each test scan, named scene%04d_%02d.txt with the corresponding ScanNet scan name. For instance, a submission for LA looks like: unzip_root/
 |-- 20
    |-- scene0707_00.txt
    |-- scene0708_00.txt
    |-- scene0709_00.txt
         ⋮
    |-- scene0806_00.txt
 |-- 50
    |-- scene0707_00.txt
    |-- scene0708_00.txt
    |-- scene0709_00.txt
         ⋮
    |-- scene0806_00.txt
 |-- 100
         ⋮
 |-- 200
         ⋮
In each prediction file, results must be provided as class labels per vertex of the corresponding 3D scan mesh, i.e., for each vertex in the order provided by the *_vh_clean_2.ply mesh. Each prediction file should contain one line per vertex, with each line containing the integer label id of the predicted class. E.g., a prediction file could look like: 10
10
2
2
2

39


Format for 3D Semantic Instance Prediction

There should be a folder for each configuration, i.e {20, 50, 100, 200} for LA; {1, 5, 10, 20} for LR. A submission under each folder must contain a .txt prediction file for each test scan. Each text file should contain a line for each instance, containing the relative path to a binary mask of the instance, the predicted label id, and the confidence of the prediction. The result text files must be named according to the corresponding test scan, as scene%04d_%02d.txt with the corresponding ScanNet scan name. For example, a submission for LR scenario should look like: unzip_root/
 |-- 1
    |-- scene0707_00.txt
    |-- scene0708_00.txt
    |-- scene0709_00.txt
        ⋮
    |-- scene0806_00.txt
    |-- predicted_masks/
       |-- scene0707_00_000.txt
       |-- scene0707_00_001.txt
            ⋮
 |-- 5
     ⋮
 |-- 10
     ⋮
 |-- 20
     ⋮
Each prediction file for a scan should contain a list of instances, where an instance is: (1) the relative path to the predicted mask file, (2) the integer class label id, (3) the float confidence score. Each line in the prediction file should correspond to one instance, and the three values above separated by spaces. Thus, the filenames in the prediction files must not contain spaces.
The predicted instance mask file should provide a mask over the vertices of the scan mesh, i.e., for each vertex in the order provided by the *_vh_clean_2.ply mesh. Each instance mask file should contain one line per vertex, with each line containing an integer value, with non-zero values indicating part of the instance. E.g., scene0707_00.txt should be of the format: predicted_masks/scene0707_00_000.txt 10 0.7234
predicted_masks/scene0707_00_001.txt 36 0.9038
     ⋮
and predicted_masks/scene0707_00_000.txt could look like: 0
0
0
1
1

0


Format for 3D Object Detection

There should be a folder for each configuration, i.e {1,2,4,7} for LA; {10,20,40,80} for LR. A submission under each folder must contain a .txt prediction file for each test scan. The result text files must be named according to the corresponding test scan, as scene%04d_%02d.txt with the corresponding ScanNet scan name. For example, a submission of 3d object detection for LA scenario should look like: unzip_root/
 |-- 1
    |-- scene0707_00.txt
    |-- scene0708_00.txt
    |-- scene0709_00.txt
        ⋮
    |-- scene0806_00.txt
 |-- 2
    |-- scene0707_00.txt
    |-- scene0708_00.txt
    |-- scene0709_00.txt
        ⋮
    |-- scene0806_00.txt
 |-- 4
     ⋮
 |-- 7
     ⋮
Each prediction file for a scan should contain a list of instances, where an instance is: (1) the bbox minx, miny, minz, maxx,maxy,maxz in world space (where .ply lives), (2) the integer class label id, (3) the float confidence score. Each line in the prediction file should correspond to one instance. E.g., scene0707_00.txt should be of the format: 5.25 1.64 0.09 5.83 2.24 0.92 5 0.75
5.60 5.67 1.19 6.16 6.44 1.44 9 0.43

-0.00 0.07 0.08 0.34 0.49 1.31 39 0.25