ScanNet200 3D Semantic Instance Results

The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.

This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.

Method	avg ap 50%	head ap 50%	common ap 50%	tail ap 50%	backpack	bag	ball	bar	basket	bathroom cabinet	bathroom stall	bathroom stall door	bathroom vanity	bathtub	bed	bench	bicycle	bin	blackboard	blanket	blinds	board	book	bookshelf	bottle	bowl	box	broom	bucket	bulletin board	cabinet	calendar	candle	cart	case of water bottles	ceiling	ceiling light	chair	clock	closet	closet rod	clothes	clothes dryer	coat rack	coffee kettle	coffee maker	coffee table	column	computer tower	copier	couch	counter	cup	curtain	cushion	decoration	desk	dining table	dish rack	dishwasher	door	doorframe	dresser	dumbbell	end table	fan	file cabinet	fire alarm	fire extinguisher	folded chair	hat	headphones	ironing board	jacket	keyboard	keyboard piano	kitchen cabinet	kitchen counter	ladder	lamp	laptop	laundry basket	laundry detergent	laundry hamper	ledge	light	light switch	machine	mailbox	mat	microwave	mini fridge	mirror	monitor	mouse	music stand	nightstand	object	office chair	oven	paper	paper bag	paper cutter	paper towel dispenser	paper towel roll	person	piano	picture	pillow	pipe	plant	plate	plunger	poster	power outlet	power strip	printer	projector	projector screen	rack	radiator	range hood	recycling bin	refrigerator	scale	seat	shelf	shoe	shower	shower curtain	shower curtain rod	shower floor	shower head	shower wall	sign	sink	soap dispenser	sofa chair	speaker	stair rail	stairs	stand	stool	storage bin	storage container	storage organizer	stove	structure	stuffed animal	suitcase	table	telephone	tissue box	toaster	toilet	toilet paper	toilet paper dispenser	towel	trash bin	trash can	tray	tube	tv	tv stand	vacuum cleaner	vent	wardrobe	washing machine	water bottle	water cooler	water pitcher	whiteboard	window	windowsill

Mask3D Scannet200	0.388 2	0.542 2	0.357 3	0.237 3	0.610 2	0.091 3	0.125 7	0.000 1	0.000 3	0.000 1	0.065 4	0.668 1	0.451 2	1.000 1	0.955 2	0.640 1	0.500 3	0.039 2	0.125 3	0.063 4	0.409 1	0.311 3	0.291 2	0.609 4	0.266 3	0.000 2	0.163 3	0.000 1	0.008 1	0.044 4	0.496 3	1.000 1	0.000 1	0.018 4	0.000 3	0.756 2	0.573 1	0.808 3	0.000 2	0.010 3	0.042 5	0.130 5	0.552 2	0.042 1	0.000 1	1.000 1	0.725 4	0.750 1	0.883 1	1.000 1	0.832 5	0.024 4	0.107 3	0.614 4	0.226 1	0.250 2	0.628 3	0.792 1	0.677 4	0.400 1	0.741 3	0.278 3	0.511 3	0.077 7	0.111 3	0.313 4	0.715 3	0.302 1	0.017 5	0.200 2	0.000 2	0.188 2	0.000 3	0.178 4	0.736 3	1.000 1	0.615 1	0.514 2	0.409 3	0.380 7	0.600 2	0.000 1	0.000 1	0.400 2	0.013 3	0.254 2	0.381 1	0.000 1	0.123 5	0.400 2	0.839 2	0.258 3	0.463 2	0.926 2	0.265 2	0.000 1	0.857 2	0.099 2	0.021 2	0.500 1	0.027 1	0.028 2	1.000 1	0.502 7	0.016 3	0.076 6	0.500 1	0.612 2	0.578 2	0.005 4	0.597 4	0.194 3	0.497 2	0.000 1	0.500 1	0.000 2	0.323 6	0.000 2	1.000 1	0.000 1	0.748 1	0.708 3	0.050 6	0.890 3	1.000 1	0.008 2	0.151 5	0.301 3	1.000 1	1.000 1	0.792 4	0.945 1	1.000 1	0.511 2	0.004 2	0.753 2	0.776 4	0.287 2	0.020 3	0.003 6	0.974 3	0.033 1	0.412 7	0.000 2	0.000 3	0.000 2	0.667 3	0.000 1	0.000 1	0.491 3	0.676 3	0.352 3	0.335 2	0.060 4	0.822 7	0.527 3	1.000 1	0.517 3	0.606 1	0.853 3	0.000 1	0.004 2	0.806 1	1.000 1	0.727 1	0.000 1	0.042 3	0.739 2	0.000 1	0.399 4	0.391 2	0.504 2	0.591 2	0.571 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
DINO3D-Scannet200	0.454 1	0.587 1	0.453 1	0.296 1	0.851 1	0.200 1	0.500 1	0.000 1	0.042 2	0.000 1	0.378 1	0.545 3	0.729 1	1.000 1	0.981 1	0.355 7	1.000 1	0.046 1	0.000 4	0.248 1	0.000 2	0.494 1	0.381 1	0.586 6	0.496 2	0.250 1	0.409 1	0.000 1	0.000 2	0.714 1	0.572 1	1.000 1	0.000 1	0.250 2	0.050 2	0.793 1	0.436 2	0.871 1	0.000 2	0.216 1	0.284 1	0.290 1	0.083 4	0.000 2	0.000 1	0.764 2	0.716 6	0.500 2	0.842 2	1.000 1	0.891 2	0.096 1	0.361 1	0.690 2	0.000 3	0.595 1	0.753 1	0.708 3	0.750 1	0.400 1	0.845 1	0.475 1	0.728 1	0.750 1	0.214 1	0.683 1	0.743 2	0.000 2	0.400 3	0.200 2	0.500 1	0.944 1	0.125 2	0.327 1	0.823 1	0.792 3	0.602 2	0.662 1	0.777 1	0.803 1	0.675 1	0.000 1	0.000 1	0.200 4	0.298 1	0.324 1	0.000 4	0.000 1	0.000 6	0.800 1	0.824 3	0.750 1	0.507 1	0.937 1	0.000 4	0.000 1	0.779 4	0.116 1	0.001 4	0.417 3	0.000 2	0.014 3	1.000 1	0.816 2	0.548 1	0.600 1	0.500 1	0.771 1	0.773 1	0.117 1	0.944 1	0.764 1	0.571 1	0.000 1	0.250 3	0.000 2	1.000 1	0.063 1	1.000 1	0.000 1	0.720 2	0.974 1	0.079 5	0.918 2	0.000 4	0.000 3	0.312 2	0.616 1	0.125 2	1.000 1	1.000 1	0.857 2	0.000 2	0.594 1	0.000 3	0.767 1	0.845 1	0.264 3	0.419 1	0.177 3	0.667 5	0.000 3	0.677 1	0.000 2	0.194 1	0.000 2	0.857 1	0.000 1	0.000 1	0.563 2	0.703 1	0.835 2	0.850 1	0.346 1	0.944 5	0.499 4	0.866 2	0.777 1	0.221 5	0.911 1	0.000 1	0.011 1	0.721 2	0.764 6	0.520 4	0.000 1	0.442 1	0.405 5	0.000 1	0.667 1	0.655 1	0.473 3	0.614 1	0.437 2
Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing and Lei Zhang: SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features. AAAI 2026
Minkowski 34D Inst.	0.203 7	0.369 6	0.134 7	0.078 7	0.479 6	0.003 6	0.500 1	0.000 1	0.000 3	0.000 1	0.100 3	0.371 5	0.300 4	0.667 5	0.746 4	0.400 4	0.000 4	0.000 4	0.000 4	0.031 5	0.000 2	0.074 5	0.165 5	0.413 7	0.000 6	0.000 2	0.070 6	0.000 1	0.000 2	0.000 5	0.221 7	0.000 4	0.000 1	0.000 5	0.000 3	0.372 7	0.070 3	0.706 5	0.000 2	0.000 4	0.000 7	0.123 6	0.033 7	0.000 2	0.000 1	0.422 6	0.732 3	0.000 6	0.778 7	1.000 1	0.845 4	0.000 5	0.090 6	0.636 3	0.000 3	0.000 5	0.158 6	0.000 5	0.250 7	0.050 6	0.693 5	0.123 6	0.051 7	0.385 5	0.009 6	0.118 7	0.406 7	0.000 2	0.000 6	0.200 2	0.000 2	0.000 4	0.000 3	0.133 6	0.307 7	0.500 4	0.251 6	0.000 6	0.281 5	0.402 5	0.317 4	0.000 1	0.000 1	0.000 5	0.000 4	0.060 6	0.000 4	0.000 1	0.396 3	0.200 4	0.669 4	0.021 6	0.218 6	0.720 7	0.000 4	0.000 1	0.696 5	0.025 6	0.000 5	0.000 5	0.000 2	0.000 5	0.125 7	0.596 4	0.000 4	0.191 3	0.500 1	0.595 3	0.369 6	0.000 5	0.500 6	0.000 4	0.143 7	0.000 1	0.000 5	0.000 2	0.226 7	0.000 2	0.000 4	0.000 1	0.701 3	0.511 6	0.000 7	0.851 5	0.000 4	0.000 3	0.150 6	0.052 7	0.100 4	0.981 5	0.500 5	0.286 5	0.000 2	0.000 7	0.000 3	0.545 6	0.522 7	0.250 4	0.000 4	0.000 7	0.522 7	0.000 3	0.500 4	0.000 2	0.000 3	0.000 2	0.282 7	0.000 1	0.000 1	0.178 7	0.382 6	0.018 7	0.056 6	0.000 5	0.997 3	0.107 7	0.677 3	0.313 6	0.000 6	0.726 7	0.000 1	0.000 3	0.583 5	0.903 5	0.200 7	0.000 1	0.000 4	0.333 6	0.000 1	0.442 3	0.083 6	0.109 7	0.387 6	0.000 7
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019
ODIN - Ins200	0.381 3	0.507 3	0.375 2	0.237 2	0.484 5	0.108 2	0.500 1	0.000 1	0.125 1	0.000 1	0.058 5	0.647 2	0.385 3	0.667 5	0.853 3	0.542 3	1.000 1	0.000 4	1.000 1	0.093 3	0.000 2	0.028 6	0.274 3	0.682 2	0.550 1	0.000 2	0.269 2	0.000 1	0.000 2	0.714 1	0.566 2	1.000 1	0.000 1	0.500 1	0.125 1	0.585 4	0.066 4	0.653 7	0.083 1	0.049 2	0.264 2	0.227 2	0.667 1	0.000 2	0.000 1	0.278 7	0.723 5	0.250 4	0.786 6	1.000 1	0.744 7	0.039 3	0.209 2	0.494 6	0.000 3	0.250 2	0.446 4	0.500 4	0.750 1	0.200 4	0.780 2	0.333 2	0.602 2	0.469 4	0.163 2	0.406 3	0.530 5	0.000 2	0.668 1	0.200 2	0.000 2	0.000 4	0.500 1	0.313 2	0.769 2	1.000 1	0.511 3	0.196 3	0.286 4	0.393 6	0.337 3	0.000 1	0.000 1	0.600 1	0.000 4	0.174 4	0.226 2	0.000 1	0.579 2	0.200 4	0.887 1	0.750 1	0.428 3	0.782 4	0.438 1	0.000 1	0.795 3	0.063 4	0.003 3	0.500 1	0.000 2	0.333 1	1.000 1	0.742 3	0.083 2	0.585 2	0.417 5	0.448 7	0.496 3	0.055 3	0.734 2	0.472 2	0.174 6	0.000 1	0.250 3	0.000 2	0.688 2	0.000 2	1.000 1	0.000 1	0.631 4	0.667 4	0.275 1	0.694 7	1.000 1	0.000 3	0.328 1	0.422 2	0.000 6	1.000 1	0.500 5	0.638 4	0.000 2	0.391 4	0.000 3	0.582 4	0.800 2	0.208 6	0.000 4	0.246 2	0.667 5	0.000 3	0.638 2	0.167 1	0.000 3	0.000 2	0.778 2	0.000 1	0.000 1	0.563 1	0.614 4	0.841 1	0.333 3	0.250 3	0.938 6	0.569 1	0.500 5	0.695 2	0.264 4	0.863 2	0.000 1	0.000 3	0.550 6	1.000 1	0.668 2	0.000 1	0.000 4	0.667 3	0.000 1	0.333 5	0.333 3	0.665 1	0.434 4	0.264 3

CSC-Pretrain Inst.	0.209 6	0.361 7	0.157 6	0.085 6	0.506 3	0.007 5	0.500 1	0.000 1	0.000 3	0.000 1	0.000 7	0.093 7	0.221 6	0.667 5	0.524 7	0.400 4	0.000 4	0.000 4	0.000 4	0.004 6	0.000 2	0.000 7	0.109 7	0.589 5	0.000 6	0.000 2	0.059 7	0.000 1	0.000 2	0.000 5	0.322 4	0.000 4	0.000 1	0.000 5	0.000 3	0.405 5	0.055 6	0.700 6	0.000 2	0.000 4	0.028 6	0.091 7	0.083 4	0.000 2	0.000 1	0.667 3	0.768 2	0.000 6	0.807 4	1.000 1	0.776 6	0.000 5	0.000 7	0.340 7	0.000 3	0.000 5	0.103 7	0.000 5	0.750 1	0.200 4	0.634 7	0.053 7	0.246 5	0.677 3	0.006 7	0.198 5	0.432 6	0.000 2	0.000 6	0.050 6	0.000 2	0.000 4	0.000 3	0.111 7	0.356 6	0.500 4	0.188 7	0.000 6	0.220 6	0.448 3	0.050 7	0.000 1	0.000 1	0.000 5	0.000 4	0.032 7	0.000 4	0.000 1	0.396 3	0.000 6	0.573 6	0.000 7	0.228 5	0.747 6	0.000 4	0.000 1	0.573 7	0.021 7	0.000 5	0.000 5	0.000 2	0.000 5	0.500 6	0.573 5	0.000 4	0.000 7	0.125 6	0.592 4	0.364 7	0.000 5	0.450 7	0.000 4	0.364 3	0.000 1	0.000 5	0.000 2	0.340 5	0.000 2	0.000 4	0.000 1	0.610 5	0.833 2	0.221 2	0.702 6	0.000 4	0.000 3	0.135 7	0.094 6	0.125 2	0.571 6	0.500 5	0.143 7	0.000 2	0.125 5	0.000 3	0.618 3	0.667 6	0.115 7	0.000 4	0.125 4	1.000 1	0.000 3	0.500 4	0.000 2	0.000 3	0.000 2	0.502 6	0.000 1	0.000 1	0.312 6	0.248 7	0.050 6	0.000 7	0.000 5	0.997 3	0.420 5	0.500 5	0.149 7	0.451 2	0.748 4	0.000 1	0.000 3	0.636 4	0.667 7	0.600 3	0.000 1	0.000 4	0.278 7	0.000 1	0.333 5	0.000 7	0.294 4	0.381 7	0.110 5
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.	0.246 5	0.413 5	0.170 5	0.130 5	0.455 7	0.003 7	0.500 1	0.000 1	0.000 3	0.000 1	0.017 6	0.333 6	0.111 7	1.000 1	0.681 6	0.400 4	0.000 4	0.000 4	1.000 1	0.003 7	0.000 2	0.167 4	0.190 4	0.637 3	0.067 5	0.000 2	0.081 5	0.000 1	0.000 2	0.000 5	0.264 6	0.000 4	0.000 1	0.000 5	0.000 3	0.387 6	0.031 7	0.754 4	0.000 2	0.000 4	0.151 4	0.135 4	0.056 6	0.000 2	0.000 1	0.582 5	0.589 7	0.500 2	0.815 3	1.000 1	0.903 1	0.000 5	0.097 4	0.588 5	0.000 3	0.000 5	0.234 5	0.000 5	0.500 5	0.400 1	0.682 6	0.156 5	0.159 6	0.750 1	0.046 5	0.125 6	0.660 4	0.000 2	0.200 4	0.000 7	0.000 2	0.000 4	0.000 3	0.164 5	0.402 5	0.500 4	0.373 5	0.025 5	0.143 7	0.426 4	0.317 4	0.000 1	0.000 1	0.000 5	0.000 4	0.063 5	0.000 4	0.000 1	0.000 6	0.000 6	0.575 5	0.250 4	0.241 4	0.772 5	0.000 4	0.000 1	0.653 6	0.034 5	0.000 5	0.000 5	0.000 2	0.000 5	1.000 1	0.561 6	0.000 4	0.100 4	0.500 1	0.541 5	0.452 5	0.000 5	0.581 5	0.000 4	0.364 3	0.000 1	0.000 5	0.000 2	0.571 4	0.000 2	0.000 4	0.000 1	0.568 6	0.511 6	0.167 4	0.857 4	0.000 4	0.000 3	0.164 4	0.112 5	0.000 6	0.530 7	1.000 1	0.286 5	0.000 2	0.125 5	0.000 3	0.464 7	0.706 5	0.208 5	0.000 4	0.125 4	0.744 4	0.000 3	0.500 4	0.000 2	0.000 3	0.000 2	0.511 5	0.000 1	0.000 1	0.344 4	0.541 5	0.068 5	0.333 3	0.000 5	1.000 1	0.196 6	0.533 4	0.318 5	0.000 6	0.748 5	0.000 1	0.000 3	0.690 3	1.000 1	0.400 5	0.000 1	0.000 4	0.667 3	0.000 1	0.333 5	0.333 3	0.270 5	0.399 5	0.083 6
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
TD3D Scannet200	0.320 4	0.501 4	0.264 4	0.164 4	0.506 4	0.062 4	0.500 1	0.000 1	0.000 3	0.000 1	0.208 2	0.431 4	0.252 5	1.000 1	0.733 5	0.587 2	0.000 4	0.008 3	0.000 4	0.106 2	0.000 2	0.356 2	0.123 6	0.686 1	0.101 4	0.000 2	0.152 4	0.000 1	0.000 2	0.226 3	0.280 5	0.000 4	0.000 1	0.250 2	0.000 3	0.619 3	0.061 5	0.841 2	0.000 2	0.000 4	0.167 3	0.194 3	0.333 3	0.000 2	0.000 1	0.667 3	0.820 1	0.250 4	0.790 5	1.000 1	0.879 3	0.077 2	0.094 5	0.708 1	0.217 2	0.049 4	0.634 2	0.792 1	0.331 6	0.033 7	0.716 4	0.159 4	0.396 4	0.331 6	0.099 4	0.415 2	0.842 1	0.000 2	0.458 2	0.542 1	0.000 2	0.101 3	0.000 3	0.218 3	0.513 4	0.500 4	0.458 4	0.104 4	0.516 2	0.456 2	0.268 6	0.000 1	0.000 1	0.400 2	0.022 2	0.233 3	0.143 3	0.000 1	0.677 1	0.400 2	0.504 7	0.095 5	0.083 7	0.890 3	0.061 3	0.000 1	0.906 1	0.076 3	0.231 1	0.125 4	0.000 2	0.003 4	0.792 5	0.881 1	0.000 4	0.098 5	0.125 6	0.498 6	0.459 4	0.063 2	0.715 3	0.000 4	0.241 5	0.000 1	0.396 2	0.063 1	0.605 3	0.000 2	0.000 4	0.000 1	0.448 7	0.629 5	0.202 3	0.967 1	0.250 3	0.038 1	0.192 3	0.185 4	0.083 5	1.000 1	1.000 1	0.857 2	0.000 2	0.470 3	0.012 1	0.565 5	0.798 3	0.621 1	0.111 2	0.500 1	1.000 1	0.017 2	0.509 3	0.000 2	0.008 2	1.000 1	0.525 4	0.000 1	0.000 1	0.332 5	0.679 2	0.264 4	0.333 3	0.267 2	1.000 1	0.549 2	0.299 7	0.387 4	0.328 3	0.744 6	0.000 1	0.000 3	0.435 7	1.000 1	0.283 6	0.000 1	0.196 2	0.817 1	0.000 1	0.472 2	0.222 5	0.123 6	0.560 3	0.156 4
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024

ScanNet200 3D Semantic Instance Benchmark