ScanNet200 3D Semantic Instance Results

The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Similarly to the ScanNet benchmark in ScanNet200 our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP) for all 200 categories. Note that multiple predictions of the same ground truth instance are penalized as false positives.

This table lists the benchmark results for the ScanNet200 3D semantic instance scenario.

Method	avg ap 25%	head ap 25%	common ap 25%	tail ap 25%	chair	table	door	couch	cabinet	shelf	desk	office chair	bed	pillow	sink	picture	window	toilet	bookshelf	monitor	curtain	book	coffee table	box	refrigerator	lamp	kitchen cabinet	towel	clothes	tv	nightstand	counter	dresser	stool	cushion	plant	ceiling	bathtub	end table	dining table	keyboard	bag	backpack	toilet paper	printer	tv stand	whiteboard	blanket	shower curtain	trash can	closet	stairs	microwave	stove	shoe	computer tower	bottle	bin	bench	board	washing machine	mirror	copier	basket	sofa chair	file cabinet	fan	laptop	shower	paper	person	paper towel dispenser	oven	blinds	rack	plate	blackboard	piano	suitcase	radiator	recycling bin	wardrobe	soap dispenser	telephone	bucket	clock	stand	light	laundry basket	pipe	clothes dryer	seat	speaker	column	bicycle	ladder	bathroom stall	shower wall	cup	jacket	storage bin	coffee maker	dishwasher	paper towel roll	machine	mat	windowsill	bar	toaster	bulletin board	ironing board	kitchen counter	doorframe	toilet paper dispenser	mini fridge	fire extinguisher	ball	hat	shower curtain rod	water cooler	paper cutter	tray	ledge	mouse	cart	storage container	scale	tissue box	light switch	power outlet	decoration	sign	projector	vacuum cleaner	candle	plunger	stuffed animal	headphones	dish rack	broom	range hood	water bottle	vent	shower floor	water pitcher	mailbox	bowl	paper bag	music stand	projector screen	laundry detergent	object	bathroom vanity	laundry hamper	bathroom stall door	ceiling light	trash bin	dumbbell	stair rail	tube	bathroom cabinet	closet rod	coffee kettle	structure	shower head	keyboard piano	case of water bottles	coat rack	storage organizer	folded chair	fire alarm	power strip	calendar	poster

TD3D Scannet200	0.379 2	0.603 2	0.306 2	0.190 2	0.885 1	0.755 1	0.800 2	0.958 1	0.390 2	0.260 2	0.866 2	0.232 1	0.979 2	0.523 3	0.869 3	0.559 5	0.689 2	1.000 1	0.795 1	0.905 2	0.748 1	0.173 5	0.825 1	0.173 2	0.970 1	0.457 1	0.615 2	0.456 2	0.200 1	0.621 4	0.906 2	0.553 1	0.517 1	0.510 1	0.220 2	0.715 1	0.706 2	1.000 1	0.113 2	0.792 1	0.717 2	0.073 2	0.635 2	0.557 1	0.638 1	1.000 1	0.205 5	0.146 3	1.000 1	0.769 5	0.186 2	1.000 1	0.710 5	0.778 1	0.415 1	0.834 4	0.226 2	0.021 2	0.590 2	0.356 2	0.817 1	0.477 5	1.000 1	0.000 1	0.635 1	0.843 2	0.427 1	0.270 4	0.125 2	0.000 2	0.102 3	1.000 1	0.125 2	0.000 2	0.000 1	0.000 2	0.000 3	0.125 4	0.370 3	0.622 5	0.221 1	0.196 2	0.836 1	0.288 2	0.000 2	0.093 2	0.020 2	0.294 2	0.000 1	0.075 2	0.667 1	0.038 1	0.111 1	0.250 4	0.000 4	0.526 2	0.495 3	0.908 1	0.111 3	0.259 1	0.003 3	0.667 2	0.045 5	0.000 2	0.000 1	0.400 1	0.274 3	0.000 1	0.274 2	0.226 2	0.000 1	0.520 2	0.302 5	0.731 2	0.103 3	0.458 1	0.500 1	0.000 1	1.000 1	0.472 1	0.792 3	0.000 1	0.088 2	0.061 2	0.250 1	0.009 2	0.250 2	0.333 3	0.181 2	0.396 2	0.051 2	0.012 1	0.000 1	0.458 4	0.000 1	0.424 5	0.000 1	0.101 2	0.390 5	0.000 1	0.833 2	0.000 1	0.000 1	0.857 2	0.222 3	1.000 1	0.000 1	0.003 2	0.000 1	0.000 2	0.000 1	0.102 2	0.275 5	0.400 2	0.735 2	0.061 3	0.433 3	0.533 3	0.625 1	0.000 2	0.000 1	0.259 4	0.000 1	0.000 1	0.000 2	0.500 2	0.000 1	0.000 2	1.000 1	0.600 1	0.000 2	0.250 1	0.000 2	0.000 1
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
CSC-Pretrain Inst.	0.275 5	0.466 5	0.218 4	0.110 5	0.783 5	0.383 5	0.783 4	0.829 5	0.367 4	0.168 5	0.305 5	0.000 3	0.661 5	0.413 5	0.869 2	0.719 1	0.546 5	0.997 3	0.685 4	0.841 4	0.555 5	0.277 4	0.768 2	0.132 3	0.779 5	0.448 3	0.364 5	0.212 5	0.161 4	0.768 2	0.692 5	0.000 4	0.395 3	0.500 2	0.000 3	0.450 5	0.591 3	1.000 1	0.020 5	0.000 3	0.423 5	0.007 5	0.625 3	0.420 3	0.505 3	1.000 1	0.353 2	0.119 5	0.571 4	0.819 2	0.014 3	1.000 1	0.774 2	0.689 4	0.311 5	0.866 2	0.067 3	0.000 3	0.400 3	0.000 5	0.278 5	0.501 3	1.000 1	0.000 1	0.162 5	0.584 5	0.286 3	0.206 5	0.125 2	0.000 2	0.084 4	0.649 2	0.000 3	0.000 2	0.000 1	0.000 2	0.000 3	0.125 4	0.312 4	0.727 3	0.221 2	0.000 4	0.667 4	0.114 3	0.000 2	0.000 3	0.000 3	0.065 5	0.000 1	0.004 4	0.278 3	0.000 3	0.000 3	0.500 2	0.000 4	0.571 1	0.000 5	0.250 4	0.019 5	0.145 5	0.000 4	0.667 2	0.200 4	0.000 2	0.000 1	0.200 3	0.258 4	0.000 1	0.000 4	0.000 4	0.000 1	0.369 4	0.429 3	0.613 4	0.000 5	0.000 4	0.500 1	0.000 1	0.500 5	0.333 5	0.500 4	0.000 1	0.106 1	0.000 3	0.000 4	0.000 3	0.000 3	0.333 3	0.000 3	0.000 3	0.000 3	0.000 3	0.000 1	0.918 1	0.000 1	0.638 1	0.000 1	0.000 3	0.750 1	0.000 1	0.833 2	0.000 1	0.000 1	0.143 5	0.000 5	0.750 3	0.000 1	0.000 3	0.000 1	0.000 2	0.000 1	0.063 4	0.377 4	0.200 3	0.222 5	0.055 4	0.500 2	0.677 2	0.250 4	0.000 2	0.000 1	0.500 2	0.000 1	0.000 1	0.000 2	0.500 2	0.000 1	0.000 2	0.000 2	0.115 5	0.000 2	0.000 2	0.000 2	0.000 1
Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. CVPR 2021
LGround Inst.	0.314 3	0.529 3	0.225 3	0.155 3	0.810 3	0.625 3	0.798 3	0.940 2	0.372 3	0.217 3	0.484 3	0.000 3	0.927 3	0.528 2	0.826 5	0.694 2	0.605 3	1.000 1	0.731 2	0.846 3	0.716 3	0.350 2	0.589 5	0.123 4	0.857 4	0.457 2	0.578 3	0.376 4	0.183 2	0.765 3	0.800 3	0.000 4	0.278 4	0.500 2	0.000 3	0.659 2	0.569 4	1.000 1	0.093 3	0.000 3	0.539 3	0.010 3	0.578 5	0.378 4	0.571 2	1.000 1	0.337 3	0.252 1	0.530 5	0.814 3	0.000 4	0.744 5	0.743 3	0.746 3	0.346 3	0.863 3	0.067 3	0.000 3	0.400 3	0.167 3	0.667 3	0.488 4	1.000 1	0.000 1	0.208 4	0.783 3	0.166 4	0.375 2	0.071 5	0.000 2	0.200 1	0.607 4	0.000 3	0.000 2	0.000 1	0.000 2	1.000 1	0.500 1	0.517 1	0.716 4	0.221 2	0.000 4	0.706 3	0.085 5	0.000 2	0.000 3	0.000 3	0.077 4	0.000 1	0.063 3	0.278 3	0.000 3	0.000 3	0.500 2	0.083 3	0.181 5	0.515 2	0.286 3	0.144 1	0.219 2	0.042 1	0.582 4	0.400 3	0.000 2	0.000 1	0.000 5	0.305 2	0.000 1	0.000 4	0.036 3	0.000 1	0.413 3	0.500 2	0.533 5	0.250 2	0.200 2	0.500 1	0.000 1	1.000 1	0.472 1	1.000 1	0.000 1	0.000 4	0.000 3	0.250 1	0.000 3	0.000 3	0.333 3	0.000 3	0.000 3	0.000 3	0.000 3	0.000 1	0.600 3	0.000 1	0.594 2	0.000 1	0.000 3	0.500 3	0.000 1	0.647 5	0.000 1	0.000 1	0.429 3	0.333 2	0.500 5	0.000 1	0.000 3	0.000 1	0.000 2	0.000 1	0.069 3	0.696 1	0.050 5	0.556 3	0.031 5	0.042 5	0.750 1	0.250 4	0.000 2	0.000 1	0.630 1	0.000 1	0.000 1	0.000 2	0.500 2	0.000 1	0.000 2	0.000 2	0.400 2	0.000 2	0.000 2	0.000 2	0.000 1
David Rozenberszki, Or Litany, Angela Dai: Language-Grounded Indoor 3D Semantic Segmentation in the Wild.
Mask3D Scannet200	0.445 1	0.653 1	0.392 1	0.254 1	0.844 2	0.746 2	0.818 1	0.888 4	0.556 1	0.262 1	0.890 1	0.025 2	1.000 1	0.608 1	0.930 1	0.694 3	0.721 1	0.930 5	0.686 3	0.966 1	0.615 4	0.440 1	0.725 4	0.201 1	0.890 3	0.414 4	0.827 1	0.552 1	0.158 5	0.806 1	0.924 1	0.042 3	0.512 2	0.412 5	0.226 1	0.604 3	0.830 1	1.000 1	0.125 1	0.792 1	0.815 1	0.097 1	0.648 1	0.551 2	0.354 4	1.000 1	0.630 1	0.241 2	1.000 1	0.853 1	0.204 1	0.974 4	0.841 1	0.778 1	0.358 2	0.927 1	0.300 1	0.045 1	0.640 1	0.363 1	0.745 2	0.710 1	1.000 1	0.000 1	0.330 2	0.943 1	0.315 2	0.600 1	1.000 1	0.027 1	0.080 5	0.556 5	0.500 1	0.409 1	0.000 1	0.194 1	1.000 1	0.500 1	0.493 2	0.761 2	0.053 4	0.042 3	0.780 2	0.454 1	0.009 1	0.333 1	0.050 1	0.321 1	0.000 1	0.084 1	0.552 2	0.008 2	0.027 2	0.750 1	0.500 1	0.442 3	0.657 1	0.765 2	0.120 2	0.183 3	0.021 2	1.000 1	0.510 2	0.016 1	0.000 1	0.400 1	0.619 1	0.000 1	0.396 1	0.290 1	0.000 1	0.741 1	0.699 1	1.000 1	0.260 1	0.017 3	0.125 5	0.000 1	0.792 4	0.399 4	1.000 1	0.000 1	0.049 3	0.265 1	0.063 3	0.000 3	1.000 1	0.335 2	0.381 1	0.500 1	0.250 1	0.004 2	0.000 1	0.727 2	0.000 1	0.538 3	0.000 1	0.188 1	0.677 2	0.000 1	0.930 1	0.000 1	0.000 1	0.966 1	0.391 1	0.908 2	0.000 1	0.028 1	0.000 1	1.000 1	0.000 1	0.152 1	0.451 2	0.458 1	0.971 1	0.573 1	0.606 1	0.167 5	0.625 1	0.004 1	0.000 1	0.058 5	0.000 1	0.000 1	1.000 1	1.000 1	0.000 1	0.056 1	0.000 2	0.200 3	0.309 1	0.000 2	1.000 1	0.000 1
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
Minkowski 34D Inst.	0.280 4	0.488 4	0.192 5	0.124 4	0.804 4	0.518 4	0.772 5	0.904 3	0.337 5	0.191 4	0.443 4	0.000 3	0.861 4	0.502 4	0.868 4	0.669 4	0.587 4	0.997 3	0.467 5	0.828 5	0.732 2	0.342 3	0.745 3	0.119 5	0.918 2	0.404 5	0.419 4	0.398 3	0.172 3	0.618 5	0.743 4	0.167 2	0.077 5	0.500 2	0.000 3	0.568 4	0.506 5	1.000 1	0.044 4	0.000 3	0.502 4	0.010 4	0.593 4	0.284 5	0.305 5	0.903 5	0.213 4	0.142 4	0.981 3	0.790 4	0.000 4	1.000 1	0.715 4	0.538 5	0.346 4	0.830 5	0.067 3	0.000 3	0.400 3	0.074 4	0.333 4	0.551 2	1.000 1	0.000 1	0.292 3	0.777 4	0.118 5	0.317 3	0.100 4	0.000 2	0.191 2	0.648 3	0.000 3	0.000 2	0.000 1	0.000 2	0.000 3	0.500 1	0.213 5	0.825 1	0.021 5	0.333 1	0.648 5	0.098 4	0.000 2	0.000 3	0.000 3	0.077 3	0.000 1	0.000 5	0.150 5	0.000 3	0.000 3	0.000 5	0.225 2	0.281 4	0.447 4	0.000 5	0.090 4	0.148 4	0.000 4	0.479 5	0.542 1	0.000 2	0.000 1	0.200 3	0.131 5	0.000 1	0.250 3	0.000 4	0.000 1	0.159 5	0.396 4	0.677 3	0.021 4	0.000 4	0.500 1	0.000 1	1.000 1	0.442 3	0.125 5	0.000 1	0.000 4	0.000 3	0.000 4	0.333 1	0.000 3	0.528 1	0.000 3	0.000 3	0.000 3	0.000 3	0.000 1	0.200 5	0.000 1	0.516 4	0.000 1	0.000 3	0.500 3	0.000 1	0.833 2	0.000 1	0.000 1	0.286 4	0.083 4	0.750 3	0.000 1	0.000 3	0.000 1	0.000 2	0.000 1	0.059 5	0.445 3	0.200 3	0.535 4	0.070 2	0.167 4	0.385 4	0.375 3	0.000 2	0.000 1	0.333 3	0.000 1	0.000 1	0.000 2	0.500 2	0.000 1	0.000 2	0.000 2	0.200 3	0.000 2	0.000 2	0.000 2	0.000 1
C. Choy, J. Gwak, S. Savarese: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. CVPR 2019

ScanNet200 3D Semantic Instance Benchmark