The 3D semantic instance prediction task involves detecting and segmenting the object in an 3D scan mesh.

Evaluation and metrics

Our evaluation ranks all methods according to the average precision for each class. We report the mean average precision AP at overlap 0.25 (AP 25%), overlap 0.5 (AP 50%), and over overlaps in the range [0.5:0.95:0.05] (AP). Note that multiple predictions of the same ground truth instance are penalized as false positives.



This table lists the benchmark results for the 3D semantic instance scenario.




Method Infoavg ap 50%bathtubbedbookshelfcabinetchaircountercurtaindeskdoorotherfurniturepicturerefrigeratorshower curtainsinksofatabletoiletwindow
sort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysort bysorted bysort bysort bysort bysort by
Competitor-SPFormer0.800 91.000 10.986 30.845 180.705 140.915 120.532 160.733 200.757 120.733 120.708 110.698 210.648 390.981 410.890 10.830 210.796 100.997 420.644 6
KmaxOneFormerNetpermissive0.783 180.903 580.981 50.794 290.706 120.931 50.561 110.701 270.706 180.727 160.697 160.731 110.689 241.000 10.856 20.750 430.761 251.000 10.599 24
Queryformer0.787 161.000 10.933 130.601 540.754 10.886 200.558 120.661 390.767 110.665 220.716 80.639 290.808 51.000 10.844 30.897 60.804 81.000 10.624 13
Mask3D0.780 191.000 10.786 470.716 420.696 170.885 210.500 170.714 220.810 50.672 210.715 90.679 240.809 11.000 10.831 40.833 200.787 131.000 10.602 22
Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe: Mask3D for 3D Semantic Instance Segmentation. ICRA 2023
EV3D0.811 41.000 10.968 110.852 160.717 80.921 100.574 80.677 310.748 130.730 140.703 150.795 30.809 11.000 10.831 40.854 110.778 171.000 10.638 9
Spherical Mask(CtoF)0.812 31.000 10.973 80.852 160.718 70.917 110.574 70.677 310.748 130.729 150.715 90.795 30.809 11.000 10.831 40.854 110.787 131.000 10.638 8
DCD0.798 121.000 10.878 230.792 300.693 190.936 30.596 30.685 300.663 270.736 90.717 70.788 50.693 221.000 10.825 70.840 170.837 11.000 10.689 1
SSTNetpermissive0.698 321.000 10.697 640.888 80.556 430.803 400.387 250.626 470.417 470.556 320.585 370.702 160.600 451.000 10.824 80.720 510.692 321.000 10.509 41
Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui Jia: Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks. ICCV2021
PointRel0.816 11.000 10.971 90.908 60.743 20.923 90.573 90.714 220.695 200.734 110.747 20.725 130.809 11.000 10.814 90.899 50.820 41.000 10.610 19
: Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation. CVPR 2025
SIM3D0.803 71.000 10.967 120.863 150.692 200.924 80.552 130.732 210.667 250.732 130.662 190.796 20.789 91.000 10.803 100.864 80.766 231.000 10.643 7
Competitor-MAFT0.816 11.000 10.983 40.872 110.718 60.941 20.588 50.652 410.819 30.776 30.720 60.780 60.769 121.000 10.797 110.813 310.798 91.000 10.659 5
TST3D0.795 131.000 10.929 150.918 40.709 110.884 220.596 40.704 250.769 100.734 100.644 240.699 200.751 141.000 10.794 120.876 70.757 260.997 420.550 36
Duc Tran Dang Trung, Byeongkeun Kang, Yeejin Lee: MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation. ACM Multimedia 2024
OneFormer3Dcopyleft0.801 81.000 10.973 70.909 50.698 160.928 60.582 60.668 370.685 210.780 20.687 170.698 220.702 161.000 10.794 130.900 40.784 150.986 550.635 10
Maxim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: OneFormer3D: One Transformer for Unified Point Cloud Segmentation.
InsSSM0.799 111.000 10.915 160.710 440.729 40.925 70.664 10.670 350.770 90.766 50.739 40.737 90.700 171.000 10.792 140.829 230.815 50.997 420.625 12
Lei Yao, Yi Wang, Moyun Liu, Lap-Pui Chau: SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation. TCSVT, 2024
MG-Former0.791 141.000 10.980 60.837 210.626 290.897 150.543 140.759 150.800 70.766 60.659 200.769 70.697 201.000 10.791 150.707 520.791 121.000 10.610 18
ExtMask3D0.789 151.000 10.988 20.756 370.706 130.912 140.429 230.647 430.806 60.755 70.673 180.689 230.772 111.000 10.789 160.852 130.811 61.000 10.617 15
VDG-Uni3DSeg0.804 61.000 10.990 10.886 90.688 210.912 130.602 20.703 260.786 80.771 40.708 120.700 180.669 270.981 410.789 170.903 20.772 201.000 10.609 20
SPFormerpermissive0.770 200.903 580.903 180.806 260.609 360.886 190.568 100.815 60.705 190.711 170.655 210.652 280.685 251.000 10.789 180.809 320.776 191.000 10.583 28
Sun Jiahao, Qing Chunmei, Tan Junpeng, Xu Xiangmin: Superpoint Transformer for 3D Scene Instance Segmentation. AAAI 2023 [Oral]
GraphCut0.732 261.000 10.788 450.724 410.642 280.859 280.248 350.787 110.618 320.596 270.653 230.722 150.583 511.000 10.766 190.861 90.825 31.000 10.504 42
UniPerception0.800 91.000 10.930 140.872 110.727 50.862 270.454 220.764 130.820 20.746 80.706 130.750 80.772 100.926 490.764 200.818 290.826 20.997 420.660 4
ISBNetpermissive0.757 231.000 10.904 170.731 400.678 230.895 160.458 200.644 450.670 240.710 180.620 290.732 100.650 291.000 10.756 210.778 350.779 161.000 10.614 16
Tuan Duc Ngo, Binh-Son Hua, Khoi Nguyen: ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution. CVPR 2023
PointComp0.811 40.850 600.969 100.864 140.739 30.946 10.539 150.671 340.835 10.700 190.742 30.817 10.766 131.000 10.755 220.909 10.808 71.000 10.687 2
OccuSeg+instance0.672 381.000 10.758 560.682 460.576 410.842 300.477 190.504 610.524 380.567 300.585 380.451 430.557 531.000 10.751 230.797 330.563 541.000 10.467 50
Lei Han, Tian Zheng, Lan Xu, Lu Fang: OccuSeg: Occupancy-aware 3D Instance Segmentation. CVPR2020
DANCENET0.680 351.000 10.807 370.733 390.600 370.768 490.375 270.543 550.538 370.610 250.599 320.498 390.632 430.981 410.739 240.856 100.633 440.882 660.454 51
SoftGrouppermissive0.761 221.000 10.808 360.845 180.716 90.862 260.243 360.824 40.655 290.620 230.734 50.699 190.791 80.981 410.716 250.844 160.769 211.000 10.594 26
Thang Vu, Kookhoi Kim, Tung M. Luu, Xuan Thanh Nguyen, Chang D. Yoo: SoftGroup for 3D Instance Segmentaiton on Point Clouds. CVPR 2022 [Oral]
Mask-Group0.664 391.000 10.822 310.764 360.616 350.815 360.139 460.694 290.597 330.459 450.566 410.599 320.600 450.516 690.715 260.819 280.635 421.000 10.603 21
Min Zhong, Xinghao Chen, Xiaokang Chen, Gang Zeng, Yunhe Wang: MaskGroup: Hierarchical Point Grouping and Masking for 3D Instance Segmentation. ICME 2022
ODIN - Inspermissive0.693 341.000 10.880 220.647 490.620 320.779 470.336 280.501 620.681 220.577 280.595 340.679 250.683 261.000 10.709 270.816 300.637 410.770 710.557 34
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki: ODIN: A Single Model for 2D and 3D Segmentation. CVPR 2024
MAFT0.786 171.000 10.894 210.807 250.694 180.893 180.486 180.674 330.740 150.786 10.704 140.727 120.739 151.000 10.707 280.849 150.756 271.000 10.685 3
PBNetpermissive0.747 251.000 10.818 320.837 220.713 100.844 290.457 210.647 430.711 170.614 240.617 310.657 270.650 291.000 10.692 290.822 250.765 241.000 10.595 25
Weiguang Zhao, Yuyao Yan, Chaolong Yang, Jianan Ye, Xi Yang, Kaizhu Huang: Divide and Conquer: 3D Instance Segmentation With Point-Wise Binarization. ICCV 2023
PointGroup0.636 461.000 10.765 510.624 510.505 540.797 420.116 500.696 280.384 510.441 470.559 420.476 410.596 481.000 10.666 300.756 420.556 580.997 420.513 40
Li Jiang, Hengshuang Zhao, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia: PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation. CVPR 2020 [oral]
Box2Mask0.677 371.000 10.847 270.771 330.509 520.816 350.277 320.558 540.482 390.562 310.640 250.448 440.700 171.000 10.666 300.852 140.578 510.997 420.488 46
Julian Chibane, Francis Engelmann, Tuan Anh Tran, Gerard Pons-Moll: Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes. ECCV 2022
DENet0.629 491.000 10.797 430.608 520.589 380.627 620.219 380.882 10.310 560.402 590.383 620.396 520.650 291.000 10.663 320.543 700.691 331.000 10.568 31
INS-Conv-instance0.657 401.000 10.760 540.667 480.581 390.863 250.323 290.655 400.477 400.473 430.549 430.432 470.650 291.000 10.655 330.738 470.585 500.944 580.472 49
HAISpermissive0.699 311.000 10.849 260.820 230.675 240.808 390.279 310.757 160.465 420.517 360.596 330.559 330.600 451.000 10.654 340.767 380.676 340.994 510.560 33
Shaoyu Chen, Jiemin Fang, Qian Zhang, Wenyu Liu, Xinggang Wang: Hierarchical Aggregation for 3D Instance Segmentation. ICCV 2021
SoftGroup++0.769 211.000 10.803 400.937 10.684 220.865 240.213 390.870 20.664 260.571 290.758 10.702 170.807 61.000 10.653 350.902 30.792 111.000 10.626 11
IPCA-Inst0.731 271.000 10.788 460.884 100.698 150.788 450.252 340.760 140.646 300.511 370.637 260.665 260.804 71.000 10.644 360.778 360.747 291.000 10.561 32
TD3Dpermissive0.751 241.000 10.774 480.867 130.621 310.934 40.404 240.706 240.812 40.605 260.633 270.626 300.690 231.000 10.640 370.820 260.777 181.000 10.612 17
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich: Top-Down Beats Bottom-Up in 3D Instance Segmentation. WACV 2024
DD-UNet+Group0.635 470.667 620.797 440.714 430.562 420.774 480.146 430.810 80.429 460.476 420.546 450.399 510.633 411.000 10.632 380.722 500.609 451.000 10.514 39
H. Liu, R. Liu, K. Yang, J. Zhang, K. Peng, R. Stiefelhagen: HIDA: Towards Holistic Indoor Understanding for the Visually Impaired via Semantic Instance Segmentation with a Wearable Solid-State LiDAR Sensor. ICCVW 2021
CSC-Pretrained0.648 411.000 10.810 340.768 340.523 500.813 370.143 450.819 50.389 500.422 540.511 470.443 450.650 291.000 10.624 390.732 480.634 431.000 10.375 58
OSIS0.605 511.000 10.801 410.599 550.535 460.728 550.286 300.436 660.679 230.491 390.433 540.256 590.404 660.857 510.620 400.724 490.510 641.000 10.539 37
TopoSeg0.725 281.000 10.806 390.933 20.668 250.758 500.272 330.734 190.630 310.549 330.654 220.606 310.697 210.966 460.612 410.839 180.754 281.000 10.573 29
DualGroup0.694 331.000 10.799 420.811 240.622 300.817 340.376 260.805 90.590 340.487 410.568 400.525 380.650 290.835 590.600 420.829 220.655 371.000 10.526 38
RPGN0.643 431.000 10.758 550.582 600.539 440.826 330.046 590.765 120.372 520.436 510.588 350.539 370.650 291.000 10.577 430.750 440.653 390.997 420.495 45
Shichao Dong, Guosheng Lin, Tzu-Yi Hung: Learning Regional Purity for Instance Segmentation on 3D Point Clouds. ECCV 2022
GICN0.638 451.000 10.895 200.800 270.480 560.676 580.144 440.737 180.354 530.447 460.400 600.365 550.700 171.000 10.569 440.836 190.599 461.000 10.473 48
PE0.645 421.000 10.773 500.798 280.538 450.786 460.088 540.799 100.350 540.435 520.547 440.545 350.646 400.933 480.562 450.761 410.556 590.997 420.501 44
Biao Zhang, Peter Wonka: Point Cloud Instance Segmentation using Probabilistic Embeddings. CVPR 2021
SSEC0.707 301.000 10.850 250.924 30.648 260.747 530.162 410.862 30.572 350.520 350.624 280.549 340.649 381.000 10.560 460.706 530.768 221.000 10.591 27
One_Thing_One_Clickpermissive0.529 590.667 620.718 590.777 320.399 600.683 570.000 710.669 360.138 660.391 610.374 630.539 360.360 670.641 660.556 470.774 370.593 480.997 420.251 66
Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu: One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation. CVPR 2021
AOIA0.601 521.000 10.761 530.687 450.485 550.828 320.008 660.663 380.405 490.405 580.425 570.490 400.596 480.714 620.553 480.779 340.597 470.992 520.424 55
Dyco3Dcopyleft0.641 441.000 10.841 280.893 70.531 470.802 410.115 510.588 520.448 440.438 490.537 460.430 490.550 540.857 510.534 490.764 400.657 360.987 540.568 30
Tong He; Chunhua Shen; Anton van den Hengel: DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution. CVPR2021
Sparse R-CNN0.515 601.000 10.538 720.282 660.468 580.790 440.173 400.345 680.429 450.413 570.484 490.176 630.595 500.591 670.522 500.668 610.476 650.986 560.327 62
DKNet0.718 291.000 10.814 330.782 310.619 330.872 230.224 370.751 170.569 360.677 200.585 360.724 140.633 410.981 410.515 510.819 270.736 301.000 10.617 14
Yizheng Wu, Min Shi, Shuaiyuan Du, Hao Lu, Zhiguo Cao, Weicai Zhong: 3D Instances as 1D Kernels. ECCV 2022
PanopticFusion-inst0.478 630.667 620.712 620.595 560.259 690.550 690.000 710.613 490.175 650.250 700.434 530.437 460.411 650.857 510.485 520.591 690.267 750.944 580.359 59
Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji: PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things. IROS 2019 (to appear)
RWSeg0.567 550.528 720.708 630.626 500.580 400.745 540.063 570.627 460.240 600.400 600.497 480.464 420.515 551.000 10.475 530.745 450.571 521.000 10.429 54
ClickSeg_Instance0.539 581.000 10.621 670.300 650.530 480.698 560.127 490.533 560.222 610.430 530.400 590.365 550.574 520.938 470.472 540.659 620.543 600.944 580.347 61
SegGroup_inspermissive0.445 670.667 620.773 490.185 730.317 650.656 590.000 710.407 670.134 670.381 620.267 670.217 610.476 580.714 620.452 550.629 650.514 621.000 10.222 69
An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie Zhou: SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation. TIP 2022
MTML0.549 571.000 10.807 380.588 580.327 640.647 600.004 680.815 70.180 630.418 550.364 640.182 620.445 601.000 10.442 560.688 600.571 531.000 10.396 56
Jean Lahoud, Bernard Ghanem, Marc Pollefeys, Martin R. Oswald: 3D Instance Segmentation via Multi-task Metric Learning. ICCV 2019 [oral]
Mask3D_evaluation0.631 481.000 10.829 300.606 530.646 270.836 310.068 550.511 590.462 430.507 380.619 300.389 530.610 441.000 10.432 570.828 240.673 350.788 700.552 35
3D-MPA0.611 501.000 10.833 290.765 350.526 490.756 510.136 480.588 520.470 410.438 500.432 560.358 570.650 290.857 510.429 580.765 390.557 571.000 10.430 53
Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner: 3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation. CVPR 2020
SALoss-ResNet0.459 651.000 10.737 580.159 760.259 680.587 650.138 470.475 640.217 620.416 560.408 580.128 650.315 690.714 620.411 590.536 710.590 490.873 670.304 63
Zhidong Liang, Ming Yang, Hao Li, Chunxiang Wang: 3D Instance Embedding Learning With a Structure-Aware Loss Function for Point Cloud Segmentation. IEEE Robotics and Automation Letters (IROS2020)
3D-BoNet0.488 621.000 10.672 660.590 570.301 660.484 730.098 520.620 480.306 570.341 650.259 680.125 660.434 630.796 610.402 600.499 720.513 630.909 620.439 52
Bo Yang, Jianan Wang, Ronald Clark, Qingyong Hu, Sen Wang, Andrew Markham, Niki Trigoni: Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds. NeurIPS 2019 Spotlight
SSEN0.575 541.000 10.761 520.473 620.477 570.795 430.066 560.529 570.658 280.460 440.461 520.380 540.331 680.859 500.401 610.692 590.653 381.000 10.348 60
Dongsu Zhang, Junha Chun, Sang Kyun Cha, Young Min Kim: Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning. Arxiv
SphereSeg0.680 351.000 10.856 240.744 380.618 340.893 170.151 420.651 420.713 160.537 340.579 390.430 480.651 281.000 10.389 620.744 460.697 310.991 530.601 23
Occipital-SCS0.512 611.000 10.716 600.509 610.506 530.611 630.092 530.602 510.177 640.346 640.383 610.165 640.442 610.850 580.386 630.618 660.543 610.889 630.389 57
MASCpermissive0.447 660.528 720.555 700.381 630.382 610.633 610.002 690.509 600.260 580.361 630.432 550.327 580.451 590.571 680.367 640.639 640.386 660.980 570.276 65
Chen Liu, Yasutaka Furukawa: MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation.
R-PointNet0.306 710.500 740.405 760.311 640.348 630.589 640.054 580.068 760.126 680.283 680.290 660.028 740.219 720.214 730.331 650.396 760.275 720.821 690.245 67
NeuralBF0.555 560.667 620.896 190.843 200.517 510.751 520.029 610.519 580.414 480.439 480.465 500.000 780.484 570.857 510.287 660.693 580.651 401.000 10.485 47
Weiwei Sun, Daniel Rebain, Renjie Liao, Vladimir Tankovich, Soroosh Yazdani, Kwang Moo Yi, Andrea Tagliasacchi: NeuralBF: Neural Bilateral Filtering for Top-down Instance Segmentation on Point Clouds. WACV 2023
SPG_WSIS0.470 640.667 620.685 650.677 470.372 620.562 670.000 710.482 630.244 590.316 670.298 650.052 730.442 620.857 510.267 670.702 540.559 561.000 10.287 64
PCJC0.578 531.000 10.810 350.583 590.449 590.813 380.042 600.603 500.341 550.490 400.465 510.410 500.650 290.835 590.264 680.694 570.561 550.889 630.504 43
Region-18class0.284 720.250 780.751 570.228 710.270 670.521 700.000 710.468 650.008 770.205 710.127 720.000 780.068 760.070 770.262 690.652 630.323 690.740 720.173 71
UNet-backbone0.319 700.667 620.715 610.233 690.189 710.479 740.008 660.218 710.067 720.201 720.173 710.107 670.123 740.438 700.150 700.615 670.355 670.916 610.093 78
Sem_Recon_ins0.227 760.764 610.486 730.069 780.098 760.426 760.017 640.067 770.015 740.172 730.100 730.096 680.054 780.183 740.135 710.366 770.260 760.614 750.168 73
3D-BEVIS0.248 740.667 620.566 690.076 770.035 790.394 770.027 630.035 780.098 700.099 770.030 780.025 750.098 750.375 720.126 720.604 680.181 770.854 680.171 72
Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe: 3D-BEVIS: Birds-Eye-View Instance Segmentation.
3D-SISpermissive0.382 681.000 10.432 750.245 680.190 700.577 660.013 650.263 700.033 730.320 660.240 690.075 690.422 640.857 510.117 730.699 550.271 740.883 650.235 68
Ji Hou, Angela Dai, Matthias Niessner: 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. CVPR 2019
Sgpn_scannet0.143 780.208 790.390 770.169 740.065 770.275 780.029 620.069 750.000 780.087 780.043 770.014 760.027 790.000 780.112 740.351 780.168 780.438 770.138 76
SemRegionNet-20cls0.250 730.333 750.613 680.229 700.163 720.493 710.000 710.304 690.107 690.147 750.100 740.052 720.231 700.119 750.039 750.445 740.325 680.654 730.141 74
tmp0.248 740.667 620.437 740.188 720.153 740.491 720.000 710.208 720.094 710.153 740.099 750.057 710.217 730.119 750.039 750.466 730.302 700.640 740.140 75
ASIS0.199 770.333 750.253 780.167 750.140 750.438 750.000 710.177 740.008 760.121 760.069 760.004 770.231 710.429 710.036 770.445 750.273 730.333 780.119 77
MaskRCNN 2d->3d Proj0.058 790.333 750.002 790.000 790.053 780.002 790.002 700.021 790.000 780.045 790.024 790.238 600.065 770.000 780.014 780.107 790.020 790.110 790.006 79
Hier3Dcopyleft0.323 690.667 620.542 710.264 670.157 730.550 680.000 710.205 730.009 750.270 690.218 700.075 690.500 560.688 650.007 790.698 560.301 710.459 760.200 70
Tan: HCFS3D: Hierarchical Coupled Feature Selection Network for 3D Semantic and Instance Segmentation.