Publications / 2025 / GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation

GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation

Shihan Zhang, Ling Cao, Wei Pan, Lei Lu

Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025)

— Summary

GCMA6D is an RGB-D 6D pose estimation network for object recognition and localization in complex scenes. It targets cases where existing pose pipelines struggle: occlusion, low texture, cluttered backgrounds, and weak local geometry.

This work appears in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025).

Problem setting

6D pose estimation is difficult for occluded or low-texture objects. GCMA6D addresses this with an RGB-D network that extracts local geometry through a 3DGCN point-cloud branch, enhances image features with Large Kernel Attention, and fuses RGB and geometric features through Cross-Modality Attention plus Squeeze-and-Excitation reweighting. Experiments on LineMOD and YCB-Video show improved accuracy over DenseFusion-style baselines.

The figures below collect representative visual evidence from Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025).

Method and visual evidence

The visuals show RGB-D graph-convolution and cross-modality attention fusion for 6D pose estimation, with benchmark comparisons.

Method overview.

Representation and setup.

Experimental evidence.

Result comparison.

Additional visual result.

Results and impact

The evaluation reported in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025) is summarized through the figures above.

Type

Paper Conference

Topic

6D Pose & Robotics

Venue

Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025)

Year

2025