Publications / 2025 / GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation

GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation

Shihan Zhang, Ling Cao, Wei Pan, Lei Lu
Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025)
— Summary

GCMA6D is an RGB-D 6D pose estimation network for object recognition and localization in complex scenes. It targets cases where existing pose pipelines struggle: occlusion, low texture, cluttered backgrounds, and weak local geometry.

In the broader publication record, this work sits in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025) and connects to practical problems in 3D sensing, computational geometry, and industrial machine vision.

Problem setting

6D pose estimation is difficult for occluded or low-texture objects. GCMA6D addresses this with an RGB-D network that extracts local geometry through a 3DGCN point-cloud branch, enhances image features with Large Kernel Attention, and fuses RGB and geometric features through Cross-Modality Attention plus Squeeze-and-Excitation reweighting. Experiments on LineMOD and YCB-Video show improved accuracy over DenseFusion-style baselines.

In the broader publication record, this work appears in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025). The visual notes below pair the paper’s original figures with a concise reading of the method, experimental setup, and reported results.

Method and visual evidence

The method works on 3D geometric observations such as point clouds, poses, correspondences, or segmented regions, then uses the proposed representation to improve robustness under noise, viewpoint change, or limited observations.

The extracted figures below show the geometric representation, network or optimization pipeline, and qualitative or quantitative results.

GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation - Method overview

Method overview. This image is extracted from an embedded PDF image object on page 3, then recomposed for web display.

GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation - Representation and setup

Representation and setup. This image is extracted from an embedded PDF image object on page 3, then recomposed for web display.

GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation - Experimental evidence

Experimental evidence. This image is extracted from an embedded PDF image object on page 4, then recomposed for web display.

GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation - Result comparison

Result comparison. This image is extracted from an embedded PDF image object on page 4, then recomposed for web display.

GCMA6D: Graph Convolution and Cross-Modality Attention Fusion for 6D Pose Estimation - Additional visual result

Additional visual result. This image is extracted from an embedded PDF image object on page 10, then recomposed for web display.

Results and impact

The evaluation reported in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025) uses the extracted figures above to show the method’s measurement, reconstruction, segmentation, matching, or diagnostic behavior on representative experiments. These visuals are paired with the paper’s quantitative or qualitative analysis to make the workflow easier to inspect from the homepage.

Source handling

I extracted 32 candidate image objects from paper.pdf and generated the compressed WebP figures used on this page. The local PDF was also optimized from 2,145,667 bytes to 2,141,399 bytes.

Type
Paper Conference
Topic
6D Pose & Robotics
Venue
Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025)
Year
2025