SFRF6D: Selective Fewer-Reference Fusion for 6D Pose Estimation

SFRF6D is a model-free 6D pose estimation framework for settings where CAD models are unavailable and only a few RGB-D reference images can be collected. It is designed for large viewpoint changes, occlusion, and sparse reference coverage.
In the broader publication record, this work sits in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025) and connects to practical problems in 3D sensing, computational geometry, and industrial machine vision.
Problem setting
Most 6D pose estimation methods rely on object CAD models or dense reference views. SFRF6D reduces this dependency by using a small set of RGB-D reference images with known poses. It reconstructs a lightweight object representation, renders reference views, and uses geometry-guided sparse cross-view attention to select reliable visible correspondences.
In the broader publication record, this work appears in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025). The visual notes below pair the paper’s original figures with a concise reading of the method, experimental setup, and reported results.
Method and visual evidence
The method works on 3D geometric observations such as point clouds, poses, correspondences, or segmented regions, then uses the proposed representation to improve robustness under noise, viewpoint change, or limited observations.
The extracted figures below show the geometric representation, network or optimization pipeline, and qualitative or quantitative results.

Method overview. This image is extracted from an embedded PDF image object on page 1, then recomposed for web display.

Representation and setup. This image is extracted from an embedded PDF image object on page 1, then recomposed for web display.

Experimental evidence. This image is extracted from an embedded PDF image object on page 1, then recomposed for web display.

Result comparison. This image is extracted from an embedded PDF image object on page 3, then recomposed for web display.

Additional visual result. This image is extracted from an embedded PDF image object on page 4, then recomposed for web display.
Results and impact
The evaluation reported in Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2025) uses the extracted figures above to show the method’s measurement, reconstruction, segmentation, matching, or diagnostic behavior on representative experiments. These visuals are paired with the paper’s quantitative or qualitative analysis to make the workflow easier to inspect from the homepage.
Source handling
I extracted 5 candidate image objects from paper.pdf and generated the compressed WebP figures used on this page. The local PDF was also optimized from 560,080 bytes to 530,968 bytes.