Publications / 2026 / Few-Shot 6D Object Pose Estimation via Decoupled Rotation and Translation with Viewpoint Encoding

Few-Shot 6D Object Pose Estimation via Decoupled Rotation and Translation with Viewpoint Encoding

Lei Lu, Peng Cao, Wei Pan, Zhilong Su, Haojun Zhang, Wangxing Zheng, Ge Gao, Peng Li
*Electronics*, 15(3):561
— Summary

This work addresses few-shot monocular 6D object pose estimation by separating rotation and translation into two easier subproblems. Rotation is estimated through a viewpoint encoder and codebook retrieval trained from synthetic CAD-rendered views, while translation is estimated using geometry-aware dense correspondences inspired by GDR-Net.

In the broader publication record, this work sits in Electronics, 15(3):561 and connects to practical problems in 3D sensing, computational geometry, and industrial machine vision.

Algorithm principle

The method decouples 6D pose into rotation estimation and translation estimation. For rotation, a viewpoint encoder maps object crops into a viewpoint embedding. Synthetic object views rendered from ShapeNet-style CAD models are stored in a codebook, allowing the system to retrieve coarse out-of-plane rotation candidates without requiring dense real pose annotations. An in-plane rotation regressor and orientation verification stage then refine the rotation estimate.

For translation, the pipeline predicts geometry-aware dense correspondences and mask-related geometric cues, then regresses the 3D translation so that the object projection agrees with the observed RGB image. This split design is especially useful in few-shot settings because rotation can be learned from synthetic view coverage, while translation is anchored by image geometry from a small amount of real data.

Visual evidence

FS6D overview

FS6D overview. The pipeline detects an object, estimates rotation through viewpoint-encoded retrieval and refinement, and estimates translation from dense geometric correspondences.

Results and impact

The implementation is designed around limited real supervision, using about 600 real images per object together with synthetic view generation. The paper reports evaluation on LINEMOD, LM-O, and YCB-Video with standard 6D pose metrics such as ADD(-S) and AUC. For deployment-oriented robotics and inspection, the main value is that new object categories can be added with much less real pose annotation than fully supervised pose pipelines.

Type
Article Journal
Topic
6D Pose & Robotics
Venue
*Electronics*, 15(3):561
Year
2026
DOI