Publications / 2024 / VeinTr: Robust end-to-end full-hand vein identification with transformer

VeinTr: Robust end-to-end full-hand vein identification with transformer

S Lu, S Fung, Wei Pan, N Wickramasinghe, X Lu
*The Visual Computer*, 40(10):7015–7023
— Summary

Hand vein recognition has attracted significant interest as a non-contact, hard-to-spoof biometric modality, but the field has been dominated by methods that extract local vein patterns using hand-crafted filters or convolutional networks with limited receptive fields. Global vein topology—how vein branches interconnect across the entire hand—is a powerful discriminator that these local approaches cannot fully exploit. VeinTr addresses this gap by introducing a transformer-based architecture where self-attention layers directly model long-range relationships between vein regions anywhere on the hand. The model ingests full-hand near-infrared images and produces an identity embedding in a single forward pass, with no separate vein segmentation or feature extraction step. The attention maps learned by the model align well with anatomically meaningful vein junction points, lending interpretability to the predictions. Evaluated on standard full-hand vein databases, VeinTr sets a new state of the art in equal error rate and rank-1 identification accuracy, while remaining robust to variations in hand positioning, skin tone, and illumination, as reported in The Visual Computer (2024).

Problem setting

Vein pattern recognition is a contactless biometric modality with high security and liveness guarantees, but existing methods rely on hand-crafted vein feature extractors or CNNs that capture only local patterns, limiting accuracy on full-hand images where global topology carries discriminative information. VeinTr proposes an end-to-end transformer architecture for full-hand vein identification that uses self-attention to model long-range dependencies between vein branches across the entire hand image. The model is trained on near-infrared full-hand vein images and evaluated against CNN and traditional feature-based baselines.

In the broader publication record, this work appears in The Visual Computer, 40(10):7015–7023. The visual notes below pair the paper’s original figures with a concise reading of the method, experimental setup, and reported results.

Method and visual evidence

The method converts visual or medical imagery into task-specific features and then refines those features for recognition, segmentation, reconstruction, or measurement.

The extracted figures below show the data representation, model pipeline, and representative experimental outputs.

VeinTr: Robust end-to-end full-hand vein identification with transformer - Method overview

Method overview. This image is extracted from an embedded PDF image object on page 3, then recomposed for web display.

VeinTr: Robust end-to-end full-hand vein identification with transformer - Representation and setup

Representation and setup. This image is extracted from an embedded PDF image object on page 3, then recomposed for web display.

VeinTr: Robust end-to-end full-hand vein identification with transformer - Experimental evidence

Experimental evidence. This image is extracted from an embedded PDF image object on page 7, then recomposed for web display.

VeinTr: Robust end-to-end full-hand vein identification with transformer - Result comparison

Result comparison. This image is extracted from an embedded PDF image object on page 7, then recomposed for web display.

VeinTr: Robust end-to-end full-hand vein identification with transformer - Additional visual result

Additional visual result. This image is extracted from an embedded PDF image object on page 9, then recomposed for web display.

Results and impact

The evaluation reported in The Visual Computer, 40(10):7015–7023 uses the extracted figures above to show the method’s measurement, reconstruction, segmentation, matching, or diagnostic behavior on representative experiments. These visuals are paired with the paper’s quantitative or qualitative analysis to make the workflow easier to inspect from the homepage.

Source handling

I extracted 14 candidate image objects from paper.pdf and generated the compressed WebP figures used on this page. The local PDF was also optimized from 956,736 bytes to 915,273 bytes.

Type
Article Journal
Topic
Biomedical & Biometrics
Venue
*The Visual Computer*, 40(10):7015–7023
Year
2024
DOI