Selective, Structural, Subtle: Trilinear Spatial-Awareness

for Few-Shot Fine-Grained Visual Recognition

Heng Wu1
Yifan Zhao1
Jia Li1,2

1State Key Laboratory of Virtual Reality Technology and Systems, SCSE, Beihang University, Beijing, China

2Pengcheng Laboratory, Shenzhen, China

ICME 2021

The framework of S3Net

Abstract

Few-shot learning aims to recognize the novel categories from a few examples. However, most of the existing approaches usually focus on general image classification and fail to handle subtle differences between images. To alleviate this issue, we propose a trilinear spatial-awareness network for fewshot-grained visual recognition, called S3Net, which is composed of a spatial selection module, structural pyramid descriptor, and subtle difference mining module. Specifically,we first build the global relation to strengthen the features by spatial selection module. The structural pyramid descriptor then constructs a multi-scale representation for enhancing the rich contextual information by exploiting different receptive fields in the same feature layer. Furthermore, a similarity loss based on local descriptors and a global classification loss is design to help the network learn discrimination capability by handling subtle differences in confused or near-duplicated samples. Extensive experiments on 4 few-shot fine-grained benchmarks demonstrate that our proposed approach is effective and outperforms state-of-the-art models by large margins.

Performance Comparison

BibTex Citation

@article{9428223,
  title={Selective, Structural, Subtle: Trilinear Spatial-Awareness for Few-Shot Fine-Grained Visual Recognition}, 
  author={Wu, Heng and Zhao, Yifan and Li, Jia},
  booktitle={2021 IEEE International Conference on Multimedia and Expo (ICME)}, 
  pages={1-6},
  year={2021},
}