Tag‐inferring and tag‐guided Transformer for image captioning
Abstract Image captioning is an important task for understanding images. Recently, many studies have used tags to build alignments between image information and language information. However, existing methods ignore the problem that simple semantic tags have difficulty expressing the detailed semant...
        Saved in:
      
    
          | Main Authors: | Yaohua Yi, Yinkai Liang, Dezhu Kong, Ziwei Tang, Jibing Peng | 
|---|---|
| Format: | Article | 
| Language: | English | 
| Published: | Wiley
    
        2024-09-01 | 
| Series: | IET Computer Vision | 
| Subjects: | |
| Online Access: | https://doi.org/10.1049/cvi2.12280 | 
| Tags: | Add Tag 
      No Tags, Be the first to tag this record!
   | 
Similar Items
- 
                
                    Vision Transformers for Image Classification: A Comparative Survey        
                          
 by: Yaoli Wang, et al.
 Published: (2025-01-01)
- 
                
                    Monitoring poultry social dynamics using colored tags: Avian visual perception, behavioral effects, and artificial intelligence precision        
                          
 by: Florencia B. Rossi, et al.
 Published: (2025-01-01)
- 
                
                    Pruning‐guided feature distillation for an efficient transformer‐based pose estimation model        
                          
 by: Dong‐hwi Kim, et al.
 Published: (2024-09-01)
- 
                
                    Swin‐fisheye: Object detection for fisheye images        
                          
 by: Dawei Zhang, et al.
 Published: (2024-11-01)
- 
                
                    3D landmark‐based face restoration for recognition using variational autoencoder and triplet loss        
                          
 by: Sahil Sharma, et al.
 Published: (2021-01-01)
 
       