A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention

Research on text-to-image has gained significant attention. However, existing methods primarily rely on upsampling convolution operations for feature extraction during the initial image generation stage. This approach has inherent limitations, often leading to the loss of global information and the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yang Yang, Ainuddin Wahid Bin Abdul Wahab, Norisma Binti Idris, Dingguo Yu, Chang Liu
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	AI-generated images image feature fusion long-range semantic dependencies text-to-image
Online Access:	https://ieeexplore.ieee.org/document/11119635/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849223105211269120
author	Yang Yang Ainuddin Wahid Bin Abdul Wahab Norisma Binti Idris Dingguo Yu Chang Liu
author_facet	Yang Yang Ainuddin Wahid Bin Abdul Wahab Norisma Binti Idris Dingguo Yu Chang Liu
author_sort	Yang Yang
collection	DOAJ
description	Research on text-to-image has gained significant attention. However, existing methods primarily rely on upsampling convolution operations for feature extraction during the initial image generation stage. This approach has inherent limitations, often leading to the loss of global information and the inability to capture long-range semantic dependencies. To address these issues, this study proposes a generation algorithm for “text to image” based on multi-channel attention (TTI-MCA). The method integrates a self-supervised module into the initial image generation phase, leveraging attention mechanisms to enable autonomous mapping learning between image features. This facilitates a deep integration of contextual understanding and self-attention learning. Additionally, a feature fusion enhancement module is introduced, which combines low-resolution features from the previous stage with high-resolution features from the current stage. This allows the generation network to fully utilize the rich semantic information of low-level features and the high-resolution details of high-level features, ultimately producing high-quality, realistic images. Experimental results show that TTI-MCA outperforms the baseline algorithm in both Inception Score (IS) and Fréchet Inception Distance (FID), achieving superior performance on the CUB and COCO datasets. This research provides a novel approach to generating high-quality images from text.
format	Article
id	doaj-art-c6e98687d8eb4e53a834b1f8a9f28a45
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-c6e98687d8eb4e53a834b1f8a9f28a452025-08-25T23:11:53ZengIEEEIEEE Access2169-35362025-01-011314487814488610.1109/ACCESS.2025.359689411119635A Generation Algorithm for “Text to Image” Based on Multi-Channel AttentionYang Yang0Ainuddin Wahid Bin Abdul Wahab1https://orcid.org/0000-0003-1062-0329Norisma Binti Idris2https://orcid.org/0000-0002-8006-7496Dingguo Yu3https://orcid.org/0000-0001-6701-6451Chang Liu4https://orcid.org/0000-0003-0846-956XFaculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, MalaysiaFaculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, MalaysiaFaculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, MalaysiaCollege of Media Engineering, Communication University of Zhejiang, Hangzhou, ChinaCollege of Media Engineering, Communication University of Zhejiang, Hangzhou, ChinaResearch on text-to-image has gained significant attention. However, existing methods primarily rely on upsampling convolution operations for feature extraction during the initial image generation stage. This approach has inherent limitations, often leading to the loss of global information and the inability to capture long-range semantic dependencies. To address these issues, this study proposes a generation algorithm for “text to image” based on multi-channel attention (TTI-MCA). The method integrates a self-supervised module into the initial image generation phase, leveraging attention mechanisms to enable autonomous mapping learning between image features. This facilitates a deep integration of contextual understanding and self-attention learning. Additionally, a feature fusion enhancement module is introduced, which combines low-resolution features from the previous stage with high-resolution features from the current stage. This allows the generation network to fully utilize the rich semantic information of low-level features and the high-resolution details of high-level features, ultimately producing high-quality, realistic images. Experimental results show that TTI-MCA outperforms the baseline algorithm in both Inception Score (IS) and Fréchet Inception Distance (FID), achieving superior performance on the CUB and COCO datasets. This research provides a novel approach to generating high-quality images from text.https://ieeexplore.ieee.org/document/11119635/AI-generated imagesimage feature fusionlong-range semantic dependenciestext-to-image
spellingShingle	Yang Yang Ainuddin Wahid Bin Abdul Wahab Norisma Binti Idris Dingguo Yu Chang Liu A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention IEEE Access AI-generated images image feature fusion long-range semantic dependencies text-to-image
title	A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention
title_full	A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention
title_fullStr	A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention
title_full_unstemmed	A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention
title_short	A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention
title_sort	generation algorithm for x201c text to image x201d based on multi channel attention
topic	AI-generated images image feature fusion long-range semantic dependencies text-to-image
url	https://ieeexplore.ieee.org/document/11119635/
work_keys_str_mv	AT yangyang agenerationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT ainuddinwahidbinabdulwahab agenerationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT norismabintiidris agenerationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT dingguoyu agenerationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT changliu agenerationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT yangyang generationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT ainuddinwahidbinabdulwahab generationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT norismabintiidris generationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT dingguoyu generationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention AT changliu generationalgorithmforx201ctexttoimagex201dbasedonmultichannelattention

A Generation Algorithm for &#x201C;Text to Image&#x201D; Based on Multi-Channel Attention

Similar Items

A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention