Image synthesis method based on multiple text description

Aiming at the challenges associates with the low quality and structural errors existed in the images generated by a single text description, a multi-stage generative adversarial network model was used to study, and it was proposed to interpolate different text sequences to enrich the given text desc...

Full description

Saved in:

Bibliographic Details
Main Authors:	NIE Kaiqin, NI Zhengwei
Format:	Article
Language:	zho
Published:	Beijing Xintong Media Co., Ltd 2024-05-01
Series:	Dianxin kexue
Subjects:	text-to-image generative adversarial network computer vision semantic consistency self-attention
Online Access:	http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2024142/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841528852875051008
author	NIE Kaiqin NI Zhengwei
author_facet	NIE Kaiqin NI Zhengwei
author_sort	NIE Kaiqin
collection	DOAJ
description	Aiming at the challenges associates with the low quality and structural errors existed in the images generated by a single text description, a multi-stage generative adversarial network model was used to study, and it was proposed to interpolate different text sequences to enrich the given text descriptions by extracting features from multiple text descriptions and imparting greater detail to the generated images. In order to enhance the correlation between the generated images and the corresponding text, a multi-captions deep attentional multi-modal similarity model that captured attention features was introduced. These features were subsequently integrated with visual features from the preceding layer, serving as input for the subsequent layer. This integration improved the realism of the generated images and enhanced their semantic consistency with the text descriptions. In addition, a self-attention mechanism to enable the model to effectively coordinate the details at each position was incorporated, resulting in images that were more aligned with real-world scenarios. The optimized model was verified on the CUB and MS-COCO datasets, demonstrating the generation of images with intact structures, stronger semantic consistency, and richer visual diversity.
format	Article
id	doaj-art-630db80294fd4347a8ab879be8a17009
institution	Kabale University
issn	1000-0801
language	zho
publishDate	2024-05-01
publisher	Beijing Xintong Media Co., Ltd
record_format	Article
series	Dianxin kexue
spelling	doaj-art-630db80294fd4347a8ab879be8a170092025-01-15T03:33:27ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012024-05-0140738560129505Image synthesis method based on multiple text descriptionNIE KaiqinNI ZhengweiAiming at the challenges associates with the low quality and structural errors existed in the images generated by a single text description, a multi-stage generative adversarial network model was used to study, and it was proposed to interpolate different text sequences to enrich the given text descriptions by extracting features from multiple text descriptions and imparting greater detail to the generated images. In order to enhance the correlation between the generated images and the corresponding text, a multi-captions deep attentional multi-modal similarity model that captured attention features was introduced. These features were subsequently integrated with visual features from the preceding layer, serving as input for the subsequent layer. This integration improved the realism of the generated images and enhanced their semantic consistency with the text descriptions. In addition, a self-attention mechanism to enable the model to effectively coordinate the details at each position was incorporated, resulting in images that were more aligned with real-world scenarios. The optimized model was verified on the CUB and MS-COCO datasets, demonstrating the generation of images with intact structures, stronger semantic consistency, and richer visual diversity.http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2024142/text-to-imagegenerative adversarial networkcomputer visionsemantic consistencyself-attention
spellingShingle	NIE Kaiqin NI Zhengwei Image synthesis method based on multiple text description Dianxin kexue text-to-image generative adversarial network computer vision semantic consistency self-attention
title	Image synthesis method based on multiple text description
title_full	Image synthesis method based on multiple text description
title_fullStr	Image synthesis method based on multiple text description
title_full_unstemmed	Image synthesis method based on multiple text description
title_short	Image synthesis method based on multiple text description
title_sort	image synthesis method based on multiple text description
topic	text-to-image generative adversarial network computer vision semantic consistency self-attention
url	http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2024142/
work_keys_str_mv	AT niekaiqin imagesynthesismethodbasedonmultipletextdescription AT nizhengwei imagesynthesismethodbasedonmultipletextdescription

Image synthesis method based on multiple text description

Similar Items