Image synthesis method based on multiple text description
Aiming at the challenges associates with the low quality and structural errors existed in the images generated by a single text description, a multi-stage generative adversarial network model was used to study, and it was proposed to interpolate different text sequences to enrich the given text desc...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Beijing Xintong Media Co., Ltd
2024-05-01
|
Series: | Dianxin kexue |
Subjects: | |
Online Access: | http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2024142/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Aiming at the challenges associates with the low quality and structural errors existed in the images generated by a single text description, a multi-stage generative adversarial network model was used to study, and it was proposed to interpolate different text sequences to enrich the given text descriptions by extracting features from multiple text descriptions and imparting greater detail to the generated images. In order to enhance the correlation between the generated images and the corresponding text, a multi-captions deep attentional multi-modal similarity model that captured attention features was introduced. These features were subsequently integrated with visual features from the preceding layer, serving as input for the subsequent layer. This integration improved the realism of the generated images and enhanced their semantic consistency with the text descriptions. In addition, a self-attention mechanism to enable the model to effectively coordinate the details at each position was incorporated, resulting in images that were more aligned with real-world scenarios. The optimized model was verified on the CUB and MS-COCO datasets, demonstrating the generation of images with intact structures, stronger semantic consistency, and richer visual diversity. |
---|---|
ISSN: | 1000-0801 |