A Customizable Face Generation Method Based on Stable Diffusion Model

Facial generation technology uses computer algorithms and artificial intelligence techniques to generate realistic facial images. This technology typically employs deep learning models such as Generative Adversarial Networks (GANs) and diffusion models, learning from a large dataset of real facial i...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenlong Xiang, Shuzhen Xu, Cuicui Lv, Shuo Wang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10810380/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Facial generation technology uses computer algorithms and artificial intelligence techniques to generate realistic facial images. This technology typically employs deep learning models such as Generative Adversarial Networks (GANs) and diffusion models, learning from a large dataset of real facial images to generate new virtual facial images. Text-to-image model combines textual and visual information by generating image content based on corresponding input text descriptions. The development of text-to-image models provides new insights into cross-modal learning, promotes interaction and fusion between textual and visual information, thereby supporting the advancement of multimodal intelligent systems. This paper aims to apply text-to-image models to design a customizable facial generation model. The model is improved upon the Stable Diffusion model by incorporating LoRA (Low-Rank Adaptation) principles for style constraints. In addition, we modify the structure of Variational Autoencoder to enhance generation efficiency. Upon obtaining initial generation results, local refinements can be performed without altering the main structure, allowing for localized adjustments. Through pre-generation and post-processing, our model can accurately generate semantically guided face images.
ISSN:2169-3536