A Customizable Face Generation Method Based on Stable Diffusion Model
Facial generation technology uses computer algorithms and artificial intelligence techniques to generate realistic facial images. This technology typically employs deep learning models such as Generative Adversarial Networks (GANs) and diffusion models, learning from a large dataset of real facial i...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10810380/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Facial generation technology uses computer algorithms and artificial intelligence techniques to generate realistic facial images. This technology typically employs deep learning models such as Generative Adversarial Networks (GANs) and diffusion models, learning from a large dataset of real facial images to generate new virtual facial images. Text-to-image model combines textual and visual information by generating image content based on corresponding input text descriptions. The development of text-to-image models provides new insights into cross-modal learning, promotes interaction and fusion between textual and visual information, thereby supporting the advancement of multimodal intelligent systems. This paper aims to apply text-to-image models to design a customizable facial generation model. The model is improved upon the Stable Diffusion model by incorporating LoRA (Low-Rank Adaptation) principles for style constraints. In addition, we modify the structure of Variational Autoencoder to enhance generation efficiency. Upon obtaining initial generation results, local refinements can be performed without altering the main structure, allowing for localized adjustments. Through pre-generation and post-processing, our model can accurately generate semantically guided face images. |
|---|---|
| ISSN: | 2169-3536 |