ENGDM: Enhanced Non-Isotropic Gaussian Diffusion Model for Progressive Image Editing

Diffusion models have made remarkable progress in image generation, leading to advancements in the field of image editing. However, balancing editability with faithfulness remains a significant challenge. Motivated by the fact that more novel content will be generated when larger variance noise is a...

Full description

Saved in:
Bibliographic Details
Main Authors: Xi Yu, Xiang Gu, Xin Hu, Jian Sun
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/10/2970
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Diffusion models have made remarkable progress in image generation, leading to advancements in the field of image editing. However, balancing editability with faithfulness remains a significant challenge. Motivated by the fact that more novel content will be generated when larger variance noise is applied to the image, in this paper, we propose an Enhanced Non-isotropic Gaussian Diffusion Model (ENGDM) for progressive image editing, which introduces independent Gaussian noise with varying variances to each pixel based on its editing needs. To enable efficient inference without retraining, ENGDM is rectified into an isotropic Gaussian diffusion model (IGDM) by assigning different total diffusion times to different pixels. Furthermore, we introduce reinforced text embeddings, using a novel editing reinforcement loss in the latent space to optimize text embeddings for enhanced editability. And we introduce optimized noise variances by employing a structural consistency loss to dynamically adjust the denoising time steps for each pixel for better faithfulness. Experimental results on multiple datasets demonstrate that ENGDM achieves state-of-the-art performance in image-editing tasks, effectively balancing faithfulness to the source image and alignment with the desired editing target.
ISSN:1424-8220