Advanced Detection of AI-Generated Images Through Vision Transformers

The rapid advancement of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has been a great success in the field of image synthesis and creation. Artificially generated GAN-based images are widely spread over the Internet along with the development in generation of n...

Full description

Saved in:
Bibliographic Details
Main Author: Darshan Lamichhane
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10815726/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841554065988780032
author Darshan Lamichhane
author_facet Darshan Lamichhane
author_sort Darshan Lamichhane
collection DOAJ
description The rapid advancement of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has been a great success in the field of image synthesis and creation. Artificially generated GAN-based images are widely spread over the Internet along with the development in generation of natural and photorealistic images. While this could lead to better digital media and content, it also poses a risk to security, legitimacy, and authenticity. The advancement of AI-generated images, particularly those that are produced by Generative Adversarial Networks (GANs), has created a rising concern about the potential misuse of these images in spreading misinformation and creating deepfakes. Detecting such fake or AI-generated images has become an important challenge in maintaining the integrity of digital media. In this research, we have explored the application of the Vision Transformer (ViTs) model for detecting AI-generated images, leveraging the Kaggle dataset - a balanced collection of real and AI-generated images. The Vision Transformer is recognized for its innovative method of treating images as sequences of patches and excels at identifying long-range dependencies and complex patterns within images. That makes it exceptionally well-suited for this task of detecting fake images. We have fine-tuned the ViT model on the dataset, performing data augmentation techniques on it and leveraging pretrained weights to boost the model’s performance. The findings thus obtained demonstrate that the ViT model attains a high level of accuracy in differentiating between real and AI-generated images, outperforming traditional CNN-based approaches. Beyond performance evaluation, we also conducted an ablation study to examine the impact of various components of the ViT model, including the number of attention heads, patch size, the impact of data augmentation, and the depth of layers. The results obtained in this study indicate that the ViT model not only excels in accuracy but also provides a robust framework for detecting AI-generated images across diverse scenarios. Our study shows the strength of transformer based models in addressing the increasing challenge of AI-generated image detection, laying a foundation for future research in this critical area. This experiment highlights that when the ViT model is fine tuned with optimal data augmentation techniques, it gains state of the art performance in AI-generated image detection, emphasizing its potential for real-world applications.
format Article
id doaj-art-c732fb5b4d8b4feb87fa129e500c467d
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c732fb5b4d8b4feb87fa129e500c467d2025-01-09T00:01:24ZengIEEEIEEE Access2169-35362025-01-01133644365210.1109/ACCESS.2024.352275910815726Advanced Detection of AI-Generated Images Through Vision TransformersDarshan Lamichhane0https://orcid.org/0009-0007-6448-6205Everest English Boarding Secondary School, Butwal, NepalThe rapid advancement of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has been a great success in the field of image synthesis and creation. Artificially generated GAN-based images are widely spread over the Internet along with the development in generation of natural and photorealistic images. While this could lead to better digital media and content, it also poses a risk to security, legitimacy, and authenticity. The advancement of AI-generated images, particularly those that are produced by Generative Adversarial Networks (GANs), has created a rising concern about the potential misuse of these images in spreading misinformation and creating deepfakes. Detecting such fake or AI-generated images has become an important challenge in maintaining the integrity of digital media. In this research, we have explored the application of the Vision Transformer (ViTs) model for detecting AI-generated images, leveraging the Kaggle dataset - a balanced collection of real and AI-generated images. The Vision Transformer is recognized for its innovative method of treating images as sequences of patches and excels at identifying long-range dependencies and complex patterns within images. That makes it exceptionally well-suited for this task of detecting fake images. We have fine-tuned the ViT model on the dataset, performing data augmentation techniques on it and leveraging pretrained weights to boost the model’s performance. The findings thus obtained demonstrate that the ViT model attains a high level of accuracy in differentiating between real and AI-generated images, outperforming traditional CNN-based approaches. Beyond performance evaluation, we also conducted an ablation study to examine the impact of various components of the ViT model, including the number of attention heads, patch size, the impact of data augmentation, and the depth of layers. The results obtained in this study indicate that the ViT model not only excels in accuracy but also provides a robust framework for detecting AI-generated images across diverse scenarios. Our study shows the strength of transformer based models in addressing the increasing challenge of AI-generated image detection, laying a foundation for future research in this critical area. This experiment highlights that when the ViT model is fine tuned with optimal data augmentation techniques, it gains state of the art performance in AI-generated image detection, emphasizing its potential for real-world applications.https://ieeexplore.ieee.org/document/10815726/GAN based images detectionDeepFake imagesGAN image classificationdetection of AI-generated imagesfake AI-generated images detectionvision transformers
spellingShingle Darshan Lamichhane
Advanced Detection of AI-Generated Images Through Vision Transformers
IEEE Access
GAN based images detection
DeepFake images
GAN image classification
detection of AI-generated images
fake AI-generated images detection
vision transformers
title Advanced Detection of AI-Generated Images Through Vision Transformers
title_full Advanced Detection of AI-Generated Images Through Vision Transformers
title_fullStr Advanced Detection of AI-Generated Images Through Vision Transformers
title_full_unstemmed Advanced Detection of AI-Generated Images Through Vision Transformers
title_short Advanced Detection of AI-Generated Images Through Vision Transformers
title_sort advanced detection of ai generated images through vision transformers
topic GAN based images detection
DeepFake images
GAN image classification
detection of AI-generated images
fake AI-generated images detection
vision transformers
url https://ieeexplore.ieee.org/document/10815726/
work_keys_str_mv AT darshanlamichhane advanceddetectionofaigeneratedimagesthroughvisiontransformers