Advanced Detection of AI-Generated Images Through Vision Transformers

The rapid advancement of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has been a great success in the field of image synthesis and creation. Artificially generated GAN-based images are widely spread over the Internet along with the development in generation of n...

Full description

Saved in:

Bibliographic Details
Main Author:	Darshan Lamichhane
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	GAN based images detection DeepFake images GAN image classification detection of AI-generated images fake AI-generated images detection vision transformers
Online Access:	https://ieeexplore.ieee.org/document/10815726/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841554065988780032
author	Darshan Lamichhane
author_facet	Darshan Lamichhane
author_sort	Darshan Lamichhane
collection	DOAJ
description	The rapid advancement of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has been a great success in the field of image synthesis and creation. Artificially generated GAN-based images are widely spread over the Internet along with the development in generation of natural and photorealistic images. While this could lead to better digital media and content, it also poses a risk to security, legitimacy, and authenticity. The advancement of AI-generated images, particularly those that are produced by Generative Adversarial Networks (GANs), has created a rising concern about the potential misuse of these images in spreading misinformation and creating deepfakes. Detecting such fake or AI-generated images has become an important challenge in maintaining the integrity of digital media. In this research, we have explored the application of the Vision Transformer (ViTs) model for detecting AI-generated images, leveraging the Kaggle dataset - a balanced collection of real and AI-generated images. The Vision Transformer is recognized for its innovative method of treating images as sequences of patches and excels at identifying long-range dependencies and complex patterns within images. That makes it exceptionally well-suited for this task of detecting fake images. We have fine-tuned the ViT model on the dataset, performing data augmentation techniques on it and leveraging pretrained weights to boost the model’s performance. The findings thus obtained demonstrate that the ViT model attains a high level of accuracy in differentiating between real and AI-generated images, outperforming traditional CNN-based approaches. Beyond performance evaluation, we also conducted an ablation study to examine the impact of various components of the ViT model, including the number of attention heads, patch size, the impact of data augmentation, and the depth of layers. The results obtained in this study indicate that the ViT model not only excels in accuracy but also provides a robust framework for detecting AI-generated images across diverse scenarios. Our study shows the strength of transformer based models in addressing the increasing challenge of AI-generated image detection, laying a foundation for future research in this critical area. This experiment highlights that when the ViT model is fine tuned with optimal data augmentation techniques, it gains state of the art performance in AI-generated image detection, emphasizing its potential for real-world applications.
format	Article
id	doaj-art-c732fb5b4d8b4feb87fa129e500c467d
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-c732fb5b4d8b4feb87fa129e500c467d2025-01-09T00:01:24ZengIEEEIEEE Access2169-35362025-01-01133644365210.1109/ACCESS.2024.352275910815726Advanced Detection of AI-Generated Images Through Vision TransformersDarshan Lamichhane0https://orcid.org/0009-0007-6448-6205Everest English Boarding Secondary School, Butwal, NepalThe rapid advancement of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has been a great success in the field of image synthesis and creation. Artificially generated GAN-based images are widely spread over the Internet along with the development in generation of natural and photorealistic images. While this could lead to better digital media and content, it also poses a risk to security, legitimacy, and authenticity. The advancement of AI-generated images, particularly those that are produced by Generative Adversarial Networks (GANs), has created a rising concern about the potential misuse of these images in spreading misinformation and creating deepfakes. Detecting such fake or AI-generated images has become an important challenge in maintaining the integrity of digital media. In this research, we have explored the application of the Vision Transformer (ViTs) model for detecting AI-generated images, leveraging the Kaggle dataset - a balanced collection of real and AI-generated images. The Vision Transformer is recognized for its innovative method of treating images as sequences of patches and excels at identifying long-range dependencies and complex patterns within images. That makes it exceptionally well-suited for this task of detecting fake images. We have fine-tuned the ViT model on the dataset, performing data augmentation techniques on it and leveraging pretrained weights to boost the model’s performance. The findings thus obtained demonstrate that the ViT model attains a high level of accuracy in differentiating between real and AI-generated images, outperforming traditional CNN-based approaches. Beyond performance evaluation, we also conducted an ablation study to examine the impact of various components of the ViT model, including the number of attention heads, patch size, the impact of data augmentation, and the depth of layers. The results obtained in this study indicate that the ViT model not only excels in accuracy but also provides a robust framework for detecting AI-generated images across diverse scenarios. Our study shows the strength of transformer based models in addressing the increasing challenge of AI-generated image detection, laying a foundation for future research in this critical area. This experiment highlights that when the ViT model is fine tuned with optimal data augmentation techniques, it gains state of the art performance in AI-generated image detection, emphasizing its potential for real-world applications.https://ieeexplore.ieee.org/document/10815726/GAN based images detectionDeepFake imagesGAN image classificationdetection of AI-generated imagesfake AI-generated images detectionvision transformers
spellingShingle	Darshan Lamichhane Advanced Detection of AI-Generated Images Through Vision Transformers IEEE Access GAN based images detection DeepFake images GAN image classification detection of AI-generated images fake AI-generated images detection vision transformers
title	Advanced Detection of AI-Generated Images Through Vision Transformers
title_full	Advanced Detection of AI-Generated Images Through Vision Transformers
title_fullStr	Advanced Detection of AI-Generated Images Through Vision Transformers
title_full_unstemmed	Advanced Detection of AI-Generated Images Through Vision Transformers
title_short	Advanced Detection of AI-Generated Images Through Vision Transformers
title_sort	advanced detection of ai generated images through vision transformers
topic	GAN based images detection DeepFake images GAN image classification detection of AI-generated images fake AI-generated images detection vision transformers
url	https://ieeexplore.ieee.org/document/10815726/
work_keys_str_mv	AT darshanlamichhane advanceddetectionofaigeneratedimagesthroughvisiontransformers

Advanced Detection of AI-Generated Images Through Vision Transformers

Similar Items