Using generative adversarial network to improve the accuracy of detecting AI-generated tweets

Abstract This paper provides a novel approach using state-of-the-art generative Artificial Intelligence (AI) models to enhance the accuracy of machine learning methods in detecting AI-generated texts; the underlying generative capabilities are used along with ensemble-based learning methods for the...

Full description

Saved in:
Bibliographic Details
Main Author: Yang Hui
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-78601-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846147740712566784
author Yang Hui
author_facet Yang Hui
author_sort Yang Hui
collection DOAJ
description Abstract This paper provides a novel approach using state-of-the-art generative Artificial Intelligence (AI) models to enhance the accuracy of machine learning methods in detecting AI-generated texts; the underlying generative capabilities are used along with ensemble-based learning methods for the exact characterization of created text attributes. Four basic steps are involved in the proposed methodology. The first step of the text process is the preprocessing stage itself consisting of several steps for the purification of irrelevant data. These stages include noise removal, text tokenization, removal of stop-words, word normalization, and handling uncommon words. In the next step, feature engineering and text representations are done whereby every preprocessed text is represented by a square matrix. This matrix encapsulates data about word correlations, cooccurrence, and word weights. The third step is Generative Adversarial Network (GAN)-based feature extraction, using a GAN model to extract efficient features in classifying the texts based on their creator type. After that, it turns the discriminator part into a strong feature extraction model. The fourth step is weighted Random Forest (RF)-based detection, with the features extracted by the discriminator of GAN serving as input to the RF-based detection model. This approach has covered the differences between texts generated by a human and that generated by Artificial Intelligence, with a significant improvement of 99.60% average accuracy, representing a 1.5% improvement against comparative methods.
format Article
id doaj-art-51237da9908e49e9a735001d16a219b1
institution Kabale University
issn 2045-2322
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-51237da9908e49e9a735001d16a219b12024-12-01T12:26:59ZengNature PortfolioScientific Reports2045-23222024-11-0114111610.1038/s41598-024-78601-1Using generative adversarial network to improve the accuracy of detecting AI-generated tweetsYang Hui0School of Humanities and Law, Zhengzhou Shengda UniversityAbstract This paper provides a novel approach using state-of-the-art generative Artificial Intelligence (AI) models to enhance the accuracy of machine learning methods in detecting AI-generated texts; the underlying generative capabilities are used along with ensemble-based learning methods for the exact characterization of created text attributes. Four basic steps are involved in the proposed methodology. The first step of the text process is the preprocessing stage itself consisting of several steps for the purification of irrelevant data. These stages include noise removal, text tokenization, removal of stop-words, word normalization, and handling uncommon words. In the next step, feature engineering and text representations are done whereby every preprocessed text is represented by a square matrix. This matrix encapsulates data about word correlations, cooccurrence, and word weights. The third step is Generative Adversarial Network (GAN)-based feature extraction, using a GAN model to extract efficient features in classifying the texts based on their creator type. After that, it turns the discriminator part into a strong feature extraction model. The fourth step is weighted Random Forest (RF)-based detection, with the features extracted by the discriminator of GAN serving as input to the RF-based detection model. This approach has covered the differences between texts generated by a human and that generated by Artificial Intelligence, with a significant improvement of 99.60% average accuracy, representing a 1.5% improvement against comparative methods.https://doi.org/10.1038/s41598-024-78601-1Artificial IntelligenceAI-generated tweetsGenerative adversarial networkRandom forestText analysis
spellingShingle Yang Hui
Using generative adversarial network to improve the accuracy of detecting AI-generated tweets
Scientific Reports
Artificial Intelligence
AI-generated tweets
Generative adversarial network
Random forest
Text analysis
title Using generative adversarial network to improve the accuracy of detecting AI-generated tweets
title_full Using generative adversarial network to improve the accuracy of detecting AI-generated tweets
title_fullStr Using generative adversarial network to improve the accuracy of detecting AI-generated tweets
title_full_unstemmed Using generative adversarial network to improve the accuracy of detecting AI-generated tweets
title_short Using generative adversarial network to improve the accuracy of detecting AI-generated tweets
title_sort using generative adversarial network to improve the accuracy of detecting ai generated tweets
topic Artificial Intelligence
AI-generated tweets
Generative adversarial network
Random forest
Text analysis
url https://doi.org/10.1038/s41598-024-78601-1
work_keys_str_mv AT yanghui usinggenerativeadversarialnetworktoimprovetheaccuracyofdetectingaigeneratedtweets