CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher

Penetration testing, a critical component of cybersecurity, typically requires extensive time and effort to find vulnerabilities. Beginners in this field often benefit from collaborative approaches with the community or experts. To address this, we develop Cybersecurity Intelligent Penetration-testi...

Full description

Saved in:
Bibliographic Details
Main Authors: Derry Pratama, Naufal Suryanto, Andro Aprila Adiputra, Thi-Thu-Huong Le, Ahmada Yusril Kadiptya, Muhammad Iqbal, Howon Kim
Format: Article
Language:English
Published: MDPI AG 2024-10-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/21/6878
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846173079469817856
author Derry Pratama
Naufal Suryanto
Andro Aprila Adiputra
Thi-Thu-Huong Le
Ahmada Yusril Kadiptya
Muhammad Iqbal
Howon Kim
author_facet Derry Pratama
Naufal Suryanto
Andro Aprila Adiputra
Thi-Thu-Huong Le
Ahmada Yusril Kadiptya
Muhammad Iqbal
Howon Kim
author_sort Derry Pratama
collection DOAJ
description Penetration testing, a critical component of cybersecurity, typically requires extensive time and effort to find vulnerabilities. Beginners in this field often benefit from collaborative approaches with the community or experts. To address this, we develop Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers (CIPHER), a large language model specifically trained to assist in penetration testing tasks as a chatbot. Unlike software development, penetration testing involves domain-specific knowledge that is not widely documented or easily accessible, necessitating a specialized training approach for AI language models. CIPHER was trained using over 300 high-quality write-ups of vulnerable machines, hacking techniques, and documentation of open-source penetration testing tools augmented in an expert response structure. Additionally, we introduced the Findings, Action, Reasoning, and Results (FARR) Flow augmentation, a novel method to augment penetration testing write-ups to establish a fully automated pentesting simulation benchmark tailored for large language models. This approach fills a significant gap in traditional cybersecurity Q&A benchmarks and provides a realistic and rigorous standard for evaluating LLM’s technical knowledge, reasoning capabilities, and practical utility in dynamic penetration testing scenarios. In our assessments, CIPHER achieved the best overall performance in providing accurate suggestion responses compared to other open-source penetration testing models of similar size and even larger state-of-the-art models like Llama 3 70B and Qwen1.5 72B Chat, particularly on insane difficulty machine setups. This demonstrates that the current capabilities of general large language models (LLMs) are insufficient for effectively guiding users through the penetration testing process. We also discuss the potential for improvement through scaling and the development of better benchmarks using FARR Flow augmentation results.
format Article
id doaj-art-1cdfe13c0a4b4c7799ca564c992d86c5
institution Kabale University
issn 1424-8220
language English
publishDate 2024-10-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-1cdfe13c0a4b4c7799ca564c992d86c52024-11-08T14:41:15ZengMDPI AGSensors1424-82202024-10-012421687810.3390/s24216878CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical ResearcherDerry Pratama0Naufal Suryanto1Andro Aprila Adiputra2Thi-Thu-Huong Le3Ahmada Yusril Kadiptya4Muhammad Iqbal5Howon Kim6School of Computer Science and Engineering, Pusan National University, Busan 46241, Republic of KoreaIoT Research Center, Pusan National University, Busan 46241, Republic of KoreaSchool of Computer Science and Engineering, Pusan National University, Busan 46241, Republic of KoreaBlockchain Platform Research Center, Pusan National University, Busan 46241, Republic of KoreaSchool of Computer Science and Engineering, Pusan National University, Busan 46241, Republic of KoreaSchool of Computer Science and Engineering, Pusan National University, Busan 46241, Republic of KoreaSchool of Computer Science and Engineering, Pusan National University, Busan 46241, Republic of KoreaPenetration testing, a critical component of cybersecurity, typically requires extensive time and effort to find vulnerabilities. Beginners in this field often benefit from collaborative approaches with the community or experts. To address this, we develop Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers (CIPHER), a large language model specifically trained to assist in penetration testing tasks as a chatbot. Unlike software development, penetration testing involves domain-specific knowledge that is not widely documented or easily accessible, necessitating a specialized training approach for AI language models. CIPHER was trained using over 300 high-quality write-ups of vulnerable machines, hacking techniques, and documentation of open-source penetration testing tools augmented in an expert response structure. Additionally, we introduced the Findings, Action, Reasoning, and Results (FARR) Flow augmentation, a novel method to augment penetration testing write-ups to establish a fully automated pentesting simulation benchmark tailored for large language models. This approach fills a significant gap in traditional cybersecurity Q&A benchmarks and provides a realistic and rigorous standard for evaluating LLM’s technical knowledge, reasoning capabilities, and practical utility in dynamic penetration testing scenarios. In our assessments, CIPHER achieved the best overall performance in providing accurate suggestion responses compared to other open-source penetration testing models of similar size and even larger state-of-the-art models like Llama 3 70B and Qwen1.5 72B Chat, particularly on insane difficulty machine setups. This demonstrates that the current capabilities of general large language models (LLMs) are insufficient for effectively guiding users through the penetration testing process. We also discuss the potential for improvement through scaling and the development of better benchmarks using FARR Flow augmentation results.https://www.mdpi.com/1424-8220/24/21/6878penetration testinglarge language modelpentesting LLMAI penetration testing assistantdomain specific LLMLLM evaluation
spellingShingle Derry Pratama
Naufal Suryanto
Andro Aprila Adiputra
Thi-Thu-Huong Le
Ahmada Yusril Kadiptya
Muhammad Iqbal
Howon Kim
CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher
Sensors
penetration testing
large language model
pentesting LLM
AI penetration testing assistant
domain specific LLM
LLM evaluation
title CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher
title_full CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher
title_fullStr CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher
title_full_unstemmed CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher
title_short CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher
title_sort cipher cybersecurity intelligent penetration testing helper for ethical researcher
topic penetration testing
large language model
pentesting LLM
AI penetration testing assistant
domain specific LLM
LLM evaluation
url https://www.mdpi.com/1424-8220/24/21/6878
work_keys_str_mv AT derrypratama ciphercybersecurityintelligentpenetrationtestinghelperforethicalresearcher
AT naufalsuryanto ciphercybersecurityintelligentpenetrationtestinghelperforethicalresearcher
AT androaprilaadiputra ciphercybersecurityintelligentpenetrationtestinghelperforethicalresearcher
AT thithuhuongle ciphercybersecurityintelligentpenetrationtestinghelperforethicalresearcher
AT ahmadayusrilkadiptya ciphercybersecurityintelligentpenetrationtestinghelperforethicalresearcher
AT muhammadiqbal ciphercybersecurityintelligentpenetrationtestinghelperforethicalresearcher
AT howonkim ciphercybersecurityintelligentpenetrationtestinghelperforethicalresearcher