Quantization-Based Jailbreaking Vulnerability Analysis: A Study on Performance and Safety of the Llama3-8B-Instruct Model

This study systematically investigates how quantization, a key technique for the efficient deployment of large language models (LLMs), affects model safety. We specifically focus on jailbreaking vulnerabilities that emerge when models are subjected to quantization, particularly in multilingual and t...

Full description

Saved in:

Bibliographic Details
Main Author:	Jaesik Lee
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Attack success rate (ASR) jailbreak large language model (LLM) multilingual quantization safety
Online Access:	https://ieeexplore.ieee.org/document/11105403/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This study systematically investigates how quantization, a key technique for the efficient deployment of large language models (LLMs), affects model safety. We specifically focus on jailbreaking vulnerabilities that emerge when models are subjected to quantization, particularly in multilingual and tense-shifted scenarios. Using Llama3-8B-Instruct as a representative model, we evaluate 23 quantization levels across two languages and three tenses. Our experimental results reveal a critical trade-off: lower-bit quantization degrades the model’s core reasoning abilities, which directly correlates with a higher Attack Success Rate (ASR). Within this context, for the model tested, 4-bit quantization appears as a practical “sweet spot,” maintaining near-baseline performance while significantly reducing computational costs. However, even at this level, substantial vulnerabilities persist—Korean prompts exhibit attack success rates 25.5 percentage points higher than English, and past-tense transformations increase vulnerability by 39.3 percentage points. These findings highlight that safety mechanisms are often compromised by quantization-induced performance degradation and are biased toward English, present-tense prompts. Although this study has clear limitations, it provides the first quantitative analysis of these combined vulnerabilities, underscoring the need for more comprehensive safety evaluations in quantized LLM deployment.
ISSN:	2169-3536

Quantization-Based Jailbreaking Vulnerability Analysis: A Study on Performance and Safety of the Llama3-8B-Instruct Model

Similar Items