Quantization-Based Jailbreaking Vulnerability Analysis: A Study on Performance and Safety of the Llama3-8B-Instruct Model
This study systematically investigates how quantization, a key technique for the efficient deployment of large language models (LLMs), affects model safety. We specifically focus on jailbreaking vulnerabilities that emerge when models are subjected to quantization, particularly in multilingual and t...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11105403/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This study systematically investigates how quantization, a key technique for the efficient deployment of large language models (LLMs), affects model safety. We specifically focus on jailbreaking vulnerabilities that emerge when models are subjected to quantization, particularly in multilingual and tense-shifted scenarios. Using Llama3-8B-Instruct as a representative model, we evaluate 23 quantization levels across two languages and three tenses. Our experimental results reveal a critical trade-off: lower-bit quantization degrades the model’s core reasoning abilities, which directly correlates with a higher Attack Success Rate (ASR). Within this context, for the model tested, 4-bit quantization appears as a practical “sweet spot,” maintaining near-baseline performance while significantly reducing computational costs. However, even at this level, substantial vulnerabilities persist—Korean prompts exhibit attack success rates 25.5 percentage points higher than English, and past-tense transformations increase vulnerability by 39.3 percentage points. These findings highlight that safety mechanisms are often compromised by quantization-induced performance degradation and are biased toward English, present-tense prompts. Although this study has clear limitations, it provides the first quantitative analysis of these combined vulnerabilities, underscoring the need for more comprehensive safety evaluations in quantized LLM deployment. |
|---|---|
| ISSN: | 2169-3536 |