Published 2024-02-01
“…
Password-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale
password leaks have highlighted the vulnerability of
passwords to risks such as guessing or theft.In recent years, research on
password analysis using natural language processing techniques has progressed, treating
passwords as a special form of natural language.Nevertheless, limited studies have investigated the impact of
password text segmentation granularity on the effectiveness of
password analysis with large language models.A multi-granularity
password-analyzing framework was proposed based on a large language model, which follows the pre-training paradigm and autonomously learns prior knowledge of
password distribution from large unlabelled datasets.The framework comprised three modules: the synchronization network, backbone network, and tail network.The synchronization network module implemented char-level, template-level, and chunk-level
password segmentation, extracting knowledge on character distribution, structure, word chunk composition, and other
password features.The backbone network module constructed a generic
password model to learn the rules governing
password composition.The tail network module generated candidate
passwords for guessing and analyzing target databases.Experimental evaluations were conducted on eight
password databases including Tianya and Twitter, analyzing and summarizing the effectiveness of the proposed framework under different language environments and word segmentation granularities.The results indicate that in
Chinese user scenarios, the performance of the
password-analyzing framework based on char-level and chunk-level segmentation is comparable, and significantly superior to the framework based on template-level segmentation.In English user scenarios, the framework based on chunk-level segmentation demonstrates the best
password-analyzing performance.…”
Get full text
Article