Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation
Background: Recent advancements in large language model (LLM) technologies have introduced powerful open-source instruction-tuned LLMs that match the text generation quality of leading models like GPT-4. Despite accelerating LLM adoption in sensitive-information environments, the lack of disclosed...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
National Research University Higher School of Economics
2024-12-01
|
Series: | Journal of Language and Education |
Subjects: | |
Online Access: | https://jle.hse.ru/article/view/22224 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841556008958164992 |
---|---|
author | Михаил Тихомиров Даниил Чернышев |
author_facet | Михаил Тихомиров Даниил Чернышев |
author_sort | Михаил Тихомиров |
collection | DOAJ |
description |
Background: Recent advancements in large language model (LLM) technologies have introduced powerful open-source instruction-tuned LLMs that match the text generation quality of leading models like GPT-4. Despite accelerating LLM adoption in sensitive-information environments, the lack of disclosed training data hinders replication and makes these achievements exclusive to specific models.
Purpose: Given the multilingual nature of the latest iteration of open-source LLMs, the benefits of training language-specific LLMs diminish, leaving computational efficiency as the sole guaranteed advantage of this computationally-expensive procedure. This work aims to address the language-adaptation limitations posed by restricted access to high-quality instruction-tuning data, offering a more cost-effective pipeline.
Method: To tackle language-adaptation challenges, we introduce Learned Embedding Propagation (LEP), a novel method with lower training data requirements and minimal disruption of existing LLM knowledge. LEP employs an innovative embedding propagation technique, bypassing the need for instruction-tuning and directly integrating new language knowledge into any instruct-tuned LLM variant. Additionally, we developed Darumeru, a new benchmark for evaluating text generation robustness during training, specifically tailored for Russian adaptation.
Results: We applied the LEP method to adapt LLaMa-3-8B and Mistral-7B for Russian, testing four different vocabulary adaptation scenarios. Evaluation demonstrates that LEP achieves competitive performance levels, comparable to OpenChat 3.5 and LLaMa-3-8B-Instruct. Further improvements were observed through self-calibration and additional instruction-tuning steps, enhancing task-solving capabilities beyond the original models.
Conclusion: LEP offers a viable and efficient alternative to traditional language-specific instruction-tuning, significantly reducing the costs associated with language adaptation while maintaining or surpassing the performance benchmarks set by contemporary LLMs.
|
format | Article |
id | doaj-art-d24352c034684912ada5218cd1146c81 |
institution | Kabale University |
issn | 2411-7390 |
language | English |
publishDate | 2024-12-01 |
publisher | National Research University Higher School of Economics |
record_format | Article |
series | Journal of Language and Education |
spelling | doaj-art-d24352c034684912ada5218cd1146c812025-01-07T16:17:17ZengNational Research University Higher School of EconomicsJournal of Language and Education2411-73902024-12-0110410.17323/jle.2024.22224Facilitating Large Language Model Russian Adaptation with Learned Embedding PropagationМихаил Тихомиров0Даниил Чернышев1Lomonosov Moscow State University, Moscow, RussiaLomonosov Moscow State University, Moscow, Russia Background: Recent advancements in large language model (LLM) technologies have introduced powerful open-source instruction-tuned LLMs that match the text generation quality of leading models like GPT-4. Despite accelerating LLM adoption in sensitive-information environments, the lack of disclosed training data hinders replication and makes these achievements exclusive to specific models. Purpose: Given the multilingual nature of the latest iteration of open-source LLMs, the benefits of training language-specific LLMs diminish, leaving computational efficiency as the sole guaranteed advantage of this computationally-expensive procedure. This work aims to address the language-adaptation limitations posed by restricted access to high-quality instruction-tuning data, offering a more cost-effective pipeline. Method: To tackle language-adaptation challenges, we introduce Learned Embedding Propagation (LEP), a novel method with lower training data requirements and minimal disruption of existing LLM knowledge. LEP employs an innovative embedding propagation technique, bypassing the need for instruction-tuning and directly integrating new language knowledge into any instruct-tuned LLM variant. Additionally, we developed Darumeru, a new benchmark for evaluating text generation robustness during training, specifically tailored for Russian adaptation. Results: We applied the LEP method to adapt LLaMa-3-8B and Mistral-7B for Russian, testing four different vocabulary adaptation scenarios. Evaluation demonstrates that LEP achieves competitive performance levels, comparable to OpenChat 3.5 and LLaMa-3-8B-Instruct. Further improvements were observed through self-calibration and additional instruction-tuning steps, enhancing task-solving capabilities beyond the original models. Conclusion: LEP offers a viable and efficient alternative to traditional language-specific instruction-tuning, significantly reducing the costs associated with language adaptation while maintaining or surpassing the performance benchmarks set by contemporary LLMs. https://jle.hse.ru/article/view/22224large language modellanguage adaptationnatural language generationllama |
spellingShingle | Михаил Тихомиров Даниил Чернышев Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation Journal of Language and Education large language model language adaptation natural language generation llama |
title | Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation |
title_full | Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation |
title_fullStr | Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation |
title_full_unstemmed | Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation |
title_short | Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation |
title_sort | facilitating large language model russian adaptation with learned embedding propagation |
topic | large language model language adaptation natural language generation llama |
url | https://jle.hse.ru/article/view/22224 |
work_keys_str_mv | AT mihailtihomirov facilitatinglargelanguagemodelrussianadaptationwithlearnedembeddingpropagation AT daniilčernyšev facilitatinglargelanguagemodelrussianadaptationwithlearnedembeddingpropagation |