Research on self-training neural machine translation based on monolingual priority sampling

To enhance the performance of neural machine translation (NMT) and ameliorate the detrimental impact of high uncertainty in monolingual data during the self-training process, a self-training NMT model based on priority sampling was proposed. Initially, syntactic dependency trees were constructed and...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG Xiaoyan, PANG Lei, DU Xiaofeng, LU Tianbo, XIA Yamei
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2024-04-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2024066/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539233380040704
author ZHANG Xiaoyan
PANG Lei
DU Xiaofeng
LU Tianbo
XIA Yamei
author_facet ZHANG Xiaoyan
PANG Lei
DU Xiaofeng
LU Tianbo
XIA Yamei
author_sort ZHANG Xiaoyan
collection DOAJ
description To enhance the performance of neural machine translation (NMT) and ameliorate the detrimental impact of high uncertainty in monolingual data during the self-training process, a self-training NMT model based on priority sampling was proposed. Initially, syntactic dependency trees were constructed and the importance of monolingual tokenization was assessed using grammar dependency analysis. Subsequently, a monolingual lexicon was built, and priority was defined based on the importance of monolingual tokenization and uncertainty. Finally, monolingual priorities were computed, and sampling was carried out based on these priorities, consequently generating a synthetic parallel dataset for training the student NMT model. Experimental results on a large-scale subset of the WMT English to German dataset demonstrate that the proposed model effectively enhances NMT translation performance and mitigates the impact of high uncertainty on the model.
format Article
id doaj-art-d0219d6206fb4b7abe0f86d7555e79a8
institution Kabale University
issn 1000-436X
language zho
publishDate 2024-04-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-d0219d6206fb4b7abe0f86d7555e79a82025-01-14T07:24:15ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2024-04-0145657259254912Research on self-training neural machine translation based on monolingual priority samplingZHANG XiaoyanPANG LeiDU XiaofengLU TianboXIA YameiTo enhance the performance of neural machine translation (NMT) and ameliorate the detrimental impact of high uncertainty in monolingual data during the self-training process, a self-training NMT model based on priority sampling was proposed. Initially, syntactic dependency trees were constructed and the importance of monolingual tokenization was assessed using grammar dependency analysis. Subsequently, a monolingual lexicon was built, and priority was defined based on the importance of monolingual tokenization and uncertainty. Finally, monolingual priorities were computed, and sampling was carried out based on these priorities, consequently generating a synthetic parallel dataset for training the student NMT model. Experimental results on a large-scale subset of the WMT English to German dataset demonstrate that the proposed model effectively enhances NMT translation performance and mitigates the impact of high uncertainty on the model.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2024066/machine translationdata augmentationself-traininguncertaintysyntactic dependency
spellingShingle ZHANG Xiaoyan
PANG Lei
DU Xiaofeng
LU Tianbo
XIA Yamei
Research on self-training neural machine translation based on monolingual priority sampling
Tongxin xuebao
machine translation
data augmentation
self-training
uncertainty
syntactic dependency
title Research on self-training neural machine translation based on monolingual priority sampling
title_full Research on self-training neural machine translation based on monolingual priority sampling
title_fullStr Research on self-training neural machine translation based on monolingual priority sampling
title_full_unstemmed Research on self-training neural machine translation based on monolingual priority sampling
title_short Research on self-training neural machine translation based on monolingual priority sampling
title_sort research on self training neural machine translation based on monolingual priority sampling
topic machine translation
data augmentation
self-training
uncertainty
syntactic dependency
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2024066/
work_keys_str_mv AT zhangxiaoyan researchonselftrainingneuralmachinetranslationbasedonmonolingualprioritysampling
AT panglei researchonselftrainingneuralmachinetranslationbasedonmonolingualprioritysampling
AT duxiaofeng researchonselftrainingneuralmachinetranslationbasedonmonolingualprioritysampling
AT lutianbo researchonselftrainingneuralmachinetranslationbasedonmonolingualprioritysampling
AT xiayamei researchonselftrainingneuralmachinetranslationbasedonmonolingualprioritysampling