Chinese named entity recognition with multi-network fusion of multi-scale lexical information

Named entity recognition (NER) is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs. In today's Chinese named entity recognition (CNER) task, the BERT-BiLSTM-CRF model is widely used and often yields notable results. However, recognizing each e...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan Guo, Hong-Chen Liu, Fu-Jiang Liu, Wei-Hua Lin, Quan-Sen Shao, Jun-Shun Su
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-12-01
Series:Journal of Electronic Science and Technology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1674862X24000557
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846097257669066752
author Yan Guo
Hong-Chen Liu
Fu-Jiang Liu
Wei-Hua Lin
Quan-Sen Shao
Jun-Shun Su
author_facet Yan Guo
Hong-Chen Liu
Fu-Jiang Liu
Wei-Hua Lin
Quan-Sen Shao
Jun-Shun Su
author_sort Yan Guo
collection DOAJ
description Named entity recognition (NER) is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs. In today's Chinese named entity recognition (CNER) task, the BERT-BiLSTM-CRF model is widely used and often yields notable results. However, recognizing each entity with high accuracy remains challenging. Many entities do not appear as single words but as part of complex phrases, making it difficult to achieve accurate recognition using word embedding information alone because the intricate lexical structure often impacts the performance. To address this issue, we propose an improved Bidirectional Encoder Representations from Transformers (BERT) character word conditional random field (CRF) (BCWC) model. It incorporates a pre-trained word embedding model using the skip-gram with negative sampling (SGNS) method, alongside traditional BERT embeddings. By comparing datasets with different word segmentation tools, we obtain enhanced word embedding features for segmented data. These features are then processed using the multi-scale convolution and iterated dilated convolutional neural networks (IDCNNs) with varying expansion rates to capture features at multiple scales and extract diverse contextual information. Additionally, a multi-attention mechanism is employed to fuse word and character embeddings. Finally, CRFs are applied to learn sequence constraints and optimize entity label annotations. A series of experiments are conducted on three public datasets, demonstrating that the proposed method outperforms the recent advanced baselines. BCWC is capable to address the challenge of recognizing complex entities by combining character-level and word-level embedding information, thereby improving the accuracy of CNER. Such a model is potential to the applications of more precise knowledge extraction such as knowledge graph construction and information retrieval, particularly in domain-specific natural language processing tasks that require high entity recognition precision.
format Article
id doaj-art-4d86d5a9bb8444d98a2c8bee6b723f02
institution Kabale University
issn 2666-223X
language English
publishDate 2024-12-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Journal of Electronic Science and Technology
spelling doaj-art-4d86d5a9bb8444d98a2c8bee6b723f022025-01-02T04:11:30ZengKeAi Communications Co., Ltd.Journal of Electronic Science and Technology2666-223X2024-12-01224100287Chinese named entity recognition with multi-network fusion of multi-scale lexical informationYan Guo0Hong-Chen Liu1Fu-Jiang Liu2Wei-Hua Lin3Quan-Sen Shao4Jun-Shun Su5School of Computer Science, China University of Geosciences, Wuhan, 430078, ChinaSchool of Computer Science, China University of Geosciences, Wuhan, 430078, ChinaSchool of Geography and Information Engineering, China University of Geosciences, Wuhan, 430078, China; Corresponding author.School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430078, ChinaSchool of Computer Science, China University of Geosciences, Wuhan, 430078, ChinaXining Comprehensive Natural Resources Survey Centre, China Geological Survey (CGS), Xining, 810000, ChinaNamed entity recognition (NER) is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs. In today's Chinese named entity recognition (CNER) task, the BERT-BiLSTM-CRF model is widely used and often yields notable results. However, recognizing each entity with high accuracy remains challenging. Many entities do not appear as single words but as part of complex phrases, making it difficult to achieve accurate recognition using word embedding information alone because the intricate lexical structure often impacts the performance. To address this issue, we propose an improved Bidirectional Encoder Representations from Transformers (BERT) character word conditional random field (CRF) (BCWC) model. It incorporates a pre-trained word embedding model using the skip-gram with negative sampling (SGNS) method, alongside traditional BERT embeddings. By comparing datasets with different word segmentation tools, we obtain enhanced word embedding features for segmented data. These features are then processed using the multi-scale convolution and iterated dilated convolutional neural networks (IDCNNs) with varying expansion rates to capture features at multiple scales and extract diverse contextual information. Additionally, a multi-attention mechanism is employed to fuse word and character embeddings. Finally, CRFs are applied to learn sequence constraints and optimize entity label annotations. A series of experiments are conducted on three public datasets, demonstrating that the proposed method outperforms the recent advanced baselines. BCWC is capable to address the challenge of recognizing complex entities by combining character-level and word-level embedding information, thereby improving the accuracy of CNER. Such a model is potential to the applications of more precise knowledge extraction such as knowledge graph construction and information retrieval, particularly in domain-specific natural language processing tasks that require high entity recognition precision.http://www.sciencedirect.com/science/article/pii/S1674862X24000557Bi-directional long short-term memory (BiLSTM)Chinese named entity recognition (CNER)Iterated dilated convolutional neural network (IDCNN)Multi-network integrationMulti-scale lexical features
spellingShingle Yan Guo
Hong-Chen Liu
Fu-Jiang Liu
Wei-Hua Lin
Quan-Sen Shao
Jun-Shun Su
Chinese named entity recognition with multi-network fusion of multi-scale lexical information
Journal of Electronic Science and Technology
Bi-directional long short-term memory (BiLSTM)
Chinese named entity recognition (CNER)
Iterated dilated convolutional neural network (IDCNN)
Multi-network integration
Multi-scale lexical features
title Chinese named entity recognition with multi-network fusion of multi-scale lexical information
title_full Chinese named entity recognition with multi-network fusion of multi-scale lexical information
title_fullStr Chinese named entity recognition with multi-network fusion of multi-scale lexical information
title_full_unstemmed Chinese named entity recognition with multi-network fusion of multi-scale lexical information
title_short Chinese named entity recognition with multi-network fusion of multi-scale lexical information
title_sort chinese named entity recognition with multi network fusion of multi scale lexical information
topic Bi-directional long short-term memory (BiLSTM)
Chinese named entity recognition (CNER)
Iterated dilated convolutional neural network (IDCNN)
Multi-network integration
Multi-scale lexical features
url http://www.sciencedirect.com/science/article/pii/S1674862X24000557
work_keys_str_mv AT yanguo chinesenamedentityrecognitionwithmultinetworkfusionofmultiscalelexicalinformation
AT hongchenliu chinesenamedentityrecognitionwithmultinetworkfusionofmultiscalelexicalinformation
AT fujiangliu chinesenamedentityrecognitionwithmultinetworkfusionofmultiscalelexicalinformation
AT weihualin chinesenamedentityrecognitionwithmultinetworkfusionofmultiscalelexicalinformation
AT quansenshao chinesenamedentityrecognitionwithmultinetworkfusionofmultiscalelexicalinformation
AT junshunsu chinesenamedentityrecognitionwithmultinetworkfusionofmultiscalelexicalinformation