RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text

Extracting geographic information from texts contributes to both geographic information science research and various practical applications, but extracting fine-grained and complex location descriptions from Chinese text is still challenging, due to flexible word construction and lack of clear bound...

Full description

Saved in:
Bibliographic Details
Main Authors: Haigang Sui, Jindi Wang, Xining Zhang, Huihan Ning, Wentao Wang, Lieyun Hu
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:Geo-spatial Information Science
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/10095020.2024.2440079
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841525405824057344
author Haigang Sui
Jindi Wang
Xining Zhang
Huihan Ning
Wentao Wang
Lieyun Hu
author_facet Haigang Sui
Jindi Wang
Xining Zhang
Huihan Ning
Wentao Wang
Lieyun Hu
author_sort Haigang Sui
collection DOAJ
description Extracting geographic information from texts contributes to both geographic information science research and various practical applications, but extracting fine-grained and complex location descriptions from Chinese text is still challenging, due to flexible word construction and lack of clear boundaries in Chinese place names. In this paper, we propose a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text (RB-TRNet), achieving complex place name recognition by learning the internal compositional patterns of various place name constructions and automatically perceiving the boundaries and types of Chinese place name entities. First, RoBERTa is used to represent the input text containing Chinese place names. Then, two BiLSTM layers are fed with text representation sequences, with one processed sequence entering the toponym regularity-guided module to obtain the composition patterns of Chinese place name entities and the other sequence entering the toponym regularity-discriminant module to soften an excessive reliance on contextual information for recognizing patterns of Chinese place name entities. Additionally, an orthogonal space is established after the BiLSTM network to facilitate the learning of different rule features by the two modules. Finally, after joint optimization training of the three modules, the toponym regularity perception module is used to predict the Chinese place name entities. To validate the results, we established a new complex Chinese place name text (CCPNT) dataset for complex Chinese place name recognition. The CCPNT dataset, along with three other public datasets, were used for performance evaluation, and compared to eight baseline models, RB-TRNet exhibited state-of-the-art performance in recognizing complex Chinese place names.
format Article
id doaj-art-b583d85f4f85499f887bf486631972c0
institution Kabale University
issn 1009-5020
1993-5153
language English
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series Geo-spatial Information Science
spelling doaj-art-b583d85f4f85499f887bf486631972c02025-01-17T13:54:47ZengTaylor & Francis GroupGeo-spatial Information Science1009-50201993-51532024-12-0111510.1080/10095020.2024.2440079RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese textHaigang Sui0Jindi Wang1Xining Zhang2Huihan Ning3Wentao Wang4Lieyun Hu5State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaSchool of Computer Science, Wuhan University, Wuhan, ChinaSchool of Cyber Science and Engineering, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaExtracting geographic information from texts contributes to both geographic information science research and various practical applications, but extracting fine-grained and complex location descriptions from Chinese text is still challenging, due to flexible word construction and lack of clear boundaries in Chinese place names. In this paper, we propose a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text (RB-TRNet), achieving complex place name recognition by learning the internal compositional patterns of various place name constructions and automatically perceiving the boundaries and types of Chinese place name entities. First, RoBERTa is used to represent the input text containing Chinese place names. Then, two BiLSTM layers are fed with text representation sequences, with one processed sequence entering the toponym regularity-guided module to obtain the composition patterns of Chinese place name entities and the other sequence entering the toponym regularity-discriminant module to soften an excessive reliance on contextual information for recognizing patterns of Chinese place name entities. Additionally, an orthogonal space is established after the BiLSTM network to facilitate the learning of different rule features by the two modules. Finally, after joint optimization training of the three modules, the toponym regularity perception module is used to predict the Chinese place name entities. To validate the results, we established a new complex Chinese place name text (CCPNT) dataset for complex Chinese place name recognition. The CCPNT dataset, along with three other public datasets, were used for performance evaluation, and compared to eight baseline models, RB-TRNet exhibited state-of-the-art performance in recognizing complex Chinese place names.https://www.tandfonline.com/doi/10.1080/10095020.2024.2440079Natural language processing (NLP)toponym recognitionlexical enhancementtoponym regularitydeep learning
spellingShingle Haigang Sui
Jindi Wang
Xining Zhang
Huihan Ning
Wentao Wang
Lieyun Hu
RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text
Geo-spatial Information Science
Natural language processing (NLP)
toponym recognition
lexical enhancement
toponym regularity
deep learning
title RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text
title_full RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text
title_fullStr RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text
title_full_unstemmed RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text
title_short RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text
title_sort rb trnet a regularity guided and boundary aware architecture for toponym recognition from chinese text
topic Natural language processing (NLP)
toponym recognition
lexical enhancement
toponym regularity
deep learning
url https://www.tandfonline.com/doi/10.1080/10095020.2024.2440079
work_keys_str_mv AT haigangsui rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext
AT jindiwang rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext
AT xiningzhang rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext
AT huihanning rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext
AT wentaowang rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext
AT lieyunhu rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext