RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text
Extracting geographic information from texts contributes to both geographic information science research and various practical applications, but extracting fine-grained and complex location descriptions from Chinese text is still challenging, due to flexible word construction and lack of clear bound...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2024-12-01
|
Series: | Geo-spatial Information Science |
Subjects: | |
Online Access: | https://www.tandfonline.com/doi/10.1080/10095020.2024.2440079 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841525405824057344 |
---|---|
author | Haigang Sui Jindi Wang Xining Zhang Huihan Ning Wentao Wang Lieyun Hu |
author_facet | Haigang Sui Jindi Wang Xining Zhang Huihan Ning Wentao Wang Lieyun Hu |
author_sort | Haigang Sui |
collection | DOAJ |
description | Extracting geographic information from texts contributes to both geographic information science research and various practical applications, but extracting fine-grained and complex location descriptions from Chinese text is still challenging, due to flexible word construction and lack of clear boundaries in Chinese place names. In this paper, we propose a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text (RB-TRNet), achieving complex place name recognition by learning the internal compositional patterns of various place name constructions and automatically perceiving the boundaries and types of Chinese place name entities. First, RoBERTa is used to represent the input text containing Chinese place names. Then, two BiLSTM layers are fed with text representation sequences, with one processed sequence entering the toponym regularity-guided module to obtain the composition patterns of Chinese place name entities and the other sequence entering the toponym regularity-discriminant module to soften an excessive reliance on contextual information for recognizing patterns of Chinese place name entities. Additionally, an orthogonal space is established after the BiLSTM network to facilitate the learning of different rule features by the two modules. Finally, after joint optimization training of the three modules, the toponym regularity perception module is used to predict the Chinese place name entities. To validate the results, we established a new complex Chinese place name text (CCPNT) dataset for complex Chinese place name recognition. The CCPNT dataset, along with three other public datasets, were used for performance evaluation, and compared to eight baseline models, RB-TRNet exhibited state-of-the-art performance in recognizing complex Chinese place names. |
format | Article |
id | doaj-art-b583d85f4f85499f887bf486631972c0 |
institution | Kabale University |
issn | 1009-5020 1993-5153 |
language | English |
publishDate | 2024-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Geo-spatial Information Science |
spelling | doaj-art-b583d85f4f85499f887bf486631972c02025-01-17T13:54:47ZengTaylor & Francis GroupGeo-spatial Information Science1009-50201993-51532024-12-0111510.1080/10095020.2024.2440079RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese textHaigang Sui0Jindi Wang1Xining Zhang2Huihan Ning3Wentao Wang4Lieyun Hu5State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaSchool of Computer Science, Wuhan University, Wuhan, ChinaSchool of Cyber Science and Engineering, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaExtracting geographic information from texts contributes to both geographic information science research and various practical applications, but extracting fine-grained and complex location descriptions from Chinese text is still challenging, due to flexible word construction and lack of clear boundaries in Chinese place names. In this paper, we propose a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text (RB-TRNet), achieving complex place name recognition by learning the internal compositional patterns of various place name constructions and automatically perceiving the boundaries and types of Chinese place name entities. First, RoBERTa is used to represent the input text containing Chinese place names. Then, two BiLSTM layers are fed with text representation sequences, with one processed sequence entering the toponym regularity-guided module to obtain the composition patterns of Chinese place name entities and the other sequence entering the toponym regularity-discriminant module to soften an excessive reliance on contextual information for recognizing patterns of Chinese place name entities. Additionally, an orthogonal space is established after the BiLSTM network to facilitate the learning of different rule features by the two modules. Finally, after joint optimization training of the three modules, the toponym regularity perception module is used to predict the Chinese place name entities. To validate the results, we established a new complex Chinese place name text (CCPNT) dataset for complex Chinese place name recognition. The CCPNT dataset, along with three other public datasets, were used for performance evaluation, and compared to eight baseline models, RB-TRNet exhibited state-of-the-art performance in recognizing complex Chinese place names.https://www.tandfonline.com/doi/10.1080/10095020.2024.2440079Natural language processing (NLP)toponym recognitionlexical enhancementtoponym regularitydeep learning |
spellingShingle | Haigang Sui Jindi Wang Xining Zhang Huihan Ning Wentao Wang Lieyun Hu RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text Geo-spatial Information Science Natural language processing (NLP) toponym recognition lexical enhancement toponym regularity deep learning |
title | RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text |
title_full | RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text |
title_fullStr | RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text |
title_full_unstemmed | RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text |
title_short | RB-TRNet: a regularity-guided and boundary-aware architecture for toponym recognition from Chinese text |
title_sort | rb trnet a regularity guided and boundary aware architecture for toponym recognition from chinese text |
topic | Natural language processing (NLP) toponym recognition lexical enhancement toponym regularity deep learning |
url | https://www.tandfonline.com/doi/10.1080/10095020.2024.2440079 |
work_keys_str_mv | AT haigangsui rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext AT jindiwang rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext AT xiningzhang rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext AT huihanning rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext AT wentaowang rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext AT lieyunhu rbtrnetaregularityguidedandboundaryawarearchitecturefortoponymrecognitionfromchinesetext |