Contextual semantics graph attention network model for entity resolution

Abstract Entity resolution technology is the process of distinguishing whether data from different knowledge bases refer to the same entity in the real world. Existing research takes entity pairs as input and makes judgments based on the characteristics of entity pairs. However, there is insufficien...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaojun Li, Shuai Fan, Junping Yao, Haifeng Sun
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-11932-9
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Entity resolution technology is the process of distinguishing whether data from different knowledge bases refer to the same entity in the real world. Existing research takes entity pairs as input and makes judgments based on the characteristics of entity pairs. However, there is insufficient utilization of contextual semantics, as existing methods fail to effectively model the token-attribute associations within data sources and cross-attribute semantic hierarchical relationships, which weakens the discriminative power of key attributes. What’ more, they exhibit failure in handling polysemous ambiguities, as conventional graph neural network adopts rigid node representations that cannot dynamically adjust word meanings according to attribute-specific contexts. To address this issue, this paper proposes the Contextual Semantics Graph Attention Network (CSGAT), which extracts contextual information at token and attribute levels to generate semantically fused embeddings. The advantages of CSGAT are: 1) Leveraging the Transformer self-attention mechanism to extract feature vectors of words, model sequence relationships, and calculate the degree of relevance with other words; 2) Employing the attention mechanism on contextual information at the attribute level to extract semantic embeddings to enrich attribute embeddings, forming more discriminative attribute embeddings; 3) Utilizing the graph attention network to generate residual vectors for final entity resolution decisions. Experimental on Amazon-Google and BeerAdvo-RateBeer datasets show that, as compared with the competing methods, CSGAT can achieve significant improved performance on F1-score with fine Precision and Recall. Code is available at https://github.com/xhtech2024/csgat .
ISSN:2045-2322