Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning

Hierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Clas...

Full description

Saved in:
Bibliographic Details
Main Authors: Han Chen, Yangsen Zhang, Yuru Jiang, Ruixue Duan
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10807255/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Classification method, EDTPA (ERNIE and Dynamic Threshold Pruning-based Adaptive classification), which leverages Large Language Models (LLMs) for data augmentation to mitigate imbalanced datasets. The model first uses Graph Attention Networks (GAT) to capture hierarchical dependencies among labels, effectively modeling structured relationships. ERNIE enhances the semantic representation of both the text and hierarchical labels, optimizing the model’s ability to process Chinese text. An attention mechanism strengthens the alignment between text and labels, improving accuracy. The model combines global and local information flows, while dynamic threshold pruning prunes low-probability branches, improving interpretability. Results on the Chinese Scientific Literature (CSL) dataset show EDTPA significantly outperforms baseline models in both Micro-F1 and Macro-F1 scores, effectively addressing data imbalance and improving classification performance.
ISSN:2169-3536