Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning

Hierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Clas...

Full description

Saved in:
Bibliographic Details
Main Authors: Han Chen, Yangsen Zhang, Yuru Jiang, Ruixue Duan
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10807255/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841533410524266496
author Han Chen
Yangsen Zhang
Yuru Jiang
Ruixue Duan
author_facet Han Chen
Yangsen Zhang
Yuru Jiang
Ruixue Duan
author_sort Han Chen
collection DOAJ
description Hierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Classification method, EDTPA (ERNIE and Dynamic Threshold Pruning-based Adaptive classification), which leverages Large Language Models (LLMs) for data augmentation to mitigate imbalanced datasets. The model first uses Graph Attention Networks (GAT) to capture hierarchical dependencies among labels, effectively modeling structured relationships. ERNIE enhances the semantic representation of both the text and hierarchical labels, optimizing the model’s ability to process Chinese text. An attention mechanism strengthens the alignment between text and labels, improving accuracy. The model combines global and local information flows, while dynamic threshold pruning prunes low-probability branches, improving interpretability. Results on the Chinese Scientific Literature (CSL) dataset show EDTPA significantly outperforms baseline models in both Micro-F1 and Macro-F1 scores, effectively addressing data imbalance and improving classification performance.
format Article
id doaj-art-60f18b9d756b4fa0a666b98e4b02362f
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-60f18b9d756b4fa0a666b98e4b02362f2025-01-16T00:02:02ZengIEEEIEEE Access2169-35362024-01-011219364119365210.1109/ACCESS.2024.351995410807255Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold PruningHan Chen0https://orcid.org/0009-0008-9408-5979Yangsen Zhang1Yuru Jiang2https://orcid.org/0000-0002-0947-2640Ruixue Duan3https://orcid.org/0000-0002-4478-1692Institute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaInstitute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaInstitute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaInstitute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaHierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Classification method, EDTPA (ERNIE and Dynamic Threshold Pruning-based Adaptive classification), which leverages Large Language Models (LLMs) for data augmentation to mitigate imbalanced datasets. The model first uses Graph Attention Networks (GAT) to capture hierarchical dependencies among labels, effectively modeling structured relationships. ERNIE enhances the semantic representation of both the text and hierarchical labels, optimizing the model’s ability to process Chinese text. An attention mechanism strengthens the alignment between text and labels, improving accuracy. The model combines global and local information flows, while dynamic threshold pruning prunes low-probability branches, improving interpretability. Results on the Chinese Scientific Literature (CSL) dataset show EDTPA significantly outperforms baseline models in both Micro-F1 and Macro-F1 scores, effectively addressing data imbalance and improving classification performance.https://ieeexplore.ieee.org/document/10807255/Hierarchical text classificationdata augmentationlarge language modelgraph attention networkERNIE
spellingShingle Han Chen
Yangsen Zhang
Yuru Jiang
Ruixue Duan
Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning
IEEE Access
Hierarchical text classification
data augmentation
large language model
graph attention network
ERNIE
title Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning
title_full Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning
title_fullStr Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning
title_full_unstemmed Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning
title_short Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning
title_sort adaptive hierarchical text classification using ernie and dynamic threshold pruning
topic Hierarchical text classification
data augmentation
large language model
graph attention network
ERNIE
url https://ieeexplore.ieee.org/document/10807255/
work_keys_str_mv AT hanchen adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning
AT yangsenzhang adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning
AT yurujiang adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning
AT ruixueduan adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning