Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning
Hierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Clas...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10807255/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841533410524266496 |
---|---|
author | Han Chen Yangsen Zhang Yuru Jiang Ruixue Duan |
author_facet | Han Chen Yangsen Zhang Yuru Jiang Ruixue Duan |
author_sort | Han Chen |
collection | DOAJ |
description | Hierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Classification method, EDTPA (ERNIE and Dynamic Threshold Pruning-based Adaptive classification), which leverages Large Language Models (LLMs) for data augmentation to mitigate imbalanced datasets. The model first uses Graph Attention Networks (GAT) to capture hierarchical dependencies among labels, effectively modeling structured relationships. ERNIE enhances the semantic representation of both the text and hierarchical labels, optimizing the model’s ability to process Chinese text. An attention mechanism strengthens the alignment between text and labels, improving accuracy. The model combines global and local information flows, while dynamic threshold pruning prunes low-probability branches, improving interpretability. Results on the Chinese Scientific Literature (CSL) dataset show EDTPA significantly outperforms baseline models in both Micro-F1 and Macro-F1 scores, effectively addressing data imbalance and improving classification performance. |
format | Article |
id | doaj-art-60f18b9d756b4fa0a666b98e4b02362f |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-60f18b9d756b4fa0a666b98e4b02362f2025-01-16T00:02:02ZengIEEEIEEE Access2169-35362024-01-011219364119365210.1109/ACCESS.2024.351995410807255Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold PruningHan Chen0https://orcid.org/0009-0008-9408-5979Yangsen Zhang1Yuru Jiang2https://orcid.org/0000-0002-0947-2640Ruixue Duan3https://orcid.org/0000-0002-4478-1692Institute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaInstitute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaInstitute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaInstitute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing, ChinaHierarchical Text Classification (HTC) is a challenging task where labels are structured in a tree or Directed Acyclic Graph (DAG) format. Current approaches often struggle with data imbalance and fail to fully capture rich semantic information. This paper proposes an Adaptive Hierarchical Text Classification method, EDTPA (ERNIE and Dynamic Threshold Pruning-based Adaptive classification), which leverages Large Language Models (LLMs) for data augmentation to mitigate imbalanced datasets. The model first uses Graph Attention Networks (GAT) to capture hierarchical dependencies among labels, effectively modeling structured relationships. ERNIE enhances the semantic representation of both the text and hierarchical labels, optimizing the model’s ability to process Chinese text. An attention mechanism strengthens the alignment between text and labels, improving accuracy. The model combines global and local information flows, while dynamic threshold pruning prunes low-probability branches, improving interpretability. Results on the Chinese Scientific Literature (CSL) dataset show EDTPA significantly outperforms baseline models in both Micro-F1 and Macro-F1 scores, effectively addressing data imbalance and improving classification performance.https://ieeexplore.ieee.org/document/10807255/Hierarchical text classificationdata augmentationlarge language modelgraph attention networkERNIE |
spellingShingle | Han Chen Yangsen Zhang Yuru Jiang Ruixue Duan Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning IEEE Access Hierarchical text classification data augmentation large language model graph attention network ERNIE |
title | Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning |
title_full | Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning |
title_fullStr | Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning |
title_full_unstemmed | Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning |
title_short | Adaptive Hierarchical Text Classification Using ERNIE and Dynamic Threshold Pruning |
title_sort | adaptive hierarchical text classification using ernie and dynamic threshold pruning |
topic | Hierarchical text classification data augmentation large language model graph attention network ERNIE |
url | https://ieeexplore.ieee.org/document/10807255/ |
work_keys_str_mv | AT hanchen adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning AT yangsenzhang adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning AT yurujiang adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning AT ruixueduan adaptivehierarchicaltextclassificationusingernieanddynamicthresholdpruning |