CoTHSSum: Structured long-document summarization via chain-of-thought reasoning and hierarchical segmentation
Abstract Long-document summarization remains a challenging task for large language models (LLMs), which often suffer from input length constraints, semantic incoherence, and factual hallucinations when processing extensive and complex texts. In this paper, we propose a novel summarization framework...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44443-025-00041-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Long-document summarization remains a challenging task for large language models (LLMs), which often suffer from input length constraints, semantic incoherence, and factual hallucinations when processing extensive and complex texts. In this paper, we propose a novel summarization framework that integrates hierarchical input segmentation with Chain-of-Thought (CoT) prompting to guide LLMs through structured, interpretable reasoning. Our method decomposes long documents into semantically coherent segments, applies CoT-based prompting for intermediate summary reasoning, and employs structure-guided decoding to compose high-quality final summaries. We evaluate our approach across five diverse datasets, including scientific, biomedical, governmental, literary, and legal domains, using strong LLM backbones such as Qwen, LLaMA, and Phi. Experimental results demonstrate that our method consistently outperforms state-of-the-art baselines across ROUGE, BLEU, BERTScore, and factual consistency metrics. Ablation and human evaluation further confirm the complementary benefits of CoT reasoning and hierarchical structure, offering a reliable and scalable solution for summarizing complex long-form content. |
|---|---|
| ISSN: | 1319-1578 2213-1248 |