CoTHSSum: Structured long-document summarization via chain-of-thought reasoning and hierarchical segmentation

Abstract Long-document summarization remains a challenging task for large language models (LLMs), which often suffer from input length constraints, semantic incoherence, and factual hallucinations when processing extensive and complex texts. In this paper, we propose a novel summarization framework...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoyong Chen, Zhiqiang Chen, Shi Cheng
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00041-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Long-document summarization remains a challenging task for large language models (LLMs), which often suffer from input length constraints, semantic incoherence, and factual hallucinations when processing extensive and complex texts. In this paper, we propose a novel summarization framework that integrates hierarchical input segmentation with Chain-of-Thought (CoT) prompting to guide LLMs through structured, interpretable reasoning. Our method decomposes long documents into semantically coherent segments, applies CoT-based prompting for intermediate summary reasoning, and employs structure-guided decoding to compose high-quality final summaries. We evaluate our approach across five diverse datasets, including scientific, biomedical, governmental, literary, and legal domains, using strong LLM backbones such as Qwen, LLaMA, and Phi. Experimental results demonstrate that our method consistently outperforms state-of-the-art baselines across ROUGE, BLEU, BERTScore, and factual consistency metrics. Ablation and human evaluation further confirm the complementary benefits of CoT reasoning and hierarchical structure, offering a reliable and scalable solution for summarizing complex long-form content.
ISSN:1319-1578
2213-1248