An history of relevance in unsupervised summarization

Carichon, Florian; Caporossi, Gilles

Automatic document summarization aims at creating a shorter version of one or more documents to help users digest large amounts of information more easily by highlighting the most relevant material. Unsupervised methods are among the most suitable for this task since they do not require prior human intervention for condensing information. Therefore, this article provides a detailed analysis of the progress of unsupervised methods applied to document summarization, offering a better comprehension of the underlying fundamental principles that drive these approaches. With this goal in mind, this review addresses several important aspects. First it gives an overview of the field, with the related concepts, methods, and scenarios that allow an understanding of their context of application, their evaluation and their evolution over time. It also provides a new typology of analysis, clarifying the link between the task of document summarization and the definition of relevance, as seen in information theory, and how it influences the very construction of various systems. In-depth reflections are made on the relationship between certain contextual factors such as purpose and audience, and how they not only affect relevance, and thus the way important material is selected, but also ultimately influence the summarization task and its evaluation. Finally, we provide recommendations and research directions for integrating these insights into the field of automatic document summarization.

Published November 2023 , 37 pages

Research Axis

Axis 1: Data valuation for decision making

Research application

Marketing (business intelligence, revenue management, recommendation systems)

Document

G2353.pdf (400 KB)

GERAD

G-2023-53

An history of relevance in unsupervised summarization

Florian Carichon and Gilles Caporossi

Research Axis

Research application

Document