From extracts to abstracts: Human summary production operations for computer-aided summarisation

Laura Hasler (2007) From extracts to abstracts: Human summary production operations for computer-aided summarisation. PhD Thesis, University of Wolverhampton, UK

Abstract

This thesis is concerned with the field of computer-aided summarisation, which has emerged at the confluence of the separate but related fields of human and automatic summarisation. Due to the poor quality of the readability and coherence of automatically produced extracts, computer-aided summarisation (CAS) is a viable working option to fully automatic summarisation. CAS allows a human summariser to post-edit automatically produced extracts to improve their readability and coherence. In order to best utilise the concept of computer-aided summarisation, reliable ways of improving the coherence and readability of extracts when transforming them into abstracts must be established.

To achieve this, a corpus-based analysis of the operations a human summariser applies to extracts to transform them into abstracts is presented. The corpus developed here is a corpus of pairs of news texts annotated for important information (i.e., human-produced extracts) and the human-produced abstracts corresponding to these extracts. The creation of this corpus simulates the computer-aided summarisation process to enable a reliable investigation into the operations used. A detailed classification of human summary production operations is proposed, with examples which highlight the common linguistic realisations and functions of the operations identified in the corpus. The classification is then used as a basis for guidelines which can be given to users of computer-aided summarisation systems in order to ensure that the summaries they produce are of a consistently high quality.

The human summary production operations are applied to extracts using the guidelines in order to evaluate them. Evaluation is performed using a metric developed for Centering Theory, a discourse theory of local coherence and salience, which constitutes a new evaluation method. This is appropriate because existing methods of evaluating summaries are unsuitable. A set of both automatic and human- produced extracts and their corresponding abstracts are evaluated, and a comparison is made with evaluations given by a human judge. The evaluation shows that when the operations are applied to extracts using the guidelines, there is an improvement in the readability and coherence of the resulting abstracts.

BibTeX
    @PhdThesis{hasler-phd,
      author =   {Laura Hasler},
      title =    {From extracts to abstracts: Human summary production 
                  operations for computer-aided summarisation},
      school =   {School of Humanities, Languages and Social Sciences, 
                  University of Wolverhampton},
      year =     {2007},
      address =  {Wolverhampton, UK},
      month =    {June},
      URL =      {http://clg.wlv.ac.uk/papers/hasler-thesis.pdf}
    }