Temporal Processing of News: Annotation of Temporal Expressions, Verbal Events and Temporal Relations

Georgiana Marsic (2011) Temporal Processing of News: Annotation of Temporal Expressions, Verbal Events and Temporal Relations . PhD Thesis, University of Wolverhampton, UK

Abstract

The ability to capture the temporal dimension of a natural language text is essential to many natural language processing applications, such as Question Answering, Automatic Summarisation, and Information Retrieval. Temporal processing is a field of Computational Linguistics which aims to access this dimension and derive a precise temporal representation of a natural language text by extracting time expressions, events and temporal relations, and then representing them according to a chosen knowledge framework.

This thesis focuses on the investigation and understanding of the different ways time is expressed in natural language, on the implementation of a temporal processing system in accordance with the results of this investigation, on the evaluation of the system, and on the extensive analysis of the errors and challenges that appear during system development. The ultimate goal of this research is to develop the ability to automatically annotate temporal expressions, verbal events and temporal relations in a natural language text.

Temporal expression annotation involves two stages: temporal expression identification concerned with determining the textual extent of a temporal expression, and temporal expression normalisation which finds the value that the temporal expression designates and represents it using an annotation standard. The research presented in this thesis approaches these tasks with a knowledge-based methodology that tackles temporal expressions according to their semantic classification. Several knowledge sources and normalisation models are experimented with to allow an analysis of their impact on system performance.

The annotation of events expressed using either finite or non-finite verbs is addressed with a method that overcomes the drawback of existing methods which associate an event with the class that is most frequently assigned to it in a corpus and are limited in coverage by the small number of events present in the corpus. This limitation is overcome in this research by annotating each WordNet verb with an event class that best characterises that verb.

This thesis also describes an original methodology for the identification of temporal relations that hold among events and temporal expressions. The method relies on sentence-level syntactic trees and a propagation of temporal relations between syntactic constituents, by analysing syntactic and lexical properties of the constituents and of the relations between them. The detailed evaluation and error analysis of the methods proposed for solving diĀ®erent temporal processing tasks form an important part of this research. Various corpora widely used by researchers studying diĀ®erent temporal phenomena are employed in the evaluation, thus enabling comparison with state of the art in the field. The detailed error analysis targeting each temporal processing task helps identify not only problems of the implemented methods, but also reliability problems of the annotated resources, and encourages potential reexaminations of some temporal processing tasks.

BibTeX
    @PhdThesis{marsic-phd,
      author =   {Georgiana Mar\c{s}ic},
      title =    {Temporal Processing of News: Annotation of Temporal 
                  Expressions, Verbal Events and Temporal Relations},
      year =     {2011},
      address =  {Wolverhampton, UK},
      month =    {December},
      URL =      {http://clg.wlv.ac.uk/papers/marsic-thesis.pdf}
    }