System developed by PhD student performs best at international competition

In April, Georgiana Puscasu, member of the Research Group in Computational Linguistics, participated in the TempEval competition organised as part of the SemEval-2007 evaluation exercise. The competition put systems from world-leading NLP research centres to the test in a series of 3 tasks designed to evaluate their ability to identify several types of temporal relations in natural language texts. Georgiana's system TICTAC (Syntactico-Semantic Temporal Annotation Cluster) achieved the best scores for the majority of the TempEval tasks.

To gain an insight into the nature of the competition, we should start by defining the field on which it operates. Natural language processing (NLP) is a subfield of artificial intelligence and linguistics, with the goal of designing and building software that analyses, understands, and generates languages that humans use naturally, so that eventually people will be able to address computers as though they were addressing another person. This goal is not easy to reach. "Understanding" language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master.

What counts as understanding natural language might vary from an NLP area to another. In the case of temporal processing, the area Georgiana's research focuses on, natural language understanding means gaining access to its temporal structure. This involves the capability to identify the events and temporal expressions described in a text, to specify their temporal location and to order them with respect to a certain point in time or to other events. This capability is crucial to a wide range of NLP applications, from document summarisation and question answering to machine translation. As an English speaker you effortlessly understand from a newspaper article dated Monday, 23 May 2005 and stating Australian pop star Kylie Minogue announced Tuesday that she has breast cancer. She underwent surgery three days later at a Melbourne hospital, that the announcement was made on the 17th of May 2005, that the surgery took place on the 20th of May 2005, and that the surgery took place after the announcement and before the date of the article. Yet the derivation of these temporal relations presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures.

TempEval evaluated the capabilities of computer systems to perform this type of temporal reasoning and to discover several types of temporal relations among different temporal entities in natural language texts. It was organised within the SemEval-2007 campaign (formerly Senseval), an international workshop covering evaluation exercises for the semantic analysis of text that gives participants the chance to apply their research to real-world scenarios and to comparatively evaluate their systems. The goal of these evaluation exercises is to measure one or more "qualities" of an algorithm or a system, in order to determine if (or to what extent) the system answers the goals of its designers, or the needs of its users. The nature of the TempEval tasks gave Georgiana the opportunity to demonstrate the practical applicability and international appeal of her PhD research. Georgiana's top placement in this competition is not only a great accomplishment, but also a reliable validation of her research.

(c) 2006 - 2011 Research Group in Computational Linguistics
Last modified: May 29 2007