Research questions and problems to be addressed
This project will investigate the extent to which methods from computational
linguistics can be
used to automatically compile lists of resources. In addition to the system to
be developed, the
project will provide new insights into the discourse structure of email
messages and web pages, and
will create a corpus of emails and web pages containing information about
resources which will be
annotated for inter and intra document coreference, and for important notions,
such as names. Emerging
fields from computational linguistics like cross-document coreference and
multi-document
summarisation will be investigated. A new evaluation methodology will also be
elaborated. Templates
will be used to encode information about resources. Templates are normally
built by experts which
makes them expensive. A semi-automatic corpus-based template acquisition
process will be sought in this
project. All the modules to be developed in this project will be tuned to
process web pages. This will
be beneficial for other researchers processing similar texts.
|