• Title: Computer Assisted Patient Note Scoring
    Funding organisation: National Board of Medical Examiners
    Ref: CAPTNS
    Period: February 2010 - June 2011; Ongoing since February 2015
    Representative publication: Brief reference in NBME Annual Report (2012)

This project is concerned with automatic assessment of student responses to the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills tests. These responses comprise patient notes taken by participants as a result of their interaction with actors playing the role of patients. The automatic scoring of these notes was built on know-how gained in the CAID project, applied to noisy sources containing a high level of linguistic variation with respect to spelling, abbreviation, and clinical "shorthand". This work was presented at the 2012 meeting of the National Council on Measurement in Education, held in Vancouver, Canada.


The goal of this project was to develop language technology to convert input documents into a more accessible form for readers with autistic spectrum disorder. This is done by removing obstacles to reading comprehension such as structural complexity (including long and complex sentences) and semantic ambiguity (including anaphora and figurative language). The text of the converted documents is supplemented with illustrative images, indicative summaries, document navigation aids, and pre-reading tasks aimed at improving reading comprehension. The system is personalisable to the needs of different users.

The software developed in the FIRST project is called OpenBook.

Machine Learning for the Study of Language Change

  • Title: Machine learning for the study of language change
    Funding organisation: University of Wolverhampton
    Ref: NA
    Period: December 2012 - March 2013
    Representative publication: doi: 10.1007/978-3-642-39593-2_24

In this work, a machine learning approach was applied to derive linguistic features that contribute to the success of a method to assign texts to categories representing different historical periods. The features that best discriminate between different categories of text are inferred to be salient for studies of language change. Experiments were conducted on the British portion of the ‘Brown family’ of corpora, using 30 different stylistic features. Performance of the classifier with feature selection using the Mann-Whitney U test and the CfsSubsetEval attribute selection algorithm was evaluated.


  • Title: Computer Aided Item Development
    Funding organisation: National Board of Medical Examiners
    Ref: CAID
    Period: March 2007 - March 2011
    Representative publication: doi: 10.1093/llc/fqr034

My involvement in this project concerned information extraction from clinical assessment items. The goal was to populate a database with information about clinical findings and the symptoms, anatomical locations, underlying body systems, and qualifying information associated with them. This research motivated my development of a systematic approach to text simplification.


This project involved information extraction from email messages and specialised websites. The goal was to populate a database with information about employment vacancies, forthcoming conferences, and software and resources relevant to the field of computational linguistics. The experience of working on this project motivated my development of a method for named entity recognition in the open domain.


  • Title: Named Entity Recognition in the Open Domain
    Funding organisation: University of Wolverhampton
    Ref: NA
    Period: August 1999 - July 2003
    Representative publication: here

Named entity recognition often exploits specific resources for the detection of particular types of named entity. This makes developed systems ineffective in settings where the concepts/entities of interest are not known a priori. In this project, patterns proposed by Hearst (1992) are submitted as google queries to identify the hypernyms of all named entities occurring in a document. The hypernyms are clustered by their taxonomic similarity into general classes which correspond to particular concepts/types of entity. As a result, the system is able to identify the specific concepts most likely to be relevant in any document. The hypernym collection patterns, together with elements of the clusters, can then be used to tag the identified named entities accordingly.


  • Title: Mitkov's Anaphora Resolution System
    Funding organisation: University of Wolverhampton
    Ref: NA
    Period: October 1998 - ongoing
    Representative publication: doi:10.1007/3-540-45715-1_15

This project concerned the implementation, improvement, optimisation, and evaluation of Mitkov's (1998) knowledge-poor approach to anaphora resolution. This research motivated the development of systems to classify the function of the pronoun it (Evans, 2001) and to detect the animacy of noun phrases in English (Orasan and Evans, 2007).

  • Front page

    General information about me.

  • Publications

    Bibliographic information and electronic versions of my research papers and technical reports.

  • Resources

    Information about language technologies and resources that I have helped to develop.