CAST: Term-based summariser v1.0

The CAST: Term-based summariser is a command line reduced version of the summarisation environment developed in the CAST project which implements several term-weighting methods that can be used to produce summaries. In addition the program offers a wide range of options which can influence the way a summary produced. If you want to try the program without installing it on your computer you can check out our online web demo which offers very similar functionalities.

Installation

All you need to do in order to install this program is to download the archive and unpack it in a directory (folder for Windows users :-) where you want to run the program.

Requirements

The program requires at least J2SE Runtime Environment (JRE) 1.4.2 or higher which can be freely downloaded from http://java.sun.com. The program will also run if you have installed J2SE Development Kit (JDK). It will probably run with other versions of Java as well. The program was successfully tested using JDK1.4.2 and JDK1.5 running under Linux (gentoo). It should also run on any operation system which has the necessary Java Virtual Machines.

Running the program

In order to run the program you need to type a command similar to the following:

java -cp CAST.jar:jwnl/jwnl.jar:jwnl/commons-logging.jar CAST.cmd.TermWeightingSummariser <arguments>
or if you run in Windows:
java -cp CAST.jar;jwnl/jwnl.jar;jwnl/commons-logging.jar CAST.cmd.TermWeightingSummariser <arguments>
You might also need to add java in your PATH or indicate the full path to the program. The program accepts several arguments. If started without any argument, it displays the available arguments.

Example:

java -cp CAST.jar:jwnl/jwnl.jar:jwnl/commons-logging.jar CAST.cmd.TermWeightingSummariser --weighting=tfridf --collection-info=data/reuters-mi-10-word.txt --document-frequency=data/reuters-df-10-word.txt --comp-rate=15 --output=txt --token=word --display-terms=10 --evaluate-with=pr 476820newsML-to-ann-done.xml
Produces a plain text 15% summary from 476820newsML-to-ann-done.xml file using tfridf weighting method, the collection information is the one from data/reuters-mi-10.word.txt, the document frequency is data/reuters-df-10-word.txt. Provided that the input file contains the necessary annotation, the quality of the summary is evaluated using precision and recall. In addition the top 10 terms are also displayed. Simple isn't it :)

Command line arguments

The arguments control the behaviour of the program. In order to obtain a list of valid arguments run the program without any parameter. The complete list of these parameters can be found here.

Legal information

CAST: Term-based summariser is distributed without any warranty, either expressed or implied. The program can be freely used for research purposes. If you find the program useful please acknowledge its use in your research.

Contact Information

Send bug reports, questions, feature requests etc. to Constantin Orasan (email: C.Orasan@wlv.ac.uk).



Back
Last changed: 20 Apr 2005