![]() | |||||||||||||||||||||||
|
CAST: Term-based summariser v1.0The CAST: Term-based summariser is a command line reduced version of the summarisation environment developed in the CAST project which implements several term-weighting methods that can be used to produce summaries. In addition the program offers a wide range of options which can influence the way a summary produced. If you want to try the program without installing it on your computer you can check out our online web demo which offers very similar functionalities. InstallationAll you need to do in order to install this program is to download the archive and unpack it in a directory (folder for Windows users :-) where you want to run the program. RequirementsThe program requires at least J2SE Runtime Environment (JRE) 1.4.2 or higher which can be freely downloaded from http://java.sun.com. The program will also run if you have installed J2SE Development Kit (JDK). It will probably run with other versions of Java as well. The program was successfully tested using JDK1.4.2 and JDK1.5 running under Linux (gentoo). It should also run on any operation system which has the necessary Java Virtual Machines. Running the programIn order to run the program you need to type a command similar to the following: java -cp CAST.jar:jwnl/jwnl.jar:jwnl/commons-logging.jar CAST.cmd.TermWeightingSummariser <arguments>
or if you run in Windows:
java -cp CAST.jar;jwnl/jwnl.jar;jwnl/commons-logging.jar CAST.cmd.TermWeightingSummariser <arguments>
You might also need to add java in your PATH or indicate the full path
to the program. The program accepts several arguments. If started
without any argument, it displays the available arguments.
Example: java -cp CAST.jar:jwnl/jwnl.jar:jwnl/commons-logging.jar CAST.cmd.TermWeightingSummariser --weighting=tfridf --collection-info=data/reuters-mi-10-word.txt --document-frequency=data/reuters-df-10-word.txt --comp-rate=15 --output=txt --token=word --display-terms=10 --evaluate-with=pr 476820newsML-to-ann-done.xml
Produces a plain text 15% summary from 476820newsML-to-ann-done.xml
file using tfridf weighting method, the collection information is the
one from data/reuters-mi-10.word.txt, the document frequency is
data/reuters-df-10-word.txt. Provided that the input file contains the
necessary annotation, the quality of the summary is evaluated using
precision and recall. In addition the top 10 terms are also
displayed. Simple isn't it :)
Command line argumentsThe arguments control the behaviour of the program. In order to obtain a list of valid arguments run the program without any parameter. The complete list of these parameters can be found here. Legal informationCAST: Term-based summariser is distributed without any warranty, either expressed or implied. The program can be freely used for research purposes. If you find the program useful please acknowledge its use in your research. Contact InformationSend bug reports, questions, feature requests etc. to Constantin Orasan (email: C.Orasan@wlv.ac.uk). Back |
||||||||||||||||||||||
| Last changed: 20 Apr 2005 |