SummaryVisualisation is a program designed to view and compare files annotated for the CAST project. It has to be pointed out that this program is not a tool for annotation. If you need to annotate files try PALinkA. This program is meant to visualise files annotated in our corpus. If you think this program can be useful for you, but the tag set is different, please contact me and I might be able to help.


Contents


Installation

The program is written in Java, so you will need to have Java installed on your computer. If you don't intend to develop programs in Java the runtime environment is enough for you. The program is developed using jdk1.4, so you will need this version or a newer one. There are several sources for java, one of them is Sun. Given that the program is written in Java you can run it on any computer with a Java Virtual Machine (or at least theoretically). We tested the program on several versions of Linux and on Windows 95.

Another resource you need in order to run this program is The Xerces Java Parser 1.4.4. Please note the program will not work with Xerces2 Java Parser. For your convenience you can download the parser from here.

Once you have Java installed on your computer and The Xerces Java Parser 1.4.4, download the SummaryVisualisation.jar file on your computer.


Running the program

In order to run the program you need to make sure that both xerces.jar and SummaryVisualisation.jar are in your CLASSPATH. If you saved the files in the same folder the easiest way to run the program is

java -cp SummaryVisualisation.jar:xerces.jar gui.VisualisationTool (unix)
java -cp SummaryVisualisation.jar;xerces.jar gui.VisualisationTool (windows)

Other options are:

java -cp <path to summary visualisation>SummaryVisualisation.jar:<path to xerces>xerces.jar gui.VisualisationTool (unix)
java -cp <path to summary visualisation>SummaryVisualisation.jar;<path to xerces>xerces.jar gui.VisualisationTool (windows)
where: <path to summary visualisation> is the path to the SummaryVisualisation.jar file, and <path to xerces> it the path to the xerces.jar file

You can also set the CLASSPATH:

export CLASSPATH=$CLASSPATH:<path to summary visualisation> SummaryVisualisation.jar:<path to xerces>xerces.jar (bash)
CLASSPATH=%CLASSPATH;<path to summary visualisation> SummaryVisualisation.jar;<path to xerces>xerces.jar (Windows batch file)
and then run java gui.VisualisationTool

How to use the program

As already mentioned, this program is for visualising and comparing texts annotated in our corpus. Figure 1 presents a screen shot of the tool. Please click on the image to see all the details.

Figure 1: The main screen of the tool

As can be seen, the screen is devided in two parts. In each part, it is possible to load a file. Each sentence is displayed on one line, preceded by its ID. If the sentence is marked in any way, the ID will be coloured accordingly. RED for ESSENTIAL sentences, BLUE for IMPORTANT sentences and GREEN for REFERRED sentences (for a explanation see the guidelines). The removed parts of the sentences are indicated on the screen with strikethrough font.

If comments were added by annotators, they are printed after the sentence. If the sentence is linked to another one, this fact is also indicated at the end of the sentence.


The menus

.
The File menu allows to load an annotated file and exit from the program. The Open left menu loads a text to the left box, and as expected Open right loads in the right box. If you want to exit the program select Exit
The Operations menu allows the user to compare the two texts loaded. If only one text is loaded, the comparison cannot be performed (obviously!). If you load two different files, not two different annotations of the same file the results are unpredictable (but I don't think there is any danger to wipe you HDD, but you never know - So you've been warned!)

The Compare option performs the comparison. As a result of comparison the colours of the text changes. Those sentences where there is a disagreement are marked with RED background. If there is agreement between annotators the sentence is marked with light blue background. Please note that if none of the annotators marks a sentence, then that sentence is not marked in any way.

Select the Show only marked sentences if you want to have only those sentences which are marked displayed. After you select this option you need to click Refresh in order to see the change.

In addition to comparing the labelling of the sentences, the program also computes the cosine distance between the two texts. In order to be able to compute it, you need to load the document frequency file by selecting Load document frequency. This operation needs to be done only once, after that you can compare as many text you want, as long as you don't close the program. If the document frequency is loaded, whenever you select Compare the program displays two values in at the standard error. The first one represents the similarity between the extracts which contain both ESSENTIAL and IMPORTANT sentences, whereas the second one is only for ESSENTIAL sentences.
The Help menu provides information about the program and allows you to check if there is any new version available. The How to use the program contains more or less the same information as the How to use the program section in this document. The Check for updates menu check if there is a new version of the program. During this checking no information from your computer is sent. Because the program need to retrieve a file from the Internet you need to be connected to the Internet. If an update is found, it will not be downloaded automatically. You will have to download and install it.

Legal issues

The program is free for research purposes. Please note that the program is distributed AS-IT-IS and no responsibility is accepted for any damage, harm, loss of data or any injury caused directly or indirectly by this program.


Contact me

If you have comments about the program or this document, please contact me on C.Orasan@wlv.ac.uk.

Last changed: 06 Sep 2004