Algraeph User Manual

Erwin Marsi
Tilburg centre for Cognition and Communication
Tilburg University
The Netherlands
e.c.marsi@uvt.nl

(Version $Id: algraeph-user-manual.htm 409 2008-02-28 08:39:35Z emarsi $)
Even though this text is work in progress, it contains all essential information to work with Algraeph.

Contents

  1. Contents
  2. Summary
  3. Quick start
  4. Introduction
  5. Using Algraeph
  6. Creating your own graphbanks and parallel graph corpora
  7. Reference

Summary

Algraeph is a tool to manually align linguistic graphs, such as phrase structure trees or dependency graphs, where each node corresponds to a subsequence of the analyzed input sentence. It allows you to express the similarity between two graphs by aligning their nodes and attaching relation labels to these aligments.

Graphs are read from one or more graphbanks (or treebanks). Algraeph currently supports graphs in the general GraphML format and in the Alpino format (for Dutch). Alignment relations are user-defined. The alignments are stored in a simple XML format, which can be used for further processing. The result - a parallel graph corpus - is a useful data set for many tasks in computational linguistics and natural language processing such as automatic summarization, automatic translation, paraphrase extraction, recognizing textual entailment, etc. The DAESO library provides support for creating, processing and exploiting such aligned parallel corpora.

Algraeph is implemented in the Python programming language using the wxPython GUI toolkit. It has been tested on Mac OS X, GNU Linux and MS Windows, but should run on any platform which is supported by Python, wxPython and Graphviz.

Quick start

So you managed to install Algraeph and now you are in hurry to see if this is of any use to you? Sounds familiar :-) Try this:

Introduction

...

Using Algraeph

This section explains how to use Algraeph for manually aligning graphs.

...

Folding: When folding a node, all successors nodes are hidden, the folded node is marked by a black rectangle and the menu item becomes checked. When unfolding a node, all successors nodes are revealed, the black rectangle mark is removed, and the menu item becomes unchecked.

Creating your own graphbanks and parallel graph corpora

This section explains how to create your own input for Algraeph in the form of graphbanks and a parallel graph corpus. You can create your own XML files, but we recommend using the Daeso Framework Python library, which provides a convenient API for creating, reading, manipulating and writing parallel graph corpora.

Your own graphbanks

Algraeph supports the general GraphML format. I suggest to have a look at the GraphML Primer, and at examples in the included "data/graphml" directory.

Notice however that there are a few additional requirements on top of those defined by the GraphML Schema/DTD:

  1. Every node must have a "tokens" attribute which defines the token subsequences displayed in the text boxes at the bottom when a node is selected. In GraphML format, "tokens" is not a direct XML attribute of the "node" element, but character data contained in an embedded "data" element.
  2. Similarly, every graph must have a "tokens" attribute which defines the tokens displayed in the text boxes at the top of the window.
  3. Every graph must have a "root" attribute which defines a unique root node.
  4. Every node must have a "label" attribute.
  5. Edges may have a "label" attribute.
  6. These GraphML attributes must be appropriately defined by means of a "key" element.

A minimal example is presented below:

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">

  <key id="label" for="node" attr.name="label" attr.type="string"/>
  <key id="tokens" attr.name="tokens" attr.type="string"/>
  <key id="root" for="graph" attr.name="root" attr.type="string"/>
  
  <graph id="g0" edgedefault="directed">  
    <data key="tokens">spam and eggs</data>
    <data key="root">n0</data>
    
    <node id="n0">
      <data key="label">and</data>
      <data key="tokens">spam and eggs</data>
    </node>
    <node id="n1">
      <data key="label">spam</data>
      <data key="tokens">spam</data>
    </node>
    <node id="n2">
      <data key="label">eggs</data>
      <data key="tokens">eggs</data>
    </node>
    
    <edge source="n0" target="n1"/>
    <edge source="n0" target="n2"/>
    
  </graph>
</graphml>

Your own parallel graph corpora

Algraeph represents a parallel graph corpus in a straight-forward XML format. If you want to use Algraeoh to annotate your own data, you will have to create your own parallel graph corpus in this format, either by hand or by program. The following is a minimal example of a parallel graph corpus:

<?xml version="1.0" encoding="utf-8"?>
<parallel_graph_corpus>
  <corpus_meta_data>
    <comment>simple alignment example</comment>
  </corpus_meta_data>
  <graphbanks>
    <file id="1" format="graphml">simple-english-graphbank.xml</file>
    <file id="2" format="graphml">simple-dutch-graphbank.xml</file>
  </graphbanks>
  <node_relations>
    <relation>equals</relation>
  </node_relations>
  <aligned_graphs>
    <graph_pair from_bank_id="1" from_graph_id="g0" to_bank_id="2" to_graph_id="g0">
      <aligned_nodes>
        <node_pair from_node_id="n0" relation="equals" to_node_id="n0"/>
        <node_pair from_node_id="n1" relation="equals" to_node_id="n1"/>
        <node_pair from_node_id="n2" relation="equals" to_node_id="n2"/>
      </aligned_nodes>
      <graph_meta_data>
        <comment>From the "Spam" sketch by Monty Python.</comment>
      </graph_meta_data>
    </graph_pair>
  </aligned_graphs>
</parallel_graph_corpus>

The XML elements are explained in the following table:

Element: Function:
parallel_graph_corpus The root container for a parallel graph corpus.
corpus_meta_data Optional element for specifying arbitary corpus meta data. Algraeph will not change its content.
graphbanks Defines the graphbanks as a one or more "file" elements.
file Specifies the filename for loading a graphbank, which may be a base name, a path relative to the location of the corpus file, or an absolute path (all in unix format with forward slashes). The manditory "id" attribute assign a unique id to every graph bank. The manditory "format" attribute specifies the format of the graphbanks, which can currently be either "graphml" or "alpino".
node_relations Defines the labels that can be assigned to node relations as one or more "relation" elements.
relation Defines a single node relation.
aligned_graphs Defines the aligned pairs of graphs as a list of zero or more "graph_pair" elements.
graph_pair Defines a pair of alligned graphs in terms of four manditory attributes. The "from_graphbank_id" is the id of the graphbank (cf. "file" element) which contains the source graph. The "from_graph_id" is the id of a graph within this graphbank. Similary, "to_graphbank_id" and "to_graph_id" identify a unique target graph.
aligned_nodes Defines the aligned nodes as a list of zero or nore "node_pair" elements.
node_pair Defines a pair of aligned nodes in terms of three manditory attributes. The "from_node_id" is the id of a node in the source graph. The "to_node_id" is the id of a node in the target graph. The "relation" is the relation that holds between these nodes (cd. the "node_relations" element)
graph_meta_data Optional element for specifying arbitrary graph meta data. Except for the "comment" element, Algraeph will not change its contents.
comment An optional element for free-form comments regarding the graph alignment. Content corresponds to the Comments text box in Algraeph.

Reference

This section contains reference information an all available menus and key board shortcuts.

View Menu

Item: Shortcut: Function:
Fold Node Shift + Left Mouse Button Hides or reveals successors nodes. This function is enabled in the pop-up menu when right-clicking on a node.
Unfold All Node Ctrl + U All folded nodes become unfolded
Auto Fold Equals Automatically fold all nodes that are aligned with with an "equals" relation
Mark Aligned Nodes Already aligned nodes are marked by rendering them in gray color, leaving unaligned nodes rendered in blue, red or black.
Mark Selected Nodes Ctrl + M Selected nodes are marked by a yellow background.
Co-select Aligned Node Ctrl + K When selecting a node in the left graph, the aligned node in the right graph (if any) is automatically selected as well.
Order Nodes Enforce strict ordering of nodes (alpino format only)
Label Edges Show edge labels
Mark Selected Alignments Ctrl + A The alignments of the currently selected nodes (if any) are rendered in yellow.
Hide Alignments Ctrl + H Hide all alignments, except those resulting from the Mark Selected Alignments option.
Save image Save the current view to a file (default image format is png).