Algraeph is a tool to manually align linguistic graphs, such as phrase structure trees or dependency graphs, where each node corresponds to a subsequence of the analyzed input sentence. It allows you to express the similarity between two graphs by aligning their nodes and attaching relation labels to these aligments.
Graphs are read from one or more graphbanks (or treebanks). Algraeph currently supports graphs in the general GraphML format and in the Alpino format (for Dutch). Alignment relations are user-defined. The alignments are stored in a simple XML format, which can be used for further processing. The result - a parallel graph corpus - is a useful data set for many tasks in computational linguistics and natural language processing such as automatic summarization, automatic translation, paraphrase extraction, recognizing textual entailment, etc. The DAESO library provides support for creating, processing and exploiting such aligned parallel corpora.
Algraeph is implemented in the Python programming language using the wxPython GUI toolkit. It has been tested on Mac OS X, GNU Linux and MS Windows, but should run on any platform which is supported by Python, wxPython and Graphviz.
So you managed to install Algraeph and now you are in hurry to see if this is of any use to you? Sounds familiar :-) Try this:
...
This section explains how to use Algraeph for manually aligning graphs.
...
Folding: When folding a node, all successors nodes are hidden, the folded node is marked by a black rectangle and the menu item becomes checked. When unfolding a node, all successors nodes are revealed, the black rectangle mark is removed, and the menu item becomes unchecked.
This section explains how to create your own input for Algraeph in the form of graphbanks and a parallel graph corpus. You can create your own XML files, but we recommend using the Daeso Framework Python library, which provides a convenient API for creating, reading, manipulating and writing parallel graph corpora.
Algraeph supports the general GraphML format. I suggest to have a look at the GraphML Primer, and at examples in the included "data/graphml" directory.
Notice however that there are a few additional requirements on top of those defined by the GraphML Schema/DTD:
A minimal example is presented below:
<?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <key id="label" for="node" attr.name="label" attr.type="string"/> <key id="tokens" attr.name="tokens" attr.type="string"/> <key id="root" for="graph" attr.name="root" attr.type="string"/> <graph id="g0" edgedefault="directed"> <data key="tokens">spam and eggs</data> <data key="root">n0</data> <node id="n0"> <data key="label">and</data> <data key="tokens">spam and eggs</data> </node> <node id="n1"> <data key="label">spam</data> <data key="tokens">spam</data> </node> <node id="n2"> <data key="label">eggs</data> <data key="tokens">eggs</data> </node> <edge source="n0" target="n1"/> <edge source="n0" target="n2"/> </graph> </graphml>
Algraeph represents a parallel graph corpus in a straight-forward XML format. If you want to use Algraeoh to annotate your own data, you will have to create your own parallel graph corpus in this format, either by hand or by program. The following is a minimal example of a parallel graph corpus:
<?xml version="1.0" encoding="utf-8"?> <parallel_graph_corpus> <corpus_meta_data> <comment>simple alignment example</comment> </corpus_meta_data> <graphbanks> <file id="1" format="graphml">simple-english-graphbank.xml</file> <file id="2" format="graphml">simple-dutch-graphbank.xml</file> </graphbanks> <node_relations> <relation>equals</relation> </node_relations> <aligned_graphs> <graph_pair from_bank_id="1" from_graph_id="g0" to_bank_id="2" to_graph_id="g0"> <aligned_nodes> <node_pair from_node_id="n0" relation="equals" to_node_id="n0"/> <node_pair from_node_id="n1" relation="equals" to_node_id="n1"/> <node_pair from_node_id="n2" relation="equals" to_node_id="n2"/> </aligned_nodes> <graph_meta_data> <comment>From the "Spam" sketch by Monty Python.</comment> </graph_meta_data> </graph_pair> </aligned_graphs> </parallel_graph_corpus>
The XML elements are explained in the following table:
Element: | Function: |
parallel_graph_corpus | The root container for a parallel graph corpus. |
corpus_meta_data | Optional element for specifying arbitary corpus meta data. Algraeph will not change its content. |
graphbanks | Defines the graphbanks as a one or more "file" elements. |
file | Specifies the filename for loading a graphbank, which may be a base name, a path relative to the location of the corpus file, or an absolute path (all in unix format with forward slashes). The manditory "id" attribute assign a unique id to every graph bank. The manditory "format" attribute specifies the format of the graphbanks, which can currently be either "graphml" or "alpino". |
node_relations | Defines the labels that can be assigned to node relations as one or more "relation" elements. |
relation | Defines a single node relation. |
aligned_graphs | Defines the aligned pairs of graphs as a list of zero or more "graph_pair" elements. |
graph_pair | Defines a pair of alligned graphs in terms of four manditory attributes. The "from_graphbank_id" is the id of the graphbank (cf. "file" element) which contains the source graph. The "from_graph_id" is the id of a graph within this graphbank. Similary, "to_graphbank_id" and "to_graph_id" identify a unique target graph. |
aligned_nodes | Defines the aligned nodes as a list of zero or nore "node_pair" elements. |
node_pair | Defines a pair of aligned nodes in terms of three manditory attributes. The "from_node_id" is the id of a node in the source graph. The "to_node_id" is the id of a node in the target graph. The "relation" is the relation that holds between these nodes (cd. the "node_relations" element) |
graph_meta_data | Optional element for specifying arbitrary graph meta data. Except for the "comment" element, Algraeph will not change its contents. |
comment | An optional element for free-form comments regarding the graph alignment. Content corresponds to the Comments text box in Algraeph. |
This section contains reference information an all available menus and key board shortcuts.
Item: | Shortcut: | Function: |
Fold Node | Shift + Left Mouse Button | Hides or reveals successors nodes. This function is enabled in the pop-up menu when right-clicking on a node. |
Unfold All Node | Ctrl + U | All folded nodes become unfolded |
Auto Fold Equals | Automatically fold all nodes that are aligned with with an "equals" relation | |
Mark Aligned Nodes | Already aligned nodes are marked by rendering them in gray color, leaving unaligned nodes rendered in blue, red or black. | |
Mark Selected Nodes | Ctrl + M | Selected nodes are marked by a yellow background. |
Co-select Aligned Node | Ctrl + K | When selecting a node in the left graph, the aligned node in the right graph (if any) is automatically selected as well. |
Order Nodes | Enforce strict ordering of nodes (alpino format only) | |
Label Edges | Show edge labels | |
Mark Selected Alignments | Ctrl + A | The alignments of the currently selected nodes (if any) are rendered in yellow. |
Hide Alignments | Ctrl + H | Hide all alignments, except those resulting from the Mark Selected Alignments option. |
Save image | Save the current view to a file (default image format is png). |