Ne der lands (Dutch)
Imogen, a project closely related to Daeso
The well-known fact that similar information can be expressed in many different ways is one of the major challenges in building robust NLP applications. It is commonly assumed that such applications can be improved with knowledge of how natural language expressions relate to each other, for instance in terms of paraphrases (same semantic content, different wording) or entailments (one expression implied by the other). DAESO investigates the detection of semantic overlap between Dutch sentences and the exploitation of this knowledge in a range of NLP applications. For this purpose, tools will be developed for the automatic alignment and classification of semantic relations (between words, phrases and sentences) for Dutch, as well as for a Dutch text-to-text generation application which fuses related sentences into a single grammatical sentence, which may be a generalization, a specification or a reformulation of the input sentences. To facilitate development and testing of these tools, an annotated monolingual Dutch parallel/comparable corpus of 1M words will be developed, consisting of pairs of texts that express comparable information. The utility of the resources and tools will be demonstrated in the context of three applications: (1) question-answering systems (improved recall, more complete answers), (2) information extraction (improved recall), and (3) summarization (beyond extraction: sentence compression, sentence fusion, anaphora resolution).
For more information, see the full Project Proposal(PDF).
Research funded by the Dutch-Flemish Research Programme for Dutch Language and Speech Technology STEVIN(Spraak- en Taaltechnologische Essentiele Voorzieningen In het Nederlands).
Three years, from October 2006 until October 2009.
Daeso paper Automatic analysis of semantic similarity in comparable text through syntactic tree matching accepted for COLING 2010 in Beijing.
Release of Algraeph version 1.0
Source release of Hitaext version 1.0
Publication of Samengevat door de computer in Dutch newspaper Spits
Release of Dutch sentence fusion data
Pycornetto has moved to its new homeat Google code!
New release of Pycornetto (version 0.6) for compatibility with latest NetworkX 1.0
Added slides from DAESO presentation at CLIN 2010
Released Pycornetto version 0.5
Added paper Reducing Redundancy in Multi-document Summarization Using Lexical Semantic Similaritypresented at 2009 Workshop on Language Generation and Summarisation (ULG+Sum 2009), Singapore.
Added slides from presentations Daeso site visit
New online sentence compression demo
Added slides from presentations at ENLG09.
Minor update of Pycornetto (version 0.4.3) solves a bug occurring with the latest 0.99 release of NetworkX
Minor update of Pycornetto (version 0.4.2) solves a bug in the word similarity functions in combination with the latest 0.99 release of NetworkX
Two Daeso papers accepted for ENLG2009workshop at EACL: Is sentence compression an NLG task?and Clustering and Matching Headlines for Automatic Paraphrase Acquisition.
New online word similarity demofor Dutch.
First public release of Pycornetto(version 0.4), a Python interface to the Cornetto database, including corpus-based word similarity measures
The Dutch Multi-Document Summarization Webdemo is online. You can try it out here.
Added recent presentation Sentence alignment in comparable text using shallow featurestto publications
Daeso paper Query-based sentence fusion is better defined and leads to more preferred results than generic sentence fusionaccepted for The 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, June 15-20, 2008, added to publications.
First public release of Algraeph(version 0.6.0), a tool for manual alignment of linguistic graphs
Added two new Daeso presentations to publicationspage: Detecting semantic overlap: Announcing a Parallel Monolingual Treebank for Dutch, and in Annotating a Parallel Monolingual Treebank with Semantic Similarity Relations.
Added recent Daeso presentations to publicationspage. New results are presented in Question-Driven Sentence Fusion is a Well-Defined Task. But the Real Issue is: Does it matter?, and in Shallow approaches to sentence alignment in comparable text.
Added the Daeso paper Annotating a parallel monolingual treebank with semantic similarity relationsaccepted for oral presentation at The Sixth International Workshop on Treebanks and Linguistic Theories (TLT'07), Bergen, Norway, December 7-8, 2007 to publications
First public release of Hitaext, a graphical tool for manually aligning pairs of text documents with XML markup.
Added the Daeso poster Detecting Semantic Overlap: Annotating a parallel monolingual treebank with semantic similarity relations, which is to be presented at 2nd STEVIN Program Meeting, to publications
Official start of the Daeso project