daeso logo
Detecting And Exploiting Semantic Overlap

Page Contents





Internal Links


Project Proposal(PDF)





Multi-Document Summarization Demo

Sentence Compression Demo

Word Similarity Demo

Dutch Sentence Fusion Data

Ne der lands (Dutch)

External Links

Stevin programme

Project description at KNAW Onderzoek Informatie

Imogen, a project closely related to Daeso



DAESO: Detecting And Exploiting Semantic Overlap

Project Summary

The well-known fact that similar information can be expressed in many different ways is one of the major challenges in building robust NLP applications. It is commonly assumed that such applications can be improved with knowledge of how natural language expressions relate to each other, for instance in terms of paraphrases (same semantic content, different wording) or entailments (one expression implied by the other). DAESO investigates the detection of semantic overlap between Dutch sentences and the exploitation of this knowledge in a range of NLP applications. For this purpose, tools will be developed for the automatic alignment and classification of semantic relations (between words, phrases and sentences) for Dutch, as well as for a Dutch text-to-text generation application which fuses related sentences into a single grammatical sentence, which may be a generalization, a specification or a reformulation of the input sentences. To facilitate development and testing of these tools, an annotated monolingual Dutch parallel/comparable corpus of 1M words will be developed, consisting of pairs of texts that express comparable information. The utility of the resources and tools will be demonstrated in the context of three applications: (1) question-answering systems (improved recall, more complete answers), (2) information extraction (improved recall), and (3) summarization (beyond extraction: sentence compression, sentence fusion, anaphora resolution).

For more information, see the full Project Proposal(PDF).



Research funded by the Dutch-Flemish Research Programme for Dutch Language and Speech Technology STEVIN(Spraak- en Taaltechnologische Essentiele Voorzieningen In het Nederlands).


Three years, from October 2006 until October 2009.