daeso logo
Detecting And Exploiting Semantic Overlap

Internal Links

Daeso Home

Pycornetto

External Links

Cornetto project

EuroWordnet for Dutch

ewnpya command line interface to Eurowordnet for Dutch written in Python

TST Centrale

Word Similarity Demo

Compute similarity measures for a pair of Dutch words.

First word:
Second word:

This demo computes three corpus-based word similarity measures for a given pair of Dutch words:

The words should in fact be lemmas. That is, "kat" (cat) and "eten" (eat) will be recognized, but inflected forms like "katten" (cats) or "eet" (eats) are not. You can use plain words like "kat" (cat) and "hond" (dog), but you can also add a part-of-speech (verb, noun or adj) as in "varen:noun" or "varen:verb", and even a particular sense as in "hond:noun:1".

If a word is unknown (i.e. not present in the Cornetto database) or has count of zero, the similarity measure may become undefined, and the value "None" is returned. For the technical details of the implementation, see the relevant part of the Pycornetto API documentation.

The implementation relies on three resources:

  • The Cornetto database: a lexical-semantic database for Dutch which combines EuroWordnet for Dutch and Referentiebestand Nederlands (RBN).
  • Word counts: a list of lemma plus part-of-speech tag combinations and their counts over more than 500M words as produced by Antal van den Bosch and Walter Daelemans.
  • Pycornetto: a Python interface to the Cornetto interface, which contains additional functionality such as these word similarity measures.

Please don't abuse this demo. If you want to compute similarity for a substantial number of words, get a legal copy of the Cornetto database from the TST centrale and use it with our free Pycornetto software to process your own data.