Package cornetto :: Module cornet :: Class Cornet
[hide private]
[frames] | no frames]

Class Cornet

source code

object --+
         |
        Cornet
Known Subclasses:

The Cornet class exposes the Cornetto xml database

Most public methods require input in the form of a shorthand for specifying lexical units and relations, as described below.

Lexical units specifications

A specification of lexical units consists of three parts, separated by a single colon (':') character:

  1. Spelling form (i.e. a word)

    This can be any string without white space

  2. Syntactic category (optional)

    This can be any of 'noun', 'verb' or 'adj'.

  3. A sense (optional)

    This is number which distinguishes the particular word sense

Examples of valid lexical unit specifications are:

Relation specifications

A specification of a relation consists of two parts:

  1. Relation name (optional)

    The name of a Wordnet relation between two synsets. See the Cornetto documentation for the available relations. If not given, all relations are tried. The special relation "SYNONYM" holds between all members of the same synset. The relation name is not case-sensitive; you can use lower case.

  2. Depth (optional)

    A digit ('0' to '9') or the plus sign ('+'). This represents the depth of the relations that are considered during search. In other words, the maximal number of links allowed. If not given a default value of 1 is used. The plus represents the system maximum (currently 9).

A relation specification must have a name, a depth or both. Valid relation specification include:

Instance Methods [hide private]
 
__init__(self, cdb_lu=None, cdb_syn=None, output_format='spec', max_depth=9)
Create a new Cornet instance
source code
 
open(self, cdb_lu, cdb_syn, verbose=False)
Open and parse Cornetto database files
source code
 
ask(self, query, format=None)
Pose a query about lexical units to the Cornetto database
source code
list
get_lex_units(self, spec, format=None)
Get all lexical units which satisfy this specification
source code
dict
get_related_lex_units(self, lu_spec, rel_spec, format=None)
For all specified lexical units, find all lexical units related by the specified relation.
source code
list
test_lex_units_relation(self, from_lu_spec, rel_spec, to_lu_spec, format=None)
Test if certain relation(s) hold between certain lexical units by searching for a a path from any of the source lexical units to any of target lexical unit(s) along one or more of the specified relation(s)
source code
list
get_synsets(self, spec, format=None)
Get all synsets containing lexical units which satisfy a certain specification.
source code
dict
get_related_synsets(self, lu_spec, rel_name=None, format=None)
For all synsets containing lexical units satisfying this specification find the related synsets along this relation.
source code
string or None
get_lex_unit_by_id(self, c_lu_id, format=None)
Get lexical unit by id
source code
list or None
get_synset_by_id(self, c_sy_id, format=None)
Get synset by id
source code
dict
all_common_subsumers(self, lu_spec1, lu_spec2, rel_name='HAS_HYPERONYM', format=None)
Finds all common subsumers of two lexical units over the given relation.
source code
list
least_common_subsumers(self, lu_spec1, lu_spec2, rel_name='HAS_HYPERONYM', format=None)
Finds the least common subsumers of two lexical units over the given relation, that is, those common subsumers of which the lenght of the path (in edges) from the first lexical unit to the subsumer to the second lexical unit is minimal.
source code
 
set_output_format(self, format='spec')
Change the default output format
source code
 
set_max_depth(self, max_depth=9)
Sets a limit on the maximal depth of searches for related lexical units where no relation name is specified.
source code
 
_split_query(self, query) source code
 
_split_unit_spec(self, spec) source code
 
_split_rel_spec(self, spec) source code
 
_transitive_closure(self, lus, rel_name)
Computes the transitive closure of a set of lexical units over a certain relation.
source code
 
_bidirectional_shortest_path(self, from_lus, to_lus, rel_name, depth) source code
 
_reconstruct_path(self, pred, common_lu, succ, format=None) source code
 
_search_related_lex_units(self, from_lu, rel_name, depth, lu_formatter, rel_formatter, path=[]) source code
 
_get_lex_unit_formatter(self, format=None) source code
 
_lu_to_spec(self, lu) source code
 
_get_relation_formatter(self, format=None) source code
 
_rel_to_spec(self, edge) source code
 
_rel_to_xml(self, edge) source code
 
_get_synset_formatter(self, format=None) source code
 
_synset_to_specs(self, synset) source code
 
_get_lu_form(self, lu) source code
 
_get_lu_cat(self, lu) source code
 
_get_lu_sense(self, lu) source code
 
_lu_has_cat(self, lu, cat) source code
 
_lu_has_sense(self, lu, sense) source code
 
_get_rel_name(self, edge) source code
 
_rel_has_name(self, edge, name) source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  _unit_separator = ':'
  _handled_output_formats = ('spec', 'xml', 'raw')
  _default_output_format = 'spec'
  _default_max_depth = 9
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, cdb_lu=None, cdb_syn=None, output_format='spec', max_depth=9)
(Constructor)

source code 

Create a new Cornet instance

Parameters:
  • cdb_lu - an xml file(name) to read the lexical units from
  • cdb_syn - an xml file(name) to read the synsets from
  • default_format (string ('spec', 'xml', 'raw')) - default output format
  • max_depth (int) - a maximal depth between 1 and 9
Overrides: object.__init__

open(self, cdb_lu, cdb_syn, verbose=False)

source code 

Open and parse Cornetto database files

Parameters:
  • cdb_lu (file or filename) - xml definition of the lexical units
  • cdb_syn (file or filename) - xml definition of the synsets
  • verbose - verbose output during parsing

ask(self, query, format=None)

source code 

Pose a query about lexical units to the Cornetto database

This supports three different types of queries:

  1. Getting lexical units

    If the query consists of only a lexical unit specification the answer lists all lexical units which satisfy this specification. See also get_lex_units

  2. Getting related lexical units

    If the query consists of a lexical unit specification plus a relation specification, the answer consists of all lexical units related by the specified relation(s). See also get_related_lex_units

  3. Testing relations between lexical units

    If the query consists of a lexical unit specification, plus a relation specification plus another lexical specification, the answer is a path from the first to the second lexical unit(s) along the specified relation(s). See also test_lex_units_relation

Parameters:
  • query (string) - a specification
  • format ('spec', 'xml', 'raw') - output format
Returns:
depends on type of query an output format

get_lex_units(self, spec, format=None)

source code 

Get all lexical units which satisfy this specification

>>> inst.get_lex_units("lamp")
['lamp:noun:3', 'lamp:noun:4', 'lamp:noun:1', 'lamp:noun:2']
>>> inst.get_lex_units("varen")
['varen:verb:3', 'varen:noun:1', 'varen:verb:1', 'varen:verb:2']
>>> inst.get_lex_units("varen:noun")
['varen:noun:1']
>>> inst.get_lex_units("varen:verb:3")
['varen:verb:3']
>>> inst.get_lex_units("varen:noun:3")
[]
Parameters:
  • spec - lexical unit specification
  • format ('spec', 'xml', 'raw') - output format
Returns: list
list of lexical units in requested output format

get_related_lex_units(self, lu_spec, rel_spec, format=None)

source code 

For all specified lexical units, find all lexical units related by the specified relation.

The search may be constrained by the setting of the maximum search depth; see set_max_depth.

>>> pprint(inst.get_related_lex_units("slang", "SYNONYM"))
{'slang:noun:1': {'SYNONYM': {'serpent:noun:2': {}}},
 'slang:noun:2': {},
 'slang:noun:3': {'SYNONYM': {'pin:noun:2': {}, 'tang:noun:2': {}}},
 'slang:noun:4': {'SYNONYM': {'groepstaal:noun:1': {},
                              'jargon:noun:1': {},
                              'kringtaal:noun:1': {}}},
 'slang:noun:5': {'SYNONYM': {'muntslang:noun:1': {}}},
 'slang:noun:6': {'SYNONYM': {'Slang:noun:1': {}}}}
>>> pprint(inst.get_related_lex_units("slang::1", "1"))
{'slang:noun:1': {'HAS_HOLO_MEMBER': {'slangegebroed:noun:1': {},
                                      'slangengebroed:noun:2': {}},
                  'HAS_HYPERONYM': {'reptiel:noun:1': {}},
                  'HAS_HYPONYM': {'cobra:noun:1': {},
                                  'gifslang:noun:1': {},
                                  'hoedslang:noun:1': {},
                                  'hydra:noun:2': {},
                                  'lansslang:noun:1': {},
                                  'lepelslang:noun:1': {},
                                  'python:noun:2': {},
                                  'ratelslang:noun:1': {},
                                  'ringslang:noun:1': {},
                                  'rolslang:noun:1': {},
                                  'waterslang:noun:3': {},
                                  'wurgslang:noun:1': {},
                                  'zeeslang:noun:1': {}},
                  'HAS_MERO_PART': {'slangekop:noun:1': {},
                                    'slangenkop:noun:1': {}},
                  'SYNONYM': {'serpent:noun:2': {}}}}
Parameters:
  • lu_spec - lexical unit(s) specification of source
  • rel_spec - relation(s) specification
  • format ('spec', 'xml', 'raw') - output format
Returns: dict
an hierachical dict structure with lexical units and relations as keys

test_lex_units_relation(self, from_lu_spec, rel_spec, to_lu_spec, format=None)

source code 

Test if certain relation(s) hold between certain lexical units by searching for a a path from any of the source lexical units to any of target lexical unit(s) along one or more of the specified relation(s)

>>> inst.test_lex_units_relation("lamp", "HAS_HYPONYM", "gloeilamp")
['lamp:noun:2', 'HAS_HYPONYM', 'gloeilamp:noun:1']
>>> inst.test_lex_units_relation("lamp", "HAS_HYPONYM2", "fotolamp")
['lamp:noun:2', 'HAS_HYPONYM', 'gloeilamp:noun:1', 'HAS_HYPONYM', 'fotolamp:noun:1']
>>> inst.test_lex_units_relation("lamp", "HAS_HYPONYM", "fotolamp")
[]
Parameters:
  • from_lu_spec - lexical unit specification of the source(s)
  • rel_spec - relation(s) specification
  • to_lu_spec - lexical unit specification of the target(s)
  • format ('spec', 'xml', 'raw') - output format
Returns: list
list of lexical units and relations in requested output format, possibly empty

Warning: The result may not be the only shortest path.

get_synsets(self, spec, format=None)

source code 

Get all synsets containing lexical units which satisfy a certain specification.

>>> pprint(inst.get_synsets("slang"))
[['Slang:noun:1', 'slang:noun:6'],
 ['slang:noun:5', 'muntslang:noun:1'],
 ['slang:noun:1', 'serpent:noun:2'],
 ['slang:noun:2'],
 ['tang:noun:2', 'pin:noun:2', 'slang:noun:3'],
 ['jargon:noun:1', 'groepstaal:noun:1', 'kringtaal:noun:1', 'slang:noun:4']]
>>> pprint(inst.get_synsets("slang:noun:5"))
[['slang:noun:5', 'muntslang:noun:1']]
>>> pprint(inst.get_synsets("slang:noun:7"))
[]
Parameters:
  • spec - lexical unit specification
  • format ('spec', 'xml', 'raw') - output format
Returns: list
list of synsets (lists) with lexical units in requested output format

get_related_synsets(self, lu_spec, rel_name=None, format=None)

source code 

For all synsets containing lexical units satisfying this specification find the related synsets along this relation. If no relation is given, all relations are considered.

>>> pprint(inst.get_related_synsets("lamp", "HAS_HYPERONYM"))
{'HAS_HYPERONYM': [['armatuur:noun:1', 'verlichtingsarmatuur:noun:1'],
                   ['lamp:noun:2', 'licht:noun:13', 'lichtje:noun:1'],
                   ['lichtbron:noun:1'],
                   ['voorwerp:noun:1', 'ding:noun:1']]}
>>> pprint(inst.get_related_synsets("slang::1"))
{'HAS_HOLO_MEMBER': [['slangegebroed:noun:1', 'slangengebroed:noun:2']],
 'HAS_HYPERONYM': [['reptiel:noun:1']],
 'HAS_MERO_PART': [['slangekop:noun:1', 'slangenkop:noun:1']]}
Parameters:
  • lu_spec - lexical unit(s) specification of source
  • rel_name - relation name
Returns: dict
a dict with relations as keys and lists of synsets as values

Note: Parameter rel_name is a relation name, not a relation specification. Search is thus not transitive.

get_lex_unit_by_id(self, c_lu_id, format=None)

source code 

Get lexical unit by id

Parameters:
  • c_lu_id - Tvalue of the c_lu_id attribute at <cdb_lu> element
  • format ('spec', 'xml', 'raw') - output format
Returns: string or None
lexical unit in the requested output format or None

get_synset_by_id(self, c_sy_id, format=None)

source code 

Get synset by id

Parameters:
  • c_sy_id - value of the c_sy_id attribute at <cdb_synset> element
  • format ('spec', 'xml', 'raw') - output format
Returns: list or None
set (list) of lexical units in the requested output format

all_common_subsumers(self, lu_spec1, lu_spec2, rel_name='HAS_HYPERONYM', format=None)

source code 

Finds all common subsumers of two lexical units over the given relation. The common subsumers are grouped according to the lenght of the path (in edges) from the first lexical unit to the subsumer to the second lexical unit.

>>> pprint(c.all_common_subsumers("kat", "hond"))
{2: ['huisdier:noun:1', 'zoogdier:noun:1'],
 4: ['beest:noun:2', 'gedierte:noun:2', 'dier:noun:1'],
 5: ['ziel:noun:3',
     'homo sapiens:noun:1',
     'sterveling:noun:1',
     'mens:noun:1',
     'mensenkind:noun:1'],
 6: ['organisme:noun:2'],
 8: ['wezen:noun:1', 'schepsel:noun:1', 'creatuur:noun:2'],
 9: ['iets:noun:2'],
 10: ['object:noun:3']}
Parameters:
  • lu_spec1 - first lexical unit(s) specification
  • rel_name - relation name (not a specification)
  • lu_spec2 - second lexical unit(s) specification
  • format ('spec', 'xml', 'raw') - output format
Returns: dict
a dict with path lenghts as key and lists of common subsumers as values, possibly empty

Note: this method will only make sense for some relations (typically HAS_HYPERONYM) but not for others (e.g. SYNONYM)

least_common_subsumers(self, lu_spec1, lu_spec2, rel_name='HAS_HYPERONYM', format=None)

source code 

Finds the least common subsumers of two lexical units over the given relation, that is, those common subsumers of which the lenght of the path (in edges) from the first lexical unit to the subsumer to the second lexical unit is minimal.

>>> c.least_common_subsumers("kat", "hond")
['huisdier:noun:1', 'zoogdier:noun:1']
Parameters:
  • lu_spec1 - first lexical unit(s) specification
  • rel_name - relation name (not a specification)
  • lu_spec2 - second lexical unit(s) specification
  • format ('spec', 'xml', 'raw') - output format
Returns: list
a lists of the least common subsumers, possibly empty

Note: this method will only make sense for some relations (typically HAS_HYPERONYM) but not for others (e.g. SYNONYM)

set_output_format(self, format='spec')

source code 

Change the default output format

Parameters:
  • format ('spec', 'xml', 'raw') - output format

set_max_depth(self, max_depth=9)

source code 

Sets a limit on the maximal depth of searches for related lexical units where no relation name is specified.

Parameters:
  • max_depth (int) - a maximal depth between 1 and 9

Note: The limit is only enforced on the public method, i.e. ask, get_related_lex_units, and not on the private methods. Also note that this not affect test_lex_units_relation.

_transitive_closure(self, lus, rel_name)

source code 

Computes the transitive closure of a set of lexical units over a certain relation. Returns a dict with successors as keys and their distance (in edges) to the orginal lexical units.