You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an existing bug that I stumbled across while using the German WordNet from the EOMW via my custom-WordNet loading code in my unmerged PR at #1621. (It's dramatically more serious for German, since all nouns in German are capitalised and so a huge fraction of the language doesn't work, but it also affects the existing OMW WordNets with support built into NLTK.)
Consider the synset representing London, England. While the synset name is in lowercase, its lemmas are capitalised in both the English WordNet...
>>> london_synset = wn.synset('london.n.01')
>>> london_synset.definition()
'the capital and largest city of England; located on the Thames in southeastern England; financial and industrial and cultural center'
>>> london_synset.lemmas()
[Lemma('london.n.01.London'), Lemma('london.n.01.Greater_London'), Lemma('london.n.01.British_capital'), Lemma('london.n.01.capital_of_the_United_Kingdom')]
But when using the English WordNet, I can look up the synset (or an individual Lemma) by lemma by passing in 'London' in whatever capitalisation I like:
In non-English, on the other hand, it is impossible to look up this synset by lemma, because the first line of wn.synsets() coerces the lemma passed in to lowercase, and that lemma is then used as a key to look up the synset in a lemma-to-synset dictionary in which Londres is capitalised.
(Contrast this with lemmas that are lowercased in the French WordNet's tab file; they can be looked up regardless of how the lemma passed to synsets() is capitalised:
To match the English behaviour, the behaviour of synsets() for non-English WordNets should be adjusted so that the lookup is properly case-insensitive. This was probably the intent of coercing the given lemma to lowercase before doing the lookup, but fails if the Lemma to be looked up is spelt with a capital letter in the actual WordNet data.
The text was updated successfully, but these errors were encountered:
This is an existing bug that I stumbled across while using the German WordNet from the EOMW via my custom-WordNet loading code in my unmerged PR at #1621. (It's dramatically more serious for German, since all nouns in German are capitalised and so a huge fraction of the language doesn't work, but it also affects the existing OMW WordNets with support built into NLTK.)
Consider the synset representing London, England. While the synset name is in lowercase, its lemmas are capitalised in both the English WordNet...
... and also in the French WordNet:
But when using the English WordNet, I can look up the synset (or an individual
Lemma
) by lemma by passing in 'London' in whatever capitalisation I like:In non-English, on the other hand, it is impossible to look up this synset by lemma, because the first line of
wn.synsets()
coerces thelemma
passed in to lowercase, and that lemma is then used as a key to look up the synset in a lemma-to-synset dictionary in whichLondres
is capitalised.(Contrast this with lemmas that are lowercased in the French WordNet's tab file; they can be looked up regardless of how the
lemma
passed tosynsets()
is capitalised:)
To match the English behaviour, the behaviour of
synsets()
for non-English WordNets should be adjusted so that the lookup is properly case-insensitive. This was probably the intent of coercing the givenlemma
to lowercase before doing the lookup, but fails if theLemma
to be looked up is spelt with a capital letter in the actual WordNet data.The text was updated successfully, but these errors were encountered: