Unit 3 NLP
Unit 3 NLP
Unit 3 NLP
English WordNet
English WordNet is a comprehensive lexical database for the English
language developed at Princeton University. It categorizes English
words into sets of synonyms, known as "synsets," and defines
semantic relationships between these synsets. These relationships
include hypernymy/hyponymy (is-a relationships), meronymy (part-
whole relationships), antonymy (opposite meanings), and more.
English WordNet organizes synsets into a hierarchical structure,
providing a semantic hierarchy of concepts. It also offers word sense
disambiguation by providing multiple word senses for polysemous
words. Each word in WordNet is associated with a lemma, the base
form of the word, and part-of-speech tags..
Hindi WordNet
Hindi WordNet is a system for bringing together different lexical and semantic
relations between Hindi words. It organizes the lexical information in terms of
word meaningsand can be termed as a lexicon based on psycholinguistic
principles. Hindi WordNet is widely used in many NLP applications. In this, for
each word there is a synset representing one lexical concept. Synsets are the
basic building blocks of Hindi WordNet. The lexicon deals with the content
words or open-class category of words. Thus, Hindi WordNet contains the
following categories of words: nouns, verbs, adjectives, and adverbs.
Each entry in Hindi WordNet consists of the entries synset, gloss (description of
concept), and position in ontology
The main obstacle to high-performance NLP applications is the knowledge
acquisition bottleneck.
Next, we describe some of the prominent causes :-
i. Absence of proper expressiveness : Need to consider approximate
relations for better understanding. For examples of word pairs like
"vidyalaya" and "samsthana" that are approximately similar but not
synonyms.
ii. Missing composition of semantic relationships: Hindi WordNet lacks
defined compositions for similar/dissimilar relations.For example like
"vahana" and "kara" having hypernymy-hyponymy relations and "kara"
and "pahiya" having meronymy-holonymy relations but relation between
"vahana" and "pahiya" is not defined in Hindi WordNet.
Fuzzy Hindi WordNet
Fuzzy Hindi WordNet is a word sense network. A word sense node in this
network is a synset that is regarded as a basic object in Fuzzy Hindi WordNet.
Each synset in Fuzzy Hindi WordNet is linked to other synsets through well-
known lexical and semantic relations such as fuzzy hypernymy, fuzzy hyponymy,
fuzzy meronymy, fuzzy troponymy, fuzzy antonymy, and fuzzy entailment.
Semantic relations are between synsets, and lexical relations are between
words. These relations serve to organize the lexical knowledge base.
We render Fuzzy Hindi WordNet as a fuzzy graph where nodes represent
concepts (synsets) and edges represent fuzzy relations between concepts. The
weight of an edge represents the strength of the relation between two
concepts/synsets. The value of strength varies from 0 to 1
1. Fuzzy Association
In Fuzzy Hindi WordNet, words often have approximate similar meanings,
which can be represented through a fuzzy association relation. This semantic
relation between two synsets signifies partially similar meanings between
concepts. The relation is denoted as (w1, w2, μas), where w1 and w2 are
approximate synonyms. The strength of this relation is represented by μas.
Examples: (vidyalaya, patshala, skula, pathalaya) → (samsthana, adhishtana,
pratishtana, istitayutaka) with a strength of 0.8.
2. Fuzzy Hypernymy and Fuzzy Hyponymy
These relations exist between synsets that capture superset/subset
relationships. Fuzzy hypernymy indicates that one synset is an approximate
subset of another, while fuzzy hyponymy signifies the reverse relationship.
These are denoted as (w1, w2, μhr/μhp), where μhr/μhp represents the degree
of the relationship.
Examples:(mattha) → (dahi) with a strength of 0.8.
4. Fuzzy Antonymy
Fuzzy antonymy represents the relation between two words expressing
approximately opposite meanings. This relation is denoted as (w1, w2, μan),
where μan signifies the strength of the relation.
Examples: (pareshani) → (khushi) with a strength of 0.8.
5. Fuzzy Entailment
Fuzzy entailment denotes the logical relationship between two verb synsets
where the truth of one follows logically from the other. It is a one-way relation
and is represented as (v1 μe → v2), where μe represents the strength of the
entailment.
Examples: (sona) → (letana) with a strength of 0.9.
6. Fuzzy Troponymy
Fuzzy troponymy captures the relation between synsets of verbs, where one
verb denotes an elaboration of another in a specific manner. This relation is
represented as (v1, v2, μt), where μt indicates the strength of the relationship.
Examples: (padhana, parhai karana) → (sikhaana) with a strength of 0.8.
7. Fuzzy Gradation
Fuzzy gradation represents intermediate concepts between fuzzy antonyms.
The relation between synsets is represented as (w1, w2, μg), where μg signifies
the strength of the gradation.
Example:(hesana, samanya manodasa, karahana) with a strength of 0.8.
8. Fuzzy Causative
The fuzzy causative relation links causative verbs and signifies the
interdependency between different morphological forms of a verb. It is a lexical
relation with a unity strength value.
Example: (khana, khilana) with a strength of 1.0.
12.Fuzzy Attribute
This relation represents partial properties of an attribute of a noun in the
adjective.
Eg.: ( (jamına), (upajau), μat), ( (candı), (camakadara), μat)
Example: (सुपत्र, सत्पात्र, अच्छा पात्र) modifies (व्यक्ति, मानव, साक्षर, सख्स, जन, बंदा,
बंदा).
Example: (तेज, तेज़, तेजी से, तेजी, रफ्तार से, रफ़तार से, तेज गतत से) modifies
(दौड़ना, भागना, धनना).
• Here, a, b, and c represent concepts, Xi, Yj, and M(i, j) represent relations
between concepts. If there is an Xi relation between concepts a and b
and a Yj relation between concepts b and c, then a relation M(i, j)
between concepts a and c exists.
• The strength can be obtained from corresponding value from table II i.e
S(i, j).
Example:
Using composition, the relation between (वाहन) and (रे तियो) is fuzzy meronymy
(M(2,4)) with a moderate strength.
This shows that (वाहन) and (रे तियो) are moderately related, meaning that a
"वाहन" is a "रे तियो" in a moderate number of cases.
The composition of fuzzy relations can help process sentences that are
challenging for standard WordNet, as it considers indirect connections between
concepts.
To compute the strength of composed fuzzy relations, t-norms are used. These
are functions that take two values in the range [0, 1] and return a value in the
same range. Three t-norms are proposed:
T1(x, y) = max(0, x + y - 1)
T2(x, y) = xy
T3(x, y) = min(x, y)
Depending on the t-norm used, you can compute the strength of the composed
fuzzy relation. For example, given x = 0.8 and y = 0.5, T1 results in 0.3, T2
results in 0.4, and T3 results in 0.5. T1 is pessimistic, T2 is moderate, and T3 is
optimistic. The choice of t-norm can influence the strength of the composed
fuzzy relation.