1-Lecture One - (Chapter One-Introduction (NLP) )
1-Lecture One - (Chapter One-Introduction (NLP) )
1-Lecture One - (Chapter One-Introduction (NLP) )
Introduction to NLP
What is NLP?
Aspects of Language Processing
Goal of NLP
History of NLP
Application of NLP
Open Problems
Knowledge Sources
Computational Morphology
01/02/23 2
What is Natural Language
Processing ?
01/02/23 3
What is Natural Language
Processing ?
01/02/23 4
What is Natural Language
Processing ?
NLP is a field of computer science, artificial intelligence and
computational linguistics concerned with the interactions
between computers and human (natural) languages, and, in
particular, concerned with programming computers to fruitfully
process large natural language corpora*.
“Natural” languages:
Geez, Amharic, Oromifa, Tigrigna,, English, Mandarin, French,
Swahili, Arabic, …
01/02/23 6
Aspects of Language Processing
A finer-grained decomposition of
the process is useful when taken
into account the current state of
the art in combination with the
need to deal with real language
data as reflected in Figure.
01/02/23 8
Aspects of Language Processing
Syntax:
Sentence structure, phrase, grammar, …
Semantics:
Meaning,
Execute commands
Discourse analysis:
Meaning of a text,
Relationship between sentences (e.g. anaphora)
01/02/23 10
Aspects of Language Processing
01/02/23 11
Aspects of Language Processing
01/02/23 12
Aspects of Language Processing
Syntax
Lemmatization:
Lemmatization usually refers to doing things properly with the use of a
vocabulary and morphological analysis of words, normally aiming to
remove inflectional endings only and to return the base or dictionary form
of a word, which is known as the lemma.
Morphological segmentation:
Separate words into individual morphemes and identify the class of the
morphemes.
The difficulty of this task depends greatly on the complexity of the
morphology (i.e. the structure of words) of the language being considered.
01/02/23 13
Aspects of Language Processing
Syntax …
Part-of-speech tagging:
Example, "book" can be a noun ("the book on the table") or verb ("to
book a flight")
01/02/23 14
Aspects of Language Processing
Syntax …
Stemming
Stemming usually refers to a crude heuristic process that chops off the ends of
words in the hope of achieving this goal correctly most of the time, and often
includes the removal of derivational affixes.
Word segmentation
Separate a chunk of continuous text into separate words.
For a language like English, this is fairly trivial, since words are usually separated
by spaces.
However, some written languages like Chinese, Japanese and Thai do not mark
word boundaries in such a fashion, and in those languages text segmentation is a
significant task requiring knowledge of the vocabulary and morphology of words
in the language.
01/02/23 15
Aspects of Language Processing
01/02/23 16
Aspects of Language Processing
Discourse :
Automatic summarization
Coreference resolution
Given a sentence or larger chunk of text, determine which words ("mentions") refer to the same
objects ("entities"). Anaphora resolution is a specific example of this task, and is specifically
concerned with matching up pronouns with the nouns or names to which they refer.
The more general task of coreference resolution also includes identifying so-called "bridging
relationships" involving referring expressions.
For example, in a sentence such as "He entered John's house through the front door", "the front
door" is a referring expression and the bridging relationship to be identified is the fact that the
door being referred to is the front door of John's house (rather than of some other structure that
might also be referred to).
Discourse analysis:
01/02/23 17
Aspects of Language Processing
Speech Processing
Speech recognition
Speech segmentation
Given a sound clip of a person or people speaking, separate
it into words.
A subtask of speech recognition and typically grouped with
it.
01/02/23 18
Goal of Natural Language Processing
01/02/23 19
History of Natural Language
Processing
1950s
Early MT: word translation + re-ordering.
Chomsky’s Generative grammar.
Bar-Hill’s argument.
1960-80s
Applications:
BASEBALL: use NL interface to search in a database on baseball
games.
LUNAR: NL interface to search in Lunar.
ELIZA: simulation of conversation with a psychoanalyst.
SHREDLU: use NL to manipulate block world.
Message understanding: understand a newspaper article on terrorism.
Machine translation.
01/02/23 20
History of Natural Language
Processing
1960-80s
Methods
ATN (augmented transition networks): extended context-free
grammar
Case grammar (agent, object, etc.)
DCG – Definite Clause Grammar
Dependency grammar: an element depends on another
1990s-now
Statistical methods
Speech recognition
MT systems
Question-answering
etc…
01/02/23 21
History of Natural Language
Processing
01/02/23 22
History of Natural Language
Processing
Classical symbolic methods:
Morphological analyzer
Discourse analysis
Pragmatic analysis
01/02/23 23
History of Natural Language
Processing
Empirical and Statistical Approaches
Corpus Creation
Treebank Annotation
Part-of-Speech Tagging
Statistical Parsing
Etc…
01/02/23 24
NLP Applications
Speech Synthesis
Text to Speech:
01/02/23 26
Open Problems in NLP
01/02/23 27
Open Problems in NLP
Ambiguity:
(ADJ, ADV) …
Syntactic: Helicopter powered by human flies.
Discourse: anaphora, …
01/02/23 28
Open Problems in NLP
Classical solution:
Using a later analysis to solve ambiguity of an earlier step.
Eg. He gives him the change:
(change as verb does not work for parsing)
He changes the place:
(change as noun does not work for parsing)
01/02/23 29
Knowledge Sources
When using NLP for a new domain, one also needs to answer
what text source should be used for extracting content.
Of course, not any arbitrary text source is applicable.
In order to qualify as a source, the text type needs to meet the
following two criteria:
Firstly, the text type needs to contain sufficient domain
knowledge.
In other words, if we choose a text type that only infrequently
contains content regarding a given domain, then we are not very
likely to extract any significant amount of knowledge.
In the past, most research in NLP has been carried out on news
corpora. The topic that is predominant on this text type are issues
out of the domain. Consequently, this text type would be of little
value for knowledge extraction.
01/02/23 30
Knowledge Sources
01/02/23 31
Computational Morphology
What is it?
Morphology: the study/knowledge of structure/form.
• In this case: of words,
• How words are created, structured, analyzed
• Morpheme: basic meaningful unit of language.
01/02/23 32
Computational Morphology
Computational applications:
Analysis: parse/break a word into its constituent morphemes.
01/02/23 33
Computational Morphology
Morphological processes:
Affixation: prefix, suffix, infix
Interleaving (KaTaB, uKTaB)
Cliticization (isn’t, s’appelle)
Internal change: (sing/sang, goose/geese)
Suppletion (irregularity): (aller/ir, be/am)
Stress placement: implant, import, contest
Tone placement: dà vs. dá ( will spank vs. spanked)
Reduplication
Full: iji/ijiiji
Partial: lakad/lalakad
01/02/23 34
Computational Morphology
Inflectional
dog+s, sneez+ed
Compounding
overkill, BYU intramural track star
Cliticization
I’m, she’ll, they’ve, o’clock
01/02/23 35
Computational Morphology
Computational morphology
Processing morphological structure via computer (parsing,
generation)
Traditional approach:
ad-hoc methods,
Cut-and-paste algorithms,
Dictionary lookup,
Inadequate for highly inflected languages.
01/02/23 36
Question & Answer
01/02/23 37
Thank You !!!
01/02/23 38