Nothing Special   »   [go: up one dir, main page]

AI Unit 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Artificial Intelligence

Unit 5

Natural Language Processing

NLP stands for Natural Language Processing, which is a part of Computer Science, Human
language, and Artificial Intelligence. It is the technology that is used by machines to
understand, analyse, manipulate, and interpret human's languages. It helps developers
to organize knowledge for performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER), speech recognition, relationship
extraction, and topic segmentation.

Till the year 1980, natural language processing systems were based on complex sets of
hand-written rules. After 1980, NLP introduced machine learning algorithms for language
processing. Now, modern NLP consists of various applications, like speech recognition,
machine translation, and machine text reading. When we combine all these applications
then it allows the artificial intelligence to gain knowledge of the world.

Advantages of NLP

 NLP helps users to ask questions about any subject and get a direct response within
seconds.
 NLP offers exact answers to the question means it does not offer unnecessary and
unwanted information.
 NLP helps computers to communicate with humans in their languages.
 It is very time efficient.
 Most of the companies use NLP to improve the efficiency of documentation
processes, accuracy of documentation, and identify the information from large
databases.

Disadvantages of NLP

 NLP may not show context.


 NLP is unpredictable
 NLP may require more keystrokes.
 NLP is unable to adapt to the new domain, and it has a limited function that's why
NLP is built for a single and specific task only.

Applications of NLP

There are the following applications of NLP -

1. Question Answering

Question Answering focuses on building systems that automatically answer the questions
asked by humans in a natural language.

2. Spam Detection

Spam detection is used to detect unwanted e-mails getting to a user's inbox.


3. Sentiment Analysis

Sentiment Analysis is also known as opinion mining. It is used on the web to analyse the
attitude, behaviour, and emotional state of the sender. This application is implemented
through a combination of NLP (Natural Language Processing) and statistics by assigning
the values to the text (positive, negative, or natural), identify the mood of the context
(happy, sad, angry, etc.)

4. Machine Translation

Machine translation is used to translate text or speech from one natural language to
another natural language.
Example: Google Translator

5. Spelling correction

Microsoft Corporation provides word processor software like MS-word, PowerPoint for the
spelling correction.

6. Speech Recognition

Speech recognition is used for converting spoken words into text. It is used in applications,
such as mobile, home automation, video recovery, dictating to Microsoft Word, voice
biometrics, voice user interface, and so on.

7. Chatbot

Implementing the Chatbot is one of the important applications of NLP. It is used by many
companies to provide the customer's chat services.
8. Information extraction

Information extraction is one of the most important applications of NLP. It is used for
extracting structured information from unstructured or semi-structured machine-
readable documents.

Components of NLP

There are the following two components of NLP -

1. Natural Language Understanding (NLU)

Natural Language Understanding (NLU) helps the machine to understand and analyse
human language by extracting the metadata from content such as concepts, entities,
keywords, emotion, relations, and semantic roles.

NLU mainly used in Business applications to understand the customer's problem in both
spoken and written language.

NLU involves the following tasks -

 It is used to map the given input into useful representation.


 It is used to analyze different aspects of the language.

2. Natural Language Generation (NLG)

Natural Language Generation (NLG) acts as a translator that converts the computerized
data into natural language representation. It mainly involves Text planning, Sentence
planning, and Text Realization.
Parsing

The word ‘Parsing’ whose origin is from Latin word ‘pars’ (which means ‘part’), is used to
draw exact meaning or dictionary meaning from the text. It is also called Syntactic analysis
or syntax analysis. Comparing the rules of formal grammar, syntax analysis checks the
text for meaningfulness. The sentence like “Give me hot ice-cream”, for example, would be
rejected by parser or syntactic analyzer.

In this sense, we can define parsing or syntactic analysis or syntax analysis as follows

It may be defined as the process of analyzing the strings of symbols in natural language
conforming to the rules of formal grammar.

We can understand the relevance of parsing in NLP with the help of following points −

 Parser is used to report any syntax error.


 It helps to recover from commonly occurring error so that the processing of the
remainder of program can be continued.
 Parse tree is created with the help of a parser.
 Parser is used to create symbol table, which plays an important role in NLP.
 Parser is also used to produce intermediate representations (IR).
Deep Vs Shallow Parsing

Deep Parsing Shallow Parsing

In deep parsing, the search strategy will It is the task of parsing a limited part of
give a complete syntactic structure to a the syntactic information from the given
sentence. task.

It is suitable for complex NLP It can be used for less complex NLP
applications. applications.

Dialogue systems and summarization Information extraction and text mining


are the examples of NLP applications are the examples of NLP applications
where deep parsing is used. where deep parsing is used.

It is also called full parsing. It is also called chunking.

Various types of parsers

As discussed, a parser is basically a procedural interpretation of grammar. It finds an


optimal tree for the given sentence after searching through the space of a variety of trees.
Let us see some of the available parsers below −

Recursive descent parser

Recursive descent parsing is one of the most straightforward forms of parsing. Following
are some important points about recursive descent parser −

 It follows a top down process.


 It attempts to verify that the syntax of the input stream is correct or not.
 It reads the input sentence from left to right.

Shift-reduce parser

Following are some important points about shift-reduce parser −

 It follows a simple bottom-up process.


 It tries to find a sequence of words and phrases that correspond to the right-hand
side of a grammar production and replaces them with the left-hand side of the
production.
 The above attempt to find a sequence of word continues until the whole sentence is
reduced.
 In other simple words, shift-reduce parser starts with the input symbol and tries to
construct the parser tree up to the start symbol.

Chart parser

Following are some important points about chart parser −

 It is mainly useful or suitable for ambiguous grammars, including grammars of


natural languages.
 It applies dynamic programing to the parsing problems.
 Because of dynamic programing, partial hypothesized results are stored in a
structure called a ‘chart’.
 The ‘chart’ can also be re-used.

Grammar

Grammar is defined as the rules for forming well-structured sentences. Grammar also
plays an essential role in describing the syntactic structure of well-formed programs, like
denoting the syntactical rules used for conversation in natural languages.

In the theory of formal languages, grammar is also applicable in Computer Science, mainly
in programming languages and data structures. Example - In the C programming
language, the precise grammar rules state how functions are made with the help of lists
and statements.

Context Free Grammar

Context-free grammar consists of a set of rules expressing how symbols of the language
can be grouped and ordered together and a lexicon of words and symbols.

One example rule is to express an NP (or noun phrase) that can be composed of either a
ProperNoun or a determiner (Det) followed by a Nominal, a Nominal in turn can consist of
one or more Nouns: NP → DetNominal, NP → ProperNoun; Nominal → Noun |
NominalNoun
Context-free rules can also be hierarchically embedded, so we can combine the previous
rules with others, like the following, that express facts about the lexicon: Det → a Det →
the Noun → flight

Context-free grammar is a formalism power enough to represent complex relations and


can be efficiently implemented. Context-free grammar is integrated into many language
applications

A Context free grammar consists of a set of rules or productions, each expressing the ways
the symbols of the language can be grouped, and a lexicon of words

Context-free grammar (CFG) can also be seen as the list of rules that define the set of all
well-formed sentences in a language. Each rule has a left-hand side that identifies a
syntactic category and a right-hand side that defines its alternative parts reading from left
to right. - Example: The rule s --> np vp means that "a sentence is defined as a noun
phrase followed by a verb phrase."

Formalism in rules for context-free grammar: A sentence in the language defined by a CFG
is a series of words that can be derived by systematically applying the rules, beginning
with a rule that has s on its left-hand side.

Use of parse tree in context-free grammar: A convenient way to describe a parse is to show
its parse tree, simply a graphical display of the parse.

A parse of the sentence is a series of rule applications in which a syntactic category is


replaced by the right-hand side of a rule that has that category on its left-hand side, and
the final rule application yields the sentence itself.

Context free grammar G can be defined by four tuples as:

G= (V, T, P, S)

Where,

G describes the grammar

T describes a finite set of terminal symbols.

V describes a finite set of non-terminal symbols

P describes a set of production rules


S is the start symbol.

In CFG, the start symbol is used to derive the string. You can derive the string by
repeatedly replacing a non-terminal by the right hand side of the production, until all non-
terminal have been replaced by terminal symbols.

You might also like