Nothing Special   »   [go: up one dir, main page]

NLP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Natural Language Processing is a field of artificial intelligence that focuses on the interaction

between computers and humans using natural language. The goal of NLP is to enable
computers to understand, interpret, and generate human language in a way that is both
meaningful and contextually relevant. NLP involves a range of tasks, including but not
limited to, text and speech recognition, language translation, sentiment analysis, and language
generation. It plays a crucial role in applications such as chatbots, language translation
services, voice assistants, and other language-related technologies.

Advantages of Natural Language Processing (NLP):


1. Improved Human-Computer Interaction: NLP enhances the interaction between
humans and computers by allowing systems to understand and respond to natural
language, making interfaces more user-friendly.
2. Efficient Information Retrieval: NLP helps in extracting relevant information from
large volumes of text, making it easier to find specific data or answers to queries.
3. Automation of Repetitive Tasks: NLP enables the automation of tasks like data
extraction, summarization, and categorization, saving time and resources.
4. Multilingual Communication: Machine translation and language processing in NLP
facilitate communication across different languages, breaking down language barriers.
5. Personal Assistants and Chatbots: NLP powers virtual assistants and chatbots,
enabling them to understand and respond to user queries, improving customer service
and user experience.
6. Data Analysis and Insights: NLP can analyze and extract valuable insights from
unstructured data, providing businesses with actionable information.
Disadvantages of Natural Language Processing (NLP):
1. Ambiguity and Contextual Challenges: Natural language is often ambiguous, and the
context can significantly affect the meaning of words or phrases, posing challenges
for accurate interpretation by NLP systems.
2. Complexity of Language: Languages are complex, and variations in grammar, syntax,
and expressions can make it difficult for NLP models to generalize effectively across
different contexts.
3. Lack of Common Sense Understanding: NLP systems may struggle with common
sense reasoning and understanding, making it challenging to handle nuanced or
complex situations.
4. Bias and Fairness Issues: NLP models can inherit and perpetuate biases present in the
training data, leading to unfair or biased outcomes, especially when dealing with
sensitive topics or diverse populations.
5. High Resource Requirements: Training and maintaining sophisticated NLP models
require significant computational resources, making them resource-intensive in terms
of both time and hardware.
6. Privacy Concerns: Analyzing and processing natural language data raises privacy
concerns, especially when dealing with sensitive information in areas like healthcare
or finance.
7. Constant Evolution of Language: Language is dynamic and evolves over time.
Keeping NLP models up-to-date with the latest linguistic trends and changes can be
challenging.
Phases of NLP

There are the following five phases of NLP:

1. Lexical Analysis and Morphological

The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. It divides the whole text into paragraphs,
sentences, and words.

2. Syntactic Analysis (Parsing)

Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship
among the words.

Example: Agra goes to the Poonam

In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is
rejected by the Syntactic analyzer.

3. Semantic Analysis

Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal
meaning of words, phrases, and sentences.

4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning
of the sentences that follow it.

5. Pragmatic Analysis

Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by
applying a set of rules that characterize cooperative dialogues.

For Example: "Open the door" is interpreted as a request instead of an order.

Levels of NLP
There are seven independent levels to understand and extract meaning from a text to or
spoken word. To understand natural languages it’s important to differentiate between them.

1. Phonology level: This level basically deals with the pronunciation. As English
spelling is especially only partially phonemic, John inputs the data does not show
these very clearly; for example, the h in John is silent and the two as in data resemble
to very unlike sounds.

2. Morphological level: Morphology deals with the smallest parts of words that convey
meaning, and suffixes and prefixes. Morphemes means studying how the words are
built from smaller meaning. For example, the word 'dog' has single morpheme while
the word 'rats' have two morphemes 'rat' and morpheme 's' denotes singular and plural
concepts.

3. Lexical level: The lexical level deals with the study at the level of words with respect
to their lexical meaning and Part-Of-Speech (POS). This level uses lexicon that is a
collection of individual lexemes. A lexeme is a basic unit of lexical meaning; which is
an abstract unit of morphological analysis that represents the set of forms or "senses"
taken by a single morpheme. For example, "Duck", can take the form of a noun or a
verb but its POS and lexical meaning can only be derived in context with other words
used in the phrase/sentence.

4. Syntactic level: Syntactic level deals with grammar and structure of sentences. It
studies the proper relationships between words. The POS tagging output of the lexical
analysis can be used at the syntactic level of two group words into the phrase and
clause brackets. Syntactic Analysis also referred to as "parsing", allows the extraction
of phrases which convey more meaning than just the individual words by themselves,
such as in a noun phrase.
5. Semantic level: This level deals with the meaning of words and sentences. There are
two approaches of semantic levels: 1. Syntax-driven semantic analysis 2. Semantic
grammar. It is a study of the meaning of words that are associated with grammatical
structure. For example, John inputs the data from this statement we can understand
that John is an Agent.
6. Discourse level: This level deals with the structure of different kinds of text. There are
two types of discourse: 1.Anaphora resolution, 2. Discourse / text structure
recognition. The words are replaced in Anaphora resolution, for example pronouns.
Discourse structure recognition determines the purpose of sentences in the text which
enhances meaningful illustration of the text.

7. Pragmatic level: This level deals with the use of real world knowledge and
understanding of how this influences meaning of what is being communicated. By
analysis documents and queries, a more detailed representation is derived.

Language, Knowledge and Grammar in Language Processing


1. Language is used for communication and knowledge is interpreted in it. Here, we
consider text as language and content as knowledge.
2. Language is a medium used for expression. It is the outer form of the content it
expresses, the same content can be expressed in various languages.
3. The question over years can language be separated from its content? If yes then how
can the content itself be represented? Usually, the meaning of one language is written
in the same language but by using a different set of words. Therefore, to process a
language means to process the content of it.
4. Computers are not able to understand natural language, methods are developed for
mapping its content in the formal language. The knowledge representation tool
represents the whole body of knowledge and that has been modified maybe through
generation of new words, to include new ideas and situations.
5. The language processing has different levels and each level of processing contains
different types of knowledge, the various levels of processing and the type of
knowledge is as follow:

 Phonetic and Phonological Knowledge: Phonetics deals sounds while phonology is


the study of combination of sounds into organized units of speech, the formation of
syllables and larger units. Phonetic and phonological knowledge are essential for
speech based systems as they deal with how words are related to the sounds that
realize them.
 Morphological Knowledge: Phonetics deals sounds while phonology is the study of
combination of sounds into organized units of speech, the formation of syllables and
larger units. Morphology concerns word formation. It is a study of the patterns of
formation of words by the combination of sounds into minimal distinctive units of
meaning called mophemes. Morphological knowledge concerns how words are
constructed from morphemes.
 Syntactic Knowledge: Syntax deals with how words combine to form phrases, phrases
combine to form clauses and clauses join to make sentences. Syntactic analysis
concerns sentence formation. It deals with how words can be put together to form
correct sentences.
 Semantic Knowledge: It concerns meanings of the words and sentences. This is the
study of context independent meaning that is the meaning a sentence has, no matter in
which context it is used. Defining the meaning of a sentence is very difficult due to
the ambiguities involved.
 Pragmatic Knowledge: Pragmatics is the extension of the meaning or semantics.
Pragmatics deals with the contextual aspects of meaning in particular situations. It
concerns how sentences are used in different situations and how use affects the
interpretation of the sentence. Defining the meaning of a sentence is very difficult due
to the ambiguities involved.
 Discourse Knowledge: Discourse concerns connected sentences. It is a study of
chunks of language which are bigger than a single sentence. Discourse language
concerns inter-sentential links that is how the immediately preceding sentences affect
the interpretation of the next sentence.
 World Knowledge: Word knowledge is nothing but everyday knowledge that all
speakers share about the world. It includes the general knowledge about the structure
of the world and what each language user must know about the other user's beliefs
and goals. This essential to make the language understanding much better.

Ambiguity and its Types in English and Indian Regional Language


Natural language has a very rich form and structure. It is very ambiguous. Ambiguity means
having not well defined solution. Any sentence in a language with a large-enough grammar
can have another interpretation. There are various forms of ambiguity related to natural
language and they are:
1. Lexical Ambiguity
2. Syntactical Ambiguity
3. Semantic Ambiguity
4. Metonymy Ambiguity

Lexical Ambiguity: When words have multiple assertion then it is known as lexical
ambiguity. For example, the word back can be a noun or an adjective. Noun: back stage,
Adjective: back door
Syntactic Ambiguity: Syntactic ambiguity means sentences are parsed in multiple syntactical
forms or A sentence can be parsed in different ways.
Semantic Ambiguity: Semantic ambiguity is related to the sentence interpretation.
Metonymy Ambiguity: Metonymy is most difficult ambiguity. It deals with phrases in which
the literal meaning is different from the figurative assertion.
Application of NLP
1. Machine Translation: In machine translation, the translation of the text from one
human language to another human language is performed automatically. For
performing the translation, it is important to have the knowledge of the words and
phrases, grammar of two languages that are involved in translation, semantics of the
language and knowledge of the word.
2. Speech recognition: Speech recognition is the process where the acoustic speech
signals are mapped to the set of words. Speech recognition is used for converting
spoken words into text. It is used in applications, such as mobile, home automation,
video recovery, dictating to Microsoft Word, voice biometrics, voice user interface,
and so on.
3. Speech synthesis: Automatic production of speech is known as speech synthesis. It
means speaking a sentence in natural language.
4. Information Retrieval: It refers to the human-computer interaction (HCI) that happens
when we use a machine to search a body of information for information objects
(content) that match our search query. A Person's query is matched against a set of
documents to find a subset of 'relevant' document. Examples: Google, Yahoo,
Altavista, etc.
5. Text Categorization: Text categorization (also known as text classification or topic
spotting) is the task of automatically sorting a set of documents into categories
(clusters).
6. Sentiment Analysis: Sentiment Analysis is also known as opinion mining. It is mainly
used on the web to analyse the behaviour, attitude, and emotional state of the sender.
This application is implemented through a combination of NLP) and statistics by
assigning the values to the text (natural, positive or negative), identify the mood of the
context (sad, happy, angry, etc.)
7. Question-Answering systems: Question Answering focuses on constructing systems
that automatically answer the questions asked by humans in a natural language. It
presents only the requested information instead of searching full documents like
search engine. The basic idea behind the QA system is that the users just have to ask
the question and the system will retrieve the most appropriate and correct answer for
that question.

8. Spam Detection: To detect unwanted e-mails getting to a user's inbox, spam detection
is used.
9. Chatbot: Chatbot is one of the most important applications of NLP. It is used by many
companies to provide the customer's chat services.
10. Text summarization: This task aims to create short summaries of longer documents
while retaining the core content and preserving the overall meaning of the text.
11. Information Extraction - Identify specific pieces of information in unstructured or
semi-structured textual document. Transform unstructured information in a corpus of
documents or web pages into a structured database.

You might also like