Regular Expression

Regular Expression
The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that
belong to the language in hand. It searches for the pattern defined by the language rules.
Regular expressions have the capability to express finite languages by defining a pattern for
finite strings of symbols. The grammar defined by regular expressions is known as regular
grammar. The language defined by regular grammar is known as regular language.
Regular expression is an important notation for specifying patterns. Each pattern matches a set of
strings, so regular expressions serve as names for a set of strings. Programming language tokens
can be described by regular languages. The specification of regular expressions is an example of
a recursive definition. Regular languages are easy to understand and have efficient
implementation.
There are a number of algebraic laws that are obeyed by regular expressions, which can be used
to manipulate regular expressions into equivalent forms.
Operations
The various operations on languages are:
 Union of two languages L and M is written as

L U M = {s | s is in L or s is in M}
 Concatenation of two languages L and M is written as
LM = {st | s is in L and t is in M}
 The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L.
Notations
If r and s are regular expressions denoting the languages L(r) and L(s), then
 Union : (r)|(s) is a regular expression denoting L(r) U L(s)

 Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
 Kleene closure : (r)* is a regular expression denoting (L(r))*
 (r) is a regular expression denoting L(r)
Precedence and Associativity
 *, concatenation (.), and | (pipe sign) are left associative

 * has the highest precedence
 Concatenation (.) has the second highest precedence.
1
 | (pipe sign) has the lowest precedence of all.

Page
Representing valid tokens of a language in regular expression
Swarupa Arjya
Asst. Prof CSE
If x is a regular expression, then:
 x* means zero or more occurrence of x.

i.e., it can generate { e, x, xx, xxx, xxxx, … }
 x+ means one or more occurrence of x.
i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
 x? means at most one occurrence of x
i.e., it can generate either {x} or {e}.
[a-z] is all lower-case alphabets of English language.
[A-Z] is all upper-case alphabets of English language.
[0-9] is all natural digits used in mathematics.
Representing occurrence of symbols using regular expressions
letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]
Representing language tokens using regular expressions
Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*
The only problem left with the lexical analyzer is how to verify the validity of a regular
expression used in specifying the patterns of keywords of a language. A well-accepted solution is
to use finite automata for verification.
Finite Automata(FA)
Finite automata is a state machine that takes a string of symbols as input and changes its state
accordingly. Finite automata is a recognizer for regular expressions. When a regular expression
string is fed into finite automata, it changes its state for each literal. If the input string is
successfully processed and the automata reaches its final state, it is accepted, i.e., the string just
fed was said to be a valid token of the language in hand.
The mathematical model of finite automata consists of:
 Finite set of states (Q)

 Finite set of input symbols (Σ)
 One Start state (q0)
 Set of final states (qf)
2
Page
 Transition function (δ)
Swarupa Arjya
Asst. Prof CSE
The transition function (δ) maps the finite set of state (Q) to a finite set of input symbols (Σ), Q ×
Σ➔Q
Finite Automata Construction
Let L(r) be a regular language recognized by some finite automata (FA).
 States : States of FA are represented by circles. State names are written inside circles.
 Start state : The state from where the automata starts, is known as the start state. Start
state has an arrow pointed towards it.
 Intermediate states : All intermediate states have at least two arrows; one pointing to
and another pointing out from them.
 Final state : If the input string is successfully parsed, the automata is expected to be in
this state. Final state is represented by double circles. It may have any odd number of
arrows pointing to it and even number of arrows pointing out from it. The number of odd
arrows are one greater than even, i.e. odd = even+1.
 Transition : The transition from one state to another state happens when a desired
symbol in the input is found. Upon transition, automata can either move to the next state
or stay in the same state. Movement from one state to another is shown as a directed
arrow, where the arrows points to the destination state. If automata stays on the same
state, an arrow pointing from a state to itself is drawn.
Example : We assume FA accepts any three digit binary value ending in digit 1. FA = {Q(q0, qf),
Σ(0,1), q0, qf, δ}
NFA (Non-Deterministic finite automata)

o NFA stands for non-deterministic finite automata. It is easy to construct an NFA than
DFA for a given regular language.
o The finite automata are called NFA when there exist many paths for specific input from
3
the current state to the next state.

Page
o Every NFA is not DFA, but each NFA can be translated into DFA.
Swarupa Arjya
Asst. Prof CSE
o NFA is defined in the same way as DFA but with the following two exceptions, it
contains multiple next states, and it contains ε transition.
In the following image, we can see that from state q0 for input a, there are two next states q1 and
q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is not fixed or determined
that with a particular input where to go next. Hence this FA is called non-deterministic finite
automata.
 NFA also has five states same as DFA, but with different transition function, as shown
follows:
δ: Q x ∑ →2Q
where,
1. Q: finite set of states

2. ∑: finite set of the input symbol
3. q0: initial state
4. F: final state
5. δ: Transition function
Graphical Representation of an NFA
An NFA can be represented by digraphs called state diagram. In which:
The state is represented by vertices.
The arc labeled with an input character show the transitions.
The initial state is marked with an arrow.
The final state is denoted by the double circle.
Example 1:
Q = {q0, q1, q2}
∑ = {0, 1}
q0 = {q0}
4
F = {q2}
Page
Solution:
Swarupa Arjya
Asst. Prof CSE
Transition diagram:
Transition Table:
Present State Next state for Input 0 Next State of Input 1
→q0 q0, q1 q1
q1 q2 q0
*q2 q2 q1, q2
In the above diagram, we can see that when the current state is q0, on input 0, the next state will
be q0 or q1, and on 1 input the next state will be q1. When the current state is q1, on input 0 the
next state will be q2 and on 1 input, the next state will be q0. When the current state is q2, on 0
input the next state is q2, and on 1 input the next state will be q1 or q2.
Example 2:
NFA with ∑ = {0, 1} accepts all strings with 01.
Solution:
Transition Table:
Present State Next state for Input 0 Next State of Input 1
→q0 q1 ε
q1 ε q2
*q2 q2 q2
5
Page
Swarupa Arjya
Asst. Prof CSE
DFA (Deterministic finite automata)
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the
computation. The finite automata are called deterministic finite automata if the machine is read
an input string one symbol at a time.
In DFA, there is only one path for specific input from the current state to the next state.
DFA does not accept the null move, i.e., the DFA cannot change state without any input
character.
DFA can contain multiple final states. It is used in Lexical Analysis in Compiler.
In the following diagram, we can see that from state q0 for input a, there is only one path which
is going to q1. Similarly, from q0, there is only one path for input b going to q2.
Formal Definition of DFA
A DFA is a collection of 5-tuples same as we described in the definition of FA.
Q: finite set of states
∑: finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
Transition function can be defined as:
δ: Q x ∑→Q
6
Page
Swarupa Arjya
Asst. Prof CSE

Regular Expression

Uploaded by

Copyright:

Available Formats

Regular Expression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regular Expression

Uploaded by

Copyright:

Available Formats

Regular Expression

The various operations on languages are:

 Union of two languages L and M is written as

 Union : (r)|(s) is a regular expression denoting L(r) U L(s)

Precedence and Associativity

 *, concatenation (.), and | (pipe sign) are left associative

 | (pipe sign) has the lowest precedence of all.

Representing valid tokens of a language in regular expression

 x* means zero or more occurrence of x.

Representing occurrence of symbols using regular expressions

Representing language tokens using regular expressions

Identifier = (letter)(letter | digit)*

The mathematical model of finite automata consists of:

 Finite set of states (Q)

 Transition function (δ)

Finite Automata Construction

Let L(r) be a regular language recognized by some finite automata (FA).

Σ(0,1), q0, qf, δ}

NFA (Non-Deterministic finite automata)

the current state to the next state.

1. Q: finite set of states

Graphical Representation of an NFA

An NFA can be represented by digraphs called state diagram. In which:

The state is represented by vertices.

The arc labeled with an input character show the transitions.

The initial state is marked with an arrow.

The final state is denoted by the double circle.

Q = {q0, q1, q2}

Present State Next state for Input 0 Next State of Input 1

NFA with ∑ = {0, 1} accepts all strings with 01.

Present State Next state for Input 0 Next State of Input 1

Formal Definition of DFA

A DFA is a collection of 5-tuples same as we described in the definition of FA.

Q: finite set of states

∑: finite set of the input symbol

q0: initial state

Transition function can be defined as:

You might also like