Regular Expression
Regular Expression
Regular Expression
The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that
belong to the language in hand. It searches for the pattern defined by the language rules.
Regular expressions have the capability to express finite languages by defining a pattern for
finite strings of symbols. The grammar defined by regular expressions is known as regular
grammar. The language defined by regular grammar is known as regular language.
Regular expression is an important notation for specifying patterns. Each pattern matches a set of
strings, so regular expressions serve as names for a set of strings. Programming language tokens
can be described by regular languages. The specification of regular expressions is an example of
a recursive definition. Regular languages are easy to understand and have efficient
implementation.
There are a number of algebraic laws that are obeyed by regular expressions, which can be used
to manipulate regular expressions into equivalent forms.
Operations
Notations
If r and s are regular expressions denoting the languages L(r) and L(s), then
Swarupa Arjya
Asst. Prof CSE
If x is a regular expression, then:
letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]
Decimal = (sign)?(digit)+
The only problem left with the lexical analyzer is how to verify the validity of a regular
expression used in specifying the patterns of keywords of a language. A well-accepted solution is
to use finite automata for verification.
Finite Automata(FA)
Finite automata is a state machine that takes a string of symbols as input and changes its state
accordingly. Finite automata is a recognizer for regular expressions. When a regular expression
string is fed into finite automata, it changes its state for each literal. If the input string is
successfully processed and the automata reaches its final state, it is accepted, i.e., the string just
fed was said to be a valid token of the language in hand.
Swarupa Arjya
Asst. Prof CSE
The transition function (δ) maps the finite set of state (Q) to a finite set of input symbols (Σ), Q ×
Σ➔Q
States : States of FA are represented by circles. State names are written inside circles.
Start state : The state from where the automata starts, is known as the start state. Start
state has an arrow pointed towards it.
Intermediate states : All intermediate states have at least two arrows; one pointing to
and another pointing out from them.
Final state : If the input string is successfully parsed, the automata is expected to be in
this state. Final state is represented by double circles. It may have any odd number of
arrows pointing to it and even number of arrows pointing out from it. The number of odd
arrows are one greater than even, i.e. odd = even+1.
Transition : The transition from one state to another state happens when a desired
symbol in the input is found. Upon transition, automata can either move to the next state
or stay in the same state. Movement from one state to another is shown as a directed
arrow, where the arrows points to the destination state. If automata stays on the same
state, an arrow pointing from a state to itself is drawn.
Example : We assume FA accepts any three digit binary value ending in digit 1. FA = {Q(q0, qf),
o Every NFA is not DFA, but each NFA can be translated into DFA.
Swarupa Arjya
Asst. Prof CSE
o NFA is defined in the same way as DFA but with the following two exceptions, it
contains multiple next states, and it contains ε transition.
In the following image, we can see that from state q0 for input a, there are two next states q1 and
q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is not fixed or determined
that with a particular input where to go next. Hence this FA is called non-deterministic finite
automata.
NFA also has five states same as DFA, but with different transition function, as shown
follows:
δ: Q x ∑ →2Q
where,
Example 1:
∑ = {0, 1}
q0 = {q0}
4
F = {q2}
Page
Solution:
Swarupa Arjya
Asst. Prof CSE
Transition diagram:
Transition Table:
→q0 q0, q1 q1
q1 q2 q0
*q2 q2 q1, q2
In the above diagram, we can see that when the current state is q0, on input 0, the next state will
be q0 or q1, and on 1 input the next state will be q1. When the current state is q1, on input 0 the
next state will be q2 and on 1 input, the next state will be q0. When the current state is q2, on 0
input the next state is q2, and on 1 input the next state will be q1 or q2.
Example 2:
Solution:
Transition Table:
→q0 q1 ε
q1 ε q2
*q2 q2 q2
5
Page
Swarupa Arjya
Asst. Prof CSE
DFA (Deterministic finite automata)
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the
computation. The finite automata are called deterministic finite automata if the machine is read
an input string one symbol at a time.
In DFA, there is only one path for specific input from the current state to the next state.
DFA does not accept the null move, i.e., the DFA cannot change state without any input
character.
DFA can contain multiple final states. It is used in Lexical Analysis in Compiler.
In the following diagram, we can see that from state q0 for input a, there is only one path which
is going to q1. Similarly, from q0, there is only one path for input b going to q2.
F: final state
δ: Transition function
δ: Q x ∑→Q
6
Page
Swarupa Arjya
Asst. Prof CSE