Finite Automata Formal Languages: Operations On Sentences
Finite Automata Formal Languages: Operations On Sentences
Finite Automata Formal Languages: Operations On Sentences
Operations on sentences:
Substring: if w = {abb}, then its substrings are: λ, {a}, {b}, {ab}, {bb}, {abb}
If u, v, and w are sentences, and w = uv, then u is a prefix of w, and v is a suffix of w
1
Deterministic Finite automaton
• A string w Î S* is accepted by M if
d*(q0, w) Î F
• The language accepted (or recognized) by M is the set of all strings on S that are
accepted by M
• Formally:
L(M) = {w Î S* : δ* (q0, w) Î F}
(1) in an NFA, the range of d is in the powerset of Q (instead of just Q), so that from the
current state, upon reading a symbol:
(a) more than one state might be the next state of the NFA, or
(b) no state may be defined as the next state of the NFA, and
(2) l-moves are possible; that is, a transition from one state to another may occur without
reading a symbol from the input string.
DFA:
transition function examples
2
δ:Q×Σ→Q δ(q0, a) = q1
NFA:
transition function examples
δ : Q × (Σ È {λ}) → 2Q δ(q1, a) = {q1, q2}
δ(q1, b) = {}
δ(q1, λ) = {q2}
Advantages of nondeterminism
• an NFA can be smaller, easier to construct and easier to understand than a DFA
that accepts the same language
• Validation
– checking that an input string is in valid format
– example 1: checking format of email address on
WWW entry form
– example 2: UNIX regex command
• Search and selection
– looking for strings that match a certain pattern
– example: UNIX grep command
• Tokenization
– converting sequence of characters (a string) into sequence of tokens (e.g.,
keywords, identifiers)
– used in lexical analysis phase of compiler
Theorem: The class of regular languages is closed under union, concatenation, and
Kleene’s star.
Proof:
3
Let L1 and L2 be regular languages over S.
Then, there are regular expressions r1 and r2 corresponding to L1 and L2.
By the definition of regular expression and regular languages, r1+r2 ,r1×r2, and r1* are
regular expressions corresponding to L1ÈL2, L1×L2, and L1*.
Thus, the class of regular languages is closed under union, concatenation, and Kleene’s
star.
Proof:
Let L be any regular language. By definition there must be some DFA
M = (Q, S, d, q0, F) with L(M) = L. Define a new DFA M' = (Q, S, d, q0, Q-F).
This has the same transition function d as M, but for any string x Î S* it accepts x if
and only if M rejects x. Thus L(M') is the complement of L. Because there is a DFA
for it, we conclude that the complement of L is regular
Proof1:
• Let L1 and L2 be any regular languages
• By definition there must be DFAs for them:
– M1 = (Q, S, d1, q0, F1) with L(M1) = L1
– M2 = (R, S, d2, r0, F2) with L(M2) = L2
• Define a new DFA M3 = (Q´R, S, d, (q0,r0), F1´F2)
• For d, define it so that for all q Î Q, r Î R, and a Î S, we have d((q,r),a) =
(d1(q,a), d2(r,a))
• M3 accepts if and only if both M1 and M2 accept
• So L(M3 ) = L1 Ç L2, so that intersection is regular
Proof2:
DeMorgan’s Law: L1∩L2 = (L1^ U L2^)^
L1 and L2 are regular
So L1^ and L2^ are regular (Closure under complementation)
So L1^ U L2^ is regular (Closure under union)
So (L1^ U L2^)^ is regular. (Closure under comp.)
So L1 ∩ L2 is regular.
4
Closure Under Reversal :
Theorm: If L1 and L2 are any regular languages, reversals of these languages are also a
regular.
Homomorphism:
Inverse homomorphism:
Let h be a homomorphism from some alphabet S to strings in another alphabet T. Let L
be an RL over T. Then h- 1(L ) is the set of strings w such that h(w) is in L.
Decision Properties:
– Assume
• #symbols = constant
• #states = n.
– Conversion from an e-NFA to a DFA --- requiring O(n32n) time in the worse cases
– Conversion from a DFA to an NFA --- requiring O(n) time
– From an automaton (DFA) to an RE --- requiring O(n34n) time
– From an RE to an automaton (e-NFA) --- requiring linear time in the size of the RE
– The table filling algorithm requires O(n2) time where n = #states.
Context-free grammar
5
A context-free grammar (CFG) is a 4-tuple, G = (V, T, S, P), where V and T are
disjoint sets, S Î V, and P is a finite set of rules of the form A ® x, where A Î V and x
Î (V È T)*.
V = non-terminals or variables
T = terminals
S = Start symbol
P = Productions or grammar rules
Derivation:
In a leftmost derivation, the leftmost nonterminal is replaced at each step.
In a rightmost derivation, the rightmost nonterminal is replaced at each step.
Parse Tree
A partial derivation tree is one in which property 1 does not necessarily hold and in
which property 2 is replaced by:
2a. Every leaf has a label from V È T È {λ}
The yield of the tree is the string of symbols in the order they are encountered when the
tree is traversed in a depth-first manner, always taking the leftmost unexplored branch.
Ambiguity
A grammar is ambiguous if it can generate a string with two possible parse trees.
A string has more than one parse tree if and only if it has more than one leftmost
derivation.
Ambiguous grammar
Unambiguous grammar
6
<st1> := IF <expression> THEN <st1> ELSE <st1> |
<otherstatement>
<st2> := IF <expression> THEN <statement> |
IF <expression> THEN <st1> ELSE <st2>
λ -productions
Unit Production
Useless Productions
A ® BC or A®a
7
Greibach Normal Form ( GNF)
Pushdown Automata
Instantaneous description
8
This means that we start at the start state, with the z on stack , and after processing the
string w, we end up in an accepting state, with no more characters left to process in the
original string. We don’t care what is left on the stack.
This is called acceptance by final state.
Proof by construction:
Let
G1 = (V1, T1, S1, P1) and
G2 = (V2, T2, S2, P2)
with
L1 = L(G1) and
L2 = L(G2)
9
Union
Construction completed.
Concatenation
Construction completed.
Kleene Closure
Construction completed.
A Finite Automaton has only its states to serve as memory - very limited in what
languages it could recognize.
In a Pushdown Automaton we added a stack to a Finite Automaton - this gave it better
memory, and allowed it to recognize more languages. But it was still limited by the
nature of the stack.
We could add another stack to a Pushdown Automaton. This would allow it to accept
some non-context free languages (e.g., anbncn).
Or we could change the stack to a queue. This would allow other languages to be
accepted (e.g., ww).
Definition
10
A Turing machine (TM) is a 7-Tuple
T = (Q, , Γ, δ, q0, Δ, F), where
Q = a finite set of states not containing h (the halt state)
= a finite set of symbols constituting the input alphabet;
SÍ Γ–{Δ}
Γ = a finite set of symbols constituting the tape alphabet
δ = the transition function: Q ´ G ® Q ´ G ´ {L, R}
q0 = the start state (q0 Î Q)
Δ is a special symbol, the blank symbol
F Í Q is the set of final states
Turing machines can be thought of as a finite state automaton with slightly different
labels on the arcs, plus a tape as auxiliary memory, and a tape head (or read-write head).
Δ 1 0 0 1 D D
The Turing machine has a tape head which reads from and writes to the tape during each
move.• The tape head can move right or left, or may stay in the same position. The tape
head can read a character from the cell under which it is positioned, or it may write a
character to the cell.
The standard Turing machine described by our textbook has several features:
The TM has a tape that is unbounded in both directions.
The TM is deterministic in the sense that d defines at most one move per configuration
There is no special input file; the input is copied onto the tape.
There is no special output device; the tape contains the output, if any.
11
A TM that accepts {a, b}* {aba} {a, b}*
12
13
A TM that accepts strings of L = {ss}:
14
Variations on TMs
Multitape TM
• Consider a TM with k tapes and a separate head for each tape. It reads and writes
on these tapes in parallel.
• We can show that this does not increase the computational power of a TM by
showing that any multi-tape TM can be simulated by a standard, single-tape TM.
• The details of the simulation involve dividing a single tape into multiple tracks --
using an alphabet consisting of tuples, with one element for each track.
Decidability
Enumerability
15
– If a language is decidable, then its complement is decidable.
Remember that the strings that a TM accepts constitute the language of the Turing
machine. We represent this as L(T).
A Turing machine always accepts the words of its language by stopping in the halting
state.
However, it is allowed to reject strings that don’t belong to its language either by
crashing (in a finite number of steps), or by looping forever.
Infinite loops are bad for us, because if we are trying to decide whether a string belongs
to the language of a TM or not, we can’t tell after waiting a finite amount of time whether
the TM is going to halt on the very next step, or whether it is going to go on forever. We
would prefer to have our TMs crash to reject a string.
It turns out that these distinctions exactly correspond to the last two major classes of
languages that we want to discuss in this course:
Recursively enumerable = accepted by a TM that may loop (or may crash) to
reject
Recursive = accepted by a TM that always crashes to reject
Definition
A language is recursively enumerable if there is a TM that accepts L.
This means that a language is recursive iff there exists a membership algorithm for it.
Otherwise the language is recursively enumerable.
The last machine model of computation which we shall examine is the linear bounded
automaton or lba. These were originally developed as models for actual computers rather
than models for the computational process. They have become important in the theory of
computation even though they have not emerged in applications to the extent which
pushdown automata enjoy.
They do not have unending amounts of storage like Turing machines. Thus any actual
computation done on a computer is not as extensive as that which could be completed on
a Turing machine.
Definition. A linear bounded automaton (lba) is a multi-track Turing machine which has
only one tape, and this tape is exactly the same length as the input.
We allow the computing device to use just the storage it was given at the beginning of its
computation. As a safety feature, we shall employ endmarkers (* on the left and # on the
16
right) on our lba tapes and never allow the machine to go past them. This will ensure that
the storage bounds are maintained and help keep our machines from leaving their tapes.
Let's have linear bounded automata accept just like Turing machines. Thus for lba
halting means accepting. For these new machines computation is restricted to an area
bounded by a constant (the number of tracks) times the length of the input.
Halting problem
The determination of whether a Turing machine will come to a halt given a particular
input program. The halting problem is solvable for machines with less than four states.
However, the four-state case is open, and the five-state case is almost certainly
unsolvable due to the fact that it includes machines iterating Collatz-like congruential
functions, and such specific problems are currently open. The problem of whether a
general Turing machine halts is undecidable, as first proved by Turing .Turing proved the
answer is, no. Since many questions can be recast to this problem, we can conclude that
some programs are absolutely impossible, although heuristic or partial solutions are
possible.
In this abstract framework, there are no resource limitations of memory or time on the
program's execution; it can take arbitrarily long, and use arbitrarily much storage space,
before halting. The question is simply whether the given program will ever halt on a
particular input.
does not halt; rather, it goes on forever in an infinite loop. On the other hand, the program
The halting problem is famous because it was one of the first problems proven
algorithmically undecidable. This means there is no algorithm which can be applied to
any arbitrary program and input to decide whether the program stops when run with that
input.
17
Representing the halting problem as a set
The conventional representation of decision problems is the set of objects possessing the
property in question. The halting set
K := { (i, x) | program i will eventually halt if run with input x} represents the
halting problem.
This set is recursively enumerable, which means there is a computable function that lists
all of the pairs (i,x) it contains. However, the complement of this set is not recursively
enumerable.
There are many equivalent formulations of the halting problem; any set whose Turing
degree equals that of the halting problem is such a formulation. Examples of such sets
include:
{ i | there is an input x such that program i eventually halts when run with input x }.
The decision problem then is to decide whether such a solution exists or not.
Example 1
α1 α2 α3 β1 β2 β3
18
a ab bba baa aa bb
However, if the two lists had consisted of only α2,α3 and β2,β3, then there would have
been no solution (because then no matching pair would have the same last letter, as must
occur at the end of a solution). A convenient way to view an instance of a Post
correspondence problem is as a collection of blocks of the form
αi
βi
there being an unlimited supply of each type of block. Thus the above example is viewed
as
a
ab bba
, ,
baa
aa bb
where the solver has an endless supply of each of these three block types. A solution
corresponds to some way of laying blocks next to each other so that the string in the top
cells corresponds to the string in the bottom cells. Then the solution to the above example
corresponds to:
bba
ab bba a
19
bb aa bb baa
Again using blocks to represent an instance of the problem, the following is an example
that has infinitely many solutions in addition to the kind obtained by merely "repeating" a
solution. For readability, just the block number is shown below each block:
bb
ab c
, ,
b
ba bc
1 2 3
In this instance, every sequence of the form (1, 2, 2, ..., 2, 3) is a solution (in addition to
all their repetitions):
bb
ab
ab ab c
...
b ba
ba ba bc
…
1 2 2… 2 3
20