Nothing Special   »   [go: up one dir, main page]

0% found this document useful (0 votes)
11 views24 pages

DSA Chap 1 PDF

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 24

Subject : Data Structure & Algorithms

Class : SYBCA – Semester IV


Lecturer : Ms. Archana R. Oza
Unit I : Introduction and Overview
 Introduction
 Basic Terminology , Elementary Data Organization
 Data Structure Operations
 Concept and Notations of Algorithm
 Algorithm Complexity , Time Space Trade Off
 Introduction to Array
 Representation of linear array in memory
 Traversing Linear Array
 Inserting and Deleting
 Searching methods ( Binary and Linear Search )

 Data Structure :
A data structure is a particular way of organizing data in a computer so that it can be
used efficiently. It is a way of collecting and organizing data in such a way that we can
perform operations on that data in an effective way.
For example, we can store a list of items having the same data-type using the array data
structure.
It represents the knowledge of data to be organized in memory. It should be designed &
implemented in such a way that it reduces the complexity and increases the efficiency.
Data structures are often classified by their characteristics. Some of those characteristics
are:

 Linear or non-linear: This characteristic describes whether the data items are
arranged in chronological sequence, such as with an array, or in an unordered
sequence, such as with a graph.

 Homogeneous or non-homogeneous: This characteristic describes whether all


data items in a given repository are of the same type or of various types.

 Static or dynamic: This characteristic describes how the data structures are
compiled. Static data structures have fixed sizes, structures and memory locations
1
at compile time. Dynamic data structures have sizes, structures and memory
locations that can shrink or expand depending on the use.

 Basic Terminologies :
Explaining or defining the basic terms used in data structures is called basic terminology.
The basic terms are described in brief as below :
1) Data : Data can be defined as value or collection of values, facts or figures. Data is
a raw and unorganized fact that is required to be processed to make it
meaningful.
For e.g.Marks of a student can be data.

2) Information :When data is processed, organized, structured or presented in


a given context so as to make it useful, it is called information. In short, Processed
data is Information.
For e.g. Average Marks of the whole class is information.

3) Data Item :A Data Item refers to a single unit of values. A data item that does not
have subordinate data items i.e. that cannot be further sub-divided is called an
elementary item.
For e.g. Roll Number, Age etc.

4) Group Item :A data item that is composed of one or more subordinate data
items is called a group item, i.e. data item that can be further sub-divided is called
a group item.
For e.g. Name – First_Name, Middle_name, Last_name.

5) Entity :An entity is a real-world object that has certain attributes or properties
which may be assigned values.
For e.g. Student, Employee, Chalk, etc.

6) Entity-set :Entities with similar attributes form an entity set.


For e.g.All students in a class.

7) Field :A field is a single unit of information representing an attribute of an entity.


The term fields refers to columns in a table.
For e.g. Roll_no, Name, Class,etc.

2
8) Record :A record is a collection of field values of a given entity i.e. A single row in a
table is called a record. It is also called a complete set of information of an entity.
For e.g. A student’s Roll_no, Name, Class.

9) File :File is the collection of records of the entities in a given entity set.A collection
of data or information that has a name, called the filename.

10) Primary Key :A primary key is a field in a table which uniquely identifies each
row/record in a database table. Primary keys must contain unique values.

For e.g.Roll_No,etc.

3
 Elementary Data Organization :
Organizing each data element at proper memory locations for efficient storage and
retrieval is called Elementary Data Organization. To organize data in an efficient way,
some data structures are used.
The different types of data structures used in a computer system are :

Fig. Types of Data Structure


Data Structures are divided into two types as Primitive and Non-Primitive, as :

1) Primitive Data Structures :


 The data structures for which a language has built-in support are known as
Built-in Data types or Primitive data structures.
 Primitive data are only single values.
 For e.g. Integer, Float values, Boolean, Character, Pointer.
 The size and type of variable values are pre-specified, and it has no additional
methods.

2) Non-Primitive Data Structures :


 The data structures that are derived from primary data types are known as non-
primitive.
 The non-primitive data structures are used to store the group of values. Non-
Primitive Data Structures are usually multiple values with a particular structure.
 For e.g. Array, structure, union, link list, stacks, queue etc.

4
 The size and type of variable values are not pre-specified, and it has additional
methods to perform certain operations.
 Non-Primitive Data Structures are divided into two categories as Linear and
Non-Linear Data Structures, as :

i. Linear Data Structures :Linear data structure arranges the data into a
sequence and follow some sort of order.
In a linear data structure, data elements are arranged in a linear order
where each member element is connected to its previous and next
element.
Such data structures are easy to implement as computer memory is also
sequential. In a linear data structure, memory is not utilized in an efficient
way.
Linear Data Structures are further sub-divided into two types as Static
and Dynamic, as :

a) Static :A static data structure is an organization or collection


of data in memory that is fixed in size. Memory size once allocated
cannot be reallocated later during run-time.
Static data structures are ideal for storing a fixed number of data
items.
For e.g. Arrays.

b) Dynamic :A dynamic data structure refers to an organization or


collection of data in memory that has the flexibility to grow or
shrink in size, enabling a programmer to control exactly how much
memory is utilized.
For e.g. Linked List, Stack, Queue.

ii. Non-Linear Data Structures :A non-linear data structure does not


have a fixed sequence to connect all its elements.
Each element can have multiple paths to connect to other
elements. Non-linear data structures uses memory very efficiently.
Non-linear data structures are not easy to traverse and needs multiple
runs to be traversed completely.
In non-linear data structure, data elements can be hierarchically
connected or are present at various levels.
For e.g. Tree, Graph.
5
Let us describe all types of data structures in brief as below :

1. Array :An array is a collection of items stored at contiguous memory


locations.Array is a container which can hold a fix number of items and these items
should be of the same type.
The basic terms used in array are :
 Element : Every item stored in an array is termed as an element.
 Index : Each memory location of an element in an array is denoted by a
numerical index which is used for identifying the element.
An array can be declared as :

Graphicalrepresentation :

2. Linked List :A linked list is a linear data structure, in which the elements are not
stored at contiguous memory locations. The elements in a linked list are linked
using pointers.
Linked list can be visualized as a chain of nodes, where every node points to the
next node
Each node is divided into two parts :
First part contains the data element i.e. Information/Data field and second part
contains the address of next node i.e. Next Pointer Field. The node that contains
the Null Pointer indicates the end of list, as :

6
The entry point into a linked list is called the HEAD node or START node of the list.
It is the reference (pointer) to the first node.

3. Stack :Stack is an ordered list of similar data type. It is a linear list in which
Insertion and Deletion takes place only at one end, called TOP.
Stack is also called as LIFO [Last In First Out] or FILO [First In Last Out] data
structure.
For e.g. Stack of Dishes, Books, Cards, etc.
It is a simple data structure that allows adding and removing elements in a
particular order. Every time an element is added, it goes on the top of the stack
and the only element that can be removed is the element that is at the top of the
stack, just like a pile of objects.

Two pre-defined functions are used to insert and remove items from the stack, as:
push() is used to insert a new element in the stack, pop() is used to delete an
existing element from the list.
The simplest application of a stack is to reverse a word. You push a given word to
stack - letter by letter - and then pop letters from the stack.

4. QUEUE :A queue is a linear collection of objects that are inserted and removed
according to the First In First Out (FIFO) principle.
7
Queue is a linear data structure where the first element is inserted from one end
called REAR and deleted from the other end called FRONT.
According to its FIFO structure, element inserted first will also be removed first.
The pre-defined functions to add an element into queue is called enqueue() and to
remove an element from queue is called dequeue().

Front pointer is used to get the first data item from a queue.Rear pointer is used
to get the last item from a queue.

5. Tree :A tree is a nonlinear hierarchical data structure that consists of nodes


connected by edges.Tree is a collection of elements called Nodes, where each
node can have single or multiple number of children.
The topmost node in a tree is called the root node. It does not have a parent.It
represents the parent & child
nodes connected by edges.

The above figure represents structure of a tree. This tree has 2 subtrees, as :
A is a parent of B and C.
B is called a child of A and also parent of D, E, F.

8
6. GRAPH :A Graph is a non-linear data structure consisting of nodes and edges. A
graph G can be defined as an ordered set G(V, E) where, G = Graph, V = set of
Vertices[Nodes] & E = set of Edges[To connect nodes].

In the above graph,


V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}
Graphs are used to solve many real-life problems. Graphs are used to represent
networks, as telephone network or circuit network. Graphs are also used in social
networks like linkedIn, Facebook, etc.

9
 Data Structure Operations :
The data in the data structures are processed by certain operations. The basic operations
that are performed on data structures are as follows:

1. Traversing :Traversing is a process where each& every element of a particular


data structure is accessed atleast once for processing.
Accessing means visiting every element at least once, just to display them to the
user or perform an operation on all the elements.
For e.g.Printing all the array elements one by one.
Traversal is the most basic of the operations that can be performed on any data
structures.

2. Searching :Searching in data structure refers to the process of finding location


LOC of an element in a list. Searching is an operation or a technique that helps
finds the place of a given element or value in the list. Any search is said to be
successful or unsuccessful depending upon whether the element that is being
searched is found or not. Searching can be Linear or binary.

3. Inserting :Inserting is the process to insert one or more data elements into adata
structure. Based on the requirement, new element can be added at the beginning,
end or any given index.

4. Deleting :Deleting refers to removing an existing element from adata structure


and re-organizing all the elements accordingly.

5. Sorting :Sorting refers to the operation or technique of arranging and


rearranging sets of data in some specific order. Sorting is performed according to
some key value of each record. Sorting is one of the most
important operations performed by computers.

6. Merging :The process of combining elements of two data structures is


called merging. Merging operation is used to combine the data items of two
sorted files into a single file in sorted order.

10
 Concept and Notations of Algorithm :
Algorithm is a list of well defined finite steps to solve a particular problem.
Following are some of the conventions used in algorithms :
1.) Identifying the number :
Each algorithm is assigned a number such as ‘Algorithm 1.1’ that
indicates first algorithm from chapter 1.

2.) Steps, Control and Exit :


The steps of algorithm are executed one after the other. Control may
be transferred with the help of ‘Goto’ statement. The algorithm is
terminated when the ‘Exit’ statement is encountered.
3.) Comments :
Each step may contain a comment in brackets which indicates the
purpose of the step.

4.) Variable Names :


Variable names will use capital letters such as MAX, LOC.

5.) Assignment Statements :


Assignment statements will use ‘dot-equals’ notation.
E.g.: MAX:=DATA[1] assigns the value of DATA[1] to the variable MAX.

6.) Input and Output :


Data may be input and is assigned to the variables using ‘read’ statement as
read : Variable names
Messages and data in variables may be output using ‘write’ statement as
write messages / Variable name

11
 Algorithm Complexity :
Complexity of an algorithm is a measure of the amount of time and/or space required by
an algorithm for an input of a given size (n).
Algorithm complexity is a rough approximation of the number of steps, which will be
executed depending on the size of the input data. Complexity gives the order of steps
count, not their exact count.
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
 Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
 Space Factor − Space is measured by counting the maximum memory space
required by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space
required by the algorithm in terms of n as the size of input data.
1. Time Complexity : Time complexity of an algorithm represents the amount of
time required by the algorithm to run to completion. Time requirements can be
defined as a numerical function T(n), where T(n) can be measured as the number of
steps, provided each step consumes constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the
total computational time is T(n) = c ∗ n, where c is the time taken for the addition of
two bits. Here, we observe that T(n) grows linearly as the input size increases.
2. Space Complexity : Space complexity of an algorithm represents the amount of
memory space required by the algorithm in its life cycle. The space required by an
algorithm is equal to the sum of the following two components −
 A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and
constants used, program size, etc.
 A variable part is a space required by variables, whose size depends on the size of
the problem. For example, dynamic memory allocation, recursion stack space,
etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the
fixed part and S(I) is the variable part of the algorithm, which depends on instance
characteristic I. Following is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
12
Step 2 - C← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now,
space depends on data types of given variables and constant types and it will be
multiplied accordingly.

 Time Space Trade Off :


An algorithm is a well defined list of steps for solving a particular problem. The
time and space it uses are the tow measures of efficiency of an algorithm. We may not
always be able to use the most efficient algorithm because sometimes the choice of the
data structure involves time space trade off, i.e., By increasing the amount of space for
storing the data, one may be able to reduce the time needed for processing the data or
vice versa.
E.g. : Suppose a file contains fields : Name, Social Security Number (SSN) and Extra data.
Consider the file is sorted according to the name and the problem is to find record of a
particular person if SSN is given. In such a situation, one has to perform linear search, bu
it is very time consuming.
Instead of that, if one creates two files, one containing Name and Pointer field
which is sorted according to the Name and the other containing SSN, Name and Extra
data that is sorted according to the SSN.
Finding a particular record with given information becomes easy and required time
is also very less because we can use binary search. But, the extra space for storing two
columns is necessary. This is called trading off space for time i.e., by increasing the
amount of space, required time gets reduced.

13
 Linear Array and its representation in Memory :
A linear array, is a list of finite homogeneous data elements such that –
i. Elements of the array are referenced by consecutive index numbers.
ii. Elements of the array stored respectively at continuous memory locations.
The total number of elements in the array is the length of an array. It can be calculated
using the formula :
Length = UB – LB + 1
Where, UB = Upper Bound(Largest index number of array) & LB = Lower Bound(Smallest
index number of array).
An array can be declared as :
int x[5] = {10,20,30,40,50}; (Declaration & Initialization)
OR
int x[5]; (Declaration)

Such that
X [0] = 10 X [1] = 20 X [2] = 30 X [3] = 40 X [4] = 50

For the given array, Length can be calculated as :


Length = UB – LB + 1
Length(x) = 4 – 0 + 1
Length(x) = 5

14
- Linear Array representation in Memory

Let ‘LA’ be the linear array in the memory of a computer as shown below :

LA [0] LA [1] LA[2] LA[3] LA[4] LA[5] LA[6]


100 200 300 400 500 600 700
1000 1002 1004 1006 1008 1010 1012

The elements of ‘LA’ are stored at successive memory cells. A Computer does not keep
track of every element in the array, but needs to keep track of address of only first
element called the base address denoted by BASE(LA).
By using this base address, computer calculates the address of any element of array
using the following formula :

LOC(LA(K)) = BASE(LA) + W (K - LB)


Where , LOC(LA(K)) = The location of Kth element of LA
BASE(LA) = The base address of LA
W = The number of bytes taken by one element
K = The Kth element
LB = The Lower Bound

Suppose we want to find out LOC ( LA [3] ),


We have BASE(LA) = 1000, W = 2 Bytes ( Because integer takes 2 bytes in memory ),
K = 3 , LB = 0
Thus,
LOC( LA ( K ) ) = BASE( LA ) + W ( K – LB )

LOC( LA ( 3 ) ) = 1000 + 2 ( 3 – 0 )
= 1000 + 2 ( 3 )
= 1000 + 6
= 1006
Therefore, LOC( LA ( 3 ) ) = 1006

15
 Traversing in Linear Array :

Traversing refers to visiting or processing each element at least once. Traversal is done
by starting with the first element of the array and reaching to the last.
For e.g. To print all the elements of an array, To count the number of elements of the
array.
Algorithm : Traverse LA(LA, LB, UB)
Description : Here, Consider ‘LA’ is the Linear array with Lower Bound ‘LB’ and Upper
Bound ‘UB’. This algorithm traverses array LA applying an operation ‘PROCESS’ to each
element of ‘LA’ .

Step 1 : [Initialize Counter] Set k := LB


Step 2 : Repeat Steps 3 & 4 while k<=UB
Step 3 : [Visit Element] Apply PROCESS to LA[k]
Step 4 : [Increase Counter] Set k := k+1
Step 5 : Exit
We also can use alternative form of algorithm which uses a repeat-for loop instead of
the repeat-while loop.

OR
Step 1 : Repeat for K= LB to UB
Step 2 : Apply PROCESS to LA[K]
[End of Loop].
Step 3 : Exit

16
Write a program to traverse & print all the elements of a linear array.
#include<stdio.h>
#include<conio.h>
void main()
{
int LA[5];
clrscr();
for(int k=0;k<=4;k++)
{
printf(“%d”,LA[k]);
}
getch();
}

 Insertion in Linear Array :


Insertion is the operation in which a new value is added at a particular place in an array.
Inserting element into the array is the process of adding an element to the existing
elements of the array.
Insertion in Linear Array involves three cases, as :
1. At the beginning
2. At a given location
3. At the end
Insertion involves movement of all the existing elements one position backwards &
empty the given location to insert new element.
Algorithm : InsertLA (DATA, N, ITEM, LOC)
Description : This algorithm inserts new element ITEM in linear array DATA with N
elements.
When LOC=1/0 it means the element has to be inserted at the beginning.
When LOC = J, it means the elements have to be inserted at J-th (given) Location.
When LOC =N+1 it means the element have to be inserted at the end.

17
Step 1 : [Initialize counter k with index of last element] Set k :=N-1
Step 2 : While k>=LOC repeat steps 3 and 4
Step 3 : [Move the current element one position backwards]
DATA[k+1] :=DATA[k]
Step 4 : [Decrement counter k] Set k :=k-1
Step 5 : [Insert new element at the Location] DATA[LOC] :=ITEM
Step 6 : [Update total number of array elements] Set N:=N+1

For e.g. Consider an array with 5 elements.

18
 Deletion in Linear Array :
Deletion refers to removing an existing element from the array and re-organizing all
elements of an array.
Deletion in Linear Array involves three cases, as :
1. At the beginning
2. At a given location
3. At the end
Delete the element and shift the rest of the elements to left by one position.
Algorithm : DeleteLA (DATA, N, ITEM, LOC)
Description : This algorithm deletes an element at given position in a linear array DATA
with N elements and stores in ITEM.
When LOC=1/0 it means the element to be deleted is at the beginning
When LOC =N it means the element to be deleted is at the end
When LOC = N-1 it means the element to be deleted is at J-th (given) location.

Step 1 : [Initialize counter k with index of element to be deleted] Set k:=LOC


Step 2 : [Store the element to be deleted in ITEM]
ITEM:=DATA[LOC]
Step 3 : While k<N repeat steps 4 and 5
Step 4 : [Move the current element one position forward]
DATA[k]=DATA[k+1]
Step 5 : [Increment counter k] Set k:=k+1
Step 6 : [Update total number of array elements] Set N:=N-1

19
 Searching Methods - Linear & Binary Search :
Searching in data structure refers to the process of finding location LOC of an element in
a list.
It is the process of finding a given value position in a list of values.
Any search is said to be successful or unsuccessful depending upon whether the element
that is being searched is found or not.
To search an element in a given array, two popular algorithms are available:

1. Linear Search [Sequential Search] :


Linear search is the simplest searching algorithm that searches for an element in a list in
sequential order. We start at one end and check every element until the desired element
is not found.
It compares the element to be searched with all the elements present in the array. When
the element is matched successfully, it returns the index of the element in the array, else
it return -1 or NULL.
It can be applied on both Sorted or Unsorted array, yet highly used for unsorted array.
The time complexity of the linear search is O(N) because each element in an array is
compared only once.
20
For e.g. : Consider x=20

If we need to search the element 20, it will go step by step in a sequential order, thus it is
also called sequential search.

Algorithm : LinearSearch (DATA, LB, UB, ITEM)


Description : This algorithm traverses & searches for the given element ITEM in the
linear array DATA with Lower Bound LB & Upper Bound UB.
Step 1 : [Initialize Counter k to the first index] Set k:=LB
Step 2 : Repeat Step 3 While k<=UB
Step 3 : [Check if the ITEM is equal with the array element]
IF DATA[k] = ITEM THEN
Set LOC := k
ELSE
Set k := k+1
Step 4 : If LOC != k then Set LOC :=NULL
Step 5 : Exit

2. Binary Search In Linear Array :


When the given data structure has large no. of data elements, the efficiency of linear
search decreases. Thus, another searching technique can be used as Binary Search.
Binary search follows divide and conquer approach in which, the list is divided into two
halves. The element to be searched may be either in the first half or the second half. The
algorithm goes on dividing each part into its two sub-parts for searching.
Due to this, the no. of comparisons to be performed are automatically reduced. For
binary search the data structure should be in the sorted form.

21
In Binary Search, the Smallest index is called BEGINNING and the Largest index is called
END, while the MIDDLE element index is called MID.
To divide the array into two halves, i.e. to get the middle index value of array, a formula
is used as :
MID = INT((BEG + END)/2)
 IF the ITEM to be searched is equal to the value of MID, LOC is set to MID, as :
If ITEM=DATA[MID] then
LOC=MID

 IF the ITEM to be searched is greater than the value of MID, Set BEG = MID + 1, as :
If ITEM>DATA[MID] then
BEG = MID + 1

 IF the ITEM to be searched is less than the value of MID, Set END=MID – 1, as :
If ITEM<DATA[MID] then
END=MID – 1
For e.g. Let DATA be the Linear Array with 10 elements.
DATA[10] = { 11,22,30,33,40,44,50,60,66,77 }
Find LOC of 44 using Binary Search?
DATA[10] = { 11,22,30,33,40,44,50,60,66,77 }
Solution : Here ITEM = 44,
Set BEG=1, End=10
MID = INT((BEG + END)/2)
= INT((1+10)/2)
= INT(11/2)
= INT(5.5)
MID = 5, thus DATA[5] = 40
Since ITEM > DATA[MID]
i.e. 44 > 40
Thus, BEG = MID + 1
= 5+1
BEG = 6
22
Now,
MID = INT((BEG + END)/2)
= INT(( 6+10 )/2)
= INT (16 / 2)
= INT(8)
MID = 8, thus DATA[8] = 60
Since ITEM < DATA[MID]
i.e. 44 < 60
Thus, END = MID - 1
= 8-1
END = 7

Now,
MID = INT((BEG + END)/2)
= INT(( 6+7 )/2)
= INT (13 / 2)
= INT(6.5)
MID = 6, thus DATA[6] = 44
Since ITEM = DATA[MID]
i.e. 44 = 44
Thus, LOC = MID
LOC = 6
The search is successful.

23
Algorithm : BinarySearch (DATA, LB, UB, ITEM, BEG, END, MID)
Description : This algorithm uses binary search method to search a given element ITEM
in the sorted linear array DATA by dividing the array into two parts until the element is
found.

Here, DATA is sorted array with lower bound LB & upper bound UB. The variables BEG,
END & MID denotes the Beginning, End & Middle locations of the array.
This algorithm finds the location of LOC or sets LOC=NULL if the search is unsuccessful.

Step 1 : [Initialize Segment Variables] Set BEG:=LB, END:=UB &

MID:=INT((BEG+END)/2)

Step 2 : Repeat Steps 3 & 4 while BEG<=END & DATA[MID]!=ITEM


Step 3 : If ITEM < DATA[MID] then
Set END := MID-1
Else
Set BEG := MID+1
Step 4 : Set MID:=INT((BEG+END)/2)
Step 5 : If DATA[MID]=ITEM then
Set LOC:=MID [Search Successful]
Else
Set LOC:=NULL [Search Unsuccessful]

24

You might also like