Nothing Special   »   [go: up one dir, main page]

Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

DATA STRUCTURE

UNIT-1
Basic Terminology used in Data Structures:

 Data: Data refers to a value or set of values. Data item refers to a single or group of values
within the data. e.g. Marks obtained by the students.
 Entity: An entity is a thing that has some properties which can take values.
 Field: A field is part of a record and contains a single piece of data for the subject of the
record
 Record: A record is a collection of fields, possibly of different data types, typically in fixed
number and sequence.
 File:A file is a collection of data stored on mass storage
 Data Organization:
Each record in a file may contain many field items, but the value in a certain field may uniquely
determine the record in the file. Such a field K is called a Primary Key, and the values k1,k2…. In
such a field are called keys or key values.

Data structure def:


 “The logical or mathematical model of a particular organization of data & the relationship exists
between individual data items is called Data Structure”

 Data Structure is a way to organize the data in some way so that we can do certain operations(
insert, delete, search, sort) on these data in an effective way.

 Data Structures are the building blocks of a program.

Algorithm + Data Structure = Program

Example of Data structure:-

 Array, Structure, Stack, Queue, Tree, Graph, Linked List.

The choice of a Data Structure depends on two factors:-

1.It must reflect the actual relationship of data in the real world.

2.It should be simple enough to effectively process data.

Categories of Data Structure


We can classify Data Structure into many categories they are:-

 Homogeneous and Non homogenous d.s.

 Linear and Nonlinear d.s.


 Primitive and Non primitive d.s.

 Static and Dynamic d.s.

 Homogeneous D.S: All the elements of data are same type

 Ex:-Array , string

 Heterogeneous D.S :- All the data elements may not be of same type.

 Ex:- Structure, Union

 Linear D.S :- Data items are arranged in a Linear sequence & forms a linear ordering in memory.
so data can be processed one by one sequentially.

 Ex:- array, stacks, queues, linked lists,

 Non linear D.S:- Data items are arranged in a scattered manner or not in sequence.

 Ex:- tree , graph

 Static D.S.:- In these sizes and structures associated with memory location are fixed at compile
time.

 Ex:- int p[4];

 Dynamic D.S. :- Which expand or shrink as required during program execution and their
associated memory locations changes.

 Ex:- linked list.

Classification of Data Structure

Primitive Data Structures

These are basic data structures & are directly operated upon by the machine instructions.

They have different representation on different computers.

Ex:- integers, characters, floats, pointers , character constant, string constant etc.
1.integer:-These are whole numbers without fractional part. Integer takes 2 bytes in memory. it may be
signed , unsigned, long

Ex:- 2,4,60,-23 etc.

2.float :- It is used store fractional numbers .It takes 4 bytes.

Ex:- 2.5, 245.47

3.character:-We can store any character, its size is 1 byte. It is represented in computer memory by their
ASCII value. The ASCII value of A is 65.but space is 32.

Ex:- x , r , A , z, 4,*, ( ,+

4.Pointer:- Pointer is a variable which store the address . Pointers are used to access and store data in
memory.

declaration:- data type * pointer name ;

ex- int *ptr;

float * p;

char *p;

Nonprimitive Data Structure

These are more sophisticated data structures and derived from primitive data structures.

They are composed of same or different data structures.so they may be Homogeneous or
Heterogeneous.

1.Array:- Write from C Note defination, diagram, declar, types

2.Stacks:- It is a Linear list in which insertion and deletions can take place only at one end called Top. so
it is called LIFO kind of Data Structure. It is similar to stack of dishes. Operations like push, pop is done in
Stacks.

Insertion of element into Stack is called Push and deletion operation is called Pop.

3.Queue:- It is a linear list where deletions can take place only at one end of the list called FRONT and
insertion can take place only at other end called REAR.
It is a FIFO kind of data structure. It operates just like people waiting at bus stop.

4.Tree:- It is used to represent data containing a hierarchical relationship between elements.

The tree contains elements as its node. Each node can be deleted or added to it.

Tree is a Nonlinear data structure where data items are stored in sorted Sequence.

A node is present at the Top of the Tree is called ROOT. The remaining nodes are partitioned into no. of
Sub Trees.

5.LINKED LIST

A link list is a linear collection of data elements called nodes where the linear order is maintained by
pointers.

The nodes are non sequential . Each node is implemented in computer by a self referential STRUCTURE.

Each node is divided into two parts:-

• 1st part contains the information of the element (INFO)

• 2nd part called link field or next pointer field contains the address of next node (LINK)

TYPES OF LINKED LIST

1. Single linked list 2.Double linked list


3.Circular linked list (single circular and double circular )

6.Graph:-A graph G is an Ordered set(V,E) where V represents the set of elements, called Vertices and E
represents the Edge between these elements.

An edge connects a pair of vertices. vertices on graph are shown as circles. so an edge can be
represented as E=(V,W) Graphs may be different types :-

directed , undirected, connected, simple, multigraph.

Basic Operations on Data Structure

1.Searching-Finding the location of a given element in a list.

2.Sorting-Arranging elements in Ascending or descending order

3.Insertion-Adding a new element in to the list.

4.Traversal-Accessing each element exactly once .This operation is called Visiting the elements.

5.Deletion-Removing an existing element from the list.

6.Merging-Combining the elements of two similar sorted list of data into a single list.

SPACE COMPLEXITY

Space complexity is a measure of the amount of working storage an algorithm needs. That means how
much memory, in the worst case, is needed at any point in the algorithm. As with time complexity, we're
mostly concerned with how the space needs grow, in big-Oh terms, as the size N of the input problem
grows.

int sum(int x, int y, int z) {


int r = x + y + z;
return r;
}

requires 3 units of space for the parameters and 1 for the local variable, and this never changes, so this
is O(1).

int sum(int a[], int n) {


int r = 0;
for (int i = 0; i < n; ++i) {
r += a[i];
}
return r;
}
requires N units for a, plus space for n, r and i, so it's O(N). What are the space complexities of these
next two functions?

void matrixAdd(int a[], int b[], int c[],


int n) {
for (int i = 0; i < n; ++i) {
c[i] = a[i] + b[j]
}
}
void matrixMultiply(int a[], int b[], int
c[][], int n) { // not legal C++
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
c[i] = a[i] + b[j];
}
}
}

If a function A uses M units of its own space (local variables and parameters), and it calls a function B
that needs N units of local space, then A overall needs M + N units of temporary workspace. What if A
calls B 3 times? When a function finishes, its space can be reused, so if A calls B 3 times, it still only
needs M + N units of workspace.

What if A calls itself recursively N times? Then its space can't be reused because every call is still in
progress, so it needs O(N2) units of workspace.

Time Complexity
Time Complexity of an algorithm is the representation of the amount of time required by the algorithm
to execute to completion. Time requirements can be denoted or defined as a numerical function t(N),
where t(N) can be measured as the number of steps, provided each step takes constant time.

Big O Notation expresses the run time of an algorithm in terms of how quickly it grows relative to the
input ‘n’ by defining the N number of operations that are done on it. Thus, the time complexity of an
algorithm is denoted by the combination of all O[n] assigned for each line of function.

Time complexity of an algorithm signifies the total time required by the program to run till its
completion.

The time complexity of algorithms is most commonly expressed using the big O notation. It's an
asymptotic notation to represent the time complexity. We will study about it in detail in the next
tutorial.

Time Complexity is most commonly estimated by counting the number of elementary steps performed
by any algorithm to finish execution. Like in the example above, for the first code the loop will
run n number of times, so the time complexity will be n at least and as the value of n will increase the
time taken will also increase. While for the second code, time complexity is constant, because it will
never be dependent on the value of n, it will always give the result in 1 step.
Order of growth is how the time of execution depends on the length of the input. In the above example,
we can clearly see that the time of execution is linearly depends on the length of the array. Order of
growth will help us to compute the running time with ease. We will ignore the lower order terms, since
the lower order terms are relatively insignificant for large input. We use different notation to describe
limiting behavior of a function.
O-notation:
To denote asymptotic upper bound, we use O-notation. For a given function g(n), we denote
by O(g(n)) (pronounced “big-oh of g of n”) the set of functions:
O(g(n))= { f(n) : there exist positive constants c and n0 such that 0≤f(n)≤c∗g(n) for all n≥n0 }

Ω-notation:
To denote asymptotic lower bound, we use Ω-notation. For a given function g(n), we denote
by Ω(g(n)) (pronounced “big-omega of g of n”) the set of functions:
Ω(g(n))= { f(n) : there exist positive constants c and n0 such that 0≤c∗g(n)≤f(n) for all n≥n0 }

Θ-notation:
To denote asymptotic tight bound, we use Θ-notation. For a given function g(n), we denote
by Θ(g(n)) (pronounced “big-theta of g of n”) the set of functions:
Θ(g(n))= { f(n) : there exist positive constants c1,c2 and n0 such that 0≤c1∗g(n)≤f(n)≤c2∗g(n) for
all n>n0 }

Time complexity notations

Review of Array, Structures, Pointers.

Check from C note given in 1st semester.

DYNAMIC MEMORY ALLOCATION

The concept of dynamic memory allocation in c language enables the C programmer to allocate
memory at runtime. Dynamic memory allocation in c language is possible by 4 functions of
stdlib.h/alloc.h header file.

1. malloc()
2. calloc()
3. realloc()
4. free()

1. malloc( )
The malloc() function allocates single block of requested memory.

It doesn't initialize memory at execution time, so it has garbage value initially.

It returns NULL if memory is not sufficient.

The syntax of malloc() function is given below:

ptr=(cast-type*)malloc(byte-size)
program :-

2.calloc( )

The calloc() function allocates multiple block of requested memory.

It initially initialize all bytes to zero.

It returns NULL if memory is not sufficient.

The syntax of calloc() function is given below:

ptr=(cast-type*)calloc(number, byte-size); do a program

3.realloc( )

If memory is not sufficient for malloc() or calloc(), you can reallocate the memory by realloc() function.
In short, it changes the memory size.Let's see the syntax of realloc() function.

ptr=realloc(ptr, new-size) ; do a program

4.free( )

The memory occupied by malloc() or calloc() functions must be released by calling free() function.
Otherwise, it will consume memory until program exit.Let's see the syntax of free() function.

free(pointer name) ;

LINKED LIST
It is a linear collection of data elements called Nodes, that are stored in different memory locations
connected by pointers.

Advantages

1. Dynamic data structure that can Grow and Shrink.

2. Efficient Memory Utilization (Exact amount of data storage)

3. Insertion ,deletion , updation are easy and efficient.

4. Data stored in RAM but not sequentially.

Disadvantages

1.More memory space is needed , if number of fields are more.


2.Access to arbitrary node is cumbersome & time consuming.

3.Logical and Physical ordering of Nodes may not be same.

4. Searching is Slow.

5. Difficult to program because pointer manipulation required.

TYPES OF LINKED LIST

1. Linear linked list or One way linked list or Single linked list

2. Double Linked list or Two way linked list

3. Circular Linked List

1. Single Circular Linked list

2. Double Circular Linked List

OPERATIONS ON LINKED LIST

We can do following Operations on linked lists:-

1.Insert (Beginning ,End , at any Position)

2.Delete(Beginning ,End , at any Position)

3.Traversing or Displaying of Each element.

4.Searching of an Element.

5.Sorting of elements.

6.Merging of lists(Join or Concatenation ).

ONE WAY LIST OR SINGLE LINKED LIST

It is a One way collection of Nodes where the linear order is maintained by pointers. The nodes are
not in sequence. Each node is implemented in computer by a self referential STRUCTURE.

Each node is divided into two parts:-

• 1st part contains the information of the elements INFO

• 2nd part is link field contains the address of next node of the list (LINK)
1.An External pointer START contains the address of the 1st node. So an arrow drawn from START to
the 1st node of the list.

2.Last Node contains NULL . It is a NULL pointer indicates End of the list.

3.In C Linked List is created using structures, pointers & malloc.

4.The structure of a Node is:-

struct node

int info ;

struct node * link ;

};

6.The pointer variable START is declared as :-

struct node * START ;

These are called Self Referential structures because a structure contains pointers to structure of its own
type.

The new node is created & the address of the new node is assigned to start as :-

START= (struct node *) malloc(sizeof(struct node)) ;

CIRCULAR LINKED LIST OR SINGLE CIRCULAR LINKED LIST

 Single linked list where link field of last node contains address of the first node .

 Link field of last node doesn’t contain NULL .

 We can assign two External Pointer START AND LAST to Circular linked list OR we can assign
ONLY One External Pointer START to Circular Linked List.

We can declare the structure for the Circular linked list as:-

Struct Node {

int info ;

Struct node *link ;

};

Struct node *start,*last ;


DOUBLE LINKED LIST OR TWO WAY LIST

Double linked list is a linear collection of nodes where each node has two link fields :-

1.Predecessor or Previous 2. Successor or Next

3. INFO ( One Or More information fields)

The previous and Next Pointers are used to point the successor node and Predecessor node
respectively.

The structure for double linked list is:-

Struct node

int info ;

Struct node *prev,*Next ;

};

Struct node *start ;


DOUBLE CIRCULAR LINKED LIST

 The last node contains the address of the first node in Next part instead of NULL & first node
contains address of the last node in the PREV. part.

 So it is a double linked list having circular structure.

 We can assign two pointers START and LAST or one start pointer to the linked list

The structure of the DCLL is :-

Struct node {

int info ;

struct node *prev, *Next ; };

Struct node *START, *LAST ;

Questions:-

write the difference between single and single circular or circular linked list ?

Write the difference between double and circular double linked list?

SPARSE MATRIX

A matrix can be defined as a two-dimensional array having 'm' columns and 'n' rows representing m*n
matrix. Sparse matrices are those matrices that have the majority of their elements equal to zero. In other
words, the sparse matrix can be defined as the matrix that has a greater number of zero elements than
the non-zero elements.

Why do we need to use a sparse matrix instead of a simple matrix?

We can also use the simple matrix to store the elements in the memory; then why do we need to use the
sparse matrix. The following are the advantages of using a sparse matrix:
o Storage: As we know, a sparse matrix that contains lesser non-zero elements than zero so less
memory can be used to store elements. It evaluates only the non-zero elements.
o Computing time: In the case of searching n sparse matrix, we need to traverse only the non-zero
elements rather than traversing all the sparse matrix elements. It saves computing time by
logically designing a data structure traversing non-zero elements.

Representing a sparse matrix by a 2D array leads to the wastage of lots of memory. The zeroes in the
matrix are of no use to store zeroes with non-zero elements. To avoid such wastage, we can store only
non-zero elements. If we store only non-zero elements, it reduces the traversal time and the storage
space.

Sparse Matrix Representation

The non-zero elements can be stored with triples, i.e., rows, columns, and value. The sparse matrix can be
represented in the following ways:

o Array representation
o Linked list representation

Array Representation

The 2d array can be used to represent a sparse matrix in which there are three rows named as:

1. Row: It is an index of a row where a non-zero element is located.


2. Column: It is an index of the column where a non-zero element is located.
3. Value: The value of the non-zero element is located at the index (row, column).

Let's understand the sparse matrix using array representation through an example.

As we can observe above, that sparse matrix is represented using triplets, i.e., row, column, and value. In
the above sparse matrix, there are 13 zero elements and 7 non-zero elements. This sparse matrix occupies
5*4 = 20 memory space. If the size of the sparse matrix is increased, then the wastage of memory space
will also be increased. The above sparse matrix can be represented in the tabular form shown as below:
PROGRAM ON ARRAY IMPLEMENTATION OF SPARSE MATRIX IN C:

Linked List Representation

In linked list representation, linked list data structure is used to represent a sparse matrix. In linked list
representation, each node consists of four fields whereas, in array representation, there are three fields,
i.e., row, column, and value. The following are the fields in the linked list:

o Row: It is an index of row where a non-zero element is located.


o Column: It is an index of column where a non-zero element is located.
o Value: It is the value of the non-zero element which is located at the index (row, column).
o Next node: It stores the address of the next node.

Let's understand the sparse matrix using linked list representation through an example.

In the above figure, sparse represented in the linked list form. In the node, first field represents the index
of row, second field represents the index of column, third field represents the value and fourth field
contains the address of the next node.

*************

You might also like