Nothing Special   »   [go: up one dir, main page]

0% found this document useful (0 votes)
2 views60 pages

unit 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 60

DATA –

Data is a raw and unorganized fact that is required to be processed to make it meaningful.
It can be considered as facts and statistics collected together for reference or analysis.

Data are individual units of information. In analytical processes, data are represented by
variables. Data is always interpreted, by a human or machine, to derive meaning. So, data
is meaningless. Data contains numbers, statements, and characters in a raw form.

Let’s understand it with an example, see it is a fact (or data) that an apple a day can keep
the doctor away. But why it is not told, means this doesn’t tell us about what makes apple
healthy but if it tells that apple provides different vitamins, minerals, and fiber, which
keeps us healthy and are needed by our body; then it is information as it conveys the full
meaning of the fact.

ENTITY –
An entity is an object that exists. It doesn't have to do anything; it just has to exist.

An entity can be a single thing, person, place, or object. Data can be stored about such
entities.

The attributes of an entity further define the information being stored. For database
effectiveness, some attributes become entities. Entities are also joined together in
relationships.

INFORMATION –
Information is a set of data which is processed in a meaningful way according to the
given requirement. Information is processed, structured, or presented in a given
context to make it meaningful and useful. It is processed data which includes data that
possess context, relevance, and purpose.

z
DATA vs INFORMATION
Data is a collection of facts, while information puts those facts into context. While data is raw
and unorganized, information is organized.

EXAMPLE –

1. At a restaurant, a single customer’s bill amount is data. However, when the restaurant
owners collect and interpret multiple bills over a range of time, they can produce valuable
information, such as what menu items are most popular and whether the prices are
sufficient to cover supplies, overhead, and wages.
2. A customer’s response to an individual customer service survey is a point of data. But when
you compile that customer’s responses over time—and, on a grander scheme, multiple
customers responses over time—you can develop insights around areas for improvement
within your customer service team.
3. The number of likes on a social media post is a single element of data. When that’s
combined with other social media engagement statistics, like followers, comments, and
shares, a company can intuit which social media platforms perform the best and which
platforms they should focus on to more effectively engage their audience.

DIFFERENCE BETWEEN DATA AND INFORMATION

DATA INFORMATION

Data are the variables that help to develop Information is meaningful data.
ideas/conclusions.

Data are text and numerical values. Information is refined form of actual data.

Data doesn’t rely on Information. While Information relies on Data.

Data does not have any specific purpose. Information carries a meaning that has been assigned
by interpreting data.

Data does not directly helps in decision making. Information directly helps in decision making.

Data is collection of facts, which it self have no Information puts those facts into context.
meaning.

Example of data is student test score. Example of information is average score of class that
is derived from given data.
z
DATA TYPE
Data Type is the set of quantities that belongs together and are of a similar category. Data type is
used so that the compiler or interpreter of a programming language can be told about the data
which is to be used. The compiler also allocates the required amount of memory storage as per
the data type defined thus saving space.

Real-life Example of Different Data Types Used –


A student login system might require attributes such as Student Name, Age, Birth Date, Roll
Number, Address, Class, Division, Time of Login, Mobile Number, etc. Each variable needs to be
stored and segregated according to their Data Types during coding to inform the compiler about
the expected input data. This will be helpful to classify what type of data does the system contains
and it will be easier to work on that.

BUILT-IN DATA TYPE


Built-in Data types are those data types that are pre-defined by the programming language. Most
languages have native data types that are created for ease of execution of data. Such Data types
are called Built-in Data Types. These data types can be used directly in the program without the
hassle of creating them. Every programming language has its own specific set of built-in data
types.

They are also called Primary or Primitive Data Types. These Data types are believed to be one of
the fastest modes to execute operations on Data. The syntax used for defining these data types is
different for each programming language.

Real-Time Example of Built-in Data type –

Suppose we want to create an automatic exam grading system for the students of a school. The
data which we require will be Names, Roll numbers, Class, Division, Subjects, Marks, Grades. This
data consists of various categories of data like Alphabets, Strings, Characters, Integers, Decimal
numbers, Symbols, etc. For this purpose, we can use the built-in data types which are provided by
the programming languages. This data can be fed into the program by using the built-in data
types such that the system finds it easy to accept, calculate, interpret and give results to the user.

z
ABSTRACT DATA TYPE
An abstract data type is an abstraction of a data structure that provides only the interface to
which the data structure must adhere. The interface does not give any specific details about
something should be implemented or in what programming language.

In other words, we can say that abstract data types are the entities that are definitions of data
and operations but do not have implementation details. In this case, we know the data that
we are storing and the operations that can be performed on the data, but we don't know
about the implementation details. The reason for not having implementation details is that
every programming language has a different implementation strategy for example; a C data
structure is implemented using structures while a C++ data structure is implemented using
objects and classes.

For example, a List is an abstract data type that is implemented using a dynamic array and
linked list. A queue is implemented using linked list-based queue, array-based queue, and
stack-based queue. A Map is implemented using Tree map, hash map, or hash table.

z
DEFINITION OF DATA STRUCTURES
Data Structure is a branch of Computer Science. The study of data structure allows us to
understand the organization of data and the management of the data flow in order to increase
the efficiency of any process or program.

Data Structure is a particular way of storing and organizing data in the memory of the computer
so that these data can easily be retrieved and efficiently utilized in the future when required. The
data can be managed in various ways, like the logical or mathematical model for a specific
organization of data is known as a data structure.

Some examples of Data Structures are Arrays, Linked Lists, Stack, Queue, Trees, etc.

Data Structures are the main part of many Computer Science Algorithms as they allow the
programmers to manage the data in an effective way.

CLASSIFICATION OF DATA STRUCTURES


We can classify Data Structures into two categories:

1. Primitive Data Structure


2. Non-Primitive Data Structure

z
Primitive Data Structures :
1. Primitive Data Structures are the data structures consisting of the numbers and the
characters that come in-built into programs.
2. These data structures can be manipulated or operated directly by machine-level
instructions.
3. Basic data types like Integer, Float, Character, and Boolean come under the Primitive Data
Structures.
4. These data types are also called Simple data types, as they contain characters that can't be
divided further.

Non-Primitive Data Structures :


1. Non-Primitive Data Structures are those data structures derived from Primitive Data
Structures.
2. These data structures can't be manipulated or operated directly by machine-level
instructions.
3. The focus of these data structures is on forming a set of data elements that is
either homogeneous (same data type) or heterogeneous (different data types).
4. Based on the structure and arrangement of data, we can divide these data structures into
two sub-categories -
a) Linear Data Structures
b) Non-Linear Data Structures

LINEAR DATA STRUCTURE -


A data structure that preserves a linear connection among its data elements is known as a Linear
Data Structure. The arrangement of the data is done linearly, where each element consists of
the successors and predecessors except the first and the last data element.

The following is the list of Linear Data Structures that we generally use:

1. Arrays
2. Linked Lists
3. Stacks
4. Queues

z
ARRAYS :
It is defined as the collection of similar type of data items stored at contiguous memory
locations.

LINKED LISTS :
• A Linked List is a data structure that consists of nodes.
• Every node contains at least two fields.
• First field contains the information and second field contains address of next node.

z
STACKS :
• Stack is a data structure where elements can be inserted and deleted from one end, only
known as 'Top' of the Stack.
• Insertion in Stack is given a standard name Push and deletion is given a standard name Pop.

QUEUES :
• A Queue is an ordered collection of items into which items may be inserted at one
end called Rear and removed from another end called Front.

• Ordered means First in First Out (FIFO) or First Come First Serve (FCFS).

z
NON-LINEAR DATA STRUCTURE :
Non-Linear Data Structures are data structures where the data elements are not arranged in
sequential order. Here, the insertion and removal of data are not feasible in a linear manner.
There exists a hierarchical relationship between the individual data items.

The following is the list of Non-Linear Data Structures that we generally use:

1. Trees
2. Graphs

TREES :
A Tree is a non-linear hierarchical data structure that follows a Parent-Child relationship. It is a
collection of nodes where one node is defined as ROOT node and this root can have zero or
more children.

GRAPHS :
• A non-linear data structure consisting of vertices/nodes and edges/lines.
• A Graph can also be termed a mathematical model used to define pair-wise relations
between objects.
• A Graph is also defined as an ordered pair G = {V, E}
where V= a set of vertices and E = a set of edges.

z
DIFFERENCE BETWEEN LINEAR AND NON-LINEAR DATA STRUCTURE

Linear Data Structure Non-Linear Data Structure

Every item is related to its previous and Every item is attached with many other
next item. items.

Data is arranged in linear sequence. Data is not arranged in sequence.

Data items can be traversed in a single run. Data cannot be traversed in a single
run.

Eg. Array, Stacks, linked list, queue. Eg. tree, graph.

Implementation is easy. Implementation is difficult.

z
INTRODUCTION TO ALGORITHMS
Definition –

An algorithm is a process or a set of rules required to perform calculations or some other


problem-solving operations especially by a computer. The formal definition of an algorithm is that
it contains the finite set of instructions which are being carried in a specific order to perform the
specific task. It is not the complete program or code; it is just a solution (logic) of a problem,
which can be represented either as an informal description using a Flowchart or Pseudocode .

CHARACTERISTICS OF ALGORITHMS
1. Well-Defined Inputs: If an algorithm says to take inputs, it should be well-defined inputs.
2. Well-Defined Outputs: The algorithm must clearly define what output will be yielded
and it should be well-defined as well.
3. Clear and Unambiguous: Algorithm should be clear and unambiguous. Each of its steps
should be clear in all aspects and must lead to only one meaning.
4. Finite-ness: The algorithm must be finite, i.e. it should not end up in an infinite loops or
similar.
5. Language Independent: The Algorithm designed must be language-independent, i.e. it
must be just plain instructions that can be implemented in any language, and yet the
output will be same, as expected.
6. Feasible: The algorithm must be simple, generic and practical, such that it can be
executed upon with the available resources. It must not contain some future technology,
or anything.

z
DATAFLOW OF AN ALGORITHM
Problem: A problem can be a real-world problem or any instance from the real-world problem
for which we need to create a program or the set of instructions. The set of instructions is
known as an algorithm.

Algorithm: An algorithm will be designed for a problem which is a step-by-step procedure.

Input: After designing an algorithm, the required and the desired inputs are provided to the
algorithm.

Processing unit: The input will be given to the processing unit, and the processing unit will
produce the desired output.

Output: The output is the outcome or the result of the program.

ALGORITHM DESIGN TECHNIQUES


Following are some of the main algorithm design techniques:

1. Brute force algorithm: The general logic structure is applied to design an algorithm. It is
also known as an exhaustive search algorithm that searches all the possibilities to
provide the required solution. Such algorithms are of two types:
a) Optimizing: Finding all the solutions of a problem and then take out the best
solution or if the value of the best solution is known then it will terminate if the
best solution is known.
b) Sacrificing: As soon as the best solution is found, then it will stop.
2. Divide and conquer: It is a very implementation of an algorithm. It allows you to design
an algorithm in a step-by-step variation. It breaks down the algorithm to solve the
problem in different methods. It allows you to break down the problem into different
methods, and valid output is produced for the valid input. This valid output is passed to
some other function.
3. Greedy algorithm: It is an algorithm paradigm that makes an optimal choice on each
iteration with the hope of getting the best solution. It is easy to implement and has a
faster execution time. But there are very rare cases in which it provides the optimal
solution.

z
4. Dynamic programming: It makes the algorithm more efficient by storing the intermediate
results. It follows five different steps to find the optimal solution for the problem:

a) It breaks down the problem into a sub-problem to find the optimal solution.
b) After breaking down the problem, it finds the optimal solution out of these sub-
problems.
c) Stores the result of the sub-problems is known as memorization.
d) Reuse the result so that it cannot be recomputed for the same sub-problems.
e) Finally, it computes the result of the complex program.
5. Branch and Bound Algorithm: The branch and bound algorithm can be applied to only
integer programming problems. This approach divides all the sets of feasible solutions
into smaller subsets. These subsets are further evaluated to find the best solution.
6. Randomized Algorithm: As we have seen in a regular algorithm, we have predefined
input and required output. Those algorithms that have some defined set of inputs and
required output, and follow some described steps are known as deterministic
algorithms. What happens that when the random variable is introduced in the
randomized algorithm?. In a randomized algorithm, some random bits are introduced by
the algorithm and added in the input to produce the output, which is random in nature.
Randomized algorithms are simpler and efficient than the deterministic algorithm.
7. Backtracking: Backtracking is an algorithmic technique that solves the problem
recursively and removes the solution if it does not satisfy the constraints of a problem.

z
ANALYSIS OF ALGORITHMS
The algorithm can be analyzed in two levels, i.e., first is before creating the algorithm, and
second is after creating the algorithm. The following are the two analysis of an algorithm:

1. Priori Analysis: Here, priori analysis is the theoretical analysis of an algorithm which is
done before implementing the algorithm. Various factors can be considered before
implementing the algorithm like processor speed, which has no effect on the
implementation part.
2. Posterior Analysis: Here, posterior analysis is a practical analysis of an algorithm. The
practical analysis is achieved by implementing the algorithm using any programming
language. This analysis basically evaluate that how much running time and space taken
by the algorithm.

PERFORMANCE ANALYSIS OF ALGORITHM


The performance of the algorithm can be measured in two factors:

1) Time complexity: The time complexity of an algorithm is the amount of time required to
complete the execution. The time complexity of an algorithm is denoted by the big O
notation. Here, big O notation is the asymptotic notation to represent the time
complexity. The time complexity is mainly calculated by counting the number of steps to
finish the execution.
Let's understand the time complexity through an example.

sum=0;

// Suppose we have to calculate the sum of n numbers.

for i=1 to n

sum=sum+i;

// when the loop ends then sum holds the sum of the n numbers.

return sum;

In the above code, the time complexity of the loop statement will be atleast n, and if the
value of n increases, then the time complexity also increases. While the complexity of
the code, i.e., return sum will be constant as its value is not dependent on the value of n
and will provide the result in one step only. We generally consider the worst-time
complexity as it is the maximum time taken for any given input size.

z
2) Space complexity: An algorithm's space complexity is the amount of space required to
solve a problem and produce an output. Similar to the time complexity, space
complexity is also expressed in big O notation.
For an algorithm, the space is required for the following purposes:

i. To store program instructions


ii. To store constant values
iii. To store variable values
iv. To track the function calls, jumping statements, etc.
Auxiliary space: The extra space required by the algorithm, excluding the input size, is
known as an auxiliary space. The space complexity considers both the spaces, i.e.,
auxiliary space, and space used by the input.

So, Space complexity = Auxiliary space + Input size.

TIME COMPLEXITY / ORDER OF GROWTH


It defines the amount of time taken by any program with respect to the size of the input. Time
Complexity specifies how the program would behave as the order of size of input is increased.

Example : Imagine a classroom of 100 students in which you gave your pen to one person.
Now, you want that pen. Here are some ways to find the pen and what the O order is.

2
O(n ): You go and ask the first person of the class, if he has the pen. Also, you ask this
person about other 99 people in the classroom if they have that pen and so on,
2
This is what we call O(n ).

O(n): Going and asking each student individually is O(n).

O(log n): Now divide the class into two groups, then ask: “Is it on the left side, or the
right side of the classroom?” Then we take that group and divide it into two and ask
again, and so on. Repeat the process till you are left with one student who has your
pen. This is what you mean by O(log n).

z
EFFICIENCY OF AN ALGORITHM
The efficiency of an algorithm depends on the amount of time, storage and other resources
required to execute the algorithm. The efficiency is measured with the help of asymptotic
notations.

An algorithm may not have the same performance for different types of inputs. With the
increase in the input size, the performance will change.

The study of change in performance of the algorithm with the change in the order of the
input size is defined as asymptotic analysis.

z
ASYMPTOTIC NOTATIONS
Asymptotic notations are the mathematical notations used to describe the running time of
an algorithm when the input tends towards a particular value or a limiting value.

For example:

In bubble sort, when the input array is already sorted, the time taken by the
algorithm is linear i.e. the best case.

But, when the input array is in reverse condition, the algorithm takes the maximum
time (quadratic) to sort the elements i.e. the worst case.

When the input array is neither sorted nor in reverse order, then it takes average
time.

These durations are denoted using asymptotic notations.

There are mainly three asymptotic notations:

• Big-O notation
• Omega notation
• Theta notation
Big-O Notation (O-notation) –

Big-O notation represents the upper bound of the running time of an algorithm. Thus,
it gives the worst-case complexity of an algorithm.

Mathematically, if f(n) describes the running time of an algorithm; f(n) is O(g(n)) if and
only if there exist positive constants c and n° such that:

0 ≤ f(n) ≤ c g(n) for all n ≥ n°.

Here, n is the input size, and g(n) is any complexity function, for, e.g. n, n 2, etc. (It is
used to give upper bound on a function)

z
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }
Omega Notation (Ω-notation) –

Omega notation represents the lower bound of the running time of an algorithm.
Thus, it provides the best case complexity of an algorithm.

Let f(n) define the running time of an algorithm; f(n) is said to be Ω (g(n)) if and only
if there exist positive constants c and n° such that:

0 ≤ c g(n) ≤ f(n) for all n ≥ n°.

Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0

Theta Notation (Θ-notation) –

Theta notation encloses the function from above and below. Since it represents the
upper and the lower bound of the running time of an algorithm, it is used for
analyzing the average-case complexity of an algorithm.

For a function g(n), Θ(g(n)) is given by the relation:

Θ(g(n)) = { f(n): there exist positive constants c1, c2 and n0

such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0 }

The above expression can be described as a function f(n) belongs to the set Θ(g(n))
if there exist positive constants c1 and c2 such that it can be sandwiched between
c1g(n) and c2g(n), for sufficiently large n.

If a function f(n) lies anywhere in between c1g(n) and c2g(n) for all n ≥ n0, then f(n)
is said to be asymptotically tight bound.

z
Increasing order of common runtimes

z
ARRAYS
Arrays are defined as the collection of similar types of data items stored at contiguous
memory locations. It is one of the simplest data structures where each data element can
be randomly accessed by using its index number.

In C programming, they are the derived data types that can store the primitive type of
data such as int, char, double, float, etc. For example, if we want to store the age of 5
students, then we don't need to define a different variable for the age of different
student. Instead, we can define an array that can store the age of each student at the
contiguous memory locations.

PROPERTIES OF ARRAY
1. Arrays have a fixed size, where the size of the array is defined when the array is declared.
In the below given figure, the size of the array is fixed i.e., the size 5 is fixed and we
cannot add one more element in the array.

2. Ordered Set means every number will be in Sequence and will be denoted by numbers
called Index (“indices” in plural) or subscript.

z
3. Homogenous elements means Data type of all the elements will be same.

4. Elements of the array are stored at contiguous memory locations where the first
element is stored at the smallest memory location.

5. Elements of the array can be randomly accessed since we can calculate the address
of each element of the array with the given base address and the size of the data
element.
The general syntax to access an element is: <name_of_array>[<index>].

There are 3 types of indexing provided by different languages to access the array.

1) (zero-based indexing): The first element of the array is indexed by a subscript 0.


The index of nth element is “n-1” in this case. (C, C++, etc.).

2) (one-based indexing): The first element of the array is indexed by the


subscript 1. (Basic, MATLAB, R, Mathematica).

3) (n-based indexing): The base index of an array can be freely chosen. Usually,
programming languages allowing n-based indexing also to allow negative index
values. (Fortran, Pascal, ALGOL, etc.).
z
TYPES OF ARRAYS
1) One-Dimensional array
2) Two-Dimensional array
3) Multi-Dimensional array

One-Dimensional Array -

A one-dimensional array (or single dimension array) is an array with one subscript
only.

Declaration of one-dimensional array :

Syntax:

<Data Type> <Arrayname> [<Array Size>]

Example:

Declaration of one-dimensional array "A" having "int" data type and size 10 elements
in C.

int A[10]

Accessing the elements in one-dimensional array :

Accessing its elements involves a single subscript.

Syntax :

Array_name[index]

Example :

To access 2nd element in the array A, we write A[1]

To access 9th element in the array A, we write A[8]

(Here, we that assume the first index is 0).

z
Memory Representation of One-Dimensional Array :

The memory representation of one-dimensional array is shown in below diagram.

In the above diagram A[0], A[1], A[2],. . . , A[9] are the array elements. The address
mentioned for these elements represent the physical location of data elements in the
main memory. It is assumed that each element requires 4 bytes for storage in this
scenario.

Two-Dimensional Array –
A two-dimensional array (2-D array) has two subscripts. The elements are stored
in the form of rows and columns. It can also be termed as an array of one-
dimensional arrays.

Matrix is an example of two-dimensional array.

z
Declaration the two-dimensional array:

Syntax:

<Data Type> <Arrayname> [m][n]

Where,

m = Number of rows

n = Number of columns

Example:

Declaration of two-dimensional array “A” having “int” datatype and row size 6
and column size 5.

int A[6][5]

Accessing the element in two-dimensional array :

Accessing its elements involves two subscripts: one for row and second for
column.

Syntax:

Arrayname[ row_index][column_index]

Example:

To access 2nd element of 1st row in the array A, we write A[0][1]

z
Memory Representation of Two-Dimensional Array :

There are two-ways by which the 2D array elements can be represented in Memory.

1) Row Major

2) Column Major

Row Major Representation :

In row-major order, storage of the elements in the memory is row-wise i.e.


storage of elements of first row is followed by the storage of elements of
second row and so on so forth.

Column Major Representation:

In column-major order, elements are stored column wise i.e., storage of


elements of the first column followed by storage of second column elements
and so on so forth.

z
Derivation of Index Formula for One-Dimensional Array :

Suppose there is a one-dimensional array A[L : U].

Number of elements or lengths of the array can be found by the formula


N=U–L+1

Where,

U = Upper Bound of the array (Last index)

L = Lower Bound of the array (First index)

A[0:9] will contain 9-0+1 =10 elements.

Index Formula for One-Dimensional Array :

Address of A[i] = Base Address +n*(i – Lower Bound)

Question 1:

Given A [-1:10], bytes per cell = 4, base address = 2000. Find the address of
A [7].

Question 2:

Given A [1:15], bytes per cell = 3, base address = 1000. Find the address of
A [9].

z
Derivation of Index Formula for Two-Dimensional Array :

ROW MAJOR FORMULA

Address of A[I][J] = B + W * ((I – LR) * N + (J – LC))

I = Row Subset of an element whose address to be found,

J = Column Subset of an element whose address to be found,

B = Base address,

W = Storage size of one element store in an array(in byte),

LR = Lower Limit of row/start row index of the matrix(If not given assume
it as zero),

LC = Lower Limit of column/start column index of the matrix(If not given


assume it as zero),

N = Number of column given in the matrix.

COLUMN MAJOR FORMULA

Address of A[I][J] = B + W * ((J – LC) * M + (I– LR))

I = Row Subset of an element whose address to be found,

J = Column Subset of an element whose address to be found,

B = Base address,

W = Storage size of one element store in an array(in byte),

LR = Lower Limit of row/start row index of the matrix(If not given assume it
as zero),

LC = Lower Limit of column/start column index of the matrix(If not given


assume it as zero),

M = Number of rows given in the matrix.

z
QUESTIONS :

1. Suppose a 2D array A is declared as A[–2:2 , 2:6], words per cell = 4, base

address = 200. Consider Row major order arrangement.

⮚ A) Find out length of each dimension and the number of elements in array.
⮚ B) Find the location of A[1,2].
2. Suppose a 2D array A is declared as A[4….7 , -1…3], words per cell = 2, base
address = 100. Consider Row major order arrangement.

⮚ A) Find out length of each dimension and the number of elements in array.
⮚ B) Find the location of A[6,2].

3. Suppose a 2D array A is declared as A[–2:2, 2:6], words per cell = 4,


base address = 1024. Consider Column Major order arrangement.
⮚ A) Find the length of each dimension and number of elements in array.
⮚ B) Find the location of A[2,5].

z
BASIC OPERATIONS IN ARRAY :
1) Traversing
2) Insertion
3) Deletion

TRAVERSING IN AN ARRAY :
1) Explore the array elements one by one in sequential order..
2) Traversing elements atleast once.
3) Also called the visiting of an array.

Algorithm for Traversing an Array :


Traversal (data,LB,UB)

Here data is array name and LB is Lower Bound (start) index of the first
element of an array. UB is Upper Bound (End) is the index of the last element.

Step 1: Start

Step 2: [Initialize variable. ] Set i = LB

Step 3: Repeat steps 4 and 5 While i <= UB

Step 4: Apply process (Print ) to data[i].

Step 5: Increment i = i+1

Step 6: End of loop

Step 7: Stop

z
⮚ Time Complexity : O(N)
The above algorithm requires execution of for loop N times. Hence, the number of
statements to be executed is N.

INSERTION IN ARRAY :
⮚ To insert a data element in an array.
⮚ A new element can be added at the beginning, end, or at any given index
based on the requirement.

z
Algorithm to Insert an element in an Array :
Here size is the array size. Position (pos) is location where element to be
inserted and Item is new value in the array

Step 1: Start

Step 2: [Initialize variable ] Set i = size - 1

Step 3: Repeat Step 4 and 5 While i >= pos-1

th
Step 4: [Move i element forward. ] set arr[i+1] = arr[i]

Step 5: [Decrease counter. ] Set i = i - 1

Step 6: [End of step 3 loop. ]

Step 7: [Insert element. ] Set arr[pos-1] = item

Step 8: Stop

Time Complexity:
Worst Case: O(N)

When the element is to be inserted at the beginning, N number of


shifting will be required and two statements to assign the value and
increase the value of N. Hence, N+2 statements will be executed.
Average case complexity is same as worst case complexity.

Best Case : Ω(1)

When the element is to be inserted at the end, no shifting is


required. Therefore, only two statements will be executed.

z
DELETION IN ARRAY :
⮚ To delete an element from the given index in the array and re-organizes the array
elements with shifting.

Algorithm to Delete an element from an Array:


Step 1: Start
Step 2: [Initialize variable] Set i = pos - 1
Step 3: Repeat Step 4 and 5 for i = pos - 1 to i < size
th
Step 4: [Move i element left side ] set a[i] = a[i+1]
Step 5: [Increment ] Set i = i + 1
Step 6: [End of step 03 loop]
Step 7: [Update Array Size] set size = size - 1
Step 8: Stop

z
Time Complexity:
Worst Case: O(N)

When the element is to be deleted from the beginning, N number of


shifting will be required and two statements to assign the value and
decrease the value of N. Average case complexity is same as worst case
complexity.

Best Case : Ω(1)

When the element is to be deleted from the end, no shifting is


required.

APPLICATION OF ARRAYS
Below are some applications of arrays.

1) Storing and accessing data: Arrays are used to store and retrieve data in a
specific order. For example, an array can be used to store the scores of a group
of students, or the temperatures recorded by a weather station.
2) Sorting: Arrays can be used to sort data in ascending or descending order.
Sorting algorithms such as bubble sort, merge sort, and quick sort rely heavily
on arrays.
3) Searching: Arrays can be searched for specific elements using algorithms such
as linear search and binary search.
4) Matrices: Arrays are used to represent matrices in mathematical computations
such as matrix multiplication, linear algebra, and image processing.
5) Stacks and queues: Arrays are used as the underlying data structure for
implementing stacks and queues, which are commonly used in algorithms and
data structures.
6) Graphs: Arrays can be used to represent graphs in computer science. Each
element in the array represents a node in the graph, and the relationships
between the nodes are represented by the values stored in the array.
7) Dynamic programming: Dynamic programming algorithms often use arrays to
store intermediate results of sub problems in order to solve a larger problem.

z
SPARSE MATRIX
A matrix is a two-dimensional data object made of m rows and n columns, therefore
having total m x n values. If most of the elements of the matrix have 0 value, then it is
called a sparse matrix.

Why to use Sparse Matrix instead of simple matrix ?

Storage: There are lesser non-zero elements than zeros and thus lesser memory can be
used to store only those elements.

Computing time: Computing time can be saved by logically designing a data structure
traversing only non-zero elements.

TYPES OF SPARSE MATRIX :


Sparse matrix can be categorized into two :
• Sparse Matrix with pattern
• Sparse Matrix without a pattern

Sparse Matrix with Pattern –


The following matrices exhibit some pattern of non-zero elements.
1. Identity Matrix:- where only diagonal elements store 1. The rest of the elements
is zero.
2. Diagonal Matrix:- where only diagonal elements store non zero value. Rest of
the elements is zero.
3. Lower Triangular Matrix:- In this kind of matrix the diagonal elements and
elements below contain non zero elements. The rest of the elements is zero.
4. Upper Triangular Matrix:- In this kind of matrix the diagonal elements and
elements above contain non-zero elements. The rest of the elements is zero.
5. Tri-diagonal Matrix:- The tri-diagonal regular sparse matrix where all non-zero
elements lie on one of the three diagonals, the main diagonal above and below.
In a tri-diagonal regular sparse matrix, all the non-zero elements are stored in a 1-
dimensional array row by row.
z
Diagonal Sparse Matrix Triangular Sparse Matrix Tri-diagonal Sparse Matrix

IDENTITY SPARSE MATRIX :


A unit matrix, or identity matrix, is square matrix whose principal diagonal
elements are ones, and the rest of the elements of the matrix are zeros. An
identity matrix is always a square matrix and is expressed as “I”.

For example, “In” is the identity matrix of order n, i.e., it has “n” rows and
columns.

A square matrix P = [aij] is said to be an identity matrix when a ij = 1 for i = j and


aij = 0 for i ≠ j.

Unit matrix of order n × n

z
DIAGONAL SPARSE MATRIX :
Consider a sparse diagonal matrix given below:
th th th
i row and i column element of matrix can be mapped to i index in 1-D
Array (assuming the indexes starting with 1).

To save memory, we can use a one-dimensional array. Array index for <i, j> element
of the
Matrix will be stored at I index in the one-dimensional array.
e.g. Index of M[1][1] in the one dimensional array will be 1
Index of M[2][2] in the one dimensional array will be 2
Index of M[3][3] in the one dimensional array will be 3, and so on so forth

z
TRIANGULAR SPARSE MATRIX :
Consider a sparse diagonal matrix given below :

To save memory, we can use the one-dimensional array. Array index for <i, j>
element of the Matrix can be found with the following method.

th
There are i elements in the i row of the matrix. Thus, before storing <i, j> element
in the one- dimensional array, elements upto i –1 rows would already have been
placed.

No of elements upto i – 1 rows = 1+2+3+ … +i–1

= i*(i –1)/2

th
Before <i, j> element in an i column, element in j–1 columns would already have
been placed.

Total Elements before <i, j> = i*(i–1)/2 + (j–1)

For storage of <i, j> location in one dimensional array = i*(i–1)/2 + (j–1) + 1

e.g. Index of M[2][2] in the one dimensional array will be 2*(2-1)+1 = 3

Index of M[3][4] in the one dimensional array will be 3*(3-1)+3 = 7

Index of M[5][3] in the one dimensional array will be 5*(5-1)+2 = 22, and so on
so forth.

z
TRIDIAGONAL SPARSE MATRIX :
Consider a tri-diagonal sparse matrix given below.

To save memory, we can use a one-dimensional array. Array index for <i, j> element
of the Matrix can be found with the following method

There are three elements in each row except the first and last row;

No of elements upto i –1 rows = 2+(3+3+ … +i–2 times) = 2+3*(i–2)

In each row there are j-i+1 element before<i,j> entry

For storage of <i, j> element, location in one dimensional array

=2+3*(i–2) + j-i+1+1

= 2+3i–6+j-i+2= 2i+j-2

e.g. Index of M[2][2] in the one dimensional array will be 2*2+2-2 = 4

Index of M[3][4] in the one dimensional array will be 3*2+4-2 = 8

Index of M[5][4] in the one dimensional array will be 5*2+4-2 = 12, and so on so
forth.

z
REPRESENTATION OF SPARSE MATRIX :
The non-zero elements in the sparse matrix can be stored using triplets that are rows,
columns, and values. There are two ways to represent the sparse matrix that are listed
as follows -
 Array representation
 Linked list representation
Array representation of the sparse matrix –
Representing a sparse matrix by a 2D array leads to the wastage of lots of memory. This
is because zeroes in the matrix are of no use, so storing zeroes with non-zero elements
is wastage of memory. To avoid such wastage, we can store only non-zero elements. If
we store only non-zero elements, it reduces the traversal time and the storage space.
In 2D array representation of sparse matrix, there are three fields used that are named
as –

1) Row - It is the index of a row where a non-zero element is located in the matrix.
2) Column - It is the index of the column where a non-zero element is located in
the matrix.
3) Value - It is the value of the non-zero element that is located at the index (row,
column).

Example -

Let's understand the array representation of sparse matrix with the help of the
example given below -
Consider the sparse matrix –

Tabular Representation

z
Linked List representation of the sparse matrix –
In a linked list representation, the linked list data structure is used to represent the
sparse matrix. The advantage of using a linked list to represent the sparse matrix is
that the complexity of inserting or deleting a node in a linked list is lesser than the
array.
Unlike the array representation, a node in the linked list representation consists of
four fields. The four fields of the linked list are given as follows –
1) Row - It represents the index of the row where the non-zero element is located.
2) Column - It represents the index of the column where the non-zero element is
located.
3) Value - It is the value of the non-zero element that is located at the index (row,
column).
4) Next node - It stores the address of the next node.
The node structure of the linked list representation of the sparse matrix is shown in
the below image –

Consider the sparse matrix -

In the above figure, we can observe a 4x4 sparse matrix containing 5 non-zero
elements and 11 zero elements. Above matrix occupies 4x4 = 16 memory space.

Increasing the size of matrix will increase the wastage space.

The linked list representation of the above matrix is given below –

z
Problem
Check whether a matrix is a sparse matrix or not.
Solution
• Let us assume ZERO in the matrix is greater than (row * column)/2.
• Then, the matrix is a sparse matrix otherwise not.

z
LINKED LIST
A linked list or a one-way list, is a linear collection of data elements, called nodes, where
the linear order is given by means of pointers. That is each node is divided in to two
parts:

• the first part contains the information of the element,


• and the second part, called the link field or next pointer field contains the address
of the next node in the list.

Why Linked List?


Arrays can be used to store linear data of similar types, but arrays have the following
limitations.

1) The size of the arrays is fixed: So we must know the upper limit on the number of
elements in advance. Also, generally, the allocated memory is equal to the upper limit
irrespective of the usage.

2) Inserting a new element in an array of elements is expensive because the room has
to be created for the new elements and to create room existing elements have to be
shifted but in Linked list if we have the head node then we can traverse to any node
through it and insert new node at the required position.
For example, in a system, if we maintain a sorted list of IDs in an array id[].
id[] = [1000, 1010, 1050, 2000, 2040].
And if we want to insert a new ID 1005, then to maintain the sorted order, we have to
move all the elements after 1000 (excluding 1000).

3) Deletion is also expensive with arrays until unless some special techniques are
used. For example, to delete 1010 in id[], everything after 1010 has to be moved due
to this so much work is being done which affects the efficiency of the code.

z
Properties
• A Linked List is identified through the address of the first node
• To access a node, we need to reach to that node

Advantages over arrays :


1) Dynamic size
2) Ease of insertion/deletion

Drawbacks :
1) Random access is not allowed. We have to access elements sequentially
starting from the first node (head node). So we cannot do binary search with
linked lists efficiently with its default implementation.
2) Extra memory space for a pointer is required with each element of the list.
3) Not cache friendly. Since array elements are contiguous locations, there is
locality of reference which is not there in case of linked lists.

z
Representation of Linked List in Memory
A linked list is represented by a pointer to the first node of the linked list.

The first node is called the head.


If the linked list is empty, then the value of the head points to NULL.

Each node in a list consists of at least two parts:


1) data (we can store integer, strings or any type of data).
2) Pointer (Or Reference) to the next node (connects one node to another).
In C, we can represent a node using structures.

z
Memory Allocation-
Whenever a new node is created, memory is allocated by the system. This memory is
taken from list of those memory locations which are free i.e. not allocated. This list is
called AVAIL List. Similarly, whenever a node is deleted, the deleted space becomes
reusable and is added to the list of unused space i.e. to AVAIL List. This unused space
can be used in future for memory allocation.
Memory allocation is of two types-
1. Static Memory Allocation
2. Dynamic Memory Allocation

Static Memory Allocation:


When memory is allocated during compilation time, it is called ‘Static Memory
Allocation’. This memory is fixed and cannot be increased or decreased after
allocation. If more memory is allocated than requirement, then memory is wasted. If
less memory is allocated than requirement, then program will not run successfully. So
exact memory requirements must be known in advance.

Dynamic Memory Allocation:


When memory is allocated during run/execution time, it is called ‘Dynamic Memory
Allocation’. This memory is not fixed and is allocated according to our requirements.
Thus, in it there is no wastage of memory. So, there is no need to know exact memory
requirements in advance.
Garbage Collection-
• Whenever a node is deleted, some memory space becomes reusable. This
memory space should be available for future use.
• One way to do this is to immediately insert the free space into availability list.
But this method may be time consuming for the operating system. So another
method is used which is called ‘Garbage Collection’.
• In this method the OS collects the deleted space time to time onto the
availability list.
• This process happens in two steps. In first step, the OS goes through all the
lists and tags all those cells which are currently being used. In the second step,
the OS goes through all the lists again and collects untagged space and adds
this collected space to availability list.
• The garbage collection may occur when small amount of free space is left in
the system or no free space is left in the system or when CPU is idle and has
time to do the garbage collection.

z
Overflow & Underflow -
• Overflow happens at the time of insertion. If we have to insert new space into
the data structure, but there is no free space i.e. availability list is empty, then
this situation is called ‘Overflow’. The programmer can handle this situation by
printing the message of OVERFLOW.
• Underflow happens at the time of deletion. If we have to delete data from the
data structure, but there is no data in the data structure i.e. data structure is
empty, then this situation is called ‘Underflow’. The programmer can handle this
situation by printing the message of UNDERFLOW.

Notations used in Linked List –


For a node having address P –

The fields are accessed as:


P→ Info
P→Next

• GetNode( ) is used for allocation of memory for new node


• START is used for keeping the address of First node. In case the Linked List is
empty, START keeps a NULL.
• Item is used to refer to the element to be inserted.
• When the nodes are deleted, their memory is freed by calling the function
FreeNode( )
• For the new node, P is used to keep the address of the newly created node.

z
z
Types of Linked List
Following are the various types of linked list.

• Simple Linked List − Item navigation is forward only.


• Doubly Linked List − Items can be navigated forward and backward.
• Circular Linked List − Last item contains link of the first element as next and

the first element has a link to the last element as previous.

Linear/Simple Linked List: The diagram below shows linear Linked List where each
node contains information and the address of the next node. The address field of
last node contains no address (NULL).

Doubly Linked List: Each node in this type of the Linked List contains at 2 address
fields (One for the address of previous node and other one for the address of the
next node).

Circular Linked List: Circular Linked List is more like Linear Linked List except that the
address field of the last node contains the address of the first node. START keeps the
address of the last node in the circular Linked List.

z
z
z
z
z
z
z
z
z
z
z
z
z

You might also like