Data Science

Data Science
Data:
Data is a collection of information gathered by observations, measurements, research or analysis.
They may consist of facts, numbers, names, figures or even description of things.
Data is organized in the form of graphs, charts or tables.
There exist data scientist who does data mining and with the help of that data analyse our world
Information:
Information is data that has been processed , organized, or structured in a way that makes it
meaningful, valuable and useful.
It is data that has been given context , relevance and purpose.
It gives knowledge, understanding and insights that can be used for decision-making , problem-
solving, communication and various other purposes.
Why data is important ?

 Data helps in make better decisions.
 Data helps in solve problems by finding the reason for underperformance.
 Data helps one to evaluate the performance.
 Data helps one improve processes.
 Data helps one understand consumers and the market.
Categories of Data
Data can be categories into two main parts –
 Structured Data: This type of data is organized data into specific format, making it easy to
search , analyse and process. Structured data is found in a relational databases that includes
information like numbers, data and categories.
 Unstructured Data: Unstructured data does not conform to a specific structure or format. It
may include some text documents , images, videos, and other data that is not easily
organized or analysed without additional processing.
Types of Data
Generally data can be classified into two parts:
1. Categorial Data: In categorical data we see the data which have a defined category, for
example:
 Marital Status
 Political Party
 Eye colour
2. Numerical Data: Numerical data can further be classified into two categories:
 Discrete Data: Discrete data contains the data which have discrete numerical values
for example Number of Children, Defects per Hour etc.
 Continuous Data: Continuous data contains the data which have continuous
numerical values for example Weight, Voltage etc.
3. Nominal Scale: A nominal scale classifies data into several distinct categories in which no
ranking criteria is implied. For example Gender, Marital Status.
4. Ordinary Scale: An ordinal scale classifies data into distinct categories during which ranking is
implied For example:
 Faculty rank : Professor, Associate Professor, Assistant Professor
 Students grade : A, B, C, D.E.F
5. Interval scale: An interval scale may be an ordered scale during which the difference
between measurements is a meaningful quantity but the measurements don’t have a true
zero point. For example:
 Temperature in Fahrenheit and Celsius.
 Years
6. Ratio scale: A ratio scale may be an ordered scale during which the difference between the
measurements is a meaningful quantity and therefore the measurements have a true zero
point. Hence, we can perform arithmetic operations on real scale data. For example : Weight,
Age, Salary etc.
Data Science:
Data science is the study of data to extract meaningful insights for business.
It is a multidisciplinary approach that combines principles and practices from the fields of
mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts
of data.
This analysis helps data scientists to ask and answer questions like what happened, why it
happened, what will happen, and what can be done with the result.
Why is data science important?

1. Data science is important because it combines tools, methods, and technology to
generate meaning from data.
2. Modern organizations are inundated with data; there is a proliferation of devices that can
automatically collect and store information.
3. Online systems and payment portals capture more data in the fields of e-commerce,
medicine, finance, and every other aspect of human life.
4. We have text, audio, video, and image data available in vast quantities.
Future of data science

Artificial intelligence and machine learning innovations have made data processing faster and
more efficient. Industry demand has created an ecosystem of courses, degrees, and job
positions within the field of data science. Because of the cross-functional skillset and expertise
required, data science shows strong projected growth over the coming decades.
What is data science used for?
Data science is used to study data in four main ways:
1. Descriptive analysis
Descriptive analysis examines data to gain insights into what happened or what is happening in
the data environment. It is characterized by data visualizations such as pie charts, bar charts,
line graphs, tables, or generated narratives. For example, a flight booking service may record
data like the number of tickets booked each day. Descriptive analysis will reveal booking spikes,
booking slumps, and high-performing months for this service.
2. Diagnostic analysis
Diagnostic analysis is a deep-dive or detailed data examination to understand why something

happened. It is characterized by techniques such as drill-down, data discovery, data mining, and
correlations. Multiple data operations and transformations may be performed on a given data set
to discover unique patterns in each of these techniques. For example, the flight service might drill
down on a particularly high-performing month to better understand the booking spike. This may
lead to the discovery that many customers visit a particular city to attend a monthly sporting
event.
3. Predictive analysis
Predictive analysis uses historical data to make accurate forecasts about data patterns that may
occur in the future. It is characterized by techniques such as machine learning, forecasting,
pattern matching, and predictive modeling. In each of these techniques, computers are trained to
reverse engineer causality connections in the data. For example, the flight service team might
use data science to predict flight booking patterns for the coming year at the start of each year.
The computer program or algorithm may look at past data and predict booking spikes for certain
destinations in May. Having anticipated their customer’s future travel requirements, the company
could start targeted advertising for those cities from February.
4. Prescriptive analysis
Prescriptive analytics takes predictive data to the next level. It not only predicts what is likely to
happen but also suggests an optimum response to that outcome. It can analyze the potential
implications of different choices and recommend the best course of action. It uses graph
analysis, simulation, complex event processing, neural networks, and recommendation engines
from machine learning.
What are the data science techniques?

Data science professionals use computing systems to follow the data science process. The top
techniques used by data scientists are:
Classification
Classification is the sorting of data into specific groups or categories. Computers are trained to
identify and sort data. Known data sets are used to build decision algorithms in a computer that
quickly processes and categorizes the data. For example:·
 Sort products as popular or not popular·

 Sort insurance applications as high risk or low risk·
 Sort social media comments into positive, negative, or neutral.
Data science professionals use computing systems to follow the data science process.
Regression
Regression is the method of finding a relationship between two seemingly unrelated data points.
The connection is usually modelled around a mathematical formula and represented as a graph
or curves. When the value of one data point is known, regression is used to predict the other
data point. For example:·
 The rate of spread of air-borne diseases.·

 The relationship between customer satisfaction and the number of employees.·
 The relationship between the number of fire stations and the number of injuries due to
fire in a particular location.
Clustering
Clustering is the method of grouping closely related data together to look for patterns and
anomalies. Clustering is different from sorting because the data cannot be accurately classified
into fixed categories. Hence the data is grouped into most likely relationships. New patterns and
relationships can be discovered with clustering. For example: ·
 Group customers with similar purchase behaviour for improved customer service.·
 Group network traffic to identify daily usage patterns and identify a network attack
faster.
 Cluster articles into multiple different news categories and use this information to find
fake news content.
The basic principle behind data science techniques
While the details vary, the underlying principles behind these techniques are:
 Teach a machine how to sort data based on a known data set. For example, sample
keywords are given to the computer with their sort value. “Happy” is positive, while
“Hate” is negative.
 Give unknown data to the machine and allow the device to sort the dataset
independently.
 Allow for result inaccuracies and handle the probability factor of the result.
What are different data science technologies?

Data science practitioners work with complex technologies such as:
1. Artificial intelligence: Machine learning models and related software are used for
predictive and prescriptive analysis.
2. Cloud computing: Cloud technologies have given data scientists the flexibility and
processing power required for advanced data analytics.
3. Internet of things: IoT refers to various devices that can automatically connect to the
internet. These devices collect data for data science initiatives. They generate massive
data which can be used for data mining and data extraction.
4. Quantum computing: Quantum computers can perform complex calculations at high
speed. Skilled data scientists use them for building complex quantitative algorithms.
What is the difference between data science and data analytics?
While the terms may be used interchangeably, data analytics is a subset of data science. Data
science is an umbrella term for all aspects of data processing—from the collection to modelling to
insights. On the other hand, data analytics is mainly concerned with statistics, mathematics, and
statistical analysis. It focuses on only data analysis, while data science is related to the bigger
picture around organizational data. IN most workplaces, data scientists and data analysts work
together towards common business goals. A data analyst may spend more time on routine
analysis, providing regular reports. A data scientist may design the way data is stored,
manipulated, and analyze. Simply put, a data analyst makes sense out of existing data, whereas
a data scientist creates new methods and tools to process data for use by analysts.
What is the difference between data science and machine learning?
Machine learning is the science of training machines to analyze and learn from data the way
humans do. It is one of the methods used in data science projects to gain automated insights
from data. Machine learning engineers specialize in computing, algorithms, and coding skills
specific to machine learning methods. Data scientists might use machine learning methods as a
tool or work closely with other machine learning engineers to process data.
For example, the tuple (0, 1) indicates that the data scientist with id 0 (Hero) and the data scientist
with id 1 (Dunn) are friends. The network is illustrated in Figure 1-1
Zen Of Python:
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.

Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Python:
Python is an easy to learn, powerful programming language.
It has efficient high-level data structures and a simple but effective approach to object-oriented
programming.
Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas on most platforms.
Python is a programming language widely used by Data Scientists.
Python has in-built mathematical libraries and functions, making it easier to calculate mathematical
problems and to perform data analysis.
Python Libraries
Python has libraries with large collections of mathematical functions and analytical tools.
In this course, we will use the following libraries:
 Pandas - This library is used for structured data operations, like import CSV files, create
dataframes, and data preparation
 Numpy - This is a mathematical library. Has a powerful N-dimensional array object, linear
algebra, Fourier transform, etc.
 Matplotlib - This library is used for visualization of data.
 SciPy - This library has linear algebra modules

What is a Virtual Environment?
A virtual environment is a tool that helps to keep dependencies
required by different projects separate by creating
isolated Python virtual environments for them. This is one of the
most important tools that most Python developers use.
Why do we need a virtual environment?
Imagine a scenario where you are working on two web-based
Python projects one of them uses Django 4.0 and the other uses
Django 4.1 (check for the latest Django versions and so on). In
such situations, we need to create a virtual environment in
Python that can be really useful to maintain the dependencies of
both projects.
When and where to use a virtual
environment?
By default, every project on your system will use these same
directories to store and retrieve site-packages (third-party
libraries).
How does this matter? Now, in the above example of two
projects, you have two versions of Django. This is a real problem
for Python since it can’t differentiate between versions in the
“site-packages” directory. So both v1.9 and v1.10 would reside
in the same directory with the same name.
This is where virtual environments come into play. To solve this
problem, we just need to create two separate virtual
environments for both projects.
The great thing about this is that there are no limits to the
number of environments you can have since they’re just
directories containing a few scripts.
A virtual Environment should be used whenever you work on any
Python-based project. It is generally good to have one new
virtual environment for every Python-based project you work on.
So the dependencies of every project are isolated from the
system and each other.
Create Virtual Environment in Python
We use a module named virtualenv which is a tool to create
virtual environments in Python, isolated from system
environment Python.
virtualenv creates a folder that contains all the necessary
executables to use the packages that a Python project would
need.
Installing virtualenv
$ pip install virtualenv
Test your installation:
$ virtualenv --version
Create a new Virtual Environment
You can create a virtualenv using the following command:
$ virtualenv my_env
After running this command, a directory named my_env will be
created. This is the directory that contains all the necessary
executables to use the packages that a Python project would
need.
This is where Python packages will be installed. If you want to
specify the Python interpreter of your choice, for example,
Python 3, it can be done using the following command:
$ virtualenv -p /usr/bin/python3 virtualenv_name
Activating a Virtual Environment in Python
Now after creating a virtual environment, you need to activate it.
Remember to activate the relevant virtual environment every
time you work on the project. This can be done using the
following command:
Activate a Virtual Environment on Windows
To activate virtual environment using windows command
prompt change directory to your virtual env, Then use the below
command
$ cd <envname>
$ Scripts\activate
Note: source is a shell command designed for users running on
Linux (or any Posix, but whatever, not Windows).
Activate a virtual environment on Linux
$ source virtualenv_name/bin/activate
Once the virtual environment is activated, the name of your
virtual environment will appear on the left side of the terminal.
This will let you know that the virtual environment is currently
active.
Installing Dependencies in Virtual
Environment Python
In the image below, venv named virtual environment is active.
Now you can install dependencies related to the project in this
virtual environment.
For example, if you are using Django 1.9 for a project, you can
install it like you install other packages.
(virtualenv_name)$ pip install Django==1.9
The Django 1.9 package will be placed in virtualenv_name folder
and will be isolated from the complete system.
Deactivate Python Virtual Environment
Once you are done with the work, you can deactivate the virtual
environment by the following command:
(virtualenv_name)$ deactivate
Now you will be back to the system’s default Python installation.
Anaconda
Anaconda is an open source software that contains Jupyter,
spyder, etc that are used for large data processing, data
analytics, heavy scientific computing. Anaconda works for R and
Python programming language. Package versions are managed
by the package management system conda.
Installing Anaconda :
Head over to anaconda.com and install the latest version of
Anaconda. Make sure to download the “Python 3.7 Version” for
the appropriate architecture. Refer to the below articles for the
detailed information on installing anaconda on different
platforms.
Let’s go through the steps of creating a virtual environment using conda interface:
Step 1: Check if conda is installed in your path.
 Open up the anaconda command prompt.
 Type conda -V and press enter.

 If the conda is successfully installed in your system you should see a similar output.
conda -V
Output:
Step 2: Update the conda environment
 Enter the following in the anaconda prompt.
conda update conda
Step 3: Set up the virtual environment
 Type conda search “^python$” to see the list of available python versions.
 Now replace the envname with the name you want to give to your virtual environment and
replace x.x with the python version you want to use.
conda create -n envname python=x.x anaconda
Let’s create a virtual environment name Geeks for Python3.6
Step 4: Activating the virtual environment
 To see the list of all the available environments use command conda info -e
 To activate the virtual environment, enter the given command and replace your given
environment name with envname
conda activate envname
When conda environment is activated it modifies the PATH and shell variables points specifically to
the isolated Python set- up you created.
Step 5: Installation of required packages to the virtual environment
 Type the following command to install the additional packages to the environment and
replace envname with the name of your environment.
conda install -n yourenvname package
Step 6: Deactivating the virtual environment
 To come out of the particular environment type the following command. The settings of the
environment will remain as it is.
conda deactivate
Step 7: Deletion of virtual environment
 If you no longer require a virtual environment. Delete it using the following command and
replace your environment name with envname
conda remove -n envname -all

Python:
Python is a popular programming language. It was created by Guido van Rossum, and released in
1991.
It is used for:
 web development (server-side),

 software development,
 mathematics,
 system scripting.
 Data science
 Machine learning
 Artificial intelligence
Python is an easy to learn, powerful programming language.
It has efficient high-level data structures and a simple but effective approach to object-oriented
programming.
Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas on most platforms.
The Python interpreter and the extensive standard library are freely available in source or binary
form for all major platforms from the Python web site, https://www.python.org/, and may be freely
distributed.
The same site also contains distributions of and pointers to many free third-party Python modules,
programs and tools, and additional documentation.
The Python interpreter is easily extended with new functions and data types implemented in C or C+
+.
Python is also suitable as an extension language for customizable applications
What can Python do?

 Python can be used on a server to create web applications.
 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software development
Why Python?
 Python works on different platforms (Windows, Mac, Linux, Raspberry
Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer
lines than some other programming languages.
 Python runs on an interpreter system, meaning that code can be
executed as soon as it is written. This means that prototyping can be
very quick.
 Python can be treated in a procedural way, an object-oriented way or a
functional way.
Python Indentation
Indentation refers to the spaces at the beginning of a code line.
Where in other programming languages the indentation in code is for

readability only, the indentation in Python is very important.
Python uses indentation to indicate a block of code.
if 5 > 2:
print("Five is greater than two!")
Python Variables
In Python, variables are created when you assign a value to it:
x = 5
y = "Hello, World!"
Comments
Python has commenting capability for the purpose of in-code documentation.
Comments start with a #, and Python will render the rest of the line as a
comment:
#This is a comment.
print("Hello, World!")
Functions:
A function is a rule for taking zero or more inputs and returning a corresponding output.
In Python, we typically define functions using def:
def double(x):
""" This is where you put an optional docstring that explains what the function
does. For example, this function multiplies its input by 2. """
return x * 2
Python functions are first-class, which means that we can assign them to variables and pass them
into functions just like any other arguments:
def apply_to_one(f):
"""Calls the function f with 1 as its argument"""
return f(1)
Python Functions is a block of statements that return the

specific task. The idea is to put some commonly or repeatedly
done tasks together and make a function so that instead of
writing the same code again and again for different inputs, we
can do the function calls to reuse code contained in it over and
over again.
Some Benefits of Using Functions
 Increase Code Readability
 Increase Code Reusability
Python Function Declaration
The syntax to declare a function is:
Syntax of Python Function Declaration
Types of Functions in Python

Below are the different types of functions in Python:
 Built-in library function: These are Standard
functions in Python that are available to use.
 User-defined function: We can create our own
functions based on our requirements.
Creating a Function in Python
We can define a function in Python, using the def keyword. We
can add any type of functionalities and properties to it as we
require. By the following example, we can understand how to
write a function in Python. In this way we can create Python
function definition by using def keyword.
Python3
# A simple Python function
def fun():
print("Welcome to GFG")
Calling a Function in Python

After creating a function in Python we can call it by using the
name of the functions Python followed by parenthesis containing
parameters of that particular function. Below is the example for
calling def function Python.
Python3
# A simple Python function
def fun():
print("Welcome to GFG")
# Driver code to call a function

fun()
Output:
Welcome to GFG
Python Function with Parameters
If you have experience in C/C++ or Java then you must be
thinking about the return type of the function and data type of
arguments. That is possible in Python as well (specifically for
Python 3.5 and above).
Python Function Syntax with Parameters
def function_name(parameter: data_type) -> return_type:
"""Docstring"""
# body of the function
return expression
The following example uses arguments and parameters that you
will learn later in this article so you can come back to it again if
not understood.
Python3
def add(num1: int, num2: int) -> int:
"""Add two numbers"""
num3 = num1 + num2
return num3
# Driver code
num1, num2 = 5, 15
ans = add(num1, num2)
print(f"The addition of {num1} and {num2} results {ans}.")
Output:
The addition of 5 and 15 results 20.
Note: The following examples are defined using syntax 1, try to
convert them in syntax 2 for practice.
Python3
# some more functions
def is_prime(n):
if n in [2, 3]:
return True
if (n == 1) or (n % 2 == 0):
return False
r = 3
while r * r <= n:
if n % r == 0:
return False
r += 2
return True
print(is_prime(78), is_prime(79))
Output:
False True
Python Function Arguments
Arguments are the values passed inside the parenthesis of the
function. A function can have any number of arguments
separated by a comma.
In this example, we will create a simple function in Python to
check whether the number passed as an argument to the
function is even or odd.
Python3
# A simple Python function to check
# whether x is even or odd
def evenOdd(x):
if (x % 2 == 0):
print("even")
else:
print("odd")
# Driver code to call the function

evenOdd(2)
evenOdd(3)
Output:
even
odd
Types of Python Function Arguments
Python supports various types of arguments that can be passed
at the time of the function call. In Python, we have the following
function argument types in Python:
 Default argument
 Keyword arguments (named arguments)
 Positional arguments
 Arbitrary arguments (variable-length arguments *args
and **kwargs)
Let’s discuss each type in detail.
Default Arguments
A default argument is a parameter that assumes a default value
if a value is not provided in the function call for that argument.
The following example illustrates Default arguments to write
functions in Python.
Python3
# Python program to demonstrate
# default arguments
def myFun(x, y=50):
print("x: ", x)
print("y: ", y)
# Driver code (We call myFun() with only

# argument)
myFun(10)
Output:
x: 10
y: 50
Like C++ default arguments, any number of arguments in a
function can have a default value. But once we have a default
argument, all the arguments to its right must also have default
values.
Keyword Arguments
The idea is to allow the caller to specify the argument name with
values so that the caller does not need to remember the order of
parameters.
Python3
# Python program to demonstrate Keyword Arguments
def student(firstname, lastname):
print(firstname, lastname)
# Keyword arguments
student(firstname='Geeks', lastname='Practice')
student(lastname='Practice', firstname='Geeks')
Output:
Geeks Practice
Geeks Practice
Positional Arguments
We used the Position argument during the function call so that
the first argument (or value) is assigned to name and the second
argument (or value) is assigned to age. By changing the position,
or if you forget the order of the positions, the values can be used
in the wrong places, as shown in the Case-2 example below,
where 27 is assigned to the name and Suraj is assigned to the
age.
Python3
def nameAge(name, age):
print("Hi, I am", name)
print("My age is ", age)
# You will get correct output because

# argument is given in order
print("Case-1:")
nameAge("Suraj", 27)
# You will get incorrect output because
# argument is not in order
print("\nCase-2:")
nameAge(27, "Suraj")
Output:
Case-1:
Hi, I am Suraj
My age is 27
Case-2:
Hi, I am 27
My age is Suraj
Arbitrary Keyword Arguments
In Python Arbitrary Keyword Arguments, *args, and **kwargs can
pass a variable number of arguments to a function using special
symbols. There are two special symbols:
 *args in Python (Non-Keyword Arguments)
 **kwargs in Python (Keyword Arguments)
Example 1: Variable length non-keywords argument
Python3
# Python program to illustrate
# *args for variable number of arguments
def myFun(*argv):
for arg in argv:
print(arg)
myFun('Hello', 'Welcome', 'to', 'GeeksforGeeks')
Output:
Hello
Welcome
to
GeeksforGeeks
Example 2: Variable length keyword arguments
Python3
# Python program to illustrate
# *kwargs for variable number of keyword arguments
def myFun(**kwargs):
for key, value in kwargs.items():
print("%s == %s" % (key, value))
# Driver code
myFun(first='Geeks', mid='for', last='Geeks')
Output:
first == Geeks
mid == for
last == Geeks
Docstring
The first string after the function is called the Document string
or Docstring in short. This is used to describe the functionality of
the function. The use of docstring in functions is optional but it is
considered a good practice.
The below syntax can be used to print out the docstring of a
function.
Syntax: print(function_name.__doc__)
Example: Adding Docstring to the function
Python3
# A simple Python function to check
# whether x is even or odd
def evenOdd(x):
"""Function to check if the number is even or odd"""
if (x % 2 == 0):
print("even")
else:
print("odd")
# Driver code to call the function
print(evenOdd.__doc__)
Output:
Function to check if the number is even or odd
Python Function within Functions
A function that is defined inside another function is known as
the inner function or nested function. Nested functions can
access variables of the enclosing scope. Inner functions are used
so that they can be protected from everything happening outside
the function.
Python3
# Python program to
# demonstrate accessing of
# variables of nested functions
def f1():
s = 'I love GeeksforGeeks'
def f2():
print(s)
f2()
# Driver's code
f1()
Output:
I love GeeksforGeeks
Anonymous Functions in Python
In Python, an anonymous function means that a function is
without a name. As we already know the def keyword is used to
define the normal functions and the lambda keyword is used to
create anonymous functions.
Python3
# Python code to illustrate the cube of a number
# using lambda function
def cube(x): return x*x*x
cube_v2 = lambda x : x*x*x
print(cube(7))
print(cube_v2(7))
Output:
343
343
Recursive Functions in Python
Recursion in Python refers to when a function calls itself. There
are many instances when you have to build a recursive function
to solve Mathematical and Recursive Problems.
Using a recursive function should be done with caution, as a
recursive function can become like a non-terminating loop. It is
better to check your exit statement while creating a recursive
function.
Python3
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n - 1)
print(factorial(4))
Output
24
Here we have created a recursive function to calculate the
factorial of the number. You can see the end statement for this
function is when n is equal to 0.
Return Statement in Python Function
The function return statement is used to exit from a function and
go back to the function caller and return the specified value or
data item to the caller. The syntax for the return statement
is:
return [expression_list]
The return statement can consist of a variable, an expression, or
a constant which is returned at the end of the function
execution. If none of the above is present with the return
statement a None object is returned.
Example: Python Function Return Statement
Python3
def square_value(num):
"""This function returns the square
value of the entered number"""
return num**2
print(square_value(2))
print(square_value(-4))
Output:
4
16
Pass by Reference and Pass by Value
One important thing to note is, in Python every variable name is
a reference. When we pass a variable to a function Python, a
new reference to the object is created. Parameter passing in
Python is the same as reference passing in Java.
Python3
# Here x is a new reference to same list lst
def myFun(x):
x[0] = 20
# Driver Code (Note that lst is modified

# after function call.
lst = [10, 11, 12, 13, 14, 15]
myFun(lst)
print(lst)
Output:
[20, 11, 12, 13, 14, 15]
When we pass a reference and change the received reference to
something else, the connection between the passed and
received parameters is broken. For example, consider the below
program as follows:
Python3
def myFun(x):
# After below line link of x with previous

# object gets broken. A new object is assigned
# to x.
x = [20, 30, 40]
# Driver Code (Note that lst is not modified

lst = [10, 11, 12, 13, 14, 15]
myFun(lst)
print(lst)
Output:
[10, 11, 12, 13, 14, 15]
Another example demonstrates that the reference link is broken
if we assign a new value (inside the function).
Python3
def myFun(x):
# After below line link of x with previous

# object gets broken. A new object is assigned
# to x.
x = 20
# Driver Code (Note that x is not modified
x = 10
myFun(x)
print(x)
Output:
10
Exercise: Try to guess the output of the following code.
Python3
def swap(x, y):
temp = x
x = y
y = temp
# Driver code
x = 2
y = 3
swap(x, y)
print(x)
print(y)
Output:
2
3
Python has a set of built-in functions.
Function Description
abs() Returns the absolute value of a number
all() Returns True if all items in an iterable object are true
any() Returns True if any item in an iterable object is true
ascii() Returns a readable version of an object. Replaces none-ascii characters with escape chara
bin() Returns the binary version of a number

bool() Returns the boolean value of the specified object
bytearray() Returns an array of bytes
bytes() Returns a bytes object
callable() Returns True if the specified object is callable, otherwise False
chr() Returns a character from the specified Unicode code.
classmethod() Converts a method into a class method
compile() Returns the specified source as an object, ready to be executed
complex() Returns a complex number
delattr() Deletes the specified attribute (property or method) from the specified object
dict() Returns a dictionary (Array)
dir() Returns a list of the specified object's properties and methods

divmod() Returns the quotient and the remainder when argument1 is divided by argument2
enumerate() Takes a collection (e.g. a tuple) and returns it as an enumerate object
eval() Evaluates and executes an expression
exec() Executes the specified code (or object)
filter() Use a filter function to exclude items in an iterable object
float() Returns a floating point number
format() Formats a specified value
frozenset() Returns a frozenset object
getattr() Returns the value of the specified attribute (property or method)
globals() Returns the current global symbol table as a dictionary
hasattr() Returns True if the specified object has the specified attribute (property/method)
hash() Returns the hash value of a specified object

help() Executes the built-in help system
hex() Converts a number into a hexadecimal value
id() Returns the id of an object
input() Allowing user input
int() Returns an integer number
isinstance() Returns True if a specified object is an instance of a specified object
issubclass() Returns True if a specified class is a subclass of a specified object
iter() Returns an iterator object
len() Returns the length of an object
list() Returns a list
locals() Returns an updated dictionary of the current local symbol table
map() Returns the specified iterator with the specified function applied to each item
max() Returns the largest item in an iterable
memoryview() Returns a memory view object
min() Returns the smallest item in an iterable
next() Returns the next item in an iterable
object() Returns a new object
oct() Converts a number into an octal
open() Opens a file and returns a file object
ord() Convert an integer representing the Unicode of the specified character
pow() Returns the value of x to the power of y
print() Prints to the standard output device
property() Gets, sets, deletes a property
range() Returns a sequence of numbers, starting from 0 and increments by 1 (by default)
repr() Returns a readable version of an object
reversed() Returns a reversed iterator
round() Rounds a numbers
set() Returns a new set object
setattr() Sets an attribute (property/method) of an object
slice() Returns a slice object
sorted() Returns a sorted list
staticmethod() Converts a method into a static method
str() Returns a string object
sum() Sums the items of an iterator
super() Returns an object that represents the parent class
tuple() Returns a tuple

type() Returns the type of an object
vars() Returns the __dict__ property of an object
zip() Returns an iterator, from two or more iterators

What is Data Visualization?
Data visualization translates complex data sets into visual formats that are easier for the human brain
to comprehend. This can include a variety of visual tools such as:
 Charts: Bar charts, line charts, pie charts, etc.
 Graphs: Scatter plots, histograms, etc.
 Maps: Geographic maps, heat maps, etc.
 Dashboards: Interactive platforms that combine multiple visualizations.
The primary goal of data visualization is to make data more accessible and easier to interpret,
allowing users to identify patterns, trends, and outliers quickly. This is particularly important in the
context of big data, where the sheer volume of information can be overwhelming without
effective visualization techniques.
Types of Data for Visualization

Performing accurate visualization of data is very critical to market research where both numerical
and categorical data can be visualized, which helps increase the impact of insights and also helps in
reducing the risk of analysis paralysis. So, data visualization is categorized into the following
categories:
 Numerical Data
 Categorical Data
Let’s understand the visualization of data via a diagram with its all categories.
Why is Data Visualization Important?

Let’s take an example. Suppose you compile visualization data of the company’s profits from 2013 to
2023 and create a line chart. It would be very easy to see the line going constantly up with a drop in
just 2018. So you can observe in a second that the company has had continuous profits in all the
years except a loss in 2018.
It would not be that easy to get this information so fast from a data table. This is just one
demonstration of the usefulness of data visualization. Let’s see some more reasons why visualization
of data is so important.
1. Data Visualization Discovers the Trends in Data
2. Data Visualization Provides a Perspective on the Data
3. Data Visualization Puts the Data into the Correct Context
4. Data Visualization Saves Time
5. Data Visualization Tells a Data Story
Types of Data Visualization Techniques
Various types of visualizations cater to diverse data sets and analytical goals.
1. Bar Charts: Ideal for comparing categorical data or displaying frequencies, bar charts offer a
clear visual representation of values.
2. Line Charts: Perfect for illustrating trends over time, line charts connect data points to reveal
patterns and fluctuations.
3. Pie Charts: Efficient for displaying parts of a whole, pie charts offer a simple way to
understand proportions and percentages.
4. Scatter Plots: Showcase relationships between two variables, identifying patterns and
outliers through scattered data points.
5. Histograms: Depict the distribution of a continuous variable, providing insights into the
underlying data patterns.
6. Heatmaps: Visualize complex data sets through color-coding, emphasizing variations and
correlations in a matrix.
7. Box Plots: Unveil statistical summaries such as median, quartiles, and outliers, aiding in data
distribution analysis.
7. distribution analysis.
8. Area Charts: Similar to line charts but with the area under the line filled, these charts
accentuate cumulative data patterns.
9. Bubble Charts: Enhance scatter plots by introducing a third dimension through varying
bubble sizes, revealing additional insights.
10. Treemaps: Efficiently represent hierarchical data structures, breaking down categories into
nested rectangles.
11. Violin Plots: Violin plots combine aspects of box plots and kernel density plots, providing a
detailed representation of the distribution of data.
12. Word Clouds: Word clouds are visual representations of text data where words are sized
based on their frequency.
13. 3D Surface Plots: 3D surface plots visualize three-dimensional data, illustrating how a
response variable changes in relation to two predictor variables.
14. Network Graphs: Network graphs represent relationships between entities using nodes and
edges. They are useful for visualizing connections in complex systems, such as social
networks, transportation networks, or organizational structures.
Tools for Visualization of Data
The following are the 10 best Data Visualization Tools
1. Tableau
2. Looker
3. Zoho Analytics
4. Sisense
5. IBM Cognos Analytics
6. Qlik Sense
7. Domo
8. Microsoft Power BI
9. Klipfolio
10. SAP Analytics Cloud
Use-Cases and Applications of Data Visualization

1. Business Intelligence and Reporting
In the realm of Business Intelligence and Reporting,
organizations leverage sophisticated tools to enhance decision-
making processes. This involves the implementation of
comprehensive dashboards designed for tracking key
performance indicators (KPIs) and essential business metrics.
Additionally, businesses engage in thorough trend analysis to
discern patterns and anomalies within sales, revenue, and other
critical datasets. These visual insights play a pivotal role in
facilitating strategic decision-making, empowering stakeholders
to respond promptly to market dynamics.
2. Financial Analysis
Financial Analysis in the corporate landscape involves the
utilization of visual representations to aid in investment decision-
making. Visualizing stock prices and market trends provides
valuable insights for investors. Furthermore, organizations
conduct comparative analyses of budgeted versus actual
expenditures, gaining a comprehensive understanding of
financial performance. Visualizations of cash flow and financial
statements contribute to a clearer assessment of overall
financial health, aiding in the formulation of robust financial
strategies.
3. Healthcare
Within the Healthcare sector, the adoption of visualizations is
instrumental in conveying complex information. Visual
representations are employed to communicate patient outcomes
and assess treatment efficacy, fostering a more accessible
understanding for healthcare professionals and stakeholders.
Moreover, visual depictions of disease spread and
epidemiological data are critical in supporting public health
efforts. Through visual analytics, healthcare organizations
achieve efficient allocation and utilization of resources, ensuring
optimal delivery of healthcare services.
4. Marketing and Sales
In the domain of Marketing and Sales, data visualization
becomes a powerful tool for understanding customer behavior.
Segmentation and behavior analysis are facilitated through
visually intuitive charts, providing insights that inform targeted
marketing strategies. Conversion funnel visualizations offer a
comprehensive view of the customer journey, enabling
organizations to optimize their sales processes. Visual analytics
of social media engagement and campaign performance further
enhance marketing strategies, allowing for more effective and
targeted outreach.
What is Matplotlib?
Matplotlib is a low-level graph plotting library in python that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript for
Platform compatibility.
Installation of Matplotlib
If you have Python and PIP already installed on a system, then installation of Matplotlib is very easy.
Install it using this command:
C:\Users\Your Name>pip install matplotlib
If this command fails, then use a python distribution that already has Matplotlib installed, like
Anaconda, Spyder etc.
Import Matplotlib
Once Matplotlib is installed, import it in your applications by adding the import module statement:
import matplotlib
Checking Matplotlib Version

The version string is stored under __version__ attribute
import matplotlib
print(matplotlib.__version__)
Pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:
Draw a line in a diagram from position (0,0) to position (6,250):
import matplotlib.pyplot as plt

import numpy as np
xpoints = np.array([0, 6])

ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()
Result:
Plotting Without Line
To plot only the markers, you can use shortcut string notation parameter 'o', which means 'rings'.
Example
Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):

import numpy as np
xpoints = np.array([1, 8])

ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints, 'o')

plt.show()
Result:
Markers
You can use the keyword argument marker to emphasize each point with a specified marker:
Example Get your own Python Server
Mark each point with a circle:

import numpy as np
ypoints = np.array([3, 8, 1, 10])
plt.plot(ypoints, marker = 'o')

plt.show()
Result:
5. Hu
o man R
esounResources departments lev erage
data visualization to K-Nearest Neighbour is one of the simplest
Machine Learning algorithms based on Supervised Learning
o K-NN algorithm assumes the similarity between the new case/data
and available cases and put the new case into the category that is
most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new
data point based on the similarity. This means when new data
appears then it can be easily classified into a well suite category by
using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and
at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and
when it gets new data, then it classifies that data into a category
that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks
similar to cat and dog, but we want to know either it is a cat or dog.
So for this identification, we can use the KNN algorithm, as it works
on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the
most similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and
we have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:
How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of
neighbors
o Step-3: Take the K nearest neighbors as per the calculated
Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data
points in each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
o Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required
category. Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose

the k=5.
o Next, we will calculate the Euclidean distance between the data
points. The Euclidean distance is the distance between two points,
which we have already studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors,
as three nearest neighbors in category A and two nearest neighbors
in category B. Consider the below image:
ADVERTISEMENT
o As we can see the 3 nearest neighbors are from category A, hence
this new data point must belong to category A.
How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the
K-NN algorithm:
PlayNext
Mute
Current TimeÂ 0:00
DurationÂ 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
ADVERTISEMENT
o There is no particular way to determine the best value for "K", so we

need to try some values to find the best out of them. The most
preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to
the effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
o Always needs to determine the value of K which may be complex

some time.
o The computation cost is high because of calculating the distance
between the data points for all the training samples.
streamline proses and enhance workforce management.
The development of employee performance dashboards
facilitates efficient Here’s a table summarizing the
differences between Functional Programming (FP) and
Object-Oriented Programming (OOP):
| **Aspect** | **Functional Programming (FP)**

| **Object-Oriented Programming (OOP)** |
|--------------------------|-------------------------------------------------------
---|----------------------------------------------------|
| **Core Concept** | Functions are the primary building

blocks. | Objects (instances of classes) are the
primary building blocks. |
| **Data Handling** | Data is immutable; new data

structures are created instead of modifying existing ones. |
Data is mutable; objects can change state over time. |
| **Functions/Methods** | Focuses on pure functions with

no side effects. | Methods can modify the state of the
object (side effects are common). |
| **Modularity** | Achieved through higher-order

functions and function composition. | Achieved through
classes and objects. |
| **State Management** | No shared state; state is

passed explicitly between functions. | State is encapsulated
within objects and can be modified through methods. |
| **Code Reusability** | Achieved through function reuse

and higher-order functions. | Achieved through inheritance
and polymorphism. |
| **Programming Style** | Declarative; focuses on what to

do rather than how to do it. | Imperative; focuses on how to
achieve the result step-by-step. |
| **Control Structures** | Prefers recursion over loops.

| Uses loops and iterative constructs. |
| **Side Effects** | Avoids side effects; functions
should not alter external states. | Side effects are common;
methods often modify the object’s state. |
| **First-Class Citizens** | Functions are first-class citizens

(can be passed around and returned). | Objects are the
primary focus; methods operate on objects. |
| **Data Abstraction** | Abstracts data by using higher-

order functions and function composition. | Abstracts data
using classes and objects. |
| **Use Cases** | Best for tasks involving data

transformations, mathematical computations, and parallel
processing. | Best for modeling real-world entities and
relationships, especially in complex systems. |
| **Languages** | Haskell, Lisp, Erlang, Scala, F#,

and supported in Python, JavaScript. | Java, C++, Python,
C#, Ruby. |
| **Error Handling** | Often handled using monads or

similar constructs. | Typically handled using exceptions
and try/catch blocks. |
This table captures the key differences between FP and OOP, highlighting how they approach
programming from different angles. If you'd like to dive deeper into any of these aspects, let me
know!HR operations. Workforce demographics and diversity metrics are visually represented,
supporting inclusive practices within organizations. Additionally, analytics for recruitment and
retention strategies are enhanced through visual insights, contributing to more effective talent ma

Data Science

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Data Science

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science

Uploaded by

Copyright:

Available Formats

Data Science

Data is organized in the form of graphs, charts or tables.

It is data that has been given context , relevance and purpose.

Why data is important ?

 Data helps in solve problems by finding the reason for underperformance.

 Data helps one to evaluate the performance.

 Data helps one improve processes.

 Data helps one understand consumers and the market.

 Faculty rank : Professor, Associate Professor, Assistant Professor

 Students grade : A, B, C, D.E.F

 Temperature in Fahrenheit and Celsius.

Why is data science important?

Future of data science

Diagnostic analysis is a deep-dive or detailed data examination to understand why something

What are the data science techniques?

 Sort products as popular or not popular·

 The rate of spread of air-borne diseases.·

What are different data science technologies?

What is the difference between data science and machine learning?

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Special cases aren't special enough to break the rules.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

Python is a programming language widely used by Data Scientists.

In this course, we will use the following libraries:

 Matplotlib - This library is used for visualization of data.

 SciPy - This library has linear algebra modules

Now you will be back to the system’s default Python installation.

Step 1: Check if conda is installed in your path.

 Open up the anaconda command prompt.

 Type conda -V and press enter.

Step 2: Update the conda environment

 Enter the following in the anaconda prompt.

conda update conda

Step 3: Set up the virtual environment

conda create -n envname python=x.x anaconda

Let’s create a virtual environment name Geeks for Python3.6

Step 4: Activating the virtual environment

Step 5: Installation of required packages to the virtual environment

conda install -n yourenvname package

Step 6: Deactivating the virtual environment

Step 7: Deletion of virtual environment

conda remove -n envname -all

 web development (server-side),

Python is an easy to learn, powerful programming language.

Python is also suitable as an extension language for customizable applications

What can Python do?

 Python can be used alongside software to create workflows.

Although never is often better than right now.

cube_v2 = lambda x : xxx