Summer Training Report

SUMMER TRAINING REPORT
ON
MACHINE LEARNING
(USING PYTHON)
Submitted in the partial fulfilment of

the requirements for the award of degree
BACHELORS IN COMPUTER APPLICATIONS

FROM
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Submitted By: Submitted to:

Name of Student: Ishan Patwal Name of Faculty:
Enrolment Number: 02224002020 Ms. Sweety
Course: BCA Designation:
Semester: 5th Assistant Professor
Shift: 1st
Year: 2020 – 2023
DECLARATION
I Ishan Patwal Enrollment No. 02224002020 , student of
Bachelors in Computer Applications, a class of 2020-23,
Trinity Institute of Professional Studies, Dwarka hereby
declare that the Summer Training project report entitled
Machine Learning (Using Python) is an original work and the
same has not been submitted to any other Institute for the
award of any other degree.
Ishan Patwal
02224002020
Bachelors in Computer Applications
Acknowledgement
First and foremost, I would like to express my profound

gratitude to Mr. P Naveen, Academic Head, Smartknower,
Delhi and for giving us the opportunity to carry out our
project at Smartknower. We find great pleasure to express
our unfeigned thanks to our trainer Mr. P Naveen for his
invaluable guidance, support and useful suggestions at
every stage of this project work.
No words can express out deep sense of gratitude to Mr.

Naveen, without whom this project would not have turned
up this way. My heartfelt thanks to him for his immense
help and support, useful discussions and valuable
recommendations throughout the course of my project
work.
I wish to thank our respected faculty Ms. Sweety and our

classmates for their support.
Ishan Patwal
Chapter 1
About The Company
Smartknower is an organization committed to the Ed-Tech
space. They aim to empower the lives of students based in
Tier-II and Tier-III towns by upskilling in modern technological
and entrepreneurial domains and benefit from these programs
to become a productive and proactive part of the corporate
space. Through rigorous and immersive training along with
simulated real-life technical exposure through live projects, they
train the candidate to be among the most fulfilled and develop
their all-round capability as well. With AI-based software at its
core, Smartknower offers a connected ecosystem accessible
from anywhere and by anyone.

PREFACE
In the 60 days summer training we study about so many

languages and then we chose to learn Machine Learning (with
Python) in our summer training used because it is easy to
manage, and it is object oriented and availability of debugging
tools. And then we are start to search the best institute who
give us summer training in Python. Then we found that
Smartknower is the best company who deal in the Python.
Then we start our 60 days summer training from Smartknower.
First we learn how to make basic programs in Python. Then we
start Machine Learning concepts with Python. Machine
Learning is a field of Artificial Intelligence that uses statistical
techniques to give computer systems the ability to computer
systems to learn from the given dataset, without being explicitly
programmed. After 60 days training we are able to develop
applications in Python. In 60 days’ training we implement this
technology to Automation system for house loan predictor.
Keywords: Python, Machine Learning, House price predictor.

CHAPTER 2
LITERATURE REVIEW
2.1 Python:-
Python is an interpreted high-level programming language for general-purpose

programming. Created by Guido van Rossum and first released in 1991,
Python has a design philosophy that emphasizes code readability, notably
using significant whitespace. It provides constructs that enable clear
programming on both small and large scales. In July 2018, Van Rossum
stepped down as the leader in the language community after 30 years. Python
features a dynamic type system and automatic memory management. It
supports multiple programming paradigms, including objectoriented,
imperative, functional and procedural, and has a large and comprehensive
standard library. Python interpreters are available for many operating systems.
CPython, the reference implementation of Python, is open-source software
and has a community-based development model, as do nearly all of Python's
other implementations. Python and CPython are managed by the non-profit
Python Software Foundation. Python has a simple, easy to learn syntax
emphasizes readability hence, it reduces the cost of program maintenance.
Also, Python supports modules and packages, which encourages program
modularity and code reuse.
2.1.1Advantages of using PYTHON
The diverse application of the Python language is a result of the
combination of features which give this language an edge over
others. Some of the benefits of programming in Python include:
1. Presence of Third Party Modules: The Python Package Index (PPI)

contains numerous third-party modules that make Python capable of
interacting with most of the other languages and platforms.
2. Extensive Support Libraries: Python provides a large standard

library which includes areas like internet protocols, string operations,
web services tools and operating system interfaces. Many high use
programming tasks have already been scripted into the standard
library which reduces length of code to be written significantly.
3. Open Source and Community Development: Python language is

developed under an OSI-approved open source license, which makes
it free to use and distribute, including for commercial purposes.
Further, its development is driven by the community which
collaborates for its code through hosting conferences and mailing
lists, and provides for its numerous modules.
4. Learning Ease and Support Available: Python offers excellent

readability and uncluttered simple-to-learn syntax which helps
beginners to utilize this programming language. The code style
guidelines, PEP 8, provide a set of rules to facilitate the formatting of
code. Additionally, the wide base of users and active developers has
resulted in a rich internet resource bank to encourage development
and the continued adoption of the language.
5. User-friendly Data Structures: Python has built-in list and
dictionary data structures which can be used to construct fast
runtime data structures. Further, Python also provides the option of
dynamic high-level data typing which reduces the length of support
code that is needed.
6. Productivity and Speed: Python has clean object-oriented design,

provides enhanced process control capabilities, and possesses strong
integration and text processing capabilities and its own unit testing
framework, all of which contribute to the increase in its speed and
productivity. Python is considered a viable option for building
complex multi-protocol network applications.
2.2 DATA SCIENCE:-
“Data science” is just about as broad of a term as they come. It may be easiest to describe
what it is by listing its more concrete components:
1) Data exploration & analysis:-.
 Included here: Pandas; NumPy; SciPy; a helping hand from Python’s

Standard Library.
2) Data visualization:- A pretty self-explanatory name. Taking data and turning it

into something colorful.
 Included here: Matplotlib; Seaborn; Datashader;
3) Classical machine learning:- Conceptually, we could define this as any

supervised or unsupervised learning task that is not deep learning (see below).
Scikit-learn is farand-away the go-to tool for implementing classification, regression,
clustering, and dimensionality reduction, while StatsModels is less actively
developed but still has a number of useful features.
 Included here: Scikit-Learn, StatsModels.
4) Deep learning:- This is a subset of machine learning that is seeing a

renaissance, and is commonly implemented with Keras, among other libraries. It has
seen monumental improvements over the last ~5 years, such as AlexNet in 2012,
which was the first design to incorporate consecutive convolutional layers.
 Included here: Keras, TensorFlow, and a whole host of others.

5) Data storage and big data frameworks:-Big data is best defined as data that is
either literally too large to reside on a single machine, or can’t be processed in the
absence of a distributed environment. The Python bindings to Apache technologies
play heavily here.
 Apache Spark; Apache Hadoop; HDFS; Dask; h5py/pytables.
6) Odds and ends. Includes subtopics such as natural language processing, and
image manipulation with libraries such as OpenCV.
 Included here: nltk; Spacy; OpenCV/cv2; scikit-image; Cython.
2.2.1 Practical Implementation of Data Science:-
Problem Statement: You are given a dataset which

comprises of comprehensive statistics on a range of aspects
like distribution & nature of prison institutions, overcrowding in
prisons, type of prison inmates etc. You have to use this
dataset to perform descriptive statistics and derive useful
insights out of the data. Below are few tasks:
1. Data loading: Load a dataset “prisoners.csv” using pandas

and display the first and last five rows in the dataset. Then find
out the number of columns using describe method in Pandas.
2. Data Manipulation: Create a new column -“total benefitted”,

which is the sum of inmates benefitted through all modes.
3. Data Visualization: Create a bar plot with each state name

on the x-axis and their total benefitted inmates as their bar
heights.
Solution:
For data loading, Write the below code:-
Now to use describe method in Pandas, just type the below statement:
Next in Python with data science article, let us perform data manipulation.
And finally, let us perform some visualization in Python for data science article. Refer
the below code:
Output:-
2.3 MACHINE LEARNING:-
 Machine learning is a subset of artificial intelligence in the field of computer

science that often uses statistical techniques to give computers the ability to
"learn" (i.e., progressively improve performance on a specific task) with data,
without being explicitly programmed.
 Machine learning is closely related to (and often overlaps with) computational

statistics, which also focuses on prediction-making through the use of
computers. It has strong ties to mathematical optimization, which delivers
methods, theory and application domains to the field.
 Machine learning (ML) is a category of algorithm that allows software

applications to become more accurate in predicting outcomes without being
explicitly programmed. The basic premise of machine learning is to build
algorithms that can receive input data and use statistical analysis to predict an
output while updating outputs as new data becomes available.[3]
2.3.1 How Machine Learning works?
 Machine learning algorithms are often categorized as supervised or

unsupervised. Supervised algorithms require a data scientist or data analyst
with machine learning skills to provide both input and desired output, in
addition to furnishing feedback about the accuracy of predictions during
algorithm training. Data scientists determine which variables, or features, the
model should analyze and use to develop predictions. Once training is
complete, the algorithm will apply what was learned to new data.
 Unsupervised algorithms do not need to be trained with desired outcome

data. Instead, they use an iterative approach called deep learning to review
data and arrive at conclusions. Unsupervised learning algorithms -- also
called neural networks -- are used for more complex processing tasks than
supervised learning systems, including image recognition, speech-to-text and
natural language generation. These neural networks work by combing through
millions of examples of training data and automatically identifying often subtle
correlations between many variables. Once trained, the algorithm can use its
bank of associations to interpret new data. These algorithms have only
become feasible in the age of big data, as they require massive amounts of
training data.
2.3.2Advantages of Machine Learning
1. Trends and Patterns Are Identified With Ease

Machine Learning is adept at reviewing large volumes of data and identifying
patterns and trends that might not be apparent to a human. For instance, a machine
learning program may successfully pinpoint a causal relationship between two
events. This makes the technology highly effective at data mining, particularly on a
continual, ongoing basis, as would be required for an algorithm.
2. Machine Learning Improves Over Time

Machine Learning technology typically improves efficiency and accuracy over time
thanks to the ever-increasing amounts of data that are processed. This gives the
algorithm or program more “experience,” which can, in turn, be used to make better
decisions or predictions.
A great example of this improvement over time involves weather prediction models.
Predictions are made by looking at past weather patterns and events; this data is
then used to determine what’s most likely to occur in a particular scenario. The more
data you have in your data set, the greater the accuracy of a given forecast. The
same basic concept is also true for algorithms that are used to make decisions or
recommendations.
3. Machine Learning Lets You Adapt Without Human Intervention

Machine Learning allows for instantaneous adaptation, without the need for human
intervention. An excellent example of this can be found in security and anti-virus
software programs, which leverage machine learning and AI technology to
implement filters and other safeguards in response to new threats.
4.Automation
Machine Learning is a key component in technologies such as predictive analytics
and artificial intelligence. The automated nature of Data Science means it can save
time and money, as developers and analysts are freed up to perform high-level tasks
that a computer simply cannot handle.
On the flip side, you have a computer running the show and that’s something that is
certain to make any developer squirm with discomfort. For now, technology is
imperfect. Still, there are workarounds. For instance, if you’re employing Data
Science technology in order to develop an algorithm, you might program the Data
Science interface so it just suggests improvements or changes that must be
implemented by a human.
This workaround adds a human gatekeeper to the equation, thereby eliminating the
potential for problems that can arise when a computer is in charge. After all, an
algorithm update that looks good on paper may not work effectively when it’s put
practice.
Various Python libraries used in the project:
2.1 Numpy
NumPy is the fundamental package for scientific computing with Python. It

contains among other things:
 a powerful N-dimensional array object

 sophisticated (broadcasting) functions
 tools for integrating C/C++ and Fortran code
 useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined. This
allows NumPy to seamlessly and speedily integrate with a wide variety of
databases.
NumPy is licensed under the BSD license, enabling reuse with few restrictions. The
core functionality of NumPy is its "ND array", for n-dimensional array, data
structure. These arrays are stride views on memory. In contrast to Python's built-in
list data structure (which, despite the name, is a dynamic array), these arrays are
homogeneously typed: all elements of a single array must be of the same type.
NumPy has built-in support for memory- mappedarrays.
Here is some function that are defined in this NumPy Library.
1. zeros (shape [, dtype, order]) - Return a new array of given shape and type,
filled with zeros.
2. array (object [, dtype, copy, order, lubok, ndim]) - Create an array
3. as array (a [, dtype, order]) - Convert the input to an array.
4. arange([start,] stop [, step,] [, dtype]) - Return evenly spaced values
within a given interval.
5. linspace (start, stop [, num, endpoint, ...]) - Return evenly spaced
numbers over a specified interval.
2.5 Pandas
Pandas is an open-source, BSD-licensed Python library providing high-performance,

easy-to- use data structures and data analysis tools for the Python programming
language. Python with Pandas is used in a wide range of fields including academic
and commercial domains including finance, economics, Statistics, analytics, etc. In
this tutorial, we will learn the various features of Python Pandas and how to use
them in practice.
The name Pandas is derived from the word Panel Data – an Econometrics from
Multidimensional data.
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had
very little contribution towards data analysis. Pandas solved this problem. Using
Pandas, we can accomplish five typical steps in the processing and analysis of data,
regardless of the origin of data — load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.
Key Features of Pandas
 Fast and efficient DataFrame object with default and customized indexing.
 Tools for loading data into in-memory data objects from different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and subsetting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
2.6 Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality

figures in a variety of hardcopy formats and interactive environments across
platforms. Matplotlib can be used in Python scripts, the Python and IPython
shells, the Jupyter notebook, web application servers, and four graphical user
interface toolkits.
Matplotlib tries to make easy things easy and hard things possible. You can generate plots,
histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.
For examples, see the sample plots and thumbnail gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when
combined with IPython. For the power user, you have full control of line styles, font properties,
axes properties, etc, via an object-oriented interface or via a set of functions familiar to MATLAB
users.
2.7 Scikit – Learn
Scikit-learn (formerly scikits. learn) is a free software machine learning library for
the Python programming language. It features various classification, regression
and clustering algorithms including support vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with
the Python numerical and scientific libraries NumPy and SciPy.
The scikit-learn project started as scikits.learn, a Google Summer of Code project

by David Cournapeau. Its name stems from the notion that it is a "SciKit" (SciPy
Toolkit), a separately- developed and distributed third-party extension to SciPy.
The original codebase was later rewritten by other developers. In 2010 Fabian
Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel, all from
INRIA took leadership of the project and made the first public release on
February the 1st 2010. Of the various scikits, scikit-learn as well as scikit-image
were described as "well-maintained and popular" in November 2012.
As of 2018, scikit-learn is under active development.
Scikit-learn is largely written in Python, with some core algorithms written in

Cython to achieve performance. Support vector machines are implemented by a
Cython wrapper around LIBSVM; logistic regression and linear support vector
machines by a similar wrapper around LIBLINEAR. [10]
2.7.1 Advantages of using Scikit – Learn:

 Scikit-learn provides a clean and consistent interface to tons of different models.
 It provides you with many options for each model, but also chooses sensible
defaults.
 Its documentation is exceptional, and it helps you to understand the models as
well as how to use them properly.
 It is also actively being developed.
CHAPTER 3
SYSTEM REQUIREMENT SPECIFICATION

To be used efficiently, all computer software needs certain hardware components or
other software resources to be present on a computer. These prerequisites are
known as (computer) system requirements and are often used as a guideline as
opposed to an absolute rule. Most software defines two sets of system requirements:
minimum and recommended. Software requirements specification establishes the
basis for an agreement between customers and contractors or suppliers on how the
software product should function.
3.1 Non-functional requirements

Non-functional requirements are the functions offered by the system. It includes time
constraints and constraints on the development process and standards. The non-
functional requirements are as follows:
Speed: The system should process the given input into output within appropriate time.
 Ease of use: The software should be user friendly. Then the
customers can use easily,
so it doesn’t require much training time.
 Reliability: The rate of failures should be less then only the

system is more reliable

Summer Training Report - Ishan Patwal

Uploaded by

Copyright:

Available Formats

Summer Training Report - Ishan Patwal

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Summer Training Report - Ishan Patwal

Uploaded by

Copyright:

Available Formats

Submitted in the partial fulfilment of

BACHELORS IN COMPUTER APPLICATIONS

Submitted By: Submitted to:

I Ishan Patwal Enrollment No. 02224002020 , student of

Bachelors in Computer Applications, a class of 2020-23,

Trinity Institute of Professional Studies, Dwarka hereby

declare that the Summer Training project report entitled

Machine Learning (Using Python) is an original work and the

award of any other degree.

First and foremost, I would like to express my profound

No words can express out deep sense of gratitude to Mr.

I wish to thank our respected faculty Ms. Sweety and our

Smartknower is an organization committed to the Ed-Tech

space. They aim to empower the lives of students based in

Tier-II and Tier-III towns by upskilling in modern technological

and entrepreneurial domains and benefit from these programs

to become a productive and proactive part of the corporate

space. Through rigorous and immersive training along with

simulated real-life technical exposure through live projects, they

train the candidate to be among the most fulfilled and develop

their all-round capability as well. With AI-based software at its

core, Smartknower offers a connected ecosystem accessible

from anywhere and by anyone.

In the 60 days summer training we study about so many

Keywords: Python, Machine Learning, House price predictor.

Python is an interpreted high-level programming language for general-purpose

1. Presence of Third Party Modules: The Python Package Index (PPI)

2. Extensive Support Libraries: Python provides a large standard

3. Open Source and Community Development: Python language is

4. Learning Ease and Support Available: Python offers excellent

6. Productivity and Speed: Python has clean object-oriented design,

1) Data exploration & analysis:-.

 Included here: Pandas; NumPy; SciPy; a helping hand from Python’s

2) Data visualization:- A pretty self-explanatory name. Taking data and turning it

 Included here: Matplotlib; Seaborn; Datashader;

3) Classical machine learning:- Conceptually, we could define this as any

 Included here: Scikit-Learn, StatsModels.

4) Deep learning:- This is a subset of machine learning that is seeing a

 Included here: Keras, TensorFlow, and a whole host of others.

 Apache Spark; Apache Hadoop; HDFS; Dask; h5py/pytables.

 Included here: nltk; Spacy; OpenCV/cv2; scikit-image; Cython.

2.2.1 Practical Implementation of Data Science:-

Problem Statement: You are given a dataset which

1. Data loading: Load a dataset “prisoners.csv” using pandas

2. Data Manipulation: Create a new column -“total benefitted”,

3. Data Visualization: Create a bar plot with each state name

 Machine learning is a subset of artificial intelligence in the field of computer

 Machine learning is closely related to (and often overlaps with) computational

 Machine learning (ML) is a category of algorithm that allows software

2.3.1 How Machine Learning works?

 Machine learning algorithms are often categorized as supervised or

 Unsupervised algorithms do not need to be trained with desired outcome

2.3.2Advantages of Machine Learning

1. Trends and Patterns Are Identified With Ease

2. Machine Learning Improves Over Time

3. Machine Learning Lets You Adapt Without Human Intervention

NumPy is the fundamental package for scientific computing with Python. It