Summer Training Report - Ishan Patwal
Summer Training Report - Ishan Patwal
Summer Training Report - Ishan Patwal
ON
MACHINE LEARNING
(USING PYTHON)
same has not been submitted to any other Institute for the
Ishan Patwal
02224002020
Bachelors in Computer Applications
Acknowledgement
Ishan Patwal
Chapter 1
About The Company
“Data science” is just about as broad of a term as they come. It may be easiest to describe
what it is by listing its more concrete components:
6) Odds and ends. Includes subtopics such as natural language processing, and
image manipulation with libraries such as OpenCV.
Now to use describe method in Pandas, just type the below statement:
Next in Python with data science article, let us perform data manipulation.
And finally, let us perform some visualization in Python for data science article. Refer
the below code:
Output:-
2.3 MACHINE LEARNING:-
4.Automation
Machine Learning is a key component in technologies such as predictive analytics
and artificial intelligence. The automated nature of Data Science means it can save
time and money, as developers and analysts are freed up to perform high-level tasks
that a computer simply cannot handle.
On the flip side, you have a computer running the show and that’s something that is
certain to make any developer squirm with discomfort. For now, technology is
imperfect. Still, there are workarounds. For instance, if you’re employing Data
Science technology in order to develop an algorithm, you might program the Data
Science interface so it just suggests improvements or changes that must be
implemented by a human.
This workaround adds a human gatekeeper to the equation, thereby eliminating the
potential for problems that can arise when a computer is in charge. After all, an
algorithm update that looks good on paper may not work effectively when it’s put
practice.
Various Python libraries used in the project:
2.1 Numpy
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined. This
allows NumPy to seamlessly and speedily integrate with a wide variety of
databases.
NumPy is licensed under the BSD license, enabling reuse with few restrictions. The
core functionality of NumPy is its "ND array", for n-dimensional array, data
structure. These arrays are stride views on memory. In contrast to Python's built-in
list data structure (which, despite the name, is a dynamic array), these arrays are
homogeneously typed: all elements of a single array must be of the same type.
NumPy has built-in support for memory- mappedarrays.
1. zeros (shape [, dtype, order]) - Return a new array of given shape and type,
filled with zeros.
2. array (object [, dtype, copy, order, lubok, ndim]) - Create an array
3. as array (a [, dtype, order]) - Convert the input to an array.
4. arange([start,] stop [, step,] [, dtype]) - Return evenly spaced values
within a given interval.
5. linspace (start, stop [, num, endpoint, ...]) - Return evenly spaced
numbers over a specified interval.
2.5 Pandas
The name Pandas is derived from the word Panel Data – an Econometrics from
Multidimensional data.
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had
very little contribution towards data analysis. Pandas solved this problem. Using
Pandas, we can accomplish five typical steps in the processing and analysis of data,
regardless of the origin of data — load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
2.6 Matplotlib
Matplotlib tries to make easy things easy and hard things possible. You can generate plots,
histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.
For examples, see the sample plots and thumbnail gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when
combined with IPython. For the power user, you have full control of line styles, font properties,
axes properties, etc, via an object-oriented interface or via a set of functions familiar to MATLAB
users.
Scikit-learn (formerly scikits. learn) is a free software machine learning library for
the Python programming language. It features various classification, regression
and clustering algorithms including support vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with
the Python numerical and scientific libraries NumPy and SciPy.
CHAPTER 3
Speed: The system should process the given input into output within appropriate time.
Ease of use: The software should be user friendly. Then the
customers can use easily,