Nothing Special   »   [go: up one dir, main page]

DAIOT UNIT 5 (1) Own

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Machine Learning Tutorial

Machine Learning tutorial provides basic and advanced concepts of machine learning.
Our machine learning tutorial is designed for students and working professionals.

Machine learning is a growing technology which enables computers to learn


automatically from past data. Machine learning uses various algorithms for building
mathematical models and making predictions using historical data or
information. Currently, it is being used for various tasks such as image
recognition, speech recognition, email filtering, Facebook auto-
tagging, recommender system, and many more.

This machine learning tutorial gives you an introduction to machine learning along
with the wide range of machine learning techniques such
as Supervised, Unsupervised, and Reinforcement learning. You will learn about
regression and classification models, clustering methods, hidden Markov models, and
various sequential models.

What is Machine Learning


In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data
like a human does? So here comes the role of Machine Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned
with the development of algorithms which allow a computer to learn from the data
and past experiences on their own. The term machine learning was first introduced
by Arthur Samuel in 1959. We can define it in a summarized way as:

Machine learning enables a machine to automatically learn from data, improve


performance from experiences, and predict things without being explicitly programmed.

With the help of sample historical data, which is known as training data, machine
learning algorithms build a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine learning brings computer
science and statistics together for creating predictive models. Machine learning
constructs or uses the algorithms that learn from historical data. The more we will
provide the information, the higher will be the performance.

A machine has the ability to learn if it can improve its performance by gaining
more data.

How does Machine Learning work


A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The
accuracy of predicted output depends upon the amount of data, as the huge amount
of data helps to build a better model which predicts the output more accurately.
Suppose we have a complex problem, where we need to perform some predictions, so
instead of writing a code for it, we just need to feed the data to generic algorithms,
and with the help of these algorithms, machine builds the logic as per the data and
predict the output. Machine learning has changed our way of thinking about the
problem. The below block diagram explains the working of Machine Learning
algorithm:

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

Need for Machine Learning


The need for machine learning is increasing day by day. The reason behind the need
for machine learning is that it is capable of doing tasks that are too complex for a
person to implement directly. As a human, we have some limitations as we cannot
access the huge amount of data manually, so for this, we need some computer systems
and here comes the machine learning to make things easy for us.

We can train machine learning algorithms by providing them the huge amount of data
and let them explore the data, construct the models, and predict the required output
automatically. The performance of the machine learning algorithm depends on the
amount of data, and it can be determined by the cost function. With the help of
machine learning, we can save both time and money.

The importance of machine learning can be easily understood by its uses cases,
Currently, machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc. Various top companies such
as Netflix and Amazon have build machine learning models that are using a vast
amount of data to analyze the user interest and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:

o Rapid increment in the production of data


o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.

The system creates a model using labeled data to understand the datasets and learn
about each data, once the training and processing are done then we test the model
by providing a sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The
supervised learning is based on supervision, and it is the same as when a student learns
things in the supervision of the teacher. The example of supervised learning is spam
filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to


find useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:

o Clustering
o Association

3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning
agent gets a reward for each right action and gets a penalty for each wrong action.
The agent learns automatically with these feedbacks and improves its performance. In
reinforcement learning, the agent interacts with the environment and explores it. The
goal of an agent is to get the most reward points, and hence, it improves its
performance.

The robotic dog, which automatically learns the movement of his arms, is an example
of Reinforcement learning.

Feature engineering with IoT data


Beware of the siren call of the automated ML software tool, which takes all your data
and determines what is important and automatically builds the best model from it–
and all at the click of a button. Often, the raw data that you have is not in a form that
ML models can be successful with. Using the data as it is can be a rocky proposition.
Many an unaware ship has been wrecked on those rocks, lured by the lovely sound
of automation.

One of the best ways to dramatically improve the predictive ability of your ML models
is not in the algorithms themselves, but in how the data that they are grown from is
presented to them. The transformations of data, the addition of constructed new
fields, and the removal of distracting fields is all done with the knowledge of how the
representation model operates. This process is called feature engineering. Data
fields are commonly referred to as features in ML. We will adopt that terminology for
the rest of the chapter.

The goal of feature engineering is to make it as easy as possible for your ML model
to have good performance. Different representations have different requirements for
what works well, so you will find yourself creating different versions of the same raw
dataset geared specifically to the ML representation. Get to know each ML
representation you are using to make sure you are giving it the best possible chance
to perform.

Feature engineering is an art and it is hard. But, it can add a lot of value and greatly
increase your probability of success. We will introduce a few key concepts, but there
is much, much more to learn.

❤...: Analytical method validation is a process used to ensure that the analytical method used for
a particular test is suitable for its intended use.
❤...: method validation results are used to ensure the quality, reliability and consistency of
analytical results; This is an integral part of any good analytical practice.
❤...: Validation should be done according to the validation protocol. Protocol should have
procedures and acceptance criteria for all roles
❤...: The results should be recorded in the validation report. Justification should be provided
when non-pharmacopoeial methods are used.
❤...
- Accuracy
- Firmness
- Simplicity
- Range
- Uniqueness
- Detection limits
- Size limit
❤...:1. Accuracy
Accuracy must be established within a certain range of the analytical procedure.
❤...: 2. Accuracy
It is the degree of agreement between individual results.
❤...: 3. Firmness
This should be considered at the development stage and show the reliability of the analysis when
deliberate variations are made in the method parameters.
❤...: 4. Simplicity.
It refers to the ability to produce results that are directly proportional to the concentration of the
analyte in the samples.
❤...: 5. Scope
It is an expression of the lowest and highest levels of analyte presented as determinable for the
product. The specified range is usually derived from linearity studies.
❤...: 6. Selection
It is the ability to unambiguously measure the desired analyte in the presence of components such
as excipients and impurities.
❤...: 7. Limitation of detection (limitation of detection)
It is the smallest quantity of analyte that can be detected by quantitative method and is not
necessarily determind.
❤...: 8. Quantity Limitation (Quantity Limitation)
It is the lowest concentration of an analyte in a sample that can be determined with acceptable
precision and accuracy.

What is bias in machine learning?


Bias is a phenomenon that skews the result of an algorithm in favor or
against an idea.

Bias is considered a systematic error that occurs in the machine


learning model itself due to incorrect assumptions in the ML process.

Technically, we can define bias as the error between average model


prediction and the ground truth. Moreover, it describes how well the
model matches the training data set:

• A model with a higher bias would not match the data set
closely.
• A low bias model will closely match the training data set.

Characteristics of a high bias model include:

• Failure to capture proper data trends


• Potential towards underfitting
• More generalized/overly simplified
• High error rate
What is variance in machine learning?
Variance refers to the changes in the model when using different
portions of the training data set.

Simply stated, variance is the variability in the model prediction—how


much the ML function can adjust depending on the given data set.
Variance comes from highly complex models with a large number of
features.

• Models with high bias will have low variance.


• Models with high variance will have a low bias.
All these contribute to the flexibility of the model. For instance, a model
that does not match a data set with a high bias will create an inflexible
model with a low variance that results in a suboptimal machine
learning model.

Characteristics of a high variance model include:

• Noise in the data set


• Potential towards overfitting
• Complex models
• Trying to put all data points as close as possible

Underfitting & overfitting


The terms underfitting and overfitting refer to how the model fails to
match the data. The fitting of a model directly correlates to whether it
will return accurate predictions from a given data set.

• Underfitting occurs when the model is unable to match the


input data to the target data. This happens when the model is
not complex enough to match all the available data and
performs poorly with the training dataset.
• Overfitting relates to instances where the model tries to
match non-existent data. This occurs when dealing with highly
complex models where the model will match almost all the
given data points and perform well in training datasets.
However, the model would not be able to generalize the data
point in the test data set to predict the outcome accurately.

Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of bias
and variance in order to avoid overfitting and underfitting in the model. If the model
is very simple with fewer parameters, it may have low variance and high bias. Whereas,
if the model has a large number of parameters, it will have high variance and low bias.
So, it is required to make a balance between bias and variance errors, and this balance
between the bias error and variance error is known as the Bias-Variance trade-off.
For an accurate prediction of the model, algorithms need a low variance and low bias.
But this is not possible because bias and variance are related to each other:

o If we decrease the variance, it will increase the bias.


o If we decrease the bias, it will increase the variance.

Bias-Variance trade-off is a central issue in supervised learning. Ideally, we need a


model that accurately captures the regularities in training data and simultaneously
generalizes well with the unseen dataset. Unfortunately, doing this is not possible
simultaneously. Because a high variance algorithm may perform well with training data,
but it may lead to overfitting to noisy data. Whereas, high bias algorithm generates a
much simple model that may not even capture important regularities in the data. So,
we need to find a sweet spot between bias and variance to make an optimal model.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make a
balance between bias and variance errors.

Different Combinations of Bias-Variance


There are four possible combinations of bias and variances, which are represented by
the below diagram:
1. Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine learning model.
However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with a
large number of parameters and hence leads to an overfitting
3. High-Bias, Low-Variance: With High bias and low variance, predictions are consistent
but inaccurate on average. This case occurs when a model does not learn well with the
training dataset or uses few numbers of the parameter. It leads
to underfitting problems in the model.
4. High-Bias,High-Variance:
With high bias and high variance, predictions are inconsistent and also inaccurate on
average.

How to identify High variance or High Bias?


High variance can be identified if the model has:
o Low training error and high test error.

High Bias can be identified if the model has:

o High training error and the test error is almost similar to training error

Use cases for deep learning with lots of data

Deep learning can do wonders for complex data, with thousands to millions of features
and a large history of labeled examples to use as training sets. Rapid advances in image
recognition have as much to do with the vast trove of recognized images that Google
and others have amassed over the years as it does with advances in the deep learning
algorithms used.

For loT data, this limits the usefulness of deep learning techniques. Most loT data is
relatively new without a long history of labeled examples. Most IoT devices only have
a few sensors, so the feature set is not that complex. In these situations, many of the
previously discussed ML techniques can perform a predictive job as well, if not better,
than deep learning techniques. Deep learning is computationally expensive, both in
terms of time and computational power (i.e. high cost). However, the loT data flow
consists of a large number of features and hundred With thousands to millions of
labeled training data available, deep learning methods can provide a significant boost
in predictive power. It is clear in Autonomous vehicle development. It can also work
wonders where pictures are taken The device is part of the activity (static or video).
Deep learning packages interact better with Python versus R. They're also relatively
new, so expect documentation and tutorials to be limited. This makes them a little
trickier to work with than the previous examples in this chapter. You need more time
and expertise to develop deep learning models than well-run ML models like Random
Forest.

Use whichever method makes the most sense for the individual use case and available
training data. Consider the impact requirements and see if using a deep learning
modeling technique provides enough accuracy to warrant the cost.

You might also like