CV UNIT 4
CV UNIT 4
CV UNIT 4
For Example, In the graph given below, we can clearly see that there are 3
circular clusters forming on the basis of distance.
K-Medoids and K-Means are two types of clustering mechanisms in Partition Clustering.
The algorithm takes the unlabeled dataset as input, divides the dataset into
k-number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = cv2.imread('RedRibbon.jpg');
image=cv2.resize(img,(1000,1500))
Z = image.reshape((-1, 3));
Z = np.float32(Z)
_, labels, centers = cv2.kmeans(Z, 5, None, (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0), 6,
cv2.KMEANS_RANDOM_CENTERS)
segmented_image =
centers[labels.flatten()].reshape(image.shape).astype(np.uint8)
plt.subplot(1, 2, 1);
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB));
plt.title('Original Image')
plt.subplot(1, 2, 2); plt.imshow(cv2.cvtColor(segmented_image,
cv2.COLOR_BGR2RGB));
plt.title('Segmented Image')
plt.show()
Output:
K-medoids
K-medoids, also known as partitioning around medoids (PAM), is a
popular clustering algorithm that groups k data points into clusters by
selecting k representative objects within a dataset. Clustering is a robust
unsupervised machine-learning algorithm that establishes patterns by
identifying clusters or groups of data points with similar characteristics
within a specific dataset.
Medoids
PAM is the most powerful algorithm of the three algorithms but has the disadvantage of time
complexity.
Manhattan distance
The distance between each data point from both medoids is calculated
using the Manhattan distance formula. It is also known as the cost.
Distance=∣x2−x1∣+∣y2−y1∣
After the algorithm completes, we will have k medoid points with their
clusters.
Advantages:
1. It is simple to understand and easy to implement.
2. K-Medoid Algorithm is fast and converges in a fixed number of steps.
3. PAM is less sensitive to outliers than other partitioning algorithms.
Disadvantages:
1. The main disadvantage of K-Medoid algorithms is that it is not suitable
for clustering non-spherical (arbitrarily shaped) groups of objects. This
is because it relies on minimizing the distances between the non-
medoid objects and the medoid (the cluster center) – briefly, it uses
compactness as clustering criteria instead of connectivity.
2. It may obtain different results for different runs on the same dataset
because the first k medoids are chosen randomly.
Classification:
What is Classification in Machine Learning?
Classification is a supervised machine learning method where the model
tries to predict the correct label of a given input data. In classification, the
model is fully trained using the training data, and then it is evaluated on test
data before being used to perform prediction on new unseen data.
Discriminant Function
A Discriminant Function in machine learning is a mathematical function used to
separate data into different classes. It is primarily employed in classification tasks,
where the goal is to predict the category or label of a given input based on its
features. The discriminant function assigns a value to each data point, and these
values are used to determine the class to which the data point belongs.
There are two main types of discriminant functions: Linear Discriminant
Function (LDA) and Quadratic Discriminant Function (QDA). In LDA, the
function assumes that the data from each class follow a Gaussian distribution with
the same covariance matrix, leading to a linear decision boundary. On the other
hand, QDA allows different covariance matrices for each class, resulting in
quadratic decision boundaries.
Applications
Discriminant functions are used in pattern recognition and image retrieval. They are
often chosen because they are simple, adhere to AAMI recommendations, and can
overcome training set imbalances
Machine learning algorithms are generally categorized based on how they learn
from data. The three main types of learning are supervised learning,
unsupervised learning, and semi-supervised learning. These categories differ in
terms of the data available during the training process and the way the algorithm
utilizes this data to make predictions or discover patterns.
Supervised learning is ideal when you have labeled data and a clear objective,
such as classification or regression.
1. Supervised Learning
How It Works:
Examples:
Supervised learning is powerful when you have access to a large labeled dataset.
However, labeling data can be expensive or time-consuming, which leads to the
next type of learning.
2. Unsupervised Learning
How It Works:
The algorithm analyzes the input data and attempts to find commonalities
or natural groupings without being explicitly told what to look for.
The output might include clusters of similar data points, lower-dimensional
representations of the data, or relationships between features.
Examples:
Unsupervised learning is useful when you have large amounts of unlabeled data
and want to extract meaningful insights without needing explicit labels. However,
evaluating the performance of unsupervised algorithms can be challenging because
there are no ground-truth labels to compare against.
3. Semi-Supervised Learning
How It Works:
The model is trained on a small set of labeled data along with a large set of
unlabeled data.
The idea is that the unlabeled data can provide additional information that
helps the model generalize better, even though it doesn't have explicit
labels.
Semi-supervised algorithms typically start by using the labeled data to learn
initial patterns and then use the unlabeled data to refine or enhance the
model.
Examples:
Image classification: Suppose you have a small set of labeled images (e.g.,
100 labeled images of dogs and cats) but a large set of unlabeled images.
Semi-supervised learning can use the small labeled set to guide the learning
process and leverage the larger unlabeled set to improve the model's
performance.
Speech recognition: You may have a small dataset of labeled transcribed
audio clips, but a large amount of unlabeled speech data. Semi-supervised
learning can help create more accurate transcription models.
Algorithms:
Bayes classification
Bayes classification is a probabilistic approach used in machine learning to predict
the class of a given data point based on prior knowledge and observed data. It is
grounded in Bayes' Theorem, a fundamental concept in probability theory that
describes how to update the probability of a hypothesis (in this case, a class label)
based on new evidence. Bayes classification is particularly useful when dealing
with uncertainty and can be applied to both binary and multiclass classification
tasks.
Bayes’ Theorem provides a way to update our beliefs about the probability of an
event or class, given some observed data. Mathematically, Bayes’ Theorem is
expressed as:
P(C∣X)=P(X∣C)P(C)
P(X)
Where:
1. Calculate Prior Probability (P(C)): This is the initial belief about the
distribution of the classes in the dataset. For instance, if there are two
classes, “Spam” and “Not Spam,” the prior might indicate that 40% of
emails are spam and 60% are not.
2. Calculate Likelihood (P(X | C)): This refers to the probability of observing
the given data (features) given a specific class. For example, the likelihood
of a particular word appearing in a spam email is calculated based on
historical data.
3. Compute Posterior Probability: The posterior probability P(C∣X) is
computed by combining the prior probability and the likelihood. This is
done for each possible class.
4. Select the Class with Maximum Posterior: The class that gives the highest
posterior probability is chosen as the predicted class for the data point.
One of the most widely used applications of Bayes classification is the Naive
Bayes classifier, which simplifies the computation by making a naive assumption:
the features (or attributes) are conditionally independent given the class. This
assumption significantly reduces the computational complexity and is especially
useful in high-dimensional datasets.
For example, in text classification (like spam detection), the Naive Bayes classifier
assumes that each word in an email is independently associated with the class label
(spam or not spam), which simplifies the calculation of the likelihood. Despite its
simplicity and the "naive" independence assumption, Naive Bayes often performs
surprisingly well in many real-world applications.
Naive Bayes Formula:
For a given class CCC and a feature vector X=(x1,x2,...,xn)X = (x_1, x_2, ...,
x_n)X=(x1,x2,...,xn), the Naive Bayes classifier computes the posterior probability
as:
n
P(C∣X)∝P(C)∏ P(xi∣C)
i=1
Where:
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
o Always needs to determine the value of K which may be complex
some time.
o The computation cost is high because of calculating the distance
between the data points for all the training samples.
Dimensionality Reduction
source
let’s walk through a simple example to understand how
Linear Discriminant Analysis (LDA) works:
source
Steps:
In our iris flower example, LDA would find the best linear
combination of sepal length, sepal width, petal length, and
petal width that maximizes the separability between the
Setosa, Versicolor, and Virginica species. The reduced-
dimensional space could potentially help in better classifying
new iris samples.
Pros:
Cons:
or
Assumptions in ICA
1. The first assumption asserts that the source signals (original signals)
are statistically independent of each other.
2. The second assumption is that each source signal exhibits non-
Gaussian distributions.
OpenCV, one of the most popular computer vision libraries, provides robust tools for performing
background subtraction, making it easier for developers to implement real-time video analysis
and tracking systems. OpenCV offers several algorithms and models for background subtraction,
such as the MOG2 and KNN methods, which are often the go-to choices for this task.
1.
Background Model: The core idea is to create and maintain a model of the background
in a scene, and then compare each new frame to this model. The goal is to identify
regions that differ significantly from the background model, which are considered to be
the foreground (moving objects).
prev = gray.copy()
cv2.imshow('original',frame)
cv2.imshow('foregroundMask',thres)
Popular Background Subtraction Algorithms in OpenCV:
Like MOG2, KNN can also handle shadows, but it does not model the
background as probabilistically as MOG2.
Simple Background Subtraction (Frame Differencing):
BackgroundSubtractorCNT
BackgroundSubtractorGMG
BackgroundSubtractorGSOC
BackgroundSubtractorLSBP
createBackgroundSubtractorMOG2
createBackgroundSubtractorKNN
Applications of Background Subtraction:
Challenges:
Moving Background: If the background itself moves (e.g., trees swaying in the
wind), detecting moving objects can become challenging.
Noise and Small Objects: Small moving objects or noise can be incorrectly
classified as foreground. Post-processing steps such as morphological filtering
can help to address this.
Modeling
Machine Learning models can be understood as a program that has been
trained to find patterns within new data and make predictions. These
models are represented as a mathematical function that takes requests in
the form of input data, makes predictions on input data, and then provides
an output in response. First, these models are trained over a set of data,
and then they are provided an algorithm to reason over data, extract the
pattern from feed data and learn from those data. Once these models get
trained, they can be used to predict the unseen dataset.
o Supervised Learning
o Unsupervised Learning
o Reinforcement Learning
o Classification
o Regression
Unsupervised Learning is also divided into below categories:
o Clustering
o Association Rule
o Dimensionality Reduction
1. Supervised Machine Learning Models
Supervised Learning is the simplest machine learning model to understand
in which input data is called training data and has a known label or result as
an output. So, it works on the principle of input-output pairs. It requires
creating a function that can be trained using a training data set, and then it
is applied to unknown data and makes some predictive performance.
Supervised learning is task-based and tested on labeled data sets.
Regression
In regression problems, the output is a continuous variable. Some
commonly used Regression models are as follows:
a) Linear Regression
Linear regression is the simplest machine learning model in which we try to
predict one output variable using one or more input variables. The
representation of linear regression is a linear equation, which combines a
set of input values(x) and predicted output(y) for the set of those input
values. It is represented in the form of a line:
Y = bx+ c.
The main aim of the linear regression model is to find the best fit line that
best fits the data points.
Advertisement
b) Decision Tree
Decision trees are the popular machine learning models that can be used
for both regression and classification problems.
The advantage of decision trees is that they are intuitive and easy to
implement, but they lack accuracy.
Advertisement
c) Random Forest
For the classification task, the outcome of the random forest is taken from
the majority of votes. Whereas in the regression task, the outcome is taken
from the mean or average of the predictions generated by each tree.
d) Neural Networks
Neural networks are the subset of machine learning and are also known as
artificial neural networks. Neural networks are made up of artificial neurons
and designed in a way that resembles the human brain structure and
working. Each artificial neuron connects with many other neurons in a
neural network, and such millions of connected neurons create a
sophisticated cognitive structure.
Classification
Classification models are the second type of Supervised Learning
techniques, which are used to generate conclusions from observed values
in the categorical form. For example, the classification model can identify if
the email is spam or not; a buyer will purchase the product or not, etc.
Classification algorithms are used to predict two classes and categorize the
output into different groups.
a) Logistic Regression
c) Naïve Bayes
Each naïve Bayes classifier assumes that the value of a specific variable is
independent of any other variable/feature. For example, if a fruit needs to
be classified based on color, shape, and taste. So yellow, oval, and sweet
will be recognized as mango. Here each feature is independent of other
features.
2. Unsupervised Machine learning models
Unsupervised Machine learning models implement the learning process
opposite to supervised learning, which means it enables the model to learn
from the unlabeled training dataset. Based on the unlabeled dataset, the
model predicts the output. Using unsupervised learning, the model learns
hidden patterns from the dataset by itself without any supervision.
o Clustering
Clustering is an unsupervised learning technique that involves
clustering or groping the data points into different clusters based on
similarities and differences. The objects with the most similarities
remain in the same group, and they have no or very few similarities
from other groups.
Clustering algorithms can be widely used in different tasks such
as Image segmentation, Statistical data analysis, Market
segmentation, etc.
Some commonly used Clustering algorithms are K-means Clustering,
hierarchal Clustering, DBSCAN, etc.
Spatio-temporal analysis
Dynamic stereo
Dynamic stereo refers to the process of capturing and analyzing three-dimensional (3D) depth
information from dynamic (moving) scenes, using stereo vision techniques. Unlike static stereo
vision, where depth is inferred from two images of a stationary scene, dynamic stereo deals with
sequences of images captured over time. This makes it a crucial tool in applications such as
robotics, autonomous driving, augmented reality (AR), and 3D scene reconstruction.
Machine learning (ML) plays a pivotal role in enhancing dynamic stereo systems by improving
the accuracy of depth estimation, handling complex motion patterns, and adapting to real-world
challenges like occlusions and lighting variations. Here's an overview of how dynamic stereo
works, the role of machine learning in it, and its applications.
In traditional stereo vision, two cameras (placed at different viewpoints) capture images of a
scene simultaneously. By comparing the two images, depth information can be extracted based
on the differences (disparities) in the images. The depth of a point in space is inversely
proportional to the disparity between the corresponding points in the two images.
However, in dynamic stereo, the scene is moving over time, which introduces additional
challenges:
Temporal Changes: The objects in the scene may move, causing their appearance in each frame
to change.
Dynamic Backgrounds: Unlike static scenes, backgrounds might change, making it more difficult
to isolate objects of interest.
Occlusions: Objects might be partially or fully obscured at different times, adding complexity to
the matching process.
To handle these challenges, dynamic stereo systems require robust algorithms that can track
objects and estimate their 3D structure over time.
Machine learning enhances dynamic stereo by improving several aspects of the process:
Dynamic scenes involve moving objects, so it's important to estimate how these objects move
across frames. Traditional stereo vision relies on finding correspondences between static points
in two images. However, in dynamic scenes, machine learning algorithms, particularly optical
flow and deep learning-based tracking, can be used to track motion more effectively.
Optical Flow: Machine learning models, especially convolutional neural networks (CNNs), are
used to estimate the motion (displacement) of objects in consecutive frames. This helps in
predicting the future positions of objects and understanding their motion dynamics.
Tracking Algorithms: Algorithms like Kalman filters or more advanced models like Recurrent
Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks are used to predict
object positions over time and handle occlusions.
Stereo Matching: Deep learning can improve stereo matching, which is the process of
identifying corresponding points in two or more images to compute depth. CNNs can learn to
match points more robustly in dynamic scenes, compensating for challenges like object motion
or changes in lighting.
End-to-End Depth Estimation: Deep networks, such as U-Net or Monodepth, can be trained to
estimate depth maps directly from image sequences, using a combination of stereo vision and
temporal information.
Occlusion detection and handling are crucial in dynamic stereo. Objects may obscure each other,
causing regions of the scene to become hidden in one or both camera views.
Occlusion Prediction: Machine learning models can predict when and where occlusions are
likely to occur based on prior data or learned patterns of motion. These models can then fill in
missing depth information using context from the surrounding pixels.
Disparity Networks: Using networks like StereoNet or DeepStereo, machine learning can
predict disparities even in occluded or uncertain areas by leveraging context from both spatial
and temporal data.
In dynamic stereo, the system needs to maintain temporal consistency in depth estimation over
time. This is particularly important for applications like 3D reconstruction, where the model
needs to ensure that depth information does not fluctuate unrealistically across frames.
e. Reconstruction of 3D Scenes
Once dynamic stereo algorithms estimate depth over time, the next step is to reconstruct the 3D
scene. This involves transforming the 2D images and depth maps into a 3D model, which can be
used for applications like navigation in autonomous vehicles or AR.
Point Cloud Generation: Depth data can be used to create point clouds, which are collections of
points representing the 3D structure of the scene. Machine learning models, such as PointNet,
can then be used to classify and analyze these 3D structures.
3. Applications of Dynamic Stereo in Machine Learning
Dynamic stereo, enhanced by machine learning, has broad applications across several fields:
a. Autonomous Vehicles
b. Robotics
For robots operating in dynamic environments, depth estimation from stereo vision helps in
object manipulation, path planning, and environmental interaction. In dynamic scenarios, robots
can use dynamic stereo to estimate the distance to moving objects and adjust their behavior
accordingly.
Dynamic stereo is used in surveillance systems to monitor 3D space in real time. For example, in
tracking moving people or objects, machine learning models can infer depth and predict future
locations, aiding in automated tracking and event detection.
In HCI, dynamic stereo can be used to track user movements and gestures in 3D space. This is
essential for applications like gesture recognition, sign language interpretation, and immersive
virtual environments.
Machine learning and traditional computer vision methods are both employed to estimate motion
parameters, with deep learning increasingly playing a central role due to its ability to handle
complex, high-dimensional data and dynamic environments.
Before delving into the methods for motion parameter estimation, it’s important to understand
the two primary types of motion in vision systems:
Rigid Motion: This involves objects or cameras moving without changing their shape.
For example, a car moving along a road or a camera rotating on a tripod. In rigid motion,
all points of the object or camera move consistently in space.
Non-Rigid Motion: This refers to objects that deform or change shape over time. For
example, a person waving their hand, cloth moving in the wind, or facial expressions
changing. Non-rigid motion is more complex to track and requires sophisticated
algorithms to estimate motion accurately.
There are several techniques and algorithms for motion parameter estimation in computer vision,
ranging from traditional optical flow methods to modern deep learning approaches.
a. Optical Flow
Optical flow is a classic method for estimating motion by tracking pixel intensities between
consecutive frames. It is based on the assumption that the intensity of a point in the image
remains constant over time, so the apparent motion of the pixel is caused by the object's or
camera's movement.
Lucas-Kanade Method: This is a differential method that assumes the motion is constant within
a small window around each pixel. It calculates the optical flow by solving a system of equations
derived from the spatial and temporal derivatives of the image intensity.
Horn-Schunck Method: This is another classical method that computes optical flow by imposing
smoothness constraints on the flow field. It assumes that flow across neighboring pixels should
vary gradually, except at object boundaries.
Optical flow is effective for estimating motion in relatively small, smooth regions of an image.
However, it struggles in regions with large motions, occlusions, or highly dynamic
environments.
In feature-based approaches, distinct features (such as corners, edges, or blobs) are tracked
across frames to estimate motion. This is done by matching the location of features in subsequent
frames.
Feature tracking is effective for rigid motion but can become unreliable when objects are
occluded, undergo non-rigid motion, or if the scene has very few distinctive features.
When the scene contains mostly planar surfaces (such as a flat ground or walls), motion can be
estimated using a homography. A homography is a transformation that relates two images of a
planar surface taken from different viewpoints (or different times).
Planar Motion: If the motion is purely rigid and the scene consists of a flat plane, the
transformation between the image planes can be described by a homography matrix. By
computing the homography between two frames, motion parameters (such as rotation and
translation) can be extracted.
This method works well for flat scenes but becomes less accurate in more complex, 3D
environments with non-planar surfaces.
Visual odometry refers to the process of estimating the position and orientation of a camera by
analyzing its motion across consecutive frames. This is important for applications like robotics
and autonomous vehicles, where the camera provides the primary source of information for
localization and mapping.
Direct Methods: These methods directly use pixel intensities from consecutive frames to
estimate camera motion. They include approaches like direct sparse odometry (DSO) and orb-
slam, which rely on minimizing photometric errors to track the camera’s motion in real-time.
Feature-Based Methods: Feature-based visual odometry techniques extract and track feature
points across images. These features are then used to estimate the camera’s motion using
triangulation and bundle adjustment techniques. ORB-SLAM is a well-known example that
combines feature extraction (using ORB features) with tracking and mapping to provide real-
time camera motion estimation.
Visual odometry can work in real-time and does not rely on external sensors like GPS or LiDAR.
However, it can drift over time (accumulation of errors), and its accuracy depends on the quality
of feature tracking and the scene's structure.
Robotics: Estimating robot motion is essential for navigation, obstacle avoidance, and mapping.
Motion parameters help robots localize themselves and avoid collisions in dynamic
environments.
Augmented and Virtual Reality (AR/VR): AR systems rely on motion estimation to track the
position and orientation of a user’s head, hands, or other objects, ensuring that virtual content is
accurately aligned with the real world.
Surveillance and Security: Motion parameter estimation allows surveillance systems to track
people or objects across video streams, providing insights into behavior, movement patterns, and
potential security threats.
Medical Imaging: In medical imaging, motion estimation techniques are applied to track and
reconstruct organs or tissues over time, especially in dynamic imaging like MRI or ultrasound.