Final
Final
Final
Submitted by
of
BACHELOR OF ENGINEERING
IN
TRICHY
MAY 2024
ANNA UNIVERSITY: CHENNAI 600
SIGNATURE SIGNATURE
Prof. T.Sugashini Dr. G. Balakrishnan
HEAD OF THE DEPARTMENT SUPERVISOR
Assistant Professor, Principal,
Department of Computer Science Department of Computer Science
and Engineering and Engineering
Indra Ganesan College of Indra Ganesan College of
Engineering, Manikandam, Engineering, Manikandam,
Trichy – 620012 Trichy – 620012
We praise the god for the blessings in all aspects of our project work
and guiding us in the path of his light. First of all, we praise and thank “our
beloved parents” from the depth of the heart.
We would like to thank our dynamic Director Dr. G. BALAKRISHNAN for his
unwavering support during the entire course of this project work. We are grateful
to our Principal Dr. G. BALAKRISHNAN for providing us an excellent
environment to carry out our course successfully.
Stereo vision, enabled by stereo matching algorithms, has emerged as a powerful technique
for depth estimation in computer vision applications. With the widespread availability of
dual-camera setups in modern smart phones, stereo vision has become increasingly feasible
for real-time depth calculation of obstacles in various scenarios. Stereo matching algorithms
leverage the disparity between corresponding points in stereo image pairs to infer depth
information. Traditional algorithms like block matching, Semi-Global Block Matching
(SGBM), and graph-based methods have been adapted and optimized for smart phone
platforms to enable real-time depth calculation. These algorithms exploit the spatial and
intensity information captured by the dual cameras to generate dense depth maps, providing
valuable insights into the 3D structure of the scene. Stereo Image Acquisition Dual-camera
smart phones capture stereo image pairs simultaneously, providing the necessary input for
stereo matching algorithms. Images acquired by the left and right cameras undergo
preprocessing steps, including rectification and color correction, to ensure alignment and
consistency between the stereo images. The heart of the depth calculation process lies in the
stereo matching algorithm. Camera implementations leverage efficient versions of block
matching, SGBM, or deep learning-based methods to compute the disparity map, which
represents the depth information of the scene The implementation of stereo matching
algorithms on smart phones offers numerous advantages, including portability, accessibility,
and integration with existing smart phone functionalities. By harnessing the computational
power of modern smart phones and leveraging the capabilities of dual-camera setups, stereo
vision becomes a practical and versatile tool for depth calculation and obstacle detection in
everyday scenarios. In conclusion, the implementation of stereo matching algorithms on
smart phones facilitates real-time depth calculation of obstacles are estimated.
iv
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO.
ABSTARCT iv
1 INTRODUCTION 1
1.1 INTRODUCTION 1
2 LITERATURE REVIEW 3
3 SYSTEM ANALYSIS 6
4 SYSTEM SPECIFICATION 8
5 SYSTEM DESIGN 17
10 APPENDICES 38
10.2 SCRENSHOTS 48
REFERENCES 49
LIST OF FIGURES
3.2 DISTANCE 7
6.1.1.5 RECTIFICATION 27
vii
LIST OF ABBEREVIATIONS
ABBEREVIATION EXPLANATION
viii
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
The integration of stereo matching algorithms into smart phones brings stereo
vision capabilities to a wide range of users, democratizing access to depth
calculation and obstacle detection. By harnessing the computational power of
modern smart phones and leveraging dual-camera setups, stereo vision becomes
accessible and practical for everyday use.
1
1.2 SCOPE OF THE PROJECT
Consideration of the hardware and software components necessary for stereo vision
on smart phones, including dual-camera setups, camera calibration, image
processing capabilities, and computational resources.
2
CHAPTER 2
LITERATURE REVIEW
3
estimation method, reducing the pose-estimation latency and error by up to 57.1%
and 29.5%, respectively.
This paper proposes a stereo matching method enhanced by object detection and
instance segmentation results obtained through the use of a deep convolutional
neural network. Then, this method is applied to generate a picking plan to solve bin
picking problems, that is, to automatically pick up objects with random poses in a
stack using a robotic arm. The system configuration and bin picking process flow
are suggested using the proposed method, and it is applied to bin picking problems,
especially those involving tiny cubic workpieces. The picking plan is generated by
applying the Harris corner detection algorithm to the point cloud in the generated
three-dimensional map. In the experiments, two kinds of stacks consisting of cubic
workpieces with an edge length of 10 mm or 5 mm are tested for bin picking. In the
first bin picking problem, all workpieces are successfully picked up, whereas in the
second, the depths of the workpieces are obtained, but the instance segmentation
process is not completed. In future work, not only cubic workpieces but also other
arbitrarily shaped workpieces should be recognized in various types of bin picking
problems.
Accurate stereo depth estimation plays a critical role in various 3D tasks in both
indoor and outdoor environments. Recently, learning-based multi-view stereo
methods have demonstrated competitive performance with limited number of
4
views. However, in challenging scenarios, especially when building cross-view
correspondences is hard, these methods still cannot produce satisfying results. In
this paper, we study how to enforce the consistency between surface normal and
depth at training time to improve the performance. We couple the learning of a
multi-view normal estimation module and a multi-view depth estimation module.
In addition, we propose a novel consistency loss to train an independent
consistency module that refines the depths from depth/normal pairs. We find that
the joint learning can improve both the prediction of normal and depth, and the
accuracy and smoothness can be further improved by enforcing the consistency.
Experiments on MVS, SUN3D, RGBD and Scenes11 demonstrate the effectiveness
of our method and state-of-the-art performance.
5
CHAPTER 3
SYSTEM ANALYSIS
In the proposed system, the utilization of the portrait mode of the image entails
capturing images with a focus on the subject, often resulting in a blurred
background. This mode, commonly available in cameras, enhances the depth
perception of the captured scene by creating a distinct separation between the
subject and its surroundings. By leveraging the portrait mode, the system ensures
that the object of interest, typically an obstacle, stands out prominently in the
image, facilitating accurate depth calculation through stereo matching algorithms.
Fig.3.1Portrait Image
The process begins with the acquisition of images from the smart phone camera in
portrait mode. This mode utilizes depth-sensing technologies, such as dual-camera
setups or depth sensors, to capture images with a shallow depth of field,
emphasizing the subject while blurring the background. The resultant images
exhibit a clear distinction between foreground and background elements, enabling
effective depth mapping for obstacle detection. Once the portrait image is obtained,
the system proceeds to calculate the depth information using stereo matching
algorithms. Stereo matching relies on the principle of triangulation, wherein
6
corresponding points in stereo image pairs are identified and used to estimate the
distance or depth of objects in the scene. By analyzing the disparities between
corresponding points in the left and right images, the system can infer the relative
distances of objects from the camera.
Fig.3.2. Distance
7
CHAPTER 4
SYSTEM SPECIFICATION
8
One of Python's standout features is its interpretive nature, facilitating rapid
prototyping and interactive development. Developers can swiftly experiment with
code snippets or debug in real-time using Python's interactive interpreter,
enhancing productivity and fostering a seamless development experience.
Furthermore, Python's cross-platform compatibility ensures that code written on
one operating system can effortlessly run on others, offering unparalleled flexibility
for developers across different environments.
The Python ecosystem thrives on collaboration and innovation, with a vibrant
community continuously enriching it with an array of third-party libraries and
frameworks. From web frameworks like Django and Flask to data science tools
such as NumPy and pandas, Python empowers developers to tackle a myriad of
tasks with ease. Its applications span diverse domains, including web development,
data analysis, scientific computing, machine learning, automation, and more.
In terms of syntax, Python's simplicity shines through, with intuitive constructs for
variables, control flow, functions, classes, and file I/O. This simplicity not only
enhances readability but also accelerates the learning curve for beginners
transitioning into programming. Additionally, Python's support for object-oriented,
procedural, and functional programming paradigms ensures adaptability to varying
project requirements and coding styles.
Python's journey from a humble scripting language to a powerhouse in the tech
industry underscores its enduring appeal and relevance. Its user-friendly design,
coupled with an extensive ecosystem and strong community support, cements
Python's position as a top choice for developers worldwide. Whether you're a
seasoned programmer or a newcomer embarking on your coding journey, Python's
charm and utility make it an indispensable tool for turning ideas into reality.
9
like Django and Flask empower developers to build robust and scalable web
applications and APIs with ease. Meanwhile, Python's prowess in data analysis and
visualization is evident through libraries such as NumPy, pandas, and Matplotlib,
enabling data scientists to manipulate and visualize data effortlessly. For desktop
GUI applications, Python offers Tkinter, PyQt, and wxPython, providing
developers with tools to create intuitive user interfaces. In the realm of gaming,
Pygame facilitates the development of 2D games, while frameworks like Panda3D
and Godot Engine support 3D game development. Python's scripting capabilities
make it indispensable for automation tasks, from file manipulation to system
administration. Moreover, Python finds application in mobile app development
through frameworks like Kivy and BeeWare, enabling developers to write code
once and deploy it across multiple platforms. In the IoT space, Python's simplicity
and flexibility shine, with libraries like MicroPython and CircuitPython tailored for
microcontroller programming. Scientific computing and engineering benefit from
Python's rich ecosystem of libraries such as SciPy, SymPy, and OpenCV,
facilitating simulations, modeling, and analysis tasks. Finally, Python's readability
and simplicity make it an ideal choice for educational purposes, with many
institutions using it to teach programming fundamentals to beginners. With its
extensive ecosystem and active community support, Python continues to be a top
choice for developers across diverse application domains.
10
Python’s native libraries and third-party This system frameworks provide fast and
convenient ways to create everything from simple REST APIs in a few lines of
code to full-blown, data-driven sites. Python’s latest versions have strong support
for asynchronous operations, letting sites handle tens of thousands of requests per
second with the right libraries.
11
Python 3 adoption was sloThis systemd for the longest time by the relative lack of
third-party library support. Many Python libraries supported only Python 2, making
it difficult to switch.
But over the last couple of years, the number of libraries supporting only Python 2
has dwindled; all of the most popular libraries are now compatible with both
Python 2 and Python 3. Today, Python 3 is the best choice for new projects; there
is no reason to pick Python 2 unless you have no choice.
Python’s libraries
The success of Python rests on a rich ecosystem of first- and third-party software.
Python benefits from both a strong standard library and a generous assortment of
easily obtained and readily used libraries from third-party developers. Python has
been enriched by decades of expansion and contribution.
Python’s standard library provides modules for common programming tasks
—math, string handling, file and directory access, networking, asynchronous
operations, threading, multiprocessors management, and so on. But it also includes
modules that manage common, high-level programming tasks needed by modern
applications: reading and writing structured file formats like JSON and XML,
manipulating compressed files, working with internet protocols and data formats
(This system pages, URLs, email). Most any external code that exposes a C-
compatible foreign function interface can be accessed with Python’s types module.
The default Python distribution also provides a rudimentary, but useful, cross-
platform GUI library via Tkinter, and an embedded copy of the SQLite 3 database.
The thousands of third-party libraries, available through the Python Package
Index (PyPI), constitute the strongest showcase for Python’s popularity and
versatility.
For example:
The Beautiful Soup library provides an all-in-one toolbox for scraping HTML—
12
even tricky, broken HTML—and extracting data from it.
Requests makes working with HTTP requests at scale painless and simple.
Frameworks like Flask and Django allow rapid development of This system
services that encompass both simple and advanced use cases.
Like C#, Java, and Go, Python has garbage-collected memory management,
meaning the programmer doesn’t have to implement code to track and release
objects. Normally, garbage collection happens automatically in the background, but
if that poses a performance problem, you can trigger it manually or disable it
entirely, or declare whole regions of objects exempt from garbage collection as a
performance enhancement.
An important aspect of Python is its dynamism. Everything in the language,
including functions and modules themselves, are handled as objects. This comes at
the expense of speed (more on that later), but makes it far easier to write high-level
code.
Developers can perform complex object manipulations with only a few
instructions, and even treat parts of an application as abstractions that can be
altered if needed.
Python’s use of significant whitespace has been cited as both one of
Python’s best and worst attributes. The indentation on the second line below isn’t
just for readability; it is part of Python’s syntax.
Python interpreters will reject programs that don’t use proper indentation to
indicate control flow.
with open(‘myfile.txt’) as my_file:
file lines = [x. strips(‘\n’) for x in my_file]
Syntactical white space might cause noses to wrinkle, and some people do
reject Python for this reason. But strict indentation rules are far less obtrusive in
practice than they might seem in theory, even with the most minimal of code
editors, and the result is code that is cleaner and more readable.
Another potential turnoff, especially for those coming from languages like C
13
or Java, is how Python handles variable typing. By default, Python uses dynamic or
“duck” typing—great for quick coding, but potentially problematic in large code
bases. That said, Python has recently added support for optional compile-time type
hinting, so projects that might benefit from static typing can use it.
Using an IDE
As good as dedicated program editors can be for your programming productivity,
their utility pales into insignificance when compared to Integrated Developing
Environments (IDEs), which offer many additional features such as in-editor
debugging and program testing, as This system as function descriptions and much
more.
This system Framework
This system Application Framework or simply This system Framework represents
a collection of libraries and Modules that enables a This system application
developer to write applications without having to bother about low-level details
such as protocols, thread manage etc.
14
presentation to web content, transforming plain markup into visually compelling
and aesthetically pleasing designs. By defining stylesheets that dictate the
appearance of HTML elements, developers can control aspects such as typography,
color schemes, layout, and responsive behavior. CSS offers a powerful set of tools
for achieving pixel-perfect designs, enabling developers to create visually stunning
websites that resonate with users across various devices and screen sizes.
JavaScript elevates web development to new heights by introducing interactivity,
dynamic behavior, and client-side scripting capabilities to web pages. As a versatile
programming language, JavaScript empowers developers to create interactive
features, handle user input, manipulate the DOM (Document Object Model), and
communicate with web servers asynchronously. With JavaScript, developers can
build sophisticated web applications that respond to user actions in real-time,
deliver personalized experiences, and seamlessly integrate with backend services
and APIs.
Together, HTML, CSS, and JavaScript form a powerful trio that enables
developers to design, build, and deploy web applications that captivate audiences
and deliver exceptional user experiences. By mastering these technologies and
leveraging their unique strengths, developers can unleash their creativity and bring
their web development visions to life in a dynamic and ever-evolving digital
landscape.
4.3.3 FLASK
Flask's simplicity extends beyond its minimalist design to its ease of use and
rapid development capabilities. With Flask, developers can quickly set up a
development environment and start coding without the overhead of complex
configuration. Its lightweight nature and modular architecture enable developers to
focus on building the core functionality of their applications without being bogged
down by unnecessary features or constraints.
15
One of Flask's strengths lies in its adaptability to various project requirements and
development workflows. Whether building a small prototype, a RESTful API, or a
full-fledged web application, Flask provides the flexibility to tailor the
development process to specific needs. Developers can choose from a wide range
of extensions to integrate additional features seamlessly, ensuring scalability and
extensibility as projects evolve.
Flask's minimalist approach also extends to its learning curve, making it an
excellent choice for developers of all skill levels. Beginners can quickly grasp
Flask's concepts and start building simple web applications, while experienced
developers can leverage its flexibility to tackle more complex projects. The
abundance of tutorials, guides, and community resources further facilitates the
learning process, empowering developers to master Flask and unlock its full
potential.
Moreover, Flask's focus on simplicity does not come at the expense of performance
or reliability. Despite its lightweight footprint, Flask is robust and capable of
handling high-traffic applications with ease. Its minimal overhead and efficient
request handling ensure optimal performance, while its built-in support for testing
enables developers to write reliable and maintainable code.
In summary, Flask's minimalist design, flexibility, ease of use, and performance
make it a compelling choice for web development projects of all sizes and
complexities. Whether building a simple blog, a RESTful API for a mobile app, or
a sophisticated web application, Flask provides the tools and resources needed to
bring ideas to life quickly and efficiently, making it a favorite among developers
worldwide.
16
CHAPTER 5
SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE
The system architecture, depicted in Fig. 5.1, encompasses the following
components and functionalities: Real-time images are acquired from one or more
cameras, serving as the input source for the system. The system utilizes appropriate
17
central hub for data capture and processing.
The Image Capture App is responsible for acquiring images through the camera,
which are then passed to the Stereo Cameras for stereoscopic vision. The Stereo
Image Processing stage prepares these images for further analysis, leading to the
application of the Stereo Matching Algorithm. This algorithm computes a Disparity
Map, providing depth information from the stereo image pairs. Concurrently, the
system detects Portrait Mode, determining the orientation of the captured image.
The Disparity & Portrait Mode Comparison stage juxtaposes these two pieces of
information to assess if any obstacles are present. Once obstacles are detected, the
Distance Calculation for Obstacle computes the distance to these obstacles using
the disparity map. Finally, the Stop Decision Logic evaluates the calculated
distance and makes a decision whether to halt or proceed based on the presence and
proximity of obstacles. This structured flow ensures a systematic approach to
obstacle detection and decision-making in real-time scenarios, contributing to
enhanced safety and efficiency.
18
A data store, a place where data
is held betThis system en
processes.
A data flows.
19
5.3 CLASS DIAGRAM
The central class is the 'Stereo Matching Algorithm', which encapsulates
attribute 'Video Capture'. This attributes represent the cameras and the module
responsible for detecting obstacles. It also contains a method ‘#Get_frame()' to
extract one frame from the image. The 'Disparity Mapping to Depth' class
characterized by attributes such as 'depth' defining the distance of the image
captured, ‘focal length’ and ‘baseline’. It also contains a method
‘#Get_DisparityMap()' to acquire an disparity of the image and ‘#Get Portrait()’ to
acquire the portrait of the image. This class hierarchy facilitates the implementation
of stereo matching and obstacle detection functionalities on camera, enabling
effective obstacle detection for various applications.
20
'Disparity Map Generation', where the disparity map is constructed, providing
depth information. Next, the system proceeds to 'Detect Portrait Mode' to identify
whether the captured image is in portrait mode. Upon completion, the system
engages in 'Compare Disparity and Portrait', where the disparity map and portrait
mode information are juxtaposed to detect obstacles. Subsequently, the 'Calculate
Distance for Obstacle' activity calculates the distance to the detected obstacles
based on the disparity map. Finally, the system executes the 'Stop Decision Logic'
activity to determine whether to halt or continue based on the calculated obstacle
distances. The sequence concludes with the 'Stop' node, representing the
termination of the system's operation. This Activity Diagram provides a clear
visualization of the system's operational flow, facilitating understanding and
communication of the sequential activities involved in obstacle detection and
decision-making.
21
5.5 SEQUENCE DIAGRAM
This Sequence Diagram provides a comprehensive view of the sequential
interactions between the camera and the Stereomatching. It begins with the 'Start'
message, indicating the initialization of the system on the camera. Upon system
initialization, the camera sends a 'Capture Image' message to the Stereomatching,
initiating the capture of an image. Once the image is captured, the process proceeds
Converting it into the disparity map and gives portrait mode of the captured image
and then 'Compare Disparity and Portrait' step, where the disparity map and
portrait mode information are evaluated to potentially identify obstacles.
Subsequently, the module calculates the distance to any detected obstacles
('Calculate Distance for Obstacle') and makes a decision to stop or proceed based
on obstacle detection ('Stop Decision') Finally, the 'Stop' message is sent back to
the camera, marking the conclusion of the system's operation. This Sequence
Diagram offers a clear depiction of the sequential interactions and message passing
between the camera and the stereomatching, facilitating understanding of the
system's operational flow.
22
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 SETREO MATCHING ALGORITHM
Stereo matching is a computer vision technique used to extract depth
information from a pair of stereo images captured by cameras. It involves
identifying corresponding points in the left and right images and then computing
the disparity between these points, which represents the perceived shift in position
due to parallax. The stereo matching algorithm aims to find the best
correspondence between points in the left and right images, typically by
minimizing a cost function that measures the similarity between image patches or
pixels. In this
6.1.1 SEMI-GLOBAL BLOCK MATCHING (SGBM)
Semi-Global Block Matching (SGBM) is a stereo matching algorithm that
combines the strengths of both block matching and global optimization technique.
It aims to find the disparity map between a pair of stereo images by considering the
local similarities between image patches while also incorporating global constraints
to ensure consistency and accuracy.
Calibration:
Process of determining the Intrinsic and Extrinsic parameters of the camera
system. In this calibration process, It identifies the focal length, principle point as
Intrinsic parameters and Translation vector and Rotational matrix as Extrinsic
parameters
23
Extrinsic calibration converts World Coordinates to Camera Coordinates. The
extrinsic parameters are called R (rotation matrix) and T (translation matrix).
Intrinsic calibration converts Camera Coordinates to Pixel Coordinates. It requires
inner values for the camera such as focal length, optical center. The intrinsic
parameter is a matrix we call K.
Calibration procedure
Step 1: Checkerboard pattern
This is the first step in the calibration procedure and is the most commonly used
calibration pattern for camera calibration. The control points of the image for this
pattern are the corners that lie inside the checkerboard. Because corners are
extremely small, they are often invariant to perspective and lens distortion.
24
features in both images and computing the disparity between them. The greater the
disparity, the closer the object is to the cameras.
25
Fig 6.1.1.4 Corner Detection
Step 4: Calibration
Calibration in stereo matching on camera is essential for correcting lens distortions,
estimating intrinsic and extrinsic camera parameters, and rectifying stereo image
pairs. Lens distortion correction ensures accurate depth calculation by mitigating
errors caused by lens imperfections. Intrinsic parameter estimation, including focal
length and principal point, aids in accurately aligning images and computing
disparities. Extrinsic parameter estimation determines the relative orientation and
position of the cameras, crucial for precise depth calculation. Rectification
simplifies stereo matching by aligning corresponding epipolar lines, facilitating
accurate correspondence between image pairs. Calibration enhances accuracy and
precision in depth estimation, vital for reliable obstacle detection on camera. It
minimizes errors from camera imperfections and misalignments, ensuring
dependable depth calculation results. Overall, calibration optimizes stereo vision
systems for accurate depth perception, critical for various applications, including
obstacle avoidance and augmented reality on camera.
Step 5: Rectification
This is the last step in the calibration procedure, It aligns stereo image pairs,
26
making corresponding epipolar lines parallel. This simplifies feature matching,
enhancing depth estimation accuracy.
Rectification corrects perspective distortion, ensuring disparities accurately
represent depth differences. It eliminates horizontal disparities, reducing
computational complexity in matching. Rectification improves algorithm
robustness by reducing sensitivity to variations. By aligning images, rectification
enhances reliability for obstacle detection. It optimizes stereo vision systems for
accurate depth perception on camera. This process enhances real-time performance
crucial for dynamic environments. Rectification simplifies stereo matching, aiding
efficient correspondence between images. It mitigates errors caused by
misalignments and perspective variations. Ultimately, rectification in calibration
ensures precise depth estimation for obstacle detection.
27
Fig 6.1.1.5 Rectification
SAD=∑i,j∣Ileft(i,j)−Iright(i+d,j)∣
Sum of Squared Differences (SSD):
SSD=∑i,j(Ileft(i,j)−Iright(i+d,j))2
28
Normalized Cross-Correlation (NCC):
NCC=∑i,j(Ileft(i,j)−Iˉleft)2∑i,j(Iright(i+d,j)−Iˉright)2∑i,j(Ileft(i,j)−Iˉleft)(Iright(i+d,j)−Iˉright)
where Iˉleft and Iˉright are the mean intensities of the left and right
image patches, respectively
3. Matching Cost Calculation: The matching cost is computed for each
candidate pixel by applying the chosen metric to compare its intensity value
with that of the pixel in the left image. The lower the matching cost, the
better the match between the pixel pairs.
4. Cost Volume Construction: These matching costs are typically organized
into a cost volume, where each pixel in the left image corresponds to a
disparity range along the horizontal axis, and the value at each disparity
represents the matching cost between the left pixel and the candidate pixel in
the right image at that particular disparity.
5. Disparity Estimation: The disparity (horizontal offset) corresponding to the
minimum matching cost is determined for each pixel in the left image. This
disparity value represents the perceived shift between corresponding points
in the left and right images and is used to calculate the depth of the scene.
.
6.1.3 DISPARITY MAPPING AND PORTRAIT
DISPARITY MAP
A Disparity map is a visual representation of the perceived depth in a scene.
Computed from stereo image pairs captured by the cameras, the disparity map
illustrates the horizontal shift or disparity between corresponding points in the left
and right images. Each pixel in the disparity map corresponds to a pixel in the left
image, with its intensity value indicating the calculated disparity. Brighter regions
denote closer objects or higher disparities, while darker regions represent farther
objects or lower disparities. Depth estimation can be derived from the disparity
values, enabling accurate distance calculation to objects in the scene.
29
Fig 6.1.3.1 Difference of Normal and Disparity map
PORTRAIT IMAGE
portrait images can provide valuable depth information for obstacle detection and
scene understanding. The stereo matching algorithm processes the stereo image
pair captured in portrait orientation to calculate depth maps, which represent the
spatial distribution of obstacles in the scene. These depth maps are then used to
infer distances to objects, aiding in obstacle detection and avoidance tasks. Portrait
images offer unique challenges and opportunities for stereo matching algorithms
due to their vertical orientation, requiring specialized processing techniques to
accurately estimate depth and calculate obstacle distances effectively.
30
Fig 6.1.3.2 Portrait image
CHAPTER 7
SYSTEM TESTING
System testing is a crucial phase in software development where the entire system
is tested as a whole. It aims to validate that the integrated software and hardware
components function correctly and meet the specified requirements.
The primary objectives of the system testing are to ensure that the system operates
according to the specified requirements, performs all intended functions
accurately, handles expected and unexpected inputs appropriately, and meets
quality standards.
The most common types of testing involved in the development process are:
Functionality Test
Non Functionality Test
The test cases were designed to cover a broad spectrum of real-world scenarios.
During the testing process, we diligently recorded performance metrics such as
execution times, throughput, and ensuring thorough analysis of the algorithm's
behaviour.
33
The performance testing offers insights into the stereo matching algorithm's
performance, ensuring its effectiveness and reliability in real-world environment.
34
stereo matching algorithm. Our goal was to ensure that users, including developers
and end-users, can interact with the algorithm effectively and efficiently.
During usability testing, participants were provided with access to the algorithm's
user interface or configuration interface, depending on the context of usage. They
were given tasks to perform, such as adjusting algorithm parameters, inputting
stereo image pairs, and interpreting the algorithm's output, representing typical
usage scenarios.
Participants' interactions with the algorithm were observed and recorded, along
with any difficulties or challenges encountered during the testing process. The
usability testing process identified areas for improvement in the algorithm's user
interface, configuration options, and documentation.
35
By conducting thorough reliability testing, we ensure that the stereo matching
algorithm consistently delivers accurate results, providing users with confidence
in its performance and reliability in real-world applications.
CHAPTER 8
EXPERIMENTAL RESULT
36
Block Matching
SGM
Deep Learning
SGBM
CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT
9.1 CONCLUSION
37
In conclusion, the implementation of stereo matching algorithms for depth
calculation of obstacles represents a significant advancement in computer vision
technology. By leveraging stereo image pairs captured by cameras, these
algorithms enable accurate depth estimation and obstacle detection in real- time
scenarios. Through processes such as cost calculation, disparity mapping, and
depth estimation, stereo matching algorithms provide valuable spatial
understanding of the environment, crucial for applications ranging from augmented
reality to navigation assistance. Furthermore, the utilization of portrait images
enhances depth perception and obstacle detection capabilities, particularly in
human-centric scenes and environments with vertical structures.
38
stereo vision, improving depth estimation accuracy, especially in challenging
environments.
CHAPTER 10
APPENDICES
10.1 SOURCE CODE
<!DOCTYPE html>
<html lang="en">
39
<head>
<meta charset="UTF-8">
<style>
body {
margin: 0;
padding: 0;
overflow: hidden;
#container {
position: relative;
width: 100%;
height: auto;
text-align: center;
display: block;
margin: auto;
40
#video {
#output {
#depthInfo {
font-size: 18px;
font-weight: bold;
color: blue;
margin-top: 10px;
#depthInfo {
font-size: 16px;
41
}
</style>
</head>
<body>
<canvas id="output"></canvas>
</div>
<script>
let classifier;
try {
42
const stream = await navigator.mediaDevices.getUserMedia({ video:
true });
video.srcObject = stream;
video. => {
resolve(video);
};
});
} catch (err) {
alert('Error accessing the camera. Please make sure you have granted
return null;
to prevent overflow
43
// Global Block Matching Algorithm (Simplest version)
difference
depthMap[idx] = diff;
return depthMap;
function renderDepth(depthMap) {
44
const normalizedValue = Math.min(255, pixelValue); // Ensure pixel
imageData.data[i * 4] = normalizedValue;
imageData.data[i * 4 + 1] = normalizedValue;
imageData.data[i * 4 + 2] = normalizedValue;
imageData.data[i * 4 + 3] = 255;
ctx.putImageData(imageData, 0, 0);
depthMap.length).toFixed(2);
function drawBoundingBoxes(objects) {
ctx.strokeStyle = 'red';
ctx.lineWidth = 2;
const x = rect.x;
45
const y = rect.y;
const w = rect.width;
const h = rect.height;
ctx.strokeRect(x, y, w, h);
function processFrame() {
if (depthMap) {
renderDepth(depthMap);
prevFrame = currentFrame;
// Object detection
if (classifier) {
classifier.detect(canvas).then(objects => {
46
drawBoundingBoxes(objects);
});
requestAnimationFrame(processFrame);
function loadModel() {
cocoSsd.load().then(model => {
classifier = model;
start();
});
if (videoElement) {
videoElement.play();
canvas.width = videoElement.videoWidth;
canvas.height = videoElement.videoHeight;
47
requestAnimationFrame(processFrame);
loadModel();
</script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script
src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script>
</body>
</html>
10.2 SCRENSHOT
48
REFERENCES
49
[1]. MobiDepth: Real-Time Depth Estimation Using On-Device Dual
Cameras Jinrui Zhang1†, Huan Yang1, Ju Ren2, Deyu Zhang1, Bangwen
He1, Ting Cao3, Yuanchun Li4, Yaoxue Zhang2, Yunxin Liu4, 2022
[2]. A Deep Learning-Enhanced Stereo Matching Method and Its
Application to Bin Picking Problems Involving Tiny Cubic Workpieces
Masaru Yoshizawa, Kazuhiro Motegi and Yoichi Shiraishi, 2023
[4]. Stereovision on mobile devices for obstacle detection in low speed traffic
scenarios A. Trif, F. Oniga, S. Nedevschi Published in International
Conference on… 24 October 2013
[5]. Sensors (Basel). 2019 Jun; 19(12): 2771. Published online 2019 Jun 20. doi:
10.3390/s19122771 Stereo Vision Based Sensory Substitution for the
Visually Impaired
[6]. Region of interest constrained negative obstacle detection and tracking with
a stereo camera Tian Sun, Wei Pan, Yujie Wang, Yong Liu
[7]. Stereo matching algorithm based on deep learning: A survey Mohd Saad
Hamid, NurulFajar Abd Manap, Rostam Affendi Hamzah, Ahmad Fauzan
Kadmin
[8]. Stereo vision based sensory substitution for the visually impaired Simona
Caraiman, Otilia Zvoristeanu, Adrian Burlacu, Paul Herghelegiu Sensors 19
(12),2771,2019
[9]. Real-time 3d object detection and recognition using a smartphone [real-time
3d object detection and recognition using a smartphone] Jin Chen, Zhigang
ZhuProceedings of the 2nd International Conference on Image Processing
and Vision Engineering-IMPROVE, 2022
50