Nothing Special   »   [go: up one dir, main page]

PoothicottuJacob George

Download as pdf or txt
Download as pdf or txt
You are on page 1of 103

A HoloLens Framework for

Augmented Reality Applications in


Breast Cancer Surgery
by

George Poothicottu Jacob

A thesis
presented to the University of Waterloo
in fulfillment of the
thesis requirement for the degree of
Master of Applied Science
in
Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2018

c George Poothicottu Jacob 2018


I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis,
including any required final revisions, as accepted by my examiners.

I understand that my thesis may be made electronically available to the public.

ii
Abstract

This project aims to support oncologic breast-conserving surgery by creating a platform


for better surgical planning through the development of a framework that is capable of
displaying a virtual model of the tumour(s) requiring surgery, on a patient’s breast. Breast-
conserving surgery is the first clear option when it comes to tackling cases of breast cancer,
but the surgery comes with risks. The surgeon wants to maintain clean margins while
performing the procedure such that the disease does not resurface. This calls for the
importance of surgical planning where the surgeon consults with radiologists and pre-
surgical imaging such as Magnetic Resonance Imaging (MRI). The MRI prior to the surgical
procedure, however, is taken with the patient in the prone position (face-down) but the
surgery happens in a supine position (face-up). Thus mapping the location of the tumour(s)
to the corresponding anatomical position from the MRI is a tedious task which requires a
large amount of expertise and time given that the organ is soft and flexible.
For this project, the tumour is visualized in the corresponding anatomical position to
assist in surgical planning. Augmented Reality is the best option for this problem and this,
in turn, led to an investigation of the application capability of the Microsoft HoloLens to
solve this problem. Given its multitude of sensors and resolution of display the device is
a fine candidate for this process. However, the HoloLens is still under development with a
large number of limitations in its use. This work tries to compensate for these limitations
using the existing hardware and software in the device’s arsenal.
Within this masters thesis, the principal questions answered are related to the acquiring
of data from breast mimicking objects in acceptable resolutions, discriminating between the
information based on photometry, offloading the data to a computer for post-processing in
creating a correspondence between the MRI data and acquired data, and finally retrieving
the processed information such that the MRI information can be used for visualizing the
tumor in the anatomically precise position. Unfortunately, time limitations for this project
led to an incomplete system which is not completely synchronized, however, our work has
solidified the grounds for the software aspects toward the final goals set out such that
extensive exploration need only be done in the imaging side of this problem.

iii
Acknowledgements

After a process that went through 16 months, today I am writing the final piece to my
masters thesis. The past months have been an intense learning experience for me on an
educational and personal level. Knowing full-well that no small number of words would
be enough in describing how grateful I am to the people who have continuously supported
me and helped me make this happen, I hope that this page would be a start.
First and foremost, I would like to thank my Supervisor Prof. Oleg Michailovich and
Co-Supervisor Prof. William D. Bishop without whom this thesis would not have been
possible. During my time in the University of Waterloo they have helped me expand and
deepen my knowledge through the process of my research, all the while being incredibly
supportive and motivational. José, Amir, Hossein, Rui, Rinat and Ming, sharing an office
with every single one of you has been an amazing experience.
To Robin, Shalabha, Vimala, Bharat, Mani, JJ, Hareesh, Sara, Vikky, Mahesh, Nirmal
and Ranga, thank you for making Waterloo a home away from home. The past two years
in this city were amazing all of you. If anything, the experiences we shared would be
something I would always cherish in the years forward.
A special shout out to my buddies back in India - Akhila, Hemanth, Aby, Jacob and
Thareeq, who despite the distance, always made me feel like they live right next door.
Finally, I thank my family, for all the love, care and belief I’ve been lucky enough to
receive.
From the bottom of my heart, I thank every single one of you, if not for which, this
thesis would never have been complete.

iv
Dedication

To Papa, Amma and Ichuppi.

v
Table of Contents

List of Tables ix

List of Figures x

Abbreviations xii

1 Introduction 1
1.1 Augmented Reality and its Domain . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Augmented Reality and Medical Science . . . . . . . . . . . . . . . . . . . 4
1.3 Moving Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Technical Background 8
2.1 An Overview of the HoloLens Hardware . . . . . . . . . . . . . . . . . . . 8
2.1.1 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Basics for HoloLens Application Development . . . . . . . . . . . . . . . . 10
2.2.1 Creating a Basic HoloLens Application . . . . . . . . . . . . . . . . 10
2.2.2 Unity Engine Environment Definitions . . . . . . . . . . . . . . . . 12
2.2.3 Object Representation in the HoloLens Environment . . . . . . . . 14
2.2.4 Augmentation Basics and Real-time Mesh Generation . . . . . . . . 15

vi
2.2.5 SDK and Other Software Tools . . . . . . . . . . . . . . . . . . . . 17
2.2.6 Mesh Storage Medium for Post Processing . . . . . . . . . . . . . . 19
2.2.7 Spatial Mapping and Spatial Understanding within the Unity UI . . 20
2.2.8 Accessing Spatial Understanding and Spatial Mapping Methods from
External Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.9 Viewing a Virtual Object within a Real-World Object . . . . . . . . 24
2.3 Limitations for the HoloLens in terms of the Project . . . . . . . . . . . . . 27
2.4 Basics of Surface Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 An Overview of Surface Reconstruction . . . . . . . . . . . . . . . . 27
2.4.2 Surface Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Main Contributions 33
3.1 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.2 Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Acquiring Depth and Color Information . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Creating an Environment for File Saving in the HoloLens . . . . . . 35
3.2.2 Obtaining Color Information and Acquiring Color of Vertices in a
Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.3 Collecting Mesh Information via the HoloLens SDK and MRTK . . 39
3.2.4 RGBD Module and Incremental Super Resolution . . . . . . . . . . 42
3.2.5 Pitfalls while Developing for the HoloLens . . . . . . . . . . . . . . 45
3.3 Surface Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Proposed Technique for Surface Reconstruction . . . . . . . . . . . 48
3.4 Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

vii
4 ISR Experimentation 55
4.1 Setup for Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Data Acquisition Program Flow . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Filtering Noise Using Color . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Surface Reconstruction - Setup and Results . . . . . . . . . . . . . . . . . 60
4.5 Surface Registration - Setup and Results . . . . . . . . . . . . . . . . . . . 62
4.6 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Conclusion and Future Work 69

References 71

APPENDICES 77

A Semantic and Syntactical References 78


A.1 Building Objects in an OBJ File . . . . . . . . . . . . . . . . . . . . . . . . 78
A.2 Creating an OBJ File Exporter . . . . . . . . . . . . . . . . . . . . . . . . 79
A.2.1 Writing to a File and Code Compile Segregation . . . . . . . . . . . 80
A.2.2 Converting C# Mesh Objects to String . . . . . . . . . . . . . . . . 81
A.2.3 Completing the OBJ Exporter Class . . . . . . . . . . . . . . . . . 84
A.3 Initializing and Using Photo Mode . . . . . . . . . . . . . . . . . . . . . . 85
A.4 A Function to Map Points from World Space to Pixel Space . . . . . . . . 89
A.5 Implementing Z-filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

viii
List of Tables

2.1 HoloLens Built-In Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . 10


2.2 Key-Value Pairs in OBJ Files . . . . . . . . . . . . . . . . . . . . . . . . . 20

ix
List of Figures

1.1 MRI Screening in the Prone Position . . . . . . . . . . . . . . . . . . . . . 2


1.2 Microsoft HoloLens and its Augmentation Capability . . . . . . . . . . . . 3

2.1 The Microsoft HoloLens and its Modules . . . . . . . . . . . . . . . . . . . 9


2.2 Unity ‘New Project’ Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Applying MRTK Settings to the Current Project . . . . . . . . . . . . . . 13
2.4 Unity Build Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Triangulated Surface Representation . . . . . . . . . . . . . . . . . . . . . 15
2.6 HoloLens Mesh Scan of an Office Space . . . . . . . . . . . . . . . . . . . . 16
2.7 Unity Application Capabilities List . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Spatial Mapping Prefab in the Unity Inspector Panel . . . . . . . . . . . . 22
2.9 Spatial Understanding Prefab in the Unity Inspector Panel . . . . . . . . . 23
2.10 Spatial Understanding Mesh Overlay for an Object in the HoloLens View . 24
2.11 View of a Virtual Object Embedded in a Real Object . . . . . . . . . . . . 26

3.1 Diagram Describing Proposed Methodology . . . . . . . . . . . . . . . . . 34

4.1 Sphere Scanning - Experimental Setting . . . . . . . . . . . . . . . . . . . 56


4.2 Mesh Overlay on the Target Object During Scanning . . . . . . . . . . . . 57
4.3 MATLAB Plot of Points Before and After Filtering . . . . . . . . . . . . . 59
4.4 Convex Hull of the Filtered Point Cloud . . . . . . . . . . . . . . . . . . . 60
4.5 Relevant Points in the Grid from the Mask . . . . . . . . . . . . . . . . . . 61

x
4.6 The Zero Level Set of φ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Simulated Surface of a Hemisphere . . . . . . . . . . . . . . . . . . . . . . 63
4.8 Cross-Sectional View of Registered Models . . . . . . . . . . . . . . . . . . 65
4.9 Tumor Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.10 Top and Side View of Tumor Visualization . . . . . . . . . . . . . . . . . . 67

A.1 OBJ File Viewed through Microsoft’s 3D Builder . . . . . . . . . . . . . . 79

xi
Abbreviations

AR Augmented Reality 2–7, 31, 37, 54

DLL Dynamic-Link Library 18

ISR Incremental Super Resolution 33, 43, 46–48, 55, 56, 62, 78

MRI Magnetic Resonance Imaging iii, 1, 2, 5, 7, 15, 19, 24, 33–35, 48, 53, 69, 70

MRTK Mixed Reality Toolkit 11, 17, 18, 20, 37, 39, 40

SDK Software Development Kit 6, 11, 17, 18, 24, 27, 35, 37, 39, 41

xii
Chapter 1

Introduction

Breast Cancer is a widespread disease that is statistically shown to affect one in every
eight women in Canada, with the disease estimated to be diagnosed in approximately
26,000 women in the year 2017 [5]. The mortality rate in records for breast cancer has seen
a decline especially with the introduction of Mammography screening yet some cancers
still remain occult to this process. This is especially the case when it comes to denser
breast tissues or cancers that have spread to multiple foci. With this problem, patients,
especially with a family history of the disease are referenced to MRI for diagnosis. MRI
in itself is still far from being used as a screening tool but is standard procedure when it
comes to surgical planning.
When it comes to tackling breast cancer through the process of surgery, it is possible to
perform a Mastectomy, which results in the complete removal of the organ, thus preventing
any relapse. However, from a patient’s perspective, a breast-conserving surgery is always
preferred over the complete removal of the organ. This, however, puts the surgeon in
a bind. On one hand, there is an intention to preserve as much tissue of the breast as
possible, while on the other, the surgeon has to make sure that there are clean margins
of excision of the tumor(s) such that the risk of a relapse could be avoided preventing
additional expenditure and emotional burden for the patient.
When it comes to the surgical planning of breast cancer, the surgeon consults the
radiology team. A MRI is preferred in this scenario especially when it comes to cancers
that are non-palpable or occult to mammography [59]. During the MRI data collection, the
patient is placed in a prone position as seen in Figure 1.1, while the actual surgery takes
place with the patient being in a supine position. Due to the significant softness of the
organ in question, the deformation is substantial with the position change, adding another

1
Figure 1.1: MRI Screening in the Prone Position [13]

layer of complexity for the surgical procedure with cancer occupying a different position
with respect to the MRI. Moreover, the information acquired is 3D but conventionally
visualized as 2D slices to the surgeon. Thus it is important to show the tumour in an
anatomically accurate area during surgery. This creates the requirement for an AR display
that performs this operation while not hindering the surgeon’s process. An AR device
provides a mixed reality interface to the user with virtual objects embedded in the real
world. In contrast, virtual reality devices close off the real world to the user such that only
virtual information is visible. Ideally, such a display of our requirements should be head-
mounted, must contain sufficient computational power, thus bringing us to the Microsoft
HoloLens. Figure 1.2 shows the capability of the HoloLens as used for educational purposes,
especially in terms of understanding the human anatomy [2].

1.1 Augmented Reality and its Domain


Augmented Reality (AR) is a technology to create a reality-based interface that is moving
into a staggering number of industries and consumers [60]. Since the late 90s, this technol-
ogy has been its own distinct field of research. International workshops and symposiums

2
Figure 1.2: Microsoft HoloLens and its Augmentation Capability [2]

for mixed reality since then have steadily popped up in the technological scene [60]. Ac-
cording to Azuma and Ronald T. [23], in their 1997 survey on AR, any device employing
the same must be capable of the following:

1. Combines virtual world with real world

2. The mixed reality must be capable of interaction in real-time

3. Virtual world is registered to the real world in 3 Dimensions

The capability of an AR device to blend the real world with the virtual makes its
application domain vast and varied, overlaying areas such as gaming, education, mechanical
assembly and medical science. Narzt et al. bases their paper on ‘Augmented Reality
Navigation Systems’ under the basic principle that any AR equipment must be seamlessly
integrated into the user’s natural environment [50]. The paper talks about the utilization
of a system that uses AR for navigation by considering a car as the AR equipment in itself.
There has been a considerable use of AR in the education domain as well, from conducting
trial studies of surgery through the use of haptic feedback in medicine [64] to the study
of geometry through applications such as Construct3D [39]. AR can also assist in the
design, assembly and maintenance of industrial and military projects [60]. In the area of

3
design, AR can allow instant visualization and modification for models [60]. Fiorentino
et al. talk about SpaceDesign, a mixed reality workspace to assist in design work-flow
through the use of semi-transparent stereo glasses that augment prototype models [34].
The applications possible for this technology in the field of medical science is large in
and of itself. Kuhlemann et al. discuss the use of AR in avoiding intermittent X-ray
imaging during endovascular interventions [45]. Cui et al. talks about a system for using
the Microsoft HoloLens for AR in Near-Infrared Fluorescence-based surgery. There are
multiple works in the field of medicine for the utilization of augmentation in the assistance
of Laparoscopic surgery [35, 29, 53].
Additionally, Van et al. [60] and Azuma et al. [23] assess in their respective surveys
the applications for AR in industries such as personal assistance, advertisement, touring,
combat simulations, maintenance and entertainment. Thus it is safe to say that AR extends
its reach to most mainstream industries today and would be subject to considerable growth
and popularity in the future.

1.2 Augmented Reality and Medical Science


Augmented Reality has been in use for surgery since 1986 [42]. Given the ability to create
a mixed reality, AR would be crucial in terms of improving medical procedures by provid-
ing the kind of X-ray vision, once envisioned only in comic books, to surgeons in viewing
anatomical parts of the human body in need of surgery. Over the years there have been
several papers written on the work conducted within this area, including the application
of this technology in real-life procedures and medical training. Wearable technology that
allows AR is especially significant through its nature of being hands-free, thus only neces-
sitating the wearing of the device without even being plugged into power (the Microsoft
HoloLens exhibits this characteristic).
One of the initial attempts in this field, especially involving wearable technology was
by Fuchs et al. [35] through the design and initial prototyping of a system that would assist
with laparoscopic surgical procedures. The aim was to provide a view into the internal
spaces of a patient with only minimal incisions on the skin and body wall. The design
was also intended to solve issues in laparoscopy involving the lack of depth in imaging and
limited field of view. The system used a 3D laparoscope in detecting depth to generate a
real-time 3-dimensional mesh. However, depth extraction was slow and registration was
faulty due to poor depth calibration.
Vogt et al. in 2006 conducted a study concerning the exploration and realization
of the practical benefits of in-situ visualization [61] for image-guided procedures. They

4
proposed the term in-situ visualization as the display of anatomical structures in their
original location on the patient body. The design of the system used a video-see-through
approach citing the precision and reliability possible from the same rather than projection
on a semi-transparent system. The system design also had a custom single camera marker
tracking system whose results were used for the purpose of calibrating the system as well
as tracking. The primary purpose here was to do in-situ visualization based on CT or
MRI data. The registration was done using MRI compatible markers during the scanning
procedure and then tracking the marker positions later when in need of augmentation.
However, trials on humans were not conducted for the same but results were collected on
virtual models.
An advanced laparoscopic liver surgery using AR on a human subject was conducted as
reported in a paper by Conrad et al. [29] to overcome the manual updating of the alignment
between the pre-operative 3D reconstruction and the moving laparoscopic image of the liver
(in displacement), thus taking the control of accuracy of overlay from the hands of the
operator. The evaluation of the AR system was conducted through the first reported case
of a laparoscopic rescue of failed portal embolization. The registration in this aspect was
done through the tracking of the laparoscopic imaging and instrument while the alignment
between the 3D reconstruction and the liver was achieved through the use of 4 anatomical
markers. The view generated during this procedure merged a semi-transparent view of
the 3D model over the laparoscopic view and the accuracy evaluated through the correct
overlay of surface tumours. The accuracy had a deviation of approximately 5 mm which
was also cited as a limitation since such a deviation would be critical for optimal clinical
application due to the proximity of critical structures [29].
An AR system that helps surgical excision of tumours are especially relevant. Cui et
al. [30] mention the heavy dependence of surgeons on traditional techniques in identifying
cancerous tissue among healthy ones, which often lead to secondary surgeries due to the
incomplete identification and removal of afflicted cells. The article also mentions the added
advantage by which the radioactive sulfur-colloid used to visualize cancerous areas can be
avoided thus preventing harmful ionizing radiations from affecting the patient. To solve
this issue a near-infrared fluorescence (NIR) based image guided surgery was considered
which uses tumour-targeted agents that bind to the cancerous tissue to be injected into
the patient. The radiation emitted here is detected using a silicon-based camera. Further,
a HoloLens was used to create a 2D AR environment that overlays a target area with the
previously mentioned NIR visualization, allowing the surgeon to use their own perception
overlaid with virtual information for surgery. Image acquisition and processing for this
operation was done through a custom sensor and processing circuit board. Calibration
for this procedure is done through the placement of a Hologram on top of a calibration

5
plane. Once the hologram has been perfectly aligned, further transforms are calculated
with respect to the calibration plane location and the coordinate system of the HoloLens.
Kuhlemann et al. [45] faces the challenge of visualizing the position and orientation of
the catheter being inserted during endovascular interventions. In this type of surgery, the
surgeon in traditional techniques needs to mentally overlay the 3D vascular tree with the
2D angiographic scene. Contrast agents and X-ray are the primary means of imaging here.
The paper proposes the use of the Microsoft HoloLens to display the vascular structures
in the field of view by using as inputs a landmark-based surface registration of a computed
tomography (CT) scan in tandem with segmentation of the vessel tree through the use
of Marching Cubes algorithm. Thus, eliminating the need for an X-ray in this aspect.
Electro-magnetic markers are used to stream data into the AR device at near-real-time
helping in registration and position tracking. The calibrations necessary for this process
are extrinsic and are landmark based.
In 1998, Sato et al. [55] proposed a method for the image guidance of breast can-
cer surgery using 3D ultrasound images and AR, aimed at breast conservative surgery.
The reconstruction of the ultrasound images is done through the use of an optical three-
dimensional position sensor. The augmentation was achieved through the superposition of
the reconstructed ultrasound images with live video images of the patient breast. However,
the tumour segmentation and visualization were said to be still insufficient in efficiency and
reliability, with reconstruction connectivity analysis and threshold selection being needed
to be conducted multiple times. The paper also mentions that a real-time volume renderer
would in the future be able to alleviate the issues mentioned.
Finally, Kotranza et al. [44] mentions the use of a virtual human and a tangible interface
in the context of a virtual breast exam patient to effectively simulate interpersonal scenarios
thus vouching for the degree to which virtualization created by a head-mounted display
can be a substitute for reality.

1.3 Moving Forward


Given the vast domain and capabilities of AR and the HoloLens in the field of medicine
as mentioned in the previous sections, the device is not without its limitations. The depth
perception resolution of the device is limited to make way for a system that consumes less
power and the Software Development Kit (SDK) is in its infancy, with changes continuously
being released. This thesis, investigates the capabilities of advanced graphical visualization
for pre-surgical planning. The principal work as performed and explained forward deals

6
with overcoming the limitations of the HoloLens such that the device fits the objective.
Primarily, the work accomplished the following:

1. Data Acquisition: Spatial information was collected over time for a breast mimick-
ing surface such that the same can be used for correspondence with MRI generated
information. This included the discrimination between the surface in question and
the rest of the real world through photometry, as well as storing the acquired infor-
mation for post-processing.

2. Surface Reconstruction and Registration: The data collected over the acquisi-
tion stage was offloaded into a computer which also possesses the MRI information.
The collected data is then used by a surface reconstruction algorithm and further by
a registration method that builds a correspondence with MRI such that the tumour
location in the data acquisition coordinate system can be identified.

3. Visualization: The processed information in the previous step is then used to locate
the tumour in the observer perspective and is sent to the HoloLens for display.

1.4 Summary
This chapter details the motivation towards developing an AR framework to assist in breast
conserving surgery. Sections 1.1 and 1.2 briefs regarding the domain of AR and its history
with medical science, demonstrating that this area of research is in a constant state of
growth and that similar work involving the use of AR for surgical assistance have been
on a gradual increase. Additionally, the outline regarding the kind of problems this thesis
tackles are mentioned in Section 1.3. The next chapter details information that serves as
the technical background to our work, including the hardware and software capabilities
of the Microsoft HoloLens, its limitations and finally the background towards our surface
fitting and registration algorithms.

7
Chapter 2

Technical Background

This chapter, discusses the technical background that need to be considered in building a
framework for surgical planning in breast-conserving surgery. The following contains basic
definitions, the type of expected inputs into the system and other relevant information
which forms the base of the contributions to the problem at hand.

2.1 An Overview of the HoloLens Hardware


The following is a look into the hardware capabilities of the HoloLens primarily in regard
to the types of sensors and optics available within the device as well as a brief look into
the software development kit and tools available in leveraging its hardware.
The HoloLens has a large array of sensors that can be put to work on various tasks.
Figure 2.1 displays the HoloLens and the sensor bar containing all the HoloLens sensors as
well as its display module which is used to project holograms. The following subsections
detail the sensors and optics in the Microsoft HoloLens as mentioned by the Microsoft
Developer Website [7].

2.1.1 Sensors
Table 2.1 summarizes the built-in sensors in the HoloLens device.

8
(a) The Microsoft HoloLens

(b) HoloLens Display Module

(c) HoloLens Sensor Bar

Figure 2.1: The Microsoft HoloLens and its Modules[7]

9
Table 2.1: HoloLens Built-In Sensors

Quantity Sensor Description


1 Inertial Measurement Unit (IMU)
4 Environment Understanding Cameras
1 Depth Detection Camera
1 2 MP Photo / HD Video Camera
1 Mixed Reality Capture
4 Microphones
1 Ambient Light Sensors

2.1.2 Optics
The optics of the HoloLens includes see-through holographic lenses with two high-definition
16:9 aspect ratio, light engines. The device also contains automatic pupillary distance
calibration for providing better focus as well as a holographic resolution of 2.3 M light
points. The holographic density is quantified as being greater than 2.5 k radiants (light
points per radian).

2.2 Basics for HoloLens Application Development


The following section explains the basics involved with programming in an application
development environment for the Microsoft HoloLens, its software capabilities in leveraging
its hardware as well as the types of information it handles.

2.2.1 Creating a Basic HoloLens Application


This section iterates the steps through which a basic holographic application can be created
for the HoloLens.

Requirements

The basic requirements for developing applications for the HoloLens are the installation
of Unity and Visual Studio. For up-to-date information refer the ”Install the Tools” page
within the Microsoft Developer website [8].

10
Creating and Building a New Project in Unity for the Microsoft HoloLens

1. Open Unity and start a new project. Provide a name for the project and set it as
a 3D application and click the ‘Create Project’ button. The opening dialogue is as
seen in Figure 2.2.

2. To install the Mixed Reality Toolkit (MRTK) go to


https://github.com/Microsoft/MixedRealityToolkit-Unity/releases/ and down-
load the current release, unzip the same and copy the contents of the ”Assets” folder
into the auto-created ”Assets” folder of your project.

3. Delete the default ‘Main Camera’ object in your project hierarchy panel. Go to
”Assets/HoloToolkit/Input/Prefabs” folder (The name HoloToolkit may change in
future versions of the MRTK), select the ‘HoloLensCamera’ prefab and drag and
drop it into the hierarchy.

4. Create a new folder in the project ”Assets” directory and name it ”Scene”. Go to
the File menu and select ‘Save Scene As’ and give your current scene a name you
desire.

5. Before we can build the project, slight changes must be made to the Unity develop-
ment environment in terms of settings for HoloLens development purposes. From the
‘Mixed Reality ToolKit’ menu, select ‘Apply Mixed Reality Project Settings’ from
the ‘Configure’ section as seen in Figure 2.3. This sets the current project Quality
Setting to ‘fastest’ and build settings to Windows Store App, SDK Universal 10,
Build Type D3D.

6. Select the ‘Build Settings’ menu from the ‘File’ menu. Click on ‘Add Open Scenes’
in the dialog box that pops up and see that the scene you previously saved shows
up on the ‘Scenes on Build’ list in the dialog. Check the ‘Unity C# project’ and
‘Development Build’ check-boxes and finalize by clicking the ‘Build’ button. Create
a folder in your main project directory called ”App” and select the folder to start
the build process. The Build dialog is as seen in Figure 2.4.

7. Once the build process is complete, open the Visual Studio (VS) solution file. In
the VS toolbar, set the solution configuration to ‘Release’, solution platform to ‘x86’
and build target to ‘Device’ (Assuming that the HoloLens is plugged in and paired
with your device). Now in the ‘Debug’ menu within VS, select ‘Start without Debug-
ging’ or ‘Start Debugging’ based on your need to get the application going in your
HoloLens.

11
Note: While working on a project we usually create a ‘Scripts’ folder within the project
and an ‘Assets’ folder to store custom made scripts meant to be linked to game
objects. These scripts would be built, during the building process and placed with
the build solution within the sub-folders of the ‘App’ folder.

Figure 2.2: Unity ‘New Project’ Dialog

2.2.2 Unity Engine Environment Definitions


The following are some definitions of terms that would be encountered while working on
the Unity end of the HoloLens application development environment

Materials: These are definitions of how a surface must be rendered with reference to
information such as textures and colour tints
Shaders: Script files containing algorithms and mathematical calculations for finding the
colour of the pixel to be rendered in an object based on the material configurations
and lighting parameters

12
Figure 2.3: Applying MRTK Settings to the Current Project

Figure 2.4: Unity Build Settings

13
Sub-shader: Every shader consists of one or more of these. Sub-shaders help in creating
compatibility between multiple graphic cards with varying capability, thus making
it possible to provide different options for rendering in different graphic cards. If
there is only one sub-shader it is declared as generic across all graphic cards unless
otherwise specified. This is also where properties such as ‘ZTest’ are specified which
can control object transparency. The Unity Manual provides a rich documentation
regarding the same[16].
Textures: Bitmap images that are referenced by a material for input to its shader in
calculating an objects surface colour. Textures can also represent aspects such as
reflectivity and roughness of a surface material
Ray Casting: A method available in the Unity API as well as increasingly used in
the field of computer graphics. This functionality in Unity casts a ray from a given
position towards a specified direction and records the journey of the ray including
where it collides.

2.2.3 Object Representation in the HoloLens Environment


The representation of the topology of objects in 3D geometry for the HoloLens is discretized
through the use of triangulated surfaces. This representation is referenced as a mesh and
is defined as a set of inter-connected vertices/nodes which form triangle/faces. These
triangles form the building blocks of what represents the topology of the object. We can
represent a node N as follows:

N = node = (x, y, z) ∈ R3 (2.1)

Here node represents the coordinate of each point on the mesh, R represents all real num-
bers and R3 represents the set of triplets of all real numbers. Given that there are L nodes,
that is {Ni }Li=1=0 , we can represent a face as a set of tuples of size three whose members
are a non-repetitive subset of {0, 1, ..., L} (which are indexes of the list of nodes). Now we
can define a face as follows:

F = face = {(l1 , l2 , l3 ) | l1 , l2 , l3 ∈ {0, 1, ..., L} and l1 6= l2 6= l3 } (2.2)


Triangles formed based on the equation above are referred to as the faces of the represen-
tation. For example a face written down as (1, 2, 3) represents the connection of vertices
whose indexes are 1, 2 and 3 in the list of vertices. A triangulated representation for a
surface is as seen in Figure 2.5.

14
Figure 2.5: Triangulated Surface Representation [28]

2.2.4 Augmentation Basics and Real-time Mesh Generation


In terms of the end goal of the project, it is necessary to view a 3D model of a breast
generated from an MRI scan in mixed reality. Even though a predefined model can be
viewed in mixed reality by importing it into the Unity interface, the objective of the
project demands that the model is generated during runtime based on the individual patient
information.
For a model to be generated within runtime a mesh is required to be defined. This is
the base form of a model that consists of nodes and faces. Figure 2.6 displays the 3D mesh
generated of an office space through the use of the Microsoft HoloLens. The structures
described in the image can be seen to be constructed from elementary geometrical shapes
(in this case triangles). As mentioned in the previous sections a mesh structure is also the
format by which we try to define the 3D model generated from the MRI.
1 Mesh mesh = new Mesh () ;
Listing 2.1: Initiates a fresh mesh object

Listing 2.1 states the code for a new mesh object in C# under the HoloLens-Unity
development environment. Every script that needs to be updated over a period of time in

15
Figure 2.6: 3D Mesh of an Office Space as Scanned by the Microsoft HoloLens

the program derives from a base class called MonoBehavior which allows for methods such
as the ones mentioned in Listing 2.2. These functions provide control over the activity cycle
of an active object, referred to as a Game Object, in Unity. The base class Monobehavior
allows the inheritance of certain variables that define the game object, such as direct
references to the ‘gameObject’ Object and ‘transform’ Object. The game object connected
to the mesh-generating script can be linked with the newly generated mesh. The mesh
for a game object is stored within a ‘Mesh Filter’. The mesh filter is inherited from the
‘Component’ class which is the base class for everything attached to the game object and
can be referenced as mentioned in Listing 2.3. It also mentions how the newly instantiated
mesh can be attached to this component.
1 void Awake () {...}
2 void Start () {...}
3 void Update () {...}
4 void FixedUpdate () {...}
5 void LateUpdate () {...}
6 void OnGUI () {...}
7 void OnDisable () {...}
8 void OnEnabled () {...}
Listing 2.2: MonoBehavior Derivatives

1 MeshFilter meshFilter = transform . GetComponent < MeshFilter >() ;


2 meshFilter . mesh = newMesh ;
Listing 2.3: Referencing the MeshFilter and Attaching the New Mesh

16
The mesh component within the mesh filter object has a number of fields. The most
important to specify in this case are the ‘vertices’, ‘triangles’ and ‘uv’ fields. The vertices
are represented as a Vector3 structure array, triangles are an integer array and finally, uv
is stored as an array of Vector2 structures.
Each element of the vertices array is a representation of a point in the world space or
the space in relation to the game object associated with the mesh. Each set of 3 elements
in the triangles integer array is a face, therefore each triangular face definition starts at
array indices 0, 3, 6, ... . The code in Listing 2.4 creates a model of a square with vertices at
a = (−1, −1, 0.1), b = (1, −1, 0.1), c = (1, 1, 0.1), d = (−1, 1, 0.1). The faces for the square
can be understood as (a, b, c) and (a, c, d). ‘uv’ here is a 2-dimensional vector representation
of the texture co-ordinates of the mesh. Before moving further with the explanation of the
application of ‘uv’, there are three key terms to be introduced in the scope of Unity [19].
Here ‘uv’ acts as a map of each point in the mesh to a point in the corresponding
texture. A texture can usually be assumed as a unit square with its upper-left corner at
the coordinate (0, 0) and its lower-right corner at (1, 1).
1 Vector3 a = new Vector3 ( -1 , -1 ,0.1 f ) ;
2 Vector3 b = new Vector3 (1 , -1 ,0.1 f ) ;
3 Vector3 c = new Vector3 (1 ,1 ,0.1 f ) ;
4 Vector3 d = new Vector3 ( -1 ,1 ,0.1 f ) ;
5 Vector3 [] vertices = new Vector3 []{ a ,b ,c , d };
6 int [] triangles = new int []{0 ,1 ,2 ,0 ,2 ,3};
7 Vector2 [] uv = new Vector2 []{ new Vector2 (0 ,0) , new Vector2 (0 ,1) ,
new Vector2 (1 ,1) , new Vector2 (1 ,0) };
8 mesh . vertices = vertices ;
9 mesh . triangles = triangles ;
10 mesh . uv = uv ;
Listing 2.4: Defining a Mesh Through Script

2.2.5 SDK and Other Software Tools


For the purpose of this project, we need the environment depth detected in real-time
and then processed into a surface. The depth sensors of the device working-alongside the
Software Development Kit (SDK) is known to provide a maximum spatial resolution of the
range of 1500 to 2000 triangles per cubic meter. The Mixed Reality ToolKit (MRTK) is
a collection of scripts and components intended to accelerate development of applications

17
Figure 2.7: Unity Application Capabilities List

targeting Microsoft HoloLens. It provides a custom camera object to be used within Unity
as well as several other prefabs (custom objects within Unity designed for specific purposes,
such as the cursor or Gaze controls) and functions for application development. The toolkit
is under active development and currently rolls in stable changes over the course of every
3 months.

MRTK - Spatial Mapping

The MRTK development packages contain a Spatial Mapping module for Unity3D which
can be used to obtain and process the mesh data from the HoloLens. The functions
contained in this module use the SurfaceObserver object that is part of the HoloLens SDK
to obtain mesh information. Since this module uses a SurfaceObserver object, the Spatial
Perception capability of the HoloLens (Figure 2.7) would need to be enabled. However,
the Spatial Mapping output has drawbacks in terms of being coarse and non-continuous
in nature.

MRTK - Spatial Understanding

The Mixed Reality Toolkit contains a better continuous meshing prefab called the ‘Spatial
Understanding’ module. The Spatial Understanding module utilizes the mesh generated
by the surface observer. However, the way the map is stored within the module through
the use of the understanding DLL sets it apart. The mesh is continuous and is finer

18
in comparison to the mesh generated by the spatial mapping module but runs a risk of
oversimplifying surfaces in order to detect flatter surfaces. The understanding module has
3 stages in its existence as follows:

1. Initiate Scan: The module starts using the input from the Spatial Mapping Module
to generate its own mesh

2. Updating: The module updates itself based on the previous meshes received from
the Mapping module as well as the new ones.

3. Finishing Scan: The module finalizes the mesh by filling all holes (areas that the
module did not receive enough spatial data for) in the current mesh and stops the
scanning procedure. The finalized mesh would not change and would be permanent
for that instance.

2.2.6 Mesh Storage Medium for Post Processing


During the course of the project, it is necessary to attain meshes generated by the HoloLens
during real-time scanning and store them for later analysis or for the purpose of sending
them to another device on the network in performing functions such as surface reconstruc-
tion or registration. Thus there is a necessity to store a given mesh in a logical file format
that can be easily written and read by the HoloLens and external systems alike. Addition-
ally, the format must be easily translatable to and from a ‘mesh’ object in Unity. It would
be best if the format could be used to store the 3D object generated from an MRI. This
would avoid conflicting files and unnecessary conversion from one file structure to another.

The Wavefront OBJ file format

This file format was developed by Wavefront Technologies and has been adopted by several
other 3D Graphics Applications [47]. Wavefront OBJ files are a supported format in the
Unity development environment that can be directly imported into virtual space as an
object during development. The file can be written in ASCII or Binary format and can
be used to declare multiple objects, specifying parameters such as coordinates, triangular
faces, vertex normals and texture. The format is structured that each element of a model
is represented as a set of key-value pairs. Every key letter specifies the information that
follows in any given line.

19
Table 2.2: Key-Value Pairs in OBJ Files with Short Examples

OBJ File Keys and Descriptions


Description Key Value Example
Comment # - #Comment on file format
Geometric Vertice v (vx , vy , vz [, w]) v 0.123 0.456 -0.354 1.0
v 0.1 0.2 3.0
Texture Coordinates vt (tu , tv ) vt 0.2 1
vt 0.1 0.1
Vertex Normals vn (nx , ny , nz ) vn -0.71 0.71 0
Face f (i, j, k) f123

Table 2.2 exhibits a brief introduction into the key-value pairs in an OBJ file. As seen
here, the vertices are co-ordinates in the 3D plane. vx , vy and vz are standard cartesian
coordinates and w is an optional component for viewers that support colour and ranges from
0 to 1, with 1 being the default value. Vertex normals are represented by three coordinates,
with each normal being represented as in the example section of Table 2.2. While reading
the OBJ file the order in which each v, vn and vt come up are indexed respectively for
their own type and are used to define the face. As seen from the example in Table 2.2 for
face, each space-separated element after the key f is a connection element of the face. An
example in building simple objects with OBJ files is mentioned in Appendix A.1

2.2.7 Spatial Mapping and Spatial Understanding within the


Unity UI
The HoloLens depth sensor does not directly provide raw depth data through their API.
This is mediated by an object called the Surface Observer. Even though this object can be
directly accessed to receive mesh information, the MRTK has a module that implements
this object and its functionality to provide a mesh through a Spatial Mapping prefab in
a very robust manner (A prefab is a prefabricated game object that can be directly im-
ported into the Unity environment). The Spatial Mapping prefab contains two components
as scripts, the Spatial Mapping Observer and the Spatial Mapping Manager as seen in Fig-
ure 2.8. There also exists a Spatial Understanding prefab in the MRTK which provides a
smoother mesh based on accumulated information acquired from the Spatial Mapping pre-
fab. The Spatial Understanding prefab consists of 3 custom components within it as scripts

20
called the Spatial Understanding Custom Mesh, Spatial Understanding Source Mesh and
Spatial Understanding.

Options in Unity UI for Spatial Prefabs

The Spatial Prefabs as mentioned in Section 2.2.7 has multiple options that could be set
as their default settings in the Unity User Interface (Unity UI). These options can be
leveraged to perform actions such as the automatic initialization of spatial scanning and
the drawing of visual meshes.
Spatial Mapping Prefab: The Spatial Mapping Observer from the Unity UI pro-
vides options to adjust the detail of the mapping in terms of number of triangles per cubic
meter (upper limit of 1500 - 2000 triangles per cubic meter), the time that can be allocated
in between subsequent mesh updates, the observer volume (the volume around a specific
point for which spatial data needs to be collected) and the orientation of the observer
volume. The Spatial Mapping Manager within the Unity UI allows control for the kind
of surface material used to construct a mesh, the physics layer of the application in which
the mesh is attached to (which can be leveraged to isolate out the mesh from other objects
in the game environment which would exist in different layers) and check-boxes that en-
able, the auto-initiation of the spatial observer, the drawing of visual meshes in the game
environment and the casting of shadows by the mesh.
Spatial Understanding Prefab: The Spatial Mapping Source mesh is the script that
accumulates information from the observer within the Spatial Mapping prefab. The Spatial
Understanding Custom Mesh is the mesh that is built from the information acquired from
the source mesh. This script provides options in the Unity UI to adjust the import period
of new meshes from the source mesh, adjust the mesh material to be used to construct the
understanding mesh, the maximum time per frame to be spend processing the accumulated
meshes and a check-box that can be used to enable mesh-colliders such that if enabled,
other objects within the game would be able to understand the custom mesh as a game
object and interact with it such as for collision detection. The Spatial Understanding script
controls the previously mentioned source and custom mesh scripts and provides options
within the UI to automatically begin the scanning procedure for Spatial Understanding as
well as fields to specify the amount of time that the Spatial Mapping Observer from the
Spatial Mapping prefab needs to spend in providing updates, during the scanning process
of Spatial understanding and after the scanning process is complete.

21
Figure 2.8: View of the Spatial Mapping Prefab in the Inspector Panel of Unity

22
Figure 2.9: View of the Spatial Understanding Prefab in the Inspector Panel of Unity

2.2.8 Accessing Spatial Understanding and Spatial Mapping Meth-


ods from External Scripts
Once Spatial Understanding or Spatial Mapping have been attached to Unity objects, it
is possible to access information acquired by these components externally through scripts.
This is possible by referencing the instantiated versions of the components in the applica-
tions. This is demonstrated in Listing 2.5. This would provide direct access to the game
objects these components are attached to as well as other components of the same such as
its renderer and transform.
1 Spa tial M a p p i n g M a n a g e r sm = S p a t i a l M a p p i n g M a n a g e r . Instance ;
2 Spatial U n d e r s t a n d i n g su = S p a t i a l U n d e r s t a n d i n g . Instance ;
Listing 2.5: Accessing the Spatial Mapping and Spatial Understanding Instances from an
external script

23
2.2.9 Viewing a Virtual Object within a Real-World Object
Generally, meshes or objects within the HoloLens SDK possess a property to occlude
objects behind the point of view. This adds a sense of realism when mixing virtual objects
with real objects. Since the surgeon must be able to view the tumour(s) within the patient’s
breast during the surgery it is important that a real-time meshing procedure be conducted
such that we detect a surface that could be used for registration with MRI information
(from which we acquire the location of the tumor(s)). It is possible, post-scanning process,
to stop drawing the mesh so that it is not necessary anymore to handle the viewing of
occluded objects in the scene, but this causes a problem for possible subsequent depth
sensing to account for other attributes such as the movement of the breast during the
process of surgery. In this scenario, we assume that a mesh generated in real-time (or a
processed mesh from the real-time mesh) and MRI is registered over the patient breast.
This means that the tumour is occluded from view because of a mesh surrounding its
position. Figure 2.10 shows how the Spatial Understanding mesh (seen in the image as a
green triangulated mesh) overlays a sphere.

Figure 2.10: Spatial Understanding Mesh Overlaying an Object in the HoloLens view

The property of being occluded from view belongs to the material aspect of an object.
Thus making an object see through would come through the modification of a materials

24
shader script since it handles the mathematical calculations pertaining to the pixels gen-
erated for a material as mentioned in Section 2.2.4, and thus also handles how pixels are
occluded as well. Every application in Unity while drawing objects in scene conduct a
depth test. This ensures that only the closest surface objects are drawn within the scene.
Our problem can leverage the ‘ZTest’ aspect of a depth test which controls how depth
testing is conducted. The default value for this property is ‘LEqual’ which hides objects
behind other objects in view.
Specifying ‘ZTest’ to be ‘Greater’ allows the shader to specify that this object is vis-
ible even when occluded by another object. To also allow ‘ZTest’ to make this spec-
ification the property called ‘ZWrite’ must be ‘Off’. ‘ZWrite’ controls if pixels for an
object is written into the depth buffer for display. Turning it off allows the use of semi-
transparent/transparent objects and dictates the system to ignore the object in the depth
buffer and further passes the control over to ‘ZTest’ in drawing the object in the scene.
More information regarding these properties are available in the Unity Manual [15]. List-
ing 2.6 demonstrates this action, attaching this to an object provides results as seen in
Section 2.2.9 where the virtual sphere seen in purple appears to be inside the bigger real
sphere and is not occluded by the Spatial Understanding mesh above it which surrounds the
real sphere. The torus-like blue circle in the image is merely the cursor for the application.
1 Shader " Custom / OcclusionGrid "
2 {
3 Properties
4 {
5 // Property specifications such as color go here
6 }
7 SubShader
8 {
9 Tags { " Queue " = " Transparent " " RenderType " = " Transparent " }
10 ZWrite Off // To ignore the depth buffer and to pass power
over to ZTest to determine if the object has to be rendered
11 ZTest Greater // ’ Greater ’ renders pixels that are occluded
12 }
13 }
Listing 2.6: Manipulating the depth test within a Shader

25
(a) Front-view

(b) Side-view

Figure 2.11: View from the HoloLens of a Virtual Sphere within a Real Object

26
2.3 Limitations for the HoloLens in terms of the Project
The HoloLens limitations in regard to the project are as follows:

1. The meshes generated by the surface observer have a very limited resolution in terms
of triangles per cubic meter (1500 to 2000).

2. The Microsoft HoloLens SDK does not provide access to raw depth sensor data or
level of detail

3. There is no pre-implemented functionality in the HoloLens SDK that combines the


sensor information from the RGB camera and depth sensor to provide an RGBD
data set.

4. The HoloLens sensors require ambient lighting conditions and considerable open
space to function according to its specifications. Low light and cramped spaces
seem to hinder with the Spatial Understanding of the HoloLens in general, causing
it to create multiple versions of the same space and store it in memory.

2.4 Basics of Surface Reconstruction


The following contains a brief overview of surface reconstruction techniques as well as
important definitions and formulas that form the foundation of our proposed methodology.

2.4.1 An Overview of Surface Reconstruction


Fitting techniques are effective when we have information that is acquired from a surface
through depth sensors in the form of point clouds, and we require to generate or reconstruct
a surface from the same. A point cloud is defined as a set of unstructured points with no
specific ordering and connection. In the context of two and three dimensions, these are
represented by (x, y) and (x, y, z) coordinates respectively [46].
Liang et al. mention that surface reconstruction can be split based on either the charac-
teristics of the data that is being used or the representation of the surface in question [46].
In terms of the input, a reconstruction can be considered easy if there is little noise and
good sampling. While a reconstruction process is considered tough if the data emulates
characteristics such as noise, non-uniform sampling and complicated topology.

27
Note: Noise means the false positives that can occur especially when it comes acquiring
data from 3D Scanners or Light Detection and Ranging (LIDAR) measurements. A
data set is considered well sampled if there are no holes in the information acquired
or in other words, the lack of an abrupt discontinuity in the data set.

Based on the representation of surface employed by varying techniques, we can divide


surface reconstruction into implicit and explicit techniques. Explicit Techniques represent
surface geometry in an explicit manner. The surface is constructed through methods
such as Voronoi Diagrams or Delaunay Triangulation (much like a mesh) being used to
determine connections among its data points and subsequently using the same to construct
a triangulated surface by connecting adjacent points [46].

Voronoi Diagrams: For a given number of points on a plane, a Voronoi diagram divides
the plane based on the nearest-neighbour rule with each point being associated with
the region having the highest proximity [22].

Delaunay Triangulation: This triangulation technique aims to minimize both the max-
imum angle and maximum circumradius of its triangles [31].

An explicit surface formulation can be very precise when using a dataset without con-
siderable noise or holes, however, triangulated surfaces usually have problems when dealing
with these parameters since the reconstruction process is based on the proximity of points
from one another. A more compact and explicit representation can be found in parametric
surfaces but it lacks global parametrization making the method less robust and flexible in
dealing with complicated topology due to the lack of consideration of volumetric informa-
tion. A global implicit surface reconstruction technique, however, that considers volumetric
information would be able to adapt to non-uniform and noisy data [46].
Implicit techniques typically represent surfaces by means of a level set of a continu-
ous scalar-valued function. The function is typically based on a grid and utilizes a level
set function such as the signed distance function. There also exists grid free representa-
tions through the combination of an implicit surface which is defined through mesh-free
approximation techniques such as radial basis interpolation functions [25, 25, 46].
Hoppe et al. [37] in 1992 developed an algorithm based on the determination of a zero
set from a signed distance function. The proposed method was capable of inferring the
topological type of the surface including the presence of boundary curves. However, the
algorithm lacked in geometric accuracy. In 1994, another paper on Surface Reconstruction
by Hoppe et al. [36] described a surface reconstruction procedure that provides accurate

28
surface models for unorganized point data by determining the topological type of the
surface and the presence and location of sharp features, but the paper does not contain
experiments which consider sparse and non-uniform data.
Tang et al. [58] in a review and comparison paper in 2013 mention the same weakness to
noise that can be acquired through real-time scanning from the techniques as proposed by
Hoppe et al. [36, 37]. Additionally, it mentions that Delaunay/Voronoi based algorithms
such as Alpha shape [33], Power Crust [21] and Cocone [20, 32] provide a theoretical
guarantee but only under certain conditions which are tough to obtain in real-time scanning
scenarios [58].
Kazhdan et al. in 2005 proposed a method called Poisson Surface Reconstruction
[41] observing that the normal field of the boundary of a solid can be understood as the
gradient of a surface indicator function which could be used for an implicit description of
the surface we attempt to reconstruct [40]. The method described the three primary steps
of reconstruction described as follows:

1. Transforming the oriented points (point normal information in this scenario) into a
three-dimensional continuous vector field.

2. Finding a scalar function whose gradient would best describe the vector field.

3. Extracting the appropriate isosurface [40] from the scalar function by accessing its
zero level set.

In 2013, Kazhdan et al. [40] proposed further improvements to this method to adjust
for the sparse set of points, all the while including several algorithmic improvements that
further reduce the time complexity of the solver used in finding the scalar function, thus
enabling faster, high-quality surface reconstruction.
Liang et al. [46] in the paper titled Robust and Efficient Implicit Surface Reconstruction
for Point Clouds Based on Convexified Image Segmentation, proposed a method which
exploits the underlying geometric structure of the point cloud data combined with a global
convexified image segmentation formulation. The method shows promise in terms of its
ability in dealing with challenging point data with attributes such as complicated geometry
and topology, as well as holes and non-uniformity. However, the algorithm does require
prior information in the form of point normal data in exhibiting its stated accuracy.

29
2.4.2 Surface Fitting
Given our region of interest Ωin ⊂ R3 , where Ω is the entire region under consideration
such that Ωin ∈ Ω ⊆ R3 , we can describe the boundary of our region of interest as Γ = ∂Ω.
Γ is the interface between our region of interest and its surroundings. The parametric way
of representing Γ is as follows:
Γ : D ⊆ R2 → R3 . (2.3)

Here Γ = (x(u, v), y(u, v), z(u, v)), where (u, v) ∈ D. We can represent this interface
based on a level set as well, that is

φ : R3 → R. (2.4)

This is defined such that Γ may be represented as follows:

Γ = (x, y, z) ∈ R3 | φ(x, y, z) = 0 .

(2.5)

This level set representation as mentioned above is the approach that we used to move
forward. Suppose we want to fit a surface to measured data, we need to find Γ∗ or equiv-
alently as we follow the level set method, φ∗ so that it agrees with our data and our prior
beliefs. Usually, we define a functional E, which is a scalar valued function of φ, such that
φ∗ is obtained as a result of the minimization of functional E. That is,

φ∗ = arg minE(φ), (2.6)


φ

where φ∗ is a minimizer of E. Typically E has two parts. Eext measures how well our
solution agrees with our input data, while Eint represents our prior beliefs of how the zero
level set of the function φ must look like. For example, if Eint represents the surface area,
it is provided as an incentive to return smooth level sets. We can represent this set of
combined energies as:

E(φ) = Eext (φ) + Eint (φ). (2.7)

According to the Euler-Lagrange equation [63], the minimizer for E has to satisfy the
following:
δE(φ)
= 0. (2.8)
δφ φ=φ∗

30
Here δE(φ)
δφ
is the first variational derivative of E(φ). Given infinitesimal perturbation
δφ of φ, the first variation can be written as:

δE(φ) = E(φ +  · δφ) − E(φ), (2.9)

where 0 <   1. Based on the above, we can write:


dE(φ +  · δφ)
δE(φ) =
d =0
Z (2.10)
δE(φ(r)
= · δφ(r)dr,
δφ(r)

where r ∈ R3 . Solving the Euler Lagrange condition normally is a difficult process, thus
instead, we start with an initial guess φ0 and then we update φ in the direction of a local
decrease in E(φ) which is indicated by the value of − δE(φ)
δφ
. This brings us to the following
initial value problem (IVP):
∂(φ(r, t)) δE(φ)
= , ∀(r, t) ∈ Ω × [0, ∞), (2.11)
∂t δφ
φ(r, 0) = φ0 , ∀r ∈ Ω, t = 0, (2.12)

where t is not a physical quantity but rather a representation of algorithmic time. For
example, if Eext = 0 and Ein considers a minimum surface area:
 
∂φ ∇φ
= div · k∇φk, (2.13)
∂t k∇φk
where this is done for all r and is referred to as Mean Curvature Flow (MCF) and attempts
to reduce the mean curvature of the surface.

2.5 Summary
Through the sections within this chapter, we looked into the hardware capabilities of the
Microsoft HoloLens and subsequently the fundamentals behind developing an AR applica-
tionin the Microsoft HoloLens. This involved descriptions and definitions towards object
representation in AR, mesh storage mediums for the device and the various components of
the SDK that can be leveraged to acquire spatial information. We also discuss the limita-
tions associated with the device for this project as mentioned in Section 2.3. Additionally,

31
the chapter also provides an overview into the literature behind surface reconstruction and
the theoretical background necessary in performing this process. The next chapter provides
the principal contributions associated with the thesis, including the proposed methodology
in solving our problem and the various components we have contributed in achieving this
goal.

32
Chapter 3

Main Contributions

The previous chapter provided the technical background in developing a framework to


assist in breast-conserving surgery. This chapter discusses our contributions which extends
in a number of directions. We first propose a methodology in Section 3.1 termed as
ISR for achieving our final goal which includes overcoming the hardware limitations of the
HoloLens and completing our intended framework. To actualize our proposed methodology,
we created multiple components as would be described in detail within subsequent sections.

3.1 Proposed Methodology


During the surgical procedure, the breast needs to be scanned in real-time so that a surface
may be generated to be registered with the information acquired from the MRI (The regis-
tration procedure is required such that we can translate the tumor information as defined
in the MRI based surface representation space to the coordinate space that the HoloLens
is using for our application). However, the HoloLens, as mentioned in Section 2.3, lacks
depth resolution. The figure details a method we propose called ”Incremental Super Res-
olution (ISR)” which aims at compensating for this limitation of the Microsoft HoloLens.
Sections 3.1.1 to 3.1.3 details the three stages that are continuously cycled throughout the
ISR process. Figure 3.1 shows a diagram that that visually represents this methodology.

33
Figure 3.1: Diagram Describing Proposed Methodology

3.1.1 Data Acquisition


The surface alignment procedure needs two meshes. One that would be provided through
the MRI scans (and transformation procedure) and the other needs to be generated real-
time during the viewing procedure. A depth sensor needs to be used to create a virtual
surface of the breast for this process. The higher the quality of meshing offered, the better
the matching that would happen during the surface registration phase of the project. Once
a satisfying mesh has been created in terms of the real-world coordinates, this needs to
be converted into the same mesh representation format as mentioned in Equations (2.1)
and (2.2). The data acquisition phase consists of this collection of depth information from
the HoloLens depth sensors over time. The vertex information of the mesh generated within
the area being scanned is continuously collected over a period of time, paired with colour
information from the Locatable Camera of the HoloLens and is sent over to a secondary
device (such as a computer) for post-processing.

3.1.2 Post-Processing
The secondary device acquires the information flowing in from the HoloLens. The device
is also in possession of the MRI data that is processed using finite element techniques such

34
that we have a surface representing the topology of the breast in the supine position. The
data which is continuously acquired is then cleaned using colour information to isolate the
object under consideration (in this case the breast anatomy).
The continuous inflow of points enables us to use a surface fitting/reconstruction algo-
rithm to generate a surface. This surface is then used to register with MRI information such
that we get a transform which could be used to move the MRI surface into the HoloLens
application space where the surgery is happening. Given this transform and the location
of the tumours with respect to the MRI, we can use the former in acquiring the latter’s
coordinates in surgical space. The surface generated from the inflow of information from
the HoloLens keeps improving over time based on the increasing amount of data being
acquired, thus the transform creates a much more accurate localization of the tumour over
a period of time.

3.1.3 Visualization
Through the process of surface registration, we acquire a transform function T that maps
the nodes in MRI space nodesM RI to the surgical space coordinates nodessurgical . This
transformation can then be used to map the position of the tumour from M RI space to
surgical space. The mapped position of the tumour is then sent over to the HoloLens for
visualization such that it appears in the anatomically accurate position inside the patient’s
breast.

3.2 Acquiring Depth and Color Information


The HoloLens does not provide direct access to the depth information it acquires through its
sensors but provides access to processed mesh information. The following section details our
contributions in acquiring data from the device including the access to depth information
and creating a correspondence between depth information and colour.

3.2.1 Creating an Environment for File Saving in the HoloLens


As mentioned earlier, a mesh object stores mesh information within the HoloLens SDK.
This means that when we want to store a mesh in view, the information would be processed
as a mesh object. In the case of selectively saving attributes of a mesh, this object could
be converted into an array or list of Vector3 objects.

35
While developing applications for the Microsoft HoloLens, it is important to note that
not all ‘System’ and ‘Windows’ namespaces are available while in the Unity Editor en-
vironment but could subsequently be available in the build code. Thus we need to use
#if directives while importing from certain namespaces such that the Unity Editor version
of the code and the build code would be error free. When a C# compiler meets an #if
followed eventually by an #endif, the code in-between is only considered if the specified
symbol is defined. Listing 3.1 is an example for such a use case where the namespace men-
tioned within the #if and #endif are only imported if the environment where the script
exists is currently not in the Unity Editor.
1 # if ! UNITY_EDITOR
2 using Windows . Storage ;
3 # endif
Listing 3.1: An example for using #if

First, we need a file write system that uses an asynchronous process to do the saving
process of the file. An asynchronous process unlike its counter-part works in parallel to
other processes and does not hold up the system. This is important in the context of
the HoloLens as it works in a frame-by-frame form. Any delay in the saving process can
freeze the Holograms in view and create an irregular experience for the user. Typically,
an asynchronous process is called through the use of ‘System.Threading.Tasks’ namespace.
However, using the same without specifying to not consider the same within the Unity
Editor environment would throw an error such as:

error CS1644: Feature ‘asynchronous functions’ cannot be used because it is not part of
the C# 4.0 language specification

The best way to circumvent this situation is to specify to the Unity Solution Builder to
ignore the code in the Unity Editor environment until we further build and deploy into the
HoloLens from VS. The Mixed Reality Toolkit contains a method called ‘MeshSaver’ that
implements a few lines of code based on this idea for saving information into a file during
runtime in the HoloLens. The creation of a class that performs this operation with code
segregation is mentioned in Appendix A.2.1. The file saving function is then given a string
representation of the mesh object such that it could be saved for post-processing (Refer
Appendices A.2.1 and A.2.3 for more code related details).

36
3.2.2 Obtaining Color Information and Acquiring Color of Ver-
tices in a Mesh
When working out the problem of obtaining the mesh for an object from the depth sensors
in the HoloLens, it is necessary to filter out unnecessary information or reduce noise in
terms of other objects in visibility or inaccuracies in mesh formation. During the process
of scanning meshes would be generated for everything in view or atleast for everything
concerning the mesh filter contained in the observer volume. This would mean that a
general reduction in observer volume would still encompass the mesh data concerning to a
minimum span of 1 meter in all three coordinate spaces. To combat this problem we can
filter out the vertices based on colour correlations. Thus it is necessary to use an RGBD
(Red Green Blue + Depth) module which in theory would combine, the colour data that
the device observes through its ‘locatable camera’ and the spatial information it receives
from its depth sensors. Unfortunately, neither the HoloLens SDK nor the MRTK has a
component that enables this process thus creating the need for solving this problem in the
HoloLens.

Obtaining an Image from the HoloLens

The first part of solving this problem is the acquisition of an image during the process
of scanning. This image can then be used to map the colour data to the mesh data we
acquire in the same instance. The Locatable Camera in the HoloLens needs to be enabled
for attaining this goal. This camera is mounted on the front of the device and enables
applications to see what the user sees in real-time. For using the same in an application,
one must first enable this capability in the capabilities section inside the player settings of
Unity. The camera not only provides the colour information but also passes information
concerning its position with respect to the scene by providing a transform. The processes
necessary for photo capture are neatly wrapped into the HoloLens SDK with the method
running asynchronously alongside other functions in the application. The C# code related
aspects of performing this operation are discussed in Appendix A.3

Mapping Mesh Coordinates to Color

The camera that captures RGB information for the HoloLens and the meshes generated in
AR exist in two different coordinate systems which can be termed as ‘camera space’ and
‘world space’. The ‘PhotoCapture’ module provides a transformation matrix to bridge this
difference, and in addition, provides a projection matrix in locating the pixel in an image

37
that corresponds to any point in camera space. The following details the working of both
matrices and an algorithm in obtaining the colour of any point in world space which is
captured in an image.
Camera to World Matrix: The camera to world matrix mentioned in the previous
section is a transformation matrix. Consider a point that is represented as (x, y, z) in the
camera space, this is represented as a matrix
 
C= x y z 1 . (3.1)

If the world space coordinates of C is represented as


 
W = a b c 1 , (3.2)

for coordinates (a, b, c). A transformation matrix that works in 3 dimensions is a 4 × 4


matrix of the form  
m11 m12 m13 0
m21 m22 m23 0
TCamT oW orld = 
m31 m32 m33 0 .
 (3.3)
0 0 0 1
The relation between the camera space, the world space and the transformation matrix
can be written as
C × TCamT oW orld = W. (3.4)
Given this information we can also write,
−1 −1
C × TCamT oW orld × TCamT oW orld = W × TCamT oW orld , (3.5)

which on further simplification becomes


−1
C × I = W × TCamT oW orld . (3.6)

Here I is the identity matrix  


1 0 0 0
0 1 0 0
I=
0
. (3.7)
0 1 0
0 0 0 1
Thus, we can write
−1
C = W × TCamT oW orld , (3.8)

38
which means that the inverse of the transformation matrix can be used to map coordinates
from world space to camera space as well. If the inverse of TCamT oW orld matrix can be
referred to as TW orldT oCam 1 , we can also write:

W × TW orldT oCam = C. (3.9)

Projection Matrix: The projection matrix is a 4×4 matrix that helps in transforming
coordinates from the camera coordinate space to image space, that is, the transformation is
from a 3D space to a 2D space. The matrix consists of parameters such as the focal length,
skew and center of projection of the camera to assist in its transformational function.
 Mul-
tiplication of a matrix
 consisting
 of a camera coordinate (a, b, c) such as C = a b c 1
yields a result P = x y 1 such that x and y are the pixel coordinates represented be-
tween a range [−1, 1]. If w and h represent the width and height of  the image, then we can
gain the actual pixel location Pactual from the matrix Pminimal = x y by the following:
1  
Pactual = Pminimal + 0.5 0.5 . (3.10)
2

Creating a World Space to Pixel Position Function: Given the information


above, it is now possible to design a function that would be able to take as input a three
dimensional coordinate and provide as output the pixel position associated with this point.
Appendix A.4 describes the implementation for such a method in the Unity-C# HoloLens
development environment.

3.2.3 Collecting Mesh Information via the HoloLens SDK and


MRTK
Considering that the HoloLens SDK does offer a base class for accessing meshes, the
Spatial Mapping and Spatial Understanding classes from the MRTK are well built and
more robust in their approach of acquiring and keeping mesh information. Thus, accessing
depth information from these modules is more reliable. The following is a description of
the optimal method of achieving this goal.
1
The transformations obtained from the Locatable Camera are affine in nature. The inverse of matrices
representing these transformations are possible as long as the matrix determinant is non-zero. An example
of a situation where inverting the transformation matrix would throw an error would be when attempting
to invert a matrix that scales by 0. However this would necessarily not occur when mapping a point from
world space to camera space given the knowledge that the point in question would be present in both
spaces

39
Acquiring Mesh Filters directly and Drawbacks

In Section 2.2.8 we mentioned how to access the Spatial Understanding and Mapping
modules externally from a script. Now we intend to access the mesh created in these
instances in our script so that we may use them to identify and colour or even directly
store them in OBJ files. The direct approach in accessing either mesh is through initiating
a call from the respective instances for their mesh filters as follows.
1 List < MeshFilter > SMF =
Spa t i a l M a p p i n g M a n a g e r . Instance . GetMeshFilters () ;
2 List < MeshFilter > SUF =
Spat i a l U n d e r s t a n d i n g . Instance . U n d e r s t a n d i n g C u s t o m M e s h
. GetMeshFilters () ;

Though this method is perfectly reasonable in its approach for the Spatial Understanding
instance, it is unwise to use the same for Spatial Mapping. The Understanding Custom
Mesh generated by the module is aligned in world space and can be directly sent to storage
or processing based on the same belief. However, the Mapping Manager filters are not
stored in relation to one another, that is, their inner meshes are stored in a different
coordinate space and does not reflect their view of the world space.

Accessing Meshes via Surface Objects

It is first important to understand that the list of mesh filters stored by both the mapping
and understanding components are based on a system of surface objects. Thus, the call
for mesh filters in the above function retrieves the meshes from a ‘SurfaceObject’ variable
within each instance. To clarify, in this context a ‘SurfaceObject’ is a struct defined as
follows in the MRTK.
1 public struct SurfaceObject
2 {
3 public int ID ;
4 public GameObject Object ;
5 public MeshRenderer Renderer ;
6 public MeshFilter Filter ;
7 public MeshCollider Collider ;
8 }

This structure is defined in the ‘SpatialMappingSource’ class and we can use the ‘Sur-
faceObject’ structure to map out the mesh based on its location in world space coordinates.

40
For this, we acquire the surface objects as follows:

1 Spa tial M a p p i n g M a n a g e r . Instance . Get Surfac eObjec ts () ; // For the


Spatial Mapping mesh
2 Spatial U n d e r s t a n d i n g . Instance . U n d e r s t a n d i n g C u s t o m M e s h
. SurfaceObjects ; // For the spatial understanding mesh

Acquiring meshes this way would help create a method that would work for both modules
and is consistent with the results we desire. Note that the Spatial Mapping Manager has a
function that returns the surface objects, while the understanding mesh returns the same
using a public variable. Both calls return a

ReadOnlyCollection < SpatialM appingSource.Surf aceObject >

type that contains the surface objects on the scene and additionally is a construct of the
results obtained from the ‘SurfaceObserver’ base class of the HoloLens SDK. This is a list
variable that stores the specific structure as mentioned in the angle brackets. The world
space coordinates are sometimes not available in the mesh filters (as these coordinates could
also be stored in relation to the position of the parent game object). However, the game
objects that these components are attached to are produced in the real world. As such,
every game object possesses a ‘Transform’ component consisting of a ‘TransformPoint’
method which can take a point stored within it and transform it into the space that the
said object is occupying. Thus we can take each vertice of a mesh in each surface object
and apply the transform component to convert the vertice into a world space coordinate.
For any given surface object so, if soT ransf orm is its transform component and soF ilter
its mesh filter, we can acquire the same and implement this idea as follows:
1 MeshFilter soFilter = so . Filter ;
2 Transform soTransform = so . Object . transform ;
3 List < Vector3 > worldVertice = new List < Vector3 >() ;
4 Vector3 [] tvertices = soFilter . mesh . vertices ;
5 for ( int i = 0; i < tvertices . Length ; i ++)
6 {
7 Vector3 temp = soTransform . TransformPoint ( tvertices [ i ]) ;
8 worldVertice . Add ( temp ) ;
9 }
Listing 3.2: Applying transforms to coordinates in bringing them into the Application’s
World Space

41
Here worldV ertice is a list consisting of all the vertices in the surface object in world space
coordinates. Since the list that stores the faces of a mesh are just the order of the vertices
that are connected to one another, we do not need to do any transformation while importing
these values from the surface objects. However, in the case of normals, a transformation
has to be done to adjust the direction such that it is compatible with the new space the
vertices of the mesh occupy. The ‘Transform’ component has a ‘TransformVector’ method
to do just this action and can be used for any given normal vector expressed through the
use of a ‘Vector3’ object. If vN ormal is a ‘Vector3’ object that represents the normal of
a vertice in a given surface object with variables the same as in Listing 3.2, then we can
transform them as follows:
1 Vector3 worldNormal = soTransform . TransformVector ( vNormal ) ;

Z-Filtering

During the process of collecting vertice information, the surface observer returns meshes
throughout the area of observation irrespective of whether they are visible during the
sampling step or not. This requires an additional layer of filtering to avoid taking in
vertices that are blocked from view especially by the object under observation itself. For
example, in Figure 2.10, we can see the mesh overlay on the sphere while there exist meshes
behind the field of view which represents the sphere and would be acquired through the
data acquisition steps we mentioned earlier. Thus this requires filtering based on the field
of view and can be implemented as follows:

1. For each sample, while capturing the image, acquire the camera position
2. Do a ray cast from the camera position to each vertex and record the position of
collision on the mesh
3. If the position of collision on the mesh in view, matches with the vertex to which the
ray cast was intended, the vertex is safe to record, else, drop the vertex

An implementation of the Z-filtering method is mentioned in Appendix A.5

3.2.4 RGBD Module and Incremental Super Resolution


It is now possible to combine the knowledge we have discussed in Sections 3.2.1 to 3.2.3 to
create a module that collects RGB color information (Red, Green, Blue), combine it with

42
the mesh information to save for post-processing and complete the first phase of the ISR
technique.

The RGBD Module

It is best to find the corresponding color based on the depth as the resolution of the depth
sensor is lower than the resolution of the locatable camera, that is, the camera collects more
points of information in the form of pixels than the depth camera can collect in terms of
the scene that is in the field of view.
Program Flow: The program flow of a module as discussed above can first be sum-
marized in the following steps:

1. Select a point around which spatial information needs to be acquired and surround
it with a bounding volume of required size and shape

2. Collect spatial information within the observer volume in terms of a complete mesh
that overlays the volume or just selective information (must contain vertice informa-
tion) that occurs within it

3. Simultaneously initiate the photo capturing module to obtain an image of the scene
in view using the RGB module discussed in Section 3.2.2

4. Pass the vertices obtained from the spatial information capture into the RGB module
to obtain colour information

5. Store mesh information as an OBJ file along with colour information in a different
or similar file based on requirements

Step 1 involves the selection of a point for which we need to create an observer volume
around which we intend to collect the spatial information. Both the Spatial Mapping and
the Understanding mesh is rooted in their depth information with the ‘Spatial Mapping
Observer’ script which is an interface of the ‘SurfaceObserver’ base class. In addition, the
script as mentioned in Section 2.2.7 provides options to specify an observer volume. The
observer volume requires as input a shape and its size. There are several shapes that can
be considered by the observer including boxes and spheres additionally for which we have
to provide parameters such as orientation and size. Considering a box-shaped observer
volume with its length, breadth and height specified as float values l, b and h respectively
with its center c specified in 3-dimensional coordinates as a ‘Vector3’ object, we can request

43
an observer volume for spatial observation as follows (The code also additionally specifies
an integer parameter T rianglesP erCubicM eter, as the resolution of depth scanning in
the observer volume, this value is inconsequential above the approximate range of 1500 to
2000):
1 Sp at ia l M a p p i n g O b s e r v e r SMO =
Spa t i a l M a p p i n g M a n a g e r . Instance . SurfaceObserver ;
2 SMO . Tr i a n g l e s P e r C u b i c M e t e r = T r i a n g l e s P e r C u b i c M e t e r ;
3 SMO . Origin = c ;
4 SMO . Obse rv er Vol um eT yp e = O b se r v er V o lu m e Ty p e s . AxisAlignedBox ;
5 SMO . Extents = new Vector3 (l , b , h ) ;

Once the observer volume is in place, the spatial information we receive would only be
regarding the surface objects that pass through this volume or contained in it. Step 2 is
the acquiring of spatial information as specified in Section 3.2.3. The mesh filters that are
obtained can then be filtered for the characteristic information we are seeking. Steps 3 and
4 involve using this vertice data to collect colour information as mentioned in Section 3.2.2,
which is finally stored as an OBJ file. The colour information can be stored in an OBJ file
as well with colour information substituted as vertice coordinates such that an OBJ file
reader would read the same without a hitch.

Incremental Super Resolution

The HoloLens updates its mesh over a period of time that can be specified. Since raw-
access to depth information is not allowed and only mesh information can be received the
identification of the surface of an object is easier if the vertice information is collected over
a period of time. The RGBD data for a specific volume covering the area of interest can
be collected multiple times. This would result in a large set of points for which a majority
of noise can be filtered out by using the colour information that we obtained. This filtered
set of vertices can be used as input into surface fitting algorithms to generate a surface as
mentioned in Section 3.3. Experimentation regarding this technique using a sphere as the
model, the subsequent surface reconstruction and its registration with a simulated point
cloud of a perfect sphere are discussed in Chapter 4.
Note: The Spatial Mapping mesh is the best option for this method. Even though
this mesh is coarser than the Spatial Understanding mesh, the Understanding mesh is a
product of the Spatial Mapping mesh and in addition focuses on detecting flatter surfaces
such as tables, chairs, floors, walls and ceilings at this point. This would mean that curved
surfaces such as a breast would run the risk of being oversimplified. Thus it is better to

44
access the rawest form of information we can acquire, especially one that does not take
shape priorities into context.

3.2.5 Pitfalls while Developing for the HoloLens


The HoloLens only allocates a finite amount of dynamically allocatable memory for its
applications. Thus, any allocation that exceeds this amount would result in the application
being killed from the outside. There are other pitfalls such as implementing the system
such that a function runs in parallel to the frame rendering done by the HoloLens. In
the context of this framework, following a set of precautionary measures can result in the
smoother performance of the application.

Use Coroutines

While running a function it is natural that the frame of Holograms would be held in
place until processing completes. For certain applications, especially in the context of the
intentions of our data acquisition technique, a function that stores vertices from a mesh,
correlates them with color and saves them to a file is extremely heavy on processing in the
1 th
context of 60 of a second (This assumes that our application is running at 60 frames per
second).
Given the above problem, we would need a method to divide the processing of a function
across frames. A coroutine is a function that has the ability to pause execution and return
control to Unity but then to continue where it left off in the following frame [14]. In the
context of C#, a coroutine is executed as follows:
1 IEnumerator DoStuff ()
2 {
3 for ( ... ) // Executes some time consuming operation
4 {
5 ...
6 yield return null ;
7 }
8 }

Here, when the coroutine needs to return control back to Unity, it stops at the line con-
taining the ‘yield’ command until Unity returns control back to the coroutine, after which
the method continues from where it left off. If necessary, it is also possible to provide a
time delay in restarting execution by doing the following:

45
1 IEnumerator DoStuff ()
2 {
3 for ( ... ) // Executes some time consuming operation
4 {
5 ...
6 yield return new WaitForSeconds (2.3 f ) ;
7 }
8 }

The above code makes the coroutine wait for 2.3 seconds before it continues its operation.
The value 2.3 can be substituted with a measure of time, based on the requirements of
the application. In context, the application we implemented for performing ISR waits 4
seconds per every sample2 . This time would also be useful if the user considers moving
from their original position for observing the volume under scanning from a different angle
or position.

Memory Bottleneck

The limited memory available for dynamic allocation in the HoloLens can result sometimes
in the application being killed once it exceeds this threshold. The following are a few ways
this could happen, especially in the context of our application and possible solutions and
precautions that can be taken to fix the same.
Instantiating Meshes: Instantiation is where an object is cloned completely including
its inner components. More information regarding Instantiation is available within the
Unity Scripting API documentation [17]. While working on an application that performs
the ISR technique, it is possible to instantiate meshes so that they may be processed outside
the Spatial Mapping Object. The utilization of the mesh in a coroutine is one such situation
which might necessitate this usage. In the context of our application, the spatial observer
is instructed to update itself every two seconds. Considering that we would be analyzing
the mesh in parallel to the surface observers action it is possible that the original mesh
object might change as mesh processing is happening. Continuous instantiation without
the proper disposal of objects after processing would create a scarcity in memory and
consequentially kills the application.

2
This time delay is such that the user can move around to observe the surface from a different point
of view. Changing the users point of observation during scanning can result in perks such as the inflow of
fresh mesh information based on areas that may have not been in the field of view earlier

46
Using Textures: While using colours for filtering noise in the context of ISR, we
would run into using Textures in storing the image we collect in parallel such that the
world coordinates of said points can be mapped back into an image. Considering the kind
of resolution we intend to capture the image in, textures would consume a large amount
of memory per image which, at times, is greater than the objects that store the mesh
information and thus proper handling of this variable is a necessity.
Defensive Measures Against Memory Bottlenecks: The following are a few mea-
sures one can adopt in avoiding issues concerning memory especially in the context of
creating a system implementing ISR

1. Delete all instantiated objects and free all variables by setting them to null values.

2. Manually initiate the Garbage collector after a set number of samples are collected.
This is possible through the system command ”GC.Collect()” (More information
can be found in the Microsoft Developer Network [6]). An example of running the
garbage collector every 5 iterations is as follows:
1 using System ;
2 ...
3 for ( int i = 0; i < TotalSamples ; i ++)
4 {
5 ...
6 if ( i % 5 == 0)
7 {
8 GC . Collect () ;
9 }
10 ...
11 }

3. Consider limiting the observer volume and filtering vertices within the span of a single
frame. Limiting the observer volume would result in a smaller mesh to analyze, thus
making it possible to run the filtering action in a normal function rather than a
coroutine.

4. Dispose textures after their use by doing the following:


1 ...
2 Texture2D . Destroy ( texture_variable ) ;
3 ...

47
3.3 Surface Reconstruction
The HoloLens even with the use of the ISR technique only yields about 1000 to 1500 points
around the region of interest(30 cm × 30 cm × 30 cm box). After noise filtration through
colour, this reduces to about 300 to 600 points. This poses a challenge for surface matching
techniques as it is difficult to correlate the orientation and position of the scanned breast
with respect to the 3D model generated from MRI. The specifications of the HoloLens
additionally mentions that the upper limit of resolution for a mesh generated via the
devices depth sensors is approximately 1500 triangles per cubic meter. The situation calls
for the application of surface fitting techniques which provide a much smoother distribution
of a point cloud for future matching.

3.3.1 Proposed Technique for Surface Reconstruction


There are multiple types of fitting techniques available for application in problems dealing
with surface reconstruction as mentioned in Section 2.4.1. However, our data set is sparse,
noisy and has a high chance of containing holes resulting in leaving out explicit surface
representations. While there are several implicit methods that could be considered, our
problem deals with extracting out the surface of a breast during real-time sampling. Thus
rather than delving into generic solutions based on implicit representations, it would be
best to formulate a method that would fit our problem. Thus, we propose a method where
a functional representing a signed distance function is optimized through a total variation
solver and gradient descent in creating a surface by accessing its zero level set.
In Section 2.4.2, we introduced a level set representation φ, which models the boundary
of our region of interest which in this case is our breast mound. Given a set of K points,
{ri }K
i=1 delivered from the HoloLens, we need a level set function that adheres to this data.
Since the breast mound is smooth, it can be reasonably assumed that the mean curvature
for the same is relatively small. As mentioned earlier, we thus need to minimize a functional
E(φ), where its components Eint penalizes the smoothness of the zero level set of φ, while
Eext shows the adherence of the zero level set to our input data. It is seen to be useful to
define φ to be a Signed Distance Function (SDF) viz,

0 1 if r ∈ Ωin
φ(r) = inf kr − r k · , ∀r ∈ Ω, (3.11)
−1 if r ∈ Ω \ Ωin

where r0 is the set of points that lie on the boundary of our region of interest (the zero

48
level set), ∂Ω. Being a SDF, such φ should obey the Eikonal equation where
(
k∇φ(r)k = 1, ∀r ∈ Ω,
(3.12)
φ(r) = 0, ∀r ∈ Γ = ∂Ω.

. Here ∇ denotes the gradient of φ at r. We now simplify Equation (2.13) as,


 
δEint ∇φ
= −div . (3.13)
δφ k∇φk

At the same time, we write Eext as an expression of mean distance, to ensure a measure of
adherence to our data, as follows:
K
1 X
Eext (φ) = |φ(ri )|. (3.14)
K i=1

Thus when we minimize the combined energy as mentioned in Equation (2.7) based on
the components mentioned above, the process pushes the surface as close to the data as
possible while ensuring that the surface we get as output is smooth.
To derive the first variational derivative of the external energy component, we do as
follows:

dEext (φ +  · δφ)
δEext =
δ =0
K (3.15)
1 X d (|φ(ri ) +  · δφ(ri )|)
= .
K i=1
d =0

For simplicity, we define the derivative of the absolute value function (|a|)0 as sign(a),
which is true ∀a 6= 0. This is justified since |a| can be approximated as
1
|a| = lim tan−1 (τ a) (3.16)
τ →∞ π

where the right hand side in the equation is 0 when a = 0. So from a numerical perspective,
the assumption we made for the derivative of the absolute function as the sign of the value
is quite acceptable.

49
Now continuing from Equation (3.15),
K
1 X
δEext (φ) = sign (φ(ri ) +  · δφ(ri )) · δφ(ri )
K i=1
=0
K
(3.17)
1 X
= sign (φ(ri )) · δφ(ri ).
K i=1

Let D denote the Dirac delta function, for which one of its defining attributes is its sifting
property. Specifically, given a continuous function f (τ ), we have,
Z
f (τ ) = f (τ 0 )D(τ − τ 0 )dτ 0 . (3.18)

Using the sifting property, we derive the first variation for Eext as
K Z
1 X
δEext (φ) = sign(φ(r)) · δφ(r) · D(r − ri ) dr
K i=1 r∈Ω
Z " K
# (3.19)
1 X
= sign(φ) (r − ri ) · δφ(r) dr.
r∈Ω K i=1

δEext
From this, we have δφ
as,
" K
#
δEext 1 X
= sign(φ) · D(r − ri ) . (3.20)
δφ K i=1

In practical computation however, using the Dirac delta function is impossible, so we


replace the same with an approximation as follows [27]:
1 krk2
− 12 2
D(r) ≈ e σ = Dσ (r). (3.21)
(2πσ 2 )3/2

Note that,
Dσ (r) −→ D(r). (3.22)
σ→0

50
Discretization

Given the above definitions, we now proceed to the discretization of the total gradient flow
which now has the form of:
  " K
#
∂φ(r, t) ∇φ 1 X
= div −λ Dσ (r − ri ) · sign(φ), (3.23)
∂t k∇φk K i=1

where λ acts as the regularization


h parameter for thei equation. For the sake of notational
1
PK
clarity, we term the component K i=1 Dσ (r − ri ) as g(r) and re-write the equation as:
 
∂φ ∇φ
= div − λgφ, (3.24)
∂t k∇φk
We now apply a semi-implicit discretization scheme as follows [62]:
 
φt+∆t − φt ∇φt+∆t
≈ div − λg · sign(φt ). (3.25)
∆t k∇φt+∆t k

Note that, here, t is not a physical unit of time, but a representation for an algorithmic
step. Since semi-implicit scheme is more stable and less sensitive to a change in t, it further
allows for larger discretization steps in time. Now we can write this equation as:
 
∇φt+∆t
φt+∆t − ∆t · div = φt − (∆tλ)g · sign(φt ). (3.26)
k∇φt+∆t k

The left-hand side of the above equation can be expressed in terms of the action of an
operator A. Consequently, we have:

A{φt+∆t } = f, (3.27)
where
f = φt − (∆tλ)g · sign(φt ). (3.28)
is available from t.
It is known that this equation amounts to the Euler-Lagrange optimality condition for
the functional [54]:
Z Z
1 2
E(φ) = |φ(r) − f (r)| dr + ∆t · k∇φ(r)kdr, (3.29)
2

51
where the last term amounts to the total variation semi-norm of φ. Thus solving Equa-
tion (3.27) is equivalent to finding the minimizer of E(φ) which is characterized by the
Euler-Lagrange condition,
δE(φ)
= 0. (3.30)
δφ φ=φt+∆t

Additionally, since E(φ) is strictly convex, such a minimizer would be unique and can
be found by a range of optimization algorithms such as PGM (Proximal Gradient Method),
which in this case is also known as Chambole method [26]. So symbolically, the solution
to Equation (3.27) is given by:

φt+∆t = PGM(f ) = PGM(φt − ∆t λg sign(φt )). (3.31)

Our Surface Fitting Algorithm

Now, we can proceed to build our algorithm as follows:


Data: {ri }Ki=1 , An initial SDF guess φ0 , λ > 0, ∆t > 0, σ > 0, 0 < β ≤ 1, Iter > 0
Result: Minimizer of E, φ∗
i = 0;
while i less than Iter do
g(r) = K1 K
P
j=1 Dσ (. − ri );
f = φ − (∆tλ) · g · sign(φ);
φ = PGM(f );
φ = Redist(φ);
σ = βσ;
i = i + 1;
end
Algorithm 1: Our Surface Fitting Algorithm
The ‘Redist’ operation mentioned above is a method termed as redistancing by which
we make sure that the continuously evolving function within our algorithm remains a signed
distance function [57, 56]. The β parameter modifies the σ in every iteration such that the
delta function subsequently becomes tighter.

52
3.4 Surface Registration
In registering the surface that we acquire through the process as described in Section 3.3
with the actual surface information of the patient we acquire from MRI, we use an intensity-
based automatic registration technique. Consider that the fitting process as mentioned in
the previous section yields an interface ΓHL . Additionally, we possess an interface ΓM RI
from the MRI data. Our objective here is to register ΓHL to ΓM RI .
Given a set of affine transforms T : R3 → R3 , our objective is to find an optimal T that
could be applied to the MRI interface such that it minimizes the distance between both
interfaces, i.e.,

min [dist(ΓM RI ◦ T, ΓHL )] . (3.32)


T

Before specifying the distance measure, we specify the following Gaussian approxima-
tion of the Dirac delta function as follows:

1 1 x2
Dσ (x) = √ e− 2 σ2 . (3.33)
2πσ 2
Indeed, the registration can alternatively be done between the SDFs φM RI and φHL
corresponding to ΓM RI and ΓHL , respectively. Such an approach would allow us to use
relatively simple and convenient distances such as Dσ (φM RI (r)) and Dσ (φHL (r)). Dσ (φ(r))
in essence forms a fuzzy indicator of the zero level set of the SDF. Here σ is a scalar,
positive value dictating the thickness of intensity around the surface, with a larger value
corresponding to a higher thickness. Thus, our problem becomes the minimization of the
following based on an affine transform T :
Z
min |Dσ (φHL (T (r))) − Dσ (φM RI (r))|2 dr. (3.34)
T

The calculated value of the Dirac Delta function from the corresponding φ can be
assumed as an intensity based representation of the surface. Thus, it is now possible
to use a registration technique built for this purpose. We use an intensity-based auto-
matic registration technique for which more information can be found in the MathWorks
documentation[4, 9]
Once the registration technique provides an acceptable T , the same transform can be
used in locating the tumour location within the breast. This method however is under

53
the assumption that a transform that registers based on surface information can be used
to extrapolate the motion of the interiors as well. The surgeon can then proceed to place
physical markers that identify these locations with precision and proceed to surgery.

3.5 Summary
The chapter details the proposed methodology in realizing the AR framework that would
assist in breast-conserving surgery. The sections mentioned above also detail the con-
tributions we have made towards the same. This in terms of data acquisition involves
the collection of depth information from a specific area, merging data collected from the
HoloLens depth sensors with the color information acquired from the device’s Locatable
Camera. In the side of surface reconstruction, we suggest algorithms that are capable of
reconstructing the observed surface from the data points collected through the HoloLens
(given that this data represents a sparse point cloud that is convex in nature ). Finally, we
suggest a method to register this surface with the MRI data. The following chapter details
the experimental phase of our project based on the techniques mentioned in this chapter.
This is facilitated through the utilization of a hemisphere as a breast-mimicking surface.

54
Chapter 4

ISR Experimentation

This section details the proof of concept experiment in registering the collected data of
a spherical object with simulated data and finally visualization. Since the HoloLens is
particularly weak in terms of understanding curved surfaces through its depth sensor,
registering a sphere through the ISR technique allows for detecting curved surface data
and subsequently provides promise in assisting conservative breast cancer surgery.

4.1 Setup for Data Collection


The understanding of the lower-resolution of depth sensitivity led to the consideration of
a spherical structure as the subject of experimentation. A sphere approximately 14 cm in
diameter was used and was placed in the centre of a black poster paper. Based on the
discussions above, the following is a list of parameter settings before scanning the sphere:

• Observer Volume Shape: Box

• Observer Volume Size: 50 × 50 × 50 cm3 (Even though this setting would result in
considering a larger surface object that overlies the area)

• Observer Volume Origin: Can be selected at the beginning of scanning - Usually, a


point observed by the depth sensor on the surface of the sphere

• Vertex Filter Size: 30 × 30 × 30 cm3

• Vertex Filter Origin: Same as Observer Volume Origin

55
• Number of Samples Collected: 50

The setting for the experiment can be seen in Figure 4.1. The uniform black color
around the sphere would help in filtering out noise through the analysis of color that we
attach with each point.

Figure 4.1: Setting used for Spatial Scanning in Conducting the ISR Experiment

4.2 Data Acquisition Program Flow


Given the setup, we collect the data for ISR based on the following algorithm:

1. Start a coroutine and initiate the process of generating the spatial mapping mesh,
set the iteration counter to zero and initialize a vertex and colour list. The mesh
overlay on the object during scanning is as seen in Figure 4.2.

56
2. While the iteration counter is less than total sample requirement, move on to the
next step, else go to step 9.

3. Capture a photo and save it in a texture using ‘RGBCapture’ as mentioned in Sec-


tion 3.2.2.

4. Capture vertex information from surface objects within world space coordinates as
mentioned in Section 3.2.3.

5. Filter vertices based on the vertex filter size and origin.

6. Filter vertices based on the Z-filtering method as mentioned in Section 3.2.3.

7. Map filtered vertex data and correlate with colour information from the image cap-
tured in Step 3 as mentioned in Section 3.2.2.

8. Append captured data into appropriate lists as mentioned in Step 1 and increment
the iteration counter.

9. Wait for 5 seconds using the yield functionality of coroutines as mentioned in Sec-
tion 3.2.5 and then move to Step 2.

10. Store the information from the vertex and colour lists into OBJ files using the meth-
ods discussed in Section 3.2.1 and then exit the coroutine.

Figure 4.2: The Spatial Mapping Mesh Overlay as seen from the HoloLens During the
Scanning Procedure

57
4.3 Filtering Noise Using Color
The data acquired is then filtered first by color. Since the surroundings of the sphere are
black in color, a majority of the noise acquired from the mesh would have a corresponding
color that is close to the color value of black ((0, 0, 0)). Thus a simple distance metric used
on the colors can be used to filter out noise. In this case a euclidean metric is used as
follows: p
dist = x2 + y 2 + z 2 . (4.1)
Here (x, y, z) are the corresponding red, green and blue components of the color that we
collect. For data collected for N points, we can define the data C as:

C = {(x, y, z)|x, y, z ∈ [0, 1]}. (4.2)

Now, if each (x, y, z) in C can be represented as Ci we can define the threshold t to filter
the colors as: PN
dist(Ci )
mean(dist(C)) = i=1 , (4.3)
N
s
PN 2
i=1 [dist(Ci ) − mean(dist(C))]
std(dist(C)) = , (4.4)
N −1
t = mean(dist(C)) − std(dist(C)). (4.5)
The threshold as seen in Equation (4.5) is just the difference of the mean and standard
deviation values of the distance of each color from the origin. Since the colors from the
black background serves as the source of color for a majority of noise, the points associated
with it would be closer to the origin and thus the threshold helps in filtering out the same.
Once the colors have been filtered, we also apply cut out values from the base of the sphere
so as to get a hemisphere. This can be done by filtering out a set of axis values from the
center of the sphere.
The unfiltered and filtered scatter figure for the point cloud would look like Figure 4.3.
The figure exhibits a point cloud of 364 points collected over 50 samples of the Spatial
Mapping mesh. A single sample in contrast only contains approximately 30 viable points.
The samples collected over our data acquisition technique can now be used in our surface
reconstruction algorithm to generate a surface that can be used to map against another
set of higher resolution data.

58
(a) Point Cloud Before Filtering

(b) Point Cloud After Filtering

Figure 4.3: MATLAB Plot of Points Before and After the Filtering Process Mentioned in
Section 4.3 (X, Y and Z are 3D Coordinates Representative of Distance in Meters)

59
4.4 Surface Reconstruction - Setup and Results
We prototyped our surface reconstruction algorithm in MATLAB. In addition to the algo-
rithm mentioned in Algorithm 1, the following steps were taken as part of initializing the
process:

1. The convex hull C of the point cloud was derived using the inbuilt ‘convhull’ function
in MATLAB as seen in Figure 4.4. Points in red are the ones selected in forming the
shape of the hull.
2. A three dimensional grid G is generated of size N × N × N based on the filtered
point cloud F (where (x, y, z) ∈ F ) containing equally spaced values on the x, y and
z coordinates in the range of [min(x) − o, max(x) + o], [min(y) − o, max(y) + o] and
[min(z) − o, max(z) + o] respectively. Here o > 0 and is set to the equivalent of 1 cm,
that is 0.01 in ensuring that a set of points remain above and below the convex hull
as padding.

Figure 4.4: Convex Hull Created for the Filtered Point Cloud Using MATLAB (X, Y and
Z are 3D Coordinates Representative of Distance in Meters)

60
3. A function called ‘intriangulation’ [43, 52] is used to identify if each point on the grid
lies inside or outside the generated convex hull C, in effect creating a binary mask.
This mask indicates the initial set of points in G which we would further reduce, as
seen in Figure 4.5.

Figure 4.5: Points from the Grid Correlating to a Logical 1 Value in the Mask (X, Y and
Z are 3D Coordinates Representative of Distance in Meters)

4. A function called ‘bwdistsc’ [49, 48] is used to calculate the three dimensional signed
distance function of each point in the mask, forming the φ0 component of the algo-
rithm mentioned in Algorithm 1.

5. Subsequently the constants based on Algorithm 1 are set as λ = 200, τ = 10, T = 20


and σ = 10, with λ, σ and τ being reduced by three percent of their own value in
each iteration1 .

1
These values were selected based on trial and error results of the algorithm, obeying parameter
constraints, until a surface of satisfactory form was attained

61
The result of the above experiment through the access of the zero level set of the
optimized function φ identifies the shape of our surface in G as shown in Figure 4.6.

Figure 4.6: Surface G corresponding to the Zero Level Set of our Optimized φ with the
Original Data Points Represented in Blue (X, Y and Z are 3D Coordinates Representative
of Distance in Meters)

4.5 Surface Registration - Setup and Results


Using the results from Section 4.4 we can now proceed to register a simulated point cloud
of a hemisphere to assess our ISR technique. The following are the steps involved in setting
up the registration process:

1. Given a point cloud F in grid GF as the result of our reconstruction experiment and

62
the optimized signed distance function (SDF) result φF , we generate the point cloud
M and surrounding grid GM of a hemisphere of similar size as the sphere we scanned,
which in this case is 13 cm in diameter, containing 1711 points. We consider the point
cloud F as the fixed point cloud and M as the moving point cloud to be registered
with the position of F . The surface point cloud M forms as seen in Figure 4.7.

2. We calculate the convex hull and use the same to generate the mask of point clouds
M as Mmask , in the same set of steps as previously mentioned in Section 4.4. The
signed distance function φM is calculated for the points.

3. The intensity is calculated for all points in both point clouds M and F as IM and
IF respectively. The intensity calculation involves the use of the combined data set
of the grid and SDF (GM , φM ) and (GF , φF ) as stated in Equation (3.33).

Figure 4.7: Simulated Hemisphere with Identical Size as the Scanned Object (X, Y and Z
are 3D Coordinates Representative of Distance in Meters)

63
Given the techniques in generating the intensity, the algorithm for acquiring the trans-
form for registration of M to F is as follows:

1. Set initial σ based on Equation (3.33) as 20.2

2. Create a three dimensional image to world coordinate reference for both point clouds
RF and RM by passing in values corresponding to GF and GM in the MATLAB
function ‘imref3D’ [12].

3. Retrieve optimizer(Opt) and metric(M et) configurations based on the image capture
modality(‘unimodal’ in this case as both point clouds exhibit readings from an object
surface with similar brightness and contrast, post surface reconstruction) using the
MATLAB function ‘imregconfig’ [3].

4. Set total number of iterations N as 10, initialize iteration counter n as 0.

5. When n < N , proceed to step 6, else exit.

6. Calculate initial intensity for point clouds M and F as IM and IF .

7. If n = 1, then the first transform generated T1 is found using the ‘imregtform’[4] func-
tion in MATLAB, where values [IM , RM , IF , RF , Opt, M et, ‘affine0 ] are passed. Else,
the current transform Tn is found by using the same function by passing the val-
ues [IM , RM , IF , RF , Opt, M et, ‘affine0 , Tn−1 ]. (Tn−1 is passed as an initial transform,
‘affine’ is passed as the type of transform we expect from the function[1]).

8. σ = 0.8σ, Go to step 5.

The cross-sectional view of the surface registration is seen in Figure 4.8. The surfaces are
seen to overlap one another about their respective cross-section, providing a satisfactory
registration.

2
This value is based on trial and error results. The higher value of σ at the start gives a rougher
estimation of the surfaces that are then registered. Subsequent iterations reduce the σ (resulting in a
more sharper estimate of the surface) and additionally uses the prior registration information to refine the
acquired transform

64
(a) Cross-sectional view of Scanned Model (b) Cross-sectional view of Simulated Model

(c) Cross-sectional view of Registered Model

Figure 4.8: Cross-Sectional View of Registered Models

65
4.6 Visualization
Finally, considering that the simulated surface has a sphere of 2 cm diameter within it
representing a tumour, with its centre at (0.03, 0.03, 0.03) as seen in Figure 4.9, we apply
the transform generated from the registration procedure to the cloud in translating this
into the HoloLens application space. This transformed position is then transferred to the
device such that a tumour can then be visualized in the real world within the scanned
sphere. The results for the same are as seen in Figure 4.10, where the tumor is seen to
be accurately sitting on the upper half of the sphere we considered for reconstruction.
Additionally, the sphere is also oriented towards a quadrant as is the simulated tumour
model and is seen to be accurate to the specified position.

Figure 4.9: Representation of a Tumor within the Simulated Surface (X, Y and Z are 3D
Coordinates Representative of Distance in Meters)

66
(a) Top View

(b) Side View

Figure 4.10: Top and Side View of the Simulated Tumor Visualized within the Real World
Object

67
4.7 Summary
We started initially by scanning a sphere of 13 cm in diameter using our HoloLens, over the
course of 50 samples using the technique mentioned in Sections 3.2.4 and 4.2. The resulting
point cloud consisted of 1776 points which included noise and a small portion of the table
that the sphere was set up on. Subsequent colour filtration and further culling of the lower
half of the sphere as explained in Section 4.3 yielded the point cloud of a hemisphere with
364 points. We passed this point cloud into the surface reconstruction algorithm devised
in Section 3.3 and executed in Section 4.4, to generate a surface. A hemisphere is then
simulated with size 13 cm in diameter at the origin containing 1711 points. An intensity-
based automated registration technique is then used to register the simulated point cloud
into the space of the scanned point cloud as mentioned in Sections 3.4 and 4.5. Finally
a representative body for a tumour is simulated within the space of the simulated point
cloud of 2 cm in diameter with its origin at (0.03, 0.03, 0.03), and the transform obtained
from the registration process is applied to the same to move the tumour into the HoloLens
application space. The transformed coordinates are then sent to the HoloLens where they
are visualized in the real world, creating a correspondence between the simulated point
cloud and the scanned object.
Through this experiment, it is clear that the HoloLens can be used to effectively read
depth information from a surface such that the scan is used in registration with another
three-dimensional surface based on a point cloud.

68
Chapter 5

Conclusion and Future Work

This thesis provides a method in which the Microsoft HoloLens could be used in a frame-
work that enables breast-conserving surgery. Attaining this goal meant that a surface or
mesh needed to be generated for the target surface so that it may be registered with the
information we receive from the MRI, which included the location of the tumours. The
HoloLens with its portability and ease of use is an incredibly viable candidate in this aspect,
but its sensors have poor observability of the depth resolution. To address this problem,
we proposed a method termed ”Incremental Super Resolution” which leverages the
continuously changing mesh information in the HoloLens by accumulating it over time in a
space of interest and further using this data within a surface reconstruction algorithm for
convex surfaces, to help in registration. Further, we demonstrate an experiment through
which the viability of the information collected from the HoloLens can be validated by
registering a simulated curved convex surface with data collected from a similar surface in
the real-world.
Our initial intention with this project was to provide contributions towards the software
end of this problem’s spectrum. However, the hardware limitations as mentioned above,
led us into the quest for methods into overcoming the same. Thus there exists a large
avenue along which work needs to be done in perfecting our proposed method. The future
work based on this thesis would be to experiment with the system in terms of objects that
possess a varying topology. The methods discussed within this thesis can also be adapted
into a continuous system that cycles through the three outlined steps as mentioned in Sec-
tion 3.1, rather than the batch data collection and processing we performed in Chapter 4.
Additionally, experiments for such a framework, would come in using the HoloLens to
capture information from its real-world scenario of use. This collection of data from a live
patient and then registering this information with the MRI information would contribute

69
as a final validation step. More investigations into the scalability and robustness of the sur-
face reconstruction algorithm, including the involvement of varying prior information such
as vertex normal data can be performed. The registration techniques for such a use case
could also be modified, optimized, and improved through experimentation with surfaces
of complicated topology. The current filters we use in reducing the noise from the point
cloud we acquire from the HoloLens are inadequate in a real-world scenario and can be
reworked by using prior information that utilizes the skin colour of the patient for isolation
of the breast. Moving forward, the system can be modified to work during the process of
surgery, where the tumor location moves, adjusting itself, as the breast is moved. Finally
a complete framework that could be used in surgical scenarios can be constructed which
scans data in real-time, sends it wirelessly to a processing system (for reconstruction and
registration), which in turn responds with the MRI based surface registered in the surgical
world coordinate space, thus locating the tumor.

70
References

[1] Affine transformation - MATLAB & Simulink. https://www.mathworks.com/


discovery/affine-transformation.html. (Accessed on 11/26/2017).

[2] Case western reserve university - Hololens. http://case.edu/hololens/. (Accessed


on 12/13/2017).

[3] Configurations for intensity-based registration - matlab imregconfig. https://www.


mathworks.com/help/images/ref/imregconfig.html. (Accessed on 11/26/2017).

[4] Estimate geometric transformation that aligns two 2-D or 3-D images - MATLAB im-
regtform. https://www.mathworks.com/help/images/ref/imregtform.html. (Ac-
cessed on 11/26/2017).

[5] Facts & stats. http://www.cbcf.org/central/AboutBreastCancerMain/


FactsStats/Pages/default.aspx. (Accessed on 12/13/2017).

[6] Gc methods (system). https://msdn.microsoft.com/en-us/library/system.


gc_methods%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396. (Accessed on
11/22/2017).

[7] Hololens hardware details. https://developer.microsoft.com/en-us/windows/


mixed-reality/hololens_hardware_details. (Accessed on 11/04/2017).

[8] Install the tools. https://developer.microsoft.com/en-us/windows/


mixed-reality/install_the_tools. (Accessed on 11/08/2017).

[9] Intensity-based automatic image registration - MATLAB


& Simulink. https://www.mathworks.com/help/images/
intensity-based-automatic-image-registration.html. (Accessed on
11/26/2017).

71
[10] Locatable camera. https://developer.microsoft.com/en-us/windows/
mixed-reality/locatable_camera. (Accessed on 11/11/2017).

[11] Objexporter - unify community Wiki. http://wiki.unity3d.com/index.php?


title=ObjExporter. (Accessed on 11/08/2017).

[12] Reference 3-D image to world coordinates - MATLAB. https://www.mathworks.


com/help/images/ref/imref3d.html. (Accessed on 11/26/2017).

[13] Screening for breast cancer canadian cancer society - breast can-
cer support and information programs. http://support.cbcf.
org/get-information/hereditary-breast-ovarian-cancer-hboc/
managing-your-breast-cancer-risk/screening-for-breast-cancer/. (Ac-
cessed on 12/14/2017).

[14] Unity - manual: Coroutines. https://docs.unity3d.com/Manual/Coroutines.


html. (Accessed on 11/21/2017).

[15] Unity - manual: Shaderlab: Culling & depth testing. https://docs.unity3d.com/


Manual/SL-CullAndDepth.html. (Accessed on 11/10/2017).

[16] Unity - manual: Shaderlab syntax. https://docs.unity3d.com/Manual/


SL-Shader.html. (Accessed on 11/10/2017).

[17] Unity - scripting API: Object.instantiate. https://docs.unity3d.com/


ScriptReference/Object.Instantiate.html. (Accessed on 11/22/2017).

[18] https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/, 5
2017. (Accessed on 11/08/2017).

[19] Materials, shaders and textures. https://docs.unity3d.com/Manual/Shaders.


html, Oct 2017. (Accessed on 11/06/2017).

[20] Nina Amenta, Sunghee Choi, Tamal K Dey, and Naveen Leekha. A simple algorithm
for homeomorphic surface reconstruction. In Proceedings of the sixteenth annual sym-
posium on Computational geometry, pages 213–222. ACM, 2000.

[21] Nina Amenta, Sunghee Choi, and Ravi Krishna Kolluri. The power crust. In Proceed-
ings of the sixth ACM symposium on Solid modeling and applications, pages 249–266.
ACM, 2001.

72
[22] Franz Aurenhammer. Voronoi diagramsa survey of a fundamental geometric data
structure. ACM Computing Surveys (CSUR), 23(3):345–405, 1991.
[23] Ronald T Azuma. A survey of augmented reality. Presence: Teleoperators and virtual
environments, 6(4):355–385, 1997.
[24] Jonathan C Carr, Richard K Beatson, Jon B Cherrie, Tim J Mitchell, W Richard
Fright, Bruce C McCallum, and Tim R Evans. Reconstruction and representation of
3D objects with radial basis functions. In Proceedings of the 28th annual conference
on Computer graphics and interactive techniques, pages 67–76. ACM, 2001.
[25] Jonathan C Carr, W Richard Fright, and Richard K Beatson. Surface interpolation
with radial basis functions for medical imaging. IEEE transactions on medical imaging,
16(1):96–107, 1997.
[26] Antonin Chambolle. An algorithm for total variation minimization and applications.
Journal of Mathematical imaging and vision, 20(1):89–97, 2004.
[27] Tony F Chan and Luminita A Vese. Active contours without edges. IEEE Transactions
on image processing, 10(2):266–277, 2001.
[28] Wikimedia Commons. File:dolphin triangle mesh.svg — Wikimedia Commons, the
free media repository, 2012. [Online; accessed 16-December-2017].
[29] Claudius Conrad, Matteo Fusaglia, Matthias Peterhans, Huanxiang Lu, Stefan We-
ber, and Brice Gayet. Augmented reality navigation surgery facilitates laparoscopic
rescue of failed portal vein embolization. Journal of the American College of Surgeons,
223(4):e31–e34, 2016.
[30] Nan Cui, Pradosh Kharel, and Viktor Gruev. Augmented reality with Microsoft
Hololens holograms for near infrared fluorescence based image guided surgery. In
Proc. of SPIE Vol, volume 10049, pages 100490I–1, 2017.
[31] Jesús A De Loera, Jörg Rambau, and Francisco Santos. Triangulations Structures for
algorithms and applications. Springer, 2010.
[32] Tamal K Dey and Samrat Goswami. Tight cocone: a water-tight surface reconstructor.
In Proceedings of the eighth ACM symposium on Solid modeling and applications, pages
127–134. ACM, 2003.
[33] Herbert Edelsbrunner and Ernst P Mücke. Three-dimensional alpha shapes. ACM
Transactions on Graphics (TOG), 13(1):43–72, 1994.

73
[34] Michele Fiorentino, Raffaele de Amicis, Giuseppe Monno, and Andre Stork.
Spacedesign: A mixed reality workspace for aesthetic industrial design. In Proceedings
of the 1st International Symposium on Mixed and Augmented Reality, page 86. IEEE
Computer Society, 2002.

[35] Henry Fuchs, Mark A Livingston, Ramesh Raskar, Kurtis Keller, Jessica R Crawford,
Paul Rademacher, Samuel H Drake, Anthony A Meyer, et al. Augmented reality
visualization for laparoscopic surgery. In International Conference on Medical Image
Computing and Computer-Assisted Intervention, pages 934–943. Springer, 1998.

[36] Hugues Hoppe, Tony DeRose, Tom Duchamp, Mark Halstead, Hubert Jin, John Mc-
Donald, Jean Schweitzer, and Werner Stuetzle. Piecewise smooth surface reconstruc-
tion. In Proceedings of the 21st annual conference on Computer graphics and interac-
tive techniques, pages 295–302. ACM, 1994.

[37] Hugues Hoppe, Tony DeRose, Tom Duchamp, John McDonald, and Werner Stuetzle.
Surface reconstruction from unorganized points, volume 26. ACM, 1992.

[38] Etienne G Huot, Hussein M Yahia, Isaac Cohen, and Isabelle L Herlin. Surface
matching with large deformations and arbitrary topology: a geodesic distance evo-
lution scheme on a 3-manifold. In European Conference on Computer Vision, pages
769–783. Springer, 2000.

[39] Hannes Kaufmann and Dieter Schmalstieg. Mathematics and geometry education
with collaborative augmented reality. Computers & graphics, 27(3):339–345, 2003.

[40] Michael Kazhdan and Hugues Hoppe. Screened poisson surface reconstruction. ACM
Transactions on Graphics (TOG), 32(3):29, 2013.

[41] Michael M Kazhdan et al. Reconstruction of solid models from oriented point sets. In
Symposium on Geometry Processing, pages 73–82, 2005.

[42] Patrick J Kelly, Bruce A Kall, Stephan Goerss, and Franklin Earnest IV. Computer-
assisted stereotaxic laser resection of intra-axial brain neoplasms. Journal of neuro-
surgery, 64(3):427–439, 1986.

[43] Johannes Korsawe. intriangulation(vertices,faces,testp,heavytest) - file exchange -


MATLAB Central. https://www.mathworks.com/matlabcentral/fileexchange/
43381-intriangulation-vertices-faces-testp-heavytest-. (Accessed on
11/26/2017).

74
[44] Aaron Kotranza and Benjamin Lok. Virtual human+ tangible interface= mixed reality
human an initial exploration with a virtual breast exam patient. In Virtual Reality
Conference, 2008. VR’08. IEEE, pages 99–106. IEEE, 2008.

[45] Ivo Kuhlemann, Markus Kleemann, Philipp Jauer, Achim Schweikard, and Floris
Ernst. Towards x-ray free endovascular interventions–using Hololens for on-line holo-
graphic visualisation. Healthcare Technology Letters, 2017.

[46] Jian Liang, Frederick Park, and Hongkai Zhao. Robust and efficient implicit surface
reconstruction for point clouds based on convexified image segmentation. Journal of
Scientific Computing, 54(2-3):577–602, 2013.

[47] Kenton McHenry and Peter Bajcsy. An overview of 3D data content, file formats and
viewers. National Center for Supercomputing Applications, 1205:22, 2008.

[48] Yuriy Mischenko. 3D Euclidean distance transform for vari-


able data aspect ratio - file exchange - MATLAB Cen-
tral. https://www.mathworks.com/matlabcentral/fileexchange/
15455-3d-euclidean-distance-transform-for-variable-data-aspect-ratio.
(Accessed on 11/26/2017).

[49] Yuriy Mishchenko. A fast algorithm for computation of discrete Euclidean distance
transform in three or more dimensions on vector processing architectures. Signal,
Image and Video Processing, 9(1):19–27, 2015.

[50] Wolfgang Narzt, Gustav Pomberger, Alois Ferscha, Dieter Kolb, Reiner Müller, Jan
Wieghardt, Horst Hörtner, and Christopher Lindinger. Augmented reality navigation
systems. Universal Access in the Information Society, 4(3):177–187, 2006.

[51] Stanley Osher and Nikos Paragios. Geometric level set methods in imaging, vision,
and graphics. Springer Science & Business Media, 2003.

[52] Sandeep Patil and B Ravi. Voxel-based representation, display and thickness analysis
of intricate shapes. In Computer Aided Design and Computer Graphics, 2005. Ninth
International Conference on, pages 6–pp. IEEE, 2005.

[53] Peter Pietrzak, Manit Arya, Jean V Joseph, and Hiten RH Patel. Three-dimensional
visualization in laparoscopic surgery. BJU international, 98(2):253–256, 2006.

[54] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based
noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1-4):259–268, 1992.

75
[55] Yoshinobu Sato, Masahiko Nakamoto, Yasuhiro Tamaki, Toshihiko Sasama, Isao
Sakita, Yoshikazu Nakajima, Morito Monden, and Shinichi Tamura. Image guid-
ance of breast cancer surgery using 3-D ultrasound images and augmented reality
visualization. IEEE Transactions on Medical Imaging, 17(5):681–693, 1998.

[56] Mark Sussman and Emad Fatemi. An efficient, interface-preserving level set redis-
tancing algorithm and its application to interfacial incompressible fluid flow. SIAM
Journal on scientific computing, 20(4):1165–1191, 1999.

[57] Mark Sussman, Peter Smereka, and Stanley Osher. A level set approach for com-
puting solutions to incompressible two-phase flow. Journal of Computational physics,
114(1):146–159, 1994.

[58] Renoald Tang, Setan Halim, and Majid Zulkepli. Surface reconstruction algorithms:
Review and comparison. In The 8th International Symposium On Digital Earth (ISDE
2013), 2013.

[59] Fabienne Thibault, Claude Nos, Martine Meunier, Carl El Khoury, Liliane Ollivier,
Brigitte Sigal-Zafrani, and Krishna Clough. MRI for surgical planning in patients
with breast cancer who undergo preoperative chemotherapy. American Journal of
Roentgenology, 183(4):1159–1168, 2004.

[60] DWF Van Krevelen and Ronald Poelman. A survey of augmented reality technologies,
applications and limitations. International Journal of Virtual Reality, 9(2):1, 2010.

[61] Sebastian Vogt, Ali Khamene, and Frank Sauer. Reality augmentation for medical
procedures: System architecture, single camera marker tracking, and system evalua-
tion. International Journal of Computer Vision, 70(2):179, 2006.

[62] Joachim Weickert. Anisotropic diffusion in image processing, volume 1. Teubner


Stuttgart, 1998.

[63] Eric W Weisstein. Euler Lagrange differential equation. Math-


World - A Wolfram Web Resource http://mathworld.wolfram.com/
Euler-LagrangeDifferentialEquation.html. (Accessed on 11/07/2017).

[64] Steve Chi-Yin Yuen, Gallayanee Yaoyuneyong, and Erik Johnson. Augmented reality:
An overview and five directions for ar in education. Journal of Educational Technology
Development and Exchange (JETDE), 4(1):11, 2011.

76
APPENDICES

77
Appendix A

Semantic and Syntactical References

The sections detailed below describe semantic and syntactical references to the topics
discussed in the main body of this thesis in developing a framework performing ISR.

A.1 Building Objects in an OBJ File


Listings A.1 and A.2 demonstrate the construction of 2 elements, one a 2D square in 3D
space and the other a cube, through the use of parameters supplied through an OBJ file.
The corresponding output of the constructed OBJ files for the square and cube are as
shown in Figures A.1a and A.1b respectively.
1 v -1 -1 0.1
2 v 1 -1 0.1
3 v 1 1 0.1
4 v -1 1 0.1
5 f 1 2 3
6 f 1 3 4
Listing A.1: Example of OBJ file structure representing the square from Listing 2.4

1 v 0.0 0.0 0.0


2 v 0.0 0.0 1.0
3 v 0.0 1.0 0.0
4 v 0.0 1.0 1.0
5 v 1.0 0.0 0.0

78
6 v 1.0 0.0 1.0
7 v 1.0 1.0 0.0
8 v 1.0 1.0 1.0
9 f 1 7 5
10 f 1 3 7
11 f 1 4 3
12 f 1 2 4
13 f 1 5 6
14 f 1 6 2
15 f 3 8 7
16 f 3 4 8
17 f 5 7 8
18 f 5 8 6
19 f 1 5 6
20 f 1 6 2
21 f 2 6 8
22 f 2 8 4
Listing A.2: Example of OBJ file structure representing a unit cube

(a) OBJ File View of Listing A.1 (b) OBJ File View of Listing A.2

Figure A.1: OBJ File Viewed through Microsoft’s 3D Builder

A.2 Creating an OBJ File Exporter


The following details the techniques in creating an OBJ file exporter, including compila-
tion segregation using #if definitions to ensure a clean build in Unity, and methods that
translate C# mesh objects into strings that could be written to file.

79
A.2.1 Writing to a File and Code Compile Segregation
Listing A.3 details code that is written such that certain lines only compile outside the
Unity editor environment, This ensures that the project builds into the VS solution such
that it can further be seamlessly built and deployed from there into the device. This
ensures that the ‘Stream’ object would be initiated correctly and would perform without
glitches in the methods that request for the same from the ‘OpenFileForWrite’ function.
1 /* namespace imports that need to be ignored in the Unity Editor
environment pre - build */
2 # if ! UNITY_EDITOR && UNITY_WSA
3 using System . Threading . Tasks ;
4 using Windows . Storage ;
5 using Windows . Storage . Streams ;
6 # endif
7

8 public class ObjExporter


9 {
10 public static string MeshFolderName
11 {
12 get
13 {
14 # if ! UNITY_EDITOR && UNITY_METRO
15 return ApplicationData . Current . RoamingFolder . Path ;
16 # else
17 return Application . per si st en tD at aP at h ;
18 # endif
19 }
20 }
21

22 private static Stream OpenFileForWrite ( string folderName ,


string fileName )
23 {
24 Stream stream = null ;
25 # if ! UNITY_EDITOR && UNITY_METRO
26 Task < Task > task = Task < Task >. Factory . StartNew (
27 async () = >
28 {
29 StorageFolder folder = await
StorageFolder . G e t F o l d e r F r o m P a t h A s y n c ( folderName ) ;

80
30 StorageFile file = await
folder . CreateFileAsync ( fileName ,
C r e a t i o n C o l l i s i o n O p t i o n . ReplaceExisting ) ;
31 stream = await file . O p e n S t r e a m F o r W r i t e A s y n c () ;
32 }) ;
33 task . Wait () ;
34 task . Result . Wait () ;
35 # else
36 stream = new FileStream ( Path . Combine ( folderName , fileName ) ,
FileMode . Create , FileAccess . Write ) ;
37 # endif
38 return stream ;
39 }
40 }
Listing A.3: Writing a file during run-time in the HoloLens (Compile code segregation
using #if as modified from the Mixed Reality Toolkit’s MeshSaver functionality)

A.2.2 Converting C# Mesh Objects to String


From the above sub-section, we have a Stream object which we can leverage in saving
a file into the HoloLens file system during application runtime and a file structure we
intend to utilize for storing information. The next step is to create a generic method that
can translate a ‘MeshFilter’ object into a string. The Unity3D Wiki has an ObjExporter
function to do this but the file saving section of the code is not compatible with the
HoloLens [11] because of the VS solution build issue. We can modify the same code
to handle multiple mesh filters or we can create a function to handle storage of specific
information from a mesh such as a list of vertices in the form of a Vector3 object list.
1 public class ObjExporter
2 {
3 ... // Other code
4

5 public static string MeshListToString ( List < MeshFilter >


mfList )
6 {
7 StringBuilder sb = new StringBuilder () ;
8 int verticeCount = 1;
9 foreach ( MeshFilter mf in mfList )

81
10 {
11 Mesh m = mf . mesh ;
12 Material [] mats =
mf . GetComponent < Renderer >() . sharedMaterials ;
13

14 sb . Append ( " g " ) . Append ( mf . name ) . Append ( " \ n " ) ;


15 foreach ( Vector3 v in m . vertices )
16 {
17 sb . Append ( string . Format ( " v {0} {1} {2}\ n " , v .x ,
v .y , v . z ) ) ;
18 }
19 sb . Append ( " \ n " ) ;
20 foreach ( Vector3 v in m . normals )
21 {
22 sb . Append ( string . Format ( " vn {0} {1} {2}\ n " , v .x ,
v .y , v . z ) ) ;
23 }
24 sb . Append ( " \ n " ) ;
25 foreach ( Vector3 v in m . uv )
26 {
27 sb . Append ( string . Format ( " vt {0} {1}\ n " , v .x ,
v.y));
28 }
29 for ( int material = 0; material < m . subMeshCount ;
material ++)
30 {
31 sb . Append ( " \ n " ) ;
32 sb . Append ( " usemtl
" ) . Append ( mats [ material ]. name ) . Append ( " \ n " ) ;
33 sb . Append ( " usemap
" ) . Append ( mats [ material ]. name ) . Append ( " \ n " ) ;
34

35 int [] triangles = m . GetTriangles ( material ) ;


36 for ( int i = 0; i < triangles . Length ; i += 3)
37 {
38 sb . Append ( string . Format ( " f {0}/{0}/{0}
{1}/{1}/{1} {2}/{2}/{2}\ n " ,
39 triangles [ i ] + verticeCount , triangles [ i
+ 1] + verticeCount , triangles [ i + 2]
+ verticeCount ) ) ;

82
40 }
41 }
42

43 verticeCount += m . vertices . Length ;


44 }
45 return sb . ToString () ;
46 }
47

48

49 public static string Vector3toString ( List < Vector3 >


vectorList , int option )
50 {
51 // option 1: vertice
52 // option 2: normal
53 StringBuilder sb = new StringBuilder () ;
54 foreach ( Vector3 v in vectorList )
55 {
56 if ( option == 1)
57 {
58 sb . Append ( string . Format ( " v {0} {1} {2}\ n " , v .x ,
v .y , v . z ) ) ;
59 }
60 else
61 {
62 sb . Append ( string . Format ( " vn {0} {1} {2}\ n " , v .x ,
v .y , v . z ) ) ;
63 }
64 }
65 sb . Append ( " \ n " ) ;
66 return sb . ToString () ;
67 }
68 }
Listing A.4: Converting Objects and Components into OBJ format String (Modified from
the OBJExporter functionality as mentioned in the Unify Community Wiki [11])

Listing A.4 features 2 methods, ‘MeshListToString’ which takes a list of mesh filters as
input and ‘Vector3toString’ which takes as input a list of Vector3 values as well as an
integer value.

83
MeshListToString

This function iteratively takes the mesh filters in the string for which the mesh component
is extracted. This mesh component is then broken down into objects such as vertices and
faces and a string is constructed from the same logic. This function is particularly useful
when we intend to store more than one MeshFilter in a single file.

Note: A room scanned by the Microsoft HoloLens is stored as multiple MeshFilter com-
ponents so that a large amount of space can be modified at select locations based on
iterative scanning results from the device, without loading all the mesh concerning
the room at once. That is, when the depth sensor of the HoloLens picks up new
information regarding an area of a room that is scanned, modifications would have
to be made. These modifications can only be made if the mesh corresponding to
the area is loaded into memory and edited based on the internal mesh modification
algorithms. Thus, storing the entire mesh in a single filter would then be a bad
idea because even minor updates to a single area would necessitate the import of the
entire mesh for the room.

Vector3ToString

This function is meant to exclusively encode a List of Vector3 data into either the format
of a vertex or a vertex normal based on the option provided into the function. This helps
in selectively filtering out information from the mesh that we need rather than storing
everything in one go, thus making any function that calls this method more efficient while
in parallel reducing the load on the HoloLens system.

A.2.3 Completing the OBJ Exporter Class


The combination of the ideas mentioned above and the code as mentioned in Listings A.3
and A.4 is a class called ‘ObjExporter’ that provides the functionality to store data as
OBJ files. Listing A.5 demonstrates such a method that could be built into the ObjEx-
porter class such that it would leverage both the ‘OpenFileForWrite’ method as well as
the ‘MeshListToString’ method that has been mentioned above.
1 public class ObjExporter {
2 ... // Other code
3 public static string SaveAll ( string fileName ,
List < MeshFilter > meshFilters )

84
4 {
5 if ( string . IsNullOrEmpty ( fileName ) )
6 {
7 throw new Argum entExc eption ( " Must specify a valid
fileName . " ) ;
8 }
9

10 if ( meshFilters == null )
11 {
12 throw new A r g u m e n t N u l l E x c e p t i o n ( " Value of meshFilters
cannot be null . " ) ;
13 }
14

15 // Create the mesh file .


16 String folderName = MeshFolderName ;
17 using ( Stream stream = OpenFileForWrite ( folderName ,
fileName + fileExtension ) )
18 {
19 // Serialize and write the meshes to the file .
20 using ( StreamWriter sw = new StreamWriter ( stream ) )
21 {
22 sw . Write ( MeshListToString ( meshFilters ) ) ;
23 sw . Flush () ;
24 }
25 }
26

27 return Path . Combine ( folderName , fileName + fileExtension ) ;


28 }
29 }
Listing A.5: Function to Save an Incoming List of Mesh Filters

A.3 Initializing and Using Photo Mode


Listing A.6 demonstrates the coding process where first the ‘PhotoCapture’ object is ini-
tiated as null and as a global variable such that other methods in the class have access to
it without explicitly being passed into the method itself. This is followed by the call of the
‘CreateAsync’ static function from the ‘PhotoCapture’ class.

85
The ‘CreateAsync’ function asynchronously creates a new ‘PhotoCapture’ object and
takes 2 parameters as input. A Boolean value (valid values are ‘true’ or ‘false’) that
specifies if the Holograms in view need to be considered while capturing the image (which
if specified false would ignore Holograms) and also a callback.
Thus the ‘CreateAsync’ method once it has initiated a ‘PhotoCapture’ object calls the
function ‘OnPhotoCaptureCreated’ which takes as input a ‘PhotoCapture’ object. This
is automatically passed in by the ‘CreateAsync’ function and here the callback assigns
this object to our global variable. Once this has been completed the callback function also
initializes a camera parameters variable and specifies the output we expect. The parameters
supplied include settings such as the camera resolution we need in capturing the image and
the pixel format in which the image is captured. Once these settings are copied, we can
call the ‘StartPhotoModeAsync’ method which takes as input the Camera parameters and
another callback. The callback is called once the photo mode of the HoloLens is initiated.
‘StartPhotoModeAsync’ calls its callback with a variable that conveys if the photo mode
has successfully initiated. The callback (‘OnPhotoModeStarted’), given the result, if suc-
cessful, then commands the HoloLens to take a photo by starting the ‘TakePhotoAsync’,
else throws an error. Once a photo has been taken the callback provided to ‘TakePho-
toAsync’ is called with the results regarding the success of the photo capture as well as a
‘PhotoCaptureFrame’ object that contains the photo. If the photo capture is a success a
texture object is initiated which has the capability to store image information, with the
width and height of the resolution we specified earlier. This texture is uploaded to the
image data and then the ‘cameraToWorldMatrix’ and the ‘projectionMatrix’ for the image
are obtained.
The camera-to-world matrix translates any three-dimensional position in terms of the
‘locatable camera’ perspective into the three-dimensional coordinate system used by the ap-
plication in drawing objects into the scene (our mesh is drawn/can be translated, in terms
of these coordinates). The projection matrix helps in converting the three-dimensional
camera space coordinates into the pixel coordinates of the image. More regarding these
matrices would be discussed below. Finally, once all the information is obtained the ‘Stop-
PhotoModeAsync’ method is called, that stops the photo mode of the HoloLens and also
initiates a callback which could be used to dispose of the photo capture object and free
memory.
More information regarding the Locatable Camera can be found in the Microsoft De-
veloper documentation [10].

86
1

2 using HoloToolkit . Unity ;


3 using System . Linq ;
4 using UnityEngine ;
5 using UnityEngine . XR . WSA . WebCam ;
6

7 public class RGBCapture {


8 PhotoCapture pCO = null ; \\ The photo capture object global
variable
9 Texture2D targetTexture = null ; \\ The Texture object where
we intend to store the image information
10 Matrix4x4 c a m er a T oW o r ld M a tr i x = Matrix4x4 . identity ;
11 Matrix4x4 projectionMatrix = Matrix4x4 . identity ;
12

13 public void capturePhoto ()


14 {
15 PhotoCapture . CreateAsync ( false , O n P h o t o C a p t u r e C r e a t e d ) ;
16 }
17

18 private void O n P h o t o C a p t u r e C r e a t e d ( PhotoCapture captureObject )


19 {
20 pCO = captureObject ;
21

22 Resolution cameraResolution =
PhotoCapture . S u p p o r t e d R e s o l u t i o n s . Or derByD escen ding
(( res ) = > res . width * res . height ) . First () ;
23 CameraParameters c = new CameraParameters () ; \\ The
camera parameters variable for storing the photo
requirements during capture
24 c . hologramOpacity = 0.0 f ;
25 c . c a m e r a R e s o l u t i o n W i d t h = cameraResolution . width ;
26 c . c a m e r a R e s o l u t i o n H e i g h t = cameraResolution . height ;
27 c . pixelFormat = Ca pt ur eP ix el Fo rm at . BGRA32 ;
28

29 captureObject . S t a rt P h ot o M od e A sy n c (c , O nPh ot oM od eS ta rt ed ) ;
30 }
31

32 private void
O nPh ot oM od eS ta rt ed ( PhotoCapture . P hot oC ap tu re Re su lt result )

87
33 {
34 if ( result . success )
35 {
36 pCO . TakePhotoAsync ( O n C a p t u r e d P h o t o T o M e m o r y ) ;
37 }
38 else
39 {
40 Debug . LogError ( " Unable to start photo mode ! " ) ;
41 }
42 }
43

44 void O n C a p t u r e d P h o t o T o M e m o r y ( PhotoCapture . Ph oto Ca pt ur eR es ul t


result , P hotoCa ptureF rame p hotoCa ptureF rame )
45 {
46 if ( result . success )
47 {
48 // Create our Texture2D for use and set the correct
resolution
49 Resolution cameraResolution =
PhotoCapture . S u p p o r t e d R e s o l u t i o n s . Or derByD escend ing
(( res ) = > res . width * res . height ) . First () ;
50 targetTexture = new Texture2D ( cameraResolution . width ,
cameraResolution . height ) ;
51 // Copy the raw image data into our target texture
52 phot oCaptu reFram e . U p l o a d I m a g e D a t a T o T e x t u r e
( targetTexture ) ;
53 phot oCaptu reFram e . T r y G e t C a m e r a T o W o r l d M a t r i x ( out
c a me r a To W o rl d M at r i x ) ;
54 phot oCaptu reFram e . T r y G e t P r o j e c t i o n M a t r i x ( out
projectionMatrix ) ;
55 }
56 // Clean up
57 pCO . St op Ph ot oMo de As yn c ( On St op pe dP hot oM od e ) ;
58 }
59

60 void OnS to pp ed Ph ot oM od e ( PhotoCapture . Pho to Ca pt ur eR es ul t


result )
61 {
62 pCO . Dispose () ;
63 pCO = null ;

88
64 }
65 }
Listing A.6: Creating a ‘PhotoCapture’ object and Initiating Photo mode (Modified from
the Locatable Camera documentation from the Microsoft Developer website[10])

A.4 A Function to Map Points from World Space to


Pixel Space
This method mentioned below can be placed in the same class as the other methods de-
scribed in Listing A.6. The code in Listing A.7 assumes that the global variables mentioned
in Listing A.6 have been populated by the process of taking a photo.
The function in Listing A.7 first finds the transformation matrix for world-to-camera
space by finding the inverse of the camera-to-world matrix for the process as mentioned
in Equations (3.5) and (3.9). This transformation matrix and the world space position
passed into the function is multiplied to find the position of the point in the camera space
coordinates. Further, theprojection
 matrix and the camera space position are multiplied
and normalized into the x y 1 format by dividing the third parameter in the matrix
obtained throughout the entire matrix, where x and y are the pixel positions expressed
in the range of [−1, 1] making them members of the Pminimal matrix mentioned in Equa-
tion (3.10).
An example for when normalization would be required would be, if the result obtained
from multiplication with the projection matrix is
 
a b 2

Here we divide the entire matrix by 2 to obtain


a b 
2 2
1

Finally, Equation (3.10) is used to find the correct pixel position of the world space coor-
dinate.
1 // Other initializations
2 public class RGBCapture {
3 // Other code
4

89
5 public Vector2 WorldToPixelPos ( Vector3 WorldSpacePos )
6 {
7 Matrix4x4 WorldToCamera = C a m er a T oW o r ld M a tr i x . inverse ;
8 Vector3 CameraSpacePos =
WorldToCamera . MultiplyPoint ( WorldSpacePos ) ;
9 Vector3 I m a g e P o s U n n o r m a l i z e d =
projectionMatrix . MultiplyPoint ( CameraSpacePos ) ; // Get
unnormalized position ( Vector3 format with Vector2
numbers , z position is not 1
10 Vector2 Im agePos Projec ted = new
Vector2 ( I m a g e P o s U n n o r m a l i z e d .x ,
I m a g e P o s U n n o r m a l i z e d . y ) / I m a g e P o s U n n o r m a l i z e d . z ; //
normalize by z , gives -1 to 1 space
11 Vector2 Im agePos ZeroTo Im agePos Projec ted * 0.5 f ) +
new Vector2 (0.5 f , 0.5 f ) ; // good for GPU textures
12 Vector2 PixelPos = new Vector2 ( Im agePos ZeroTo One . x *
targetTexture . width , (1 - Image PosZer oToOne . y ) *
targetTexture . height ) ;
13 return PixelPos ;
14 }
15

16 }
Listing A.7: World Space to Pixel Position Function (Modified from the Locatable Camera
documentation from the Microsoft Developer Website [10])

More information for camera space conversions are available at the Microsoft Developer
documentation online [10].

A.5 Implementing Z-filtering


Listing A.8 exhibits a simple example of this procedure where the method described takes
as input the camera position and a vertex to be judged as occluded in view or not. The
layer mask is set such that only layer 31 is considered for collision with the ray (layer 31
is the layer the spatial mapping mesh is assigned to by default). If a hit happens and
the collision occurred in close proximity to the intended point, the method returns a true
value. Else, the vertex is flagged as occluded by returning false.
1 Boolean R ay C a st V e ri f i ca t i on ( Vector3 cameraPos , Vector3 vertPos )

90
2 {
3 RaycastHit hitinfo ;
4 Vector3 heading = vertPos - cameraPos ;
5 Vector3 direction = heading / heading . magnitude ; // Unit
vector calculation of direction vector for a ray cast
6 int layermask = 1 << 31; // Layers to be focused on during
the RayCast procedure
7 if ( Physics . Raycast ( cameraPos , direction , out hitinfo , 10.0 f ,
layermask ) )
8 {
9 if ( Math . Abs (( hitinfo . point - cameraPos ) . magnitude -
heading . magnitude ) > 0.006) // If the hit did not
happen in proximity of the intended location
10 {
11 return false ;
12 }
13 else
14 {
15 return true ;
16 }
17 }
18 else
19 {
20 return false ;
21 }
22 }
Listing A.8: Ray Cast verification procedure

91

You might also like