Vision-Based Location Positioning Using Augmented Reality For Indoor Navigation
Vision-Based Location Positioning Using Augmented Reality For Indoor Navigation
Vision-Based Location Positioning Using Augmented Reality For Indoor Navigation
3, AUGUST 2008
Abstract — In this paper, we propose a vision-based lifestyle enhancing results, and it is currently used in several
location positioning system using augmented reality technique applications [3]. AR is an effective means for utilizing and
for indoor navigation. The proposed system automatically exploiting the potential of computer-generated information.
recognizes a location from image sequences taken of indoor AR techniques are applied to systems for applications
environments, and it realizes augmented reality by seamlessly including monitoring, remote intelligence, military
overlaying the user’s view with location information. To applications, and location positioning [3-5]. More recently,
recognize a location, we pre-constructed an image database AR technology has been actively studied for indoor
and location model, which consists of locations and paths positioning applications. Indoor positioning systems currently
between locations, of an indoor environment. Location is use GPS-based [6-8], sensor-based [9, 10], and RFID-based
recognized by using prior knowledge about the layout of the systems [11-14] to establish user location. Among them GPS-
indoor environment. The image sequence is obtained by a based systems are the most common.
wearable mobile PC with camera, which transmits the images
In this paper, we propose a vision-based indoor location
to remote PCs for processing. The remote PCs perform
positioning system that uses the AR technique and does not
marker detection, image sequence matching, and location
recognition. The remote PCs transmit the recognized location require any additional devices. Our system employs the AR
information to the wearable mobile PC. The system provides technique to provide location information to persons
the ability to identify similar locations in the image database unfamiliar with the layout of an indoor environment. In
and display location-related information. Accurate and vision-based methods, locations in indoor environments can
efficient location positioning is achieved by using several be recognized by first characterizing each location with
vision-based techniques. The proposed system was tested in special identifiers. Alternatively, as with previously
an indoor environment and achieved an average location mentioned systems, location can be established using the
recognition success rate of 89%. The proposed system could signal strength of RF(radio frequency) bands of the IR echo
be applied to various consumer applications including the distance [15, 16]. The RFID-based system characterizes
door plate system, notice board system, shopping assistance location by measuring the strength of a signal received from
system, and bus service route guide system, among others1. an RF sensor attached to known locations. In our system,
however, specific locations are each identified by a marker and
Index Terms — Location positioning, Indoor navigation, with color information and prior knowledge. Here, each marker
Image sequence matching, Location model. has a black and white colored square with a characteristic
pattern. And topographical information of the indoor
I. INTRODUCTION environment is established from the prior knowledge of
Computers, from simple calculators to life-saving location, and it is represented in the location model by a
equipment, are ubiquitous in modern society. In many areas, hierarchical tree structure. Our system has several distinct
computers increase human efficiency and save time. Thus, advantages over all other location positioning systems. First, it
humans have come to expect rapid access to important is an economical solution because the marker that identifies
information through computers. The rapid development of each location is a simply printed on paper, and the mobile PC
computer technology has yielded the shrinking of computer and camera are general devices that can be implemented in
size and increasing of processing power. Such progress is well software. Second, although our system has inferior performance
realized by wearable computers [1-3]. Wearable computers than other systems such as the GPS or sensor-based location
are small, worn on the body during use, and intended to positioning systems, it is not limited by signal propagation and
provide information not directly detectable through human multiple reflections. To increase performance, it adapts via an
senses. Wearable computers that seamlessly overlay real-time adaptive thresholding method [3] to detect markers under
information regarding the human environment may contribute illumination changes, and it uses the location model to reduce
to a convenient and efficient human lifestyle. The augmented execution time during image sequence matching.
reality (AR) technique has been developed to achieve human The paper is structured as follows. Section Ⅱ describes
researches related to the location positioning in indoor
environments. In Section Ⅲ and Ⅳ, we present an overview
JongBae Kim is with the School of Computer Engineering, Seoul Digital and a detailed explanation of the proposed system. Section Ⅴ
University, Seoul, S. Korea.(e-mail: jbkim@sdu.ac.kr)
HeeSung Jun is with the School of Computer Engineering and Information of this paper contains a discussion of our results. In Section Ⅵ,
Technology, University of Ulsan, Ulsan, S. Korea.(e-mail: we present the conclusions of this paper.
hsjun@ulsan.ac.kr).
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
JB. Kim and HS. Jun: Vision-Based Location Positioning using Augmented Reality for Indoor Navigation 955
II. RELATED WORKS beacons. A single transmitter emits a magnetic signal to track
Indoor positioning systems provide location information of the position and orientation of numerous sensors. The location
indoor environments using various sensors. Previously error of these sensors is typically within a few meters.
developed systems are shown in Table I. As shown in Table 1, However, these techniques require extensive wiring, which
previous systems have recognized user location in indoor makes them prohibitively expensive and difficult to deploy.
environments with a sensor. These systems work by More importantly, these techniques only allow for room-scale
measuring the angle of signal arrival, the time difference of location, and are therefore not suitable for wide-scale
deployment.
signal arrival or the signal strength of the sensor. However,
the performance of previous systems, including their accuracy
and efficiency, is highly dependent on the number of sensors C. RFID tag-based Location Positioning
and the structural characteristics of indoor environments. The RFID-based location positioning method uses the RF
Below is a brief description of some of the location tags and a reader with an antenna to identify user location.
positioning technologies that have emerged in the past few Tags are generally affixed to places or objects such as
years to aid in navigation. consumer equipment so that the objects can be located without
line of sight. Tags contain circuitry that gains power from
TABLE I
radio waves emitted by readers in their vicinity. Tags then use
EXAMPLE OF LOCATION POSITIONING
System Technique Usability
this power to transmit their unique identifier to the reader. The
Smart Sight [17] GPS outdoor
detection range of these tags is approximately 4-6 meters.
Cricket [9] RF + Ultrasonic indoor Therefore, this technique offers room level precision. As
Active Badge [10] Infrared IR indoor result, the use of the RFID tag method requires a great
RADAR [11] RF indoor infrastructure in order to be highly precise and effective at
Drishti [18] GPS + Sensors outdoor, indoor location positioning. Additionally, the use of sensing
CyberGuide [19] GPS + Infrared IR outdoor, indoor equipment for indoor location positioning results in exorbitant
SHOSLIF [20] Vision indoor costs. Likewise, the installation of RFID sensors on ceiling
H. Aoki’s method [21] Vision indoor tiles or walls can be costly. Thus, there exists a need for a
location positioning system that overcomes the shortcomings
associated with current indoor positioning methods while still
A. GPS-based Location Positioning being economical. The development of such a method is
Major developments of location positioning systems have discussed herein.
been focused on enhancing positioning capability in open
outdoor environments. The most obvious way to determine III. OVERVIEW OF THE PROPOSED SYSTEM
location information is to use GPS receivers, which can
determine their positions to within a few meters in outdoor A. System Design
environments. Therefore, the GPS-based location positioning Location positioning systems for indoor environments
systems are used for precise applications such as aircraft or should have minimal weight and consistent performance.
vehicle navigation systems. Unfortunately, GPS radio signals Therefore, our location positioning system comprises a mobile
have difficulty in penetrating building walls. Thus the relative tablet PC, wireless camera, Head Mounted Display (HMD),
distances between the reference points and the location device and desktop PCs with a wireless LAN interface. Fig. 1 shows
cannot be easily calculated. This difficulty arises because the a front and top view of a user using our system. The mobile
communication paths of the GPS radio signals are long and PC is used to input the destination and to display the location
not always empty. So, the GPS-based location positioning information and map. The HMD is used to annotate the
devices do not work well indoors, or in many outdoor areas, direction sign on the user’s view with location information.
because the satellite signals are not strong enough to penetrate The camera, which is mounted on the user’s cap, captures
building walls, dense vegetation, or other obstructive features. image sequences and transmits them to remote PCs via a
wireless LAN. The remote PCs estimate location information
from the received image sequences and transmit the result to
B. Sensor-based Location Positioning
the user’s mobile PC.
Sensors, IR, electromagnetic wave and other sensors have
been adapted for location positioning systems. Sonar or IR
sensors placed at fixed positions within a room-scale space B. Overview
receive sensor signals and, through software analysis, make A process diagram of our system is shown in Fig. 2. Given
the sightings available to services and applications. To operate an image sequence taken as input from a cap-mounted
effectively, sensors require the deployment of large arrays of wireless camera, the proposed system annotates the
hundreds to thousands of IR beacons on ceiling tiles. The recognized location information. Our system mainly consists
position and orientation of the IR sensor are estimated by of two parts: wearable units and remote PC units. The
sighting the relative angles and positions of the ceiling IR wearable units capture and display images. The remote PC
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
956 IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
JB. Kim and HS. Jun: Vision-Based Location Positioning using Augmented Reality for Indoor Navigation 957
(a)
(a) (b)
(c) (d)
Fig. 4. PCA (a) and LDA (b) distribution of the first-two features of each
location
B. Image Sequence Matching
This process estimates user location by analyzing features must be used to characterize each location for
between features of the input and pre-stored image matching. Although they are obtained at consistent
sequences. The location dictionary is constructed from the locations, image sequences may differ widely since they
video sequence captured by a user walking in an indoor are acquired from different camera viewpoints. Therefore,
environment. The video sequence used for the location to operate effectively, the image sequence matching
dictionary was recorded over a period of 30 minutes at 12 process must be able to characterize whole frames of an
frames per second. The video sequence was then sub- image sequence with features of the image. However,
sampled to 8 frames per second to create the location more features do not necessarily result in more accurate
dictionary. Subsequently, the user annotated the sub- location recognition. Instead, each feature must have
sampled frame number and location ID at each location strong location discrimination power [20, 27].
where he stepped into a new area. At each of these Color, as a component of visual context, is an important
locations, features of the sub-sampled frame were source of information for image matching. Therefore, many
calculated and stored in the location dictionary. Each researchers have used color information to match images.
location in the dictionary is represented by a bundle of 64 H. Aoki used a hue histogram with 32-bins to match frames
frames. Therefore, this process compares every 64 frames [21]. However, the use of a hue histogram to match features
of the input image sequence to the 8 frame bundle has several limitations. When large variation exists between
representing each location in the location dictionary. frames, prominent hue features and the global contrast
After constructing the location dictionary, features of regions may not be stable. In response to this deficiency,
input image sequence are calculated and matched to the proposed method identifies features as the variation of
features in the location dictionary. When the input image color information between frames in an image sequence.
sequence is adequately similar to the pre-stored image However, the dimensionality and number of image
sequence in the location dictionary, the image sequence sequences is large. Thus, a dimension reduction method is
matching process shows the location information. The applied in real-time in our system. Linear Discriminant
recognition task is difficult because indoor areas are often Analysis (LDA) is used to effectively reduce the dimension
of that includes similar features such as hallways, rooms, of the image feature space and to successfully discriminate
and lobbies, often have similar features. Therefore, each location [27].
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
958 IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008
S = min f (D XY n ) (4)
n
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
JB. Kim and HS. Jun: Vision-Based Location Positioning using Augmented Reality for Indoor Navigation 959
In our case, location models were made for every building image sequence matching processes. However, if the
at our university. The location model for a specific building is moveable locations do not include one of the results of the
selected by inputting the destination of a user. By using the two processes, then the location recognition process outputs
location model, the range of image sequences searched for in the previous recognized location.
the location dictionary is narrower, which allows for low In the third case, if only one of the previous two processes
matching error. has a result, and if the moveable locations include the result, it
the location is determined. However, if the moveable locations
do not include the result, then the location recognition process
outputs the previous recognized location. The location
recognition process is performed as detailed in Table Ⅱ .
D. Location Annotation
The location annotation process annotates the location
information on the user’s view. To annotate the location
(a)
information on the user’s view in a seamless manner, we used
the OpenGL graphics library. Location information is
provided using virtual 3D direction sign graphics and text
such as “go straight”, “turn left” and “turn-right”. The location
information is received by the remote units, and the location
information is annotated by the wearable unit. The wearable
unit annotates the 3D direction sign on the top-left position of
the user’s view. The user can see the 3D sign on the input
(b) image sequence through the HMD. The distance to the
destination, according to the office layout, is also displayed
Fig. 6. Layout of the (a) laboratory area and (b) location model
through the HMD.
Three potential location recognition output scenarios must
V. EXPERIMENTAL RESULT
be considered. The first is in the case where the results
between the marker detection process and the image sequence In order to verify the effectiveness of the proposed system,
matching process are outputted. The second is the case where an experiment was performed using indoor image sequences
the difference between results is outputted. The third is the acquired by the cap-mounted camera. Images were captured at
case where only the result of the image sequence matching a rate of 8 frames per second and were digitized to a size of
process is outputted. 320×240 pixels. The experiments were performed with a
To determine the current location in the first case, the Pentium Mobile 1.8GHz tablet PC with Windows XP, a
location recognition process constantly checks the moveable wireless camera, desktop PCs (P-3.2GHz, 2GB RAM), HMD,
locations connected to the previous recognized location. The and the algorithm was implemented using a MS Visual C++
process moves to the next location when the result is found to development tool. Fig. 7 shows the interface and system setup
be one of the moveable locations. of the proposed system. The top-left side of the interface
The current location is determined in the second case if the shows the input image overlaid with a direction sign, the
moveable locations connected to the previous recognized bottom-left side shows the office map overlaid with the user’s
location include one of the results of the marker detection or current location, and the right side consists of buttons to set
TABLE Ⅱ
the departure and destination locations, as well as text boxes
RULES GUIDING THE LOCATION RECOGNITION PROCESS
that display the recognized location. This interface is
a –- the result of marker detection process
displayed on the mobile PC, and the HMD displays the top-
b –- the result of image sequence matching process left image of the interface.
pl –- the previous recognized location in the location model
{cl} –- the set of the moveable locations connected to the pl in the locati
on model
⎧a ∈ {cl}, output a
-if a=b ⎨
⎩a ∉ {cl}, output pl
⎧a, b ∈ {cl}, output a location shortest distance
⎪a ∈ {cl} ∩ b ∉ {cl}, from pl
⎪
-if a≠ b ⎨ output a
⎪a ∉ {cl} ∩ b ∈ {cl}, output b
⎪⎩a, b ∉ {cl},
output pl
(a) (b)
⎧b ∈ {cl}, output b
-if only output b ⎨
⎩b ∉ {cl}, output pl Fig. 7. Interface (a) and system setup (b) of the proposed system
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
960 IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008
A. Evaluation of the Marker Detection appearance of frame. The 32-bin histograms are calculated
The marker detection process is applied to every frame of and recorded for every frame. The FPR is the proportion of
the input image sequence. To show the robustness of the negative instances that were erroneously reported as positive.
marker detection process, we performed the detection process The FNR is the proportion of positive instances that were
for a total 53 locations on image sequences captured at day erroneously reported as negative. The optimal case is when
and night. We followed Eq. (5) to test the performance of the values of the FPR and FNR are set to zero. For this test, one
location detection system. video sequence with 64 frames per location was stored in the
location dictionary, 10 test image sequences per location were
1 n ⎛c ⎞
R = × ∑ ⎜⎜ i × 100 ⎟⎟ (5) obtained, the number of locations was 53, and the best result
n i =1 ⎝ Ti ⎠ was obtained when the Euclidean distance between features
Here, n is the number of locations included in the captured was at a minimum. Because the test sequences were obtained
image sequence, T i is the number of frames that show the ith during 24 second intervals, the test image sequences had a
total of 192 frames per location. The measures of the FPR and
location in the image sequence, and C i is the number of times FNR are shown by Eq. (6), where # represents the number of
that the ith is recognized by the system. The robustness of the each component.
marker detection process was tested with a motion blur
sensitivity test. In our system, the movement of the camera # false positive # false positive
FPR = =
follows the user’s view. A blurry image, which may decrease # location × # test image sequence 53 × 10 (6)
the success rate of marker detection, may result from a user’s # false nagative # false nagative
quick movement. The robustness of our marker detection FNR = =
# location × # test image sequence 53 × 10
process under increasing motion blur is shown in Table Ⅲ . Table Ⅳ shows the FPR and FNR results obtained using
For this test, the term “ground-truth marker” is used to denote several features for image sequence matching. As shown in
the ground-truth bounding square marked around a region of
each marker. For the test, the images were blurred in one Table Ⅳ, LDA is suitable for image sequence matching, and
direction (the camera is moving from the left to right) using a the performance of image sequence matching is higher than
9×9 motion blur filter. The fixed threshold value was set to PCA.
100. The marker detection success rate gradually decreased
with increasing blurriness. Still, the detection success rate was TABLE Ⅳ
maintained at over 92% using the adaptive threshold method. THE RESULTS OF THE FPR AND FNR USING SEVERAL FEATURES FOR
IMAGE SEQUENCE MATCHING
Measures
TABLE Ⅲ FPR / #false positive FNR / #false negative
Features
MARKER DETECTION RESULTS FOR A MOTION BLURRED IMAGE SEQUENCE
Color histogram with 32-bin 0.200 / 106 0.115 / 61
ANALYZED USING THE FIXED(FT) AND ADAPTIVE THRESHOLDING(AT)
METHODS Hue histogram with 32-bin 0.136 / 72 0.075 / 40
Image sequence (286 frames) PCA first-two features 0.092 / 49 0.030 / 16
Blurring factor 4 6 8 10 12 14 LDA first-two features 0.045 / 24 0.017 / 9
with FT(=100) 87.3 82.6 73.9 68.9 63.9 57.2
with AT 93.7 92.3 92.0 81.6 75.1 60.8
C. Evaluation of the Location Recognition with the Location
Model
B. Evaluation of the Image Sequence Matching Process
The location recognition process could be evaluated by
The image sequence matching process analyzes the comparing the measured positions after they were calibrated with
difference between features of the input image sequence and the content of the location dictionary. For this evaluation, we
features of the location dictionary. The process uses LDA assume that the user walked in the middle of corridors and
features to match features from both image sequences. As followed the nodes of the location model. Fig. 8 shows the
mentioned, LDA performs dimensionality reduction while location recognition results obtained using the fixed and adaptive
preserving as much of the class discriminatory information as thresholding methods with the location model. As previously
possible. To evaluate the image sequence matching process, mentioned, location recognition was performed with a better
we measure the false positive rate (FPR) and false negative success rate by using the adaptive thresholding method instead of
rate (FNR) using the PCA and LDA features and color and the fixed thresholding method. Two measures were used to
hue histograms as features for the image sequence matching evaluate the accuracy of location recognition: average probability
process. The PCA and LDA features are used in the first-two and average execution time. Table Ⅴ shows the results of the
features. Color and hue histograms are the most widely used location recognition accuracy assessment. Daytime- and
for color feature representations [21]. The histogram nighttime-captured image sequences were used for this test. As
information is partially reliable for matching purposes even in shown in Table Ⅴ , the rate of incorrect recognition and
the presence of small variations in the frame’s visual execution time were decreased by using the location model.
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
JB. Kim and HS. Jun: Vision-Based Location Positioning using Augmented Reality for Indoor Navigation 961
100 Fix thresholding applications, such as the door plate system, notice board
Adaptive thresholding system, shopping assistance system, and bus service route
Recognition rate(%)
90
guide system. The main purpose of our system is to insert
computer-generated graphical content into a real scene in real-
time. Such a system could be beneficial for shopping, for
80
example, as it would be possible to check prices of goods with
out spending time visiting shops and walking around every
70
1 2
single aisls. This is beneficial for both consumers and shop
3 4 5 6 7 8 9 owners because less time and money must be devoted to
10
Location ID investigating and advertising which products are carried. The
(a)
processed system could be applied in a shopping environment
100 Fix thresholding by putting a marker on each product in a store.
Adaptive thresholding
Recognition rate(%)
90
80
70
1 2 3 4 5 6 7 8 9 10
Location ID
(b)
Fig. 8. Location recognition results obtained using the fixed and adaptive
thresholding methods with the location model, recognition results of (a)
daytime- and (b) nighttime-captured image sequences
Fig. 9 shows typical indoor navigation situations: passing Fig. 9. Results of indoor location positioning
through a corridor, moving to another floor using an elevator
and going up or downstairs. The system was found to have an The proposed system could also be beneficial as a bus
average recognition rate of 89%, and the average execution service route guide. At each bus stop, the system user could
time for location recognition was 2.3 sec. Incorrect location see all of the destinations on each bus route at the bus stop.
recognition occurred in corridor locations connected to an The proposed system could inform the user of the proper bus
outside environment because of rapid changes in illumination. route and number to reach a specified destination. Generally,
The proposed system can be applied to various consumer due to space limitations, bus service route tables shown at bus
stops only show the destinations of each bus line. Visitors
unfamiliar with a city may not know the destinations well
TABLE Ⅴ
enough to select a bus line. However, with the proposed
AVERAGE LOCATION RECOGNITION RATE AND PROCESSING TIME
ACCORDING TO USING OR NOT USING THE LOCATION MODEL FOR THE
system, the user would only have to input the final destination
DAYTIME-CAPTURED IMAGE SEQUENCE and the bus service route and bus number would be annotated
Average recognition rate (%) and processing time (sec.)
on the user’s view. To be employed in an economical manner,
Location ID our system could be installed at each bus stop. Transit
Non-using time Using time
authorities would also save money due to not having to reprint
1 85.1 0.31 87 0.19 maps when bus routes are modified. Various examples of
2 87 0.30 91 0.21 consumer applications of our system are shown in Fig. 10. To
3 78 0.35 84.3 0.21
characterize each appliance, a 9×9cm marker is used. Fig. 10
(a) shows the price along with the publication company and
4 85.8 0.29 87.5 0.19
the author of a book. And, Fig. 10(b) shows the price,
5 92 0.29 92.1 0.16 manufacturer and data about a manufacture of an electric
6 92 0.27 96.6 0.15 heater.
7 91.5 0.31 90.4 0.16
VI. CONCLUSION
8 89.7 0.30 90 0.20
A vision-based location positioning system that uses the
9 84 0.29 85.7 0.19
augmented reality technique for indoor navigation has been
10 90 0.27 92.5 0.16 proposed herein. This system automatically recognizes a
total 87.5 0.29 89.7 0.18 location from image sequences taken of indoor environments.
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.
962 IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008
Authorized licensed use limited to: University of Texas at Arlington. Downloaded on February 12,2021 at 05:48:08 UTC from IEEE Xplore. Restrictions apply.