1 Introduction
Affordable yet highly sophisticated drone devices and operating systems are increasingly available on the consumer market. Drones are now being used for the aerial photography and scanning of static objects [
23,
24,
45,
56], for interactive applications with environments, objects, and humans, including disaster investigations and rescue [
44], product delivery [
7], remote repairs [
81], haptic proxy for VR [
1], outdoor navigation [
37], and communication agents [
3,
83]. For such scenarios that require drone operators’ real-time judgment, interactive drone-piloting approaches have become dominant over the recently advanced autopiloting technologies [
68,
73]. However, drone operation is generally difficult for most pilots due to the inevitable challenges of speed-control mechanisms [
9,
72,
80], environmental factors (e.g., wind, field complexity [
14]), poor spatial awareness [
67], etc. To reflect such difficulty, most countries have enacted strict regulations that rarely permit remote drone piloting, i.e., beyond visual line of sight (BVLOS) drone flights, for inexperienced pilots.
Our modern societies nevertheless desire BVLOS drone operations to utilize the drone’s actual sensing and locomotion capabilities [
7]. The biggest challenge for achieving BVLOS operation is the pilot’s poor spatial awareness of remote drones and the surrounding environment. When using a typical consumer drone system, a video feed from an on-drone front-facing camera (FPV: first-person view) is streamed to the pilot. Since these cameras do not cover the left, right, and rear directions, the pilot is unable to fully grasp the drone’s spatial status relative to its surroundings. If the drone enters such a blind-spot region, the pilot will obviously be concerned about the operation’s safety. This issue becomes critical when pilots must position drones near surrounding objects or humans in remote places.
To address these blind-spot issues of FPVs, previous works have considered methods that implement multiple cameras (e.g., DJI Matrice 210 [
12]) or a 360-degree camera attached to the aircraft [
34,
59,
85]. However, the images obtained by this method are from a first-person perspective and do not fully help the pilot’s distance perception of the drone and its surroundings.
3D digital map generation using SLAM [
69,
70] has also been explored as a way to provide environmental knowledge that FPV cameras cannot capture. Such a world-reconstruction approach using on-drone sensors is quite promising for briefly representing the remote drone’s operating area, but the current technologies still have difficulty reflecting dynamic changes in unknown remote environments and objects. Specifically, we notice a significant data transmission delay and low FPS issues when running current SLAM algorithms via a wireless network [
77,
82], suggesting that a fully reliable and real-time 3D digital map cannot be reconstructed [
8].
Another notable approach is using additional cameras to provide a third-person view (TPV) that shows an overview of the information around the operating drone. Applying the idea of using a wide, upward perspective has increased the spatial awareness of operators in the teleportation robot domain (e.g., [
32,
40,
64]) and improved 2D/3D content navigation interfaces (e.g., [
10]). This idea was recently applied to a drone interface system that allows pilots to manipulate a remote drone (i.e., main drone) through a TPV from an additional higher-positioned drone [
67] or a preset indoor overhead camera [
46]. Hereinafter, we call this unique navigation TPV-based piloting. This is an important advance toward successful BVLOS drone operation because TPV captures real-world information in real time and covers the blind-spot regions of FPV cameras from higher view angles.
Despite the great potential of TPV-based piloting, studies of it are scant, and two critical challenges remain. The first challenge is the drone’s low visibility in the TPV. Pilots often struggle to distinguish the main drone’s body in a colorful or blackish background or to precisely perceive its spatial status (e.g., height, position) relative to the surroundings, since only vague geometrical depth cues are available from an overview perspective (Fig.
2a) [
9,
11,
36,
67,
74,
86]. The second challenge regards TPV framing. Depending on the main drone’s motion direction, a fixed bird’s eye TPV is not always suitable, and it needs to be dynamically adjusted to better focus on the drone’s current motion direction [
70]. However, prior TPV-based piloting interfaces have either a static or pre-adjustable TPV [
46,
67], which does not fully enhance the pilot’s spatial awareness while the main drone moves flexibly. Although addressing these two challenges is crucial toward practical TPV-based piloting, there remain many other technical difficulties, including GPS’s poor distance tracking ability, potential optical noises in outdoor environments, and interference with radio waves.
In this work, we propose BirdViewAR, a novel surroundings-aware remote drone-operation system that significantly expands a pilot’s spatial awareness using an augmented third-person view (TPV) from an autopiloted follower drone. The follower drone is autonomously flown behind the main drone at a higher altitude to offer a bird’s-eye TPV that entirely captures the main drone’s 3D motion degrees: horizontal, ascending/descending motions (z-axis motions), and yaw rotation (around the z-axis). We expand these basic TPV capabilities by introducing two relevant augmentations to address TPV’s two crucial challenges mentioned above. First, to improve the pilot’s spatial understanding of the TPV’s contents, we employ AR overlays to visually highlight the main drone’s spatial statuses, including the current position, heading, height, camera field-of-view (FOV), and proximity areas (Fig.
1 B). Second, to improve the TPV framing for the fast-moving main drone, we employ motion-dependent automatic TPV framing, where the follower drone’s position and directions are dynamically controlled to clearly capture the main drone’s status, visualized overlays, and near-future destination in the TPV (Fig.
1 A). We introduce the two augmentations simultaneously because they work in a complimentary manner; for example, the AR overlays can obviously be more effective when the TPV captures more motion-related areas. At the same time, the dynamic TPV framing would not be recognizable without clear visual feedback via our AR overlay graphics. Later, we show how their combination is superior over the individual augmentations.
While our work is built upon previous TPV-based piloting studies [
46,
67], we significantly expand them by introducing the above two technical augmentations. To demonstrate our concept, we focus on designing and prototyping the BirdViewAR system, in which we further propose a feasible drone formation tracker using the follower drone’s on-board camera, an accurate AR-overlay generator, and an optimization-based follower drone control algorithm. In terms of our target scenarios, from many application opportunities with drones, we focus on both aerial recording and a drone’s interactive activities with remote objects (e.g., communication, delivery, inspection), where a pilot’s real-time spatial awareness of remote areas and accurate drone-positioning ability are required.
The following are the main contributions of this work:
•
We propose BirdViewAR, a remote surroundings-aware drone-operation system that increases the spatial awareness of pilots using an augmented TPV through AR overlays and motion-dependent automatic TPV framing.
•
We describe all of the design considerations for TPV visual augmentation and the follower drone’s motion-dependent automatic control.
•
We show technical insights for implementing BirdViewAR in consumer-level programmable drones, including our vision-based drone-sensing platform and an optimization-based control process for the follower drone.
•
We describe BirdViewAR’s potential as well as its remaining challenges for beginner pilots through a preliminary outdoor user study.