LEVIOSA: Natural Language-Based Uncrewed Aerial Vehicle Trajectory Generation
<p>Our framework incorporates several LLMs to generate and refine drone waypoints based on user commands.</p> "> Figure 2
<p>Illustrative diagram of the components of the high-level planner system, showing the role of each LLM agent type, their inputs, and outputs. (<b>a</b>) Instructor agent. (<b>b</b>) Generator agent. (<b>c</b>) Critic agents. (<b>d</b>) Aggregator agent.</p> "> Figure 3
<p>The overall trajectory is divided into individual waypoints for each drone. The waypoints, combined with each drone’s real-time observations, are then processed by the dedicated low-level policy for that UAV. The process generates the specific actions required to guide the drone’s movement.</p> "> Figure 4
<p>Sample Star generated based on Gemini.</p> "> Figure 5
<p>Sample Star generated based on GeminiFlash.</p> "> Figure 6
<p>Sample Star generated based on GPT-4o.</p> "> Figure 7
<p>Successful 5-petal flower trajectory generated by the Gemini model.</p> "> Figure 8
<p>Common failure mode of the Gemini model for petal flower geometries.</p> "> Figure 9
<p>A thousand drones successfully form parallel lines generated by Gemini.</p> "> Figure 10
<p>One hundred drones successfully form a spiral generated by Gemini.</p> "> Figure 11
<p>A thousand drones unsuccessfully form a dragon generated by Gemini.</p> ">
Abstract
:1. Introduction
- Multi-critic consensus mechanism: a system that utilizes multiple critics to evaluate generated trajectories with a majority voting scheme to ensure high-quality outputs.
- Hierarchical prompt structuring: a method that organizes and summarizes outputs from multiple critics into a coherent context, improving the LLMs’ ability to understand and execute complex tasks.
2. Related Work
2.1. Large Language Models in Robotics
2.2. Reinforcement Learning in Aerial Vehicle Control
3. Methodology
3.1. Problem Formulation
- High-level planner: The component employs four types of multimodal LLMs to process the operator’s natural language input and creates the final coordinated waypoint paths to be executed by the UAVs. The high-level planner comprises four main types of LLMs that collaborate as a multi-agent flow to achieve the audio-to-semantic conversion of the speech or audio command into a Python (v3.10) script that generates 3D waypoints. Each type of LLM is assigned unique roles that contribute to the system’s overall effectiveness. Some roles only require text input and output in the case of the aggregator LLM agent, while other roles require handling multiple modalities, such as audio in the case of the instructor LLM agent or images in the case of the critic LLM agents. We discuss the details of the high-level planner system, its LLM organization, and the roles of each LLM in the following sections.
- Per-UAV low-level controllers: Implemented as policies trained with RL to provide low-level motor control, the controllers are the same for each drone or UAV. One low-level controller takes the 3D waypoints the high-level planner system provides and produces low-level motor controls to operate the individual UAVs to execute the intended trajectory.
3.2. High-Level Planning
- The instructor multimodal LLM agent converts the input audio from the user into high-level requirements that capture the semantics of the audio. Optionally, the agent also converts text commands from the user into requirements.
- The generator LLM agent takes the high-level requirements provided by the instructor agent and synthesizes a Python program that generates the 3D waypoints for all the drones when executed.
- The critic multimodal LLM agents or critics, characteristic of our multi-critic consensus mechanism, take as input the visual plot of the generated waypoints and the requirements to provide feedback on the quality of the waypoints.
- The aggregator LLM agent, illustrative of our hierarchical prompt structuring, aggregates current and previous aggregated feedback from the critics to provide new, comprehensive feedback and progress direction to the generator.
3.2.1. Instructor Agent
3.2.2. Generator Agent
3.2.3. Critic Agents: Multi-Critic Consensus Mechanism
3.2.4. Aggregator Agent: Hierarchical Prompt Structuring
3.3. Per-UAV Low-Level Controllers
- Observations and actions: Each drone is fed with its own sequence of observations comprised of a vector concatenated with a previous actions buffer. The vector consists of the drone’s current position , orientation , linear velocity , angular velocity , and difference between the current position and target waypoints . Additionally, a buffer of previous actions is added to the observations, which are the actions from the previous s, after some trial and error, in our case. The temporal information allows the agent to reflect the current state and the sequence of recent actions. The state representation allows the RL policy to understand the drone’s kinematic state and its relation to the waypoints. The observation space of each drone excludes the positions of the other drones as the LLM generates collision-free paths given constant velocity for all drones. The action space consists of continuous low-level control input for the drone’s rotors, specifically the revolutions per minute (RPM) for each rotor , which allows for fine-grained control over the drone’s movement and orientation.
- Reward: The reward function of our task is designed to encourage the drones to navigate through assigned waypoints while avoiding collisions efficiently. The reward balances immediate positional goals with broader flight characteristics. The primary component, , uses an exponential decay based on the squared Euclidean distance to the current waypoint, providing a continuous gradient that intensifies as the drone approaches its target:The distance reward is complemented by , a binary reward triggered when the drone is within a tight threshold of a waypoint, offering a substantial bonus for precision:The reward function also incorporates a velocity-based reward. to encourage the drone to maintain moderate speeds with a peak reward at 0.5 units per time step, while and the corresponding penalty work in tandem to promote smooth rotational movements and discourage erratic changes in orientation:To address long-term objectives, provides a significant reward upon completing the entire set of waypoints, motivating the drone to navigate efficiently through the full course. The careful tuning of temperature parameters allows for fine adjustment of each component’s influence and enables the reward function to be adapted to various mission profiles and drone capabilities. The reward function ensures that the drone reaches the waypoints and has a flight pattern that is efficient, stable, and suitable for real-world applications:
4. Experiments
4.1. Setup
4.1.1. Simulation Setup
4.1.2. Low-Level Policy Training
- Basic control: The initial stage focused on fundamental hover capability, requiring the policy to maintain stable position control at fixed points in space. This established the foundation for all subsequent flight behaviors.
- Structured navigation: once hovering was mastered, the policy progressed to following predefined circular trajectories, introducing continuous motion and coordinated control across multiple axes.
- Advanced trajectory tracking: the final stage involved tracking arbitrary trajectories, requiring the policy to generalize its learned skills to diverse and complex flight paths.
4.2. Results
2024-08-25 21:14:22,929 - INFO - Aggregated feedback from multiple critics:
MAJORITY INVALID (0/3) Feedback Summary: The feedback highlights several issues with the drone trajectories,
particularly concerning the completeness and shape of the 3-petal rose curve.
**Common Points:**
* **Drone 3’s trajectory is the biggest problem. ** All critics agree that Drone 3’s path is incomplete and does not match the expected shape of a petal. * **The overall shape is incorrect. ** The combined trajectories do not form a proper 3-petal rose curve. This is mainly due to Drone 3’s incomplete path. * **Starting positions are generally good. ** There is no consensus on issues with starting positions, except for Drone 3, which doesn’t follow its designated starting point in the second iteration.
**Consensus:**
The consensus is that the drone trajectories are not valid and need significant improvement. The primary focus should be on fixing Drone 3’s path to ensure it traces a complete petal and adjusting the other drones’ paths to achieve the correct overall shape.
4.3. Discussion on Varying Results Among LLMs
5. Ablation Studies
5.1. Contributions of High-Level Planning Modules
- Instructor agentstranslate the natural language prompt into a detailed set of requirements that guide the generator agent, focusing less on performance and more on efficiency. Since audio data are larger in storage size than text data, sharing audio data across multiple agent calls is expensive compared to simpler text descriptions. Because text is more efficient to manipulate than audio, we utilized an instructor agent, so we only handled the audio command once during the initial conversion of the audio commands to text.
- Critic agents provide feedback to the generator. In Table 2, we expose the difference between no critic, one critic, and three critics. We chose three critics to achieve two objectives: to overcome hallucinations in the feedback evaluation with redundancy and to utilize an odd number of critics to guarantee no ties. Additional critics can be included as long as the total number of critics remains odd to prevent tie votes. However, with more critics comes added computation cost and overhead. Three is the minimum number of critics satisfying our previously stated objectives while keeping the framework efficient. In Table 2, we observe that three critics significantly outperformed other configurations with an average success rate of 64.0%, compared to single critic (56.0%) and no critics (54.5%). Three critics achieved the best performance in most path types, demonstrating substantial improvements in complex trajectories like the cross (100%), helix (100%), and zigzag (90%). A single critic showed modest improvements over no critics (56.0% vs. 54.5%), suggesting that even minimal feedback helps refine trajectories. However, we observed interesting failure modes where no critics performed better, particularly in paths like triangle (90% vs. 70%), square (80% vs. 60%), and octagon (40% vs. 30%). This suggests that in certain conditions where the generator is already confident in its generation, adding critics may introduce confusion and lead the model away from an initially correct trajectory. These findings highlight the trade-off between the benefits of multiple perspectives and the potential for overcomplicated feedback in simpler scenarios.
- Aggregator agents: Similar to the instructor agent, the aggregator agent is for efficiency. We found that with multiple critics, the context window of the generator was quickly exhausted on mostly redundant information from the critics. We also found that the outputs from the critics occasionally differed due to the occasional hallucinations in the LLM/VLM experiments. We used an aggregator agent to provide an unambiguous feedback signal to the generator agent.
5.2. Timing Analysis
- Examining the Gemini configurations with varying numbers of critics showed a clear trade-off between refinement capability and computational cost. While Gemini without reflection was the fastest, with a total time of 5.45 s, it lacked any refinement mechanisms. Adding critics significantly increased total computation time: one critic (127.72 s), three critics (203.60 s), and five critics (444.36 s). The increase was driven by two factors: (1) reflection time per round grew substantially with more critics (9.23 s for one critic to 59.33 s for five critics) as more feedback had to be processed, and (2) the generation phase became more complex (from 4.24 s to 14.73 s) as the model had to process more comprehensive feedback. Notably, adding more critics did not necessarily reduce the number of rounds needed, with similar average rounds for one and five critics (six rounds) and more for three critics (eight rounds), suggesting diminishing returns from adding critics beyond a certain point.
- Comparing Gemini Flash (small model) with Gemini (large model) in their three-critic configurations revealed the efficiency–capability trade-off. While Gemini Flash had a notably lower average success rate of 50.5% compared to Gemini’s 64.0%, it achieved faster generation (1.73 s vs. 5.21 s) and reflection times (9.23 s vs. 20.78 s). However, it required more reflection rounds on average (nine vs. eight) to achieve satisfactory results, indicating that while individual operations were faster, the model often needed more iterations to converge and still achieved lower performance. This resulted in a lower but comparable total time (94.95 s vs. 203.60 s), suggesting that the smaller model might be preferable when computational resources were constrained and longer convergence times were acceptable, though this came at a significant performance cost (−13.5% success rate).
- GPT-4o demonstrated superior per-iteration performance among the three-critic configurations with the highest average success rate of 76.0%, compared to Gemini (64.0%) and Gemini Flash (50.5%). Despite having higher generation time (8.55 s) and reflection time (31.48 s) compared to both Gemini (5.21 s, 20.78 s) and Gemini Flash (1.73 s, 9.23 s), it required significantly fewer reflection rounds (two vs. eight and nine, respectively). This efficiency in convergence led to the best total time (80.06 s) among configurations with critics, even outperforming the smaller Gemini Flash model (94.95 s). This suggested that GPT-4o’s enhanced capabilities enabled it to generate higher-quality outputs that needed less refinement, making it more efficient overall despite higher per-operation costs. The combination of superior success rate (+12% over Gemini, +25.5% over Gemini Flash) and fastest total computation time underscored GPT-4o’s exceptional performance in both quality and efficiency. These results demonstrated that investing in a stronger base model for the LEVIOSA framework yielded compounding benefits: better generation quality required fewer refinement iterations, ultimately leading to both superior performance and faster convergence.
5.3. Drone Capacity
6. Findings
- Role of critic agents: The inclusion of critic agents and the consensus mechanism significantly enhanced the robustness of the generated trajectories. By providing iterative feedback, the system could correct and refine outputs over several iterations. This approach was particularly effective for complex, multi-agent coordination tasks, where precision and synchronization are critical.
- Model performance: GPT-4o consistently achieved higher success rates across various path types than Gemini and GeminiFlash. For instance, GPT-4o produced a more intricate and recognizable design in the star-shaped path, successfully capturing the underlying intent despite minimal detail in the prompt. In contrast, the trajectories generated by the Gemini models were simpler and less aligned with the complex structure typically associated with a star.
- Complex geometries: The Gemini model struggled significantly with petal flower geometries, as evidenced by lower success rates for the 3-petal, 4-petal, and 5-petal rose curves. Common failure modes included generating the wrong number of petals or failing to assign a drone to each petal, leading to incomplete or incorrect geometries. This suggests the model has difficulty handling complex spatial reasoning tasks required for intricate shapes.
- Impact of model size: There was a noticeable difference in success rates between GeminiFlash and the larger models (Gemini and GPT-4o), illustrating that model size matters in the waypoint path generation task. Both Gemini and GPT-4o, which are larger in terms of parameters, outperformed GeminiFlash in most path types, indicating that larger models may have more capacity to handle the complex spatial reasoning and code synthesis required for generating accurate trajectories.
- Generative iterations: The iterative process involving generation and reflection cycles contributed to incremental improvements in trajectory generation. However, achieving satisfactory results for more complex geometries often required multiple retries up to the maximum allowed, which could be computationally expensive and yield marginal improvements.
- Computational efficiency: timing analysis revealed three key insights about computational trade-offs: (1) adding critics increased computational time; (2) while smaller models like Gemini Flash had faster per-operation times, they required more refinement iterations, leading to longer total execution times; (3) GPT-4o, despite higher per-operation costs, achieved better overall efficiency through fewer refinement iterations, demonstrating that model capability had more impact on total performance than raw operational speed.
7. Conclusions
Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Algorithm Description
Algorithm A1 LEVIOSA framework. | ||
1. High-level planner | ||
Input: Text prompt p or voice audio v, maximum iterations M, number of critics C | ||
Output: Set of waypoint coordinates for N drones | ||
1: | ▷ Instructor VLM agent converts input to requirements | |
2: | ▷ Initialize empty feedback | |
3: | for to M do | ▷ Generation reflection Loop |
% Generation phase | ||
4: | ▷ Generator LLM agent synthesizes Python code | |
5: | ▷ Non-agent function | |
6: | ▷ Non-agent function | |
% Reflection Phase | ||
7: | for to C do | |
8: | ▷ Critic VLM agents evaluate plot | |
9: | end for | |
10: | ▷ Aggregator LLM agent combines feedback | |
11: | if | ▷ Non-agent function then |
12: | break | ▷ stop as good waypoints found |
13: | end if | |
14: | end for | |
15: | return | |
2. Per-UAV low-level controller | ▷ Executed in parallel for each of the N drones | |
Input: Waypoint sequence from high-level planner, current state | ||
Output: Sequence of control actions for drone n until terminal position | ||
16: | while not reached terminal position do | |
17: | ▷ Generate control action using RL policy | |
18: | Execute and observe next state | |
19: | ||
20: | end while |
Appendix B
Path Type | Path Name | Prompt |
---|---|---|
Single | Circle | Create a circular trajectory using 2 drones, where each drone traces out one half of the circle. The drones should move in perfect synchronization to form a complete circle. |
Hyperbola | Design a hyperbolic path using 2 drones, with each drone tracing one branch of the hyperbola. The drones should maintain symmetry and smoothness in their paths. | |
3-Petal rose | Generate a 3-petal rose curve using 3 drones, where each drone is responsible for tracing out one petal. The drones should coordinate to form a seamless rose pattern. | |
4-Petal rose | Create a 4-petal rose curve using 4 drones, with each drone tracing one petal. The drones should work together to ensure the rose curve is smooth and continuous. | |
5-Petal rose | Design a 5-petal rose curve using 5 drones, where each drone forms one petal. The drones should synchronize their movements to create a harmonious rose shape. | |
Sine wave | Construct a sine wave pattern using 3 drones, where each drone covers a separate section of the wave. The drones should ensure a continuous and smooth wave formation. | |
Helix | Draw a helical path using 1 drone, creating a spiral in three-dimensional space. The drone should maintain a consistent radius and pitch throughout the helix. | |
Double helix | Create a double helix trajectory using 2 drones, with each drone forming one strand of the helix. The drones should maintain parallel paths and synchronized movement. | |
Triple helix | Generate a triple helix pattern using 3 drones, with each drone forming one strand. The drones should coordinate to maintain uniform spacing and synchronization. | |
Double conical helix | Design a double conical helix using 2 drones, where each drone traces one conical spiral. The drones should ensure the cones are symmetrical and the paths are smooth. | |
Composite | Star | Generate a star-shaped trajectory using 5 drones. The drones should move in such a way that their combined flight paths trace out a symmetrical star with equal arm lengths. |
Zigzag | Create a dynamic zigzag pattern using 3 drones. The drones should move in unison, forming a synchronized zigzag path. Each drone should follow a separate path within the zigzag, ensuring the pattern is evenly spaced and consistent throughout the trajectory. | |
Heart | Design a geometric, angular heart-shaped path using 2 drones. Each drone should trace one half of the heart, starting from the bottom point and meeting at the top. The heart should have an angular appearance, with both halves perfectly mirroring each other. | |
Cross | Generate a cross-shaped path using 2 drones. Each drone should be responsible for one arm of the cross. Ensure that the paths are perpendicular to each other and intersect at the center. | |
Pentagon | Create a pentagon using 5 drones. Each drone should trace one side of the pentagon, with their paths combining to form the shape. | |
Hexagon | Design a hexagon-shaped path using 3 drones, each responsible for two sides of the hexagon. The drones should work together to form a complete hexagon, ensuring that the drones’ paths connect seamlessly at the vertices to maintain the shape’s integrity. | |
Triangle | Create an equilateral triangle path using 3 drones. Each drone should trace one side of the triangle, starting from a common point and moving outward to form the triangle. The drones should synchronize their movements to complete the triangle simultaneously. | |
Square | Generate a square trajectory using 4 drones. Each drone should be responsible for one side of the square, ensuring that the angles at each corner are well-defined. The drones should coordinate their movements to maintain equal side lengths and complete the square simultaneously. | |
Octagon | Design an octagon-shaped path using 8 drones. Each drone should be responsible for tracing two sides of the octagon. Ensure that the drones’ paths create a symmetric and precise overall shape. | |
Pyramid | Create a pyramid-shaped path using 4 drones. Each drone should trace one side of the pyramid, starting from the base and converging at the apex. The drones should coordinate their movements to form a symmetrical and well-defined pyramid shape. |
References
- Javaid, S.; Fahim, H.; He, B.; Saeed, N. Large language models for uavs: Current state and pathways to the future. arXiv 2024, arXiv:2405.01745. [Google Scholar] [CrossRef]
- Tzachor, A.; Devare, M.; Richards, C.; Pypers, P.; Ghosh, A.; Koo, J.; Johal, S.; King, B. Large language models and agricultural extension services. Nat. Food 2023, 4, 941–948. [Google Scholar] [CrossRef] [PubMed]
- Shi, L.; Mehrooz, G.; Jacobsen, R.H. Inspection Path Planning for Aerial Vehicles via Sampling-based Sequential Optimization. In Proceedings of the 2021 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 15–18 June 2021; pp. 679–687. [Google Scholar]
- Pu, H.; Yang, X.; Li, J.; Guo, R. AutoRepo: A general framework for multimodal LLM-based automated construction reporting. Expert Syst. Appl. 2024, 255, 124601. [Google Scholar] [CrossRef]
- Wan, G.; Wu, Y.; Chen, J.; Li, S. CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction. arXiv 2024, arXiv:2408.13940. [Google Scholar]
- Mikami, Y.; Melnik, A.; Miura, J.; Hautamäki, V. Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs. arXiv 2024, arXiv:2403.13801. [Google Scholar]
- Chen, Y.; Arkin, J.; Zhang, Y.; Roy, N.; Fan, C. Scalable multi-robot collaboration with large language models: Centralized or decentralized systems? In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 4311–4317. [Google Scholar]
- Ivanova, N. Swarm Robotics-Coordination and Cooperation: Exploring Coordination and Cooperation Strategies in Swarm Robotics Systems for Achieving Collective Tasks. J. Comput. Intell. Robot. 2024, 4, 1–13. [Google Scholar]
- Zu, W.; Song, W.; Chen, R.; Guo, Z.; Sun, F.; Tian, Z.; Pan, W.; Wang, J. Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1019–1025. [Google Scholar]
- Mandi, Z.; Jain, S.; Song, S. Roco: Dialectic multi-robot collaboration with large language models. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 286–299. [Google Scholar]
- Mi, J.; Liang, H.; Katsakis, N.; Tang, S.; Li, Q.; Zhang, C.; Zhang, J. Intention-related natural language grounding via object affordance detection and intention semantic extraction. Front. Neurorobot. 2020, 14, 26. [Google Scholar] [CrossRef]
- Stramandinoli, F.; Tikhanoff, V.; Pattacini, U.; Nori, F. Grounding speech utterances in robotics affordances: An embodied statistical language model. In Proceedings of the 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Cergy-Pontoise, France, 19–22 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 79–86. [Google Scholar]
- Mees, O.; Borja-Diaz, J.; Burgard, W. Grounding language with visual affordances over unstructured data. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 11576–11582. [Google Scholar]
- Wu, X.; Xian, R.; Guan, T.; Liang, J.; Chakraborty, S.; Liu, F.; Sadler, B.; Manocha, D.; Bedi, A.S. On the safety concerns of deploying llms/vlms in robotics: Highlighting the risks and vulnerabilities. arXiv 2024, arXiv:2402.10340. [Google Scholar]
- Huang, W.; Xia, F.; Xiao, T.; Chan, H.; Liang, J.; Florence, P.; Zeng, A.; Tompson, J.; Mordatch, I.; Chebotar, Y.; et al. Inner monologue: Embodied reasoning through planning with language models. arXiv 2022, arXiv:2207.05608. [Google Scholar]
- Jiao, A.; Patel, T.P.; Khurana, S.; Korol, A.M.; Brunke, L.; Adajania, V.K.; Culha, U.; Zhou, S.; Schoellig, A.P. Swarm-gpt: Combining large language models with safe motion planning for robot choreography design. arXiv 2023, arXiv:2312.01059. [Google Scholar]
- Liu, H.; Zhu, Y.; Kato, K.; Tsukahara, A.; Kondo, I.; Aoyama, T.; Hasegawa, Y. Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration. arXiv 2024, arXiv:2406.14097. [Google Scholar] [CrossRef]
- Adajania, V.K.; Zhou, S.; Singh, A.K.; Schoellig, A.P. AMSwarm: An Alternating Minimization Approach for Safe Motion Planning of Quadrotor Swarms in Cluttered Environments. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 1421–1427. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Brohan, A.; Chebotar, Y.; Finn, C.; Hausman, K.; Herzog, A.; Ho, D.; Ibarz, J.; Irpan, A.; Jang, E.; Julian, R.; et al. Do as i can, not as i say: Grounding language in robotic affordances. In Proceedings of the Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023; PMLR: New York, NY, USA, 2023; pp. 287–318. [Google Scholar]
- Yan, K.; Ji, L.; Wang, Z.; Wang, Y.; Duan, N.; Ma, S. Voila-A: Aligning Vision-Language Models with User’s Gaze Attention. arXiv 2023, arXiv:2401.09454. [Google Scholar]
- Naik, R.; Chandrasekaran, V.; Yuksekgonul, M.; Palangi, H.; Nushi, B. Diversity of Thought Improves Reasoning Abilities of LLMs. arXiv 2024, arXiv:2310.07088. [Google Scholar]
- Wang, X.; Wang, Z.; Liu, J.; Chen, Y.; Yuan, L.; Peng, H.; Ji, H. MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback. arXiv 2024, arXiv:2309.10691. [Google Scholar]
- Lou, J.; Wu, W.; Liao, S.; Shi, R. Air-M: A Visual Reality Many-Agent Reinforcement Learning Platform for Large-Scale Aerial Unmanned System. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 5598–5605. [Google Scholar]
- Aikins, G.; Jagtap, S.; Gao, W. Resilience Analysis of Deep Q-Learning Algorithms in Driving Simulations Against Cyberattacks. In Proceedings of the 2022 1st International Conference on AI in Cybersecurity (ICAIC), Victoria, TX, USA, 24–26 May 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Ho, T.M.; Nguyen, K.K.; Cheriet, M. UAV Control for Wireless Service Provisioning in Critical Demand Areas: A Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2021, 70, 7138–7152. [Google Scholar] [CrossRef]
- Amendola, J.; Cenkeramaddi, L.R.; Jha, A. Drone Landing on Moving UGV Platform with Reinforcement Learning Based Offsets. In Proceedings of the 2023 IEEE International Symposium on Smart Electronic Systems (iSES), Ahmedabad, India, 18–20 December 2023; pp. 16–21. [Google Scholar]
- Yun, W.J.; Park, S.; Kim, J.; Shin, M.; Jung, S.; Mohaisen, D.A.; Kim, J.H. Cooperative Multiagent Deep Reinforcement Learning for Reliable Surveillance via Autonomous Multi-UAV Control. IEEE Trans. Ind. Inform. 2022, 18, 7086–7096. [Google Scholar] [CrossRef]
- Tovarnov, M.S.; Bykov, N.V. Reinforcement learning reward function in unmanned aerial vehicle control tasks. J. Phys. Conf. Ser. 2022, 2308, 012004. [Google Scholar] [CrossRef]
- Geles, I.; Bauersfeld, L.; Romero, A.; Xing, J.; Scaramuzza, D. Demonstrating Agile Flight from Pixels without State Estimation. arXiv 2024, arXiv:2406.12505. [Google Scholar]
- Aikins, G.; Jagtap, S.; Nguyen, K.D. A Robust Strategy for UAV Autonomous Landing on a Moving Platform under Partial Observability. Drones 2024, 8, 232. [Google Scholar] [CrossRef]
- Alon, Y.; Zhou, H. Multi-agent reinforcement learning for unmanned aerial vehicle coordination by multi-critic policy gradient optimization. arXiv 2020, arXiv:2012.15472. [Google Scholar]
- Guo, J.; Chen, Y.; Hao, Y.; Yin, Z.; Yu, Y.; Li, S. Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 115–122. [Google Scholar]
- Zhang, J.; Hu, C.; Cai, R.; Wang, W.; Yan, J.; Lv, C. Safe Trajectory Generation for Complex Urban Environments Using Spatio-temporal Semantic Corridor. IEEE Robot. Autom. Lett. 2018, 3, 2784–2791. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Panerati, J.; Zheng, H.; Zhou, S.; Xu, J.; Prorok, A.; Schoellig, A.P. Learning to fly—a gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 7512–7519. [Google Scholar]
- Hou, Y.; Liang, X.; Lv, M.; Yang, Q.; Li, Y. Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making. Eng. Appl. Artif. Intell. 2023, 125, 106703. [Google Scholar] [CrossRef]
- Kurkcu, A.; Acar, C.; Campolo, D.; Tee, K.P. Discrete Task-Space Automatic Curriculum Learning for Robotic Grasping. In Proceedings of the 2021 21st International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 12–15 October 2021; pp. 731–738. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.H.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates Inc.: Red Hook, NY, USA, 2024. [Google Scholar]
- Angel, M.; Rinehart, J.B.; Canneson, M.; Baldi, P. Clinical Knowledge and Reasoning Abilities of AI Large Language Models in Anesthesiology: A Comparative Study on the American Board of Anesthesiology Examination. Anesth. Analg. 2024. [Google Scholar] [CrossRef] [PubMed]
- Xu, L.; Zhao, S.; Lin, Q.; Chen, L.; Luo, Q.; Wu, S.; Ye, X.; Feng, H.; Du, Z. Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study. arXiv 2024, arXiv:2408.14438. [Google Scholar]
- Li, Y.; Wang, H.; Zhang, C. Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study. In Proceedings of the North American Chapter of the Association for Computational Linguistics, Mexico City, Mexico, 16–21 June 2024. [Google Scholar]
- Lin, Y.; Na, Z.; Feng, Z.; Lin, B.; Lin, Y. Dual-game based UAV swarm obstacle avoidance algorithm in multi-narrow type obstacle scenarios. EURASIP J. Adv. Signal Process. 2023, 2023, 118. [Google Scholar] [CrossRef]
- Albrekht, Y.; Pysarenko, A. Exploring the power of heterogeneous UAV swarms through reinforcement learning. Technol. Audit Prod. Reserv. 2023, 6, 6–10. [Google Scholar] [CrossRef]
- Chen, J.; Xiao, K.; You, K.; Qing, X.; Ye, F.; Sun, Q. Hierarchical task assignment strategy for heterogeneous multi-UAV system in large-scale search and rescue scenarios. Int. J. Aerosp. Eng. 2021, 2021, 7353697. [Google Scholar] [CrossRef]
Path Type | Path Name | Gemini (%) | GeminiFlash (%) | GPT-4o (%) |
---|---|---|---|---|
Single | Circle | 90 | 90 | 80 |
Hyperbola | 70 | 10 | 10 | |
3-Petal rose | 70 | 50 | 90 | |
4-Petal rose | 70 | 30 | 100 | |
5-Petal rose | 70 | 70 | 100 | |
Sine wave | 20 | 60 | 60 | |
Helix | 100 | 90 | 100 | |
Double helix | 90 | 30 | 80 | |
Triple helix | 80 | 60 | 100 | |
Double conical helix | 50 | 0 | 30 | |
Composite | Star | 40 | 40 | 80 |
Zigzag | 90 | 60 | 90 | |
Heart | 10 | 0 | 10 | |
Cross | 100 | 60 | 100 | |
Pentagon | 70 | 80 | 90 | |
Hexagon | 10 | 20 | 80 | |
Triangle | 70 | 60 | 30 | |
Square | 60 | 90 | 100 | |
Octagon | 30 | 40 | 90 | |
Pyramid | 90 | 70 | 100 | |
Average Success Rate | 64.0 | 50.5 | 76.0 |
Path Type | Path Name | No Critic (%) | One Critic (%) | Three Critics (%) |
---|---|---|---|---|
Single | Circle | 90 | 80 | 90 |
Hyperbola | 50 | 50 | 70 | |
3-Petal rose | 40 | 50 | 70 | |
4-Petal rose | 30 | 70 | 70 | |
5-Petal rose | 70 | 70 | 70 | |
Sine wave | 40 | 40 | 20 | |
Helix | 90 | 90 | 100 | |
Double helix | 30 | 80 | 90 | |
Triple helix | 80 | 90 | 80 | |
Double Conical helix | 40 | 20 | 50 | |
Composite | Star | 30 | 40 | 40 |
Zigzag | 50 | 40 | 90 | |
Heart | 0 | 0 | 10 | |
Cross | 80 | 70 | 100 | |
Pentagon | 40 | 30 | 70 | |
Hexagon | 30 | 40 | 10 | |
Triangle | 90 | 60 | 70 | |
Square | 80 | 80 | 60 | |
Octagon | 40 | 30 | 30 | |
Pyramid | 100 | 90 | 90 | |
Average Success Rate | 54.5 | 56.0 | 64.0 |
Model | Generation (s) | Reflection (s) | Rounds | Total Time (s) |
---|---|---|---|---|
Gemini (no reflection) | 5.45 | – | – | 5.45 |
Gemini (1 critic) | 4.24 | 9.23 | 6 | 127.72 |
Gemini (3 critics) | 5.21 | 20.78 | 8 | 203.60 |
Gemini (5 critics) | 14.73 | 59.33 | 6 | 444.36 |
GPT-4o (3 critics) | 8.55 | 31.48 | 2 | 80.06 |
Gemini Flash (3 critics) | 1.73 | 9.38 | 9 | 94.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Aikins, G.; Dao, M.P.; Moukpe, K.J.; Eskridge, T.C.; Nguyen, K.-D. LEVIOSA: Natural Language-Based Uncrewed Aerial Vehicle Trajectory Generation. Electronics 2024, 13, 4508. https://doi.org/10.3390/electronics13224508
Aikins G, Dao MP, Moukpe KJ, Eskridge TC, Nguyen K-D. LEVIOSA: Natural Language-Based Uncrewed Aerial Vehicle Trajectory Generation. Electronics. 2024; 13(22):4508. https://doi.org/10.3390/electronics13224508
Chicago/Turabian StyleAikins, Godwyll, Mawaba Pascal Dao, Koboyo Josias Moukpe, Thomas C. Eskridge, and Kim-Doang Nguyen. 2024. "LEVIOSA: Natural Language-Based Uncrewed Aerial Vehicle Trajectory Generation" Electronics 13, no. 22: 4508. https://doi.org/10.3390/electronics13224508