Applying Large Language Model to a Control System for Multi-Robot Task Assignment
<p>Level of the MultiBotGPT.</p> "> Figure 2
<p>MultiBotGPT performs a task: the operator entered the command “The UAV finds the number 3 and then the car moves to the position of the number 6”. The commands are passed through the Clue Core to the LLLM to ask GPT-3.5 and obtain a fixed format response (the format of the response is described by a predefined document and read by GPT-3.5 in advance), which is analyzed and processed by the Clue Core to send the corresponding tasks and parameters to the specific robot APIs.</p> "> Figure 3
<p>System architecture of MultiBotGPT.</p> "> Figure 4
<p>Rule text composition.</p> "> Figure 5
<p>Sequential execution of multiple tasks with example.</p> "> Figure 6
<p>Algorithm fusion in the layer of robot control.</p> "> Figure 7
<p>UAV and UGV map commands to functions.</p> "> Figure 8
<p>Simulation scenarios and robot models: (<b>a</b>) simulation scenarios, (<b>b</b>) UAV, (<b>c</b>) UGV.</p> "> Figure 9
<p>MultiBotGPT task execution results: (<b>A</b>) mission execution flow of a UAV searching for digital landmarks (searching for the number 6 as an example), (<b>B</b>) execution flow of a UGV reaching a task below a UAV, (<b>C</b>) task execution flow of a UGV reaching a numerical landmark (take reaching number 6 as an example).</p> "> Figure 10
<p>Success rates of tasks performed in MultiBotGPT and BERT as natural language-processing algorithms, respectively. Orange: MultiBotGPT, green: MultiBotBERT.</p> "> Figure 11
<p>Using the console to control UAV and UGV in the simulation scenario.</p> "> Figure 12
<p>Experiment results: (<b>a</b>) mean time consumption in three conditions, (<b>b</b>) mean self-evaluation performance scores in three conditions, (<b>c</b>) mean mental and physical consumption scores in three conditions.</p> ">
Abstract
:1. Introduction
2. MultiBotGPT Control System
2.1. System Architecture
- Rules Message Organize: After MultiBotGPT starts, the Clue Core will first set up the question and answer rules for GPT-3.5, the program will be based on the stored information, and eventually organize to form a completed text to send to GPT-3.5. GPT-3.5 understands the text message and the logical relationship therein and memorizes it, and in the question and answer process after that, GPT-3.5 will be able to answer the question and answer in accordance with the format that is set up in the rules. This part of the organization will call on the stored Competency Library information, which includes the tasks that the robot can perform in the system, the message formats needed for each task, and some examples. The information in the Competency Library will also be incorporated into the rule text, allowing GPT-3.5 to understand what the control system needs it to do.
- Obtaining Operator Commands: This module is the interface exposed to the operator to obtain commands, and this section generates a terminal input box for the operator to enter commands. In addition, this section can also be used with algorithms such as voice input to enable the input of voice control commands.
- Sending Question to LLM and Obtaining Response: This module is responsible for sending the commands entered by the operator to the GPT-3.5, obtaining the responses returned by the GPT-3.5, and performing the initial processing. Define the operation of this step as . After the operator enters the , it is processed and the initial task text of the reply is obtained as shown in Equation (1).
- Splitting Tasks and Sending in Order: After obtaining the GPT-3.5 response , this section will parse , which consists primarily of correcting portions of the GPT 3.5 response that are not output in the desired format, to minimize control system failures due to formatting issues. Define the operation of this step as . becomes after this step as shown in Equation (2).
- Basic Control: This module defines the underlying basic control of the robot, including the basic control of forward, backward, up, down (UAV), etc., and provides an interface for the Execution Mission module to call.
- Execution Mission module accomplishes broader tasks by utilizing the interfaces provided in Basic Control. For instance, by coordinating basic robotic movements, it enables the Unmanned Ground Vehicle (UGV) to navigate to specific coordinates, facilitates the Unmanned Aerial Vehicle (UAV) in reaching designated locations, and supports tasks such as automatic cruising for the UAV. This module is able to perform tasks that are identical to the Robotics Competency Library stored in the Clue Core, and it also includes a program that parses the task codes sent by the Clue Core. After obtaining the command from the Clue Core, the parameters of the task will be parsed and the task will be executed correctly.
- Return Execution Result: Regardless of the success or failure of the task execution, a message will be returned to the Clue Core informing about the result of the task execution to enable further operations.
- Save Information in Shared Libraries: Based on the topic messaging mechanism of ROS, bots are able to obtain information about each other. For example, when the UAV searches for a ground sign, it saves the corresponding numbers and coordinates of the ground sign, which speeds up the UAV’s search for the same sign and allows it to fulfill the function of guiding the UGV to the location.
2.2. MultiBotGPT Key Algorithm Realization
2.2.1. Layer of Core Interaction
- Rules message organize
- Preprocessing of responses from GPT-3.5
Algorithm 1 Preprocessing of Responses from GPT-3.5 |
Input: Original command text(Command) |
Output: Pre-processed mission text()
|
2.2.2. Layer of Robot Control
- Gmapping: Gmapping is a real-time, robust SLAM algorithm known for its simplicity of implementation, flexibility, and adaptability to dynamic environments. We used the Gmapping algorithm in conjunction with a UGV-mounted LiDAR to perform base mapping of the simulation environment and generate grid map that can be used for path planning.
- Theta* path planning: The Theta* algorithm is an improved path-planning algorithm that adds smoothing and flexibility to the A* algorithm by allowing the path to bend at any angle, thus generating more natural and intuitive paths. In our previous research, the Artificial Potential Field (APF) method was introduced into the Theta* algorithm to form the Theta*-APF algorithm [22].The Theta*-APF algorithm exhibits superior computational efficiency and path security. We introduce the Theta* algorithm in the control system, together with the grid map generated by the GMapping algorithm, to realize the path planning capability of the UGV.
- YOLOv7 image recognition: YOLOv7 is one of the newest target-detection algorithms in the YOLO series and stands out for its excellent real-time performance and high-precision detection capabilities. We introduce the YOLOv7 algorithm into the control system to realize the UAV’s recognition of ground targets (digital signage on the ground). During the UAV’s flight, the control system will drive the YOLOv7 algorithm in a separate thread to perform image recognition and share the recognition results in real time to the ROS for robot invocation.
- UGV
- UAV
3. Experiments and Results Presentation
3.1. Simulation Experiment
3.2. Comparison Experiment Between MultiBotGPT and Human Operation
3.2.1. Experimental Design
3.2.2. Presentation and Analysis of Experimental Results
4. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://hayate-lab.com/wp-content/uploads/2023/05/43372bfa750340059ad87ac8e538c53b.pdf (accessed on 10 October 2024).
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; et al. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Brohan, A.; Brown, N.; Carbajal, J.; Chebotar, Y.; Chen, X.; Choromanski, K.; Ding, T.; Driess, D.; Dubey, A.; Finn, C.; et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv 2023, arXiv:2307.15818. [Google Scholar]
- Bunk, T.; Varshneya, D.; Vlasov, V.; Nichol, A. Diet: Lightweight language understanding for dialogue systems. arXiv 2020, arXiv:2004.09936. [Google Scholar]
- Zhang, Y.; Jin, R.; Zhou, Z.H. Understanding bag-of-words model: A statistical framework. Int. J. Mach. Learn. Cybern. 2010, 1, 43–52. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Alzubi, J.A.; Jain, R.; Singh, A.; Parwekar, P.; Gupta, M. COBERT: COVID-19 question answering system using BERT. Arab. J. Sci. Eng. 2023, 48, 11003–11013. [Google Scholar] [CrossRef] [PubMed]
- Xu, H.; Liu, B.; Shu, L.; Yu, P.S. BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv 2019, arXiv:1904.02232. [Google Scholar]
- Alghanmi, I.; Anke, L.E.; Schockaert, S. Combining BERT with static word embeddings for categorizing social media. In Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-NUT 2020), Online, 19 November 2020; pp. 28–33. [Google Scholar]
- Ye, J.; Chen, X.; Xu, N.; Zu, C.; Shao, Z.; Liu, S.; Cui, Y.; Zhou, Z.; Gong, C.; Shen, Y.; et al. A comprehensive capability analysis of gpt-3 and gpt-3.5 series models. arXiv 2023, arXiv:2303.10420. [Google Scholar]
- Du, Z.; Qian, Y.; Liu, X.; Ding, M.; Qiu, J.; Yang, Z.; Tang, J. Glm: General language model pretraining with autoregressive blank infilling. arXiv 2021, arXiv:2103.10360. [Google Scholar]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen technical report. arXiv 2023, arXiv:2309.16609. [Google Scholar]
- Team, G.; Mesnard, T.; Hardin, C.; Dadashi, R.; Bhupatiraju, S.; Pathak, S.; Sifre, L.; Rivière, M.; Kale, M.S.; Love, J.; et al. Gemma: Open models based on gemini research and technology. arXiv 2024, arXiv:2403.08295. [Google Scholar]
- Huang, W.; Wang, C.; Zhang, R.; Li, Y.; Wu, J.; Fei-Fei, L. Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv 2023, arXiv:2307.05973. [Google Scholar]
- Chalvatzaki, G.; Younes, A.; Nandha, D.; Le, A.T.; Ribeiro, L.F.; Gurevych, I. Learning to reason over scene graphs: A case study of finetuning GPT-2 into a robot language model for grounded task planning. Front. Robot. AI 2023, 10, 1221739. [Google Scholar] [CrossRef] [PubMed]
- Ahn, M.; Brohan, A.; Brown, N.; Chebotar, Y.; Cortes, O.; David, B.; Finn, C.; Fu, C.; Gopalakrishnan, K.; Hausman, K.; et al. Do as i can, not as i say: Grounding language in robotic affordances. arXiv 2022, arXiv:2204.01691. [Google Scholar]
- Zhao, C.; Yuan, S.; Jiang, C.; Cai, J.; Yu, H.; Wang, M.Y.; Chen, Q. Erra: An embodied representation and reasoning architecture for long-horizon language-conditioned manipulation tasks. IEEE Robot. Autom. Lett. 2023, 8, 3230–3237. [Google Scholar] [CrossRef]
- Tang, C.; Huang, D.; Ge, W.; Liu, W.; Zhang, H. Graspgpt: Leveraging semantic knowledge from a large language model for task-oriented grasping. IEEE Robot. Autom. Lett. 2023, 8, 7551–7558. [Google Scholar] [CrossRef]
- Ding, Y.; Zhang, X.; Paxton, C.; Zhang, S. Task and motion planning with large language models for object rearrangement. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2086–2092. [Google Scholar]
- Zhao, W.; Li, L.; Wang, Y.; Zhan, H.; Fu, Y.; Song, Y. Research on A Global Path-Planning Algorithm for Unmanned Arial Vehicle Swarm in Three-Dimensional Space Based on Theta*–Artificial Potential Field Method. Drones 2024, 8, 125. [Google Scholar] [CrossRef]
Role | Output |
---|---|
Mission | UAV searches for number 1, then UGV reaches below UAV. |
Human Operator | UAV looking for where number 5 is. |
MultiBotGPT | Obtained the task, executing. |
MultiBotGPT | Searching for target with coordinates (−4.92,−6.84). |
Classifications | Participant Contro | Mix Control | MultiBotGPT Control |
---|---|---|---|
Time consumption | 23.8 s | 29.6 s | 18.2 s |
Score of Self-Evaluation Performance1 | 6.3 | 7.9 | 8.7 |
Score of Mental and Physical Consumption2 | 7.3 | 4.8 | 1.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, W.; Li, L.; Zhan, H.; Wang, Y.; Fu, Y. Applying Large Language Model to a Control System for Multi-Robot Task Assignment. Drones 2024, 8, 728. https://doi.org/10.3390/drones8120728
Zhao W, Li L, Zhan H, Wang Y, Fu Y. Applying Large Language Model to a Control System for Multi-Robot Task Assignment. Drones. 2024; 8(12):728. https://doi.org/10.3390/drones8120728
Chicago/Turabian StyleZhao, Wen, Liqiao Li, Hanwen Zhan, Yingqi Wang, and Yiqi Fu. 2024. "Applying Large Language Model to a Control System for Multi-Robot Task Assignment" Drones 8, no. 12: 728. https://doi.org/10.3390/drones8120728
APA StyleZhao, W., Li, L., Zhan, H., Wang, Y., & Fu, Y. (2024). Applying Large Language Model to a Control System for Multi-Robot Task Assignment. Drones, 8(12), 728. https://doi.org/10.3390/drones8120728