short-paper

WiP: A Solution for Reducing MLLM-Based Agent Interaction Overhead

Authors:

Ming FuAuthors Info & Claims

EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation Models

Pages 16 - 17

https://doi.org/10.1145/3662006.3662062

Published: 11 June 2024 Publication History

Get Access

Abstract

Current Multi-modal LLM-based mobile agents are associated with concerns over high inference time and cost. We propose to tackle these issues by developing a lightweight UI Transition Graph (UTG) and locally executing automatic tasks. Specifically, we build a lightweight HTML-based UTG on both system-level and third-party applications, enabling the avoidance of computational overhead and laboriousness. Then we simplify the interaction phase with the LLM, and perform a local shortest path search on the UTG after a target option is derived from the LLM. The small-scale experiments demonstrate the benefits of our method.

References

[1]

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual Symposium on User Interface Software and Technology (UIST '17).

Digital Library

Google Scholar

[2]

OpenAI. 2021. ChatGPT. https://openai.com/research/chatgpt.

Google Scholar

[3]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

Google Scholar

[4]

Oriol Vinyals Rohan Anil, Jeffrey Dean. 2024. Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805 [cs.CL]

Google Scholar

[5]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV]

Google Scholar

[6]

Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. 2024. Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception. arXiv preprint arXiv:2401.16158 (2024).

Google Scholar

[7]

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. 2024. AutoDroid: LLM-powered Task Automation in Android. arXiv:2308.15272 [cs.AI]

Google Scholar

[8]

Chenfei Wu, Sheng-Kai Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models. ArXiv abs/2303.04671 (2023).

Google Scholar

[9]

Jason Wu, Siyan Wang, Siman Shen, Yi-Hao Peng, Jeffrey Nichols, and Jeffrey P Bigham. 2023. WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1--14.

Digital Library

Google Scholar

[10]

Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, and Jingren Zhou. 2023. mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration. arXiv:2311.04257 [cs.CL]

Google Scholar

[11]

Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, and Lucas Beyer. 2022. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12104--12113.

Crossref

Google Scholar

[12]

Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. 2024. UFO: A UI-Focused Agent for Windows OS Interaction. arXiv preprint arXiv:2402.07939 (2024).

Google Scholar

[13]

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. 2023. AppAgent: Multimodal Agents as Smartphone Users. arXiv:2312.13771 [cs.CV]

Google Scholar

Index Terms

WiP: A Solution for Reducing MLLM-Based Agent Interaction Overhead
1. Computing methodologies
  1. Artificial intelligence
2. Human-centered computing
  1. Ubiquitous and mobile computing

Recommendations

Persistency for Java-Based Mobile Agent Systems
ICIW '08: Proceedings of the 2008 Third International Conference on Internet and Web Applications and Services

In this paper, we present mobile agent systems supporting persistency. In order to develop the mobile agent systems, a mobile agent framework has to have functions to support persistence of agents and persistence of an agent runtime environment. Our ...
Agent programming in the cognitive era
Abstract
It is claimed that, in the nascent ‘Cognitive Era’, intelligent systems will be trained using machine learning techniques rather than programmed by software developers. A contrary point of view argues that machine learning has limitations, and, ...
AI based intelligent system on the EDISON platform
AICCC '18: Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference

In recent years, artificial intelligence (AI) has become a trend all over the world. This trend has led to the application and development of intelligent system that apply AI. In this paper, we describe a system architecture that uses AI, on a platform ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation Models

June 2024

44 pages

ISBN:9798400706639

DOI:10.1145/3662006

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

MOBISYS '24

Sponsor:

SIGMOBILE

MOBISYS '24: The 22nd Annual International Conference on Mobile Systems, Applications and Services

June 3 - 7, 2024

Tokyo, Minato-ku, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
106
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)13

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Persistency for Java-Based Mobile Agent Systems

Agent programming in the cognitive era

AI based intelligent system on the EDISON platform