Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3586183.3606822acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

DiLogics: Creating Web Automation Programs with Diverse Logics

Published: 29 October 2023 Publication History

Abstract

Knowledge workers frequently encounter repetitive web data entry tasks, like updating records or placing orders. Web automation increases productivity, but translating tasks to web actions accurately and extending to new specifications is challenging. Existing tools can automate tasks that perform the same logical trace of UI actions (e.g., input text in each field in order), but do not support tasks requiring different executions based on varied input conditions. We present DiLogics, a programming-by-demonstration system that utilizes NLP to assist users in creating web automation programs that handle diverse specifications. DiLogics first semantically segments input data to structured task steps. By recording user demonstrations for each step, DiLogics generalizes the web macros to novel but semantically similar task requirements. Our evaluation showed that non-experts can effectively use DiLogics to create automation programs that fulfill diverse input instructions. DiLogics provides an efficient, intuitive, and expressive method for developing web automation programs satisfying diverse specifications.

Supplemental Material

ZIP File
Supplemental File

References

[1]
2023. Adept AI. https://www.adept.ai/blog/act-1 Accessed: March, 2023.
[2]
2023. Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/ Accessed: March, 2023.
[3]
2023. Help Needed with Keyword Assertion. https://forum.imacros.net/viewtopic.php?f=7&t=30990&sid=117ef7144580822f4982137e21eddfc2 Accessed: March, 2023.
[4]
2023. iMacro. https://www.progress.com/imacros Accessed: March, 2023.
[5]
2023. iMacro Forum. https://forum.imacros.net/ Accessed: March, 2023.
[6]
2023. iMacros Script to Extract Specific Text From Static Position. https://forum.imacros.net/viewtopic.php?f=7&t=28432&sid=01bf0edf807fd3d4bb507a35b87155ad Accessed: March, 2023.
[7]
2023. iMcaros Forum. https://forum.imacros.net/ Accessed: March, 2023.
[8]
2023. Programatically interact with the IE browser to fill in forms and navigate etc. https://stackoverflow.com/questions/8438782/programatically-interact-with-the-ie-browser-to-fill-in-forms-and-navigate-etc Accessed: March, 2023.
[9]
2023. Reddit API. https://www.reddit.com/dev/api/ Accessed: March, 2023.
[10]
2023. Stack Overflow. https://stackoverflow.com/search?q=%5Bselenium%5D+semantic Accessed: March, 2023.
[11]
2023. Taxy AI. https://taxy.ai/ Accessed: March, 2023.
[12]
2023. UiPath. https://www.uipath.com/ Accessed: March, 2023.
[13]
2023. UiPath Forum. https://forum.uipath.com/ Accessed: March, 2023.
[14]
James Allen, Nathanael Chambers, George Ferguson, Lucian Galescu, Hyuckchul Jung, Mary Swift, and William Taysom. 2007. Plow: A collaborative task learning agent. In AAAI, Vol. 7. 1514–1519.
[15]
A Blackwell. 2000. Your Wish is My Command: Giving Users the Power to Instruct their Software, chapter SWYN: a visual representation for regular expressions. M. Kaufmann (2000), 245–270.
[16]
Alan F Blackwell. 2001. SWYN: A visual representation for regular expressions. In Your wish is my command. Elsevier, 245–XIII.
[17]
Sarah E Chasins, Maria Mueller, and Rastislav Bodik. 2018. Rousillon: Scraping Distributed Hierarchical Web Data. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 963–975.
[18]
Qiaochu Chen, Aaron Lamoreaux, Xinyu Wang, Greg Durrett, Osbert Bastani, and Isil Dillig. 2021. Web question answering with neurosymbolic program synthesis. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 328–343.
[19]
Yan Chen and Tovi Grossman. 2021. Umitation: Retargeting UI Behavior Examples for Website Design. In The 34th Annual ACM Symposium on User Interface Software and Technology. 922–935.
[20]
Rui Dong, Zhicheng Huang, Ian Iong Lam, Yan Chen, and Xinyu Wang. 2022. WebRobot: Web Robotic Process Automation using Interactive Programming-by-Demonstration. arXiv preprint arXiv:2203.09993 (2022).
[21]
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
[22]
Matthew Honnibal, Ines Montani, Matthew Honnibal, Henning Peters, Sofie Van Landeghem, Maxim Samsonov, Jim Geovedi, Jim Regan, György Orosz, Søren Lind Kristiansen, Paul O’Leary McCann, Duygu Altinok, Roman, Grégory Howard, Sam Bozek, Explosion Bot, Mark Amery, Wannaphong Phatthiyaphaibun, Leif Uwe Vogelsang, Björn Böing, Pradeep Kumar Tippa, jeannefukumaru, GregDubbin, Vadim Mazaev, Ramanan Balakrishnan, Jens Dahl Møllerhøj, wbwseeker, Magnus Burton, thomasO, and Avadh Patel. 2019. explosion/spaCy: v2.1.7: Improved evaluation, better language factories and bug fixes. https://doi.org/10.5281/zenodo.3358113
[23]
Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 159–166. https://doi.org/10.1145/302979.303030
[24]
Rebecca Krosnick and Steve Oney. 2021. Understanding the Challenges and Needs of Programmers Writing Web Automation Scripts. In 2021 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1–9.
[25]
Rebecca Krosnick and Steve Oney. 2022. ParamMacros: Creating UI Automation Leveraging End-User Natural Language Parameterization. In 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 1–10. https://doi.org/10.1109/VL/HCC53370.2022.9833005
[26]
David Kurlander, Allen Cypher, and Daniel Conrad Halbert. 1993. Watch what I do: programming by demonstration. MIT press.
[27]
David Kurlander and Steven Feiner. 1992. A history-based macro by example system. In Proceedings of the 5th annual ACM symposium on User interface software and technology. 99–106.
[28]
Tessa Lau, Steven A Wolfman, Pedro Domingos, and Daniel S Weld. 2003. Programming by demonstration using version space algebra. Machine Learning 53, 1 (2003), 111–156.
[29]
Tessa A Lau, Pedro M Domingos, and Daniel S Weld. 2000. Version Space Algebra and its Application to Programming by Demonstration. In ICML. Citeseer, 527–534.
[30]
Vu Le and Sumit Gulwani. 2014. Flashextract: A framework for data extraction by examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. 542–553.
[31]
David Ledo, Steven Houben, Jo Vermeulen, Nicolai Marquardt, Lora Oehlberg, and Saul Greenberg. 2018. Evaluation strategies for HCI toolkit research. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–17.
[32]
Gilly Leshed, Eben M Haber, Tara Matthews, and Tessa Lau. 2008. CoScripter: automating & sharing how-to knowledge in the enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1719–1728.
[33]
Ian Li, Jeffrey Nichols, Tessa Lau, Clemens Drews, and Allen Cypher. 2010. Here’s what I did: Sharing and reusing web activity with ActionShot. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 723–732.
[34]
Toby Jia-Jun Li, Jingya Chen, Haijun Xia, Tom M. Mitchell, and Brad A. Myers. 2020. Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1094–1107. https://doi.org/10.1145/3379337.3415820
[35]
Toby Jia-Jun Li, Igor Labutov, Xiaohan Nancy Li, Xiaoyi Zhang, Wenze Shi, Wanling Ding, Tom M. Mitchell, and Brad A. Myers. 2018. APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 105–114. https://doi.org/10.1109/VLHCC.2018.8506506
[36]
Toby Jia-Jun Li, Lindsay Popowski, Tom Mitchell, and Brad A Myers. 2021. Screen2Vec: Semantic Embedding of GUI Screens and GUI Components. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3411764.3445049
[37]
Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M Mitchell, and Brad A Myers. 2019. Pumice: A multi-modal agent that learns concepts and conditionals from natural language and demonstrations. In Proceedings of the 32nd annual ACM symposium on user interface software and technology. 577–589.
[38]
Yang Li, Jiacong He, Xiaoxia Zhou, Yuan Zhang, and Jason Baldridge. 2020. Mapping Natural Language Instructions to Mobile UI Action Sequences. ArXiv abs/2005.03776 (2020).
[39]
Henry Lieberman. 1994. A user interface for knowledge acquisition from video. In AAAI. Citeseer, 527–534.
[40]
James Lin, Jeffrey Wong, Jeffrey Nichols, Allen Cypher, and Tessa A Lau. 2009. End-user programming of mashups with vegemite. In Proceedings of the 14th international conference on Intelligent user interfaces. 97–106.
[41]
Greg Little, Tessa A Lau, Allen Cypher, James Lin, Eben M Haber, and Eser Kandogan. 2007. Koala: capture, share, automate, personalize business processes on the web. In Proceedings of the SIGCHI conference on Human factors in computing systems. 943–946.
[42]
Karthik Mahadevan, Yan Chen, Maya Cakmak, Anthony Tang, and Tovi Grossman. 2022. Mimic: In-Situ Recording and Re-Use of Demonstrations to Support Robot Teleoperation. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–13.
[43]
David L Maulsby, Ian H Witten, and Kenneth A Kittlitz. 1989. Metamouse: Specifying graphical procedures by example. ACM SIGGRAPH Computer Graphics 23, 3 (1989), 127–136.
[44]
Dan H Mo and Ian H Witten. 1992. Learning text editing tasks from examples: a procedural approach. Behaviour & Information Technology 11, 1 (1992), 32–45.
[45]
Francesmary Modugno and Brad A Myers. 1994. Pursuit: Visual programming in a visual domain. Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.
[46]
Brad A Myers, Dario A Giuse, Roger B Dannenberg, Brad Vander Zanden, David S Kosbie, Edward Pervin, Andrew Mickish, and Philippe Marchal. 1995. GARNET comprehensive support for graphical, highly interactive user interfaces. In Readings in Human–Computer Interaction. Elsevier, 357–371.
[47]
Wode Ni, Joshua Sunshine, Vu Le, Sumit Gulwani, and Titus Barik. 2021. reCode: A Lightweight Find-and-Replace Interaction in the IDE for Transforming Code by Example. In The 34th Annual ACM Symposium on User Interface Software and Technology. 258–269.
[48]
Jeffrey Nichols and Tessa Lau. 2008. Mobilization by demonstration: using traces to re-author existing web sites. In Proceedings of the 13th international conference on Intelligent user interfaces. 149–158.
[49]
Don Norman. 2013. The design of everyday things: Revised and expanded edition. Basic books.
[50]
Peter O’Donovan, Aseem Agarwala, and Aaron Hertzmann. 2015. Designscape: Design with interactive layout suggestions. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 1221–1224.
[51]
Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. 2016. Neuro-symbolic program synthesis. arXiv preprint arXiv:1611.01855 (2016).
[52]
Panupong Pasupat, Tian-Shun Jiang, Evan Zheran Liu, Kelvin Guu, and Percy Liang. 2018. Mapping Natural Language Commands to Web Elements. arxiv:1808.09132 [cs.CL]
[53]
Kevin Pu, Rainey Fu, Rui Dong, Xinyu Wang, Yan Chen, and Tovi Grossman. 2022. SemanticOn: Specifying Content-Based Semantic Conditions for Web Automation Programs. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 63, 16 pages. https://doi.org/10.1145/3526113.3545691
[54]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]
[55]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
[56]
Alborz Rezazadeh Sereshkeh, Gary Leung, Krish Perumal, Caleb Phillips, Minfan Zhang, Afsaneh Fazly, and Iqbal Mohomed. 2020. VASTA: a vision and language-assisted smartphone task automation system. In Proceedings of the 25th international conference on intelligent user interfaces. 22–32.
[57]
Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, and Percy Liang. 2017. World of Bits: An Open-Domain Platform for Web-Based Agents. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 3135–3144. https://proceedings.mlr.press/v70/shi17a.html
[58]
Atsushi Sugiura and Yoshiyuki Koseki. 1998. Internet scrapbook: automating web browsing tasks by demonstration. In Proceedings of the 11th annual ACM symposium on User interface software and technology. 9–18.
[59]
Amanda Swearngin, Chenglong Wang, Alannah Oleson, James Fogarty, and Amy J Ko. 2020. Scout: Rapid Exploration of Interface Layout Alternatives through High-Level Design Constraints. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
[60]
Priyan Vaithilingam and Philip J Guo. 2019. Bespoke: Interactively synthesizing custom GUIs from command-line applications by demonstration. In Proceedings of the 32nd annual ACM symposium on user interface software and technology. 563–576.
[61]
Bryan Wang, Gang Li, and Yang Li. 2022. Enabling Conversational Interaction with Mobile UI using Large Language Models. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (2022).
[62]
Bryan Wang, Gang Li, Xin Zhou, Zhourong Chen, Tovi Grossman, and Yang Li. 2021. Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning. The 34th Annual ACM Symposium on User Interface Software and Technology (2021).
[63]
Andrew J Werth and Brad A Myers. 1993. Tourmaline (abstract) macrostyles by example. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems. 532.
[64]
Tom Yeh, Tsung-Hsiang Chang, and Robert C Miller. 2009. Sikuli: using GUI screenshots for search and automation. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. 183–192.
[65]
Nanxuan Zhao, Nam Wook Kim, Laura Mariah Herman, Hanspeter Pfister, Rynson WH Lau, Jose Echevarria, and Zoya Bylinskii. 2020. Iconate: Automatic compound icon generation and ideation. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.

Cited By

View all
  • (2024)Efficient Bottom-Up Synthesis for Programs with Local VariablesProceedings of the ACM on Programming Languages10.1145/36328948:POPL(1540-1568)Online publication date: 5-Jan-2024
  • (2024)ScrapeViz: Hierarchical Representations for Web Scraping Macros2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC60511.2024.00040(300-305)Online publication date: 2-Sep-2024

Index Terms

  1. DiLogics: Creating Web Automation Programs with Diverse Logics
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
    October 2023
    1825 pages
    ISBN:9798400701320
    DOI:10.1145/3586183
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. PBD
    2. Web automation
    3. neurosymbolic programming

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    UIST '23

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)263
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Bottom-Up Synthesis for Programs with Local VariablesProceedings of the ACM on Programming Languages10.1145/36328948:POPL(1540-1568)Online publication date: 5-Jan-2024
    • (2024)ScrapeViz: Hierarchical Representations for Web Scraping Macros2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC60511.2024.00040(300-305)Online publication date: 2-Sep-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media