3.1 Insights from AutoDOViz
From exploratory interviews, user studies, and reflections from previous work in AutoDOViz [
69], we were able to derive further features and suggestions that could drive our system design for AutoRL X. Furthermore, since the AutoDO [
41] engine is not easily accessible as an open-source software outside the IBM network for industry and clients, we propose
AutoRL X as an alternative option to visualize and interact with AutoRL on alternative engines. In AutoDOViz, we further introduced an ensemble of different visualizations for RL analytics, such as state space visualizations, action space visualizations, policy and value function visualization, training progress and convergence, or agent-environment interaction dynamics. We also proposed a set of interactive tools tailored for RL visualization. For example, drag-and-drop features should allow users to input and then customize their own RL environment, or zooming features in the performance charts to focus on particular areas of the state space or time intervals, also visualizing real-time feedback for hyperparameter tuning and algorithm adjustments, and line charts to compare different RL algorithms (agents) and its configurations side by side. Principles and requirements that led to the design decisions of AutoDOViz were derived from exploratory semi-structured interviews with two groups. First, we interviewed DO practitioners with roles, titles, and personas ranging from business representatives, business analysts, data engineers, data scientists, quantitative analysts, developers, IT specialists, as well as salespeople to one or more optimization experts. A second round of remote video interviews with another user group, namely eight business consultants in different domains ranging from specialists in agriculture, oil, automotive, government, manufacturing, or retail industries. The potential target end-users for AutoDOViz were identified to be data scientists. However, in AutoRL X, we were hoping to enable domain experts from non-computer science fields that need to solve optimization problems. Insights from these semi-structured interviews led to 9 design requirements for AutoDOViz for developing human-centered automation for DO using RL. These requirements include (1) creation of generic templates that match common categories of DO problems, (2) visual tools for categorizing DO challenges, (3) fostering stakeholder collaboration, (4) defining user skills and goals, (5) supporting the complete workflow within a unified framework, (6) enhancing trust in automated solutions, (7) demystifying RL training for non-experts, (8) organizing templates by industry for easy access, and (9) offering templates for widespread business issues across various sectors. In our AutoDOViz user interface, a dashboard provides users with straightforward access to three core entities: environments (gyms), engine configurations, and executed jobs (
Figure 3). This structured layout enables users, including business domain stakeholders, to trigger executions and access high-level visualizations of their RL experiments without delving into the technicalities of gym implementations. Moreover, AutoDOViz incorporates a configuration wizard that simplifies the complex process of setting up RL agents and their hyperparameters and implements two further types of visualizations, transition matrices and trajectory networks, to present behavioral information about the agent to provide detailed insights and increased confidence to the user. The interface displays a list of selectable RL agents and, for each one, a detailed configuration panel that allows users to adjust hyperparameters, providing options for types, possible values for discrete parameters, ranges for continuous ones, and default values. Further, tutoring interface strategies of AutoDOViz are applied in modelling the gym, where the composer led users through a series of decisions.
3.2 Requirements
Our main priority in developing an open-source version was to maintain the functionalities that we offered users in the proprietary software. We still aim to incorporate additional findings and qualitative feedback from the user studies, which were used as a baseline to informally derive our requirements. AutoDOViz [
69] Section 5.2.4 describes participants’ post-survey reflections, likes and dislikes, and findings of the post-study questionnaire. The post-study questionnaire consisted of 14 questions, including 11 5-point Likert agreement scale questions. The user study was conducted with 13 participants who were encouraged to “think aloud” as they followed through with user study tasks while working in AutoDOViz’s
user interface (UI).
In the following section, we present the requirements that guide the development of AutoRL X, as a more refined and user-centric follow-up system: First, areas which participants of the AutoDOViz study felt could be improved, included user experience for small screen-size users to reduce scrolling in the composer screen. There is a need to ensure the system is optimized for various devices, including tablets and mobile phones (
R1). The agent listing screen was also identified by the users as an area for improvement. Enhancing its layout, functionality, and filter options could provide a smoother experience (
R2). One participant expressed a need for more understandable visualizations. Addressing this could involve using tooltips, legends, and contextual guides to help users decipher visual data (
R3). Suggestions were made to incorporate time sliders to replay real-time feedback on agent progress visualizations. This feature would allow users to rewind, pause, and analyze agents’ actions over time (
R4). For those less familiar with the tool, there is a need for additional on-screen explanations, tutorials, or a help section to guide them (
R5). Given that one user expressed interest in collaborating using AutoDOViz, introducing features that enable collaboration, such as shared views, commenting, and real-time edits, could be beneficial (
R6). Several participants showed interest in integrating AutoDOViz with their existing toolkits. Developing plugins or APIs to facilitate integration with popular data science and ML tools could enhance its adoption (
R7). Based on feedback, while users are keen on using pre-existing templates, there is a hesitance in contributing due to confidentiality concerns. A potential solution is to provide more generic templates or allow for anonymized sharing (
R8). Recognizing that preferences for working in shared vs. custom environments are highly use-case dependent, the system could offer more granular control over environment settings, with attention to security, privacy, and cost (
R9). While the UI was appreciated for its familiarity, maintaining consistency with popular data science software can ensure users find the platform intuitive (
R10). Since the UI successfully allowed data scientists to learn about DO tasks quickly, adding more educational tools, walkthroughs, or interactive demos might enhance user understanding (
R11). Emphasizing transparency, especially on metrics, was claimed essential by user study participants (
R12).
Table 1 lists all the requirements we could identify.
3.3 AutoRL X Architecture
The architectural schema of the AutoRL X system is depicted in
Figure 4. Informed by the identified requirements, the design closely mirrors that of our proprietary AutoDOViz platform. Similarly, the system should manage three entities by the user of AutoRL X: gyms (or environments), engine configurations and agents, and resulting runs (or jobs). The system architecture is structured into the following three principal components:
Backend. The backend of our application uses an open-source AutoRL engine, ARLO [
47], to facilitate automatic computation of RL pipelines. ARLO handles OpenAI Gyms [
6], MuJoCo [
60]
three-dimensional (3D) environments, and leverages agent implementations from Mushroom RL [
13]. It is suitable for diverse research and development scenarios.
Figure 5 shows eight ARLO [
47] models offered through our UI. While we are focusing on Online RL scenarios throughout this work, next to DQN, PPO, SAC, DDPG, Gradient of a Partially ObservableMarkov Decision Process, the ARLO framework also features FQI, DoubleFQI, and LSPI for Offline scenarios. ARLO further provides different tuner strategies that our users can choose from in our interface. For example, a Genetic Tuner, which evolves a population of model configurations by mutation and selection to optimize hyperparameter for performance on a given evaluation metric. We also provide access to the Optuna Tuner,
2 performing HPO by searching through a predefined space and evaluating model performance. It uses advanced algorithms to determine the best set of parameters with features like trial pruning and parallel execution to speed up the search. From a dropdown menu in the UI, the user can also choose different evaluation metrics for the RL pipeline, such as a discounted reward, which evaluates the average of cumulative rewards received over episodes, adjusted by a discount factor (
gamma) to account for the time value of rewards. In contrast, another metric, temporal difference error, calculates the average squared deviation between predicted and actual rewards in subsequent states, reflecting the accuracy of the value function. Lastly, a time series rolling average discounted reward evaluates the average of cumulative rewards received over episodes adjusted by a discount factor to account for the time value of rewards.
Further enhancing the backend, our system is designed with highest extensibility in mind (
R7). It is not limited to the proposed ARLO framework [
47]; the architecture allows for integration of alternative AutoRL engines. This is achieved by AutoRL X’s robust logging mechanism that records run metadata and streams model logs into a database, ensuring a structured and retrievable data management process. Additional AutoRL frameworks can simply be integrated by providing a job execution script (Python), and making them communicate with the REST API provided in AutoRL X.
API. The REST API (see
Figure 6) built with FastAPI
3 offers execution of RL gyms in ARLO and integrates with the SQL database via an internal
service layer. One part of the API is responsible for managing the database operations related to gyms, configurations, runs, and models. These endpoints contain different HTTP Methods to create or send data to the server. The services handle database connections securely and perform queries and updates efficiently to ensure lazy loading on the front-end. For example, to get information about model run episode trajectories, we offer a
get\(\_\)trajectory method in the runs endpoint, which retrieves only the currently selected step sequence requested by the user in the UI. The request body must be in JSON format.
Figure 6 exemplifies the gyms endpoint is essential for operations that involve adding new gym entities or updating existing ones within the system. Each gym instance has multiple attributes that need to be defined, and it supports customization through its various parameters and modules. Our API further comprises of other parts like a logger with
zlib compression support for fast write- and read operations, to enable an overall comprehensive and scalable backend solution.
Frontend. Our frontend framework of choice is Svelte.js [
53], a modern and friendly framework [
4] with
actual, compiler-ensured state-reactivity from the bottom. Svelte stands out for its innovative approach to building user interfaces, leading to more efficient updates and cleaner code. This advantage lies in Svelte’s departure from the virtual
document object model (DOM) paradigm, offering direct manipulation of the DOM and, thus, faster performance and a more straightforward development experience. To obtain responsive and user-friendly interface components, we add Carbon Design Framework. The
Carbon Components Svelte library implements the Carbon Design System,
4 an open-source design system developed by IBM that emphasizes reuse, consistency, and extensibility. This design is tailored for complex, enterprise-level interfaces to accelerate the development process while maintaining a high standard of design quality and user experience. By utilizing this library, we were able to maintain a consistent look and feel across our application. Our decision was informed by insights gathered from interviews with data scientists using AutoDOViz [
69], and many were already acquainted with the Carbon UI library from other software tools they use in their daily work. The familiarity of the UI framework contributed positively to the user experience. Participants noted that this consistency aided them in efficiently performing their tasks, fulfilling the user interface consistency requirement
R10 highlighted as a priority for our system’s design.
In result, as shown in
Figure 7, users of AutoRL X can now select from automatically refined agents in a pipeline run according to
R2 with novel filter options to search for specific phases like learn or test phase, a certain epoch, filter through iterations and actions to closer examine agents behaviors. In line with
R3 and
R4, we have added tooltips and time sliders to better retrieve information from the line charts allowing users to easier understand visual data. The user can also see which agents are still running or already finished. Next to the filtering options, we also added the possibility to more refinedly view agent logs, visited states, and hyperparameters that were demanded in
R12, improving transparency for the user. However, as mentioned in by [
8,
15,
52], capturing user trust can be challenging, which we also point out as a limitation for further discussion.
In response to user feedback received in AutoDOViz, where users expressed confusion regarding navigation of categories in the gym template catalog, we make a slightly improved design prototype in AutoRL X as seen in
Figure 12. Specifically, in AutoDOViz, we categorized gyms based on the
North American Industry Classification System (NAICS) into various business problem categories. We introduce a more user-friendly navigation system in AutoRL X compared to AutoDOViz, which guides users through the different gym categories via breadcrumbs to streamline the user experience and facilitate easier exploration. Similar to AutoDOViz, users click through the hierarchy via tiles on each level. On a leaf level, the gym catalog then shows the list of available templates; an example is shown in
Figure 13. For requirement
R1, we tested our platform on multiple devices: a tablet, smartphone and even the mixed reality browser offered in Meta Quest 3 by connecting via the local network. Overall, we addressed 8 of the 12 requirements from our full list in this work. The remaining requirements
R5, R6, R9, and
R11 are mapped to Github issues in our open-source repository (
https://github.com/lorifranke/autorlx).
Extensibility. We provide three points of extensibility: (1) We build up on the extensible OpenAI API [
6]. However, OpenAI provides a preliminary infrastructure where rendering can be challenging for RL environments, essentially using a pythonic render function. This setup based on Python is a disconnect from web applications and not dynamic or flexible enough to serve as real-time visualization. (2) Therefore, an extensible feature we have implemented is a 3D visualization showing agent dynamics within a simulated environment using WebGL [
22] and Three.js [
7]. Both are powerful tools for rendering interactive 3D graphics directly in web browsers without plugins. Unlike OpenAI’s Gym pythonic render function, this novel feature naturally supports interactivity within the web app, providing a native experience for users engaging with our platform. As illustrated in
Figure 14, users have the flexibility to create this visualization optionally by inserting TypeScript code via Three.js in the editor. This addition enables visualization of agent’s movements and actions within the environment, also the ability to see step-by-step agent interactions in different epochs, offering a more intuitive understanding of its behavior in 3D spaces. Users can use the step back button to see agents’ previous behavior and click through the sequence. Furthermore, the 3D environment is included as an interactive thumbnail-version in the catalog leaf node (see
Figure 13). (3) Another point regarding extensibility is the ARLO [
47] framework, which lays the foundation with its different models. Moreover, ARLO offers extensibility by allowing users to customize different RL pipelines and adding customized stages, by incorporating automatic RL training. (4) Lastly, we offer extensibility via gym code parametrization. In AutoRL X, we enhance the UI by enabling users to define individual parameters during the implementation of a gym. This approach marks a significant lesson learned, as it allows users to set parameters externally before execution of a run and test multiple gyms with slightly differing parameters (see
Figure 9)