Author(s)
|
Guan, Wen (Brookhaven National Laboratory (US)) ; Barreiro Megino, Fernando Harald (University of Texas at Arlington (US)) ; De, Kaushik (University of Texas at Arlington (US)) ; Karavakis, Edward (Brookhaven National Laboratory (US)) ; Lin, Fa-Hui (University of Texas at Arlington (US)) ; Maeno, Tadashi (Brookhaven National Laboratory (US)) ; Nilsson, Paul (Brookhaven National Laboratory (US)) ; Wenaus, Torre (Brookhaven National Laboratory (US)) ; Zhang, Rui (University of Wisconsin Madison (US)) ; Zhao, Xin (Brookhaven National Laboratory (US)) ; Yang, Zhaoyu (Brookhaven National Laboratory (US)) |
Abstract
| The growing complexity of high energy physics analysis often involves running a large number of different tools. This demands a multi-step data processing approach, with each step requiring different resources and carrying dependencies on preceding steps. It’s important and useful to have a tool to automate these diverse steps efficiently. With the Production and Distributed Analysis (PanDA) system and the intelligent Data Delivery Service (iDDS), we provide a platform for coordinating sequences of tasks with a workflow, orchestrating the seamless execution of tasks in a specified order and under predefined conditions, in order to automate the task sequence. In this presentation, we will present our efforts, beginning with an overview of the platform's architecture. We'll then describe a user-friendly interface with workflows described in python and tasks described by python functions. Next, we detail the flow to transform python functions into tasks and schedule tasks to distributed heterogeneous resources, coupled with a messaging-based asynchronous result-processing mechanism. Finally, we'll showcase a practical example illustrating how this platform effectively converts a machine learning hyperparameter optimization processing on an ATLAS ttH analysis to a distributed workflow. |