Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3569951.3593597acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

Active Research Data Management with the Django Globus Portal Framework

Published: 10 September 2023 Publication History

Abstract

Publishing and sharing data is critical to fostering collaboration and advancing scientific research. Data portals are commonly used to organize, publish, and securely disseminate data—a critical step toward making data findable, accessible, interoperable, and reusable (FAIR). However, the diversity of scientific data types, sizes, and their location present significant challenges, e.g., it is difficult for portals to accommodate heterogenous research products when using strict metadata schemas and rigid interfaces. Thus, there is a need for a user-customizable data portal solution that enables rapid creation of new portals that may be tailored to a researchers needs while accommodating distributed data sources and engaging advanced computing resources. In this paper, we present the Django Globus Portal Framework (DGPF), a tool designed to help users rapidly create secure, customizable, and extensible data portals. DGPF is a powerful and flexible framework that builds upon the Globus platform for authentication, data sharing, creation of automation flows, and search capabilities, allowing for seamless integration with existing research workflows. We present the design and implementation of the DGPF and describe our experiences operating the Argonne Community Data Co-op (ACDC)—a collection of DGPF portals with over 1 M records and over 100 TB of published data that has been accessed by more than 300 users.

References

[1]
Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Ryan Chard, Brendan McCollam, Jim Pruyne, Stephen Rosen, Steven Tuecke, and Ian Foster. 2018. Globus Platform Services for Data Publication. In Practice and Experience on Advanced Research Computing (Pittsburgh, PA, USA) (PEARC ’18). ACM, New York, NY, USA, Article 14, 7 pages.
[2]
Rachana Ananthakrishnan, Kyle Chard, Ian Foster, and Steven Tuecke. 2015. Globus platform-as-a-service for collaborative science applications. Concurrency and Computation: Practice and Experience 27, 2 (2015), 290–305.
[3]
Python Social Auth. 2023. Python Social Auth. Retrieved March 2, 2023 from https://python-social-auth.readthedocs.io/en/latest/
[4]
B. Blaiszik, K. Chard, J. Pruyne, R. Ananthakrishnan, S. Tuecke, and I. Foster. 2016. The Materials Data Facility: Data Services to Advance Materials Science Research. JOM 68, 8 (July 2016), 2045–2052. https://doi.org/10.1007/s11837-016-2001-3
[5]
Ben Blaiszik, Logan Ward, Marcus Schwarting, Jonathon Gaff, Ryan Chard, Daniel Pike, Kyle Chard, and Ian Foster. 2019. A data ecosystem to support machine learning in materials science. MRS Communications 9, 4 (2019), 1125–1133.
[6]
James F Brinkley, Shannon Fisher, Matthew P Harris, Greg Holmes, Joan E Hooper, Ethylin Wang Jabs, Kenneth L Jones, Carl Kesselman, Ophir D Klein, Richard L Maas, 2016. The FaceBase Consortium: a comprehensive resource for craniofacial researchers. Development 143, 14 (2016), 2677–2688.
[7]
Amanda L Charbonneau, Arthur Brady, Karl Czajkowski, Jain Aluvathingal, Saranya Canchi, Robert Carter, Kyle Chard, Daniel JB Clarke, Jonathan Crabtree, Heather H Creasy, 2022. Making Common Fund data more findable: catalyzing a data ecosystem. GigaScience 11 (2022).
[8]
Kyle Chard, Eli Dart, Ian Foster, David Shifflett, Steven Tuecke, and Jason Williams. 2018. The Modern Research Data Portal: A design pattern for networked, data-intensive science. PeerJ Computer Science 4 (2018), e144.
[9]
Kyle Chard, Mattias Lidman, Brendan McCollam, Josh Bryan, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2016. Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management. Future Generation Computer Systems 56 (2016), 571–583. https://doi.org/10.1016/j.future.2015.09.006
[10]
Kyle Chard, Jim Pruyne, Ben Blaiszik, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2015. Globus data publication as a service: Lowering barriers to reproducible science. In 2015 IEEE 11th International Conference on e-Science. IEEE, 401–410.
[11]
K. Chard, S. Tuecke, and I. Foster. 2014. Efficient and Secure Transfer, Synchronization, and Sharing of Big Data. IEEE Cloud Computing 1, 3 (2014), 46–55.
[12]
Ryan Chard, Jim Pruyne, Kurt McKee, Josh Bryan, Brigitte Raumann, Rachana Ananthakrishnan, Kyle Chard, and Ian T Foster. 2023. Globus automation services: Research process automation across the space–time continuum. Future Generation Computer Systems (2023).
[13]
LSST Dark Energy Science Collaboration. 2023. LSSTDESC Data Portal. Retrieved March 2, 2023 from https://data.lsstdesc.org/
[14]
Django Globus App Cookiecutter. 2023. Django Globus App Cookiecutter. Retrieved March 2, 2023 from https://github.com/globus/cookiecutter-django-globus-app
[15]
Mercè Crosas. 2011. The dataverse network: an open-source application for sharing, discovering and preserving data. D-lib Magazine 17, 1/2 (2011).
[16]
Django Globus Portal Framework Documentation. 2023. Django Globus Portal Framework Documentation. Retrieved March 2, 2023 from https://django-globus-portal-framework.readthedocs.io/
[17]
European Organization For Nuclear Research and OpenAIRE. 2013. Zenodo. https://doi.org/10.25495/7GXK-RD71
[18]
Django Software Foundation. 2023. Object Relational Mappers. Retrieved June 9, 2023 from https://docs.djangoproject.com/en/4.2/topics/db/models/
[19]
Jeremy Goecks, Anton Nekrutenko, James Taylor, and Galaxy Team team@ galaxyproject. org. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11 (2010), 1–13.
[20]
Clinton Gormley and Zachary Tong. 2015. ElasticSearch: The definitive guide: a distributed real-time search and analytics engine. O’Reilly Media, Inc.
[21]
Dick Hardt. 2012. The OAuth 2.0 authorization framework. Technical Report.
[22]
Katrin Heitmann, Thomas D Uram, Hal Finkel, Nicholas Frontiere, Salman Habib, Adrian Pope, Esteban Rangel, Joseph Hollowed, Danila Korytov, Patricia Larsen, 2019. Hacc cosmological simulations: First data release. The Astrophysical Journal Supplement Series 244, 1 (2019), 17.
[23]
Faisal Khan, Suresh Narayanan, Roger Sersted, Nicholas Schwarz, and Alec Sandy. 2018. Distributed X-ray photon correlation spectroscopy data reduction using Hadoop MapReduce. Journal of Synchrotron Radiation 25, 4 (2018), 1135–1143.
[24]
Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, 2011. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments. 21–28.
[25]
Michael McLennan and Rick Kennell. 2010. HUBzero: a platform for dissemination and collaboration in computational science and engineering. Computing in Science & Engineering 12, 2 (2010), 48–53.
[26]
Natsuhiko Sakimura, John Bradley, Mike Jones, Breno De Medeiros, and Chuck Mortimore. 2014. Openid connect core 1.0. The OpenID Foundation (2014), S3.
[27]
Darren A Sherrell, Alex Lavens, Mateusz Wilamowski, Youngchang Kim, Ryan Chard, Krzysztof Lazarski, Gerold Rosenbaum, Rafael Vescovi, Jessica L Johnson, Chase Akins, 2022. Fixed-target serial crystallography at the Structural Biology Center. Journal of Synchrotron Radiation 29, 5 (2022).
[28]
Tyler J Skluzacek, Ryan Wong, Zhuozhao Li, Ryan Chard, Kyle Chard, and Ian Foster. 2021. A serverless framework for distributed bulk metadata extraction. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 7–18.
[29]
Joe Stubbs, Richard Cardone, Mike Packard, Anagha Jamthe, Smruti Padhy, Steve Terry, Julia Looney, Joseph Meiring, Steve Black, Maytal Dahan, 2021. Tapis: an API platform for reproducible, distributed computational research. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 1. Springer, 878–900.
[30]
The Globus Team. 2023. Django Globus Portal Framework Github. Retrieved March 2, 2023 from https://github.com/globus/django-globus-portal-framework
[31]
Steven Tuecke, Rachana Ananthakrishnan, Kyle Chard, Mattias Lidman, Brendan McCollam, Stephen Rosen, and Ian Foster. 2016. Globus Auth: A research identity and access management platform. In IEEE 12th International Conference on e-Science (e-Science). IEEE, 203–212.
[32]
Rafael Vescovi, Ryan Chard, Nickolaus Saint, Ben Blaiszik, Jim Pruyne, Tekin Bicer, Alex Lavens, Zhengchun Liu, Michael E. Papka, Suresh Narayanan, Nicholas Schwarz, Kyle Chard, and Ian Foster. 2022. Linking Instruments and HPC: Patterns, Technologies, Experiences. Arxiv.
[33]
Siniša Veseli, Nicholas Schwarz, and Collin Schmitz. 2018. APS data management system. Journal of Synchrotron Radiation 25, 5 (2018), 1574–1580.

Cited By

View all
  • (2024)Zero Code and Infrastructure Research Data PortalsPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670595(1-4)Online publication date: 17-Jul-2024
  • (2024)Cheap and FAIR: A Serverless Research Data Repository for the Next Generation Cosmic Microwave Background ExperimentPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670558(1-4)Online publication date: 17-Jul-2024
  • (2024)Design Thinking for Human Centric Research Data Systems EngineeringPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670542(1-7)Online publication date: 17-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '23: Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good
July 2023
519 pages
ISBN:9781450399852
DOI:10.1145/3569951
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FAIR Data
  2. Globus
  3. Modern Research Data Portal

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)167
  • Downloads (Last 6 weeks)31
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Zero Code and Infrastructure Research Data PortalsPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670595(1-4)Online publication date: 17-Jul-2024
  • (2024)Cheap and FAIR: A Serverless Research Data Repository for the Next Generation Cosmic Microwave Background ExperimentPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670558(1-4)Online publication date: 17-Jul-2024
  • (2024)Design Thinking for Human Centric Research Data Systems EngineeringPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670542(1-7)Online publication date: 17-Jul-2024
  • (2024)GPU-Driven Optimization of Web-Based Volume Rendering in Peripheral Artery Disease CT Imaging2024 IEEE 24th International Conference on Bioinformatics and Bioengineering (BIBE)10.1109/BIBE63649.2024.10820457(1-8)Online publication date: 27-Nov-2024
  • (2024)Research data management in institutional repositories: an architectural approach using data lakehousesDigital Library Perspectives10.1108/DLP-02-2024-0022Online publication date: 24-Dec-2024
  • (2023)Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line / Microscope to SupercomputersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624614(2140-2146)Online publication date: 12-Nov-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media