Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3219104.3219122acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

Deploying Jupyter Notebooks at scale on XSEDE resources for Science Gateways and workshops

Published: 22 July 2018 Publication History

Abstract

Jupyter Notebooks have become a mainstream tool for interactive computing in every field of science. Jupyter Notebooks are suitable as companion applications for Science Gateways, providing more flexibility and post-processing capability to the users. Moreover they are often used in training events and workshops to provide immediate access to a pre-configured interactive computing environment. The Jupyter team released the JupyterHub web application to provide a platform where multiple users can login and access a Jupyter Notebook environment. When the number of users and memory requirements are low, it is easy to setup JupyterHub on a single server. However, setup becomes more complicated when we need to serve Jupyter Notebooks at scale to tens or hundreds of users. In this paper we will present three strategies for deploying JupyterHub at scale on XSEDE resources. All options share the deployment of JupyterHub on a Virtual Machine on XSEDE Jetstream. In the first scenario, JupyterHub connects to a supercomputer and launches a single node job on behalf of each user and proxies back the Notebook from the computing node back to the user's browser. In the second scenario, implemented in the context of a XSEDE consultation for the IRIS consortium for Seismology, we deploy Docker in Swarm mode to coordinate many XSEDE Jetstream virtual machines to provide Notebooks with persistent storage and quota. In the last scenario we install the Kubernetes containers orchestration framework on Jetstream to provide a fault-tolerant JupyterHub deployment with a distributed filesystem and capability to scale to thousands of users. In the conclusion section we provide a link to step-by-step tutorials complete with all the necessary commands and configuration files to replicate these deployments.

References

[1]
D. Baron and D. Poznanski. 2017. The weirdest SDSS galaxies: results from an outlier detection algorithm. MNRAS 465 (March 2017), 4530--4555. arXiv:1611.07526
[2]
Andrew M. Gross, Ryan K. Orosco, John P. Shen, Ann Marie Egloff, Hannah Carter, Matan Hofree, Michel Choueiri, Charles S. Coffey, Scott M. Lippman, D. Neil Hayes, Ezra E. Cohen, Jennifer R. Grandis, Quyen T. Nguyen, and Trey Ideker. 2014. Multi-tiered genomic analysis of head and neck cancer ties TP53 mutation to 3p loss. Nature Genetics 46 (03 Aug 2014), 939 EP --.
[3]
Morris A. Jette, Andy B. Yoo, and Mark Grondona. 2002. SLURM: Simple Linux Utility for Resource Management. In In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP) 2003. Springer-Verlag, 44--60.
[4]
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Peréz, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, and Carol Willing. 2016. Jupyter Notebooks -- a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, F. Loizides and B. Schmidt (Eds.). IOS Press, 87--90.
[5]
Katherine A. Lawrence, Michael Zentner, Nancy Wilkins-Diehr, Julie A. Wernert, Marlon Pierce, Suresh Marru, and Scott Michael. 2015. Science gateways today and tomorrow: positive perspectives of nearly 5000 members of the research community. Concurrency and Computation: Practice and Experience 27, 16 (2015), 4252--4268.
[6]
Mark A Miller, Wayne Pfeiffer, and Terri Schwartz. 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In Gateway Computing Environments Workshop (GCE), 2010. Ieee, 1--8.
[7]
Michael Milligan. 2017. Interactive HPC Gateways with Jupyter and Jupyterhub. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability Success and Impact (PEARC17). ACM, New York, NY, USA, Article 63, 4 pages.
[8]
Fernando Peréz and Brian E. Granger. 2007. IPython: a System for Interactive Scientific Computing. Computing in Science and Engineering 9, 3 (May 2007), 21--29.
[9]
Will Reese. 2008. Nginx: the high-performance web server and reverse proxy. Linux Journal 2008, 173 (2008), 2.
[10]
J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr. 2014. XSEDE: Accelerating Scientific Discovery. Computing in Science & Engineering 16, 5 (Sept.-Oct. 2014), 62--74.
[11]
Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 307--320.
[12]
Nancy Wilkins-Diehr, Sergiu Sanielevici, Jay Alameda, John Cazes, Lonnie Crosby, Marlon Pierce, and Ralph Roskies. 2016. An overview of the XSEDE extended collaborative support program. In High Performance Computer Applications - 6th International Conference, ISUM 2015, Revised Selected Papers (Communications in Computer and Information Science), Vol. 595. Springer Verlag, Germany, 3--13.

Cited By

View all
  • (2023)EasyScienceGateway: A new framework for providing reproducible user environments on science gatewaysConcurrency and Computation: Practice and Experience10.1002/cpe.792936:4Online publication date: 13-Oct-2023
  • (2022)A Study about Future Prospects of JupyterHub in MOOCsProceedings of the Ninth ACM Conference on Learning @ Scale10.1145/3491140.3529537(275-279)Online publication date: 1-Jun-2022
  • (2021)Jetstream2: Accelerating cloud computing via JetstreamPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions10.1145/3437359.3465565(1-8)Online publication date: 17-Jul-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity
July 2018
652 pages
ISBN:9781450364461
DOI:10.1145/3219104
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Docker
  2. Jupyter Notebook
  3. Kubernetes
  4. PEARC Proceedings

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '18

Acceptance Rates

PEARC '18 Paper Acceptance Rate 79 of 123 submissions, 64%;
Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)112
  • Downloads (Last 6 weeks)20
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)EasyScienceGateway: A new framework for providing reproducible user environments on science gatewaysConcurrency and Computation: Practice and Experience10.1002/cpe.792936:4Online publication date: 13-Oct-2023
  • (2022)A Study about Future Prospects of JupyterHub in MOOCsProceedings of the Ninth ACM Conference on Learning @ Scale10.1145/3491140.3529537(275-279)Online publication date: 1-Jun-2022
  • (2021)Jetstream2: Accelerating cloud computing via JetstreamPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions10.1145/3437359.3465565(1-8)Online publication date: 17-Jul-2021
  • (2021)Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds2021 IEEE 17th International Conference on eScience (eScience)10.1109/eScience51609.2021.00013(30-39)Online publication date: Sep-2021
  • (2021)Easy-to-Use Cloud Computing for Teaching Data ScienceJournal of Statistics and Data Science Education10.1080/10691898.2020.186072629:sup1(S103-S111)Online publication date: 22-Mar-2021
  • (2020)Introducing Students to Scientific Python for Atmospheric ScienceBulletin of the American Meteorological Society10.1175/BAMS-D-20-0069.1101:9(E1492-E1496)Online publication date: 1-Sep-2020
  • (2020)Using Containers to Create More Interactive Online Training and Education MaterialsPractice and Experience in Advanced Research Computing 2020: Catch the Wave10.1145/3311790.3396641(246-251)Online publication date: 26-Jul-2020
  • (2018)Fuzzy-Based Conversational Recommender for Data-intensive Science Gateway Applications2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622046(4870-4875)Online publication date: Dec-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media