Skip to main content

Peter W Rose

Followers

14

Following

7

Co-authors

7

Public Views

Interests

Uploads

Papers by Peter W Rose

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

ArXiv, 2018

Reproducibility of computational studies is a hallmark of scientific methodology. It enables rese... more Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or simulations. In this paper, we address several questions about reproducibility. For example, what are the technical and non-technical barriers to reproducible computational studies? What opportunities and challenges do computational notebooks offer to overcome some of these barriers? What tools are available and how can they be used effectively? We have developed a set of rules to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Notebooks combine detailed workflows with...

Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks

PLOS Computational Biology, 2019

A Biomedical Open Knowledge Network Harnesses the Power of AI to Understand Deep Human Biology

AI Magazine

Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to... more Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article, we describe concrete uses of Scalable PrecisiOn Medicine Knowledge Engine (SPOKE), an open knowledge network that connects curated information from thirty-seven specialized and human-curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID-19 research and chronic disease diagnosis, and management.

ISMB sequence-structure hackathon 2016!

F1000Research, Aug 11, 2016

The evolution of the RCSB Protein Data Bank website

Wiley Interdisciplinary Reviews: Computational Molecular Science, May 20, 2011

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) supports sc... more The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) supports scientific research and education by providing an essential resource of information about biomolecular structures. As a member of the Worldwide Protein Data Bank (wwPDB), the RCSB PDB curates and annotates the data about the experimentally determined three‐dimensional structures of proteins and nucleic acids that are deposited into the PDB archive. The RCSB PDB also provides online resources to access the data in the archive, including a relational database supporting simple and complex query and reporting, visualization tools, structure‐sequence comparison tools, access to the associated literature, and educational services. In the 11 years (1999–2010) since RCSB PDB has been in operation, the amount of data in the archive has increased six‐fold, along with an increase in the complexity of structures being determined and in the number of experimental methods used. The evolution required by RCSB PDB to meet these challenges provides insight into the motivation and challenges of developing and maintaining a major biological resource, particularly the one used in understanding the molecular details of living systems in both normal and disease states. © 2011 John Wiley &amp; Sons, Ltd. WIREs Comput Mol Sci 2011 1 782–789 DOI: 10.1002/wcms.57This article is categorized under: Computer and Information Science &gt; Chemoinformatics

RCSB Protein Data Bank: Views of structural biology for basic and applied research

F1000Research, Jul 25, 2016

OUP accepted manuscript

Nucleic Acids Research, 2016

Pre-calculated protein structure alignments at the RCSB PDB website: Fig. 1

Carolina Digital Repository (University of North Carolina at Chapel Hill), 2010

Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm

PLOS Computational Biology, Apr 22, 2019

New online curriculum: the PDB pipeline and data archiving

Acta Crystallographica, Jul 20, 2018

Softwareof open access literature into the RCSB Protein Data Bank using BioLit

Background: Biological data have traditionally been stored and made publicly available through a ... more Background: Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now online and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB). Results: BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing t...

Compressive structural bioinformatics: High efficiency 3D structure compression

F1000Research, 2016

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven e... more The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility (P. aureginosa only). We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that, w...

Trendspotting in the Protein Data Bank

FEBS Letters, 2013

The Protein Data Bank (PDB) was established in 1971 as a repository for the three dimensional str... more The Protein Data Bank (PDB) was established in 1971 as a repository for the three dimensional structures of biological macromolecules. Since then, more than 85 000 biological macromolecule structures have been determined and made available in the PDB archive. Through analysis of the corpus of data, it is possible to identify trends that can be used to inform us abou the future of structural biology and to plan the best ways to improve the management of the ever‐growing amount of PDB data.

BioJava 5: A community driven open-source bioinformatics library

PLOS Computational Biology, Feb 8, 2019

EROS 6.0, a Knowledge Based System for Reaction Prediction — Application to the Regioselectivity of the Diels-Alder Reaction

Software Development in Chemistry 4, 1990

The RCSB PDB Protein Comparison Tool

F1000Research, 2011

BioJava-ModFinder: identification of protein modifications in 3D structures from the Protein Data Bank

Bioinformatics, 2017

Summary We developed a new software tool, BioJava-ModFinder, for identifying protein modification... more Summary We developed a new software tool, BioJava-ModFinder, for identifying protein modifications observed in 3D structures archived in the Protein Data Bank (PDB). Information on more than 400 types of protein modifications were collected and curated from annotations in PDB, RESID, and PSI-MOD. We divided these modifications into three categories: modified residues, attachment modifications, and cross-links. We have developed a systematic method to identify these modifications in 3D protein structures. We have integrated this package with the RCSB PDB web application and added protein modification annotations to the sequence diagram and structure display. By scanning all 3D structures in the PDB using BioJava-ModFinder, we identified more than 30 000 structures with protein modifications, which can be searched, browsed, and visualized on the RCSB PDB website. Availability and Implementation BioJava-ModFinder is available as open source (LGPL license) at (https://github.com/biojava...

The Protein Data Bank: Overview and Tools for Drug Discovery

NATO Science for Peace and Security Series A: Chemistry and Biology, 2015

The increasing size and complexity of the three dimensional (3D) structures of biomacromolecules ... more The increasing size and complexity of the three dimensional (3D) structures of biomacromolecules in the Protein Data Bank (PDB) is a reflection of the growth in the field of structural biology. Although the PDB archive was initially used only in the field of structural biology, it has grown to become a valuable resource for understanding biology at a molecular level and is critical for designing new therapeutic options for various diseases. The many uses of the PDB archive depend upon on the tools and resources for both data management and for data access and analysis.

Thermodynamics and kinetics of ligand—protein binding studied with the weighted histogram analysis method and simulated annealing

Biocomputing '99, 1998

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

ArXiv, 2018

Reproducibility of computational studies is a hallmark of scientific methodology. It enables rese... more Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or simulations. In this paper, we address several questions about reproducibility. For example, what are the technical and non-technical barriers to reproducible computational studies? What opportunities and challenges do computational notebooks offer to overcome some of these barriers? What tools are available and how can they be used effectively? We have developed a set of rules to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Notebooks combine detailed workflows with...

Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks

PLOS Computational Biology, 2019

A Biomedical Open Knowledge Network Harnesses the Power of AI to Understand Deep Human Biology

AI Magazine

Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to... more Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article, we describe concrete uses of Scalable PrecisiOn Medicine Knowledge Engine (SPOKE), an open knowledge network that connects curated information from thirty-seven specialized and human-curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID-19 research and chronic disease diagnosis, and management.

ISMB sequence-structure hackathon 2016!

F1000Research, Aug 11, 2016

The evolution of the RCSB Protein Data Bank website

Wiley Interdisciplinary Reviews: Computational Molecular Science, May 20, 2011

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) supports sc... more The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) supports scientific research and education by providing an essential resource of information about biomolecular structures. As a member of the Worldwide Protein Data Bank (wwPDB), the RCSB PDB curates and annotates the data about the experimentally determined three‐dimensional structures of proteins and nucleic acids that are deposited into the PDB archive. The RCSB PDB also provides online resources to access the data in the archive, including a relational database supporting simple and complex query and reporting, visualization tools, structure‐sequence comparison tools, access to the associated literature, and educational services. In the 11 years (1999–2010) since RCSB PDB has been in operation, the amount of data in the archive has increased six‐fold, along with an increase in the complexity of structures being determined and in the number of experimental methods used. The evolution required by RCSB PDB to meet these challenges provides insight into the motivation and challenges of developing and maintaining a major biological resource, particularly the one used in understanding the molecular details of living systems in both normal and disease states. © 2011 John Wiley &amp; Sons, Ltd. WIREs Comput Mol Sci 2011 1 782–789 DOI: 10.1002/wcms.57This article is categorized under: Computer and Information Science &gt; Chemoinformatics

RCSB Protein Data Bank: Views of structural biology for basic and applied research

F1000Research, Jul 25, 2016

OUP accepted manuscript

Nucleic Acids Research, 2016

Pre-calculated protein structure alignments at the RCSB PDB website: Fig. 1

Carolina Digital Repository (University of North Carolina at Chapel Hill), 2010

Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm

PLOS Computational Biology, Apr 22, 2019

New online curriculum: the PDB pipeline and data archiving

Acta Crystallographica, Jul 20, 2018

Softwareof open access literature into the RCSB Protein Data Bank using BioLit

Background: Biological data have traditionally been stored and made publicly available through a ... more Background: Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now online and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB). Results: BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing t...

Compressive structural bioinformatics: High efficiency 3D structure compression

F1000Research, 2016

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven e... more The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility (P. aureginosa only). We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that, w...

Trendspotting in the Protein Data Bank

FEBS Letters, 2013

The Protein Data Bank (PDB) was established in 1971 as a repository for the three dimensional str... more The Protein Data Bank (PDB) was established in 1971 as a repository for the three dimensional structures of biological macromolecules. Since then, more than 85 000 biological macromolecule structures have been determined and made available in the PDB archive. Through analysis of the corpus of data, it is possible to identify trends that can be used to inform us abou the future of structural biology and to plan the best ways to improve the management of the ever‐growing amount of PDB data.

BioJava 5: A community driven open-source bioinformatics library

PLOS Computational Biology, Feb 8, 2019

EROS 6.0, a Knowledge Based System for Reaction Prediction — Application to the Regioselectivity of the Diels-Alder Reaction

Software Development in Chemistry 4, 1990

The RCSB PDB Protein Comparison Tool

F1000Research, 2011

BioJava-ModFinder: identification of protein modifications in 3D structures from the Protein Data Bank

Bioinformatics, 2017

Summary We developed a new software tool, BioJava-ModFinder, for identifying protein modification... more Summary We developed a new software tool, BioJava-ModFinder, for identifying protein modifications observed in 3D structures archived in the Protein Data Bank (PDB). Information on more than 400 types of protein modifications were collected and curated from annotations in PDB, RESID, and PSI-MOD. We divided these modifications into three categories: modified residues, attachment modifications, and cross-links. We have developed a systematic method to identify these modifications in 3D protein structures. We have integrated this package with the RCSB PDB web application and added protein modification annotations to the sequence diagram and structure display. By scanning all 3D structures in the PDB using BioJava-ModFinder, we identified more than 30 000 structures with protein modifications, which can be searched, browsed, and visualized on the RCSB PDB website. Availability and Implementation BioJava-ModFinder is available as open source (LGPL license) at (https://github.com/biojava...

The Protein Data Bank: Overview and Tools for Drug Discovery

NATO Science for Peace and Security Series A: Chemistry and Biology, 2015

The increasing size and complexity of the three dimensional (3D) structures of biomacromolecules ... more The increasing size and complexity of the three dimensional (3D) structures of biomacromolecules in the Protein Data Bank (PDB) is a reflection of the growth in the field of structural biology. Although the PDB archive was initially used only in the field of structural biology, it has grown to become a valuable resource for understanding biology at a molecular level and is critical for designing new therapeutic options for various diseases. The many uses of the PDB archive depend upon on the tools and resources for both data management and for data access and analysis.

Thermodynamics and kinetics of ligand—protein binding studied with the weighted histogram analysis method and simulated annealing

Biocomputing '99, 1998