Nothing Special   »   [go: up one dir, main page]

AlphaMap An Open-Source Python Package For The Visual Annotation of Proteomics Data With Sequence-Specific Knowledge

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Bioinformatics, 2021, 1–4

doi: 10.1093/bioinformatics/btab674
Advance Access Publication Date: 29 September 2021
Applications Note

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab674/6377770 by Academia Sinica user on 10 December 2021


Sequence analysis
AlphaMap: an open-source Python package for the visual
annotation of proteomics data with sequence-specific
knowledge
Eugenia Voytik1,†, Isabell Bludau1,†, Sander Willems1, Fynn M. Hansen1,
Andreas-David Brunner1, Maximilian T. Strauss1 and Matthias Mann 1,2,*
1
Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany and
2
Department of Clinical Proteomics, NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200
Copenhagen, Denmark
*To whom correspondence should be addressed.

The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
Associate Editor: Olga Vitek

Received on July 30, 2021; revised on September 2, 2021; editorial decision on September 13, 2021; accepted on September 22, 2021

Abstract
Summary: Integrating experimental information across proteomic datasets with the wealth of publicly available se-
quence annotations is a crucial part in many proteomic studies that currently lacks an automated analysis platform.
Here, we present AlphaMap, a Python package that facilitates the visual exploration of peptide-level proteomics data.
Identified peptides and post-translational modifications in proteomic datasets are mapped to their corresponding pro-
tein sequence and visualized together with prior knowledge from UniProt and with expected proteolytic cleavage sites.
The functionality of AlphaMap can be accessed via an intuitive graphical user interface or—more flexibly—as a Python
package that allows its integration into common analysis workflows for data visualization. AlphaMap produces
publication-quality illustrations and can easily be customized to address a given research question.
Availability and implementation: AlphaMap is implemented in Python and released under an Apache license. The
source code and one-click installers are freely available at https://github.com/MannLabs/alphamap.
Contact: mmann@biochem.mpg.de
Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction researchers, we developed AlphaMap, a Python package that facili-


tates the visual exploration of peptide-level proteomics data.
Bottom-up mass spectrometry (MS) has become the leading technol-
ogy for identifying and quantifying proteomes (Aebersold and
Mann, 2003, 2016; Müller et al., 2020). Since peptides rather than 2 The AlphaMap computational framework
intact proteins are measured, visualizing identified peptides and
post-translational modifications (PTMs) together with known pro- In line with other recently developed software tools from our lab
tein sequence information is an important aspect of downstream MS (Strauss et al., 2021; Willems et al., 2021), we implemented
data exploration. However, the ability to easily integrate and visual- AlphaMap in pure Python because of its clear, easy to understand
ize experimental data together with already known sequence anno- syntax and the availability of excellent supporting scientific libra-
tations is an unmet need in the proteomics community. Although ries. To read fasta files, we leverage the Pyteomics Python package
established visualization platforms provide manual visualization of (Goloborodko et al., 2013; Levitsky et al., 2019). Plotly is a well-
a single experimental sample or dataset at a time (Omasits et al., established plotting library that we use for generating AlphaMap’s
2014), there is a lack of tools that support state-of-the-art data ana- sequence visualization (Plotly Technologies Inc., 2015), allowing
lysis software frameworks and that can visualize experimental se- flexible customization and great user interactivity. To enable easy
quence coverage across multiple samples or datasets in combination access to the AlphaMap functionality with a low barrier of entry, a
with available sequence annotations mined from UniProt, the stand- stand-alone graphical user interface (GUI) was implemented using
ard knowledgebase for protein information (Bateman, 2019). To the Panel library (Rudiger et al., 2021). AlphaMap can be launched
make this wealth of information easily accessible to proteomics either as a browser-based GUI after simple local installation or as a

C The Author(s) 2021. Published by Oxford University Press.


V 1
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
journals.permissions@oup.com
2 E.Voytik et al.

standard Python module installed via PyPI (Python Software encourage the broader community to integrate AlphaMap in their
Foundation, n.d.) or directly from its GitHub repository. own data analysis and visualization workflows with the possibility
In line with the AlphaPept ecosystem (Strauss et al., 2021), we to easily adopt the code according to specific needs.
make the AlphaMap code openly available on GitHub, using its
many supporting features for unit and system testing via GitHub

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab674/6377770 by Academia Sinica user on 10 December 2021


actions. For code development, we adopted the concept of ‘literate 3 Overview of the AlphaMap workflow
programming’ (Knuth, 1984), which combines the algorithmic code
with readable documentation and testing. Using the nbdev package, AlphaMap uses peptide-level proteomics data as input. It currently
the codebase can directly be inspected in well documented Jupyter supports the direct import of data processed by MaxQuant (Cox
Notebooks, from which the code is automatically extracted and Mann, 2008), Spectronaut (Bruderer et al., 2015), DIA-NN
(Kluyver et al., 2016). We envision that these design principles will (Demichev et al., 2020), FragPipe (Kong et al., 2017) and our

Fig. 1. (A) Overview of the AlphaMap workflow from MS data upload to the interactive sequence visualization. (B) Exemplary sequence visualization for epidermal growth
factor receptor (EGFR). A zoom-in on a selected sequence region, indicated by dashed lines, is provided at the lower part of the panel
AlphaMap: Interactive protein sequence visualization 3

recently introduced AlphaPept framework (Strauss et al., 2021). In processing with the goal to achieve a more complete sequence
contrast to Protter (Omasits et al., 2014), users can select multiple coverage.
independent datasets for co-visualization. These could either have
been processed by the same or with different MS analysis tools. It is
also possible to select only a single sample, or a subset of samples of

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab674/6377770 by Academia Sinica user on 10 December 2021


a given input file for individual sequence visualization. In addition 5 Conclusion
to the peptide-level data generated from LC-MS analysis, AlphaMap AlphaMap offers an interactive GUI and a Python package for visu-
leverages a plethora of manually curated sequence-specific protein alizing peptide-level bottom-up proteomics data on the basis of indi-
level information available from UniProt. Fasta files and UniProt se-
vidual protein sequences, including information of curated UniProt
quence annotations are readily accessible in AlphaMap for the 13
sequence annotations and expected proteolytic cleavage sites. We
most popular UniProt organisms as well as for SARS-CoV and
expect that future developments by us and the community will ex-
SARS-CoV-2. Functionality to enable the integration of additional
tend the variety of available annotations in AlphaMap, for example
organisms is further available as part of our Python package.
by including prior knowledge of sequence conservation or predicted
Finally, the user can select the different layers of information that
should be displayed in the interactive sequence representation, functional domains. In addition, we will integrate quantitative infor-
including selected protease cleavage sites and UniProt sequence mation and differential analysis results into the AlphaMap sequence
annotations. Figure 1A shows a schematic overview of the representations. We envision that AlphaMap will assist MS-based
AlphaMap workflow. Detailed instructions for its installation and proteomics researchers in inspecting peptide- and PTM-level data,
usage are further provided in the supplementary user guide. In add- thereby providing valuable information in the process of candidate
ition to interactive sequence visualization of a user-selected protein, validation in biological and clinical context.
AlphaMap provides individual links to external databases and tools
for further sequence evaluation in UniProt (Bateman, 2019),
PhosphoSitePlus (Hornbeck et al., 2015), Protter (Omasits et al., Author contributions
2014), PDB (Berman et al., 2000) and Peptide Atlas (Desiere et al., I.B. conceptualized the project and together with E.V. and M.M.
2006). wrote the manuscript with contributions from all authors. I.B. and
E.V. implemented the core AlphaMap functions. E.V. implemented
the GUI. S.W. provided important help with the AlphaMap instal-
4 Application of AlphaMap to investigate full lers. F.M.H. and A.-D.B. provided valuable ideas for the concept
proteome and PTM data and visualization in AlphaMap and F.M.H. further contributed by
Figure 1B shows the sequence visualization of the peptides and rigorous testing. M.T.S. designed the general AlphaPept ecosystem
PTMs identified for the epidermal growth factor receptor (EGFR) and assisted with the nbdev environment. M.M. supervised the
in human A549-ACE2 cells that were infected with SARS-CoV-2 study and provided critical feedback on all aspects of the presented
or SARS-CoV (an exemplary viral protein detected in this dataset work.
is visualized in the Supplementary Material) (Stukalov et al.,
2021). We show three independent experimental traces: one for
full proteome data, one for phospho-enriched peptides and one Funding
for ubiquitin-enriched peptides. The proteome data indicates a
This study was supported by The Max-Planck Society for Advancement of
homogeneous coverage across the entire protein sequence. As
Science and by the Bavarian State Ministry of Health and Care through the re-
expected, phosphorylation and ubiquitination are limited to the
C-terminal region of the protein, which is annotated to be search project DigiMed Bayern (www.digimed-bayern.de, G64b-A1070-
exposed to the cytosol. In addition, the kinase domain of EGFR is 2018/131-2 DMB-1805-0008). I.B. acknowledges funding support from her
highly ubiquitinated in our dataset, whereas the surrounding Postdoc.Mobility fellowship granted by the Swiss National Science
cytosolic regions are phosphorylated. Interestingly, AlphaMap Foundation [P400PB_191046].
reports that most of our observed phosphorylation sites have been
Conflict of Interest: none declared.
previously identified, whereas none of the identified ubiquitina-
tion sites are annotated in UniProt. Please note that unmodified
peptides are also observed in both the phospho- and ubiquitin-
enriched samples due to the imperfect selectivity of enrichment Acknowledgements
protocols. The authors thank Julia Schessner, Barbara Steigenberger, Jakob Bader and
Beyond the uses highlighted here, we envision AlphaMap to fa- Sophia Mädler for testing and providing critical feedback on AlphaMap.
cilitate data analysis and interpretation for a variety of different They are grateful to Özge Karayel and Maria C. Tanzer for valuable discus-
applications: sions and for providing experimental data.

• Candidate validation: AlphaMap can be used to assess the se-


quence coverage of identified biomarker candidates (or other References
proteins of interest) to evaluate possible sequence variations or Aebersold,R. and Mann,M. (2003) Mass spectrometry-based proteomics.
unexpected anomalies on the basis of readily available sequence Nature, 422, 198–207.
information. Aebersold,R. and Mann,M. (2016) Mass-spectrometric exploration of prote-
• Preparation of panels for publication: Sequence visualizations ome structure and function. Nature, 537, 347–355.
Bateman,A.; UniProt Consortium. (2019) UniProt: a worldwide hub of protein
from AlphaMap can directly highlight the precise MS derived in-
knowledge. Nucleic Acids Res., 47, D506–D515.
formation about proteins of interest in biological or clinical Berman,H.M. et al. (2000) The protein data bank. In Nucleic Acids Res., 28,
projects. 235–242.
• Technical comparisons: AlphaMap can be used to evaluate se- Bruderer,R. et al. (2015) Extending the limits of quantitative proteome profil-
quence coverage between different data acquisition strategies ing with data-independent acquisition and application to
acetaminophen-treated three-dimensional liver microtissues. Mol. Cell.
such as data-dependent and data-independent acquisition, alter-
Proteomics, 14, 1400–1410.
native instrument platforms or software tools. Cox,J. and Mann,M. (2008) MaxQuant enables high peptide identification
• Optimization of sample processing: Visualization of protein rates, individualized p.p.b.-range mass accuracies and proteome-wide pro-
cleavage sites for different proteases can help to optimize sample tein quantification. Nat. Biotechnol., 26, 1367–1372.
4 E.Voytik et al.

Demichev,V. et al. (2020) DIA-NN: neural networks and interference correc- Levitsky,L.I. et al. (2019) Pyteomics 4.0: five years of development of a Python
tion enable deep proteome coverage in high throughput. Nat. Methods, 17, proteomics framework. J. Proteome Res., 18, 709–714.
41–44. Müller,J.B. et al. (2020) The proteome landscape of the kingdoms of life.
Desiere,F. et al. (2006) The PeptideAtlas project. Nucleic Acids Res., 34, Nature, 582, 592–596.
D655–D658. Omasits,U. et al. (2014) Protter: interactive protein feature visualization and in-

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab674/6377770 by Academia Sinica user on 10 December 2021


Goloborodko,A.A. et al. (2013) Pyteomics – a python framework for explora- tegration with experimental proteomic data. Bioinformatics, 30, 884–886.
tory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Plotly Technologies Inc. (2015) plotly. Montréal, QC. https://plot.ly (27
Mass Spectrometry, 24, 301–304. September 2021, date last accessed).
Hornbeck,P.V. et al. (2015) PhosphoSitePlus, 2014: mutations, PTMs and Python Software Foundation. (n.d.) Python Package Index – PyPI. https://pypi.
recalibrations. Nucleic Acids Res., 43, D512–D520. org/ (27 September 2021, date last accessed).
Kluyver,T. et al. (2016) Jupyter Notebooks—a publishing format for reprodu- Rudiger,P. et al. (2021) holoviz/panel: Version 0.11.3. doi:
cible computational workflows. In: Positioning and Power in Academic 10.5281/ZENODO.4692827.
Publishing: Players, Agents and Agendas – Proceedings of the 20th Strauss,M.T. et al. (2021) AlphaPept, a modern and open framework for MS-based
International Conference on Electronic Publishing, ELPUB 2016, proteomics. BioRxiv, 2021.07.23.453379. doi:10.1101/2021.07.23.453379.
Göttingen, Germany, pp. 87–90. Stukalov,A. et al. (2021) Multilevel proteomics reveals host perturbations by
Knuth,D.E. (1984) Literate programming. Comput. J., 27, 97–111. SARS-CoV-2 and SARS-CoV. Nature, 594, 246–252.
Kong,A.T. et al. (2017) MSFragger: ultrafast and comprehensive peptide Willems,S. et al. (2021) AlphaTims: indexing trapped ion mobility spectrom-
identification in mass spectrometry-based proteomics. Nat. Methods, 14, etry – time of flight data for fast and easy accession and visualization. Mol.
513–520. Cell. Proteomics, 100149.

You might also like