SPIN enables high throughput species identification of archaeological bone by proteomics

Nature Communications

Species determination based on genetic evidence is an indispensable tool in archaeology, forensics, ecology, and food authentication. Most available analytical approaches involve compromises with regard to the number of detectable species, high cost due to low throughput, or a labor-intensive manual process. Here, we introduce “Species by Proteome INvestigation” (SPIN), a shotgun proteomics workflow for analyzing archaeological bone capable of querying over 150 mammalian species by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Rapid peptide chromatography and data-independent acquisition (DIA) with throughput of 200 samples per day reduce expensive MS time, whereas streamlined sample preparation and automated data interpretation save labor costs. We confirm the successful classification of known reference bones, including domestic species and great apes, beyond the taxonomic resolution of the conventional peptide mass fingerprinting (PMF)-based Zooarchaeology by Mass Sp...

ARTICLE https://doi.org/10.1038/s41467-022-30097-x OPEN SPIN enables high throughput species identiﬁcation of archaeological bone by proteomics 1234567890():,; Patrick Leopold Rüther 1 ✉, Immanuel Mirnes Husic1, Pernille Bangsgaard2, Kristian Murphy Gregersen3, Pernille Pantmann4, Milena Carvalho5,6, Ricardo Miguel Godinho5, Lukas Friedl5,7, João Cascalheira5, Alberto John Taurozzi 2, Marie Louise Schjellerup Jørkov8, Michael M. Benedetti 5,9, Jonathan Haws Nuno Bicho 5, Frido Welker2, Enrico Cappellini 2 & Jesper Velgaard Olsen 1 ✉ 5,10, Species determination based on genetic evidence is an indispensable tool in archaeology, forensics, ecology, and food authentication. Most available analytical approaches involve compromises with regard to the number of detectable species, high cost due to low throughput, or a labor-intensive manual process. Here, we introduce “Species by Proteome INvestigation” (SPIN), a shotgun proteomics workﬂow for analyzing archaeological bone capable of querying over 150 mammalian species by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Rapid peptide chromatography and data-independent acquisition (DIA) with throughput of 200 samples per day reduce expensive MS time, whereas streamlined sample preparation and automated data interpretation save labor costs. We conﬁrm the successful classiﬁcation of known reference bones, including domestic species and great apes, beyond the taxonomic resolution of the conventional peptide mass ﬁngerprinting (PMF)-based Zooarchaeology by Mass Spectrometry (ZooMS) method. In a blinded study of degraded Iron-Age material from Scandinavia, SPIN produces reproducible results between replicates, which are consistent with morphological analysis. Finally, we demonstrate the high throughput capabilities of the method in a high-degradation context by analyzing more than two hundred Middle and Upper Palaeolithic bones from Southern European sites with late Neanderthal occupation. While this initial study is focused on modern and archaeological mammalian bone, SPIN will be open and expandable to other biological tissues and taxa. 1 Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark. 2 Globe institute, University of Copenhagen, Copenhagen, Denmark. 3 Institute of Conservation, Royal Danish Academy, Copenhagen, Denmark. 4 Dept. of Archaeology, Museum Nordsjælland, Copenhagen, Denmark. 5 Interdisciplinary Center of Archaeology and Evolution of Human Behavior, University of Algarve, Faro, Portugal. 6 Departmrent of Anthropology, University of New Mexico, Albuquerque, NM, USA. 7 Dept. of Anthropology University of West Bohemia, Pilsen, Czech Republic. 8 The Laboratory of Biological Anthropology, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark. 9 Department of Earth and Ocean Sciences, University of North Carolina Wilmington, Wilmington, NC, USA. 10 Department of Anthropology, University of Louisville, Louisville, KY, USA. ✉email: patrick.ruether@palaeome.org; jesper.olsen@cpr.ku.dk NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x olecular identiﬁcation methods enable species determination of archaeological objects and other biological samples independent of morphological features. Genetic methods analyze species-speciﬁc variations either in DNA, which provides maximum resolution1,2, or in proteins, which have superior longevity and can be analyzed without ampliﬁcation bias3,4. The two most common approaches to protein-based species identiﬁcation are immuno-assays for targeting a single or few selected species at a time5 and mass spectrometry (MS) for global species determination6,7. Here, we employ MS-based methods because they offer greater ﬂexibility and can cope with more degraded and contaminated material8,9. The typical workﬂow for MS-based protein analysis involves tryptic digestion of extracted proteins into shorter peptides, which are subsequently analyzed by liquid chromatographytandem mass spectrometry (LC-MS/MS)10 or by peptide mass ﬁngerprinting (PMF)11. Peptide sequencing by LC-MS/MS using high-resolution MS instrumentation facilitates conﬁdent identiﬁcation of proteins in a complex biological matrix and is therefore the standard technology in proteomics. The technique is broadly compatible with virtually any type of protein and suitable for single and mixed species analysis. The recent development of high-throughput data acquisition and interpretation workﬂows for clinical diagnostics12,13 have yet to be adapted with regard to the samples and research questions relevant for molecular species identiﬁcation. As of now, species identiﬁcation by LC-MS/MS is primarily reserved for a few individual samples of high proﬁle that justify the high analytical costs, long analysis time, and complex data interpretation14,15. Conversely, most studies with many samples continue to be based on the faster and more costefﬁcient PMF16,17. Here, peptides are analyzed without chromatographic separation by matrix-assisted laser desorption/ionization time-of-ﬂight (MALDI-TOF) MS, which delivers high throughput but has limited versatility and dynamic range, and suffers from lower conﬁdence in peptide identiﬁcations due to a lack of MS/MS-based sequencing. The downside of this is that the samples cannot be too complex and should contain, or be dominated by, only one or few proteins, such as type I collagen (Collagen alpha-1(I) chain and Collagen alpha-2(I) chain) in bone. A popular PMF-based workﬂow for collagen-based species identiﬁcation is “Zooarchaeology by Mass Spectrometry” (ZooMS)18,19. Although applicable to many groups of vertebrates, the maximum obtainable species resolution remains limited by the relatively small number of analyzed peptides from just two evolutionary conserved proteins. Consequently, there are many species that are indistinguishable by ZooMS, in particular within closely related genera. Unsurprisingly, their bones often share a highly similar morphology and therefore frequently require molecular species identiﬁcation for unambiguous classiﬁcation. With “Species by Proteome INvestigation” (SPIN), we present an LC-MS/MS-based species identiﬁcation workﬂow that combines sequencing-based reliability from conventional LC-MS/MS with the speed and cost-effectiveness from ZooMS, ultimately overcoming the current limits of both these two approaches. To facilitate the analysis of large cohorts with hundreds to thousands of samples, major bottlenecks in sample preparation, data acquisition, and conclusive species assignment were removed. While the SPIN concept can in principle be applied to many protein-containing materials, we selected bone remains for its ﬁrst implementation, due to their high relevance in archaeology and forensics. Preparing hundreds of samples per day requires a scalable, reproducible, and automated procedure with few manual steps resulting in peptides without contaminants that could interfere with MS analysis. To achieve this, we took advantage of the protein aggregation capture (PAC) cleanup and digestion method using magnetic beads20,21 that have previously been applied to M 2 bone proteins22,23. PAC can be transferred to a robotics platform24 and its robustness allowed us to use a combined bonedemineralization/protein-extraction solution and thereby omit buffer exchange, a time-consuming step during which signiﬁcant protein loss can occur. Although this protocol is not necessarily faster than standard in-solution digestion25 for a single sample, the automation and protein-level cleanup facilitate the scalability needed for routine high-throughput analysis. The cost of high-resolution tandem mass spectrometers poses a major barrier to the application of conventional LC-MS/MS for large-scale species identiﬁcation. Instead of compromising on technology, we aimed at reducing the cost per sample by shortening the acquisition time by an order of magnitude compared to previous LC-MS/MS-based species identiﬁcation strategies. This became possible due to two recent developments, liquid chromatography (LC) systems with signiﬁcantly shorter overhead time between analyses26 and software for the conﬁdent peptide identiﬁcation based on data-independent acquisition (DIA)27,28. In addition to higher throughput and lower analysis costs, shorter gradients are beneﬁcial for sensitivity and the Evosep LC makes tedious preparation steps like peptide elution after cleanup, solvent evaporation, and MALDI target plate spotting obsolete. As an alternative to using the latest LC-MS/MS equipment, the data acquisition methods required for SPIN should theoretically be transferable to previous generations of Orbitrap MS instruments29,30 and lower sensitivity microﬂow LC systems31 if compensated with higher peptide loads. The interpretation of the peptide identiﬁcation data to determine the most probable species is an effort that is often neglected when comparing different technologies. Particularly in large projects with optimized and parallelized laboratory processes, downstream data interpretation is often the major bottleneck if done manually. Nevertheless, this has been the case for virtually all species identiﬁcation studies based on proteins15,32, even despite the development of a software tool for species analysis by PMF of collagen33 that has unfortunately not been made publicly available. In the SPIN workﬂow, peptides are identiﬁed using broadly available and conﬁdence-controlled peptide search software, such as Spectronaut27 and Maxquant34, followed by data interpretation by a newly developed automated species inference algorithm that rationally compares the MS results with a species database by sequence alignment. In contrast to the previously described tool for ZooMS33, our algorithm operates without untraceable machine learning steps and does not strictly require experimental spectral libraries. We optimized our methods with reference bones from nine domesticated animals, human, and three other great ape species, validated the reproducibility, and benchmarked comparability to morphology-based identiﬁcation with more than 60 partially degraded bones from the Danish Early Iron-Age. Finally, we stress-tested the SPIN approach with a set of over 200 Middle and Upper Palaeolithic bone fragments from three archaeological sites in Portugal dating to 30–60,000 BP. We demonstrate that mammalian species families and taxa can be correctly assigned in most cases, while misassignments due to low signal intensity or gaps in the species sequence database can be avoided through thresholds based on two different quality control conﬁdence scores. Comparative analysis of the same peptides from the references and a subset of the archaeological bones by PMF-based ZooMS analysis conﬁrmed superior separation of closely related taxa and higher sensitivity with SPIN. Results Streamlined and automated sample preparation. To increase sample processing throughput, we developed a sample preparation NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x a DDA - 100 spd 80 % EtOH wash 5 % HCl 0.1 % NP-40 Desalting Database search 100 % ACN wash 70 % ACN wash Gelatinization, reduction & alkylation 11 min gradient DDA MS2 OR Library search 10 mM TCEP 20 mM NEM Extract + beads + 60 % ACN c Identified precursors 4000 Specificity Nterm Cterm tryptic 3000 2000 1000 ap Str G AS P In -s ol ut io n an ua SP l IN ro bo t FA SP SP IN m Library DIA Direct DIA DDA 2000 1500 1000 500 0 0 Library-free search DIA XIC DIA - 200 spd d Identified precursors b 5.6 min gradient Lys-C & Trypsin digestion Automated cleanup & digestion 0 2 4 6 8 10 Cumulative identified sites Demineralization & extraction 5 mg Library DIA DDA Direct DIA 6000 4000 2000 0 0 Retention time [min] 5 10 15 20 Gene rank Fig. 1 High-throughput bone proteome analysis workﬂow and benchmark. a Sample preparation and data acquisition. Proteins were obtained from bone chips or powder by simultaneous demineralization and extraction. Cleanup by Protein Aggregation Capture and digestion can be automated by a magnetic bead-handling robot. Peptides were rapidly separated and analyzed by tandem mass spectrometry with a throughput of 100 samples per day (spd) in datadependent or 200 spd in data-independent acquisition mode. b Performance of the “Species by Proteome INvestigation” (SPIN) sample preparation protocol executed manually (SPINmanual) or by robot (SPINrobot), compared to other common sample preparation techniques. Methods were compared by analyzing Pleistocene mammoth bone powder from the same batch in n = 3 technical replicate experiments with a 60 spd gradient and data-dependent acquisition. Bars indicate mean precursor identiﬁcations obtained with each method separated by enzymatic cleavage speciﬁcity shown by color with sand color for tryptic, dark blue for semi-tryptic with non-tryptic C-terminus and light blue for semi-tryptic with non-tryptic N-terminus. Error bars are centered at the total number of precursor identiﬁcations and indicate standard deviation. c Comparison of precursor identiﬁcations accumulated over retention time between the fast 100 spd DDA method analyzed by conventional database searching (sand color) and the rapid 200 spd DIA method analyzed with a library-based (dark blue) vs. library-free (light blue) approach. Peptides were generated by SPIN using a bovine bone and analyzed by LC-MS/MS with the two acquisition methods. d Gene-wise cumulative absolute amino acid coverage based on the precursors identiﬁed in c shown over the top 20 genes ranked by the number of precursors. Color indicates acquisition and peptide identiﬁcation method with sand color for DDA, dark blue for library-based DIA, and light blue for library-free DirectDIA. protocol consisting of only a few manual steps for easy scale-up (Fig. 1a). A combination of hydrochloric acid and the non-ionic detergent NP-40 in the extraction buffer facilitated demineralization and protein extraction in the same mixture, which otherwise usually requires intermediate centrifugation and buffer exchange35. Compared to other tested combinations, HCl and NP-40 were the only candidates that enabled efﬁcient protein cleanup without precipitation (Table S1) and high peptide identiﬁcation rates in the subsequent LC-MS/MS analysis (Fig. S1). We optimized the demineralization (Fig. S2) and extraction times (Fig. S3) and tested the effect of reduction and alkylation during the extraction step (Fig. S4). Contaminants, detergents, and minerals were removed using a modiﬁed protein aggregation capture (PAC) protocol with optimized solvent amounts (Fig. S5), which was automated on a magnetic bead-handling Kingﬁsher robot starting with only 5 mg of bone material. When parallelized with fast analysis, the automated sample preparation enables a single laboratory operator to continuously process and analyze 200 bone samples per day (Fig. S17). We benchmarked our new sample preparation workﬂow against the commonly used “insolution” digestion protocol25, “ﬁlter-aided sample preparation” (FASP)36, “gel-aided sample preparation” (GASP)37, and the more recent “S-trap”38 using 5 mg per replicate of the same Pleistocene Mammoth bone sample39 for all methods (Fig. 1b). The number of identiﬁed peptide precursors by LC-MS/MS was the lowest for FASP and GASP, which was probably due to losses in the ﬁlter or gel. “In-solution” and “S-trap” performed about two-fold better, although “in-solution” had a relatively poor digestion efﬁciency leading to more missed tryptic cleavages (Fig. S6). The protocol developed for SPIN produced signiﬁcantly more peptide precursor identiﬁcations by almost a factor of two compared to S-trap and in-solution. When comparing robotic processing to manual sample preparation following the SPIN protocol, we observed similar numbers of peptide identiﬁcations, but better reproducibility with the robot. Importantly, from a practical standpoint, the sample preparation procedure in SPIN requires much less handson time than FASP and GASP, which cannot easily be scaled to 96-well format. Moreover, although the “in-solution” digestion protocol is relatively fast, the lack of protein cleanup makes it more susceptible to contamination problems, which complicates scale-up. Finally, from an economic standpoint, “S-trap” can be costly due to the requirement for proprietary ﬁlter devices, whereas the modiﬁed PAC workﬂow in SPIN can be performed with any type of magnetic beads and a simple magnet rack instead of a Kingﬁsher robot. Rapid peptide identiﬁcation. To maximize scalability and throughput, SPIN uses very short online LC gradients to speed up the chromatographic separation of the peptide mixtures in line with Orbitrap tandem MS. We achieved sufﬁcient proteome coverage for species discrimination (Fig. 1c) at a throughput of 100 samples per day with data-dependent acquisition (DDA, Fig. S7) and 200 samples per day with data-independent acquisition (DIA, Fig. S8, Table S2)26. Consequently, data acquisition in SPIN is NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x roughly 20 times faster than common practice in palaeoproteomics (<10 samples per day)40 and 4 times faster than high-throughput plasma proteomics methods (60 samples per day)12. In a modern bovine bone sample, fast-scanning DDA identiﬁed about 1200 peptide precursors in 11 min, while DIA reached the same number of identiﬁcations in 5.6 min when analyzed without a spectral library by directDIA (Fig. 1c). Almost twice as many peptide precursors were identiﬁed by analyzing the same ﬁle using a dedicated spectral library that was generated once for each species by DDA analysis of ofﬂine fractionated peptide mixtures41 (Figs. S9 and S10). However, as spectral library-based DIA yielded more overlapping peptide precursors, it did not result in a proportional gain in absolute sequence coverage (Fig. 1d). Across the 40 modern reference samples, the DDA method had a median coverage of 3678 amino acids and thereby outperformed the spectral library-free directDIA approach with 3226 amino acids (Fig. S11). The highest median coverage of 4480 amino acids was achieved with librarybased DIA. As expected, sequence coverage was highest for the two most abundant bone proteins, COL1A1 and COL1A2. Since there was almost no additional coverage gained by including more protein sequences for the database search than the top 20 most abundant protein-coding genes, we decided to focus the SPIN analysis on only those 20 genes and thereby reduce noise and simplify the protein sequence database assembly and alignment (Supplementary Data 1). Automated species inference strategy. The completeness and quality of protein sequence databases are vastly different between taxa (Fig. 2b). Missing genes, gaps, and stretches of incorrect amino acid sequences can introduce a bias towards wellannotated species when the proteomics-based taxonomic assignment is performed based on simple metrics like the number of identiﬁed peptides or protein groups. Therefore, we built a species inference algorithm encompassing site-based species-tospecies comparisons (Fig. 2a). It is based on a gene-wise multiple sequence alignment (MSA) for all protein sequences across all available mammalian species for each of the 20 most highly expressed protein-coding genes in bone (Supplementary Data 1). Further manual reﬁnement was needed to remove faulty sequence inserts or obvious prediction errors like frameshifts and the SPIN sequence inference algorithm was conﬁgured to automatically remove species (21 out of 177 in the current database) that lack more than 5 out of the 20 genes, because species with too many missing genes could not be reliably assigned. For all species identiﬁed in this study, sequences for all 20 genes were available for all taxa, except the white-tailed deer (19 genes), the European bison (15 genes), and the aurochs (15 genes). To conﬁrm that the sequence information of the 20 genes is sufﬁcient for resolving the taxonomy of all the 156 species in the database, we built a phylogenetic tree based on the reﬁned sequences of the 20 most highly expressed bone proteins. Reassuringly, we observed that its topology matched the currently established phylogeny based on morphological and genomic data (Figs. 2b and S16). The protein sequence alignment was also the basis for creating a “site-speciﬁc difference matrix” by performing every possible species-to-species comparison. For all pairwise comparisons, this matrix only considers sites that are known for both species and have different amino acids. The aligned database was also used for mapping the LC-MS/MS-based peptide identiﬁcations to the correct genes and locations (“Mapping to alignment”, Fig. 2a). Once mapped, the peptide data could be converted to site-level by splitting peptide identiﬁcations into amino acid identiﬁcations. In this format, the data can be used to score every species-to-species comparison in the site difference matrix. To best integrate the multiple MS metrics in the scoring scheme, peptide intensity, precursor count, 4 peptide count, and maximum score, were scaled and combined into a normalized joined score (J-Score). The J-Score does not necessarily reﬂect the actual amino acid probability for each site but assigns a higher weight to amino acids with better underlying data. The summed J-Scores were used for determining the winner of every species-to-species comparison in the site-speciﬁc difference matrix (Fig. 2c). The algorithm allows for a single or multiple indistinguishable species to win most comparisons depending on the sequence coverage and the number of closely related species in the protein sequence database. We added an optional “ﬁne grouping” step using a manually curated list of marker peptides to keep the phylogenetic placement between closely related species consistent, even at low sequence coverage. To this point, a species was assigned to every sample, even including the blanks. The algorithm includes two mechanisms for controlling and minimizing the false-discovery rate (FDR), one to identify samples with too low signal, like blanks, and a second one to control for species that are not yet in the database. Samples with low peptide intensity were removed through quantitative comparison to the abundance of autolysis-derived protease peptides, which we treated like a spike-in standard. The threshold was automatically calibrated based on relative protease intensity in laboratory blanks. The second control mechanism was aimed at the identiﬁcation of species with insufﬁcient sequence coverage, as this would lead to unreliable classiﬁcation. Therefore, we extended the database with an equally sized set of decoy species (randomly generated chimera species). The ﬁnal results were then ranked by sequence coverage and a cutoff was applied to keep the number of decoy identiﬁcations below 1%. The comparison of site coverage and relative protease abundance demonstrated that most of the blanks along with the empty samples were successfully removed using the two thresholds and that the coverage decreases with sample age (Fig. 2d). To ensure global FDR control and for easy comparability between samples, all samples analyzed in the entire study were processed in the species inference pipeline, together. The species results were collected in a global result table (Supplementary Data 2) and the proteomics results converted to site-level were used to create a consensus of the identiﬁed sequences for each sample (Supplementary Data 3). Validating SPIN with bones from known species. We optimized and assessed the performance of the different data acquisition and interpretation strategies using a set of 49 known reference bones from 13 species (Fig. 3a). We compared the sequence coverage (Fig. S11) and shared and unique peptide identiﬁcations (Fig. S12) between the three data types library-based DIA, directDIA, and DDA. All samples were placed in the correct genus using library-based DIA, whereas spectral library-free directDIA could not differentiate human from chimpanzee, and DDA was not able to exclude goats for one of the eight sheep samples. Interestingly, all three methods performed equally well, when it came to the placement of taxa within the families. Within bovines, the domestic cattle (Bos taurus) could be distinguished from European bison (Bison bonasus) but not from the aurochs (Bos primigenius). The European bison itself could not be discriminated against from American bison (Bison bison) and yak (Bos mutus) and in one case, from zebu (Bos taurus indicus) (Fig. 3b). The closely related goat (Capra hircus) and sheep (Ovis aries), were correctly identiﬁed in all DIA analyses and 10 out of 11 samples in DDA analysis. Within equines, domestic horse (Equus ferus caballus) was successfully discriminated from donkey (Equus africanus asinus), but not the Mongolian wild horse (Equus ferus przewalskii) (Fig. 3c). Fine-grouping was not actually required to distinguish goat and sheep or horse and donkey, but it made the caballus/przewalskii classiﬁcation more uniform. NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x a 20 Genes Reannotation & alignment Selected peptides Site difference matrix 23H Uniprot COL1A1 NCBI Peptides identified by LC-MS/MS F2 AHSG Equidae Equidae 7A 124A 23Q 23N Decoy GASGPAGVR 7A 65R GATGPAGVR 7P 7P 65K 124V Mapping to alignment E. caballus E. przewalskii Equidae Scoring variable sites E. asinus Fine grouping PPGTPGR VGPAGT # Precursors Intensity Score PGHVGPP # Precursors Intensity Score # wins 1 2 3 4 5 6 7 8 9 11 12 13 7 P 2 7e6 85 0.84 7 A 1 2e5 50 0.16 Bovidae Chiroptera Eulipotyphla Leporidae Muridae Rodentia Strepsirrhini Platyrrhini Cercopithecidae Equus przewalskii Pan troglodytes Homo sapiens Equus caballus Gorilla gorilla Ovis aries 8 10 10 11 12 7 6 5 5 0 2 3 1 Sus scrofa 0 15 8 98 137 128 137 151 150 149 149 Equus asinus 0 Capra hircus # wins Carnivora Equus Camelidae Cetadea Pongo abelii Bos primigenius 1 Bos taurus Bison bonasus Bison bonasus Bos primigenius Feliformia d Comparison Lower score Higher score Same sites 0 16 9 99 138 129 138 151 150 149 149 Bos taurus 1 0 18 12 115 153 145 155 179 176 175 178 Capra hircus 36 36 42 4 129 169 161 171 198 196 194 197 Ovis aries 39 39 44 12 134 173 165 176 202 200 198 200 Sus scrofa 32 32 49 40 37 Equus asinus 36 36 55 47 Equus przewalskii 31 31 50 38 Equus caballus 35 35 54 46 Gorilla gorilla 29 28 51 47 103 98 106 143 140 137 142 37 65 1 43 71 2 43 82 Pan troglodytes 29 28 50 47 43 81 Homo sapiens 29 28 51 47 43 80 Pongo abelii 31 30 51 47 42 82 Hominoidae 5 43 71 5 118 114 114 114 0 104 101 101 102 0 116 113 113 113 89 83 89 6 8 7 83 8 8 9 89 85 90 9 4 14 2 14 4 86 82 87 16 13 9 12 Blank 40 0 6000 Sample set Decoy Metatheria Afrotheria AHSG ALB BGN CHAD COL11A1 COL11A2 COL12A1 COL1A1 COL1A2 COL22A1 COL2A1 COL3A1 COL4A4 COL5A1 COL5A2 COL5A3 F2 LUM POSTN SERPINF1 0.03 Database Uniprot UniRef Manual NCBI Both 1 2 c Distance 5 0 Site AA Prec. Intensity Score J-Score b Peptides 0 PGHVGPP VGPAGT PPGTPGR Identified sites # Precursors Intensity Score Blank Portugal Reference Denmark 4000 2000 0 100% 10% 1% 0.1% 0 100 Relative protease intensity Fig. 2 Data analysis pipeline for species identiﬁcation. a The aligned protein database, species difference matrix, and manually curated species marker peptides (top row) are used at multiple stages of the data processing pipeline (bottom row): Peptide identiﬁcations were converted to site-level, scored by joining intensity, score, and number of precursors (J-Score) and used to identify the winner of all possible species-to-species comparisons. Fine resolution of closely related species can be further improved by using manually selected species marker peptides. b The mammalian species database comprising 20 genes across 177 species (156 species with >14 genes) was generated by merging Uniprot and NCBI with manually curated and reannotated protein sequences. The phylogenetic tree was generated from the protein database using Fast Tree and FigTree. Color indicates database source with dark blue for sequences with annotated gene name from Uniprot, light blue for sequences from Uniprot with gene name added from UniRef, sand color for sequences from NCBI, pink for sequences available in both databases, and black for manually added sequences. c Example species competition matrix for the reference sample “Ovis_07” only showing the 13 reference species. White numbers indicate the summed joint scores (J-Score) of the speciesdiscriminating sites. Gray cells indicate species pairs, where no species-discriminating sites have been identiﬁed in the sample. Pink indicates that the left species wins and blue indicates that the top species wins the comparison. The phylogenetic tree is a subset of the tree in panel b. The complete species competition matrices comprise all 156 target and 156 decoy species, i.e., 24,336 comparisons. d Absolute sequence coverage and relative protease intensity in reversed log10-scale for all samples from the three datasets in this study. The vertical site coverage cutoff is used to control the false-discovery rate at 1%. The horizontal protease intensity cutoff excludes samples with low signal (lower than 75% of the blank runs). Independent analysis of both parameters is displayed as histograms. Sample sets are indicated by color with black dots for blank runs, sand color for the Portuguese sample set, light blue for the samples from Denmark, and pink for the reference samples. Besides common domesticated animals and their wild relatives, we explored the potential to detect and correctly identify great apes. While all three peptide identiﬁcation methods could successfully classify human (Homo sapiens), orangutan (Pongo abelii), and gorilla (Gorilla gorilla), only DDA and library-based DIA analyses could correctly separate chimpanzee (Pan troglodytes) from Homo sapiens (Fig. 3d). It is noteworthy that both DDA and library-based DIA analysis of two chimpanzee bones assigned one of them to chimpanzee and the other to bonobo (Pan paniscus). Unfortunately, the low quality of the available bonobo protein database prevented closer investigation. These results conﬁrmed that the SPIN workﬂow can be used to classify great apes at the genus level. As a proof-of-concept, we investigated the potential to detect species hybrids with SPIN by analyzing two samples from mules. Focusing on two peptides that are distinct between horse and donkey, one of the two mule samples showed high intensity for both sequence variants, as expected, but the second mule showed peptide intensities typical for a donkey (Fig. 3c). We concluded that SPIN is technically capable of hybrid detection, but an assessment of its reliability would require a larger study size. Performance and reproducibility benchmark. To benchmark the SPIN analysis strategy against standard-practice bioarchaeological species determination based on bone morphology, we analyzed a set of 63 bone fragments related to human activities at the “Salpetermosen Syd 10” site (MNS50010, ZMK5/2013) in Denmark, which dates to the early Pre-Roman Iron-Age (380 BC–540 AD, Fig. 4a)42. Some of the bones showed strong signs of decay due to the age and the wet anoxic conditions in the Salpetermosen bog (Fig. 4b). Each specimen was morphologically NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x a Species match Correct species identification Family match Not matching DDA Dir Lib DIA DIA Identified sites (Lib DIA) 0 2000 4000 Relative protease intensity (Lib DIA) 0% b Bos primigenius Library-based DIA Bos taurus Identify best matching species 5% 20% 100% Blank Determine species family Fine grouping Bos taurus* Bovine phylogeny Bison bonasus Bos primigenius Bos primigenius* Bos taurus Bos indicus Bison bonasus* Bos mutus Bison bonasus Bison bison Bubalus bubalis Best match 0.001 c Equus phylogeny Equus asinus Capra hircus Equus caballus 0.0003 Equus caballus* Equus przewalskii Equus caballus Best match Equus przewalskii* Fine grouping Equus asinus* log10(Intensity) GATGPAGVR Ovis aries Fine grouping P36 4 2 0 0 2 4 6 log10(Intensity) GASGPAGVR Fine grouping Species Equus asinus Equus asinus X Equus caballus Equus caballus Equus przewalskii Homo sapiens log10(Intensity) GPTGEPGKPGDK Best match Sus scrofa P35 6 Best match Fine grouping Fine grouping Collagen alpha-2(I) chain 10618 - 10626 Equus asinus Equus przewalskii Best match Collagen alpha-2(I) chain 492 - 508 6 P35 P36 4 2 0 0 2 4 6 log10(Intensity) GPSGEPGKPGDK Pan troglodytes* Gorilla gorilla d Hominoidae phylogeny Pan troglodytes Homo sapiens Pongo abelii Pan troglodytes Gorilla gorilla Homo sapiens Pan paniscus Gorilla gorilla Pongo abelii Pongo abelii Blank Nomascus leucogenys 0.002 Hylobates moloch DDA Dir DIA Lib DIA Any Fig. 3 Reference species analysis. a Species identiﬁcation results based on DDA, directDIA, and library-based DIA analysis. Dark blue boxes indicate correct identiﬁcation of a single or multiple indistinguishable (marked by asterisk) species. Light blue indicates species that could not be separated from their closest relatives. Blanks that were below the relative protease intensity threshold are shown in pink. The “identiﬁed sites” bar chart shows the absolute amino acid coverage in blue for sites matching the true species and pink for non-matching sites. The relative protease intensity was calculated by dividing the intensity of protease peptides by total intensity and plotted in log-scale. b Bovine species identiﬁcations obtained by library-based DIA analysis. Phylogeny was based on the protein database. Correctly identiﬁed single or indistinguishable species are highlighted in blue. Inconsistent identiﬁcations are marked in pink. Best Matching species are on the left and the reﬁned “ﬁne-grouping” on the right side. c Same display as in b for equine species analyzed by library-based DIA data. The additional plots on the right side show the log10 intensity of two species-discriminating peptides for the horse isoform on the xaxis and the donkey isoform on the y-axis. Missing quantiﬁcations are shown as zero log10 intensity. Donkey samples are marked in sand color, horse in dark blue, Przewalskii horse in light blue, and hybrids pink. d Species identiﬁcations after ﬁne-grouping for great apes comparing the three different peptide identiﬁcation strategies. Correctly identiﬁed genus is highlighted in blue. Broader matches within the family are marked in pink. Differences between the three identiﬁcation strategies were only observed for the genus Pan. analyzed by an experienced zooarchaeologist, and the SPIN analysis was conducted in technical duplicates starting on bone powder level. Variations in the input amount, peptide recoveries, and LC-MS/MS performance resulted in one experiment with higher and one with lower average MS intensity. The study was blinded by keeping the morphological species identiﬁcation undisclosed until the SPIN analysis was ﬁnalized. SPIN analysis using the 5 min DIA method and ﬁne-grouping resulted in 49 exact and 3 approximate species identiﬁcations in the higher intensity experiment. In case of the lower-intensity replica experiment, we obtained 44 exact and 2 false species 6 identiﬁcations (Fig. 4c). The remaining 11 (high-intensity replicate) and 17 (low-intensity replicate) samples were excluded by the algorithm due to high relative protease intensity. The laboratory blanks were also correctly excluded. The comparison of replicates showed perfect reproducibility between duplicates with site coverage >3000 amino acids. We observed four cases of missing identiﬁcations in the high-intensity replicate that were identiﬁed in the low-intensity experiment, which we attributed to variability in the bone chips, input amounts, and peptide recovery. Importantly, there were no contradicting species identiﬁcations between the two replicates. For 94 out of 98 NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x d Rel. protease intensity Identified sites signal too low Cervidae Capra hircus Växjö Ovis aries Sweden Aalborg Bos/Bison Boras Sus scrofa Gothenburg Hjørring Equus caballus c a Karlskrona Aarhus Helsingborg Denmark Hillerød Salpetermosen Copenhagen Species by SPIN High 5 / Halmstad lab blank signal too low Sus Equus Odocoileus Capra Capra/Ovis Ovis Bison/Bos/Ovis Bison/Bos Bos 3 11 14 3 Low 5 1 injection amount Malmö 1 Species by morphology Odense 14 Flensburg 2 7 anima yb detaerC tcejorP nuoN eht morf Identified sites Kiel Germany Rostock Lübeck e 5000 4000 b 2000 1000 Rel. Protease Intensity 0% 25% 50% 50 # True identifications 3000 40 30 20 Injection Acquisition method amount 10 75% Laboratory blank 0 10 High Low DDA DirDIA LibDIA 20 30 # False/missing identifications Fig. 4 Species identiﬁcation of bones from the Scandinavian Iron-Age. a Location of the archaeological site “Salpetermosen Syd 10” on Zealand in Denmark in the Hillerød municipality 30 km north of Copenhagen. Map drawn in Mapbox Studio using a custom style. b Cross section of an in situ wetland bone deposit. Scale bar is 50 cm. Four bones were radiocarbon dated between 1720 and 1570 BP. Picture provided by the Museum of North Zealand. c Species identiﬁcation results by SPIN (5 min method, library-based DIA) and by morphological assessment for 63 samples from the Salpetermosen site measured in technical duplicates and 3 blanks. Rows represent individual samples and have been ordered ﬁrst by morphological species assignment and then by decreasing mean site coverage. The upper left and lower right wedge of each cell represent results measured in two separate experiments, one with higher (upper left, dark blue) and the other with lower (lower right, light blue) MS signal intensity. The ﬁrst seven columns indicate SPIN species by blue wedges and morphological species possibilities by pink boxes. Bovine species assignments are combined in column two. The eighth and ninth columns are heatmaps showing the absolute number of covered amino acids and relative protease intensity, respectively. d Summary of SPIN species identiﬁcations from panel c in the replicate with high MS intensity. Bovine identiﬁcations are separated into cow (Bos) and broader bovine identiﬁcations (Bos/Bison). Striped colors indicate samples with insufﬁcient sequence coverage to distinguish closely related taxa. Samples with insufﬁcient sequence coverage for conﬁdent species identiﬁcation are marked as “signal too low” and correctly excluded blanks are marked in black. e Pseudo receiver operating characteristic (ROC) curves for comparing the sensitivity and success rate of three different data acquisition and analysis strategies. Results of each dataset were sorted by decreasing number of identiﬁed sites. The y-axis shows the cumulative number of correct species identiﬁcations in agreement with the morphology. The x-axis shows the cumulative number of false or missing identiﬁcations below the relative protease intensity threshold. Color indicates data acquisition and analysis mode with pink for DDA, dark blue for library-based DIA, and sand color for library-free DirectDIA. Experiments with lower MS intensity are shown by dashed and high intensity by solid lines. identiﬁed samples from both replicates, the SPIN analysis was in agreement with the morphological analysis (Fig. 4c). Two of the inconsistencies were sheep identiﬁcations for a sample that had the appearance of a goat bone, while the other two had low peptide intensity and were classiﬁed as cattle and sheep by SPIN but morphologically closer to pig and cattle, respectively. Sheep and goat, which often cannot be discriminated morphologically, were unambiguously identiﬁed by SPIN in 39 cases and could not be distinguished in two. For cattle, only Bos is plausible at this time and location43 and the SPIN and morphological identiﬁcations showed good agreement. Nevertheless, we were interested in the performance of SPIN for distinguishing Bos and Bison in degraded material and therefore looked at the overall species distribution in the high-intensity replica experiment (Fig. 4d). Discriminating between Bos and Bison was only possible for 5 out of 14 bovine bones and became signiﬁcantly more challenging with lower sequence coverage (Supplementary Data 2). Cattle and horses could not be distinguished morphologically in 10 cases, 7 of which could be resolved by SPIN. All three laboratory blanks were correctly excluded (“signal too low”) by the relative protease intensity threshold. We compared the performance of the three different types of peptide identiﬁcations by library DIA, directDIA, and DDA, which performed very similarly to the reference samples. Compared to the well-preserved reference samples, the differences became much more apparent in the more degraded Salpetermosen sample set. The pseudo ROC-curve analysis shows that the DIA-based methods outcompeted DDA, especially in the low amount replicate, indicating higher sensitivity in DIA-based measurements (Fig. 4e). Between the two DIA methods, librarybased DIA consistently produced more true species identiﬁcations than directDIA. Species identiﬁcation of Middle and Upper Palaeolithic archaeological bones. To challenge the SPIN workﬂow with highly degraded samples and demonstrate its scalability, we analyzed a set of 213 archaeological bone fragments from three Portuguese archaeological sites with early human occupation (Fig. 5a). To this end, we translated the output of the species inference algorithm to reﬂect the most likely ancestors that were present at the location and time (Table S3). Analogous to all other samples in this study, we analyzed the Portuguese bones with DDA and DIA, which took 52 h and 26 h of MS acquisition time, respectively. We used the library-based DIA results as the basis for species identiﬁcation because of its higher resolution, as demonstrated with the Salpetermosen dataset. However, to allow the identiﬁcation of species for which no spectral library is NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x a b Vale Boi Lapa do Picareiro Gruta da Companheira Valladolid Braga 11 1 4 4 3 1 Porto 2 7 24 2 37 Averio Leiria 1 Madrid Picareiro Cave 3 14 1 1 11 Toledo Cáceres Portugal 49 Spain Córdoba Huelva Vale Boi 4 3 Ciudad Real Mérida Lisbon Jaén 6 Seville Granada 31 11 3 13 6 1 Level 18 1 8 Málaga Galeria 5 GG 7 2 7 3 1 1 10 2 4 4 2 1 2 1 6 1 12 JJ Companheira cave 8 Level 32 11 24 3 11 3 8 8 6 18 Jerez Gibraltar Tangier Rabbit Wild ass Deer Ibex Aurochs/Bison below FDR Wild boar Horse Deer/Chamois Ibex/Chamois Aurochs/Bison/Buffalo Aurochs/Bison/Buffalo/ Horse/Chamois weak signal Morocco Chamois POSTN SERPINF1 F2 LUM COL5A3 COL5A2 COL5A1 COL4A4 COL3A1 COL2A1 COL22A1 COL1A2 COL1A1 30 COL12A1 10 COL11A2 0 1 3 COL11A1 Avg. fold-coverage AHSG ALB BGN CHAD c Reference Salpetermosen Vale Boi Lapa do Picareiro Gruta da Companheira 0 10000 20000 Global Site Fig. 5 Large-scale species identiﬁcation at three sites with early human occupation on the Iberian peninsula. a Locations of the three sites on a current map of Portugal. Map drawn in Mapbox Studio using a custom style. b Species identiﬁed in 84 samples from levels 6–7 (29–31,500 BP) of Vale Boi, in 95 samples from layers GG to JJ (38–45,000 BP) of Lapa do Picareiro, and 34 samples from chambers 1 and 2 (estimated 50–60,000 BP) of Gruta da Companheira. Overall species distribution is displayed by the pie chart, whereas bar charts show species ratios for separate compartments of the assemblage. Colors are used to distinguish species, as indicated in the legend. c Average fold-coverage of the 20 genes used for SPIN comparing the three Portuguese sites with the modern reference and iron-age material. Coverage was calculated by summing the number of precursors at each site in the global aligned database and is indicated by color using white for no coverage, blue for medium coverage, and pink for high coverage. The values represent the average fold-coverage in 10 amino acid bins for each dataset. currently available, such as rodents, we replaced the result with the taxonomy identiﬁed by directDIA, whenever directDIA detected such a species. As both results were based on the same raw data, the relative protease threshold remained unaffected, but the sequence coverage was lower with directDIA. In addition, compared to the reference and Salpetermosen samples, these Southern European Middle and Upper Palaeolithic samples suffered from reduced protein sequence coverage across the proteome assembly (Figs. 5c, S13) and an increase in protein deamidation (Fig. S14). For Lapa do Picareiro, 94 out of 95 samples could be conﬁdently assigned with a species. For these specimens, which were dated approximately between 38,000–41,000 BP (layers GG-II) and 45,000 BP (layer JJ)44,45, species composition was relatively similar for both layers (Fig. 5b). Of particular interest was the identiﬁcation of one specimen of the now-extinct European wild ass (E. hemionus hydruntinus), alongside 37 caballine horses. Most of the ibex and chamois bones, which are not easy to distinguish morphologically, could be uniquely assigned to one of the two (18 out of 20). Finally, bovines and wild boar were exclusively identiﬁed in the older “JJ” layer. For Vale Boi, dated between 31,500 and 29,000 BP46,47, 60 out of 84 samples could be conﬁdently identiﬁed. The remaining 24 samples failed to meet the abundance-based quality threshold and were therefore not assigned a species identity (Fig. 5b). The 8 vast majority of the identiﬁed bones from layers 6 and 7 and all bones in layer 8 were classiﬁed as deer, which is in agreement with the previously reported numbers for large mammals46. Equids, including one E. hydruntinus, were only identiﬁed in layer 6. With directDIA, four smaller bone fragments from layers 6 and 7 could be classiﬁed as rabbits, which were highly abundant at Vale Boi48. Finally, for Gruta da Companheira 12 out of 34 samples could be assigned a conﬁdent species identiﬁcation. Expected to date around 50,000–60,000 BP, 14 samples failed to meet the FDR threshold while eight samples were excluded due to failing to meet the abundance-based quality threshold. Amongst the conﬁdently identiﬁed species at Gruta da Companheira were bovines, deer, and rabbits, whereas the only ovicaprine sample could not be uniquely assigned to either ibex or chamois. Although below the relative protease cutoff, the two most interesting bones, which were both found in Galeria 2, matched best to great apes. The sample with the highest sequence coverage (1817 aa) was classiﬁed as human or chimpanzee (Fig. S15), whereas the sample with lower coverage (235 aa) matched equally well with all great apes. Here, the SPIN results can be used as a starting point for future in-depth protein and ancient DNA analyses to ﬁnd out whether these are actually human remains and to eventually deﬁne their genetic proﬁle40. NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x a Reference samples SPIN b True Species c Salpetermosen PMF Portugal Cervidae/Saiga SPIN PMF SPIN PMF Fig. 6 Comparison of SPIN and PMF. a Alluvial diagram showing species identiﬁcation of 46 reference bone samples and 3 laboratory blanks. Small bars on the x-axis indicate individual samples. Color and position in the middle column represent the true species, whereas the left and right column report the species identiﬁcation by SPIN and PMF, respectively. Bars with color gradients indicate changing species assignments. b Alluvial diagram showing species identiﬁcation of 20 representative samples from the Danish Salpetermosen site (Fig. 5). Left column indicates the species identiﬁcation by SPIN, whereas the right column indicates the species identiﬁed by PMF. c Alluvial diagram showing species identiﬁcation of 21 representative samples from the three Portuguese sites (Fig. 5). Left column indicates the species identiﬁcation by SPIN, whereas the right column indicates the species identiﬁed by PMF. Performance comparison against collagen type I PMF. To evaluate the SPIN method against the current method of choice for species identiﬁcation by mass spectrometry, we compared it directly with MALDI-TOF MS-based species analysis of collagen type I PMF, i.e., ZooMS analysis18,49. To ensure a fair comparison, we assembled a representative test set of peptides comprising all 46 samples from our reference set, 20 from Salpetermosen, and 21 from the Portuguese bone assemblages (total = 87 samples), as well as 3 extraction blanks. SPIN and ZooMS analysis were performed using the same amount of peptides per replicate that were generated following the protocol developed for SPIN (Methods, Fig. 1a). Amongst the known references, SPIN and ZooMS obtained the same level of taxonomic identity in 14 cases (30%), SPIN was more speciﬁc in 28 cases (60%), and ZooMS could not determine the species of 4 samples (9%), all of which were pig bones readily ARTICLE identiﬁed by SPIN (Fig. 6a). In case of the material from Salpetermosen (Fig. 6b), the same level of identiﬁcation was achieved for 10 samples (50%), more speciﬁc classiﬁcation was achieved with SPIN in 3 cases (15%) and with ZooMS in 1 case (5%). Samples were not identiﬁable by SPIN in 1 case (5%) and by ZooMS in 6 cases (30%). For the most degraded material from the Portuguese sites (Fig. 6c), species identiﬁcations were on the same level for 10 samples (48%), more speciﬁc with SPIN for 4 samples (19%), and exclusively identiﬁed by SPIN in 6 cases (29%). We found no case in any of the samples, where the ZooMS and SPIN identiﬁcations were mutually exclusive. With ZooMS, all three laboratory blanks were all marked as “unidentiﬁable”, whereas SPIN assigned one of them as Bos taurus (Fig. 6a). The majority of the ZooMS identiﬁcations (57 out of 87, 63.3%) provided a level of taxonomic speciﬁcity that cannot be improved further without adding more peptide markers beyond the nine we looked for. We therefore conclude that SPIN provides a level of taxonomic speciﬁcity unreachable by current ZooMS collagen PMF approaches, most likely due to the analysis of more divergent non-collagenous proteins, higher dynamic range achieved by chromatography, conﬁdence control for peptide identiﬁcations, and higher resolution of the used MS instrument used for SPIN. Discussion Here, we present a streamlined proteomics approach for genetic evidence-based species identiﬁcation that bridges the gap between high-throughput low-cost targeted methods, such as PMF, PCR, or ELISA, and more powerful low-throughput high-cost approaches like NGS or conventional proteomics analysis by LC-MS/MS. We demonstrate that the automated sample preparation workﬂow based on PAC produces high-quality peptide samples for bone proteome analysis by LC-MS/MS and allows for easy scale-up. Data acquisition, which used to be a bottleneck in terms of speed and costs, was drastically shortened to reach a throughput of up to 200 samples per day per MS instrument. This became possible with a gradient storage-based LC system and advanced software for DIA data interpretation, two emerging technologies that have previously been undescribed in bone proteome analysis. The data interpretation strategy used for SPIN facilitates an unbiased comparison of currently up to 156 species and provides high conﬁdence due to FDR control on both peptide and species identiﬁcation levels. We concluded that the highest conﬁdence and sensitivity could be achieved with the library-based DIA approach, based on the analysis of 46 references and 64 more degraded bones from Denmark. These results were validated internally with replication and externally with morphological species identiﬁcation. Furthermore, we demonstrated the high throughput of this method by analyzing a set of 213 Palaeolithic bones from Portugal in little more than a single day of MS time. SPIN performed well during the analysis of degraded samples and still had a reasonable success rate for 50,000–60,000-year-old bone material from a Mediterranean climate. In contrast, samples in a similar state of degradation require considerable effort for ancient DNA analysis. We expect, with an optimized sample preparation protocol for very stable specimens like teeth, SPIN will become applicable for palaeoproteomics analysis of million-year-old material40. By comparing SPIN with PMF analysis of type I collagen using the popular ZooMS protocol, we demonstrated that SPIN outperformed the species resolution of PMF in most cases. It needs to be noted that the species selection in the test set was not aimed at broad coverage of mammalian taxa but rather at the more challenging task of discriminating a few common species from their close relatives. Since we analyzed the same peptide extracts for fair comparability of the MS and data analysis performance, NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x the use of a PAC-based sample preparation method potentially impacted the PMF sensitivity. Nevertheless, many of the PMFbased species discriminations could not even be improved in an ideal scenario because they are inherently limited by the restriction to type I collagen. While a fair cost estimate for the two methods is hardly possible due to variable machine and labor costs, we believe that the shorter analysis time and automated data interpretation with SPIN will at least drive the costs of the two approaches closer together. In our case and possibly for many other groups, a major bottleneck in the PMF workﬂow was the manual analysis of peptide markers which was necessary because no universally accepted taxonomic assignment procedure of MALDI-TOF MS spectra was available at the time of this study. So far, the SPIN data analysis is limited to 156 species that can be analyzed with DirectDIA, providing good species separation, or alternatively 13 spectral libraries for library-based DIA analysis, achieving the best possible species identiﬁcation. Depending on the research question and archaeological context, studies may require an extension of the protein database and the set of spectral libraries. Although genomes are available for a plethora of extant mammals50, the main bottleneck towards a more comprehensive protein database in our eyes is the genome annotation and sequence prediction for understudied species. With the appropriate sequence database and reference samples at hand, generating spectral libraries is a relatively straightforward task if required for a project. We envision that the number of available reference data will grow with more research groups sharing their protein sequence databases and spectral libraries, in the future. Furthermore, the SPIN workﬂow itself will likely be improved and expanded over time. We think of it as a modular protocol that can serve as the foundation for “SPIN-off” methods with custom building blocks, like: (i) sample preparation modiﬁed to support protein extraction from heavily-processed food products, (ii) data acquisition adapted for different instruments, and (iii) data interpretation including sex identiﬁcation. Furthermore, it can be adapted to resolve mixtures of proteins from multiple taxa or to quantify protein damage (Fig. S14), in the future. Finally, we anticipate that the SPIN workﬂow will make LC-MS/ MS more accessible for everyone, due to the reduction of the analytical costs per sample and a high degree of automation. Methods Sample description. Each bone sample in this study was taken with permission from the respective museum, curator, or institution and the impact was minimized by only removing necessary amounts. The list of all samples and museum identiﬁers can be found in Supplementary Data 4. A fragment of a Pleistocene mammoth bone from permafrost and dated to ~43,000 BP39,51 was used for optimizing methods. Reference samples for Bos taurus, Ovis aries, Sus scrofa (mandibles), and Equus caballus (phalanx) were from the mixed viking-medieval deposits of the archaeological site Hotel Skandinavien (Århus Søndervold) (ZMK139/1964) in Århus, Denmark. The Laboratory of Biological Anthropology, Department of Forensic Medicine, at the University of Copenhagen provided the human reference sample, dentine from a previously described40 200–400-year-old premolar from “Almindelig Hospitals cemetery on Østerbrogade” in Copenhagen, Denmark. Reference samples for Bos primigenius, Bison bonasus, Capra hircus, Equus asinus, Equus primigenius, mule (Equus caballus X Equus asinus), Pongo pygmaeus, Gorilla gorilla, and Pan troglodytes were provided by the Natural History Museum of Denmark. The collection of 63 bone fragments dated to the Danish Early Iron-Age was sourced from the “Salpetermosen Syd 10” (MNS50010) site in Denmark52. The set of 213 upper-palaeolithic bone fragments from three Portuguese sites consisted of 84 samples from level 6 (29,300 BP46), level 7 (30,400 BP47), and level 8 (31,500 BP, unpublished) from Vale Boi, 95 samples from layers GG-II (38.000–41,000 BP) and JJ (45,000 BP44,45) from Lapa do Picareiro, and 34 samples from Galeria 1 and 2 from Gruta da Companheira53. Sampling. The reference samples for Bos taurus, Ovis aries, Sus scrofa, Equus caballus, and Homo sapiens and the 213 Portuguese samples were sampled in a clean laboratory designed for ancient DNA and protein work at the Globe Institute, at the University of Copenhagen. The remaining samples were processed in a laboratory with measures against human protein contamination. Working areas and tools were decontaminated with 5% bleach and 70% ethanol, between samples. 10 Reference bones intended for the generation of spectral libraries were surfacedecontaminated by mechanical ablation and small samples of <100 mg were removed using a rotary cutting tool, followed by crushing of the pieces by mortar and pestle. For high-throughput analysis, ~5 mg samples were collected by scraping the fracture site of bone fragments with a small chisel or pliers and transferred into a 96-well plate. At least three laboratory blanks, i.e., empty wells, were included for each project with at least 1 blank on every plate. Combined demineralization and extraction for SPIN. Five milligrams of bone powder or chips were suspended in 100 µL of demineralization/extraction solution containing 5% HCl and 0.1% NP-40 (ThermoFisher Scientiﬁc, 28324) in ultrapure water. Demineralization takes place at room temperature (rt) with continuous shaking at 1000 rpm, for 16–24 h. Reduction, alkylation, and collagen gelatinization were facilitated by adding 10 µL 0.1 M tris(2-carboxyethyl)phosphine (TCEP, Sigma–Aldrich, C4706) and 0.2 M N-ethylmaleimide (NEM, Sigma–Aldrich, E3876) in 50% ethanol and 50% ultrapure water and shaking at 1000 rpm at 60 °C, for 1 h. Protein cleanup and digestion for SPIN. The puriﬁcation and digestion take place on a KingFisherTM Flex (ThermoFisher Scientiﬁc) magnetic bead-handling robot. Debris was removed from the protein extract by centrifuging the plate at 800 × g, for 5 min. Magnetic SiMAG-Sulfon beads (Chemicell, 1202) were washed and prepared at a ﬁnal concentration of 5 mg/ml in 60% acetonitrile (ACN) and 40% water. In a deep-well KingFisherTM plate, 10 µL bead solution and 40 µL of the clear protein extract were brieﬂy mixed. Protein aggregation capture (PAC) was initiated by the addition of 240 µL of 70% ACN and 30% water (60% ﬁnal ACN concentration) and ﬁnalized by incubating with shaking at 800 rpm for 5 min and without shaking for 1 min. The robot was loaded with this plate, “wash I” (500 µL 70% acetonitrile, 30% water), “wash II” (500 µL 80% ethanol, 20% water), “wash III” (500 µL 100% acetonitrile, and the “on-bead-digestion” plate (100 µL 20 mM Tris pH 8.5, Sigma–Aldrich 10708976001, 1 µg/mL LysC, Wako 129-02541, 2 µg/ mL Trypsin, Promega V5111). The programmed sequence was: (i) collect the beads at low speed for 3:30 min, (ii–iv) washes I-III with slow mixing for 2 min, (v) digestion at 37 °C with slow mixing for 1 h, and (vi) bead collection and removal. The digestion was ﬁnalized outside the robot with shaking at 800 rpm and 37 °C, overnight. The peptides were acidiﬁed with 10 µL of 10% triﬂuoroacetic acid (TFA, Sigma–Aldrich, T6508). One Evotip (Evosep, EV-2001) per sample was washed in ACN, soaked with isopropyl alcohol, and equilibrated with 0.1% TFA in water, according to the manufacturer’s protocol. The equilibrated tips were loaded with 10 µL peptide solution and subsequently washed with 20 µL 0.1% TFA, before LCMS/MS. Peptide fractionation for spectral libraries. Peptides for spectral libraries were obtained either from 3 × 5 mg bone powder processed by robot-based SPIN or from 20 mg bone powder processed manually with the same workﬂow. The peptides were desalted using C18 (3 M Empore, 66883-U) StageTips54. After quantiﬁcation by Nanodrop (Thermo Fisher Scientiﬁc) at A280nm, 12 µg peptides were adjusted to pH 7–8 by adding one volume of 50 mM ammonium bicarbonate (ABC, Sigma–Aldrich, A6141). Ofﬂine fractionation by high-pH reversed-phase chromatography was carried out on an Ultimate 3000 HPLC (Thermo Fisher Scientiﬁc) operated with Chromeleon (6.8) and equipped with a 15 cm long, 1 mm i.d., 1.7 µm particle size C18 column (Waters ACQUITY Peptide CSH) and 5 mM ABC as buffer A and 100% ACN as buffer B. The gradient at a ﬂow rate of 30 µL/ min started at 6% B, was increased to 18% B over 55 min, to 25% B over 12 min, and to 70% B over 3 min, followed by a column wash at 70% B for 7 min and reequilibration at 6% B for 9 min. During the gradient, 12 fractions of equal size were collected. Blanks were run between the different species to reduce carryover. The fractions were acidiﬁed with 1% (ﬁnal concentration) TFA and vacuumconcentrated. An equivalent of 250 ng peptides (25% of each fraction) was loaded on Evotips as described above. Protein extraction and digestion methods for comparison. Each of the tested protocols “in-solution”7, FASP55, GASP37, and S-trap38 was conducted in triplicates using 5 mg bone powder from the Pleistocene mammoth bone test sample. Demineralization was done by adding 100 µL of 5% HCl and incubating at r.t. and shaking at 1000 rpm, for 24 h. Insolubles were separated by spinning the suspension at 5000 × g for 5 min and the supernatant was discarded or kept for FASP. The protein pellet was washed with 100 µL ultrapure water followed by repeating the centrifugation and discarding the supernatant. Proteins were extracted either in 100 µL 3 M guanidinium hydrochloride (Gnd-HCl, Sigma–Aldrich, G3272) in 0.1 M Tris, pH 8.5 for “in-solution” and FASP or in 100 µL 2% SDS (Sigma–Aldrich, 428018) for GASP and S-trap. To all protein extractions, 10 mM TCEP and 20 mM chloroacetamide (CAA, Sigma–Aldrich, C0267) were added and the samples were incubated with shaking at 1000 rpm at 60 °C, for 1 h. The protein concentrations of the extracts were quantiﬁed by BCA assay (ThermoFisher Scientiﬁc, 23225). For “in-solution”, the guanidinium extract was incubated with 1:300 (proteasewt:proteinwt) LysC at 37 °C, for 1 h. Subsequently, the sample was diluted with 3 volumes of 25 mM Tris, pH 8.5 and 1:100 (proteasewt:proteinwt) trypsin was NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x added for overnight digestion at 37 °C. The protein digest was acidiﬁed with 1% (ﬁnal concentration) TFA and desalted using StageTips. For FASP, the guanidinium extract was spun at 20,000 × g, for 10 min, and the supernatant was mixed with 9 volumes 8 M urea in 0.1 M Tris, pH 8.5 and transferred and passed through a 2.5 kDA MWCO ﬁlter (Millipore, UFC500324) in 200 µL steps by spinning at 5000 × g for about 10 min. The ﬁlter was washed with 200 µL 8 M urea in 0.1 M Tris, pH 8.5 the demineralization supernatant was mixed with 6 volumes 8 M urea in 0.1 M Tris, pH 8.5, and passed through the same ﬁlter in 200 µL steps. After the last wash with 200 µL 8 M urea in 0.1 M Tris, pH 8.5, 100 ng LysC in 100 µL 50 mM ABC was added to the ﬁlter. Predigestion took place at 37 °C with shaking at 1000 rpm, for 1 h. Next, 200 ng trypsin was added and the digestion continued at 37 °C, overnight. The peptides were collected by spinning the ﬁlter and washing the membrane with 100 µL Tris, pH 8.5, spinning, washing with 100 µL 40% ACN 60% water, and spinning for ﬁnal collection. The ﬂowthrough was concentrated to <50 µL and acidiﬁed with 1% (ﬁnal concentration) TFA, before desalting on C18 StageTips. For GASP, 100 µL SDS extract was mixed with 100 µL acrylamide solution (37.5:1 Acrylamide:Bisacrylamide), for 20 min. Polymerization was started by adding 8 µL TEMED (Sigma–Aldrich, T9281) and 8 µL 10% ammonium persulfate (Sigma–Aldrich, A3678). The obtained gel was sliced into pieces and ﬁxed in 50% methanol, 40% water, and 10% acetic acid, for 30 min. The supernatant was removed and the gel was sequentially washed with 1 mL ACN, 1 mL 6 M urea, 1 mL ACN, 1 mL ACN, 1 mL 6 M urea, 1 mL ACN, 1 mL 50 mM ABC, 1 mL ACN, and ﬁnally resuspended in 200 µL 50 mM ABC. For predigestion, 100 ng LysC was added and the sample incubated at 37 °C with shaking at 1000 rpm, for 1 h. Next, 200 ng trypsin was added and the digestion continued at 37 °C, overnight. The supernatant containing the peptides was collected and the gel pieces extracted using 200 µL ACN, followed by 200 µL 5% formic acid, and 200 µL ACN. The peptide solutions were combined, vacuum-concentrated, and desalted on C18 StageTips. For S-trap, 1.2% (ﬁnal concentration) phosphoric acid was added to the SDS extract. An equivalent of 10 µg extract was mixed with 1 volume 90% methanol, 100 mM triethylammonium bicarbonate, pH 7 (TEAB, Sigma–Aldrich, 18597) and loaded on an S-trap ﬁlter (Protiﬁ) by spinning at 2000 × g, for 1 min. The ﬂowthrough was re-loaded three times, before washing the ﬁlter with 200 µL of 90% methanol, 100 mM TEAB, pH 7. For predigestion, 100 ng LysC in 100 µL 50 mM Tris, pH 8.5 was added and the samples incubated with shaking at 1000 rpm at 37 °C, for 1 h. Then, 200 ng Trypsin in 50 µL 50 mM Tris, pH 8.5 was added and the digestion continued, overnight. The peptides were collected by centrifugation at 2,000 × g, for 1 min, and sequential washing with 100 µL 50 mM Tris, pH 8.5, 100 µL 0.1% TFA, and 100 µL 0.1% TFA in 50% ACN. After concentrating the peptides by vacuum evaporation, they were desalted on C18 StageTips. LC-MS/MS data acquisition methods. For SPIN by DDA, chromatography was carried out using the 100 samples per day (SPD) method of an Evosep One (Evosep, Odense, Denmark) operated with the Evosep plugin (1.4.381.0) in Chronos (4.9.2.0) and an analytical column made in-house using a laser-pulled 8 cm long 150 µm inner diameter capillary packed with 1.9 µm C18 particles (Reprosil, Dr. Maisch). Peptides were ionized by nano-electrospray at 2 kV and analyzed on an Orbitrap Exploris 480TM (Thermo Fisher Scientiﬁc, Bremen, Germany) MS operated with Xcalibur (3.1-4). Full scans ranging from 350 to 1400 m/z were measured at 60 k resolution, 25 ms max. IT, 300% AGC target. The top 6 precursors were selected (30 s dynamic exclusion) for HCD fragmentation with an isolation window of 1.3 m/z and an NCE of 30. MS2 scans were acquired at 15 k resolution, 22 ms max. IT, and 200% AGC target. DIA SPIN analysis was based on the 200 SPD method of an Evosep One and the same column, ESI, and MS instrument as for DDA. Full scans ranging from 350 to 1400 m/z were measured at 120 k resolution, 45 ms max. IT, 300% AGC target. Precursors were selected for data-independent fragmentation in 15 windows ranging from 349.5 to 770.5 m/z and 3 windows ranging from 769.5–977.5 m/z with 1 m/z overlap. HCD fragmentation was set to an NCE of 27 and MS2 scans were acquired at 30 k resolution, 45 ms max. IT, and 1000% AGC target. Samples for method optimization and spectral libraries were measured in DDA mode. Method optimization experiments were analyzed on the 60 SPD Evosep One gradient and spectral libraries on the 200 SPD gradient using the same column, ESI, and MS instrument as described above. Full scans ranging from 350 to 1400 m/z were measured at 60 k resolution, 25 ms max. IT, 300% AGC target. The top 12 precursors were selected (30 s dynamic exclusion) for HCD fragmentation with an isolation window of 1.3 m/z and an NCE of 30. MS2 scans were acquired at 15 k resolution, 22 ms max. IT, and 200% AGC target. Protein database for SPIN. The database includes all protein sequences from UniProt knowledgebase56 release 2020_06 and NCBI RefSeq57 release 201 (July 2020) from mammalian species and matching to the top 20 genes (Fig. 2b) in.fasta format. The NCBI entries were reannotated with gene and species information using the respective GenPept ﬁles and the fasta-headers were changed to a pseudoUniprot format: “>NCBI | [protein ID] | [protein ID]_[gene alias] [protein description] OS = [species name] OX = [species ID] GN = [gene name]”. Relevant Uniprot entries with missing or false gene annotations were added by sequence similarity-based reannotation. The UniRef90 (release 2020_06) repository was used ARTICLE to annotate each “90 % similarity” cluster with its most common gene name, followed by downloading all additional proteins matching the top 20 genes and updating the fasta-headers to include the correct gene name. Protein sequences from species missing in the databases like Equus przewalskii, Bison bonasus, and the extinct Bos primigenius were manually extracted from the available genomes58–60 using reference sequences of the closest living relatives and the local BLAST61 and visualization in UGENE62. After combining all protein sequences, ﬁltering for mammalian species, and removing duplicates, the sequences were split into 20 separate ﬁles for each gene. A multiple sequence alignment (MSA) was performed for each ﬁle using MUSCLE version 3.8.42563 and the alignment was visualized in AliView version 1.2664. Upon manual inspection, faulty sequences, for instance, large inserts that were not shared by any other species or frameshifts identiﬁed by very low similarity to the rest of the alignment, were removed or changed into gaps. The aligned and manually reﬁned databases were combined into one.fasta ﬁle with gene-wise alignment and a second gapless ﬁle for use in search engines. Phylogenetic trees based on this database were made by merging all genes of each species in alphabetical order, generating a consensus, generating the tree with FastTree version 2.1.1165, and visualization with FigTree version 1.4.4 (Andrew Rambaut). Peptide identiﬁcation. All raw ﬁles from DDA were analyzed in MaxQuant version 1.6.0.1766. Variable modiﬁcations included oxidation (M), deamidation (NQ), Gln -> pyro-Glu, Glu -> pyro-Glu, and proline hydroxylation, whereas NEM-derivatization of Cys was conﬁgured as ﬁxed modiﬁcation. Spectral libraries and method optimization runs were searched against all protein sequences in the UniProt knowledgebase for the given species (download dates available in the MaxQuant “summary.txt”). Searches were ﬁrst run with tryptic speciﬁcity and up to 2 missed cleavages to reduce the database size and then searched again with semi-tryptic speciﬁcity allowing for a peptide length between 8–30 aa and max. mass of 4000 Da. SPIN ﬁles were searched against the abovementioned gapless database only with semi-tryptic speciﬁcity using the same settings. Up to 5 variable modiﬁcations were allowed in tryptic and up to 4 in semi-tryptic searches. In all searches, “Second peptides” search and “Match between runs” were disabled, the score thresholds for identiﬁcation were set to a minimum Andromeda score of 40 and delta score of 6, and the internal MaxQuant contaminant list was replaced with a custom database (Supplementary File “PR200512_HumanCons.fasta”). All other settings were left as default. All raw ﬁles from DIA were analyzed in Biognosys Spectronaut67,68 version 14.5.200813, or version 14.11.210528 for the ﬁles analyzed in the SPIN vs. MALDITOF comparison, using either library-based or library-free DirectDIA search. Spectral libraries were imported from the individual semi-tryptic MaxQuant search results with default settings except for digestion speciﬁcity and merged into a single library, before searching. The library-based search was carried out with default settings, except for MS1 as “Quantity MS-Level”. The DirectDIA search was conﬁgured with the abovementioned gapless and custom contaminant databases. Digestion speciﬁcity was set to semi-tryptic with a peptide length between 7–40 amino acids, up to 2 missed cleavages, and the variable modiﬁcations included oxidation (M), deamidation (NQ), Gln -> pyro-Glu, Glu -> pyro-Glu, and proline hydroxylation, whereas NEMderivatization of Cys was conﬁgured as ﬁxed modiﬁcation. Again, the “Quantity MSLevel” was set to MS1 and all other settings were kept as default. Species inference. The species determination based on peptides identiﬁed with library-based DIA, DirectDIA, or DDA was done in R version 4.0.3 using RStudio version 1.3.10.93 and additional packages (reporting summary). The required peptide identiﬁcation data was either generated with a Spectronaut report based on the SPIN.rs scheme (Supplementary Files) for DIA or extracted from the Maxquant output “evidence.txt” for DDA. Additionally, the aligned protein sequence database, the contaminant database, the experimental annotations, a list of species in the spectral library, and the ﬁne-grouping table are needed (Supplementary Code69). The workﬂow was almost identical for the three types of data with small differences between MaxQuant and Spectronaut output column names, which can be found in the provided R scripts69. Species inference starts with loading the precursor identiﬁcation ﬁles and ﬁltering for 1% FDR. The databases were loaded and all isoleucines in the database and precursor sequences were changed into leucines. To allow for unambiguous site assignment, “global sites” were determined for every amino acid in the database by putting the 20 aligned genes in alphabetic order and numbering the positions from 1 through 25,550. The protein database was extended with an equal amount of “species decoy” proteins, which were generated by slicing the globally aligned sequences into 500 amino acid-long pieces and combining slices from randomized species into a “chimeric” decoy species. The combined target/decoy database was used to annotate the precursors with matching genes, proteins, species, and global sites covered by the peptide sequence. Site-level information was generated by counting precursors/peptides, ﬁnding the maximum score, and summing the intensity for every possible amino acid at every possible global site, for each raw ﬁle. The summed intensities larger than 1 were log10-transformed and the precursor count, peptide count, log-intensity, and max. score were scaled by dividing by the maximum at the global site. These four metrics were then multiplied to calculate a joint score (J-score), which was again scaled by dividing by the maximum J-score at the global site. NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x Based on the combined target/decoy database, a site difference matrix was generated. For every possible comparison of two species, the difference matrix lists the global sites and amino acids that are different between them. Global sites with gaps in one of the species are ignored. For each raw ﬁle, these species-discriminating sites were then scored using the J-score. The species with the higher J-score sum was selected as the “winner”, while both species are “winners” in case of a “tie”. The best match for a raw ﬁle was the species that won the most comparisons. Fine-grouping was done for raw ﬁles with the best match that was amongst the species with manually selected marker peptides. The ﬁne-grouping uses a list of hand-picked peptides that can be used to discriminate closely related species within a genus. These peptides were scored based on their precursor intensities in the respective sample. Based on the highest score, a single or multiple indistinguishable ﬁne-grouping species were reported. The ﬁnal output species was selected by taking either the best match or, whenever available, the ﬁne-grouping species. The “site count”, which is the absolute sequence coverage, was added to this output table by counting the number of matching sites that were identiﬁed in the respective raw ﬁle. After ranking the list of raw ﬁles by decreasing site count, a q-value can be calculated as the fraction of decoy-species identiﬁcations at a given site count cutoff. Files with a q-value above 1% were marked in the ﬁnal output. Furthermore, the relative protease intensity was calculated for each sample by dividing the summed precursor intensity of proteases peptides by the total precursor intensity sum. The relative protease intensity cutoff was determined as the upper quartile of relative protease intensity amongst the laboratory blanks. Samples below that threshold will also be marked in the ﬁnal output. MALDI-TOF MS spectral acquisition and PMF analysis. For the comparison between PMF- and LC-MS/MS-based species identiﬁcation, 90 samples and 3 extraction blanks (Supplementary Data 5) were analyzed with both methods. Peptides generated using the SPIN protocol for protein extraction and digestion were split into a 10 µL aliquot for LC-MS/MS and a 30 µL aliquot for triplicate MALDI-TOF analysis. For desalting, the 30 µL were loaded and washed on Evotips as described in “Protein cleanup and digestion for SPIN” and subsequently eluted with two times 20 µL 50% ACN. After vacuum concentration to dryness, the peptides were reconstituted in 3.5 µL 50% ACN and 1 µL was spotted on a MALDI target plate for each replicate and mixed with 1 µL of CHCA (α-cyano-4-hydroxycinnamic acid). MALDITOF MS data acquisition was performed using an AutoFlex LRF MALDI-TOF MS instrument (Bruker) at the Fraunhofer Institute (Leipzig, Germany) in reﬂector mode, positive polarity, matrix suppression up to 590 Da, and spectral collection in the range of 700–3500 m/z. Spectra were exported to.txt ﬁles, their baselines were removed, were aligned to their respective triplicate replicates, and subsequently merged through R (R Core Team, 2018). Masses of the nine common peptide markers observed for mammalian species were taken into account following the nomenclature of Brown70 and the peptide marker database presented by Welker71. Instead of peptide marker COL1ɑ1 586–618, which contains a tryptic cleavage site and was therefore cleaved during efﬁcient PAC digestion, the fully tryptic peptide COL1ɑ1 604–618 in the mass range 1281–1327 Da was used. Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. Data availability The raw mass spectrometry proteomics data generated in this study have been deposited to the ProteomeXchange Consortium via the PRIDE72 partner repository under accession code PXD024487. The processed proteomics data are available as part of the same repository and accession code. The data interpretation results generated in this study are provided as Supplementary Data ﬁles. 23. 24. 25. Code availability Scripts in R written for usage in RStudio and the sequence-aligned protein database are publicly available under Creative Commons Attribution 4.0 International license on Zenodo (https://doi.org/10.5281/zenodo.6406044)69. 26. 27. Received: 23 March 2021; Accepted: 18 April 2022; 28. 29. References 1. 2. 12 Orlando, L., Gilbert, M. T. P. & Willerslev, E. Reconstructing ancient genomes and epigenomes. Nat. Rev. Genet. 16, 395–408 (2015). Slon, V. et al. The genome of the offspring of a Neanderthal mother and a Denisovan father. Nature 561, 113–116 (2018). 30. 31. Cappellini, E., Collins, M. J. & Gilbert, M. T. P. Biochemistry. Unlocking ancient protein palimpsests. Science 343, 1320–1322 (2014). Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. elife 5, e17092 (2016). Asensio, L., González, I., García, T. & Martín, R. Determination of food authenticity by enzyme-linked immunosorbent assay (ELISA). Food Control 19, 1–8 (2008). Martisius, N. L. et al. Non-destructive ZooMS identiﬁcation reveals strategic bone tool raw material selection by Neandertals. Sci. Rep. 10, 7746 (2020). Jersie-Christensen, R. R. et al. Quantitative metaproteomics of medieval dental calculus reveals individual oral health status. Nat. Commun. 9, 4744 (2018). Solazzo, C., Fitzhugh, W. W., Rolando, C. & Tokarski, C. Identiﬁcation of protein remains in archaeological potsherds by proteomics. Anal. Chem. 80, 4590–4597 (2008). Welker, F. et al. Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature 576, 262–265 (2019). Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003). Pappin, D. J., Hojrup, P. & Bleasby, A. J. Rapid identiﬁcation of proteins by peptide-mass ﬁngerprinting. Curr. Biol. 3, 327–332 (1993). Geyer, P. E. et al. Plasma proteome proﬁling to assess human health and disease. Cell Syst. 2, 185–195 (2016). Santos, A. et al. Clinical knowledge graph integrates proteomics data into clinical decision-making. bioRxiv https://doi.org/10.1101/2020.05.09.084897 (2020). Dallongeville, S. et al. Identiﬁcation of animal glue species in artworks using proteomics: application to a 18th century gilt sample. Anal. Chem. 83, 9431–9437 (2011). Mackie, M. et al. Palaeoproteomic proﬁling of conservation layers on a 14th century Italian wall painting. Angew. Chem. Int. Ed. Engl. 57, 7369–7374 (2018). Ostrom, P. H. et al. New strategies for characterizing ancient proteins using matrix-assisted laser desorption ionization mass spectrometry. Geochim. Cosmochim. Acta 64, 1043–1050 (2000). Hynek, R., Kuckova, S., Hradilova, J. & Kodicek, M. Matrix-assisted laser desorption/ionization time-of-ﬂight mass spectrometry as a tool for fast identiﬁcation of protein binders in color layers of paintings. Rapid Commun. Mass Spectrom. 18, 1896–1900 (2004). Buckley, M., Collins, M., Thomas-Oates, J. & Wilson, J. C. Species identiﬁcation by analysis of bone collagen using matrix-assisted laser desorption/ionisation time-of-ﬂight mass spectrometry. Rapid Commun. Mass Spectrom. 23, 3843–3854 (2009). Sinet-Mathiot, V. et al. Combining ZooMS and zooarchaeology to study Late Pleistocene hominin behaviour at Fumane (Italy). Sci. Rep. 9, 12350 (2019). Hughes, C. S. et al. Ultrasensitive proteome analysis using paramagnetic bead technology. Mol. Syst. Biol. 10, 757 (2014). Batth, T. S. et al. Protein aggregation capture on microparticles enables multipurpose proteomics sample preparation. Mol. Cell. Proteomics 18, 1027–1035 (2019). Cleland, T. P. Human bone paleoproteomics utilizing the single-pot, solidphase-enhanced sample preparation method to maximize detected proteins and reduce humics. J. Proteome Res. 17, 3976–3983 (2018). Palmer, K. S. et al. Comparing the use of magnetic beads with ultraﬁltration for ancient dental calculus proteomics. J. Proteome Res. https://doi.org/10. 1021/acs.jproteome.0c00862 (2021). Bekker-Jensen, D. B. et al. A compact quadrupole-orbitrap mass spectrometer with FAIMS interface improves proteome coverage in short LC gradients. Mol. Cell. Proteomics 19, 716–729 (2020). Jersie-Christensen, R. R., Sultan, A. & Olsen, J. V. Simple and reproducible sample preparation for single-shot phosphoproteomics with high sensitivity. Methods Mol. Biol. 1355, 251–260 (2016). Bache, N. et al. A novel LC system embeds analytes in pre-formed gradients for rapid, ultra-robust proteomics. Mol. Cell. Proteomics 17, 2284–2296 (2018). Rosenberger, G. et al. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses. Nat. Methods 14, 921–927 (2017). Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIANN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17, 41–44 (2020). Kelstrup, C. D. et al. Performance evaluation of the Q exactive HF-X for shotgun proteomics. J. Proteome Res. 17, 727–738 (2018). Kelstrup, C. D. et al. Rapid and deep proteomes by faster sequencing on a benchtop quadrupole ultra-high-ﬁeld Orbitrap mass spectrometer. J. Proteome Res. 13, 6187–6195 (2014). Bian, Y. et al. Robust, reproducible and quantitative analysis of thousands of proteomes by micro-ﬂow LC–MS/MS. Nat. Commun. 11, 157 (2020). NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x 32. Bradﬁeld, J., Kitchener, A. C. & Buckley, M. Selection preferences for animal species used in bone-tool-manufacturing strategies in KwaZulu-Natal, South Africa. PLoS ONE 16, e0249296 (2021). 33. Gu, M. & Buckley, M. Semi-supervised machine learning for automated species identiﬁcation by collagen peptide mass ﬁngerprinting. BMC Bioinformatics 19, 241 (2018). 34. Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016). 35. Cleland, T. P., Voegele, K. & Schweitzer, M. H. Empirical evaluation of bone extraction protocols. PLoS ONE 7, e31443 (2012). 36. Wiśniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method for proteome analysis. Nat. Methods 6, 359–362 (2009). 37. Hendy, J. et al. Ancient proteins from ceramic vessels at Çatalhöyük West reveal the hidden cuisine of early farmers. Nat. Commun. 9, 4064 (2018). 38. Zougman, A., Selby, P. J. & Banks, R. E. Suspension trapping (STrap) sample preparation method for bottom-up proteomics analysis. Proteomics 14, 1006–1000 (2014). 39. Cappellini, E. et al. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012). 40. Welker, F. et al. The dental proteome of Homo antecessor. Nature 580, 235–238 (2020). 41. Batth, T. S., Francavilla, C. & Olsen, J. V. Off-line high-pH reversed-phase fractionation for in-depth phosphoproteomics. J. Proteome Res. 13, 6176–6186 (2014). 42. Bennike, O., Pantmann, P. & Aarsleff, E. Lateglacial and Holocene ﬂoras and faunas from the Salpetermosen area, north-east Sjælland, Denmark. Bull. Geol. Soc. Den. 68, 231–244 (2020). 43. Aaris-Sørensen, K. Diversity and dynamics of the mammalian fauna in Denmark throughout the last glacial-interglacial cycle, 115-0 kyr BP. (John Wiley & Sons, 2010). 44. Benedetti, M. M., Haws, J. A., Bicho, N. F., Friedl, L. & Ellwood, B. B. Late Pleistocene site formation and paleoclimate at Lapa do Picareiro, Portugal. Geoarchaeology 34, 698–726 (2019). 45. Haws, J. A. et al. The early Aurignacian dispersal of modern humans into westernmost Eurasia. Proc. Natl Acad. Sci. USA 117, 25414–25422 (2020). 46. Bicho, N. et al. The ecodynamics of the ﬁrst modern humans in Southwestern Iberia: the case of Vale Boi, Portugal. Quat. Int. 318, 102–116 (2013). 47. Pereira, T. et al. Territory and abiotic resources between 33 and 15.6 ka at Vale Boi (SW Portugal). Quat. Int. 412, 124–134 (2016). 48. Manne, T. Early Upper Paleolithic bone processing and insights into smallscale storage of fats at Vale Boi, southern Iberia. J. Archaeol. Sci. 43, 111–123 (2014). 49. Dekker, J. et al. Human and cervid osseous materials used for barbed point manufacture in Mesolithic Doggerland. J. Archaeol. Sci. Rep. 35, 102678 (2021). 50. Hotaling, S., Kelley, J. L. & Frandsen, P. B. Toward a genome sequence for every animal: where are we now? Proc. Natl Acad. Sci. USA 118, e2109019118 (2021). 51. Römpler, H. et al. Nuclear gene indicates coat-color polymorphism in mammoths. Science 313, 62 (2006). 52. Bangsgaard, P. & Pantmann, P. The phenomenon of primary and secondary animals within Iron Age deposits in Denmark. Danish J. Archaeol. 10, 1–20 (2022). 53. Bicho, N., Cascalheira, J., Tátá, F. & Horta, P. in Mudanças em Sistemas Ambientais e sua Expressão Temporal: Livro de Resumos da IX Reunião do Quaternário Ibérico 113–114 (Asociación Española para el Estudio del Cuaternario, 2017). 54. Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670 (2003). 55. Wadsworth, C. & Buckley, M. Proteome degradation in fossils: investigating the longevity of protein survival in ancient bone. Rapid Commun. Mass Spectrom. 28, 605–615 (2014). 56. UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019). 57. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016). 58. Huang, J. et al. Analysis of horse genomes provides insight into the diversiﬁcation and adaptive evolution of karyotype. Sci. Rep. 4, 4958 (2014). 59. Wang, K. et al. The genome sequence of the wisent (Bison bonasus). Gigascience 6, 1–5 (2017). 60. Park, S. D. E. et al. Genome sequencing of the extinct Eurasian wild aurochs, Bos primigenius, illuminates the phylogeography and evolution of cattle. Genome Biol. 16, 234 (2015). 61. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). ARTICLE 62. Okonechnikov, K., Golosova, O. & Fursov, M., UGENE team. Unipro UGENE: a uniﬁed bioinformatics toolkit. Bioinformatics 28, 1166–1167 (2012). 63. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). 64. Larsson, A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30, 3276–3278 (2014). 65. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2-approximately maximumlikelihood trees for large alignments. PLoS ONE 5, e9490 (2010). 66. Cox, J. & Mann, M. MaxQuant enables high peptide identiﬁcation rates, individualized ppb-range mass accuracies and proteome-wide protein quantiﬁcation. Nat. Biotechnol. 26, 1367 (2008). 67. Bruderer, R. et al. Optimization of experimental parameters in dataIndependent mass spectrometry signiﬁcantly increases depth and reproducibility of results. Mol. Cell. Proteomics 16, 2296–2309 (2017). 68. Bekker-Jensen, D. B. et al. Rapid and site-speciﬁc deep phosphoproteome proﬁling by data-independent acquisition without the need for spectral libraries. Nat. Commun. 11, 787 (2020). 69. Rüther, P. L. et al. SPIN—Species by Proteome INvestigation: Code, databases, and example data. https://doi.org/10.5281/zenodo.6406044. (2022). 70. Brown, S., Douka, K., Collins, M. J. & Richter, K. K. On the standardization of ZooMS nomenclature. J. Proteomics 235, 104041 (2021). 71. Welker, F. et al. Palaeoproteomic evidence identiﬁes archaic hominins associated with the Châtelperronian at the Grotte du Renne. Proc. Natl Acad. Sci. USA 113, 11162–11167 (2016). 72. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantiﬁcation data. Nucleic Acids Res. 47, D442–D450 (2019). Acknowledgements We would like to express our gratitude to Daniel Klingberg Johansson and the Natural History Museum of Denmark for providing samples from non-human primates, Eske Willerslev for providing the mammoth bone, and Pedro Horta and Cláudia Costa for supporting the sample selection. We thank Meaghan Emma Mackie and Matthew Collins for helping to explore the PAC methods for very ancient samples. We thank the IZI Fraunhofer (Leipzig, Germany), Stefan Kalkhof, and Johannes Schmidt for access to the MALDI-TOF MS instrument, and Dorothea Mylopotamitaki and Huan Xia for assistance with generating the MALDI-TOF MS spectra. Thanks to Ulises Hernández Guzmán for naming our method, and to Dorte Breinholdt Bekker-Jensen and Tanveer Singh-Baath for providing technical foundations for our workﬂow. P.L.R., E.C., and J.V.O. were supported by the European Commission through the MSC European Training Network ‘TEMPERA’ (grant number 722606). Work at The Novo Nordisk Foundation Center for Protein Research (CPR) is funded in part by a generous donation from the Novo Nordisk Foundation (NNF14CC0001). This work has been supported by EPIC-XS, project number 823839, funded by the Horizon 2020 programme of the European Union. E.C. has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 101021361) and from VILLUM FONDEN (No. 17649). F.W. has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 948365) and a Marie Skłodowska Curie Individual Fellowship (No. 795569). The work in Vale Boi and Companheira is funded by Fundação para a Ciência e Tecnologia (FCT), grant PTDC/HAR-ARQ/27833/2017. The work at Lapa do Picareiro is funded by U.S. National Science Foundation (NSF) awards to J.H. (BCS1420299, BCS-1724997) and M.B. (BCS-1420453, BCS-1725015). L.F. was supported by an SGS grant of the University of West Bohemia (SGS-2022-002). J.C. is funded by Fundação para a Ciência e para a Tecnologia (FCT), contract references DL57/2016/CP1361/CT0026. R.M.G. is funded by the European Regional Development Fund (FEDER) via the Programa Operacional CRESC Algarve 2020, of Portugal2020 (project ALG-01-0145-FEDER-29680), and the Fundação para a Ciência e a Tecnologia (FCT; contract reference 2020.00499.CEECIND). We thank all members of the JVO group for the constructive feedback, the MS platform team at CPR for keeping our instruments in good shape, and Thermo Fisher Scientiﬁc in Bremen for early access to MS instruments. Author contributions P.L.R., J.V.O., and E.C. conceived the study. E.C. and F.W. established collaborations. P.B., K.M.G., P.P., M.C., R.M.G., L.F., J.C., M.L.S.J., M.M.B., J.H., N.B., F.W., and E.C. contributed archaeological samples. P.L.R., I.H., F.W., and A.J.T. performed laboratory research. P.L.R., I.H., and F.W. analyzed the data. P.B., K.M.G., P.P., M.C., R.M.G., L.F., J.C., M.M.B., J.H., N.B., F.W., and E.C. provided archaeological interpretations. P.R. prepared the manuscript. J.V.O., F.W., and E.C., and all co-authors contributed to the revision of the manuscript. Competing interests The authors declare no competing interests. NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30097-x Additional information Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41467-022-30097-x. Correspondence and requests for materials should be addressed to Patrick Leopold Rüther or Jesper Velgaard Olsen. Peer review information Nature Communications thanks Timothy Cleland, Arnaud Droit and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Reprints and permission information is available at http://www.nature.com/reprints Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. 14 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/. © The Author(s) 2022 NATURE COMMUNICATIONS | (2022)13:2458 | https://doi.org/10.1038/s41467-022-30097-x | www.nature.com/naturecommunications

Log In

SPIN enables high throughput species identification of archaeological bone by proteomics

Related papers

Related papers

Related topics