WO2020205426A1

WO2020205426A1 - Comprehensive identification of interacting protein targets using mrna display of uniform libraries

Info

Publication number: WO2020205426A1
Application number: PCT/US2020/024945
Authority: WO
Inventors: Yushen DU; Ren Sun
Original assignee: The Regents Of The University Of California
Priority date: 2019-03-29
Filing date: 2020-03-26
Publication date: 2020-10-08

Abstract

Disclosed herein are methods for detecting and analyzing protein-protein interactions with high specificity and sensitivity.

Description

COMPREHENSIVE IDENTIFICATION OF INTERACTING PROTEIN TARGETS USING MRNA

DISPLAY OF UNIFORM LIBRARIES

[0001] CROSS-REFERENCE TO RELATED APPLICATIONS

[0002] This application claims the benefit of U.S. Patent Application No. 62/826,611, filed March 29, 2019, which is herein incorporated by reference in its entirety.

[0003] REFERENCE TO A SEQUENCE LISTING SUBMITTED VIA EFS-WEB

[0004] The content of the ASCII text file of the sequence listing named

“20200323_034044_200W01_ST25” which is 2.93 kb in size was created on March 23, 2020 and electronically submitted via EFS-Web herewith the application is incorporated herein by reference in its entirety.

[0005] BACKGROUND OF THE INVENTION

[0006] 1. FIELD OF THE INVENTION

[0007] The present invention generally relates to methods for assaying protein-protein interactions.

[0008] 2. DESCRIPTION OF THE RELATED ART

[0009] Defining the protein-protein interaction (PPI) network is essential for

understanding the regulation of cellular biological processes. However, the accurate detection of interactions between proteins, especially the ones of low abundance, remains a challenge. Virus-host PPIs offer a particularly complex case as viral proteins are often multi-functional and can form extensive connections with multiple cellular proteins. These physical interactions are often crucial for viral replication and pathogenesis, making them attractive targets for the generation of antiviral drugs. Influenza virus, for example, leverages PPIs to hijack and/or interfere with diverse cellular pathways, including growth, apoptosis, metabolism, and the immune response. A comprehensive evaluation of viral host interactions is therefore fundamental for understanding the functional connections between cellular networks and disease pathogenesis.

[0010] Currently, affinity purification-mass spectrometry (AP-MS) is one of the most commonly used and well-established methods for detecting protein-protein interactions. Although the sensitivity and accuracy of AP-MS continues to increase, some limitations remain. First, high quality antibodies are required for efficient pull-down of the bait protein. This limitation can be partially circumvented by tagging the target protein with high-affinity epitopes, but it is often difficult to express tagged proteins in the cell type of interest, this may result in non-physiological levels of expression, and it is difficult to determine how the tag impacts the protein-protein interactions. Second, as protein samples cannot be amplified, low abundance proteins may not reach the detection limit for mass spectrometry. As a consequence, information about low abundance proteins in the affinity -purified protein complex may be lost in the acquired dataset. Third, the current technical reproducibility of mass spectrometry is generally not as high as next generation sequencing. This is mainly associated with the high complexity of sample preparation, chromatography, and mass spectrometry procedures. Particularly, when running under data-dependent acquisition (DDA) methods— the most commonly used method for peptide identification— peptide signals of true binding proteins might be masked by signals from co-eluted contaminating proteins. Hence true binding proteins may show up as random hits among replicates and be filtered out in data analysis procedures or removed as noise. All AP-MS methods rely on complex scoring algorithms to determine the probability that an interaction is‘real’, each of which oftentimes gives very different answers. Finally, the complexity of the cellular proteome and limitations in detectable peptides makes comprehensive measurement of an input sample impossible. As such, relative quantification of AP-MS results relies on the inclusion of internal standards between samples.

[0011] SUMMARY OF THE INVENTION

[0012] In some embodiments, the present invention provides methods for assaying

protein-protein interactions, which comprise a) obtaining an exon library, b) preparing an mRNA library by transcribing the exons of the exon library, c) generating a peptide library by translating the mRNA sequences of the mRNA library, d) generating an mRNA display library by linking the peptides of the peptide library with the mRNA sequences of the mRNA library, e) generating an input library of cDNA sequences by reverse transcription of the mRNA sequences of the mRNA display library, f) enriching the cDNA sequences of the input library to obtain an enriched library of enriched cDNA sequences by using one or more proteins of interest as bait proteins, and g) obtaining enriched peptides from the enriched cDNA sequences, contacting the enriched peptides with the one or more proteins of interest, and analyzing any interactions between the enriched peptides and the one or more proteins of interest. In some embodiments, steps a) to f) are repeated one or more times whereby the enriched cDNA sequences are used as the exon library in the repeated steps. In some embodiments, step b) comprises adding a T7 promoter and a FLAG peptide sequence. In some embodiments, step b) comprises linking the transcribed exons to puromycin. In some embodiments, step b) comprises purifying the peptide-mRNA complexes by FLAG tag selection. In some embodiments, step f) comprises contacting the cDNA sequences of the input library with the bait proteins and amplifying the cDNA sequences that bind the bait proteins by PCR amplification. In some embodiments, the methods comprise sequencing the cDNA sequences of the input library and/or sequencing the enriched cDNA sequences of the enriched library. In some embodiments, the exon library is generated from fragmented DNA. In some embodiments, the exon library is generated from a given cell type and/or organism of interest. In some embodiments, the exon library is obtained from genomic DNA. In some embodiments, the methods further comprise minimizing protein fragments that do not represent complete exon sequences by using methods in the art to control the length and composition of the input library or utilizing an open reading frame (ORF) library that is substantially evenly distributed.

[0013] Both the foregoing general description and the following detailed description are exemplary and explanatory only and are intended to provide further explanation of the invention as claimed. The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute part of this specification, illustrate several embodiments of the invention, and together with the description explain the principles of the invention.

[0014] DESCRIPTION OF THE DRAWINGS

[0015] This invention is further understood by reference to the drawings (color versions which may be obtained in US 62/826,611) wherein:

[0016] Figure 1 to Figure 3: Construction of human exon library for mRNA display to detect protein-protein interactions. Figure 1 : The schematic diagram shows the experimental design of PED. Human exons library was enriched from fragmented DNA. The DNA fragments were transcribed in vitro and translated. Puromycin was utilized to link mRNA to its encoded protein. The nuclear acid and protein fusion complexes were pre-selected using C terminus FLAG tag as an input library. The pre-selected input library was then selected against bait proteins and subjected to high-throughput sequencing to determine the identity and frequency of each exon. Figure 2: Scatter plot shows the correlation between two independent input libraries. Exon frequencies were calculated for each replicate and strong correlation were observed with biological duplicate. Figure 3: The distribution of transcript frequency in input libraries is shown with histogram. Medium gray (blue) bars represent the distribution of exon library, and light gray bars correspond to cDNA library. Dark gray shows overlap between the medium gray and light gray bars.

[0017] Figure 4 to Figure 7: Identification of cellular interactors of influenza NS1

protein using PED method. Figure 4: Cellular proteins identified to be interacting with NS1. The corresponding enrichment score is shown. Data is shown as the average of 6 biological replicates and error bars represent standard error. Figure 5: Interactions between NS1 protein and indicated cellular binders were examined by

immunoprecipitation (IP)-western. Strep-tagged NS1 protein and one of the FL AG- tagged cellular proteins were co-expressed in 293T cells. Cells were lysed 48 hours post transfection. Cellular proteins were pull-down with FLAG beads and detected for co eluted NS1 protein using antibody against Strep (N=3). Figure 6: GO enrichment analysis of genes that were identified to be interacting with NS1 through PED. Figure 7: Cellular interaction network of NS1 binders. Interactions with confidence > 0.15 in STRING database were included. Each node represents a cellular binder, and the width of edge represents the confidence level of interactions. FASN, AKTI, PDGFRA, NXF1, CPSF4, SF3B2, and TFAP2C (pink nodes) are known host-dependent factors and the orange nodes are host-restriction factors for influenza viral replication, extracted from the IAV database. The large circle (light yellow circle) indicates the cluster of genes (PABPCI, NXF1, CPSF1, CPSF4, SF3B2) involved in mRNA surveillance. The irregular box (orange shade) indicates the cluster of genes (FASN, AKTI, PDGFRA) in the pathway of regulation of lipid metabolic process.

[0018] Figure 8 to Figure 12: PED facilitates identification of binders of low abundance.

Figure 8: GO enrichment analysis of genes that were identified to be interacting with NS1 through AP-MS. Figure 9: The Circos plot shows the overlap between proteins identified through expanded PED with AP-MS. On the outside, each arc represents the identity of gene list, using the same color code as shown on the legend (the top section is Expanded PED and the bottom section is AP-MA). On the inside, the dark gray (dark orange color) represents the genes that appear in both methods and light gray (light orange color) represents genes that are unique to one method. Purple lines link the same gene that are shared by multiple gene lists. Blue lines link the different genes where they fall into the same ontology term. Figure 10: Venn plot shows the overlap among the proteins identified through expanded PED, AP-MS, and literature. Figure 11 : The cellular abundance of identified protein binders were compared among total cellular proteins, PED and AP-MS methods. Cellular abundance of each protein was quantified with a published cellular proteome database (PRIDE, project: PXD000418, left panel) and correlated transcript abundance was quantified through RNA-seq (right panel). ***P<0.001 (Wilcoxon rank sum test). Figure 12: Enrichment scores of each CDS are shown for CPSF4 (left panel) and PABPC1 (right panel). Orange shades indicates the previously reported domain interacting with NS1. Data is shown as the average of 6 biological replicates and error bar represents standard error.

[0019] Figure 13 to Figure 19: FASN is required for viral replication and regulated by NS1. Figure 13 : The gene expression level and protein expression level of FASN was examined post-NSl expression. 293T cells were transfected with NS1 expression plasmid for the indicated time. Gene expression was examined by real-time PCR and endogenous FASN protein was detected with western blotting (N=3). Figure 14: The effect of FASN on viral replication was examined using C75 inhibitor. A549 cells were infected with WT virus at an MOI of 0.1. Indicated concentrations of C75 were used at the time of infection. The effect of C75 on viral replication at 24 hours post infection was examined by TCID50 assay (N=6). Figure 16 to Figure 19: The levels of newly synthesized fatty acids or cholesterol upon expression of indicated viral proteins were examined by GC/MS. The control (Ctl) is transfection with GFP-expressing vector (N=4). Error bars represent the standard deviation for all panels. *P<0.05, **P<0.01, ***P<0.001 (two-tailed t-test).

[0020] Figure 20 to Figure 26: Mechanistic characterization of an NS1 IFN sensitive mutation D92Y. Figure 20: Ratio of average enrichment scores of indicated proteins binding to NS1 versus D92Y mutant (N=6 for NS 1 protein and N=2 for D92Y mutant).

A lower ratio represented a stronger loss of binding for D92Y mutant. Figure 21 :

Interactions between CPSF1 and NS1 proteins (WT and D92Y mutant) were examined by immunoprecipitation. Figure 22: Inhibition of protein expression by WT or D92Y NS1 proteins using GFP reporter. GFP reporter was transfected in 293T cells together with WT or D92Y NS1 expression plasmid. Fluorescence intensity was examined 24 hours post transfection. Empty vector was used as a control. 3 biological replicates were performed. Represented figures are shown. The histogram of fluorescence intensity is shown on the right panel. K-S test of Green Channel intensity distribution between NS1 and Ctl shows p<0.001. Figure 23 : Mature GFP mRNA was examined by poly-A specific reverse transcription and real-time PCR relative to GAPDH. Empty vector was used as a control (N=3). Figure 24: Effect of CPSF1 overexpression on replication of WT or D92Y mutant viruses (N=3). Viral titer was examined by TCID50 assay. Figure 25: Enrichment score of each CDS is shown for CPSF1 binding with WT NS1 protein. The range of amino acids of each fragment is marked in gray boxes. Figure 26:

Interactions between NS1 proteins and CPSF1 fragments were examined by

immunoprecipitation (IP)-western. Error bars represent the standard deviation for Figure 22 and Figure 23. *P<0.05, **P<0.01, ***P<0.001 (two-tailed t-test).

[0021] Figure 27 to Figure 29: Generation of exon library for mRNA display. Figure 27 and Figure 28: Gel picture (Figure 27) and histogram (Figure 28) shows the distribution of fragment sizes of the input exon library. Double-strand DNAs, from 25 bp and 1500 bp, were loaded as size markers. Figure 29: Schematic diagram shows the construct of exon library for mRNA display. Enriched exon fragment is shown as the shaded (light blue) box. The sequence on the 5’ end of the Exon Fragment is SEQ ID NO: 1

(complementary sequence is SEQ ID NO: 2) and the sequence on the 3’ end of the Exon Fragment is SEQ ID NO: 3 (complementary sequence is SEQ ID NO: 4).

[0022] Figure 30 and Figure 31 : Quality control of the exon library. Figure 30: The scatter plot shows the correlation of cDNA library and mRNA sequencing. Dots represent the frequency of a transcript/cDNA in each library. Figure 31 : The histogram shows the frequency of genes in exon library and cDNA library counted by HTseq software. Medium gray (blue) bars represent the distribution of exon library, and light gray bars correspond to cDNA library. Dark gray is the overlap of the medium gray and light gray portions.

[0023] Figure 32: Feasibility of detecting protein-protein interaction using exon display.

Enrichment of indicated target protein sequences after one round of enrichment. 3*HA tag with linker sequence and influenza NS1 gene were spiked into the human exon library at a frequency of 0.01%. Anti-HA antibodies, monoclonal and polyclonal anti- NS1 antibodies were conjugated onto protein G beads as bait proteins, respectively. The enrichment scores of HA and NS1 sequences after one round of selection were measured by real-time PCR and normalized to input. The first bar in each set is 3*HA and the second bar in each set is NS1.

[0024] Figure 33 : Identification of cellular binders of influenza virus NS 1 protein.

Schematic plot shows the experimental procedures of using PED to identify cellular binders of NS1 protein. Briefly, NS1 with a C-terminal HA tag was expressed in 293 T cells. GFP-HA were expressed as control. Then cell was lysed, centrifuged and proteins in the supernatant were conjugated to anti-HA beads at 4 degrees overnight. Five washes were performed to clean the conjugated bait proteins. These purified baits were then incubated with the input fusion libraries for three hours, precipitated, washed, and the precipitated fractions prepared for next generation sequencing as output library. A total of 6 replicates were performed.

[0025] Figure 34: Data analysis procedure. The schematic diagram shows the general process of data analysis. Raw sequencing reads were mapped onto the human hgl9 reference genome. Enrichment scores of each coding DNA sequence (CDS, represent the exons that encode proteins) were calculated as the relative frequency of the CDS in the selection library to that in the input library. Sequencing reads with wrong

orientations or shifted reading frames were filtered out. Genes with enriched CDSs were compared between biological replicates and the consistent ones were selected. To further reduce false positive rates, the enrichment score of each gene transcript was calculated and overlapped with the enriched CDS. The enrichment score of each gene was calculated as the highest score among all gene transcripts of this gene.

[0026] Figure 35: Quality control of NS1 enrichment of exon library. The scatter plot shows the correlation of the frequency of gene transcripts (left panel) and exons (right panel) between two replicates, post selection against NS1 protein. The two replicates were randomly selected from the total 6 replicates that were performed, for the purpose of easier visualization. Selecting different replicates shows similar correlations (R=0.94- 0.99 for pair-wise spearman correlation).

[0027] Figure 36: Confirmation of NS1 binding cellular proteins. Complementary to

Figure 5, interactions between NS1 proteins and the newly identified cellular binders were examined by Co-IP-westem. HA-tagged NS1 protein and FLAG-tagged cellular protein were co-expressed in 293T cells. Cells were lysed 48 hours post transfection. NS1 were pull-down with HA beads and detected for co-eluted cellular proteins using FLAG antibody.

[0028] Figure 37: Enriched GO pathways correlated between Expanded PED and AP-

MS. GO enrichment analysis was performed for proteins identified by expanded PED method and AP-MS. The enrichment score for each pathway is shown.

[0029] Figure 38: FASN is required for viral replication. The effect of FASN on viral replication was examined with shRNA knock-down. shRNA construct targeting CPSF1 was transfected in 293T cells. The expression level of FASN was examined by western blot (upper panel). Viral replication capacity in cells with or without FASN knockdown was examined using mCherry reporter assay (bottom panel). [0030] Figure 39: Low cell toxicity of C75. Cell viability post C75 treatment was measured by CCK8 assay at 24 hours post-treatment. The doses that were used have low cell toxicity but have significant impact on viral replication (Figure 14), which is shown in log scale.

[0031] Figure 40: Effect of NS1 on lipid and cholesterol synthesis. Panels A-D: The percentage of synthesized lipid over total lipid is shown for myristic acid (14:0), palmitic acid (16:0), palmitoleic acid (16: 1) and stearic acid (18:0). Data is shown as the average of 4 biological replicates and error bar represents standard deviation. No significant differences of cholesterol level were detected among different conditions. *P<0.05, **P<0.01, ***P<0.001 (two-tailed t-test).

[0032] Figure 41 and Figure 42: D92Y NS1 mutant has reduced binding to CPSF1.

Figure 41 : Scatter plot shows the MiST score of all proteins detected by AP-MS using WT or mutant NS1 protein as bait. 38 proteins showed to have increased binding to mutant D92Y protein, while 80 has reduced binding to the mutant, including CPSF1 and the known CPSF complex member FIP1L1. Figure 42: Interactions between NS1 proteins (WT and D92Y mutant) with CPSF1 were examined by in vitro binding. FLAG- tagged CPSF1 protein were expressed in 293T cells, purified by FLAG antibody and eluted with FLAG peptides. Binding was performed by incubating purified CPSF1 with HA-tagged NS1 that conjugated to beads by HA antibody.

[0033] DETAILED DESCRIPTION OF THE INVENTION

[0034] Disclosed herein is an exon mRNA display method for the detection of protein- protein interactions— referred to herein as“Protein interaction detection by Exon Display (PED)”— that uses an exon library to achieve an even representation across a proteome. PED leverages high-throughput sequencing to quantify the interactions of a given bait protein with a given exon library presented by mRNA display.

[0035] As disclosed herein, the multi-functional NS1 protein of influenza A virus (IAV) was used as model bait to exemplify PED. Compared to an AP-MS approach conducted in parallel and compared with the published literature, several interactors that have been previously described were identified. However, several new interactions were discovered using PED and these new interactions were then validated through

immunoprecipitation and immunoblotting. Because of the exon library design, the protein-protein interactions are traceable to potential domains of each interacting protein. Additionally, PED can be used to examine the mechanisms underlying the function and/or activity of a given protein or mutant by, for example, comparing the differential cellular binders of the given protein or mutant with a similar protein or the wildtype protein. PED can also be used as a complementary approach for the identification of PPIs, particularly for the identification of low abundance interactors.

[0036] Specifically, as disclosed herein, using a human exon display library, 25 cellular binders that potentially have direct interactions with NS1 were identified using PED. These results were correlated with AP-MS data and the literature, revealing novel NS1 interactions with FASN, AKT1, and PMSB IO. It was found that NS 1 can directly bind fatty acid synthase (FASN) and affect its function, FASN protein expression was up- regulated with NS1 protein expression alone or during viral infection, and cellular lipid synthesis was significantly up-regulated upon NS1 protein expression. PED was used to examine the differential cellular binders of wild type NS 1 and a single point mutant, D92Y, which had been previously shown to weaken the ability of NS1 to disrupt the interferon response, and it was found that the D92Y mutant failed to engage CPSF1, likely resulting in an increased host response.

[0037] PED provides several advantages over affinity purification-mass spectrometry (AP-MS). For example, PED overcomes some of the limitations of AP-MS by converting the task of detecting a large number of different proteins to that of detecting nuclear acid sequences, thereby significantly increasing the sensitivity and

reproducibility. The relatively even distribution of the exon library enables equal representation of cellular proteins, regardless of their expression levels in specific types of cells. Moreover, the high-throughput, comprehensive sequencing of both the input and bound output allows some quantitative analysis of binding affinity and increases the confidence level of hit identification. Due to the isolation of specific protein domains, interactions that are identified are likely to be direct and thereby simplify downstream analysis and hypothesis generation.

[0038] PED also presents advantages over mRNA display by using natural cDNA

libraries. The evenly distributed input exon libraries enable the identification of interactors with single round of selection, and no specific elution conditions are needed for target binders. Moreover, as the input libraries are extracted from genomic DNA, one not need to re-generate cDNA libraries for each tissue type of a given organism. The information obtained from PED also provides more robust analyses as one can directly map the sequence reads to exons, without the need for peak calling and peak comparison for fragmented cDNA libraries. These advantages also allow scaling up PED for parallel experiments. For these reasons, PED is a unique and advantageous way to detect and analyze protein-protein interactions.

[0039] As provided in the detailed experiments herein, the input library was constructed through exon enrichment of a randomly fragmented DNA library. As such, the translated protein fragments may not correlate with an intact exon, and thereby the proteins in the library may not accurately represent the proteome of any particular cell type. Although C-terminal flag tags and a GFP bait control was used to filter out prematurely stopped or aggregated proteins and furthermore remove fragments with the wrong orientation or frame-shifts during data analysis, the presence of some protein fragments may still remain. Therefore, in some embodiments, the input library may be modified or optimized to minimize protein fragments that do not represent intact exons by, for example, using methods in the art to control the length and composition of input library or to utilize an evenly distributed ORF library.

[0040] Because PED is based on direct physical protein-protein interactions, it cannot be used to characterize other indirect protein interactions. Additionally, PED cannot detect interactions that depend on posttranslational modifications or depend on tertiary folds within a protein structure that may cross multiple exons. Further, protein interactions with low affinity or of a transient nature may be difficult to detect or analyze using PED. Therefore, in some embodiments, PED may be used in combination with other methods in the art, such as genome-wide Bi-FC, proximity labeling, protein correlation profiling methods, and the like.

[0041] Although PED is exemplified herein using NS1 of influenza A virus (IAV), PED may be used as a quantitative high-throughput method to examine other PPIs of proteins such as those of Hepatitis C Virus (HCV), Human Immunodeficiency Virus (HIV), Zika Virus (ZIKV), other influenza viruses, and the like, alone or in parallel. As an example of using PED for“parallel” analysis, one may examine the property of 10⁵ or more mutants of a given protein in a single experiment by coupling saturating mutagenesis and high-throughput sequencing. This allows for the discovery of many loss-of-function mutations and offers distinctive opportunities for mechanistic studies. Differential screening is an important approach to elucidate the mechanisms underlying loss of function mutations on the viruses and to uncover cellular functions that are activated or inhibited by viral proteins. Through quantitative analysis of direct protein-protein interactions, PED allows one to explore the PPIs across such genetic mutants in a high- throughput manner to uncover related mechanisms. Thus, PED is a new assay method for the identification of PPIs, which can be used to interpret functional variations among protein variants and mutants by quantitatively profiling their interactomes.

[0042] Therefore, the present invention provides an assay method for protein-protein interactions, which comprises obtaining an exon library of interest, preparing an mRNA library by transcribing the exons of the exon library, generating a peptide library by translating the mRNA sequences of the mRNA library, generating an mRNA display library by linking the peptides of the peptide library with the mRNA sequences of the mRNA library, generating an input library of cDNA sequences by reverse transcription of the mRNA sequences of the mRNA display library, enriching the cDNA sequences of the input library to obtain an enriched library of enriched cDNA sequences by using one or more proteins of interest as bait proteins, and obtaining enriched peptides from the enriched cDNA sequences, contacting the enriched peptides with the one or more proteins of interest, and analyzing any interactions between the enriched peptides and the one or more proteins of interest. In some embodiments, the steps from preparing the mRNA library to obtaining enriched cDNA sequences are repeated one or more times using the enriched cDNA sequences as the exon library in the repeated steps. In some embodiments, the step of preparing the mRNA library comprises adding a T7 promoter and a FLAG peptide sequence. In some embodiments, the step of preparing the mRNA library comprises linking the transcribed exons to puromycin. In some embodiments, the step of generating the mRNA display library comprises purifying the peptide-mRNA complexes by FLAG tag selection. In some embodiments, the step of enriching the cDNA sequences comprises contacting the cDNA sequences of the input library with the bait proteins and amplifying the cDNA sequences that bind the bait proteins by PCR amplification. In some embodiments, the cDNA sequences of the input library and/or the enriched cDNA sequences of the enriched library are sequenced. In some embodiments, the exon library is generated from fragmented DNA. In some

embodiments, the exon library is generated from a given cell type and/or organism of interest. In some embodiments, the exon library is obtained from genomic DNA. In some embodiments, the exon library is obtained from fragmented genomic DNA. In some embodiments, the method further comprises minimizing protein fragments that do not represent complete exon sequences by using methods in the art to control the length and composition of the input library or utilizing an open reading frame (ORF) library that is substantially evenly distributed. An ORF library that is substantially evenly distributed is one wherein the frequencies of the ORFs in the library are substantially equal. A substantially evenly distributed ORF library may be obtained using methods in the art. For example, one can extract and/or amplify ORFs using methods that are not biased by transcription or initial abundance (i.e., initial copy numbers in a starting sample). In some embodiments, the enriched peptides that bind the one or more proteins of interest and/or the resulting protein-protein complexes are further analyzed, e.g., sequenced, subjected to X-ray crystallography, etc.

[0043] Kits

[0044] In some embodiments, the present invention provides kits for performing one or more assays as described herein. In some embodiments, the kits comprise one or more reagents, e.g. , blocking buffers, assay buffers, diluents, wash solutions, etc. In some embodiments, the kits comprise additional components such as interpretive information, control samples, reference levels, and standards.

[0045] In some embodiments, the kits include a carrier, package, or container that may be compartmentalized to receive one or more containers, such as vials, tubes, and the like. In some embodiments, the kits optionally include an identifying description or label or instructions relating to its use. In some embodiments, the kits include information prescribed by a governmental agency that regulates the manufacture, use, or sale of compounds and compositions as contemplated herein.

[0046] Applications

[0047] The methods and kits as contemplated herein may be used in the evaluation of protein interactions with a protein of interest. In some embodiments, the methods and kits may be used for experiments to elucidate mechanism of action of the given protein. The methods and kits may be used to elucidate the underlying mechanisms of a given disease, develop and/or screen for candidate protein-based therapeutics that may be used to treat the given disease, and/or assess the efficacy of a given protein-based therapeutic for treating the given disease.

[0048] The methods and kits may be used to identify diseases that are caused by a given protein and/or identify mutant proteins that are involved in the pathology of a given disease. The methods and kits may be used to study mechanisms, e.g. , mechanisms and pathways involving given protein. The methods and kits may be used to develop and screen for therapeutics that reduce or block the binding of a given protein to its intended protein (binding partner). [0049] The following examples are intended to illustrate but not to limit the invention.

[0050] EXAMPLES

[0051] As a proof-of-principle, PED was applied to Influenza A Virus (IAV) NS 1 to better understand the PPIs that drive the multi-functionality of this critical host regulator. PED enabled us to validate several previously described PPIs as well as identify several novel cellular binders, interactors that are of low abundance in the cell. Interaction with one such host protein, fatty acid synthase (FASN), allows NS1 to directly regulate cellular fatty acid synthesis and lipid metabolism. As quantified by gas chromatography- mass spectrometry (GC-MS), lipid synthesis was significantly upregulated by NS1 protein expression. In addition, PED was used to examine and compare PPIs of IAV wild type NS1 and a single amino acid point mutant, D92Y, which is known to abrogate NSl’s role in blocking the interferon response. While most interactions are conserved, the D92Y mutant failed to bind CPSF1 and, as a result, failed to suppress immune activation at the transcriptional level. PED is highly complementary to current methodologies for PPI discovery, enabling both the detection of low abundance interactors and interaction domain mapping. Furthermore, the use of high-throughput DNA sequencing as the readout for PED enables sensitive quantification of interactions, ultimately enabling massively parallel experimentation for the investigation of the cellular protein interactome.

[0052] Establishment of a Human Exon Library for mRNA Display to Detect Protein

Interactions

[0053] Human genomic DNA was extracted from the peripheral blood mononuclear cells (PBMCs) of two independent, anonymous donors (Figure 1). The genomic DNA was fragmented and filtered to a size of 300-700 bp (Figure 27 & Figure 28). Two rounds of exon enrichment were performed to generate an exon DNA library. T7 promoter and FLAG tag DNA sequences were added onto the 5’ and 3’ termini of each enriched exon fragment, respectively (Figure 29). Following in vitro transcription and mRNA purification, a puromycin linker was ligated onto the 3’ ends of the mRNA. In vitro translation was then performed using rabbit reticulocyte lysate. FLAG purification was used for pre-selection of the input library, an essential step to remove untranslated mRNAs or prematurely stopped proteins. mRNA/cDNA duplexes were then generated through reverse transcription to remove secondary structures of mRNAs, prevent mRNA degradation during selection procedures, and enable PCR amplification post-selection. This final displayed exon library includes complexes of nascent proteins and their corresponding mRNA/cDNA, which can be used to screen for binding to immobilized protein baits, a technique referred to herein as Protein interaction detection by Exon Display (PED). High-throughput sequencing can then be used to monitor the frequency change of each exon fragment before and after selection enrichment.

[0054] The library quality post-FLAG purification was evaluated using next-generation sequencing. Raw sequencing reads were mapped onto human reference genome (hgl9). Exon frequencies in the biological duplicates were highly correlated (R=0.99, Figure 2). 55% of total input reads contained exon sequences, an over 70-fold enrichment of exon representation relative to the human genome (1.4%). Between two donors, on average, 66% (1268138/1967640) of exons and 92% (707359/723789) of coding DNA sequences (CDS) were covered.

[0055] As an mRNA display approach using a cDNA library was previously reported for the identification of cellular binders using a multi-stage selection protocol, a cDNA library was generated from A549 cells to compare the coverage with the exon library.

As expected, the frequency of each gene transcript in the cDNA library correlated highly with transcript frequency as monitored by RNA-Seq, with heavy representation bias towards highly expressed genes (Figure 30). In contrast, the frequency distribution of gene transcripts is significantly more even for the exon library (two-sample KS test, p<0.001), which peaked at 4-6 reads per million (RPM, Figure 3). HTSeq software was used to count the gene features of the two libraries. Consistently, the exon library showed a more even distribution in the frequency of gene counts compared with the cDNA library (two-sample KS test, p < 0.001, Figure 31). 0.4% of genes had read counts more than 1000 RPM for the cDNA library, while the number reduced to 0.01% (40-fold decrease) for exon library. The even distribution of input library ensures a more equal representation of each gene in the final selection pool, especially of those genes with a low expression level in cells.

[0056] To ensure the exon library would enable the detection of protein interactions, a trial immunoprecipitation experiment was performed to test whether the RNA- puromycin-protein prey could efficiently bind to corresponding protein antibody as the bait. A 3xHA tag sequence and an influenza NS1 sequence was spiked into the exon library to a frequency of 0.01% and synthesized a new input library of fusion proteins. Anti -HA antibody (monoclonal) and anti -NS 1 antibodies (monoclonal and polyclonal) were immobilized onto protein G beads as baits and incubated with the fusion library. The enrichment of HA or NS1 sequences after one round of selection was examined by real-time qPCR. Normalized to input, a 2-16 fold enrichment was observed of the expected prey (Figure 32). Taken together, these results demonstrate the quality of the exon library and the capability of the mRNA displayed exon library to detect PPIs.

[0057] Identification of Cellular Binders of Influenza Virus NS1 Protein

[0058] As a proof-of-principle, the Protein interaction detection by Exon Display (PED) method was employed to examine the cellular binders of the influenza virus (IAV) protein, NS1 (A/WSN/33 (H1N1) strain). NS1 is important for efficient virus replication, being largely responsible for counteracting the host immune response and interfering with multiple cellular pathways. NS1 with a C-terminal HA tag was expressed in 293T cells and conjugated to anti-HA beads alongside GFP-HA as a control. Extensive washing steps were performed to clean the conjugated bait proteins. These purified baits were then incubated with the fusion libraries for three hours, precipitated, washed, and the precipitated fractions prepared for deep sequencing as output library (Figure 33). Enrichment scores of each exon and corresponding gene transcript were calculated as the relative frequency in the selected output library to that in the input library. The highest enrichment score of functional transcripts was selected to represent the binding capacity of corresponding protein to the bait (Figure 34). Six biological replicates of NS1 selection were performed, revealing a modest correlation (Figure 35). After filtering out potential sticky binders that were enriched in the GFP control condition, 25 proteins were enriched in all six replicates and were thus identified as potential cellular binders interacting with NS1 (Figure 4, Table 1). Among the identified binders, four of them (CPSF4, CPSF1, PABPC1, NFX1) were previously reported and validated by Immunoprecipitation (IP)-Western, and one of them (SF3B2) was identified through previous published AP-MS screening.

[0059] Of the 25 potential interactors, 10 were chosen for further validation by co-IP (Figure 5). These 10 genes include previously identified binders, as well as novel binders that may have functional roles in influenza virus replication pathways as suggested by gene ontology (GO) enrichment (Figure 6). Each cellular gene was cloned into a FLAG-tagged vector and co-expressed with C-terminal 2xStrep-tagged NS1 in 293 T cells. FLAG-tagged GFP and an empty vector were used as controls. Pulling down on the FLAG-tagged proteins, all 10 binders were able to co-precipitate NS1 with the exception of APMD3 and TGFB2 (Figure 5). Notably, three unreported binders of NS1, namely PSMB10, AKT1, and FASN, were identified and validated. These three interactions were also confirmed by reciprocal co-IP while APMD3 and TGFB2 again failed to validate (Figure 36). GO analysis revealed that the mRNA surveillance pathway and the regulation of lipid metabolic process are the top ranked pathways enriched in NS1 binders (Figure 6). The identified binders were mapped onto the cellular gene interaction network using STRING (search tool for recurring instances of neighboring genes) database in the art (Figure 7). The interaction between NS1 and the mRNA cleavage and polyadenylation specificity factor (CPSF) complex (including CPSF1 and CPSF4 here) has been well described and is essential for shut-off of the host response. Besides the cluster of RNA surveillance and transport related proteins, AKT1 is a hub protein that has multiple connections. The NS1 protein is known to induce the activation of the PI3K/AKT pathway, which supports viral replication, but no direct physical connection with AKT1 has been previously reported. Moreover, inhibition of ART activity was shown to restrict viral growth. Besides ART, a few other novel interactors have known functional roles in viral replication (Figure 7). The identified binders were significantly enriched for influenza virus host factors when compared with cellular proteins globally (chi-square test, p=0.002), suggesting the linkage between physical and functional interactions.

[0060] Comparison between PEP and AP-MS

[0061] To obtain a detailed comparison between PED and AP-MS, quantitative AP-MS was performed using the same IAV NS1 clone. NS1 and GFP as a control were cloned with a C-terminal 2xStrep tag into a lentiviral vector and the subsequent virus used to transduce A549 cells. An antibody against the Strep tag was used to affinity purify the baits and affiliated protein complexes in three biological replicates. Samples were subjected to on-bead digest and the resultant peptides analyzed by tandem mass spectrometry using methods in the art. As NS 1 is known to interact with the interferon (IFN) pathway and the basal expression level of many IFN-stimulated genes is low in A549 cells, these experiments were performed in the presence and absence of 12-hour pre-treatment with type I interferon (IFND at 1,000 U/ml). Interacting proteins identified by mass spectrometry were scored for confidence based on their specificity, reproducibility, and abundance using the Mi ST scoring algorithm in the art. A total of 317 proteins were found to interact with NS1 with a MiST score > 0.8: 161 baits were found regardless of treatment condition, 41 were identified only in the absence of IFN, and 115 proteins were identified only in the presence of IFN (Table 2). Among the 25 genes that were identified with high-confidence by PED, only DDX6 and CPSF1 were identified by both methodologies. Nevertheless, GO analysis revealed an enrichment of the same major pathways, including RNA processing and RNA 3’ processing (Figure 8).

[0062] One explanation for the limited overlap is that PED relies on the expression of shorter exon fragments, not whole proteins, and so preferentially identifies direct interactors while AP-MS will pull down entire protein complexes. To examine this, the PED database was expanded by the inclusion of known protein complexes and compared this again to the AP-MS data. All validated human protein complexes were extracted from the COREIM database that included the 25 interactors, increasing the dataset to 232 proteins (Table 3). In this extended dataset, 20 proteins overlapped directly with the AP- MS result while the enriched pathways displayed extensive overlap (Figure 9, Figure 10, Figure 37). 558 proteins in the literature that had been reported to interact with NS1, including data from three AP-MS studies, one yeast-two-hybrid study, and all factors from the VirHost database were identified. 49 (21%) of extended PED binders and 155 (49%) AP-MS binders overlapped directly with these prior reports. Overall, these data suggest that while both AP-MS and PED can be used to identify high-confidence interactors, they have distinct advantages for the identification of protein complexes and direct binders, respectively.

[0063] One important feature and potential advantage of PED over AP-MS is that it should enable the identification of low abundance interactors. These are often missed through proteomic techniques and hard to detect using traditional purification techniques. To quantitatively examine this point, the cellular abundance of identified binders in the matched PED and AP-MS datasets were compared. Mass spectrometry data of the A549 cellular proteome were obtained from PRIDE and used to calculate the average abundance of each set of interactors using methods in the art. A significantly lower abundance of PED interactors compared with AP-MS interactors was observed (Figure 11). This was further confirmed at the transcript level based on the A549 RNA-Seq data. Consistently, the binders identified by PED had significantly lower transcript levels than those identified by AP-MS (Figure 11).

[0064] An additional advantage of the PED data is that it might indicate the exact

binding domains for each interactor. As the exon library was largely composed of individual exons or exon fragments, the interaction is localized to specific and

identifiable regions of each protein. For example, CPSF4 and PABPCl both appeared as significant interactors in the PED dataset and both have been previously validated as NS1 interactors. Loops 2 and 3 (the entirety of exon 3 and part of exon 4) of CPSF4 were shown to be important for NS 1 binding based on a co-crystal structure (PDB:

2RHK), and residues 365-535 of PABPCl were found to be essential for the NS1 interaction based on deletion mapping. Examining the enrichment score of each exon in the PED data, exon 3 of CPSF4 showed the strongest enrichment for binding with NS1, which encodes the reported binding region (Figure 12). Similarly, exons 8-11 were also among the highest enriched segments in PABPCl, overlapping with the reported binding region (Figure 12). However, some false positive exons peaking outside the known binding area, such as exon 1 of CPSF4 and exon 5 of PABPCl, possibly due to the technical limitations of current PED method, were observed.

[0065] Collectively, the above results suggest that PED is capable of identifying cellular interactors of proteins and in a manner complementary to AP-MS approaches. PED also offers the potential advantages of identifying low abundance interactors and in facilitating the identification of binding domains.

[0066] NS I Regulates Cellular Lipid Metabolism by Binding to FASN

[0067] Among the detected binders, FASN (Fatty Acid Synthase) was used for

functional analysis. FASN is a multi-functional protein that is critical for catalyzing and regulating fatty acid synthesis in mammalian cells. It was reported to be a host factor of influenza virus, but has not been previously shown to bind to any influenza viral protein. While influenza virus infection significantly induces fatty acid biosynthesis, the mechanisms driving this induction have remained elusive. Above, demonstrates an interaction between influenza virus NS1 and FASN (Figure 5, Figure 35). Upon NS1 expression by transient transfection, the mRNA expression level of FASN did not significantly change, however, the protein steady-state level increased (Figure 13).

FASN was knocked down by transient transfection of shRNA in 293T cells, and co transfected with a mCherry reporter for viral replication. Infection of these cells with wild type WSN virus (strain A/WSN/33) indicate a drop in viral replication with decreased FASN protein expression (Figure 38). Virus replication also decreased with an increasing concentration of an FASN inhibitor, C75, while cell viability was unaffected (Figure 14, Figure 39).

[0068] To determine if expression of the NS1 protein can impact fatty acid synthesis directly, gas chromatography-mass spectrometry (GC-MS) was used to quantitatively measure de novo synthesized fatty acids using ¹³C isotopomer enrichment analysis.

Inducible lentiviral vectors expressing influenza A virus NS1 (strain A/WSN/33 and strain A/Cal/04/09 (H1N1 pdm)), PB2 (strain A/WSN/33), PA (strain A/WSN/33), and GFP as a control, were constructed. Lentiviruses were used to transduce A549 cells, which were induced to express the respective protein products with doxycycline for 24 hours. Cells were then switched to complete media containing 50% U¹³C-glucose for 24 hours to label de novo synthesized lipids. Absolute amounts of de novo synthesized fatty acids, including myristic acid (14:0), palmitic acid (16:0), palmitoleic acid (16: 1), stearic acid (18:0), and cholesterol were determined by GC-MS. Iotopologue distributions were fit by Isotopomer Spectral Analysis (ISA) using methods in the art. Myristic acid, palmitic acid, and stearic acid are the direct products of FASN, while palmitoleic acid is an indirect product of FASN following desaturation. The NS1 proteins from both influenza A virus stains increased the de novo synthesis of all tested fatty acids relative to the GFP control (Figure 15-Figure 18, Figure 40). While a slight increase in myristic acid was observed upon PB2 expression, no other viral protein tested caused a significant change in any of the monitored fatty acids. Furthermore, NS1 expression did not cause any significant change on cholesterol synthesis upon NS1 overexpression, whose levels is not controlled by FASN (Figure 19). These results suggest that the binding of NS1 to FASN results in the up-regulation of fatty acid synthesis and lipid metabolism and that this up-regulation is beneficial to virus replication. [0069] Revealing the Mechanism Underlying an IFN-Sensitive NS1 Mutation

[0070] As the enrichment of the exon library by PED gives a semi-quantitative

evaluation of the binding efficiency, one potential application is the direct comparison of PPI binding by different protein variants and mutations. This can be particularly valuable for the mechanistic interrogation of mutations discovered by genetic screens. Towards this end, the interaction profile of a previously described NS1 mutant, D92Y, that was discovered in the high-throughput genetic screen for interferon (IFN) sensitive variants was investigated. Cells infected with influenza A viruses containing the D92Y mutation produce higher amounts of IFN compared to infection with wild-type virus, indicating an inability of this mutant NS1 to inhibit IFN induction, but the exact mechanism of this loss-of-function is unknown. To determine if this phenotype could be due to a change in the mutant NS1 protein’s PPI profile, PED was used on the NS1 D92Y (strain A/WSN/33). Among the 25 NS1 binders identified to bind to the wild-type protein (Figure 4), only one interactor, CPSF1, failed to bind the D92Y mutant (Figure 20). In comparison, the AP-MS derived profiles of the wild-type protein and D92Y NS1 mutant revealed 80 proteins with loss of binding affinity to the mutant, including CPSF1 and the known CPSF complex member FIP1L1 (Figure 41, Table 2). While this large change in PPI profile in the AP-MS data likely reflects global changes to complex recruitment and protein localization, it does not suggest a clear mechanistic hypothesis like the PED result, again pointing to the high complementarity of these two approaches. Together, the data from PED and AP-MS suggest that the D92Y mutation results in a loss of binding to the CPSF complex. In support of this, immunoprecipitation of HA- tagged NS1 in 293T cells resulted in co-immunoprecipitation of Flag-tagged CPSF1, but not the D92Y mutant (Figure 21). Furthermore, purified CPSFl-Flag was found to interact with NS1-HA in vitro , but this interaction was reduced by the D92Y mutation (Figure 42).

[0071] CPSF1 is the largest component of CPSF complex, which is critical for pre- mRNA 3' processing, cleavage, and poly(A) addition. By blocking the function of CPSF complex, influenza A virus NS1 inhibits cellular mRNA transport and protein expression, including the expression of many anti-viral interferon-stimulated genes. To examine the inhibition of cellular protein expression, a GFP expression plasmid, pEGFP- Cl, was used as a reporter. While wild-type NS1 efficiently down-regulated the expression of the GFP reporter, the D92Y mutant had no impact on GFP expression relative to the control (Figure 22). The reduced protein expression is consistent with reduced mRNA levels, as quantified by real-time PCR after reverse transcription with oligo-dT, relative to the expression of GAPDH (Figure 23). Conversely, over-expression of CPSF1 to prevent NS 1 disruption of the CPSF complex results in significant inhibition of wild-type influenza A virus replication, but not of the D92Y mutant virus, which already lacks CPSF complex recruitment (Figure 24).

[0072] CPSF1 is a large, multi-domain protein and its binding interface with NS1 has not been previously mapped. By mining the PED data, enrichment of exons located on the N-terminus of the protein, especially exon 5 and 6 (was observed Figure 25). To test if this was indeed the binding site, the secondary structure and exon arrangements of CPSF1 and fragmented the protein into 6 small regions that should still fold properly was examined (Figure 25). All fragments were expressed well in 293T cells upon transient transfection. Immunoprecipitation of each fragment revealed that only fragment 1, corresponding to amino acids 1-313 and exons 1-8, pulled down NSl (Figure 26), consistent with the region predicted by PED.

[0073] MATERIALS AND METHODS

[0074] Cells, Viruses, and Plasmids

[0075] An eight-plasmid reverse genetics system of influenza A/WSN/33 virus (WSN) was utilized to reconstitute WT and mutant viruses using methods in the art. 293 T cells were cultured in DMEM (Corning) with 10% FBS (Corning). A549 cells were cultured in RPMI 1640 (Coming) with 10% FBS (Corning). 293T cells were used for

transfection of mammalian expression plasmid to overexpress viral and cellular proteins. A549 cells were used for transduction of lenti -virus vector expressing each bait protein for AP-MS. NS1, PB2, PB1, PA, NP (from WSN strain), NS1 from Cal09 and GFP protein were cloned into pcDNA5 mammalian expression vector with lxHA affinity tag. WSN NS 1 protein was cloned into Lenti-X Tet-one inducible expression plasmid with 2xStrep affinity tag at C-terminus. Cellular proteins with lxFLAG tag were purchased from Harvard plasmid database, Origene, or amplified from cellular mRNA/cDNA and cloned into pCMV mammalian expression vector. NS1 protein with D92Y mutation was generated using a PCR-based site-directed mutagenesis strategy.

[0076] Generation of Exon Library for mRNA Display

[0077] PBMCs were obtained from UCLA CARF Virology Core, collected from

antonymous donors. Genomic DNAs were extracted using the DNeasy Blood and Tissue Kit (Qiagen). DNAs were fragmented using Covaris focused-ultrasonic technology, and size-selected with a range from 300 bp-700 bp. The fragmented DNAs were end- repaired (NEB), dA-tailed using klenow exo- (NEB) and ligated with customized Y shape adaptor as below.

5 ' -GGAGCCGCTACCCTTATCGTCGTCATCCTTGTAATCTGCCTGGCTTCCAGTGGAGCT (SEQ ID NO: 5)

3 ' -CCCTGTTAATGATAAATGTTAATGGTGGTACCGAAGGTCACCTCG-p (SEQ ID NO: 6) [0078] Fragments were amplified and hybridized twice to human exon array using

Roche NimbleGen SeqCapkit (Roche). The hybridized fragments were finally enriched with the customized primer linking to the Y shape adaptor as below. A T7 promoter and Kozac sequence (5’ end), as well as a constant 3’ linker sequence encoding a Flag-tag were encoded in the adaptor sequence, for affinity purification from the in vitro translation reaction.

T7Koz : TTCTAATACGACTCACTATAGGGACAATTACTATTTACAATTACCACCATGG (SEQ ID NO: 7)

Lib Rev: GGAGCCGCTACCCTTATCGTCG (SEQ ID NO: 8)

[0079] Generation of cDNA Library for mRNA Display

[0080] mRNAs from A549 cells were extracted using Trizol (Thermo Fisher) and

fragmented with 10 mM magnesium at 94°C for 4 mins. Reverse transcription was performed with super-script III system (Invitrogen) using poly-dT (IDT) as primers. Second strand synthesis was performed using mRNA second strand synthesis module (NEB). The resulting double strand DNAs were end-repaired, dA-tailed, and ligated with customized Y shape adaptor, same as for the exon library.

[0081] Expression and Purification of Bait Proteins

[0082] Open reading frames of viral proteins were cloned into pcDNA5 mammalian expression plasmid with a C terminal HA tag. For each replicate, about 150 million 293T cells were transfected with 100 ug DNA plasmid with Calcium Phosphate transfection reagents (Clotech). Cells were lysed at two days post-transfection with binding buffer (50 mM Tris-HCl pH 7.4, 0.5% NP-40, 150 mM KC1, 1 mM EDTA and protease inhibitors). Cell lysates were incubated with HA beads (Sigma- Aldrich) for overnight at 4°C with constant agitation, and washed with wash buffer (50 mM Tris-HCl pH 7.4, 2% NP-40, 300 mM KC1, 1 mM EDTA and protease inhibitors) for 5 times.

[0083] mRNA Display and Sequencing

[0084] For each reaction, the exon library (DNA templates) was transcribed by T7 run off transcription (Ambion), and 1 nmole of mRNA was ligated to the pF30P linker (Phospho-polyA-spacer9-spacer9-spacer9-ACC-puromycin, 1.2 nmoles) via the splint oligonucleotide (1.1 nmoles) by T4 DNA ligase (NEB) in a 200 pL reaction. After purification and isolation of ligated mRNA templates, in vitro translation was performed using reticulocyte lysate (Ambion) in 100 pL reaction volume followed by incubation with KC1 (500 mM final) and MgCb (60 mM final) for 30 minutes at room temperature to enhance fusion formation. The mRNA-protein fusions were then affinity-purified using M2 anti-Flag beads (Sigma-Aldrich) to remove sequences containing nonsense mutations and non-fused RNA templates and proteins. After elution with 3 x Flag peptides, the fusions were reverse transcribed with super script III (Invitrogen) and a fraction of the purified sample was reserved to determine the frequencies of each coding sequence in the input library. The purified fusion sample was incubated with bait protein for 3 hours at 4°C. After washing, the immobilized fusion samples were eluted by heat (95 °C) and PCR amplified using the following primers (T7-Rec, Lib Rev). The amplified DNA fragments from input and post selection were then prepared for high throughput sequencing using lllumina Hiseq PEI 50. Barcodes of 6 bps were added to distinguish among different samples. 6 biological replicates were performed for each bait protein, including in vitro transcription, translation, and enrichment steps.

T7 -Rec : GGGACAATTACTATTTACAATTACCACCATGG (SEQ ID NO: 9)

Lib Rev: GGAGCCGCTACCCTTATCGTCG (SEQ ID NO : 4 )

[0085] Sequencing Data Analysis

[0086] Data were analyzed by customized bash and python scripts. Paired-end fastq reads were de-multiplexed into corresponding samples by the 6 bp barcodes. Reads were mapped to human reference genome (hgl9) using Tophat2 using default parameters. Unmatched or multi-matched (> 2) reads were filtered out. Coding frame of each fragment was analyzed by the mapping position and corresponding CDS position. Only the reads with correct orientation and frame for defined ORFs on cDNAs were considered. The reads counts for each CDS and transcript features were calculated using bamtools. The enrichment score for each CDS was calculated as the relative frequency of a particular CDS in the selection pool to that in the input library. The enrichment of transcripts was also evaluated by summarizing the frequency of all related CDSs in the selection library, to that in the input library.

[0087] To reduce the noise, only the genes showed to have both enriched CDSs (the highest enrichment score of CDS of corresponding gene > 2) and enriched transcripts (enrichment score >1), and were consistent among replicates were considered as hits for the corresponding bait protein. Genes that showed enrichment in GFP bait control were considered to be noise and filtered out. The enrichment score of a specific gene was calculated as the highest enrichment score of the corresponding functional transcripts, representing the binding possibility of corresponding cellular gene to the bait protein. Enrichment Score_Gene = Max (Enrichment Score_transcripts )

[0088] Affinity Purification-Mass Spectrometry (AP-MS)

[0089] WT and D92Y mutant NS1 ORFs were cloned into the pLVX-TetOne-Puro vector with 2xStrep tag at the N-terminus. A 2xStrep tagged GFP was cloned as a control. Lenti viruses were generated by transfecting 293 T cells with pLVX-TetOne- Puro vector, Gag-Pol packaging construct and VSV-G envelope. A549 cells were transduced with the generated lentiviruses and selected under 1 pg/ml puromycin. The expression of desired viral proteins was confirmed by western blot.

[0090] For immunoprecipitation, A549 cells expressing different viral proteins (or the control GFP protein) were induced with 1 ug/ml Dox for 24 hours. They were left untreated or treated with 1000 U/ml type I interferon for 12 hours before harvesting.

Cells were lysed, cleared of cellular debris, and bound with 20 pL of Strep-Tactin Sepharose beads (IBA Lifesciences) in 550 pL IP buffer (50mM Tris-HCl, pH7.4, 150 mM NaCl, and 1 mM EDTA). Beads were washed 4 times (2 times with 0.05% NP-40 and 2 times without) prior to on-bead protein digest. Streptactin-purified proteins were reduced and alkylated on beads with 20 pL reduction-alkylation buffer (50 mM Tris- HCl, pH 8.0, 2 M Urea, 1 mM Dithiothreitol (DTT), 3 mM iodoacetamide). An additional 3 mM DTT was added to quench the reaction, and proteins were digested with 0.75 pg trypsin (Invitrogen). Formic acid was added to a final concentration of 1% to acidify the peptides. Peptides were desalted using Agilent OMIX C18 10 pL tips.

[0091] Digested peptides were subjected to LC-MS/MS analysis using an Easy-nLC

1000 coupled to a dual-pressure linear ion trap (Velos Pro) Orbitrap Elite mass spectrometer (Thermo Fisher Scientific, San Jose, CA). Peptides were eluted by a gradient of 5% to 30% acetonitrile in 0.1 % formic acid in 110 minutes delivered at a flow rate of 300 nL/minute. For each cycle, one full MS scan (150-1500 m/z, resolution of 120,000) in the Orbitrap was followed by 20 data-dependent MS/MS scans fragmented by normalized collision energy (setting of 35 %) and acquired in the linear ion trap. Raw MS files were analyzed by MaxQuant version 1.3.0.3 and MS/MS spectra searched by the Andromeda search engine against a database containing reviewed SwissProt human and influenza protein sequences (20,226 total). MiST scoring algorithm was used to assign scores to bait-prey interactions against the GFP controls.

[0092] Network and GO Analysis

[0093] Gene ontology enrichment analysis was performed through metascape. STRING was used as cellular PPI databases. Network analysis was performed using networkx package in python.

[0094] Gas Chromatography-Mass Spectrometry (GC-MS)

[0095] GC-MS were performed using methods in the art. Briefly, cells were cultured in a 1 : 1 ratio of U¹³C glucose tracer for 24 hours. Prior to collection, cells were imaged on Molecular Devices ImageXpress XL to assess cell numbers. Then cells were dissolved in 6 M Guanidine HC1 and transferred to glass tubes for derivatization with 3M methanolic guanidine HC1. Samples were prepared alongside standard curve samples made up of FAMES mix (Nu-chek Prep, GLC 20a) and Cholesterol (Sigma, C8667).

[0096] Total cellular fatty acids were prepared by mild acid methanolysis (Ichihara and

Fukubayashi, 2010) with the modifications as previously described in York et al. (2015). Integration and quantification was performed on MassHunter Quantitative Analysis Program (Agilent Technologies, B.06.00). Analysis for total quantification of Fatty Acids and Cholesterol and relative contributions of synthesis to the respective pool over labeling period were determined by fitting the isotopologue distributions to Isotopomer Spectral Analysis (ISA).

[0097] Examining the Cellular Abundance of Detected Binders

[0098] Protein abundance of binders detected by PED method and AP-MS were

compared. Mass spectrometry raw files of A549 cellular proteome were downloaded from PRIDE (project: PXD000418). Raw files were searched against Unipro human proteome database using ProLuCID search engine with protein FDR < 0.01.“AP-MS” protein category was identified in the NSl-AP-MS experiments with score > 0.8 in either IFN treated or non-treated A549 cells. A549 RNA-seq quantification with 3 biological replicates was downloaded from ENCODE (experiment ENCSR937WIG). [0099] Immunoprecipitation and In Vitro Binding

[0100] Immunoprecipitation experiments were performed with HA- and FLAG- tagged proteins expressed in 293T cells. Briefly, cells were transfected with corresponding expression plasmids with Lipofectamine 2000 reagents (Invitrogen), and lysed at two days post-transfection with RIPA buffer (50 mM Tris-HCl pH 7.4, 0.5% NP-40, 150 mM NaCl, 1 mM EDTA and protease inhibitors). For binding experiments with NP protein, 300 nM NaCl was used to further increase the stringency. Cell lysates were incubated with 1 pg anti-FLAG for overnight at 4°C with constant agitation, washed with RIPA buffer 5 times and eluted with 60 mΐ of SDS-PAGE sample buffer. All samples were subjected to SDS-PAGE and western blotting analysis.

[0101] For in vitro binding experiments, FLAG-tagged CPSF1 was expressed in 293T cells. Cells were lysed at two days post-transfection with RIPA buffer, and bound with anti-FLAG antibody for 4 hours at 4°C with constant agitation. Then CPSF1 protein was eluted with 3X FLAG peptide overnight at 4°C with constant agitation. HA-tagged WT or D92Y mutant NS1 protein was expressed in 293T cells, lysed at two days post transfection and bound with anti -HA overnight at 4°C. They were washed 5 times with RIPA buffer and incubated with eluted CPSF1 protein for 4 hours at 4°C. Samples were then washed with RIPA buffer 5 times and eluted with 60 mΐ of SDS-PAGE sample buffer. All samples were subjected to SDS-PAGE and western blotting analysis.

[0102] Western Blotting

[0103] Proteins in SDS-PAGE sample buffer were heated at 95°C, resolved by SDS-

PAGE gel electrophoresis, and then transferred onto PVDF membrane. Proteins were detected with antibodies against FLAG-epitope, HA-epitope, or actin.

[0104] shRNA

[0105] shRNA against FASN is ordered from Sigma (SHCLNG-NM_004104). Non mammalian shRNA control (SHC002) were used as scramble control. FASN was knocked down by transient transfection of shRNA in 293T cells, and co-transfected with a mCherry reporter for viral replication. 24 hours post transfection, cells were infected with wild type WSN virus (strain A/WSN/33) at MOI 0.1. The mCherry reporter intensity were examined with fluorescence microscope at 24 hours post infection. [0106] Viral Replication Assay (TCID 50, Viral Copy Number, mCherry Reporter)

[0107] TCID50 assay were performed in A549 cells by observing cellular cytotoxic effect (CPE). Viral copy numbers were measured using real-time PCR using standard curve with the following primer targeting NP segment.

NP-Forward: GAC GAT GCA ACG GCT GGT CTG (SEQ ID NO: 10)

NP-Reverse: ACC ATT GTT CCA ACT CCT TT (SEQ ID NO: 11)

[0108] 50 ng virus-inducible mCherry reporter were transfected in 293T cells in 24 well plates. Media were changed 24 hours post transfection and cells were infected with indicated virus for 24 hours. The mCherry signal was observed under fluorescence microscope, as an indicator of the replication capacity of virus.

[0109] Generation of Mutant Viruses

[0110] Individual mutant viral plasmids were generated by quick-change system. To generate the mutant virus, about 2 million 293T cells were transfected with 10 pg DNA. To measure the growth curve, about 1 million A549 cells were infected with MOI 0.1 and supernatants were collected at the indicated time to measure viral titer.

[0111] Quantification of mRNA ( GFP, poly A )

[0112] 293 T cells were transfected with WT or mutant NS1 protein (100 ng) and GFP reporter (10 ng). At 24 hours post transfection, total cellular RNAs were extracted from infected cells with the Purelink RNA Mini Kit (Ambion) and reverse transcribed by Superscript III Reverse Transcriptase (Thermo Fisher) using oligo-dT as primer.

Quantitative real-time PCR was performed using Taq polymerase and SYBG.

[0113] Fragmentation of CPSF1

[0114] To avoid disrupting protein structures and restricting protein expression, the secondary structure and intrinsic unstructured region of CPSF1 was evaluated and the entire protein was fragmented into 6 small regions. Six fragments of CPSF1 were constructed by cloning the corresponding regions into pCMV6 vector.

[0115] Statistical Analysis

[0116] Detailed statistical analysis were included in each figure legend. Results were analyzed by two-tailed Student’s / test. Differences were considered statistically significant when p<0.05 (*), p<0.01 (**) or p<0.001(***). [0117] Data Availability

[0118] Scored mass spectrometry data files are provided in Table 2. The sequencing data are deposited to NIH Short Read Archive (SRA) with access numbers

PRJNA520773 and PRJNA383938.

[0119] EXEMPLARY PROTOCOL

[0120] The following is an exemplary PED protocol:

[0121] Generation of Exon Library

1) Purify genome DNA from PBMCs

2) DNA fragmentation and size selection

3) End repair, dA tailing, ligation

4) Amplification after ligation

5) elution of hybridization

6) Final enrichment

7) End repair and dA tailing of the fragmented library

8) Ligation to customized Y shape adaptor to add T7 promotor and FLAG tag (as below)

5 ' -GGAGCCGCTACCCTTATCGTCGTCATCCTTGTAATCTGCCTGGCTTCCAGTGGAGCT (SEQ ID NO: 1)

3 ' -CCCTGTTAATGATAAATGTTAATGGTGGTACCGAAGGTCACCTCG-p (SEQ ID NO: 2)

9) PCR to amplify the ligated product

After enrichment, wash, and perform a 2^nd PCR step. Preferably, a 3 : 1 ratio of the test library: normal library and add equal amounts of forward/reverse custom and standard primers for the PCR step. Qubit concentration was about 17.1 ng/pL (about 45 pL total).

[0122] mRNA Display

• For first round, scale up to cover desired complexity

^■ PCR to generate gExon Library from stock (Generation of gExon library details in the gExon Lib protocol)

Stock is generated by exon enrichment from human PBMC

400 pL PCR reaction system, 40 pL template, using high fidelity enzyme KOD

56°C, 6-8 cycles, primers to use:

T7 Tmv Koz : TTCTAATACGACTCACTATAGGGACAATTACTATTTACAATTACCACCATGG (SEQ ID NO: 3)

Lib Rev: GGAGCCGCTACCCTTATCGTCG (SEQ ID NO: 4)

Check PCR by gel: should be about 300-700 bp

[0123] In Vitro Transcription

o 150-300 pL reaction volume

o DNA template concentration is about 100-200 nM - measuring the pool DNA o 5x transcription buffer:

^■ 10 mM Spermidine

^■ 125 mM MgCh

^■ 400 mM Tris (pH = 7.5)

o 5x NTP = 20 mM each

o incubate at 65°C for 3-5 minutes

o Cool 1 min, then add T7 RNA polymerase (Ambion or Promega).

o Incubate at 37°C for 2-3 hours

o Add 1/10^th volume of 0.5 M EDTA to dissolve the phosphate precipitate, purify using PCR-cleanup column (use 1 column per 150 pL rxn). Elute with 50 pL X 2 RNAase free ECO,

o Take OD to measure concentration

Exemplary 300 uL system:

DNA 100 pL

5* buffer 60 pL

10* DTT 30 pL

5& NTP 60 pL

T7 15 pL

H2O 35 pL

[0124] Ligation

o 100-200 pL reaction volume

o 8-10 pM RNA

o 11 pM splint, 12 mM linker (pF30P)

o Then anneal at 65°C for 1 min

o 20 pL T4 DNA ligase buffer, 2 pL T4 DNA ligase

o room temp, 60 minutes

Exemplary 200 pL system

8 pL splint (up to 10 pL)

4 pL linker (up to 5 pL)

150 pL RNA

14 pL H2O o 65°C 1 min

o Add 20 pL buffer

o Room temp 1 min

o Ice 1 min

o Add 4 pL T4 ligase

o PCR purification

o Elute with 50 pL X 2 RNAase free H2O

Splint (splint20) : TTTTTTTTTTTTGGAGCCGCTACCCTTATCGT (SEQ ID NO: 12)

Linker (pF3 OP) : Phospho-poly A-spacer9-spacer9-spacer9- ACC-puromycin [0125] Lambda Exonuclease digestion (get rid of splint)

Exonuclease https://www.neb.com/products/m0262-lambda-exonuclease

200-400 pL reaction system, 37°C, 1 h, PCR purification

Elute with 50 pL X 2 RNAase free EhO

Exemplary 400 pL system:

Ligated RNA 100 pL

10* buffer 40 pL

Lambda exonuclease 12 pL

H2O 248 pL

[0126] dT beads purification (get rid of non-ligated portion) using Dynabeads® mRNA

DIRECT™ Kit Ambion 61011

o Prepare oligo(dT)25 Dynabeads as follows:

i. Use an appropriate amount of oligo(dT)25 Dynabeads (50 pL per 50 pL sample) ii. Collect the beads on a magnetic stand

iii. Wash twice with an equal volume of IX binding buffer

o Resuspend the Dynabeads in an appropriate volume of IX binding buffer (50 pL per sample). Aliquot the Dynabeads to each well/tube containing total RNA in 50 pL binding buffer. Mix by pipetting

o Heat to 65°C for 2 min.

o Incubate at room temperature for 5-10 min with occasional shaking

o Collect and wash the beads twice with 150 pL of washing buffer A.

o Collect and wash the beads with 150 pL of washing buffer B

o Resuspend the Dynabeads in 20-30 pL of RNAnase free H2O, 75°C for 5 min, then directly move to ice code magnetic strand

o Take the H2O and measure OD, if needed, do another round of binding & elution

In some embodiments, gel purification of the ligated product may be used in place of lambda endonuclease digestion and dT beads purification.

[0127] Ethanol precipitation of purified ligated mRNA (optional)

Add 2-2.5 fold of 100% ethanol into purified mRNA

Add 1/10 volume of sodium acetate

Centrifuge 12000 rpm, 5 min at 4°C

Wash once or twice with 90% ethanol

Dry and dissolve with H2O

Then the product is ready for translation. Can store in -20°C, however, the remaining steps are preferably done the without storage.

[0128] Translation

Option 1 : Use PURE system (E. coli, https://www.neb.eom/products/e6800-purexpress- invitro-protein-synthesis-kit)

Exemplary 25 uL translation system:

Solution A 10 pi

Solution B 7.5 pi

RNase Inhibitor 1 pL

Nuclease-free H2O x pL Ligated mRNA x uL (0.4 uM RNA (ligated) template)

Total 25 mΐ

37°C for 1.5 h o Then add salt to facilitate fusion formation, per 50 pL rxn:

^■ 12 pL 3 M KC1, 4 pL 1 M MgCh

(preferably 18 pL 2M KC1, 4 pL 1 M MgCh)

^■ Incubate at room temperature for 30 minutes

Option 2: Use Ambion retie lysate (rabbit)

^■ For the Fn template, 100 mM KOAc is preferred

Exemplary 25 pL translation system:

17 pL lysate

1.25 pL 20X translation buffer

4 pL RNA

2.75 pL Water

0.4 pM RNA (ligated) template

1 hour at 30°C

Note that the Kozak sequence & the distance between Kozak and ATG is different between the two systems, the current primer is best matched with the mammalian cell system (rabbit), but also can be done with E. coli.

^■ All the buffer below for binding and washing can be used:

TBK (more K inside the cell instead of Na) - mild selection condition

25 mM Tris

150 mM KC1

0.02% Tween (Detergent, can also use 0.5% NP40)

Note: The buffer condition can be changed, e.g., glycerol can be added.

^■ Anti-flag preselection

o Wash 10 pL M2 agarose beads / reaction with TBK

o Add 10 pL 0.5M EDTA . Dilute to 500 pL with TBK, and add to 10 pL M2 agarose beads / reaction (can do binding in spin-x cup)

o Bind 1 hour, 4°C, then wash 5 times (1500 ref, 1 min/20 s, 4°C) with cold TBK. o Change to the new EP tube, elute by incubating with 100 pL of IX RT buffer (5X first strand buffer Invitrogen)with 0.2 mg/ml 3X Flag peptide at room temperature, rotate half circle for 30 minutes

o RT, 1500 ref, 1 min for two times (with and without parafilm)

^■ Reverse transcription before selection to prevent RNA degrade and prevent secondary structure

o 100 pL reaction volume

o 150 pmoles Fn oligo #10 (use concentrated stock, 1.5 pL of 100 pM primer) o 2 pL of concentrated DNTP mix, 25 mM each

o 100 pL elution buffer

o 1 pL of superscript III

o 20°C for 10 min, 42°C for 60 min Exemplary reaction

RNA 1 pL

5X FS 20 pL

FnOligo 1.5 pL

DNTP 2 pL

H2O 74.5 pL

SSIII 1 pL

[0129] A ffinity Enrichment

o For cellular targets, prepare one day before, binding to proper beads and wash extensively (5 times with detergent) before use

o Bind beads (loaded with target) in 0.2 ml buffer (can do binding in spin-x filter cup), cold room for 2-3 hours

o wash 5 times in 0.2 ml TBK

o Elute with 50-100 pL H2O by 95°C 1 min, RT, 1500 ref, 1 min, elution can be stored at -20°C.

o PCR amplify the eluted enriched library using:

T7 -Short: GGGACAATTACTATTTACAATTACCACCATGG (SEQ ID NO: 5)

Lib Rev: GGAGCCGCTACCCTTATCGTCG (SEQ ID NO: 4)

KOD system

10X KOD buffer 2.5 pL

dNTP 2.5 pL

MgS0₄ 1.5 pL

Primer 1.5 pL for each

Template 1 pL

KOD enzyme 0.5 pL

dd H2O _

Total 25 pL

Input about 16 cycles, and enriched elution usually takes about 26 cycles.

[0130] REFERENCES

1. Tripathi, S. et al. Meta- and Orthogonal Integration of Influenza‘oMICs’ Data Defines a Role for UBR4 in Virus Budding. Cell Host Microbe 18, 723-735 (2015).

2. Watanabe, T. et al. Influenza virus-host interactome screen as a platform for antiviral drug development. Cell Host Microbe 16, 795-805 (2014).

3. Brass, A. L. et al. Identification of host proteins required for HIV infection through a functional genomic screen. Science 319, 921-926 (2008).

4. Konig, R. & Stertz, S. Recent strategies and progress in identifying host factors involved in virus replication. Curr. Opin. Microbiol. 26, 79-88 (2015).

5. Karlas, A. et al. Genome-wide RNAi screen identifies human host factors crucial for influenza virus replication. Nature 463, 818-822 (2010).

6. Gack, M. U. et al. Influenza A virus NS1 targets the ubiquitin ligase TRIM25 to evade recognition by RIG-I. Cell Host Microbe 5, 439-449 (2010). Shapira, S. D. et al. A Physical and Regulatory Map of Host-Influenza Interactions Reveals Pathways in HINI Infection. Cell 139, 1255-1267 (2009).

Lee, S. et al. An integrated approach to elucidate the intra-viral and viral-cellular protein interaction networks of a gamma-herpesvirus. PLoS Pathog. 7, (2011).

Konig, R. et al. Human host factors required for influenza virus replication. Nature 463, 813-817 (2010).

Shapira, S. D. et al. A Physical and Regulatory Map of Host-Influenza Interactions Reveals Pathways in HINI Infection. Cell 139, 1255-1267 (2009).

Wang, L. et al. Comparative influenza protein interactomes identify the role of plakophilin 2 in virus restriction. Nat. Commun. 8, 1-12 (2017).

Levy, M. L, Washburn, M. P. & Florens, L. Probing the Sensitivity of the Orbitrap Lumos Mass Spectrometer Using a Standard Reference Protein in a Complex

Background. J. Proteome Res. 17, 3586-3592 (2018).

Joung, J. K., Ramm, E. I. & Pabo, C. O. A bacterial two-hybrid selection system for studying protein - DNA and protein - protein interactions. (2000).

Fields, S., Song, O. & Schnier, J. A novel genetic system to detect protein-protein interactions. Nature 340, (1989).

Smith, G. P. & Petrenko, V. A. Phage display. Chem. Rev. 97, 391-410 (1997).

Kim, S.-H. & Park, S.-Y. Selection and characterization of human antibodies against hepatitis B virus surface antigen (HBsAg) by phage-display. Hybrid. Hybridomics 21, 385-392 (2002).

GP, S. Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228, 1315-1317 (1985).

McCafferty, J., Griffiths, A. D., Winter, G. & Chiswell, D. J. Phage antibodies:

filamentous phage displaying antibody variable domains. Nature 348, 552 (1990).

Vega, N. M., Allison, K. R., Khalil, A. S. & Collins, J. J. Signaling-mediated bacterial persister formation. Nat. Chem. Biol. 8, 431-3 (2012).

Galan, A. et al. Library-based display technologies: where do we stand? Mol. BioSyst.

12, 2342-2358 (2016).

Pepper, L. R., Cho, Y. K., Boder, E. T. & Shusta, E. V. A decade of yeast surface display technology: Where are we now? Comb. Chem. High Throughput Screen. 11, 127-134 (2008).

Cherf, G. M. & Cochran, J. R. Yeast Surface Display. 1319, 155-175 (2015).

Weaver-Feldhaus, J. M. et al. Yeast mating for combinatorial Fab library generation and surface display. FEBS Lett. 564, 24-34 (2004).

Manuscript, A. & Libraries, N. P. Ribosome Display and Related Technologies. 805, 287-297 (2012).

Schaffitzel, C., Hanes, J., Jermutus, L. & Pliickthun, A. Ribosome display: An in vitro method for selection and evolution of antibodies from libraries. J. Immunol. Methods 231, 119-135 (1999).

Cujec, T. P., Medeiros, P. F., Hammond, P., Rise, C. & Kreider, B. L. Selection of v-abl tyrosine kinase substrate sequences from randomized peptide and cellular proteomic libraries using mRNA display. Chem. Biol. 9, 253-264 (2002). Wilson, D. S., Keefe, A. D. & Szostak, J. W. The use of mRNA display to select high- affinity protein-binding peptides. Proc. Natl. Acad. Sci. 98, 3750-3755 (2001).

Wang, H. & Liu, R. Advantages of mRNA display selections over other selection techniques for investigation of protein-protein interactions. Expert Rev. Proteomics 8, 335-346 (2011).

Huang, B. C. & Liu, R. Comparison of mRNA-display -based selections using synthetic peptide and natural protein libraries. Biochemistry 46, 10102-10112 (2007).

Olson, C. A., Wu, N. C. & Sun, R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24, 2643-2651 (2014).

Olson, C. A. et al. Rapid mRNA-Display Selection of an IL-6 Inhibitor Using

Continuous-Flow Magnetic Separation. Angew. Chem.Int.Ed 50, 8295-8298 (2011). Olson, C. A. et al. Single-Round , Multiplexed Antibody Mimetic Design through mRNA Display. Angew. Chem.Int.Ed 51, 12449-12453 (2012).

Shen, X. et al. Scanning the human proteome for calmodulin-binding proteins. Proc.

Natl. Acad. Sci. U. S. A. 102, 5969-74 (2005).

Ju, W. et al. Proteome-wide identification of family member-specific natural substrate repertoire of caspases. Proc. Natl. Acad. Sci. USA 104, 14294-14299 (2007).

Hale, B. G., Randall, R. E., Ortin, J. & Jackson, D. The multifunctional NS1 protein of influenza A viruses. J. Gen. Virol. 89, 2359-76 (2008).

Kochs, G., Garcia-Sastre, A. & Martinez-Sobrido, L. Multiple Anti -Interferon Actions of the Influenza A Virus NS1 Protein. J. Virol. 81, 7011-7021 (2007).

Das, K. et al. Structural basis for suppression of a host antiviral response by influenza A virus. Proc. Natl. Acad. Sci. U. S. A. 105, 13093-8 (2008).

Burgui, L, Aragon, T., Ortin, J. & Nieto, A. PABPl and eIF4GI associate with influenza virus NS1 protein in viral mRNA translation initiation complexes. J. Gen. Virol. 84, 3263-3274 (2003).

Hale, B. G., Randall, R. E., Ortin, J. & Jackson, D. The multifunctional NS1 protein of influenza A viruses. J. Gen. Virol. 89, 2359-2376 (2008).

Dubois, J., Terrier, O. & Rosa-Calatrava, M. Influenza viruses and mRNA splicing: Doing more with less. MBio 5, 1-13 (2014).

Szklarczyk, D. et al. The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362-D368 (2017).

Ehrhardt, C. et al. Influenza A Virus NS1 Protein Activates the PI3K/Akt Pathway To Mediate Antiapoptotic Signaling Responses. J. Virol. 81, 3058-3067 (2007).

Hirata, N. et al. Inhibition of Akt kinase activity suppresses entry and replication of influenza virus. Biochem. Biophys. Res. Commun. 450, 891-898 (2014).

Davis, Z. H. et al. Global mapping of herpesvirus-host protein complexes reveals a transcription strategy for late genes. Mol. Cell 57, 349-360 (2015).

Batra, J. et al. Protein Interaction Mapping Identifies RBBP6 as a Negative Regulator of Ebola Virus Replication Article Protein Interaction Mapping Identifies RBBP6 as a Negative Regulator of Ebola Virus Replication. Cell 175, 1917-1930. el3 (2018). Ruepp, A. et al. CORUM: The comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 36, 646-650 (2008).

Ruepp, A. et al. CORUM: The comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Res. 38, 497-501 (2009).

Guirimand, T., Navratil, V. & Lyon, D. VirHostNet 2 . 0 : surfing on the web of virus / host. Nucleic Acids Res. 43, 583-587 (2015).

Hultin-Rosenberg, L., Forshed, L, Branca, R. M. M., Lehtio, J. & Johansson, H. J. Defining, Comparing, and Improving iTRAQ Quantification in Mass Spectrometry Proteomics Data. Mol. Cell. Proteomics 12, 2021-2031 (2013).

Tisoncik-go, J. et al. Integrated Omics Analysis of Pathogenic Host Responses during Pandemic H1N1 Influenza Virus Infection : The Crucial Role of Lipid Metabolism Resource Integrated Omics Analysis of Pathogenic Host Responses during Pandemic H1N1 Influenza Virus Infection : The Crucial Role of Lipid Metabolism. Cell Host Microbe 19, 254-266 (2016).

Munger, J. et al. Systems-level metabolic flux profiling identifies fatty acid synthesis as a target for antiviral therapy. Nat. Biotechnol. 26, 1179-1186 (2008).

Williams, K. J. et al. An essential requirement for the SCAP/SREBP signaling axis to protect cancer cells from lipotoxicity. Cancer Res. 73, 2850-2862 (2013).

Wu, N. C. et al. High-throughput identification of loss-of-function mutations for anti interferon activity in the influenza A virus NS segment. J. Virol. 88, 10157-64 (2014). Letunic, T, Doerks, T. & Bork, P. SMART: Recent updates, new developments and status in 2015. Nucleic Acids Res. 43, D257-D260 (2015).

Letunic, I. & Bork, P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 1-4 (2017). doi: 10.1093/nar/gkx922

Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433-3434 (2005).

Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J. Mol. Biol. 347, 827-839 (2005).

Yates, J. R. in Annual Review of Biophysics and Biomolecular Structure 297-316 (2004). doi: 10.1146/annurev.biophys.33.111502.082538

Zhang, Y., Fonslow, B. R., Shan, B., Baek, M. & Yates, J. R. Protein Analysis by Shotgun / Bottom-up Proteomics. (2013). doi: 10.1021/cr3003533

Miller, K. E. et al. Bimolecular fluorescence complementation(BiFC) analysis: advances and recent applications for genome-wide interaction studies. J. Mol. Biol. 427, 2039- 2055 (2016).

Sung, M. K. et al. Genome-wide bimolecular fluorescence complementation analysis of SUMO interactome in yeast. Genome Res. 23, 736-746 (2013).

Havugimana, P. C. et al. A Census of Human Soluble Protein Complexes. 150, 1068- 1081 (2012).

Shin, Y. et al. SH3 Binding Motif 1 in Influenza A Virus NS1 Protein Is Essential for PI3K / Akt Signaling Pathway Activation□. J. Virol. 81, 12730-12739 (2007). York, A. G. et al. Limiting Cholesterol Biosynthetic Flux Spontaneously Engages Type I IFN Signaling. Cell 163, 1716-1729 (2015).

Heaton, N. S. & Randall, G. Dengue virus-induced autophagy regulates lipid

metabolism. Cell Host Microbe 8, 422-432 (2010).

Heaton, N. S. et al. Dengue virus nonstructural protein 3 redistributes fatty acid synthase to sites of viral replication and increases cellular fatty acid synthesis. Proc. Natl. Acad. Sci. 107, 17345-17350 (2010).

Nasheri, N. et al. Modulation of fatty acid synthase enzyme activity and expression during hepatitis C virus replication. Chem. Biol. 20, 570-582 (2013).

Qi, H. et al. Systematic identification of anti-interferon function on hepatitis C virus genome reveals p7 as an immune evasion protein. 114, 3-8 (2018).

Qi, H. et al. A quantitative high-resolution genetic profile rapidly identifies sequence determinants of hepatitis C viral fitness and drug sensitivity. PLoS Pathog. 10, el004064 (2014).

Doud, M. B., Hensley, S. E. & Bloom, J. D. Complete mapping of viral escape from neutralizing antibodies. bioRxiv (2016).

Al-Mawsawi, L. Q. et al. High-throughput profiling of point mutations across the HIV-1 genome. Retrovirology 11, 124 (2014).

Heaton, N. S., Sachs, D., Chen, C.-L, Hai, R. & Palese, P. Genome-wide mutagenesis of influenza virus reveals unique plasticity of the hemagglutinin and NS1 proteins. Proc. Natl. Acad. Sci. U. S. A. 110, 20248-53 (2013).

Du, Y. et al. Genome-wide identification of interferon-sensitive mutations enables influenza vaccine design. Science (80- ). 359, (2018).

Du, Y. et al. Effects of Mutations on Replicative Fitness and Major Histocompatibility Complex Class I Binding Affinity Are Among the Determinants Underlying Cytotoxic- T -Lymphocyte Escape of HIV-1 Gag Epitopes. MBio 8, e01050-17 (2017).

Gong, D. et al. High-Throughput Fitness Profiling of Zika Virus E Protein Reveals Different Roles for Glycosylation during Infection of Mammalian and Mosquito Cells High-Throughput Fitness Profiling of Zika Virus E Protein Reveals Different Roles for Glycosylation during Infection of Mammalian and Mosquito Cells. ISCIENCE 1, 97- 111 (2018).

Hoffmann, E., Krauss, S., Perez, D., Webby, R. & Webster, R. Eight-plasmid system for rapid generation of influenza virus vaccines. Vaccine 20, 3165-3170 (2002).

Hoffmann, E. & Neumann, G. A DNA transfection system for generation of influenza A virus from eight plasmids. Proc. Natl. Acad. Sci. 97, 6108-6113 (2000).

Neuhauser, N., Michalski, A., Scheltema, R. A., Olsen, J. V & Mann, M. Andromeda : A Peptide Search Engine Integrated into the MaxQuant Environment. 1794-1805 (2011). doi : 10.1021/prl 01065j

Consortium, T. U. UniProt : a hub for protein information. 43, 204-212 (2015).

Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p . p . b . -range mass accuracies and proteome-wide protein quantification. Nat.

Biotechnol. 26, 1367-1372 (2008).

Consortium, T. E. P. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57 (2012). 82. Lutz, A., Dyall, J., Olivo, P. D. & Pekosz, A. Virus-inducible reporter genes as a tool for detecting and quantifying influenza A virus replication. J. Virol. Methods 126, 13-20 (2005).

[0131] All scientific and technical terms used in this application have meanings

commonly used in the art unless otherwise specified.

[0132] As used herein, a given percentage of“sequence identity” refers to the percentage of nucleotides or amino acid residues that are the same between sequences, when compared and optimally aligned for maximum correspondence over a given comparison window, as measured by visual inspection or by a sequence comparison algorithm in the art, such as the BLAST algorithm, which is described in Altschul et al ., (1990) J Mol Biol 215:403-410. Software for performing BLAST ( e.g ., BLASTP and BLASTN) analyses is publicly available through the National Center for Biotechnology Information (ncbi.nlm.nih.gov). The comparison window can exist over a given portion, e.g., a functional domain, or an arbitrarily selection a given number of contiguous nucleotides or amino acid residues of one or both sequences. Alternatively, the comparison window can exist over the full length of the sequences being compared. For purposes herein, where a given comparison window (e.g, over 80% of the given sequence) is not provided, the recited sequence identity is over 100% of the given sequence.

Additionally, for the percentages of sequence identity of the proteins provided herein, the percentages are determined using BLASTP 2.8.0+, scoring matrix BLOSUM62, and the default parameters available at blast.ncbi.nlm.nih.gov/Blast.cgi. See also Altschul, et al., (1997) Nucleic Acids Res 25:3389-3402; and Altschul, et al, (2005) FEBS J 272:5101- 5109.

[0133] Optimal alignment of sequences for comparison can be conducted, e.g, by the local homology algorithm of Smith & Waterman, Adv Appl Math 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J Mol Biol 48:443 (1970), by the search for similarity method of Pearson & Lipman, PNAS USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by visual inspection.

[0134] As used herein, the terms“protein”,“polypeptide” and“peptide” are used

interchangeably to refer to two or more amino acids linked together. Groups or strings of amino acid abbreviations are used to represent peptides. Except when specifically indicated, peptides are indicated with the N-terminus on the left and the sequence is written from the N-terminus to the C-terminus.

[0135] Proteins may be made using methods in the art including chemical synthesis, biosynthesis or in vitro synthesis using recombinant DNA methods, and solid phase synthesis. See , e.g ., Kelly & Winkler (1990) Genetic Engineering Principles and Methods, vol. 12, J. K. Setlow ed., Plenum Press, NY, pp. 1-19; Merrifield (1964) J Amer Chem Soc 85:2149; Houghten (1985) PNAS USA 82:5131-5135; and Stewart & Young (1984) Solid Phase Peptide Synthesis, 2ed. Pierce, Rockford, IL, which are herein incorporated by reference. Proteins may be purified using protein purification techniques in the art such as reverse phase high-performance liquid chromatography (HPLC), ion-exchange or immunoaffmity chromatography, filtration or size exclusion, or electrophoresis. See , e.g., Olsnes and Pihl (1973) Biochem. 12(16):3121-3126; and Scopes (1982) Protein Purification, Springer-Verlag, NY, which are herein incorporated by reference. Alternatively, the polypeptides may be made by recombinant DNA techniques in the art. Thus, polynucleotides that encode proteins are contemplated herein. In some embodiments, the polypeptides and polynucleotides are isolated.

[0136] As used herein, an“isolated” compound refers to a compound that is isolated from its native environment. For example, an isolated polynucleotide is a one which does not have the bases normally flanking the 5’ end and/or the 3’ end of the

polynucleotide as it is found in nature. As another example, an isolated polypeptide is a one which does not have its native amino acids, which correspond to the full-length polypeptide, flanking the N-terminus, C-terminus, or both. For example, an isolated fragment of polypeptide refers to an isolated polypeptide that consists of only a portion of the full-length protein or comprises some, but not all, of the amino acid residues of the wild-type protein and non-native amino acids, i.e., amino acids that are different from the amino acids found at the corresponding positions of the wild-type protein, at its N- terminus, C-terminus, or both. In some embodiments, isolated polynucleotides and polypeptides are made“by the hand of man”, e.g. , using synthetic and/or recombinant techniques.

[0137] As used herein, a“substantially purified” compound refers to a compound that is removed from its natural environment and/or is at least about 60% free, preferably about 75% free, and more preferably about 90% free, and most preferably about 95-100% free from other macromolecular components or compounds with which the compound is associated with in nature or from its synthesis. [0138] As used herein,“antibody” refers to naturally occurring and synthetic immunoglobulin molecules and immunologically active portions thereof (i.e., molecules that contain an antigen binding site that specifically bind the molecule to which antibody is directed against). As such, the term antibody encompasses not only whole antibody molecules, but also antibody multimers and antibody fragments as well as variants (including derivatives) of antibodies, antibody multimers and antibody fragments.

Examples of molecules which are described by the term“antibody” herein include: single chain Fvs (scFvs), Fab fragments, Fab’ fragments, F(ab’)2, disulfide linked Fvs (sdFvs), Fvs, and fragments comprising or alternatively consisting of, either a VL or a VH domain.

[0139] As used herein, a compound ( e.g ., receptor or antibody)“specifically binds” a given target (e.g., ligand or epitope) if it reacts or associates more frequently, more rapidly, with greater duration, and/or with greater binding affinity with the given target than it does with a given alternative, and/or indiscriminate binding that gives rise to non specific binding and/or background binding. As used herein,“non-specific binding” and “background binding” refer to an interaction that is not dependent on the presence of a specific structure (e.g, a given epitope). An example of an antibody that specifically binds a given protein is an antibody that binds the given protein with greater affinity, avidity, more readily, and/or with greater duration than it does to other compounds. As used herein,“binding affinity” refers to the propensity of a compound to associate with (or alternatively dissociate from) a given target and may be expressed in terms of its dissociation constant, Kd. In some embodiments, the antibodies have a Kd of 10 ⁵ or less, 10 ⁶ or less, preferably 10 ⁷ or less, more preferably 10 ⁸ or less, even more preferably 10 ⁹ or less, and most preferably 10 ¹⁰ or less, to their given target. Binding affinity can be determined using methods in the art, such as equilibrium dialysis, equilibrium binding, gel filtration, immunoassays, surface plasmon resonance, and spectroscopy using experimental conditions that exemplify the conditions under which the compound and the given target may come into contact and/or interact. Dissociation constants may be used determine the binding affinity of a compound for a given target relative to a specified alternative. Alternatively, methods in the art, e.g, immunoassays, in vivo or in vitro assays for functional activity, etc., may be used to determine the binding affinity of the compound for the given target relative to the specified alternative. Thus, in some embodiments, the binding affinity of the antibody for the given target is at least 1-fold or more, preferably at least 5-fold or more, more preferably at least 10-fold or more, and most preferably at least 100-fold or more than its binding affinity for the specified alternative.

[0140] As used herein, the term“sample” is used in its broadest sense and includes

specimens and cultures obtained from any source, as well as biological samples and environmental samples. Biological samples may be obtained from animals (including humans) and encompass fluids, solids, tissues, and gases. Biological samples include blood products, such as plasma, serum, and the like. A biological sample can be obtained from a subject using methods in the art. A sample to be analyzed using one or more methods described herein can be either an initial unprocessed sample taken from a subject or a subsequently processed, e.g ., partially purified, diluted, concentrated, fluidized, pretreated with a reagent (e.g, protease inhibitor, anti -coagulant, etc.), and the like. In some embodiments, the sample is a blood sample. In some embodiments, the blood sample is a whole blood sample, a serum sample, or a plasma sample. In some embodiments, the sample may be processed, e.g, condensed, diluted, partially purified, and the like. In some embodiments, the sample is pretreated with a reagent, e.g, a protease inhibitor. In some embodiments, two or more samples are collected at different time intervals to assess any difference in the amount of the analyte of interest, the progression of a disease or disorder, or the efficacy of a treatment. The test sample is then contacted with a capture reagent and, if the analyte is present, a conjugate between the analyte and the capture reagent is formed and is detected and/or measured with a detection reagent.

[0141] As used herein, a“capture reagent” refers to a molecule which specifically binds an analyte of interest. The capture reagent may be immobilized on a assay substrate. For example, if the analyte of interest is an antibody, the capture reagent may be an antigen or an epitope thereof to which the antibody specifically binds.

[0142] As used herein, an“assay substrate” refers to any substrate that may be used to immobilize a capture reagent thereon and then detect an analyte when bound thereto. Examples of assay substrates include membranes, beads, slides, and multi-well plates.

[0143] As used herein, a“detection reagent” refers to a substance that has a detectable label attached thereto and specifically binds an analyte of interest or a conjugate of the analyte of interest, e.g, an antibody-analyte conjugate.

[0144] As used herein, a“detectable label” is a compound or composition that produces or can be induced to produce a signal that is detectable by, e.g, visual, spectroscopic, photochemical, biochemical, immunochemical, or chemical means. The use of the term “labeled” as a modifier of a given substance, e.g, a labeled antibody, means that the substance has a detectable label attached thereto. A detectable label can be attached directly or indirectly by way of a linker (e.g, an amino acid linker or a chemical moiety). Examples of detectable labels include radioactive and non-radioactive isotopes (e.g,

1251, 18F, 13C, etc.), enzymes (e.g, b-galactosidase, peroxidase, etc.) and fragments thereof, enzyme substrates, enzyme inhibitors, coenzymes, catalysts, fluorophores (e.g, rhodamine, fluorescein isothiocyanate, etc.), dyes, chemiluminescers and luminescers (e.g, dioxetanes, luciferin, etc.), and sensitizers. A substance, e.g., antibody, having a detectable label means that a detectable label that is not linked, conjugated, or covalently attached to the substance, in its naturally-occurring form, has been linked, conjugated, or covalently attached to the substance by the hand of man. As used herein, the phrase“by the hand of man” means that a person or an object under the direction of a person (e.g, a robot or a machine operated or programmed by a person), not nature itself, has performed the specified act. Thus, the steps set forth in the claims are performed by the hand of man, e.g, a person or an object under the direction of the person.

[0145] As used herein, the terms“subject”,“patient”, and“individual” are used

interchangeably to refer to humans and non-human animals. The term "non-human animal" includes all vertebrates, e.g, mammals and non-mammals, such as non-human primates, horses, sheep, dogs, cows, pigs, chickens, and other veterinary subjects and test animals. In some embodiments, the subject is a mammal. In some embodiments, the subject is a human.

[0146] As used herein, the term“diagnosing” refers to the physical and active step of informing, i.e., communicating verbally or by writing (on, e.g, paper or electronic media), another party, e.g, a patient, of the diagnosis. Similarly,“providing a prognosis” refers to the physical and active step of informing, i.e., communicating verbally or by writing (on, e.g, paper or electronic media), another party, e.g, a patient, of the prognosis.

[0147] The use of the singular can include the plural unless specifically stated otherwise.

As used in the specification and the appended claims, the singular forms“a”,“an”, and “the” can include plural referents unless the context clearly dictates otherwise.

[0148] As used herein,“and/or” means“and” or“or”. For example,“A and/or B” means “A, B, or both A and B” and“A, B, C, and/or D” means“A, B, C, D, or a combination thereof’ and said“A, B, C, D, or a combination thereof’ means any subset of A, B, C, and D, for example, a single member subset (e.g, A or B or C or D), a two-member subset ( e.g ., A and B; A and C; etc.), or a three-member subset (e.g., A, B, and C; or A, B, and D; etc), or all four members (e.g, A, B, C, and D).

[0149] As used herein, the phrase“one or more of’, e.g.,“one or more of A, B, and/or

C” means“one or more of A”,“one or more of B”,“one or more of C”,“one or more of A and one or more of B”,“one or more of B and one or more of C”,“one or more of A and one or more of C” and“one or more of A, one or more of B, and one or more of C”.

[0150] The phrase“comprises or consists of A” is used as a tool to avoid excess page and translation fees and means that in some embodiments the given thing at issue:

comprises A or consists of A. For example, the sentence“In some embodiments, the composition comprises or consists of A” is to be interpreted as if written as the following two separate sentences:“In some embodiments, the composition comprises A. In some embodiments, the composition consists of A.”

[0151] Similarly, a sentence reciting a string of alternates is to be interpreted as if a

string of sentences were provided such that each given alternate was provided in a sentence by itself. For example, the sentence“In some embodiments, the composition comprises A, B, or C” is to be interpreted as if written as the following three separate sentences:“In some embodiments, the composition comprises A. In some embodiments, the composition comprises B. In some embodiments, the composition comprises C.” As another example, the sentence“In some embodiments, the composition comprises at least A, B, or C” is to be interpreted as if written as the following three separate sentences:“In some embodiments, the composition comprises at least A. In some embodiments, the composition comprises at least B. In some embodiments, the composition comprises at least C.”

[0152] To the extent necessary to understand or complete the disclosure of the present invention, all publications, patents, and patent applications mentioned herein are expressly incorporated by reference therein to the same extent as though each were individually so incorporated.

[0153] Having thus described exemplary embodiments of the present invention, it should be noted by those skilled in the art that the within disclosures are exemplary only and that various other alternatives, adaptations, and modifications may be made within the scope of the present invention. Accordingly, the present invention is not limited to the specific embodiments as illustrated herein, but is only limited by the following claims.

Claims

What is claimed is:

1. A method for assaying protein-protein interactions, which comprises

a) obtaining an exon library,

b) preparing an mRNA library by transcribing the exons of the exon library,

c) generating a peptide library by translating the mRNA sequences of the mRNA library, d) generating an mRNA display library by linking the peptides of the peptide library with the mRNA sequences of the mRNA library,

e) generating an input library of cDNA sequences by reverse transcription of the mRNA sequences of the mRNA display library,

f) enriching the cDNA sequences of the input library to obtain an enriched library of

enriched cDNA sequences by using one or more proteins of interest as bait proteins, and g) obtaining enriched peptides from the enriched cDNA sequences, contacting the enriched peptides with the one or more proteins of interest, and analyzing any interactions between the enriched peptides and the one or more proteins of interest.

2. The method according to claim 1, wherein steps a) to f) are repeated one or more times whereby the enriched cDNA sequences are used as the exon library in the repeated steps.

3. The method according to claim 1, wherein step b) comprises adding a T7 promoter and a FLAG peptide sequence.

4. The method according to claim 1, wherein step b) comprises linking the transcribed exons to puromycin.

5. The method according to claim 1, wherein step d) comprises purifying the peptide-mRNA complexes by FLAG tag selection.

6. The method according to claim 1, wherein step f) comprises contacting the cDNA sequences of the input library with the bait proteins and amplifying the cDNA sequences that bind the bait proteins by PCR amplification.

7. The method according to any one of claims 1 to 6, which comprises sequencing the cDNA sequences of the input library and/or sequencing the enriched cDNA sequences of the enriched library.

8. The method according to any one of claims 1 to 7, wherein the exon library is generated from fragmented DNA.

9. The method according to any one of claims 1 to 8, wherein the exon library is generated from a given cell type and/or organism of interest.

10. The method according to any one of claims 1 to 9, wherein the exon library is obtained from genomic DNA.

11. The method according to any one of claims 1 to 10, which further comprises minimizing protein fragments that do not represent complete exon sequences by using methods in the art to control the length and composition of the input library or utilizing an open reading frame (ORF) library that is substantially evenly distributed.