WO2024197185A1

WO2024197185A1 - Methods and compositions for dissecting organelle physiology

Info

Publication number: WO2024197185A1
Application number: PCT/US2024/020984
Authority: WO
Inventors: Vamsi Mootha; Tsz-Leung TO
Original assignee: The Broad Institute, Inc.; The General Hospital Corporation
Priority date: 2023-03-21
Filing date: 2024-03-21
Publication date: 2024-09-26

Abstract

The subject matter disclosed herein is generally directed to methods for detailed organelle functional measurements in cell-based genetic screening assays. Specifically, disclosed herein are methods for combining detailed bioenergetics measurements with cell-based genetic-screening for mutant phenotypes.

Description

METHODS AND COMPOSITIONS FOR DISSECTING ORGANELLE PHYSIOLOGY

CROSS-REFERENCE TO RELATED APPLICATION

[0001] The present application is related to and claims priority under 35 U.S.C. § 119(e) to U.S. provisional patent application No. 63/491,424, entitled “METHODS AND COMPOSITIONS FOR DISSECTING ORGANELLE PHYSIOLOGY,” fded March 21, 2023. The entire content of the aforementioned patent application is incorporated herein by this reference.

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

[0002] The contents of the electronic sequence listing ("BROD-5780P_ST26.xml"; Size is 4,804,944 bytes and it was created on February 22, 2023) is herein incorporated by reference in its entirety.

TECHNICAL FIELD

[0003] The subject matter disclosed herein is generally directed to methods for detailed organelle functional measurements in cell-based genetic screening assays.

BACKGROUND

[0004] Mitochondria are affectionately termed the “powerhouses of the cell” as they house the machinery for oxidative phosphorylation (OXPHOS), classically defined as respiratory chain complexes I to IV, and the F₁F_o-ATPase complex V (Fig. la). When reducing equivalents feed into the respiratory chain, free energy is conserved by coupling electron transport to the generation of a proton motive force across the inner membrane, consisting primarily of a membrane potential (AF_m) as well as pH gradient (ApH). This proton motive force is then dissipated by complex V to catalyze the formation of ATP.

[0005] Early insights into coupling mechanisms in OXPHOS bioenergetics came from studies of suspensions of isolated mitochondria, in which it is possible to monitor key parameters such as oxygen consumption and membrane potential following titration of substrates and inhibitors that are not cell permeable ^1-4. Addition of such substrates leads to an energization of mitochondria and a stable, high membrane potential state (called state 4), which can be rapidly dissipated by complex V following the addition of ADP (state 3). For example, mitochondria can be energized with glutamate/malate, a classic substrate combination that feeds specifically into complex I, or succinate/piericidin, a classic substrate/inhibitor pair that feeds directly into complex II. These classical approaches make it possible to investigate the mitochondrial respiratory chain in highly defined media conditions, but they are low throughput.

[0006] A highly complementary approach to investigate mitochondrial biology is genetic screening. For example, studies on yeast genetics were crucial in delineating pathways required for the function and assembly of cytochrome c oxidase ⁵. Additionally, analysis of cell lines from patients with inherited mitochondrial disease has led to the discovery of numerous respiratory chain assembly factors - notably of complex 1⁶, which is missing in the genetically tractable yeast S. cerevisiae ⁷. Recently, pooled CRISPR genetic screening has made it possible to systematically identify genes required for OXPHOS based on cellular growth phenotypes that are dependent on intact mitochondrial function ⁹. However, these growth screens are gross readouts and do not provide fine granularity on specific features of organelle physiology.

[0007] Ideally, it would be possible to combine the power of classical bioenergetics - in which substrates and inhibitors can be carefully fed to the mitochondrial respiratory chain - with modem genetic screening technologies that employ CRISPR. Such hybrid technology would ideally be scalable so that Applicants could systematically connect genes to mitochondrial physiology with the biochemical resolution afforded by classical studies of bioenergetics in isolated mitochondria. [0008] Citation or identification of any document in this application is not an admission that such a document is available as prior art to the present invention.

SUMMARY

[0009] In one aspect, the present invention provides for a method of measuring organelle function in cells comprising: a) permeabilizing the plasma membranes of a population of cells with one or more agents that preserve cell integrity and functionally intact organelles, wherein the population of cells comprise one or more perturbations, with or without one or more barcodes identifying the one or more perturbations, optionally, wherein the population of cells comprise an organelle-specific genetically encoded functional reporter; b) optionally, labeling the population of cells with an organelle-specific functional probe when the population of cells does not comprise a genetically encoded functional reporter; c) treating the population of cells with one or more organelle specific substrate/inhibitor combinations; d) measuring organelle function for the population of cells by detecting the organelle-specific functional probe or organelle-specific functional reporter; e) sequencing the barcodes or sequences encoding the perturbations from the population of cells to correlate the one or more perturbations with an effect on the measured organelle function. In certain embodiments, the method further comprises, prior to step e), sorting the cell population based on an activity level of the organelle specific functional probe thereby allowing for detection of perturbations enriched or depleted in the sorted cell population.

[0010] In certain embodiments, the one or more perturbations are genetic (e.g., INDELs, substitutions, CRISPRa (CRISPR activation), CRISPRi (CRISPR interference), RNAi (RNA interference), and base editor mediated mutagenesis), chemical, or biologic perturbations. In certain embodiments, the one or more perturbations are genome-wide perturbations. In certain embodiments, the one or more genetic perturbations comprise CRISPR guide RNAs (gRNAs) from a library of CRISPR gRNAs, wherein the population of cells are configured to express a CRISPR enzyme, optionally, wherein the guide sequences are capable of being used as the barcodes. In certain embodiments, the library of CRISPR gRNAs comprises one or more guide sequences targeting one or more genes selected from the group consisting of genes encoding the mitochondrial proteome, the lysosomal proteome, and metabolic pathways. In certain embodiments, the one or more genetic perturbations comprise barcoded open reading frames (ORFs) from a library of barcoded ORFs, wherein each ORF may be identified by a unique barcode or by the sequence of the ORF. Direct sequencing of the ORF may be used to identify mutations when no barcode is used. In certain embodiments, the ORFs comprise variants of an enzyme. In certain embodiments, the ORFs comprise organelle specific ORFs.

[0011] In certain embodiments, the organelle is a mitochondria or chloroplast and the organelle-specific functional probe is a membrane potential probe. In certain embodiments, the membrane potential probe is selected from the group of consisting of tetramethylrhodamine methyl ester (TMRM), TMRE, JC-1, JC-10, MITO-ID, MitoTracker Red, CMXRos/Deep Red, Rhodamine 123, DiOC6, SPIRIT RhoVR, MitoView 405, MitoView 633, MitoView 650, and MitoView 720. In certain embodiments, the organelle is a mitochondria and the organelle-specific functional probe is a chemical probe for mitochondria abundance. In certain embodiments, the chemical probe for mitochondria abundance is selected from the group of consisting of PK Mito Orange, MitoTracker Green, MitoTracker Deep Red, and MitoView Green. In certain embodiments, the organelle is a mitochondria and the organelle-specific functional probe is a chemical probe for mitochondria reactive oxygen species (ROS). In certain embodiments, the chemical probe for mitochondria ROS is selected from the group of consisting of MitoSOX, MitoB, MitoPYl, and MitoPeDPP. In certain embodiments, the organelle is a mitochondria or chloroplast and the organelle-specific functional probe is a calcium probe. In certain embodiments, the organelle is a mitochondria or a chloroplast and the organelle-specific functional probe is an NADH probe. In certain embodiments, the organelle is a lysozyme and the organelle-specific functional probe is a fluorescent acidotropic probe. In certain embodiments, the organelle-specific functional genetic reporter is selected from the group consisting of genetic reporters of NADH (SONAR), NADPH (iNAP), ATP (iATPsnFR), calcium (GCaMP, GECO), pH (SypHer, pHRed), ROS (roGFP, HyPer), citrate (Citron, Citroff), and lactate (GEM-IL, eLACCO). In certain embodiments, the probe or reporter is a fluorescent probe or reporter.

[0012] In certain embodiments, the one or more organelle-specific substrate/inhibitors is one or more mitochondria specific substrate(s)/inhibitor(s). In certain embodiments, the one or more mitochondria-specific substrate(s)/inhibitor(s) are selected from the group consisting of glutamate, malate, succinate, piericidin A, coenzyme Q-linked substrates, glycerol-3-phosphate, ATP, ADP, D-lactate, antimycin A, ascorbate, N,N,N’,N’-tetramethyl-p-phenylenediamine (TMPD), oligomycin A, BAM15, and carbonyl cyanide m-chlorophenyl hydrazone (CCCP).

[0013] In certain embodiments, the plasma membranes of the plurality of cells are permeabilized with Perfringolysin O. In certain embodiments, the plasma membranes of the plurality of cells are permeabilized with a cholesterol-specific detergent, such as digitonin or saponin, or lower concentrations of commonly used detergents such as Triton X-100.

[0014] In certain embodiments, the cells further comprise a sample barcode that identifies a sample source for the population of cells, optionally wherein the sample barcode is introduced via a vector to all cells in a given sample, optionally wherein the sample comprises a specific perturbation. Regarding chemical (e.g., drugs) or biological (e.g., antibodies) perturbations, barcodes can be specific to different samples, wherein each sample has a different perturbation. [0015] These and other aspects, objects, features, and advantages of the example embodiments will become apparent to those having ordinary skill in the art upon consideration of the following detailed description of example embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] An understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention may be utilized, and the accompanying drawings of which: [0017] FIG. 1A-1E - Schematic overview of PMF-Seq. a, Schematic diagram showing how various substrates can feed into the respiratory chain to support the membrane potential, b, Seahorse permeabilized cell oxygen consumption rate (OCR) measurements in A375 cells with glutamate/malate (left panel), succinate (middle panel), or ascorbate/TMPD (right panel) as the substrate, c, Kinetic TMRM permeabilized cell membrane potential measurements by a fluorescence spectrophotometer in A375 cells with glutamate/malate (left panel), succinate (middle panel), or ascorbate/TMPD (right panel) as the substrate, d, Endpoint permeabilized cell membrane potential measurements by a flow cytometer in A375 cells with glutamate/malate (left panel), succinate (middle panel), or ascorbate/TMPD (right panel) as the substrate, e, Experimental workflow of PMF-Seq in permeabilized A375 cells. IMS: inner membrane space; G/M: glutamate/malate; Succ: succinate; Asc/TMPD: ascorbate/N,N,N',N'-tetramethyl-p- phenylenediamine; Pier: piericidin A; Anti: antimycin A; Oligo: oligomycin A; Bam: BAM15; CCCP: carbonyl cyanide m-chlorophenyl hydrazone; TMRM: Tetramethylrhodamine methyl ester; A.U.: arbitrary units.

[0018] FIG. 2 - Genetic dissection of respiratory chain flow and branching by PMF-Seq. Scatterplots of Z-scores showing the specified respiratory chain components when glutamate/malate (first column), succinate (second column), or ascorbate/TMPD (third column) was used as the respiratory chain substrate in permeabilized A375 cells. Data from biological duplicates are shown. A highly negative Z-score indicates enrichment of sgRNAs for a given gene in the low tail of the TMRM distribution, suggesting a dependency on that gene for membrane potential generation. IMS: inner membrane space; G/M: glutamate/malate; Succ: succinate; Asc/TMPD: ascorbate/N,N,N',N'-tetramethyl-p-phenylenediamine; Pier: piericidin A; Anti: antimycin A; A.U.: arbitrary units.

[0019] FIG. 3A-3H - PMF-Seq identifies the genetic determinant for the utilization of D- lactate as a respiratory chain substrate in human cells, a, Endpoint permeabilized cell membrane potential measurements by a flow cytometer in A375 cells with D-lactate as the substrate, b, Schematic diagram showing that either ascorbate/TMPD or D-lactate can feed into the respiratory chain to support the membrane potential when complex III is inhibited by antimycin A. c, PMF-Seq identifies LDHD as a specific genetic requirement under D-lactate. d, Kinetic TMRM permeabilized cell membrane potential measurements by a fluorescence spectrophotometer in control vs. LDHD KO cells in A375 (first and second panels) and HepG2 (third and final panels). Ascorbate/TMPD (gray line) or D-lactate (blue line) was added as the substrate after antimycin A treatment, as indicated by the second arrow, e, Size exclusion chromatography profile of purified human LDHD. f, SDS-PAGE analysis of purified human LDHD visualized with Coomassie stains, g, in vitro steady-state enzyme kinetics of LDHD- catalyzed cytochrome c reduction by D-lactate. h, Relative initial reaction rates of LDHD- catalyzed cytochrome c reduction by 10 mM of the specified substrates. Asc/TMPD: ascorbate/N,N,N',N'-tetramethyl-p-phenylenediamine; Anti: antimycin A; A.U.: arbitrary units. Shown is the mean +/- s.d. in Fig. 3g-h.

[0020] FIG. 4 - Comparison of PMF-seq with intact cell OXPHOS CRISPR screening. OXPHOS genes are organized by complex with red indicating genes that are hits in the galactose death screen in intact K562 cells (top panel, Arroyo et al. Cell Metab 2016) or in the PMF-seq screens as in permeabilized cells (bottom three panels, with average Z-scores from the two replicates depicted in Fig. 2). Genes are ordered alphabetically within complex by prefixes (NDUF-, SDH-, UQCR-, and COX-).

[0021] FIG. 5A-5B - Cell-type specific branching into the respiratory chain can be revealed by PMF-Seq. Scatterplots of Z-scores showing the specified respiratory chain components when glutamate/malate (x-axis of both columns), glycerol 3-phosphate (y-axis of first column), or ascorbate/TMPD (y-axis of second column) was used as the respiratory chain substrate in permeabilized (a) A375 or (b) K562 cells. A highly negative Z-score indicates the enrichment of sgRNAs for a given gene in the low tail of the TMRM distribution, suggesting the dependency on that gene for membrane potential generation. G3P: glycerol 3-phosphate; G/M: glutamate/malate; Ask/TMPD: ascorbate/N,N,N',N'-tetramethyl-p-phenylenediamine; Pier: pieri cidin A; Anti: antimycin A.

[0022] FIG. 6A-6D - Dissecting complex V reversal with PMF-seq. a, Schematic diagram showing complex V running in reverse by ATP hydrolysis to support the membrane potential, b, Kinetic TMRM permeabilized cell membrane potential measurements by a fluorescence spectrophotometer in A375 cells with ATP as the substrate, c, Endpoint permeabilized cell membrane potential measurements by a flow cytometer in A375 cells with ATP as the substrate, d, Scatterplots of Z-scores showing components of complex I (red) or complex V (black) when glutamate/malate (first column), or ATP (second column) was used as the substrate. Data from biological duplicates are shown. A highly negative Z-score indicates the enrichment of sgRNAs for a given gene in the low tail of the TMRM distribution, suggesting the dependency on that gene for membrane potential generation. RC: respiratory chain; Anti: antimycin A; Oligo: oligomycin A; G/M: glutamate/malate; Anti: antimycin A; A.U.: arbitrary units.

[0023] FIG. 7A-7B - D-lactate dehydrogenase loss does not greatly increase sensitivity to methylglyoxal toxicity, a, Cell counts of control and LDHD KO cells in A375 after 3 days of methylglyoxal treatment at the indicated dose, b, Viability of control and LDHD KO cells in the same samples as in (a). Shown is the mean +/- s.e.m., n = 3. ***p < 0. 001 or ****p < 0.0001 indicates P values for the specified comparisons from two-way ANOVA after adjustment for multiple comparisons (Tukey’s).

[0024] The figures herein are for illustrative purposes only and are not necessarily drawn to scale.

DETAILED DESCRIPTION OF THE EXAMPLE EMBODIMENTS

General Definitions

[0025] Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. Definitions of common terms and techniques in molecular biology may be found in Molecular Cloning: A Laboratory Manual, 2^nd edition (1989) (Sambrook, Fritsch, and Maniatis); Molecular Cloning: A Laboratory Manual, 4^th edition (2012) (Green and Sambrook); Current Protocols in Molecular Biology (1987) (F.M. Ausubel et al. eds.); the series Methods in Enzymology (Academic Press, Inc.): PCR2: APractical Approach (1995) (M.J. MacPherson, B.D. Hames, and G.R. Taylor eds.): Antibodies, A Laboratory Manual (1988) (Harlow and Lane, eds.): Antibodies A Laboratory Manual, 2^nd edition 2013 (E.A. Greenfield ed.); Animal Cell Culture (1987) (R.I. Freshney, ed.); Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN 0763752223); Kendrew etal. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994), March, Advanced Organic Chemistry Reactions, Mechanisms and Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992); and Marten H. Hofker and Jan van Deursen, Transgenic Mouse Methods and Protocols, 2^nd edition (2011).

[0026] As used herein, the singular forms “a”, “an”, and “the” include both singular and plural referents unless the context clearly dictates otherwise.

[0027] The term “optional” or “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.

[0028] The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within the respective ranges, as well as the recited endpoints.

[0029] The terms “about” or “approximately” as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, are meant to encompass variations of and from the specified value, such as variations of +/-10% or less, +/-5% or less, +/- 1% or less, and +/-0.1% or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. It is to be understood that the value to which the modifier “about” or “approximately” refers is itself also specifically, and preferably, disclosed.

[0030] As used herein, a “biological sample” may contain whole cells and/or live cells and/or cell debris. The biological sample may contain (or be derived from) a “bodily fluid”. The present invention encompasses embodiments wherein the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof. Biological samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.

[0031] The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.

[0032] Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.

[0033] All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference. OVERVIEW

[0034] Embodiments disclosed herein provide methods for detailed in situ organelle functional measurements combined with pooled cell-based perturbation assays (e.g., genetic screening assays). As used herein the term “perturbation” refers to any alteration of the function of a biological system by external or internal means, such as alterations in gene expression, alterations by environmental stimuli, or alterations by drug treatment. In example embodiments, the perturbation is genetic, chemical, or biological. In example embodiments, functional measurements can be detected in pooled cells that were treated with a library of genetic perturbations, such that the cells can be sorted based on different levels of the functional measurements and the perturbations in the sorted cells can be identified. Thus, genetic perturbations targeting genes can link the perturbed genes to the function of an organelle of interest. In example embodiments, the cells can be any eukaryotic cell and the organelle can be any organelle in the cells. Functional measurements can be made using reporter genes operably linked to an organelle function or by labeling cells with a functional probe. Specific functions can be dissected by treating the cells with specific substrate/inhibitor combinations that block a biological system in an organelle at a specific step or that push a biological system in an organelle in a specific direction.

[0035] In example embodiments, bioenergetics measurements are detected in mitochondria in cells treated with a genetic perturbation. The study of mitochondrial physiology has benefited from two distinct approaches: detailed bioenergetics measurements in suspensions of isolated mitochondria, and cell-based genetic-screening for mutant phenotypes. Here, Applicants marry these complementary approaches in Permeabilized-cell Mitochondrial Function sequencing, or PMF-seq, in which mitochondrial bioenergetics is probed in pools of permeabilized cells that have been CRISPR mutagenized. Applicants demonstrate that PMF-seq can reveal the genetic basis for pervasive branching of the mitochondrial respiratory chain. The core concept behind the approach is to interrogate mitochondrial function in mutagenized pools of cells where the plasma membrane has been permeabilized while the mitochondria remain physiologically competent and accessible to substrates/inhibitors. Applicants reasoned that by CRISPR mutagenizing cultured cells prior to permeabilization Applicants could perform kinetic measurements in bulk, sort cells based on a bioenergetic parameter at a specified time, and perform next-generation sequencing, thereby connecting genes to bioenergetics. Also disclosed is a mitochondrial specific CRISPR library comprising 15,271 guide sequences (MitoPlus).

MEASURING ORGANELLE FUNCTION

[0036] In one aspect, the present invention provides for a method of measuring organelle function in cells by identifying perturbations that modulate organelle function. The method includes a step of permeabilizing the plasma membranes of a population of cells with one or more agents that preserve cell integrity and functionally intact organelles. In example embodiments, the permeabilization allows the cells to be treated with organelle specific substrates/inhibitors used to pin point the assay to different functional components of a biological system in an organelle. The one or more agents can be a substrate for a specific functional component of a biological system, such that the activity of the biological system in an organelle is pushed in one direction. The one or more agents can be one or more inhibitors that block a specific component of a biological system in an organelle, such that the function of an upstream or downstream component can be analyzed. In example embodiments, the population of cells comprise one or more perturbations, with or without one or more barcodes identifying the one or more perturbations. The perturbations can be any agent that modulates a target gene, such as a genetic, chemical or biologic perturbation. In example embodiments, the perturbation is in a gene and the gene sequence can be used to identify the perturbation. In example embodiments, the perturbation is a guide sequence that guides a programmable nuclease system to a specific gene target and the guide sequence can identify the perturbation. In example embodiments, each cell encodes a barcode sequence that is associated to the perturbation, such that identification of the barcode identifies the perturbation. In example embodiments, the population of cells comprise an organelle-specific functional genetic reporter. For example, a reporter gene that becomes activated in response to a specific activity in an organelle. Alternatively, the population of cells can be labeled with an organelle-specific functional probe that can be detected in response to a specific activity in an organelle. In example embodiments, measuring organelle function for the population of cells can be performed by detecting the organelle-specific functional probe or genetically encoded organelle-specific functional reporter. In example embodiments, the readout is a fluorescent readout. For example, the reporter can be a fluorescent reporter gene and the probe can be a fluorescent probe. Detecting the reporter or probe can be performed using fluorescence cell sorting, such as FACS. In example embodiments, the cells are sorted based on the level of detection. In example embodiments, the perturbations are identified for the sorted cells. For example, cells identified as having the highest level of detection are analyzed for perturbations. In example embodiments, perturbations that are enriched or depleted in sorted cell populations are detected. In example embodiments, the perturbations are identified by sequencing the barcodes or sequences encoding the perturbations from the population of cells to correlate the one or more perturbations with an effect on the measured organelle function. Sequencing can include amplification of barcodes followed by sequencing. Sequencing can be any next generation sequencing platform.

Population of cells

[0037] In example embodiments, the population of cells is any clonal eukaryotic cell line comprising organelles and capable of being perturbed as described herein. The cells can be animal or plant cells. The animal cells are not limited to human cells and can include any mammalian cell line or insect cell lines. In example embodiments, plant protoplasts, plant cells without cell walls, can be used (see, e.g., Yue JJ, Yuan JL, Wu FH, et al. Protoplasts: From Isolation to CRISPR/Cas Genome Editing Application. Front Genome Ed. 2021;3:717017).

Permeabilizing Cells

[0038] In example embodiments, the plasma membranes of the plurality of cells are permeabilized with one or more agents that preserve cell integrity and functionally intact organelles (see, e.g., Divakaruni AS, Rogers GW, Murphy AN. Measuring Mitochondrial Function in Permeabilized Cells Using the Seahorse XF Analyzer or a Clark-Type Oxygen Electrode. Curr Protoc Toxicol. 2014;60:25.2.1-25.2.16). In example embodiments, the permeabilizing agent is titrated to empirically determine the appropriate concentration for the population of cells used. The permeabilization may be performed in a way to minimally perturb the spatial proximity of nucleic acids, protein folding, organelles, and/or nuclei. In example embodiments, the cells are permeabilized, such that protein complexes do not fall apart or proteins are not denatured.

[0039] In example embodiments, the plasma membranes of the plurality of cells are permeabilized with a cytolysin (see, e.g., Divakaruni, et al., 2014). Cytolysin refers to the substance secreted by microorganisms, plants or animals that is specifically toxic to individual cells, in many cases causing their dissolution through lysis. Cytolysins can destruct membranes without creating lysis to cells. Cytolysins comprise more than 1/3 of all bacterial protein toxins. Therefore, "membrane damaging toxins" (MDTs) describes the essential actions of cytolysins. Cytolysins which form pores on target cells' membranes are also known as pore-forming toxins (PFTs) and comprise the largest portion of all cytolysins. Examples of this type include perfringiolysin O from Clostridium perfringens bacteria, hemolysin from Escherichia coli, and listeriolysin from Listeria monocytogenes. Targets of this type of cytolysins range from general cell membranes to more specific microorganisms, such as cholesterols and phagocyte membranes. In example embodiments, the plasma membranes of the plurality of cells are permeabilized with a PFT or derivative thereof.

[0040] In example embodiments, the plasma membranes of the plurality of cells are permeabilized with Perfringolysin O. Perfringolysin O (PFO) is a thiol-activated cytolysin, which creates membrane holes on eukaryotic cells (see, e.g., Rossjohn J, Fed SC, McKinstry WJ, Tweten RK, Parker MW. Structure of a cholesterol-binding, thiol -activated cytolysin and a model of its membrane form. Cell. 1997;89(5):685-692). PFO is a cholesterol-dependent cytolysin derived from Clostridium perfringens that forms oligomeric pores in cholesterol-containing membranes (see, e.g., Divakaruni, et al., 2014). In example embodiments, a mutant recombinant perfringolysin O (rPFO) is used to selectively permeabilize the plasma membrane (see, e.g., Divakaruni, et al., 2014). In some embodiments, high concentrations of rPFO are not injurious to mitochondria in the way excess digitonin is (see, e.g., Divakaruni, et al., 2014). Concentrations 10-fold higher than those used to permeabilize the plasma membrane of cells did not affect state 3 respiration, mitochondrial membrane potential, or cytochrome c release from the intermembrane space (Divakaruni AS, Wiley SE, Rogers GW, et al. Thiazolidinediones are acute, specific inhibitors of the mitochondrial pyruvate carrier. Proc Natl Acad Sci U S A. 2013;! 10(14):5422-5427). This is likely due to the threshold property for rPFO, as the mitochondrial outer membrane may not have the cholesterol content required to form the rPFO oligomer. Thus, in example embodiments, any cholesterol-binding cytolysin can be used for permeabilization (see, e.g., US patent application publication US20130164774 A9).

[0041] In example embodiments, the plasma membranes of the plurality of cells are permeabilized with a cholesterol-specific detergent, such as the amphipathic steroid glycosides digitonin or saponin. In example embodiments, the concentration of digitonin needed to permeabilize cells is about 0.008% (w/v) (see, e.g., Divakaruni, et al., 2014). [0042] In general, a biological sample can be permeabilized by exposing the sample to one or more permeabilizing agents. Suitable agents for this purpose include, but are not limited to, organic solvents (e.g., acetone, ethanol, and methanol), cross-linking agents (e.g., paraformaldehyde), detergents (e.g., saponin, Triton X-100™, Tween-20™, or sodium dodecyl sulfate (SDS)), and enzymes (e.g., trypsin, proteases (e.g., proteinase K). In example embodiments, the population of cells is permeabilized with a detergent. In example embodiments, the plasma membranes of the plurality of cells are permeabilized with lower concentrations of commonly used detergents, such as saponin, Triton X-100™, Tween-20™, or sodium dodecyl sulfate (SDS). Saponin interacts with membrane cholesterol, selectively removing it and leaving holes in the membrane. In example embodiments, the detergent is non-ionic. In some embodiments, the detergent is an anionic detergent (e.g., SDS or N-lauroylsarcosine sodium salt solution). In some embodiments, the plurality of cells can be permeabilized using any of the detergents described herein, e.g., SDS and/or N-lauroylsarcosine sodium salt solution) before or after enzymatic treatment (e.g., treatment with any of the enzymes described herein, e.g., trypsin, proteases (e.g., pepsin and/or proteinase K)). Additional methods for sample permeabilization are described, for example, in Jamur et al., Method Mol. Biol.588:63-66, 2010, the entire contents of which are incorporated herein by reference.

[0043] In example embodiments, the concentration of the detergent is sufficient to permeabilize the cells without denaturing proteins. In example embodiments, NP40, digitonin, or tween is used. For example, the concentration of detergent used herein may be from 0.005% to 1%, from 0.01% to 0.8%, from 0.01% to 0.6%, from 0.01% to 0.4%, from 0.01% to 0.2%, from 0.01% to 0.1%, from 0.005% to 0.05%, from 0.01% to 0.03%, from 0.015% to 0.025%, from 0.018% to 0.022%, from 0.015% to 0.017%, from 0.016% to 0.018%, from 0.017% to 0.019%, from 0.018% to 0.02%, from 0.019% to 0.021%, from 0.02% to 0.022%, or from 0.021% to 0.023%. In some cases, the concentration of the detergent may be about 0.01%, about 0.015%, about 0.02%, about 0.025%, or about 0.03%. For example, the concentration of the detergent may be about 0.02%. In example embodiments, SDS is used at concentrations below 0.5%, such as 0.1, 0.05, or less than 0.01%. Organelles and functional measurements

[0044] In example embodiments, the methods described herein can be used to analyze the function of any organelle in intact cells. As used herein, “organelle” refers to any of a number of organized or specialized structures within a living cell or a small structure in a cell that is surrounded by a membrane and has a specific function. Non-limiting examples of organelles are the nucleus, mitochondria, lysosomes, chloroplast (plastid), endoplasmic reticulum, Golgi apparatus, and vacuole. In example embodiments, a probe or genetically encoded reporter is used to observe an intracellular parameter in an organelle. In example embodiments, the probe or reporter is a fluorescent probe or reporter.

[0045] In example embodiments, membrane potential probes are used for measuring the membrane potential of organelles, such as, mitochondria, lysosome/endosome, and endoplasmic reticulum (see, e.g., Klier PEZ, Roo R, Miller EW. Fluorescent indicators for imaging membrane potential of organelles. Curr Opin Chem Biol. 2022;71 : 102203). In example embodiments, membrane potential is measured in mitochondria or chloroplasts with a membrane potential probe (see, e.g., Perry SW, Norman JP, Barbieri J, Brown EB, Gelbard HA. Mitochondrial membrane potential probes and the proton gradient: a practical usage guide. Biotechniques. 2011 ;50(2):98- 115). In example embodiments, the membrane potential probe is selected from the group of consisting of tetramethylrhodamine methyl ester (TMRM), tetramethylrhodamine ethyl ester (TMRE), 5,5',6,6'-tetrachloro-l,r,3,3'-tetraethylbenzimidazolo carbocyanine iodide (JC-1), JC- 10, MITO-ID, MitoTracker Red, CMXRos/Deep Red, Rhodamine 123, 3,3'- diehexiloxadicarbocyanine iodide (DiOC6), SPIRIT RhoVR, MitoView 405, MitoView 633, MitoView 650, and MitoView 720. In example embodiment, a water-soluble mitochondrial membrane potential sensor (Mito-MPS), a modified version of JC-1 with similar fluorescent properties and subcellular staining patterns, is used to quantify membrane potential (see, e.g., Sakamuru S, Li X, Attene-Ramos MS, et al. Application of a homogenous membrane potential assay to assess mitochondrial function. Physiol Genomics. 2012;44(9):495-503).

[0046] In example embodiments, a chemical probe for mitochondria abundance is used as a functional probe. In example embodiments, a chemical probe for mitochondria abundance is selected from the group of consisting of PK Mito Orange, MitoTracker Green, MitoTracker Deep Red, and MitoView Green. [0047] In example embodiments, a chemical probe for mitochondria reactive oxygen species (ROS) is used as a functional probe. In example embodiments, the chemical probe for mitochondria reactive oxygen species (ROS) is selected from the group of consisting of MitoSOX, MitoB, MitoPYl, and MitoPeDPP.

[0048] In example embodiments, the organelle is a mitochondria or chloroplast and the organelle-specific functional probe is a calcium probe. A calcium probe can be used to measure Ca²⁺ movement across membranes, such as inner-envelope membranes. For example, the Cabsensitive fluorophore fura-2 can be used (see, e.g., Roh MH, Shingles R, Cleveland MJ, McCarty RE. Direct measurement of calcium transport across chloroplast inner-envelope vesicles. Plant Physiol. 1998; 118(4): 1447-1454).

[0049] In example embodiments, the organelle is a mitochondria or a chloroplast and the organelle-specific functional probe is an NADH probe.

[0050] In example embodiments, the organelle is a lysozyme and the organelle-specific functional probe is a fluorescent acidotropic probe.

[0051] In example embodiments, an organelle-specific functional genetically encoded reporter responds to a specific metabolite or reaction product produced in an organelle. In example embodiments, the reporter is a genetically encoded reporter that allows for the observation of an intracellular parameter in a living system. Genetically encoded reporters specific for various metabolic and redox parameters are available (see, e.g., Germond A, Fujita H, Ichimura T, Watanabe TM. Design and development of genetically encoded fluorescent sensors to monitor intracellular chemical and physical parameters. Biophys Rev. 2016;8(2): 121-138; and Kostyuk Al, Kokova AD, Podgorny OV, et al. Genetically Encoded Tools for Research of Cell Signaling and Metabolism under Brain Hypoxia. Antioxidants (Basel). 2020;9(6):516). In example embodiments, the organelle-specific functional genetic reporter is selected from the group consisting of genetic reporters of NADH (SoNar) (Zhao Y, Hu Q, Cheng F, et al. SoNar, a Highly Responsive NAD+/NADH Sensor, Allows High-Throughput Metabolic Screening of Anti-tumor Agents. Cell Metab. 2015;21(5):777-789), NADPH (iNAP) (Tao R, Zhao Y, Chu H, et al. Genetically encoded fluorescent sensors reveal dynamic regulation of NADPH metabolism. Nat Methods. 2017;14(7):720-728), ATP (iATPsnFR) (Lobas MA, Tao R, Nagai J, et al. A genetically encoded single-wavelength sensor for imaging cytosolic and cell surface ATP. Nat Commun. 2019; 10(1 ):711), calcium (GCaMP, GECO) (Podor B, Hu YL, Ohkura M, Nakai J, Croll R, Fine A. Comparison of genetically encoded calcium indicators for monitoring action potentials in mammalian brain by two-photon excitation fluorescence microscopy. Neurophotonics. 2015;2(2):021014; Nat. Biotechnol.2(2), 137-141 (2001); and Science2(2), 1888-1891 (2011)), pH (SypHer, pHRed) (Matlashov ME, Bogdanova YA, Ermakova GV, et al. Fluorescent ratiometric pH indicator SypHer2: Applications in neuroscience and regenerative biology. Biochim Biophys Acta. 2015; 1850(11):2318-2328; and Tantama M, Hung YP, Yellen G. Imaging intracellular pH in live cells with a genetically encoded red fluorescent protein sensor. J Am Chem Soc. 2011;133(26):10034-10037), ROS (roGFP, HyPer) (Hanson GT, Aggeler R, Oglesbee D, et al. Investigating mitochondrial redox potential with redox-sensitive green fluorescent protein indicators. J Biol Chem. 2004;279(13): 13044-13053; Bilan DS, Belousov VV. In Vivo Imaging of Hydrogen Peroxide with HyPer Probes. Antioxid Redox Signal. 2018;29(6):569-584), citrate (Citron, Citroff) (Zhao Y, Shen Y, Wen Y, Campbell RE. High-Performance Intensiometric Direct- and Inverse-Response Genetically Encoded Biosensors for Citrate. ACS Cent Sci. 2020;6(8): 1441-1450), and lactate (GEM-IL, eLACCO) (Bekdash R, Quejada JR, Ueno S, et al. GEM-IL: A highly responsive fluorescent lactate indicator. Cell Rep Methods. 2021 ; 1 (7): 100092; and Nasu, et al., Improved genetically encoded fluorescent biosensors for monitoring of intra- and extracellular L-lactate. bioRxiv 2022.12.27.522013).

Perturbations

[0052] In example embodiments, before measuring organelle function a population of cells is contacted with one or more perturbations. In example embodiments, the perturbation can be a genetic perturbation (e.g., INDELs, substitutions, CRISPRa (CRISPR activation), CRISPRi (CRISPR interference), RNAi (RNA interference), and base editor mediated mutagenesis). As used herein a genetic perturbation refers to a perturbation that perturbs a nucleic acid, such as a genome sequence (e.g., a target gene or regulatory element) or RNA sequence (e.g., a transcript sequence). As used herein a chemical perturbation refers to a perturbation such as a small molecule, compound, or drug. As used herein a biological perturbation refers to a perturbation such as a biologic drug (e.g., antibody or peptide). In example embodiments, the perturbations can be identified by sequencing. In preferred embodiments, each perturbation can be identified by a barcode sequence. In example embodiments, the one or more perturbations are genome-wide perturbations. In example embodiments, the one or more perturbations target genes specific to the organelle of interest (e.g., mitochondrial genes).

Barcodes

[0053] The term “barcode” as used herein refers to a short sequence of nucleotides (for example, DNA or RNA) that is used as an identifier for an associated molecule, such as a target molecule and/or target nucleic acid, or as an identifier of the source of an associated molecule, such as a cell-of-origin, sample of origin, or individual transcript. A barcode may also refer to any unique, non-naturally occurring, nucleic acid sequence that may be used to identify the originating source of a nucleic acid fragment. Although it is not necessary to understand the mechanism of an invention, it is believed that the barcode sequence provides a high-quality individual read of a barcode associated with a perturbation, single cell, single nuclei, a viral vector, labeling ligand (e.g., antibody or aptamer), protein, shRNA, sgRNA or cDNA such that multiple species can be sequenced together. Barcoding may be performed based on any of the compositions or methods disclosed in patent publication WO 2014047561 Al, Compositions and methods for labeling of agents, incorporated herein in its entirety. In certain embodiments barcoding uses an error correcting scheme (T. K. Moon, Error Correction Coding: Mathematical Methods and Algorithms (Wiley, New York, ed. 1, 2005)).

[0054] A nucleic acid barcode can have a length of at least, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides, and can be in single- or double-stranded form. Target molecule and/or target nucleic acids can be labeled with multiple nucleic acid barcodes in combinatorial fashion, such as a nucleic acid barcode concatemer. Typically, a nucleic acid barcode is used to identify a target molecule and/or target nucleic acid as being from a particular discrete volume, having a particular physical property (for example, affinity, length, sequence, etc.), or having been subject to certain treatment conditions. Target molecule and/or target nucleic acid can be associated with multiple nucleic acid barcodes to provide information about all of these features (and more).

[0055] In some embodiments, a nucleic acid barcode may be attached to sequences that allow for amplification and sequencing (for example, SBS3 and P5 elements for Illumina sequencing). In example embodiments, a nucleic acid barcode can further include a hybridization site for a primer (for example, a single-stranded DNA primer) attached to the end of the barcode. For example, a barcode may be a nucleic acid including a barcode and a hybridization site for a specific primer. Nucleic acid barcodes (optionally in combination with other nucleic acid barcodes as described herein) can be amplified by methods known in the art, such as polymerase chain reaction (PCR). For example, the nucleic acid barcode can contain universal primer recognition sequences that can be bound by a PCR primer for PCR amplification and subsequent high-throughput sequencing, also known as next generation sequencing. In example embodiments, the nucleic acid barcode includes or is linked to sequencing adapters (for example, universal primer recognition sequences) such that the barcode and sequencing adapter elements are both coupled to the target molecule (perturbation construct).

CRISPR Libraries

[0056] In example embodiments, the perturbations are introduced to the population of cells with a library of CRISPR guide RNAs. Exemplary guide sequences specific for mitochondria are disclosed herein. In example embodiments, the CRISPR library is introduced to a population of cells capable of expressing a CRISPR enzyme or a sequence encoding the CRISPR enzyme is introduced with the guide perturbation sequence. The sequence of the guide RNAs can act as the identifying barcode sequence or the library can include separate barcode sequences identifying the guide sequence targets. In example embodiments, the CRISPR library is designed to knockdown or eliminate expression of a gene by generating an INDEL at the gene. In example embodiments, the CRISPR library is designed to knockdown or activate expression of a gene by recruiting a repressor or activator domain to the gene (e g., CRISPRa (CRISPR activation), CRISPRi (CRISPR interference)). In example embodiments, the CRISPR library is designed to replace or insert a sequence in a target gene.

[0057] In example embodiments, the perturbations can be introduced using a perturb-seq library. As used herein “perturb-seq” refers to a pooled method of introducing perturbations (guide sequences) to a population of cells, such that both the barcode sequences identifying the perturbations and transcriptome data for single cells can be identified using single cell RNA-seq. Methods and tools for genome-scale screening of perturbations in single cells using CRISPR have been described, herein referred to as perturb-seq (see e.g., Dixit et al., “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens” 2016, Cell 167, 1853-1866; Adamson et al., “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” 2016, Cell 167, 1867-1882; Feldman et al., Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens, bioRxiv 262121, doi: doi.org/10.1101/262121; Datlinger, et al., 2017, Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods. Vol.14 No.3 DOI: 10.1038/nmeth.4177; Hill et al., On the design of CRISPR-based single cell molecular screens, Nat Methods. 2018 Apr; 15(4): 271-274; Replogle, et al., “Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing” Nat Biotechnol (2020). doi.org/10.1038/s41587-020-0470-y; Schraivogel D, Gschwind AR, Milbank JH, et al. "Targeted Perturb-seq enables genome-scale genetic screens in single cells". Nat Methods. 2020;17(6):629-635; Frangieh CJ, Melms JC, Thakore PI, et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat Genet. 2021;53(3):332-341; US patent application publication number US20200283843A1; and US Patent number US11214797B2). In example embodiments, the barcode sequence does not have to be encoded for by a sequence that is capable of being transcribed and captured using RNA- seq (see, e.g., Genome-Scale CRISPR-Cas9 Knockout Screening in Human Cells. Shalem, O., Sanjana, NE., Hartenian, E., Shi, X., Scott, DA., Mikkelson, T., Heckl, D., Ebert, BL., Root, DE., Doench, JG., Zhang, F. Science Dec 12. (2013); and Parnas et al., “A Genome-wide CRISPR Screen in Primary Immune Cells to Dissect Regulatory Networks,” Cell 162, 675-686 (July 30, 2015)). For example, the barcode sequence can be the sequence encoding the guide RNA. In some embodiments, the barcode sequence can be amplified from a vector and sequenced. Thus, in some embodiments, the CRISPR library includes guide sequences from a perturb-seq library and/or barcodes identifying the guide sequences.

[0058] In example embodiments, barcoded CRISPR libraries comprising guide sequences targeting regulatory sequences are introduced to the population of cells (see, e.g., Shalem et al., “High-throughput functional genomics using CRISPR-Cas9,” Nature Reviews Genetics 16, 299- 311 (May 2015); and Gilbert, L.A., et al., (2013). "CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes". Cell. 154 (2): 442-51)).

[0059] CRISPR-Cas systems comprise a Cas polypeptide and a guide sequence, wherein the guide sequence is capable of forming a CRISPR-Cas complex with the Cas polypeptide and directing site-specific binding of the CRISPR-Cas sequence to a target sequence in one or more of the target genes. The Cas polypeptide may induce a double- or single-stranded break at a designated site in the target sequence. The site of CRISPR-Cas cleavage, for most CRISPR-Cas systems, is dictated by distance from a protospacer-adjacent motif (PAM), discussed in further detail below. Accordingly, a guide sequence may be selected to direct the CRISPR-Cas system to a desired target site at or near the one or more target genes.

[0060] In general, a CRISPR-Cas or CRISPR system as used in herein and in documents, such as International Patent Publication No. WO 2014/093622 (PCT/US2013/074667), refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (transactivating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or “RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g. CRISPR RNA and transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA)) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). See, e.g, Shmakov et al. (2015) “Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems”, Molecular Cell, DOI: dx.doi.org/10.1016/j.molcel.2015.10.008.

[0061] CRISPR-Cas systems can generally fall into two classes based on their architectures of their effector molecules, which are each further subdivided by type and subtype. The two class are Class 1 and Class 2. Class 1 CRISPR-Cas systems have effector modules composed of multiple Cas proteins, some of which form crRNA-binding complexes, while Class 2 CRISPR-Cas systems include a single, multi-domain crRNA-binding protein.

[0062] In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 2 CRISPR-Cas system. Class 1 CRISPR-Cas Systems

[0063] In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. Class 1 CRISPR-Cas systems are divided into types I, II, and IV. Makarova et al. 2020. Nat. Rev. 18: 67-83., particularly as described in Figure 1. Type I CRISPR-Cas systems are divided into 9 subtypes (I-A, I-B, I-C, I-D, I-E, I-Fl, I-F2, 1-F3, and IG). Makarova et al., 2020. Class 1, Type I CRISPR-Cas systems can contain a Cas3 protein that can have helicase activity. Type III CRISPR- Cas systems are divided into 6 subtypes (III-A, III-B, III-C, III-D, III-E, and III-F). Type III CRISPR-Cas systems can contain a CaslO that can include an RNA recognition motif called Palm and a cyclase domain that can cleave polynucleotides. Makarova et al., 2020. Type IV CRISPR- Cas systems are divided into 3 subtypes. (IV-A, IV-B, and IV-C). Makarova et al., 2020. Class 1 systems also include CRISPR-Cas variants, including Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I- F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I-B systems. Peters et al., PNAS 114 (35) (2017); DOI: 10.1073/pnas.1709035114; see also, Makarova et al. 2018. The CRISPR Journal, v. 1 , n5, Figure 5.

[0064] The Class 1 systems typically comprise a multi-protein effector complex, which can, in some embodiments, include ancillary proteins, such as one or more proteins in a complex referred to as a CRISPR-associated complex for antiviral defense (Cascade), one or more adaptation proteins (e.g., Casl, Cas2, RNA nuclease), and/or one or more accessory proteins (e.g., Cas 4, DNA nuclease), CRISPR associated Rossman fold (CARF) domain containing proteins, and/or RNA transcriptase.

[0065] The backbone of the Class 1 CRISPR-Cas system effector complexes can be formed by RNA recognition motif domain-containing protein(s) of the repeat-associated mysterious proteins (RAMPs) family subunits (e.g., Cas 5, Cas6, and/or Cas7). RAMP proteins are characterized by having one or more RNA recognition motif domains. In some embodiments, multiple copies of RAMPs can be present. In some embodiments, the Class I CRISPR-Cas system can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more Cas5, Cas6, and/or Cas 7 proteins. In some embodiments, the Cas6 protein is an RNAse, which can be responsible for pre-crRNA processing. When present in a Class 1 CRISPR-Cas system, Cas6 can be optionally physically associated with the effector complex.

[0066] Class 1 CRISPR-Cas system effector complexes can, in some embodiments, also include a large subunit. The large subunit can be composed of or include a Cas8 and/or CaslO protein. See, e.g., Figures 1 and 2. Koonin EV, Makarova KS. 2019. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087 and Makarova et al. 2020.

[0067] Class 1 CRISPR-Cas system effector complexes can, in some embodiments, include a small subunit (for example, Casl l). See, e.g., Figures 1 and 2. Koonin EV, Makarova KS. 2019 Origins and Evolution of CRISPR-Cas systems. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087.

[0068] In some embodiments, the Class 1 CRISPR-Cas system can be a Type I CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-A CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-B CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-C CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-D CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-E CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-Fl CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F2 CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F3 CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-G CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a CRISPR Cas variant, such as a Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I-F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I- B systems as previously described.

[0069] In some embodiments, the Class 1 CRISPR-Cas system can be a Type III CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-A CRISPR- Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-B CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-C CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype TIT-D CRTSPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-E CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-F CRISPR-Cas system.

[0070] In some embodiments, the Class 1 CRISPR-Cas system can be a Type IV CRISPR- Cas-system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-A CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype

IV-B CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-C CRISPR-Cas system.

[0071] The effector complex of a Class 1 CRISPR-Cas system can, in some embodiments, include a Cas3 protein that is optionally fused to a Cas2 protein, a Cas4, a Cas5, a Cas6, a Cas7, a Cas8, a CaslO, a Casl 1, or a combination thereof. In some embodiments, the effector complex of a Class 1 CRISPR-Cas system can have multiple copies, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14, of any one or more Cas proteins.

Class 2 CRISPR-Cas Systems

[0072] The compositions, systems, and methods described in greater detail elsewhere herein can be designed and adapted for use with Class 2 CRISPR-Cas systems. Thus, in some embodiments, the CRISPR-Cas system is a Class 2 CRISPR-Cas system. Class 2 systems are distinguished from Class 1 systems in that they have a single, large, multi-domain effector protein. In certain example embodiments, the Class 2 system can be a Type II, Type V, or Type VI system, which are described in Makarova et al. “Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants” Nature Reviews Microbiology, 18:67-81 (Feb 2020), incorporated herein by reference. Each type of Class 2 system is further divided into subtypes. See Markova et al. 2020, particularly at Figure. 2. Class 2, Type II systems can be divided into 4 subtypes: II-A, II-B, II-C1, and II-C2. Class 2, Type V systems can be divided into 17 subtypes:

V-A, V-Bl, V-B2, V-C, V-D, V-E, V-Fl, V-F1(V-U3), V-F2, V-F3, V-G, V-H, V-I, V-K (V-U5),

V-Ul, V-U2, and V-U4. Class 2, Type IV systems can be divided into 5 subtypes: VI- A, VI-B1,

VI-B2, VI-C, and VI-D.

[0073] The distinguishing feature of these types is that their effector complexes consist of a single, large, multi-domain protein. Type V systems differ from Type II effectors (e.g., Cas9), which contain two nuclear domains that are each responsible for the cleavage of one strand of the target DNA, with the HNH nuclease inserted inside the Ruv-C like nuclease domain sequence. The Type V systems (e.g., Casl2) only contain a RuvC-like nuclease domain that cleaves both strands. Type VI (Casl3) are unrelated to the effectors of Type II and V systems and contain two

HEPN domains and target RNA. Casl3 proteins also display collateral activity that is triggered by target recognition. Some Type V systems have also been found to possess this collateral activity with two single- stranded DNA in in vitro contexts.

[0074] In some embodiments, the Class 2 system is a Type II system. In some embodiments, the Type II CRISPR-Cas system is a II-A CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-B CRISPR-Cas system. In some embodiments, the Type II CRISPR- Cas system is a II-C1 CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-C2 CRISPR-Cas system. In some embodiments, the Type II system is a Cas9 system. In some embodiments, the Type II system includes a Cas9.

[0075] In some embodiments, the Class 2 system is a Type V system. In some embodiments, the Type V CRISPR-Cas system is a V-A CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-Bl CRISPR-Cas system. In some embodiments, the Type V

CRISPR-Cas system is a V-B2 CRISPR-Cas system. In some embodiments, the Type V CRISPR- Cas system is a V-C CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-D CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-E CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-Fl CRISPR- Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-Fl (V-U3) CRISPR-

Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F3 CRISPR-Cas system.

In some embodiments, the Type V CRISPR-Cas system is a V-G CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-H CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-I CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-K (V-U5) CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-Ul CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U4 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system includes a Casl2a (Cpfl), Casl2b (C2cl), Casl2c (C2c3), Casl2d (CasY), Casl2e (CasX), Casl4, and/or CasO.

[0076] In some embodiments the Class 2 system is a Type VI system. In some embodiments, the Type VI CRISPR-Cas system is a VI-A CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B1 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B2 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-C CRISPR-Cas system. In some embodiments, the Type VI CRISPR- Cas system is a VI-D CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system includes a Casl3a (C2c2), Casl3b (Group 29/30), Casl3c, and/or Casl3d.

Guide Molecules

[0077] The following include general design principles that may be applied to the guide molecule. The terms guide molecule, guide sequence and guide polynucleotide refer to polynucleotides capable of guiding Cas to a target genomic locus and are used interchangeably as in foregoing cited documents such as International Patent Publication No. WO 2014/093622 (PCT/US2013/074667). In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence. The guide molecule can be a polynucleotide.

[0078] The ability of a guide sequence (within a nucleic acid-targeting guide RNA) to direct sequence-specific binding of a nucleic acid-targeting complex to a target nucleic acid sequence may be assessed by any suitable assay. For example, the components of a nucleic acid-targeting CRISPR system sufficient to form a nucleic acid-targeting complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target nucleic acid sequence, such as by transfection with vectors encoding the components of the nucleic acid-targeting complex, followed by an assessment of preferential targeting (e.g., cleavage) within the target nucleic acid sequence, such as by Surveyor assay (Qui et al. 2004. BioTechniques. 36(4)702-707). Similarly, cleavage of a target nucleic acid sequence may be evaluated in a test tube by providing the target nucleic acid sequence, components of a nucleic acid-targeting complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible and will occur to those skilled in the art.

[0079] In some embodiments, the guide molecule is an RNA. The guide molecule(s) (also referred to interchangeably herein as guide polynucleotide and guide sequence) that are included in the CRISPR-Cas or Cas based system can be any polynucleotide sequence having sufficient complementarity with a target nucleic acid sequence to hybridize with the target nucleic acid sequence and direct sequence-specific binding of a nucleic acid-targeting complex to the target nucleic acid sequence. In some embodiments, the degree of complementarity, when optimally aligned using a suitable alignment algorithm, can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, CA), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).

[0080] A guide sequence, and hence a nucleic acid-targeting guide, may be selected to target any target nucleic acid sequence. The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (IncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of ncRNA, and IncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.

[0081] In some embodiments, a nucleic acid-targeting guide is selected to reduce the degree secondary structure within the nucleic acid-targeting guide. In some embodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the nucleic acid-targeting guide participate in self-complementary base pairing when optimally folded. Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g., A.R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and GM Church, 2009, Nature Biotechnology 27(12): 1151-62).

[0082] In one example embodiment, a guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat (DR) sequence and a guide sequence or spacer sequence. In another example embodiment, the guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat sequence fused or linked to a guide sequence or spacer sequence. In another example embodiment, the direct repeat sequence may be located upstream (i.e., 5’) from the guide sequence or spacer sequence. In other embodiments, the direct repeat sequence may be located downstream (i.e., 3’) from the guide sequence or spacer sequence.

[0083] In one example embodiment, the crRNA comprises a stem loop, preferably a single stem loop. In one example embodiment, the direct repeat sequence forms a stem loop, preferably a single stem loop.

[0084] In one example embodiment, the spacer length of the guide RNA is from 15 to 35 nt. In another example embodiment, the spacer length of the guide RNA is at least 15 nucleotides. In another example embodiment, the spacer length is from 15 to 17 nt, e.g., 15, 16, or 17 nt, from 17 to 20 nt, e.g., 17, 18, 19, or 20 nt, from 20 to 24 nt, e.g., 20, 21, 22, 23, or 24 nt, from 23 to 25 nt, e.g., 23, 24, or 25 nt, from 24 to 27 nt, e.g., 24, 25, 26, or 27 nt, from 27 to 30 nt, e.g., 27, 28, 29, or 30 nt, from 30 to 35 nt, e.g., 30, 31, 32, 33, 34, or 35 nt, or 35 nt or longer.

[0085] The “tracrRNA” sequence or analogous terms includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize. In some embodiments, the degree of complementarity between the tracrRNA sequence and crRNA sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and crRNA sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin.

[0086] In general, degree of complementarity is with reference to the optimal alignment of the sea sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm and may further account for secondary structures, such as self-complementarity within either the sea sequence or tracr sequence. In some embodiments, the degree of complementarity between the tracr sequence and sea sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.

[0087] In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%; a guide or RNA or sgRNA can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length; or guide or RNA or sgRNA can be less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length; and tracr RNA can be 30 or 50 nucleotides in length. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5% or 95% or 95.5% or 96% or 96.5% or 97% or 97.5% or 98% or 98.5% or 99% or 99.5% or 99.9%, or 100%. Off target is less than 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% or 94% or 93% or 92% or 91% or 90% or 89% or 88% or 87% or 86% or 85% or 84% or 83% or 82% or 81% or 80% complementarity between the sequence and the guide, with it being advantageous that off target is 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% complementarity between the sequence and the guide.

[0088] In some embodiments according to the invention, the guide RNA (capable of guiding Cas to a target locus) may comprise (1) a guide sequence capable of hybridizing to a genomic target locus in the eukaryotic cell; (2) a tracr sequence; and (3) a tracr mate sequence. All of (1) to (3) may reside in a single RNA, i.e., an sgRNA (arranged in a 5’ to 3’ orientation), or the tracr RNA may be a different RNA than the RNA containing the guide and tracr sequence. The tracr hybridizes to the tracr mate sequence and directs the CRTSPR/Cas complex to the target sequence. Where the tracr RNA is on a different RNA than the RNA containing the guide and tracr sequence, the length of each RNA may be optimized to be shortened from their respective native lengths, and each may be independently chemically modified to protect from degradation by cellular RNase or otherwise increase stability.

[0089] Many modifications to guide sequences are known in the art and are further contemplated within the context of this invention. Various modifications may be used to increase the specificity of binding to the target sequence and/or increase the activity of the Cas protein and/or reduce off-target effects. Example guide sequence modifications are described in International Patent Application No. PCT US2019/045582, specifically paragraphs [0178]-[0333], which is incorporated herein by reference.

Tarset Sequences, PAMs, and PFSs

[0090] In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. In other words, the target polynucleotide can be a polynucleotide or a part of a polynucleotide to which a part of the guide sequence is designed to have complementarity with and to which the effector function mediated by the complex comprising the CRISPR effector protein and a guide molecule is to be directed. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell.

[0091] PAM elements are sequences that can be recognized and bound by Cas proteins. Cas proteins/effector complexes can then unwind the dsDNA at a position adjacent to the PAM element. It will be appreciated that Cas proteins and systems target RNA do not require PAM sequences (Marraffini et al. 2010. Nature. 463:568-571). Instead, many rely on PFSs, which are discussed elsewhere herein. In one example embodiment, the target sequence should be associated with a PAM (protospacer adjacent motif) or PFS (protospacer flanking sequence or site), that is, a short sequence recognized by the CRISPR complex. Depending on the nature of the CRISPR-Cas protein, the target sequence should be selected, such that its complementary sequence in the DNA duplex (also referred to herein as the non-target sequence) is upstream or downstream of the PAM. In the embodiments, the complementary sequence of the target sequence is downstream or 3’ of the PAM or upstream or 5’ of the PAM. The precise sequence and length requirements for the PAM differ depending on the Cas protein used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence). Examples of the natural PAM sequences for different Cas proteins are provided herein below and the skilled person will be able to identify further PAM sequences for use with a given Cas protein.

[0092] The ability to recognize different PAM sequences depends on the Cas polypeptide(s) included in the system. See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517. Table A (from Gleditzsch et al. 2019) below shows several Cas polypeptides and the PAM sequence they recognize.

[0093] In a preferred embodiment, the CRISPR effector protein may recognize a 3’ PAM. In one example embodiment, the CRISPR effector protein may recognize a 3’ PAM which is 5’H, wherein H is A, C or U.

[0094] Further, engineering of the PAM Interacting (PI) domain on the Cas protein may allow programing of PAM specificity, improve target site recognition fidelity, and increase the versatility of the CRISPR-Cas protein, for example as described for Cas9 in Kleinstiver BP et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015 Jul 23;523(7561):481-5. doi: 10.1038/naturel4592. As further detailed herein, the skilled person will understand that Cast 3 proteins may be modified analogously. Gao et al, “Engineered Cpfl Enzymes with Altered PAM Specificities,” bioRxiv 091611; doi: dx.doi.org/10.1101/091611 (Dec. 4, 2016). Doench et al. created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. The authors showed that optimization of the PAM improved activity and also provided an on-line tool for designing sgRNAs.

[0095] PAM sequences can be identified in a polynucleotide using an appropriate design tool, which are commercially available as well as online. Such freely available tools include, but are not limited to, CRISPRFinder and CRISPRTarget. Mojica et al. 2009. Microbiol. 155(Pt. 3):733-740; Atschul et al. 1990. J. Mol. Biol. 215:403-410; Biswass et al. 2013 RNA Biol. 10:817-827; and Grissa et al. 2007. Nucleic Acid Res. 35:W52-57. Experimental approaches to PAM identification can include, but are not limited to, plasmid depletion assays (Jiang et al. 2013. Nat. Biotechnol. 31:233-239; Esvelt et al. 2013. Nat. Methods. 10: 1116-1121; Kleinstiver et al. 2015. Nature. 523:481-485), screened by a high-throughput in vivo model called PAM-SCNAR (Pattanayak et al. 2013. Nat. Biotechnol. 31 :839-843 and Leenay et al. 2016. Mol. Cell. 16:253), and negative screening (Zetsche et al. 2015. Cell. 163:759-771).

[0096] As previously mentioned, CRISPR-Cas systems that target RNA do not typically rely on PAM sequences. Instead such systems typically recognize protospacer flanking sites (PFSs) instead of PAMs Thus, Type VI CRISPR-Cas systems typically recognize protospacer flanking sites (PFSs) instead of PAMs. PFSs represents an analogue to PAMs for RNA targets. Type VI CRISPR-Cas systems employ a Casl3. Some Casl3 proteins analyzed to date, such as Casl3a (C2c2) identified from Leptotrichia shahii (LShCAsl3a) have a specific discrimination against G at the 3 ’end of the target RNA. The presence of a C at the corresponding crRNA repeat site can indicate that nucleotide pairing at this position is rejected. However, some Casl3 proteins (e.g., LwaCAsl3a and PspCasl3b) do not seem to have a PFS preference. See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4): 504-517. [0097] Some Type VI proteins, such as subtype B, have 5 '-recognition of D (G, T, A) and a 3 '-motif requirement of NAN or NNA. One example is the Cast 3b protein identified in Bergey ella zoohelcum (BzCasl3b). See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517.

[0098] Overall Type VI CRISPR-Cas systems appear to have less restrictive rules for substrate

(e g., target sequence) recognition than those that target DNA (e.g., Type V and type II).

Sequences related to nucleus targetins and transportation

[0099] In some embodiments, one or more components (e.g., the Cas protein) in the composition for engineering cells may comprise one or more sequences related to nucleus targeting and transportation. Such sequences may facilitate the one or more components in the composition for targeting a sequence within a cell. In order to improve targeting of the CRISPR-Cas protein used in the methods of the present disclosure to the nucleus, it may be advantageous to provide one or both of these components with one or more nuclear localization sequences (NLSs).

[0100] In one example embodiment, the NLSs used in the context of the present disclosure are heterologous to the proteins. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 15272) or PKKKRKVEAS (SEQ ID NO: 15273); the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 15274)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 15275) or RQRRNELKRSP (SEQ ID NO: 15276); the hRNPAl M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 15277); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 15278) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 15279) and PPKKARED (SEQ ID NO: 15280) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 15281) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO: 15282) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO: 15283) and PKQKKRK (SEQ ID NO: 15284) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO: 15285) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO: 15286) of the mouse Mxl protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 15287) of the human poly(ADP- ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 15288) of the steroid hormone receptors (human) glucocorticoid. In general, the one or more NLSs are of sufficient strength to drive accumulation of the DNA-targeting Cas protein in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number ofNLSs in the CRISPR-Cas protein, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the nucleic acid-targeting protein, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g., a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of nucleic acid-targeting complex formation (e.g., assay for deaminase activity) at the target sequence, or assay for altered gene expression activity affected by DNA-targeting complex formation and/or DNA-targeting), as compared to a control not exposed to the Cas protein, or exposed to a Cas protein lacking the one or more NLSs.

[0101] The Cas proteins may be provided with 1 or more, such as with, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more heterologous NLSs. In some embodiments, the proteins comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g., zero or at least one or more NLS at the amino-terminus and zero or at one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. In preferred embodiments of the Cas proteins, an NLS attached to the C- terminal of the protein.

[0102] In certain embodiments, the CRISPR-Cas protein and a functional domain protein (described further herein) are delivered to the cell or expressed within the cell as separate proteins. In these embodiments, each of the CRISPR-Cas and functional domain protein can be provided with one or more NLSs as described herein. In certain embodiments, the CRISPR-Cas and functional domain protein are delivered to the cell or expressed with the cell as a fusion protein. In these embodiments one or both of the CRISPR-Cas and functional domain protein is provided with one or more NLSs. Where the functional domain protein is fused to an adaptor protein (such as MS2) as described above, the one or more NLS can be provided on the adaptor protein, provided that this does not interfere with aptamer binding. In particular embodiments, the one or more NLS sequences may also function as linker sequences between the functional domain protein and the CRISPR-Cas protein.

[0103] In certain embodiments, guides of the disclosure comprise specific binding sites (e.g. aptamers) for adapter proteins, which may be linked to or fused to a functional domain protein or catalytic domain thereof. When such a guide forms a CRISPR complex (e.g., CRISPR-Cas protein binding to guide and target), the adapter proteins bind and the functional domain protein or catalytic domain thereof associated with the adapter protein is positioned in a spatial orientation which is advantageous for the attributed function to be effective.

[0104] The skilled person will understand that modifications to the guide which allow for binding of the adapter + nucleotide deaminase, but not proper positioning of the adapter + nucleotide deaminase (e.g., due to steric hindrance within the three-dimensional structure of the CRISPR complex) are modifications which are not intended. The one or more modified guide may be modified at the tetra loop, the stem loop 1, stem loop 2, or stem loop 3, as described herein, preferably at either the tetra loop or stem loop 2, and in some cases at both the tetra loop and stem loop 2.

[0105] In some embodiments, a component (e.g., the dead Cas protein, the functional domain protein or catalytic domain thereof, or a combination thereof) in the systems may comprise one or more nuclear export signals (NES), one or more nuclear localization signals (NLS), or any combinations thereof. In some cases, the NES may be an HIV Rev NES. In certain cases, the NES may be MAPK NES. When the component is a protein, the NES or NLS may be at the C terminus of component. Alternatively or additionally, the NES or NLS may be at the N terminus of component. In some examples, the Cas protein and optionally said functional domain protein or catalytic domain thereof comprise one or more heterologous nuclear export signal(s) (NES(s)) or nuclear localization signal(s) (NLS(s)), preferably an HIV Rev NES or MAPK NES, preferably C-terminal. CRISPR-Cas Cleavage

[0106] In one example embodiment, the CRISPR-Cas system may induce a double- or singlestranded break at a designated site in the target sequence. The CRISPR-Cas system may introduce an indel, which, as used herein, refers to insertions or deletions of the DNA at particular locations on the chromosome. The site of CRISPR-Cas cleavage, for most CRISPR-Cas systems, is dictated by distance from a protospacer-adjacent motif (PAM). Accordingly, a guide sequence may be selected to direct the CRISPR-Cas system to induce cleavage at a desired target site at or near the one or more variants.

NHEJ-Based Editing

[0107] In one example embodiment, the CRISPR-Cas system is used to introduce one or more insertions or deletions to a target sequence on the gene or enhancer associated with the gene such that one or more indels or insertions reduce expression or activity of the one or more polypeptides. More than one guide sequence may be selected to insert multiple insertion, deletions, or combination thereof. Likewise, more than one Cas protein type may be used, for example, to maximize targets sites adjacent to different PAMs. In one example embodiment, a guide sequence is selected that directs the CRISPR-Cas system to make one or more insertions or deletions within the enhancer region. In one example embodiment, a guide is selected that directs the CRISPR-Cas system to make an insertion 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 base pairs upstream of an enhancer controlling expression of a target gene. In one example embodiment, a guide sequence is selected to that directs the CRISPR-Cas system to make an insertion 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 base pairs downstream of an enhancer controlling expression of a target gene. In one example embodiment, a guide sequence is selected that directs the CRISPR-Cas system to make a deletion 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 base pairs downstream of an enhancer controlling expression of a target gene. In one example embodiment, a guide sequence is selected that directs the CRISPR-Cas system to make a deletion 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 base pairs downstream of an enhancer controlling expression of a target gene. HDR Template Based Editins

[0108] In one example embodiment, a donor template is provided to replace a genomic sequence in a target gene or sequence controlling expression of the target gene. A donor template may comprise an insertion sequence flanked by two homology regions. The insertion sequence comprises an edited sequence to be inserted in place of the target sequence (e.g. a portion of genomic DNA to be edited). The homology regions comprise sequences that are homologous to the genomic DNA strands at the site of the CRISPR-Cas induced double-strand break. Cellular HDR mechanisms then facilitate insertion of the insertion sequence at the site of the DSB.

[0001] Accordingly, in certain example embodiments, a donor template and guide sequence are selected to direct excision and replacement of a section of genome DNA comprising an enhancer controlling expression of a target gene or a section of genome DNA within the gene that is required for activity of the target gene. In one example embodiment, the insertion sequence comprises a transcription factor binding site that recruits a repressor to the gene.

[0109] The donor template may include a sequence which results in a change in sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more nucleotides of the target sequence.

[0110] A donor template may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length. In an embodiment, the template nucleic acid may be 20+/- 10, 30+/- 10, 40+/- 10, 50+/- 10, 60+/- 10, 70+/- 10, 80+/- 10, 90+/- 10, 100+/- 10, 1 10+/- 10, 120+/- 10, 130+/- 10, 140+/- 10, 150+/- 10, 160+/- 10, 170+/- 10, 1 80+/- 10, 190+/- 10, 200+/- 10, 210+/- 10, of 220+/- 10 nucleotides in length. In an embodiment, the template nucleic acid may be 30+/-20, 40+/-20, 50+/-20, 60+/-20, 70+/- 20, 80+/-20, 90+/-20, 100+/-20, 1 10+/-20, 120+/-20, 130+/-20, 140+/-20, I 50+/-20, 160+/-20, 170+/-20, 180+/-20, 190+/-20, 200+/-20, 210+/-20, of 220+/-20 nucleotides in length. In an embodiment, the template nucleic acid is 10 to 1 ,000, 20 to 900, 30 to 800, 40 to 700, 50 to 600, 50 to 500, 50 to 400, 50 to300, 50 to 200, or 50 to 100 nucleotides in length.

[0U1] The homology regions of the donor template may be complementary to a portion of a polynucleotide comprising the target sequence. When optimally aligned, a donor template might overlap with one or more nucleotides of a target sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more nucleotides). In some embodiments, when a template sequence and a polynucleotide comprising a target sequence are optimally aligned, the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence. [0112] The donor template comprises a sequence to be integrated (e.g., a mutated gene). The sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function.

[0113] Homology arms of the donor template may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000.

[0114] In one example embodiment, one or both homology arms may be shortened to avoid including certain sequence repeat elements. For example, a 5' homology arm may be shortened to avoid a sequence repeat element. In other embodiments, a 3' homology arm may be shortened to avoid a sequence repeat element. In some embodiments, both the 5' and the 3' homology arms may be shortened to avoid including certain sequence repeat elements.

[0115] The donor template may further comprise a marker. Such a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers. The donor template of the disclosure can be constructed using recombinant techniques (see, for example, Sambrook et al., 2001 and Ausubel et al., 1996). [0116] In one example embodiment, a donor template is a single-stranded oligonucleotide. When using a single-stranded oligonucleotide, 5' and 3' homology arms may range up to about 200 base pairs (bp) in length, e.g., at least 25, 50, 75, 100, 125, 150, 175, or 200 bp in length.

[0117] Suzuki et al. describe in vivo genome editing via CRISPR/Cas9 mediated homologyindependent targeted integration (2016, Nature 540:144-149).

Templates

[0118] In some embodiments, a composition for engineering cells comprises a template, e.g., a recombination template. A template may be a component of another vector as described herein, contained in a separate vector, or provided as a separate polynucleotide. In some embodiments, a recombination template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-targeting effector protein as a part of a nucleic acid-targeting complex.

[0119] In an embodiment, the template nucleic acid alters the sequence of the target position. In an embodiment, the template nucleic acid results in the incorporation of a modified, or non- naturally occurring base into the target nucleic acid.

[0120] The template sequence may undergo a breakage mediated or catalyzed recombination with the target sequence. In an embodiment, the template nucleic acid may include sequence that corresponds to a site on the target sequence that is cleaved by a Cas protein mediated cleavage event. In an embodiment, the template nucleic acid may include a sequence that corresponds to both, a first site on the target sequence that is cleaved in a first Cas protein mediated event, and a second site on the target sequence that is cleaved in a second Cas protein mediated event.

[0121] In certain embodiments, the template nucleic acid can include a sequence which results in an alteration in the coding sequence of a translated sequence, e.g., one which results in the substitution of one amino acid for another in a protein product, e.g., transforming a mutant allele into a wild type allele, transforming a wild type allele into a mutant allele, and/or introducing a stop codon, insertion of an amino acid residue, deletion of an amino acid residue, or a nonsense mutation. In certain embodiments, the template nucleic acid can include a sequence which results in an alteration in a non-coding sequence, e.g., an alteration in an exon or in a 5' or 3' non-translated or non-transcribed region. Such alterations include an alteration in a control element, e.g., a promoter, enhancer, and an alteration in a cis-acting or trans-acting control element.

[0122] A template nucleic acid having homology with a target position in a target gene may be used to alter the structure of a target sequence. The template sequence may be used to alter an unwanted structure, e.g., an unwanted or mutant nucleotide. The template nucleic acid may include a sequence which, when integrated, results in decreasing the activity of a positive control element; increasing the activity of a positive control element; decreasing the activity of a negative control element; increasing the activity of a negative control element; decreasing the expression of a gene; increasing the expression of a gene; increasing resistance to a disorder or disease; increasing resistance to viral entry; correcting a mutation or altering an unwanted amino acid residue conferring, increasing, abolishing or decreasing a biological property of a gene product, e.g., increasing the enzymatic activity of an enzyme, or increasing the ability of a gene product to interact with another molecule.

[0123] The template nucleic acid may include a sequence which results in a change in sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12 or more nucleotides of the target sequence.

[0124] A template polynucleotide may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length. In an embodiment, the template nucleic acid may be 20+/- 10, 30+/- 10, 40+/- 10, 50+/- 10, 60+/- 10, 70+/- 10, 80+/- 10, 90+/- 10, 100+/- 10, 1 10+/- 10, 120+/- 10, 130+/- 10, 140+/- 10, 150+/- 10, 160+/- 10, 170+/- 10, 1 80+/- 10, 190+/- 10, 200+/- 10, 210+/- 10, of 220+/- 10 nucleotides in length. In an embodiment, the template nucleic acid may be 30+/-20, 40+/-20, 50+/-20, 60+/-20, 70+/- 20, 80+/-20, 90+/-20, 100+/-20, 1 10+/-20, 120+/-20, 130+/-20, 140+/-20, 1 50+/-20, 160+/- 20, 170+/-20, 180+/-20, 190+/-20, 200+/-20, 210+/-20, of 220+/-20 nucleotides in length. In an embodiment, the template nucleic acid is 10 to 1,000, 20 to 900, 30 to 800, 40 to 700, 50 to 600, 50 to 500, 50 to 400, 50 to300, 50 to 200, or 50 to 100 nucleotides in length.

[0125] In some embodiments, the template polynucleotide is complementary to a portion of a polynucleotide comprising the target sequence. When optimally aligned, a template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more nucleotides). In some embodiments, when a template sequence and a polynucleotide comprising a target sequence are optimally aligned, the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.

[0126] The exogenous polynucleotide template comprises a sequence to be integrated (e.g., a mutated gene). The sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a noncoding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function. [0127] An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000.

[0128] An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000

[0129] In certain embodiments, one or both homology arms may be shortened to avoid including certain sequence repeat elements. For example, a 5' homology arm may be shortened to avoid a sequence repeat element. In other embodiments, a 3' homology arm may be shortened to avoid a sequence repeat element. In some embodiments, both the 5' and the 3' homology arms may be shortened to avoid including certain sequence repeat elements.

[0130] In some methods, the exogenous polynucleotide template may further comprise a marker. Such a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers. The exogenous polynucleotide template of the disclosure can be constructed using recombinant techniques (see, for example, Sambrook et al., 2001 and Ausubel et al., 1996).

[0131] In certain embodiments, a template nucleic acid for correcting a mutation may designed for use as a single-stranded oligonucleotide. When using a single- stranded oligonucleotide, 5' and 3' homology arms may range up to about 200 base pairs (bp) in length, e.g., at least 25, 50, 75, 100, 125, 150, 175, or 200 bp in length.

[0132] Suzuki et al. describe in vivo genome editing via CRISPR/Cas9 mediated homologyindependent targeted integration (2016, Nature 540:144-149).

Specialized Cas-based Systems

[0133] In some embodiments, the system is a Cas-based system that is capable of performing a specialized function or activity. For example, the Cas protein may be fused, operably coupled to, or otherwise associated with one or more functionals domains. In certain example embodiments, the Cas protein may be a catalytically dead Cas protein (“dCas”) and/or have nickase activity. A nickase is a Cas protein that cuts only one strand of a double stranded target. In such embodiments, the dCas or nickase provide a sequence specific targeting functionality that delivers the functional domain to or proximate a target sequence. Example functional domains that may be fused to, operably coupled to, or otherwise associated with a Cas protein can be or include, but are not limited to a nuclear localization signal (NLS) domain, a nuclear export signal (NES) domain, a translational activation domain, a transcriptional activation domain (e.g. VP64, p65, MyoDl, HSF1, RTA, and SET7/9), a translation initiation domain, a transcriptional repression domain (e.g., a KRAB domain, NuE domain, NcoR domain, and a SID domain such as a SID4X domain), a nuclease domain (e.g., FokI), a histone modification domain (e.g., a histone acetyltransferase), a light inducible/controllable domain, a chemically inducible/controllable domain, a transposase domain, a homologous recombination machinery domain, a recombinase domain, an integrase domain, and combinations thereof. Methods for generating catalytically dead Cas9 or a nickase Cas9 (WO 2014/204725, Ran et al. Cell. 2013 Sept 12; 154(6): 1380-1389), Casl2 (Liu et al. Nature Communications, 8, 2095 (2017), and Casl3 (International Patent Publication Nos. WO 2019/005884 and W02019/060746) are known in the art and incorporated herein by reference.

[0134] In some embodiments, the functional domains can have one or more of the following activities: methylase activity, demethylase activity, translation activation activity, translation initiation activity, translation repression activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, molecular switch activity, chemical inducibility, light inducibility, and nucleic acid binding activity. In some embodiments, the one or more functional domains may comprise epitope tags or reporters. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporters include, but are not limited to, glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and auto-fluorescent proteins including blue fluorescent protein (BFP). [0135] The one or more functional domain(s) may be positioned at, near, and/or in proximity to a terminus of the effector protein (e.g., a Cas protein). In embodiments having two or more functional domains, each of the two can be positioned at or near or in proximity to a terminus of the effector protein (e.g., a Cas protein). In some embodiments, such as those where the functional domain is operably coupled to the effector protein, the one or more functional domains can be tethered or linked via a suitable linker (including, but not limited to, GlySer linkers) to the effector protein (e.g., a Cas protein). When there is more than one functional domain, the functional domains can be same or different. In some embodiments, all the functional domains are the same. In some embodiments, all of the functional domains are different from each other. In some embodiments, at least two of the functional domains are different from each other. In some embodiments, at least two of the functional domains are the same as each other.

[0136] Other suitable functional domains can be found, for example, in International Patent Publication No. WO 2019/018423.

Split CRISPR-Cas systems

[0137] In one example embodiment, the CRISPR-Cas system is a split CRISPR-Cas system. See e.g., Zetche et al., 2015. Nat. Biotechnol. 33(2): 139-142 and International Patent Publication WO 2019/018423, the compositions and techniques of which can be used in and/or adapted for use with the present invention. Split CRISPR-Cas proteins are set forth herein and in documents incorporated herein by reference in further detail herein. In certain embodiments, each part of a split CRISPR protein is attached to a member of a specific binding pair, and when bound with each other, the members of the specific binding pair maintain the parts of the CRISPR protein in proximity. In certain embodiments, each part of a split CRISPR protein is associated with an inducible binding pair. An inducible binding pair is one which is capable of being switched “on” or “off’ by a protein or small molecule that binds to both members of the inducible binding pair. In some embodiments, CRISPR proteins may preferably split between domains, leaving domains intact. In particular embodiments, said Cas split domains (e.g., RuvC and HNH domains in the case of Cas9) can be simultaneously or sequentially introduced into the cell such that said split Cas domain(s) process the target nucleic acid sequence in the algae cell. The reduced size of the split Cas compared to the wild type Cas allows other methods of delivery of the systems to the cells, such as the use of cell penetrating peptides as described herein. DNA and RNA Base Editing

[0138] In one example embodiment, the gene editing system configured to modify the one or more target genes disclosed herein is a base editing system. In one example embodiment, a Cas protein is connected or fused to a nucleotide deaminase. As used herein, “base editing” refers generally to the process of polynucleotide modification via a CRISPR-Cas-based or Cas-based system that does not include excising nucleotides to make the modification. Base editing can convert base pairs at precise locations without generating excess undesired editing byproducts that can be made using traditional CRISPR-Cas systems. Accordingly, in one example embodiment, the base editing system edits the target gene to reduce or eliminate its expression.

[0139] In one example embodiment, the nucleotide deaminase may be a DNA base editor used in combination with a DNA binding Cas protein such as, but not limited to, Class 2 Type II and Type V systems. Two classes of DNA base editors are generally known: cytosine base editors (CBEs) and adenine base editors (ABEs). CBEs convert a C*G base pair into a T*A base pair (Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Li et al. Nat. Biotech. 36:324-327) and ABEs convert an A«T base pair to a G»C base pair. Collectively, CBEs and ABEs can mediate all four possible transition mutations (C to T, A to G, T to C, and G to A). Rees and Liu. 2O18.Nat. Rev. Genet. 19(12): 770-788, particularly at Figures lb, 2a-2c, 3a-3f, and Table 1. In some embodiments, the base editing system includes a CBE and/or an ABE. In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a base editing system. Rees and Liu. 2018. Nat. Rev. Gent. 19(12):770-788. Base editors also generally do not need a DNA donor template and/or rely on homology-directed repair. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551 :464-471. Upon binding to a target locus in the DNA, base pairing between the guide RNA of the system and the target DNA strand leads to displacement of a small segment of ssDNA in an “R-loop”. Nishimasu et al. Cell. 156:935-949. DNA bases within the ssDNA bubble are modified by the enzyme component, such as a deaminase. In some systems, the catalytically disabled Cas protein can be a variant or modified Cas can have nickase functionality and can generate a nick in the non-edited DNA strand to induce cells to repair the non-edited strand using the edited strand as a template. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551 :464-471. [0140] Other Example Type V base editing systems are described in International Patent Publication Nos. WO 2018/213708, WO 2018/213726, and International Patent Applications No. PCT/US2018/067207, PCT/US2018/067225, and PCT/US2018/067307, each of which is incorporated herein by reference.

[0141] In one example embodiment, the base editing system may be an RNA base editing system. As with DNA base editors, a nucleotide deaminase capable of converting nucleotide bases may be fused to a Cas protein. However, in these embodiments, the Cas protein will need to be capable of binding RNA. Example RNA binding Cas proteins include, but are not limited to, RNA- binding Cas9s such as Francisella novicida Cas9 (“FnCas9”), and Class 2 Type VI Cas systems. The nucleotide deaminase may be a cytidine deaminase or an adenosine deaminase, or an adenosine deaminase engineered to have cytidine deaminase activity. In certain example embodiments, the RNA base editor may be used to delete or introduce a post-translation modification site in the expressed mRNA. In contrast to DNA base editors, whose edits are permanent in the modified cell, RNA base editors can provide edits where finer, temporal control may be needed, for example in modulating a particular immune response. Example Type VI RNA- base editing systems are described in Cox et al. 2017. Science 358: 1019-1027, International Patent Publication Nos. WO 2019/005884, WO 2019/005886, and WO 2019/071048, and International Patent Application Nos. PCT/US20018/05179 and PCT/US2018/067207, which are incorporated herein by reference. An example FnCas9 system that may be adapted for RNA base editing purposes is described in International Patent Publication No. WO 2016/106236, which is incorporated herein by reference.

[0142] An example method for delivery of base-editing systems, including use of a split-intein approach to divide CBE and ABE into reconstituble halves, is described in Levy et al. Nature Biomedical Engineering doi.org/10.1038/s41441-019-0505-5 (2019), which is incorporated herein by reference.

Prime Editors

[0143] In one example embodiment, the gene editing system configured to modify the target genes is a prime editing system. See e.g. Anzalone et al. 2019. Nature. 576: 149-157; and International patent application publication No. W02022150790A2. Prime editing advantageously provides lower off-target editing than a Cas9 nuclease system. In example embodiments, the target gene is edited to introduce a stop codon, mutate an essential residue (e.g., an active site residue in a target enzyme, a residue essential for protein-protein binding, or a residue required for modification), or introduce a frameshift that inactivates the gene. In example a regulatory sequence, such as an enhancer sequence is edited to reduce or eliminate binding of a transcription factor.

[0144] . In one example embodiment, a genomic sequence in a target gene or sequence controlling expression of the target gene is replaced or deleted using a prime editing system. Like base editing systems, prime editing systems can be capable of targeted modification of a polynucleotide without generating double stranded breaks. Further prime editing systems are capable of all 12 possible combination swaps. Prime editing may operate via a “search-and- replace” methodology and can mediate targeted insertions, deletions, of all 12 possible base-to- base conversion and combinations thereof. Generally, a prime editing system, as exemplified by PEI, PE2, and PE3 (Id), can include a reverse transcriptase fused or otherwise coupled or associated with an RNA-programmable nickase and a prime-editing extended guide RNA (pegRNA) to facility direct copying of genetic information from the extension on the pegRNA into the target polynucleotide. Embodiments that can be used with the present invention include these and variants thereof. Prime editing can have the advantage of lower off-target activity.

[0145] In some embodiments, the prime editing guide molecule can specify both the target polynucleotide information (e.g., sequence) and contain a new polynucleotide cargo that replaces target polynucleotides. To initiate transfer from the guide molecule to the target polynucleotide, the PE system can nick the target polynucleotide at a target side to expose a 3 ’hydroxyl group, which can prime reverse transcription of an edit-encoding extension region of the guide molecule (e.g., a prime editing guide molecule or peg guide molecule) directly into the target site in the target polynucleotide. See e g., Anzalone et al. 2019. Nature. 576: 149-157, particularly atFigures lb, 1c, related discussion, and Supplementary discussion.

[0146] In some embodiments, a prime editing system can be composed of a Cas polypeptide having nickase activity, a reverse transcriptase, and a guide molecule. The Cas polypeptide can lack nuclease activity. The guide molecule can include a target binding sequence as well as a primer binding sequence and a template containing the edited polynucleotide sequence. The guide molecule, Cas polypeptide, and/or reverse transcriptase can be coupled together or otherwise associate with each other to form an effector complex and edit a target sequence. In some embodiments, the Cas polypeptide is a Class 2, Type V Cas polypeptide. In some embodiments, the Cas polypeptide is a Cas9 polypeptide (e.g., is a Cas9 nickase). In some embodiments, the Cas polypeptide is fused to the reverse transcriptase. In some embodiments, the Cas polypeptide is linked to the reverse transcriptase.

[0147] In some embodiments, the prime editing system can be a PEI system or variant thereof, a PE2 system or variant thereof, or a PE3 (e.g., PE3, PE3b) system. See e.g., Anzalone et al. 2019. Nature. 576: 149-157, particularly at pgs. 2-3, Figs. 2a, 3a-3f, 4a-4b, Extended data Figs. 3a-3b, 4,

[0148] The peg guide molecule can be about 10 to about 200 or more nucleotides in length, such as lO to/or l l, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,

33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,

59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,

85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,

108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,

127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145,

146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,

165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,

184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, or 200 or more nucleotides in length. Optimization of the peg guide molecule can be accomplished as described in Anzalone et al. 2019. Nature. 576: 149-157, particularly at pg. 3, Fig. 2a-2b, and Extended Data Figs. 5a-c.

[0149] Prime editing can also include a system that uses a prime editor (PE) protein and two prime editing guide RNAs (pegRNAs), such that, the two pegRNAs template the synthesis of complementary DNA flaps on opposing strands of genomic DNA, which replace the endogenous DNA sequence between the PE-induced nick sites. See, e.g., Anzalone AV, Gao XD, Podracky CJ, et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat Biotechnol. 2022;40(5):731-740. Thus, use of two pegRNAs allows for larger insertions because of the two overlapping 3’ flaps created by the two nicked sites. The system can be combined with a site-specific serine recombinase to allow targeted integration of gene-sized DNA plasmids (>5,000 bp) and targeted sequence inversions of 40 kb in human cells. Id. In one example embodiment, the system can be used to insert or replace a sequence into one or more target genes. In example embodiments, the insertion or replacement results in an inactive target gene or less active form of the target gene. In one example embodiment, the system is used to replace all or a portion of the entire target gene. In one example embodiment, the system is used to replace all or a portion of an enhancer controlling the target gene expression.

CRISPR-directed integrase

[0150] In example embodiments, the prime editing system inserts a serine integrase attachment site for large, multiplexed gene insertion without reliance on DNA repair pathways. See, e.g., Yarnall MTN, loannidi El, Schmitt-Ulms C, et al. Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases [published online ahead of print, 2022 Nov 24] Nat Biotechnol. 2022. This system is a variation of prime editing that includes all of the components of prime editing, but with an integrase. Serine integrases typically insert sequences containing an attP attachment site into a target containing the related attB attachment site. By using programmable genome editing to place integrase landing sites at desired locations in the genome, this system directly guides the activity of the associated integrase to the specific genomic site. In one embodiment, pegRNAs including attB sequences are used to insert the sites at desired locations in the genome. In one embodiment, the system uses a Cas enzyme-reverse transcriptase-integrase fusion protein to directly recruit the integrase to the target site.

[0151] ‘Uni-directional recombinases” or “integrases” refer to recombinase enzymes whose recognition sites are destroyed after the recombination has taken place. The term “integrase” refers to a type of recombinase. In other words, the sequence recognized by the recombinase is changed into one that is not recognized by the recombinase upon recombination. As a result, once a sequence is subjected to recombination by the uni-directional recombinase, the continued presence of the recombinase cannot reverse the previous recombination event.

[0152] Typically, two different sites are involved (in regards to recombination termed “complementary sites”), one present in the target nucleic acid (e.g., a chromosome or episome of a eukaryote) and another on the nucleic acid that is to be integrated at the target recombination site. The terms “attB” and “attP,” which refer to attachment (or recombination) sites originally from a bacterial target (attachment site of bacteria) and a phage donor (attachment site of phage), respectively, are used herein although recombination sites for particular enzymes may have different names. The two attachment sites can share as little sequence identity as a few base pairs. The recombination sites typically include left and right arms separated by a core or spacer region. Thus, an attB recombination site consists of BOB', where B and B' are the left and right arms, respectively, and O is the core region. Similarly, attP is POP', where P and P' are the arms and O is again the core region. Upon recombination between the attB and attP sites, and concomitant integration of a nucleic acid at the target, the recombination sites that flank the integrated DNA are referred to as “attL” and “aatR.” The attL and attR sites, using the terminology above, thus consist of BOP' and POB', respectively. In some representations herein, the “O” is omitted and attB and attP, for example, are designated as BB' and PP', respectively.

[0153] In example embodiments, the recombinase of the present invention is a serine integrase. In example embodiments, serine integrases specifically recombine when recognizing the two attachment sites specific for the integrase. In example embodiments, the heterologous sites are referred to as attP and attB, however, these terms refer to the specific sequences recognized by the specific integrase and do not refer to a single consensus sequence. Serine integrases mediate sitespecific recombination between short recognition sites located in phage genomes and bacterial chromosomes, respectively, the attachment site of phage (attP) and attachment site of bacteria (attB) (i.e., the target sites of the integrase), to form the hybrid attachment sites attL and attR. Unlike Cre and Flp recombinases that catalyze reversible site-specific recombination reactions, serine integrases are unidirectional and catalyze only attP and attB recombination without RDF or Xis accessory proteins. Thus, in the absence of any accessory factors integrase is unidirectional. In addition, DNA substrates identified by serine integrases (attP and attB) are relatively short (30- 50 bp) and have a minimal length of approximately 34-40 base pairs (bp) (Groth AC et al., Proc. Natl. Acad. Sci. USA 97, 5995-6000 (2000)). The compatibility of distinct DNA topological structures is also quite different from recognition of DNA by Hin recombinase or Tn3 resolvase. Serine integrases recognize DNA substrates specifically, not at random, but can facilitate recombination at sequences with partial identity with wild-type recombination sites, termed pseudo attachment sites (either pseudo attP or pseudo attB). A“pseudo-recombination site” is a DNA sequence recognized by a recombinase enzyme such that the recognition site differs in one or more base pairs from the wild-type recombinase recognition sequence and/or is present as an endogenous sequence in a genome that differs from the genome where the wild-type recognition sequence for the recombinase resides. “Pseudo attP site” or“pseudo attB site” refer to pseudo sites that are similar to wild-type phage or bacterial attachment site sequences, respectively, for phage integrase enzymes. “Pseudo att site” is a more general term that can refer to either a pseudo attP site or a pseudo attB site. Specific attB and attP sequences for use in the present invention include all wildtype sequences as well as pseudo attB and atP sequences.

[0154] Recombination sites used in the present methods include those recognized by unidirectional, site-directed recombinases (e.g., integrases). Non-limiting examples of serine integrases and recombination sites applicable to the present invention include ΦC31 integrase, Bxbl, ΦBT 1 integrase, Al 18, TP901-1, and R4 and the corresponding recombination sites for each (see, e g., Groth, A. C. and Calos, M. P. (2004) J. Mol. Biol. 335, 667-678; Lei, et al., FEBS Lett. 2018 Apr;592(8): 1389-1399; Singh, et al., Attachment Site Selection and Identity in Bxbl Serine Integrase-Mediated Site-Specific Recombination, PLoS Genet. 2013 May,9(5):eI003490; and Gupta, et al., Nucleic Acids Res. 2007 May; 35(10): 3407 -3419). Additional serine recombinases and recombination sites may be any of those disclosed in US 20180346934A1 and US 2010/0190178. In certain embodiments, a functional domain of the serine integrase is used.

[0155] In one example embodiment, the system can be used to insert or replace a sequence into one or more target genes. In example embodiments, the insertion or replacement results in an inactive target gene or less active form of the target gene. In one example embodiment, the system is used to replace all or a portion of the entire target gene. In one example embodiment, the system is used to replace all or a portion of an enhancer controlling the target gene expression.

CRISPR Associated Transposase (CAST) Systems

[0156] In one example embodiment, the gene editing system configured to modify the one or more target genes is a CRISPR associated transposase system (CAST). In one example embodiment, the CAST system can be used to insert or replace a sequence into one or more target genes. In example embodiments, the insertion or replacement results in an inactive target gene or less active form of the target gene. In one example embodiment, a CAST system is used to replace all or a portion of an enhancer controlling the target gene expression. CAST system can include a Cas protein that is catalytically inactive, or engineered to be catalytically active, and further comprises a transposase (or subunits thereof) that catalyze RNA-guided DNA transposition. Such systems are able to insert DNA sequences at a target site in a DNA molecule without relying on host cell repair machinery. CAST systems can be Class 1 or Class 2 CAST systems. An example Class 1 system is described in Klompe et al. Nature, doi: 10.1038/s41586-019-1323, which is in incorporated herein by reference. An example Class 2 system is described in Strecker et al. Science. 10/1126/science. aax9181 (2019), and PCT/US2019/066835 which are incorporated herein by reference.

OMEGA systems

[0157] In one example embodiment, the gene editing system configured to modify the one or more target genes is a transposon-encoded RNA-guided nuclease system, referred to herein as OMEGA (obligate mobile element-guided activity). See, e.g., Altae-Tran H, Kannan S, Demircioglu FE, et al. The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science. 2021;374(6563):57-65. OMEGA systems include, but are not limited to IscB, IsrB, TnpB systems.

[0158] In some embodiments, the nucleic acid-guided nucleases herein may be an IscB protein (see, e.g., International patent application publication No. WO2022087494A1; and Altae-Tran H, et al. 2021). An IscB protein may comprise an X domain and a Y domain as described herein. In some examples, the IscB proteins may form a complex with one or more guide molecules. In some cases, the IscB proteins may form a complex with one or more hRNA molecules which serve as a scaffold molecule and comprise guide sequences. In some examples, the IscB proteins are CRISPR-associated proteins, e.g., the loci of the nucleases are associated with an CRISPR array. In some examples, the IscB proteins are not CRISPR-associated. In some examples, the IscB protein may be homolog or ortholog of IscB proteins described in Kapitonov VV et al., ISC, a Novel Group of Bacterial and Archaeal DNA Transposons That Encode Cas9 Homologs, J Bacteriol. 2015 Dec 28;198(5):797-807. doi: 10.1128/JB.00783-15, which is incorporated by reference herein in its entirety.

[0159] In some embodiments, the nucleic acid-guided nucleases herein may be an IsrB (Insertion sequence RuvC-like OrfB) protein (see, e.g., International patent application publication No. WO2022087494A1; and Altae-Tran H, et al. 2021). IsrB refers to a group of shorter, -350 aa IscB homologs that are also encoded in IS200/605 superfamily transposons. These proteins contain a PLMP domain and split RuvC but lack the HNH domain.

[0160] In some embodiments, the nucleic acid-guided nucleases herein may be a TnpB protein (see, e.g., International patent application publication No. WO2022159892A1; and Altae-Tran H, et al. 2021). TnpB is a putative endonuclease distantly related to iscB and thought to be the ancestor of Casl2, the type V CRISPR effector. The TnpB system comprises a TnpB polypeptide and a nucleic acid component capable of forming a complex with the TnpB polypeptide and directing the complex to a target polynucleotide. The TnpB systems and TnpB/nucleic acid component complexes may also be referred to herein as OMEGA (Obligate Mobile Element Guided Activity) systems or complexes, or Ω sytems or complexes for short. TnpB systems are a distinct type of Ω sytem, which further include IscB, IsrB, and IshB systems. The nucleic acid component of Ω sytems is structurally distinct from other RNA-guided nucleases, such as CRISPR-Cas systems, and may also be refered to as a coRNA. In certain example embodiments, the TnpB systems are RNA-predominate, that is the nucleic acid component makes a larger contribution to the overall size of the TnpB complex relative to other RNA-guided nuclease systems such as CRISPR-Cas. Also, given the more minimal structural features of TnpB relative other known programmable nucleases such as CRISPR-Cas, the polynucleotide binding pocket is open and more accessible, which can facilitate greater access to and ability to manipulate, modify, edit, remove, or delete nucleotides at a target region on the bound polynucleotide.

Epigenetic Editing

[0161] In one example embodiment, the one or more agents is an epigenetic modification polypeptide comprising a DNA binding domain linked to or otherwise capable of associating with an epigenetic modification domain such that binding of the DNA binding domain at target sequence on genomic DNA (e.g., chromatin) results in one or more epigenetic modifications by the epigenetic modification domain that increases or decreases expression of the one or more polypeptides disclosed herein. As used herein, “linked to or otherwise capable of associating with” refers to a fusion protein or a recruitment domain or an adaptor protein, such as an aptamer (e.g., MS2) or an epitope tag. The recruitment domain or an adaptor protein can be linked to an epigenetic modification domain or the DNA binding domain (e.g., an adaptor for an aptamer). The epigenetic modification domain can be linked to an antibody specific for an epitope tag fused to the DNA binding domain. An aptamer can be linked to a guide sequence.

[0162] In example embodiments, the DNA binding domain is a programmable DNA binding protein linked to or otherwise capable of associating with an epigenetic modification domain. Programmable DNA binding proteins for modifying the epigenome include, but are not limited to CRISPR systems, transcription activator-like effectors (TALEs), Zn finger proteins and meganucleases (see, e.g., Thakore PI, Black JB, Hilton IB, Gersbach CA. Editing the epigenome: technologies for programmable transcription and epigenetic modulation. Nat Methods. 2016; 13(2): 127- 137; and described further herein). In example embodiments, the DNA binding domain is a nuclease-deficient RNA-guided DNA endonuclease enzyme or a nuclease-deficient endonuclease enzyme. In example embodiments, a CRISPR system having an inactivated nuclease activity (e g., dCas) is used as the DNA binding domain.

[0163] In example embodiments, the epigenetic modification domain is a functional domain and includes, but is not limited to a histone methyltransferase (HMT) domain, histone demethylase domain, histone acetyltransferase (HAT) domain, histone deacetylation (HDAC) domain, DNA methyltransferase domain, DNA demethylation domain, histone phosphorylation domain (e.g., serine and threonine, or tyrosine), histone ubiquitylation domain, histone sumoylation domain, histone ADP ribosylation domain, histone proline isomerization domain, histone biotinylation domain, histone citrullination domain (see, e.g., Epigenetics, Second Edition, 2015, Edited by C. David Allis; Marie-Laure Caparros; Thomas Jenuwein; Danny Reinberg; Associate Editor Monika Lachlan; Dawson MA, Kouzarides T. Cancer epigenetics: from mechanism to therapy. Cell. 2012; 150(1): 12-27; Syding LA, Nickl P, Kasparek P, Sedlacek R. CRISPR/Cas9 Epigenome Editing Potential for Rare Imprinting Diseases: A Review. Cells. 2020;9(4):993; and Zhang Y. Transcriptional regulation by histone ubiquitination and deubiquitination. Genes Dev. 2003;17(22):2733-2740). Example epigenetic modification domains can be obtained from, but are not limited to chromatin modifying enzymes, such as, DNA methyltransferases (e.g., DNMT1, DNMT3a and DNMT3b), TET1, TET2, thymine-DNA glycosylase (TDG), GCN5-related N- acetyltransferases family (GNAT), MYST family proteins (e.g., MOZ and MORF), and CBP/p300 family proteins (e.g., CBP, p300), Class I HDACs (e.g., HDAC 1-3 and HDAC8), Class II HDACs (e.g., HDAC 4-7 and HDAC 9-10), Class III HDACs (e.g., sirtuins), HDAC11, SET domain containing methyltransferases (e.g., SET7/9 (KMT7, NCBI Entrez Gene: 80854), KMT5A (SET8), MMSET, EZH2, andMLL family members), DOT1L, LSD1, Jumonji demethylases (e.g., KDM5A (JARID1A), KDM5C (JARID1C), and KDM6A (UTX)), kinases (e.g., Haspin, VRK1, PKCa, PKCP, PIM1, IKKa, Rsk2, PKB/Akt, Aurora B, MSK1/2, JNK1, MLTKα, PRK1, Chkl, Dlk/ZIP, PKC5, MST1, AMPK, JAK2, Abl, BMK1, CaMK, S6K1, SIK1), Ubp8, ubiquitin C- terminal hydrolases (UCH), the ubiquitin-specific processing proteases (UBP), and poly(ADP- ribose) polymerase 1 (P ARP-1). See, also, US Patent US11001829B2 for additional domains.

[0164] In example embodiments, histone acetylation is targeted to a target sequence using a CRISPR system (see, e.g., Hilton IB, et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat Biotechnol. 2015). In example embodiments, histone deacetylation is targeted to a target sequence (see, e.g., Cong et al., 2012; and Konermann S, et al. Optical control of mammalian endogenous transcription and epigenetic states. Nature. 2013;500:472-476). In example embodiments, histone methylation is targeted to a target sequence (see, e.g., Snowden AW, Gregory PD, Case CC, Pabo CO. Genespecific targeting of H3K9 methylation is sufficient for initiating repression in vivo. Curr Biol. 2002;12:2159-2166; and Cano-Rodriguez D, Gjaltema RA, Jilderda LJ, et al. Writing of H3K4Me3 overcomes epigenetic silencing in a sustained but context-dependent manner. Nat Commun. 2016;7: 12284). In example embodiments, histone demethylation is targeted to a target sequence (see, e.g., Kearns NA, Pham H, Tabak B, et al. Functional annotation of native enhancers with a Cas9-histone demethylase fusion. Nat Methods. 2015; 12(5):401-403). In example embodiments, histone phosphorylation is targeted to a target sequence (see, e.g., Li J, Mahata B, Escobar M, et al. Programmable human histone phosphorylation and gene activation using a CRISPR/Cas9-based chromatin kinase. Nat Commun. 2021;12(l):896). In example embodiments, DNA methylation is targeted to a target sequence (see, e.g., Rivenbark AG, et al. Epigenetic reprogramming of cancer cells via targeted DNA methylation. Epigenetics. 2012;7:350-360; Siddique AN, et al. Targeted methylation and gene silencing of VEGF-A in human cells by using a designed Dnmt3a-Dnmt3L single-chain fusion protein with increased DNA methylation activity. J Mol Biol. 2013;425:479-491; Bernstein DL, Le Lay JE, Ruano EG, Kaestner KH. TALE- mediated epigenetic suppression of CDKN2A increases replication in human fibroblasts. J Clin Invest. 2015;125: 1998-2006; Liu XS, Wu H, Ji X, et al. Editing DNA Methylation in the Mammalian Genome. Cell. 2016;167(l):233-247.el7; Stepper P, Kungulovski G, Jurkowska RZ, et al. Efficient targeted DNA methylation with chimeric dCas9-Dnmt3a-Dnmt3L methyltransferase. Nucleic Acids Res. 2017;45(4): 1703-1713; and Pflueger C., Tan D., Swain T., Nguyen T., Pflueger J., Nefzger C., Polo J.M., Ford E., Lister R. A modular dCas9-SunTag DNMT3A epigenome editing system overcomes pervasive off-target activity of direct fusion dCas9-DNMT3A constructs. Genome Res. 2018;28: 1193-1206). In example embodiments, DNA demethylation is targeted to a target sequence using a CRISPR system (see, e.g., TET1, see Xu et al, Cell Discov. 2016 May 3;2: 16009; Choudhury et al, Oncotarget. 2016 Jul 19;7(29):46545- 46556; and Kang JG, Park JS, Ko JH, Kim YS. Regulation of gene expression by altered promoter methylation using a CRISPR/Cas9-mediated epigenetic editing system. Sci Rep. 2019;9( 1) : 11960). In example embodiments, DNA demethylation is targeted to a target sequence (see, e g., TDG, see, Gregory DJ, Zhang Y, Kobzik L, Fedulov AV. Specific transcriptional enhancement of inducible nitric oxide synthase by targeted promoter demethylation. Epigenetics. 2013;8: 1205-1212).

[0165] Example epigenetic modification domains can be obtained from, but are not limited to transcription activators, such as, VP64 (see, e.g., Ji Q, et al. Engineered zinc-finger transcription factors activate OCT4 (POU5F1), SOX2, KLF4, c-MYC (MYC) and miR302/367. Nucleic Acids Res. 2014;42:6158-6167; Perez-Pinera P, et al. Synergistic and tunable human gene activation by combinations of synthetic transcription factors. Nat Methods. 2013;10:239-242; Farzadfard F, Perli SD, Lu TK. Tunable and multifunctional eukaryotic transcription factors based on CRISPR/Cas. ACS Synth Biol. 2013;2:604-613; Black JB, Adler AF, Wang HG, et al. Targeted Epigenetic Remodeling of Endogenous Loci by CRISPR/Cas9-Based Transcriptional Activators Directly Converts Fibroblasts to Neuronal Cells. Cell Stem Cell. 2016;19(3):406-414; and Maeder ML, Linder SJ, Cascio VM, Fu Y, Ho QH, Joung JK. CRISPR RNA-guided activation of endogenous human genes. Nat Methods. 2013;10(10):977-979), p65 (see, e.g., Liu PQ, et al. Regulation of an endogenous locus using a panel of designed zinc finger proteins targeted to accessible chromatin regions. Activation of vascular endothelial growth factor A. J Biol Chem. 2001;276: 11323-11334; and Konermann S, et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature. 2015;517:583-588), HSF1, and RTA (see, e.g., Chavez A, et al. Highly efficient Cas9-mediated transcriptional programming. Nat Methods. 2015;12:326-328). Example epigenetic modification domains can be obtained from, but are not limited to transcription repressors, such as, KRAB (see, e.g., Beerli RR, Segal DJ, Dreier B, Barbas CF., 3rd Toward controlling gene expression at will: specific regulation of the erbB-2/HER-2 promoter by using polydactyl zinc finger proteins constructed from modular building blocks. Proc Natl Acad Sci U S A. 1998;95: 14628-14633; Cong L, Zhou R, Kuo YC, Cunniff M, Zhang F. Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains. Nat Commun. 2012;3:968; GilbertLA, et al. CRISPR-mediated modular RNA- guided regulation of transcription in eukaryotes. Cell. 2013;154:442-451; and Yeo NC, Chavez A, Lance-Byrne A, et al. An enhanced CRISPR repressor for targeted mammalian gene regulation. Nat Methods. 2018; 15(8):611-616).

[0166] In example embodiments, the epigenetic modification domain linked to a DNA binding domain recruits an epigenetic modification protein to a target sequence. In example embodiments, a transcriptional activator recruits an epigenetic modification protein to a target sequence. For example, VP64 can recruit DNA demethylation, increased H3K27ac and H3K4me. In example embodiments, a transcriptional repressor protein recruits an epigenetic modification protein to a target sequence. For example, KRAB can recruit increased H3K9me3 (see, e.g., Thakore PI, D'Ippolito AM, Song L, et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat Methods. 2015; 12(12): 1143-1149). In an example embodiment, methyl-binding proteins linked to a DNA binding domain, such as MBD1, MBD2, MBD3, and MeCP2 recruits an epigenetic modification protein to a target sequence. In an example embodiment, Mi2/NuRD, Sin3A, or Co-REST recruit HDACs to a target sequence.

[0167] In example embodiments, the epigenetic modification domain can be a eukaryotic or prokaryotic (e.g., bacteria or Archaea) protein. In example embodiments, the eukaryotic protein can be a mammalian, insect, plant, or yeast protein and is not limited to human proteins (e.g., a yeast, insect, plant chromatin modifying protein, such as yeast HATs, HDACs, methyltransferases, etc.

[0168] In one aspect of the invention, is provided a fusion protein (epigenetic modification polypeptide) comprising from N-terminus to C-terminus, an epigenetic modification domain, an XTEN linker, and a nuclease-deficient RNA-guided DNA endonuclease enzyme or a nuclease- deficient endonuclease enzyme. [0169] In aspects, the epigenetic modification polypeptide further comprises a transcriptional activator. In aspects, the transcriptional activator is VP64, p65, RTA, or a combination of two or more thereof. In another aspect, the epigenetic modification polypeptide further comprises one or more nuclear localization sequences. In embodiments, the epigenetic modification polypeptide comprises the nuclease-deficient RNA-guided DNA endonuclease enzyme. In embodiments, the fusion protein comprises the nuclease-deficient DNA endonuclease enzyme.

[0170] In some embodiments, the functional domains associated with the adaptor protein or the CRISPR enzyme is a transcriptional activation domain comprising VP64, p65, MyoDl, HSF1, RTA or SET7/9. Other references herein to activation (or activator) domains in respect of those associated with the adaptor protein(s) include any known transcriptional activation domain and specifically VP64, p65, MyoDl, HSF1, RTA or SET7/9 (see, e.g., US Patent, US11001829B2).

[0171] In certain embodiments, the present invention provides a fusion protein comprising from N-terminus to C-terminus, an RNA-binding sequence, an XTEN linker, and a transcriptional activator. In aspects, the transcriptional activator is VP64, p65, RTA, or a combination of two or more thereof. In aspects, the fusion protein further comprises a demethylation domain, a nuclease- deficient RNA-guided DNA endonuclease enzyme or a nuclease-deficient endonuclease enzyme, a nuclear localization sequence, or a combination of two or more thereof. In embodiments, the fusion protein comprises the nuclease-deficient RNA-guided DNA endonuclease enzyme. In embodiments, the fusion protein comprises the nuclease-deficient DNA endonuclease enzyme.

[0172] In certain embodiments, the present invention provides a method of activating a target nucleic acid sequence in a cell, the method comprising: (i) delivering a first polynucleotide encoding a epigenetic modification polypeptide described herein including embodiments thereof to a cell containing the silenced target nucleic acid; and (ii) delivering to the cell a second polynucleotide comprising: (a) a sgRNA or (b) a crtracrRNA; thereby reactivating the silenced target nucleic acid sequence in the cell. In aspects, the sgRNA comprises at least one MS2 stem loop. In aspects, the second polynucleotide comprises a transcriptional activator. In aspects, the second polynucleotide comprises two or more sgRNA.

Open reading frame (ORF) Libraries

[0173] In example embodiments, the perturbation can be introduced using an open reading frame (ORF) or cDNA library (as used herein cDNA or ORF may be used interchangeably). A cDNA may be synthesized and cloned into a vector. In example embodiments, a first type of ORF library is a ORF library representing a plurality of different ORFs in the genome. In example embodiments, a second type of ORF library is a ORF library representing variants of a single ORF or variants for a very small number of ORFs, such as 5-20 ORFs (also known as a “deep mutational scanning” library) (see, e.g., Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014; 1 l(8):801-807; and Wei H, Li X. Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Front Genet. 2023; 14: 1087267). A plurality of cDNAs may be cloned into a library of vectors, such that each gene of interest or variant is represented in the library. In example embodiments, the vectors encoding the ORFs include a barcode specific to each ORF. Construction of barcoded ORF libraries has been described and is applicable to the present invention (see, e.g., US Patent Application Publication US20200283843A1; Ursu O, Neal JT, Shea E, et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nat Biotechnol. 2022;40(6):896-905; and Massively parallel phenotyping of variant impact in cancer with Perturb-seq reveals a shift in the spectrum of cell states induced by somatic mutations. Oana Ursu, James T. Neal, Emily Shea, Pratiksha I. Thakore, Livnat Jerby-Arnon, Lan Nguyen, Danielle Dionne, Celeste Diaz, Julia Bauman, Mariam Mounir Mosaad, Christian Fagre, Andrew O. Giacomelli, Seav Huong Ly, Orit Rozenblatt-Rosen, William C. Hahn, Andrew J. Aguirre, Alice H. Berger, Aviv Regev, Jesse S. Boehm. bioRxiv 2020.11.16.383307).

RNAi Libraries

[0174] In example embodiments, perturbations are introduced to the population of cells with an RNAi based library. For example shRNA libraries have been disclosed, such that the shRNA sequence acts as the barcode (see, e.g., Cheung HW, Cowley GS, Weir BA, et al. Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. Proc Natl Acad Sci U S A. 2011 ; 108(30): 12372-12377). As used herein, the term “RNAi” refers to any type of interfering RNA, including but not limited to, siRNAi, shRNAi, endogenous microRNA and artificial microRNA. For instance, it includes sequences previously identified as siRNA, regardless of the mechanism of down-stream processing of the RNA (i.e. although siRNAs are believed to have a specific method of in vivo processing resulting in the cleavage of mRNA, such sequences can be incorporated into the vectors in the context of the flanking sequences described herein). The term “RNAi” can include both gene silencing RNAi molecules, and also RNAi effector molecules which activate the expression of a gene.

[0175] As used herein, a “siRNA” refers to a nucleic acid that forms a double stranded RNA, which double stranded RNA has the ability to reduce or inhibit expression of a gene or target gene when the siRNA is present or expressed in the same cell as the target gene. The double stranded RNA siRNA can be formed by the complementary strands. In one embodiment, a siRNA refers to a nucleic acid that can form a double stranded siRNA. The sequence of the siRNA can correspond to the full-length target gene, or a subsequence thereof. Typically, the siRNA is at least about 15- 50 nucleotides in length (e.g., each complementary sequence of the double stranded siRNA is about 15-50 nucleotides in length, and the double stranded siRNA is about 15-50 base pairs in length, preferably about 19-30 base nucleotides, preferably about 20-25 nucleotides in length, e g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length).

[0176] As used herein “shRNA” or “small hairpin RNA” (also called stem loop) is a type of siRNA. In one embodiment, these shRNAs are composed of a short, e.g., about 19 to about 25 nucleotide, antisense strand, followed by a nucleotide loop of about 5 to about 9 nucleotides, and the analogous sense strand. Alternatively, the sense strand can precede the nucleotide loop structure and the antisense strand can follow.

[0177] The terms “microRNA” or “miRNA” are used interchangeably herein are endogenous RNAs, some of which are known to regulate the expression of protein-coding genes at the posttranscriptional level. Endogenous microRNAs are small RNAs naturally present in the genome that are capable of modulating the productive utilization of mRNA. The term artificial microRNA includes any type of RNA sequence, other than endogenous microRNA, which is capable of modulating the productive utilization of mRNA. MicroRNA sequences have been described in publications such as Lim, et al., Genes & Development, 17, p. 991 - 1008 (2003), Lim et al Science 299, 1540 (2003), Lee and Ambros Science, 294, 862 (2001), Lau et al., Science 294, 858-861 (2001), Lagos-Quintana et al, Current Biology, 12, 735-739 (2002), Lagos Quintana et al, Science 294, 853- 857 (2001), and Lagos-Quintana et al, RNA, 9, 175- 179 (2003), which are incorporated herein by reference. Multiple microRNAs can also be incorporated into a precursor molecule. Furthermore, miRNA-like stem-loops can be expressed in cells as a vehicle to deliver artificial miRNAs and short interfering RNAs (siRNAs) for the purpose of modulating the expression of endogenous genes through the miRNA and or RNAi pathways.

[0178] As used herein, “double stranded RNA” or “dsRNA” refers to RNA molecules that are comprised of two strands. Double-stranded molecules include those comprised of a single RNA molecule that doubles back on itself to form a two-stranded structure. For example, the stem loop structure of the progenitor molecules from which the single-stranded miRNA is derived, called the pre-miRNA (Bartel et al. 2004. Cell 1 16:281 -297), comprises a dsRNA molecule.

Chemical/Biologic perturbations

[0179] In example embodiments, non-genetic perturbations (e.g., chemical/biologic perturbations) are provided to the population of cells to determine their effect on the function of an organelle (e.g., mitochondria). In example embodiments, cells in separate wells or reaction volumes are tagged with a sample nucleic acid barcode sequence identifying each well.

[0180] In one embodiment, the sample barcode can be a vector encoding the barcode sequence, preferably, introduced before perturbation. For example, a population of cells is separated into individual wells of a plate and a vector is introduced to each well, each well having a specific barcode encoded for by the vector. A different chemical or biologic perturbation can then be added to each well. The methods as described can then be followed to identify perturbations that modulate the function of an organelle.

[0181] In another embodiment, each reaction volume is provided a binding agent that includes a barcode sequence. For example, a population of cells is separated into individual wells of a plate and each well is treated with a different chemical or biologic perturbation. After perturbation, the cells can be washed and the binding agent added to label each well with a specific barcode. The methods as described can then be followed to identify perturbations that modulate the function of an organelle. Specifically, permeabilization and treating with substrate/inhibitors would not disturb binding of the binding agent. After detection of the probe or reporter in the intact cells the barcodes for different levels of probe or reporter can be identified. The agent can be a binding agent linked to an identifying oligonucleotide, such as antibody linked to an oligonucleotide barcode. The agent can be an antibody as described for CITE-seq (Stoeckius et al., Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017 Sep;14(9):865-868). The binding agent (e.g., antibody) is specific for a surface marker on the cells to be analyzed. The labeling of the cells can use any method of cell hashing described (see, e.g., US Patent 11,332,736). [0182] In example embodiments, the chemical perturbations can be small molecules or combinations of small molecules. In example embodiments, the small molecules are derived from a combinatorial library containing a large number of potential therapeutic compounds. A combinatorial chemical library may be a collection of diverse chemical compounds generated by either chemical synthesis or biological synthesis, by combining a number of chemical "building blocks" such as reagents. For example, a linear combinatorial chemical library, such as a polypeptide library, is formed by combining a set of chemical building blocks (amino acids) in every possible way for a given compound length (for example, the number of amino acids in a polypeptide compound). Millions of chemical compounds can be synthesized through such combinatorial mixing of chemical building blocks.

[0183] Appropriate agents can be contained in libraries, for example, synthetic or natural compounds in a combinatorial library. Numerous libraries are commercially available or can be readily produced; means for random and directed synthesis of a wide variety of organic compounds and biomolecules, including expression of randomized oligonucleotides, such as antisense oligonucleotides and oligopeptides, also are known. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available or can be readily produced. Additionally, natural or synthetically produced libraries and compounds are readily modified through conventional chemical, physical and biochemical means, and may be used to produce combinatorial libraries. Such libraries are useful for the screening of a large number of different compounds.

[0184] Preparation of combinatorial libraries is well known to those of skill in the art. Libraries

(such as combinatorial chemical libraries) useful in the disclosed methods include, but are not limited to, peptide libraries (see, e.g., U.S. Patent No. 5,010,175; Furka, Int. J. Pept. Prot. Res., 37:487-493, 1991; Houghton et al, Nature, 354:84-88, 1991; PCT Publication No. WO 91/19735), (see, e.g., Lam et al., Nature, 354:82-84, 1991; Houghten et al., Nature, 354:84-86, 1991), and combinatorial chemistry-derived molecular library made of D-and/or L-configuration amino acids, phosphopeptides (including, but not limited to, members of random or partially degenerate, directed phosphopeptide libraries; see, e.g., Songyang et al., Cell, 72:767-778, 1993), antibodies (including, but not limited to, polyclonal, monoclonal, humanized, anti-idiotypic, chimeric or single chain antibodies, nanobodies, and Fab, F(ab')2 and Fab expression library fragments, and epitope-binding fragments thereof), small organic or inorganic molecules (such as, so-called natural products or members of chemical combinatorial libraries), molecular complexes (such as protein complexes), or nucleic acids, encoded peptides (e.g., PCT Publication WO 93/20242), random bio-oligomers (e.g., PCT Publication No. WO 92/00091), benzodiazepines (e.g., U.S. Patent No. 5,288,514), diversomers such as hydantoins, benzodiazepines and dipeptides (Hobbs et al., Proc. Natl Acad. Sa. USA, 90:6909-6913, 1993), vinylogous polypeptides (Hagihara et al., J. Am. Chem. Soc, 114:6568, 1992), nonpeptidal peptidomimetics with glucose scaffolding (Hirschmann et al., J. Am. Chem. Soc, 114:9217-9218, 1992), analogous organic syntheses of small compound libraries (Chen et al., J. Am. Chem. Soc, 116:2661, 1994), oligo carbamates (Cho et al., Science, 261 : 1303, 1003), and/or peptidyl phosphonates (Campbell et al., J. Org. Chem., 59:658, 1994), nucleic acid libraries (see Sambrook et al. Molecular Cloning, A Laboratory Manual, Cold Springs Harbor Press, NY., 1989; Ausubel et al., Current Protocols m Molecular Biology, Green Publishing Associates and Wiley Interscience, N. Y., 1989), peptide nucleic acid libraries (see, e.g., U.S. Patent No. 5,539,083), antibody libraries (see, e.g., Vaughn et al., Nat. Biotechnol, 14:309-314, 1996; PCT App. No. PCT/US96/10287), carbohydrate libraries (see, e.g., Liang et al., Science, 274: 1520-1522, 1996; U.S. Patent No. 5,593,853), small organic molecule libraries (see, e.g., benzodiazepines, Baum, C&EN, Jan 18, page 33, 1993; isoprenoids, U.S. Patent No. 5,569,588; thiazolidionones and methathiazones, U.S. Pat. No. 5,549,974; pyrrolidines, U.S. Patent Nos. 5,525,735 and 5,519,134; morpholino compounds, U.S. Patent No. 5,506,337; benzodiazepines, 5,288,514) and the like.

[0185] Libraries useful for the disclosed screening methods can be produced in a variety of manners including, but not limited to, spatially arrayed multipin peptide synthesis (Geysen, et al., Proc. Natl. Acad. Sa., 81(13):3998-4002, 1984), "teabag" peptide synthesis (Houghten, Proc. Natl. Acad. Sa., 82(15): 5131 -5135, 1985), phage display (Scott and Smith, Science, 249:386-390, 1990), spot or disc synthesis (Dittrich et al., Bworg. Med. Chem. Lett., 8(17):2351-2356, 1998), or split and mix solid phase synthesis on beads (Furka et al., Int. J. Pept. Protein Res., 37(6):487- 493, 1991; Lam et al., Chem. Rev., 97 (2):411-448, 1997). [0186] Devices for the preparation of combinatorial libraries are also commercially available (see, e.g., 357 MPS, 390 MPS, Advanced Chem Tech, Louisville Ky., Symphony, Rainin, Woburn, Mass., 433A Applied Biosystems, Foster City, Calif, 9050 Plus, Millipore, Bedford, Mass.). In addition, numerous combinatorial libraries are themselves commercially available (see, for example, ComGenex, Princeton, N.J., Asinex, Moscow, Ru, Tripos, Inc., St. Louis, Mo., ChemStar, Ltd, Moscow, RU, 3D Pharmaceuticals, Exton, Pa., Martek Biosciences, Columbia, Md., etc.).

[0187] Libraries can include a varying number of compositions (members), such as up to about

100 members, such as up to about 1,000 members, such as up to about 5,000 members, such as up to about 10,000 members, such as up to about 100,000 members, such as up to about 500,000 members, or even more than 500,000 members. In one example, the methods can involve providing a combinatorial chemical or peptide library containing a large number of potential therapeutic compounds. Such combinatorial libraries are then screened by the methods disclosed herein to identify those library members (particularly chemical species or subclasses) that display a desired characteristic activity.

[0188] The compounds identified using the methods disclosed herein can serve as conventional "lead compounds" or can themselves be used as potential or actual therapeutics. In some instances, pools of candidate agents can be identified and further screened to determine which individual or subpools of agents in the collective have a desired activity. Compounds identified by the disclosed methods can be used as therapeutics or lead compounds for drug development for a variety of conditions.

Organelle specific substrate/inhibitor combinations

[0189] In example embodiments, the one or more organelle-specific substrate/inhibitors is one or more mitochondria specific substrate(s)/inhibitor(s). Substrates and inhibitors for use in mitochondrial respiration assays have been previously described (see, e.g., Divakaruni, et al., 2014). In example embodiments, oxidizable substrates include, but are not limited to Glutamate, Malate, Succinate, Glycerol-3 -phosphate, Palmitoyl CoA, Palmitoyl carnitine, Octanoyl carnitine, Pyruvate, P-Hydroxybutyrate, Ascorbate, and TMPD. In example embodiments, inhibitors and other reagents include, but are not limited to ADP K+ salt (ADP-K), Rotenone, Antimycin A, Oligomycin, carbonyl cyanide 4-(trifluoromethoxy)phenyl-hydrazone (FCCP), di chloroacetate or DCA K+ salt (DCA-K), Carnitine, and ATP. In example embodiments, the one or more mitochondria-specific substrate(s)/inhibitor(s) are selected from the group consisting of glutamate, malate, succinate, piericidin A, coenzyme Q-linked substrates, glycerol-3-phosphate, ATP, ADP, D-lactate, antimycin A, ascorbate, N,N,N’,N’-tetramethyl-p-phenylenediamine (TMPD), oligomycin A, BAM15, and carbonyl cyanide m-chlorophenyl hydrazone (CCCP).

Detection of functional probes or reporters

[0190] In example embodiments, the population of cells is sorted based on the level of the functional probes or reporters. Thus, barcodes or perturbations that increase or decrease the signal can be identified. Methods of cell sorting are well known in the art. Fluorescence-Activated Cell Sorting, also known as flow cytometry cell sorting, or commonly known by the acronym FACS can be used to sort the population of cells based on the level of the probe or reporter signal.

Sequencing and Amplification

[0191] In example embodiments, identification of perturbations associated with an organelle function comprises sequencing and, preferably amplification of barcodes before sequencing. In example embodiments, sequencing comprises high-throughput (formerly “next-generation”) technologies to generate sequencing reads. In DNA sequencing, a read is an inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment. Methods for constructing sequencing libraries are known in the art (see, e.g., Head et al., Library construction for next-generation sequencing: Overviews and challenges. Biotechniques. 2014; 56(2): 61-77). A “library” or “fragment library” may be a collection of nucleic acid molecules derived from one or more nucleic acid samples, in which fragments of nucleic acid have been modified, generally by incorporating terminal adapter sequences comprising one or more primer binding sites and identifiable sequence tags. In certain embodiments, the library members (e.g., genomic DNA, cDNA) may include sequencing adaptors that are compatible with use in, e.g., Illumina's reversible terminator method, long read nanopore sequencing, Roche's pyrosequencing method (454), Life Technologies' sequencing by ligation (the SOLiD platform) or Life Technologies' Ion Torrent platform. Examples of such methods are described in the following references: Margulies et al (Nature 2005 437: 376-80); Schneider and Dekker (Nat Biotechnol. 2012 Apr 10;30(4):326-8); Ronaghi et al (Analytical Biochemistry 1996 242: 84-9); Shendure et al (Science 2005 309: 1728-32); Imelfort et al (Brief Bioinform. 2009 10:609-18); Fox et al (Methods Mol. Biol. 2009; 553:79-108); Appleby et al (Methods Mol. Biol. 2009; 513: 19-39); and Morozova et al (Genomics. 2008 92:255-64), which are incorporated by reference for the general descriptions of the methods and the particular steps of the methods, including all starting products, reagents, and final products for each of the steps. In example embodiments, barcodes are amplified by PCR using primers specific to universal sequences flanking the barcode sequences. Thus, barcode sequences are enriched by amplification before sequencing.

[0192] Further embodiments are illustrated in the following Examples which are given for illustrative purposes only and are not intended to limit the scope of the invention.

EXAMPLES

Example 1 - Large-scale genetic dissection of mitochondrial physiology using PMF-seq

[0193] Here, Applicants report the development of such a technology, which Applicants call permeabilized-cell mitochondrial function sequencing (PMF-seq). The core concept behind the approach is to interrogate mitochondrial function in mutagenized pools of cells where the plasma membrane has been permeabilized while the mitochondria remain physiologically competent and accessible to substrates/inhibitors. Applicants reasoned that by CRISPR mutagenizing cultured cells prior to permeabilization Applicants could perform kinetic measurements in bulk, sort cells based on a bioenergetic parameter at a specified time, and perform next-generation sequencing, thereby connecting genes to bioenergetics.

[0194] Applicants sought to optimize cell permeabilization and buffer conditions such that mitochondria remained functionally intact and single cells could still be reliably sorted. Applicants used Perfringolysis O ¹⁰, a bacterial toxin that selectively permeabilizes cholesterol-rich membranes such as the plasma membrane by forming giant ring-shaped pores. Applicants confirmed plasma membrane permeabilization and substrate-specific respiration in A375 cells using plasma membrane-impermeable glutamate/malate (complex I), succinate (complex II), and ascorbate/TMPD (cytochrome c) with the Seahorse XFe96 Analyzer (Fig. 1b). Applicants could monitor both basal steady state and energized steady states of mitochondrial membrane potential (DYm) in pools of permeabilized cells using the lipophilic cationic dye tetramethylrhodamine methyl ester (TMRM) (Fig. 1c). Applicants then repeated the same experiment and could detect low and high DY_m states as an endpoint assay on a flow cytometer (Fig. Id). Collectively, these studies show that in suspensions of permeabilized cells, mitochondria can be energized using specific substrate/inhibitor combinations that are otherwise cell impermeable, they can be monitored using TMRM, and that steady state endpoints from kinetic measurements can be reliably detected by flow cytometry.

[0195] To enable PMF-seq workflow, Applicants combined high-throughput pooled CRISPR/Cas9 mutagenesis with a TMRM readout by fluorescence activated cell sorting (FACS) (Fig. le). Applicants generated a custom CRISPR sgRNA library targeting genes encoding the mitochondrial proteome ⁿ, also including genes encoding the lysosomal proteome ¹² as well as genes from key metabolic pathways such as glycolysis. The resulting CRISPR library, which Applicants term “MitoPlus”, consists of 15,271 sgRNAs targeting 1,864 genes (SEQ ID NOS: 1- 15271). Applicants lentivirally infected wild-type A375 cells with CRISPR/Cas9 and the MitoPlus library. CRISPR mutagenized cells were harvested on day 15 post infection, permeabilized and stained with TMRM, treated with specific substrate/inhibitor combinations, and subjected to FACS. Cells corresponding to the top and bottom -15% of the TMRM distributions (designated as “high tail” and “low tail”) were collected, with genomic DNA harvested and subjected to PCR amplification and gRNA sequencing. Analyses of the screening results were performed using a Z- score based method as previously described ⁹ In this analysis, genes required for generating the membrane potential have highly negative Z-scores (Online Methods).

[0196] Applicants began by asking if PMF-seq can recover the known genetic basis of respiratory chain branching, which is not readily probed in intact cells. Although the respiratory chain is traditionally depicted as a simple linear sequence of protein complexes, it is in fact highly branched ¹³, with multiple inputs into coenzyme Q (e g., complex I, SDH, GPD2, DHODH, ETFDH, SQOR) and into cytochrome c (e.g., complex III, MIA40, SUOX). Applicants previously reported intact-cell based CRISPR screens that identified genes required for OXPHOS based on differential growth in glucose versus galactose ⁸. Although complex I and SDH scored in those screens, genes corresponding to these non-canonical respiratory chain inputs did not, likely because of pathway redundancy and lack of endogenous substrates. In theory the advantage of PMF-seq is that the substrate and inhibitor milieu can be controlled, forcing the cell to utilize a pathway whose genetic basis can then be queried. In agreement with the theory, Applicants find that PMF-seq recovers complex T genes only when glutamate/malate (a classic complex I cocktail) is used, complex II genes only under conditions of succinate/piericidin (a classic complex II cocktail), and complex III genes only when coenzyme Q-linked substrates are provided (Fig. 2). As expected, the downstream respiratory chain complexes cytochrome c and complex IV score for all these substrates (Fig. 2). PMF-seq can be readily implemented in both adherent and suspension cell types and can reveal cell-type differences. For example, Applicants find differential dependency on GPD2 using glycerol-3 -phosphate, another Q-linked substrate ¹⁴, in A375 cells (Fig. 5a) versus K562 cells (Fig. 5b). Crucially, direct comparison to the previous OXPHOS CRISPR screen in intact cells ⁸ (Fig. 4) clearly indicates that PMF-seq has far greater resolving power to spotlight branch-points and substrate specificities.

[0197] Applicants next applied PMF-seq to probe the genetic basis for another property of OXPHOS that is not widely appreciated: reversibility. Nearly all OXPHOS complexes are reversible, depending on the prevailing metabolic state of the cell. Complex V can run in reverse by consuming glycolytic ATP to support mitochondrial membrane potential ¹⁵ (Fig 6a). Applicants confirmed in the permeabilized A375 system that added ATP can energize mitochondria when the respiratory chain is inhibited by antimycin (Fig 6b), and Applicants further showed that ATP can support the membrane potential during FACS sorting (Fig. 6c). PMF-seq reveals strong genetic dependency on known complex V genes when ATP, but not other substrates, is used (Fig. 6d).

[0198] Finally, Applicants sought to determine whether PMF-seq can provide genetic insights into respiratory chain fuels not previously linked to human OXPHOS bioenergetics. In isolated mitochondria and permeabilized cells, Applicants find that D-lactate can boost the membrane potential under antimycin conditions (Fig. 3a). By performing PMF-seq using D-lactate as the substrate and comparing it to ascorbate/TMPD under antimycin treatment, Applicants identified human D-lactate dehydrogenase (LDHD). LDHD is a mitochondrial intermembrane space protein ¹⁶, and in S. cerevisiae and A. thaliana, its homolog transfers electrons directly to cytochrome c by oxidizing D-lactate ^{17, 18}, but this function has not been established in any mammalian counterparts. Applicants validated the screening result in A375 and HepG2 LDHD knockout cell lines. In these knockout cell lines, Applicants used real-time, spectrophotometric measurements of mitochondrial membrane potential in permeabilized cells, and Applicants found that although they could mount a membrane potential response to ascorbate/TMPD, their response to D-lactate was blunted (Fig. 3d) To further characterize human LDHD, Applicants expressed and purified it for in vitro biochemical characterizations. Human LDHD, which has a predicted molecular weight of ~50 kDa, evidently forms a dimer based on gel filtration (Fig. 3e, f). Using the purified LDHD, Applicants could confirm direct electron transfer from D-lactate to cytochrome c by monitoring its heme absorbance (Fig. 3g). Applicants further characterized the steady-state enzyme kinetics and found the kcm and KM of this reaction to be 103 min'¹ and 112 pM, respectively. In addition, Applicants tested candidate related substrates for LDHD and identified R-2-hydroxybutyrate as another substrate that can reduce cytochrome c (Fig. 3h). LDHD has been associated with the detoxification pathway for methylglyoxal, a highly toxic byproduct of glycolysis ¹⁹ However, Applicants only observed a very modest change in sensitivity towards methylglyoxal in LDHD KO (Fig. 7a-b). Human LDHD deficiency has been linked to diverse pathologies, ranging from D-lactic acidosis, a complication of short bowel syndrome ²⁰, to complex IV deficiency with neurological manifestations ²¹. While prior studies have focused on LDHD in detoxifying reactive aldehydes, this work shows that as a substrate D-lactate is able to contribute to the proton motive force. Future studies will be required to determine whether this bioenergetic role of LDHD (as opposed to any detoxification role) may be important in disease pathogenesis, or perhaps more generally in helping to harvest energy from end products of bacterial metabolism.

[0199] PMF-seq provides a simple new way to investigate the genetic basis of mitochondrial organelle-level physiology with extremely high throughput. In the current paper, Applicants have focused on vectorial bioenergetics and have shown that PMF-seq can easily reveal the genetic basis for branching and reversibility of the OXPHOS system, which cannot otherwise be readily determined in intact cells using genetic screening. PMF-seq is ideally suited to genetically map all mitochondrial metabolic pathways and transporters coupled to the respiratory chain, and the platform can be readily extended to other bioenergetic organelles such as chloroplasts. In the current report, Applicants focused on membrane potential (AΨ_m), the major component of the proton motive force in mitochondria; However, by simple extension, PMF-seq can be used to monitor other parameters classically measured in isolated mitochondria such as NADH ²² and calcium ²³ to paint a more complete picture of mitochondrial physiology.

Example 2 - Methods [0200] Cell lines and cell culture. A375 (ATCC CRL-1619) and K562 (ATCC CCL-243) cells were obtained from ATCC. All experiments with wild-type cells or CRISPR-Cas9 mediated knockouts were performed under passage number 20 upon receipt from ATCC, and late passage cells were authenticated by STR profiling. Cells were cultured in DMEM (Gibco #11995-065) supplemented with 10% FBS (Gibco #26140-079) and penicillin/streptomycin (Gibco #15140- 122).

[0201] Determination of infection conditions for CRISPR pooled screens. The custom all- in-one MitoPlus library (15,271 sgRNAs targeting 1,864 genes) was generated by the Broad’s Genetic Perturbation Platform as a service. The library was delivered as lentivirus. Optimal infection conditions were determined to achieve 30%-50% infection efficiency, corresponding to a multiplicity of infection (MOI) of -0.5-1. Spin-infections were performed in 12-well plate format with 1.5xl0⁶ cells each well. Optimal conditions were determined by infecting cells with different virus volumes (0, 100, 200, 400, 600, 800 ul) with a final concentration of 4 ug/ml polybrene in A375 cells. Cells were centrifuged for 2 h at 1000 x g at 37°C. Approximately 24 h after infection, cells were collected and supplemented with 1 ug/ml puromycin. Cells were counted 3 days post selection to determine the infection efficiency, comparing survival with and without puromycin selection. Volume of virus that yielded 30%-50% infection efficiency was used for screening.

[0202] CRISPR Screens with the MitoPlus “All-in-one” Library. Infection, selection, and expansion were performed in two distinct replicates. Screening-scale infections were performed with the pre-determined volume of virus in the same 12-well format as the viral titration described above, and pooled 24 h post-centrifugation. Infections were performed with 3.6xl0⁷ cells per replicate, to achieve a representation of at least 500 cells per sgRNA following puromycin selection (8xl0⁶ surviving cells). Approximately 24 h after infection, all wells within a replicate were pooled and were split into T175 flasks and cells were selected with puromycin for 3 days to remove uninfected cells. After selection was complete, at least 5xl0⁶ of A375 cells were seeded in T175 flasks. Cells were passaged in fresh media (high glucose DMEM supplemented with pyruvate, uridine, and 10% regular FBS, pen/strep) every 2-3 days. Cells were harvested -15 days after infection for PMF-seq.

[0203] Flow cytometry-based screening with TMRM as the readout in permeabilized cells. A375-MitoPlus cells were harvested in sequential batches. Each batch of cells (~200xl0⁶) was resuspended in Agilent Seahorse XF DMEM media (supplemented with 10 mM glucose, 1 mM pyruvate, 2 mM glutamine) at 10xl0⁶/mL (~20 ml total). Cells were seeded in an ultra-low binding 6-well plate up to 4 ml per well and were incubated at 37° C in a CO2 incubator before sorting. Permeabilization buffers were prepared with the following: IxMAS buffer (70 mM sucrose, 220 mM mannitol, 5 mM KH2PO4, 5 mM MgCb, 2 mM HEPES, 1 mM EGTA), 4 mM ADP (except for the antimycin+ATP condition), 5 uM oligomycin (except for the antimycin+ATP condition), 0.7% fatty acid free BSA, 10 nM Seahorse PMP (Agilent), and 100 nM TMRM (Invitrogen). At this concentration, TMRM signal behaves in the non-quenching mode where inner membrane polarization leads to a stronger TMRM signal. Right before each sort, 2xl0⁷ cells were harvested and spun down at 600 x g for 0.5 min. Media were aspirated and replaced with 1 mL of permeabilization buffer + treatment (respiratory chain substrate and small-molecule OXPHOS modulator). When cells were ready for sorting, the 2xl0⁷ cells were transferred to a flow cytometry tube through the strainer cap. Applicants used the following conditions in this paper: 10 mM Glutamate + 10 mM Malate, 1 uM Pieri ci din + 10 mM Succinate, 1 uM Piericidin + 25 mM Glycerol 3-phosphate, 1 uM Antimycin + 10 uM Ascorbate + 0.1 uM TMPD, 1 uM Antimycin + 25 uM D-Lactate, and 1 uM Antimycin + 5 mM ATP. All chemicals were purchased from Sigma- Aldrich, except for Piericidin (Enzo Life Sciences) and ATP (ThermoFisher).

[0204] A Sony SH800 flow cytometer was used for all cell sorting experiments in this study. Sample flow rate was adjusted to achieve an event rate of ~10,000/s. Cell population was first gated by FSC/SSC, and the top and bottom -15% of the TMRM distribution were sorted. Cells were sorted until all cells were consumed. For an input of 2xl0⁷ cells, around 7.5x10⁵ cells could be harvested from each “tail”. Genomic DNA isolation. PCR, sequencing, and subsequent analyses of the screening results were performed as previously described in (To et al. Cell 2019). For each replicate and each condition, the Z-score represents the Z-score transformation of mean log2 fold difference in sgRNA abundances between the low tail and high tail for each gene in each treatment. [0205] Analysis of the screening results. Next-generation sequencing was performed and for each condition the abundances of each sgRNA in the “high tail” and “low tail” were quantified and compared, to identify gene knockouts that are depolarized or hyperpolarized under the said condition. For a given substrate Applicants created scatter plots of Z-scores, which are derived from the mean log2 fold difference in sgRNA abundances between the low tail and high tail for each gene. A highly negative Z-score indicates the enrichment of a gene knockout in the “low tail” and thus a depolarized state, whereas a highly positive Z-score indicates the enrichment of a gene knockout in the “high tail” and thus a hyperpolarized state. Applicants used a set of non-expressed genes (bottom 300 genes as determined by the abundance of transcripts) in the MitoPlus library as the controls. These controls have tight Z-score distributions centered around zero in all cases. To visualize the “electron flow” that supports membrane potential generation under each substrate, Applicants highlighted the gene sets for the complexes I-IV and cytochrome c (Fig. 2) in two biological replicates. A gene set is required for membrane potential generation if its members are enriched in the lower left quadrant.

[0206] Gene-specific CRISPR-Cas9 knockouts. The two best sgRNAs from the MitoPlus library were ordered as complementary oligonucleotides (Integrated DNA Technologies) and cloned into pLentiCRISPRv2 (Addgene # 52961). An sgRNA (CTTGAGACTGAGTCAGACCA (SEQ ID NO: 15289)) targeting a non-expressed gene OR4N4 was used as a cutting negative control. Lentiviruses were produced according to Addgene’s protocol and cells were selected with 2 ug/mL puromycin 24 h post infection. Puromycin was withdrawn 48 h later and cells were maintained for 10-20 addition days during which experiments were performed. Gene disruption efficiency was verified by protein immunoblotting. The sequences of the sgRNAs used are: LDHD sgl (AGGTTCGCGAGTCCTACCCA (SEQ ID NO: 15290)), LDHD sg2 (CACCGCGGCAGTGGACACGT (SEQ ID NO: 15291)). The resulting knockouts were evaluated by Western blotting against LDHD (Sigma-Aldrich HPA0066148).

[0207] Oxygen consumption measurement in permeabilized cells. For permeabilized Seahorse OCR measurements with the XFe96 analyzer (Agilent), A375 cells were seeded at 1.5xl0⁴ cells/well in 80 ul/well growth media and grown overnight at 37°C. Seahorse cartridges were hydrated overnight at 37°C, according to the manufacturer’s protocol. After 16-20 hrs, cells were washed once with IxMAS buffer (70 mM sucrose, 220 mM mannitol, 5 mM KH2PO4, 5 mM MgCh, 2 mM HEPES, 1 mM EGTA). Cells were then permeabilized with IxMAS buffer supplemented with 0.2% fatty acid free BSA, 2 nM XF Plasma Membrane Permeabilizer (PMP) (Agilent 102504-100), and 4 mM ADP. Piericidin (1 uM) or antimycin (1 uM) might be added to the buffer depending on the respiratory chain substrate under investigation. Upon assay start, baseline respiratory rate measurements were taken, followed by injection of 10 mM respiratory chain substrate, 2 uM oligomycin, 8 uM BAM15, and 1 uM antimycin (or 20 mM sodium azide when antimycin was already added). All chemicals were purchased from Sigma-Aldrich, except for Piericidin (Enzo Life Sciences).

[0208] Kinetic measurements of TMRM in permeabilized cell. Kinetic TMRM measurement in permeabilized cells were performed using a LS-55 fluorescence spectrometer (PerkinElmer). One million A375 cells were harvested and pelleted, washed once in dPBS, and resuspended in 500 ul IxMAS buffer containing 2 nM XF PMP and 0.2% fatty acid free BSA. All 500 ul cell suspension was transferred to a quartz (Suprasil) cuvette with 4 mm magnetic stirrer. Trace reading was started, and fluorescence was measured at an excitation of 530 nm and emission of 600 nm (with slits of 5 nm). TMRM (Invitrogen) was added at 500 nM (1.25 ul of 200 uM) and fluorescence was allowed to stabilize (~ 5min). Substrate (see concentrations above) and chemical OXPHOS modulators (see concentrations above) were subsequently added. TMRM signal quenching is measured and therefore inner membrane polarization has an inverse relationship with TMRM signal. Depolarization is inferred from increased intensity (lower 1-TMRM) and hyperpolarization from decreased intensity (higher 1-TMRM).

[0209] Purification and biochemical characterization of human LDHD. A Pichia codon optimized construct containing the mitochondrial targeting sequence of DLD1 from Komagataella phaffi, followed by human LDHD isoform 2, a C-terminal TEV cleavage site, and a 10X histidine tag was cloned into expression vector pJGG (Biogrammatics) and then integrated into the genome of Komagataella phaffi expression strain BG24 (Biogrammatics). The recombinant strain was used to inoculate 1 L cultures of BMGY media (Biogrammatics) containing 0.5 mg/ml G418 (Goldbio) which were grown at 30°C with shaking for 24 hrs to an optical density around 30. Cells were pelleted at 4000 x g and then frozen in liquid nitrogen. Cells were lysed with a Resch Mixer Mill MM40 and the powder was stored at -80°C.

[0210] At the time of purification, lysed cells were resuspended in 50 mM HEPES pH 8, 300 mM NaCl, 0.1% Triton-X-100 (Sigma), 1 mM PMSF (Sigma), 1 ug/ml pepstatin (GoldBio), 1 ug/ml leupeptin (GoldBio), 1 ug/ml aprotinin (Sigma), 1 mM benzamidine (GoldBio), 0.1 mg/ml soybean trypsin inhibitor (GoldBio), 0.1 mg/ml AEBSF (GoldBio), and DNasel (Sigma). The lysate was pelleted at 45,000 x g for 1 hr and the supernatant loaded onto Talon cobalt resin (Takara). The column was washed, and the protein eluted with 25 mM HEPES pH 8.0, 300 mM NaCl, and 300 mM imidazole. The protein was then run over a Superdex 200 10/300 GL column (Cytiva), concentrated to approximately 5 mg/ml, and analyzed by SDS-PAGE with Coomassie stain.

[0211] Steady-state enzyme kinetics. The reaction buffer contained 50mM potassium phosphate pH 8.5, 200 uM bovine heart cytochrome c (Sigma), and the required concentration of substrate. Purified LDHD protein (20 ug per reaction) was added to the reaction and the redox state of cytochrome c was monitored spectrophotometrically with a Cary 100 UV-VIS spectrophotometer (Agilent) by the absorbance at 550 nm. The initial rates were determined using up to 30 seconds of the linear portion of the trace. Values of k_cat and K_M were determined using Prism 9 (GraphPad Software). The relative rates among various substrates were determined from the initial reaction rate of LDHD and cytochrome c with 10 mM substrate.

[0212] Methylglyoxal toxicity. A375 control and LDHD KO cells were seeded at 5xl0⁴/well in a 12-well plate in regular media (high glucose DMEM supplemented with pyruvate, uridine, and 10% regular FBS, pen/strep), and treated with 0-5 mM methylglyoxal (Sigma), followed by 3 days of growth. Cell counts and cell viability were determined by a Beckman Coulter Vi-Cell XR cell viability analyzer.

Example 3 - Tables

Tables 1-4 show Z-scores from the screens, related to FIGS. 2, 3C, 4, and 5. Z-scores: Z-score transformation of mean log2 fold difference in sgRNA abundances between the low tail and high tail for each treatment. Null distribution was defined by the "Control" set.

Table 1. related to FIG. 2. The data in this table is organized in the following order: gene name 1, A375 GM Repl, A375 GM Rep2, A375 Pier-Succ Repl, A375 Pier-Succ Rep2, A375 Anti- AscTMPD Repl, A375 Anti-AscTMPD Rep2; gene name 2, A375 GM Repl, A375 GM Rep2, A375 Pier-Succ Repl, A375 Pier-Succ Rep2, A375 Anti-AscTMPD Repl, A375 Anti-AscTMPD Rep2; etc. The data can be viewed in an alternative table form by replacing commas with the tab character and the semicolons with a paragraph character.

Table 2. related to FIG. 3C. The data in this table is organized in the following order: gene name 1, A375 Anti-D-lac, A375 Anti-AscTMPD; gene name 2, A375 Anti-D-lac, A375 Anti- AscTMPD; etc. The data can be viewed in an alternative table form by replacing commas with the tab character and the semicolons with a paragraph character.

0.413432238, 0.101162415; ZNF219, -0.457501705, -0.307682363; ZNF680, -0.631991973, 1.775617543; ZNF682, -0.383121089, -0.803608688; ZNF91, 0.478105234, -1.011317677; ZNRF1, 0.765321858, 0.2989664; ZNRF2, -0.595158363, 0.341129637; ZYG11A, 0.489899035, 2.198622417

Table 3. related to FIG. 4. The data in this table is organized in the following order: gene name 1, A375 GM, A375 Pier-G3P, A375 Anti-AscTMPD, K562 GM, K562 Pier-G3P, K562 Anti- AscTMPD; gene name 2, A375 GM, A375 Pier-G3P, A375 Anti-AscTMPD, K562 GM, K562 Pier-G3P, K562 Anti-AscTMPD; etc. The data can be viewed in an alternative table form by replacing commas with the tab character and the semicolons with a paragraph character.

Table 4. related to FIG. 5. The data in this table is organized in the following order: gene name 1 , A375 GM Repl, A375 GM Rep2, A375 Anti-ATP Repl, A375 Anti-ATP Rep2; gene name 2, A375 GM Repl, A375 GM Rep2, A375 Anti-ATP Repl, A375 Anti-ATP Rep2; etc. The data can be viewed in an alternative table form by replacing commas with the tab character and the semicolons with a paragraph character.

References

1. Chance, B. & Williams, G.R. Respiratory enzymes in oxidative phosphorylation. I. Kinetics of oxygen utilization. J Biol Chem 217, 383-393 (1955).

2. Chance, B. & Hollunger, G. The interaction of energy and electron transfer reactions in mitochondria. I. General properties and nature of the products of succinate-linked reduction of pyridine nucleotide. J Biol Chem 236, 1534-1543 (1961). Mitchell, P. & Moyle, J. Respiration-driven proton translocation in rat liver mitochondria. Biochem J 105, 1147-1162 (1967). Nicholls, D.G. & Ferguson, S.J., Edn. 4th 1 online resource (434 p (Elsevier,, Amsterdam, Netherlands; 2013). Tzagoloff, A. & Dieckmann, C L. PET genes of Saccharomyces cerevisiae. Microbiol Rev 54, 211-225 (1990). Calvo, S.E. et al. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nat Genet 42, 851-858 (2010). Pagliarini, D.J. et al. A mitochondrial protein compendium elucidates complex I disease biology. Cell 134, 112-123 (2008). Arroyo, J.D. et al. A Genome-wide CRISPR Death Screen Identifies Genes Essential for Oxidative Phosphorylation. Cell Metab 24, 875-885 (2016). To, T.L. et al. A Compendium of Genetic Modifiers of Mitochondrial Dysfunction Reveals Intra-organelle Buffering. Cell 179, 1222-1238 el217 (2019). Divakaruni, A.S., Rogers, G.W. & Murphy, A.N. Measuring Mitochondrial Function in Permeabilized Cells Using the Seahorse XF Analyzer or a Clark-Type Oxygen Electrode. Curr Brotoc Toxicol 60, 25 22 21-16 (2014). Calvo, S.E., Clauser, K.R. & Mootha, V.K. MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins. Nucleic Acids Res 44, D1251-1257 (2016). Brozzi, A., Urbanelli, L., Germain, P.L., Magini, A. & Emiliani, C. hLGDB: a database of human lysosomal genes and their regulation. Database (Oxford) 2013, bat024 (2013). Vafai, S.B. & Mootha, V.K. Mitochondrial disorders as windows into an ancient organelle. Nature 491, 374-383 (2012). Klingenberg, M. Localization of the glycerol-phosphate dehydrogenase in the outer phase of the mitochondrial inner membrane. Eur J Biochem 13, 247-252 (1970). Walker, J.E. The ATP synthase: the understood, the uncertain and the unknown. Biochem Soc Trans 41, 1-16 (2013). Rath, S. et al. MitoCarta3.0: an updated mitochondrial proteome now with sub-organelle localization and pathway annotations. Nucleic Acids Res 49, D1541-D1547 (2021). 17. Pajot, P. & Claisse, M L. Utilization by yeast of D-lactate and L-lactate as sources of energy in the presence of antimycin A. Eur J Biochem 49, 275-285 (1974).

18. Engqvist, M., Drincovich, M.F., Flugge, U.I. & Maurino, V.G. Two D-2-hydroxy-acid dehydrogenases in Arabidopsis thaliana with catalytic capacities to participate in the last reactions of the methylglyoxal and beta-oxidation pathways. J Biol Chem 284, 25026- 25037 (2009).

19. Allaman, I., Belanger, M. & Magistretti, P.J. Methylglyoxal, the dark side of glycolysis. Front Neurosci 9, 23 (2015).

20. Monroe, G.R. et al. Identification of human D lactate dehydrogenase deficiency. Nat Commun 10, 1477 (2019).

21. Kwong, A.K. et al. Human d-lactate dehydrogenase deficiency by LDHD mutation in a patient with neurological manifestations and mitochondrial complex IV deficiency. JIMD Rep 60, 15-22 (2021).

22. Blinova, K. et al. Distribution of mitochondrial NADH fluorescence lifetimes: steady-state kinetics of matrix NADH interactions. Biochemistry 44, 2585-2594 (2005).

23. Qian, Y. et al. A genetically encoded near-infrared fluorescent calcium ion indicator. Nat Methods 16, 171-174 (2019).

***

[0213] Various modifications and variations of the described methods, pharmaceutical compositions, and kits of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific embodiments, it will be understood that it is capable of further modifications and that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the art are intended to be within the scope of the invention. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure come within known customary practice within the art to which the invention pertains and may be applied to the essential features herein before set forth.

Claims

CLAIMS What is claimed is:

1. A method of measuring organelle function in cells comprising: a) permeabilizing the plasma membranes of a population of cells with one or more agents that preserve cell integrity and functionally intact organelles, wherein the population of cells comprise one or more perturbations, with or without one or more barcodes identifying the one or more perturbations, optionally, wherein the population of cells comprise an organellespecific genetically encoded functional reporter; b) optionally, labeling the population of cells with an organelle-specific functional probe when the population of cells does not comprise a genetically encoded functional reporter; c) treating the population of cells with one or more organelle specific substrate/inhibitor combinations; d) measuring organelle function for the population of cells by detecting the organellespecific functional probe or organelle-specific functional reporter; e) sequencing the barcodes or sequences encoding the perturbations from the population of cells to correlate the one or more perturbations with an effect on the measured organelle function.

2. The method of claim 1, further comprising, prior to step e), sorting the cell population based on an activity level of the organelle specific functional probe thereby allowing for detection of perturbations enriched or depleted in the sorted cell population.

3. The method of claim 1 or 2, wherein the one or more perturbations are genetic, chemical, or biologic perturbations.

4. The method of claim 3, wherein the one or more genetic perturbations comprise CRISPR guide RNAs (gRNAs) from a library of CRISPR gRNAs, wherein the population of cells are configured to express a CRISPR enzyme, optionally, wherein the guide sequences are capable of being used as the barcodes.

5. The method of claim 4, wherein the library of CRISPR gRNAs comprises one or more guide sequences targeting one or more genes selected from the group consisting of genes encoding the mitochondrial proteome, the lysosomal proteome, and metabolic pathways.

6. The method of claim 3, wherein the one or more genetic perturbations comprise barcoded open reading frames (ORFs) from a library of barcoded ORFs, wherein each ORF may be identified by a unique barcode or by the sequence of the ORF. Direct sequencing of the ORF may be used to identify mutations when no barcode is used.

7. The method of claim 6, wherein the ORFs comprise variants of an enzyme.

8. The method of claim 6, wherein the ORFs comprise organelle specific ORFs.

9. The method of any of claims 1 to 8, wherein the organelle is a mitochondria or chloroplast and the organelle-specific functional probe is a membrane potential probe.

10. The method of claim 9, wherein the membrane potential probe is selected from the group of consisting of tetramethylrhodamine methyl ester (TMRM), TMRE, JC-1, JC-10, MITO-ID, MitoTracker Red, CMXRos/Deep Red, Rhodamine 123, DiOC6, SPIRIT RhoVR, MitoView 405, MitoView 633, MitoView 650, and MitoView 720.

11. The method of any of claims 1 to 8, wherein the organelle is a mitochondria and the organelle-specific functional probe is a chemical probe for mitochondria abundance.

12. The method of claim 11, wherein the chemical probe for mitochondria abundance is selected from the group of consisting of PK Mito Orange, MitoTracker Green, MitoTracker Deep Red, and MitoView Green.

13. The method of any of claims 1 to 8, wherein the organelle is a mitochondria and the organelle-specific functional probe is a chemical probe for mitochondria reactive oxygen species (ROS).

14. The method of claim 13, wherein the chemical probe for mitochondria ROS is selected from the group of consisting of MitoSOX, MitoB, MitoPYl, and MitoPeDPP.

15. The method of any one of claims 1 to 8, wherein the organelle is a mitochondria or chloroplast and the organelle-specific functional probe is a calcium probe.

16. The method of any one of claims 1 to 8, wherein the organelle is a mitochondria or a chloroplast and the organelle-specific functional probe is an NADH probe.

17. The method of any one of claims 1 to 8, wherein the organelle is a lysozyme and the organelle-specific functional probe is a fluorescent acidotropic probe.

18. The method of any of claims 1 to 8, wherein the organelle-specific functional genetic reporter is selected from the group consisting of genetic reporters of NADH, NADPH, ATP, calcium), pH, ROS (roGFP, HyPer), citrate (Citron, Citroff), and lactate (GEM-IL, eLACCO).

19. The method of any one of claims 1 to 18, wherein the one or more organelle-specific substrate/inhibitors is one or more mitochondria specific substrate(s)/inhibitor(s).

20. The method of claim 19, wherein the one or more mitochondria-specific substrate(s)/inhibitor(s) are selected from the group consisting of glutamate, malate, succinate, piericidin A, coenzyme Q-linked substrates, glycerol-3 -phosphate, ATP, ADP, D-lactate, antimycin A, ascorbate, N,N,N’,N’-tetramethyl-p-phenylenediamine (TMPD), oligomycin A, BAM15, and carbonyl cyanide m-chlorophenyl hydrazone (CCCP).

21. The method of any of claims 1 to 20, wherein the plasma membranes of the plurality of cells are permeabilized with Perfringolysin O.

22. The method of any of claims 1 to 20, wherein the plasma membranes of the plurality of cells are permeabilized with a cholesterol-specific detergent, such as digitonin or saponin, or lower concentrations of commonly used detergents such as Triton X-100.

23. The method of any of claims 1 to 22, wherein the cells further comprise a sample barcode that identifies a sample source for the population of cells, optionally wherein the sample barcode is introduced via a vector to all cells in a given sample, optionally wherein the sample comprises a specific perturbation.

24. The method of any of the preceding claims, wherein the probe or reporter is a fluorescent probe or reporter.