Nothing Special   »   [go: up one dir, main page]

WO2024263826A1 - Sars-cov-2 t cell vaccines - Google Patents

Sars-cov-2 t cell vaccines Download PDF

Info

Publication number
WO2024263826A1
WO2024263826A1 PCT/US2024/034888 US2024034888W WO2024263826A1 WO 2024263826 A1 WO2024263826 A1 WO 2024263826A1 US 2024034888 W US2024034888 W US 2024034888W WO 2024263826 A1 WO2024263826 A1 WO 2024263826A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
cov
sars
amino acid
full
Prior art date
Application number
PCT/US2024/034888
Other languages
French (fr)
Inventor
Edison ONG
Meghana PESHWA
Original Assignee
Modernatx, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Modernatx, Inc. filed Critical Modernatx, Inc.
Publication of WO2024263826A1 publication Critical patent/WO2024263826A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P31/00Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
    • A61P31/12Antivirals
    • A61P31/14Antivirals for RNA viruses
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K9/00Medicinal preparations characterised by special physical form
    • A61K9/10Dispersions; Emulsions
    • A61K9/127Synthetic bilayered vehicles, e.g. liposomes or liposomes with cholesterol as the only non-phosphatidyl surfactant
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K2039/51Medicinal preparations containing antigens or antibodies comprising whole cells, viruses or DNA/RNA
    • A61K2039/53DNA (RNA) vaccination
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K2039/555Medicinal preparations containing antigens or antibodies characterised by a specific combination antigen/adjuvant
    • A61K2039/55511Organic adjuvants
    • A61K2039/55555Liposomes; Vesicles, e.g. nanoparticles; Spheres, e.g. nanospheres; Polymers
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K39/12Viral antigens
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2770/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssRNA viruses positive-sense
    • C12N2770/00011Details
    • C12N2770/20011Coronaviridae
    • C12N2770/20022New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2770/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssRNA viruses positive-sense
    • C12N2770/00011Details
    • C12N2770/20011Coronaviridae
    • C12N2770/20034Use of virus or viral component as vaccine, e.g. live-attenuated or inactivated virus, VLP, viral protein

Definitions

  • BACKGROUND Human coronaviruses are highly contagious enveloped, positive-sense single-stranded RNA viruses of the Coronaviridae family. Two sub-families of Coronaviridae are known to cause human disease, the most important being the ⁇ -coronaviruses (betacoronaviruses).
  • the ⁇ - coronaviruses are common etiological agents of mild to moderate upper respiratory tract infections. Outbreaks of novel coronavirus infections such as the infections caused by a coronavirus initially identified from the Chinese city of Wuhan in December 2019; however, have been associated with a high mortality rate.
  • SARS-CoV-2 Severe Acute Respiratory Syndrome Coronavirus 2
  • 2019-nCoV Severe Acute Respiratory Syndrome Coronavirus 2
  • WHO World Health Organization
  • COVID-19 Coronavirus Disease 2019
  • the first genome sequence of a SARS-CoV-2 isolate was released by investigators from the Chinese CDC in Beijing on January 10, 2020 at Virological, a UK-based discussion forum for analysis and interpretation of virus molecular evolution and epidemiology.
  • SARS-CoV-2 strain variants have been identified, some of which are more infectious than the SARS-CoV-2 isolate.
  • RBD receptor binding domain
  • NTD N-terminal domain
  • the entry of coronavirus into host cells is mediated by interaction between the RBD of the viral S protein and host angiotensin-converting enzyme 2 (ACE2).
  • ACE2 angiotensin-converting enzyme 2
  • NTD neutralization “supersite”
  • RBD eg., K417N, E484K, and N501Y
  • NTD eg., L18F, D80A, D215G, and ⁇ 242-244
  • N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2.
  • Cell doi:10.1016/j.cell.2021.03.028 (2021). Since the identification above the above isolates, Omicron BA.4 and BA.5 sub-variants were detected in samples from South Africa in January 2022 and February 2022, respectively. While BA.4 and BA.5 are two different Omicron sub-variants, they are usually discussed together for vaccine/immunization purposes, as they encode identical spike proteins. Early data from South Africa and genetic and epidemic surveillance in several countries indicated that BA.4/BA.5 had substantial growth advantage over other SARS-CoV-2 circulating strains.
  • BA.4/BA.5 This advantage was likely driven by new mutations in BA.4/BA.5 spike that provided increased escape from pre-existing immunity in the populations acquired either via natural infection or vaccinations.
  • ECDC European Centers for Disease Control and Prevention
  • UK Health Security Agency UK Health Security Agency
  • BA.4/BA.5 Variants of Concern (VOC) in May 2022.
  • BA.5 became dominant variant in Portugal in May 2022, while BA.4/BA.5 became dominant in the USA, France, UK, and Germany in June 2022.
  • the recent emergence of SARS-CoV-2 variants (XBB.1.5 lineage; “Kraken”, and XBB.1.16 lineage, “Arcturus”) have raised concerns due to their increased rates of transmission and potential to circumvent immunity elicited by natural infection or vaccination.
  • the XBB.1.5 variant (“Kraken”) is derived from the BA.2 Omicron subvariant and has increased apparent transmissibility compared to ancestral SARS-CoV-2 strains.
  • the XBB.1.5 Spike protein includes 38 substitutions and 4 amino acid deletions, relative to the wild-type Spike protein sequence of the Wuhan-Hu-1 isolate. These substitutions include G252V and F486P.
  • the XBB.1.16 variant (“Arcturus”) is derived from the BA.2 Omicron subvariant and includes 39 substitutions and 4 deletions in the Spike protein, relative to the wild-type Spike protein sequence of the Wuhan-Hu- 1 isolate. These substitutions include T478R, which increases viral infectivity.
  • SUMMARY A monovalent SARS-CoV-2 Spike (S) protein-encoding mRNA vaccine (developed by Moderna Therapeutics) has been demonstrated to be highly efficacious in prevention of symptomatic COVID-19 disease and severe disease.
  • T cell responses to SARS-CoV-2 antigens other than the S protein such as the Nucleocapsid (N), Matrix (M), and Non-structural protein 3 (Nsp3) play important roles in anti-betacoronavirus immunity, including reducing the severity and duration of infections.
  • SARS-CoV-2 variants e.g., the XBB.1.5 lineage; “Kraken”, and XBB.1.16 lineage, “Arcturus”
  • Spike proteins that evade antibodies elicited by exposure to ancestral Spike proteins, such as through immunization or prior infection, but T cell epitopes in other proteins are more often conserved between variants and ancestral strains.
  • compositions comprising a lipid nanoparticle and a messenger RNA (mRNA) comprising an open reading frame encoding a SARS-CoV-2 chimeric protein comprising: a SARS-CoV-2 N protein portion; a SARS-CoV-2 NSP3 protein portion; and a SARS-CoV-2 M protein portion comprising one or more transmembrane domains.
  • mRNA messenger RNA
  • the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein and a C-terminal domain of the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of the full-length SARS- CoV-2 N protein.
  • the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to the amino acids 43-87 of the full-length SARS-CoV-2 N protein.
  • the first and second N-terminal domain amino acid sequences are connected by a linker.
  • the linker is a glycine linker or a glycine-serine linker.
  • the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the SARS-CoV-2 N protein.
  • the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
  • the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
  • the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
  • the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
  • the CD8+ T cell epitopes occur in a different order in the SARS- CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length SARS-CoV-2 NSPR3 protein.
  • one or more junctional epitopes present in a concatenated amino acid sequence consisting of two or more CD8+ T cell epitopes are not present in the SARS-CoV-2 NSP3 protein portion.
  • the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85.
  • the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93. In some embodiments, the SARS-CoV-2 protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a ⁇ - sheet domain of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a ⁇ -sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the ⁇ -sheet domain is connected to one or more transmembrane domains by a linker.
  • the linker is a glycine or glycine-serine linker.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
  • the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
  • two or more of the N protein portion, the NSP3 protein portion, and the M protein portion are separated by a linker.
  • the N protein portion and the NSP3 protein portion are separated by a first linker, and/or the NSP3 protein portion and the M protein portion are separated by a second linker.
  • the N protein portion and the M protein portion are separated by a first linker, and/or the M protein portion and the NSP3 protein portion are separated by a second linker.
  • the M protein portion and the N protein portion are separated by a first linker, and/or the N protein portion and the NSP3 protein portion are separated by a second linker.
  • each of the first and second linkers is a glycine or glycine-serine linker.
  • each of the first and second linkers comprises the amino acid sequence AAY.
  • the SARS-CoV-2 chimeric protein further comprises a signal peptide.
  • the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide.
  • HA hemagglutinin
  • Some aspects relate to a composition comprising a lipid nanoparticle and an mRNA comprising an open reading frame encoding a SARS-COV-2 chimeric protein comprising: a SARS-CoV-2 S protein portion; and a SARS-CoV-2 N protein portion; and a transmembrane portion comprising a transmembrane domain.
  • the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a SARS-CoV-2 N protein and a C-terminal domain of the SARS- CoV-2 N protein.
  • the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of a full-length SARS-CoV- 2 N protein.
  • the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43-87 of the full- length SARS-CoV-2 N protein.
  • the first and second N-terminal domain amino acid sequences are connected by a linker.
  • the linker is a glycine or glycine-serine linker.
  • the SARS-CoV-2 protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the full-length SARS-CoV-2 N protein.
  • the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
  • the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
  • the transmembrane portion comprises an influenza HA transmembrane domain.
  • the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein.
  • the SARS-CoV-2 M protein portion does not comprise an N- terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
  • the SARS-CoV-2 protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a ⁇ - sheet domain of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a ⁇ -sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the ⁇ -sheet domain is connected to one or more transmembrane domains by a linker.
  • the linker is a glycine or a glycine-serine linker.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
  • the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
  • the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed.
  • the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker.
  • the linker is a glycine linker or a glycine-serine linker.
  • the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein.
  • the NTD corresponds to amino acids 1-290 of the full-length SARS-CoV-2 S protein
  • the RBD corresponds to amino acids 316-517 of the full-length SARS-CoV-2 S protein.
  • the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein.
  • the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein.
  • the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87.
  • the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98.
  • two or more of the S protein portion, the N protein portion, and the transmembrane portion are separated by a linker.
  • the S protein portion and the N protein portion are separated by a first linker and/or the N protein portion and the transmembrane portion are separated by a second linker.
  • the S protein portion and the transmembrane portion are separated by a first linker, and/or the transmembrane portion and the N protein portion are separated by a second linker.
  • each of the first and second linkers is a glycine or a glycine-serine linker.
  • each of the first and second linkers comprises the amino acid sequence AAY.
  • compositions comprising a lipid nanoparticle and a messenger ribonucleic acid comprising an open reading frame (ORF) encoding a protein comprising an amino acid sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183.
  • the mRNA comprises a 5′ untranslated region (UTR), wherein the 5′ UTR comprises a nucleotide sequence with at least 90% sequence identity to a nucleotide sequence selected from SEQ ID NOs: 1, 2, 5–35, 66, 70–72, 75, 76, and 81.
  • the 5′ UTR comprises a nucleotide sequence selected from SEQ ID NOs: 1, 2, 5– 35, 66, 70–72, 75, 76, and 81.
  • the mRNA comprises a 3′ untranslated region (UTR), wherein the 5′ UTR comprises a nucleotide sequence with at least 90% sequence identity to a nucleotide sequence selected from SEQ ID NOs: 3–4, 36–44, 68, 69, 73, 74, 77–79, and 82.
  • the 3′ UTR comprises a nucleotide sequence selected from SEQ ID NOs: 3–4, 36– 44, 68, 69, 73, 74, 77–79, and 82.
  • the mRNA comprises one or more stop codons immediately downstream from the open reading frame.
  • the one or more stop codons comprise the nucleotide sequence UGAUGA.
  • the one or more stop codons comprise the nucleotide sequence UGAUAAUAG.
  • the mRNA comprises a polyadenosine (polyA) sequence comprising 20 or more consecutive adenosine nucleotides.
  • the polyA sequence comprises 100 consecutive adenosine nucleotides. In some embodiments, the polyA sequence comprises, in 5′-to-3′ order, a first nucleotide sequence comprising 30 consecutive adenosine nucleotides, an intervening sequence comprising no more than three adenosine nucleotides, and a second nucleotide sequence comprising 70 consecutive adenosine nucleotides. In some embodiments, the polyA sequence comprises the nucleotide sequence of SEQ ID NO: 80. In some embodiments, the mRNA further comprises a polycytidine (polyC) sequence comprising 20 or more consecutive cytidine nucleotides.
  • polyC polycytidine
  • the polyC sequence comprises 30 consecutive cytidine nucleotides. In some embodiments, the polyC sequence is downstream from the polyA sequence, wherein the polyA sequence comprises 64 consecutive adenosine nucleotides. In some embodiments, the polyA sequence comprises 109 consecutive adenosine nucleotides. In some embodiments, the mRNA comprises a 5’ cap analog. In some embodiments, the 5’ cap analog comprises a 7mG(5’)ppp(5’)NlmpNp cap.
  • the lipid nanoparticle comprises 40-55 mol% ionizable amino lipid, 30-45 mol% sterol, 5-15 mol% neutral lipid, and 1-5 mol% PEG-modified lipid.
  • the ionizable amino lipid comprises a compound of Formula (I): R 1 is R”M’R’ or C 5-20 alkenyl; R2 and R3 are each independently selected from C1-14 alkyl and C2-14 alkenyl; R 4 is (CH 2 ) n Q, wherein Q is OH and n is selected from 3, 4, and 5; M and M’ are each independently -OC(O)- or -C(O)O; R5, R6, and R7 are each H; R’ is a linear C1-12 alkyl, or C1-12 alkyl substituted with C6-9 alkyl; R” is C 3-14 alkyl; m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13.
  • the ionizable amino lipid comprises Compound 1: (Compound 1). In some embodiments, the ionizable amino lipid comprises a compound of the structure A7: (A7). In some embodiments, the neutral lipid is 1,2 distearoyl-sn-glycero-3-phosphocholine (DSPC). In some embodiments, the sterol is cholesterol. In some embodiments, the PEG-modified lipid is PEG2000-DMG. In some embodiments, the open reading frame comprises one or more chemically modified nucleotides. In some embodiments, the open reading frame comprises N1-methylpseudouridine.
  • At least 80% of uracil nucleotides in the open reading frame comprise N1- methylpseudouridine. In some embodiments, 100% of uracil nucleotides in the open reading frame comprise N1-methylpseudouridine. In some embodiments, the open reading frame comprises 5-methylcytidine. In some embodiments, at least 80% of cytosine nucleotides in the open reading frame comprise 5- methylcytidine. In some embodiments, 100% of cytosine nucleotides in the open reading frame comprise 5-methylcytidine. In some embodiments, the open reading frame comprises 5-methyluridine.
  • At least 80% of uracil nucleotides in the open reading frame comprise 5- methyluridine. In some embodiments, 100% of uracil nucleotides in the open reading frame comprise 5-methyluridine.
  • a pharmaceutical composition comprising an mRNA and a pharmaceutically acceptable excipient. Some aspects relate to a method comprising administering to a subject a composition. In some embodiments, the composition is administered intramuscularly. In some embodiments, the composition is effective to induce, in the subject, CD4+ and/or CD8+ T cells specific to one or more epitopes of the protein. In some embodiments, the method comprises administering a first dose and a second dose of the composition.
  • SARS-CoV-2 chimeric protein comprising a SARS-CoV-2 N protein portion; a SARS-CoV-2 NSP3 protein portion; and a SARS-CoV-2 M protein portion comprising one or more transmembrane domains.
  • the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein and a C-terminal domain of the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of the full-length SARS- CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to the amino acids 43-87 of the full-length SARS-CoV-2 N protein. In some embodiments, the first and second N-terminal domain amino acid sequences are connected by a linker. In some embodiments, the linker is a glycine linker or a glycine-serine linker.
  • the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the SARS-CoV-2 N protein.
  • the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
  • the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
  • the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
  • the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
  • the CD8+ T cell epitopes occur in a different order in the SARS- CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length wild-type SARS-CoV-2 NSPR3 protein.
  • one or more junctional epitopes present in a concatenated amino acid sequence consisting of two or more CD8+ T cell epitopes are not present in the SARS-CoV-2 NSP3 protein portion.
  • the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93. In some embodiments, the SARS-CoV-2 protein portion does not comprise an N-terminal glycosylation site, relative to a full-length wild-type SARS-CoV-2 M protein.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a ⁇ - sheet domain of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a ⁇ -sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the ⁇ -sheet domain is connected to one or more transmembrane domains by a linker.
  • the linker is a glycine or glycine-serine linker.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
  • the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
  • two or more of the N protein portion, the NSP3 protein portion, and the M protein portion are separated by a linker.
  • the N protein portion and the NSP3 protein portion are separated by a first linker, and/or the NSP3 protein portion and the M protein portion are separated by a second linker.
  • the N protein portion and the M protein portion are separated by a first linker, and/or the M protein portion and the NSP3 protein portion are separated by a second linker.
  • the M protein portion and the N protein portion are separated by a first linker, and/or the N protein portion and the NSP3 protein portion are separated by a second linker.
  • each of the first and second linkers is a glycine or glycine-serine linker.
  • each of the first and second linkers comprises the amino acid sequence AAY.
  • the SARS-CoV-2 chimeric protein further comprises a signal peptide.
  • the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide.
  • HA hemagglutinin
  • SARS-COV-2 chimeric protein comprising: a SARS-CoV-2 S protein portion; and a SARS-CoV-2 N protein portion; and a transmembrane portion comprising a transmembrane domain.
  • the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a SARS-CoV-2 N protein and a C-terminal domain of the SARS- CoV-2 N protein.
  • the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of a full-length SARS-CoV- 2 N protein.
  • the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43-87 of the full- length SARS-CoV-2 N protein.
  • the first and second N-terminal domain amino acid sequences are connected by a linker.
  • the linker is a glycine or glycine-serine linker.
  • the SARS-CoV-2 protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the full-length wild-type SARS- CoV-2 N protein.
  • the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
  • the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
  • the transmembrane portion comprises an influenza HA transmembrane domain.
  • the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein.
  • the SARS-CoV-2 M protein portion does not comprise an N- terminal glycosylation site, relative to a full-length wild-type SARS-CoV-2 M protein.
  • the SARS-CoV-2 protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a ⁇ - sheet domain of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a ⁇ -sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the ⁇ -sheet domain is connected to one or more transmembrane domains by a linker.
  • the linker is a glycine or a glycine-serine linker.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
  • the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
  • the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed.
  • the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker.
  • the linker is a glycine linker or a glycine-serine linker.
  • the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein.
  • the NTD corresponds to amino acids 1-290 of the full-length SARS-CoV-2 S protein
  • the RBD corresponds to amino acids 316-517 of the full-length SARS-CoV-2 S protein.
  • the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein.
  • the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein.
  • the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87.
  • the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98.
  • two or more of the S protein portion, the N protein portion, and the transmembrane portion are separated by a linker.
  • the S protein portion and the N protein portion are separated by a first linker and/or the N protein portion and the transmembrane portion are separated by a second linker.
  • the S protein portion and the transmembrane portion are separated by a first linker, and/or the transmembrane portion and the N protein portion are separated by a second linker.
  • each of the first and second linkers is a glycine or a glycine-serine linker.
  • each of the first and second linkers comprises the amino acid sequence AAY.
  • a SARS-CoV-2 chimeric protein comprising an amino acid sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183.
  • an mRNA comprising an ORF encodes any one of the proteins.
  • an mRNA comprises an ORF encoding the SARS-CoV-2 chimeric protein.
  • the mRNA comprises a chemical modification.
  • 100% of the uracil nucleotides of the mRNA comprise a chemical modification.
  • 100% of the uracil nucleotides of the mRNA comprise N1- methylpseudouridine.
  • Some aspects relate to a composition comprising a self-amplifying RNA encoding a SARS-CoV-2 chimeric protein comprising a SARS-CoV-2 N protein portion; a SARS-CoV-2 NSP3 protein portion; and a SARS-CoV-2 M protein portion comprising one or more transmembrane domains.
  • the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein and a C-terminal domain of the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of the full-length SARS- CoV-2 N protein.
  • the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to the amino acids 43-87 of the full-length SARS-CoV-2 N protein.
  • the first and second N-terminal domain amino acid sequences are connected by a linker.
  • the linker is a glycine linker or a glycine-serine linker.
  • the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the SARS-CoV-2 N protein.
  • the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
  • the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
  • the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the CD8+ T cell epitopes occur in a different order in the SARS- CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length SARS-CoV-2 NSPR3 protein.
  • one or more junctional epitopes present in a concatenated amino acid sequence consisting of two or more CD8+ T cell epitopes are not present in the SARS-CoV-2 NSP3 protein portion.
  • the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85.
  • the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93.
  • the SARS-CoV-2 protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a ⁇ - sheet domain of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a ⁇ -sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the ⁇ -sheet domain is connected to one or more transmembrane domains by a linker.
  • the linker is a glycine or glycine-serine linker.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
  • the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
  • two or more of the N protein portion, the NSP3 protein portion, and the M protein portion are separated by a linker.
  • the N protein portion and the NSP3 protein portion are separated by a first linker, and/or the NSP3 protein portion and the M protein portion are separated by a second linker.
  • the N protein portion and the M protein portion are separated by a first linker, and/or the M protein portion and the NSP3 protein portion are separated by a second linker.
  • the M protein portion and the N protein portion are separated by a first linker, and/or the N protein portion and the NSP3 protein portion are separated by a second linker.
  • each of the first and second linkers is a glycine or glycine-serine linker.
  • each of the first and second linkers comprises the amino acid sequence AAY.
  • the SARS-CoV-2 chimeric protein further comprises a signal peptide.
  • the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide.
  • HA hemagglutinin
  • Some aspects relate to a composition comprising a self-amplifying RNA encoding a SARS-CoV-2 chimeric protein comprising: a SARS-CoV-2 S protein portion; and a SARS-CoV- 2 N protein portion; and a transmembrane portion comprising a transmembrane domain.
  • the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a SARS-CoV-2 N protein and a C-terminal domain of the SARS- CoV-2 N protein.
  • the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
  • the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of a full-length SARS-CoV- 2 N protein.
  • the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43-87 of the full- length SARS-CoV-2 N protein.
  • the first and second N-terminal domain amino acid sequences are connected by a linker.
  • the linker is a glycine or glycine-serine linker.
  • the SARS-CoV-2 protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the full-length SARS-CoV-2 N protein.
  • the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
  • the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
  • the transmembrane portion comprises an influenza HA transmembrane domain.
  • the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein.
  • the SARS-CoV-2 M protein portion does not comprise an N- terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
  • the SARS-CoV-2 protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a ⁇ - sheet domain of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
  • the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a ⁇ -sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the ⁇ -sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the ⁇ -sheet domain is connected to one or more transmembrane domains by a linker.
  • the linker is a glycine or a glycine-serine linker.
  • the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
  • the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
  • the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
  • the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed.
  • the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker.
  • the linker is a glycine linker or a glycine-serine linker.
  • the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein.
  • the NTD corresponds to amino acids 1-290 of the full-length SARS-CoV-2 S protein
  • the RBD corresponds to amino acids 316-517 of the full-length SARS-CoV-2 S protein.
  • the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein.
  • the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein.
  • the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87.
  • the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98.
  • two or more of the S protein portion, the N protein portion, and the transmembrane portion are separated by a linker.
  • the S protein portion and the N protein portion are separated by a first linker and/or the N protein portion and the transmembrane portion are separated by a second linker.
  • the S protein portion and the transmembrane portion are separated by a first linker, and/or the transmembrane portion and the N protein portion are separated by a second linker.
  • each of the first and second linkers is a glycine or a glycine-serine linker.
  • each of the first and second linkers comprises the amino acid sequence AAY.
  • FIG.1 is a schematic of T cell composite vaccine designs. N/M/Nsp3 composite vaccine designs are shown on top, either with or without a signal peptide (SP), with two possible compositions of nucleocapsid (N) protein sequences, six non-structural protein 3 (Nsp3) epitopes, and either a full-length or truncated membrane (M) protein sequence. Segments of this composite are either not linked or are linked with an AAY linker (cleavable) or a GGSGG (SEQ ID NO: 99) linker (non-cleavable).
  • SP signal peptide
  • NTD-RBD/N/M composite vaccine designs are shown on the bottom, with a spike (S) protein N terminal domain (NTD) and receptor binding domain (RBD), two possible compositions of N protein sequences, and three possible compositions of M protein sequences (HA-TM, full-length M protein, or truncated M protein). Segments of this composite are linked with a GGSGG (SEQ ID NO: 99) linker.
  • FIG.2 represents the prediction methods underlying the design of the N protein antigens used in the vaccine composites shown in FIG.1. Shown in the bottom line graph is the N protein sequence with areas of immunogenicity; the top line graph illustrates the conservation of the N protein sequence across sarbecoviruses.
  • FIG.3A-3C relate to the sequence and structure of the N protein.
  • FIG.3A is a schematic of the regions of the N protein, including the N-terminal arm (N-arm); the N-terminal domain (NTD), which contains the RNA-binding sequence; the linker region (LKR) containing serine and arginine-rich (SR) motifs; the C-terminal domain (CTD), which contains a sequence associated with RNA-binding and oligomerization; and the C-terminal tail (C-tail).
  • FIG.3B shows the protein structures of the RNA-binding residues of the N protein.
  • FIG.3C iterates the designs of the N protein sequences used in the vaccine compositions.
  • the design on the top contains residues 104-143 and 213-366 of the N protein, while the design on the bottom also includes residues 43-87 and a linker, preserving most of the NTD domain.
  • FIG.4 represents the prediction methods underlying the design of the M protein antigens used in the vaccine composites shown in FIG.1. Shown in the bottom line graph is the M protein sequence with areas of immunogenicity; the top line graph illustrates the conservation of the N protein sequence across sarbecoviruses.
  • FIG.5 iterates designs of M protein sequences used in vaccine compositions. The design on the top is the full-length M protein, while the design on the bottom is a truncated M protein containing the ⁇ -sheet domain and residues 6-104.
  • FIG.6A-6B illustrate the junctional epitopes that arise from the concatenation of Nsp3 epitopes in two different configurations.
  • FIG.6A includes a schematic of the concatenation of Nsp3 epitopes in the order of epitopes 1-6 (amino acid sequences for each epitope are shown below the epitope names) and the resulting junctional epitopes upon various epitope pairings.
  • the world population coverage of HLAs that are capable of presenting the antigens of the junctional epitopes are shown as a percent in the “World Pop. Cov.” columns.
  • the bolded and underlined junctional epitope that results from the combination of Nsp3-E3 and Nsp3-E4 is similarly matched to human proteins.
  • FIG.6B shows the minimization of junctional epitopes that result from a concatenation of Nsp3 epitopes in a different configuration: Nsp3-E6---Nsp3- E2---Nsp3-E1---Nsp3-E4---Nsp3-E5---Nsp3-E3.
  • Junctional epitopes result from three of these pairings, the sequences for which are shown in “MHC-I Junctional Epitopes.”
  • the world population coverage of HLAs that are capable of presenting the resultant antigens is shown in the “World Pop. Cov.” column.
  • SARS-CoV-2 The genome of SARS-CoV-2 is a single-stranded positive-sense RNA (+ssRNA) with the size of 29.8–30 kb encoding about 9860 amino acids (Chan et al.2000, supra; Kim et al.2020 Cell, May 14; 181(4):914-921.e10.).
  • the SARS-CoV-2 genome is organized into specific genes encoding structural proteins and nonstructural proteins (Nsps).
  • the order of the structural proteins in the genome is 5′-replicase (open reading frame (ORF)1/ab)-structural proteins [Spike (S)-Envelope (E)-Membrane (M)-Nucleocapsid (N)]-3′.
  • ORF open reading frame
  • the genome of coronaviruses includes a variable number of open reading frames that encode accessory proteins, nonstructural proteins, and structural proteins (Song et al.2019 Viruses;11(1):p.59). Most of the antigenic peptides are located in the structural proteins (Cui et al.2019 Nat. Rev. Microbiol., 17(3):181–192).
  • Spike surface glycoprotein (S), a small envelope protein (E), matrix protein (M), and nucleocapsid protein (N) are four main structural proteins. Since S-protein contributes to cell tropism and virus entry and also induces neutralizing antibodies (NAb) and protective immunity, it can be considered one of the most important targets in coronavirus vaccine development among all other structural proteins. Moreover, amino acid sequence analysis has shown that S-protein contains conserved regions among the coronaviruses, which may be the basis for universal vaccine development.
  • compositions comprising nucleic acids (e.g., mRNAs) encoding proteins of interest, e.g., a protein derived from one or more betacoronavirus proteins such as a SARS-CoV-2 nucleocapsid (N), matrix (M), non-structural protein 3 (nsp3), and/or spike (S) protein.
  • nucleic acids e.g., mRNAs
  • mRNAs e.g., mRNAs
  • proteins of interest e.g., a protein derived from one or more betacoronavirus proteins
  • S non-structural protein 3
  • spike spike
  • nucleic acids in particular mRNA(s)
  • appropriate carriers or delivery vehicles e.g., lipid nanoparticles
  • nucleic acid upon administration to cells, tissues or subjects, nucleic acid is taken up by cells which, in turn, express protein(s) encoded by the nucleic acids, e.g., mRNAs.
  • Antigens as used herein, are proteins capable of inducing an immune response (e.g., causing an immune system to produce antibodies against the antigens).
  • the vaccines provide a unique advantage over traditional protein-based vaccination approaches, in which protein antigens are purified or produced in vitro, e.g., recombinant protein production technologies.
  • the vaccines feature RNA (e.g., mRNA) encoding the desired antigens, which when introduced into the body, i.e., administered to a mammalian subject (for example a human) in vivo, cause the cells of the body to express the desired antigens.
  • RNA e.g., mRNA
  • the mRNAs are encapsulated in lipid nanoparticles (LNPs).
  • LNPs lipid nanoparticles
  • the mRNAs Upon delivery and uptake by cells of the body, the mRNAs are translated in the cytosol and protein antigens are generated by the host cell machinery.
  • the protein antigens are presented and elicit an adaptive humoral and cellular immune response.
  • Neutralizing antibodies are directed against the expressed protein antigens and hence the protein antigens are considered relevant target antigens for vaccine development.
  • antigen encompasses immunogenic proteins and immunogenic fragments (an immunogenic fragment that induces (or is capable of inducing) an immune response to a (at least one) SARS-CoV-2 variant), unless otherwise stated.
  • protein encompasses peptides and the term “antigen” encompasses antigenic fragments.
  • Other molecules may be antigenic such as bacterial polysaccharides or combinations of protein and polysaccharide structures, but for the viral vaccines included herein, viral proteins, fragments of viral proteins and designed and or mutated proteins derived from SARS-CoV-2 are the antigens.
  • viral proteins have a quaternary or three-dimensional structure, which consists of more than one polypeptide or several polypeptide chains that associate into an oligomeric molecule.
  • subunit refers to a single protein molecule, for example, a polypeptide or polypeptide chain resulting from processing of a nascent protein molecule, which subunit assembles (or “coassembles”) with other protein molecules (e.g., subunits or chains) to form a protein complex.
  • Proteins can have a relatively small number of subunits and therefore be described as “oligomeric” or can consist of a large number of subunits and therefore be described as “multimeric”.
  • the subunits of an oligomeric or multimeric protein may be identical, homologous or totally dissimilar and dedicated to disparate tasks. Proteins or protein subunits can further comprise domains.
  • domain refers to a distinct functional and/or structural unit within a protein. Typically, a “domain” is responsible for a particular function or interaction, contributing to the overall role of a protein. Domains can exist in a variety of biological contexts. Similar domains (i.e., domains sharing structural, functional and/or sequence homology) can exist within a single protein or can exist within distinct proteins having similar or different functions. A protein domain is often a conserved part of a given protein tertiary structure or sequence that can function and exist independently of the rest of the protein or subunit thereof.
  • antigen is distinct from the term “epitope” which is a substructure of an antigen, e.g., a polypeptide, such as 7-10 amino acids, or carbohydrate structure, which may be recognized by an antigen binding site.
  • epitopope is a substructure of an antigen, e.g., a polypeptide, such as 7-10 amino acids, or carbohydrate structure, which may be recognized by an antigen binding site.
  • protein antigens that are delivered to subjects or immune cells in isolated form, e.g., isolated protein, polypeptide or peptide antigens, however, the design, testing, validation, and production of protein antigens can be costly and time-consuming, especially when producing proteins at large scale.
  • mRNA technology is amenable to rapid design and testing of mRNA constructs encoding a variety of antigens.
  • rapid production of mRNA coupled with inclusion in appropriate delivery vehicles can proceed quickly and can rapidly produce mRNA vaccines at large scale.
  • appropriate delivery vehicles e.g., lipid nanoparticles
  • Potential benefit also arises from the fact that antigens encoded by the mRNAs are expressed by the cells of the subject, e.g., are expressed by the human body, and thus the subject, e.g., the human body, serves as the “factory” to produce the antigens which, in turn, elicit the desired immune response.
  • the compositions may include an RNA or multiple RNAs encoding two or more antigens of the same or different viral strains.
  • Vaccines may be combination vaccines that include RNA encoding one or more coronavirus antigens and one or more antigen(s) of a different organism.
  • the vaccines may be combination vaccines that target one or more antigens of the same strain/species, or one or more antigens of different strains/species, e.g., antigens which induce immunity to organisms which are found in the same geographic areas where the risk of coronavirus infection is high or organisms to which an individual is likely to be exposed to when exposed to a coronavirus (e.g., COVID-19).
  • the second or subsequent circulating SARS-CoV-2 antigen is an immunodominant antigen from an emerging strain.
  • An immunodominant antigen of an emerging strain is assessed with respect to the strain from which the antigen is derived, relative to a different strain of the virus, such as the original strain or other variant thereof.
  • An immunodominant antigen of the emerging strain induces a stronger immune response against the emerging strain than against the different strain.
  • an immunodominant antigen of the emerging strain is more infective than a different strain of the virus, such as the original strain or other variant thereof.
  • the nucleocapsid of the SARS-CoV-2 virus plays an essential role in its replication and assembly and is highly conserved among SARS-CoV-2 variants. It contains the N protein, which protein forms oligomers and assembles around the viral RNA in a helical arrangement, resulting in the formation of the viral ribonucleoprotein (vRNP) complex. This complex is essential for protecting the viral genome and facilitating its packaging into new viral particles during viral assembly.
  • the N protein includes two domains: the N-terminal domain (NTD) and the C-terminal domain (CTD), connected by a linking region.
  • an mRNA encodes a protein comprising a SARS-CoV-2 nucleocapsid (N) protein portion.
  • a protein comprises a SARS-CoV-2 nucleocapsid (N) protein portion.
  • N protein portions may contain a truncation or deletion of one or more regions, relative to a full-length or naturally occurring N protein, that are sparse in CD4+ and/or CD8+ T cell epitopes, thereby increasing the epitope density of the N protein portion compared to a full-length N protein.
  • an N protein portion comprises a higher density or CD4+ T cell epitopes than a wild-type SARS-CoV-2 N protein.
  • an N protein portion comprises a higher density or CD8+ T cell epitopes than a wild-type SARS-CoV-2 N protein.
  • N protein portions may also be modified to remove or disrupt a functional region of full- length N protein to improve safety or immunogenicity.
  • an N protein portion may lack one or more amino acids of an RNA-binding domain.
  • the N protein portion has a truncation in, or a deletion of, a basic loop of an RNA-binding domain.
  • a SARS-CoV-2 N protein having the amino acid sequence of SEQ ID NO: 84 comprises a basic loop at amino acids 88–103.
  • a basic loop corresponds to amino acids 88–103, and so a N portion lacking or having a truncated basic loop lacks one or more amino acids corresponding to amino acids 88–103 of SEQ ID NO: 84.
  • SARS-CoV-2 N protein portion comprising an internal truncation lacks one or more amino acids that is present in a full-length N protein sequence, but comprises one or more amino acids that flank the deleted amino acid(s) in the full-length N protein sequence.
  • a modified N protein portion lacks an amino acid sequence comprising 1–200, 1–190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1– 90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140– 200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present between the N-terminal amino acid and C-terminal amino acid of a full-length N amino acid sequence.
  • a modified N protein portion lacks 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 amino acids corresponding to amino acids 88–103 of SEQ ID NO: 84.
  • a SARS-CoV-2 N protein portion may comprise a truncated N-terminus.
  • a modified N protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the N-terminus of a full-length N protein amino acid sequence.
  • a modified N protein portion lacks an amino acid sequence comprising 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 101, 102, or 103 amino acids, which is present at the N-terminus of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–103 of a full- length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–103 of SEQ ID NO: 84. In some embodiments, a modified N protein portion lacks an amino acid sequence comprising 10, 20, 30, 40, 41, or 42 amino acids, which is present at the N-terminus of a full-length N protein.
  • a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–42 of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–42 of SEQ ID NO: 84.
  • a SARS-CoV-2 N protein portion may comprise a truncated C-terminus.
  • a modified N protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the C-terminus of a full-length N protein amino acid sequence.
  • a modified N protein portion lacks an amino acid sequence comprising 10, 20, 30, 40, 50, 51, 52, or 53 amino acids, which is present at the C- terminus of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 367–419 of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 367–419 of SEQ ID NO: 84.
  • a SARS-CoV-2 N protein portion may comprise two or more amino acid sequences derived from a full-length N protein.
  • the amino acid sequences of the full-length N protein may be derived from the same N protein, or different N proteins (e.g., N proteins of different SARS- CoV-2 lineages). Any pair of the two or more amino acid sequences may be contiguous in the N protein portion (e.g., without an intervening amino acid sequence), or the two amino acid sequences may be separated by a linker. Where multiple linkers separate multiple pairs of amino acid sequences, the multiple linkers may each comprise the same amino acid sequence (e.g., AAY). Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS).
  • an N protein portion comprises an N-terminal domain of a SARS- CoV-2 N protein. In some embodiments, the N protein portion comprises a truncated N-terminal domain of a SARS-CoV-2 N protein. In some embodiments, the N protein portion lacks a basic loop corresponding to amino acids 88–103 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence with at least 90% identity to amino acids 104– 143 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence corresponding to amino acids 104–143 of SEQ ID NO: 84.
  • the N protein portion comprises an amino acid sequence with at least 90% identity to amino acids 43–87 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence corresponding to amino acids 43–87 of SEQ ID NO: 84. In some embodiments, two or more amino acid sequences of an N protein portion are connected by GGSGG (SEQ ID NO: 99). In some embodiments, the N protein portion comprises a C-terminal domain of a SARS- CoV-2 N protein. In some embodiments, the C-terminal domain comprises an amino acid sequence 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, or 110 amino acids in length.
  • the C-terminal domain comprises an amino acid sequence corresponding to amino acids 255–364 of SEQ ID NO: 84.
  • the N protein portion comprises an amino acid sequence with at least 90% identity to amino acids 213–366 of SEQ ID NO: 84.
  • the N protein portion comprises an amino acid sequence corresponding amino acids 213–366 of SEQ ID NO: 84.
  • Matrix (M) Proteins and Portions Some aspects relate to proteins comprising a matrix (M) portion and/or RNAs encoding the same.
  • the SARS-CoV-2 M protein is an integral membrane protein that plays a crucial role involved in the assembly, budding, and maturation of viral particles.
  • the M protein is critical for maintaining the contributes to maintenance of virion structural integrity via stabilization of the lipid bilayer. It also participates in intracellular trafficking and localization of viral components by interacting with cellular transport proteins to direct viral proteins to the sites of viral assembly. Structurally, the M protein is a transmembrane domain of 222-229 amino acids, depending on the variant. It has three main regions: the N-terminal domain (NTD), the transmembrane domains (TMD), and the C-terminal domain (CTD).
  • NTD is the region of the M protein exposed on the cytoplasmic side of the viral envelope, where it interacts with the nucleocapsid protein.
  • the TMD spans the viral membrane, anchoring the M protein within the viral envelope.
  • an mRNA encodes a protein comprising a SARS-CoV-2 matrix (M) protein portion.
  • a protein comprises a SARS-CoV-2 matrix (M) protein portion.
  • M protein portions may contain a truncation or deletion of one or more regions, relative to a full-length or naturally occurring M protein, that are sparse in CD4+ and/or CD8+ T cell epitopes, thereby increasing the epitope density of the M protein portion compared to a full- length M protein.
  • an M protein portion comprises a higher density or CD4+ T cell epitopes than a wild-type SARS-CoV-2 M protein.
  • an M protein portion comprises a higher density or CD8+ T cell epitopes than a wild-type SARS-CoV- 2 M protein.
  • M protein portions may also be modified to disrupt or remove functional regions of the M protein to improve safety or immunogenicity.
  • an M protein lacks one or more glycosylation sites.
  • Some portions of SARS-COV-2 M proteins that are shortened or removed by truncation are located internally on a wild-type SARS-CoV-2 M protein, and so truncations in these portions are internal truncations.
  • a SARS-CoV-2 M protein portion comprising an internal truncation lacks one or more amino acids that is present in a full-length M protein sequence, but comprises one or more amino acids that flank the deleted amino acid(s) in the full-length M protein sequence.
  • a modified M protein portion lacks an amino acid sequence comprising 1–200, 1–190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130– 200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present between the N-terminal amino acid and C-terminal amino acid of a full-length M amino acid sequence.
  • a SARS-CoV-2 M protein portion may comprise a truncated N-terminus.
  • a modified M protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the
  • a modified M protein portion lacks an amino acid sequence comprising 1, 2, 3, 4, or 5 amino acids, which is present at the N-terminus of a full- length M protein. In some embodiments, a modified M protein portion lacks an amino acid sequence corresponding to amino acids 1–5 of a full-length M protein. The N-terminal amino acids 1–5 of a full-length M protein are involved in glycosylation of the full-length M protein, and so removal of one or more of these amino acids results in a less glycosylated M protein portion, relative to a full-length M protein. In some embodiments, a modified M protein portion lacks an amino acid sequence corresponding to amino acids 1–5 of SEQ ID NO: 86.
  • a SARS-CoV-2 M protein portion may comprise a truncated C-terminus.
  • a modified M protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the
  • an M protein portion lacks one or more C-terminal amino acids of SEQ ID NO: 86.
  • an M protein portion comprises an amino acid sequence comprising 1–222, 1–220, 1–217, 1–215, 1–210, 1–200, 1–190, 1–180, 1–170, 1–160, 1–150, 1– 140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–222, 20–222, 30–222, 40–222, 50–222, 60–222, 70–222, 80–222, 90–222, 100–222, 110–222, 120–222, 130–222, 140–222, 150–222, 160–222, 170–222, 180–222, 190–222, 200– 222, 210–222, 217–222, 10–30,
  • the full-length M protein amino acid sequence is SEQ ID NO: 86.
  • an M protein portion comprises one or more transmembrane domains of a full-length SARS-CoV-2 M protein.
  • an M protein portion comprises 1, 2, or 3 transmembrane domains of a full-length SARS-CoV-2 M protein portion.
  • Exemplary transmembrane domains of a full-length SARS-CoV-2 M protein are located at amino acids 19–1000 of SEQ ID NO: 86, such that the N-terminal amino acids are present outside of a virus particle, and C-terminal amino acids are present inside a virus particle.
  • an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 19–100 of SEQ ID NO: 86. In some embodiments, an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 6–104 of SEQ ID NO: 86. In some embodiments, an M protein portion comprises a ⁇ -sheet domain.
  • an M protein portion of a protein comprises a ⁇ -sheet domain that is C-terminal to one or more transmembrane domains of the M protein portion.
  • an M protein portion comprises the ⁇ -sheet domain of a full-length M protein, and the ⁇ -sheet domain is N-terminal to one or more transmembrane domains (e.g. M protein transmembrane domains). In some embodiments, the ⁇ -sheet domain is N-terminal to 1, 2, or 3 transmembrane domains of the full-length SARS-CoV-2 M protein.
  • Rearrangement of the M protein domains in this manner, where the ⁇ -sheet domain is N-terminal to one or more transmembrane domains, allows the ⁇ -sheet domain to be located extracellularly when a protein comprising the M protein portion is expressed in a cell and embedded in the cell membrane.
  • this localization of the ⁇ -sheet domain outside the cell membrane reduces interaction of the ⁇ -sheet domain with intracellular components, and also exposes the ⁇ -sheet domain for the generation of antibodies specific to the ⁇ -sheet domain.
  • an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 118–222 of SEQ ID NO: 86. In some embodiments, an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 105–222 of SEQ ID NO: 86.
  • a SARS-CoV-2 M protein portion may comprise two or more amino acid sequences derived from a full-length M protein.
  • the amino acid sequences of the full-length M protein may be derived from the same M protein, or different M proteins (e.g., M proteins of different SARS- CoV-2 lineages). Any pair of the two or more amino acid sequences may be contiguous in the M protein portion (e.g., without an intervening amino acid sequence), or the two amino acid sequences may be separated by a linker. Where multiple linkers separate multiple pairs of amino acid sequences, the multiple linkers may each comprise the same amino acid sequence (e.g., AAY). Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS).
  • Nsp3 Non-structural protein 3
  • Nsp3 proteins comprising non-structural protein 3 (Nsp3) portion and/or RNAs encoding the same.
  • the SARS-CoV-2 Nsp3 protein is large and multifunctional. It is involved in several processes during the viral life cycle, including viral replication, host immune response modulation, and viral pathogenesis. Nsp3 plays a crucial role in the formation of the replication- transcription complex (RTC) and acts as a scaffold for various enzymatic activities.
  • RTC replication- transcription complex
  • the SARS-CoV-2 Nsp3 protein is a large, multidomain protein with a molecular weight of approximately 200 kDa.
  • Some of the important domains within Nsp3 include the papain-like protease (PLpro) domain; the macrodomain (also known as ADP-ribose- 1''-phosphatase); and the ubiquitin-like domain (Ubl1), which are involved in various enzymatic activities and protein-protein interactions.
  • PLpro papain-like protease
  • Ubl1 ubiquitin-like domain
  • an mRNA encodes a protein comprising a SARS-CoV-2 non- structural protein 3 (Nsp3) protein portion.
  • a protein comprises a SARS- CoV-2 non-structural protein 3 (Nsp3) protein portion.
  • a SARS-CoV-2 Nsp3 protein portion may comprise one or more T cell epitopes.
  • T cell epitopes may be CD4+ T cell epitopes or CD8+ T cell epitopes.
  • a CD4+ T cell epitope refers to an amino acid sequence that is presented on a class II major histocompatibility (MHC-II) protein.
  • a CD8+ T cell epitope refers to an amino acid sequence that is presented on a class I major histocompatibility (MHC-I) protein.
  • MHC-I proteins present peptides that are typically 8– 11 amino acids in length.
  • a CD8+ T cell epitope may thus comprise an amino acid sequence 8– 11 amino acids in length.
  • a protein may comprise a T cell epitope, such that when a peptide consisting of the amino acid sequence of the epitope is presented on a protein, a T cell recognizes the peptide:MHC complex.
  • Peptides consisting of amino acid sequences present in a protein are generated by cleavage of proteins by a proteasome, which cleaves peptide bonds to release peptide fragments, and peptides are loaded into antigen-presenting grooves of MHC proteins.
  • an Nsp3 protein portion comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more T cell epitopes.
  • an Nsp3 protein portion comprises 2–6 T cell epitopes. In some embodiments, an Nsp3 protein portion comprises at least 6 T cell epitopes. In some embodiments, an Nsp3 protein portion comprises 6 T cell epitopes. In some embodiments, Nsp3 protein portion comprises 6 different epitope sequences. In some embodiments, Nsp3 protein portion comprises 6 different epitope sequences that do not overlap in the amino acid sequence of the protein. In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence ALRKVPTDNYITTY (SEQ ID NO: 149).
  • the Nsp3 protein portion comprises an epitope with the amino acid sequence SNEKQEILGTVSWNL (SEQ ID NO: 150). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence HTTDPSFLGRYMSAL (SEQ ID NO: 151). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence LVAEWFLAYILFTRFFYV (SEQ ID NO: 152). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence YIFFASFYYVWKSYV (SEQ ID NO: 153).
  • the Nsp3 protein portion comprises an epitope with the amino acid sequence AEAELAKNVSLDNVL (SEQ ID NO: 154). In some embodiments, the Nsp3 protein portion comprises each of SEQ ID NOs: 149–154. In some embodiments, an Nsp3 protein portion comprises two or more T cell epitopes, and those T cell epitopes occur in a different order in the Nsp3 protein portion than in a full- length SARS-CoV-2 Nsp3 protein.
  • a naturally occurring SARS-CoV-2 Nsp3 protein comprises three epitopes (E 1 , E 2 , and E 3 ) in the order E 1 –X–E 2 –X–E 3 , where each X is either a peptide bond, or one or more intervening amino acids between epitopes in the naturally occurring Nsp3 protein
  • a Nsp3 protein may comprise the three epitopes in the order same order (E1–E2–E3), or a different order.
  • the Nsp3 protein portion comprises two or more epitopes in the same order as they occur in a naturally occurring or full-length Nsp3.
  • the Nsp3 protein portion comprises two or more epitopes in a different order than they occur in a naturally occurring or full-length Nsp3 protein. In some embodiments, the Nsp3 protein portion lacks one or more junctional epitopes formed by an amino acid sequence that overlaps with a first epitope sequence and a second epitope sequence.
  • a concatenation of both epitopes includes the junctional epitopes YITTYSNEK (SEQ ID NO: 156) and ITTYSNEKQ (SEQ ID NO: 157) (underlined amino acids are present in the first epitope sequence, and bolded amino acids are present in the second epitope sequence).
  • junctional epitopes may be absent from a SARS- CoV-2 Nsp3 protein, and so T cells specific to those junctional epitopes are expected to provide little protection, if any, against a SARS-CoV-2 infection. Additionally, junctional epitopes formed by concatenation of two or more amino acid sequences may include amino acid sequences present in endogenous human proteins, or resemble amino acid sequences present in endogenous human proteins. Presentation of such junctional epitopes may cause deleterious activation of T cells that then exhibit a response to endogenous proteins in a subject.
  • rearrangement of two or more epitopes of a full-length Nsp3 protein, in an Nsp3 protein portion allows for the avoidance of one or more junctional epitopes, while maintaining the presence of those epitopes that are useful in generating an anti-SARS-CoV-2 T cell response. While rearrangement may also introduce a different junctional epitope, such an introduced junctional epitope may present on an MHC-I allele that is less common in the human population.
  • the Nsp3 protein portion lacks one or more amino acid sequences selected from YITTYSNEK (SEQ ID NO: 156), ITTYSNEKQ (SEQ ID NO: 157), TVSWNLHTT (SEQ ID NO: 158), WNLHTTDPS (SEQ ID NO: 159), RYMSALLVA (SEQ ID NO: 160), MSALLVAEW (SEQ ID NO: 161), SALLVAEWF (SEQ ID NO: 162), RFFYVYIFF (SEQ ID NO: 163), TRFFYVYIF (SEQ ID NO: 164), KSYVAEAEL (SEQ ID NO: 165), VWKSYVAEA (SEQ ID NO: 166), WKSYVAEAE (SEQ ID NO: 168), and VAEAELA (SEQ ID NO: 169).
  • two Nsp3 epitope sequences are connected by a linker.
  • the presence of a linker may eliminate a junctional epitope that would otherwise be present if two epitope sequences were concatenated without any intervening amino acids.
  • the presence of a linker reduces the chance a junctional epitope will be presented on an MHC-I protein.
  • two of a pair of Nsp3 epitopes are connected by an AAY linker.
  • the amino acid sequence AAY is a cleavage site for mammalian proteasomes, and so inclusion of an AAY linker facilitates cleavage between the two epitopes, increasing the efficiency of Nsp3 peptide epitope production and presentation.
  • a SARS-CoV-2 Nsp3 portion may comprise two or more amino acid sequences derived from a full-length Nsp3.
  • the amino acid sequences of the full-length Nsp3 may be derived from the same Nsp3, or different Nsp3s (e.g., Nsp3s of different SARS-CoV-2 lineages). Any pair of the two or more amino acid sequences may be contiguous in the Nsp3 portion (e.g., without an intervening amino acid sequence), or the two amino acid sequences may be separated by a linker. Where multiple linkers separate multiple pairs of amino acid sequences, the multiple linkers may each comprise the same amino acid sequence (e.g., AAY).
  • Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS).
  • two or more amino acid sequences of an Nsp3 portion are connected by GGSGG (SEQ ID NO: 99).
  • Spike (S) Proteins The envelope spike (S) proteins of known betacoronaviruses determine the virus host tropism and entry into host cells. Coronavirus spike (S) protein is a choice antigen for the vaccine design as it can induce neutralizing antibodies and protective immunity. S protein is critical for SARS-CoV-2 infection.
  • S protein refers to a glycoprotein that that forms homotrimers protruding from the envelope (viral surface) of viruses including betacoronaviruses. Trimerized Spike protein facilitates entry of the virion into a host cell by binding to a receptor on the surface of a host cell followed by fusion of the viral and host cell membranes.
  • the S protein is a highly glycosylated and large type I transmembrane fusion protein that is made up of 1,160 to 1,400 amino acids, depending upon the type of virus. Betacoronavirus Spike proteins comprise between about 1100 to 1500 amino acids.
  • SARS-CoV-2 spike (S) protein is a primary antigen choice for vaccine design, as it can induce neutralizing antibodies and protective immunity.
  • mRNAs are designed to produce SARS- CoV-2 Spike proteins (i.e., encode Spike proteins such that Spike protein is expressed when the mRNA is delivered to a cell or tissue, for example a cell or tissue in a subject), as well as variants thereof.
  • Spike protein may be necessary for a virus, e.g., a betacoronavirus, to perform its intended function of facilitating virus entry into a host cell
  • a certain amount of variation in Spike protein structure and/or sequence is tolerated when seeking primarily to elicit an immune response against Spike protein.
  • minor truncation e.g., of one to a few, possibly up to 5 or up to 10 amino acids from the N- or C-terminus of the encoded Spike protein, e.g., encoded Spike protein antigen, may be tolerated without changing the antigenic properties of the protein.
  • the Spike protein is a stabilized Spike protein, for example, the Spike protein is stabilized by two proline substitutions (a 2P mutation).
  • the Spike protein is not a stabilized Spike protein, for example, the Spike protein is not stabilized by two proline substitutions (a 2P mutation).
  • the Spike protein is from a different virus strain.
  • a strain is a genetic variant of a microorganism (e.g., a virus).
  • New viral strains can be created due to mutation, which may be selected due to enhanced replication, transmissibility, and/or evasion of pre-existing immune responses (e.g., antigenic drift), or recombination of genetic components when two or more viruses infect the same cell, with such recombinant viruses being selected due to enhanced replication, transmissibility, and/or evasion of pre-existing immune responses.
  • Antigenic drift is a kind of genetic variation in viruses, arising by the accumulation of mutations in the virus genes that code for virus-surface proteins recognized by host immune responses (antibodies and T cells).
  • Antigenic shift is the process by which two or more different strains of a virus, or strains of two or more different viruses, combine to form a new subtype having a mixture of the surface antigens of the two or more original strains, which may create virus with a novel combination of surface antigens that did not previously exist in nature.
  • the term is often applied specifically to influenza viruses, where segmentation of the viral genome into distinct RNA segments, and reassortment of genome segments during virion production, allows the production of reassortant progeny with novel combinations of genome segments from co-infected cells.
  • genetic recombination may occur between non-segmented viruses (e.g., SARS-CoV-2) where multiple viral strains replicate in the same cell, e.g., by switching between two template genomes during replication, resulting in progeny genomes with combinations of sequences from two or more viral strains.
  • Antigenic shift is contrasted with antigenic drift (in which individual mutations accumulate over time, and may lead to a loss of immunity, or in vaccine mismatch).
  • a virus strain as used herein is a genetic variant or of a virus that is characterized by a differing isoform of one or more surface proteins of the virus.
  • SARS-CoV-2 for example, a different amino acid sequence in the SARS-CoV-2 spike protein where the immune response in an individual to the new strain is less effective than to the strain used to immunize or first infect the individual.
  • a new virus strain may arise from natural mutation or a combination of natural mutation and immune selection due to an ongoing immune response in an immunized or previously infected individual.
  • a new virus strain can differ by one, two, three or more amino acid mutations in regions of the spike protein responsible for a viral function such as receptor binding or viral fusion with a target cell.
  • a spike protein from a new strain may differ from the parental strain by as much as 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity at the amino acid level.
  • a natural virus strain is a variant of a given virus that is recognizable because it possesses some “unique phenotypic characteristics” that remain stable (e.g., stable and heritable biological, serological, and/or molecular characters) under natural conditions.
  • Such “unique phenotypic characteristics” are biological properties different from the compared reference virus, such as unique antigenic properties, host range (e.g., infecting a different kind of host), symptoms of disease caused by the strain, different type of disease caused by the strain (e.g., transmitted by different means), etc.
  • a “unique phenotypic characteristic” can be detected clinically (e.g., clinical manifestations detected in a host infected with the strain) or within a comparative animal experiment in which a researcher skilled in the art of virology can distinguish between the reference control virus-infected animal and the animal infected with the alleged new strain, without knowing which animal received which virus and without having any information about the differences between the two viruses.
  • a virus variant with a simple difference in genome sequence is not a separate strain if there is no recognizable distinct viral phenotype. The extent of genomic sequence variation is irrelevant for the classification of a variant as a strain since a distinct phenotype sometimes arises from few mutations.
  • the mRNA encodes an antigen from at least one virus strain variant or comprises mutations from at least one virus strain that is not wild-type SARS-CoV-2.
  • the vaccine comprises an mRNA encoding a Spike protein associated with the XBB.1.5 lineage variant.
  • the XBB.1.5 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan-Hu-1 Spike protein (SEQ ID NO: 87), including an N460K substitution, an F486P substitution, and an F490S substitution in the Spike protein.
  • an mRNA encodes a Spike protein with at least one substitution associated with the XBB.1.5 lineage variant.
  • an mRNA encodes a Spike protein with an N460K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F490S substitution relative to SEQ ID NO: 87. In some embodiments, the vaccine comprises an mRNA encoding a Spike protein associated with the XBB.1.16 lineage variant.
  • the XBB.1.16 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan-Hu-1 Spike protein (SEQ ID NO: 87), including a Q183E substitution, an F456L substitution, an F486P substitution, and an F490S substitution in the Spike protein.
  • an mRNA encodes a Spike protein with at least one substitution associated with the XBB.1.16 lineage variant.
  • an mRNA encodes a Spike protein with an Q183E substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a Spike protein with an F456L substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a Spike protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F490S substitution relative to SEQ ID NO: 87. Table 4, below, presents examples of Spike protein mutations in SARS-CoV-2 variants. In some embodiments, a Spike protein, e.g., an encoded Spike protein portion, has the amino acid sequence of SEQ ID NO: 87.
  • a Spike protein e.g., an encoded Spike protein antigen
  • the variant preferably has the same activity as the reference Spike protein sequence and/or has the same immune specificity as the reference Spike protein, as determined for example, in immunoassays (e.g., enzyme-linked immunosorbent assays (ELISA assays).
  • immunoassays e.g., enzyme-linked immunosorbent assays (ELISA assays).
  • S proteins of coronaviruses can be divided into two important functional subunits, of which include the N-terminal S1 subunit, which forms of the globular head of the S protein, and the C-terminal S2 region that forms the stalk of the protein and is directly embedded into the viral envelope.
  • the S1 subunit Upon interaction with a potential host cell, the S1 subunit will recognize and bind to receptors on the host cell, specifically angiotensin-converting enzyme 2 (ACE2) receptors, whereas the S2 subunit, which is the most conserved component of the S protein, will be responsible for fusing the envelope of the virus with the host cell membrane.
  • ACE2 angiotensin-converting enzyme 2
  • Each monomer of trimeric S protein trimer contains the two subunits, S1 and S2, mediating attachment and membrane fusion, respectively.
  • the two subunits are separated from each other by an enzymatic cleavage process.
  • S protein is first cleaved by furin-mediated cleavage at the S1/S2 site in infected cells. In vivo, a subsequent serine protease-mediated cleavage event occurs at the S2′ site within S1.
  • the S1/S2 cleavage site is at amino acids 676 – TQTNSPRRAR/SVA – 688 (SEQ ID NO: 45).
  • the S2’ cleavage site is at amino acids 811 – KPSKR/SFI – 818 (SEQ ID NO: 46).
  • S1 subunit e.g., S1 subunit antigen
  • S2 subunit e.g., S2 subunit antigen
  • Spike protein S1 or S2 subunit may be necessary for receptor binding or membrane fusion, respectively, a certain amount of variation in S1 or S2 structure and/or sequence is tolerated when seeking primarily to elicit an immune response against Spike protein subunits.
  • minor truncation e.g., of one to a few, possibly up to 4, 5, 6, 7, 8, 9 or 10 amino acids from the N- or C-terminus of the encoded subunit, e.g., encoded S1 or S2 protein antigens, may be tolerated without changing the antigenic properties of the protein.
  • a Spike protein e.g., an encoded Spike protein antigen
  • the S1 and S2 subunits of the SARS-CoV-2 Spike protein further include domains readily discernable by structure and function, which in turn can be featured in designing antigens to be encoded by the nucleic acid vaccines, in particular, mRNA vaccines.
  • domains include the N-terminal domain (NTD) and the receptor-binding domain (RBD), said RBD domain further including a receptor-binding motif (RBM)
  • NTD N-terminal domain
  • RBD receptor-binding domain
  • domains include fusion peptide (FP), heptad repeat 1 (HR1), heptad repeat 2 (HR2), transmembrane domain (TM), and cytoplasm domain, also known as cytoplasmic tail (CT) (Lu R. et al., supra; Wan et al., J. Virol. Mar 2020, 94 (7) e00127-20).
  • CTL cytoplasmic tail
  • the HR1 and HR2 domains can be referred to as the “fusion core region” of SARS-CoV-2 (Xia et al., 2020 Cell Mol Immunol. Jan; 17(1):1- 12.).
  • the S1 subunit includes an N terminal domain (NTD), a linker region, a receptor binding domain (RBD), a first subdomain (SD1), and a second subdomain (SD2).
  • the S2 subunit includes, inter alia, a first heptad repeat (HR1), a second heptad repeat (HR2), a transmembrane domain (TM), and a cytoplasmic tail.
  • NTD and RBD of S1 are good antigens for the vaccine design approach of some embodiments, as these domains have been shown to be the targets of neutralizing antibodies in betacoronavirus-infected individuals.
  • NTD refers to a domain within the SARS-CoV-2 S1 subunit comprising approximately 290 amino acids in length, having identity to amino acids 1-290 of the S1 subunit of the Spike protein having the amino acid sequence set forth as SEQ ID NO: 87.
  • the term “receptor binding domain” or “RBD” refers to a domain within the S1 subunit of SARS-CoV-2 comprising approximately 175-225 amino acids in length, having identity to amino acids 316-517 of the S1 subunit of the Spike protein having the amino acid sequence set forth as SEQ ID NO: 87.
  • the term “receptor binding motif” refers to the portion of the RBD that directly contacts the ACE2 receptor.
  • compositions may include mRNA that encodes any one or more full-length or partial (truncated or other deletion of sequence) S protein subunit (e.g., S1 or S2 subunit), one or more domain or combination of domains of an S protein subunit (e.g., NTD, RBD, or NTD-RBD fusions, with or without an SD1 and/or SD2), or chimeras of full-length or partial and S2 protein subunits.
  • S protein subunit e.g., S1 or S2 subunit
  • NTD, RBD, or NTD-RBD fusions e.g., NTD, RBD, or NTD-RBD fusions, with or without an SD1 and/or SD2
  • chimeras of full-length or partial and S2 protein subunits e.g., NTD, RBD, or NTD-RBD fusions, with or without an SD1 and/or SD2 protein subunits.
  • Proteins comprising one or more Spike protein domains may comprise one or more mutations associated with a virus strain variant, in the respective domain of the encoded protein.
  • an encoded NTD-RBD fusion protein may comprise a substitution corresponding to N460K in a full-length Spike protein, which the skilled artisan will understand refers to substitution of asparagine for lysine, where the substituted asparagine is one corresponding to N460 of a full-length Spike protein, when the NTD-RBD fusion protein sequence is aligned to a full-length Spike protein sequence (e.g., SEQ ID NO: 87).
  • the vaccine comprises an mRNA encoding an NTD-RBD fusion protein comprising one or more mutations associated with the XBB.1.5 lineage variant.
  • the XBB.1.5 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan- Hu-1 Spike protein (SEQ ID NO: 87), including an N460K substitution, an F486P substitution, and an F490S substitution in the Spike protein.
  • an mRNA encodes an NTD-RBD fusion protein with at least one substitution associated with the XBB.1.5 lineage variant.
  • an mRNA encodes an NTD-RBD fusion protein with an N460K substitution relative to SEQ ID NO: 87.
  • an mRNA encodes an NTD-RBD fusion protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an F490S substitution relative to SEQ ID NO: 87. In some embodiments, the vaccine comprises an mRNA encoding an NTD-RBD fusion protein comprising one or more mutations associated with the XBB.1.16 lineage variant.
  • the XBB.1.16 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan-Hu-1 Spike protein (SEQ ID NO: 87), including a Q183E substitution, an F456L substitution, an F486P substitution, and an F490S substitution in the Spike protein.
  • an mRNA encodes an NTD-RBD fusion protein with at least one substitution associated with the XBB.1.16 lineage variant.
  • an mRNA encodes an NTD-RBD fusion protein with an Q183E substitution relative to SEQ ID NO: 87.
  • an mRNA encodes an NTD-RBD fusion protein with an F456L substitution relative to SEQ ID NO: 87.
  • an mRNA encodes an NTD-RBD fusion protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an F490S substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a protein having at least one of the following mutations relative to SEQ ID NO: 87: T19I, A27S, V83A, G142D, H146Q, E180V, Q183E, V213E, G252V, G339H, R346T, L368I, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, V445P, G446S, N460K, S477N, T478K, T478R, E484A, F486P, F490S, Q498R, N501Y, Y505H, D614G, H655Y, N679K, P681H, N764K, D796Y, Q954H, and N969K.
  • the mRNA encodes a protein having 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 substitutions selected from T19I, A27S, V83A, G142D, H146Q, E180V, Q183E, V213E, G252V, G339H, R346T, L368I, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, V445P, G446S, N460K, S477N, T478K, T478R, E484A, F486P, F490S, Q498R, N501Y, Y505H, D614G, H655Y, N679K, P681H, N764K, D796Y, Q954H, and N969K
  • an mRNA encodes a protein having a T19I substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a A27S substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a V83A substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G142D substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a H146Q substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a E180V substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a protein having a Q183E substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a V213E substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G252V substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G339H substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a R346T substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a L368I substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a protein having a S371F substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a S373P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a S375F substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a T376A substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a D405N substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a R408S substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a protein having a K417N substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N440K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a V445P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G446S substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N460K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a S477N substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a protein having a T478K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a T478R substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a E484A substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a F490S substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a Q498R substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a protein having a N501Y substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a Y505H substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a D614G substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a H655Y substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N679K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a P681H substitution relative to SEQ ID NO: 87.
  • an mRNA encodes a protein having a N764K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a D796Y substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a Q954H substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N969K substitution relative to SEQ ID NO: 87. In some embodiments, the mRNA encodes a protein having one or more deletions relative to the SARS-CoV-2 S protein of SEQ ID NO: 87.
  • Exemplary deletions include, but are not limited to, deletions of L24, P25, P26, and Y144.
  • the mRNA encodes a protein lacking 1, 2, 3, or 4 amino acids corresponding to L24, P25, P26, or Y144 of SEQ ID NO: 87.
  • the mRNA encodes a protein having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 substitutions selected from T19I, A27S, V83A, G142D, H146Q, E180V, Q183E, V213E, G252V, G339H, R346T, L368I, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, V445P, G446S, N460K, S477N, T478K, T478R, E484A, F486P, F490S, Q498R, N501Y, Y505H, D614G, H655Y, N679K, P681H, N764K, D796Y, Q954H, and N969
  • an mRNA encodes a protein having a deletion of L24 relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a deletion of P25 relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a deletion of P26 relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a deletion of Y144 relative to SEQ ID NO: 87. In some embodiments, the mRNA vaccine comprises 1, 2, 3, 4, 5, or 6 mRNAs encoding different proteins, wherein each protein comprises at least one mutation and/or at least one deletion.
  • the mRNA vaccine further comprises an mRNA encoding a wild-type SARS-CoV-2 S protein or the antigenic fragment thereof.
  • the mRNA vaccine in some embodiments, is in a lipid nanoparticle (that is, the lipid nanoparticle comprises 1, 2, 3, 4, 5, or 6 mRNAs encoding different protein).
  • a composition comprises a first mRNA encoding a protein or variant thereof of a first SARS-CoV-2 virus and a second mRNA encoding a second protein or variant thereof of a second SARS-CoV-2 virus.
  • the first SARS-CoV-2 virus is a first circulating SARS-CoV-2 virus.
  • the second SARS-CoV-2 virus is a second circulating SARS-CoV-2 virus.
  • “Circulating viruses” as used herein refer to viruses that have been in circulation for 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, a portion of a year, 1 year, 1.5 years, 2 years, 3 years, or longer.
  • the first and second mRNAs are present in the composition in a 1:1, 1:2, 1:3, or 1:4 ratio.
  • the first and second mRNAs are present in the composition in a 2:1, 3:1, or 4:1 ratio.
  • the first and second mRNAs are present in the composition in a 1:1 ratio.
  • a S protein portion comprises a receptor binding domain (RBD) from a SARS-CoV-2 Spike protein.
  • a S protein portion comprises an N-terminal domain (NTD) from a SARS-CoV-2 Spike protein.
  • Fusion Proteins Some aspects relate to SARS-CoV-2 chimeric proteins comprising one or more portions of a SARS-CoV-2 N protein, a SARS-CoV-2 M protein, a SARS-CoV-2 Nsp3 protein, and/or a SARS-CoV-2 S protein, and/or RNAs encoding the same.
  • the encoded protein or proteins may include two or more proteins (e.g., protein and/or protein portions) joined together with or without a linker.
  • a chimeric protein comprises an N protein portion, an Nsp3 protein portion, and an M protein portion.
  • the N protein portion, Nsp3 protein portion, and M protein portion may occur in any order in the chimeric protein.
  • the portions occur, from N-to-C-terminal order, N-Nsp3-M. In some embodiments, the portions occur, from N-to-C-terminal order, N-M-Nsp3.
  • the portions occur, from N-to-C-terminal order, M-N-Nsp3. In some embodiments, the portions occur, from N-to-C- terminal order, M-Nsp3-N. In some embodiments, the portions occur, from N-to-C-terminal order, N-Nsp3-M. In some embodiments, the portions occur, from N-to-C-terminal order, N-M- Nsp3.
  • a chimeric protein comprising an M protein portion including one or more transmembrane domains
  • the amino acids N-terminal to the transmembrane domain(s) will be present extracellularly when the protein is embedded in a cell membrane, and the amino acids C-terminal to the transmembrane domain(s) will be present inside the cytoplasm when the protein is embedded in the cell membrane.
  • a chimeric protein comprises an S protein portion, an N protein portion, and a transmembrane portion comprising one or more transmembrane domains.
  • the transmembrane portion may comprise an M protein portion.
  • the transmembrane protein portion comprises a transmembrane domain from a protein other than a SARS-CoV-2 M protein.
  • the transmembrane domain is an influenza virus hemagglutinin (HA) transmembrane domain.
  • the S portion, N portion, and transmembrane portions may occur in any order.
  • the S portion is N-terminal to the transmembrane portion.
  • the N portion is N-terminal to the transmembrane portion.
  • the N portion is C-terminal to the transmembrane portion.
  • a fusion protein comprises a transmembrane domain.
  • the transmembrane domain may, in some embodiments, be from a virus that is not SARS-CoV-2.
  • the transmembrane domain may be from an influenza hemagglutinin transmembrane domain, which has been demonstrated to effectively anchor proteins at the cell surface.
  • Any pair of protein portions e.g., N portion, M portion, Nsp3 portion, S portion
  • the multiple linkers may each comprise the same amino acid sequence (e.g., AAY).
  • Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS).
  • a first linker comprises the amino acid sequence GGS
  • a second linker comprises the amino acid sequence GGS.
  • two or more portions of a chimeric protein portion are connected by a glycine linker or a glycine-serine linker.
  • each pair of protein portions of a chimeric protein are connected by a glycine linker or a glycine-serine linker.
  • each pair of protein portions are connected by the amino acid sequence GGSGG (SEQ ID NO: 99).
  • two or more portions of a chimeric protein are connected by a linker comprising the amino acid sequence AAY.
  • each pair of protein portions are connected by the amino acid sequence AAY.
  • two or more portions of a chimeric protein are connected by a linker comprising the amino acid sequence RKSY (SEQ ID NO: 136).
  • each pair of protein portions are connected by the amino acid sequence RKSY (SEQ ID NO: 136).
  • the amino acid sequence RKSY (SEQ ID NO: 136) is a cleavable linker.
  • inclusion of an RKSY (SEQ ID NO: 136) linker facilitates cleavage between the two connected protein portions, increasing the efficiency of peptide epitope production and presentation.
  • no linkers are present between protein portions in the chimeric protein.
  • no linkers are present between protein portions in the chimeric protein.
  • SARS-CoV-2 chimeric protein comprising an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183.
  • Exemplary sequences of SARS-CoV-2 chimeric proteins are provided in Appendix I and Table 5.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 101. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 102.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 103. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 104.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 105. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 106.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 107. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 108.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 109. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 110.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 111. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 112.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 113. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 114.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 115. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 116.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 117. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 118.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 119. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 120.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 121. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 122.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 123. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 124.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 170. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 171.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 172. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 173.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 174. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 175.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 176. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 177.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 178. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 179.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 180. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 181.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 182.
  • a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 183.
  • compositions include RNA that encodes a SARS-CoV-2 antigen variant.
  • Antigen variants or other polypeptide variants refers to molecules that differ in their amino acid sequence from a wild-type, native, or reference sequence.
  • the antigen/polypeptide variants may possess substitutions, deletions, and/or insertions at certain positions within the amino acid sequence, as compared to a native or reference sequence.
  • variants possess at least 50% identity to a wild-type, native or reference sequence.
  • variants share at least 80%, or at least 90% identity with a wild-type, native, or reference sequence.
  • the nucleic acid vaccines encode SARS-CoV-2 variant proteins comprising 1, 2, 3, 4, or more mutations relative to a reference sequence.
  • the nucleic acid vaccines encode SARS-CoV-2 variant proteins comprising less than 20, 18, 15, 12, or 10 mutations relative to a reference sequence. In some embodiments, the nucleic acid vaccines encode SARS-CoV-2 variant proteins having 1-501-40, 1-30, 1-25, 1-20, 1-15, 1-10, 5-50, 5-40, 5-30, 5-25, 5-20, 5-15, 5-10, 10-50, 10-40, 10-30, 10-25, 10-20, 10-15, 20-50, 20-40, 20-30, 20- 25, 25-50, 25-40, 25-30, 30-50, 30-40, 40-50 mutations (e.g., substitutions). As used herein, “mutation” refers to an amino acid substitution, insertion, or deletion.
  • a reference sequence refers to a naturally-occurring strain, for example, a naturally-occurring circulating strain of SARS-CoV-2.
  • Variant antigens/polypeptides encoded by nucleic acids may contain amino acid changes that confer any of a number of desirable properties, e.g., that enhance their immunogenicity, enhance their expression, and/or improve their stability or PK/PD properties in a subject.
  • Variant antigens/polypeptides can be made using routine mutagenesis techniques and assayed as appropriate to determine whether they possess the desired property. Assays to determine expression levels and immunogenicity are well known in the art and exemplary such assays are set forth in the Examples section.
  • PK/PD properties of a protein variant can be measured using art recognized techniques, e.g., by determining expression of antigens in a vaccinated subject over time and/or by looking at the durability of the induced immune response.
  • the stability of protein(s) encoded by a variant nucleic acid may be measured by assaying thermal stability or stability upon urea denaturation or may be measured using in silico prediction. Methods for such experiments and in silico determinations are known in the art.
  • a composition comprises an RNA or an RNA ORF that comprises a nucleotide sequence of any one of the sequences in Table 5, or comprises a nucleotide sequence at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to a nucleotide sequence of any one of the sequences in Table 5.
  • identity refers to a relationship between the sequences of two or more polypeptides (e.g. antigens) or polynucleotides (nucleic acids), as determined by comparing the sequences.
  • Identity also refers to the degree of sequence relatedness between or among sequences as determined by the number of matches between strings of two or more amino acid residues or nucleic acid residues. Identity measures the percent of identical matches between the smaller of two or more sequences with gap alignments (if any) addressed by a particular mathematical model or computer program (e.g., “algorithms”). Identity of related antigens or nucleic acids can be readily calculated by known methods.
  • Percent (%) identity as it applies to polypeptide or polynucleotide sequences is defined as the percentage of residues (amino acid residues or nucleic acid residues) in the candidate amino acid or nucleic acid sequence that are identical with the residues in the amino acid sequence or nucleic acid sequence of a second sequence after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. It is understood that identity depends on a calculation of percent identity but may differ in value due to gaps and penalties introduced in the calculation.
  • variants of a particular polynucleotide or polypeptide have at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% but less than 100% sequence identity to that particular reference polynucleotide or polypeptide as determined by sequence alignment programs and parameters known to those skilled in the art.
  • sequence alignment programs and parameters known to those skilled in the art.
  • tools for alignment include those of the BLAST suite (Stephen F. Altschul, et al (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res.25:3389-3402).
  • Another popular local alignment technique is based on the Smith-Waterman algorithm (Smith, T.F.
  • a general global alignment technique based on dynamic programming is the Needleman–Wunsch algorithm (Needleman, S.B. & Wunsch, C.D. (1970) “A general method applicable to the search for similarities in the amino acid sequences of two proteins.” J. Mol. Biol.48:443-453). More recently a Fast Optimal Global Sequence Alignment Algorithm (FOGSAA) has been developed that purportedly produces global alignment of nucleotide and protein sequences faster than other optimal global alignment methods, including the Needleman–Wunsch algorithm.
  • FOGSAA Fast Optimal Global Sequence Alignment Algorithm
  • sequence tags or amino acids such as one or more lysines
  • Sequence tags can be used for peptide detection, purification or localization.
  • Lysines can be used to increase peptide solubility or to allow for biotinylation.
  • amino acid residues located at the carboxy and amino terminal regions of the amino acid sequence of a peptide or protein may optionally be deleted providing for truncated sequences.
  • amino acids may alternatively be deleted depending on the use of the sequence, as for example, expression of the sequence as part of a larger sequence which is soluble or linked to a solid support.
  • sequences for (or encoding) signal sequences, termination sequences, transmembrane domains, linkers, multimerization domains (such as, e.g., foldon regions) and the like may be substituted with alternative sequences that achieve the same or a similar function.
  • cavities in the core of proteins can be filled to improve stability, e.g., by introducing larger amino acids.
  • buried hydrogen bond networks may be replaced with hydrophobic resides to improve stability.
  • glycosylation sites may be removed and replaced with appropriate residues.
  • sequences are readily identifiable to one of skill in the art. It should also be understood that some of the sequences contain sequence tags or terminal peptide sequences (e.g., at the N-terminal or C-terminal ends) that may be deleted, for example, prior to use in the preparation of an mRNA vaccine.
  • sequence tags or terminal peptide sequences e.g., at the N-terminal or C-terminal ends
  • protein fragments, functional protein domains, and homologous proteins are also considered to be within the scope of coronavirus antigens of interest.
  • any protein fragment meaning a polypeptide sequence at least one amino acid residue shorter than a reference antigen sequence but otherwise identical
  • an antigen includes 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mutations, as shown in any of the sequences provided or referenced herein.
  • Antigens/antigenic polypeptides can range in length from about 4, 6, or 8 amino acids to full length proteins.
  • an RNA that encodes a protein encodes a linker located between at least one or each domain (portion) of the protein.
  • the linker may be, for example, a cleavable linker or protease-sensitive linker.
  • the linker is selected from the group consisting of F2A linker, P2A linker, T2A linker, E2A linker, and combinations thereof (see, e.g., WO 2017/127750).
  • This family of self-cleaving peptide linkers, referred to as 2A peptides has been described in the art (see, e.g., Kim, J.H.
  • the linker is an F2A linker.
  • the linker is a GS linker.
  • GS linkers are polypeptide linkers that include glycine and serine amino acids repeats. They comprise flexible and hydrophilic residues and can be used to perform fusion of protein subunits without interfering in the folding and function of the protein domains, and without formation of secondary structures.
  • an RNA e.g., mRNA encodes a protein that comprises a GS linker that is 3 to 20 amino acids long.
  • the GS linker may have a length of (or have a length of at least) 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids.
  • a GS linker is (or is at least) 15 amino acids long (e.g., GGSGGSGGSGGSGGG (SEQ ID NO: 47)).
  • a GS linker is (or is at least) 8 amino acids long (e.g., GGGSGGGS (SEQ ID NO: 48)).
  • a GS linker is (or is at least) 7 amino acids long (e.g., GGGSGGG (SEQ ID NO: 49)).
  • a GS linker comprises the amino acid sequence GGGSGG (SEQ ID NO: 50). In some embodiments, a GS linker is (or is at least) 4 amino acid long (e.g., GGGS (SEQ ID NO: 51)). In some embodiments, the GS linker comprises (GGGS)n (SEQ ID NO: 127), where n is any integer from 1-5. In some embodiments, a GS linker is (or is at least) 4 amino acid long (e.g., GSGG (SEQ ID NO: 52)). In some embodiments, the GS linker comprises (GSGG)n (SEQ ID NO: 130), where n is any integer from 1-5.
  • a linker is a glycine linker, for example having a length of (or a length of at least) 3 amino acids (e.g., GGG).
  • a protein encoded by an RNA e.g., mRNA
  • two or more linkers which may be the same or different from each other.
  • linkers may be suitable for use in the constructs (e.g., encoded by nucleic acids).
  • polycistronic constructs RNA (e.g., mRNA) encoding more than one protein separately within the same molecule) may be suitable.
  • an RNA e.g., mRNA
  • an ORF that encodes a signal peptide fused to a protein.
  • Signal peptides comprising the N-terminal 15-60 amino acids of proteins, are typically involved for the translocation across the membrane on the secretory pathway and, thus, control the entry of proteins both in eukaryotes and prokaryotes to the secretory pathway.
  • the signal peptide of a nascent precursor protein pre-protein
  • ER endoplasmic reticulum
  • ER processing produces mature proteins, wherein the signal peptide is cleaved from precursor proteins, typically by an ER-resident signal peptidase of the host cell, or they remain uncleaved and function as a membrane anchor.
  • a signal peptide may also facilitate the targeting of the protein to the cell membrane.
  • a signal peptide may have a length of 15-60 amino acids.
  • a signal peptide may have a length of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids.
  • a signal peptide has a length of 20-60, 25-60, 30-60, 35- 60, 40-60, 45- 60, 50-60, 55-60, 15-55, 20-55, 25-55, 30-55, 35-55, 40-55, 45-55, 50-55, 15-50, 20-50, 25-50, 30-50, 35-50, 40-50, 45-50, 15-45, 20-45, 25-45, 30-45, 35-45, 40-45, 15-40, 20- 40, 25-40, 30-40, 35-40, 15-35, 20-35, 25-35, 30-35, 15-30, 20-30, 25-30, 15-25, 20-25, or 15-20 amino acids.
  • an RNA e.g., mRNA
  • an RNA comprises an open reading frame that encodes a protein fused to a signal peptide comprising an amino acid sequence that has at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of any one of the sequences, such as those reproduced below in Table 5.
  • an mRNA comprises an open reading frame that encodes a protein including an endogenous signal peptide of the wild-type protein (e.g., an mRNA encoding a (wild-type or modified) SARS-CoV-2 protein or variant thereof encodes a SARS-CoV-2 signal peptide).
  • an mRNA comprises an open reading frame that encodes a SARS-CoV-2 protein having an influenza virus hemagglutinin (HA) signal peptide.
  • the SARS-CoV-2 protein comprises the amino acid sequence of SEQ ID NO: 100.
  • Nucleic Acids Encoding SARS-CoV-2 Proteins Nucleic acids comprise a polymer of nucleotides (nucleotide monomers). Thus, nucleic acids are also referred to as polynucleotides. Nucleic acids may be or may include, for example, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), threose nucleic acid (TNA), glycol nucleic acid (GNA), peptide nucleic acid (PNA), locked nucleic acid (LNA, including LNA having a ⁇ -D-ribo configuration, ⁇ -LNA having an ⁇ -L-ribo configuration (a diastereomer of LNA), 2′-amino-LNA having a 2′-amino functionalization, and 2′-amino- ⁇ -LNA having a 2′- amino functionalization), ethylene nucleic acid (ENA), cyclohexenyl nucleic acid (CeNA) and/or
  • RNA comprises an open reading frame (ORF) encoding a SARS-CoV-2 protein or variant thereof.
  • the RNA e.g., mRNA
  • the RNA further comprises a 5 ⁇ untranslated region (UTR), 3 ⁇ UTR, a poly(A) tail and/or a 5 ⁇ cap analog.
  • Messenger RNA (mRNA) Messenger RNA (mRNA) is RNA that encodes a (at least one) protein (a naturally- occurring, non-naturally-occurring, or modified polymer of amino acids) and can be translated to produce the encoded protein in vitro, in vivo, in situ, or ex vivo.
  • mRNA is not self-amplifying RNA (saRNA) (see, e.g., Bloom K et al. Gene Therapy 2021; 28: 117–129 for a comparison of mRNA and saRNA).
  • saRNAs include alphavirus replicase sequences that encode an RNA-dependent RNA polymerase. mRNA does not include alphavirus replicase sequences.
  • nucleic acid sequences set forth in the instant application may recite “T”s in a representative DNA sequence but where the sequence represents mRNA, the “T”s would be substituted for “U”s.
  • any of the DNAs disclosed and identified by a particular sequence identification number herein also disclose the corresponding mRNA sequence complementary to the DNA, where each “T” of the DNA sequence is substituted with “U.”
  • Naturally-occurring eukaryotic mRNA molecules can contain stabilizing elements, including, but not limited to, UTRs at their 5′-end (5′ UTR) and/or at their 3′-end (3′ UTR), in addition to other structural features, such as a 5′-cap structure or a 3′-poly(A) tail. Both the 5′ UTR and the 3′ UTR are typically transcribed from the genomic DNA and are elements of the premature mRNA.
  • Untranslated Regions (UTRs) mRNAs may comprise one or more regions or parts which act or function as an untranslated region.
  • a “5′ untranslated region” (UTR) refers to a region of an mRNA that is directly upstream (i.e., 5′) from the start codon (i.e., the first codon of an mRNA transcript translated by a ribosome) that does not encode a polypeptide.
  • a “3′ untranslated region” refers to a region of an mRNA that is directly downstream (i.e., 3′) from the open reading frame (e.g., downstream from the last amino acid-encoding codon of an open reading frame, where the stop codon is considered part of the 3′ UTR, or downstream from the first stop codon signaling translation termination, where that stop codon is considered part of the open reading frame), and which does not encode a polypeptide.
  • the 5’ UTR may comprise a promoter sequence. Such promoter sequences are known in the art. It should be understood that such promoter sequences will not be present in an mRNA vaccine.
  • the mRNA may comprise a 5’ UTR and/or 3’ UTR.
  • UTRs of an mRNA are transcribed but not translated.
  • the 5′ UTR starts at the transcription start site and continues to the start codon but does not include the start codon; the 3′ UTR starts immediately following the open reading frame and continues until the transcriptional termination signal.
  • the 3′ UTR begins with a stop codon, such that no amino acids are added to a polypeptide beyond the last amino acid encoded by the open reading frame.
  • a 3′ UTR may further comprise one or more stop codons.
  • the regulatory features of a UTR can be incorporated into the polynucleotides to, among other things, enhance the stability of the molecule.
  • the specific features can also be incorporated to ensure controlled down-regulation of the transcript in case they are misdirected to undesired organs sites.
  • a variety of 5’ UTR and 3’ UTR sequences are known.
  • the 5′ UTR comprises a sequence provided in Table 1 or a sequence with at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identity to a 5′ UTR sequence provided in Table 1, or a variant or a fragment thereof.
  • the 3′ UTR comprises a sequence provided in Table 2 or a sequence with at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identity to a 3′ UTR sequence provided in Table 2, or a variant or a fragment thereof.
  • the mRNA may include any 5’ UTR and/or any 3’ UTR.
  • Exemplary UTR sequences include SEQ ID NOs: 1-44, 66-79 and 81-82; however, other UTR sequences may be used.
  • a 5' UTR comprises a sequence selected from: GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC (SEQ ID NO: 1), GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGACCCCGGCGCCGCCACC (SEQ ID NO: 2), GAGGAAAUCGCAAAAUUUGCUCUUCGCGUUAGAUUUCUUUUAGUUUUCUCGCAACUAGC AAGCUUUUUGUUCUCGCC (SEQ ID NO: 66), and GGAAAUCGCAAAAUUUGCUCUUCGCGUUAGAUUUCUUUAGUUUUCUCGCAACUAGCAA GCUUUUUGUUCUCGCC (SEQ ID NO: 5).
  • a 3′ UTR comprises, in 5′-to-3′ order: (a) the nucleic acid sequence UAAAGCUCCCCGGGGGCCUCGGUGGCCUAGCUUCUU GCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCAG (SEQ ID NO: 68), (b) an identification and ratio determination (IDR) sequence, and (c) the nucleic acid sequence UGGUCUUUGAAUAAAGUCUGAGUGGGCGGC (SEQ ID NO: 69).
  • each mRNA encoding a distinct protein comprises a 3′ UTR comprising, in 5′-to-3′ order: (a) the nucleotide sequence of SEQ ID NO: 68; (b) a distinct IDR sequence; and (c) the nucleotide sequence of SEQ ID NO: 69.
  • a 5′ UTR comprises a sequence derived from a 5′ UTR of a gene selected from HSD17B4, RPL32, ASAH1, ATP5A1, MP68, NDUFA4, NOSIP, RPL31, SLC7A3, TUBB4B and UBQLN2.
  • the 5′ UTR comprises a sequence derived from the 5′ UTR of human hydroxysteroid 17-beta dehydrogenase 4 (HSD17B4).
  • a 5′ UTR comprises the sequence GGGAGAGUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUCGUUGCAGG CCUUAUUCAAGCUUACC (SEQ ID NO: 70). In some embodiments, a 5′ UTR comprises the sequence GUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUCGUUGCAGGCCUUAU UC (SEQ ID NO: 71). In some embodiments, a 5′ UTR comprises the sequence GGGAGAAAGCUUACC (SEQ ID NO: 72).
  • a 3′ UTR comprises a sequence derived from a 3′ UTR of a gene selected from PSMB3, ALB7, alpha-globin, CASP1, COX6B1, GNAS, NDUFA1 and RPS9.
  • a 3′ UTR comprises a sequence derived from a 3′ UTR of PSMB3 (proteasome 20S subunit beta 3).
  • a 3′ UTR comprises a sequence derived from a 3′ UTR of alpha-globin (MUAG).
  • a 3′ UTR comprises the sequence AGGACUAGUCCCUGUUCCCAGAGCCCACUUUUUUUCUUUUUGAAAUAAAAUAGCCU GUCUUUCAGAUCU (SEQ ID NO: 73). In some embodiments, a 3′ UTR comprises the sequence GGACUAGUUAUAAGACUGACUAGCCCGAUGGGCCUCCCAACGGGCCCUCCUCCCCUCCUU GCACCGAGAUUAAU (SEQ ID NO: 74).
  • the mRNA comprises a 5′ UTR comprising the nucleotide sequence of any one of SEQ ID NOs: 70–72, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 73 or SEQ ID NO: 74.
  • the mRNA further comprises a polyA sequence comprising at least 64 consecutive adenosine nucleotides.
  • the mRNA further comprises a polyC sequence comprising at least 30 consecutive cytidine nucleotides.
  • a 5′ UTR comprises the sequence AACUAGUAUUCUUCUGGUCCCCACAGACUCAGAGAACCCGCCACC (SEQ ID NO: 75). In some embodiments, a 5′ UTR comprises the sequence GAGAAUAAACUAGUAUUCUUCUGGUCCCCACAGACUCAGAGAACCCGCCACC (SEQ ID NO: 76).
  • a 3′ UTR comprises the sequence CUGGUACUGCAUGCACGCAAUGCUAGCUGCCCCUUUCCCGUCCUGGGUACCCCGAGUCUC CCCCGACCUCGGGUCCCAGGUAUGCUCCCACCUCCACCUGCCCCACUCACCACCUCUGCUA GUUCCAGACACCUCCCAAGCACGCAGCAAUGCAGCUCAAAACGCUUAGCCUAGCCACACC CCCACGGGAAACAGCAGUGAUUAACCUUUAGCAAUAAACGAAAGUUUAACUAAGCUAUA CUAACCCCAGGGUUGGUCAAUUUCGUGCCAGCCACACC (SEQ ID NO: 77).
  • a 3′ UTR comprises the sequence CUCGAGCUGGUACUGCAUGCACGCAAUGCUAGCUGCCCCUUUCCCGUCCUGGGUACCCCG AGUCUCCCCCGACCUCGGGUCCCAGGUAUGCUCCCACCUCCACCUGCCCCACUCACCACCU CUGCUAGUUCCAGACACCUCCCAAGCACGCAGCAAUGCAGCUCAAAACGCUUAGCCUAGC CACACCCCCACGGGAAACAGCAGUGAUUAACCUUUAGCAAUAAACGAAAGUUUAACUAAG CUAUACUAACCCCAGGGUUGGUCAAUUUCGUGCCAGCCACACCCUGGAGCUAGC (SEQ ID NO: 78).
  • a 3′ UTR comprises the sequence CUGGUACUGCAUGCACGCAAUGCUAGCUGCCCCUUUCCCGUCCUGGGUACCCCGAGUCUC CCCCGACCUCGGGUCCCAGGUAUGCUCCCACCUCCACCUGCCCCACUCACCACCUCUGCUA GUUCCAGACACCUCCCAAGCACGCAGCAAUGCAGCUCAAAACGCUUAGCCUAGCCACACC CCCACGGGAAACAGCAGUGAUUAACCUUUAGCAAUAAACGAAAGUUUAACUAAGCUAUA CUAACCCCAGGGUUGGUCAAUUUCGUGCCAGCCACACCCUGGAGCUAGC (SEQ ID NO: 79).
  • an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 75 or SEQ ID NO: 76, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of any one of SEQ ID NOs: 77–79.
  • an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 76, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 78.
  • an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 76, an open reading frame, the nucleotide sequence UGAUGA, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 78.
  • the mRNA further comprises two poly(A) sequences separated by an intervening nucleotide sequence.
  • the mRNA further comprises the nucleotide sequence of SEQ ID NO: 80.
  • a 5′ UTR comprises the sequence GAGGAGACCCAAGCUACAUUUGCUUCUGACACAACUGUGUUCACUAGCAACCUCAAACAG ACACCGCCACC (SEQ ID NO: 81).
  • a 3′ UTR comprises the sequence GCUCGCUUUCUUGCUGUCCAAUUUCUAUUAAAGGUUCCUUUGUUCCCUAAGUCCAACUAC UAAACUGGGGGAUAUUAUGAAGGGCCUUGAGCAUCUGGAUUCUGCCUAAUAAAAAACAU UUAUUUUCAUUGC (SEQ ID NO: 82).
  • an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 81, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 82.
  • the mRNA further comprises a polyA tail comprising 109 consecutive adenosine nucleotides.
  • UTRs may also be omitted from the mRNA.
  • a 5 ⁇ UTR does not encode a protein (is non-coding).
  • Natural 5′ UTRs have features that play roles in translation initiation. They harbor signatures like Kozak sequences which are commonly known to be involved in the process by which the ribosome initiates translation of many genes.
  • a 5’ UTR is a heterologous UTR, i.e., is a UTR found in nature associated with a different ORF.
  • a 5’ UTR is a synthetic UTR, i.e., does not occur in nature.
  • Synthetic UTRs include UTRs that have been mutated to improve their properties, e.g., which increase gene expression as well as those which are completely synthetic.
  • Exemplary 5’ UTRs include Xenopus or human derived a-globin or b-globin (8278063; 9012219), human cytochrome b-245 a polypeptide, and hydroxysteroid (17b) dehydrogenase, and Tobacco etch virus (US8278063, US9012219).
  • CMV immediate-early 1 (IE1) gene (US2014/0206753, WO2013/185069)
  • the sequence GGGAUCCUACC SEQ ID NO: 54) (WO 2014/144196) may also be used.
  • a 5' UTR is a 5' UTR of a TOP gene lacking the 5' TOP motif (the oligopyrimidine tract) (e.g., WO2015/101414, WO2015/101415, WO2015/062738, WO2015/024667, WO2015/024667); 5' UTR element derived from ribosomal protein Large 32 (L32) gene (WO/2015101414, WO2015101415, WO/2015/062738), 5' UTR element derived from the 5' UTR of an hydroxysteroid (17- ⁇ ) dehydrogenase 4 gene (HSD17B4) (WO201/5024667), or a 5' UTR element derived from the 5' UTR of ATP5A1 (WO2015/024667) can be used.
  • L32 ribosomal protein Large 32
  • an internal ribosome entry site is used instead of a 5' UTR.
  • a 3 ⁇ UTR does not encode a protein (is non-coding).
  • Natural or wild type 3′ UTRs are known to have stretches of adenosines and uridines embedded in them. These AU rich signatures are particularly prevalent in genes with high rates of turnover. Based on their sequence features and functional properties, the AU rich elements (AREs) can be separated into three classes (Chen et al, 1995): Class I AREs contain several dispersed copies of an AUUUA motif within U-rich regions. C-Myc and MyoD contain class I AREs.
  • AREs possess two or more overlapping UUAUUUA(U/A)(U/A) nonamers. Molecules containing this type of AREs include GM-CSF and TNF-a. Class III ARES are less well defined. These U rich regions do not contain an AUUUA motif. c-Jun and Myogenin are two well-studied examples of this class. Most proteins binding to the AREs are known to destabilize the messenger, whereas members of the ELAV family, most notably HuR, have been documented to increase the stability of mRNA. HuR binds to AREs of all the three classes.
  • AREs 3′ UTR AU rich elements
  • cells can be transfected with different ARE-engineering molecules and by using an ELISA kit to the relevant protein and assaying protein produced at 6 hours, 12 hours, 1 day, 2 days, and 7 days post-transfection.
  • 5’ UTRs that are heterologous or synthetic may be used with any desired 3’ UTR sequence.
  • a heterologous or synthetic 5’ UTR may be used with a synthetic 3’ UTR or with a heterologous 3’ UTR.
  • Non-UTR sequences may also be used as regions or subregions within a nucleic acid.
  • introns or portions of introns sequences may be incorporated into regions of nucleic acid.
  • the ORF may be flanked by a 5′ UTR which may contain a strong Kozak translational initiation signal and/or a 3' UTR which may include an oligo(dT) sequence for templated addition of a poly-A tail.
  • a 5′ UTR may comprise a first polynucleotide fragment and a second polynucleotide fragment from the same and/or different genes such as the 5′ UTRs described in US2010/0293625 and WO2015/085318A2, each of which is herein incorporated by reference.
  • any UTR from any gene may be incorporated into the regions of a nucleic acid.
  • multiple wild-type UTRs of any known gene may be utilized.
  • Artificial UTRs which are not variants of wild type regions may be used. These UTRs or portions thereof may be placed in the same orientation as in the transcript from which they were selected or may be altered in orientation or location. Hence a 5′ or 3′ UTR may be inverted, shortened, lengthened, made with one or more other 5′ UTRs or 3′ UTRs.
  • the term “altered” as it relates to a UTR sequence means that the UTR has been changed in some way in relation to a reference sequence.
  • a 3′ UTR or 5′ UTR may be altered relative to a wild-type/native UTR by the change in orientation or location as taught above or may be altered by the inclusion of additional nucleotides, deletion of nucleotides, swapping or transposition of nucleotides. Any of these changes producing an “altered” UTR (whether 3′ or 5′) comprise a variant UTR.
  • a double, triple or quadruple UTR such as a 5′ UTR or 3′ UTR may be used.
  • a “double” UTR is one in which two copies of the same UTR are encoded either in series or substantially in series.
  • a double beta-globin 3′ UTR may be used as described in US2010/0129877, which is incorporated herein by reference.
  • Patterned UTRs may be used in RNAs.
  • patterned UTRs are those UTRs which reflect a repeating or alternating pattern, such as ABABAB or AABBAABBAABB or ABCABCABC or variants thereof repeated once, twice, or more than 3 times. In these patterns, each letter, A, B, or C represent a different UTR at the nucleotide level.
  • flanking regions are selected from a family of transcripts whose proteins share a common function, structure, feature, or property.
  • polypeptides of interest may belong to a family of proteins which are expressed in a particular cell, tissue or at some time during development.
  • the UTRs from any of these genes may be swapped for any other UTR of the same or different family of proteins to create a new polynucleotide.
  • a “family of proteins” is used in the broadest sense to refer to a group of two or more polypeptides of interest which share at least one function, structure, feature, localization, origin, or expression pattern.
  • the untranslated region may also include translation enhancer elements (TEE).
  • TEE translation enhancer elements
  • the TEE may include those described in US 2009/0226470, herein incorporated by reference, and those known in the art.
  • Open Reading Frames An open reading frame (ORF) is a continuous stretch of DNA or RNA that (1) begins with a start codon (e.g., ATG or AUG, encoding methionine), and (2) ends with a stop codon (e.g., TAA, TAG or TGA, or UAA, UAG or UGA) or is immediately followed by a stop codon.
  • a stop codon does not encode an amino acid, such that translation of an ORF terminates when a ribosome reaches the stop codon immediately following the last amino acid-encoding codon in the ORF.
  • a stop codon that results in translation termination may be considered part of the ORF, in which case the ORF ends with the stop codon.
  • the first stop codon immediately following the last amino acid-encoding codon of an ORF may considered part of the 3′ untranslated region (3′ UTR) of a DNA or RNA, rather than part of the ORF.
  • 3′ UTR 3′ untranslated region
  • an ORF sequence that ends in a codon encoding amino acid will be followed by one or more stop codons in a DNA or RNA.
  • An ORF may be followed by multiple stop codons.
  • stop codons reduces the extent of continued translation that may occur if a stop codon is mutated to a codon encoding an amino acid (readthrough), as a second stop codon may terminate translation even if a first stop codon is mutated and encodes an amino acid, such that only one amino acid is added to the C-terminus of the translated protein.
  • the multiple stop codons may comprise the same stop codon (e.g., UGAUGA).
  • Multiple stop codons may comprise different stop codons in series (e.g., UGAUAAUAG).
  • an ORF typically encodes a protein. It will be understood that the sequences disclosed herein may further comprise additional elements, e.g., 5’ and/or 3’ UTRs, but that those elements, unlike the ORF, need not necessarily be present in an RNA (e.g., mRNA).
  • RNA e.g., mRNA or self-amplifying RNA
  • SARS- CoV-2 chimeric protein comprising an open reading frame with a sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of any one of SEQ ID NOs: 125, 126, 128, 129, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, or 147.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 125.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 126.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 128.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 129.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 131.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 132.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 134.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 135.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 137.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 138.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 140.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 141.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 143.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 144.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 146.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 147.
  • an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 133.
  • an RNA comprises a 5′ terminal cap.
  • 5′-capping of polynucleotides may be completed concomitantly during an in vitro transcription reaction using, for example, the following chemical RNA cap analogs to generate the 5′-guanosine cap structure according to manufacturer protocols: 3 ⁇ -O-Me-m7G(5')ppp(5') G [the ARCA cap];G(5')ppp(5')A; G(5')ppp(5')G; m7G(5')ppp(5')A; m7G(5')ppp(5')G (New England BioLabs, Ipswich, MA).
  • 5′- capping of modified RNA may be completed post-transcriptionally using, for example, a Vaccinia Virus Capping Enzyme to generate the “Cap 0” structure: m7G(5')ppp
  • Cap 1 structure may be generated using both Vaccinia Virus Capping Enzyme and a 2′-O methyl-transferase to generate: m7G(5')ppp(5')G-2′-O- methyl.
  • Cap 2 structure may be generated from the Cap 1 structure followed by the 2′-O- methylation of the 5′-antepenultimate nucleotide using a 2′-O methyl-transferase.
  • Cap 3 structure may be generated from the Cap 2 structure followed by the 2′-O-methylation of the 5′- preantepenultimate nucleotide using a 2′-O methyl-transferase.
  • Enzymes may be derived from a recombinant source. Other cap analogs may be used.
  • a cap analog may be, for example, a dinucleotide cap, a trinucleotide cap, or a tetranucleotide cap.
  • a cap analog is a dinucleotide cap.
  • a cap analog is a trinucleotide cap.
  • a cap analog is a tetranucleotide cap.
  • a nucleotide cap (e.g., a trinucleotide cap or tetranucleotide cap), in some embodiments, comprises a compound of formula (I) a stereoisomer, tautomer or salt ;
  • ring B1 is a modified or unmodified Guanine;
  • ring B 2 and ring B 3 each independently is a nucleobase or a modified nucleobase;
  • X 2 is O, S(O) p , NR 24 or CR 25 R 26 in which p is 0, 1, or 2;
  • Y0 is O or CR6R7;
  • Y1 is O, S(O)n, CR6R7, or NR8, in which n is 0, 1, or 2;
  • each --- is a single bond or absent, wherein when each --- is a single bond, Yi is O, S(O) n , CR6R7, or NR8; and when each --- is absent, Y1 is
  • a cap analog may include any of the cap analogs described in international publication WO 2017/066797, published on 20 April 2017, incorporated by reference herein in its entirety.
  • the B 2 middle position can be a non-ribose molecule, such as arabinose.
  • R2 is ethyl-based.
  • a tetranucleotide cap comprises the following structure: .
  • a tetranucleotide cap comprises the following structure:
  • R is an alkyl (e.g., C 1 -C 6 alkyl). In some embodiments, R is a methyl group (e.g., C 1 alkyl). In some embodiments, R is an ethyl group (e.g., C 2 alkyl). In some embodiments, R is a hydrogen. In some embodiments, a tetranucleotide cap comprises GGAG. In some embodiments, a tetranucleotide cap comprises any one of the following structures: ; or .
  • poly(A) tail is a region of mRNA that is downstream, e.g., directly downstream (i.e., 3′), from the 3′ UTR that contains multiple, consecutive adenosine monophosphates.
  • a poly(A) tail may contain 10 to 300 adenosine monophosphates. It can, in some instances, comprise up to about 400 adenine nucleotides.
  • a poly(A) tail may contain 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or 300 adenosine monophosphates.
  • a poly(A) tail contains 50 to 250 adenosine monophosphates.
  • the poly(A) tail functions to protect mRNA from enzymatic degradation, e.g., in the cytoplasm, and aids in transcription termination, and/or export of the mRNA from the nucleus and translation.
  • the length of the 3′-poly(A) tail may be an essential element with respect to the stability of the individual mRNA.
  • a poly(A) tail has a length of about 50, about 100, about 150, about 200, about 250, about 300, about 350, or about 400 nucleotides.
  • a poly(A) tail has a length of 100 nucleotides.
  • an mRNA comprises a poly(A) sequence that has a length of 50– 75 nucleotides.
  • an mRNA comprises a poly(A) sequence that comprises 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 consecutive adenosine nucleotides.
  • an mRNA comprises a poly(A) sequence comprising 64 consecutive adenosine nucleotides.
  • the consecutive adenosine nucleotides of a poly(A) sequence are flanked at the 5′ and 3′ end by nucleotides that are not adenosine nucleotides.
  • an mRNA comprises a poly(C) sequence, which may comprise 10 to 300 cytidine nucleotides.
  • the poly(C) sequence comprises 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 consecutive cytidine nucleotides.
  • the poly(C) sequence comprises 30 cytidine nucleotides.
  • the consecutive cytidine nucleotides of a poly(C) sequence are flanked at the 5′ and 3′ end by nucleotides that are not cytidine nucleotides.
  • an mRNA comprises two poly(A) sequences separated by an intervening nucleotide sequence.
  • the intervening nucleotide sequence comprises no more than 3, no more than two, no more than 1, or no adenosine nucleotides. In some embodiments, the intervening sequence comprises 3 adenosine nucleotides. In some embodiments, the intervening sequence does not comprise an adenosine nucleotide. In some embodiments, the intervening sequence is no more than 30, no more than 25, no more than 20, no more than 15, or no more than 10 nucleotides long. In some embodiments, the intervening sequence consists of 10 nucleotides. In some embodiments, the intervening sequence comprises the sequence of GCAUAUGACU (SEQ ID NO: 55).
  • the intervening sequence does not begin with an adenosine nucleotide, and does not end with an adenosine nucleotide.
  • the first poly(A) sequences comprises at least 15, at least 20, at least 25, or at least 30 consecutive adenosine nucleotides.
  • the second poly(A) sequences comprises at least 55, at least 60, at least 65, or at least 70 consecutive adenosine nucleotides.
  • the first poly(A) sequence comprises 30 consecutive adenosine nucleotides.
  • the second poly(A) sequence comprises 70 adenosine nucleotides.
  • an mRNA comprises the nucleotide sequence AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  • an mRNA comprises a poly(A) sequence that comprises 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 190, or 120 consecutive adenosine nucleotides.
  • an mRNA comprises a poly(A) sequence that comprises at least 109 consecutive adenosine nucleotides.
  • an mRNA comprises a poly(A) sequence that comprises 109 consecutive adenosine nucleotides.
  • an mRNA comprises a poly(A) sequence that consists of 109 consecutive adenosine nucleotides.
  • Self-amplifying RNA Some aspects relate to self-amplifying RNA (e.g., an RNA replicon) encoding a SARS- CoV-2 chimeric protein.
  • An self-amplifying RNA refers to an RNA encoding one or more molecules (e.g., proteins), individually or in conjunction, are capable of replicating the self- amplifying RNA.
  • the proteins encoded by the self-amplifying RNA are non-structural proteins nsP1, nsP2, nsP3, and nsP4, which form an RNA-dependent RNA polymerase (RdRp), or replicase, that is capable of replicating the self-amplifying RNA.
  • RdRp RNA-dependent RNA polymerase
  • a self-amplifying RNA is capable of self-amplification in a cell, provided that the cell can translate the RNA and produce the encoded protein(s).
  • a self-amplifying RNA may be referred to as an RNA replicon.
  • viral non-structural protein is a protein encoded by a virus but that is not part of the virus particle.
  • the viral non-structural proteins in the context of self-amplifying RNA, replicate the nucleotide sequences encoding the vaccine antigen or therapeutic protein (e.g., SARS-CoV-2 chimeric protein) from the self-amplifying RNA via the sub-genomic viral promoters. Such replication driven by the viral sub-genomic promoter using the viral non- structural proteins enhances the expression level of the encoded protein.
  • the viral non-structural proteins are from a single-strand positive-sense RNA viruses. In some embodiments, the viral non-structural proteins are from an Alphavirus, belonging to the Togaviridae family. In some embodiments, the alphavirus is Sindbis or Venezuelan equine encephalitis virus. In some embodiments, the viral non-structural protein is an RNA-dependent RNA polymerase (RdRp) polyprotein P1234 (also termed NSP1-4). Upon translation, P1234 is rapidly cleaved into P123 and nsP4 by autoproteolytic activity originating from the nsP2 (proteinase) portion of the polyprotein.
  • RdRp RNA-dependent RNA polymerase
  • Alphaviral RNA synthesis occurs at the plasma membrane of a cell, where the nsPs, together with alphaviral RNA, form membrane invaginations (or “spherules”). These spherules contain dsRNA created by replication of “+” strand viral genomic RNA into “–“ strand anti-genomic RNA. The “–“ strand serves as a template from which additional “+” strand genomic RNA (synthesized from the 5’UTR) or a shorter subsequence of the genomic RNA (termed subgenomic RNA) is synthesized from the subgenomic viral promoter region located near the end of the nonstructural protein ORF.
  • the “+” strand genomic RNA and the subgenomic RNA are exported out of the spherules into the cytoplasm where they are translated by endogenous ribosomes.
  • the exported “+” strand genomic RNA can associate with nsPs and form additional spherules, thus resulting in exponential increase of replicon RNA.
  • the viral non-structural proteins facilitate the replication of the nucleotide sequences encoding the SARS-CoV-2 protein via the subgenomic viral promoters (also referred to as “subgenomic promoters” herein).
  • a “subgenomic viral promoter” refers to a promoter the drives the transcription of subgenomic mRNAs.
  • an mRNA is transcribed from genomic DNAs and episomal DNAs (e.g., plasmids). Some viruses may transcribe subgenomic mRNAs from a RNA replicon that is produced from its genomic RNA. Many positive-sense RNA viruses produce subgenomic mRNAs as one of the common infection techniques used by these viruses and generally transcribe late viral genes. Subgenomic viral promoters range from 20 nucleotide (Sindbis virus) to over 100 nucleotides (Beet necrotic yellow vein virus) and are usually found upstream of the transcription start.
  • the subgenomic viral promoter is 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 nucleotides long, or longer.
  • RNA e.g., mRNA
  • Stabilizing elements may include, for example, a histone stem-loop.
  • a stem-loop binding protein (SLBP) a 32 kDa protein has been identified. It is associated with the histone stem-loop at the 3'-end of the histone messages in both the nucleus and the cytoplasm.
  • SLBP RNA binding domain of SLBP is conserved through metazoa and protozoa; its binding to the histone stem-loop depends on the structure of the loop.
  • the minimum binding site includes at least three nucleotides 5’ and two nucleotides 3 ⁇ relative to the stem-loop.
  • an RNA (e.g., mRNA) includes an open reading frame (coding region), a histone stem-loop, and optionally, a poly(A) sequence or polyadenylation signal.
  • the poly(A) sequence or polyadenylation signal generally should enhance the expression level of the encoded protein.
  • the encoded protein in some embodiments, is not a histone protein, a reporter protein (e.g., Luciferase, GFP, EGFP, ⁇ -Galactosidase, EGFP), or a marker or selection protein (e.g., alpha-Globin, Galactokinase and Xanthine:guanine phosphoribosyl transferase (GPT)).
  • a reporter protein e.g., Luciferase, GFP, EGFP, ⁇ -Galactosidase, EGFP
  • a marker or selection protein e.g., alpha-Globin, Galactokina
  • an RNA e.g., mRNA
  • an RNA includes the combination of a poly(A) sequence or polyadenylation signal and at least one histone stem-loop, even though both represent alternative mechanisms in nature, they act synergistically to increase the protein expression beyond the level observed with either of the individual elements.
  • the synergistic effect of the combination of poly(A) and a histone stem-loop does not depend on the order of the elements or the length of the poly(A) sequence.
  • an RNA e.g., mRNA
  • HDE histone downstream element
  • Histone downstream element includes a purine-rich polynucleotide stretch of approximately 15 to 20 nucleotides 3′ of naturally-occurring stem-loops, representing the binding site for the U7 snRNA, which is involved in processing of histone pre-mRNA into mature histone mRNA.
  • the nucleic acid does not include an intron.
  • An RNA e.g., mRNA
  • the histone stem-loop is generally derived from histone genes and includes an intramolecular base pairing of two neighbored partially or entirely reverse complementary sequences separated by a spacer, consisting of a short sequence, which forms the loop of the structure.
  • the unpaired loop region is typically unable to base pair with either of the stem loop elements. It occurs more often in RNA, as is a key component of many RNA secondary structures but may be present in single-stranded DNA as well. Stability of the stem-loop structure generally depends on the length, number of mismatches or bulges, and base composition of the paired region. In some embodiments, wobble base pairing (non-Watson-Crick base pairing) may result.
  • the at least one histone stem-loop sequence comprises a length of 15 to 45 nucleotides.
  • an RNA e.g., mRNA
  • AURES AU-rich sequences removed. These sequences, sometimes referred to as AURES are destabilizing sequences found in the 3 ’UTR. The AURES may be removed from the mRNA. Alternatively, the AURES may remain in the mRNA. Sequence Modification
  • an open reading frame encoding a protein is codon optimized. Codon optimization methods are known in the art. An open reading frame of any one or more of the sequences may be codon optimized.
  • Codon optimization may be used to match codon frequencies in target and host organisms to ensure proper folding; bias GC content to increase RNA (e.g., mRNA) stability or reduce secondary structures; minimize tandem repeat codons or base runs that may impair gene construction or expression; customize transcriptional and translational control regions; insert or remove protein trafficking sequences; remove/add post translation modification sites in encoded protein (e.g., glycosylation sites); add, remove or shuffle protein domains; insert or delete restriction sites; modify ribosome binding sites and RNA (e.g., mRNA) degradation sites; adjust translational rates to allow the various domains of the protein to fold properly; or reduce or eliminate problem secondary structures within the polynucleotide.
  • RNA e.g., mRNA
  • Codon optimization tools, algorithms and services are known in the art – non-limiting examples include services from GeneArt (Life Technologies), DNA2.0 (Menlo Park CA) and/or proprietary methods.
  • the open reading frame sequence is optimized using optimization algorithms.
  • a codon optimized sequence shares less than 95% sequence identity to a naturally-occurring or wild-type sequence open reading frame (e.g., a naturally- occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein antigen).
  • a codon optimized sequence shares less than 90% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon optimized sequence shares less than 85% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein).
  • a naturally-occurring or wild-type sequence e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein.
  • a codon optimized sequence shares less than 80% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon optimized sequence shares less than 75% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein).
  • a codon optimized sequence shares between 65% and 85% (e.g., between about 67% and about 85% or between about 67% and about 80%) sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein).
  • a codon optimized sequence shares between 65% and 75% or about 80% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein).
  • a codon-optimized sequence encodes an antigen that is as immunogenic as, or more immunogenic than (e.g., at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 100%, or at least 200% more), than a SARS-CoV-2 protein encoded by a non-codon-optimized sequence.
  • the modified mRNAs When transfected into mammalian host cells, the modified mRNAs have a stability of between 12-18 hours, or greater than 18 hours, e.g., 24, 36, 48, 60, 72, or greater than 72 hours and are capable of being expressed by the mammalian host cells.
  • a codon optimized RNA may be one in which the levels of G/C are enhanced.
  • the G/C-content of nucleic acid molecules may influence the stability of the RNA.
  • RNA e.g., mRNA
  • having an increased amount of guanine (G) and/or cytosine (C) residues may be functionally more stable than RNA containing a large amount of adenine (A) and thymine (T) or uracil (U) nucleotides.
  • WO02/098443 discloses a pharmaceutical composition containing an RNA (e.g., mRNA) stabilized by sequence modifications in the translated region.
  • RNAs e.g., mRNA
  • Some embodiments of mRNAs comprise a sequence with a %G/C content of 30%–80%, 40%–70%, 50%–60%, 35%–50%, 50%–65%, 65%–70%, 40%–45%, 45%–50%, 50%–55%, 55%–70%, 70%–75%, or 75%–80%.
  • the nucleic acid sequence of the full-length mRNA comprises a %G/C content of 30%–80%, 40%–70%, 50%–60%, 35%–50%, 50%–65%, 65%–70%, 40%–45%, 45%–50%, 50%–55%, 55%–70%, 70%–75%, or 75%–80%.
  • the mRNA comprises an ORF with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%.
  • the mRNA comprises 5′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%.
  • the mRNA comprises 3′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%.
  • a modified mRNA comprises a higher %G/C content than a wild-type mRNA sequence.
  • the %G/C content of the modified mRNA sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the wild-type RNA sequence.
  • the %G/C content of the modified ORF sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the wild-type ORF sequence.
  • the %G/C content of the modified 5′ UTR sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the wild-type 3′ UTR sequence.
  • Chemically Modified Nucleotides comprise, in some embodiments, an RNA having an open reading frame encoding a protein, wherein the nucleic acid comprises nucleotides and/or nucleosides that can be standard (unmodified) or modified as is known in the art.
  • nucleotides and nucleosides comprise modified nucleotides or nucleosides.
  • modified nucleotides and nucleosides can be naturally-occurring modified nucleotides and nucleosides or non-naturally-occurring modified nucleotides and nucleosides.
  • modifications can include those at the sugar, backbone, or nucleobase portion of the nucleotide and/or nucleoside as are recognized in the art.
  • a naturally-occurring modified nucleotide or nucleotide is one as is generally known or recognized in the art.
  • Non-limiting examples of such naturally-occurring modified nucleotides and nucleotides can be found, inter alia, in the widely recognized MODOMICS database.
  • a non-naturally-occurring modified nucleotide or nucleoside is one as is generally known or recognized in the art.
  • Non-limiting examples of such non-naturally- occurring modified nucleotides and nucleosides can be found, inter alia, in international publication numbers WO2013052523A1; WO2014093924A1; WO2015051173A2; WO2015051169A2; WO2015089511A2; or WO2017153936A1, each of which is herein incorporated by reference.
  • nucleic acids e.g., DNA and RNA, such as mRNA
  • nucleic acids can comprise standard nucleotides and nucleosides, naturally-occurring nucleotides and nucleosides, non-naturally- occurring nucleotides and nucleosides, or any combination thereof.
  • Nucleic acids e.g., DNA and RNA, such as mRNA
  • nucleic acids in some embodiments, comprise various (more than one) different types of standard and/or modified nucleotides and nucleosides.
  • a particular region of a nucleic acid contains one, two or more (optionally different) types of standard and/or modified nucleotides and nucleosides.
  • a modified RNA introduced to a cell or organism, exhibits reduced degradation in the cell or organism, respectively, relative to an unmodified nucleic acid comprising standard nucleotides and nucleosides.
  • a modified RNA introduced into a cell or organism, may exhibit reduced immunogenicity in the cell or organism, respectively (e.g., a reduced innate response) relative to an unmodified nucleic acid comprising standard nucleotides and nucleosides.
  • Nucleic acids in some embodiments, comprise non-natural modified nucleotides that are introduced during synthesis or post-synthesis of the nucleic acids to achieve desired functions or properties.
  • the modifications may be present on internucleotide linkages, purine or pyrimidine bases, or sugars.
  • the modification may be introduced with chemical synthesis or with a polymerase enzyme at the terminal of a chain or anywhere else in the chain. Any of the regions of a nucleic acid may be chemically modified.
  • Modified nucleosides and nucleotides may be present in a nucleic acid (e.g., RNA nucleic acids, such as mRNA nucleic acids).
  • nucleoside refers to a compound containing a sugar molecule (e.g., a pentose or ribose) or a derivative thereof in combination with an organic base (e.g., a purine or pyrimidine) or a derivative thereof (also referred to herein as “nucleobase”).
  • organic base e.g., a purine or pyrimidine
  • nucleobase also referred to herein as “nucleobase”.
  • nucleotide refers to a nucleoside, including a phosphate group. Modified nucleotides may by synthesized by any useful method, such as, for example, chemically, enzymatically, or recombinantly, to include one or more modified or non-natural nucleosides.
  • Nucleic acids can comprise a region or regions of linked nucleosides.
  • Such regions may have variable backbone linkages.
  • the linkages can be standard phosphodiester linkages, in which case the nucleic acids would comprise regions of nucleotides.
  • Modified nucleotide base pairing encompasses not only the standard adenosine-thymine, adenosine-uracil, or guanosine-cytosine base pairs, but also base pairs formed between nucleotides and/or modified nucleotides comprising non-standard or modified bases, wherein the arrangement of hydrogen bond donors and hydrogen bond acceptors permits hydrogen bonding between a non-standard base and a standard base or between two complementary non-standard base structures, such as, for example, in those nucleic acids having at least one chemical modification.
  • modified nucleobases in nucleic acids comprise 1-methyl-pseudouridine (m1 ⁇ ), 1-ethyl-pseudouridine (e1 ⁇ ), 5-methoxy-uridine (mo5U), 5-methyl-cytidine (m5C), 5-methyl-uridine (m5U), and/or pseudouridine ( ⁇ ).
  • modified nucleobases in nucleic acids comprise 5-methyluridine, 5-methoxymethyl uridine, 5-methylthio uridine, 1-methoxymethyl pseudouridine, 5-methyl cytidine, and/or 5- methoxy cytidine.
  • the polyribonucleotide includes a combination of at least two (e.g., 2, 3, 4 or more) of any of the aforementioned modified nucleobases, including but not limited to chemical modifications.
  • a mRNA comprises 1-methyl-pseudouridine (m1 ⁇ ) substitutions at one or more or all uridine positions of the nucleic acid. In some embodiments, a mRNA comprises 5-methyl-uridine (5mU) substitutions at one or more or all uridine positions of the nucleic acid. In some embodiments, a mRNA comprises 5-methyl-uridine (5mU) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid.
  • a mRNA comprises 1-methyl-pseudouridine (m1 ⁇ ) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid.
  • a mRNA comprises 5-methyl-uridine (5mU) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid.
  • a mRNA comprises pseudouridine ( ⁇ ) substitutions at one or more or all uridine positions of the nucleic acid.
  • a mRNA comprises pseudouridine ( ⁇ ) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid.
  • a mRNA comprises unmodified uridine at one or more or all uridine positions of the nucleic acid.
  • mRNAs are uniformly modified (e.g., fully modified, modified throughout the entire sequence) for a particular modification.
  • a nucleic acid can be uniformly modified with 1-methyl-pseudouridine, meaning that all uridine residues in the mRNA sequence are replaced with 1-methyl-pseudouridine.
  • nucleic acid can be uniformly modified for any type of nucleoside residue present in the sequence by replacement with a modified residue such as those set forth above.
  • the nucleic acids may be partially or fully modified along the entire length of the molecule.
  • one or more or all or a given type of nucleotide e.g., purine or pyrimidine, or any one or more or all of A, G, U, C
  • all nucleotides X in a nucleic acid are modified nucleotides, wherein X may be any one of nucleotides A, G, U, C, or any one of the combinations A+G, A+U, A+C, G+U, G+C, U+C, A+G+U, A+G+C, G+U+C or A+G+C.
  • the nucleic acid may contain from about 1% to about 100% modified nucleotides (either in relation to overall nucleotide content, or in relation to one or more types of nucleotide, i.e., any one or more of A, G, U or C) or any intervening percentage (e.g., from 1% to 20%, from 1% to 25%, from 1% to 50%, from 1% to 60%, from 1% to 70%, from 1% to 80%, from 1% to 90%, from 1% to 95%, from 10% to 20%, from 10% to 25%, from 10% to 50%, from 10% to 60%, from 10% to 70%, from 10% to 80%, from 10% to 90%, from 10% to 95%, from 10% to 100%, from 20% to 25%, from 20% to 50%, from 20% to 60%, from 20% to 70%, from 20% to 80%, from 20% to 90%, from 20% to 95%, from 20% to 100%, from 50% to 60%, from 50% to 70%, from 50% to 80%, from 50% to 90%, from 50% to 95%, from 50% to 100%, from 70% to
  • the mRNAs may contain at a minimum 1% and at maximum 100% modified nucleotides, or any intervening percentage, such as at least 5% modified nucleotides, at least 10% modified nucleotides, at least 25% modified nucleotides, at least 50% modified nucleotides, at least 80% modified nucleotides, or at least 90% modified nucleotides.
  • the nucleic acids may contain a modified pyrimidine such as a modified uracil or cytosine.
  • At least 5%, at least 10%, at least 25%, at least 50%, at least 80%, at least 90% or 100% of the uracil in the nucleic acid is replaced with a modified uracil (e.g., a 5-substituted uracil).
  • the modified uracil can be replaced by a compound having a single unique structure or can be replaced by a plurality of compounds having different structures (e.g., 2, 3, 4 or more unique structures).
  • cytosine in the nucleic acid is replaced with a modified cytosine (e.g., a 5-substituted cytosine).
  • the modified cytosine can be replaced by a compound having a single unique structure or can be replaced by a plurality of compounds having different structures (e.g., 2, 3, 4 or more unique structures).
  • Modified nucleotides may include modified nucleobases.
  • an RNA transcript may include a modified uracil nucleobase selected from pseudouracil ( ⁇ ), N1-methylpseudouracil (m1 ⁇ ), 1-ethylpseudouracil, 2-thiouracil, 4′-thiouracil, 2-thio-1- methyl-1-deaza-pseudouracil, 2-thio-1-methyl-pseudouracil, 2-thio-5-aza-uracil, 2-thio- dihydropseudouracil, 2-thio-dihydrouracil, 2-thio-pseudouracil, 4-methoxy-2-thio-pseudouracil, 4-methoxy-pseudouracil, 4-thio-1-methyl-pseudouracil, 4-thio-pseudouracil, 5-aza-uracil, dihydropseudouracil, 5-methyluracil, 5-methyluracil,
  • an RNA transcript (e.g., mRNA transcript) includes a modified guanine nucleobase selected from digoxigeninated guanine, 6-thioguanine, 7-deazaguanine, 7-deaza-7- propargylaminoguanine, 8-oxoguanine, araguanine, biotin-16-7-deaza-7-propargylaminoguanine, isoguanine, N2-methylguanine, O6-methylguanine, thienoguanine, and 2,6-daminoguanine.
  • a modified guanine nucleobase selected from digoxigeninated guanine, 6-thioguanine, 7-deazaguanine, 7-deaza-7- propargylaminoguanine, 8-oxoguanine, araguanine, biotin-16-7-deaza-7-propargylaminoguanine, isoguanine, N2-methylguanine, O6-methylgu
  • an RNA transcript may include a modified cytosine nucleobase selected from digoxigeninated cytosine, 2-thiocytosine, 5-aminoallylcytosine, 5-bromocytosine, 5- carboxycytosine, 5-formylcytosine, 5-hydroxycytosine, 5-hydroxymethylcytosine, 5- methoxycytosine, 5-methylcytosine, 5-propargylaminocytosine, 5-propynylcytosine, 6- azacytosine, aracytosine, cyanine 3-5-propargylaminocytosine, cyanine 3-aminoallylcytosine, cyanine 5-6-propargylaminocytosine, cyanine 5-aminoallylcytosine, desthiobiotin-6- aminoallylcytosine, N4-biotin-OBEA-cytosine, N4-methylcytosine, pseudoisocytosine, and thienocytosine.
  • an RNA transcript (e.g., mRNA transcript) includes a modified adenine nucleobase selected from digoxigeninated adenine, N6-methyladenine, 7- deazaadenine, 7-deaza-7-propargylaminoadenine, 8-azaadenine, 8-azidoadenine, 8- chloroadenine, 8-oxoadenine, araadenine, N1-methyladenine, N6-methyladenine, 3- deazaadenine, 2,6-diaminoadenine, 2-methyl-thio-N6-isopentenyladenine (ms2i6A), 2- methylthio-N6-methyladenine (ms2m6A), N6-(cis-hydroxyisopentenyl)adenine (io6A), 2- methylthio-N6-(cis-hydroxyisopentenyl)adenine (ms2io6A), N6-gly
  • an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified nucleobases.
  • Modified nucleotides may include modified sugars.
  • an RNA transcript may include a modified sugar selected from 2′-thioribose, 2′,3′-dideoxyribose, 2′-amino-2′-deoxyribose, 2′ deoxyribose, 2′-azido-2′-deoxyribose, 2′-fluoro-2′-deoxyribose, 2′- O-methylribose, 2′-O-methyldeoxyribose, 3′-amino-2′,3′-dideoxyribose, 3′-azido-2′,3′- dideoxyribose, 3′-deoxyribose, 3′-O-(2-nitrobenzyl)-2′-deoxyribose, 3′-O-methylribose, 5′- aminoribose, 5′-thioribose, 5-nitro-1-indolyl-2′-deoxyribose, 5
  • an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified sugars.
  • Modified nucleotides may include modified phosphates.
  • a modified phosphate group is a phosphate group that differs from the canonical structure of phosphate.
  • An example of a canonical is shown below: , where R 5 and R 3 are atoms or molecules to which the canonical phosphate is bonded.
  • R 5 may refer to the upstream nucleotide of the nucleic acid
  • R3 may refer to the downstream nucleotide of the nucleic acid.
  • the canonical structure of phosphate also refers to structures in which one or more hydroxyl groups of the phosphate are deprotonated, or in which an oxygen atom of the phosphate is bonded to an adjacent nucleotide in a nucleic acid sequence.
  • an RNA transcript may include a modified phosphate selected from phosphorothioate (PS), thiophosphate, 5′-O-methylphosphonate, 3′-O-methylphosphonate, 5′-hydroxyphosphonate, hydroxyphosphanate, phosphoroselenoate, selenophosphate, phosphoramidate, carbophosphonate, methylphosphonate, phenylphosphonate, ethylphosphonate, H-phosphonate, guanidinium ring, triazole ring, boranophosphate (BP), methylphosphonate, and guanidinopropyl phosphoramidate.
  • PS phosphorothioate
  • thiophosphate 5′-O-methylphosphonate
  • 3′-O-methylphosphonate 5′-hydroxyphosphonate
  • hydroxyphosphanate phosphoroselenoate
  • selenophosphate selenophosphate
  • phosphoramidate carbophosphonate, methylphosphon
  • an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified phosphates.
  • an mRNA includes N1-methylpseudouridine.
  • at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of uracil nucleotides in an mRNA comprise N1-methylpseudouridine.
  • each uracil nucleotide of an mRNA transcript comprises N1- methylpseudouridine.
  • an mRNA includes 5-methylcytidine. In some embodiments, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of cytosine nucleotides in an mRNA comprise 5-methylcytidine. In some embodiments, each cytosine nucleotide of an mRNA transcript comprises 5- methylcytidine. In some embodiments, an mRNA includes 5-methyluridine. In some embodiments, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of uracil nucleotides in an mRNA comprise 5-methyluridine.
  • each uracil nucleotide of an mRNA transcript comprises 5-methyluridine.
  • an mRNA includes 5-methylcytidine and 5-methyluridine.
  • at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of uracil nucleotides in an mRNA comprise 5-methyluridine and at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of cytosine nucleotides in an mRNA comprise 5-methylcytidine.
  • each cytosine nucleotide of an mRNA transcript comprises 5-methylcytidine and each uracil nucleotide of an mRNA transcript comprises 5-methyluridine.
  • an RNA e.g., mRNA
  • nucleotides and nucleosides comprise standard nucleoside residues such as those present in transcribed RNA (e.g., A, G, C, or U).
  • nucleotides and nucleosides comprise standard deoxyribonucleosides such as those present in DNA (e.g., dA, dG, dC, or dT).
  • IVT In Vitro Transcription
  • cDNA encoding the polynucleotides may be transcribed using an in vitro transcription (IVT) system.
  • IVT in vitro transcription
  • In vitro transcription of RNA is known in the art and is described in International Publication WO 2014/152027, which is incorporated by reference herein in its entirety.
  • the RNA is prepared in accordance with any one or more of the methods described in WO 2018/053209 and WO 2019/036682, each of which is incorporated by reference herein.
  • the RNA transcript is generated using a non-amplified, linearized DNA template in an in vitro transcription reaction to generate the RNA transcript.
  • the template DNA is isolated DNA.
  • the template DNA is cDNA.
  • the cDNA is formed by reverse transcription of a RNA polynucleotide, for example, but not limited to influenza virus mRNA.
  • cells e.g., bacterial cells, e.g., E. coli, e.g., DH-1 cells are transfected with the plasmid DNA template.
  • the transfected cells are cultured to replicate the plasmid DNA which is then isolated and purified.
  • the DNA template includes a RNA polymerase promoter, e.g., a T7 promoter located 5 ' to and operably linked to the gene of interest.
  • a RNA polymerase promoter e.g., a T7 promoter located 5 ' to and operably linked to the gene of interest.
  • an in vitro transcription template encodes a 5′ untranslated (UTR) region, contains an open reading frame, and encodes a 3′ UTR and a poly(A) tail.
  • UTR untranslated
  • An in vitro transcription system typically comprises a transcription buffer, nucleotide triphosphates (NTPs), an RNase inhibitor and a polymerase.
  • the NTPs may be manufactured in house, may be selected from a supplier, or may be synthesized.
  • the NTPs may be selected from, but are not limited to, those including natural and unnatural (modified) NTPs. Any number of RNA polymerases or variants may be used in the method.
  • the polymerase may be selected from, but is not limited to, a phage RNA polymerase, e.g., a T7 RNA polymerase, a T3 RNA polymerase, a SP6 RNA polymerase, and/or mutant polymerases such as, but not limited to, polymerases able to incorporate modified nucleic acids and/or modified nucleotides, including chemically modified nucleic acids and/or nucleotides. Some embodiments exclude the use of DNase.
  • the RNA transcript is capped via enzymatic capping.
  • the RNA comprises 5' terminal cap, for example, 7mG(5’)ppp(5’)NlmpNp.
  • the RNA polymerase is an RNA polymerase variant, such as those described in WO 2020/172239, incorporated herein by reference in its entirety. RNA polymerase variants include at least one amino acid substitution, relative to the wild type (WT) RNA polymerase.
  • a WT T7 RNA polymerase is represented by SEQ ID NO: 83: MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVA DNAAAKPLITTLLPKMIARINDWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNT TVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNVEEQLNKRVGHVYKKAFMQVVEADMLSKGL LGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIELAPEYAEAIATRAGA LAGISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQ NTAWKINKKVLAVANVITKWKHCPVEDIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKD KARKSRRISLEFMLEQANKFANHKAI
  • the RNA polymerase variant is a T7 RNA polymerase variant comprising at least one (one or more) amino acid substitution relative to WT RNA polymerase (e.g., WT T7 RNA polymerase having an amino acid sequence of SEQ ID NO: 83).
  • a RNA polymerase variant comprises a RNA polymerase that includes an (at least one) amino acid modification causes a loop structure of the RNA polymerase variant to undergo a conformational change to a helix structure as the RNA polymerase variant transitions from an initiation complex to an elongation complex.
  • the amino acid modification is an amino acid substitution at one or more of positions 42, 43, 44, 45, 46, and 47, relative to the wild-type RNA polymerase, wherein the wild- type RNA polymerase comprises the amino acid sequence of SEQ ID NO: 83.
  • the amino acid substitution in some embodiments, is a high propensity amino acid substitution.
  • RNA polymerase variant comprise a RNA polymerase that includes an additional C-terminal amino acid, relative to the wild-type RNA polymerase.
  • the additional C-terminal amino acid in some embodiments, is selected from glycine, alanine, threonine, proline, glutamine, and serine.
  • the additional C-terminal amino acid (e.g., at position 884 relative to wild-type RNA polymerase comprising the amino acid sequence of SEQ ID NO: 83) is glycine.
  • Co-transcriptional capping methods may also be used for ribonucleic acid (RNA) synthesis, using an RNA polymerase variant. That is, RNA is produced in a “one-pot” reaction, without the need for a separate capping reaction.
  • the methods in some embodiments, comprise reacting a polynucleotide template with a RNA polymerase variant, nucleoside triphosphates, and a cap analog under in vitro transcription reaction conditions to produce RNA transcript.
  • compositions may include RNA (e.g., mRNA) or multiple RNAs (e.g., mRNAs) encoding two or more antigens of the same or different species.
  • composition includes an RNA (e.g., mRNA) or multiple RNAs (e.g., mRNAs) encoding two or more proteins.
  • the RNA e.g., mRNA
  • the RNA may encode 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or more proteins.
  • two or more different RNA (e.g., mRNA) encoding antigens may be formulated in the same lipid nanoparticle.
  • two or more different RNA (e.g., mRNA) encoding antigens may be formulated in separate lipid nanoparticles (each RNA (e.g., mRNA) formulated in a single lipid nanoparticle). Lipid nanoparticles may then be combined and administered as a single vaccine composition (e.g., comprising multiple RNAs (e.g., mRNAs) encoding multiple antigens) or may be administered separately.
  • Identification and Ratio Determination (IDR) Sequences In some embodiments, one or more nucleic acids comprises an Identification and Ratio Determination sequence.
  • An Identification and Ratio Determination (IDR) sequence is a sequence of a biological molecule (e.g., nucleic acid or protein) that, when combined with the sequence of a target biological molecule, serves to identify the target biological molecule.
  • an IDR sequence is a heterologous sequence that is incorporated within or appended to a sequence of a target biological molecule and can be used as a reference to identify the target molecule.
  • a nucleic acid e.g., mRNA
  • a target sequence of interest e.g., a coding sequence encoding a therapeutic and/or antigenic peptide or protein
  • a unique IDR sequence e.g., a unique IDR sequence.
  • RNA species may comprise an IDR sequence that differs from the IDR sequence of other RNA species (e.g., RNA(s) having different coding sequence(s)).
  • Each IDR sequence thus identifies a particular RNA species, and so the abundance of IDR sequences may be measured to determine the abundance of each RNA species in a composition.
  • Use of distinct IDR sequences to identify RNA species allows for analysis of multivalent RNA compositions (e.g., containing multiple RNA species) containing RNA species with similar coding sequences and/or lengths, which could otherwise be difficult to distinguish using PCR- or chromatography-based analysis of full-length RNAs.
  • Each RNA species in a multivalent RNA composition may comprise an IDR sequence that is not a sequence isomer of an IDR sequence of another RNA species in a multivalent RNA composition (e.g., the IDR sequence does not have the same number of adenosine nucleotides, the same number of cytosine nucleotides, the same number of guanine nucleotides, and the same number of uracil nucleotides, as another IDR sequence in the composition, even if those sequences have different sequences).
  • Having identical nucleotide compositions causes sequence isomers to have the same mass, presenting a challenge to distinguishing sequence isomers using mass-based identification methods (e.g., mass spectrometry).
  • Each RNA species in a multivalent RNA composition may comprise an IDR sequence having a mass that differs from the mass of IDR sequences of each other RNA species in a multivalent RNA composition.
  • the mass of each IDR sequence may differ from the mass of other IDR sequences by at least 9 Da, at least 25 Da, at least 25 Da, or at least 50 Da.
  • Use of IDR sequences with distinct masses allows RNA fragments comprising different IDR sequences to be distinguished using mass-based analysis methods (e.g., mass spectrometry), which do not require reverse transcription, amplification, or sequencing of RNAs.
  • Each RNA species in an RNA composition may comprises an IDR sequence with a different length.
  • each IDR sequence may have a length independently selected from 0 to 25 nucleotides.
  • the length of a nucleic acid influences the rate at which the nucleic acid traverses a chromatography column, and so the use of IDR sequences of different lengths on different RNA species allows RNA fragments having different IDR sequences to be distinguished using chromatography-based methods (e.g., LC-UV).
  • IDR sequences may be chosen such that no IDR sequence comprises a start codon, ‘AUG’. Lack of a start codon in an IDR sequence prevents undesired translation of nucleotide sequences within and/or downstream from the IDR sequence.
  • IDR sequences may be chosen such that no IDR sequence comprises a recognition site for a restriction enzyme.
  • no IDR sequence comprises a recognition site for XbaI, ‘UCUAG’.
  • Lack of a recognition site for a restriction enzyme e.g., XbaI recognition site ‘UCUAG’) allows the restriction enzyme to be used in generating and modifying a DNA template for in vitro transcription, without affecting the IDR sequence or sequence of the transcribed RNA.
  • Non-limiting examples of distinct IDR sequences include: GAGAUUGAGUGUAGUGACUAG (SEQ ID NO: 56), GAGAUUGAGUGUAGUGAC (SEQ ID NO: 57), GAGAUUGAGUGUAGUG (SEQ ID NO: 58), GAUUGAGACUACGGG (SEQ ID NO: 59), and CAUAGACACUACG (SEQ ID NO: 60).
  • each mRNA encoding a distinct protein comprises a 3′ UTR comprising a distinct IDR sequence selected from SEQ ID NOs: 56–60.
  • Nucleic Acid Production Chemical Synthesis Nucleic acids may be manufactured in whole or in part using solid phase techniques.
  • Solid-phase chemical synthesis of nucleic acids is an automated method wherein molecules are immobilized on a solid support and synthesized step by step in a reactant solution. Solid-phase synthesis is useful in site-specific introduction of chemical modifications in the nucleic acid sequences.
  • the synthesis of nucleic acids by the sequential addition of monomer building blocks may be carried out in a liquid phase.
  • the synthetic methods discussed above each has its own advantages and limitations. Attempts have been conducted to combine these methods to overcome the limitations. Such combinations of methods are also suitable.
  • the use of solid-phase or liquid-phase chemical synthesis in combination with enzymatic ligation provides an efficient way to generate long chain nucleic acids that cannot be obtained by chemical synthesis alone.
  • Ligation Assembling nucleic acids by a ligase may also be used.
  • DNA or RNA ligases promote intermolecular ligation of the 5’ and 3’ ends of polynucleotide chains through the formation of a phosphodiester bond.
  • Nucleic acids such as chimeric polynucleotides and/or circular nucleic acids may be prepared by ligation of one or more regions or subregions. DNA fragments can be joined by a ligase catalyzed reaction to create recombinant DNA with different functions. Two oligodeoxynucleotides, one with a 5’ phosphoryl group and another with a free 3’ hydroxyl group, serve as substrates for a DNA ligase.
  • nucleic acid clean-up may be performed by methods known in the arts such as, but not limited to, AGENCOURT® beads (Beckman Coulter Genomics, Danvers, MA), poly-T beads, LNATM oligo-T capture probes (EXIQON® Inc, Vedbaek, Denmark) or HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC), and hydrophobic interaction HPLC (HIC- HPLC).
  • AGENCOURT® beads Beckman Coulter Genomics, Danvers, MA
  • poly-T beads poly-T beads
  • LNATM oligo-T capture probes EXIQON® Inc, Vedbaek, Denmark
  • HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC), and hydrophobic interaction HPLC (HIC- HPLC).
  • purified when used in relation to a nucleic acid such as a “purified nucleic acid” refers to one that is separated from at least one contaminant.
  • a “contaminant” is any substance that makes another unfit, impure or inferior.
  • a purified nucleic acid e.g., DNA and RNA
  • a quality assurance and/or quality control check may be conducted using methods such as, but not limited to, gel electrophoresis, UV absorbance, or analytical HPLC.
  • the nucleic acids may be sequenced by methods including, but not limited to reverse-transcriptase-PCR. Quantification In some embodiments, the nucleic acids may be quantified in exosomes or when derived from one or more bodily fluid.
  • Bodily fluids include peripheral blood, serum, plasma, ascites, urine, cerebrospinal fluid (CSF), sputum, saliva, bone marrow, synovial fluid, aqueous humor, amniotic fluid, cerumen, breast milk, broncheoalveolar lavage fluid, semen, prostatic fluid, cowper's fluid or pre-ejaculatory fluid, sweat, fecal matter, hair, tears, cyst fluid, pleural and peritoneal fluid, pericardial fluid, lymph, chyme, chyle, bile, interstitial fluid, menses, pus, sebum, vomit, vaginal secretions, mucosal secretion, stool water, pancreatic juice, lavage fluids from sinus cavities, bronchopulmonary aspirates, blastocyl cavity fluid, and umbilical cord blood.
  • CSF cerebrospinal fluid
  • saliva aqueous humor
  • amniotic fluid cerumen
  • breast milk broncheoalveolar lavage fluid
  • exosomes may be retrieved from an organ selected from the group consisting of lung, heart, pancreas, stomach, intestine, bladder, kidney, ovary, testis, skin, colon, breast, prostate, brain, esophagus, liver, and placenta.
  • Assays may be performed using construct specific probes, cytometry, qRT-PCR, real-time PCR, PCR, flow cytometry, electrophoresis, mass spectrometry, or combinations thereof while the exosomes may be isolated using immunohistochemical methods such as enzyme linked immunosorbent assay (ELISA) methods.
  • ELISA enzyme linked immunosorbent assay
  • Exosomes may also be isolated by size exclusion chromatography, density gradient centrifugation, differential centrifugation, nanomembrane ultrafiltration, immunoabsorbent capture, affinity purification, microfluidic separation, or combinations thereof. These methods afford the investigator the ability to monitor, in real time, the level of nucleic acids remaining or delivered. This is possible because the nucleic acids, in some embodiments, differ from the endogenous forms due to the structural or chemical modifications.
  • the nucleic acid may be quantified using methods such as, but not limited to, ultraviolet visible spectroscopy (UV/Vis).
  • UV/Vis ultraviolet visible spectroscopy
  • a non-limiting example of a UV/Vis spectrometer is a NANODROP® spectrometer (ThermoFisher, Waltham, MA).
  • the quantified nucleic acid may be analyzed in order to determine if the nucleic acid may be of proper size, check that no degradation of the nucleic acid has occurred. Degradation of the nucleic acid may be checked by methods such as, but not limited to, agarose gel electrophoresis, HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC), and hydrophobic interaction HPLC (HIC- HPLC), liquid chromatography-mass spectrometry (LCMS), capillary electrophoresis (CE) and capillary gel electrophoresis (CGE).
  • agarose gel electrophoresis HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC), and hydrophobic interaction HPLC (HIC- HPLC), liquid chromatography-mass spectrometry (LCMS), capillary electrophoresis (
  • the nucleic acids are formulated as a lipid composition, such as a composition comprising a lipid nanoparticle, a liposome, and/or a lipoplex.
  • nucleic acids are formulated as lipid nanoparticle (LNP) compositions.
  • LNP lipid nanoparticle
  • Lipid nanoparticles typically comprise amino lipid, non-cationic lipid, structural lipid, and PEG lipid components along with the nucleic acid cargo of interest.
  • the lipid nanoparticles can be generated using components, compositions, and methods as are generally known in the art, see for example PCT/US2016/052352; PCT/US2016/068300; PCT/US2017/037551; PCT/US2015/027400; PCT/US2016/047406; PCT/US2016/000129; PCT/US2016/014280; PCT/US2017/038426; PCT/US2014/027077; PCT/US2014/055394; PCT/US2016/052117; PCT/US2012/069610; PCT/US2017/027492; PCT/US2016/059575; PCT/US2016/069491; PCT/US2016/069493; and PCT/US2014/66242, all of which are incorporated by reference herein in their entirety.
  • the lipid nanoparticle comprises at least one ionizable amino lipid, at least one non-cationic lipid, at least one sterol, and/or at least one polyethylene glycol (PEG)- modified lipid.
  • the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-25% non-cationic lipid, 25-55% structural lipid, and 0.5-15% PEG-modified lipid.
  • the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-30% non-cationic lipid, 10-55% structural lipid, and 0.5-15% PEG-modified lipid.
  • the lipid nanoparticle comprises 40-50 mol% ionizable lipid, optionally 45-50 mol%, for example, 45-46 mol%, 46-47 mol%, 47-48 mol%, 48-49 mol%, or 49-50 mol% for example about 45 mol%, 45.5 mol%, 46 mol%, 46.5 mol%, 47 mol%, 47.5 mol%, 48 mol%, 48.5 mol%, 49 mol%, or 49.5 mol%.
  • the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid.
  • the lipid nanoparticle may comprise 20-50 mol%, 20-40 mol%, 20-30 mol%, 30-60 mol%, 30-50 mol%, 30-40 mol%, 40-60 mol%, 40-50 mol%, or 50-60 mol% ionizable amino lipid.
  • the lipid nanoparticle comprises 20 mol%, 30 mol%, 40 mol%, 50 mol%, or 60 mol% ionizable amino lipid.
  • the lipid nanoparticle comprises 35 mol%, 36 mol%, 37 mol%, 38 mol%, 39 mol%, 40 mol%, 41 mol%, 42 mol%, 43 mol%, 44 mol%, 45 mol%, 46 mol%, 47 mol%, 48 mol%, 49 mol%, 50 mol%, 51 mol%, 52 mol%, 53 mol%, 54 mol%, or 55 mol% ionizable amino lipid. In some embodiments, the lipid nanoparticle comprises 45–55 mole percent (mol%) ionizable amino lipid.
  • lipid nanoparticle may comprise 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, or 55 mol% ionizable amino lipid.
  • Ionizable amino lipids Formula (AI) the ionizable amino lipid of a lipid nanoparticle is a compound of Formula (AI): ; a wherein R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C 2-14 alkenyl; R 4 is selected from the group consisting of -(CH 2 ) n OH, wherein n is selected from the group consisting of 1, 2, 3, 4, and 5, and , wherein denotes a point of attachment; wherein R 10 is N(R) 2 ; each R is independently selected from the group consisting of C 1-6
  • R’ a is R’ branched ; is a point of attachment; R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each H; R 2 and R 3 are each C1-14 alkyl; R 4 is -(CH2)nOH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each - C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7.
  • R’ a is R’ branched ; R’ branched is ; a point of attachment; R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each H; R 2 and R 3 are each C1-14 alkyl; R 4 is -(CH2)nOH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each - C(O)O-; R’ is a C 1-12 alkyl; l is 3; and m is 7.
  • R’ a is R’ branched ; R’ branched is ; denotes a point of attachment; R a ⁇ is C 2-12 alkyl; R a ⁇ , R a ⁇ , and R a ⁇ are each H; R 2 and R 3 are each C 1-14 alkyl; 6 alkyl); n2 is 2; R 5 is H; each R 6 is H; M and M’ are each - m is 7.
  • R’ a is R’ branched ; R’ branched is a point of attachment; R a ⁇ , R a ⁇ , and R a ⁇ are each H; R a ⁇ is C 2-12 are C1-14 alkyl; R 4 is -(CH2)nOH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7.
  • the compound of Formula (AI) is selected from: .
  • the ionizable amino lipid of Formula (AI) is a compound of (AIa), or its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched ; wherein denotes a point of attachment; selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C 2-14 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting , wherein denotes a point of attachment; wherein R 10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2- 3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R 5 is independently selected from the group consisting of C 1-3 alkyl, C2-3 alkenyl
  • the ionizable amino lipid of Formula (AI) is a compound of (AIb), or its N-oxide, or a salt or isomer thereof, wherein R’ branched is: ; wherein denotes a point of attachment; wherein R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each independently selected from the group consisting of H, C 2-12 alkyl, and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R 4 is -(CH 2 ) n OH, wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; each R 5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R 6 is independently selected from the group consisting of C 1-3 alkyl, C 2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting
  • R’ a is R’ branched ; is attachment; R a ⁇ , R a ⁇ , and R a ⁇ are each H; R 2 and R 3 are each C 1-14 alkyl; R 4 is -(CH 2 ) n OH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each - C(O)O-; R’ is a C 1-12 alkyl; l is 5; and m is 7.
  • R’ a is R’ branched ; is a point of attachment; R a ⁇ , R a ⁇ , and R a ⁇ are each H; R 2 and R 3 are each C 1-14 alkyl; R 4 is -(CH 2 ) n OH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each - C(O)O-; R’ is a C1-12 alkyl; l is 3; and m is 7.
  • R’ a is R’ branched ; is ; denotes a point of attachment; R a ⁇ and R a ⁇ are each H; R a ⁇ is C2-12 alkyl; R 2 and R 3 are each C 1-14 alkyl; R 4 is -(CH 2 ) n OH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7.
  • the ionizable amino lipid of Formula (AI) is a compound of Formula (AIc): its N-oxide, or a salt or isomer thereof, R’ branched denotes a point of attachment; wherein are selected from the group consisting of H, C 2-12 alkyl, and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R 4 is wherein denotes a point of attachment; wherein R 10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R 5 is independently selected from the group consisting of C 1-3 alkyl, C 2-3 alkenyl, and H; each R 6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M
  • the compound of Formula (AIc) is: .
  • the ionizable amino lipid is a compound of Formula (AII):
  • R’ b is: ; wherein denotes a point of attachment;
  • R a ⁇ and R a ⁇ are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of R a ⁇ and R a ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl;
  • R b ⁇ and R b ⁇ are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of R b ⁇ and R b ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl;
  • R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C2-14 alkenyl;
  • R 4 is wherein n is selected from the
  • the ionizable amino lipid of Formula (AII) is a compound of wherein R’ a is R’ branched or R’ cyclic ; wherein denotes a point of attachment; R a ⁇ and R a ⁇ are each independently selected from the group consisting of H, C1-12 alkyl, and C 2-12 alkenyl, wherein at least one of R a ⁇ and R a ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl; R b ⁇ and R b ⁇ are each independently selected from the group consisting of H, C1-12 alkyl, and C 2-12 alkenyl, wherein at least one of R b ⁇ and R b ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R 4 is wherein n is selected from the group consisting of 1, 2, 3, 4,
  • the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-b): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched or R’ cyclic ; wherein a point of attachment; R a ⁇ and R b ⁇ are each independently selected from the group consisting of C 1-12 alkyl and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C 2-14 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting , wherein denotes a point of attachment; wherein R 10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C 1-12
  • the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-c): denotes a point of wherein R a ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C2-14 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting , wherein denotes a point of attachment; wherein R 10 is N(R)2; each R is independently selected from the group consisting of C 1-6 alkyl, C 2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; R’ is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9.
  • the ionizable amino lipid of Formula (AII) is a compound of Formula : (AII-d), or its N-oxide, or a salt or isomer thereof, R’ branched is: and R’ b is: ; wherein denotes a point of attachment; wherein R a ⁇ and R b ⁇ are each independently selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting , wherein denotes a point of attachment; 2; from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C 1-12 alkyl or C 2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8,
  • the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-e): denotes a point of attachment; wherein R a ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R 4 is -(CH 2 ) n OH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C 1-12 alkyl or C 2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9.
  • m and l are each independently selected from 4, 5, and 6. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), m and l are each 5. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), each R’ independently is a C 1-12 alkyl.
  • each R’ independently is a C2-5 alkyl.
  • R’ b is: and R 2 and R 3 are each independently a C 1-14 alkyl.
  • R’ b is: R 3 are each independently a C 6-10 alkyl.
  • R’ b R 2 and R 3 are each a C 8 alkyl.
  • R 3 are each independently a C6-10 alkyl.
  • embodiments of the compound of Formula is: embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), , 6 alkyl, and R 2 and R 3 are each a C8 alkyl.
  • , , , (AII- d), or , , are each a , (AII-d), or (AII-e), R’ branched is: , R’ b is: , and R a ⁇ and R b ⁇ are each a C2-6 alkyl.
  • m and l are each independently selected from 4, 5, and 6 and each R’ independently is a C 1-12 alkyl.
  • m and l are each 5 and each R’ independently is a C 2-5 alkyl.
  • R’ branched is: independently R a ⁇ and R b ⁇ are each a C 1-12 alkyl.
  • each R’ independently is a C2-5 alkyl, and R a ⁇ and are each a C2-6 alkyl.
  • R a ⁇ is a C 1-12 alkyl and R 2 and R 3 are each independently a C 6-10 alkyl.
  • R’ is a C2- 5 alkyl
  • R a ⁇ is a C 2-6 alkyl
  • R 2 and R 3 are each a C 8 alkyl.
  • R 10 is NH(C1-6 alkyl) and n2 is 2.
  • R 4 is , R 10 is NH(CH3) and n2 is 2.
  • R’ branched is: independently is a C 2-5 alkyl, R a ⁇ and R b ⁇ are each a C 2-6 alkyl, and R 4 , wherein R 10 is NH(CH 3 ) and n2 is 2.
  • R’ is a C1-12 alkyl
  • R 2 and R 3 are each independently a C 6-10 alkyl
  • R a ⁇ is a C 1-12 alkyl
  • R 4 is , wherein R 10 is NH(C 1-6 alkyl) and n2 is 2.
  • R’ is a C2- a ⁇ 2 3 4 5 alkyl
  • R is a C 2-6 alkyl
  • R and R are each a C 8 alkyl
  • R 4 is -(CH 2 ) n OH and n is 2, 3, or 4.
  • R 4 is -(CH2)nOH and n is 2.
  • c) independently is a C 1-12 alkyl
  • R a ⁇ and R b ⁇ are each a C 1-12 alkyl
  • R 4 is -(CH 2 ) n OH
  • n is 2, 3, or 4.
  • R’ b is: , m and l are each 5, each R’ independently is a C2-5 alkyl, R a ⁇ and R b ⁇ are each a C 2-6 alkyl, R 4 is -(CH 2 ) n OH, and n is 2.
  • the ionizable amino lipid of Formula (AII) is a compound of R’ branched is: and R’ b is: ; wherein denotes a point of attachment; R a ⁇ is a C 1-12 alkyl; R 2 and R 3 are each independently a C 1-14 alkyl; R 4 is -(CH2)nOH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C 1-12 alkyl; m is selected from 4, 5, and 6; and l is selected from 4, 5, and 6.
  • m and l are each 5, and n is 2, 3, or 4.
  • R’ is a C2-5 alkyl, R a ⁇ is a C2-6 alkyl, and R 2 and R 3 are each a C6-10 alkyl.
  • m and l are each 5, n is 2, 3, or 4
  • R’ is a C 2-5 alkyl, R a ⁇ is a C 2-6 alkyl, and R 2 and R 3 are each a C 6-10 alkyl.
  • the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-g): R a ⁇ is a C2-6 alkyl; R’ is a C 2-5 alkyl; and R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting wherein denotes a point of attachment, R 10 is NH(C 1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3.
  • the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-h): thereof; wherein R a ⁇ and R b ⁇ are each independently a C 2-6 alkyl; each R’ independently is a C 2-5 alkyl; and R 4 is selected from the of - nOH wherein n is selected from the group consisting , wherein denotes a point of attachment, R 10 is NH(C1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3. of the compound of Formula (AII-g) or (AII-h), R 4 is , wherein R 10 is NH(CH3) and n2 is 2.
  • R 4 is -(CH 2 ) 2 OH.
  • the ionizable amino lipids of a lipid nanoparticle may be one or more of compounds of Formula (AIII): (AIII), or their N-oxides, or salts or isomers thereof, wherein: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of hydrogen, a C 3-6 carbocycle, -(CH 2 )
  • another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of a C 3-6 carbocycle, -(CH 2 ) n Q, -(CH 2 ) n CHQR, -CHQR, -CQ(R) 2 , and unsubstituted C 1-6 alkyl, where Q is selected from a C 3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N,
  • another subset of compounds of Formula (AIII) includes those in which: R 1 is selected from the group consisting of C 5-30 alkyl, C 5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R 2 and R 3 , together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R) 2 , and unsubstituted C 1-6 alkyl, where Q is selected from a C 3-6 carbocycle, a 5- to 14-membered heterocycle having one or more heteroatoms selected from N, O, and S,
  • another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of a C 3-6 carbocycle, -(CH 2 ) n Q, -(CH 2 ) n CHQR, -CHQR, -CQ(R) 2 , and unsubstituted C 1-6 alkyl, where Q is selected from a C 3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N,
  • another subset of compounds of Formula (AIII) includes those in which R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 2-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is -(CH 2 ) n Q or -(CH 2 ) n CHQR, where Q is -N(R) 2 , and n is selected from 3, 4, and 5; each R 5 is independently selected from the group consisting of C 1-3 alkyl, C 2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M
  • another subset of compounds of Formula (AIII) includes those in which R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of -(CH 2 ) n Q, -(CH 2 ) n CHQR, -CHQR, and -CQ(R)2, where Q is -N(R)2, and n is selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R 6 is independently selected from the group consisting of
  • m is 5, 7, or 9.
  • Q is OH, -NHC(S)N(R) 2 , or -NHC(O)N(R) 2 .
  • Q is -N(R)C(O)R, or -N(R)S(O) 2 R.
  • m is 5, 7, or 9.
  • Q is OH, -NHC(S)N(R) 2 , or -NHC(O)N(R) 2 .
  • Q is -N(R)C(O)R, or -N(R)S(O) 2 R.
  • (AIII) are of Formula (AIII-D), (AIII-D), or their N-oxides, or salts or isomers thereof, wherein R4 is as described in this “Lipid Compositions” section.
  • the compounds of Formula (AIII) are of Formula (AIII-E), their N-oxides, or salts or isomers section.
  • the compounds of Formula (AIII) are of Formula (AIII-F) or (AIII-G): their N-oxides, or salts or isomers thereof, wherein R 4 is as described in this “Lipid Compositions” section.
  • the compounds of Formula (AIII) are of Formula (AIII-H): their N-oxides, or salts or isomers thereof, wherein M is -C(O)O- or –OC(O)-, M” is C 1-6 alkyl or C 2-6 alkenyl, R 2 and R 3 are independently selected from the group consisting of C 5-14 alkyl and C 5-14 alkenyl, and n is selected from 2, 3, and 4.
  • the compounds of Formula (AIII) are of Formula (AIII-I): (AIII-I), or their N-oxides, or salts or isomers thereof, wherein n is 2, 3, or 4; and m, R’, R”, and R 2 through R 6 are as described in this “Lipid Compositions” section.
  • each of R2 and R3 may be independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl.
  • an ionizable amino lipid comprises a compound having structure: (Compound 1).
  • an ionizable amino lipid comprises a compound having structure: .
  • the compounds of Formula (AIII) are of Formula (AIII-J), (AIII-J), or their N-oxides, or salts or isomers thereof, wherein l is selected from 1, 2, 3, 4, and 5; m is selected from 5, 6, 7, 8, and 9; M 1 is a bond or M’; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl.
  • M is C1-6 alkyl (e.g., C 1-4 alkyl) or C 2-6 alkenyl (e.g. C 2-4 alkenyl).
  • R 2 and R 3 are independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl.
  • the ionizable amino lipids are one or more of the compounds described in U.S. Application Nos.
  • the central amine moiety of a lipid according to Formula (AIII), (AIII-A), (AIII-B), (AIII-C), (AIII-D), (AIII-E), (AIII-F), (AIII-G), (AIII-H), (AIII-I), or (AIII-J) may be protonated at a physiological pH.
  • a lipid may have a positive or partial positive charge at physiological pH.
  • Such amino lipids may be referred to as cationic lipids, ionizable lipids, cationic amino lipids, or ionizable amino lipids.
  • Amino lipids may also be zwitterionic, i.e., neutral molecules having both a positive and a negative charge.
  • the ionizable amino lipids of a lipid nanoparticle may be one or more of compounds of formula (AIV), t is 1 or 2; A1 and A2 are each independently selected from CH or N; Z is CH 2 or absent wherein when Z is CH 2 , the dashed lines (1) and (2) each represent a single bond; and when Z is absent, the dashed lines (1) and (2) are both absent; R1, R2, R3, R4, and R5 are independently selected from the group consisting of C5-20 alkyl, C 5-20 alkenyl, -R”MR’, -R*YR”, -YR”, and -R*OR”; R X1 and R X2 are each independently H or C 1-3 alkyl; each M is independently selected from the group consisting of -C(O)O-, -OC(O)-, -OC(O)O-, -C(O)N(R’)-, -N(R’)C(O)-
  • the compound is of any of formulae (AIVa)-(AIVh): , (AIVd),
  • the ionizable amino lipid is a salt thereof.
  • the central amine moiety of a lipid according to Formula (AIV), (AIVa), (AIVb), (AIVc), (AIVd), (AIVe), (AIVf), (AIVg), or (AIVh) may be protonated at a physiological pH.
  • a lipid may have a positive or partial positive charge at physiological pH.
  • each R 1a is independently hydrogen, R 1c , or R 1d ; each R 1b is independently R 1c or R 1d ; each R 1c is independently –[CH 2 ] 2 C(O)X 1 R 3 ; each R 1d Is independently -C(O)R 4 ; each R 2 is independently -[C(R 2a ) 2 ] c R 2b ; each R 2a is independently hydrogen or C 1 -C 6 alkyl; R 2b is -N(L1-B)2; -(OCH2CH2)6OH; or -(OCH2CH2)bOCH3; each R 3 and R 4 is independently C6-C30 aliphatic; each I.
  • each B is independently hydrogen or an ionizable nitrogen-containing group
  • each X 1 is independently a covalent bond or O
  • each a is independently an integer of 1-10
  • each b is independently an integer of 1-10
  • each c is independently an integer of 1-10.
  • the lipid nanoparticle comprises a lipid having the structure: - - - - - G 1 and G 2 are each independently C 2 -C 12 alkylene or C 2 -C 12 alkenylene; G 3 is C1-C24 alkylene, C2-C24 alkenylene, C3-C8 cycloalkylene or C3-C8 cycloalkenylene; R a , R b , R d and R e are each independently H or C 1 -C 12 alkyl or C 1 -C 12 alkenyl; R c and R f are each independently C 1 -C 12 alkyl or C 2 -C 12 alkenyl; R 1 and R 2 are each independently branched C6-C24 alkyl or branched C6-C24 alkenyl; R 3 is -N(R 4 )R 5 ; R 4 is C 1 -C 12 alkyl; R 5 is substituted C1-C12 alkyl;
  • the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt thereof, wherein are same or a linear or branched alkyl with 1-9 carbons, or as alkenyl or alkynyl with 2 to 11 carbon atoms, L1 and L2 are the same or different, each a linear alkyl having 5 to 18 carbon atoms, or form a heterocycle with N, X 1 is a bond, or is -CG-G- whereby L 2 -CO-O-R 2 is formed, X2 is S or O, L 3 is a bond or a lower alkyl, or form a heterocycle with N, R 3 is a lower alkyl, and R4 and R5 are the same or different, each a lower alkyl.
  • the lipid nanoparticle comprises an ionizable lipid having the structure: , or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: , or a pharmaceutically acceptable salt thereof. (A3), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (A4), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (A6), or a pharmaceutically acceptable salt thereof.
  • the lipid nanoparticle comprises a lipid having the structure: (A7), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (A8), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (A9), or a pharmaceutically acceptable salt thereof. some comprises a lipid having the structure: pharmaceutically acceptable salt In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (A11), or a pharmaceutically acceptable salt thereof.
  • Non-cationic lipids In certain embodiments, the lipid nanoparticles comprise one or more non-cationic lipids. Non-cationic lipids may be phospholipids.
  • the lipid nanoparticle comprises 5-25 mol% non-cationic lipid.
  • the lipid nanoparticle may comprise 5-20 mol%, 5-15 mol%, 5-10 mol%, 10-25 mol%, 10-20 mol%, 10-25 mol%, 15-25 mol%, 15-20 mol%, or 20-25 mol% non-cationic lipid.
  • the lipid nanoparticle comprises 5 mol%, 10 mol%, 15 mol%, 20 mol%, or 25 mol% non-cationic lipid.
  • a non-cationic lipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2- dilinoleoyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero-phosphocholine (DMPC), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), l,2-dipalmitoyl-sn-glycero-3- phosphocholine (DPPC), 1,2-diundecanoyl-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2- oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3-phosphocholine
  • the lipid nanoparticle comprises 5–15 mol%, 5–10 mol%, or 10– 15 mol% DSPC.
  • the lipid nanoparticle may comprise 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 mol% DSPC.
  • the lipid composition of the lipid nanoparticle composition disclosed herein can comprise one or more phospholipids, for example, one or more saturated or (poly)unsaturated phospholipids or a combination thereof.
  • phospholipids comprise a phospholipid moiety and one or more fatty acid moieties.
  • a phospholipid moiety can be selected, for example, from the non-limiting group consisting of phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl glycerol, phosphatidyl serine, phosphatidic acid, 2-lysophosphatidyl choline, and a sphingomyelin.
  • a fatty acid moiety can be selected, for example, from the non-limiting group consisting of lauric acid, myristic acid, myristoleic acid, palmitic acid, palmitoleic acid, stearic acid, oleic acid, linoleic acid, alpha-linolenic acid, erucic acid, phytanoic acid, arachidic acid, arachidonic acid, eicosapentaenoic acid, behenic acid, docosapentaenoic acid, and docosahexaenoic acid.
  • Particular phospholipids can facilitate fusion to a membrane.
  • a cationic phospholipid can interact with one or more negatively charged phospholipids of a membrane (e.g., a cellular or intracellular membrane). Fusion of a phospholipid to a membrane can allow one or more elements (e.g., a therapeutic agent) of a lipid-containing composition (e.g., LNPs) to pass through the membrane permitting, e.g., delivery of the one or more elements to a target tissue.
  • elements e.g., a therapeutic agent
  • a lipid-containing composition e.g., LNPs
  • Non-natural phospholipid species including natural species with modifications and substitutions including branching, oxidation, cyclization, and alkynes are also contemplated.
  • a phospholipid can be functionalized with or cross-linked to one or more alkynes (e.g., an alkenyl group in which one or more double bonds is replaced with a triple bond).
  • alkynes e.g., an alkenyl group in which one or more double bonds is replaced with a triple bond.
  • an alkyne group can undergo a copper-catalyzed cycloaddition upon exposure to an azide.
  • Such reactions can be useful in functionalizing a lipid bilayer of a nanoparticle composition to facilitate membrane permeation or cellular recognition or in conjugating a nanoparticle composition to a useful component such as a targeting or imaging moiety (e.g., a dye).
  • Phospholipids include, but are not limited to, glycerophospholipids such as phosphatidylcholines, phosphatidylethanolamines, phosphatidylserines, phosphatidylinositols, phosphatidy glycerols, and phosphatidic acids. Phospholipids also include phosphosphingolipid, such as sphingomyelin.
  • a phospholipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-Distearoyl-sn-glycero-3-phosphoethanolamine (DSPE), 1,2- dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleoyl-sn-glycero-3- phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero-phosphocholine (DMPC), 1,2-dioleoyl-sn- glycero-3-phosphocholine (DOPC), l,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2- diundecanoyl-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3- phosphocholine (POPC), 1,2-di-
  • a phospholipid is an analog or variant of DSPC.
  • a phospholipid is a compound of Formula (HI): (HI), or a salt thereof, wherein: each R 1 is independently optionally substituted alkyl; or optionally two R 1 are joined together with the intervening atoms to form optionally substituted monocyclic carbocyclyl or optionally substituted monocyclic heterocyclyl; or optionally three R 1 are joined together with the intervening atoms to form optionally substituted bicyclic carbocyclyl or optionally substitute bicyclic heterocyclyl; n is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; m is 0, 1, 2, 3, 4, 5, A is of the formula: or ; each instance of L 2 is independently a bond or optionally substituted C 1-6 alkylene, wherein one methylene unit of the optionally substituted C 1-6 alkylene is optionally replaced with O, N(R N ), S, C(O), C(O)N(R N)
  • the compound is not of the formula: , wherein each instance of R 2 is independently unsubstituted alkyl, unsubstituted alkenyl, or unsubstituted alkynyl.
  • the phospholipids may be one or more of the phospholipids described in PCT Application No. PCT/US2018/037922.
  • the lipid nanoparticle comprises a molar ratio of 5-25% non- cationic lipid relative to the other lipid components.
  • the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15-20%, 20-25%, or 25-30% non-cationic lipid.
  • the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% non-cationic lipid.
  • the lipid nanoparticle comprises a molar ratio of 5-25% phospholipid relative to the other lipid components.
  • the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15-20%, 20-25%, or 25-30% phospholipid.
  • the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% phospholipid lipid.
  • Structural lipids The lipid composition of a pharmaceutical composition disclosed herein can comprise one or more structural lipids.
  • structural lipid includes sterols and also to lipids containing sterol moieties.
  • Structural lipids can be selected from the group including but not limited to, cholesterol, fecosterol, sitosterol, ergosterol, campesterol, stigmasterol, brassicasterol, tomatidine, tomatine, ursolic acid, alpha-tocopherol, hopanoids, phytosterols, steroids, and mixtures thereof.
  • the structural lipid is a sterol.
  • “sterols” are a subgroup of steroids consisting of steroid alcohols.
  • the structural lipid is a steroid.
  • the structural lipid is cholesterol.
  • the structural lipid is an analog of cholesterol. In certain embodiments, the structural lipid is alpha-tocopherol. In some embodiments, the structural lipids may be one or more of the structural lipids described in U.S. Application No.16/493,814. In some embodiments, the lipid nanoparticle comprises a molar ratio of 25-55% structural lipid relative to the other lipid components.
  • the lipid nanoparticle may comprise a molar ratio of 10-55%, 25-50%, 25-45%, 25-40%, 25-35%, 25-30%, 30-55%, 30- 50%, 30-45%, 30-40%, 30-35%, 35-55%, 35-50%, 35-45%, 35-40%, 40-55%, 40-50%, 40-45%, 45-55%, 45-50%, or 50-55% structural lipid.
  • the lipid nanoparticle comprises a molar ratio of 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or 55% structural lipid.
  • the lipid nanoparticle comprises 30-45 mol% sterol, optionally 35- 40 mol%, for example, 30-31 mol%, 31-32 mol%, 32-33 mol%, 33-34 mol%, 34-35 mol%, 35- 36 mol%, 36-37 mol%, 37-38 mol%, 38-39 mol%, or 39-40 mol%. In some embodiments, the lipid nanoparticle comprises 25-55 mol% sterol.
  • the lipid nanoparticle may comprise 25-50 mol%, 25-45 mol%, 25-40 mol%, 25-35 mol%, 25-30 mol%, 30-55 mol%, 30- 50 mol%, 30-45 mol%, 30-40 mol%, 30-35 mol%, 35-55 mol%, 35-50 mol%, 35-45 mol%, 35- 40 mol%, 40-55 mol%, 40-50 mol%, 40-45 mol%, 45-55 mol%, 45-50 mol%, or 50-55 mol% sterol.
  • the lipid nanoparticle comprises 25 mol%, 30 mol%, 35 mol%, 40 mol%, 45 mol%, 50 mol%, or 55 mol% sterol. In some embodiments, the lipid nanoparticle comprises 35 – 40 mol% cholesterol. For example, the lipid nanoparticle may comprise 35, 35.5, 36, 36.5, 37, 37.5, 38, 38.5, 39, 39.5, or 40 mol% cholesterol.
  • Polyethylene glycol (PEG)-Lipids The lipid composition of a pharmaceutical composition disclosed herein can comprise one or more polyethylene glycol (PEG) lipids.
  • PEG-lipid or “PEG-modified lipid” refers to polyethylene glycol (PEG)-modified lipids.
  • PEG-lipids include PEG-modified phosphatidylethanolamine and phosphatidic acid, PEG-ceramide conjugates (e.g., PEG-CerC14 or PEG-CerC20), PEG-modified dialkylamines, and PEG-modified 1,2-diacyloxypropan-3- amines.
  • PEG-lipids include PEG-modified phosphatidylethanolamine and phosphatidic acid, PEG-ceramide conjugates (e.g., PEG-CerC14 or PEG-CerC20), PEG-modified dialkylamines, and PEG-modified 1,2-diacyloxypropan-3- amines.
  • PEGylated lipids PEGylated lipids.
  • a PEG lipid can be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid.
  • the PEG-lipid includes, but not limited to 1,2-dimyristoyl-sn- glycerol methoxypolyethylene glycol (PEG-DMG), 1,2-distearoyl-sn-glycero-3- phosphoethanolamine-N-[amino(polyethylene glycol)] (PEG-DSPE), PEG-disteryl glycerol (PEG-DSG), PEG-dipalmetoleyl, PEG-dioleyl, PEG-distearyl, PEG-diacylglycamide (PEG- DAG), PEG-dipalmitoyl phosphatidylethanolamine (PEG-DPPE), or PEG-l,2- dimyristyloxlpropyl-3-amine
  • the PEG-lipid is selected from the group consisting of a PEG- modified phosphatidylethanolamine, a PEG-modified phosphatidic acid, a PEG-modified ceramide, a PEG-modified dialkylamine, a PEG-modified diacylglycerol, a PEG-modified dialkylglycerol, and mixtures thereof.
  • the PEG-modified lipid is PEG- DMG, PEG-c-DOMG (also referred to as PEG-DOMG), PEG-DSG, and/or PEG-DPG.
  • the lipid moiety of the PEG-lipids includes those having lengths of from about C14 to about C22, preferably from about C14 to about C16.
  • a PEG moiety for example an mPEG-NH 2 , has a size of about 1000, 2000, 5000, 10,000, 15,000 or 20,000 daltons.
  • the PEG-lipid is PEG2k-DMG.
  • the lipid nanoparticles can comprise a PEG lipid which is a non- diffusible PEG.
  • Non-limiting examples of non-diffusible PEGs include PEG-DSG and PEG- DSPE.
  • PEG-lipids are known in the art, such as those described in U.S.
  • lipid component of a lipid nanoparticle composition may include one or more molecules comprising polyethylene glycol, such as PEG or PEG-modified lipids. Such species may be alternately referred to as PEGylated lipids.
  • a PEG lipid is a lipid modified with polyethylene glycol.
  • a PEG lipid may be selected from the non-limiting group including PEG- modified phosphatidylethanolamines, PEG-modified phosphatidic acids, PEG-modified ceramides, PEG-modified dialkylamines, PEG-modified diacylglycerols, PEG-modified dialkylglycerols, and mixtures thereof.
  • a PEG lipid may be PEG-c-DOMG, PEG- DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid.
  • the PEG-modified lipids are a modified form of PEG DMG.
  • PEG- DMG has the following structure:
  • PEG lipids can be PEGylated lipids described in International Publication No. WO2012099755, the contents of which is herein incorporated by reference in its entirety. Any of these exemplary PEG lipids may be modified to comprise a hydroxyl group on the PEG chain.
  • the PEG lipid is a PEG-OH lipid.
  • a “PEG-OH lipid” (also referred to herein as “hydroxy-PEGylated lipid”) is a PEGylated lipid having one or more hydroxyl (–OH) groups on the lipid.
  • the PEG- OH lipid includes one or more hydroxyl groups on the PEG chain.
  • a PEG-OH or hydroxy-PEGylated lipid comprises an –OH group at the terminus of the PEG chain.
  • Formula (PI) in a PEG lipid is a compound of Formula (PI): (PI), or salts thereof, wherein: R 3 is –OR O ; R O is hydrogen, optionally substituted alkyl, or an oxygen protecting group; r is an integer between 1 and 100, inclusive; L 1 is optionally substituted C 1-10 alkylene, wherein at least one methylene of the optionally substituted C1-10 alkylene is independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, O, N(R N ), S, C(O), C(O)N(R N ), NR N C(O), C(O)O, OC(O), OC(O)O, OC(O)N(R N ), NR N C(O)O, or NR N C(O)N(R N ); D is a moiety obtained by click chemistry or a moiety cleavable under physiological conditions;
  • the compound of Fomula (PI) is a PEG-OH lipid (i.e., R 3 is – OR O , and R O is hydrogen).
  • the compound of Formula (PI) is of Formula (PI-OH), or a salt thereof.
  • Formula (PII) In certain embodiments, a PEG lipid is a PEGylated fatty acid. In certain embodiments, a PEG lipid is a compound of Formula (PII).
  • compounds of Formula (PII) have the following formula: (PII), or a salts thereof, wherein: R 3 is–OR O ; R O is hydrogen, optionally substituted alkyl or an oxygen protecting group; r is an integer between 1 and 100, inclusive; R 5 is optionally substituted C10-40 alkyl, optionally substituted C10-40 alkenyl, or optionally substituted C 10-40 alkynyl; and optionally one or more methylene groups of R 5 are replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, N(R N ), O, S, C(O), C(O)N(R N ), - NR N C(O), NR N C(O)N(R N ), C(O)O, OC(O), OC(O)O, OC(O)N(R N ), NR N C(O)O, C(O)S, SC(O), C
  • the compound of Formula (PII) is of Formula (PII-OH): (PII-OH), or a salt thereof.
  • r is 40-50.
  • the of Formula is: . or a salt thereof.
  • the lipid composition of the pharmaceutical compositions disclosed herein does not comprise a PEG-lipid.
  • the PEG-lipids may be one or more of the PEG lipids described in U.S. Application No. US15/674,872.
  • the lipid nanoparticle comprises a molar ratio of 0.5-15% PEG lipid relative to the other lipid components.
  • the lipid nanoparticle may comprise a molar ratio of 0.5-10%, 0.5-5%, 1-15%, 1-10%, 1-5%, 2-15%, 2-10%, 2-5%, 5-15%, 5-10%, or 10-15% PEG lipid.
  • the lipid nanoparticle comprises a molar ratio of 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, or 15% PEG- lipid.
  • the lipid nanoparticle comprises 1-5% PEG-modified lipid, optionally 1-3 mol%, for example 1.5 to 2.5 mol%, 1-2 mol%, 2-3 mol%, 3-4 mol%, or 4-5 mol%.
  • the lipid nanoparticle comprises 0.5-15 mol% PEG-modified lipid.
  • the lipid nanoparticle may comprise 0.5-10 mol%, 0.5-5 mol%, 1-15 mol%, 1-10 mol%, 1-5 mol%, 2-15 mol%, 2-10 mol%, 2-5 mol%, 5-15 mol%, 5-10 mol%, or 10-15 mol%.
  • the lipid nanoparticle comprises 0.5 mol%, 1 mol%, 2 mol%, 3 mol%, 4 mol%, 5 mol%, 6 mol%, 7 mol%, 8 mol%, 9 mol%, 10 mol%, 11 mol%, 12 mol%, 13 mol%, 14 mol%, or 15 mol% PEG-modified lipid.
  • Some embodiments comprise adding PEG to a composition comprising an LNP encapsulating a nucleic acid (e.g., which already includes PEG in the amounts listed above).
  • the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid, 5-25 mol% non-cationic lipid, 25-55 mol% sterol, and 0.5-15 mol% PEG-modified lipid.
  • a LNP comprises an ionizable amino lipid of Compound 1, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG.
  • a LNP comprises an ionizable amino lipid of Compound 2, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG.
  • a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising PEG-DMG.
  • a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising a compound having Formula (PII).
  • a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII).
  • a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII).
  • a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid having Formula (HI), a structural lipid, and a PEG lipid comprising a compound having Formula (PII).
  • the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 10 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG. In some embodiments, the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 1.5 mol% DMG-PEG. In some embodiments, the lipid nanoparticle comprises 48 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG. In some embodiments, a LNP comprises an N:P ratio of from about 2:1 to about 30:1.
  • a LNP comprises an N:P ratio of about 6:1. In some embodiments, a LNP comprises an N:P ratio of about 3:1, 4:1, or 5:1. In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of from about 10:1 to about 100:1. In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 20:1. In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 10:1.
  • Some embodiments comprise a composition having one or more LNPs having a diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less.
  • Some embodiments comprise a composition having a mean LNP diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less.
  • the composition has a mean LNP diameter from about 30nm to about 150nm, or a mean diameter from about 60nm to about 120nm.
  • a LNP may comprise or one or more types of lipids, including but not limited to amino lipids (e.g., ionizable amino lipids), neutral lipids, non-cationic lipids, charged lipids, PEG- modified lipids, phospholipids, structural lipids and sterols.
  • a LNP may further comprise one or more cargo molecules, including but not limited to nucleic acids (e.g., mRNA, plasmid DNA, DNA or RNA oligonucleotides, siRNA, shRNA, snRNA, snoRNA, lncRNA, etc.), small molecules, proteins and peptides.
  • the composition comprises a liposome.
  • a liposome is a lipid particle comprising lipids arranged into one or more concentric lipid bilayers around a central region. The central region of a liposome may comprises an aqueous solution, suspension, or other aqueous composition.
  • a lipid nanoparticle may comprise two or more components (e.g., amino lipid and nucleic acid, PEG-lipid, phospholipid, structural lipid).
  • a lipid nanoparticle may comprise an amino lipid and a nucleic acid.
  • Compositions comprising the lipid nanoparticles may be used for a wide variety of applications, including the stealth delivery of therapeutic payloads with minimal adverse innate immune response. Effective in vivo delivery of nucleic acids represents a continuing medical challenge. Exogenous nucleic acids (i.e., originating from outside of a cell or organism) are readily degraded in the body, e.g., by the immune system.
  • a particulate carrier e.g., lipid nanoparticles
  • the particulate carrier should be formulated to have minimal particle aggregation, be relatively stable prior to intracellular delivery, effectively deliver nucleic acids intracellularly, and illicit no or minimal immune response.
  • many conventional particulate carriers have relied on the presence and/or concentration of certain components (e.g., PEG-lipid).
  • certain components e.g., PEG-lipid
  • certain components may decrease the stability of encapsulated nucleic acids (e.g., mRNA molecules). The reduced stability may limit the broad applicability of the particulate carriers.
  • the lipid nanoparticles comprise one or more of ionizable molecules, polynucleotides, and optional components, such as structural lipids, sterols, neutral lipids, phospholipids and a molecule capable of reducing particle aggregation (e.g., polyethylene glycol (PEG), PEG-modified lipid), such as those described above.
  • a LNP may include one or more ionizable molecules (e.g., amino lipids or ionizable lipids).
  • the ionizable molecule may comprise a charged group and may have a certain pKa.
  • the pKa of the ionizable molecule may be greater than or equal to about 6, greater than or equal to about 6.2, greater than or equal to about 6.5, greater than or equal to about 6.8, greater than or equal to about 7, greater than or equal to about 7.2, greater than or equal to about 7.5, greater than or equal to about 7.8, greater than or equal to about 8.
  • the pKa of the ionizable molecule may be less than or equal to about 10, less than or equal to about 9.8, less than or equal to about 9.5, less than or equal to about 9.2, less than or equal to about 9.0, less than or equal to about 8.8, or less than or equal to about 8.5. Combinations of the above referenced ranges are also possible (e.g., greater than or equal to 6 and less than or equal to about 8.5). Other ranges are also possible. In embodiments in which more than one type of ionizable molecule are present in a particle, each type of ionizable molecule may independently have a pKa in one or more of the ranges described above.
  • an ionizable molecule comprises one or more charged groups.
  • an ionizable molecule may be positively charged or negatively charged.
  • an ionizable molecule may be positively charged.
  • an ionizable molecule may comprise an amine group.
  • the term “ionizable molecule” has its ordinary meaning in the art and may refer to a molecule or matrix comprising one or more charged moiety.
  • a “charged moiety” is a chemical moiety that carries a formal electronic charge, e.g., monovalent (+1, or -1), divalent (+2, or -2), trivalent (+3, or -3), etc.
  • the charged moiety may be anionic (i.e., negatively charged) or cationic (i.e., positively charged).
  • positively-charged moieties include amine groups (e.g., primary, secondary, and/or tertiary amines), ammonium groups, pyridinium group, guanidine groups, and imidizolium groups.
  • the charged moieties comprise amine groups.
  • negatively- charged groups or precursors thereof include carboxylate groups, sulfonate groups, sulfate groups, phosphonate groups, phosphate groups, hydroxyl groups, and the like.
  • the charge of the charged moiety may vary, in some cases, with the environmental conditions, for example, changes in pH may alter the charge of the moiety, and/or cause the moiety to become charged or uncharged.
  • the charge density of the molecule and/or matrix may be selected as desired.
  • an ionizable molecule e.g., an amino lipid or ionizable lipid
  • the ionizable molecule may include a neutral moiety that can be hydrolyzed to form a charged moiety, such as those described above.
  • the molecule or matrix may include an amide, which can be hydrolyzed to form an amine, respectively.
  • an amide which can be hydrolyzed to form an amine, respectively.
  • Those of ordinary skill in the art will be able to determine whether a given chemical moiety carries a formal electronic charge (for example, by inspection, pH titration, ionic conductivity measurements, etc.), and/or whether a given chemical moiety can be reacted (e.g., hydrolyzed) to form a chemical moiety that carries a formal electronic charge.
  • the ionizable molecule e.g., amino lipid or ionizable lipid
  • the molecular weight of an ionizable molecule is less than or equal to about 2,500 g/mol, less than or equal to about 2,000 g/mol, less than or equal to about 1,500 g/mol, less than or equal to about 1,250 g/mol, less than or equal to about 1,000 g/mol, less than or equal to about 900 g/mol, less than or equal to about 800 g/mol, less than or equal to about 700 g/mol, less than or equal to about 600 g/mol, less than or equal to about 500 g/mol, less than or equal to about 400 g/mol, less than or equal to about 300 g/mol, less than or equal to about 200 g/mol, or less than or equal to about 100 g/mol.
  • the molecular weight of an ionizable molecule is greater than or equal to about 100 g/mol, greater than or equal to about 200 g/mol, greater than or equal to about 300 g/mol, greater than or equal to about 400 g/mol, greater than or equal to about 500 g/mol, greater than or equal to about 600 g/mol, greater than or equal to about 700 g/mol, greater than or equal to about 1000 g/mol, greater than or equal to about 1,250 g/mol, greater than or equal to about 1,500 g/mol, greater than or equal to about 1,750 g/mol, greater than or equal to about 2,000 g/mol, or greater than or equal to about 2,250 g/mol.
  • each type of ionizable molecule may independently have a molecular weight in one or more of the ranges described above.
  • the percentage (e.g., by weight, or by mole) of a single type of ionizable molecule (e.g., amino lipid or ionizable lipid) and/or of all the ionizable molecules within a particle may be greater than or equal to about 15%, greater than or equal to about 16%, greater than or equal to about 17%, greater than or equal to about 18%, greater than or equal to about 19%, greater than or equal to about 20%, greater than or equal to about 21%, greater than or equal to about 22%, greater than or equal to about 23%, greater than or equal to about 24%, greater than or equal to about 25%, greater than or equal to about 30%, greater than or equal to about 35%, greater than or equal to about 40%, greater than or equal to about 42%, greater than or equal to about 45%, greater than or equal to about 48%, greater than or equal to about 50%, greater than or equal to about 52%, greater than or equal to about 55%, greater than or equal to about 58%, greater than
  • the percentage (e.g., by weight, or by mole) may be less than or equal to about 70%, less than or equal to about 68%, less than or equal to about 65%, less than or equal to about 62%, less than or equal to about 60%, less than or equal to about 58%, less than or equal to about 55%, less than or equal to about 52%, less than or equal to about 50%, or less than or equal to about 48%. Combinations of the above referenced ranges are also possible (e.g., greater than or equal to 20% and less than or equal to about 60%, greater than or equal to 40% and less than or equal to about 55%, etc.).
  • each type of ionizable molecule may independently have a percentage (e.g., by weight, or by mole) in one or more of the ranges described above.
  • the percentage e.g., by weight, or by mole
  • the percentage may be determined by extracting the ionizable molecule(s) from the dried particles using, e.g., organic solvents, and measuring the quantity of the agent using high pressure liquid chromatography (i.e., HPLC), liquid chromatography-mass spectrometry (LC-MS), nuclear magnetic resonance (NMR), or mass spectrometry (MS).
  • HPLC may be used to quantify the amount of a component, by, e.g., comparing the area under the curve of a HPLC chromatogram to a standard curve.
  • charge or “charged moiety” does not refer to a “partial negative charge” or “partial positive charge” on a molecule.
  • partial negative charge and “partial positive charge” are given their ordinary meaning in the art.
  • a “partial negative charge” may result when a functional group comprises a bond that becomes polarized such that electron density is pulled toward one atom of the bond, creating a partial negative charge on the atom.
  • a lipid composition comprises one or more lipids.
  • Such lipids may include those useful in the preparation of lipid nanoparticle formulations as described above or as known in the art.
  • Stabilizing compounds Some embodiments of the compositions are stabilized pharmaceutical compositions.
  • Various non-viral delivery systems, including nanoparticle formulations present attractive opportunities to overcome many challenges associated with mRNA delivery.
  • Lipid nanoparticles (LNPs) have drawn particular attention in recent years as various LNP formulations have shown promise in a variety of pharmaceutical applications.
  • lipids have been shown to degrade nucleic acids, including mRNA, and lipid nanoparticle formulations undergo rapid loss of purity when stored as refrigerated liquids. Moreover, the storage stability of mRNA encapsulated within LNPs is lower than that of unencapsulated mRNA.
  • a class of compounds has been found to stabilize nucleic acids within a lipid carrier such as an LNP, an unexpected and unprecedented discovery which enables applications including extended refrigerated liquid shelf-life, extended in-use periods at room temperature, and extended in-use stability at physiological temperatures up to higher temperatures such as 40°C. Such stabilizing compounds solve a critical problem, as current manufacturing processes and formulations experience a 5-10% purity loss during LNP formation and processing that is typical with current large-scale LNP production.
  • the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a stabilizing compound (e.g., a compound of Formula (I), of Formula (II), or a tautomer or solvate thereof).
  • a stabilizing compound e.g., a compound of Formula (I), of Formula (II), or a tautomer or solvate thereof.
  • the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a lipid, and a compound of Formula (I): is a single bond or a double bond;
  • R 1 is H;
  • R 2 is OCH 3 , or together with R 3 is OCH 2 O;
  • R 3 is OCH 3 , or together with R 2 is OCH2O;
  • R 4 is H;
  • R 5 is H or OCH3;
  • R 6 is OCH3;
  • R 7 is H or OCH3;
  • R 8 is H;
  • R 9 is H or CH3;
  • X is a pharmaceutically acceptable anion, e.g., a halide such as chloride.
  • the compound of Formula (I) has the structure of: or Formula (Ia) Formula (Ib) Formula (Ic) or a tautomer or solvate thereof.
  • the stabilized pharmaceutical composition comprises a nucleic acid formulation a nucleic acid and a lipid, and a compound of Formula (II): (II), or a tautomer or solvate thereof, wherein: R 10 is H; R 11 is H; R 12 together with R 13 is OCH 2 O; R 14 is H; R 15 together with R 16 is OCH2O; R 17 is H; and X is a pharmaceutically acceptable anion, e.g., a halide such as chloride.
  • the compound of Formula (II) has the structure of: Stabilizing compounds of Formulas (I), (Ia), (Ib), (Ic), (II), and (Iia) are described in International Application No. PCT/US2022/025967, which is incorporated by reference herein in its entirety.
  • the nucleic acid formulation comprises lipid nanoparticles.
  • the nucleic acid is mRNA.
  • the stabilizing compound (“the compound”) has a purity of at least 70%, 80%, 90%, 95%, or 99%. In some embodiments, the compound contains fewer than 100ppm of elemental metals.
  • the stabilized pharmaceutical composition (“the composition”) comprises a pharmaceutically acceptable metal chelator, e.g., EDTA (ethylenediaminetetraacetic acid) or DTPA (diethylenetriaminepentaacetic acid).
  • the composition is an aqueous solution.
  • the compound is present at a concentration between about 0.1mM and about 10mM in the aqueous solution.
  • the aqueous solution has a pH of or about 5 to 8, including pH of about 5, 5.5, 6, 6.5, 7, 7.5, or 8.
  • the aqueous solution does not comprise NaCl.
  • the aqueous solution comprises NaCl in a concentration of or about 150mM. In some embodiments, the aqueous solution comprises a phosphate buffer, a tris buffer, an acetate buffer, a histidine buffer, or a citrate buffer. In some embodiments, microbial growth in the composition is inhibited by the compound. In some embodiments, the composition is characterized as having a mRNA purity level of greater than 60%, greater than 70%, greater than 80%, or greater than 90% main peak mRNA purity after at least thirty days of storage. In some embodiments, the composition comprises a mRNA purity level of greater than 50% main peak mRNA purity after at least six months of storage. In some embodiments, the storage is at room temperature.
  • the composition comprises a lipid nanoparticle encapsulating a mRNA, and the composition comprises less than 50%, less than 60%, less than 70%, less than 80%, less than 90%, or less than 95% RNA fragments after at least thirty days of storage.
  • the storage temperature is greater than room temperature. In some embodiments, the storage temperature is about 4°C.
  • the compound interacts with the nucleic acid comprised within a lipid nanostructure (e.g., a lipid nanoparticle, liposome, or lipoplex), e.g., via pi-pi stacking and/or by changing backbone helicity of the nucleic acid.
  • the compound intercalates with a nucleic acid.
  • the compound binds with a nucleic acid, e.g., reversible binding, and/or binding to the stranded regions of the nucleic acid.
  • the compound self-associates, binds to nucleic acid ribose contacts, and/or binds to nucleic acid base contacts.
  • the compound does not substantially bind to nucleic acid phosphate contacts.
  • the positive charge of the compound contributes to nucleic acid binding.
  • the compound interacts with a nucleic acid and provides shielding from solvent, e.g., water.
  • the compound shields ribose from solvent more than the compound shields the phosphate groups of the nucleic acid.
  • the solvent exposure is measured by the solvent accessible surface area (SASA).
  • a stabilizing compound decreases the solvent accessible area of ribose to about 5- 10 nm 2 . In some embodiments, a stabilizing compound decreases the solvent accessible area of ribose to about 6-8 nm 2 . In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 9-12 nm 2 . In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 10-11 nm 2 . In some embodiments, a nucleic acid that is conformationally stabilized by the compound exhibits thermal unfolding temperatures (measured by circular dichroism or DSC, for example) that are higher than in the absence of the compound.
  • the compound confers increased stability, e.g., thermal stability, to the nucleic acid in a folded structure, e.g., relative to its unfolded or less folded or more linear form.
  • the compound causes compaction of the nucleic acid upon interaction with the nucleic acid.
  • the compound causes a decrease in the hydrodynamic radius of the nucleic acid molecule upon interaction with the nucleic acid.
  • a stabilizing compound causes compaction or a decrease in the hydrodynamic radius of a nucleic acid molecule by 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, or more.
  • a stabilizing compound causes compaction or a decrease in the hydrodynamic radius of a nucleic acid molecule when the compound is in a concentration of 1 ⁇ M, 2 ⁇ M, 3 ⁇ M, 4 ⁇ M, 5 ⁇ M, 6 ⁇ M, 7 ⁇ M, 8 ⁇ M, 9 ⁇ M, 10 ⁇ M, 15 ⁇ M, 20 ⁇ M, 25 ⁇ M, 30 ⁇ M, 35 ⁇ M, 40 ⁇ M, 45 ⁇ M, 50 ⁇ M, 60 ⁇ M, 70 ⁇ M, 80 ⁇ M, 90 ⁇ M, or 100 ⁇ M.
  • compositions e.g., pharmaceutical compositions
  • methods, kits and reagents for prevention or treatment of coronavirus in humans and other mammals, for example.
  • the compositions can be used as therapeutic or prophylactic agents. They may be used in medicine to prevent and/or treat a coronavirus infection.
  • the SARS-CoV-2 vaccine containing RNA can be administered to a subject (e.g., a mammalian subject, such as a human subject), and the RNA polynucleotides are translated in vivo to produce an antigenic polypeptide (antigen).
  • an “effective amount” of a composition is based, at least in part, on the target tissue, target cell type, means of administration, physical characteristics of the RNA (e.g., length, nucleotide composition, and/or extent of modified nucleosides), other components of the vaccine, and other determinants, such as age, body weight, height, sex and general health of the subject.
  • an effective amount of a composition provides an induced or boosted immune response as a function of antigen production in the cells of the subject.
  • an effective amount of the composition containing RNA polynucleotides having at least one chemical modifications are more efficient than a composition containing a corresponding unmodified polynucleotide encoding the same antigen or a peptide antigen.
  • Increased antigen production may be demonstrated by increased cell transfection (the percentage of cells transfected with the RNA vaccine), increased protein translation and/or expression from the polynucleotide, decreased nucleic acid degradation (as demonstrated, for example, by increased duration of protein translation from a modified polynucleotide), or altered antigen specific immune response of the host cell.
  • composition refers to the combination of an active agent with a carrier, inert or active, making the composition especially suitable for diagnostic or therapeutic use in vivo or ex vivo.
  • a “pharmaceutically acceptable carrier,” after administered to or upon a subject, does not cause undesirable physiological effects.
  • the carrier in the pharmaceutical composition must be “acceptable” also in the sense that it is compatible with the active ingredient and can be capable of stabilizing it.
  • One or more solubilizing agents can be utilized as pharmaceutical carriers for delivery of an active agent.
  • a pharmaceutically acceptable carrier include, but are not limited to, biocompatible vehicles, adjuvants, additives, and diluents to achieve a composition usable as a dosage form.
  • compositions comprising polynucleotides and their encoded polypeptides
  • a composition may be administered prophylactically or therapeutically as part of an active immunization scheme to healthy individuals or early in infection during the incubation phase or during active infection after onset of symptoms.
  • the amount of RNA provided to a cell, a tissue or a subject may be an amount effective for immune prophylaxis.
  • a composition may be administered with other prophylactic or therapeutic compounds.
  • a prophylactic or therapeutic compound may be an adjuvant or a booster.
  • the term “booster” refers to an extra administration of the vaccine composition and may include a traditional boost, seasonal boost or a pandemic shift boost.
  • a booster (or booster vaccine) may be given after an earlier administration of the prophylactic composition.
  • the time of administration between the initial administration of the prophylactic composition and the booster may be, but is not limited to, 1 minute, 2 minutes, 3 minutes, 4 minutes, 5 minutes, 6 minutes, 7 minutes, 8 minutes, 9 minutes, 10 minutes, 15 minutes, 20 minutes 35 minutes, 40 minutes, 45 minutes, 50 minutes, 55 minutes, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 11 hours, 12 hours, 13 hours, 14 hours, 15 hours, 16 hours, 17 hours, 18 hours, 19 hours, 20 hours, 21 hours, 22 hours, 23 hours, 1 day, 36 hours, 2 days, 3 days, 4 days, 5 days, 6 days, 1 week, 10 days, 2 weeks, 3 weeks, 1 month, 2 months, 3 months, 4 months, 5 months, or 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, one year, or more.
  • the time of administration between the initial administration of the prophylactic composition and the booster is at least 6 months.
  • the time of administration between the initial administration of the prophylactic composition and the booster may be, but is not limited to, 1 week, 2 weeks, 3 weeks, 1 month, 2 months, 3 months, or 6 months.
  • the booster may comprise the same or different mRNAs as compared to the earlier administration of the prophylactic composition.
  • the booster may comprise a combination of the same mRNA from the earlier administration of the prophylactic composition and at least one different mRNA.
  • the ratio of the mRNA from the earlier administration of the prophylactic composition and the at least one different mRNA is 1:1, 1:2, 1:4, 4:1, or 2:1.
  • the ratio is 1:1.
  • the booster may comprise different mRNAs as compared to the earlier administration of the prophylactic compositions. In some embodiments, such a booster may comprise 1, 2, 3, 4 or more mRNAs that were not present in the prophylactic composition. In some embodiments, the ratio of two mRNA polynucleotides (none of which were in the prophylactic composition) in the booster is 1:1, 1:2, 1:4, 4:1, or 2:1. In one embodiment, the ratio is 1:1.
  • a boost or booster dose may be administered more than once, for example 2, 3, 4, 5, 6 or more times after the initial prophylactic (prime) dose.
  • a subsequent boost is administered within weeks, e.g., within 3-4 weeks of the first (or previous) boost.
  • a second boost is administered 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or more weeks after the first (or previous) boost.
  • the booster in some embodiments is monovalent (e.g., the mRNA encodes a single antigen). In some embodiments, the booster is multivalent (e.g., the mRNA encodes more than one antigen).
  • the booster dose is 5 ⁇ g-30 ⁇ g, 5 ⁇ g-25 ⁇ g, 5 ⁇ g-20 ⁇ g, 5 ⁇ g-15 ⁇ g, 5 ⁇ g-10 ⁇ g, 10 ⁇ g-30 ⁇ g, 10 ⁇ g-25 ⁇ g, 10 ⁇ g-20 ⁇ g, 10 ⁇ g-15 ⁇ g, 15 ⁇ g-30 ⁇ g, 15 ⁇ g-25 ⁇ g, 15 ⁇ g-20 ⁇ g, 20 ⁇ g-30 ⁇ g, 25 ⁇ g-30 ⁇ g, or 25 ⁇ g-300 ⁇ g.
  • the booster dose is 10 ⁇ g-60 ⁇ g, 10 ⁇ g-55 ⁇ g, 10 ⁇ g-50 ⁇ g, 10 ⁇ g-45 ⁇ g, 10 ⁇ g-40 ⁇ g, 10 ⁇ g-35 ⁇ g, 10 ⁇ g- 30 ⁇ g, 10 ⁇ g-25 ⁇ g, 10 ⁇ g-20 ⁇ g, 15 ⁇ g-60 ⁇ g, 15 ⁇ g-55 ⁇ g, 15 ⁇ g-50 ⁇ g, 15 ⁇ g-45 ⁇ g, 15 ⁇ g- 40 ⁇ g, 15 ⁇ g-35 ⁇ g, 15 ⁇ g-30 ⁇ g, 15 ⁇ g-25 ⁇ g, 15 ⁇ g-20 ⁇ g, 20 ⁇ g-60 ⁇ g, 20 ⁇ g-55 ⁇ g, 20 ⁇ g- 50 ⁇ g, 20 ⁇ g-45 ⁇ g, 20 ⁇ g-40 ⁇ g, 20 ⁇ g-35 ⁇ g, 20 ⁇ g-30 ⁇ g, 20 ⁇ g-25 ⁇ g, 20 ⁇ g
  • the booster dose is at least 10 ⁇ g and less than 25 ⁇ g of the composition. In some embodiments, the booster dose is at least 5 ⁇ g and less than 25 ⁇ g of the composition. For example, the booster dose is 5 ⁇ g, 10 ⁇ g, 15 ⁇ g, 20 ⁇ g, 25 ⁇ g, 30 ⁇ g, 35 ⁇ g, 40 ⁇ g, 45 ⁇ g, 50 ⁇ g, 55 ⁇ g, 60 ⁇ g, 65 ⁇ g, 70 ⁇ g, 75 ⁇ g, 80 ⁇ g, 85 ⁇ g, 90 ⁇ g, 95 ⁇ g, 100 ⁇ g, 110 ⁇ g, 120 ⁇ g, 130 ⁇ g, 140 ⁇ g, 150 ⁇ g, 160 ⁇ g, 170 ⁇ g, 180 ⁇ g, 190 ⁇ g, 200 ⁇ g, 250 ⁇ g, or 300 ⁇ g.
  • a composition may be administered intramuscularly, intranasally or intradermally, similarly to the administration of inactivated vaccines known in the art.
  • a composition may be utilized in various settings depending on the prevalence of the infection or the degree or level of unmet medical need.
  • the RNA vaccines may be utilized to treat and/or prevent a variety of infectious disease. RNA vaccines have superior properties in that they produce much larger antibody titers, better neutralizing immunity, produce more durable immune responses, and/or produce responses earlier than commercially available vaccines.
  • Some aspects relate to pharmaceutical compositions including RNA and/or complexes optionally in combination with one or more pharmaceutically acceptable excipients.
  • RNA may be formulated or administered alone or in conjunction with one or more other components.
  • an immunizing composition may comprise other components including, but not limited to, adjuvants.
  • an immunizing composition does not include an adjuvant (it is adjuvant free).
  • An RNA may be formulated or administered in combination with one or more pharmaceutically-acceptable excipients.
  • vaccine compositions comprise at least one additional active substances, such as, for example, a therapeutically-active substance, a prophylactically-active substance, or a combination of both.
  • Vaccine compositions may be sterile, pyrogen-free or both sterile and pyrogen-free.
  • an immunizing composition is administered to humans, human patients or subjects.
  • active ingredient generally refers to the RNA vaccines or the polynucleotides contained therein, for example, RNA polynucleotides (e.g., mRNA polynucleotides) encoding antigens.
  • Formulations of the vaccine compositions may be prepared by any method known or hereafter developed in the art of pharmacology.
  • such preparatory methods include the step of bringing the active ingredient (e.g., mRNA polynucleotide) into association with an excipient and/or one or more other accessory ingredients, and then, if necessary and/or desirable, dividing, shaping and/or packaging the product into a desired single- or multi-dose unit.
  • the active ingredient e.g., mRNA polynucleotide
  • Relative amounts of the active ingredient, the pharmaceutically acceptable excipient, and/or any additional ingredients in a pharmaceutical composition will vary, depending upon the identity, size, and/or condition of the subject treated and further depending upon the route by which the composition is to be administered.
  • the composition may comprise between 0.1% and 100%, e.g., between 0.5 and 50%, between 1-30%, between 5-80%, at least 80% (w/w) active ingredient.
  • an RNA is formulated using one or more excipients to: (1) increase stability; (2) increase cell transfection; (3) permit the sustained or delayed release (e.g., from a depot formulation); (4) alter the biodistribution (e.g., target to specific tissues or cell types); (5) increase the translation of encoded protein in vivo; and/or (6) alter the release profile of encoded protein (antigen) in vivo.
  • excipients can include, without limitation, lipidoids, liposomes, lipid nanoparticles, polymers, lipoplexes, core-shell nanoparticles, peptides, proteins, cells transfected with the RNA (e.g., for transplantation into a subject), hyaluronidase, nanoparticle mimics and combinations thereof.
  • immunizing compositions e.g., RNA vaccines
  • methods, kits and reagents for prevention and/or treatment of coronavirus infection in humans and other mammals can be used as therapeutic or prophylactic agents.
  • immunizing compositions are used to provide prophylactic protection from coronavirus infection.
  • immunizing compositions are used to treat a coronavirus infection.
  • immunizing compositions are used in the priming of immune effector cells, for example, to activate peripheral blood mononuclear cells (PBMCs) ex vivo, which are then infused (re-infused) into a subject.
  • PBMCs peripheral blood mononuclear cells
  • a subject may be any mammal, including non-human primate and human subjects.
  • a subject is a human subject.
  • an immunizing composition e.g., RNA vaccine
  • a subject e.g., a mammalian subject, such as a human subject
  • an immunizing composition is administered to a subject (e.g., a mammalian subject, such as a human subject) in an effective amount to induce an antigen-specific immune response.
  • the RNA encoding the coronavirus spike protein antigen is expressed and translated in vivo to produce the antigen, which then stimulates an immune response in the subject.
  • Prophylactic protection from a coronavirus can be achieved following administration of an immunizing composition (e.g., an RNA vaccine).
  • Immunizing compositions can be administered once, twice, three times, four times or more but it is likely sufficient to administer the vaccine once (optionally followed by a single booster). It is possible, although less desirable, to administer an immunizing composition to an infected individual to achieve a therapeutic response. Dosing may need to be adjusted accordingly.
  • Some aspects relate to a method of eliciting an immune response in a subject against a coronavirus antigen (or multiple antigens).
  • a method involves administering to the subject an immunizing composition comprising a mRNA having an open reading frame encoding a coronavirus antigen, thereby inducing in the subject an immune response specific to the coronavirus antigen, wherein anti-antigen antibody titer in the subject is increased following vaccination relative to anti-antigen antibody titer in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the antigen.
  • An “anti- antigen antibody” is a serum antibody the binds specifically to the antigen.
  • a prophylactically effective dose is an effective dose that prevents infection with the virus at a clinically acceptable level.
  • the effective dose is a dose listed in a package insert for the vaccine.
  • a traditional vaccine refers to a vaccine other than the mRNA vaccines.
  • a traditional vaccine includes, but is not limited, to live microorganism vaccines, killed microorganism vaccines, subunit vaccines, protein antigen vaccines, DNA vaccines, virus like particle (VLP) vaccines, etc.
  • a traditional vaccine is a vaccine that has achieved regulatory approval and/or is registered by a national drug regulatory body, for example the Food and Drug Administration (FDA) in the United States or the European Medicines Agency (EMA).
  • FDA Food and Drug Administration
  • EMA European Medicines Agency
  • the anti-antigen antibody titer in the subject is increased 1 log to 10 log following vaccination relative to anti-antigen antibody titer in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the coronavirus or an unvaccinated subject. In some embodiments, the anti-antigen antibody titer in the subject is increased 1 log, 2 log, 3 log, 4 log, 5 log, or 10 log following vaccination relative to anti-antigen antibody titer in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the coronavirus or an unvaccinated subject. Some aspects relate to a method of eliciting an immune response in a subject against a coronavirus.
  • the method involves administering to the subject a composition comprising an mRNA comprising an open reading frame encoding a coronavirus antigen, thereby inducing in the subject an immune response specific to the coronavirus, wherein the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine against the coronavirus at 2 times to 100 times the dosage level relative to the composition.
  • the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at twice the dosage level relative to a composition.
  • the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at three times the dosage level relative to a composition.
  • the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at 4 times, 5 times, 10 times, 50 times, or 100 times the dosage level relative to a composition. In some embodiments, the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at 10 times to 1000 times the dosage level relative to a composition. In some embodiments, the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at 100 times to 1000 times the dosage level relative to a composition. In some embodiments, the immune response is assessed by determining [protein] antibody titer in the subject.
  • the ability of serum or antibody from an immunized subject is tested for its ability to neutralize viral uptake or reduce coronavirus transformation of human B lymphocytes.
  • the ability to promote a robust T cell response(s) is measured using art recognized techniques.
  • Some aspects relate to methods of eliciting an immune response in a subject against a coronavirus by administering to the subject composition comprising an mRNA having an open reading frame encoding a coronavirus antigen, thereby inducing in the subject an immune response specific to the coronavirus antigen, wherein the immune response in the subject is induced 2 days to 10 weeks earlier relative to an immune response induced in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the coronavirus.
  • the immune response in the subject is induced in a subject vaccinated with a prophylactically effective dose of a traditional vaccine at 2 times to 100 times the dosage level relative to a composition.
  • the immune response in the subject is induced 2 days, 3 days, 1 week, 2 weeks, 3 weeks, 5 weeks, or 10 weeks earlier relative to an immune response induced in a subject vaccinated with a prophylactically effective dose of a traditional vaccine.
  • Some aspects relate to methods of eliciting an immune response in a subject against a coronavirus by administering to the subject an mRNA having an open reading frame encoding a first antigen, wherein the RNA does not include a stabilization element, and wherein an adjuvant is not co-formulated or co-administered with the vaccine.
  • a composition may be administered by any route that results in a therapeutically effective outcome.
  • RNA vaccines include, but are not limited, to intradermal, intramuscular, intranasal, and/or subcutaneous administration.
  • Some aspects relate to methods comprising administering RNA vaccines to a subject in need thereof.
  • the exact amount required will vary from subject to subject, depending on the species, age, and general condition of the subject, the severity of the disease, the particular composition, its mode of administration, its mode of activity, and the like.
  • the RNA is typically formulated in dosage unit form for ease of administration and uniformity of dosage. It will be understood, however, that the total daily usage of the RNA may be decided by the attending physician within the scope of sound medical judgment.
  • the specific therapeutically effective, prophylactically effective, or appropriate imaging dose level for any particular patient will depend upon a variety of factors including the disorder being treated and the severity of the disorder; the activity of the specific compound employed; the specific composition employed; the age, body weight, general health, sex and diet of the patient; the time of administration, route of administration, and rate of excretion of the specific compound employed; the duration of the treatment; drugs used in combination or coincidental with the specific compound employed; and like factors well known in the medical arts.
  • the effective amount (e.g., effective dose) of the RNA may be as low as 20 ⁇ g, administered for example as a single dose or as two 10 ⁇ g doses.
  • the effective amount is a total dose of 20 ⁇ g-300 ⁇ g5 ⁇ g-30 ⁇ g, 5 ⁇ g-25 ⁇ g, 5 ⁇ g-20 ⁇ g, 5 ⁇ g-15 ⁇ g, 5 ⁇ g-10 ⁇ g, 10 ⁇ g-30 ⁇ g, 10 ⁇ g-25 ⁇ g, 10 ⁇ g-20 ⁇ g, 10 ⁇ g-15 ⁇ g, 15 ⁇ g- 30 ⁇ g, 15 ⁇ g-25 ⁇ g, 15 ⁇ g-20 ⁇ g, 20 ⁇ g-30 ⁇ g, 25 ⁇ g-30 ⁇ g, or 25 ⁇ g-300 ⁇ g.
  • the effective dose (e.g., effective amount) is at least 10 ⁇ g and less than 25 ⁇ g of the composition. In some embodiments, the effective dose (e.g., effective amount) is at least 5 ⁇ g and less than 25 ⁇ g of the composition.
  • the effective amount may be a total dose of 5 ⁇ g, 10 ⁇ g, 15 ⁇ g, 20 ⁇ g, 25 ⁇ g, 30 ⁇ g, 35 ⁇ g, 40 ⁇ g, 45 ⁇ g, 50 ⁇ g, 55 ⁇ g, 60 ⁇ g, 65 ⁇ g, 70 ⁇ g, 75 ⁇ g, 80 ⁇ g, 85 ⁇ g, 90 ⁇ g, 95 ⁇ g, 100 ⁇ g, 110 ⁇ g, 120 ⁇ g, 130 ⁇ g, 140 ⁇ g, 150 ⁇ g, 160 ⁇ g, 170 ⁇ g, 180 ⁇ g, 190 ⁇ g, 200 ⁇ g, 250 ⁇ g, or 300 ⁇ g.
  • the effective amount (e.g., effective dose) is a total dose of 10 ⁇ g. In some embodiments, the effective amount is a total dose of 20 ⁇ g (e.g., two 10 ⁇ g doses). In some embodiments, the effective amount is a total dose of 25 ⁇ g. In some embodiments, the effective amount is a total dose of 30 ⁇ g. In some embodiments, the effective amount is a total dose of 50 ⁇ g. In some embodiments, the effective amount is a total dose of 60 ⁇ g (e.g., two 30 ⁇ g doses). In some embodiments, the effective amount is a total dose of 75 ⁇ g. In some embodiments, the effective amount is a total dose of 100 ⁇ g.
  • the effective amount is a total dose of 150 ⁇ g. In some embodiments, the effective amount is a total dose of 200 ⁇ g. In some embodiments, the effective amount is a total dose of 250 ⁇ g. In some embodiments, the effective amount is a total dose of 300 ⁇ g. Any of the doses provided above may be an effective amount for a booster dose; for example, in some embodiments, the booster dose is a total dose of 50 ⁇ g. In some embodiments, the composition comprises two or more mRNA polynucleotides and effective amount is a total dose of 20 ⁇ g (e.g., 10 ⁇ g of a first mRNA and 10 ⁇ g of a second mRNA).
  • the composition comprises two or more mRNA polynucleotides and effective amount is a total dose of 50 ⁇ g (e.g., 25 ⁇ g of a first mRNA and 25 ⁇ g of a second mRNA). In some embodiments, the composition comprises two or more mRNA polynucleotides and effective amount is a total dose of 100 ⁇ g (e.g., 50 ⁇ g of a first mRNA and 50 ⁇ g of a second mRNA).
  • RNA can be formulated into a dosage form, such as an intranasal, intratracheal, or injectable (e.g., intravenous, intraocular, intravitreal, intramuscular, intradermal, intracardiac, intraperitoneal, and subcutaneous).
  • Vaccine Efficacy Some aspects relate to compositions containing RNA (e.g., RNA vaccines), wherein the RNA is present in an effective amount to produce an antigen specific immune response in a subject (e.g., production of antibodies specific to a coronavirus antigen). “An effective amount” is a dose of the RNA effective to produce an antigen-specific immune response. Some aspects relate to methods of inducing an antigen-specific immune response in a subject.
  • an immune response to a vaccine or LNP is the development in a subject of a humoral and/or a cellular immune response to a (one or more) coronavirus protein(s) present in the vaccine.
  • a “humoral” immune response refers to an immune response mediated by antibody molecules, including, e.g., secretory (IgA) or IgG molecules, while a “cellular” immune response is one mediated by T-lymphocytes (e.g., CD4+ helper and/or CD8+ T cells (e.g., CTLs) and/or other white blood cells.
  • T-lymphocytes e.g., CD4+ helper and/or CD8+ T cells (e.g., CTLs) and/or other white blood cells.
  • CTLs cytolytic T-cells
  • CTLs have specificity for peptide antigens that are presented in association with proteins encoded by the major histocompatibility complex (MHC) and expressed on the surfaces of cells. CTLs help induce and promote the destruction of intracellular microbes or the lysis of cells infected with such microbes.
  • MHC major histocompatibility complex
  • Another aspect of cellular immunity involves and antigen-specific response by helper T-cells. Helper T-cells act to help stimulate the function and focus the activity nonspecific effector cells against cells displaying peptide antigens in association with MHC molecules on their surface.
  • a cellular immune response also leads to the production of cytokines, chemokines, and other such molecules produced by activated T-cells and/or other white blood cells including those derived from CD4+ and CD8+ T-cells.
  • the antigen-specific immune response is characterized by measuring an anti-coronavirus antigen antibody titer produced in a subject administered a composition.
  • An antibody titer is a measurement of the amount of antibodies within a subject, for example, antibodies that are specific to a particular antigen or epitope of an antigen.
  • Antibody titer is typically expressed as the inverse of the greatest dilution that provides a positive result.
  • Enzyme-linked immunosorbent assay is a common assay for determining antibody titers, for example.
  • a variety of serological tests can be used to measure antibody against encoded antigen of interest, for example, SAR-CoV-2 virus or SAR-CoV-2 viral antigen, e.g., SAR-CoV-2 spike or S protein, of domain thereof. These tests include the hemagglutination-inhibition test, complement fixation test, fluorescent antibody test, enzyme-linked immunosorbent assay (ELISA), and plaque reduction neutralization test (PRNT). Each of these tests measures different antibody activities.
  • a plaque reduction neutralization test, or PRNT is used as a serological correlate of protection.
  • PRNT measures the biological parameter of in vitro virus neutralization and is the most serologically virus-specific test among certain classes of viruses, correlating well to serum levels of protection from virus infection.
  • the basic design of the PRNT allows for virus-antibody interaction to occur in a test tube or microtiter plate, and then measuring antibody effects on viral infectivity by plating the mixture on virus-susceptible cells, preferably cells of mammalian origin. The cells are overlaid with a semi-solid media that restricts spread of progeny virus.
  • virus that initiates a productive infection produces a localized area of infection (a plaque), that can be detected in a variety of ways. Plaques are counted and compared back to the starting concentration of virus to determine the percent reduction in total virus infectivity.
  • the serum sample being tested is usually subjected to serial dilutions prior to mixing with a standardized amount of virus.
  • concentration of virus is held constant such that, when added to susceptible cells and overlaid with semi-solid media, individual plaques can be discerned and counted.
  • PRNT end- point titers can be calculated for each serum sample at any selected percent reduction of virus activity.
  • the serum sample dilution series for antibody titration should ideally start below the “seroprotective” threshold titer.
  • the “seroprotective” threshold titer remains unknown; but a seropositivity threshold of 1:10 can be considered a seroprotection threshold in certain embodiments.
  • a neutralizing immune response is an immune response that produces a level of antibodies that meet or exceed a seroprotection threshold.
  • PRNT end-point titers are expressed as the reciprocal of the last serum dilution showing the desired percent reduction in plaque counts. The PRNT titer can be calculated based on a 50% or greater reduction in plaque counts (PRNT50).
  • a PRNT50 titer is preferred over titers using higher cut-offs (e.g., PRNT90) for vaccine sera, providing more accurate results from the linear portion of the titration curve.
  • PRNT90 cut-offs
  • PRNT titers There are several ways to calculate PRNT titers. The simplest and most widely used way to calculate titers is to count plaques and report the titer as the reciprocal of the last serum dilution to show >50% reduction of the input plaque count as based on the back-titration of input plaques. Use of curve fitting methods from several serum dilutions may permit calculation of a more precise result. There are a variety of computer analysis programs available for this (e.g., SPSS or GraphPad Prism).
  • an antibody titer is used to assess whether a subject has had an infection or to determine whether immunizations are required. In some embodiments, an antibody titer is used to determine the strength of an autoimmune response, to determine whether a booster immunization is needed, to determine whether a previous vaccine was effective, and to identify any recent or prior infections. An antibody titer may be used to determine the strength of an immune response induced in a subject by a composition (e.g., RNA vaccine). In some embodiments, an anti-coronavirus antigen antibody titer produced in a subject is increased by at least 1 log relative to a control.
  • anti-coronavirus antigen antibody titer produced in a subject may be increased by at least 1.5, at least 2, at least 2.5, or at least 3 log relative to a control.
  • the anti-coronavirus antigen antibody titer produced in the subject is increased by 1, 1.5, 2, 2.5 or 3 log relative to a control.
  • the anti-coronavirus antigen antibody titer produced in the subject is increased by 1-3 log relative to a control.
  • the anti-coronavirus antigen antibody titer produced in a subject may be increased by 1-1.5, 1-2, 1-2.5, 1-3, 1.5-2, 1.5-2.5, 1.5-3, 2-2.5, 2-3, or 2.5-3 log relative to a control.
  • the anti-coronavirus antigen antibody titer produced in a subject is increased at least 2 times relative to a control.
  • the anti-coronavirus antigen n antibody titer produced in a subject may be increased at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times relative to a control.
  • the anti-coronavirus antigen antibody titer produced in the subject is increased 2, 3, 4, 5, 6, 7, 8, 9, or 10 times relative to a control.
  • the anti-coronavirus antigen antibody titer produced in a subject is increased 2-10 times relative to a control.
  • the anti-coronavirus antigen antibody titer produced in a subject may be increased 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 2-4, 2-3, 3-10, 3-9, 3-8, 3-7, 3-6, 3-5, 3-4, 4-10, 4-9, 4-8, 4-7, 4-6, 4-5, 5-10, 5-9, 5-8, 5-7, 5-6, 6-10, 6-9, 6-8, 6-7, 7-10, 7-9, 7-8, 8-10, 8-9, or 9-10 times relative to a control.
  • an antigen-specific immune response is measured as a ratio of geometric mean titer (GMT), referred to as a geometric mean ratio (GMR), of serum neutralizing antibody titers to coronavirus.
  • GTT geometric mean titer
  • a geometric mean titer (GMT) is the average antibody titer for a group of subjects calculated by multiplying all values and taking the nth root of the number, where n is the number of subjects with available data.
  • a control in some embodiments, is an anti-coronavirus antigen antibody titer produced in a subject who has not been administered a composition (e.g., RNA vaccine).
  • a control is an anti-coronavirus antigen antibody titer produced in a subject administered a recombinant or purified protein vaccine.
  • Recombinant protein vaccines typically include protein antigens that either have been produced in a heterologous expression system (e.g., bacteria or yeast) or purified from large amounts of the pathogenic organism.
  • the ability of a composition e.g., RNA vaccine
  • a composition may be administered to a murine model and the murine model assayed for induction of neutralizing antibody titers. Viral challenge studies may also be used to assess the efficacy of a vaccine.
  • a composition may be administered to a murine model, the murine model challenged with virus, and the murine model assayed for survival and/or immune response (e.g., neutralizing antibody response, T cell response (e.g., cytokine response)).
  • an effective amount of a composition e.g., RNA vaccine
  • a “standard of care” refers to a medical or psychological treatment guideline and can be general or specific. “Standard of care” specifies appropriate treatment based on scientific evidence and collaboration between medical professionals involved in the treatment of a given condition.
  • a “standard of care dose” refers to the dose of a recombinant or purified protein vaccine, or a live attenuated or inactivated vaccine, or a VLP vaccine, that a physician/clinician or other medical professional would administer to a subject to treat or prevent coronavirus infection or a related condition, while following the standard of care guideline for treating or preventing coronavirus infection or a related condition.
  • the anti-coronavirus antigen antibody titer produced in a subject administered an effective amount of a composition is equivalent to an anti-coronavirus antigen antibody titer produced in a control subject administered a standard of care dose of a recombinant or purified protein vaccine, or a live attenuated or inactivated vaccine, or a VLP vaccine.
  • Vaccine efficacy may be assessed using standard analyses (see, e.g., Weinberg et al., J Infect Dis.2010 Jun 1;201(11):1607-10). For example, vaccine efficacy may be measured by double-blind, randomized, clinical controlled trials.
  • AR disease attack rate
  • RR relative risk
  • vaccine effectiveness may be assessed using standard analyses (see, e.g., Weinberg et al., J Infect Dis.2010 Jun 1;201(11):1607-10).
  • Vaccine effectiveness is an assessment of how a vaccine (which may have already proven to have high vaccine efficacy) reduces disease in a population.
  • Vaccine effectiveness is proportional to vaccine efficacy (potency) but is also affected by how well target groups in the population are immunized, as well as by other non-vaccine-related factors that influence the ‘real-world’ outcomes of hospitalizations, ambulatory visits, or costs.
  • a retrospective case control analysis may be used, in which the rates of vaccination among a set of infected cases and appropriate controls are compared.
  • efficacy of the composition is at least 60% relative to unvaccinated control subjects.
  • efficacy of the composition may be at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 95%, at least 98%, or 100% relative to unvaccinated control subjects.
  • Sterilizing Immunity refers to a unique immune status that prevents effective pathogen infection into the host.
  • the effective amount of a composition is sufficient to provide sterilizing immunity in the subject for at least 1 year.
  • the effective amount of a composition is sufficient to provide sterilizing immunity in the subject for at least 2 years, at least 3 years, at least 4 years, or at least 5 years.
  • the effective amount of a composition is sufficient to provide sterilizing immunity in the subject at an at least 5-fold lower dose relative to control.
  • the effective amount may be sufficient to provide sterilizing immunity in the subject at an at least 10-fold lower, 15-fold, or 20-fold lower dose relative to a control.
  • Detectable Antigen In some embodiments, the effective amount of a composition is sufficient to produce detectable levels of coronavirus antigen as measured in serum of the subject at 1-72 hours post administration. Titer.
  • An antibody titer is a measurement of the number of antibodies within a subject, for example, antibodies that are specific to a particular antigen (e.g., an anti-coronavirus antigen).
  • Antibody titer is typically expressed as the inverse of the greatest dilution that provides a positive result.
  • Enzyme-linked immunosorbent assay (ELISA) is a common assay for determining antibody titers, for example.
  • a neutralizing immune response is an immune response that is a neutralizing antibody response and/or an effective neutralizing T cell response. In some embodiments a neutralizing antibody response produces a level of antibodies that meet or exceed a seroprotection threshold.
  • An effective T cell response is a response which produces a baseline level of viral activated or viral specific T cells including CD8+ and CD4+ T helper type 1 cells.
  • CD8+ cytotoxic T lymphocytes typically clear the intracellular virus compartment and CD4+ T cells exert various functions in the body such as helping B and other T cells, promoting memory generation and indirect or direct cytotoxic activity.
  • the effective T cells comprises a high proportion of CD8+ T cells and/or CD4+ T cells, relative to a baseline level (in a na ⁇ ve subject). In some embodiments these T cells are differentiated towards an early- differentiated memory phenotype with co-expression of CD27 and CD28.
  • the effective amount of a composition is sufficient to produce a 1,000-10,000 neutralizing antibody titer produced by neutralizing antibody against the coronavirus antigen as measured in serum of the subject at 1-72 hours post administration.
  • the effective amount is sufficient to produce a 1,000-5,000 neutralizing antibody titer produced by neutralizing antibody against the coronavirus antigen as measured in serum of the subject at 1-72 hours post administration. In some embodiments, the effective amount is sufficient to produce a 5,000-10,000 neutralizing antibody titer produced by neutralizing antibody against the coronavirus antigen as measured in serum of the subject at 1-72 hours post administration. In some embodiments, the neutralizing antibody titer is at least 100 NT50. For example, the neutralizing antibody titer may be at least 200, 300, 400, 500, 600, 700, 800, 900 or 1000 NT50. In some embodiments, the neutralizing antibody titer is at least 10,000 NT50.
  • the neutralizing antibody titer is at least 100 neutralizing units per milliliter (NU/mL).
  • the neutralizing antibody titer may be at least 200, 300, 400, 500, 600, 700, 800, 900 or 1000 NU/mL.
  • the neutralizing antibody titer is at least 10,000 NU/mL.
  • an anti-coronavirus antigen antibody titer produced in the subject is increased by at least 1 log relative to a control.
  • an anti-coronavirus antigen antibody titer produced in the subject may be increased by at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 log relative to a control.
  • an anti-coronavirus antigen antibody titer produced in the subject is increased at least 2 times relative to a control.
  • an anti-coronavirus antigen antibody titer produced in the subject is increased by at least 3, 4, 5, 6, 7, 8, 9 or 10 times relative to a control.
  • a geometric mean which is the nth root of the product of n numbers, is generally used to describe proportional growth. Geometric mean, in some embodiments, is used to characterize antibody titer produced in a subject.
  • a control may be, for example, an unvaccinated subject, or a subject administered a live attenuated viral vaccine, an inactivated viral vaccine, or a protein subunit vaccine.
  • Characterization of polynucleotides may be accomplished using polynucleotide mapping, reverse transcriptase sequencing, charge distribution analysis, detection of RNA impurities, or any combination of two or more of the foregoing.
  • “Characterizing” comprises determining the RNA transcript sequence, determining the purity of the RNA transcript, or determining the charge heterogeneity of the RNA transcript, for example. Such methods are taught in, for example, International Publication WO2014/144711 and International Publication WO2014/144767, the contents of each of which are incorporated herein by reference in their entirety.
  • lipid nanoparticle (LNP) formulation included 48 mol% ionizable lipid of Compound 1, 11 mol% 1,2 distearoyl-sn-glycero-3- phosphocholine (DSPC), 38.5 mol% cholesterol, and 2.5 mol% PEG-modified 1,2 dimyristoyl- sn-glycerol, methoxypolyethyleneglycol (PEG2500 DMG).
  • Immunization Methods Vaccine compositions of lipid nanoparticles containing mRNAs are administered to mice according the following administration schedule. C57BL/6 mice are immunized with two doses of a given composition, receiving the first dose on day 0, and the second dose on day 22.
  • Sera are collected on day 21, three weeks after the first (prime) dose but before administration of the second (boost) dose, and day 36, two weeks after administration of the second (boost) dose.
  • mice are euthanized on day 36, and spleens are collected and processed to harvest splenocytes.
  • Splenocytes are stimulated with one of a panel of peptide pools, each pool containing peptides from a single SARS-CoV-2 antigen, in the presence of a Golgi blocker so that cells producing cytokines in response to stimulation retain cytokines instead of secreting them.
  • Cell surfaces are stained for lymphocyte markers, including CD3, CD4, and CD8, and cells are permeabilized and stained for multiple cytokines.
  • Neutralization assays Antibodies in serum, when bound to a viral surface protein that is essential for infection, can prevent a virus from infecting a target cell, an activity referred to as “neutralization.” To determine the ability of mRNA compositions to generate neutralizing antibodies against SARS- CoV-2, the neutralization activity of sera is quantified using a neutralization assay. For each assay, ARPE-19 cells are plated in 96-well plates, at a density of 2*104 cells/well and incubated for 20–24 hours. Then, serial 3-fold dilutions of each serum sample are prepared in phenol red- free cDMEM.
  • NT50 50% neutralization titer
  • ADCC antibody-dependent cell-mediated cytotoxicity
  • This assay uses Jurkat cells that constitutively express mouse Fc ⁇ RIV, allowing for recognition of antibody Fc regions, and luciferase under the control of the NFAT pathway, which is activated following Fc recognition.
  • Vero cells are plated in 96-well plates, at a density of 2.5*104 cells/well, incubated for 20–24 hours, then inoculated with SARS-CoV-2 at a multiplicity of infection (MOI) of 5 plaque-forming units (PFU) per cell.16 hours after inoculation, serial 3-fold dilutions of serum samples are prepared in RPMI + 4% fetal bovine serum (FBS) containing only minimal amounts of IgG, to reduce background.
  • FBS fetal bovine serum
  • Serum samples are added to each well, to allow antibodies to bind to infected Vero cells expressing viral surface proteins. Then, reporter effector cells are serially diluted in RPMI + 4% low-IgG FBS, added to wells, and incubated for 6 hours to allow for recognition of surface-bound antibodies and expression of luciferase. After the 6 hours of incubation, a luciferase substrate is added to wells, so that any luciferase present can react with the substrate to produce light. Light emitted from wells is measured to quantify the amount of luciferase activity as a measurement of ADCC activity.
  • Example 1 Design of mRNA encoding antigenic SARS-CoV-2 antigens.
  • the protein sequences of SARS-CoV-2 M and N proteins were analyzed to identify regions containing high densities of T cell epitopes. Regions of M and N proteins that are rich in T cell epitopes are shown in FIG.2 and FIG.4. Modified forms of each protein containing epitope-rich regions were designed. These modified proteins contain higher densities of T cell epitopes than full-length forms and are thus useful for eliciting T cell responses to the proteins.
  • Example 2 Immunization of mice with compositions containing mRNAs encoding SARS- CoV-2 antigens.
  • mice are immunized with lipid nanoparticles containing the mRNAs encoding Nsp3 and N and M proteins, optionally with a signal peptide or the S protein N-terminal and receptor- binding domains, shown in FIG.1.
  • the antigens encoded by the mRNAs of each composition are shown in Table E2, the sequences of which can be found in Appendix I.
  • a first dose (prime) is administered on day 0, and a second dose (booster) is administered on day 22.
  • Serum is collected on day 21, three weeks after the administration after the first dose but before booster dose administration, and on day 36, two weeks after the administration of the booster dose.
  • mice are also euthanized to collect spleens for analysis of T cells by cell surface marker and intracellular cytokine staining.
  • T cell effector phenotype is and response to antigen is evaluated as described above.
  • Sera are evaluated for antiviral activities such as neutralization, ADCC activity, and prevention of cell-cell spread by SARS-CoV-2.
  • Table E2 Panel of mRNA vaccines containing mRNAs encoding SARS-CoV-2 chimeric protein 0 1 1 1 2 2 2 .
  • SARS-CoV-2 N protein-specific IgG titers, M-specific IgG titers, and Nsp3-specific IgG titers are measured by ELISA at Day 21 post vaccination.
  • the assay to evaluate the neutralization capacity of IgG antibodies generated in response to immunization are carried out as described above.
  • Example 4. Immunogenicity and neutralization assay at day 36 following two doses The same compositions of the mRNA vaccines described in Example 2 are again administered to mice as booster doses on Day 22 post-vaccination with the first dose.
  • the titers of antibodies generated after the booster dose to each of N antigen, M antigen, and Nsp3 are measured by ELISA from day 36 serum.
  • Example 5 Immunogenicity and neutralization assay following administration of mRNA vaccines encoding full-length or composite antigens
  • mRNA vaccines encoding full-length or composite antigens
  • Table E5-1 full-length T cell antigens
  • Table E5-2 composite T cell antigens with or without S antigen
  • mice are euthanized to collect spleens for analysis of antigen-specific cells by ELISPOT, and analysis of T cell responses by intracellular cytokine staining.
  • Table E5-1 Panel of mRNA vaccines containing mRNAs encoding SARS-CoV-2 proteins (full-length N and/or M).
  • any of the mRNA sequences may include a 5’ UTR and/or a 3’ UTR.
  • the UTR sequences may be selected from the following sequences, or other known UTR sequences may be used.
  • any of the mRNA constructs may further comprise a poly(A) tail and/or cap (e.g., 7mG(5’)ppp(5’)NlmpNp).
  • mRNAs and encoded antigen sequences include a signal peptide and/or a peptide tag (e.g., C-terminal His tag), it should be understood that the indicated signal peptide and/or peptide tag may be substituted for a different signal peptide and/or peptide tag, or the signal peptide and/or peptide tag may be omitted.
  • a signal peptide and/or a peptide tag e.g., C-terminal His tag
  • UTR GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGACCCCGGCGCCGCCACC (SEQ ID NO: 1) 5’ UTR: GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGACCCCGGCGCCGCCACC (SEQ ID NO: 2) 3’ UTR: UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCA CCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGC (SEQ ID NO: 3) 3’ UTR: UGAUAAUAGGCUGGAGCCUCGGUGGCCUAGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCA CCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGC (SEQ ID NO: 4) Ta ′ S N 5 G CUUUUUGUUCUCGCC GGAAAUCC
  • any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.
  • All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms. All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document.
  • a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in some embodiments, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
  • “or” should be understood to have the same meaning as “and/or” as defined above.
  • At least one of A and B can refer, in some embodiments, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Veterinary Medicine (AREA)
  • Virology (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Molecular Biology (AREA)
  • Dispersion Chemistry (AREA)
  • Communicable Diseases (AREA)
  • Oncology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Organic Chemistry (AREA)
  • Peptides Or Proteins (AREA)

Abstract

Some aspects of the disclosure relate to mRNA vaccines encoding chimeric proteins and variants thereof, including portions of coronavirus nucleocapsid (N), matrix (M), non- structural protein 3 (Nsp3), and/or Spike (S). Also provided are methods of using the vaccines.

Description

SARS-COV-2 T CELL VACCINES RELATED APPLICATIONS This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No.63/509,650, filed June 22, 2023, and U.S. Provisional Application No. 63/58,2967, filed September 15, 2023, the contents of each of which are incorporated by reference herein in their entirety. REFERENCE TO AN ELECTRONIC SEQUENCE LISTING The contents of the electronic sequence listing (M137870281WO00-SEQ-NTJ.xml; Size: 232,186 bytes; and Date of Creation: June 18, 2024) are incorporated by reference herein in their entirety. BACKGROUND Human coronaviruses are highly contagious enveloped, positive-sense single-stranded RNA viruses of the Coronaviridae family. Two sub-families of Coronaviridae are known to cause human disease, the most important being the β-coronaviruses (betacoronaviruses). The β- coronaviruses are common etiological agents of mild to moderate upper respiratory tract infections. Outbreaks of novel coronavirus infections such as the infections caused by a coronavirus initially identified from the Chinese city of Wuhan in December 2019; however, have been associated with a high mortality rate. This recently identified coronavirus, referred to as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) (formerly referred to as a “2019 novel coronavirus,” or a “2019-nCoV”) has rapidly infected millions of people and caused a global pandemic. The pandemic disease that the SARS-CoV-2 virus causes has been named by World Health Organization (WHO) as COVID-19 (Coronavirus Disease 2019). The first genome sequence of a SARS-CoV-2 isolate (Wuhan-Hu-1; USA-WA1/2020 isolate) was released by investigators from the Chinese CDC in Beijing on January 10, 2020 at Virological, a UK-based discussion forum for analysis and interpretation of virus molecular evolution and epidemiology. The sequence was then deposited in GenBank on January 12, 2020, having Genbank Accession number MN908947.1. Subsequently, a number of SARS-CoV-2 strain variants have been identified, some of which are more infectious than the SARS-CoV-2 isolate. The emergence of SARS-CoV-2 variants with substitutions in the receptor binding domain (RBD) and N-terminal domain (NTD) of the viral S protein has raised concerns among scientists and health officials. The entry of coronavirus into host cells is mediated by interaction between the RBD of the viral S protein and host angiotensin-converting enzyme 2 (ACE2). Vaccine development has focused on inducing antibody responses against this region of SARS- CoV-2 S protein. More recently, a neutralization “supersite” has also been identified in the NTD. A significant decrease in vaccine efficacy has been correlated with amino acid substitutions in the RBD (eg, K417N, E484K, and N501Y) and NTD (eg, L18F, D80A, D215G, and Δ242-244) of the S protein. Some of the most recently circulating isolates containing these substitutions from the United Kingdom (B.1.1.7, Alpha), Republic of South Africa (B.1.351, Beta), Brazil (P.1 lineage, Gamma), New York (B.1.526, Iota), and California (B.1.427/B.1.429 or CAL.20C lineage, Epsilon), have shown a reduction in neutralization from convalescent serum in pseudovirus neutralization (PsVN) assays and resistance to certain monoclonal antibodies. In particular, mutations in the NTD subdomain, and specifically the neutralization supersite, are most extensive in the B.1.351 lineage virus. See McCallum, M. et al. N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2. Cell, doi:10.1016/j.cell.2021.03.028 (2021). Since the identification above the above isolates, Omicron BA.4 and BA.5 sub-variants were detected in samples from South Africa in January 2022 and February 2022, respectively. While BA.4 and BA.5 are two different Omicron sub-variants, they are usually discussed together for vaccine/immunization purposes, as they encode identical spike proteins. Early data from South Africa and genetic and epidemic surveillance in several countries indicated that BA.4/BA.5 had substantial growth advantage over other SARS-CoV-2 circulating strains. This advantage was likely driven by new mutations in BA.4/BA.5 spike that provided increased escape from pre-existing immunity in the populations acquired either via natural infection or vaccinations. In response, the European Centers for Disease Control and Prevention (ECDC) and the UK Health Security Agency (UKHSA) designated BA.4/BA.5 as Variants of Concern (VOC) in May 2022. BA.5 became dominant variant in Portugal in May 2022, while BA.4/BA.5 became dominant in the USA, France, UK, and Germany in June 2022. The recent emergence of SARS-CoV-2 variants (XBB.1.5 lineage; “Kraken”, and XBB.1.16 lineage, “Arcturus”) have raised concerns due to their increased rates of transmission and potential to circumvent immunity elicited by natural infection or vaccination. The XBB.1.5 variant (“Kraken”) is derived from the BA.2 Omicron subvariant and has increased apparent transmissibility compared to ancestral SARS-CoV-2 strains. The XBB.1.5 Spike protein includes 38 substitutions and 4 amino acid deletions, relative to the wild-type Spike protein sequence of the Wuhan-Hu-1 isolate. These substitutions include G252V and F486P. The XBB.1.16 variant (“Arcturus”) is derived from the BA.2 Omicron subvariant and includes 39 substitutions and 4 deletions in the Spike protein, relative to the wild-type Spike protein sequence of the Wuhan-Hu- 1 isolate. These substitutions include T478R, which increases viral infectivity. SUMMARY A monovalent SARS-CoV-2 Spike (S) protein-encoding mRNA vaccine (developed by Moderna Therapeutics) has been demonstrated to be highly efficacious in prevention of symptomatic COVID-19 disease and severe disease. However, T cell responses to SARS-CoV-2 antigens other than the S protein, such as the Nucleocapsid (N), Matrix (M), and Non-structural protein 3 (Nsp3) play important roles in anti-betacoronavirus immunity, including reducing the severity and duration of infections. Moreover, emerging SARS-CoV-2 variants (e.g., the XBB.1.5 lineage; “Kraken”, and XBB.1.16 lineage, “Arcturus”) frequently have altered Spike proteins that evade antibodies elicited by exposure to ancestral Spike proteins, such as through immunization or prior infection, but T cell epitopes in other proteins are more often conserved between variants and ancestral strains. Vaccination approaches for eliciting T cell responses to such non-Spike proteins, alone or in combination with approaches for generating anti-Spike antibodies, therefore allow robust immune responses that maintain anti-betacoronavirus immunity in the face of Spike protein evolution among circulating SARS-CoV-2 strains. Accordingly, some aspects relate to a composition comprising a lipid nanoparticle and a messenger RNA (mRNA) comprising an open reading frame encoding a SARS-CoV-2 chimeric protein comprising: a SARS-CoV-2 N protein portion; a SARS-CoV-2 NSP3 protein portion; and a SARS-CoV-2 M protein portion comprising one or more transmembrane domains. In some embodiments, the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein and a C-terminal domain of the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of the full-length SARS- CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to the amino acids 43-87 of the full-length SARS-CoV-2 N protein. In some embodiments, the first and second N-terminal domain amino acid sequences are connected by a linker. In some embodiments, the linker is a glycine linker or a glycine-serine linker. In some embodiments, the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the SARS-CoV-2 N protein. In some embodiments, the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84. In some embodiments, the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the CD8+ T cell epitopes occur in a different order in the SARS- CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length SARS-CoV-2 NSPR3 protein. In some embodiments, one or more junctional epitopes present in a concatenated amino acid sequence consisting of two or more CD8+ T cell epitopes are not present in the SARS-CoV-2 NSP3 protein portion. In some embodiments, the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93. In some embodiments, the SARS-CoV-2 protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a β- sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a β-sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the β-sheet domain is connected to one or more transmembrane domains by a linker. In some embodiments, the linker is a glycine or glycine-serine linker. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97. In some embodiments, the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86. In some embodiments, two or more of the N protein portion, the NSP3 protein portion, and the M protein portion are separated by a linker. In some embodiments, the N protein portion and the NSP3 protein portion are separated by a first linker, and/or the NSP3 protein portion and the M protein portion are separated by a second linker. In some embodiments, the N protein portion and the M protein portion are separated by a first linker, and/or the M protein portion and the NSP3 protein portion are separated by a second linker. In some embodiments the M protein portion and the N protein portion are separated by a first linker, and/or the N protein portion and the NSP3 protein portion are separated by a second linker. In some embodiments, each of the first and second linkers is a glycine or glycine-serine linker. In some embodiments, each of the first and second linkers comprises the amino acid sequence AAY. In some embodiments, the SARS-CoV-2 chimeric protein further comprises a signal peptide. In some embodiments, the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide. Some aspects relate to a composition comprising a lipid nanoparticle and an mRNA comprising an open reading frame encoding a SARS-COV-2 chimeric protein comprising: a SARS-CoV-2 S protein portion; and a SARS-CoV-2 N protein portion; and a transmembrane portion comprising a transmembrane domain. In some embodiments, the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a SARS-CoV-2 N protein and a C-terminal domain of the SARS- CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of a full-length SARS-CoV- 2 N protein. In some embodiments, the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43-87 of the full- length SARS-CoV-2 N protein. In some embodiments, the first and second N-terminal domain amino acid sequences are connected by a linker. In some embodiments, the linker is a glycine or glycine-serine linker. In some embodiments, the SARS-CoV-2 protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the full-length SARS-CoV-2 N protein. In some embodiments, the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84. In some embodiments, the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92. In some embodiments, the transmembrane portion comprises an influenza HA transmembrane domain. In some embodiments, the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 M protein portion does not comprise an N- terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a β- sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a β-sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the β-sheet domain is connected to one or more transmembrane domains by a linker. In some embodiments, the linker is a glycine or a glycine-serine linker. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97. In some embodiments, the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86. In some embodiments, the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed. In some embodiments, the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker. In some embodiments, the linker is a glycine linker or a glycine-serine linker. In some embodiments, the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein. In some embodiments, the NTD corresponds to amino acids 1-290 of the full-length SARS-CoV-2 S protein, and/or the RBD corresponds to amino acids 316-517 of the full-length SARS-CoV-2 S protein. In some embodiments, the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein. In some embodiments, the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein. In some embodiments, the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87. In some embodiments, the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98. In some embodiments, two or more of the S protein portion, the N protein portion, and the transmembrane portion are separated by a linker. In some embodiments, the S protein portion and the N protein portion are separated by a first linker and/or the N protein portion and the transmembrane portion are separated by a second linker. In some embodiments, the S protein portion and the transmembrane portion are separated by a first linker, and/or the transmembrane portion and the N protein portion are separated by a second linker. In some embodiments, each of the first and second linkers is a glycine or a glycine-serine linker. In some embodiments, each of the first and second linkers comprises the amino acid sequence AAY. Some aspects relate to a composition comprising a lipid nanoparticle and a messenger ribonucleic acid comprising an open reading frame (ORF) encoding a protein comprising an amino acid sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183. In some embodiments, the mRNA comprises a 5′ untranslated region (UTR), wherein the 5′ UTR comprises a nucleotide sequence with at least 90% sequence identity to a nucleotide sequence selected from SEQ ID NOs: 1, 2, 5–35, 66, 70–72, 75, 76, and 81. In some embodiments, the 5′ UTR comprises a nucleotide sequence selected from SEQ ID NOs: 1, 2, 5– 35, 66, 70–72, 75, 76, and 81. In some embodiments, the mRNA comprises a 3′ untranslated region (UTR), wherein the 5′ UTR comprises a nucleotide sequence with at least 90% sequence identity to a nucleotide sequence selected from SEQ ID NOs: 3–4, 36–44, 68, 69, 73, 74, 77–79, and 82. In some embodiments, the 3′ UTR comprises a nucleotide sequence selected from SEQ ID NOs: 3–4, 36– 44, 68, 69, 73, 74, 77–79, and 82. In some embodiments, the mRNA comprises one or more stop codons immediately downstream from the open reading frame. In some embodiments, the one or more stop codons comprise the nucleotide sequence UGAUGA. In some embodiments, the one or more stop codons comprise the nucleotide sequence UGAUAAUAG. In some embodiments, the mRNA comprises a polyadenosine (polyA) sequence comprising 20 or more consecutive adenosine nucleotides. In some embodiments, the polyA sequence comprises 100 consecutive adenosine nucleotides. In some embodiments, the polyA sequence comprises, in 5′-to-3′ order, a first nucleotide sequence comprising 30 consecutive adenosine nucleotides, an intervening sequence comprising no more than three adenosine nucleotides, and a second nucleotide sequence comprising 70 consecutive adenosine nucleotides. In some embodiments, the polyA sequence comprises the nucleotide sequence of SEQ ID NO: 80. In some embodiments, the mRNA further comprises a polycytidine (polyC) sequence comprising 20 or more consecutive cytidine nucleotides. In some embodiments, the polyC sequence comprises 30 consecutive cytidine nucleotides. In some embodiments, the polyC sequence is downstream from the polyA sequence, wherein the polyA sequence comprises 64 consecutive adenosine nucleotides. In some embodiments, the polyA sequence comprises 109 consecutive adenosine nucleotides. In some embodiments, the mRNA comprises a 5’ cap analog. In some embodiments, the 5’ cap analog comprises a 7mG(5’)ppp(5’)NlmpNp cap. In some embodiments, the lipid nanoparticle comprises 40-55 mol% ionizable amino lipid, 30-45 mol% sterol, 5-15 mol% neutral lipid, and 1-5 mol% PEG-modified lipid. In some embodiments, the ionizable amino lipid comprises a compound of Formula (I): R1 is R”M’R’ or C5-20 alkenyl; R2 and R3 are each independently selected from C1-14 alkyl and C2-14 alkenyl; R4 is (CH2)nQ, wherein Q is OH and n is selected from 3, 4, and 5; M and M’ are each independently -OC(O)- or -C(O)O; R5, R6, and R7 are each H; R’ is a linear C1-12 alkyl, or C1-12 alkyl substituted with C6-9 alkyl; R” is C3-14 alkyl; m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13. In some embodiments, the ionizable amino lipid comprises Compound 1:
Figure imgf000011_0001
(Compound 1). In some embodiments, the ionizable amino lipid comprises a compound of the structure A7:
Figure imgf000011_0002
(A7). In some embodiments, the neutral lipid is 1,2 distearoyl-sn-glycero-3-phosphocholine (DSPC). In some embodiments, the sterol is cholesterol. In some embodiments, the PEG-modified lipid is PEG2000-DMG. In some embodiments, the open reading frame comprises one or more chemically modified nucleotides. In some embodiments, the open reading frame comprises N1-methylpseudouridine. In some embodiments, at least 80% of uracil nucleotides in the open reading frame comprise N1- methylpseudouridine. In some embodiments, 100% of uracil nucleotides in the open reading frame comprise N1-methylpseudouridine. In some embodiments, the open reading frame comprises 5-methylcytidine. In some embodiments, at least 80% of cytosine nucleotides in the open reading frame comprise 5- methylcytidine. In some embodiments, 100% of cytosine nucleotides in the open reading frame comprise 5-methylcytidine. In some embodiments, the open reading frame comprises 5-methyluridine. In some embodiments, at least 80% of uracil nucleotides in the open reading frame comprise 5- methyluridine. In some embodiments, 100% of uracil nucleotides in the open reading frame comprise 5-methyluridine. Some aspects relate to a pharmaceutical composition comprising an mRNA and a pharmaceutically acceptable excipient. Some aspects relate to a method comprising administering to a subject a composition. In some embodiments, the composition is administered intramuscularly. In some embodiments, the composition is effective to induce, in the subject, CD4+ and/or CD8+ T cells specific to one or more epitopes of the protein. In some embodiments, the method comprises administering a first dose and a second dose of the composition. Some aspects relate to a SARS-CoV-2 chimeric protein comprising a SARS-CoV-2 N protein portion; a SARS-CoV-2 NSP3 protein portion; and a SARS-CoV-2 M protein portion comprising one or more transmembrane domains. In some embodiments, the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein and a C-terminal domain of the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of the full-length SARS- CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to the amino acids 43-87 of the full-length SARS-CoV-2 N protein. In some embodiments, the first and second N-terminal domain amino acid sequences are connected by a linker. In some embodiments, the linker is a glycine linker or a glycine-serine linker. In some embodiments, the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the SARS-CoV-2 N protein. In some embodiments, the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84. In some embodiments, the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the CD8+ T cell epitopes occur in a different order in the SARS- CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length wild-type SARS-CoV-2 NSPR3 protein. In some embodiments, one or more junctional epitopes present in a concatenated amino acid sequence consisting of two or more CD8+ T cell epitopes are not present in the SARS-CoV-2 NSP3 protein portion. In some embodiments, the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93. In some embodiments, the SARS-CoV-2 protein portion does not comprise an N-terminal glycosylation site, relative to a full-length wild-type SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a β- sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a β-sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the β-sheet domain is connected to one or more transmembrane domains by a linker. In some embodiments, the linker is a glycine or glycine-serine linker. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97. In some embodiments, the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86. In some embodiments, two or more of the N protein portion, the NSP3 protein portion, and the M protein portion are separated by a linker. In some embodiments, the N protein portion and the NSP3 protein portion are separated by a first linker, and/or the NSP3 protein portion and the M protein portion are separated by a second linker. In some embodiments, the N protein portion and the M protein portion are separated by a first linker, and/or the M protein portion and the NSP3 protein portion are separated by a second linker. In some embodiments the M protein portion and the N protein portion are separated by a first linker, and/or the N protein portion and the NSP3 protein portion are separated by a second linker. In some embodiments, each of the first and second linkers is a glycine or glycine-serine linker. In some embodiments, each of the first and second linkers comprises the amino acid sequence AAY. In some embodiments, the SARS-CoV-2 chimeric protein further comprises a signal peptide. In some embodiments, the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide. Some aspects relate to a SARS-COV-2 chimeric protein comprising: a SARS-CoV-2 S protein portion; and a SARS-CoV-2 N protein portion; and a transmembrane portion comprising a transmembrane domain. In some embodiments, the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a SARS-CoV-2 N protein and a C-terminal domain of the SARS- CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of a full-length SARS-CoV- 2 N protein. In some embodiments, the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43-87 of the full- length SARS-CoV-2 N protein. In some embodiments, the first and second N-terminal domain amino acid sequences are connected by a linker. In some embodiments, the linker is a glycine or glycine-serine linker. In some embodiments, the SARS-CoV-2 protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the full-length wild-type SARS- CoV-2 N protein. In some embodiments, the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84. In some embodiments, the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92. In some embodiments, the transmembrane portion comprises an influenza HA transmembrane domain. In some embodiments, the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 M protein portion does not comprise an N- terminal glycosylation site, relative to a full-length wild-type SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a β- sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a β-sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the β-sheet domain is connected to one or more transmembrane domains by a linker. In some embodiments, the linker is a glycine or a glycine-serine linker. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97. In some embodiments, the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86. In some embodiments, the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed. In some embodiments, the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker. In some embodiments, the linker is a glycine linker or a glycine-serine linker. In some embodiments, the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein. In some embodiments, the NTD corresponds to amino acids 1-290 of the full-length SARS-CoV-2 S protein, and/or the RBD corresponds to amino acids 316-517 of the full-length SARS-CoV-2 S protein. In some embodiments, the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein. In some embodiments, the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein. In some embodiments, the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87. In some embodiments, the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98. In some embodiments, two or more of the S protein portion, the N protein portion, and the transmembrane portion are separated by a linker. In some embodiments, the S protein portion and the N protein portion are separated by a first linker and/or the N protein portion and the transmembrane portion are separated by a second linker. In some embodiments, the S protein portion and the transmembrane portion are separated by a first linker, and/or the transmembrane portion and the N protein portion are separated by a second linker. In some embodiments, each of the first and second linkers is a glycine or a glycine-serine linker. In some embodiments, each of the first and second linkers comprises the amino acid sequence AAY. Some aspects relate to a SARS-CoV-2 chimeric protein comprising an amino acid sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183. In some embodiments, an mRNA comprising an ORF encodes any one of the proteins. In some embodiments, an mRNA comprises an ORF encoding the SARS-CoV-2 chimeric protein. In some embodiments, the mRNA comprises a chemical modification. In some embodiments, 100% of the uracil nucleotides of the mRNA comprise a chemical modification. In some embodiments, 100% of the uracil nucleotides of the mRNA comprise N1- methylpseudouridine. Some aspects relate to a composition comprising a self-amplifying RNA encoding a SARS-CoV-2 chimeric protein comprising a SARS-CoV-2 N protein portion; a SARS-CoV-2 NSP3 protein portion; and a SARS-CoV-2 M protein portion comprising one or more transmembrane domains. In some embodiments, the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein and a C-terminal domain of the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of the full-length SARS- CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to the amino acids 43-87 of the full-length SARS-CoV-2 N protein. In some embodiments, the first and second N-terminal domain amino acid sequences are connected by a linker. In some embodiments, the linker is a glycine linker or a glycine-serine linker. In some embodiments, the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the SARS-CoV-2 N protein. In some embodiments, the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84. In some embodiments, the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. In some embodiments, the CD8+ T cell epitopes occur in a different order in the SARS- CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length SARS-CoV-2 NSPR3 protein. In some embodiments, one or more junctional epitopes present in a concatenated amino acid sequence consisting of two or more CD8+ T cell epitopes are not present in the SARS-CoV-2 NSP3 protein portion. In some embodiments, the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85. In some embodiments, the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93. In some embodiments, the SARS-CoV-2 protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a β- sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a β-sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the β-sheet domain is connected to one or more transmembrane domains by a linker. In some embodiments, the linker is a glycine or glycine-serine linker. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97. In some embodiments, the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86. In some embodiments, two or more of the N protein portion, the NSP3 protein portion, and the M protein portion are separated by a linker. In some embodiments, the N protein portion and the NSP3 protein portion are separated by a first linker, and/or the NSP3 protein portion and the M protein portion are separated by a second linker. In some embodiments, the N protein portion and the M protein portion are separated by a first linker, and/or the M protein portion and the NSP3 protein portion are separated by a second linker. In some embodiments the M protein portion and the N protein portion are separated by a first linker, and/or the N protein portion and the NSP3 protein portion are separated by a second linker. In some embodiments, each of the first and second linkers is a glycine or glycine-serine linker. In some embodiments, each of the first and second linkers comprises the amino acid sequence AAY. In some embodiments, the SARS-CoV-2 chimeric protein further comprises a signal peptide. In some embodiments, the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide. Some aspects relate to a composition comprising a self-amplifying RNA encoding a SARS-CoV-2 chimeric protein comprising: a SARS-CoV-2 S protein portion; and a SARS-CoV- 2 N protein portion; and a transmembrane portion comprising a transmembrane domain. In some embodiments, the SARS-CoV-2 protein portion comprises a truncated or modified N-terminal domain of a SARS-CoV-2 N protein and a C-terminal domain of the SARS- CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein. In some embodiments, the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104-143 of a full-length SARS-CoV- 2 N protein. In some embodiments, the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43-87 of the full- length SARS-CoV-2 N protein. In some embodiments, the first and second N-terminal domain amino acid sequences are connected by a linker. In some embodiments, the linker is a glycine or glycine-serine linker. In some embodiments, the SARS-CoV-2 protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213-366 of the full-length SARS-CoV-2 N protein. In some embodiments, the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84. In some embodiments, the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92. In some embodiments, the transmembrane portion comprises an influenza HA transmembrane domain. In some embodiments, the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 M protein portion does not comprise an N- terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein. In some embodiments, the SARS-CoV-2 protein portion comprises, in N-to-C terminal order, one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and a β- sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96. In some embodiments, the SARS-CoV-2 M protein portion comprises, in N-to-C terminal order, a β-sheet domain of a full-length SARS-CoV-2 M protein, and one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the β-sheet domain is connected to one or more transmembrane domains by a linker. In some embodiments, the linker is a glycine or a glycine-serine linker. In some embodiments, the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97. In some embodiments, the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86. In some embodiments, the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell. In some embodiments, the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed. In some embodiments, the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker. In some embodiments, the linker is a glycine linker or a glycine-serine linker. In some embodiments, the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein. In some embodiments, the NTD corresponds to amino acids 1-290 of the full-length SARS-CoV-2 S protein, and/or the RBD corresponds to amino acids 316-517 of the full-length SARS-CoV-2 S protein. In some embodiments, the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein. In some embodiments, the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein. In some embodiments, the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87. In some embodiments, the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98. In some embodiments, two or more of the S protein portion, the N protein portion, and the transmembrane portion are separated by a linker. In some embodiments, the S protein portion and the N protein portion are separated by a first linker and/or the N protein portion and the transmembrane portion are separated by a second linker. In some embodiments, the S protein portion and the transmembrane portion are separated by a first linker, and/or the transmembrane portion and the N protein portion are separated by a second linker. In some embodiments, each of the first and second linkers is a glycine or a glycine-serine linker. In some embodiments, each of the first and second linkers comprises the amino acid sequence AAY. BRIEF DESCRIPTION OF THE DRAWINGS FIG.1 is a schematic of T cell composite vaccine designs. N/M/Nsp3 composite vaccine designs are shown on top, either with or without a signal peptide (SP), with two possible compositions of nucleocapsid (N) protein sequences, six non-structural protein 3 (Nsp3) epitopes, and either a full-length or truncated membrane (M) protein sequence. Segments of this composite are either not linked or are linked with an AAY linker (cleavable) or a GGSGG (SEQ ID NO: 99) linker (non-cleavable). NTD-RBD/N/M composite vaccine designs are shown on the bottom, with a spike (S) protein N terminal domain (NTD) and receptor binding domain (RBD), two possible compositions of N protein sequences, and three possible compositions of M protein sequences (HA-TM, full-length M protein, or truncated M protein). Segments of this composite are linked with a GGSGG (SEQ ID NO: 99) linker. FIG.2 represents the prediction methods underlying the design of the N protein antigens used in the vaccine composites shown in FIG.1. Shown in the bottom line graph is the N protein sequence with areas of immunogenicity; the top line graph illustrates the conservation of the N protein sequence across sarbecoviruses. Graphs in between represent the coverage of CD4+ and CD8+ TCRs, indicating immunogenic epitopes within the N protein sequence. FIG.3A-3C relate to the sequence and structure of the N protein. FIG.3A is a schematic of the regions of the N protein, including the N-terminal arm (N-arm); the N-terminal domain (NTD), which contains the RNA-binding sequence; the linker region (LKR) containing serine and arginine-rich (SR) motifs; the C-terminal domain (CTD), which contains a sequence associated with RNA-binding and oligomerization; and the C-terminal tail (C-tail). FIG.3B shows the protein structures of the RNA-binding residues of the N protein. The circle in the structure on the right highlights the N protein basic loop, which contains at least 7 residues that are removed in the N protein compositions included in the vaccine designs. FIG.3C iterates the designs of the N protein sequences used in the vaccine compositions. The design on the top contains residues 104-143 and 213-366 of the N protein, while the design on the bottom also includes residues 43-87 and a linker, preserving most of the NTD domain. FIG.4 represents the prediction methods underlying the design of the M protein antigens used in the vaccine composites shown in FIG.1. Shown in the bottom line graph is the M protein sequence with areas of immunogenicity; the top line graph illustrates the conservation of the N protein sequence across sarbecoviruses. Graphs in between represent the coverage of CD4+ and CD8+ TCRs, indicating immunogenic epitopes within the N protein sequence. FIG.5 iterates designs of M protein sequences used in vaccine compositions. The design on the top is the full-length M protein, while the design on the bottom is a truncated M protein containing the β-sheet domain and residues 6-104. FIG.6A-6B illustrate the junctional epitopes that arise from the concatenation of Nsp3 epitopes in two different configurations. FIG.6A includes a schematic of the concatenation of Nsp3 epitopes in the order of epitopes 1-6 (amino acid sequences for each epitope are shown below the epitope names) and the resulting junctional epitopes upon various epitope pairings. The world population coverage of HLAs that are capable of presenting the antigens of the junctional epitopes are shown as a percent in the “World Pop. Cov.” columns. The bolded and underlined junctional epitope that results from the combination of Nsp3-E3 and Nsp3-E4 is similarly matched to human proteins. FIG.6B shows the minimization of junctional epitopes that result from a concatenation of Nsp3 epitopes in a different configuration: Nsp3-E6---Nsp3- E2---Nsp3-E1---Nsp3-E4---Nsp3-E5---Nsp3-E3. Junctional epitopes result from three of these pairings, the sequences for which are shown in “MHC-I Junctional Epitopes.” The world population coverage of HLAs that are capable of presenting the resultant antigens is shown in the “World Pop. Cov.” column. DETAILED DESCRIPTION SARS-CoV-2 The genome of SARS-CoV-2 is a single-stranded positive-sense RNA (+ssRNA) with the size of 29.8–30 kb encoding about 9860 amino acids (Chan et al.2000, supra; Kim et al.2020 Cell, May 14; 181(4):914-921.e10.). The SARS-CoV-2 genome is organized into specific genes encoding structural proteins and nonstructural proteins (Nsps). The order of the structural proteins in the genome is 5′-replicase (open reading frame (ORF)1/ab)-structural proteins [Spike (S)-Envelope (E)-Membrane (M)-Nucleocapsid (N)]-3′. The genome of coronaviruses includes a variable number of open reading frames that encode accessory proteins, nonstructural proteins, and structural proteins (Song et al.2019 Viruses;11(1):p.59). Most of the antigenic peptides are located in the structural proteins (Cui et al.2019 Nat. Rev. Microbiol., 17(3):181–192). Spike surface glycoprotein (S), a small envelope protein (E), matrix protein (M), and nucleocapsid protein (N) are four main structural proteins. Since S-protein contributes to cell tropism and virus entry and also induces neutralizing antibodies (NAb) and protective immunity, it can be considered one of the most important targets in coronavirus vaccine development among all other structural proteins. Moreover, amino acid sequence analysis has shown that S-protein contains conserved regions among the coronaviruses, which may be the basis for universal vaccine development. SARS-CoV-2 Proteins Some aspects relate to compositions comprising nucleic acids (e.g., mRNAs) encoding proteins of interest, e.g., a protein derived from one or more betacoronavirus proteins such as a SARS-CoV-2 nucleocapsid (N), matrix (M), non-structural protein 3 (nsp3), and/or spike (S) protein. Such compositions do not comprise antigens per se, but rather comprise nucleic acids, in particular, mRNA(s) that encode antigens or antigenic sequences once delivered to a cell, tissue or subject. Delivery of nucleic acids, in particular mRNA(s), is achieved by inclusion of nucleic acids in appropriate carriers or delivery vehicles (e.g., lipid nanoparticles) such that upon administration to cells, tissues or subjects, nucleic acid is taken up by cells which, in turn, express protein(s) encoded by the nucleic acids, e.g., mRNAs. Antigens, as used herein, are proteins capable of inducing an immune response (e.g., causing an immune system to produce antibodies against the antigens). The vaccines provide a unique advantage over traditional protein-based vaccination approaches, in which protein antigens are purified or produced in vitro, e.g., recombinant protein production technologies. The vaccines feature RNA (e.g., mRNA) encoding the desired antigens, which when introduced into the body, i.e., administered to a mammalian subject (for example a human) in vivo, cause the cells of the body to express the desired antigens. To facilitate delivery of the mRNAs to the cells of the body, the mRNAs are encapsulated in lipid nanoparticles (LNPs). Upon delivery and uptake by cells of the body, the mRNAs are translated in the cytosol and protein antigens are generated by the host cell machinery. The protein antigens are presented and elicit an adaptive humoral and cellular immune response. Neutralizing antibodies are directed against the expressed protein antigens and hence the protein antigens are considered relevant target antigens for vaccine development. Herein, use of the term “antigen” encompasses immunogenic proteins and immunogenic fragments (an immunogenic fragment that induces (or is capable of inducing) an immune response to a (at least one) SARS-CoV-2 variant), unless otherwise stated. It should be understood that the term “protein” encompasses peptides and the term “antigen” encompasses antigenic fragments. Other molecules may be antigenic such as bacterial polysaccharides or combinations of protein and polysaccharide structures, but for the viral vaccines included herein, viral proteins, fragments of viral proteins and designed and or mutated proteins derived from SARS-CoV-2 are the antigens. Many proteins have a quaternary or three-dimensional structure, which consists of more than one polypeptide or several polypeptide chains that associate into an oligomeric molecule. As used herein the term “subunit” refers to a single protein molecule, for example, a polypeptide or polypeptide chain resulting from processing of a nascent protein molecule, which subunit assembles (or “coassembles”) with other protein molecules (e.g., subunits or chains) to form a protein complex. Proteins can have a relatively small number of subunits and therefore be described as “oligomeric” or can consist of a large number of subunits and therefore be described as “multimeric”. The subunits of an oligomeric or multimeric protein may be identical, homologous or totally dissimilar and dedicated to disparate tasks. Proteins or protein subunits can further comprise domains. As used herein, the term “domain” refers to a distinct functional and/or structural unit within a protein. Typically, a “domain” is responsible for a particular function or interaction, contributing to the overall role of a protein. Domains can exist in a variety of biological contexts. Similar domains (i.e., domains sharing structural, functional and/or sequence homology) can exist within a single protein or can exist within distinct proteins having similar or different functions. A protein domain is often a conserved part of a given protein tertiary structure or sequence that can function and exist independently of the rest of the protein or subunit thereof. In structural and molecular biology, identical, homologous or similar subunits or domains can help to classify newly identified or novel proteins, as was done immediately upon publication of the SARS-CoV-2 viral genomic sequence. As used herein, the term antigen is distinct from the term “epitope” which is a substructure of an antigen, e.g., a polypeptide, such as 7-10 amino acids, or carbohydrate structure, which may be recognized by an antigen binding site. The art describes protein antigens that are delivered to subjects or immune cells in isolated form, e.g., isolated protein, polypeptide or peptide antigens, however, the design, testing, validation, and production of protein antigens can be costly and time-consuming, especially when producing proteins at large scale. By contrast, mRNA technology is amenable to rapid design and testing of mRNA constructs encoding a variety of antigens. Moreover, rapid production of mRNA coupled with inclusion in appropriate delivery vehicles (e.g., lipid nanoparticles), can proceed quickly and can rapidly produce mRNA vaccines at large scale. Potential benefit also arises from the fact that antigens encoded by the mRNAs are expressed by the cells of the subject, e.g., are expressed by the human body, and thus the subject, e.g., the human body, serves as the “factory” to produce the antigens which, in turn, elicit the desired immune response. The compositions may include an RNA or multiple RNAs encoding two or more antigens of the same or different viral strains. Vaccines may be combination vaccines that include RNA encoding one or more coronavirus antigens and one or more antigen(s) of a different organism. Thus, the vaccines may be combination vaccines that target one or more antigens of the same strain/species, or one or more antigens of different strains/species, e.g., antigens which induce immunity to organisms which are found in the same geographic areas where the risk of coronavirus infection is high or organisms to which an individual is likely to be exposed to when exposed to a coronavirus (e.g., COVID-19). In some embodiments, the second or subsequent circulating SARS-CoV-2 antigen is an immunodominant antigen from an emerging strain. An immunodominant antigen of an emerging strain is assessed with respect to the strain from which the antigen is derived, relative to a different strain of the virus, such as the original strain or other variant thereof. An immunodominant antigen of the emerging strain induces a stronger immune response against the emerging strain than against the different strain. In some embodiments, an immunodominant antigen of the emerging strain is more infective than a different strain of the virus, such as the original strain or other variant thereof. Nucleocapsid (N) Proteins and Portions Some aspects relate to proteins comprising a nucleocapsid (N) portion and/or RNAs encoding the same. The nucleocapsid of the SARS-CoV-2 virus plays an essential role in its replication and assembly and is highly conserved among SARS-CoV-2 variants. It contains the N protein, which protein forms oligomers and assembles around the viral RNA in a helical arrangement, resulting in the formation of the viral ribonucleoprotein (vRNP) complex. This complex is essential for protecting the viral genome and facilitating its packaging into new viral particles during viral assembly. Structurally, the N protein includes two domains: the N-terminal domain (NTD) and the C-terminal domain (CTD), connected by a linking region. The NTD is involved in RNA binding and oligomerization, while the CTD is responsible for protein-protein interactions, including interactions with other viral proteins and host factors. The N protein is predominantly intracellular but can be released from cells during active viral replication following cell lysis or by active secretion mechanisms. The N protein is highly immunogenic, containing many epitopes, including CD8+ and CD4+ T cell epitopes, a large number of which are conserved between emerging variants and ancestral lineages. In some embodiments, an mRNA encodes a protein comprising a SARS-CoV-2 nucleocapsid (N) protein portion. In some embodiments, a protein comprises a SARS-CoV-2 nucleocapsid (N) protein portion. N protein portions may contain a truncation or deletion of one or more regions, relative to a full-length or naturally occurring N protein, that are sparse in CD4+ and/or CD8+ T cell epitopes, thereby increasing the epitope density of the N protein portion compared to a full-length N protein. In some embodiments, an N protein portion comprises a higher density or CD4+ T cell epitopes than a wild-type SARS-CoV-2 N protein. In some embodiments, an N protein portion comprises a higher density or CD8+ T cell epitopes than a wild-type SARS-CoV-2 N protein. N protein portions may also be modified to remove or disrupt a functional region of full- length N protein to improve safety or immunogenicity. For example, an N protein portion may lack one or more amino acids of an RNA-binding domain. In some embodiments, the N protein portion has a truncation in, or a deletion of, a basic loop of an RNA-binding domain. For example, a SARS-CoV-2 N protein having the amino acid sequence of SEQ ID NO: 84 comprises a basic loop at amino acids 88–103. Thus, in SEQ ID NO: 84, a basic loop corresponds to amino acids 88–103, and so a N portion lacking or having a truncated basic loop lacks one or more amino acids corresponding to amino acids 88–103 of SEQ ID NO: 84. Some portions of SARS-COV-2 N protein that are shortened or removed by truncation are located internally on a wild-type SARS-CoV-2 N protein, and so truncations in these portions are internal truncations. A SARS-CoV-2 N protein portion comprising an internal truncation lacks one or more amino acids that is present in a full-length N protein sequence, but comprises one or more amino acids that flank the deleted amino acid(s) in the full-length N protein sequence. In some embodiments, a modified N protein portion lacks an amino acid sequence comprising 1–200, 1–190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1– 90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140– 200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present between the N-terminal amino acid and C-terminal amino acid of a full-length N amino acid sequence. In some embodiments, a modified N protein portion lacks 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 amino acids corresponding to amino acids 88–103 of SEQ ID NO: 84. A SARS-CoV-2 N protein portion may comprise a truncated N-terminus. In some embodiments, a modified N protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the N-terminus of a full-length N protein amino acid sequence. In some embodiments, a modified N protein portion lacks an amino acid sequence comprising 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 101, 102, or 103 amino acids, which is present at the N-terminus of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–103 of a full- length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–103 of SEQ ID NO: 84. In some embodiments, a modified N protein portion lacks an amino acid sequence comprising 10, 20, 30, 40, 41, or 42 amino acids, which is present at the N-terminus of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–42 of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 1–42 of SEQ ID NO: 84. A SARS-CoV-2 N protein portion may comprise a truncated C-terminus. In some embodiments, a modified N protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the C-terminus of a full-length N protein amino acid sequence. In some embodiments, a modified N protein portion lacks an amino acid sequence comprising 10, 20, 30, 40, 50, 51, 52, or 53 amino acids, which is present at the C- terminus of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 367–419 of a full-length N protein. In some embodiments, a modified N protein portion lacks an amino acid sequence corresponding to amino acids 367–419 of SEQ ID NO: 84. A SARS-CoV-2 N protein portion may comprise two or more amino acid sequences derived from a full-length N protein. The amino acid sequences of the full-length N protein may be derived from the same N protein, or different N proteins (e.g., N proteins of different SARS- CoV-2 lineages). Any pair of the two or more amino acid sequences may be contiguous in the N protein portion (e.g., without an intervening amino acid sequence), or the two amino acid sequences may be separated by a linker. Where multiple linkers separate multiple pairs of amino acid sequences, the multiple linkers may each comprise the same amino acid sequence (e.g., AAY). Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS). In some embodiments, an N protein portion comprises an N-terminal domain of a SARS- CoV-2 N protein. In some embodiments, the N protein portion comprises a truncated N-terminal domain of a SARS-CoV-2 N protein. In some embodiments, the N protein portion lacks a basic loop corresponding to amino acids 88–103 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence with at least 90% identity to amino acids 104– 143 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence corresponding to amino acids 104–143 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence with at least 90% identity to amino acids 43–87 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence corresponding to amino acids 43–87 of SEQ ID NO: 84. In some embodiments, two or more amino acid sequences of an N protein portion are connected by GGSGG (SEQ ID NO: 99). In some embodiments, the N protein portion comprises a C-terminal domain of a SARS- CoV-2 N protein. In some embodiments, the C-terminal domain comprises an amino acid sequence 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, or 110 amino acids in length. In some embodiments, the C-terminal domain comprises an amino acid sequence corresponding to amino acids 255–364 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence with at least 90% identity to amino acids 213–366 of SEQ ID NO: 84. In some embodiments, the N protein portion comprises an amino acid sequence corresponding amino acids 213–366 of SEQ ID NO: 84. Matrix (M) Proteins and Portions Some aspects relate to proteins comprising a matrix (M) portion and/or RNAs encoding the same. The SARS-CoV-2 M protein is an integral membrane protein that plays a crucial role involved in the assembly, budding, and maturation of viral particles. The M protein is critical for maintaining the contributes to maintenance of virion structural integrity via stabilization of the lipid bilayer. It also participates in intracellular trafficking and localization of viral components by interacting with cellular transport proteins to direct viral proteins to the sites of viral assembly. Structurally, the M protein is a transmembrane domain of 222-229 amino acids, depending on the variant. It has three main regions: the N-terminal domain (NTD), the transmembrane domains (TMD), and the C-terminal domain (CTD). The NTD is the region of the M protein exposed on the cytoplasmic side of the viral envelope, where it interacts with the nucleocapsid protein. The TMD spans the viral membrane, anchoring the M protein within the viral envelope. It is a hydrophobic region that enables the protein’s integration into the lipid bilayer. The CTD is located on the extracellular side of the viral envelope. It interacts with other structural proteins, such as the S protein and envelope (E) protein, to maintain the integrity of the viral envelope. Though generally considered less immunogenic than the S protein, dozens of Multiple CD8+ and CD4+ T cell epitopes have been predicted to occur in the M protein, the amino acid sequence of which is highly conserved among SARS-CoV-2 variants. In some embodiments, an mRNA encodes a protein comprising a SARS-CoV-2 matrix (M) protein portion. In some embodiments, a protein comprises a SARS-CoV-2 matrix (M) protein portion. M protein portions may contain a truncation or deletion of one or more regions, relative to a full-length or naturally occurring M protein, that are sparse in CD4+ and/or CD8+ T cell epitopes, thereby increasing the epitope density of the M protein portion compared to a full- length M protein. In some embodiments, an M protein portion comprises a higher density or CD4+ T cell epitopes than a wild-type SARS-CoV-2 M protein. In some embodiments, an M protein portion comprises a higher density or CD8+ T cell epitopes than a wild-type SARS-CoV- 2 M protein. M protein portions may also be modified to disrupt or remove functional regions of the M protein to improve safety or immunogenicity. For example, in some embodiments, an M protein lacks one or more glycosylation sites. Some portions of SARS-COV-2 M proteins that are shortened or removed by truncation are located internally on a wild-type SARS-CoV-2 M protein, and so truncations in these portions are internal truncations. A SARS-CoV-2 M protein portion comprising an internal truncation lacks one or more amino acids that is present in a full-length M protein sequence, but comprises one or more amino acids that flank the deleted amino acid(s) in the full-length M protein sequence. In some embodiments, a modified M protein portion lacks an amino acid sequence comprising 1–200, 1–190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130– 200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present between the N-terminal amino acid and C-terminal amino acid of a full-length M amino acid sequence. A SARS-CoV-2 M protein portion may comprise a truncated N-terminus. In some embodiments, a modified M protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the N-terminus of a full-length M protein amino acid sequence. In some embodiments, a modified M protein portion lacks an amino acid sequence comprising 1, 2, 3, 4, or 5 amino acids, which is present at the N-terminus of a full- length M protein. In some embodiments, a modified M protein portion lacks an amino acid sequence corresponding to amino acids 1–5 of a full-length M protein. The N-terminal amino acids 1–5 of a full-length M protein are involved in glycosylation of the full-length M protein, and so removal of one or more of these amino acids results in a less glycosylated M protein portion, relative to a full-length M protein. In some embodiments, a modified M protein portion lacks an amino acid sequence corresponding to amino acids 1–5 of SEQ ID NO: 86. A SARS-CoV-2 M protein portion may comprise a truncated C-terminus. In some embodiments, a modified M protein portion lacks an amino acid sequence comprising 1–200, 1– 190, 1–180, 1–170, 1–160, 1–150, 1–140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–200, 20–200, 30–200, 40–200, 50–200, 60–200, 70–200, 80–200, 90–200, 100–200, 110–200, 120–200, 130–200, 140–200, 150–200, 160–200, 170–200, 180–200, 190–200, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175–200 amino acids that is present that is present at the C-terminus of a full-length M protein amino acid sequence. In some embodiments, the modified M protein portion lacks one or more C-terminal amino acids of SEQ ID NO: 86. In some embodiments, an M protein portion comprises an amino acid sequence comprising 1–222, 1–220, 1–217, 1–215, 1–210, 1–200, 1–190, 1–180, 1–170, 1–160, 1–150, 1– 140, 1–130, 1–120, 1–110, 1–100, 1–90, 1–80, 1–70, 1–60, 1–50, 1–40, 1–30, 1–25, 1–20, 1–10, or 1–5, 10–222, 20–222, 30–222, 40–222, 50–222, 60–222, 70–222, 80–222, 90–222, 100–222, 110–222, 120–222, 130–222, 140–222, 150–222, 160–222, 170–222, 180–222, 190–222, 200– 222, 210–222, 217–222, 10–30, 30–50, 50–75, 75–100, 100–125, 125–150, 150–175, or 175– 217 amino acids, which is present in a full-length M protein amino acid sequence. In some embodiments, the full-length M protein amino acid sequence is SEQ ID NO: 86. In some embodiments, an M protein portion comprises one or more transmembrane domains of a full-length SARS-CoV-2 M protein. In some embodiments, an M protein portion comprises 1, 2, or 3 transmembrane domains of a full-length SARS-CoV-2 M protein portion. Exemplary transmembrane domains of a full-length SARS-CoV-2 M protein are located at amino acids 19–1000 of SEQ ID NO: 86, such that the N-terminal amino acids are present outside of a virus particle, and C-terminal amino acids are present inside a virus particle. In some embodiments, an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 19–100 of SEQ ID NO: 86. In some embodiments, an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 6–104 of SEQ ID NO: 86. In some embodiments, an M protein portion comprises a β-sheet domain. In a naturally occurring SARS-CoV-2 M protein, the β-sheet domain corresponding, e.g., to amino acids 118– 222 of SEQ ID NO: 86, is C-terminal to the transmembrane domains of the M protein, such that the β-sheet domain is present inside a virus particle (virion), or inside the cytoplasm of an infected cell. In some embodiments, an M protein portion of a protein comprises a β-sheet domain that is C-terminal to one or more transmembrane domains of the M protein portion. Location of the β-sheet domain downstream from the one or more transmembrane domains allows the β-sheet domain to be localized in the cytoplasm when a protein having the M protein portion is expressed in a cell (e.g., by translation of an mRNA encoding the protein). In some embodiments, an M protein portion comprises the β-sheet domain of a full-length M protein, and the β-sheet domain is N-terminal to one or more transmembrane domains (e.g. M protein transmembrane domains). In some embodiments, the β-sheet domain is N-terminal to 1, 2, or 3 transmembrane domains of the full-length SARS-CoV-2 M protein. Rearrangement of the M protein domains in this manner, where the β-sheet domain is N-terminal to one or more transmembrane domains, allows the β-sheet domain to be located extracellularly when a protein comprising the M protein portion is expressed in a cell and embedded in the cell membrane. Without wishing to be bound by theory, it is expected that this localization of the β-sheet domain outside the cell membrane reduces interaction of the β-sheet domain with intracellular components, and also exposes the β-sheet domain for the generation of antibodies specific to the β-sheet domain. In some embodiments, an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 118–222 of SEQ ID NO: 86. In some embodiments, an M protein portion comprises an amino acid sequence that is at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to amino acids 105–222 of SEQ ID NO: 86. A SARS-CoV-2 M protein portion may comprise two or more amino acid sequences derived from a full-length M protein. The amino acid sequences of the full-length M protein may be derived from the same M protein, or different M proteins (e.g., M proteins of different SARS- CoV-2 lineages). Any pair of the two or more amino acid sequences may be contiguous in the M protein portion (e.g., without an intervening amino acid sequence), or the two amino acid sequences may be separated by a linker. Where multiple linkers separate multiple pairs of amino acid sequences, the multiple linkers may each comprise the same amino acid sequence (e.g., AAY). Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS). In some embodiments, two or more amino acid sequences of an M protein portion are connected by GGSGG (SEQ ID NO: 99). Non-structural protein 3 (Nsp3) Proteins and Portions Some aspects relate to proteins comprising non-structural protein 3 (Nsp3) portion and/or RNAs encoding the same. The SARS-CoV-2 Nsp3 protein is large and multifunctional. It is involved in several processes during the viral life cycle, including viral replication, host immune response modulation, and viral pathogenesis. Nsp3 plays a crucial role in the formation of the replication- transcription complex (RTC) and acts as a scaffold for various enzymatic activities. Structurally, the SARS-CoV-2 Nsp3 protein is a large, multidomain protein with a molecular weight of approximately 200 kDa. Some of the important domains within Nsp3 include the papain-like protease (PLpro) domain; the macrodomain (also known as ADP-ribose- 1''-phosphatase); and the ubiquitin-like domain (Ubl1), which are involved in various enzymatic activities and protein-protein interactions. Nsp3 protein sequences are highly conserved among coronaviruses, including SARS- CoV-2. In addition to the functions described above, it contributes to the virulence of SARS- CoV-2 via interaction with host factors, which an influence the severity of COVD-19 and its clinical manifestations. Like the N protein, Nsp3 is largely intracellular but may be exposed to the extracellular space upon cell lysis or through active secretion. Several T cell epitopes, both CD8+ and CD4+ T cell epitopes, are predicted to occur in Nsp3, which are conserved across multiple SARS-CoV-2 variants. In some embodiments, an mRNA encodes a protein comprising a SARS-CoV-2 non- structural protein 3 (Nsp3) protein portion. In some embodiments, a protein comprises a SARS- CoV-2 non-structural protein 3 (Nsp3) protein portion. A SARS-CoV-2 Nsp3 protein portion may comprise one or more T cell epitopes. Such T cell epitopes may be CD4+ T cell epitopes or CD8+ T cell epitopes. A CD4+ T cell epitope refers to an amino acid sequence that is presented on a class II major histocompatibility (MHC-II) protein. A CD8+ T cell epitope refers to an amino acid sequence that is presented on a class I major histocompatibility (MHC-I) protein. MHC-I proteins present peptides that are typically 8– 11 amino acids in length. A CD8+ T cell epitope may thus comprise an amino acid sequence 8– 11 amino acids in length. A protein may comprise a T cell epitope, such that when a peptide consisting of the amino acid sequence of the epitope is presented on a protein, a T cell recognizes the peptide:MHC complex. Peptides consisting of amino acid sequences present in a protein are generated by cleavage of proteins by a proteasome, which cleaves peptide bonds to release peptide fragments, and peptides are loaded into antigen-presenting grooves of MHC proteins. In some embodiments, an Nsp3 protein portion comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more T cell epitopes. In some embodiments, an Nsp3 protein portion comprises 2–6 T cell epitopes. In some embodiments, an Nsp3 protein portion comprises at least 6 T cell epitopes. In some embodiments, an Nsp3 protein portion comprises 6 T cell epitopes. In some embodiments, Nsp3 protein portion comprises 6 different epitope sequences. In some embodiments, Nsp3 protein portion comprises 6 different epitope sequences that do not overlap in the amino acid sequence of the protein. In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence ALRKVPTDNYITTY (SEQ ID NO: 149). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence SNEKQEILGTVSWNL (SEQ ID NO: 150). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence HTTDPSFLGRYMSAL (SEQ ID NO: 151). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence LVAEWFLAYILFTRFFYV (SEQ ID NO: 152). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence YIFFASFYYVWKSYV (SEQ ID NO: 153). In some embodiments, the Nsp3 protein portion comprises an epitope with the amino acid sequence AEAELAKNVSLDNVL (SEQ ID NO: 154). In some embodiments, the Nsp3 protein portion comprises each of SEQ ID NOs: 149–154. In some embodiments, an Nsp3 protein portion comprises two or more T cell epitopes, and those T cell epitopes occur in a different order in the Nsp3 protein portion than in a full- length SARS-CoV-2 Nsp3 protein. For example, if a naturally occurring SARS-CoV-2 Nsp3 protein comprises three epitopes (E1, E2, and E3) in the order E1–X–E2–X–E3, where each X is either a peptide bond, or one or more intervening amino acids between epitopes in the naturally occurring Nsp3 protein, a Nsp3 protein may comprise the three epitopes in the order same order (E1–E2–E3), or a different order. In some embodiments, the Nsp3 protein portion comprises two or more epitopes in the same order as they occur in a naturally occurring or full-length Nsp3. In some embodiments, the Nsp3 protein portion comprises two or more epitopes in a different order than they occur in a naturally occurring or full-length Nsp3 protein. In some embodiments, the Nsp3 protein portion lacks one or more junctional epitopes formed by an amino acid sequence that overlaps with a first epitope sequence and a second epitope sequence. For example, where a first epitope has the amino acid sequence ALRKVPTDNYITTY (SEQ ID NO: 149) and a second epitope has the amino acid sequence SNEKQEILGTVSWNL (SEQ ID NO: 150), a concatenation of both epitopes, having the amino acid sequence ALRKVPTDNYITTYSNEKQEILGTVSWNL (SEQ ID NO: 155), includes the junctional epitopes YITTYSNEK (SEQ ID NO: 156) and ITTYSNEKQ (SEQ ID NO: 157) (underlined amino acids are present in the first epitope sequence, and bolded amino acids are present in the second epitope sequence). Such junctional epitopes may be absent from a SARS- CoV-2 Nsp3 protein, and so T cells specific to those junctional epitopes are expected to provide little protection, if any, against a SARS-CoV-2 infection. Additionally, junctional epitopes formed by concatenation of two or more amino acid sequences may include amino acid sequences present in endogenous human proteins, or resemble amino acid sequences present in endogenous human proteins. Presentation of such junctional epitopes may cause deleterious activation of T cells that then exhibit a response to endogenous proteins in a subject. Accordingly, rearrangement of two or more epitopes of a full-length Nsp3 protein, in an Nsp3 protein portion, allows for the avoidance of one or more junctional epitopes, while maintaining the presence of those epitopes that are useful in generating an anti-SARS-CoV-2 T cell response. While rearrangement may also introduce a different junctional epitope, such an introduced junctional epitope may present on an MHC-I allele that is less common in the human population. For example, if a first junctional epitope presented on an MHC allele that is present in 25% of humans, and rearrangement of two Nsp3 epitopes eliminates this junctional epitope while introducing a second junctional epitope that is presented on an MHC allele that is present in only 5% of humans, this rearrangement reduces the chance that an Nsp3 protein portion having the rearranged epitopes will elicit off-target T cells, because the introduced junctional epitope is less likely to be presented than the one that was eliminated. In some embodiments, the Nsp3 protein portion lacks one or more amino acid sequences selected from YITTYSNEK (SEQ ID NO: 156), ITTYSNEKQ (SEQ ID NO: 157), TVSWNLHTT (SEQ ID NO: 158), WNLHTTDPS (SEQ ID NO: 159), RYMSALLVA (SEQ ID NO: 160), MSALLVAEW (SEQ ID NO: 161), SALLVAEWF (SEQ ID NO: 162), RFFYVYIFF (SEQ ID NO: 163), TRFFYVYIF (SEQ ID NO: 164), KSYVAEAEL (SEQ ID NO: 165), VWKSYVAEA (SEQ ID NO: 166), WKSYVAEAE (SEQ ID NO: 168), and VAEAELA (SEQ ID NO: 169). In some embodiments, two Nsp3 epitope sequences are connected by a linker. The presence of a linker may eliminate a junctional epitope that would otherwise be present if two epitope sequences were concatenated without any intervening amino acids. In some embodiments, the presence of a linker reduces the chance a junctional epitope will be presented on an MHC-I protein. In some embodiments, two of a pair of Nsp3 epitopes are connected by an AAY linker. The amino acid sequence AAY is a cleavage site for mammalian proteasomes, and so inclusion of an AAY linker facilitates cleavage between the two epitopes, increasing the efficiency of Nsp3 peptide epitope production and presentation. A SARS-CoV-2 Nsp3 portion may comprise two or more amino acid sequences derived from a full-length Nsp3. The amino acid sequences of the full-length Nsp3 may be derived from the same Nsp3, or different Nsp3s (e.g., Nsp3s of different SARS-CoV-2 lineages). Any pair of the two or more amino acid sequences may be contiguous in the Nsp3 portion (e.g., without an intervening amino acid sequence), or the two amino acid sequences may be separated by a linker. Where multiple linkers separate multiple pairs of amino acid sequences, the multiple linkers may each comprise the same amino acid sequence (e.g., AAY). Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS). In some embodiments, two or more amino acid sequences of an Nsp3 portion are connected by GGSGG (SEQ ID NO: 99). Spike (S) Proteins The envelope spike (S) proteins of known betacoronaviruses determine the virus host tropism and entry into host cells. Coronavirus spike (S) protein is a choice antigen for the vaccine design as it can induce neutralizing antibodies and protective immunity. S protein is critical for SARS-CoV-2 infection. The organization of the S protein is similar among betacoronaviruses, such as SARS-CoV-2, SARS-CoV, MERS-CoV, HKU1-CoV, MHV-CoV and NL63-CoV. As used herein, the term “Spike protein” refers to a glycoprotein that that forms homotrimers protruding from the envelope (viral surface) of viruses including betacoronaviruses. Trimerized Spike protein facilitates entry of the virion into a host cell by binding to a receptor on the surface of a host cell followed by fusion of the viral and host cell membranes. The S protein is a highly glycosylated and large type I transmembrane fusion protein that is made up of 1,160 to 1,400 amino acids, depending upon the type of virus. Betacoronavirus Spike proteins comprise between about 1100 to 1500 amino acids. SARS-CoV-2 spike (S) protein is a primary antigen choice for vaccine design, as it can induce neutralizing antibodies and protective immunity. mRNAs are designed to produce SARS- CoV-2 Spike proteins (i.e., encode Spike proteins such that Spike protein is expressed when the mRNA is delivered to a cell or tissue, for example a cell or tissue in a subject), as well as variants thereof. The skilled artisan will understand that, while an essentially full length or complete Spike protein may be necessary for a virus, e.g., a betacoronavirus, to perform its intended function of facilitating virus entry into a host cell, a certain amount of variation in Spike protein structure and/or sequence is tolerated when seeking primarily to elicit an immune response against Spike protein. For example, minor truncation, e.g., of one to a few, possibly up to 5 or up to 10 amino acids from the N- or C-terminus of the encoded Spike protein, e.g., encoded Spike protein antigen, may be tolerated without changing the antigenic properties of the protein. Likewise, variation (e.g., conservative substitution) of one to a few, possibly up to 5 or up to 10 amino acids (or more) of the encoded Spike protein, e.g., encoded Spike protein antigen, may be tolerated without changing the antigenic properties of the protein. In some embodiments, the Spike protein is a stabilized Spike protein, for example, the Spike protein is stabilized by two proline substitutions (a 2P mutation). In some embodiments, the Spike protein is not a stabilized Spike protein, for example, the Spike protein is not stabilized by two proline substitutions (a 2P mutation). In some embodiments, the Spike protein is from a different virus strain. A strain is a genetic variant of a microorganism (e.g., a virus). New viral strains can be created due to mutation, which may be selected due to enhanced replication, transmissibility, and/or evasion of pre-existing immune responses (e.g., antigenic drift), or recombination of genetic components when two or more viruses infect the same cell, with such recombinant viruses being selected due to enhanced replication, transmissibility, and/or evasion of pre-existing immune responses. Antigenic drift is a kind of genetic variation in viruses, arising by the accumulation of mutations in the virus genes that code for virus-surface proteins recognized by host immune responses (antibodies and T cells). This results in selection for new virus strains that are not effectively inhibited by the antibodies and/or T cell responses that prevented or mitigated infection by previous (wild-type or ancestral) strains. This makes it easier for the changed virus to spread throughout a partially immune population. Antigenic shift is the process by which two or more different strains of a virus, or strains of two or more different viruses, combine to form a new subtype having a mixture of the surface antigens of the two or more original strains, which may create virus with a novel combination of surface antigens that did not previously exist in nature. The term is often applied specifically to influenza viruses, where segmentation of the viral genome into distinct RNA segments, and reassortment of genome segments during virion production, allows the production of reassortant progeny with novel combinations of genome segments from co-infected cells. However, genetic recombination may occur between non-segmented viruses (e.g., SARS-CoV-2) where multiple viral strains replicate in the same cell, e.g., by switching between two template genomes during replication, resulting in progeny genomes with combinations of sequences from two or more viral strains. Antigenic shift is contrasted with antigenic drift (in which individual mutations accumulate over time, and may lead to a loss of immunity, or in vaccine mismatch). In contrast to accumulation of mutations on an ancestral genome (antigenic drift), genetic recombination (through reassortment or molecular recombination of genomes) is often associated with a major reorganization of viral surface antigens, resulting in novel combination of genes, which may cause a more immediate and drastic change in viral phenotype. A virus strain as used herein is a genetic variant or of a virus that is characterized by a differing isoform of one or more surface proteins of the virus. In the case of SARS-CoV-2, for example, a different amino acid sequence in the SARS-CoV-2 spike protein where the immune response in an individual to the new strain is less effective than to the strain used to immunize or first infect the individual. A new virus strain may arise from natural mutation or a combination of natural mutation and immune selection due to an ongoing immune response in an immunized or previously infected individual. A new virus strain can differ by one, two, three or more amino acid mutations in regions of the spike protein responsible for a viral function such as receptor binding or viral fusion with a target cell. A spike protein from a new strain may differ from the parental strain by as much as 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity at the amino acid level. A natural virus strain is a variant of a given virus that is recognizable because it possesses some “unique phenotypic characteristics” that remain stable (e.g., stable and heritable biological, serological, and/or molecular characters) under natural conditions. Such “unique phenotypic characteristics” are biological properties different from the compared reference virus, such as unique antigenic properties, host range (e.g., infecting a different kind of host), symptoms of disease caused by the strain, different type of disease caused by the strain (e.g., transmitted by different means), etc. A “unique phenotypic characteristic” can be detected clinically (e.g., clinical manifestations detected in a host infected with the strain) or within a comparative animal experiment in which a researcher skilled in the art of virology can distinguish between the reference control virus-infected animal and the animal infected with the alleged new strain, without knowing which animal received which virus and without having any information about the differences between the two viruses. Importantly, a virus variant with a simple difference in genome sequence is not a separate strain if there is no recognizable distinct viral phenotype. The extent of genomic sequence variation is irrelevant for the classification of a variant as a strain since a distinct phenotype sometimes arises from few mutations. As an example, in some embodiments, the mRNA encodes an antigen from at least one virus strain variant or comprises mutations from at least one virus strain that is not wild-type SARS-CoV-2. In some embodiments, the vaccine comprises an mRNA encoding a Spike protein associated with the XBB.1.5 lineage variant. The XBB.1.5 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan-Hu-1 Spike protein (SEQ ID NO: 87), including an N460K substitution, an F486P substitution, and an F490S substitution in the Spike protein. In some embodiments, an mRNA encodes a Spike protein with at least one substitution associated with the XBB.1.5 lineage variant. In some embodiments, an mRNA encodes a Spike protein with an N460K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F490S substitution relative to SEQ ID NO: 87. In some embodiments, the vaccine comprises an mRNA encoding a Spike protein associated with the XBB.1.16 lineage variant. The XBB.1.16 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan-Hu-1 Spike protein (SEQ ID NO: 87), including a Q183E substitution, an F456L substitution, an F486P substitution, and an F490S substitution in the Spike protein. In some embodiments, an mRNA encodes a Spike protein with at least one substitution associated with the XBB.1.16 lineage variant. In some embodiments, an mRNA encodes a Spike protein with an Q183E substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F456L substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a Spike protein with an F490S substitution relative to SEQ ID NO: 87. Table 4, below, presents examples of Spike protein mutations in SARS-CoV-2 variants. In some embodiments, a Spike protein, e.g., an encoded Spike protein portion, has the amino acid sequence of SEQ ID NO: 87. In some embodiments, a Spike protein, e.g., an encoded Spike protein antigen, has no greater than 100, no greater than 90, no greater than 80, no greater than 70, no greater than 60, no greater than 50, no greater than 40, no greater than 30, no greater than 20, no greater than 10, or no greater than 5 amino acid substitutions and/or deletions as compared to (when aligned with) a Spike protein having the amino acid sequence as provided in SEQ ID NO: 87. Where minor variations are made in encoded Spike protein sequences, the variant preferably has the same activity as the reference Spike protein sequence and/or has the same immune specificity as the reference Spike protein, as determined for example, in immunoassays (e.g., enzyme-linked immunosorbent assays (ELISA assays). S proteins of coronaviruses can be divided into two important functional subunits, of which include the N-terminal S1 subunit, which forms of the globular head of the S protein, and the C-terminal S2 region that forms the stalk of the protein and is directly embedded into the viral envelope. Upon interaction with a potential host cell, the S1 subunit will recognize and bind to receptors on the host cell, specifically angiotensin-converting enzyme 2 (ACE2) receptors, whereas the S2 subunit, which is the most conserved component of the S protein, will be responsible for fusing the envelope of the virus with the host cell membrane. (See e.g., Shang et al., PLoS Pathog.2020 Mar; 16(3):e1008392.). Each monomer of trimeric S protein trimer contains the two subunits, S1 and S2, mediating attachment and membrane fusion, respectively. As part of the infection process in vivo, the two subunits are separated from each other by an enzymatic cleavage process. S protein is first cleaved by furin-mediated cleavage at the S1/S2 site in infected cells. In vivo, a subsequent serine protease-mediated cleavage event occurs at the S2′ site within S1. In SARS-CoV2, the S1/S2 cleavage site is at amino acids 676 – TQTNSPRRAR/SVA – 688 (SEQ ID NO: 45). The S2’ cleavage site is at amino acids 811 – KPSKR/SFI – 818 (SEQ ID NO: 46). As used herein, for example in the context of designing SARS-CoV-2 S protein antigens encoded by the nucleic acids, e.g., mRNAs, the term “S1 subunit” (e.g., S1 subunit antigen) refers to the N-terminal subunit of the Spike protein beginning at the S protein N-terminus and ending at the S1/S2 cleavage site whereas the term “S2 subunit” (e.g., S2 subunit antigen) refers to the C-terminal subunit of the Spike protein beginning at the S1/S2 cleavage site and ending at the C-terminus of the Spike protein. As described supra, the skilled artisan will understand that, while an essentially full length or complete Spike protein S1 or S2 subunit may be necessary for receptor binding or membrane fusion, respectively, a certain amount of variation in S1 or S2 structure and/or sequence is tolerated when seeking primarily to elicit an immune response against Spike protein subunits. For example, minor truncation, e.g., of one to a few, possibly up to 4, 5, 6, 7, 8, 9 or 10 amino acids from the N- or C-terminus of the encoded subunit, e.g., encoded S1 or S2 protein antigens, may be tolerated without changing the antigenic properties of the protein. Likewise, variation (e.g., conservative substitution) of one to a few, possibly up to 4, 5, 6, 7, 8, 9 or 10 amino acids (or more) of the encoded Spike protein subunits, e.g., encoded S1 or S2 protein antigen, may be tolerated without changing the antigenic properties of the protein(s). In some embodiments, a Spike protein, e.g., an encoded Spike protein antigen, has an amino acid sequence of SEQ ID NO: 87. The S1 and S2 subunits of the SARS-CoV-2 Spike protein further include domains readily discernable by structure and function, which in turn can be featured in designing antigens to be encoded by the nucleic acid vaccines, in particular, mRNA vaccines. Within the S1 subunit, domains include the N-terminal domain (NTD) and the receptor-binding domain (RBD), said RBD domain further including a receptor-binding motif (RBM) Within the S2 subunit, domains include fusion peptide (FP), heptad repeat 1 (HR1), heptad repeat 2 (HR2), transmembrane domain (TM), and cytoplasm domain, also known as cytoplasmic tail (CT) (Lu R. et al., supra; Wan et al., J. Virol. Mar 2020, 94 (7) e00127-20). The HR1 and HR2 domains can be referred to as the “fusion core region” of SARS-CoV-2 (Xia et al., 2020 Cell Mol Immunol. Jan; 17(1):1- 12.). The S1 subunit includes an N terminal domain (NTD), a linker region, a receptor binding domain (RBD), a first subdomain (SD1), and a second subdomain (SD2). The S2 subunit includes, inter alia, a first heptad repeat (HR1), a second heptad repeat (HR2), a transmembrane domain (TM), and a cytoplasmic tail. The NTD and RBD of S1 are good antigens for the vaccine design approach of some embodiments, as these domains have been shown to be the targets of neutralizing antibodies in betacoronavirus-infected individuals. As used herein, for example, in the context of an antigen design (said antigen encoded by an mRNA and to be expressed, for example, from an mRNA vaccine), the term “N-terminal domain” or “NTD” refers to a domain within the SARS-CoV-2 S1 subunit comprising approximately 290 amino acids in length, having identity to amino acids 1-290 of the S1 subunit of the Spike protein having the amino acid sequence set forth as SEQ ID NO: 87. As used herein, for example, in the context of an antigen design (said antigen encoded by an mRNA and to be expressed, for example, from an mRNA vaccine), the term “receptor binding domain” or “RBD” refers to a domain within the S1 subunit of SARS-CoV-2 comprising approximately 175-225 amino acids in length, having identity to amino acids 316-517 of the S1 subunit of the Spike protein having the amino acid sequence set forth as SEQ ID NO: 87. As used herein, the term “receptor binding motif” refers to the portion of the RBD that directly contacts the ACE2 receptor. Expressed RBDs are predicted to specifically bind to angiotensin-converting enzyme 2 (ACE2) as its receptor and/or specifically react with RBD-binding and/or neutralizing antibodies, e.g., CR3022. In some embodiments these antigens include a stabilizing 2P mutation. Compositions may include mRNA that encodes any one or more full-length or partial (truncated or other deletion of sequence) S protein subunit (e.g., S1 or S2 subunit), one or more domain or combination of domains of an S protein subunit (e.g., NTD, RBD, or NTD-RBD fusions, with or without an SD1 and/or SD2), or chimeras of full-length or partial and S2 protein subunits. Other S protein subunit and/or domain configurations are contemplated herein. Proteins comprising one or more Spike protein domains (e.g., NTD and RBD) may comprise one or more mutations associated with a virus strain variant, in the respective domain of the encoded protein. For example, an encoded NTD-RBD fusion protein may comprise a substitution corresponding to N460K in a full-length Spike protein, which the skilled artisan will understand refers to substitution of asparagine for lysine, where the substituted asparagine is one corresponding to N460 of a full-length Spike protein, when the NTD-RBD fusion protein sequence is aligned to a full-length Spike protein sequence (e.g., SEQ ID NO: 87). In some embodiments, the vaccine comprises an mRNA encoding an NTD-RBD fusion protein comprising one or more mutations associated with the XBB.1.5 lineage variant. The XBB.1.5 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan- Hu-1 Spike protein (SEQ ID NO: 87), including an N460K substitution, an F486P substitution, and an F490S substitution in the Spike protein. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with at least one substitution associated with the XBB.1.5 lineage variant. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an N460K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an F490S substitution relative to SEQ ID NO: 87. In some embodiments, the vaccine comprises an mRNA encoding an NTD-RBD fusion protein comprising one or more mutations associated with the XBB.1.16 lineage variant. The XBB.1.16 lineage variant encodes a Spike protein with multiple mutations relative to an ancestral Wuhan-Hu-1 Spike protein (SEQ ID NO: 87), including a Q183E substitution, an F456L substitution, an F486P substitution, and an F490S substitution in the Spike protein. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with at least one substitution associated with the XBB.1.16 lineage variant. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an Q183E substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an F456L substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes an NTD-RBD fusion protein with an F490S substitution relative to SEQ ID NO: 87. Table 4, below, presents examples of Spike protein mutations in SARS-CoV-2 variants. Ta V 2
Figure imgf000041_0001
. 20A.EU2 S477N-D614G N439K-D614G N439K-D614G M B ( 2 B B ( P - B B V B V B A V B , B , X a X a
Figure imgf000042_0001
, Q , An exemplary sequence of a protein comprising one or more domains of a coronavirus Spike protein (e.g., NTD-RBD fusion proteins) encoded by mRNAs is provided as SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having at least one of the following mutations relative to SEQ ID NO: 87: T19I, A27S, V83A, G142D, H146Q, E180V, Q183E, V213E, G252V, G339H, R346T, L368I, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, V445P, G446S, N460K, S477N, T478K, T478R, E484A, F486P, F490S, Q498R, N501Y, Y505H, D614G, H655Y, N679K, P681H, N764K, D796Y, Q954H, and N969K. In some embodiments, the mRNA encodes a protein having 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 substitutions selected from T19I, A27S, V83A, G142D, H146Q, E180V, Q183E, V213E, G252V, G339H, R346T, L368I, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, V445P, G446S, N460K, S477N, T478K, T478R, E484A, F486P, F490S, Q498R, N501Y, Y505H, D614G, H655Y, N679K, P681H, N764K, D796Y, Q954H, and N969K. In some embodiments, an mRNA encodes a protein having a T19I substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a A27S substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a V83A substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G142D substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a H146Q substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a E180V substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a Q183E substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a V213E substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G252V substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G339H substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a R346T substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a L368I substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a S371F substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a S373P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a S375F substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a T376A substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a D405N substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a R408S substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a K417N substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N440K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a V445P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a G446S substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N460K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a S477N substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a T478K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a T478R substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a E484A substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a F486P substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a F490S substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a Q498R substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N501Y substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a Y505H substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a D614G substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a H655Y substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N679K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a P681H substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N764K substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a D796Y substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a Q954H substitution relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a N969K substitution relative to SEQ ID NO: 87. In some embodiments, the mRNA encodes a protein having one or more deletions relative to the SARS-CoV-2 S protein of SEQ ID NO: 87. Exemplary deletions include, but are not limited to, deletions of L24, P25, P26, and Y144. In some embodiments, the mRNA encodes a protein lacking 1, 2, 3, or 4 amino acids corresponding to L24, P25, P26, or Y144 of SEQ ID NO: 87. In some embodiments, the mRNA encodes a protein having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 substitutions selected from T19I, A27S, V83A, G142D, H146Q, E180V, Q183E, V213E, G252V, G339H, R346T, L368I, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, V445P, G446S, N460K, S477N, T478K, T478R, E484A, F486P, F490S, Q498R, N501Y, Y505H, D614G, H655Y, N679K, P681H, N764K, D796Y, Q954H, and N969K relative to SEQ ID NO: 87, and lacking 1, 2, 3, or 4 amino acids corresponding to L24, P25, P26, or Y144 of SEQ ID NO: 87, any combination thereof. In some embodiments, an mRNA encodes a protein having a deletion of L24 relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a deletion of P25 relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a deletion of P26 relative to SEQ ID NO: 87. In some embodiments, an mRNA encodes a protein having a deletion of Y144 relative to SEQ ID NO: 87. In some embodiments, the mRNA vaccine comprises 1, 2, 3, 4, 5, or 6 mRNAs encoding different proteins, wherein each protein comprises at least one mutation and/or at least one deletion. In some embodiments, the mRNA vaccine further comprises an mRNA encoding a wild-type SARS-CoV-2 S protein or the antigenic fragment thereof. The mRNA vaccine, in some embodiments, is in a lipid nanoparticle (that is, the lipid nanoparticle comprises 1, 2, 3, 4, 5, or 6 mRNAs encoding different protein). In some embodiments, a composition comprises a first mRNA encoding a protein or variant thereof of a first SARS-CoV-2 virus and a second mRNA encoding a second protein or variant thereof of a second SARS-CoV-2 virus. In some embodiments, the first SARS-CoV-2 virus is a first circulating SARS-CoV-2 virus. In some embodiments, the second SARS-CoV-2 virus is a second circulating SARS-CoV-2 virus. “Circulating viruses” as used herein refer to viruses that have been in circulation for 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, a portion of a year, 1 year, 1.5 years, 2 years, 3 years, or longer. In some embodiments, the first and second mRNAs are present in the composition in a 1:1, 1:2, 1:3, or 1:4 ratio. In some embodiments, the first and second mRNAs are present in the composition in a 2:1, 3:1, or 4:1 ratio. In some embodiments, the first and second mRNAs are present in the composition in a 1:1 ratio. In some embodiments, a S protein portion comprises a receptor binding domain (RBD) from a SARS-CoV-2 Spike protein. In some embodiments, a S protein portion comprises an N-terminal domain (NTD) from a SARS-CoV-2 Spike protein. Fusion Proteins Some aspects relate to SARS-CoV-2 chimeric proteins comprising one or more portions of a SARS-CoV-2 N protein, a SARS-CoV-2 M protein, a SARS-CoV-2 Nsp3 protein, and/or a SARS-CoV-2 S protein, and/or RNAs encoding the same. Thus, the encoded protein or proteins may include two or more proteins (e.g., protein and/or protein portions) joined together with or without a linker. Alternatively, the protein to which a protein is fused does not promote a strong immune response to itself, but rather to the coronavirus antigen. Antigenic fusion proteins, in some embodiments, retain the functional property from each original protein. In some embodiments, a chimeric protein comprises an N protein portion, an Nsp3 protein portion, and an M protein portion. The N protein portion, Nsp3 protein portion, and M protein portion may occur in any order in the chimeric protein. In some embodiments, the portions occur, from N-to-C-terminal order, N-Nsp3-M. In some embodiments, the portions occur, from N-to-C-terminal order, N-M-Nsp3. In some embodiments, the portions occur, from N-to-C-terminal order, M-N-Nsp3. In some embodiments, the portions occur, from N-to-C- terminal order, M-Nsp3-N. In some embodiments, the portions occur, from N-to-C-terminal order, N-Nsp3-M. In some embodiments, the portions occur, from N-to-C-terminal order, N-M- Nsp3. Those skilled in the art will understand that in a chimeric protein comprising an M protein portion including one or more transmembrane domains, the amino acids N-terminal to the transmembrane domain(s) will be present extracellularly when the protein is embedded in a cell membrane, and the amino acids C-terminal to the transmembrane domain(s) will be present inside the cytoplasm when the protein is embedded in the cell membrane. In some embodiments, a chimeric protein comprises an S protein portion, an N protein portion, and a transmembrane portion comprising one or more transmembrane domains. The transmembrane portion may comprise an M protein portion. In some embodiments, the transmembrane protein portion comprises a transmembrane domain from a protein other than a SARS-CoV-2 M protein. In some embodiments, the transmembrane domain is an influenza virus hemagglutinin (HA) transmembrane domain. The S portion, N portion, and transmembrane portions may occur in any order. In some embodiments, the S portion is N-terminal to the transmembrane portion. In some embodiments, the N portion is N-terminal to the transmembrane portion. In some embodiments, the N portion is C-terminal to the transmembrane portion. Those skilled in the art will understand that in a chimeric protein comprising a transmembrane portion including one or more transmembrane domains, the amino acids N-terminal to the transmembrane domain(s) will be present extracellularly when the protein is embedded in a cell membrane, and the amino acids C-terminal to the transmembrane domain(s) will be present inside the cytoplasm when the protein is embedded in the cell membrane. The presence of a S portion extracellularly, in particular, allows exposure to the extracellular environment, facilitating more efficient generation of anti-Spike antibodies in a subject. In some embodiments, a fusion protein comprises a transmembrane domain. The transmembrane domain may, in some embodiments, be from a virus that is not SARS-CoV-2. For example, the transmembrane domain may be from an influenza hemagglutinin transmembrane domain, which has been demonstrated to effectively anchor proteins at the cell surface. Any pair of protein portions (e.g., N portion, M portion, Nsp3 portion, S portion) may be contiguous in the chimeric protein (e.g., without an intervening amino acid sequence), or the two portions may be separated by a linker. Where multiple linkers separate multiple pairs of portions, the multiple linkers may each comprise the same amino acid sequence (e.g., AAY). Multiple linkers may comprise different amino acid sequences (e.g., a first linker comprises the amino acid sequence GGS, and a second linker comprises the amino acid sequence GGS). In some embodiments, two or more portions of a chimeric protein portion are connected by a glycine linker or a glycine-serine linker. In some embodiments, each pair of protein portions of a chimeric protein are connected by a glycine linker or a glycine-serine linker. In some embodiments, each pair of protein portions are connected by the amino acid sequence GGSGG (SEQ ID NO: 99). In some embodiments, two or more portions of a chimeric protein are connected by a linker comprising the amino acid sequence AAY. In some embodiments, each pair of protein portions are connected by the amino acid sequence AAY. In some embodiments, two or more portions of a chimeric protein are connected by a linker comprising the amino acid sequence RKSY (SEQ ID NO: 136). In some embodiments, each pair of protein portions are connected by the amino acid sequence RKSY (SEQ ID NO: 136). The amino acid sequence RKSY (SEQ ID NO: 136) is a cleavable linker. Thus, inclusion of an RKSY (SEQ ID NO: 136) linker facilitates cleavage between the two connected protein portions, increasing the efficiency of peptide epitope production and presentation. In some embodiments, no linkers are present between protein portions in the chimeric protein. In some embodiments, no linkers are present between protein portions in the chimeric protein. Some aspects relate to a SARS-CoV-2 chimeric protein comprising an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183. Exemplary sequences of SARS-CoV-2 chimeric proteins are provided in Appendix I and Table 5. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 101. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 102. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 103. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 104. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 105. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 106. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 107. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 108. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 109. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 110. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 111. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 112. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 113. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 114. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 115. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 116. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 117. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 118. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 119. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 120. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 121. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 122. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 123. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 124. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 170. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 171. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 172. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 173. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 174. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 175. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 176. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 177. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 178. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 179. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 180. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 181. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 182. In some embodiments, a SARS-CoV-2 chimeric protein comprises an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the amino acid sequence of SEQ ID NO: 183. Variants In some embodiments, compositions include RNA that encodes a SARS-CoV-2 antigen variant. Antigen variants or other polypeptide variants refers to molecules that differ in their amino acid sequence from a wild-type, native, or reference sequence. The antigen/polypeptide variants may possess substitutions, deletions, and/or insertions at certain positions within the amino acid sequence, as compared to a native or reference sequence. Ordinarily, variants possess at least 50% identity to a wild-type, native or reference sequence. In some embodiments, variants share at least 80%, or at least 90% identity with a wild-type, native, or reference sequence. In some embodiments, the nucleic acid vaccines encode SARS-CoV-2 variant proteins comprising 1, 2, 3, 4, or more mutations relative to a reference sequence. In some embodiments, the nucleic acid vaccines encode SARS-CoV-2 variant proteins comprising less than 20, 18, 15, 12, or 10 mutations relative to a reference sequence. In some embodiments, the nucleic acid vaccines encode SARS-CoV-2 variant proteins having 1-501-40, 1-30, 1-25, 1-20, 1-15, 1-10, 5-50, 5-40, 5-30, 5-25, 5-20, 5-15, 5-10, 10-50, 10-40, 10-30, 10-25, 10-20, 10-15, 20-50, 20-40, 20-30, 20- 25, 25-50, 25-40, 25-30, 30-50, 30-40, 40-50 mutations (e.g., substitutions). As used herein, “mutation” refers to an amino acid substitution, insertion, or deletion. A reference sequence refers to a naturally-occurring strain, for example, a naturally-occurring circulating strain of SARS-CoV-2. Variant antigens/polypeptides encoded by nucleic acids may contain amino acid changes that confer any of a number of desirable properties, e.g., that enhance their immunogenicity, enhance their expression, and/or improve their stability or PK/PD properties in a subject. Variant antigens/polypeptides can be made using routine mutagenesis techniques and assayed as appropriate to determine whether they possess the desired property. Assays to determine expression levels and immunogenicity are well known in the art and exemplary such assays are set forth in the Examples section. Similarly, PK/PD properties of a protein variant can be measured using art recognized techniques, e.g., by determining expression of antigens in a vaccinated subject over time and/or by looking at the durability of the induced immune response. The stability of protein(s) encoded by a variant nucleic acid may be measured by assaying thermal stability or stability upon urea denaturation or may be measured using in silico prediction. Methods for such experiments and in silico determinations are known in the art. In some embodiments, a composition comprises an RNA or an RNA ORF that comprises a nucleotide sequence of any one of the sequences in Table 5, or comprises a nucleotide sequence at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to a nucleotide sequence of any one of the sequences in Table 5. The term “identity” refers to a relationship between the sequences of two or more polypeptides (e.g. antigens) or polynucleotides (nucleic acids), as determined by comparing the sequences. Identity also refers to the degree of sequence relatedness between or among sequences as determined by the number of matches between strings of two or more amino acid residues or nucleic acid residues. Identity measures the percent of identical matches between the smaller of two or more sequences with gap alignments (if any) addressed by a particular mathematical model or computer program (e.g., “algorithms”). Identity of related antigens or nucleic acids can be readily calculated by known methods. “Percent (%) identity” as it applies to polypeptide or polynucleotide sequences is defined as the percentage of residues (amino acid residues or nucleic acid residues) in the candidate amino acid or nucleic acid sequence that are identical with the residues in the amino acid sequence or nucleic acid sequence of a second sequence after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. It is understood that identity depends on a calculation of percent identity but may differ in value due to gaps and penalties introduced in the calculation. Generally, variants of a particular polynucleotide or polypeptide (e.g., antigen) have at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% but less than 100% sequence identity to that particular reference polynucleotide or polypeptide as determined by sequence alignment programs and parameters known to those skilled in the art. Such tools for alignment include those of the BLAST suite (Stephen F. Altschul, et al (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res.25:3389-3402). Another popular local alignment technique is based on the Smith-Waterman algorithm (Smith, T.F. & Waterman, M.S. (1981) “Identification of common molecular subsequences.” J. Mol. Biol.147:195-197). A general global alignment technique based on dynamic programming is the Needleman–Wunsch algorithm (Needleman, S.B. & Wunsch, C.D. (1970) “A general method applicable to the search for similarities in the amino acid sequences of two proteins.” J. Mol. Biol.48:443-453). More recently a Fast Optimal Global Sequence Alignment Algorithm (FOGSAA) has been developed that purportedly produces global alignment of nucleotide and protein sequences faster than other optimal global alignment methods, including the Needleman–Wunsch algorithm. As such, polynucleotides encoding peptides or polypeptides containing substitutions, insertions and/or additions, deletions and covalent modifications with respect to reference sequences are contemplated. For example, sequence tags or amino acids, such as one or more lysines, can be added to peptide sequences (e.g., at the N-terminal or C-terminal ends). Sequence tags can be used for peptide detection, purification or localization. Lysines can be used to increase peptide solubility or to allow for biotinylation. Alternatively, amino acid residues located at the carboxy and amino terminal regions of the amino acid sequence of a peptide or protein may optionally be deleted providing for truncated sequences. Certain amino acids (e.g., C-terminal or N-terminal residues) may alternatively be deleted depending on the use of the sequence, as for example, expression of the sequence as part of a larger sequence which is soluble or linked to a solid support. In some embodiments, sequences for (or encoding) signal sequences, termination sequences, transmembrane domains, linkers, multimerization domains (such as, e.g., foldon regions) and the like may be substituted with alternative sequences that achieve the same or a similar function. In some embodiments, cavities in the core of proteins can be filled to improve stability, e.g., by introducing larger amino acids. In some embodiments, buried hydrogen bond networks may be replaced with hydrophobic resides to improve stability. In yet some embodiments, glycosylation sites may be removed and replaced with appropriate residues. Such sequences are readily identifiable to one of skill in the art. It should also be understood that some of the sequences contain sequence tags or terminal peptide sequences (e.g., at the N-terminal or C-terminal ends) that may be deleted, for example, prior to use in the preparation of an mRNA vaccine. As recognized by those skilled in the art, protein fragments, functional protein domains, and homologous proteins are also considered to be within the scope of coronavirus antigens of interest. For example, any protein fragment (meaning a polypeptide sequence at least one amino acid residue shorter than a reference antigen sequence but otherwise identical) of a reference protein, provided that the fragment is immunogenic and confers a protective immune response to the coronavirus, may be suitable. In addition to variants that are identical to the reference protein but are truncated, in some embodiments, an antigen includes 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mutations, as shown in any of the sequences provided or referenced herein. Antigens/antigenic polypeptides can range in length from about 4, 6, or 8 amino acids to full length proteins. Linkers and Cleavable Peptides In some embodiments, an RNA (e.g., mRNA) that encodes a protein encodes a linker located between at least one or each domain (portion) of the protein. The linker may be, for example, a cleavable linker or protease-sensitive linker. In some embodiments, the linker is selected from the group consisting of F2A linker, P2A linker, T2A linker, E2A linker, and combinations thereof (see, e.g., WO 2017/127750). This family of self-cleaving peptide linkers, referred to as 2A peptides, has been described in the art (see, e.g., Kim, J.H. et al. PLoS ONE 2011;6:e18556). In some embodiments, the linker is an F2A linker. In some embodiments, the linker is a GS linker. GS linkers are polypeptide linkers that include glycine and serine amino acids repeats. They comprise flexible and hydrophilic residues and can be used to perform fusion of protein subunits without interfering in the folding and function of the protein domains, and without formation of secondary structures. In some embodiments, an RNA (e.g., mRNA) encodes a protein that comprises a GS linker that is 3 to 20 amino acids long. For example, the GS linker may have a length of (or have a length of at least) 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids. In some embodiments, a GS linker is (or is at least) 15 amino acids long (e.g., GGSGGSGGSGGSGGG (SEQ ID NO: 47)). In some embodiments, a GS linker is (or is at least) 8 amino acids long (e.g., GGGSGGGS (SEQ ID NO: 48)). In some embodiments, a GS linker is (or is at least) 7 amino acids long (e.g., GGGSGGG (SEQ ID NO: 49)). In some embodiments, a GS linker comprises the amino acid sequence GGGSGG (SEQ ID NO: 50). In some embodiments, a GS linker is (or is at least) 4 amino acid long (e.g., GGGS (SEQ ID NO: 51)). In some embodiments, the GS linker comprises (GGGS)n (SEQ ID NO: 127), where n is any integer from 1-5. In some embodiments, a GS linker is (or is at least) 4 amino acid long (e.g., GSGG (SEQ ID NO: 52)). In some embodiments, the GS linker comprises (GSGG)n (SEQ ID NO: 130), where n is any integer from 1-5. In some embodiments, a linker is a glycine linker, for example having a length of (or a length of at least) 3 amino acids (e.g., GGG). In some embodiments, a protein encoded by an RNA (e.g., mRNA) includes two or more linkers, which may be the same or different from each other. The skilled artisan will appreciate that other art-recognized linkers may be suitable for use in the constructs (e.g., encoded by nucleic acids). The skilled artisan will likewise appreciate that other polycistronic constructs (RNA (e.g., mRNA) encoding more than one protein separately within the same molecule) may be suitable. Signal Peptides In some embodiments, an RNA (e.g., mRNA) has an ORF that encodes a signal peptide fused to a protein. Signal peptides, comprising the N-terminal 15-60 amino acids of proteins, are typically involved for the translocation across the membrane on the secretory pathway and, thus, control the entry of proteins both in eukaryotes and prokaryotes to the secretory pathway. In eukaryotes, the signal peptide of a nascent precursor protein (pre-protein) directs the ribosome to the rough endoplasmic reticulum (ER) membrane and initiates the transport of the growing peptide chain across it for processing. ER processing produces mature proteins, wherein the signal peptide is cleaved from precursor proteins, typically by an ER-resident signal peptidase of the host cell, or they remain uncleaved and function as a membrane anchor. A signal peptide may also facilitate the targeting of the protein to the cell membrane. A signal peptide may have a length of 15-60 amino acids. For example, a signal peptide may have a length of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids. In some embodiments, a signal peptide has a length of 20-60, 25-60, 30-60, 35- 60, 40-60, 45- 60, 50-60, 55-60, 15-55, 20-55, 25-55, 30-55, 35-55, 40-55, 45-55, 50-55, 15-50, 20-50, 25-50, 30-50, 35-50, 40-50, 45-50, 15-45, 20-45, 25-45, 30-45, 35-45, 40-45, 15-40, 20- 40, 25-40, 30-40, 35-40, 15-35, 20-35, 25-35, 30-35, 15-30, 20-30, 25-30, 15-25, 20-25, or 15-20 amino acids. Signal peptides from heterologous genes (which regulate expression of genes other than SARS-CoV-2 proteins in nature) are known in the art and can be tested for desired properties and then incorporated into a nucleic acid. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame that encodes a protein fused to a signal peptide comprising an amino acid sequence that has at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of any one of the sequences, such as those reproduced below in Table 5. In some embodiments, an mRNA comprises an open reading frame that encodes a protein including an endogenous signal peptide of the wild-type protein (e.g., an mRNA encoding a (wild-type or modified) SARS-CoV-2 protein or variant thereof encodes a SARS-CoV-2 signal peptide). In some embodiments, an mRNA comprises an open reading frame that encodes a SARS-CoV-2 protein having an influenza virus hemagglutinin (HA) signal peptide. In some embodiments, the SARS-CoV-2 protein comprises the amino acid sequence of SEQ ID NO: 100. Nucleic Acids Encoding SARS-CoV-2 Proteins Nucleic acids comprise a polymer of nucleotides (nucleotide monomers). Thus, nucleic acids are also referred to as polynucleotides. Nucleic acids may be or may include, for example, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), threose nucleic acid (TNA), glycol nucleic acid (GNA), peptide nucleic acid (PNA), locked nucleic acid (LNA, including LNA having a β-D-ribo configuration, α-LNA having an α-L-ribo configuration (a diastereomer of LNA), 2′-amino-LNA having a 2′-amino functionalization, and 2′-amino- α-LNA having a 2′- amino functionalization), ethylene nucleic acid (ENA), cyclohexenyl nucleic acid (CeNA) and/or chimeras and/or combinations thereof. RNA (e.g., mRNA) comprises an open reading frame (ORF) encoding a SARS-CoV-2 protein or variant thereof. In some embodiments, the RNA (e.g., mRNA) further comprises a 5 ^ untranslated region (UTR), 3 ^ UTR, a poly(A) tail and/or a 5 ^ cap analog. Messenger RNA (mRNA) Messenger RNA (mRNA) is RNA that encodes a (at least one) protein (a naturally- occurring, non-naturally-occurring, or modified polymer of amino acids) and can be translated to produce the encoded protein in vitro, in vivo, in situ, or ex vivo. It is understood that mRNA is not self-amplifying RNA (saRNA) (see, e.g., Bloom K et al. Gene Therapy 2021; 28: 117–129 for a comparison of mRNA and saRNA). saRNAs include alphavirus replicase sequences that encode an RNA-dependent RNA polymerase. mRNA does not include alphavirus replicase sequences. The skilled artisan will appreciate that, except where otherwise noted, nucleic acid sequences set forth in the instant application may recite “T”s in a representative DNA sequence but where the sequence represents mRNA, the “T”s would be substituted for “U”s. Thus, any of the DNAs disclosed and identified by a particular sequence identification number herein also disclose the corresponding mRNA sequence complementary to the DNA, where each “T” of the DNA sequence is substituted with “U.” Naturally-occurring eukaryotic mRNA molecules can contain stabilizing elements, including, but not limited to, UTRs at their 5′-end (5′ UTR) and/or at their 3′-end (3′ UTR), in addition to other structural features, such as a 5′-cap structure or a 3′-poly(A) tail. Both the 5′ UTR and the 3′ UTR are typically transcribed from the genomic DNA and are elements of the premature mRNA. Characteristic structural features of mature mRNA, such as the 5′-cap and the 3′-poly(A) tail are usually added to the transcribed (premature) mRNA during mRNA processing. Untranslated Regions (UTRs) mRNAs may comprise one or more regions or parts which act or function as an untranslated region. A “5′ untranslated region” (UTR) refers to a region of an mRNA that is directly upstream (i.e., 5′) from the start codon (i.e., the first codon of an mRNA transcript translated by a ribosome) that does not encode a polypeptide. A “3′ untranslated region” (UTR) refers to a region of an mRNA that is directly downstream (i.e., 3′) from the open reading frame (e.g., downstream from the last amino acid-encoding codon of an open reading frame, where the stop codon is considered part of the 3′ UTR, or downstream from the first stop codon signaling translation termination, where that stop codon is considered part of the open reading frame), and which does not encode a polypeptide. When RNA transcripts are being generated, the 5’ UTR may comprise a promoter sequence. Such promoter sequences are known in the art. It should be understood that such promoter sequences will not be present in an mRNA vaccine. Where mRNAs encode a (at least one) protein, the mRNA may comprise a 5’ UTR and/or 3’ UTR. UTRs of an mRNA are transcribed but not translated. In mRNA, the 5′ UTR starts at the transcription start site and continues to the start codon but does not include the start codon; the 3′ UTR starts immediately following the open reading frame and continues until the transcriptional termination signal. Where an open reading frame ends with a codon encoding an amino acid, the 3′ UTR begins with a stop codon, such that no amino acids are added to a polypeptide beyond the last amino acid encoded by the open reading frame. A 3′ UTR may further comprise one or more stop codons. There is a growing body of evidence about the regulatory roles played by the UTRs in terms of stability of the nucleic acid molecule and translation. The regulatory features of a UTR can be incorporated into the polynucleotides to, among other things, enhance the stability of the molecule. The specific features can also be incorporated to ensure controlled down-regulation of the transcript in case they are misdirected to undesired organs sites. A variety of 5’ UTR and 3’ UTR sequences are known. In some embodiments, the 5′ UTR comprises a sequence provided in Table 1 or a sequence with at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identity to a 5′ UTR sequence provided in Table 1, or a variant or a fragment thereof. In some embodiments, the 3′ UTR comprises a sequence provided in Table 2 or a sequence with at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identity to a 3′ UTR sequence provided in Table 2, or a variant or a fragment thereof. It should also be understood that the mRNA may include any 5’ UTR and/or any 3’ UTR. Exemplary UTR sequences include SEQ ID NOs: 1-44, 66-79 and 81-82; however, other UTR sequences may be used. In some embodiments, a 5' UTR comprises a sequence selected from: GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC (SEQ ID NO: 1), GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGACCCCGGCGCCGCCACC (SEQ ID NO: 2), GAGGAAAUCGCAAAAUUUGCUCUUCGCGUUAGAUUUCUUUUAGUUUUCUCGCAACUAGC AAGCUUUUUGUUCUCGCC (SEQ ID NO: 66), and GGAAAUCGCAAAAUUUGCUCUUCGCGUUAGAUUUCUUUUAGUUUUCUCGCAACUAGCAA GCUUUUUGUUCUCGCC (SEQ ID NO: 5). In some embodiments, a 3′ UTR comprises, in 5′-to-3′ order: (a) the nucleic acid sequence UAAAGCUCCCCGGGGGCCUCGGUGGCCUAGCUUCUU GCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCAG (SEQ ID NO: 68), (b) an identification and ratio determination (IDR) sequence, and (c) the nucleic acid sequence UGGUCUUUGAAUAAAGUCUGAGUGGGCGGC (SEQ ID NO: 69). In some embodiments, each mRNA encoding a distinct protein (i.e., having a different amino acid sequence from proteins encoded by other mRNAs in a composition) comprises a 3′ UTR comprising, in 5′-to-3′ order: (a) the nucleotide sequence of SEQ ID NO: 68; (b) a distinct IDR sequence; and (c) the nucleotide sequence of SEQ ID NO: 69. Non-limiting examples of IDR sequences are described in the section entitled “Identification and Ratio Determination (IDR) Sequences.” In some embodiments, a 5′ UTR comprises a sequence derived from a 5′ UTR of a gene selected from HSD17B4, RPL32, ASAH1, ATP5A1, MP68, NDUFA4, NOSIP, RPL31, SLC7A3, TUBB4B and UBQLN2. In some embodiments, the 5′ UTR comprises a sequence derived from the 5′ UTR of human hydroxysteroid 17-beta dehydrogenase 4 (HSD17B4). In some embodiments, a 5′ UTR comprises the sequence GGGAGAGUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUGUCGUUGCAGG CCUUAUUCAAGCUUACC (SEQ ID NO: 70). In some embodiments, a 5′ UTR comprises the sequence GUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUGUCGUUGCAGGCCUUAU UC (SEQ ID NO: 71). In some embodiments, a 5′ UTR comprises the sequence GGGAGAAAGCUUACC (SEQ ID NO: 72). In some embodiments, a 3′ UTR comprises a sequence derived from a 3′ UTR of a gene selected from PSMB3, ALB7, alpha-globin, CASP1, COX6B1, GNAS, NDUFA1 and RPS9. In some embodiments, a 3′ UTR comprises a sequence derived from a 3′ UTR of PSMB3 (proteasome 20S subunit beta 3). In some embodiments, a 3′ UTR comprises a sequence derived from a 3′ UTR of alpha-globin (MUAG). In some embodiments, a 3′ UTR comprises the sequence AGGACUAGUCCCUGUUCCCAGAGCCCACUUUUUUUUCUUUUUUUGAAAUAAAAUAGCCU GUCUUUCAGAUCU (SEQ ID NO: 73). In some embodiments, a 3′ UTR comprises the sequence GGACUAGUUAUAAGACUGACUAGCCCGAUGGGCCUCCCAACGGGCCCUCCUCCCCUCCUU GCACCGAGAUUAAU (SEQ ID NO: 74). In some embodiments, the mRNA comprises a 5′ UTR comprising the nucleotide sequence of any one of SEQ ID NOs: 70–72, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 73 or SEQ ID NO: 74. In some embodiments, the mRNA further comprises a polyA sequence comprising at least 64 consecutive adenosine nucleotides. In some embodiments, the mRNA further comprises a polyC sequence comprising at least 30 consecutive cytidine nucleotides. In some embodiments, a 5′ UTR comprises the sequence AACUAGUAUUCUUCUGGUCCCCACAGACUCAGAGAGAACCCGCCACC (SEQ ID NO: 75). In some embodiments, a 5′ UTR comprises the sequence GAGAAUAAACUAGUAUUCUUCUGGUCCCCACAGACUCAGAGAGAACCCGCCACC (SEQ ID NO: 76). In some embodiments, a 3′ UTR comprises the sequence CUGGUACUGCAUGCACGCAAUGCUAGCUGCCCCUUUCCCGUCCUGGGUACCCCGAGUCUC CCCCGACCUCGGGUCCCAGGUAUGCUCCCACCUCCACCUGCCCCACUCACCACCUCUGCUA GUUCCAGACACCUCCCAAGCACGCAGCAAUGCAGCUCAAAACGCUUAGCCUAGCCACACC CCCACGGGAAACAGCAGUGAUUAACCUUUAGCAAUAAACGAAAGUUUAACUAAGCUAUA CUAACCCCAGGGUUGGUCAAUUUCGUGCCAGCCACACC (SEQ ID NO: 77). In some embodiments, a 3′ UTR comprises the sequence CUCGAGCUGGUACUGCAUGCACGCAAUGCUAGCUGCCCCUUUCCCGUCCUGGGUACCCCG AGUCUCCCCCGACCUCGGGUCCCAGGUAUGCUCCCACCUCCACCUGCCCCACUCACCACCU CUGCUAGUUCCAGACACCUCCCAAGCACGCAGCAAUGCAGCUCAAAACGCUUAGCCUAGC CACACCCCCACGGGAAACAGCAGUGAUUAACCUUUAGCAAUAAACGAAAGUUUAACUAAG CUAUACUAACCCCAGGGUUGGUCAAUUUCGUGCCAGCCACACCCUGGAGCUAGC (SEQ ID NO: 78). In some embodiments, a 3′ UTR comprises the sequence CUGGUACUGCAUGCACGCAAUGCUAGCUGCCCCUUUCCCGUCCUGGGUACCCCGAGUCUC CCCCGACCUCGGGUCCCAGGUAUGCUCCCACCUCCACCUGCCCCACUCACCACCUCUGCUA GUUCCAGACACCUCCCAAGCACGCAGCAAUGCAGCUCAAAACGCUUAGCCUAGCCACACC CCCACGGGAAACAGCAGUGAUUAACCUUUAGCAAUAAACGAAAGUUUAACUAAGCUAUA CUAACCCCAGGGUUGGUCAAUUUCGUGCCAGCCACACCCUGGAGCUAGC (SEQ ID NO: 79). In some embodiments, an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 75 or SEQ ID NO: 76, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of any one of SEQ ID NOs: 77–79. In some embodiments, an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 76, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 78. In some embodiments, an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 76, an open reading frame, the nucleotide sequence UGAUGA, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 78. In some embodiments, the mRNA further comprises two poly(A) sequences separated by an intervening nucleotide sequence. In some embodiments, the mRNA further comprises the nucleotide sequence of SEQ ID NO: 80. In some embodiments, a 5′ UTR comprises the sequence GAGGAGACCCAAGCUACAUUUGCUUCUGACACAACUGUGUUCACUAGCAACCUCAAACAG ACACCGCCACC (SEQ ID NO: 81). In some embodiments, a 3′ UTR comprises the sequence GCUCGCUUUCUUGCUGUCCAAUUUCUAUUAAAGGUUCCUUUGUUCCCUAAGUCCAACUAC UAAACUGGGGGAUAUUAUGAAGGGCCUUGAGCAUCUGGAUUCUGCCUAAUAAAAAACAU UUAUUUUCAUUGC (SEQ ID NO: 82). In some embodiments, an mRNA comprises a 5′ UTR comprising the nucleotide sequence of SEQ ID NO: 81, an open reading frame, one or more stop codons, and a 3′ UTR comprising the nucleotide sequence of SEQ ID NO: 82. In some embodiments, the mRNA further comprises a polyA tail comprising 109 consecutive adenosine nucleotides. UTRs may also be omitted from the mRNA. A 5 ^ UTR does not encode a protein (is non-coding). Natural 5′ UTRs have features that play roles in translation initiation. They harbor signatures like Kozak sequences which are commonly known to be involved in the process by which the ribosome initiates translation of many genes. Kozak sequences have the consensus CCR(A/G)CCAUGG (SEQ ID NO: 53), where R is a purine (adenine or guanine) three bases upstream of the start codon (AUG), which is followed by another 'G'.5′UTR also have been known to form secondary structures which are involved in elongation factor binding. In some embodiments, a 5’ UTR is a heterologous UTR, i.e., is a UTR found in nature associated with a different ORF. In some embodiments, a 5’ UTR is a synthetic UTR, i.e., does not occur in nature. Synthetic UTRs include UTRs that have been mutated to improve their properties, e.g., which increase gene expression as well as those which are completely synthetic. Exemplary 5’ UTRs include Xenopus or human derived a-globin or b-globin (8278063; 9012219), human cytochrome b-245 a polypeptide, and hydroxysteroid (17b) dehydrogenase, and Tobacco etch virus (US8278063, US9012219). CMV immediate-early 1 (IE1) gene (US2014/0206753, WO2013/185069), the sequence GGGAUCCUACC (SEQ ID NO: 54) (WO 2014/144196) may also be used. In some embodiments, a 5' UTR is a 5' UTR of a TOP gene lacking the 5' TOP motif (the oligopyrimidine tract) (e.g., WO2015/101414, WO2015/101415, WO2015/062738, WO2015/024667, WO2015/024667); 5' UTR element derived from ribosomal protein Large 32 (L32) gene (WO/2015101414, WO2015101415, WO/2015/062738), 5' UTR element derived from the 5' UTR of an hydroxysteroid (17-β) dehydrogenase 4 gene (HSD17B4) (WO201/5024667), or a 5' UTR element derived from the 5' UTR of ATP5A1 (WO2015/024667) can be used. In some embodiments, an internal ribosome entry site (IRES) is used instead of a 5' UTR. A 3 ^ UTR does not encode a protein (is non-coding). Natural or wild type 3′ UTRs are known to have stretches of adenosines and uridines embedded in them. These AU rich signatures are particularly prevalent in genes with high rates of turnover. Based on their sequence features and functional properties, the AU rich elements (AREs) can be separated into three classes (Chen et al, 1995): Class I AREs contain several dispersed copies of an AUUUA motif within U-rich regions. C-Myc and MyoD contain class I AREs. Class II AREs possess two or more overlapping UUAUUUA(U/A)(U/A) nonamers. Molecules containing this type of AREs include GM-CSF and TNF-a. Class III ARES are less well defined. These U rich regions do not contain an AUUUA motif. c-Jun and Myogenin are two well-studied examples of this class. Most proteins binding to the AREs are known to destabilize the messenger, whereas members of the ELAV family, most notably HuR, have been documented to increase the stability of mRNA. HuR binds to AREs of all the three classes. Engineering the HuR specific binding sites into the 3′ UTR of nucleic acid molecules will lead to HuR binding and thus, stabilization of the message in vivo. Introduction, removal or modification of 3′ UTR AU rich elements (AREs) can be used to modulate the stability of mRNA. When engineering specific nucleic acids, one or more copies of an ARE can be introduced to make nucleic acids less stable and thereby curtail translation and decrease production of the resultant protein. Likewise, AREs can be identified and removed or mutated to increase the intracellular stability and thus increase translation and production of the resultant protein. Transfection experiments can be conducted in relevant cell lines, using nucleic acids and protein production can be assayed at various time points post-transfection. For example, cells can be transfected with different ARE-engineering molecules and by using an ELISA kit to the relevant protein and assaying protein produced at 6 hours, 12 hours, 1 day, 2 days, and 7 days post-transfection. Those of ordinary skill in the art will understand that 5’ UTRs that are heterologous or synthetic may be used with any desired 3’ UTR sequence. For example, a heterologous or synthetic 5’ UTR may be used with a synthetic 3’ UTR or with a heterologous 3’ UTR. Non-UTR sequences may also be used as regions or subregions within a nucleic acid. For example, introns or portions of introns sequences may be incorporated into regions of nucleic acid. Incorporation of intronic sequences may increase protein production as well as nucleic acid levels. Combinations of features may be included in flanking regions and may be contained within other features. For example, the ORF may be flanked by a 5′ UTR which may contain a strong Kozak translational initiation signal and/or a 3' UTR which may include an oligo(dT) sequence for templated addition of a poly-A tail. A 5′ UTR may comprise a first polynucleotide fragment and a second polynucleotide fragment from the same and/or different genes such as the 5′ UTRs described in US2010/0293625 and WO2015/085318A2, each of which is herein incorporated by reference. It should be understood that any UTR from any gene may be incorporated into the regions of a nucleic acid. Furthermore, multiple wild-type UTRs of any known gene may be utilized. Artificial UTRs which are not variants of wild type regions may be used. These UTRs or portions thereof may be placed in the same orientation as in the transcript from which they were selected or may be altered in orientation or location. Hence a 5′ or 3′ UTR may be inverted, shortened, lengthened, made with one or more other 5′ UTRs or 3′ UTRs. As used herein, the term “altered” as it relates to a UTR sequence, means that the UTR has been changed in some way in relation to a reference sequence. For example, a 3′ UTR or 5′ UTR may be altered relative to a wild-type/native UTR by the change in orientation or location as taught above or may be altered by the inclusion of additional nucleotides, deletion of nucleotides, swapping or transposition of nucleotides. Any of these changes producing an “altered” UTR (whether 3′ or 5′) comprise a variant UTR. In some embodiments, a double, triple or quadruple UTR such as a 5′ UTR or 3′ UTR may be used. As used herein, a “double” UTR is one in which two copies of the same UTR are encoded either in series or substantially in series. For example, a double beta-globin 3′ UTR may be used as described in US2010/0129877, which is incorporated herein by reference. Patterned UTRs may be used in RNAs. As used herein “patterned UTRs” are those UTRs which reflect a repeating or alternating pattern, such as ABABAB or AABBAABBAABB or ABCABCABC or variants thereof repeated once, twice, or more than 3 times. In these patterns, each letter, A, B, or C represent a different UTR at the nucleotide level. In some embodiments, flanking regions are selected from a family of transcripts whose proteins share a common function, structure, feature, or property. For example, polypeptides of interest may belong to a family of proteins which are expressed in a particular cell, tissue or at some time during development. The UTRs from any of these genes may be swapped for any other UTR of the same or different family of proteins to create a new polynucleotide. As used herein, a “family of proteins” is used in the broadest sense to refer to a group of two or more polypeptides of interest which share at least one function, structure, feature, localization, origin, or expression pattern. The untranslated region may also include translation enhancer elements (TEE). As a non- limiting example, the TEE may include those described in US 2009/0226470, herein incorporated by reference, and those known in the art. Open Reading Frames An open reading frame (ORF) is a continuous stretch of DNA or RNA that (1) begins with a start codon (e.g., ATG or AUG, encoding methionine), and (2) ends with a stop codon (e.g., TAA, TAG or TGA, or UAA, UAG or UGA) or is immediately followed by a stop codon. A stop codon does not encode an amino acid, such that translation of an ORF terminates when a ribosome reaches the stop codon immediately following the last amino acid-encoding codon in the ORF. A stop codon that results in translation termination may be considered part of the ORF, in which case the ORF ends with the stop codon. Alternatively, the first stop codon immediately following the last amino acid-encoding codon of an ORF may considered part of the 3′ untranslated region (3′ UTR) of a DNA or RNA, rather than part of the ORF. Those skilled in the art will understand that an ORF sequence that ends in a codon encoding amino acid will be followed by one or more stop codons in a DNA or RNA. An ORF may be followed by multiple stop codons. Inclusion of multiple consecutive stop codons reduces the extent of continued translation that may occur if a stop codon is mutated to a codon encoding an amino acid (readthrough), as a second stop codon may terminate translation even if a first stop codon is mutated and encodes an amino acid, such that only one amino acid is added to the C-terminus of the translated protein. Where multiple stop codons are present at the end, or immediately following, an ORF, the multiple stop codons may comprise the same stop codon (e.g., UGAUGA). Multiple stop codons may comprise different stop codons in series (e.g., UGAUAAUAG). In addition to reducing the extent of readthrough if a first stop codon is mutated, the presence of multiple different stop codons reduces the extent of readthrough if the first stop codon fails to allow translation termination (e.g., if a suppressor tRNA with an anticodon complementary to the first stop codon is present in the cell). An ORF typically encodes a protein. It will be understood that the sequences disclosed herein may further comprise additional elements, e.g., 5’ and/or 3’ UTRs, but that those elements, unlike the ORF, need not necessarily be present in an RNA (e.g., mRNA). Some aspects relate to an RNA (e.g., mRNA or self-amplifying RNA) encoding a SARS- CoV-2 chimeric protein and comprising an open reading frame with a sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of any one of SEQ ID NOs: 125, 126, 128, 129, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, or 147. Exemplary sequences of open reading frames encoding a SARS-CoV-2 chimeric protein are provided in Table 5. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 125. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 126. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 128. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 129. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 131. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 132. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 134. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 135. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 137. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 138. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 140. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 141. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 143. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 144. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 146. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 147. In some embodiments, an RNA (e.g., mRNA) comprises an open reading frame having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity to the nucleotide sequence of SEQ ID NO: 133. 5′ Caps In some embodiments, an RNA (e.g., mRNA) comprises a 5′ terminal cap.5′-capping of polynucleotides may be completed concomitantly during an in vitro transcription reaction using, for example, the following chemical RNA cap analogs to generate the 5′-guanosine cap structure according to manufacturer protocols: 3´-O-Me-m7G(5')ppp(5') G [the ARCA cap];G(5')ppp(5')A; G(5')ppp(5')G; m7G(5')ppp(5')A; m7G(5')ppp(5')G (New England BioLabs, Ipswich, MA).5′- capping of modified RNA (e.g., mRNA) may be completed post-transcriptionally using, for example, a Vaccinia Virus Capping Enzyme to generate the “Cap 0” structure: m7G(5')ppp(5')G (New England BioLabs, Ipswich, MA). Cap 1 structure may be generated using both Vaccinia Virus Capping Enzyme and a 2′-O methyl-transferase to generate: m7G(5')ppp(5')G-2′-O- methyl. Cap 2 structure may be generated from the Cap 1 structure followed by the 2′-O- methylation of the 5′-antepenultimate nucleotide using a 2′-O methyl-transferase. Cap 3 structure may be generated from the Cap 2 structure followed by the 2′-O-methylation of the 5′- preantepenultimate nucleotide using a 2′-O methyl-transferase. Enzymes may be derived from a recombinant source. Other cap analogs may be used. A cap analog may be, for example, a dinucleotide cap, a trinucleotide cap, or a tetranucleotide cap. In some embodiments, a cap analog is a dinucleotide cap. In some embodiments, a cap analog is a trinucleotide cap. In some embodiments, a cap analog is a tetranucleotide cap. A nucleotide cap (e.g., a trinucleotide cap or tetranucleotide cap), in some embodiments, comprises a compound of formula (I) a stereoisomer, tautomer or salt
Figure imgf000066_0001
; ring B1 is a modified or unmodified Guanine; ring B2 and ring B3 each independently is a nucleobase or a modified nucleobase; X2 is O, S(O)p, NR24 or CR25R26 in which p is 0, 1, or 2; Y0 is O or CR6R7; Y1 is O, S(O)n, CR6R7, or NR8, in which n is 0, 1, or 2; each --- is a single bond or absent, wherein when each --- is a single bond, Yi is O, S(O)n, CR6R7, or NR8; and when each --- is absent, Y1 is void; Y2 is (OP(O)R4)m in which m is 0, 1, or 2, or -O-(CR40R41)u-Q0-(CR42R43)v-, in which Q0 is a bond, O, S(O)r, NR44, or CR45R46, r is 0, 1 , or 2, and each of u and v independently is 1, 2, 3 or 4; each R2 and R2' independently is halo, LNA, or OR3; each R3 independently is H, C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl and R3, when being C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl, is optionally substituted with one or more of halo, OH and C1-C6 alkoxyl that is optionally substituted with one or more OH or OC(O)-C1-C6 alkyl; each R4 and R4' independently is H, halo, C1-C6 alkyl, OH, SH, SeH, or BH3-; each of R6, R7, and R8, independently, is -Q1-T1, in which Q1 is a bond or C1-C3 alkyl linker optionally substituted with one or more of halo, cyano, OH and C1-C6 alkoxy, and T1 is H, halo, OH, COOH, cyano, or Rs1, in which Rs1 is C1-C3 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1- C6 alkoxyl, C(O)O-C1-C6 alkyl, C3-C8 cycloalkyl, C6-C10 aryl, NR31R32, (NR31R32R33)+, 4 to 12- membered heterocycloalkyl, or 5- or 6-membered heteroaryl, and Rs1 is optionally substituted with one or more substituents selected from the group consisting of halo, OH, oxo, C1-C6 alkyl, COOH, C(O)O-C1-C6 alkyl, cyano, C1-C6 alkoxyl, NR31R32, (NR31R32R33)+, C3-C8 cycloalkyl, C6-C10 aryl, 4 to 12-membered heterocycloalkyl, and 5- or 6-membered heteroaryl; each of R10, R11, R12, R13 R14, and R15, independently, is -Q2-T2, in which Q2 is a bond or C1-C3 alkyl linker optionally substituted with one or more of halo, cyano, OH and C1-C6 alkoxy, and T2 is H, halo, OH, NH2, cyano, NO2, N3, Rs2, or ORs2, in which Rs2 is C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C3-C8 cycloalkyl, C6-C10 aryl, NHC(O)-C1-C6 alkyl, NR31R32, (NR31R32R33)+, 4 to 12-membered heterocycloalkyl, or 5- or 6-membered heteroaryl, and Rs2 is optionally substituted with one or more substituents selected from the group consisting of halo, OH, oxo, C1-C6 alkyl, COOH, C(O)O-C1-C6 alkyl, cyano, C1 - C6 alkoxyl, NR31R32, (NR31R32R33)+, C3-C8 cycloalkyl, C6-C10 aryl, 4 to 12-membered heterocycloalkyl, and 5- or 6- membered heteroaryl; or alternatively R12 together with R14 is oxo, or R13 together with R15 is oxo, each of R20, R21, R22, and R23 independently is -Q3-T3, in which Q3 is a bond or C1-C3 alkyl linker optionally substituted with one or more of halo, cyano, OH and C1-C6 alkoxy, and T3 is H, halo, OH, NH2, cyano, NO2, N3, RS3, or ORS3, in which RS3 is C1-C6 alkyl, C2- C6 alkenyl, C2-C6 alkynyl, C3-C8 cycloalkyl, C6-C10 aryl, NHC(O)-C1-C6 alkyl, mono-C1- C6 alkylamino, di-C1-C6 alkylamino, 4 to 12-membered heterocycloalkyl, or 5- or 6-membered heteroaryl, and Rs3 is optionally substituted with one or more substituents selected from the group consisting of halo, OH, oxo, C1-C6 alkyl, COOH, C(O)O-C1-C6 alkyl, cyano, C1-C6 alkoxyl, amino, mono-C1-C6 alkylamino, di-C1-C6 alkylamino, C3-C8 cycloalkyl, C6-C10 aryl, 4 to 12-membered heterocycloalkyl, and 5- or 6-membered heteroaryl; each of R24, R25, and R26 independently is H or C1-C6 alkyl; each of R27 and R28 independently is H or OR29; or R27 and R28 together form O-R30-O; each R29 independently is H, C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl and R29, when being C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl, is optionally substituted with one or more of halo, OH and C1-C6 alkoxyl that is optionally substituted with one or more OH or OC(O)-C1-C6 alkyl; R30 is C1-C6 alkylene optionally substituted with one or more of halo, OH and C1-C6 alkoxyl; each of R31, R32, and R33, independently is H, C1-C6 alkyl, C3-C8 cycloalkyl, C6-C10 aryl, 4 to 12-membered heterocycloalkyl, or 5- or 6-membered heteroaryl; each of R40, R41, R42, and R43 independently is H, halo, OH, cyano, N3, OP(O)R47R48, or C1-C6 alkyl optionally substituted with one or more OP(O)R47R48, or one R41 and one R43, together with the carbon atoms to which they are attached and Q0, form C4-C10 cycloalkyl, 4- to 14-membered heterocycloalkyl, C6-C10 aryl, or 5- to 14-membered heteroaryl, and each of the cycloalkyl, heterocycloalkyl, phenyl, or 5- to 6-membered heteroaryl is optionally substituted with one or more of OH, halo, cyano, N3, oxo, OP(O)R47R48, C1-C6 alkyl, C1-C6 haloalkyl, COOH, C(O)O-C1-C6 alkyl, C1-C6 alkoxyl, C1-C6 haloalkoxyl, amino, mono-C1-C6 alkylamino, and di-C1-C6 alkylamino; R44 is H, C1-C6 alkyl, or an amine protecting group; each of R45 and R46 independently is H, OP(O)R47R48, or C1-C6 alkyl optionally substituted with one or more OP(O)R47R48, and each of R47 and R48, independently is H, halo, C1-C6 alkyl, OH, SH, SeH, or BH3. It should be understood that a cap analog may include any of the cap analogs described in international publication WO 2017/066797, published on 20 April 2017, incorporated by reference herein in its entirety. In some embodiments, the B2 middle position can be a non-ribose molecule, such as arabinose. In some embodiments R2 is ethyl-based. Thus, in some embodiments, a tetranucleotide cap comprises the following structure:
Figure imgf000068_0001
. In some embodiments, a tetranucleotide cap comprises the following structure:
. (IX). In some embodiments, R is an alkyl (e.g., C1-C6 alkyl). In some embodiments, R is a methyl group (e.g., C1 alkyl). In some embodiments, R is an ethyl group (e.g., C2 alkyl). In some embodiments, R is a hydrogen. In some embodiments, a tetranucleotide cap comprises GGAG. In some embodiments, a tetranucleotide cap comprises any one of the following structures:
Figure imgf000070_0001
; or . Polyadenylation A “poly(A) tail” is a region of mRNA that is downstream, e.g., directly downstream (i.e., 3′), from the 3′ UTR that contains multiple, consecutive adenosine monophosphates. A poly(A) tail may contain 10 to 300 adenosine monophosphates. It can, in some instances, comprise up to about 400 adenine nucleotides. For example, a poly(A) tail may contain 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or 300 adenosine monophosphates. In some embodiments, a poly(A) tail contains 50 to 250 adenosine monophosphates. In a relevant biological setting (e.g., in cells, in vivo) the poly(A) tail functions to protect mRNA from enzymatic degradation, e.g., in the cytoplasm, and aids in transcription termination, and/or export of the mRNA from the nucleus and translation. In some embodiments, the length of the 3′-poly(A) tail may be an essential element with respect to the stability of the individual mRNA. In some embodiments, a poly(A) tail has a length of about 50, about 100, about 150, about 200, about 250, about 300, about 350, or about 400 nucleotides. In some embodiments, a poly(A) tail has a length of 100 nucleotides. In some embodiments, an mRNA comprises a poly(A) sequence that has a length of 50– 75 nucleotides. In some embodiments, an mRNA comprises a poly(A) sequence that comprises 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 consecutive adenosine nucleotides. In some embodiments, an mRNA comprises a poly(A) sequence comprising 64 consecutive adenosine nucleotides. In some embodiments, the consecutive adenosine nucleotides of a poly(A) sequence are flanked at the 5′ and 3′ end by nucleotides that are not adenosine nucleotides. In some embodiments, an mRNA comprises a poly(C) sequence, which may comprise 10 to 300 cytidine nucleotides. In some embodiments, the poly(C) sequence comprises 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 consecutive cytidine nucleotides. In some embodiments, the poly(C) sequence comprises 30 cytidine nucleotides. In some embodiments, the consecutive cytidine nucleotides of a poly(C) sequence are flanked at the 5′ and 3′ end by nucleotides that are not cytidine nucleotides. In some embodiments, an mRNA comprises two poly(A) sequences separated by an intervening nucleotide sequence. In some embodiments, the intervening nucleotide sequence comprises no more than 3, no more than two, no more than 1, or no adenosine nucleotides. In some embodiments, the intervening sequence comprises 3 adenosine nucleotides. In some embodiments, the intervening sequence does not comprise an adenosine nucleotide. In some embodiments, the intervening sequence is no more than 30, no more than 25, no more than 20, no more than 15, or no more than 10 nucleotides long. In some embodiments, the intervening sequence consists of 10 nucleotides. In some embodiments, the intervening sequence comprises the sequence of GCAUAUGACU (SEQ ID NO: 55). In some embodiments, the intervening sequence does not begin with an adenosine nucleotide, and does not end with an adenosine nucleotide. In some embodiments, the first poly(A) sequences comprises at least 15, at least 20, at least 25, or at least 30 consecutive adenosine nucleotides. In some embodiments, the second poly(A) sequences comprises at least 55, at least 60, at least 65, or at least 70 consecutive adenosine nucleotides. In some embodiments, the first poly(A) sequence comprises 30 consecutive adenosine nucleotides. In some embodiments, the second poly(A) sequence comprises 70 adenosine nucleotides. In some embodiments, an mRNA comprises the nucleotide sequence AAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCAUAUGACUAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (SEQ ID NO: 80). In some embodiments, an mRNA comprises a poly(A) sequence that has a length of 90– 120 nucleotides. In some embodiments, an mRNA comprises a poly(A) sequence that comprises 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 190, or 120 consecutive adenosine nucleotides. In some embodiments, an mRNA comprises a poly(A) sequence that comprises at least 109 consecutive adenosine nucleotides. In some embodiments, an mRNA comprises a poly(A) sequence that comprises 109 consecutive adenosine nucleotides. In some embodiments, an mRNA comprises a poly(A) sequence that consists of 109 consecutive adenosine nucleotides. Self-amplifying RNA Some aspects relate to self-amplifying RNA (e.g., an RNA replicon) encoding a SARS- CoV-2 chimeric protein. An self-amplifying RNA refers to an RNA encoding one or more molecules (e.g., proteins), individually or in conjunction, are capable of replicating the self- amplifying RNA. In some embodiments, the proteins encoded by the self-amplifying RNA are non-structural proteins nsP1, nsP2, nsP3, and nsP4, which form an RNA-dependent RNA polymerase (RdRp), or replicase, that is capable of replicating the self-amplifying RNA. By encoding proteins that are capable of replicating the RNA, a self-amplifying RNA is capable of self-amplification in a cell, provided that the cell can translate the RNA and produce the encoded protein(s). A self-amplifying RNA may be referred to as an RNA replicon. When an RNA replicon or self-amplifying RNA is translated, the one or more encoded viral non-structural proteins are translated. A “viral non-structural protein” is a protein encoded by a virus but that is not part of the virus particle. The viral non-structural proteins, in the context of self-amplifying RNA, replicate the nucleotide sequences encoding the vaccine antigen or therapeutic protein (e.g., SARS-CoV-2 chimeric protein) from the self-amplifying RNA via the sub-genomic viral promoters. Such replication driven by the viral sub-genomic promoter using the viral non- structural proteins enhances the expression level of the encoded protein. In some embodiments, the viral non-structural proteins are from a single-strand positive-sense RNA viruses. In some embodiments, the viral non-structural proteins are from an Alphavirus, belonging to the Togaviridae family. In some embodiments, the alphavirus is Sindbis or Venezuelan equine encephalitis virus. In some embodiments, the viral non-structural protein is an RNA-dependent RNA polymerase (RdRp) polyprotein P1234 (also termed NSP1-4). Upon translation, P1234 is rapidly cleaved into P123 and nsP4 by autoproteolytic activity originating from the nsP2 (proteinase) portion of the polyprotein. Alphaviral RNA synthesis occurs at the plasma membrane of a cell, where the nsPs, together with alphaviral RNA, form membrane invaginations (or “spherules”). These spherules contain dsRNA created by replication of “+” strand viral genomic RNA into “–“ strand anti-genomic RNA. The “–“ strand serves as a template from which additional “+” strand genomic RNA (synthesized from the 5’UTR) or a shorter subsequence of the genomic RNA (termed subgenomic RNA) is synthesized from the subgenomic viral promoter region located near the end of the nonstructural protein ORF. The “+” strand genomic RNA and the subgenomic RNA are exported out of the spherules into the cytoplasm where they are translated by endogenous ribosomes. The exported “+” strand genomic RNA can associate with nsPs and form additional spherules, thus resulting in exponential increase of replicon RNA. The viral non-structural proteins facilitate the replication of the nucleotide sequences encoding the SARS-CoV-2 protein via the subgenomic viral promoters (also referred to as “subgenomic promoters” herein). A “subgenomic viral promoter” refers to a promoter the drives the transcription of subgenomic mRNAs. Typically, an mRNA is transcribed from genomic DNAs and episomal DNAs (e.g., plasmids). Some viruses may transcribe subgenomic mRNAs from a RNA replicon that is produced from its genomic RNA. Many positive-sense RNA viruses produce subgenomic mRNAs as one of the common infection techniques used by these viruses and generally transcribe late viral genes. Subgenomic viral promoters range from 20 nucleotide (Sindbis virus) to over 100 nucleotides (Beet necrotic yellow vein virus) and are usually found upstream of the transcription start. In some embodiments, the subgenomic viral promoter is 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 nucleotides long, or longer. Subgenomic viral promoters have been described in the art, e.g., in PCT Application Publication No. WO 2016/040359, and Wagner et al., Nature Chemical Biology, DOI: 10.1038/s41589-018-0146-9 (2018). Additional Stabilizing Elements RNA (e.g., mRNA), in some embodiments, includes an additional stabilizing element. Stabilizing elements may include, for example, a histone stem-loop. A stem-loop binding protein (SLBP), a 32 kDa protein has been identified. It is associated with the histone stem-loop at the 3'-end of the histone messages in both the nucleus and the cytoplasm. Its expression level is regulated by the cell cycle; it peaks during the S-phase, when histone mRNA levels are also elevated. The protein has been shown to be essential for efficient 3'-end processing of histone pre-mRNA by the U7 snRNP. SLBP continues to be associated with the stem-loop after processing, and then stimulates the translation of mature histone mRNAs into histone proteins in the cytoplasm. The RNA binding domain of SLBP is conserved through metazoa and protozoa; its binding to the histone stem-loop depends on the structure of the loop. The minimum binding site includes at least three nucleotides 5’ and two nucleotides 3 ^ relative to the stem-loop. In some embodiments, an RNA (e.g., mRNA) includes an open reading frame (coding region), a histone stem-loop, and optionally, a poly(A) sequence or polyadenylation signal. The poly(A) sequence or polyadenylation signal generally should enhance the expression level of the encoded protein. The encoded protein, in some embodiments, is not a histone protein, a reporter protein (e.g., Luciferase, GFP, EGFP, β-Galactosidase, EGFP), or a marker or selection protein (e.g., alpha-Globin, Galactokinase and Xanthine:guanine phosphoribosyl transferase (GPT)). In some embodiments, an RNA (e.g., mRNA) includes the combination of a poly(A) sequence or polyadenylation signal and at least one histone stem-loop, even though both represent alternative mechanisms in nature, they act synergistically to increase the protein expression beyond the level observed with either of the individual elements. The synergistic effect of the combination of poly(A) and a histone stem-loop does not depend on the order of the elements or the length of the poly(A) sequence. In some embodiments, an RNA (e.g., mRNA) does not include a histone downstream element (HDE). “Histone downstream element” (HDE) includes a purine-rich polynucleotide stretch of approximately 15 to 20 nucleotides 3′ of naturally-occurring stem-loops, representing the binding site for the U7 snRNA, which is involved in processing of histone pre-mRNA into mature histone mRNA. In some embodiments, the nucleic acid does not include an intron. An RNA (e.g., mRNA) may or may not contain an enhancer and/or promoter sequence, which may be modified or unmodified or which may be activated or inactivated. In some embodiments, the histone stem-loop is generally derived from histone genes and includes an intramolecular base pairing of two neighbored partially or entirely reverse complementary sequences separated by a spacer, consisting of a short sequence, which forms the loop of the structure. The unpaired loop region is typically unable to base pair with either of the stem loop elements. It occurs more often in RNA, as is a key component of many RNA secondary structures but may be present in single-stranded DNA as well. Stability of the stem-loop structure generally depends on the length, number of mismatches or bulges, and base composition of the paired region. In some embodiments, wobble base pairing (non-Watson-Crick base pairing) may result. In some embodiments, the at least one histone stem-loop sequence comprises a length of 15 to 45 nucleotides. In some embodiments, an RNA (e.g., mRNA) has one or more AU-rich sequences removed. These sequences, sometimes referred to as AURES are destabilizing sequences found in the 3 ’UTR. The AURES may be removed from the mRNA. Alternatively, the AURES may remain in the mRNA. Sequence Modification In some embodiments, an open reading frame encoding a protein is codon optimized. Codon optimization methods are known in the art. An open reading frame of any one or more of the sequences may be codon optimized. Codon optimization, in some embodiments, may be used to match codon frequencies in target and host organisms to ensure proper folding; bias GC content to increase RNA (e.g., mRNA) stability or reduce secondary structures; minimize tandem repeat codons or base runs that may impair gene construction or expression; customize transcriptional and translational control regions; insert or remove protein trafficking sequences; remove/add post translation modification sites in encoded protein (e.g., glycosylation sites); add, remove or shuffle protein domains; insert or delete restriction sites; modify ribosome binding sites and RNA (e.g., mRNA) degradation sites; adjust translational rates to allow the various domains of the protein to fold properly; or reduce or eliminate problem secondary structures within the polynucleotide. Codon optimization tools, algorithms and services are known in the art – non-limiting examples include services from GeneArt (Life Technologies), DNA2.0 (Menlo Park CA) and/or proprietary methods. In some embodiments, the open reading frame sequence is optimized using optimization algorithms. In some embodiments, a codon optimized sequence shares less than 95% sequence identity to a naturally-occurring or wild-type sequence open reading frame (e.g., a naturally- occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein antigen). In some embodiments, a codon optimized sequence shares less than 90% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon optimized sequence shares less than 85% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon optimized sequence shares less than 80% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon optimized sequence shares less than 75% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon optimized sequence shares between 65% and 85% (e.g., between about 67% and about 85% or between about 67% and about 80%) sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon optimized sequence shares between 65% and 75% or about 80% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type RNA (e.g., mRNA) sequence encoding a SARS-CoV-2 protein). In some embodiments, a codon-optimized sequence encodes an antigen that is as immunogenic as, or more immunogenic than (e.g., at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 100%, or at least 200% more), than a SARS-CoV-2 protein encoded by a non-codon-optimized sequence. When transfected into mammalian host cells, the modified mRNAs have a stability of between 12-18 hours, or greater than 18 hours, e.g., 24, 36, 48, 60, 72, or greater than 72 hours and are capable of being expressed by the mammalian host cells. In some embodiments, a codon optimized RNA (e.g., mRNA) may be one in which the levels of G/C are enhanced. The G/C-content of nucleic acid molecules (e.g., mRNA) may influence the stability of the RNA. RNA (e.g., mRNA) having an increased amount of guanine (G) and/or cytosine (C) residues may be functionally more stable than RNA containing a large amount of adenine (A) and thymine (T) or uracil (U) nucleotides. As an example, WO02/098443 discloses a pharmaceutical composition containing an RNA (e.g., mRNA) stabilized by sequence modifications in the translated region. Due to the degeneracy of the genetic code, the modifications work by substituting existing codons for those that promote greater RNA stability without changing the resulting amino acid. The approach is limited to coding regions of the RNA (e.g., mRNA). Some embodiments of mRNAs comprise a sequence with a %G/C content of 30%–80%, 40%–70%, 50%–60%, 35%–50%, 50%–65%, 65%–70%, 40%–45%, 45%–50%, 50%–55%, 55%–70%, 70%–75%, or 75%–80%. In some embodiments, the nucleic acid sequence of the full-length mRNA comprises a %G/C content of 30%–80%, 40%–70%, 50%–60%, 35%–50%, 50%–65%, 65%–70%, 40%–45%, 45%–50%, 50%–55%, 55%–70%, 70%–75%, or 75%–80%. In some embodiments, the mRNA comprises an ORF with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%. In some embodiments, the mRNA comprises 5′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%. In some embodiments, the mRNA comprises 3′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%. In some embodiments, a modified mRNA comprises a higher %G/C content than a wild-type mRNA sequence. In some embodiments, the %G/C content of the modified mRNA sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the wild-type RNA sequence. In some embodiments, the %G/C content of the modified ORF sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the wild-type ORF sequence. In some embodiments, the %G/C content of the modified 5′ UTR sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the wild-type 3′ UTR sequence. Chemically Modified Nucleotides The compositions comprise, in some embodiments, an RNA having an open reading frame encoding a protein, wherein the nucleic acid comprises nucleotides and/or nucleosides that can be standard (unmodified) or modified as is known in the art. In some embodiments, nucleotides and nucleosides comprise modified nucleotides or nucleosides. Such modified nucleotides and nucleosides can be naturally-occurring modified nucleotides and nucleosides or non-naturally-occurring modified nucleotides and nucleosides. Such modifications can include those at the sugar, backbone, or nucleobase portion of the nucleotide and/or nucleoside as are recognized in the art. In some embodiments, a naturally-occurring modified nucleotide or nucleotide is one as is generally known or recognized in the art. Non-limiting examples of such naturally-occurring modified nucleotides and nucleotides can be found, inter alia, in the widely recognized MODOMICS database. In some embodiments, a non-naturally-occurring modified nucleotide or nucleoside is one as is generally known or recognized in the art. Non-limiting examples of such non-naturally- occurring modified nucleotides and nucleosides can be found, inter alia, in international publication numbers WO2013052523A1; WO2014093924A1; WO2015051173A2; WO2015051169A2; WO2015089511A2; or WO2017153936A1, each of which is herein incorporated by reference. Hence, nucleic acids (e.g., DNA and RNA, such as mRNA) can comprise standard nucleotides and nucleosides, naturally-occurring nucleotides and nucleosides, non-naturally- occurring nucleotides and nucleosides, or any combination thereof. Nucleic acids (e.g., DNA and RNA, such as mRNA), in some embodiments, comprise various (more than one) different types of standard and/or modified nucleotides and nucleosides. In some embodiments, a particular region of a nucleic acid contains one, two or more (optionally different) types of standard and/or modified nucleotides and nucleosides. In some embodiments, a modified RNA (e.g., mRNA) introduced to a cell or organism, exhibits reduced degradation in the cell or organism, respectively, relative to an unmodified nucleic acid comprising standard nucleotides and nucleosides. In some embodiments, a modified RNA (e.g., mRNA) introduced into a cell or organism, may exhibit reduced immunogenicity in the cell or organism, respectively (e.g., a reduced innate response) relative to an unmodified nucleic acid comprising standard nucleotides and nucleosides. Nucleic acids (e.g., RNA, such as mRNA), in some embodiments, comprise non-natural modified nucleotides that are introduced during synthesis or post-synthesis of the nucleic acids to achieve desired functions or properties. The modifications may be present on internucleotide linkages, purine or pyrimidine bases, or sugars. The modification may be introduced with chemical synthesis or with a polymerase enzyme at the terminal of a chain or anywhere else in the chain. Any of the regions of a nucleic acid may be chemically modified. Modified nucleosides and nucleotides may be present in a nucleic acid (e.g., RNA nucleic acids, such as mRNA nucleic acids). A “nucleoside” refers to a compound containing a sugar molecule (e.g., a pentose or ribose) or a derivative thereof in combination with an organic base (e.g., a purine or pyrimidine) or a derivative thereof (also referred to herein as “nucleobase”). A “nucleotide” refers to a nucleoside, including a phosphate group. Modified nucleotides may by synthesized by any useful method, such as, for example, chemically, enzymatically, or recombinantly, to include one or more modified or non-natural nucleosides. Nucleic acids can comprise a region or regions of linked nucleosides. Such regions may have variable backbone linkages. The linkages can be standard phosphodiester linkages, in which case the nucleic acids would comprise regions of nucleotides. Modified nucleotide base pairing encompasses not only the standard adenosine-thymine, adenosine-uracil, or guanosine-cytosine base pairs, but also base pairs formed between nucleotides and/or modified nucleotides comprising non-standard or modified bases, wherein the arrangement of hydrogen bond donors and hydrogen bond acceptors permits hydrogen bonding between a non-standard base and a standard base or between two complementary non-standard base structures, such as, for example, in those nucleic acids having at least one chemical modification. One example of such non-standard base pairing is the base pairing between the modified nucleotide inosine and adenine, cytosine or uracil. Any combination of base/sugar or linker may be incorporated into nucleic acids. In some embodiments, modified nucleobases in nucleic acids (e.g., RNA nucleic acids, such as mRNA nucleic acids) comprise 1-methyl-pseudouridine (m1ψ), 1-ethyl-pseudouridine (e1ψ), 5-methoxy-uridine (mo5U), 5-methyl-cytidine (m5C), 5-methyl-uridine (m5U), and/or pseudouridine (ψ). In some embodiments, modified nucleobases in nucleic acids (e.g., RNA nucleic acids, such as mRNA nucleic acids) comprise 5-methyluridine, 5-methoxymethyl uridine, 5-methylthio uridine, 1-methoxymethyl pseudouridine, 5-methyl cytidine, and/or 5- methoxy cytidine. In some embodiments, the polyribonucleotide includes a combination of at least two (e.g., 2, 3, 4 or more) of any of the aforementioned modified nucleobases, including but not limited to chemical modifications. In some embodiments, a mRNA comprises 1-methyl-pseudouridine (m1ψ) substitutions at one or more or all uridine positions of the nucleic acid. In some embodiments, a mRNA comprises 5-methyl-uridine (5mU) substitutions at one or more or all uridine positions of the nucleic acid. In some embodiments, a mRNA comprises 5-methyl-uridine (5mU) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid. In some embodiments, a mRNA comprises 1-methyl-pseudouridine (m1ψ) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid. In some embodiments, a mRNA comprises 5-methyl-uridine (5mU) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid. In some embodiments, a mRNA comprises pseudouridine (ψ) substitutions at one or more or all uridine positions of the nucleic acid. In some embodiments, a mRNA comprises pseudouridine (ψ) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid. In some embodiments, a mRNA comprises unmodified uridine at one or more or all uridine positions of the nucleic acid. In some embodiments, mRNAs are uniformly modified (e.g., fully modified, modified throughout the entire sequence) for a particular modification. For example, a nucleic acid can be uniformly modified with 1-methyl-pseudouridine, meaning that all uridine residues in the mRNA sequence are replaced with 1-methyl-pseudouridine. Similarly, a nucleic acid can be uniformly modified for any type of nucleoside residue present in the sequence by replacement with a modified residue such as those set forth above. The nucleic acids may be partially or fully modified along the entire length of the molecule. For example, one or more or all or a given type of nucleotide (e.g., purine or pyrimidine, or any one or more or all of A, G, U, C) may be uniformly modified in a nucleic acid, or in a predetermined sequence region thereof (e.g., in the mRNA including or excluding the poly(A) tail). In some embodiments, all nucleotides X in a nucleic acid (or in a sequence region thereof) are modified nucleotides, wherein X may be any one of nucleotides A, G, U, C, or any one of the combinations A+G, A+U, A+C, G+U, G+C, U+C, A+G+U, A+G+C, G+U+C or A+G+C. The nucleic acid may contain from about 1% to about 100% modified nucleotides (either in relation to overall nucleotide content, or in relation to one or more types of nucleotide, i.e., any one or more of A, G, U or C) or any intervening percentage (e.g., from 1% to 20%, from 1% to 25%, from 1% to 50%, from 1% to 60%, from 1% to 70%, from 1% to 80%, from 1% to 90%, from 1% to 95%, from 10% to 20%, from 10% to 25%, from 10% to 50%, from 10% to 60%, from 10% to 70%, from 10% to 80%, from 10% to 90%, from 10% to 95%, from 10% to 100%, from 20% to 25%, from 20% to 50%, from 20% to 60%, from 20% to 70%, from 20% to 80%, from 20% to 90%, from 20% to 95%, from 20% to 100%, from 50% to 60%, from 50% to 70%, from 50% to 80%, from 50% to 90%, from 50% to 95%, from 50% to 100%, from 70% to 80%, from 70% to 90%, from 70% to 95%, from 70% to 100%, from 80% to 90%, from 80% to 95%, from 80% to 100%, from 90% to 95%, from 90% to 100%, and from 95% to 100%). It will be understood that any remaining percentage is accounted for by the presence of unmodified A, G, U, or C. The mRNAs may contain at a minimum 1% and at maximum 100% modified nucleotides, or any intervening percentage, such as at least 5% modified nucleotides, at least 10% modified nucleotides, at least 25% modified nucleotides, at least 50% modified nucleotides, at least 80% modified nucleotides, or at least 90% modified nucleotides. For example, the nucleic acids may contain a modified pyrimidine such as a modified uracil or cytosine. In some embodiments, at least 5%, at least 10%, at least 25%, at least 50%, at least 80%, at least 90% or 100% of the uracil in the nucleic acid is replaced with a modified uracil (e.g., a 5-substituted uracil). The modified uracil can be replaced by a compound having a single unique structure or can be replaced by a plurality of compounds having different structures (e.g., 2, 3, 4 or more unique structures). In some embodiments, at least 5%, at least 10%, at least 25%, at least 50%, at least 80%, at least 90% or 100% of the cytosine in the nucleic acid is replaced with a modified cytosine (e.g., a 5-substituted cytosine). The modified cytosine can be replaced by a compound having a single unique structure or can be replaced by a plurality of compounds having different structures (e.g., 2, 3, 4 or more unique structures). Modified nucleotides may include modified nucleobases. For example, an RNA transcript (e.g., mRNA transcript) may include a modified uracil nucleobase selected from pseudouracil (ψ), N1-methylpseudouracil (m1ψ), 1-ethylpseudouracil, 2-thiouracil, 4′-thiouracil, 2-thio-1- methyl-1-deaza-pseudouracil, 2-thio-1-methyl-pseudouracil, 2-thio-5-aza-uracil, 2-thio- dihydropseudouracil, 2-thio-dihydrouracil, 2-thio-pseudouracil, 4-methoxy-2-thio-pseudouracil, 4-methoxy-pseudouracil, 4-thio-1-methyl-pseudouracil, 4-thio-pseudouracil, 5-aza-uracil, dihydropseudouracil, 5-methyluracil, 5-methoxyuracil (mo5U) and 2′-O-methyluracil. In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a modified guanine nucleobase selected from digoxigeninated guanine, 6-thioguanine, 7-deazaguanine, 7-deaza-7- propargylaminoguanine, 8-oxoguanine, araguanine, biotin-16-7-deaza-7-propargylaminoguanine, isoguanine, N2-methylguanine, O6-methylguanine, thienoguanine, and 2,6-daminoguanine. In some embodiments, an RNA transcript may include a modified cytosine nucleobase selected from digoxigeninated cytosine, 2-thiocytosine, 5-aminoallylcytosine, 5-bromocytosine, 5- carboxycytosine, 5-formylcytosine, 5-hydroxycytosine, 5-hydroxymethylcytosine, 5- methoxycytosine, 5-methylcytosine, 5-propargylaminocytosine, 5-propynylcytosine, 6- azacytosine, aracytosine, cyanine 3-5-propargylaminocytosine, cyanine 3-aminoallylcytosine, cyanine 5-6-propargylaminocytosine, cyanine 5-aminoallylcytosine, desthiobiotin-6- aminoallylcytosine, N4-biotin-OBEA-cytosine, N4-methylcytosine, pseudoisocytosine, and thienocytosine. In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a modified adenine nucleobase selected from digoxigeninated adenine, N6-methyladenine, 7- deazaadenine, 7-deaza-7-propargylaminoadenine, 8-azaadenine, 8-azidoadenine, 8- chloroadenine, 8-oxoadenine, araadenine, N1-methyladenine, N6-methyladenine, 3- deazaadenine, 2,6-diaminoadenine, 2-methyl-thio-N6-isopentenyladenine (ms2i6A), 2- methylthio-N6-methyladenine (ms2m6A), N6-(cis-hydroxyisopentenyl)adenine (io6A), 2- methylthio-N6-(cis-hydroxyisopentenyl)adenine (ms2io6A), N6-glycinylcarbamoyladenine (g6A), N6-threonylcarbamoyladenine (t6A), 2-methylthio-N6-threonyl carbamoyladenine (ms2t6A), N6-methyl-N6-threonylcarbamoyladenine (m6t6A), N6- hydroxynorvalylcarbamoyladenine (hn6A), 2-methylthio-N6-hydroxynorvalyl carbamoyladenine (ms2hn6A), N6,N6-dimethyladenine (m62A), and N6-acetyladenine (ac6A). In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified nucleobases. Modified nucleotides may include modified sugars. For example, an RNA transcript (e.g., mRNA transcript) may include a modified sugar selected from 2′-thioribose, 2′,3′-dideoxyribose, 2′-amino-2′-deoxyribose, 2′ deoxyribose, 2′-azido-2′-deoxyribose, 2′-fluoro-2′-deoxyribose, 2′- O-methylribose, 2′-O-methyldeoxyribose, 3′-amino-2′,3′-dideoxyribose, 3′-azido-2′,3′- dideoxyribose, 3′-deoxyribose, 3′-O-(2-nitrobenzyl)-2′-deoxyribose, 3′-O-methylribose, 5′- aminoribose, 5′-thioribose, 5-nitro-1-indolyl-2′-deoxyribose, 5′-biotin-ribose, 2′-O,4′-C- methylene-linked, 2′-O,4′-C-amino-linked ribose, and 2′-O,4′-C-thio-linked ribose. In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified sugars. Modified nucleotides may include modified phosphates. A modified phosphate group is a phosphate group that differs from the canonical structure of phosphate. An example of a canonical is shown below:
Figure imgf000082_0001
, where R5 and R3 are atoms or molecules to which the canonical phosphate is bonded. For example, for a phosphate in a nucleic acid sequence, R5 may refer to the upstream nucleotide of the nucleic acid, and R3 may refer to the downstream nucleotide of the nucleic acid. The canonical structure of phosphate also refers to structures in which one or more hydroxyl groups of the phosphate are deprotonated, or in which an oxygen atom of the phosphate is bonded to an adjacent nucleotide in a nucleic acid sequence. In some embodiments, an RNA transcript (e.g., mRNA transcript) may include a modified phosphate selected from phosphorothioate (PS), thiophosphate, 5′-O-methylphosphonate, 3′-O-methylphosphonate, 5′-hydroxyphosphonate, hydroxyphosphanate, phosphoroselenoate, selenophosphate, phosphoramidate, carbophosphonate, methylphosphonate, phenylphosphonate, ethylphosphonate, H-phosphonate, guanidinium ring, triazole ring, boranophosphate (BP), methylphosphonate, and guanidinopropyl phosphoramidate. In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified phosphates. In some embodiments, an mRNA includes N1-methylpseudouridine. In some embodiments, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of uracil nucleotides in an mRNA comprise N1-methylpseudouridine. In some embodiments, each uracil nucleotide of an mRNA transcript comprises N1- methylpseudouridine. In some embodiments, an mRNA includes 5-methylcytidine. In some embodiments, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of cytosine nucleotides in an mRNA comprise 5-methylcytidine. In some embodiments, each cytosine nucleotide of an mRNA transcript comprises 5- methylcytidine. In some embodiments, an mRNA includes 5-methyluridine. In some embodiments, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of uracil nucleotides in an mRNA comprise 5-methyluridine. In some embodiments, each uracil nucleotide of an mRNA transcript comprises 5-methyluridine. In some embodiments, an mRNA includes 5-methylcytidine and 5-methyluridine. In some embodiments, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of uracil nucleotides in an mRNA comprise 5-methyluridine and at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% of cytosine nucleotides in an mRNA comprise 5-methylcytidine. In some embodiments, each cytosine nucleotide of an mRNA transcript comprises 5-methylcytidine and each uracil nucleotide of an mRNA transcript comprises 5-methyluridine. Unmodified Nucleotides In some embodiments, an RNA (e.g., mRNA) is not chemically modified and comprises the standard ribonucleotides consisting of adenosine, guanosine, cytosine and uridine. In some embodiments, nucleotides and nucleosides comprise standard nucleoside residues such as those present in transcribed RNA (e.g., A, G, C, or U). In some embodiments, nucleotides and nucleosides comprise standard deoxyribonucleosides such as those present in DNA (e.g., dA, dG, dC, or dT). In Vitro Transcription (IVT) cDNA encoding the polynucleotides may be transcribed using an in vitro transcription (IVT) system. In vitro transcription of RNA is known in the art and is described in International Publication WO 2014/152027, which is incorporated by reference herein in its entirety. In some embodiments, the RNA is prepared in accordance with any one or more of the methods described in WO 2018/053209 and WO 2019/036682, each of which is incorporated by reference herein. In some embodiments, the RNA transcript is generated using a non-amplified, linearized DNA template in an in vitro transcription reaction to generate the RNA transcript. In some embodiments, the template DNA is isolated DNA. In some embodiments, the template DNA is cDNA. In some embodiments, the cDNA is formed by reverse transcription of a RNA polynucleotide, for example, but not limited to influenza virus mRNA. In some embodiments, cells, e.g., bacterial cells, e.g., E. coli, e.g., DH-1 cells are transfected with the plasmid DNA template. In some embodiments, the transfected cells are cultured to replicate the plasmid DNA which is then isolated and purified. In some embodiments, the DNA template includes a RNA polymerase promoter, e.g., a T7 promoter located 5 ' to and operably linked to the gene of interest. In some embodiments, an in vitro transcription template encodes a 5′ untranslated (UTR) region, contains an open reading frame, and encodes a 3′ UTR and a poly(A) tail. The particular nucleic acid sequence composition and length of an in vitro transcription template will depend on the mRNA encoded by the template. An in vitro transcription system typically comprises a transcription buffer, nucleotide triphosphates (NTPs), an RNase inhibitor and a polymerase. The NTPs may be manufactured in house, may be selected from a supplier, or may be synthesized. The NTPs may be selected from, but are not limited to, those including natural and unnatural (modified) NTPs. Any number of RNA polymerases or variants may be used in the method. The polymerase may be selected from, but is not limited to, a phage RNA polymerase, e.g., a T7 RNA polymerase, a T3 RNA polymerase, a SP6 RNA polymerase, and/or mutant polymerases such as, but not limited to, polymerases able to incorporate modified nucleic acids and/or modified nucleotides, including chemically modified nucleic acids and/or nucleotides. Some embodiments exclude the use of DNase. In some embodiments, the RNA transcript is capped via enzymatic capping. In some embodiments, the RNA comprises 5' terminal cap, for example, 7mG(5’)ppp(5’)NlmpNp. In some embodiments, the RNA polymerase is an RNA polymerase variant, such as those described in WO 2020/172239, incorporated herein by reference in its entirety. RNA polymerase variants include at least one amino acid substitution, relative to the wild type (WT) RNA polymerase. A WT T7 RNA polymerase is represented by SEQ ID NO: 83: MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVA DNAAAKPLITTLLPKMIARINDWFEEVKAKRGKRPTAFQFLQEIKPEAVAYITIKTTLACLTSADNT TVQAVASAIGRAIEDEARFGRIRDLEAKHFKKNVEEQLNKRVGHVYKKAFMQVVEADMLSKGL LGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHRQNAGVVGQDSETIELAPEYAEAIATRAGA LAGISPMFQPCVVPPKPWTGITGGGYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQ NTAWKINKKVLAVANVITKWKHCPVEDIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKD KARKSRRISLEFMLEQANKFANHKAIWFPYNMDWRGRVYAVSMFNPQGNDMTKGLLTLAKGKP IGKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMACAKSPLENTWWAEQDSPFCFLAFCF EYAGVQHHGLSYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGIVAKKVNEI LQADAINGTDNEVVTVTDENTGEISEKVKLGTKALAGQWLAYGVTRSVTKRSVMTLAYGSKEF GFRQQVLEDTIQPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWLKSAAKLLA AEVKDKKTGEILRKRCAVHWVTPDGFPVWQEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDA HKQESGIAPNFVHSQDGSHLRKTVVWAHEKYGIESFALIHDSFGTIPADAANLFKAVRETMVDTY ESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDFAFA (SEQ ID NO: 83). For example, with reference to WT T7 RNA polymerase having an amino acid sequence of SEQ ID NO: 83, the glycine at position 47 is considered a “wild-type amino acid,” whereas a substitution of the glycine for alanine at position 47 is considered an “amino acid substitution” that has a high-helix propensity. In some embodiments, the RNA polymerase variant is a T7 RNA polymerase variant comprising at least one (one or more) amino acid substitution relative to WT RNA polymerase (e.g., WT T7 RNA polymerase having an amino acid sequence of SEQ ID NO: 83). In some embodiments, a RNA polymerase variant comprises a RNA polymerase that includes an (at least one) amino acid modification causes a loop structure of the RNA polymerase variant to undergo a conformational change to a helix structure as the RNA polymerase variant transitions from an initiation complex to an elongation complex. In some embodiments, the amino acid modification is an amino acid substitution at one or more of positions 42, 43, 44, 45, 46, and 47, relative to the wild-type RNA polymerase, wherein the wild- type RNA polymerase comprises the amino acid sequence of SEQ ID NO: 83. The amino acid substitution, in some embodiments, is a high propensity amino acid substitution. Examples of high-helix propensity amino acids include alanine, isoleucine, leucine, arginine, methionine, lysine, glutamine, and/or glutamate. In some embodiments, the amino acid substitution at position 47 is G47A. In some embodiments, a RNA polymerase variant comprise a RNA polymerase that includes an additional C-terminal amino acid, relative to the wild-type RNA polymerase. The additional C-terminal amino acid, in some embodiments, is selected from glycine, alanine, threonine, proline, glutamine, and serine. In some embodiments, the additional C-terminal amino acid (e.g., at position 884 relative to wild-type RNA polymerase comprising the amino acid sequence of SEQ ID NO: 83) is glycine. Co-transcriptional capping methods may also be used for ribonucleic acid (RNA) synthesis, using an RNA polymerase variant. That is, RNA is produced in a “one-pot” reaction, without the need for a separate capping reaction. Thus, the methods, in some embodiments, comprise reacting a polynucleotide template with a RNA polymerase variant, nucleoside triphosphates, and a cap analog under in vitro transcription reaction conditions to produce RNA transcript. Multivalent Vaccines The compositions may include RNA (e.g., mRNA) or multiple RNAs (e.g., mRNAs) encoding two or more antigens of the same or different species. In some embodiments, composition includes an RNA (e.g., mRNA) or multiple RNAs (e.g., mRNAs) encoding two or more proteins. In some embodiments, the RNA (e.g., mRNA) may encode 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or more proteins. In some embodiments, two or more different RNA (e.g., mRNA) encoding antigens may be formulated in the same lipid nanoparticle. In some embodiments, two or more different RNA (e.g., mRNA) encoding antigens may be formulated in separate lipid nanoparticles (each RNA (e.g., mRNA) formulated in a single lipid nanoparticle). Lipid nanoparticles may then be combined and administered as a single vaccine composition (e.g., comprising multiple RNAs (e.g., mRNAs) encoding multiple antigens) or may be administered separately. Identification and Ratio Determination (IDR) Sequences In some embodiments, one or more nucleic acids comprises an Identification and Ratio Determination sequence. An Identification and Ratio Determination (IDR) sequence is a sequence of a biological molecule (e.g., nucleic acid or protein) that, when combined with the sequence of a target biological molecule, serves to identify the target biological molecule. Typically, an IDR sequence is a heterologous sequence that is incorporated within or appended to a sequence of a target biological molecule and can be used as a reference to identify the target molecule. Thus, in some embodiments, a nucleic acid (e.g., mRNA) comprises (i) a target sequence of interest (e.g., a coding sequence encoding a therapeutic and/or antigenic peptide or protein); and (ii) a unique IDR sequence. An RNA species (e.g., RNA having a given coding sequence) may comprise an IDR sequence that differs from the IDR sequence of other RNA species (e.g., RNA(s) having different coding sequence(s)). Each IDR sequence thus identifies a particular RNA species, and so the abundance of IDR sequences may be measured to determine the abundance of each RNA species in a composition. Use of distinct IDR sequences to identify RNA species allows for analysis of multivalent RNA compositions (e.g., containing multiple RNA species) containing RNA species with similar coding sequences and/or lengths, which could otherwise be difficult to distinguish using PCR- or chromatography-based analysis of full-length RNAs. Each RNA species in a multivalent RNA composition may comprise an IDR sequence that is not a sequence isomer of an IDR sequence of another RNA species in a multivalent RNA composition (e.g., the IDR sequence does not have the same number of adenosine nucleotides, the same number of cytosine nucleotides, the same number of guanine nucleotides, and the same number of uracil nucleotides, as another IDR sequence in the composition, even if those sequences have different sequences). Having identical nucleotide compositions causes sequence isomers to have the same mass, presenting a challenge to distinguishing sequence isomers using mass-based identification methods (e.g., mass spectrometry). Each RNA species in a multivalent RNA composition may comprise an IDR sequence having a mass that differs from the mass of IDR sequences of each other RNA species in a multivalent RNA composition. For example, the mass of each IDR sequence may differ from the mass of other IDR sequences by at least 9 Da, at least 25 Da, at least 25 Da, or at least 50 Da. Use of IDR sequences with distinct masses allows RNA fragments comprising different IDR sequences to be distinguished using mass-based analysis methods (e.g., mass spectrometry), which do not require reverse transcription, amplification, or sequencing of RNAs. Each RNA species in an RNA composition may comprises an IDR sequence with a different length. For example, each IDR sequence may have a length independently selected from 0 to 25 nucleotides. The length of a nucleic acid influences the rate at which the nucleic acid traverses a chromatography column, and so the use of IDR sequences of different lengths on different RNA species allows RNA fragments having different IDR sequences to be distinguished using chromatography-based methods (e.g., LC-UV). IDR sequences may be chosen such that no IDR sequence comprises a start codon, ‘AUG’. Lack of a start codon in an IDR sequence prevents undesired translation of nucleotide sequences within and/or downstream from the IDR sequence. IDR sequences may be chosen such that no IDR sequence comprises a recognition site for a restriction enzyme. In one example, no IDR sequence comprises a recognition site for XbaI, ‘UCUAG’. Lack of a recognition site for a restriction enzyme (e.g., XbaI recognition site ‘UCUAG’) allows the restriction enzyme to be used in generating and modifying a DNA template for in vitro transcription, without affecting the IDR sequence or sequence of the transcribed RNA. Non-limiting examples of distinct IDR sequences include: GAGAUUGAGUGUAGUGACUAG (SEQ ID NO: 56), GAGAUUGAGUGUAGUGAC (SEQ ID NO: 57), GAGAUUGAGUGUAGUG (SEQ ID NO: 58), GAUUGAGACUACGGG (SEQ ID NO: 59), and CAUAGACACUACG (SEQ ID NO: 60). In some embodiments of the compositions, each mRNA encoding a distinct protein comprises a 3′ UTR comprising a distinct IDR sequence selected from SEQ ID NOs: 56–60. Nucleic Acid Production Chemical Synthesis Nucleic acids may be manufactured in whole or in part using solid phase techniques. Solid-phase chemical synthesis of nucleic acids is an automated method wherein molecules are immobilized on a solid support and synthesized step by step in a reactant solution. Solid-phase synthesis is useful in site-specific introduction of chemical modifications in the nucleic acid sequences. The synthesis of nucleic acids by the sequential addition of monomer building blocks may be carried out in a liquid phase. The synthetic methods discussed above each has its own advantages and limitations. Attempts have been conducted to combine these methods to overcome the limitations. Such combinations of methods are also suitable. The use of solid-phase or liquid-phase chemical synthesis in combination with enzymatic ligation provides an efficient way to generate long chain nucleic acids that cannot be obtained by chemical synthesis alone. Ligation Assembling nucleic acids by a ligase may also be used. DNA or RNA ligases promote intermolecular ligation of the 5’ and 3’ ends of polynucleotide chains through the formation of a phosphodiester bond. Nucleic acids such as chimeric polynucleotides and/or circular nucleic acids may be prepared by ligation of one or more regions or subregions. DNA fragments can be joined by a ligase catalyzed reaction to create recombinant DNA with different functions. Two oligodeoxynucleotides, one with a 5’ phosphoryl group and another with a free 3’ hydroxyl group, serve as substrates for a DNA ligase. Purification Purification of the nucleic acids may include, but is not limited to, nucleic acid clean-up, quality assurance and quality control. Clean-up may be performed by methods known in the arts such as, but not limited to, AGENCOURT® beads (Beckman Coulter Genomics, Danvers, MA), poly-T beads, LNATM oligo-T capture probes (EXIQON® Inc, Vedbaek, Denmark) or HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC), and hydrophobic interaction HPLC (HIC- HPLC). The term “purified” when used in relation to a nucleic acid such as a “purified nucleic acid” refers to one that is separated from at least one contaminant. A “contaminant” is any substance that makes another unfit, impure or inferior. Thus, a purified nucleic acid (e.g., DNA and RNA) is present in a form or setting different from that in which it is found in nature, or a form or setting different from that which existed prior to subjecting it to a treatment or purification method. A quality assurance and/or quality control check may be conducted using methods such as, but not limited to, gel electrophoresis, UV absorbance, or analytical HPLC. In some embodiments, the nucleic acids may be sequenced by methods including, but not limited to reverse-transcriptase-PCR. Quantification In some embodiments, the nucleic acids may be quantified in exosomes or when derived from one or more bodily fluid. Bodily fluids include peripheral blood, serum, plasma, ascites, urine, cerebrospinal fluid (CSF), sputum, saliva, bone marrow, synovial fluid, aqueous humor, amniotic fluid, cerumen, breast milk, broncheoalveolar lavage fluid, semen, prostatic fluid, cowper's fluid or pre-ejaculatory fluid, sweat, fecal matter, hair, tears, cyst fluid, pleural and peritoneal fluid, pericardial fluid, lymph, chyme, chyle, bile, interstitial fluid, menses, pus, sebum, vomit, vaginal secretions, mucosal secretion, stool water, pancreatic juice, lavage fluids from sinus cavities, bronchopulmonary aspirates, blastocyl cavity fluid, and umbilical cord blood. Alternatively, exosomes may be retrieved from an organ selected from the group consisting of lung, heart, pancreas, stomach, intestine, bladder, kidney, ovary, testis, skin, colon, breast, prostate, brain, esophagus, liver, and placenta. Assays may be performed using construct specific probes, cytometry, qRT-PCR, real-time PCR, PCR, flow cytometry, electrophoresis, mass spectrometry, or combinations thereof while the exosomes may be isolated using immunohistochemical methods such as enzyme linked immunosorbent assay (ELISA) methods. Exosomes may also be isolated by size exclusion chromatography, density gradient centrifugation, differential centrifugation, nanomembrane ultrafiltration, immunoabsorbent capture, affinity purification, microfluidic separation, or combinations thereof. These methods afford the investigator the ability to monitor, in real time, the level of nucleic acids remaining or delivered. This is possible because the nucleic acids, in some embodiments, differ from the endogenous forms due to the structural or chemical modifications. In some embodiments, the nucleic acid may be quantified using methods such as, but not limited to, ultraviolet visible spectroscopy (UV/Vis). A non-limiting example of a UV/Vis spectrometer is a NANODROP® spectrometer (ThermoFisher, Waltham, MA). The quantified nucleic acid may be analyzed in order to determine if the nucleic acid may be of proper size, check that no degradation of the nucleic acid has occurred. Degradation of the nucleic acid may be checked by methods such as, but not limited to, agarose gel electrophoresis, HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC), and hydrophobic interaction HPLC (HIC- HPLC), liquid chromatography-mass spectrometry (LCMS), capillary electrophoresis (CE) and capillary gel electrophoresis (CGE). Lipid Compositions In some embodiments, the nucleic acids are formulated as a lipid composition, such as a composition comprising a lipid nanoparticle, a liposome, and/or a lipoplex. In some embodiments, nucleic acids are formulated as lipid nanoparticle (LNP) compositions. Lipid nanoparticles typically comprise amino lipid, non-cationic lipid, structural lipid, and PEG lipid components along with the nucleic acid cargo of interest. The lipid nanoparticles can be generated using components, compositions, and methods as are generally known in the art, see for example PCT/US2016/052352; PCT/US2016/068300; PCT/US2017/037551; PCT/US2015/027400; PCT/US2016/047406; PCT/US2016/000129; PCT/US2016/014280; PCT/US2017/038426; PCT/US2014/027077; PCT/US2014/055394; PCT/US2016/052117; PCT/US2012/069610; PCT/US2017/027492; PCT/US2016/059575; PCT/US2016/069491; PCT/US2016/069493; and PCT/US2014/66242, all of which are incorporated by reference herein in their entirety. In some embodiments, the lipid nanoparticle comprises at least one ionizable amino lipid, at least one non-cationic lipid, at least one sterol, and/or at least one polyethylene glycol (PEG)- modified lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-25% non-cationic lipid, 25-55% structural lipid, and 0.5-15% PEG-modified lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-30% non-cationic lipid, 10-55% structural lipid, and 0.5-15% PEG-modified lipid. In some embodiments, the lipid nanoparticle comprises 40-50 mol% ionizable lipid, optionally 45-50 mol%, for example, 45-46 mol%, 46-47 mol%, 47-48 mol%, 48-49 mol%, or 49-50 mol% for example about 45 mol%, 45.5 mol%, 46 mol%, 46.5 mol%, 47 mol%, 47.5 mol%, 48 mol%, 48.5 mol%, 49 mol%, or 49.5 mol%. In some embodiments, the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid. For example, the lipid nanoparticle may comprise 20-50 mol%, 20-40 mol%, 20-30 mol%, 30-60 mol%, 30-50 mol%, 30-40 mol%, 40-60 mol%, 40-50 mol%, or 50-60 mol% ionizable amino lipid. In some embodiments, the lipid nanoparticle comprises 20 mol%, 30 mol%, 40 mol%, 50 mol%, or 60 mol% ionizable amino lipid. In some embodiments, the lipid nanoparticle comprises 35 mol%, 36 mol%, 37 mol%, 38 mol%, 39 mol%, 40 mol%, 41 mol%, 42 mol%, 43 mol%, 44 mol%, 45 mol%, 46 mol%, 47 mol%, 48 mol%, 49 mol%, 50 mol%, 51 mol%, 52 mol%, 53 mol%, 54 mol%, or 55 mol% ionizable amino lipid. In some embodiments, the lipid nanoparticle comprises 45–55 mole percent (mol%) ionizable amino lipid. For example, lipid nanoparticle may comprise 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, or 55 mol% ionizable amino lipid. Ionizable amino lipids Formula (AI) In some embodiments, the ionizable amino lipid of a lipid nanoparticle is a compound of Formula (AI):
Figure imgf000091_0001
; a wherein R, R, R, and R are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH, wherein n is selected from the group consisting of 1, 2, 3, 4, and 5, and , wherein denotes a point of
Figure imgf000092_0001
attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the group consisting of 5, 6, 7, 8, 9, 10, 11, 12, and 13. In some embodiments of the compounds of Formula (AI), R’a is R’branched;
Figure imgf000092_0002
is
Figure imgf000092_0003
a point of attachment; R, R, R, and R are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each - C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. In some embodiments of the compounds of Formula (AI), R’a is R’branched; R’branched is
Figure imgf000092_0004
; a point of attachment; R, R, R, and R are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each - C(O)O-; R’ is a C1-12 alkyl; l is 3; and m is 7. of the compounds of Formula (AI), R’a is R’branched; R’branched is
Figure imgf000092_0005
; denotes a point of attachment; R is C2-12 alkyl; R, R, and R are each H; R2 and R3 are each C1-14 alkyl; 6 alkyl); n2 is 2; R5 is H; each R6 is H; M and M’ are each - m is 7. In some embodiments of the compounds of Formula (AI), R’a is R’branched; R’branched is
Figure imgf000093_0001
a point of attachment; R, R, and R are each H; R is C2-12
Figure imgf000093_0002
are C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. In some embodiments, the compound of Formula (AI) is selected from:
Figure imgf000093_0003
. In some embodiments, the ionizable amino lipid of Formula (AI) is a compound of
Figure imgf000093_0004
(AIa), or its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched; wherein denotes a point of attachment;
Figure imgf000094_0001
selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
Figure imgf000094_0002
, wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2- 3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the group consisting of 5, 6, 7, 8, 9, 10, 11, 12, and 13. In some embodiments, the ionizable amino lipid of Formula (AI) is a compound of
Figure imgf000094_0003
(AIb), or its N-oxide, or a salt or isomer thereof, wherein
Figure imgf000094_0004
R’branched is: ; wherein denotes a point of attachment; wherein R, R, R, and R are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is -(CH2)nOH, wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the group consisting of 5, 6, 7, 8, 9, 10, 11, 12, and 13. In some embodiments of Formula (AI) or (AIb), R’a is R’branched;
Figure imgf000095_0001
is
Figure imgf000095_0002
attachment; R, R, and R are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each - C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. In some embodiments of Formula (AI) or (AIb), R’a is R’branched;
Figure imgf000095_0003
is
Figure imgf000095_0004
a point of attachment; R, R, and R are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each - C(O)O-; R’ is a C1-12 alkyl; l is 3; and m is 7.
Figure imgf000095_0005
In some embodiments of Formula (AI) or (AIb), R’a is R’branched; is
Figure imgf000095_0006
; denotes a point of attachment; R and R are each H; R is C2-12 alkyl; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. In some embodiments, the ionizable amino lipid of Formula (AI) is a compound of Formula (AIc): its N-oxide, or a salt or isomer thereof, R’branched denotes a point of attachment;
Figure imgf000096_0001
wherein are selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is
Figure imgf000096_0002
wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the
Figure imgf000096_0003
In some embodiments, R’a is R’branched; R’branched is ; denotes a point of each H; R is C2-12 alkyl; R2 and R3 are each C1-14
Figure imgf000096_0004
alkyl; R4 is ; denotes a point of attachment; R10 is NH(C1-6 alkyl); n2 is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. In some embodiments, the compound of Formula (AIc) is: .
Figure imgf000097_0001
Formula In some embodiments, the ionizable amino lipid is a compound of Formula (AII):
Figure imgf000097_0002
R’b is:
Figure imgf000097_0006
; wherein
Figure imgf000097_0003
denotes a point of attachment; R and R are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of R and R is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R and R are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of R and R is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is wherein n is selected from the
Figure imgf000097_0004
Figure imgf000097_0005
group consisting of 1, 2, 3, 4, and 5, and , wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl; Ya is a C3-6 carbocycle; R*”a is selected from the group consisting of C1-15 alkyl and C2-15 alkenyl; and s is 2 or 3; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of
Figure imgf000098_0001
wherein R’a is R’branched or R’cyclic; wherein
Figure imgf000098_0002
denotes a point of attachment; R and R are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of R and R is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R and R are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of R and R is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is wherein n is selected from the
Figure imgf000098_0003
group consisting of 1, 2, 3, 4, and 5, and , wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-b):
Figure imgf000099_0001
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched or R’cyclic; wherein
Figure imgf000099_0002
a point of attachment; R and R are each independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
Figure imgf000099_0003
, wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-c): denotes a point of
Figure imgf000100_0001
wherein R is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
Figure imgf000100_0002
, wherein
Figure imgf000100_0003
denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; R’ is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula :
Figure imgf000100_0004
(AII-d), or its N-oxide, or a salt or isomer thereof,
Figure imgf000100_0005
R’branched is: and R’b is: ; wherein denotes a point of attachment; wherein R and R are each independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting , wherein denotes a point of
Figure imgf000101_0001
attachment; 2; from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-e):
Figure imgf000101_0003
Figure imgf000101_0002
denotes a point of attachment; wherein R is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is -(CH2)nOH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d), or (AII-e), m and l are each independently selected from 4, 5, and 6. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), m and l are each 5. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d), or (AII-e), each R’ independently is a C1-12 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), each R’ independently is a C2-5 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d), or (AII-e), R’b is: and R2 and R3 are each independently a C1-14 alkyl. In some
Figure imgf000102_0001
embodiments of the of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’b is: R3 are each independently a C6-10 alkyl. In some embodiments of the
Figure imgf000102_0002
compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’b
Figure imgf000102_0003
R2 and R3 are each a C8 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d)
Figure imgf000102_0004
and R3 are each independently a C6-10 alkyl. In some embodiments of the compound of Formula is:
Figure imgf000102_0008
embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e),
Figure imgf000102_0005
, 6 alkyl, and R2 and R3 are each a C8 alkyl. In some embodiments of the of Formula , , , , (AII-
Figure imgf000102_0006
d), or , , , are each a ,
Figure imgf000102_0007
(AII-d), or (AII-e), R’branched is: , R’b is: , and R and R are each a C2-6 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d), or (AII-e), m and l are each independently selected from 4, 5, and 6 and each R’ independently is a C1-12 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), m and l are each 5 and each R’ independently is a C2-5 alkyl. In some embodiments of the compound of (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’branched is: independently
Figure imgf000103_0001
R and R are each a C1-12 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), l
Figure imgf000103_0002
are each 5, each R’ independently is a C2-5 alkyl, and R and are each a C2-6 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d)
Figure imgf000103_0003
l are each independently selected from 4, 5, and 6, R’ is a C1-12 alkyl, R is a C1-12 alkyl and R2 and R3 are each independently a C6-10 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d)
Figure imgf000103_0004
l are each 5, R’ is a C2- 5 alkyl, R is a C2-6 alkyl, and R2 and R3 are each a C8 alkyl. In some embodiments of the compound of (AII), (AII-a), (AII-b), (AII-c), (AII-d), or
Figure imgf000103_0005
, , wherein R10 is NH(C1-6 alkyl) and n2 is 2. In some embodiments of the of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R4
Figure imgf000103_0006
is , R10 is NH(CH3) and n2 is 2. c), (AII-
Figure imgf000103_0007
d), or (AII-e), R’branched is: , R’b is: , m and l are each independently selected from 4, 5, and 6, each R’ independently is a C1-12 alkyl, R and R are each a C1-12 alkyl, and R4 is wherein R10 is NH(C1-6 alkyl), and n2 is 2. In some embodiments of the (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII- e), R’branched is:
Figure imgf000104_0001
independently is a C2-5 alkyl, R and R are each a C2-6 alkyl, and R4
Figure imgf000104_0002
, wherein R10 is NH(CH3) and n2 is 2. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d)
Figure imgf000104_0003
each independently selected from 4, 5, and 6, R’ is a C1-12 alkyl, R2 and R3 are each independently a C6-10 alkyl, R is a C1-12 alkyl, and R4 is , wherein R10 is NH(C1-6 alkyl) and n2 is 2. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d)
Figure imgf000104_0004
l are each 5, R’ is a C2- aγ 2 3 4
Figure imgf000104_0005
5 alkyl, R is a C2-6 alkyl, R and R are each a C8 alkyl, and R , wherein R10 is NH(CH3) and n2 is 2. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII- d), or (AII-e), R4 is -(CH2)nOH and n is 2, 3, or 4. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R4 is -(CH2)nOH and n is 2. c), (AII- d),
Figure imgf000104_0006
, , , independently selected from 4, 5, and 6, each R’ independently is a C1-12 alkyl, R and R are each a C1-12 alkyl, R4 is -(CH2)nOH, and n is 2, 3, or 4. In some embodiments of the compound of Formula , R’b is: , m and l are each 5, each R’ independently is a C2-5 alkyl, R and R are each a C2-6 alkyl, R4 is -(CH2)nOH, and n is 2. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of
Figure imgf000105_0001
R’branched is:
Figure imgf000105_0002
and R’b is:
Figure imgf000105_0003
; wherein
Figure imgf000105_0004
denotes a point of attachment; R is a C1-12 alkyl; R2 and R3 are each independently a C1-14 alkyl; R4 is -(CH2)nOH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C1-12 alkyl; m is selected from 4, 5, and 6; and l is selected from 4, 5, and 6. In some embodiments of the compound of Formula (AII-f), m and l are each 5, and n is 2, 3, or 4. In some embodiments of the compound of Formula (AII-f) R’ is a C2-5 alkyl, R is a C2-6 alkyl, and R2 and R3 are each a C6-10 alkyl. In some embodiments of the compound of Formula (AII-f), m and l are each 5, n is 2, 3, or 4, R’ is a C2-5 alkyl, R is a C2-6 alkyl, and R2 and R3 are each a C6-10 alkyl. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-g): R is a C2-6 alkyl; R’ is a C2-5 alkyl; and R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
Figure imgf000106_0001
wherein denotes a point of attachment, R10 is NH(C1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3. In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-h):
Figure imgf000106_0002
thereof; wherein R and R are each independently a C2-6 alkyl; each R’ independently is a C2-5 alkyl; and R4 is selected from the of - nOH wherein n is selected from the
Figure imgf000106_0003
group consisting , wherein denotes a point of attachment, R10 is NH(C1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3. of the compound of Formula (AII-g) or (AII-h), R4 is
Figure imgf000106_0004
, wherein R10 is NH(CH3) and n2 is 2. In some embodiments of the compound of Formula (AII-g) or (AII-h), R4 is -(CH2)2OH. Formula (AIII) In some embodiments, the ionizable amino lipids of a lipid nanoparticle may be one or more of compounds of Formula (AIII): (AIII), or their N-oxides, or salts or isomers thereof, wherein:
Figure imgf000107_0001
R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of hydrogen, a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a carbocycle, heterocycle, -OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -N(R)2, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -N(R)R8, -N(R)S(O)2R8, -O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR)C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)N(R)2, -C(=NR9)R, -C(O)N(R)OR, and –C(R)N(R)2C(O)OR, and each n is independently selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group, in which M” is a bond, C1-13 alkyl or C2-13 alkenyl; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-15 alkyl and C3-15 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13; and wherein when R4 is -(CH2)nQ, -(CH2)nCHQR, –CHQR, or -CQ(R)2, then (i) Q is not -N(R)2 when n is 1, 2, 3, 4 or 5, or (ii) Q is not 5, 6, or 7-membered heterocycloalkyl when n is 1 or 2. In some embodiments, another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N, O, and S, -OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -CRN(R)2C(O)OR, -N(R)R8, -O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR)C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)N(R)2, -C(=NR9)R, -C(O)N(R)OR, and a 5- to 14-membered heterocycloalkyl having one or more heteroatoms selected from N, O, and S which is substituted with one or more substituents selected from oxo (=O), OH, amino, mono- or di-alkylamino, and C1-3 alkyl, and each n is independently selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. In some embodiments, another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heterocycle having one or more heteroatoms selected from N, O, and S, -OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -CRN(R)2C(O)OR, -N(R)R8, -O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR)C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)R, -C(O)N(R)OR, and -C(=NR9)N(R)2, and each n is independently selected from 1, 2, 3, 4, and 5; and when Q is a 5- to 14-membered heterocycle and (i) R4 is -(CH2)nQ in which n is 1 or 2, or (ii) R4 is -(CH2)nCHQR in which n is 1, or (iii) R4 is -CHQR, and -CQ(R)2, then Q is either a 5- to 14-membered heteroaryl or 8- to 14-membered heterocycloalkyl; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. In some embodiments, another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N, O, and S, -OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -CRN(R)2C(O)OR, -N(R)R8, -O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR)C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)R, -C(O)N(R)OR, and -C(=NR9)N(R)2, and each n is independently selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. In some embodiments, another subset of compounds of Formula (AIII) includes those in which R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C2-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is -(CH2)nQ or -(CH2)nCHQR, where Q is -N(R)2, and n is selected from 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C1-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. In some embodiments, another subset of compounds of Formula (AIII) includes those in which R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of -(CH2)nQ, -(CH2)nCHQR, -CHQR, and -CQ(R)2, where Q is -N(R)2, and n is selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C1-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. In certain embodiments, a subset of compounds of Formula (AIII) includes those of Formula (AIII-A):
Figure imgf000113_0001
(AIII-A), or its N-oxide, or a salt or isomer thereof, wherein l is selected from 1, 2, 3, 4, and 5; m is selected from 5, 6, 7, 8, and 9; M1 is a bond or M’; R4 is hydrogen, unsubstituted C1-3 alkyl, or -(CH2)nQ, in which Q is -OH, -NHC(S)N(R)2, -NHC(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)R8,-NHC(=NR9)N(R)2, -NHC(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, heteroaryl or heterocycloalkyl; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group,; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl. For example, m is 5, 7, or 9. For example, Q is OH, -NHC(S)N(R)2, or -NHC(O)N(R)2. For example, Q is -N(R)C(O)R, or -N(R)S(O)2R. In certain embodiments, a subset of compounds of Formula (AIII) includes those of Formula (AIII-B): (AIII-B), or its N-oxide, or a salt or isomer thereof in which all are as For example, m is selected from 5, 6, 7, 8, and 9; R4 is hydrogen, unsubstituted C1-3 alkyl, or -(CH2)nQ, in which Q is H, -NHC(S)N(R)2, -NHC(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)R8, -NHC(=NR9)N(R)2, -NHC(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, heteroaryl or heterocycloalkyl; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl. For example, m is 5, 7, or 9. For example, Q is OH, -NHC(S)N(R)2, or -NHC(O)N(R)2. For example, Q is -N(R)C(O)R, or -N(R)S(O)2R. In certain embodiments, a subset of compounds of Formula (AIII) includes those of Formula (AIII-C):
Figure imgf000114_0001
(AIII-C), or its N-oxide, or a salt or isomer thereof, wherein l is selected from 1, 2, 3, 4, and 5; M1 is a bond or M’; R4 is hydrogen, unsubstituted C1- 3 alkyl, or -(CH2)nQ, in which n is 2, 3, or 4, and Q is OH, -NHC(S)N(R)2, -NHC(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)R8, -NHC(=NR9)N(R)2, -NHC(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, heteroaryl or heterocycloalkyl; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl. In (AIII) are of Formula (AIII-D),
Figure imgf000114_0002
(AIII-D), or their N-oxides, or salts or isomers thereof, wherein R4 is as described in this “Lipid Compositions” section. In another embodiment, the compounds of Formula (AIII) are of Formula (AIII-E), their N-oxides, or salts or isomers section. In another embodiment, the compounds of Formula (AIII) are of Formula (AIII-F) or (AIII-G):
Figure imgf000115_0001
their N-oxides, or salts or isomers thereof, wherein R4 is as described in this “Lipid Compositions” section. In another embodiment, the compounds of Formula (AIII) are of Formula (AIII-H):
Figure imgf000115_0002
their N-oxides, or salts or isomers thereof, wherein M is -C(O)O- or –OC(O)-, M” is C1-6 alkyl or C2-6 alkenyl, R2 and R3 are independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl, and n is selected from 2, 3, and 4. In a further embodiment, the compounds of Formula (AIII) are of Formula (AIII-I):
Figure imgf000115_0003
(AIII-I), or their N-oxides, or salts or isomers thereof, wherein n is 2, 3, or 4; and m, R’, R”, and R2 through R6 are as described in this “Lipid Compositions” section. For example, each of R2 and R3 may be independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl. In some embodiments, an ionizable amino lipid comprises a compound having structure: (Compound 1).
Figure imgf000116_0001
In some embodiments, an ionizable amino lipid comprises a compound having structure: .
Figure imgf000116_0002
In a further embodiment, the compounds of Formula (AIII) are of Formula (AIII-J),
Figure imgf000116_0003
(AIII-J), or their N-oxides, or salts or isomers thereof, wherein l is selected from 1, 2, 3, 4, and 5; m is selected from 5, 6, 7, 8, and 9; M1 is a bond or M’; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl. For example, M” is C1-6 alkyl (e.g., C1-4 alkyl) or C2-6 alkenyl (e.g. C2-4 alkenyl). For example, R2 and R3 are independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl. In some embodiments, the ionizable amino lipids are one or more of the compounds described in U.S. Application Nos. 62/220,091, 62/252,316, 62/253,433, 62/266,460, 62/333,557, 62/382,740, 62/393,940, 62/471,937, 62/471,949, 62/475,140, and 62/475,166, and PCT Application No. PCT/US2016/052352. The central amine moiety of a lipid according to Formula (AIII), (AIII-A), (AIII-B), (AIII-C), (AIII-D), (AIII-E), (AIII-F), (AIII-G), (AIII-H), (AIII-I), or (AIII-J) may be protonated at a physiological pH. Thus, a lipid may have a positive or partial positive charge at physiological pH. Such amino lipids may be referred to as cationic lipids, ionizable lipids, cationic amino lipids, or ionizable amino lipids. Amino lipids may also be zwitterionic, i.e., neutral molecules having both a positive and a negative charge. Formula (AIV) In some embodiments, the ionizable amino lipids of a lipid nanoparticle may be one or more of compounds of formula (AIV), t is 1 or 2; A1 and A2 are each independently selected from CH or N; Z is CH2 or absent wherein when Z is CH2, the dashed lines (1) and (2) each represent a single bond; and when Z is absent, the dashed lines (1) and (2) are both absent; R1, R2, R3, R4, and R5 are independently selected from the group consisting of C5-20 alkyl, C5-20 alkenyl, -R”MR’, -R*YR”, -YR”, and -R*OR”; RX1 and RX2 are each independently H or C1-3 alkyl; each M is independently selected from the group consisting of -C(O)O-, -OC(O)-, -OC(O)O-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -C(O)S-, -SC(O)-, an aryl group, and a heteroaryl group; M* is C1-C6 alkyl, W1 and W2 are each independently selected from the group consisting of -O- and -N(R6)-; each R6 is independently selected from the group consisting of H and C1-5 alkyl; X1, X2, and X3 are independently selected from the group consisting of a bond, -CH2-, -(CH2)2-, -CHR-, -CHY-, -C(O)-, -C(O)O-, -OC(O)-, -(CH2)n-C(O)-, -C(O)-(CH2)n-, -(CH2)n-C(O)O-, -OC(O)-(CH2)n-, -(CH2)n-OC(O)-, -C(O)O-(CH2)n-, -CH(OH)-, -C(S)-, and -CH(SH)-; each Y is independently a C3-6 carbocycle; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each R is independently selected from the group consisting of C1-3 alkyl and a C3-6 carbocycle; each R’ is independently selected from the group consisting of C1-12 alkyl, C2-12 alkenyl, and H; each R” is independently selected from the group consisting of C3-12 alkyl, C3-12 alkenyl and -R*MR’; and n is an integer from 1-6; wherein when ring , then
Figure imgf000118_0001
i) at least one of X1, X2, and X3 is not -CH2-; and/or ii) at least one of R1, R2, R3, R4, and R5 is -R”MR’. In some embodiments, the compound is of any of formulae (AIVa)-(AIVh): ,
Figure imgf000118_0002
(AIVd), In some embodiments, the ionizable amino lipid is
Figure imgf000119_0001
a salt thereof. The central amine moiety of a lipid according to Formula (AIV), (AIVa), (AIVb), (AIVc), (AIVd), (AIVe), (AIVf), (AIVg), or (AIVh) may be protonated at a physiological pH. Thus, a lipid may have a positive or partial positive charge at physiological pH. Formula (AV) In some the comprises a lipid having the structure:
Figure imgf000119_0002
(AV), or a pharmaceutically acceptable salt, tautomer, or stereoisomer thereof, wherein: R1 is optionally substituted C1-C24 alkyl or optionally substituted C2-C24 alkenyl; R2 and R3 are each independently optionally substituted C1-C36 alkyl; R4 and R5 are each independently optionally substituted C1-C6 alkyl, or R4 and R5 join, along with the N to which they are attached, to form a heterocyclyl or heteroaryl; L1, L2, and L3 are each independently optionally substituted C1-C I 8 alkylene; G1 is a direct bond, -(CH2)nO(C=O)-, -(CH2)n(C=O)O-, or -(C=O)-; G2 and G3 are each independently -(C=O)O- or -O(C=O)-; and n is an integer greater than 0. Formula (AVI) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: acceptable salt, tautomer, or stereoisomer
Figure imgf000120_0001
thereof, wherein: G1 is -N(R3)R4 or -OR5; R1 is optionally substituted branched, saturated or unsaturated C12-C36 alkyl; R2 is optionally substituted branched or unbranched, saturated or unsaturated C12-C36 alkyl when L is -C(=O)-; or R2 is optionally substituted branched or unbranched, saturated or unsaturated C4-C36 alkyl when L is C6-C12 alkylene, C6-C12 alkenylene, or C2-C6 alkynylene; R3 and R4 are each independently H, optionally substituted branched or unbranched, saturated or unsaturated C1-C6 alkyl; or R3 and R4 are each independently optionally substituted branched or unbranched, saturated or unsaturated C1-C6 alkyl when L is C6-C12 alkylene, C6- C12 alkenylene, or C2-C6 alkynylene; or R3 and R4, together with the nitrogen to which they are attached, join to form a heterocyclyl; R5 is H or optionally substituted C1-C6 alkyl; L is -C(=O)-, C6-C 12 alkylene, C6-C12 alkenylene, or C2-C6 alkynylene; and n is an integer from 1 to 12. Formula (AVII) In some embodiments, the comprises a lipid having the structure:
Figure imgf000120_0002
(AVII), or a pharmaceutically acceptable salt thereof, wherein: each R1a is independently hydrogen, R1c, or R1d; each R1b is independently R1c or R1d; each R1c is independently –[CH2]2C(O)X1R3; each R1d Is independently -C(O)R4; each R2 is independently -[C(R2a)2]cR2b; each R2a is independently hydrogen or C1-C6 alkyl; R2b is -N(L1-B)2; -(OCH2CH2)6OH; or -(OCH2CH2)bOCH3; each R3 and R4 is independently C6-C30 aliphatic; each I.3 is independently C1-C10 alkylene; each B is independently hydrogen or an ionizable nitrogen-containing group; each X1 is independently a covalent bond or O; each a is independently an integer of 1-10; each b is independently an integer of 1-10; and each c is independently an integer of 1-10. Formula (AVIII) In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000121_0001
pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein: X is N, and Y is absent; or X is CR, and Y is NR; L1 is -O(C-O)R1, -(C=O)OR1, -C(=O)R1, -OR1, -S(O)xR1, -S-SR1, -C(=O)SR1, -SC(=O)R1, -NRaC(=O)R1, -C(=O)NRbRc, -NRaC(=O)NRbRc, -OC(=O)NRbRc, or -NRaC(=O)OR1; L2 is -O(C=O)R2, -(C=O)OR2, -C(=O)R2, -OR2, -S(O)xR2, -S-SR2, -C(=O)SR2, -SC(=O)R2, -NRdC(=O)R2, -C(=O)NReRf, -NRdC(=O)NReRf, -OC(=O)NReRf; -NRdC(=O)OR2 or a direct bond to R2; L3 is -O(C=O)R3 or -(C=O)OR3; G1 and G2 are each independently C2-C12 alkylene or C2-C12 alkenylene; G3 is C1-C24 alkylene, C2-C24 alkenylene, C1-C24 heteroalkylene or C2-C24 heteroalkenylene when X is CR, and Y is NR; and G3 is C1-C24 heteroalkylene or C2-C24 heteroalkenylene when X is N, and Y is absent; Ra, Rb, Rd and Re are each independently H or C1-C12 alkyl or C1-C12 alkenyl; Rc and Rf are each independently C1-C12 alkyl or C2-C12 alkenyl; each R is independently H or C1-C12 alkyl; R1, R2 and R3 are each independently C1-C24 alkyl or C2-C24 alkenyl; and x is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, heteroalkylene and heteroalkenylene is independently substituted or unsubstituted unless otherwise specified. Formula (AIX) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt, tautomer,
Figure imgf000122_0001
or L1 and L2 are each independently -O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O)x-, -S-S-, -C(=O)S-, -SC(=O)-, -NRaC(=O)-, -C(=O)NRa-, -NRaC(=O)NRa-, -OC(=O)NRa-, -NRaC(=O)O- or a direct bond; G1 is C1-C2 alkylene, -(C=O)-, -O(C=O)-, -SC(=O)-, -NRaC(=O)- or a direct bond; G2 is -C(O)-, -(CO)O-, -C(=O)S-, -C(=O)NRa- or a direct bond; G3 is C1-C6 alkylene; Ra is H or C1-C12 alkyl; R1a and R1b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R1a is H or C1-C12 alkyl, and R1b together with the carbon atom to which it is bound is taken together with an adjacent R1b and the carbon atom to which it is bound to form a carbon-carbon double bond; R2a and R2b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R2a is H or C1-C12 alkyl, and R2b together with the carbon atom to which it is bound is taken together with an adjacent R2b and the carbon atom to which it is bound to form a carbon-carbon double bond; R3a and R3b are, at each occurrence, independently either (a): H or C1-C12 alkyl; or (b) R3a is H or C1-C12 alkyl, and R3b together with the carbon atom to which it is bound is taken together with an adjacent R and the carbon atom to which it is bound to form a carbon-carbon double bond; R4a and R4b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R4a is H or C1-C12 alkyl, and R4b together with the carbon atom to which it is bound is taken together with an adjacent R4b and the carbon atom to which it is bound to form a carbon-carbon double bond; R5 and R6 are each independently H or methyl; R7 is H or C1-C20 alkyl; R8 is OH, -N(R9)(C=O)R10, -(C=O)NR9R10, -NR9R10, -(C=0)0R"1 or -O(C=O)R", provided that G3 is C4-C6 alkylene when R8 is -NR9R10, R9 and R10 are each independently H or C1-C12 alkyl; R" is aralkyl; a, b, c and d are each independently an integer from 1 to 24; and x is 0, 1 or 2, wherein each alkyl, alkylene and aralkyl is optionally substituted. Formula (AX) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt, prodrug or
Figure imgf000123_0001
stereoisomer thereof, wherein: X and X' are each independently N or CR; Y and Y' are each independently absent, -O(C=O)-, -(C=O)O- or NR, provided that: a) Y is absent when X is N; b) Y' is absent when X' is N; c) Y is -O(C=O)-, -(C=O)O- or NR when X is CR; and d) Y' is -O(C=O)-, -(C=O)O- or NR when X' is CR, L1 and L1' are each independently -O(C=O)R', -(C=O)OR' , -C(=O)R', -OR1, -S(O)zR', -S- SR1, -C(=O)SR', -SC(=O)R', -NRaC(=O)R', -C(=O)NRbRc, -NRaC(=O)NRbRc, -OC(=O)NRbRc or -NRaC(=O)OR'; L2 and L2’ are each independently -O(C=O)R2, -(C=O)OR2, -C(=O)R2, -OR2, -S(O)zR2, - R2, -NRdC(=O)R2, -C(=O)NReRf, -NRdC(=O)NReRf, - OR2 or a direct bond to R2;
Figure imgf000123_0002
are each independently C2-C12 alkylene or C2-C12 alkenylene; G is C2-C24 heteroalkylene or C2-C24 heteroalkenylene; Ra, Rb, Rd and Re are, at each occurrence, independently H, C1-C12 alkyl or C2-C12 alkenyl; Rc and Rf are, at each occurrence, independently C1-C12 alkyl or C2-C12 alkenyl; R is, at each occurrence, independently H or C1-C12 alkyl; R1 and R2 are, at each occurrence, independently branched C6-C24 alkyl or branched C6-C24 alkenyl; z is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, heteroalkylene and heteroalkenylene is independently substituted or unsubstituted unless otherwise specified. Formula (AXI) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: - - - -
Figure imgf000124_0002
G1 and G2 are each independently C2-C12 alkylene or C2-C12 alkenylene; G3 is C1-C24 alkylene, C2-C24 alkenylene, C3-C8 cycloalkylene or C3-C8 cycloalkenylene; Ra, Rb, Rd and Re are each independently H or C1-C12 alkyl or C1-C12 alkenyl; Rc and Rf are each independently C1-C12 alkyl or C2-C12 alkenyl; R1 and R2 are each independently branched C6-C24 alkyl or branched C6-C24 alkenyl; R3 is -N(R4)R5; R4 is C1-C12 alkyl; R5 is substituted C1-C12 alkyl; and x is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, cycloalkylene, cycloalkenylene, aryl and aralkyl is independently substituted or unsubstituted unless otherwise specified. In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000124_0001
or , or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein: L1 is -O(C=O)R1, -(C=O)OR1, -C(=O)R1, -OR1, -S(O)xR1, -S-SR1, -C(=O)SR1, -SC(=O)R1, -NRaC(=O)R1, -C(=O)NRbRc, -NRaC(=O)NRbRc, -OC(=O)NRbRc or -NRaC(=O)OR1; L2 is -O(C=O)R2, -(C=O)OR2, -C(=O)R2, -OR2, -S(O)xR2, -S-SR2, -C(=O)SR2, -SC(=O)R2, -NRdC(=O)R2, -C(=O)NReRf, -NRdC(=O)NReRf, -OC(=O)NReRf, -NRdC(=O)OR2 or a direct bond to R2; G1a and G2b are each independently C2-C12 alkylene or C2-C12 alkenylene; G1b and G2b are each independently C1-C12 alkylene or C2-C12 alkenylene; G3 is C1-C24 alkylene, C2-C24 alkenylene, C3-C8 cycloalkylene or C3-C8 cycloalkenylene; Ra, Rb, Rd and Re are each independently H or C1-C12 alkyl or C2-C12 alkenyl; Rc and Rf are each independently C1-C12 alkyl or C2-C12 alkenyl; R1 and R2 are each independently branched C6-C24 alkyl or branched C6-C24 alkenyl; R3a is -C(=O)N(R4a)R5a or -C(=O)OR6; R3b is -NR4bC(=O)R5b; R4a is C1-C12 alkyl; R4b is H, C1-C12 alkyl or C2-C12 alkenyl; R5a is H, C1-C8 alkyl or C2-C8 alkenyl; R5b is C2-C12 alkyl or C2-C12 alkenyl when R4b is H; or R5b is C1-C12 alkyl or C2-C12 alkenyl when R4b is C1-C12 alkyl or C2-C12 alkenyl; R6 is H, aryl or aralkyl; and x is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, cycloalkylene, cycloalkenylene, aryl and aralkyl is independently substituted or unsubstituted. Formula (AXII) In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000125_0001
acceptable salt, prodrug or stereoisomer thereof, wherein: G1 is -OH, -R3R4, -(C=O)R5 or -R3(C=O)R5; G2 is -CH2- or -(C=O)-; R is, at each occurrence, independently H or OH; R1 and R2 are each independently optionally substituted branched, saturated or unsaturated C12-C36 alkyl; R3 and R4 are each independently H or optionally substituted straight or branched, saturated or unsaturated C1-C6 alkyl; R5 is optionally substituted straight or branched, saturated or unsaturated C1-C6 alkyl; and n is an integer from 2 to 6. Formula (AXIII) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt, prodrug or one of G1 or G2 is, at each occurrence, -O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O) , -S-S-, -C(=O)S-, SC(=O)-, -N(Ra)C(=O)-, -C(=O)N(Ra)-, -N(Ra)C(=O)N(Ra)-, -OC(=O)N(Ra)- or -N(Ra)C(=O)O-, and the other of G1 or G2 is, at each occurrence, -O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O) , -S-S-, -C(=O)S-, -SC(=O)-, -N(Ra)C(=O)-, -C(=O)N(Ra)-, -N(Ra)C(=O)N(Ra)-, -OC(=O)N(Ra)- or -N(Ra)C(=O)O- or a direct bond; L is, at each occurrence, ~O(C=O)-, wherein ~ represents a covalent bond to X; X is CRa; Z is alkyl, cycloalkyl or a monovalent moiety comprising at least one polar functional group when n is 1; or Z is alkylene, cycloalkylene or a polyvalent moiety comprising at least one polar functional group when n is greater than 1; Ra is, at each occurrence, independently H, C1-C12 alkyl, C1-C12 hydroxylalkyl, C1-C12 aminoalkyl, C1-C12 alkylaminylalkyl, C1-C12 alkoxyalkyl, C1-C12 alkoxycarbonyl, C1-C12 alkylcarbonyloxy, C1-C12 alkylcarbonyloxyalkyl or C1-C12 alkylcarbonyl; R is, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R together with the carbon atom to which it is bound is taken together with an adjacent R and the carbon atom to which it is bound to form a carbon-carbon double bond; R1 and R2 have, at each occurrence, the following structure, respectively:
Figure imgf000126_0001
a1 and a2 are, at each occurrence, independently an integer from 3 to 12; b1 and b2 are, at each occurrence, independently 0 or 1; c1 and c2 are, at each occurrence, independently an integer from 5 to 10; d1 and d2 are, at each occurrence, independently an integer from 5 to 10; y is, at each occurrence, independently an integer from 0 to 2; and n is an integer from 1 to 6, wherein each alkyl, alkylene, hydroxylalkyl, aminoalkyl, alkylaminylalkyl, alkoxyalkyl, alkoxycarbonyl, alkylcarbonyloxy, alkylcarbonyloxyalkyl and alkylcarbonyl is optionally substituted with one or more substituent. Formula (AXIV) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt, prodrug or
Figure imgf000127_0001
stereoisomer thereof, wherein: one of L1 or L2 is -O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O)x-, -S-S-, -C(=O)S-, - the other of L1
Figure imgf000127_0002
-, -RaC(=O)-, -C(=O)Ra-, RaC(=O)Ra-, -OC(=O)Ra- or -NRaC(=O)O- or a direct bond; G1 and G2 are each independently unsubstituted C1-C12 alkylene or C1-C12 alkenylene; G3 is C1-C24 alkylene, C1-C24 alkenylene, C3-C8 cycloalkylene, C3-C8 cycloalkenylene; Ra is H or C1-C12 alkyl; R1 and R2 are each independently C6-C24 alkyl or C6-C24 alkenyl; R3 is H, OR5, CN, -C(=O)OR4, -OC(=O)R4 or -R5C(=O)R4; R4 is C1-C12 alkyl; R5 is H or C1-C6 alkyl; and x is 0, 1 or 2. Formula (AXV) In some embodiments, the comprises a lipid having the structure:
Figure imgf000127_0003
(AXV), or a pharmaceutically acceptable salt, tautomer, prodrug or stereoisomer thereof, wherein: L1 and L2 are each independently -O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O)x-, -S-S-, -C(=O)S-, -SC(=O)-, -RaC(=O)-, -C(=O)Ra-, -RaC(=O)Ra-, -OC(=O)Ra-, -RaC(=O)O- or a direct bond; G1 is C1-C2 alkylene, -(C=O)-, -O(C=O)-, -SC(=O)-, -RaC(=O)- or a direct bond: G2 is -C(=O)-, -(C=O)O-, -C(=O)S-, -C(=O)NRa- or a direct bond; G3 is C1-C6 alkylene; Ra is H or C1-C12 alkyl; R1a and R1b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R1a is H or C1-C12 alkyl, and R1b together with the carbon atom to which it is bound is taken together with an adjacent R1b and the carbon atom to which it is bound to form a carbon-carbon double bond; R2a and R2b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R2a is H or C1-C12 alkyl, and R2b together with the carbon atom to which it is bound is taken together with an adjacent R2b and the carbon atom to which it is bound to form a carbon-carbon double bond; R3a and R3b are, at each occurrence, independently either (a): H or C1-C12 alkyl; or (b) R3a is H or C1-C12 alkyl, and R3b together with the carbon atom to which it is bound is taken together with an adjacent R and the carbon atom to which it is bound to form a carbon-carbon double bond; R4a and R4b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R4a is H or C1-C12 alkyl, and R4b together with the carbon atom to which it is bound is taken together with an adjacent R4b and the carbon atom to which it is bound to form a carbon-carbon double bond; R5 and R6 are each independently H or methyl; R7 is C4-C20 alkyl; R8 and R9 are each independently C1-C12 alkyl; or R8 and R9, together with the nitrogen atom to which they are attached, form a 5, 6 or 7-membered heterocyclic ring; a, b, c and d are each independently an integer from 1 to 24; and x is 0, 1 or 2. Formula (AXVI) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt, tautomer, L1 and L2 are each independently -O(C=O)-, -(C=O)O- or a carbon-carbon double bond; R1a and R1b are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) R1a is H or C1-C12 alkyl, and R1b together with the carbon atom to which it is bound is taken together with an adjacent R1b and the carbon atom to which it is bound to form a carbon-carbon double bond; R2a and R2b are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) R2a is H or C1-C12 alkyl, and R2b together with the carbon atom to which it is bound is taken together with an adjacent R2b and the carbon atom to which it is bound to form a carbon-carbon double bond; R3a and R3b are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) R3a is H or C1-C12 alkyl, and R3b together with the carbon atom to which it is bound is taken together with an adjacent R3b and the carbon atom to which it is bound to form a carbon-carbon double bond; R4a and R4b are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) R4a is H or C1-C12 alkyl, and R4b together with the carbon atom to which it is bound is taken together with an adjacent R4b and the carbon atom to which it is bound to form a carbon-carbon double bond; R5 and R6 are each independently methyl or cycloalkyl; R7 is, at each occurrence, independently H or C1-C12 alkyl; R8 and R9 are each independently unsubstituted C1-C12 alkyl; or R8 and R9, together with the nitrogen atom to which they are attached, form a 5, 6 or 7- membered heterocyclic ring comprising one nitrogen atom; a and d are each independently an integer from 0 to 24; b and c are each independently an integer from 1 to 24; and e is 1 or 2, provided that: at least one of R1a, R2a, R3a or R4a is C1-C12 alkyl, or at least one of L1 or L2 is -O(C=O)- or -(C=O)O-; and R1a and R1b are not isopropyl when a is 6 or n-butyl when a is 8. Formula (AXVII) In some embodiments, the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt thereof, wherein
Figure imgf000130_0001
are same or a linear or branched alkyl with 1-9 carbons, or as alkenyl or alkynyl with 2 to 11 carbon atoms, L1 and L2 are the same or different, each a linear alkyl having 5 to 18 carbon atoms, or form a heterocycle with N, X1 is a bond, or is -CG-G- whereby L2-CO-O-R2 is formed, X2 is S or O, L3 is a bond or a lower alkyl, or form a heterocycle with N, R3 is a lower alkyl, and R4 and R5 are the same or different, each a lower alkyl. Compounds (A1)-(A11) In some embodiments, the lipid nanoparticle comprises an ionizable lipid having the structure:
Figure imgf000130_0002
, or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000130_0003
, or a pharmaceutically acceptable salt thereof.
Figure imgf000130_0004
(A3), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (A4), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000131_0001
a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000131_0002
(A6), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000131_0003
(A7), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000131_0004
(A8), or a pharmaceutically acceptable salt thereof. In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (A9), or a pharmaceutically acceptable salt thereof.
Figure imgf000132_0001
some comprises a lipid having the structure: pharmaceutically acceptable salt
Figure imgf000132_0002
In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
Figure imgf000132_0003
(A11), or a pharmaceutically acceptable salt thereof. Non-cationic lipids In certain embodiments, the lipid nanoparticles comprise one or more non-cationic lipids. Non-cationic lipids may be phospholipids. In some embodiments, the lipid nanoparticle comprises 5-25 mol% non-cationic lipid. For example, the lipid nanoparticle may comprise 5-20 mol%, 5-15 mol%, 5-10 mol%, 10-25 mol%, 10-20 mol%, 10-25 mol%, 15-25 mol%, 15-20 mol%, or 20-25 mol% non-cationic lipid. In some embodiments, the lipid nanoparticle comprises 5 mol%, 10 mol%, 15 mol%, 20 mol%, or 25 mol% non-cationic lipid. In some embodiments, a non-cationic lipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2- dilinoleoyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero-phosphocholine (DMPC), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), l,2-dipalmitoyl-sn-glycero-3- phosphocholine (DPPC), 1,2-diundecanoyl-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2- oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3- phosphocholine (18:0 Diether PC), 1-oleoyl-2 cholesterylhemisuccinoyl-sn-glycero-3- phosphocholine (OChemsPC), 1-hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC), 1,2- dilinolenoyl-sn-glycero-3-phosphocholine,1,2-diarachidonoyl-sn-glycero-3-phosphocholine, 1,2- didocosahexaenoyl-sn-glycero-3-phosphocholine, 1,2-diphytanoyl-sn-glycero-3- phosphoethanolamine (ME 16.0 PE), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine, 1,2- dilinoleoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinolenoyl-sn-glycero-3- phosphoethanolamine, 1,2-diarachidonoyl-sn-glycero-3-phosphoethanolamine, 1,2- didocosahexaenoyl-sn-glycero-3-phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-rac- (1-glycerol) sodium salt (DOPG), sphingomyelin, or mixtures thereof. In some embodiments, the lipid nanoparticle comprises 5–15 mol%, 5–10 mol%, or 10– 15 mol% DSPC. For example, the lipid nanoparticle may comprise 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 mol% DSPC. In certain embodiments, the lipid composition of the lipid nanoparticle composition disclosed herein can comprise one or more phospholipids, for example, one or more saturated or (poly)unsaturated phospholipids or a combination thereof. In general, phospholipids comprise a phospholipid moiety and one or more fatty acid moieties. A phospholipid moiety can be selected, for example, from the non-limiting group consisting of phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl glycerol, phosphatidyl serine, phosphatidic acid, 2-lysophosphatidyl choline, and a sphingomyelin. A fatty acid moiety can be selected, for example, from the non-limiting group consisting of lauric acid, myristic acid, myristoleic acid, palmitic acid, palmitoleic acid, stearic acid, oleic acid, linoleic acid, alpha-linolenic acid, erucic acid, phytanoic acid, arachidic acid, arachidonic acid, eicosapentaenoic acid, behenic acid, docosapentaenoic acid, and docosahexaenoic acid. Particular phospholipids can facilitate fusion to a membrane. For example, a cationic phospholipid can interact with one or more negatively charged phospholipids of a membrane (e.g., a cellular or intracellular membrane). Fusion of a phospholipid to a membrane can allow one or more elements (e.g., a therapeutic agent) of a lipid-containing composition (e.g., LNPs) to pass through the membrane permitting, e.g., delivery of the one or more elements to a target tissue. Non-natural phospholipid species including natural species with modifications and substitutions including branching, oxidation, cyclization, and alkynes are also contemplated. For example, a phospholipid can be functionalized with or cross-linked to one or more alkynes (e.g., an alkenyl group in which one or more double bonds is replaced with a triple bond). Under appropriate reaction conditions, an alkyne group can undergo a copper-catalyzed cycloaddition upon exposure to an azide. Such reactions can be useful in functionalizing a lipid bilayer of a nanoparticle composition to facilitate membrane permeation or cellular recognition or in conjugating a nanoparticle composition to a useful component such as a targeting or imaging moiety (e.g., a dye). Phospholipids include, but are not limited to, glycerophospholipids such as phosphatidylcholines, phosphatidylethanolamines, phosphatidylserines, phosphatidylinositols, phosphatidy glycerols, and phosphatidic acids. Phospholipids also include phosphosphingolipid, such as sphingomyelin. In some embodiments, a phospholipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-Distearoyl-sn-glycero-3-phosphoethanolamine (DSPE), 1,2- dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleoyl-sn-glycero-3- phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero-phosphocholine (DMPC), 1,2-dioleoyl-sn- glycero-3-phosphocholine (DOPC), l,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2- diundecanoyl-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3- phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleoyl-2 cholesterylhemisuccinoyl-sn-glycero-3-phosphocholine (OChemsPC), 1-hexadecyl- sn-glycero-3-phosphocholine (C16 Lyso PC), 1,2-dilinolenoyl-sn-glycero-3-phosphocholine,1,2- diarachidonoyl-sn-glycero-3-phosphocholine, 1,2-didocosahexaenoyl-sn-glycero-3- phosphocholine, 1,2-diphytanoyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE), 1,2- distearoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleoyl-sn-glycero-3- phosphoethanolamine, 1,2-dilinolenoyl-sn-glycero-3-phosphoethanolamine, 1,2-diarachidonoyl- sn-glycero-3-phosphoethanolamine, 1,2-didocosahexaenoyl-sn-glycero-3-phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-rac-(1-glycerol) sodium salt (DOPG), sphingomyelin, or mixtures thereof. Formula (HI) In certain embodiments, a phospholipid is an analog or variant of DSPC. In certain embodiments, a phospholipid is a compound of Formula (HI):
Figure imgf000134_0001
(HI), or a salt thereof, wherein: each R1 is independently optionally substituted alkyl; or optionally two R1 are joined together with the intervening atoms to form optionally substituted monocyclic carbocyclyl or optionally substituted monocyclic heterocyclyl; or optionally three R1 are joined together with the intervening atoms to form optionally substituted bicyclic carbocyclyl or optionally substitute bicyclic heterocyclyl; n is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; m is 0, 1, 2, 3, 4, 5,
Figure imgf000134_0002
A is of the formula: or ; each instance of L2 is independently a bond or optionally substituted C1-6 alkylene, wherein one methylene unit of the optionally substituted C1-6 alkylene is optionally replaced with O, N(RN), S, C(O), C(O)N(RN), NRNC(O), C(O)O, OC(O), OC(O)O, OC(O)N(RN), NRNC(O)O, or NRNC(O)N(RN); each instance of R2 is independently optionally substituted C1-30 alkyl, optionally substituted C1-30 alkenyl, or optionally substituted C1-30 alkynyl; optionally wherein one or more methylene units of R2 are independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted ,
Figure imgf000135_0002
, N(RN)S(O)O, S(O)2, N(RN)S(O)2, S(O)2N(RN), N(RN)S(O)2N(RN), OS(O)2N(RN), or N(RN)S(O)2O; each instance of RN is independently hydrogen, optionally substituted alkyl, or a nitrogen protecting group; Ring B is optionally substituted carbocyclyl, optionally substituted heterocyclyl, optionally substituted aryl, or optionally substituted heteroaryl; and p is 1 or 2. In certain embodiments, the compound is not of the formula:
Figure imgf000135_0001
, wherein each instance of R2 is independently unsubstituted alkyl, unsubstituted alkenyl, or unsubstituted alkynyl. In some embodiments, the phospholipids may be one or more of the phospholipids described in PCT Application No. PCT/US2018/037922. In some embodiments, the lipid nanoparticle comprises a molar ratio of 5-25% non- cationic lipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15-20%, 20-25%, or 25-30% non-cationic lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% non-cationic lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 5-25% phospholipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15-20%, 20-25%, or 25-30% phospholipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% phospholipid lipid. Structural lipids The lipid composition of a pharmaceutical composition disclosed herein can comprise one or more structural lipids. As used herein, the term “structural lipid” includes sterols and also to lipids containing sterol moieties. Incorporation of structural lipids in the lipid nanoparticle may help mitigate aggregation of other lipids in the particle. Structural lipids can be selected from the group including but not limited to, cholesterol, fecosterol, sitosterol, ergosterol, campesterol, stigmasterol, brassicasterol, tomatidine, tomatine, ursolic acid, alpha-tocopherol, hopanoids, phytosterols, steroids, and mixtures thereof. In some embodiments, the structural lipid is a sterol. As defined herein, “sterols” are a subgroup of steroids consisting of steroid alcohols. In certain embodiments, the structural lipid is a steroid. In certain embodiments, the structural lipid is cholesterol. In certain embodiments, the structural lipid is an analog of cholesterol. In certain embodiments, the structural lipid is alpha-tocopherol. In some embodiments, the structural lipids may be one or more of the structural lipids described in U.S. Application No.16/493,814. In some embodiments, the lipid nanoparticle comprises a molar ratio of 25-55% structural lipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 10-55%, 25-50%, 25-45%, 25-40%, 25-35%, 25-30%, 30-55%, 30- 50%, 30-45%, 30-40%, 30-35%, 35-55%, 35-50%, 35-45%, 35-40%, 40-55%, 40-50%, 40-45%, 45-55%, 45-50%, or 50-55% structural lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or 55% structural lipid. In some embodiments, the lipid nanoparticle comprises 30-45 mol% sterol, optionally 35- 40 mol%, for example, 30-31 mol%, 31-32 mol%, 32-33 mol%, 33-34 mol%, 34-35 mol%, 35- 36 mol%, 36-37 mol%, 37-38 mol%, 38-39 mol%, or 39-40 mol%. In some embodiments, the lipid nanoparticle comprises 25-55 mol% sterol. For example, the lipid nanoparticle may comprise 25-50 mol%, 25-45 mol%, 25-40 mol%, 25-35 mol%, 25-30 mol%, 30-55 mol%, 30- 50 mol%, 30-45 mol%, 30-40 mol%, 30-35 mol%, 35-55 mol%, 35-50 mol%, 35-45 mol%, 35- 40 mol%, 40-55 mol%, 40-50 mol%, 40-45 mol%, 45-55 mol%, 45-50 mol%, or 50-55 mol% sterol. In some embodiments, the lipid nanoparticle comprises 25 mol%, 30 mol%, 35 mol%, 40 mol%, 45 mol%, 50 mol%, or 55 mol% sterol. In some embodiments, the lipid nanoparticle comprises 35 – 40 mol% cholesterol. For example, the lipid nanoparticle may comprise 35, 35.5, 36, 36.5, 37, 37.5, 38, 38.5, 39, 39.5, or 40 mol% cholesterol. Polyethylene glycol (PEG)-Lipids The lipid composition of a pharmaceutical composition disclosed herein can comprise one or more polyethylene glycol (PEG) lipids. As used herein, the term “PEG-lipid” or “PEG-modified lipid” refers to polyethylene glycol (PEG)-modified lipids. Non-limiting examples of PEG-lipids include PEG-modified phosphatidylethanolamine and phosphatidic acid, PEG-ceramide conjugates (e.g., PEG-CerC14 or PEG-CerC20), PEG-modified dialkylamines, and PEG-modified 1,2-diacyloxypropan-3- amines. Such lipids are also referred to as PEGylated lipids. For example, a PEG lipid can be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid. In some embodiments, the PEG-lipid includes, but not limited to 1,2-dimyristoyl-sn- glycerol methoxypolyethylene glycol (PEG-DMG), 1,2-distearoyl-sn-glycero-3- phosphoethanolamine-N-[amino(polyethylene glycol)] (PEG-DSPE), PEG-disteryl glycerol (PEG-DSG), PEG-dipalmetoleyl, PEG-dioleyl, PEG-distearyl, PEG-diacylglycamide (PEG- DAG), PEG-dipalmitoyl phosphatidylethanolamine (PEG-DPPE), or PEG-l,2- dimyristyloxlpropyl-3-amine (PEG-c-DMA). In some embodiments, the PEG-lipid is selected from the group consisting of a PEG- modified phosphatidylethanolamine, a PEG-modified phosphatidic acid, a PEG-modified ceramide, a PEG-modified dialkylamine, a PEG-modified diacylglycerol, a PEG-modified dialkylglycerol, and mixtures thereof. In some embodiments, the PEG-modified lipid is PEG- DMG, PEG-c-DOMG (also referred to as PEG-DOMG), PEG-DSG, and/or PEG-DPG. In some embodiments, the lipid moiety of the PEG-lipids includes those having lengths of from about C14 to about C22, preferably from about C14 to about C16. In some embodiments, a PEG moiety, for example an mPEG-NH2, has a size of about 1000, 2000, 5000, 10,000, 15,000 or 20,000 daltons. In some embodiments, the PEG-lipid is PEG2k-DMG. In some embodiments, the lipid nanoparticles can comprise a PEG lipid which is a non- diffusible PEG. Non-limiting examples of non-diffusible PEGs include PEG-DSG and PEG- DSPE. PEG-lipids are known in the art, such as those described in U.S. Patent No.8158601 and International Publ. No. WO 2015/130584 A2, which are incorporated herein by reference in their entirety. In general, some of the other lipid components (e.g., PEG lipids) of various formulae may be synthesized as described International Patent Application No. PCT/US2016/000129, filed December 10, 2016, entitled “Compositions and Methods for Delivery of Therapeutic Agents,” which is incorporated by reference in its entirety. The lipid component of a lipid nanoparticle composition may include one or more molecules comprising polyethylene glycol, such as PEG or PEG-modified lipids. Such species may be alternately referred to as PEGylated lipids. A PEG lipid is a lipid modified with polyethylene glycol. A PEG lipid may be selected from the non-limiting group including PEG- modified phosphatidylethanolamines, PEG-modified phosphatidic acids, PEG-modified ceramides, PEG-modified dialkylamines, PEG-modified diacylglycerols, PEG-modified dialkylglycerols, and mixtures thereof. For example, a PEG lipid may be PEG-c-DOMG, PEG- DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid. In some embodiments the PEG-modified lipids are a modified form of PEG DMG. PEG- DMG has the following structure:
Figure imgf000138_0001
In some embodiments, PEG lipids can be PEGylated lipids described in International Publication No. WO2012099755, the contents of which is herein incorporated by reference in its entirety. Any of these exemplary PEG lipids may be modified to comprise a hydroxyl group on the PEG chain. In certain embodiments, the PEG lipid is a PEG-OH lipid. As generally defined herein, a “PEG-OH lipid” (also referred to herein as “hydroxy-PEGylated lipid”) is a PEGylated lipid having one or more hydroxyl (–OH) groups on the lipid. In certain embodiments, the PEG- OH lipid includes one or more hydroxyl groups on the PEG chain. In certain embodiments, a PEG-OH or hydroxy-PEGylated lipid comprises an –OH group at the terminus of the PEG chain. Each possibility represents a separate embodiment. Formula (PI) In a PEG lipid is a compound of Formula (PI):
Figure imgf000138_0002
(PI), or salts thereof, wherein: R3 is –ORO; RO is hydrogen, optionally substituted alkyl, or an oxygen protecting group; r is an integer between 1 and 100, inclusive; L1 is optionally substituted C1-10 alkylene, wherein at least one methylene of the optionally substituted C1-10 alkylene is independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, O, N(RN), S, C(O), C(O)N(RN), NRNC(O), C(O)O, OC(O), OC(O)O, OC(O)N(RN), NRNC(O)O, or NRNC(O)N(RN); D is a moiety obtained by click chemistry or a moiety cleavable under physiological conditions; m is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; A is of the formula: ; 2
Figure imgf000139_0001
each instance of L is a or substituted C1-6 alkylene, wherein one methylene unit of the optionally substituted C1-6 alkylene is optionally replaced with O, N(RN), S, C(O), C(O)N(RN), NRNC(O), C(O)O, OC(O), OC(O)O, OC(O)N(RN), NRNC(O)O, or NRNC(O)N(RN); each instance of R2 is independently optionally substituted C1-30 alkyl, optionally substituted C1-30 alkenyl, or optionally substituted C1-30 alkynyl; optionally wherein one or more methylene units of R2 are independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, N(RN), O, S, C(O), C(O)N(RN), NRNC(O), NRNC(O)N(RN), C(O)O, OC(O), , , N
Figure imgf000139_0002
N(RN)S(O)2O; each instance of RN is independently hydrogen, optionally substituted alkyl, or a nitrogen protecting group; Ring B is optionally substituted carbocyclyl, optionally substituted heterocyclyl, optionally substituted aryl, or optionally substituted heteroaryl; and p is 1 or 2. In certain embodiments, the compound of Fomula (PI) is a PEG-OH lipid (i.e., R3 is – ORO, and RO is hydrogen). In certain embodiments, the compound of Formula (PI) is of Formula
Figure imgf000139_0003
(PI-OH), or a salt thereof. Formula (PII) In certain embodiments, a PEG lipid is a PEGylated fatty acid. In certain embodiments, a PEG lipid is a compound of Formula (PII). In some embodiments, compounds of Formula (PII) have the following formula: (PII), or a salts thereof, wherein:
Figure imgf000140_0001
R3 is–ORO; RO is hydrogen, optionally substituted alkyl or an oxygen protecting group; r is an integer between 1 and 100, inclusive; R5 is optionally substituted C10-40 alkyl, optionally substituted C10-40 alkenyl, or optionally substituted C10-40 alkynyl; and optionally one or more methylene groups of R5 are replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, N(RN), O, S, C(O), C(O)N(RN), - NRNC(O), NRNC(O)N(RN), C(O)O, OC(O), OC(O)O, OC(O)N(RN), NRNC(O)O, C(O)S, SC(O), C , - S , - N
Figure imgf000140_0002
each instance of RN is independently hydrogen, optionally substituted alkyl, or a nitrogen protecting group. In certain embodiments, the compound of Formula (PII) is of Formula (PII-OH):
Figure imgf000140_0003
(PII-OH), or a salt thereof. In some embodiments, r is 40-50. In some embodiments the of Formula is:
Figure imgf000140_0004
. or a salt thereof.
Figure imgf000140_0005
. In some embodiments, the lipid composition of the pharmaceutical compositions disclosed herein does not comprise a PEG-lipid. In some embodiments, the PEG-lipids may be one or more of the PEG lipids described in U.S. Application No. US15/674,872. In some embodiments, the lipid nanoparticle comprises a molar ratio of 0.5-15% PEG lipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 0.5-10%, 0.5-5%, 1-15%, 1-10%, 1-5%, 2-15%, 2-10%, 2-5%, 5-15%, 5-10%, or 10-15% PEG lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, or 15% PEG- lipid. In some embodiments, the lipid nanoparticle comprises 1-5% PEG-modified lipid, optionally 1-3 mol%, for example 1.5 to 2.5 mol%, 1-2 mol%, 2-3 mol%, 3-4 mol%, or 4-5 mol%. In some embodiments, the lipid nanoparticle comprises 0.5-15 mol% PEG-modified lipid. For example, the lipid nanoparticle may comprise 0.5-10 mol%, 0.5-5 mol%, 1-15 mol%, 1-10 mol%, 1-5 mol%, 2-15 mol%, 2-10 mol%, 2-5 mol%, 5-15 mol%, 5-10 mol%, or 10-15 mol%. In some embodiments, the lipid nanoparticle comprises 0.5 mol%, 1 mol%, 2 mol%, 3 mol%, 4 mol%, 5 mol%, 6 mol%, 7 mol%, 8 mol%, 9 mol%, 10 mol%, 11 mol%, 12 mol%, 13 mol%, 14 mol%, or 15 mol% PEG-modified lipid. Some embodiments comprise adding PEG to a composition comprising an LNP encapsulating a nucleic acid (e.g., which already includes PEG in the amounts listed above). In embodiments comprise adding about 0.5mo% or more PEG to an LNP composition, such as about 1mol%, about 1.5mol%, about 2mol%, about 2.5mol%, about 3mol%, about 3.5mol%, about 4mol%, about 5mol%, or more after formation of an LNP composition (e.g., which already contains PEG in amount listed elsewhere herein). In some embodiments, the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid, 5-25 mol% non-cationic lipid, 25-55 mol% sterol, and 0.5-15 mol% PEG-modified lipid. In some embodiments, a LNP comprises an ionizable amino lipid of Compound 1, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG. In some embodiments, a LNP comprises an ionizable amino lipid of Compound 2, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG. In some embodiments, a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising PEG-DMG. In some embodiments, a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising a compound having Formula (PII). In some embodiments, a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII). In some embodiments, a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII). In some embodiments, a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid having Formula (HI), a structural lipid, and a PEG lipid comprising a compound having Formula (PII). In some embodiments, the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 10 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG. In some embodiments, the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 1.5 mol% DMG-PEG. In some embodiments, the lipid nanoparticle comprises 48 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG. In some embodiments, a LNP comprises an N:P ratio of from about 2:1 to about 30:1. In some embodiments, a LNP comprises an N:P ratio of about 6:1. In some embodiments, a LNP comprises an N:P ratio of about 3:1, 4:1, or 5:1. In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of from about 10:1 to about 100:1. In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 20:1. In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 10:1. Some embodiments comprise a composition having one or more LNPs having a diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less. Some embodiments comprise a composition having a mean LNP diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less. In some embodiments, the composition has a mean LNP diameter from about 30nm to about 150nm, or a mean diameter from about 60nm to about 120nm. A LNP may comprise or one or more types of lipids, including but not limited to amino lipids (e.g., ionizable amino lipids), neutral lipids, non-cationic lipids, charged lipids, PEG- modified lipids, phospholipids, structural lipids and sterols. In some embodiments, a LNP may further comprise one or more cargo molecules, including but not limited to nucleic acids (e.g., mRNA, plasmid DNA, DNA or RNA oligonucleotides, siRNA, shRNA, snRNA, snoRNA, lncRNA, etc.), small molecules, proteins and peptides. In some embodiments, the composition comprises a liposome. A liposome is a lipid particle comprising lipids arranged into one or more concentric lipid bilayers around a central region. The central region of a liposome may comprises an aqueous solution, suspension, or other aqueous composition. In some embodiments, a lipid nanoparticle may comprise two or more components (e.g., amino lipid and nucleic acid, PEG-lipid, phospholipid, structural lipid). For instance, a lipid nanoparticle may comprise an amino lipid and a nucleic acid. Compositions comprising the lipid nanoparticles may be used for a wide variety of applications, including the stealth delivery of therapeutic payloads with minimal adverse innate immune response. Effective in vivo delivery of nucleic acids represents a continuing medical challenge. Exogenous nucleic acids (i.e., originating from outside of a cell or organism) are readily degraded in the body, e.g., by the immune system. Accordingly, effective delivery of nucleic acids to cells often requires the use of a particulate carrier (e.g., lipid nanoparticles). The particulate carrier should be formulated to have minimal particle aggregation, be relatively stable prior to intracellular delivery, effectively deliver nucleic acids intracellularly, and illicit no or minimal immune response. To achieve minimal particle aggregation and pre-delivery stability, many conventional particulate carriers have relied on the presence and/or concentration of certain components (e.g., PEG-lipid). However, it has been discovered that certain components may decrease the stability of encapsulated nucleic acids (e.g., mRNA molecules). The reduced stability may limit the broad applicability of the particulate carriers. As such, there remains a need for methods by which to improve the stability of nucleic acid (e.g., mRNA) encapsulated within lipid nanoparticles. In some embodiments, the lipid nanoparticles comprise one or more of ionizable molecules, polynucleotides, and optional components, such as structural lipids, sterols, neutral lipids, phospholipids and a molecule capable of reducing particle aggregation (e.g., polyethylene glycol (PEG), PEG-modified lipid), such as those described above. In some embodiments, a LNP may include one or more ionizable molecules (e.g., amino lipids or ionizable lipids). The ionizable molecule may comprise a charged group and may have a certain pKa. In certain embodiments, the pKa of the ionizable molecule may be greater than or equal to about 6, greater than or equal to about 6.2, greater than or equal to about 6.5, greater than or equal to about 6.8, greater than or equal to about 7, greater than or equal to about 7.2, greater than or equal to about 7.5, greater than or equal to about 7.8, greater than or equal to about 8. In some embodiments, the pKa of the ionizable molecule may be less than or equal to about 10, less than or equal to about 9.8, less than or equal to about 9.5, less than or equal to about 9.2, less than or equal to about 9.0, less than or equal to about 8.8, or less than or equal to about 8.5. Combinations of the above referenced ranges are also possible (e.g., greater than or equal to 6 and less than or equal to about 8.5). Other ranges are also possible. In embodiments in which more than one type of ionizable molecule are present in a particle, each type of ionizable molecule may independently have a pKa in one or more of the ranges described above. In general, an ionizable molecule comprises one or more charged groups. In some embodiments, an ionizable molecule may be positively charged or negatively charged. For instance, an ionizable molecule may be positively charged. For example, an ionizable molecule may comprise an amine group. As used herein, the term “ionizable molecule” has its ordinary meaning in the art and may refer to a molecule or matrix comprising one or more charged moiety. As used herein, a “charged moiety” is a chemical moiety that carries a formal electronic charge, e.g., monovalent (+1, or -1), divalent (+2, or -2), trivalent (+3, or -3), etc. The charged moiety may be anionic (i.e., negatively charged) or cationic (i.e., positively charged). Examples of positively-charged moieties include amine groups (e.g., primary, secondary, and/or tertiary amines), ammonium groups, pyridinium group, guanidine groups, and imidizolium groups. In a particular embodiment, the charged moieties comprise amine groups. Examples of negatively- charged groups or precursors thereof, include carboxylate groups, sulfonate groups, sulfate groups, phosphonate groups, phosphate groups, hydroxyl groups, and the like. The charge of the charged moiety may vary, in some cases, with the environmental conditions, for example, changes in pH may alter the charge of the moiety, and/or cause the moiety to become charged or uncharged. In general, the charge density of the molecule and/or matrix may be selected as desired. In some cases, an ionizable molecule (e.g., an amino lipid or ionizable lipid) may include one or more precursor moieties that can be converted to charged moieties. For instance, the ionizable molecule may include a neutral moiety that can be hydrolyzed to form a charged moiety, such as those described above. As a non-limiting specific example, the molecule or matrix may include an amide, which can be hydrolyzed to form an amine, respectively. Those of ordinary skill in the art will be able to determine whether a given chemical moiety carries a formal electronic charge (for example, by inspection, pH titration, ionic conductivity measurements, etc.), and/or whether a given chemical moiety can be reacted (e.g., hydrolyzed) to form a chemical moiety that carries a formal electronic charge. The ionizable molecule (e.g., amino lipid or ionizable lipid) may have any suitable molecular weight. In certain embodiments, the molecular weight of an ionizable molecule is less than or equal to about 2,500 g/mol, less than or equal to about 2,000 g/mol, less than or equal to about 1,500 g/mol, less than or equal to about 1,250 g/mol, less than or equal to about 1,000 g/mol, less than or equal to about 900 g/mol, less than or equal to about 800 g/mol, less than or equal to about 700 g/mol, less than or equal to about 600 g/mol, less than or equal to about 500 g/mol, less than or equal to about 400 g/mol, less than or equal to about 300 g/mol, less than or equal to about 200 g/mol, or less than or equal to about 100 g/mol. In some instances, the molecular weight of an ionizable molecule is greater than or equal to about 100 g/mol, greater than or equal to about 200 g/mol, greater than or equal to about 300 g/mol, greater than or equal to about 400 g/mol, greater than or equal to about 500 g/mol, greater than or equal to about 600 g/mol, greater than or equal to about 700 g/mol, greater than or equal to about 1000 g/mol, greater than or equal to about 1,250 g/mol, greater than or equal to about 1,500 g/mol, greater than or equal to about 1,750 g/mol, greater than or equal to about 2,000 g/mol, or greater than or equal to about 2,250 g/mol. Combinations of the above ranges (e.g., at least about 200 g/mol and less than or equal to about 2,500 g/mol) are also possible. In embodiments in which more than one type of ionizable molecules are present in a particle, each type of ionizable molecule may independently have a molecular weight in one or more of the ranges described above. In some embodiments, the percentage (e.g., by weight, or by mole) of a single type of ionizable molecule (e.g., amino lipid or ionizable lipid) and/or of all the ionizable molecules within a particle may be greater than or equal to about 15%, greater than or equal to about 16%, greater than or equal to about 17%, greater than or equal to about 18%, greater than or equal to about 19%, greater than or equal to about 20%, greater than or equal to about 21%, greater than or equal to about 22%, greater than or equal to about 23%, greater than or equal to about 24%, greater than or equal to about 25%, greater than or equal to about 30%, greater than or equal to about 35%, greater than or equal to about 40%, greater than or equal to about 42%, greater than or equal to about 45%, greater than or equal to about 48%, greater than or equal to about 50%, greater than or equal to about 52%, greater than or equal to about 55%, greater than or equal to about 58%, greater than or equal to about 60%, greater than or equal to about 62%, greater than or equal to about 65%, or greater than or equal to about 68%. In some instances, the percentage (e.g., by weight, or by mole) may be less than or equal to about 70%, less than or equal to about 68%, less than or equal to about 65%, less than or equal to about 62%, less than or equal to about 60%, less than or equal to about 58%, less than or equal to about 55%, less than or equal to about 52%, less than or equal to about 50%, or less than or equal to about 48%. Combinations of the above referenced ranges are also possible (e.g., greater than or equal to 20% and less than or equal to about 60%, greater than or equal to 40% and less than or equal to about 55%, etc.). In embodiments in which more than one type of ionizable molecule is present in a particle, each type of ionizable molecule may independently have a percentage (e.g., by weight, or by mole) in one or more of the ranges described above. The percentage (e.g., by weight, or by mole) may be determined by extracting the ionizable molecule(s) from the dried particles using, e.g., organic solvents, and measuring the quantity of the agent using high pressure liquid chromatography (i.e., HPLC), liquid chromatography-mass spectrometry (LC-MS), nuclear magnetic resonance (NMR), or mass spectrometry (MS). Those of ordinary skill in the art would be knowledgeable of techniques to determine the quantity of a component using the above-referenced techniques. For example, HPLC may be used to quantify the amount of a component, by, e.g., comparing the area under the curve of a HPLC chromatogram to a standard curve. It should be understood that the terms “charged” or “charged moiety” does not refer to a “partial negative charge" or “partial positive charge" on a molecule. The terms “partial negative charge" and “partial positive charge" are given their ordinary meaning in the art. A “partial negative charge" may result when a functional group comprises a bond that becomes polarized such that electron density is pulled toward one atom of the bond, creating a partial negative charge on the atom. Those of ordinary skill in the art will, in general, recognize bonds that can become polarized in this way. A lipid composition comprises one or more lipids. Such lipids may include those useful in the preparation of lipid nanoparticle formulations as described above or as known in the art. Stabilizing compounds Some embodiments of the compositions are stabilized pharmaceutical compositions. Various non-viral delivery systems, including nanoparticle formulations, present attractive opportunities to overcome many challenges associated with mRNA delivery. Lipid nanoparticles (LNPs) have drawn particular attention in recent years as various LNP formulations have shown promise in a variety of pharmaceutical applications. However, lipids have been shown to degrade nucleic acids, including mRNA, and lipid nanoparticle formulations undergo rapid loss of purity when stored as refrigerated liquids. Moreover, the storage stability of mRNA encapsulated within LNPs is lower than that of unencapsulated mRNA. A class of compounds has been found to stabilize nucleic acids within a lipid carrier such as an LNP, an unexpected and unprecedented discovery which enables applications including extended refrigerated liquid shelf-life, extended in-use periods at room temperature, and extended in-use stability at physiological temperatures up to higher temperatures such as 40°C. Such stabilizing compounds solve a critical problem, as current manufacturing processes and formulations experience a 5-10% purity loss during LNP formation and processing that is typical with current large-scale LNP production. In some embodiments, the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a stabilizing compound (e.g., a compound of Formula (I), of Formula (II), or a tautomer or solvate thereof). In some embodiments, the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a lipid, and a compound of Formula (I):
Figure imgf000147_0001
is a single bond or a double bond; R1 is H; R2 is OCH3, or together with R3 is OCH2O; R3 is OCH3, or together with R2 is OCH2O; R4 is H; R5 is H or OCH3; R6 is OCH3; R7 is H or OCH3; R8 is H; R9 is H or CH3; and X is a pharmaceutically acceptable anion, e.g., a halide such as chloride. In some embodiments, the compound of Formula (I) has the structure of:
Figure imgf000147_0002
or Formula (Ia) Formula (Ib) Formula (Ic) or a tautomer or solvate thereof. In some embodiments, the stabilized pharmaceutical composition comprises a nucleic acid formulation a nucleic acid and a lipid, and a compound of Formula (II):
Figure imgf000147_0003
(II), or a tautomer or solvate thereof, wherein: R10 is H; R11 is H; R12 together with R13 is OCH2O; R14 is H; R15 together with R16 is OCH2O; R17 is H; and X is a pharmaceutically acceptable anion, e.g., a halide such as chloride. In some embodiments, the compound of Formula (II) has the structure of: Stabilizing compounds of Formulas (I), (Ia), (Ib), (Ic), (II), and (Iia) are described in International Application No. PCT/US2022/025967, which is incorporated by reference herein in its entirety. In some embodiments, the nucleic acid formulation comprises lipid nanoparticles. In some embodiments, the nucleic acid is mRNA. In some embodiments, the stabilizing compound (“the compound”) has a purity of at least 70%, 80%, 90%, 95%, or 99%. In some embodiments, the compound contains fewer than 100ppm of elemental metals. In some embodiments, the stabilized pharmaceutical composition (“the composition”) comprises a pharmaceutically acceptable metal chelator, e.g., EDTA (ethylenediaminetetraacetic acid) or DTPA (diethylenetriaminepentaacetic acid). In some embodiments, the composition is an aqueous solution. In some embodiments, the compound is present at a concentration between about 0.1mM and about 10mM in the aqueous solution. In some embodiments, the aqueous solution has a pH of or about 5 to 8, including pH of about 5, 5.5, 6, 6.5, 7, 7.5, or 8. In some embodiments, the aqueous solution does not comprise NaCl. In some embodiments, the aqueous solution comprises NaCl in a concentration of or about 150mM. In some embodiments, the aqueous solution comprises a phosphate buffer, a tris buffer, an acetate buffer, a histidine buffer, or a citrate buffer. In some embodiments, microbial growth in the composition is inhibited by the compound. In some embodiments, the composition is characterized as having a mRNA purity level of greater than 60%, greater than 70%, greater than 80%, or greater than 90% main peak mRNA purity after at least thirty days of storage. In some embodiments, the composition comprises a mRNA purity level of greater than 50% main peak mRNA purity after at least six months of storage. In some embodiments, the storage is at room temperature. In some embodiments, the composition comprises a lipid nanoparticle encapsulating a mRNA, and the composition comprises less than 50%, less than 60%, less than 70%, less than 80%, less than 90%, or less than 95% RNA fragments after at least thirty days of storage. In some embodiments, the storage temperature is greater than room temperature. In some embodiments, the storage temperature is about 4°C. In some embodiments, the compound interacts with the nucleic acid comprised within a lipid nanostructure (e.g., a lipid nanoparticle, liposome, or lipoplex), e.g., via pi-pi stacking and/or by changing backbone helicity of the nucleic acid. In some embodiments, the compound intercalates with a nucleic acid. In some embodiments, the compound binds with a nucleic acid, e.g., reversible binding, and/or binding to the stranded regions of the nucleic acid. In some embodiments, the compound self-associates, binds to nucleic acid ribose contacts, and/or binds to nucleic acid base contacts. In some embodiments, the compound does not substantially bind to nucleic acid phosphate contacts. In some embodiments, the positive charge of the compound contributes to nucleic acid binding. In some embodiments, the interacts with the nucleic acid with a binding affinity defined by an equilibrium dissociation constant of less than 10-3 M (e.g., less than 10-4 M, less than 10-5 M, less than 10-5 M, less than 10-7 M, less than 10-8 M, or less than 10-9 M). In some embodiments, the compound interacts with a nucleic acid and provides shielding from solvent, e.g., water. In some embodiments, the compound shields ribose from solvent more than the compound shields the phosphate groups of the nucleic acid. In some embodiments, the solvent exposure is measured by the solvent accessible surface area (SASA). In some embodiments, a stabilizing compound decreases the solvent accessible area of ribose to about 5- 10 nm2. In some embodiments, a stabilizing compound decreases the solvent accessible area of ribose to about 6-8 nm2. In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 9-12 nm2. In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 10-11 nm2. In some embodiments, a nucleic acid that is conformationally stabilized by the compound exhibits thermal unfolding temperatures (measured by circular dichroism or DSC, for example) that are higher than in the absence of the compound. In some embodiments, the compound confers increased stability, e.g., thermal stability, to the nucleic acid in a folded structure, e.g., relative to its unfolded or less folded or more linear form. In some embodiments, the compound causes compaction of the nucleic acid upon interaction with the nucleic acid. In some embodiments, the compound causes a decrease in the hydrodynamic radius of the nucleic acid molecule upon interaction with the nucleic acid. In some embodiments, a stabilizing compound causes compaction or a decrease in the hydrodynamic radius of a nucleic acid molecule by 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, or more. In some embodiments, a stabilizing compound causes compaction or a decrease in the hydrodynamic radius of a nucleic acid molecule when the compound is in a concentration of 1 μM, 2 μM, 3 μM, 4 μM, 5 μM, 6 μM, 7 μM, 8 μM, 9 μM, 10 μM, 15 μM, 20 μM, 25 μM, 30 μM, 35 μM, 40 μM, 45 μM, 50 μM, 60 μM, 70 μM, 80 μM, 90 μM, or 100 μM. Pharmaceutical Compositions Some aspects relate to compositions (e.g., pharmaceutical compositions), methods, kits and reagents for prevention or treatment of coronavirus in humans and other mammals, for example. The compositions can be used as therapeutic or prophylactic agents. They may be used in medicine to prevent and/or treat a coronavirus infection. In some embodiments, the SARS-CoV-2 vaccine containing RNA can be administered to a subject (e.g., a mammalian subject, such as a human subject), and the RNA polynucleotides are translated in vivo to produce an antigenic polypeptide (antigen). An “effective amount” of a composition (e.g., comprising RNA) is based, at least in part, on the target tissue, target cell type, means of administration, physical characteristics of the RNA (e.g., length, nucleotide composition, and/or extent of modified nucleosides), other components of the vaccine, and other determinants, such as age, body weight, height, sex and general health of the subject. Typically, an effective amount of a composition provides an induced or boosted immune response as a function of antigen production in the cells of the subject. In some embodiments, an effective amount of the composition containing RNA polynucleotides having at least one chemical modifications are more efficient than a composition containing a corresponding unmodified polynucleotide encoding the same antigen or a peptide antigen. Increased antigen production may be demonstrated by increased cell transfection (the percentage of cells transfected with the RNA vaccine), increased protein translation and/or expression from the polynucleotide, decreased nucleic acid degradation (as demonstrated, for example, by increased duration of protein translation from a modified polynucleotide), or altered antigen specific immune response of the host cell. The term "pharmaceutical composition" refers to the combination of an active agent with a carrier, inert or active, making the composition especially suitable for diagnostic or therapeutic use in vivo or ex vivo. A "pharmaceutically acceptable carrier," after administered to or upon a subject, does not cause undesirable physiological effects. The carrier in the pharmaceutical composition must be "acceptable" also in the sense that it is compatible with the active ingredient and can be capable of stabilizing it. One or more solubilizing agents can be utilized as pharmaceutical carriers for delivery of an active agent. Examples of a pharmaceutically acceptable carrier include, but are not limited to, biocompatible vehicles, adjuvants, additives, and diluents to achieve a composition usable as a dosage form. Examples of other carriers include colloidal silicon oxide, magnesium stearate, cellulose, and sodium lauryl sulfate. Additional suitable pharmaceutical carriers and diluents, as well as pharmaceutical necessities for their use, are described in Remington's Pharmaceutical Sciences. In some embodiments, the compositions (comprising polynucleotides and their encoded polypeptides) may be used for treatment or prevention of a coronavirus infection. A composition may be administered prophylactically or therapeutically as part of an active immunization scheme to healthy individuals or early in infection during the incubation phase or during active infection after onset of symptoms. In some embodiments, the amount of RNA provided to a cell, a tissue or a subject may be an amount effective for immune prophylaxis. A composition may be administered with other prophylactic or therapeutic compounds. As a non-limiting example, a prophylactic or therapeutic compound may be an adjuvant or a booster. As used herein, when referring to a prophylactic composition, such as a vaccine, the term “booster” refers to an extra administration of the vaccine composition and may include a traditional boost, seasonal boost or a pandemic shift boost. A booster (or booster vaccine) may be given after an earlier administration of the prophylactic composition. The time of administration between the initial administration of the prophylactic composition and the booster may be, but is not limited to, 1 minute, 2 minutes, 3 minutes, 4 minutes, 5 minutes, 6 minutes, 7 minutes, 8 minutes, 9 minutes, 10 minutes, 15 minutes, 20 minutes 35 minutes, 40 minutes, 45 minutes, 50 minutes, 55 minutes, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 11 hours, 12 hours, 13 hours, 14 hours, 15 hours, 16 hours, 17 hours, 18 hours, 19 hours, 20 hours, 21 hours, 22 hours, 23 hours, 1 day, 36 hours, 2 days, 3 days, 4 days, 5 days, 6 days, 1 week, 10 days, 2 weeks, 3 weeks, 1 month, 2 months, 3 months, 4 months, 5 months, or 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, one year, or more. In some embodiments, the time of administration between the initial administration of the prophylactic composition and the booster is at least 6 months. In exemplary embodiments, the time of administration between the initial administration of the prophylactic composition and the booster may be, but is not limited to, 1 week, 2 weeks, 3 weeks, 1 month, 2 months, 3 months, or 6 months. The booster may comprise the same or different mRNAs as compared to the earlier administration of the prophylactic composition. In some embodiments, the booster may comprise a combination of the same mRNA from the earlier administration of the prophylactic composition and at least one different mRNA. In some embodiments, the ratio of the mRNA from the earlier administration of the prophylactic composition and the at least one different mRNA is 1:1, 1:2, 1:4, 4:1, or 2:1. In one embodiment, the ratio is 1:1. In some embodiments, the booster may comprise different mRNAs as compared to the earlier administration of the prophylactic compositions. In some embodiments, such a booster may comprise 1, 2, 3, 4 or more mRNAs that were not present in the prophylactic composition. In some embodiments, the ratio of two mRNA polynucleotides (none of which were in the prophylactic composition) in the booster is 1:1, 1:2, 1:4, 4:1, or 2:1. In one embodiment, the ratio is 1:1. A boost or booster dose may be administered more than once, for example 2, 3, 4, 5, 6 or more times after the initial prophylactic (prime) dose. In some embodiments, a subsequent boost is administered within weeks, e.g., within 3-4 weeks of the first (or previous) boost. In some embodiments, a second boost is administered 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or more weeks after the first (or previous) boost. The booster, in some embodiments is monovalent (e.g., the mRNA encodes a single antigen). In some embodiments, the booster is multivalent (e.g., the mRNA encodes more than one antigen). In some embodiments, the booster dose is 5 µg-30 µg, 5 µg-25 µg, 5 µg-20 µg, 5 µg-15 µg, 5 µg-10 µg, 10 µg-30 µg, 10 µg-25 µg, 10 µg-20 µg, 10 µg-15 µg, 15 µg-30 µg, 15 µg-25 µg, 15 µg-20 µg, 20 µg-30 µg, 25 µg-30 µg, or 25 µg-300 µg. In some embodiments, the booster dose is 10 µg-60 µg, 10 µg-55 µg, 10 µg-50 µg, 10 µg-45 µg, 10 µg-40 µg, 10 µg-35 µg, 10 µg- 30 µg, 10 µg-25 µg, 10 µg-20 µg, 15 µg-60 µg, 15 µg-55 µg, 15 µg-50 µg, 15 µg-45 µg, 15 µg- 40 µg, 15 µg-35 µg, 15 µg-30 µg, 15 µg-25 µg, 15 µg-20 µg, 20 µg-60 µg, 20 µg-55 µg, 20 µg- 50 µg, 20 µg-45 µg, 20 µg-40 µg, 20 µg-35 µg, 20 µg-30 µg, 20 µg-25 µg, 25 µg-60 µg, 25 µg- 55 µg, 25 µg-50 µg, 25 µg-45 µg, 25 µg-40 µg, 25 µg-35 µg, 25 µg-30 µg, 30 µg-60 µg, 30 µg- 55 µg, 30 µg-50 µg, 30 µg-45 µg, 30 µg-40 µg, 30 µg-35 µg, 35 µg-60 µg, 35 µg-55 µg, 35 µg- 50 µg, 35 µg-45 µg, 35 µg-40 µg, 40 µg-60 µg, 40 µg-55 µg, 40 µg-50 µg, 40 µg-45 µg, 45 µg- 60 µg, 45 µg-55 µg, 45 µg-50 µg, 50 µg-60 µg, 50 µg-55 µg, or 55 µg-60 µg. In some embodiments, the booster dose is at least 10 µg and less than 25 µg of the composition. In some embodiments, the booster dose is at least 5 µg and less than 25 µg of the composition. For example, the booster dose is 5 µg, 10 µg, 15 µg, 20 µg, 25 µg, 30 µg, 35 µg, 40 µg, 45 µg, 50 µg, 55 µg, 60 µg, 65 µg, 70 µg, 75 µg, 80 µg, 85 µg, 90 µg, 95 µg, 100 µg, 110 µg, 120 µg, 130 µg, 140 µg, 150 µg, 160 µg, 170 µg, 180 µg, 190 µg, 200 µg, 250 µg, or 300 µg. In some embodiments, the booster dose is 50 μg. In some embodiments, a composition may be administered intramuscularly, intranasally or intradermally, similarly to the administration of inactivated vaccines known in the art. A composition may be utilized in various settings depending on the prevalence of the infection or the degree or level of unmet medical need. As a non-limiting example, the RNA vaccines may be utilized to treat and/or prevent a variety of infectious disease. RNA vaccines have superior properties in that they produce much larger antibody titers, better neutralizing immunity, produce more durable immune responses, and/or produce responses earlier than commercially available vaccines. Some aspects relate to pharmaceutical compositions including RNA and/or complexes optionally in combination with one or more pharmaceutically acceptable excipients. The RNA may be formulated or administered alone or in conjunction with one or more other components. For example, an immunizing composition may comprise other components including, but not limited to, adjuvants. In some embodiments, an immunizing composition does not include an adjuvant (it is adjuvant free). An RNA may be formulated or administered in combination with one or more pharmaceutically-acceptable excipients. In some embodiments, vaccine compositions comprise at least one additional active substances, such as, for example, a therapeutically-active substance, a prophylactically-active substance, or a combination of both. Vaccine compositions may be sterile, pyrogen-free or both sterile and pyrogen-free. General considerations in the formulation and/or manufacture of pharmaceutical agents, such as vaccine compositions, may be found, for example, in Remington: The Science and Practice of Pharmacy 21st ed., Lippincott Williams & Wilkins, 2005 (incorporated herein by reference in its entirety). In some embodiments, an immunizing composition is administered to humans, human patients or subjects. The phrase “active ingredient” generally refers to the RNA vaccines or the polynucleotides contained therein, for example, RNA polynucleotides (e.g., mRNA polynucleotides) encoding antigens. Formulations of the vaccine compositions may be prepared by any method known or hereafter developed in the art of pharmacology. In general, such preparatory methods include the step of bringing the active ingredient (e.g., mRNA polynucleotide) into association with an excipient and/or one or more other accessory ingredients, and then, if necessary and/or desirable, dividing, shaping and/or packaging the product into a desired single- or multi-dose unit. Relative amounts of the active ingredient, the pharmaceutically acceptable excipient, and/or any additional ingredients in a pharmaceutical composition will vary, depending upon the identity, size, and/or condition of the subject treated and further depending upon the route by which the composition is to be administered. By way of example, the composition may comprise between 0.1% and 100%, e.g., between 0.5 and 50%, between 1-30%, between 5-80%, at least 80% (w/w) active ingredient. In some embodiments, an RNA is formulated using one or more excipients to: (1) increase stability; (2) increase cell transfection; (3) permit the sustained or delayed release (e.g., from a depot formulation); (4) alter the biodistribution (e.g., target to specific tissues or cell types); (5) increase the translation of encoded protein in vivo; and/or (6) alter the release profile of encoded protein (antigen) in vivo. In addition to traditional excipients such as any and all solvents, dispersion media, diluents, or other liquid vehicles, dispersion or suspension aids, surface active agents, isotonic agents, thickening or emulsifying agents, preservatives, excipients can include, without limitation, lipidoids, liposomes, lipid nanoparticles, polymers, lipoplexes, core-shell nanoparticles, peptides, proteins, cells transfected with the RNA (e.g., for transplantation into a subject), hyaluronidase, nanoparticle mimics and combinations thereof. Administration and Dosing Some aspects relate to immunizing compositions (e.g., RNA vaccines), methods, kits and reagents for prevention and/or treatment of coronavirus infection in humans and other mammals. Immunizing compositions can be used as therapeutic or prophylactic agents. In some embodiments, immunizing compositions are used to provide prophylactic protection from coronavirus infection. In some embodiments, immunizing compositions are used to treat a coronavirus infection. In some embodiments, embodiments, immunizing compositions are used in the priming of immune effector cells, for example, to activate peripheral blood mononuclear cells (PBMCs) ex vivo, which are then infused (re-infused) into a subject. A subject may be any mammal, including non-human primate and human subjects. Typically, a subject is a human subject. In some embodiments, an immunizing composition (e.g., RNA vaccine) is administered to a subject (e.g., a mammalian subject, such as a human subject) in an effective amount to induce an antigen-specific immune response. The RNA encoding the coronavirus spike protein antigen is expressed and translated in vivo to produce the antigen, which then stimulates an immune response in the subject. Prophylactic protection from a coronavirus can be achieved following administration of an immunizing composition (e.g., an RNA vaccine). Immunizing compositions can be administered once, twice, three times, four times or more but it is likely sufficient to administer the vaccine once (optionally followed by a single booster). It is possible, although less desirable, to administer an immunizing composition to an infected individual to achieve a therapeutic response. Dosing may need to be adjusted accordingly. Some aspects relate to a method of eliciting an immune response in a subject against a coronavirus antigen (or multiple antigens). In some embodiments, a method involves administering to the subject an immunizing composition comprising a mRNA having an open reading frame encoding a coronavirus antigen, thereby inducing in the subject an immune response specific to the coronavirus antigen, wherein anti-antigen antibody titer in the subject is increased following vaccination relative to anti-antigen antibody titer in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the antigen. An “anti- antigen antibody” is a serum antibody the binds specifically to the antigen. A prophylactically effective dose is an effective dose that prevents infection with the virus at a clinically acceptable level. In some embodiments, the effective dose is a dose listed in a package insert for the vaccine. A traditional vaccine, as used herein, refers to a vaccine other than the mRNA vaccines. For instance, a traditional vaccine includes, but is not limited, to live microorganism vaccines, killed microorganism vaccines, subunit vaccines, protein antigen vaccines, DNA vaccines, virus like particle (VLP) vaccines, etc. In exemplary embodiments, a traditional vaccine is a vaccine that has achieved regulatory approval and/or is registered by a national drug regulatory body, for example the Food and Drug Administration (FDA) in the United States or the European Medicines Agency (EMA). In some embodiments, the anti-antigen antibody titer in the subject is increased 1 log to 10 log following vaccination relative to anti-antigen antibody titer in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the coronavirus or an unvaccinated subject. In some embodiments, the anti-antigen antibody titer in the subject is increased 1 log, 2 log, 3 log, 4 log, 5 log, or 10 log following vaccination relative to anti-antigen antibody titer in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the coronavirus or an unvaccinated subject. Some aspects relate to a method of eliciting an immune response in a subject against a coronavirus. The method involves administering to the subject a composition comprising an mRNA comprising an open reading frame encoding a coronavirus antigen, thereby inducing in the subject an immune response specific to the coronavirus, wherein the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine against the coronavirus at 2 times to 100 times the dosage level relative to the composition. In some embodiments, the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at twice the dosage level relative to a composition. In some embodiments, the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at three times the dosage level relative to a composition. In some embodiments, the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at 4 times, 5 times, 10 times, 50 times, or 100 times the dosage level relative to a composition. In some embodiments, the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at 10 times to 1000 times the dosage level relative to a composition. In some embodiments, the immune response in the subject is equivalent to an immune response in a subject vaccinated with a traditional vaccine at 100 times to 1000 times the dosage level relative to a composition. In some embodiments, the immune response is assessed by determining [protein] antibody titer in the subject. In some embodiments, the ability of serum or antibody from an immunized subject is tested for its ability to neutralize viral uptake or reduce coronavirus transformation of human B lymphocytes. In some embodiments, the ability to promote a robust T cell response(s) is measured using art recognized techniques. Some aspects relate to methods of eliciting an immune response in a subject against a coronavirus by administering to the subject composition comprising an mRNA having an open reading frame encoding a coronavirus antigen, thereby inducing in the subject an immune response specific to the coronavirus antigen, wherein the immune response in the subject is induced 2 days to 10 weeks earlier relative to an immune response induced in a subject vaccinated with a prophylactically effective dose of a traditional vaccine against the coronavirus. In some embodiments, the immune response in the subject is induced in a subject vaccinated with a prophylactically effective dose of a traditional vaccine at 2 times to 100 times the dosage level relative to a composition. In some embodiments, the immune response in the subject is induced 2 days, 3 days, 1 week, 2 weeks, 3 weeks, 5 weeks, or 10 weeks earlier relative to an immune response induced in a subject vaccinated with a prophylactically effective dose of a traditional vaccine. Some aspects relate to methods of eliciting an immune response in a subject against a coronavirus by administering to the subject an mRNA having an open reading frame encoding a first antigen, wherein the RNA does not include a stabilization element, and wherein an adjuvant is not co-formulated or co-administered with the vaccine. A composition may be administered by any route that results in a therapeutically effective outcome. These include, but are not limited, to intradermal, intramuscular, intranasal, and/or subcutaneous administration. Some aspects relate to methods comprising administering RNA vaccines to a subject in need thereof. The exact amount required will vary from subject to subject, depending on the species, age, and general condition of the subject, the severity of the disease, the particular composition, its mode of administration, its mode of activity, and the like. The RNA is typically formulated in dosage unit form for ease of administration and uniformity of dosage. It will be understood, however, that the total daily usage of the RNA may be decided by the attending physician within the scope of sound medical judgment. The specific therapeutically effective, prophylactically effective, or appropriate imaging dose level for any particular patient will depend upon a variety of factors including the disorder being treated and the severity of the disorder; the activity of the specific compound employed; the specific composition employed; the age, body weight, general health, sex and diet of the patient; the time of administration, route of administration, and rate of excretion of the specific compound employed; the duration of the treatment; drugs used in combination or coincidental with the specific compound employed; and like factors well known in the medical arts. The effective amount (e.g., effective dose) of the RNA may be as low as 20 µg, administered for example as a single dose or as two 10 µg doses. In some embodiments, the effective amount (e.g., effective dose) is a total dose of 20 µg-300 µg5 µg-30 µg, 5 µg-25 µg, 5 µg-20 µg, 5 µg-15 µg, 5 µg-10 µg, 10 µg-30 µg, 10 µg-25 µg, 10 µg-20 µg, 10 µg-15 µg, 15 µg- 30 µg, 15 µg-25 µg, 15 µg-20 µg, 20 µg-30 µg, 25 µg-30 µg, or 25 µg-300 µg. In some embodiments, the effective dose (e.g., effective amount) is at least 10 µg and less than 25 µg of the composition. In some embodiments, the effective dose (e.g., effective amount) is at least 5 µg and less than 25 µg of the composition. For example, the effective amount may be a total dose of 5 µg, 10 µg, 15 µg, 20 µg, 25 µg, 30 µg, 35 µg, 40 µg, 45 µg, 50 µg, 55 µg, 60 µg, 65 µg, 70 µg, 75 µg, 80 µg, 85 µg, 90 µg, 95 µg, 100 µg, 110 µg, 120 µg, 130 µg, 140 µg, 150 µg, 160 µg, 170 µg, 180 µg, 190 µg, 200 µg, 250 µg, or 300 µg. In some embodiments, the effective amount (e.g., effective dose) is a total dose of 10 μg. In some embodiments, the effective amount is a total dose of 20 μg (e.g., two 10 μg doses). In some embodiments, the effective amount is a total dose of 25 μg. In some embodiments, the effective amount is a total dose of 30 μg. In some embodiments, the effective amount is a total dose of 50 μg. In some embodiments, the effective amount is a total dose of 60 μg (e.g., two 30 μg doses). In some embodiments, the effective amount is a total dose of 75 μg. In some embodiments, the effective amount is a total dose of 100 μg. In some embodiments, the effective amount is a total dose of 150 μg. In some embodiments, the effective amount is a total dose of 200 μg. In some embodiments, the effective amount is a total dose of 250 μg. In some embodiments, the effective amount is a total dose of 300 μg. Any of the doses provided above may be an effective amount for a booster dose; for example, in some embodiments, the booster dose is a total dose of 50 μg. In some embodiments, the composition comprises two or more mRNA polynucleotides and effective amount is a total dose of 20 μg (e.g., 10 μg of a first mRNA and 10 μg of a second mRNA). In some embodiments, the composition comprises two or more mRNA polynucleotides and effective amount is a total dose of 50 μg (e.g., 25 μg of a first mRNA and 25 μg of a second mRNA). In some embodiments, the composition comprises two or more mRNA polynucleotides and effective amount is a total dose of 100 μg (e.g., 50 μg of a first mRNA and 50 μg of a second mRNA). The RNA can be formulated into a dosage form, such as an intranasal, intratracheal, or injectable (e.g., intravenous, intraocular, intravitreal, intramuscular, intradermal, intracardiac, intraperitoneal, and subcutaneous). Vaccine Efficacy Some aspects relate to compositions containing RNA (e.g., RNA vaccines), wherein the RNA is present in an effective amount to produce an antigen specific immune response in a subject (e.g., production of antibodies specific to a coronavirus antigen). “An effective amount” is a dose of the RNA effective to produce an antigen-specific immune response. Some aspects relate to methods of inducing an antigen-specific immune response in a subject. As used herein, an immune response to a vaccine or LNP is the development in a subject of a humoral and/or a cellular immune response to a (one or more) coronavirus protein(s) present in the vaccine. A “humoral” immune response refers to an immune response mediated by antibody molecules, including, e.g., secretory (IgA) or IgG molecules, while a “cellular” immune response is one mediated by T-lymphocytes (e.g., CD4+ helper and/or CD8+ T cells (e.g., CTLs) and/or other white blood cells. One important aspect of cellular immunity involves an antigen- specific response by cytolytic T-cells (CTLs). CTLs have specificity for peptide antigens that are presented in association with proteins encoded by the major histocompatibility complex (MHC) and expressed on the surfaces of cells. CTLs help induce and promote the destruction of intracellular microbes or the lysis of cells infected with such microbes. Another aspect of cellular immunity involves and antigen-specific response by helper T-cells. Helper T-cells act to help stimulate the function and focus the activity nonspecific effector cells against cells displaying peptide antigens in association with MHC molecules on their surface. A cellular immune response also leads to the production of cytokines, chemokines, and other such molecules produced by activated T-cells and/or other white blood cells including those derived from CD4+ and CD8+ T-cells. In some embodiments, the antigen-specific immune response is characterized by measuring an anti-coronavirus antigen antibody titer produced in a subject administered a composition. An antibody titer is a measurement of the amount of antibodies within a subject, for example, antibodies that are specific to a particular antigen or epitope of an antigen. Antibody titer is typically expressed as the inverse of the greatest dilution that provides a positive result. Enzyme-linked immunosorbent assay (ELISA) is a common assay for determining antibody titers, for example. A variety of serological tests can be used to measure antibody against encoded antigen of interest, for example, SAR-CoV-2 virus or SAR-CoV-2 viral antigen, e.g., SAR-CoV-2 spike or S protein, of domain thereof. These tests include the hemagglutination-inhibition test, complement fixation test, fluorescent antibody test, enzyme-linked immunosorbent assay (ELISA), and plaque reduction neutralization test (PRNT). Each of these tests measures different antibody activities. In exemplary embodiments, A plaque reduction neutralization test, or PRNT (e.g., PRNT50 or PRNT90) is used as a serological correlate of protection. PRNT measures the biological parameter of in vitro virus neutralization and is the most serologically virus-specific test among certain classes of viruses, correlating well to serum levels of protection from virus infection. The basic design of the PRNT allows for virus-antibody interaction to occur in a test tube or microtiter plate, and then measuring antibody effects on viral infectivity by plating the mixture on virus-susceptible cells, preferably cells of mammalian origin. The cells are overlaid with a semi-solid media that restricts spread of progeny virus. Each virus that initiates a productive infection produces a localized area of infection (a plaque), that can be detected in a variety of ways. Plaques are counted and compared back to the starting concentration of virus to determine the percent reduction in total virus infectivity. In PRNT, the serum sample being tested is usually subjected to serial dilutions prior to mixing with a standardized amount of virus. The concentration of virus is held constant such that, when added to susceptible cells and overlaid with semi-solid media, individual plaques can be discerned and counted. In this way, PRNT end- point titers can be calculated for each serum sample at any selected percent reduction of virus activity. In functional assays intended to assess vaccinal immunogenicity, the serum sample dilution series for antibody titration should ideally start below the “seroprotective” threshold titer. Regarding SARS-CoV-2 neutralizing antibodies, the “seroprotective” threshold titer remains unknown; but a seropositivity threshold of 1:10 can be considered a seroprotection threshold in certain embodiments. In some embodiments a neutralizing immune response is an immune response that produces a level of antibodies that meet or exceed a seroprotection threshold. PRNT end-point titers are expressed as the reciprocal of the last serum dilution showing the desired percent reduction in plaque counts. The PRNT titer can be calculated based on a 50% or greater reduction in plaque counts (PRNT50). A PRNT50 titer is preferred over titers using higher cut-offs (e.g., PRNT90) for vaccine sera, providing more accurate results from the linear portion of the titration curve. There are several ways to calculate PRNT titers. The simplest and most widely used way to calculate titers is to count plaques and report the titer as the reciprocal of the last serum dilution to show >50% reduction of the input plaque count as based on the back-titration of input plaques. Use of curve fitting methods from several serum dilutions may permit calculation of a more precise result. There are a variety of computer analysis programs available for this (e.g., SPSS or GraphPad Prism). In some embodiments, an antibody titer is used to assess whether a subject has had an infection or to determine whether immunizations are required. In some embodiments, an antibody titer is used to determine the strength of an autoimmune response, to determine whether a booster immunization is needed, to determine whether a previous vaccine was effective, and to identify any recent or prior infections. An antibody titer may be used to determine the strength of an immune response induced in a subject by a composition (e.g., RNA vaccine). In some embodiments, an anti-coronavirus antigen antibody titer produced in a subject is increased by at least 1 log relative to a control. For example, anti-coronavirus antigen antibody titer produced in a subject may be increased by at least 1.5, at least 2, at least 2.5, or at least 3 log relative to a control. In some embodiments, the anti-coronavirus antigen antibody titer produced in the subject is increased by 1, 1.5, 2, 2.5 or 3 log relative to a control. In some embodiments, the anti-coronavirus antigen antibody titer produced in the subject is increased by 1-3 log relative to a control. For example, the anti-coronavirus antigen antibody titer produced in a subject may be increased by 1-1.5, 1-2, 1-2.5, 1-3, 1.5-2, 1.5-2.5, 1.5-3, 2-2.5, 2-3, or 2.5-3 log relative to a control. In some embodiments, the anti-coronavirus antigen antibody titer produced in a subject is increased at least 2 times relative to a control. For example, the anti-coronavirus antigen n antibody titer produced in a subject may be increased at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times relative to a control. In some embodiments, the anti-coronavirus antigen antibody titer produced in the subject is increased 2, 3, 4, 5, 6, 7, 8, 9, or 10 times relative to a control. In some embodiments, the anti-coronavirus antigen antibody titer produced in a subject is increased 2-10 times relative to a control. For example, the anti-coronavirus antigen antibody titer produced in a subject may be increased 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 2-4, 2-3, 3-10, 3-9, 3-8, 3-7, 3-6, 3-5, 3-4, 4-10, 4-9, 4-8, 4-7, 4-6, 4-5, 5-10, 5-9, 5-8, 5-7, 5-6, 6-10, 6-9, 6-8, 6-7, 7-10, 7-9, 7-8, 8-10, 8-9, or 9-10 times relative to a control. In some embodiments, an antigen-specific immune response is measured as a ratio of geometric mean titer (GMT), referred to as a geometric mean ratio (GMR), of serum neutralizing antibody titers to coronavirus. A geometric mean titer (GMT) is the average antibody titer for a group of subjects calculated by multiplying all values and taking the nth root of the number, where n is the number of subjects with available data. A control, in some embodiments, is an anti-coronavirus antigen antibody titer produced in a subject who has not been administered a composition (e.g., RNA vaccine). In some embodiments, a control is an anti-coronavirus antigen antibody titer produced in a subject administered a recombinant or purified protein vaccine. Recombinant protein vaccines typically include protein antigens that either have been produced in a heterologous expression system (e.g., bacteria or yeast) or purified from large amounts of the pathogenic organism. In some embodiments, the ability of a composition (e.g., RNA vaccine) to be effective is measured in a murine model. For example, a composition may be administered to a murine model and the murine model assayed for induction of neutralizing antibody titers. Viral challenge studies may also be used to assess the efficacy of a vaccine. For example, a composition may be administered to a murine model, the murine model challenged with virus, and the murine model assayed for survival and/or immune response (e.g., neutralizing antibody response, T cell response (e.g., cytokine response)). In some embodiments, an effective amount of a composition (e.g., RNA vaccine) is a dose that is reduced compared to the standard of care dose of a recombinant protein vaccine. A “standard of care” refers to a medical or psychological treatment guideline and can be general or specific. “Standard of care” specifies appropriate treatment based on scientific evidence and collaboration between medical professionals involved in the treatment of a given condition. It is the diagnostic and treatment process that a physician/ clinician should follow for a certain type of patient, illness or clinical circumstance. A “standard of care dose” refers to the dose of a recombinant or purified protein vaccine, or a live attenuated or inactivated vaccine, or a VLP vaccine, that a physician/clinician or other medical professional would administer to a subject to treat or prevent coronavirus infection or a related condition, while following the standard of care guideline for treating or preventing coronavirus infection or a related condition. In some embodiments, the anti-coronavirus antigen antibody titer produced in a subject administered an effective amount of a composition is equivalent to an anti-coronavirus antigen antibody titer produced in a control subject administered a standard of care dose of a recombinant or purified protein vaccine, or a live attenuated or inactivated vaccine, or a VLP vaccine. Vaccine efficacy may be assessed using standard analyses (see, e.g., Weinberg et al., J Infect Dis.2010 Jun 1;201(11):1607-10). For example, vaccine efficacy may be measured by double-blind, randomized, clinical controlled trials. Vaccine efficacy may be expressed as a proportionate reduction in disease attack rate (AR) between the unvaccinated (ARU) and vaccinated (ARV) study cohorts and can be calculated from the relative risk (RR) of disease among the vaccinated group with use of the following formulas: Efficacy = (ARU – ARV)/ARU x 100; and Efficacy = (1-RR) x 100. Likewise, vaccine effectiveness may be assessed using standard analyses (see, e.g., Weinberg et al., J Infect Dis.2010 Jun 1;201(11):1607-10). Vaccine effectiveness is an assessment of how a vaccine (which may have already proven to have high vaccine efficacy) reduces disease in a population. This measure can assess the net balance of benefits and adverse effects of a vaccination program, not just the vaccine itself, under natural field conditions rather than in a controlled clinical trial. Vaccine effectiveness is proportional to vaccine efficacy (potency) but is also affected by how well target groups in the population are immunized, as well as by other non-vaccine-related factors that influence the ‘real-world’ outcomes of hospitalizations, ambulatory visits, or costs. For example, a retrospective case control analysis may be used, in which the rates of vaccination among a set of infected cases and appropriate controls are compared. Vaccine effectiveness may be expressed as a rate difference, with use of the odds ratio (OR) for developing infection despite vaccination: Effectiveness = (1 – OR) x 100. In some embodiments, efficacy of the composition (e.g., RNA vaccine) is at least 60% relative to unvaccinated control subjects. For example, efficacy of the composition may be at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 95%, at least 98%, or 100% relative to unvaccinated control subjects. Sterilizing Immunity. Sterilizing immunity refers to a unique immune status that prevents effective pathogen infection into the host. In some embodiments, the effective amount of a composition is sufficient to provide sterilizing immunity in the subject for at least 1 year. For example, the effective amount of a composition is sufficient to provide sterilizing immunity in the subject for at least 2 years, at least 3 years, at least 4 years, or at least 5 years. In some embodiments, the effective amount of a composition is sufficient to provide sterilizing immunity in the subject at an at least 5-fold lower dose relative to control. For example, the effective amount may be sufficient to provide sterilizing immunity in the subject at an at least 10-fold lower, 15-fold, or 20-fold lower dose relative to a control. Detectable Antigen. In some embodiments, the effective amount of a composition is sufficient to produce detectable levels of coronavirus antigen as measured in serum of the subject at 1-72 hours post administration. Titer. An antibody titer is a measurement of the number of antibodies within a subject, for example, antibodies that are specific to a particular antigen (e.g., an anti-coronavirus antigen). Antibody titer is typically expressed as the inverse of the greatest dilution that provides a positive result. Enzyme-linked immunosorbent assay (ELISA) is a common assay for determining antibody titers, for example. A neutralizing immune response is an immune response that is a neutralizing antibody response and/or an effective neutralizing T cell response. In some embodiments a neutralizing antibody response produces a level of antibodies that meet or exceed a seroprotection threshold. An effective T cell response is a response which produces a baseline level of viral activated or viral specific T cells including CD8+ and CD4+ T helper type 1 cells. CD8+ cytotoxic T lymphocytes typically clear the intracellular virus compartment and CD4+ T cells exert various functions in the body such as helping B and other T cells, promoting memory generation and indirect or direct cytotoxic activity. In some embodiments the effective T cells comprises a high proportion of CD8+ T cells and/or CD4+ T cells, relative to a baseline level (in a naïve subject). In some embodiments these T cells are differentiated towards an early- differentiated memory phenotype with co-expression of CD27 and CD28. In some embodiments, the effective amount of a composition is sufficient to produce a 1,000-10,000 neutralizing antibody titer produced by neutralizing antibody against the coronavirus antigen as measured in serum of the subject at 1-72 hours post administration. In some embodiments, the effective amount is sufficient to produce a 1,000-5,000 neutralizing antibody titer produced by neutralizing antibody against the coronavirus antigen as measured in serum of the subject at 1-72 hours post administration. In some embodiments, the effective amount is sufficient to produce a 5,000-10,000 neutralizing antibody titer produced by neutralizing antibody against the coronavirus antigen as measured in serum of the subject at 1-72 hours post administration. In some embodiments, the neutralizing antibody titer is at least 100 NT50. For example, the neutralizing antibody titer may be at least 200, 300, 400, 500, 600, 700, 800, 900 or 1000 NT50. In some embodiments, the neutralizing antibody titer is at least 10,000 NT50. In some embodiments, the neutralizing antibody titer is at least 100 neutralizing units per milliliter (NU/mL). For example, the neutralizing antibody titer may be at least 200, 300, 400, 500, 600, 700, 800, 900 or 1000 NU/mL. In some embodiments, the neutralizing antibody titer is at least 10,000 NU/mL. In some embodiments, an anti-coronavirus antigen antibody titer produced in the subject is increased by at least 1 log relative to a control. For example, an anti-coronavirus antigen antibody titer produced in the subject may be increased by at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 log relative to a control. In some embodiments, an anti-coronavirus antigen antibody titer produced in the subject is increased at least 2 times relative to a control. For example, an anti-coronavirus antigen antibody titer produced in the subject is increased by at least 3, 4, 5, 6, 7, 8, 9 or 10 times relative to a control. In some embodiments, a geometric mean, which is the nth root of the product of n numbers, is generally used to describe proportional growth. Geometric mean, in some embodiments, is used to characterize antibody titer produced in a subject. A control may be, for example, an unvaccinated subject, or a subject administered a live attenuated viral vaccine, an inactivated viral vaccine, or a protein subunit vaccine. EXAMPLES Methods Manufacture of polynucleotides and/or parts or regions thereof may be accomplished utilizing the methods taught in International Publication WO2014/152027, entitled “Manufacturing Methods for Production of RNA Transcripts,” the content of which is incorporated herein by reference in its entirety. Purification methods may include those taught in International Publication WO2014/152030 and International Publication WO2014/152031, each of which is incorporated herein by reference in its entirety. Detection and characterization methods of the polynucleotides may be performed as taught in International Publication WO2014/144039, which is incorporated herein by reference in its entirety. Characterization of polynucleotides may be accomplished using polynucleotide mapping, reverse transcriptase sequencing, charge distribution analysis, detection of RNA impurities, or any combination of two or more of the foregoing. “Characterizing” comprises determining the RNA transcript sequence, determining the purity of the RNA transcript, or determining the charge heterogeneity of the RNA transcript, for example. Such methods are taught in, for example, International Publication WO2014/144711 and International Publication WO2014/144767, the contents of each of which are incorporated herein by reference in their entirety. In experiments where a lipid nanoparticle (LNP) formulation was used, the formulation included 48 mol% ionizable lipid of Compound 1, 11 mol% 1,2 distearoyl-sn-glycero-3- phosphocholine (DSPC), 38.5 mol% cholesterol, and 2.5 mol% PEG-modified 1,2 dimyristoyl- sn-glycerol, methoxypolyethyleneglycol (PEG2500 DMG). Immunization Methods Vaccine compositions of lipid nanoparticles containing mRNAs are administered to mice according the following administration schedule. C57BL/6 mice are immunized with two doses of a given composition, receiving the first dose on day 0, and the second dose on day 22. Sera are collected on day 21, three weeks after the first (prime) dose but before administration of the second (boost) dose, and day 36, two weeks after administration of the second (boost) dose. Where T cells are evaluated, mice are euthanized on day 36, and spleens are collected and processed to harvest splenocytes. Splenocytes are stimulated with one of a panel of peptide pools, each pool containing peptides from a single SARS-CoV-2 antigen, in the presence of a Golgi blocker so that cells producing cytokines in response to stimulation retain cytokines instead of secreting them. Cell surfaces are stained for lymphocyte markers, including CD3, CD4, and CD8, and cells are permeabilized and stained for multiple cytokines. Stained cells are incubated with a viability dye and analyzed by flow cytometry. Neutralization assays Antibodies in serum, when bound to a viral surface protein that is essential for infection, can prevent a virus from infecting a target cell, an activity referred to as “neutralization.” To determine the ability of mRNA compositions to generate neutralizing antibodies against SARS- CoV-2, the neutralization activity of sera is quantified using a neutralization assay. For each assay, ARPE-19 cells are plated in 96-well plates, at a density of 2*104 cells/well and incubated for 20–24 hours. Then, serial 3-fold dilutions of each serum sample are prepared in phenol red- free cDMEM. A consistent amount of SARS-CoV-2 reporter virus, containing a gene encoding GFP, is incubated with each serum dilution sample, to allow for binding of any SARS-CoV-2- specific antibodies to the virus. After incubation, cells are washed, and incubated with SARS- CoV-2/serum mixtures.24 hours after incubation, GFP fluorescence in each well is measured to determine the extent of infection. For a given serum sample, the 50% neutralization titer (NT50) is calculated as the reciprocal of the serum dilution at which 50% of GFP+ cells are observed. Antibody-dependent cell-mediated cytotoxicity assays Antibodies in serum, when bound to viral protein expressed on the surface of an infected cell, can be recognized by effector cells, such as natural killer (NK) cells. Effector cells recognize the constant (Fc) region of antibodies bound to the target cells, and following recognition, release cytotoxic granules that induce apoptosis in the infected target cells. This process is referred to as “antibody-dependent cell-mediated cytotoxicity” (ADCC). To determine the ability of mRNA compositions to generate antibodies that can facilitate ADCC, the ADCC activity of sera is quantified using an ADCC assay. This assay uses Jurkat cells that constitutively express mouse FcγRIV, allowing for recognition of antibody Fc regions, and luciferase under the control of the NFAT pathway, which is activated following Fc recognition. In each assay, Vero cells are plated in 96-well plates, at a density of 2.5*104 cells/well, incubated for 20–24 hours, then inoculated with SARS-CoV-2 at a multiplicity of infection (MOI) of 5 plaque-forming units (PFU) per cell.16 hours after inoculation, serial 3-fold dilutions of serum samples are prepared in RPMI + 4% fetal bovine serum (FBS) containing only minimal amounts of IgG, to reduce background. Serum samples are added to each well, to allow antibodies to bind to infected Vero cells expressing viral surface proteins. Then, reporter effector cells are serially diluted in RPMI + 4% low-IgG FBS, added to wells, and incubated for 6 hours to allow for recognition of surface-bound antibodies and expression of luciferase. After the 6 hours of incubation, a luciferase substrate is added to wells, so that any luciferase present can react with the substrate to produce light. Light emitted from wells is measured to quantify the amount of luciferase activity as a measurement of ADCC activity. Example 1: Design of mRNA encoding antigenic SARS-CoV-2 antigens. The protein sequences of SARS-CoV-2 M and N proteins were analyzed to identify regions containing high densities of T cell epitopes. Regions of M and N proteins that are rich in T cell epitopes are shown in FIG.2 and FIG.4. Modified forms of each protein containing epitope-rich regions were designed. These modified proteins contain higher densities of T cell epitopes than full-length forms and are thus useful for eliciting T cell responses to the proteins. Example 2: Immunization of mice with compositions containing mRNAs encoding SARS- CoV-2 antigens. Mice are immunized with lipid nanoparticles containing the mRNAs encoding Nsp3 and N and M proteins, optionally with a signal peptide or the S protein N-terminal and receptor- binding domains, shown in FIG.1. The antigens encoded by the mRNAs of each composition are shown in Table E2, the sequences of which can be found in Appendix I. A first dose (prime) is administered on day 0, and a second dose (booster) is administered on day 22. Serum is collected on day 21, three weeks after the administration after the first dose but before booster dose administration, and on day 36, two weeks after the administration of the booster dose. At day 36, mice are also euthanized to collect spleens for analysis of T cells by cell surface marker and intracellular cytokine staining. T cell effector phenotype is and response to antigen is evaluated as described above. Sera are evaluated for antiviral activities such as neutralization, ADCC activity, and prevention of cell-cell spread by SARS-CoV-2. Table E2: Panel of mRNA vaccines containing mRNAs encoding SARS-CoV-2 chimeric protein 0 1 1 1 2 2 2
Figure imgf000166_0001
. _ _ _ _ _ 3.1 M-N_trunc_dBF-AAY-NSP3_E6_minjunc-AAY-M_FL 8 107 3.2 M-N_trunc_dBF-GGSGG-NSP3_E6_minjunc-GGSGG- 8 108 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8
Figure imgf000167_0001
Example 3. Immunogenicity and neutralization assay at day 21 following a single dose Mice are immunized with lipid nanoparticles containing mRNAs encoding Nsp3, N protein, and M protein, with or without a signal peptide or the S protein N-terminal and receptor- binding domains (described in Example 2 and Table E2). SARS-CoV-2 N protein-specific IgG titers, M-specific IgG titers, and Nsp3-specific IgG titers are measured by ELISA at Day 21 post vaccination. The assay to evaluate the neutralization capacity of IgG antibodies generated in response to immunization are carried out as described above. Example 4. Immunogenicity and neutralization assay at day 36 following two doses The same compositions of the mRNA vaccines described in Example 2 are again administered to mice as booster doses on Day 22 post-vaccination with the first dose. The titers of antibodies generated after the booster dose to each of N antigen, M antigen, and Nsp3 are measured by ELISA from day 36 serum. The assay to evaluate the neutralization capacity of IgG antibodies generated in response to immunization are carried out as described above. Example 5. Immunogenicity and neutralization assay following administration of mRNA vaccines encoding full-length or composite antigens One of a panel of mRNA vaccines shown in Table E5-1 (full-length T cell antigens) or Table E5-2 (composite T cell antigens with or without S antigen) are administered to mice at the indicated doses, with a first (prime) dose on day 0 and a second (boost) dose on day 22. At day 21 (before boost, 3 weeks post-prime dose), and day 36 (2 weeks post-boost dose), sera are collected and analyzed to quantify S protein-specific antibody titers, and neutralizing antibody titers against pseudoviruses expressing D614G, XBB.1.5, or BA.4/BA.5 S proteins. At day 36, mice are euthanized to collect spleens for analysis of antigen-specific cells by ELISPOT, and analysis of T cell responses by intracellular cytokine staining. Table E5-1: Panel of mRNA vaccines containing mRNAs encoding SARS-CoV-2 proteins (full-length N and/or M). G 1 2 3 4 5 6 7 8 9 1 1 1 1 1
Figure imgf000168_0001
Table E5-2: Panel of mRNA vaccines containing mRNAs encoding SARS-CoV-2 chimeric proteins. 1 2 3 4 5 6 7 8 9 1
Figure imgf000168_0002
. . ( - - ) + + Composite 2 11 XBB.1.5 S (NTD-RBD-HATM) + 1+5 6 1 1 1 1 1
Figure imgf000169_0001
In Table E5-2 above, the N-M-Nsp3 antigen of Group 2 may be any one of SEQ ID NOs: 180– 183, and the Composite 1–3 antigens are, for example: • Composite 1 = NTD40-NSP3-M FL-AAY (Protein 1.1, SEQ ID NO: 101, encoded by SEQ ID NO: 125); • Composite 2 = HAsp-dBF-NSP3-M FL-AAY (Protein 7.1, SEQ ID NO: 119, encoded by SEQ ID NO: 143); and • Composite 3 = HAsp-NTD40-NSP3-M FL-GGSGG (Protein 5.2, SEQ ID NO: 114, encoded by SEQ ID NO: 138). EXEMPLARY SEQUENCES It should be understood that any of the mRNA sequences may include a 5’ UTR and/or a 3’ UTR. The UTR sequences may be selected from the following sequences, or other known UTR sequences may be used. It should also be understood that any of the mRNA constructs may further comprise a poly(A) tail and/or cap (e.g., 7mG(5’)ppp(5’)NlmpNp). Further, while many of the mRNAs and encoded antigen sequences include a signal peptide and/or a peptide tag (e.g., C-terminal His tag), it should be understood that the indicated signal peptide and/or peptide tag may be substituted for a different signal peptide and/or peptide tag, or the signal peptide and/or peptide tag may be omitted. 5’ UTR: GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC (SEQ ID NO: 1) 5’ UTR: GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGACCCCGGCGCCGCCACC (SEQ ID NO: 2) 3’ UTR: UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCA CCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGC (SEQ ID NO: 3) 3’ UTR: UGAUAAUAGGCUGGAGCCUCGGUGGCCUAGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCA CCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGC (SEQ ID NO: 4) Ta ′ S N 5
Figure imgf000169_0002
G CUUUUUGUUCUCGCC GGAAAUCCCCACAACCGCCUCAUAUCCAGGCUCAAGAAUAGAGCUCAGUGUUUUGUUGUU U U G U G G C U G U U A A A C U A G U U
Figure imgf000170_0001
C GUGCACUUAUAAGUAUUUG 30 GGAAAGCGAUUGAAGGCGUCUUUUCAACUACUCGAUUAAGGUUGGGUAUCGUCGUGGGAC 3 A 3 3 A 3
Figure imgf000171_0001
Table 2.3′ UTR sequences (stop cassette is italicized; miR binding sites are boldened) SE ID N U 3 3 C C C 3 U C 3 U A 4 C C 4 C C 4 G U C 4 U A 4
Figure imgf000171_0002
(miR122 binding site boldened) Ta D H Ig
Figure imgf000171_0003
eavy c an eps on - sgna W W V V S 6 peptide Japanese encephalitis PRM signal MLGSNSGQRVVFTILLLLVAPAYS 63 se V J s I (
Figure imgf000172_0001
Table 5. Exemplary full-length SARS-CoV-2 protein sequences and portions thereof SEQ Sequence Description I N N )
Figure imgf000172_0002
G S G G G G C TVATSRTLSYYKLGASQRVAGDSGFAAYSRYRIGNYKLNTDHSSSSDNIALLVQ MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFL Full-length S PFF NVTWFHAIHV TN TKRFDNPVLPFND VYFA TEK NIIR WIF TTLD
Figure imgf000173_0001
TGCVIAWNSNKLDSKVGGNYNYRYRLFRKSNLKPFERDISTEIYQAGNKPCNGVAG VNCYFPLQSYGFRPTYGVGHQPYRVVVLSFELLHAPATVCGPK MLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKNGGDAALALLLLDRL 1.1 N LE KM K TVTKK AAEA KKPR KRTATKAYNVT AF RR PE T
Figure imgf000174_0001
MSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQ ELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPN FKDQVILLNKHIDAYKTFPPTaayAEAELAKNVSLDNVLSNEKQEILGTVSWNLAL RKVPTDNYITTYLVAEWFLAYILFTRFFYVYIFFA FYYVWK YVHTTDP FL RY
Figure imgf000175_0001
YNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGME VTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTaayAEAELAKNVS LDNVLSNEKQEILGTVSWNLALRKVPTDNYITTYLVAEWFLAYILFTRFFYVYIFF A FYYVWK YVHTTDP FL RYM AL TITVEELKKLLEEWNLVI FLFLTWI
Figure imgf000176_0001
NEKQEILGTVSWNLALRKVPTDNYITTYLVAEWFLAYILFTRFFYVYIFFASFYYV WKSYVHTTDPSFLGRYMSALaayGTITVEELKKLLEEWNLVIGFLFLTWICLLQFA YANRNRFLYIIKLIFLWLLWPVTLTCFVLAAVYRINWITGGIAIAMACLVGLMWLS YFIA FRLFARTR MW FNPETNILLNVPLH TILTRPLLE ELVI AVILR HLR
Figure imgf000177_0001
ACACACCAAAGAACGGCGGAGACGCGGCTTTGGCACTGCTGCTCCTGGACCGGCTG AACCAACTAGAGAGCAAGATGAGCGGCAAGGGCCAGCAGCAGCAAGGCCAGACCGT Protein 1.1 GACCAAGAAGAGCGCCGCCGAAGCCAGCAAGAAGCCCCGGCAGAAACGGACCGCCA AA TA AA T A A TTT A A A A A A A
Figure imgf000178_0001
GACGACAAGGACCCCAACTTTAAGGACCAGGTGATCTTGCTGAACAAGCACATCGA CGCCTACAAGACCTTCCCTCCCACTGCCGCCTACGCAGAAGCCGAGCTGGCCAAGA Protein 2.1 ACGTGAGCCTGGACAACGTGCTGAGCAACGAGAAGCAGGAGATCCTGGGCACCGTG A T AAT T T AA T A A AA TA AT A TA TA T
Figure imgf000179_0001
GGCCTACATCCTGTTCACCCGGTTCTTCTACGTGTACATCTTCTTCGCAAGCTTTT ACTACGTGTGGAAGAGCTACGTGCACACAACCGACCCCAGCTTTCTGGGACGGTAC Protein 3.1 ATGAGCGCGCTGGCAGCTTACGGAACCATAACCGTGGAGGAGCTGAAGAAGCTGCT A A T AA TTA T AT TT T TTT T A T ATTT T T
Figure imgf000180_0001
ATGAGCGCGCTGGCTGCTTACCGGACCCGGAGCATGTGGAGCTTCAACCCCGAGAC TAACATCCTGCTGAACGTGCCCCTGCACGGCACCATCCTGACCCGGCCCCTGTTAG Protein 4.1 AGAGCGAGCTGGTTATCGGCGCCGTGATTCTGCGGGGCCACTTGCGGATCGCCGGT A AT T T A AT AA A T AA A AT A TA A
Figure imgf000181_0001
GATCTTTCTGTGGCTGCTGTGGCCCGTGACCCTGACCTGCTTCGTGCTGGCCGCCG TGTACCGGATCAACTGGATCACCGGCGGCATCGCTATCGCCATGGCCTGCCTGGTG Protein 5.1 GGCCTGATGTGGCTGTCCTACTTCATCGCCAGCTTCCGACTGTTCGCGCGGACCCG A AT T A TT AA A A TAA AT T T AA T T A
Figure imgf000182_0001
CTGGTGATCGGCTTCCTGTTCCTCACCTGGATCTGCCTGCTGCAGTTCGCATACGC CAACCGGAACAGGTTCCTGTACATCATCAAGCTGATCTTCCTGTGGCTGCTGTGGC Protein 6.1 CCGTGACCCTGACCTGCTTCGTGCTGGCCGCCGTGTACCGGATCAACTGGATCACC AT AAT AT T T T T AT T T A TA TT
Figure imgf000183_0001
CTGGGGACTCCGGCTTTGCCGCATACAGCCGGTATCGGATCGGCAACTACAAGCTG AACACCGACCACAGCTCAAGCTCAGACAACATCGCCCTGCTGGTGCAG Protein 7.1 ATGAAGGCCATCTTAGTCGTGCTGCTGTACACCTTCACAACAGCCAACGCCCAAGG T AA AA A A T TT A T A A A A AA A
Figure imgf000184_0001
TCGTGCTGGCCGCCGTGTACCGGATCAACTGGATCACCGGCGGCATCGCAATCGCC Protein 8.1 ATGGCCTGCCTGGTGGGCCTGATGTGGCTGAGCTACTTCATTGCCAGCTTCCGGCT TT
Figure imgf000185_0001
MFVFLVLLPLVSSQCVNLITRTQSYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFF SNVTWFHAISGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSL RBD-N-M 3 LIVNNATNVVIKVCEFQFCNDPFLDVYYHKNNKSWMESEFRVYSSANNCTFEYVSQ PFLMDLE K NFKNLREFVFKNID YFKIY KHTPINL RDLP F ALEPLVDL
Figure imgf000186_0001
IAGHHLGRCDIKDLPKEITVATSRTLSYYKLGASQRVAGDSGFAAYSRYRIGNYKL NTDHSSSSDNIALLVQ RBD-N-M 6 MFVFLVLLPLVSSQCVNLITRTQSYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFF NVTWFHAI TN TKRFDNPVLPFND VYFA TEK NIIR WIF TTLD KT L
Figure imgf000187_0001
KHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLN KHIDAYKTFPPTggsggRTRSMWSFNPETNILLNVPLHGTILTRPLLESELVIGAV 10 ILRGHLRIAGHHLGRCDIKDLPKEITVATSRTLSYYKLGASQRVAGDSGFAAYSRY RI NYKLNTDH DNIALLV TITVEELKKLLEEWNLVI FLFLTWI , ,
Figure imgf000188_0001
GTTCAAACTGAGCGAGGTTGGCCCCGAGCATAGCCTGGCCGAGTACTACATCTTCT , TCGCCAGCTTCTACTACCGAAAGTCATACTTCGCCTACGCCAACCGCAACAGATTC linkers) CTGTACATCATCAAGCTGATATTCCTGTGGCTGCTGTGGCCCGTGACTCTGGCCTG TT T T T TA AT AA T AT A A AT ATT
Figure imgf000189_0001
EQUIVALENTS AND SCOPE While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure. All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms. All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document. The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.” The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in some embodiments, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc. As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law. As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in some embodiments, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc. Each possibility represents a separate embodiment of the present invention. It should be understood that, unless clearly indicated to the contrary, the disclosure of numerical values and ranges of numerical values in the specification includes both i) the exact value(s) or range specified, and ii) values that are “about” the value(s) or ranges specified (e.g., values or ranges falling within a reasonable range (e.g., about 10% similar)) as would be understood by a person of ordinary skill in the art. It should also be understood that, unless clearly indicated to the contrary, in any methods disclosed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are disclosed. In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.

Claims

CLAIMS What is claimed is: 1. A composition comprising a lipid nanoparticle and a messenger ribonucleic acid (mRNA) comprising an open reading frame encoding a SARS-CoV-2 chimeric protein comprising: (i) a SARS-CoV-2 nucleocapsid (N) protein portion; (ii) a SARS-CoV-2 non-structural protein 3 (NSP3) protein portion; and (iii) a SARS-CoV-2 matrix (M) protein portion comprising one or more transmembrane domains.
2. The composition of claim 1, wherein the SARS-CoV-2 N protein portion comprises (a) a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein, and (b) a C- terminal domain of the full-length SARS-CoV-2 N protein.
3. The composition of claim 2, wherein the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein. 4. The composition of claim 2 or 3, wherein the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104–143 of the full- length SARS-CoV-2 N protein. 5. The composition of claim 4, wherein the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43– 87 of the full-length SARS-CoV-2 N protein. 6. The composition of claim 5, wherein the first and second N-terminal domain amino acid sequences are connected by a linker. 7. The composition of claim 6, wherein the linker is a glycine linker or glycine-serine linker. 8. The composition of any one of claims 2–6, wherein the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213–366 of the full-length SARS-CoV-2 N protein. 9. The composition of any one of claims 2–8, wherein the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84. 10. The composition of any one of claims 1–8, wherein the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92. 11. The composition of any one of claims 1–10, wherein the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein. 12. The composition of claim 11, wherein the SARS-CoV-2 NSP3 protein portion comprises 3,
4,
5,
6,
7,
8,
9,
10,
11,
12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
13. The composition of claim 11 or 12, wherein the two or more CD8+ T cell epitopes occur in a different order in the SARS-CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length SARS-CoV-2 NSP3 protein.
14. The composition of any one of claims 11–13, wherein one or more junctional epitopes, which are present in a concatenated amino acid sequence consisting of the two or more CD8+ T cell epitopes, are not present in the SARS-CoV-2 NSP3 protein portion.
15. The composition of any one of claims 11–14, wherein the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85.
16. The composition of any one of claims 11–15, wherein the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93.
17. The composition of any one of claims 1–16, wherein the SARS-CoV-2 M protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
18. The composition of any one of claims 1–17, wherein the SARS-CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and (b) a β-sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
19. The composition of claim 18, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
20. The composition of any one of claims 1–17, wherein the SARS-CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) a β-sheet domain of a full-length SARS-CoV-2 M protein, and (b) one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
21. The composition of claim 20, wherein the β-sheet domain is connected to the one or more transmembrane domains by a linker.
22. The composition of claim 21, wherein the linker is a glycine or glycine-serine linker.
23. The composition of any one of claims 11–14, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
24. The composition of any one of claims 18–23, wherein the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
25. The composition of any one of claims 1–24, wherein two or more of the N protein portion, NSP3 protein portion, and M protein portion are separated by a linker.
26. The composition of claim 25, wherein: (i) the N protein portion and the NSP3 protein portion are separated by a first linker, and/or (ii) the NSP3 protein portion and the M protein portion are separated by a second linker.
27. The composition of claim 25, wherein: (i) the N protein portion and the M protein portion are separated by a first linker, and/or (ii) the M protein portion and the NSP3 protein portion are separated by a second linker.
28. The composition of claim 25, wherein: (i) the M protein portion and the N protein portion are separated by a first linker, and/or (ii) the N protein portion and the NSP3 protein portion are separated by a second linker.
29. The composition of any one of claims 26–28, wherein each of the first and second linkers is a glycine or glycine-serine linker.
30. The composition of any one of claims 26–28, wherein each of the first and second linkers comprises the amino acid sequence AAY.
31. The composition of any one of claims 1–30, wherein the SARS-CoV-2 chimeric protein further comprises a signal peptide.
32. The composition of claim 31, wherein the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide.
33. A composition comprising a lipid nanoparticle and a messenger RNA (mRNA) comprising an open reading frame encoding a SARS-CoV-2 chimeric protein comprising: (i) a SARS-CoV-2 Spike (S) protein portion; and (ii) a SARS-CoV-2 nucleocapsid (N) protein portion; and (iii) a transmembrane portion comprising a transmembrane domain.
34. The composition of claim 33, wherein the SARS-CoV-2 N protein portion comprises (a) a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein, and (b) a C- terminal domain of the full-length SARS-CoV-2 N protein.
35. The composition of claim 34, wherein the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
36. The composition of claim 34 or 35, wherein the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104–143 of a full-length SARS-CoV-2 N protein.
37. The composition of claim 36, wherein the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43– 87 of the full-length SARS-CoV-2 N protein.
38. The composition of claim 37, wherein the first and second N-terminal domain amino acid sequences are connected by a linker.
39. The composition of claim 38, wherein the linker is a glycine or glycine-serine linker.
40. The composition of any one of claims 34–39, wherein the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213– 366 of the full-length SARS-CoV-2 N protein.
41. The composition of any one of claims 34–40, wherein the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
42. The composition of any one of claims 33–41, wherein the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
43. The composition of any one of claims 33–42, wherein the transmembrane portion comprises an influenza virus hemagglutinin (HA) transmembrane domain.
44. The composition of any one of claims 33–42, wherein the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein.
45. The composition of claim 44, wherein the SARS-CoV-2 M protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
46. The composition of claim 44 or 45, wherein the SARS-CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and (b) a β-sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
47. The composition of claim 46, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
48. The composition of claim 44 or 45, wherein the SARS-CoV-2 M protein portion, comprises, in N-to-C-terminal order, (a) a β-sheet domain of a full-length SARS-CoV-2 M protein, and (b) one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
49. The composition of claim 48, wherein the β-sheet domain is connected to the one or more transmembrane domains by a linker.
50. The composition of claim 49, wherein the linker is a glycine or glycine-serine linker.
51. The composition of claim 48, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
52. The composition of any one of claims 44–51, wherein the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
53. The composition of any one of claims 33–51, wherein the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
54. The composition of any one of claims 33–53, wherein the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
55. The composition of any one of claims 33–54, wherein the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker.
56. The composition of claim 55, wherein the linker is a glycine linker or a glycine-serine linker.
57. The composition of any one of claims 33–55, wherein the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein.
58. The composition of claim 57, wherein the NTD corresponds to amino acids 1–290 of the full-length SARS-CoV-2 S protein, and/or the RBD corresponds to amino acids 316–517 of the full-length SARS-CoV-2 S protein.
59. The composition of claim 57 or 58, wherein the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein.
60. The composition of claim 57or 58, wherein the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein.
61. The composition of any one of claims 57–59, wherein the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87.
62. The composition of any one of claims 33–61, wherein the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98.
63. The composition of any one of claims 33–61, wherein two or more of the S protein portion, N protein portion, and transmembrane portion are separated by a linker.
64. The composition of claim 63, wherein: (i) the S protein portion and the N protein portion are separated by a first linker, and/or (ii) the N protein portion and the transmembrane portion are separated by a second linker.
65. The composition of claim 63, wherein: (i) the S protein portion and the transmembrane portion are separated by a first linker, and/or (ii) the transmembrane portion and the N protein portion are separated by a second linker.
66. The composition of claim 64 or 65, wherein each of the first and second linkers is a glycine or glycine-serine linker.
67. The composition of claim 64 or 65, wherein each of the first and second linkers comprises the amino acid sequence AAY.
68. A composition comprising a lipid nanoparticle and a messenger ribonucleic acid comprising an open reading frame (ORF) encoding a protein comprising an amino acid sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183.
69. The composition of any one of claims 1–68, wherein the mRNA comprises a 5′ untranslated region (UTR), wherein the 5′ UTR comprises a nucleotide sequence with at least 90% sequence identity to a nucleotide sequence selected from SEQ ID NOs: 1, 2, 5–35, 66, 70– 72, 75, 76, and 81. 70. The composition of claim 69, wherein the 5′ UTR comprises a nucleotide sequence selected from SEQ ID NOs: 1, 2, 5–35, 66,
70–72, 75, 76, and 81.
71. The composition of any one of claims 1–70, wherein the mRNA comprises a 3′ untranslated region (UTR), wherein the 5′ UTR comprises a nucleotide sequence with at least 90% sequence identity to a nucleotide sequence selected from SEQ ID NOs: 3–4, 36–44, 68, 69, 73, 74, 77–79, and 82.
72. The composition of claim 71, wherein the 3′ UTR comprises a nucleotide sequence selected from SEQ ID NOs: 3–4, 36–44, 68, 69, 73, 74, 77–79, and 82.
73. The composition of any one of claims 1–72, wherein the mRNA comprises one or more stop codons immediately downstream from the open reading frame.
74. The composition of claim 73, wherein the one or more stop codons comprise the nucleotide sequence UGAUGA.
75. The composition of claim 73, wherein the one or more stop codons comprise the nucleotide sequence UGAUAAUAG.
76. The composition of any one of claims 1–75, wherein the mRNA comprises a polyadenosine (polyA) sequence comprising 20 or more consecutive adenosine nucleotides.
77. The composition of claim 76, wherein the polyA sequence comprises 100 consecutive adenosine nucleotides.
78. The composition of claim 76, wherein the polyA sequence comprises, in 5′-to-3′ order, a first nucleotide sequence comprising 30 consecutive adenosine nucleotides, an intervening sequence comprising no more than three adenosine nucleotides, and a second nucleotide sequence comprising 70 consecutive adenosine nucleotides.
79. The composition of claim 76 or 78, wherein the polyA sequence comprises the nucleotide sequence of SEQ ID NO: 80.
80. The composition of claim 76, wherein the mRNA further comprises a polycytidine (polyC) sequence comprising 20 or more consecutive cytidine nucleotides.
81. The composition of claim 80, wherein the polyC sequence comprises 30 consecutive cytidine nucleotides.
82. The composition of claim 80 or 81, wherein the polyC sequence is downstream from the polyA sequence, wherein the polyA sequence comprises 64 consecutive adenosine nucleotides.
83. The composition of claim 76, wherein the polyA sequence comprises 109 consecutive adenosine nucleotides.
84. The composition of any one of claims 1–83, wherein the mRNA comprises a 5′ cap analog.
85. The composition of claim 84, wherein the 5′ cap analog comprises a 7mG(5′)ppp(5′)NlmpNp cap.
86. The composition of any one of claims 1–85, wherein the lipid nanoparticle comprises 40- 55 mol% ionizable amino lipid, 30-45 mol% sterol, 5-15 mol% neutral lipid, and 1-5 mol% PEG-modified lipid.
87. The composition of any one of claims 1–86, wherein the ionizable amino lipid comprises a compound of Formula (I):
Figure imgf000201_0001
salt or isomer thereof, wherein: R1 is R”M’R’ or C5-20 alkenyl; R2 and R3 are each independently selected from C1-14 alkyl and C2-14 alkenyl; R4 is -(CH2)nQ, wherein Q is OH and n is selected from 3, 4, and 5; M and M’ are each independently -OC(O)- or -C(O)O-; R5, R6, and R7 are each H; R’ is a linear C1-12 alkyl, or C1-12 alkyl substituted with C6-9 alkyl; R” is C3-14 alkyl; m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13.
88. The the ionizable amino lipid comprises
Figure imgf000201_0002
Compound 1: (Compound 1).
89. The composition of any one of claims 1–87, wherein the ionizable amino lipid comprises a compound of the structure A7: .
Figure imgf000202_0001
90. The composition of any one of claims 86–89, wherein the neutral lipid is 1,2 distearoyl- sn-glycero-3-phosphocholine (DSPC).
91. The composition of any one of claims 86–90, wherein the sterol is cholesterol.
92. The composition of any one of claims 86–91, wherein the PEG-modified lipid is PEG2000-DMG.
93. The composition of any one of claims 1–92, wherein the open reading frame comprises one or more chemically modified nucleotides.
94. The composition of any one of claims 1–93, wherein the open reading frame comprises N1-methylpseudouridine.
95. The composition of any one of claims 1–94, wherein at least 80% of uracil nucleotides in the open reading frame comprise N1-methylpseudouridine.
96. The composition of any one of claims 1–95, wherein 100% of uracil nucleotides in the open reading frame comprise N1-methylpseudouridine.
97. The composition of any one of claims 1–96, wherein the open reading frame comprises 5-methylcytidine.
98. The composition of any one of claims 1–97, wherein at least 80% of cytosine nucleotides in the open reading frame comprise 5-methylcytidine.
99. The composition of any one of claims 1–98, wherein 100% of cytosine nucleotides in the open reading frame comprise 5-methylcytidine.
100. The composition of any one of claims 1-93 or 97-99, wherein the open reading frame comprises 5-methyluridine.
101. The composition of any one of claims 1-93 or 97-100, wherein at least 80% of uracil nucleotides in the open reading frame comprise 5-methyluridine.
102. The composition of any one of claims 1–93 or 97-101, wherein 100% of uracil nucleotides in the open reading frame comprise 5-methyluridine.
103. A pharmaceutical composition comprising the composition of any one of claims 1–102 and a pharmaceutically acceptable excipient.
104. A method comprising administering to a subject the composition of any one of claims 1– 102.
105. The method of claim 104, wherein the composition is administered intramuscularly.
106. The method of claim 104 or 105, wherein the composition is effective to induce, in the subject, CD4+ and/or CD8+ T cells specific to one or more epitopes of the protein.
107. The method of any one of claims 104–106, the method comprising administering a first dose and a second dose of the composition.
108. A SARS-CoV-2 chimeric protein comprising: (i) a SARS-CoV-2 nucleocapsid (N) protein portion; (ii) a SARS-CoV-2 non-structural protein 3 (NSP3) protein portion; and (iii) a SARS-CoV-2 matrix (M) protein portion comprising one or more transmembrane domains.
109. The SARS-CoV-2 chimeric protein of claim 108, wherein the SARS-CoV-2 N protein portion comprises (a) a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein, and (b) a C-terminal domain of the full-length SARS-CoV-2 N protein.
110. The SARS-CoV-2 chimeric protein of claim 109, wherein the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
111. The SARS-CoV-2 chimeric protein of claim 109 or 110, wherein the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104–143 of the full-length SARS-CoV-2 N protein.
112. The SARS-CoV-2 chimeric protein of claim 111, wherein the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43–87 of the full-length SARS-CoV-2 N protein.
113. The SARS-CoV-2 chimeric protein of claim 112, wherein the first and second N-terminal domain amino acid sequences are connected by a linker.
114. The SARS-CoV-2 chimeric protein of claim 113, wherein the linker is a glycine linker or glycine-serine linker.
115. The SARS-CoV-2 chimeric protein of any one of claims 109–113, wherein the SARS- CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213–366 of the full-length SARS-CoV-2 N protein.
116. The SARS-CoV-2 chimeric protein of any one of claims 109–115, wherein the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
117. The SARS-CoV-2 chimeric protein of any one of claims 108–115, wherein the SARS- CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
118. The SARS-CoV-2 chimeric protein of any one of claims 108–117, wherein the SARS- CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
119. The SARS-CoV-2 chimeric protein of claim 118, wherein the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
120. The SARS-CoV-2 chimeric protein of claim 118 or 119, wherein the two or more CD8+ T cell epitopes occur in a different order in the SARS-CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length SARS-CoV-2 NSP3 protein.
121. The SARS-CoV-2 chimeric protein of any one of claims 118–120, wherein one or more junctional epitopes, which are present in a concatenated amino acid sequence consisting of the two or more CD8+ T cell epitopes, are not present in the SARS-CoV-2 NSP3 protein portion.
122. The SARS-CoV-2 chimeric protein of any one of claims 118–121, wherein the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85.
123. The SARS-CoV-2 chimeric protein of any one of claims 118–122, wherein the SARS- CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93.
124. The SARS-CoV-2 chimeric protein of any one of claims 108–123, wherein the SARS- CoV-2 M protein portion does not comprise an N-terminal glycosylation site, relative to a full- length SARS-CoV-2 M protein.
125. The SARS-CoV-2 chimeric protein of any one of claims 108–124, wherein the SARS- CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and (b) a β-sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
126. The SARS-CoV-2 chimeric protein of claim 125, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
127. The SARS-CoV-2 chimeric protein of any one of claims 108–124, wherein the SARS- CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) a β-sheet domain of a full- length SARS-CoV-2 M protein, and (b) one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
128. The SARS-CoV-2 chimeric protein of claim 127, wherein the β-sheet domain is connected to the one or more transmembrane domains by a linker.
129. The SARS-CoV-2 chimeric protein of claim 128, wherein the linker is a glycine or glycine-serine linker.
130. The SARS-CoV-2 chimeric protein of any one of claims 118–121, wherein the SARS- CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
131. The SARS-CoV-2 chimeric protein of any one of claims 125–130, wherein the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
132. The SARS-CoV-2 chimeric protein of any one of claims 108–131, wherein two or more of the N protein portion, NSP3 protein portion, and M protein portion are separated by a linker.
133. The SARS-CoV-2 chimeric protein of claim 132, wherein: (i) the N protein portion and the NSP3 protein portion are separated by a first linker, and/or (ii) the NSP3 protein portion and the M protein portion are separated by a second linker.
134. The SARS-CoV-2 chimeric protein of claim 132, wherein: (i) the N protein portion and the M protein portion are separated by a first linker, and/or (ii) the M protein portion and the NSP3 protein portion are separated by a second linker.
135. The SARS-CoV-2 chimeric protein of claim 132, wherein: (i) the M protein portion and the N protein portion are separated by a first linker, and/or (ii) the N protein portion and the NSP3 protein portion are separated by a second linker.
136. The SARS-CoV-2 chimeric protein of any one of claims 133–135, wherein each of the first and second linkers is a glycine or glycine-serine linker.
137. The SARS-CoV-2 chimeric protein of any one of claims 133–135, wherein each of the first and second linkers comprises the amino acid sequence AAY.
138. The SARS-CoV-2 chimeric protein of any one of claims 108–137, wherein the SARS- CoV-2 chimeric protein further comprises a signal peptide.
139. The SARS-CoV-2 chimeric protein of claim 138, wherein the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide.
140. A SARS-CoV-2 chimeric protein comprising: (i) a SARS-CoV-2 Spike (S) protein portion; and (ii) a SARS-CoV-2 nucleocapsid (N) protein portion; and (iii) a transmembrane portion comprising a transmembrane domain.
141. The SARS-CoV-2 chimeric protein of claim 140, wherein the SARS-CoV-2 N protein portion comprises (a) a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein, and (b) a C-terminal domain of the full-length SARS-CoV-2 N protein.
142. The SARS-CoV-2 chimeric protein of claim 141, wherein the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
143. The SARS-CoV-2 chimeric protein of claim 141 or 142, wherein the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104–143 of a full-length SARS-CoV-2 N protein.
144. The SARS-CoV-2 chimeric protein of claim 143, wherein the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43–87 of the full-length SARS-CoV-2 N protein.
145. The SARS-CoV-2 chimeric protein of claim 144, wherein the first and second N-terminal domain amino acid sequences are connected by a linker.
146. The SARS-CoV-2 chimeric protein of claim 145, wherein the linker is a glycine or glycine-serine linker.
147. The SARS-CoV-2 chimeric protein of any one of claims 141–146, wherein the SARS- CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213–366 of the full-length SARS-CoV-2 N protein.
148. The SARS-CoV-2 chimeric protein of any one of claims 141–147, wherein the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
149. The SARS-CoV-2 chimeric protein of any one of claims 140–148, wherein the SARS- CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
150. The SARS-CoV-2 chimeric protein of any one of claims 140–149, wherein the transmembrane portion comprises an influenza virus hemagglutinin (HA) transmembrane domain.
151. The SARS-CoV-2 chimeric protein of any one of claims 140–149, wherein the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full-length SARS-CoV-2 M protein.
152. The SARS-CoV-2 chimeric protein of claim 151, wherein the SARS-CoV-2 M protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
153. The SARS-CoV-2 chimeric protein of claim 151 or 152, wherein the SARS-CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and (b) a β-sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the full-length SARS- CoV-2 chimeric protein is expressed in a cell.
154. The SARS-CoV-2 chimeric protein of claim 153, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
155. The SARS-CoV-2 chimeric protein of claim 151 or 152, wherein the SARS-CoV-2 M protein portion, comprises, in N-to-C-terminal order, (a) a β-sheet domain of a full-length SARS- CoV-2 M protein, and (b) one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
156. The SARS-CoV-2 chimeric protein of claim 155, wherein the β-sheet domain is connected to the one or more transmembrane domains by a linker.
157. The SARS-CoV-2 chimeric protein of claim 156, wherein the linker is a glycine or glycine-serine linker.
158. The SARS-CoV-2 chimeric protein of claim 155, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
159. The SARS-CoV-2 chimeric protein of any one of claims 151–158, wherein the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
160. The SARS-CoV-2 chimeric protein of any one of claims 140–158, wherein the SARS- CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
161. The SARS-CoV-2 chimeric protein of any one of claims 140–160, wherein the SARS- CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS- CoV-2 chimeric protein is expressed in a cell.
162. The SARS-CoV-2 chimeric protein of any one of claims 140–161, wherein the SARS- CoV-2 N protein portion and the transmembrane portion are connected by a linker.
163. The SARS-CoV-2 chimeric protein of claim 162, wherein the linker is a glycine linker or a glycine-serine linker.
164. The SARS-CoV-2 chimeric protein of any one of claims 140–162, wherein the SARS- CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein.
165. The SARS-CoV-2 chimeric protein of claim 164, wherein the NTD corresponds to amino acids 1–290 of the full-length SARS-CoV-2 S protein, and/or the RBD corresponds to amino acids 316–517 of the full-length SARS-CoV-2 S protein.
166. The SARS-CoV-2 chimeric protein of claim 164 or 165, wherein the full-length SARS- CoV-2 S protein is a BA.4 or BA.5 lineage S protein.
167. The SARS-CoV-2 chimeric protein of claim 164or 165, wherein the full-length SARS- CoV-2 S protein is a Wuhan-Hu-1 lineage S protein.
168. The SARS-CoV-2 chimeric protein of any one of claims 164–166, wherein the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87.
169. The SARS-CoV-2 chimeric protein of any one of claims 140–168, wherein the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98.
170. The SARS-CoV-2 chimeric protein of any one of claims 140–169, wherein two or more of the S protein portion, N protein portion, and transmembrane portion are separated by a linker.
171. The SARS-CoV-2 chimeric protein of claim 170, wherein: (i) the S protein portion and the N protein portion are separated by a first linker, and/or (ii) the N protein portion and the transmembrane portion are separated by a second linker.
172. The SARS-CoV-2 chimeric protein of claim 170, wherein: (i) the S protein portion and the transmembrane portion are separated by a first linker, and/or (ii) the transmembrane portion and the N protein portion are separated by a second linker.
173. The SARS-CoV-2 chimeric protein of any one of claim 171 or 172, wherein each of the first and second linkers is a glycine or glycine-serine linker.
174. The SARS-CoV-2 chimeric protein of any one of claim 171 or 172, wherein each of the first and second linkers comprises the amino acid sequence AAY.
175. A SARS-CoV-2 chimeric protein comprising an amino acid sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 101–124 or 170–183.
176. A ribonucleic acid (RNA) comprising an open reading frame (ORF) encoding the protein of any one of claims 108–175.
177. A messenger ribonucleic acid (mRNA) comprising an open reading frame encoding the SARS-CoV-2 chimeric protein of claim 176.
178. The mRNA of claim 176 or 177, wherein the mRNA comprises a chemical modification.
179. The mRNA of any one of claims 176–178, wherein 100% of the uracil nucleotides of the mRNA comprise a chemical modification.
180. The mRNA of claim 179, wherein 100% of uracil nucleotides of the mRNA comprise N1-methylpseudouridine.
181. A self-amplifying ribonucleic acid (saRNA) comprising an open reading frame encoding a SARS-CoV-2 chimeric protein comprising: (i) a SARS-CoV-2 nucleocapsid (N) protein portion; (ii) a SARS-CoV-2 non-structural protein 3 (NSP3) protein portion; and (iii) a SARS-CoV-2 matrix (M) protein portion comprising one or more transmembrane domains.
182. The saRNA of claim 181, wherein the SARS-CoV-2 N protein portion comprises (a) a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein, and (b) a C- terminal domain of the full-length SARS-CoV-2 N protein.
183. The saRNA of claim 182, wherein the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
184. The saRNA of claim 182 or 183, wherein the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104–143 of the full- length SARS-CoV-2 N protein.
185. The saRNA of claim 184, wherein the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43–87 of the full-length SARS-CoV-2 N protein.
186. The saRNA of claim 185, wherein the first and second N-terminal domain amino acid sequences are connected by a linker.
187. The saRNA of claim 186, wherein the linker is a glycine linker or glycine-serine linker.
188. The saRNA of any one of claims 182–186, wherein the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213–366 of the full-length SARS-CoV-2 N protein.
189. The saRNA of any one of claims 182–188, wherein the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
190. The saRNA of any one of claims 181–188, wherein the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
191. The saRNA of any one of claims 181–190, wherein the SARS-CoV-2 NSP3 protein portion comprises two or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
192. The saRNA of claim 191, wherein the SARS-CoV-2 NSP3 protein portion comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more CD8+ T cell epitopes of a full-length SARS-CoV-2 NSP3 protein.
193. The saRNA of claim 191 or 192, wherein the two or more CD8+ T cell epitopes occur in a different order in the SARS-CoV-2 NSP3 protein portion, relative to the order of the epitopes in a full-length SARS-CoV-2 NSP3 protein.
194. The saRNA of any one of claims 191–193, wherein one or more junctional epitopes, which are present in a concatenated amino acid sequence consisting of the two or more CD8+ T cell epitopes, are not present in the SARS-CoV-2 NSP3 protein portion.
195. The saRNA of any one of claims 191–194, wherein the full-length SARS-CoV-2 NSP3 protein comprises the amino acid sequence of SEQ ID NO: 85.
196. The saRNA of any one of claims 191–195, wherein the SARS-CoV-2 NSP3 protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 93.
197. The saRNA of any one of claims 181–196, wherein the SARS-CoV-2 M protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
198. The saRNA of any one of claims 181–197, wherein the SARS-CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and (b) a β-sheet domain of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
199. The saRNA of claim 198, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
200. The saRNA of any one of claims 181–197, wherein the SARS-CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) a β-sheet domain of a full-length SARS-CoV-2 M protein, and (b) one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β-sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
201. The saRNA of claim 200, wherein the β-sheet domain is connected to the one or more transmembrane domains by a linker.
202. The saRNA of claim 201, wherein the linker is a glycine or glycine-serine linker.
203. The saRNA of any one of claims 191–194, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
204. The saRNA of any one of claims 198–203, wherein the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
205. The saRNA of any one of claims 181–204, wherein two or more of the N protein portion, NSP3 protein portion, and M protein portion are separated by a linker.
206. The saRNA of claim 205, wherein: (i) the N protein portion and the NSP3 protein portion are separated by a first linker, and/or (ii) the NSP3 protein portion and the M protein portion are separated by a second linker.
207. The saRNA of claim 205, wherein: (i) the N protein portion and the M protein portion are separated by a first linker, and/or (ii) the M protein portion and the NSP3 protein portion are separated by a second linker.
208. The saRNA of claim 205, wherein: (i) the M protein portion and the N protein portion are separated by a first linker, and/or (ii) the N protein portion and the NSP3 protein portion are separated by a second linker.
209. The saRNA of any one of claims 206–208, wherein each of the first and second linkers is a glycine or glycine-serine linker.
210. The saRNA of any one of claims 206–208, wherein each of the first and second linkers comprises the amino acid sequence AAY.
211. The saRNA of any one of claims 181–210, wherein the SARS-CoV-2 chimeric protein further comprises a signal peptide.
212. The saRNA of claim 211, wherein the signal peptide comprises an influenza A virus hemagglutinin (HA) signal peptide.
213. A self-amplifying ribonucleic acid (saRNA) encoding a SARS-CoV-2 chimeric protein comprising: (i) a SARS-CoV-2 Spike (S) protein portion; and (ii) a SARS-CoV-2 nucleocapsid (N) protein portion; and (iii) a transmembrane portion comprising a transmembrane domain.
214. The saRNA of claim 213, wherein the SARS-CoV-2 N protein portion comprises (a) a truncated or modified N-terminal domain of a full-length SARS-CoV-2 N protein, and (b) a C- terminal domain of the full-length SARS-CoV-2 N protein.
215. The saRNA of claim 214, wherein the SARS-CoV-2 N protein portion does not comprise a basic loop of an N-terminal domain, relative to the full-length SARS-CoV-2 N protein.
216. The saRNA of claim 214 or 215, wherein the SARS-CoV-2 N protein portion comprises a first N-terminal domain amino acid sequence corresponding to amino acids 104–143 of a full- length SARS-CoV-2 N protein.
217. The saRNA of claim 216, wherein the SARS-CoV-2 N protein portion further comprises a second N-terminal domain amino acid sequence corresponding to amino acids 43–87 of the full-length SARS-CoV-2 N protein.
218. The saRNA of claim 217, wherein the first and second N-terminal domain amino acid sequences are connected by a linker.
219. The saRNA of claim 218, wherein the linker is a glycine or glycine-serine linker.
220. The saRNA of any one of claims 214–219, wherein the SARS-CoV-2 N protein portion comprises a C-terminal domain amino acid sequence corresponding to amino acids 213–366 of the full-length SARS-CoV-2 N protein.
221. The saRNA of any one of claims 214–220, wherein the full-length SARS-CoV-2 N protein comprises the amino acid sequence of SEQ ID NO: 84.
222. The saRNA of any one of claims 213–221, wherein the SARS-CoV-2 N protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 91 or SEQ ID NO: 92.
223. The saRNA of any one of claims 213–222, wherein the transmembrane portion comprises an influenza virus hemagglutinin (HA) transmembrane domain.
224. The saRNA of any one of claims 213–222, wherein the transmembrane portion comprises a SARS-CoV-2 M protein portion comprising one or more transmembrane domains of a full- length SARS-CoV-2 M protein.
225. The saRNA of claim 224, wherein the SARS-CoV-2 M protein portion does not comprise an N-terminal glycosylation site, relative to a full-length SARS-CoV-2 M protein.
226. The saRNA of claim 224 or 225, wherein the SARS-CoV-2 M protein portion comprises, in N-to-C-terminal order, (a) one or more transmembrane domains of a full-length SARS-CoV-2 M protein, and (b) a β-sheet domain of the full-length SARS-CoV-2 M protein, wherein the β- sheet domain is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
227. The saRNA of claim 226, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 96.
228. The saRNA of claim 224 or 225, wherein the SARS-CoV-2 M protein portion, comprises, in N-to-C-terminal order, (a) a β-sheet domain of a full-length SARS-CoV-2 M protein, and (b) one or more transmembrane domains of the full-length SARS-CoV-2 M protein, wherein the β- sheet domain is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
229. The saRNA of claim 228, wherein the β-sheet domain is connected to the one or more transmembrane domains by a linker.
230. The saRNA of claim 229, wherein the linker is a glycine or glycine-serine linker.
231. The saRNA of claim 228, wherein the SARS-CoV-2 M protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 97.
232. The saRNA of any one of claims 224–231, wherein the full-length SARS-CoV-2 M protein comprises the amino acid sequence of SEQ ID NO: 86.
233. The saRNA of any one of claims 213–231, wherein the SARS-CoV-2 N protein portion is C-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is present in the cytoplasm when the SARS-CoV-2 chimeric protein is expressed in a cell.
234. The saRNA of any one of claims 213–233, wherein the SARS-CoV-2 N protein portion is N-terminal to the transmembrane portion in the SARS-CoV-2 chimeric protein, wherein the SARS-CoV-2 N protein portion is extracellular when the SARS-CoV-2 chimeric protein is expressed in a cell.
235. The saRNA of any one of claims 213–234, wherein the SARS-CoV-2 N protein portion and the transmembrane portion are connected by a linker.
236. The saRNA of claim 235, wherein the linker is a glycine linker or a glycine-serine linker.
237. The saRNA of any one of claims 213–235, wherein the SARS-CoV-2 S protein portion comprises an N-terminal domain (NTD) and a receptor-binding domain (RBD) of a full-length SARS-CoV-2 S protein.
238. The saRNA of claim 237, wherein the NTD corresponds to amino acids 1–290 of the full- length SARS-CoV-2 S protein, and/or the RBD corresponds to amino acids 316–517 of the full- length SARS-CoV-2 S protein.
239. The saRNA of claim 237 or 238, wherein the full-length SARS-CoV-2 S protein is a BA.4 or BA.5 lineage S protein.
240. The saRNA of claim 237or 238, wherein the full-length SARS-CoV-2 S protein is a Wuhan-Hu-1 lineage S protein.
241. The saRNA of any one of claims 237–239, wherein the full-length SARS-CoV-2 S protein comprises the amino acid sequence of SEQ ID NO: 87.
242. The saRNA of any one of claims 213–241, wherein the S protein portion comprises an amino acid sequence with at least 90% identity to the amino acid sequence of SEQ ID NO: 98.
243. The saRNA of any one of claims 213–241, wherein two or more of the S protein portion, N protein portion, and transmembrane portion are separated by a linker.
244. The saRNA of claim 243, wherein: (i) the S protein portion and the N protein portion are separated by a first linker, and/or (ii) the N protein portion and the transmembrane portion are separated by a second linker.
245. The saRNA of claim 243, wherein: (i) the S protein portion and the transmembrane portion are separated by a first linker, and/or (ii) the transmembrane portion and the N protein portion are separated by a second linker.
246. The saRNA of claim 244 or 245, wherein each of the first and second linkers is a glycine or glycine-serine linker.
247. The saRNA of claim 244 or 245, wherein each of the first and second linkers comprises the amino acid sequence AAY.
PCT/US2024/034888 2023-06-22 2024-06-21 Sars-cov-2 t cell vaccines WO2024263826A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363509650P 2023-06-22 2023-06-22
US63/509,650 2023-06-22
US202363582967P 2023-09-15 2023-09-15
US63/582,967 2023-09-15

Publications (1)

Publication Number Publication Date
WO2024263826A1 true WO2024263826A1 (en) 2024-12-26

Family

ID=91853236

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/034888 WO2024263826A1 (en) 2023-06-22 2024-06-21 Sars-cov-2 t cell vaccines

Country Status (1)

Country Link
WO (1) WO2024263826A1 (en)

Citations (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002098443A2 (en) 2001-06-05 2002-12-12 Curevac Gmbh Stabilised mrna with an increased g/c content and optimised codon for use in gene therapy
US20090226470A1 (en) 2007-12-11 2009-09-10 Mauro Vincent P Compositions and methods related to mRNA translational enhancer elements
US20100129877A1 (en) 2005-09-28 2010-05-27 Ugur Sahin Modification of RNA, Producing an Increased Transcript Stability and Translation Efficiency
US20100293625A1 (en) 2007-09-26 2010-11-18 Interexon Corporation Synthetic 5'UTRs, Expression Vectors, and Methods for Increasing Transgene Expression
US8158601B2 (en) 2009-06-10 2012-04-17 Alnylam Pharmaceuticals, Inc. Lipid formulation
WO2012099755A1 (en) 2011-01-11 2012-07-26 Alnylam Pharmaceuticals, Inc. Pegylated lipids and their use for drug delivery
US8278063B2 (en) 2007-06-29 2012-10-02 Commonwealth Scientific And Industrial Research Organisation Methods for degrading toxic compounds
WO2013052523A1 (en) 2011-10-03 2013-04-11 modeRNA Therapeutics Modified nucleosides, nucleotides, and nucleic acids, and uses thereof
WO2013185069A1 (en) 2012-06-08 2013-12-12 Shire Human Genetic Therapies, Inc. Pulmonary delivery of mrna to non-lung target cells
WO2014093924A1 (en) 2012-12-13 2014-06-19 Moderna Therapeutics, Inc. Modified nucleic acid molecules and uses thereof
US20140206753A1 (en) 2011-06-08 2014-07-24 Shire Human Genetic Therapies, Inc. Lipid nanoparticle compositions and methods for mrna delivery
WO2014144039A1 (en) 2013-03-15 2014-09-18 Moderna Therapeutics, Inc. Characterization of mrna molecules
WO2014144196A1 (en) 2013-03-15 2014-09-18 Shire Human Genetic Therapies, Inc. Synergistic enhancement of the delivery of nucleic acids via blended formulations
WO2014144767A1 (en) 2013-03-15 2014-09-18 Moderna Therapeutics, Inc. Ion exchange purification of mrna
WO2014144711A1 (en) 2013-03-15 2014-09-18 Moderna Therapeutics, Inc. Analysis of mrna heterogeneity and stability
WO2014152030A1 (en) 2013-03-15 2014-09-25 Moderna Therapeutics, Inc. Removal of dna fragments in mrna production process
WO2014152027A1 (en) 2013-03-15 2014-09-25 Moderna Therapeutics, Inc. Manufacturing methods for production of rna transcripts
WO2014152031A1 (en) 2013-03-15 2014-09-25 Moderna Therapeutics, Inc. Ribonucleic acid purification
WO2015024667A1 (en) 2013-08-21 2015-02-26 Curevac Gmbh Method for increasing expression of rna-encoded proteins
WO2015051173A2 (en) 2013-10-02 2015-04-09 Moderna Therapeutics, Inc Polynucleotide molecules and uses thereof
WO2015051169A2 (en) 2013-10-02 2015-04-09 Moderna Therapeutics, Inc. Polynucleotide molecules and uses thereof
US9012219B2 (en) 2005-08-23 2015-04-21 The Trustees Of The University Of Pennsylvania RNA preparations comprising purified modified RNA for reprogramming cells
WO2015062738A1 (en) 2013-11-01 2015-05-07 Curevac Gmbh Modified rna with decreased immunostimulatory properties
WO2015085318A2 (en) 2013-12-06 2015-06-11 Moderna Therapeutics, Inc. Targeted adaptive vaccines
WO2015089511A2 (en) 2013-12-13 2015-06-18 Moderna Therapeutics, Inc. Modified nucleic acid molecules and uses thereof
WO2015101415A1 (en) 2013-12-30 2015-07-09 Curevac Gmbh Artificial nucleic acid molecules
WO2015101414A2 (en) 2013-12-30 2015-07-09 Curevac Gmbh Artificial nucleic acid molecules
WO2015130584A2 (en) 2014-02-25 2015-09-03 Merck Sharp & Dohme Corp. Lipid nanoparticle vaccine adjuvants and antigen delivery systems
WO2016040359A1 (en) 2014-09-08 2016-03-17 WebMD Health Corporation Structuring multi-sourced medical information into a collaborative health record
WO2017066797A1 (en) 2015-10-16 2017-04-20 Modernatx, Inc. Trinucleotide mrna cap analogs
WO2017127750A1 (en) 2016-01-22 2017-07-27 Modernatx, Inc. Messenger ribonucleic acids for the production of intracellular binding polypeptides and methods of use thereof
WO2017153936A1 (en) 2016-03-10 2017-09-14 Novartis Ag Chemically modified messenger rna's
WO2018053209A1 (en) 2016-09-14 2018-03-22 Modernatx, Inc. High purity rna compositions and methods for preparation thereof
WO2019036682A1 (en) 2017-08-18 2019-02-21 Modernatx, Inc. Rna polymerase variants
WO2020172239A1 (en) 2019-02-20 2020-08-27 Modernatx, Inc. Rna polymerase variants for co-transcriptional capping
WO2021188969A2 (en) * 2020-03-20 2021-09-23 Biontech Us Inc. Coronavirus vaccines and methods of use
WO2021216743A2 (en) * 2020-04-21 2021-10-28 Emory University Coronavirus vaccines, compositions, and methods related thereto
WO2022043551A2 (en) * 2020-08-31 2022-03-03 Curevac Ag Multivalent nucleic acid based coronavirus vaccines

Patent Citations (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002098443A2 (en) 2001-06-05 2002-12-12 Curevac Gmbh Stabilised mrna with an increased g/c content and optimised codon for use in gene therapy
US9012219B2 (en) 2005-08-23 2015-04-21 The Trustees Of The University Of Pennsylvania RNA preparations comprising purified modified RNA for reprogramming cells
US20100129877A1 (en) 2005-09-28 2010-05-27 Ugur Sahin Modification of RNA, Producing an Increased Transcript Stability and Translation Efficiency
US8278063B2 (en) 2007-06-29 2012-10-02 Commonwealth Scientific And Industrial Research Organisation Methods for degrading toxic compounds
US20100293625A1 (en) 2007-09-26 2010-11-18 Interexon Corporation Synthetic 5'UTRs, Expression Vectors, and Methods for Increasing Transgene Expression
US20090226470A1 (en) 2007-12-11 2009-09-10 Mauro Vincent P Compositions and methods related to mRNA translational enhancer elements
US8158601B2 (en) 2009-06-10 2012-04-17 Alnylam Pharmaceuticals, Inc. Lipid formulation
WO2012099755A1 (en) 2011-01-11 2012-07-26 Alnylam Pharmaceuticals, Inc. Pegylated lipids and their use for drug delivery
US20140206753A1 (en) 2011-06-08 2014-07-24 Shire Human Genetic Therapies, Inc. Lipid nanoparticle compositions and methods for mrna delivery
WO2013052523A1 (en) 2011-10-03 2013-04-11 modeRNA Therapeutics Modified nucleosides, nucleotides, and nucleic acids, and uses thereof
WO2013185069A1 (en) 2012-06-08 2013-12-12 Shire Human Genetic Therapies, Inc. Pulmonary delivery of mrna to non-lung target cells
WO2014093924A1 (en) 2012-12-13 2014-06-19 Moderna Therapeutics, Inc. Modified nucleic acid molecules and uses thereof
WO2014144039A1 (en) 2013-03-15 2014-09-18 Moderna Therapeutics, Inc. Characterization of mrna molecules
WO2014144196A1 (en) 2013-03-15 2014-09-18 Shire Human Genetic Therapies, Inc. Synergistic enhancement of the delivery of nucleic acids via blended formulations
WO2014144767A1 (en) 2013-03-15 2014-09-18 Moderna Therapeutics, Inc. Ion exchange purification of mrna
WO2014144711A1 (en) 2013-03-15 2014-09-18 Moderna Therapeutics, Inc. Analysis of mrna heterogeneity and stability
WO2014152030A1 (en) 2013-03-15 2014-09-25 Moderna Therapeutics, Inc. Removal of dna fragments in mrna production process
WO2014152027A1 (en) 2013-03-15 2014-09-25 Moderna Therapeutics, Inc. Manufacturing methods for production of rna transcripts
WO2014152031A1 (en) 2013-03-15 2014-09-25 Moderna Therapeutics, Inc. Ribonucleic acid purification
WO2015024667A1 (en) 2013-08-21 2015-02-26 Curevac Gmbh Method for increasing expression of rna-encoded proteins
WO2015051169A2 (en) 2013-10-02 2015-04-09 Moderna Therapeutics, Inc. Polynucleotide molecules and uses thereof
WO2015051173A2 (en) 2013-10-02 2015-04-09 Moderna Therapeutics, Inc Polynucleotide molecules and uses thereof
WO2015062738A1 (en) 2013-11-01 2015-05-07 Curevac Gmbh Modified rna with decreased immunostimulatory properties
WO2015085318A2 (en) 2013-12-06 2015-06-11 Moderna Therapeutics, Inc. Targeted adaptive vaccines
WO2015089511A2 (en) 2013-12-13 2015-06-18 Moderna Therapeutics, Inc. Modified nucleic acid molecules and uses thereof
WO2015101415A1 (en) 2013-12-30 2015-07-09 Curevac Gmbh Artificial nucleic acid molecules
WO2015101414A2 (en) 2013-12-30 2015-07-09 Curevac Gmbh Artificial nucleic acid molecules
WO2015130584A2 (en) 2014-02-25 2015-09-03 Merck Sharp & Dohme Corp. Lipid nanoparticle vaccine adjuvants and antigen delivery systems
WO2016040359A1 (en) 2014-09-08 2016-03-17 WebMD Health Corporation Structuring multi-sourced medical information into a collaborative health record
WO2017066797A1 (en) 2015-10-16 2017-04-20 Modernatx, Inc. Trinucleotide mrna cap analogs
WO2017127750A1 (en) 2016-01-22 2017-07-27 Modernatx, Inc. Messenger ribonucleic acids for the production of intracellular binding polypeptides and methods of use thereof
WO2017153936A1 (en) 2016-03-10 2017-09-14 Novartis Ag Chemically modified messenger rna's
WO2018053209A1 (en) 2016-09-14 2018-03-22 Modernatx, Inc. High purity rna compositions and methods for preparation thereof
WO2019036682A1 (en) 2017-08-18 2019-02-21 Modernatx, Inc. Rna polymerase variants
WO2020172239A1 (en) 2019-02-20 2020-08-27 Modernatx, Inc. Rna polymerase variants for co-transcriptional capping
WO2021188969A2 (en) * 2020-03-20 2021-09-23 Biontech Us Inc. Coronavirus vaccines and methods of use
WO2021216743A2 (en) * 2020-04-21 2021-10-28 Emory University Coronavirus vaccines, compositions, and methods related thereto
WO2022043551A2 (en) * 2020-08-31 2022-03-03 Curevac Ag Multivalent nucleic acid based coronavirus vaccines

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
"Genbank", Database accession no. MN908947.1
"Remington: The Science and Practice of Pharmacy", 2005, LIPPINCOTT WILLIAMS & WILKINS
BLOOM K ET AL., GENE THERAPY, vol. 28, 2021, pages 117 - 129
CUI ET AL., NAT. REV. MICROBIOL., vol. 17, no. 3, 2019, pages 181 - 192
KIM ET AL., CELL, vol. 181, no. 4, 2020, pages 914 - 921
KIM, J.H ET AL., PLOS ONE, vol. 6, 2011, pages e18556
LEEKHA ANKITA ET AL: "Ending transmission of SARS-CoV-2: sterilizing immunity using an intranasal subunit vaccine", BIORXIV, 15 July 2022 (2022-07-15), pages 1 - 47, XP093079471, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2022.07.14.500068v1.full> [retrieved on 20230906], DOI: 10.1101/2022.07.14.500068 *
MCCALLUM, M ET AL.: "N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2", CELL, 2021
NEEDLEMAN, S.BWUNSCH, C.D: "A general method applicable to the search for similarities in the amino acid sequences of two proteins.", J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
SHANG ET AL., PLOS PATHOG, vol. 16, no. 3, March 2020 (2020-03-01), pages e1008392
SMITH, T.F. & WATERMAN, M.S.: "Identification of common molecular subsequences.", J. MOL. BIOL., vol. 147, 1981, pages 195 - 197, XP024015032, DOI: 10.1016/0022-2836(81)90087-5
SONG ET AL., VIRUSES, vol. 11, no. 1, 2019, pages 59
STEPHEN F. ALTSCHUL ET AL.: "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", NUCLEIC ACIDS RES., vol. 25, 1997, pages 3389 - 3402, XP002905950, DOI: 10.1093/nar/25.17.3389
WAGNER ET AL., NATURE CHEMICAL BIOLOGY, 2018
WAN ET AL., J. VIROL, vol. 94, no. 7, March 2020 (2020-03-01), pages e00127 - 20
WEINBERG ET AL., J INFECT DIS, vol. 201, no. 11, 1 June 2010 (2010-06-01), pages 1607 - 10
XIA ET AL., CELL MOL IMMUNOL, vol. 17, no. 1, 2020, pages 1 - 12

Similar Documents

Publication Publication Date Title
US20240100151A1 (en) Variant strain-based coronavirus vaccines
US20230346914A1 (en) Sars-cov-2 mrna domain vaccines
US20240139309A1 (en) Variant strain-based coronavirus vaccines
US20240382581A1 (en) Pan-human coronavirus vaccines
EP4217371A1 (en) Multi-proline-substituted coronavirus spike protein vaccines
AU2021213108A1 (en) Coronavirus RNA vaccines
WO2022221335A9 (en) Respiratory virus combination vaccines
EP4355761A1 (en) Mrna vaccines encoding flexible coronavirus spike proteins
WO2023283642A2 (en) Pan-human coronavirus concatemeric vaccines
WO2023283645A1 (en) Pan-human coronavirus domain vaccines
EP4355891A1 (en) Coronavirus glycosylation variant vaccines
US20240299531A1 (en) Therapeutic use of sars-cov-2 mrna domain vaccines
WO2023092069A1 (en) Sars-cov-2 mrna domain vaccines and methods of use
WO2024050483A1 (en) Variant strain-based coronavirus vaccines and uses thereof
WO2024263826A1 (en) Sars-cov-2 t cell vaccines
WO2025019352A2 (en) Mers-cov mrna vaccines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24739977

Country of ref document: EP

Kind code of ref document: A1