Nothing Special   »   [go: up one dir, main page]

GB2582850A - Molecular barcodes for single cell sequencing, compositions and methods thereof - Google Patents

Molecular barcodes for single cell sequencing, compositions and methods thereof Download PDF

Info

Publication number
GB2582850A
GB2582850A GB2000471.9A GB202000471A GB2582850A GB 2582850 A GB2582850 A GB 2582850A GB 202000471 A GB202000471 A GB 202000471A GB 2582850 A GB2582850 A GB 2582850A
Authority
GB
United Kingdom
Prior art keywords
sequence
barcode
variable
nucleobases
solid support
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB2000471.9A
Other versions
GB202000471D0 (en
Inventor
m mccoy Adam
J Campau Kathleen
R Tollervey James
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Biosearch Technologies Inc
Original Assignee
Biosearch Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Biosearch Technologies Inc filed Critical Biosearch Technologies Inc
Publication of GB202000471D0 publication Critical patent/GB202000471D0/en
Publication of GB2582850A publication Critical patent/GB2582850A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are beads or particles having attached thereto nucleic acid or oligonucleotide sequences comprising a primer binding sequence, at least three molecular barcode sequences and a terminal polynucleotide or oligonucleotide sequence for use in single cell sequencing analysis and generating sequence libraries. At least one of the barcode sequences comprises a cell barcode sequence comprising a plurality of contiguous invariant or partially-variable bases, linked or unlinked. At least one of the barcode sequences comprises a unique molecular identifier sequence comprising variable, partially-variable or invariant bases. The third barcode comprises at least one of (i) a plurality of linked or unlinked invariant, partially-variable or variable bases, (ii) a plurality of invariant bases comprising at least one terminal partially variable base or (iii) a single partially-variable or invariant base.

Description

MOLECULAR BARCODES FOR SINGLE CELL SEQUENCING, COMPOSITIONS AND METHODS THEREOF
BACKGROUND
Single cell analyses are widely used to determine which genes or proteins are expressed individually or together in any given cell or to identify gene mutations or variations that are associated with cancer or cancer progression. High throughput sequencing technologies, such as single-cell RNA sequencing (seRNA-seq) and accompanying bioinformatics provide platforms for the analysis of single cells at the polynucleotide and/or protein level, e.g., to determine subpopulations of cells that are associated with self-renewal or drug resistance.
In the analysis of single cells, it is necessary to track all fragments of genetic or protein information obtained from a given cell back to that cell. If cells are to be mixed together, they may remain intact, for example, by multiple fluorescent tagging, of the cells that are then visualized by microscopy or flow cytometry. Alternatively, prior to mixing, the fragments of genetic information must be identified as having been obtained from the same cell. This type of single cell identification is the principle behind molecular harcoding, which adds a nucleic acid index sequence, typically a unique nucleic acid index sequence, (a barcode) to all of the polynucleotide fragments obtained from the cell, or to all of its RNA transcripts. For plate-based technology in which single-cell reactions are performed in micro wells, this becomes problematic when the readout is to be pooled, as generally occurs in next-generation sequencing (NGS) procedures. The principle is the same for droplet-based techniques, in which the reactions (including barcoding) are carried out on a single cell in an isolated droplet before the contents are allowed to mix.
A feature of many NGS barcodes is that the sequences of the barcodes are known in advance and are defined sequences. Needed in the an are sequencing methods and techniques with components, e.g., barcodes, and features to improve efficiency, accuracy and precision in performing various types of single cell analyses.
SUMMARY OF TILE DISCLOSURE
The present ention features a solid support, such as a bead or particle, comprising a linker and a plurality of nucleotide or oligonucleotide sequences attached thereto, wherein each attached nucleotide or oligonucleotide sequence comprises: a primer binding sequence; at least three nonidentical molecular barcode nucleic acid sequences, wherein at least one of the barcode sequences comprises a cell barcode (CBC) sequence comprising a plurality of contiguous invariant or partially--variable nucleobases, linked or unlinked; at least one of the barcode sequences comprises a unique molecular identifier (Ulvil) sequence comprising variable nucleobases, partially-variable nucleobases, invariant nucleobases, or a combination thereof; and a third or additional barcodes comprises at least one of (i) a plurality of invariant nucleobases, partially-variable nucleobases, variable nucleobases, linked or unlinked, or a combination thereof; (ii) a plurality of invariant nucleobases comprising at least one terminal partially lit variable nucleobase; or (iii) a single partially-variable or invariant nucleobase; and a terminal polynucleotide or oligonucleotide sequence (3' sequence). In an embodiment, the 3' terminal polynucleotide or oligonucleotide sequence is a capture sequence which binds to another polynucleotide or oligonucleotide sequence. In an embodiment, the 3' terminal polynucleotide or oligonucleotide sequence comprises identical repeating nucleobases (e.g., poly nucleobases) to which a polyadenylated nucleotide sequence binds. In embodiments, the barcodes can be spatially separated or continguous. It will be appreciated that the term "molecular barcode sequence' is used interchangeably with the terms "molecular barcode" or simply "barcode" herein.
in an embodiment, the solid support is a bead or particle. hi an embodiment, the bead or particle is a nanoparticle, microparticle, or macroparticle. In an embodiment, the bead or particle comprises glass, polystyrene, silicon-based polymer, polymethylmethacrylate (PMMA), polydimethylsiloxane (PDMS), silica gel, polyethylene, or composite materials, optionally with a paramagnetic core or coating. In an embodiment, the bead or particle is monodisperse and spherical. hi an embodiment, the bead or particle comprises an average diameter of 30 pm. In an embodiment, the linker component of the bead or is a cleavable linker or a non-cleavable linker. In a particular embodiment, the linker is a cleavable linker selected from a photocleavable (photolabile) linker, disulfide linker, thermally cleavable linker, or chemically cleavable linker. In a particular embodiment, the linker is a non-cleavable linker selected from a straight-chain polymer, a substituted hydrocarbon polymer, polyethylene glycol, such as a(ethylene glycol) or PEG-C3-C24.
In an embodiment, the at least three barcodes of the bead or particle are spatially separated or contiguous within the attached nucleotide or oligonucleotide sequence. In an embodiment, the CBC barcode sequence comprises at least four invariant, partially-variable, and/or variable nucleobases. in an embodiment, the CBC barcode sequence comprises at least four variable nucleobases. In an embodiment, the CBC barcode sequence comprises at least four variable nucleobases comprising 4*4*4*4*, wherein each 4* is a single nucleobase across all oligonucleotides attached to a given bead or particle, and d wherein the single nucleobase is selected from A, C, T, or G. in the foregoing embodiment, "4*" in the CBC barcode sequence designates a variable nucleobase at that position (i.e., all 4 nucleobases are possible at that position) in the sequence, with the "*" indicating that the same base is found in that position across all oligonucleotides attached to a given bead or particle (solid support).
in an embodiment, the UNE barcode sequence comprises at least four variable nucleobases. in an embodiment, the TAU barcode sequence comprises at least four variable nucleobases comprising NNNN, wherein each N is a single nucleobase of unfixed identity across all oligonucleotides on a given bead or particle (solid support), and wherein the single nucleobase is selected from A, C, 1', or G. In another embodiment, the UMI barcode sequence terminates in a partially-variable nucleobase, V, which comprises one of A, C, or G. In an aspect, a solid support, e.g., a bead or particle, as delineated in any of the foregoing; is provided in which the oligonucleotide sequences attached to the bead or M icroparticle comprise, in a 5' to 3' direction, a sequencing priming site, a CBC barcode sequence comprising 8-12 nucleobases, a UMI barcode sequence comprising 14 nucleobases, wherein 14 nucleobases comprises 13 nucleobases, each of which is either A, T, C, or G, and a partially-variable (V) nucleobase, wherein V = A, C, G, but not T, or an invariant nucleobase, and a terminal (3) polynucleotide or oligonucleotide capture sequence. In an embodiment, the polynucleotide 2.5 capture sequence is a poly-dT nucleobase sequence to which a polyadenylated nucleotide sequence binds. In an embodiment, the oligonucleotide sequence attached to the solid support, e.g., bead or particle, comprises identical repeating nucleobases which is a poly-di' nucleobase sequence. in a particular embodiment, the poly-T nucleobase sequence comprises at least 30 T nucleobases.
In certain exemplary, non-limiting aspects, the solid support, e.g., a bead or particle, delineated in any of the foregoing aspects comprises an attached nucleotide or oligonucleotide sequence comprising at least three barcode sequences, selected from: 5/f-TT ITT TA IRGCACiTGGTATCAACGCAGAGIAC414*444.4*4*4*4*4*4*4*3* NNNNNN-NTNVT4*4*4*4*VTTTMTTTTTTTTTTTTTTTTTTTTTTTT-3'; TTTTTTTAA/RGCACriGGT ATC AA C GCA GAGTAC4 * 4 *4*4*4*-4*4*4*NNNNNNNN4''4*4 *4*TITTITTITT MTH T _3'- 5'-ITITITTAA/RGCAGTGGIATCAACGCAGAGIAC4*4*4*4*4'* 4 NINNNT-Nr4"4* 4*NNNNVTTTT1TTT1"TTTTTTTTYITTFITVITTTT-Y; ITITITTAA/RCICAGIGGIATC AACGCACiAGTAC4*4*4*(PRR4*4*4*3*TNNTRRNN4*4* 4*4*3*TNNNNVETTFTTTTTITTTITT1717117171771-T-3'; s'- 1TITITTAA/RCiCACIMGIATCAACGCACiAGTAC4*4*4*4*R2*4*4*4*3*TNN2*RNN4*4 *4*4;*3*INNNWLIFTTTTITITITITTTTYTTTITTTITTT-3% 5?-TTITITT AA/RGC AGIGGIATC AAC GC AGAGT AC4*4*4*4*2*2*:4*4*4*3"INN2*2*NN4 *4*4*4*3 *TN NN N VTTTTTTTTTTTTTTTTTTTTTTT ITTITTT-3 '; Sf-TTIITTTAA/RCK; AGTGGTATC AACGC A GAGT CA*4*4*4*AL A4*4*4*3*TNN NN4*4* 4 *03 *TNNNNVETTFTTTTTITTTITT1717117171717T-3 '; 5r-ITTITTIAA/R.GCAGTG-CiTA ICAACCiCACIA.GTAC4*4*4*4*G64*4*4*3*TNNG6NN4"4* 4*4*3*TNNINN VTITTMTMITITTT ITITITYMITT-3% ITTITTIAARGCACMiCITATCAACCICAGAGTAC4*44'4*4't2*2*4*4"4*39 rN2*-2*NN4 *4*4*4*2*2*3*TNNNN VTTITTTTTITTTTTITTTTTTITTTITTTT-3"; 5r-ITITITTAA/RGCACITGCiTAICAACCiCA A.GT,A.C4 *)4*4*4*3*TNEN(4"4*)N N4*4*4*4*(4*4*)3*TNNNNVT1 FTITITTI ETTITTI] TTITIT1 TT] T-3';
A 5'-
TTITITTAA/RCiCACiTGGTATCAACGCAGAGTAC4*4*4*4*(AA)4*4*4*3 IN (AA)NN4 *4*4*4*(AA)3*TNNNNVTTTTTTTFITTTTTTTTTTTTFrTTTTTT T-3 5'.
TTTT-rf TA A /13.GC. Cr-MG-TA Te A ACGC AGA GT AC4*4*4*411-114*4*4*3 *I IN(TT)NN4* 4*4*4*(TT)3*TNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3'; TTITTITAA/RGCAGTGGTATCAACGCAGAGTAC4*4*4*4*((i G)4*4*4*3*1 T(GG)NN4 *4*4*4*(GG)3*INNNNA, 171-MITT-TT TTTITTITTTITT I TIT 5;-TTTTTTT A A/RGC AGTGGT A TC AACGC.AGAGTAC4*4*4*4*(CC)4*4*4*3*TNN(CC)NN4 *4*4*4*(CC)3*TNNNNV 'ITT t TTITITTITITT TITTITTTITITT-3'; 5'-TTITTTTAA/RCiCAGTOGTATCA.ACGCACiAGTAC4*4*4*4*4*(4*4*)4*4*4*4* TN, NN -NNNNNNNNNVT4*4*4*4* VTITTITITTFETTITITT 'FITT TTTITTITT-3'; TTTITTT AA/RCiCAGTGGTATCA.ACGCAGAGTAC4*4 4*4*4*4*4*4*N1 N NNN4*4*4* 4*NNNNVITTTITITITTTTITTTITITTITTITTTI3', s'_ TT'TTTTTAA/RCK:AGTCKiTATCAACGCAGAGTAC'4*4*4*4*4*4*4*4*NNNNNNNR*R*R tik*NNNNVTITTTI ITT TITTTTITTTITTITTI-TIT-3% TT TITTAA/RGCACi fliGTATCAACGCAGAGTAC4*4*4*4*4*4*4*4*NNN-NNNNY*Y*1" Nib N VITTTITT IFTI"FTrrITYLTTT'ETTTTI-, 5'-TTTTTTTAA/RGCAGTGGTATCAACGCAGAGTAC4*4*4*4*4*4* -*TNNNNNN4*4*4*4* 35 4*4*V*TNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 3'; TTTTTTTAA/RGCAGTGGTATCAACGCAGAGTACB*B*B*B*B*B*B*V*1 N NNNNNB*B *B*B*B*B*V*IN NNN VTITTITFLITTITT Fri FITITITITITIT-3, 5'-TTITITTAA/RGCAGTGGTATCAACGCAGAGTACI-YD*D*D*D*D*V*TNN-NN-N-N-D*D* D*D*D*D*V*TNNNN TITITITITITTITTITTITTITITTITTI-3; or 5'-TITTITTAA/RGCAGTGGTATC AA CGC AGAGTACEPIVH*H*H*H*V*T i\ININNNNIVH li*IPMPH*V*INNNNVITT TT TITTIT TITTTITTITTITTITITT-3; wherein each of 2*, 3*, and 4* represents a single nucleobase at a given position in the sequence with differing numbers of possible bases (A, adenine; T, thyrnine; C, cytosine; G, guanine) at each position, wherein 2* is a single nucleobase of either A or I; A or C, A or G; or C; Tor G; or C or G nucleobase; wherein 3* is a single nucleobase of either A, I; or C; A, C, or G; or I, C, or G; and wherein 4* is a single nucleobase of either A, C, T, or G, wherein TNT = A, C, T, or G; wherein R A or G, wherein NT = A, C, or G; wherein B C or G or T; wherein D = A or G or wherein H = A Of C or T; and wherein nucleobases set forth in parentheses, identical or nonidentical, are linked.
in an aspect, a nucleic acid or polynucleotide sequence comprising at least three nonidentical molecular barcode nucleic acid sequences is provided, wherein at least one of the barcode sequences comprises a cell barcode (CBC) sequence comprising a plurality of contiguous invariant or partially-variable nucleobases, linked or unlinked; at least one of the barcode sequences comprises a unique molecular identifier (UMI) sequence comprising variable nucleobases, pal-jolly-variable nucleobases, invariant nucleobases, or a combination thereof; and a third or additional barcode sequence comprises at least one of (i) a plurality,-of invariant nucleobases, partially-variable nucleobases, variable nucleobases, linked or unlinked, or a combination thereof; (ii) a plurality of invariant nucleobases comprising at]east one terminal partially variable nucleobase; or (iii) a single partially-variable or invariant nucleobase; and a terminal polynucleotide or oligonucleotide capture sequence (to which a nucleotide sequence binds). in an embodiment, the terminal polynucleotide or oligonucleotide sequence comprises identical repeating nucleobases to which a polyadenvlated nucleotide sequence binds.
in an aspect, a composition comprising the nucleic acid or polynucleotide sequence comprising said at least three barcodes, or the solid support, e.g., bead or particle, having the nucleic acid or polynrrcleotide sequence comprising the at least three barcodes as described herein rovided.
In an embodiments of any of the foregoing aspects delineated herein, the presence of the at least three (or three or more) barcode sequences within the nucleic acid or oligonucleotide sequence, such as attached to the solid support, e.g., bead or particle, allow for the identification of artefacts following amplification of cellular nucleic acids In an aspect, a kit comprising a bead or particle or the nucleic acid or polynucleotide delineated in any of the foregoing aspects and embodiments, and instructions for use thereof, is provided.
0 in an aspect, a method of analyzing nucleic acid sequences or sequence libraries obtained from a single cell is provided, in which the method involves lysing the cell to produce a single cell lysate; admixing the single cell lysate with the bead or particle (solid support) of any of the foregoing aspects and embodiments, wherein nucleic acids from the single cell lysate are captured on the nucleic acids or oligonucleotides attached to the bead or particle (solid support); 13 amplifying the nucleic acid from the cell; and analyzing the amplified nucleic acid sequences or sequence libraries using bioinformatic methods. In an embodiment, the single cell lysate is produced by lysing the cell with a surfactant.
Definitions Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991): and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them below, unless specified otherwise.
By "agent" is meant a peptide, polypeptide, nucleic acid molecule, or small molecule chemical compound, antibody, or a fragment thereof.
By "alteration" is meant a change (increase or decrease in the expression levels or activity of a gene or polypeptide as detected by standard art known methods such as those described herein. As used herein, an alteration includes a 10% change in expression levels, a 25% change, a 40% change, or a 50% or greater change in expression levels.
A molecular "barcode" refers to a nucleotide sequence that serves to identify polynucleotides and fragments thereof from a given cell (cell source) that is subjected to single cell analysis or sequencing. In embodiments, a barcode is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more nucleotides (nucleobases) M length. In an embodiment, a barcode is at least 8 nucleobases in length. A cell barcode (CBC) is a nucleic acid sequence that identifies the cell of origin (individual oligonucleotide primers on a bead may share the same CBC).
A unique molecular barcode (UNIT) is a short nucleotide sequence (barcode) that is included in each read in some next generation sequencing (NGS) protocols Using UMI barcodes, the RNA molecules to be tali piffled rev; tagged with random IN-mei-oligonueicoldes.
The number of CA tags is designed to significantly CXCeed the number of copies of each transcript s that is to be amplified, reaulti.4; sr this provides a controlkit alt: pl " Lion biarea. lids serve to reduce the quantitative bias introduced by cDNA amplification or PCR amplification, which is necessary to get enough leads for detection. (KJ vioja I, et al., 2012, "Counting absolute numbers of molecules using unique molecular identifiers," Nat Methmis 9(1):72-74). The inclusion of LIMN is particularly useful for single cell RNA-Seq (Islam 5, et al., 2014, "Quantitative single-cell RNA-seq with unique molecular identifiers," Nat. Methods, 11(2):163--166).
Barcoding refers to molecularly tagging single cells or sequencing libraries with barcode sequences (unique nucleic acid or oligonucleotide sequences), allowing for sample multiplexing.
The cell barcode (CRC) identifies the well or cell of origin of molecules of interest so that many samples (originating from many wells or cells) can be processed in parallel and sequenced in bulk. Sequencing reads that correspond to each sample are subsequently deo:involuted using the cell barcode sequence information.
An "invariant nucleotide or nucleo " refers to a nucleobase of a defined identity at a given position, such as in all nucleic acid (polyrincleotide or oligonucleotide) sequences, on all beads. By way of example, in the sequence "NNNN P4*4*4*4s the "T" nucleotbase is an invariant nucleobase. By way of further example, in the "primer binding sequence" region of the following standard sequence, all of the nucieobases indicated in bold are invariant and will always (without variation) constitute the sequence indicated in the following: 5'-TrIIIITTAA/RGCAGTCGTATCA-ACGCAGAGIAC.
A "variable nucleotide or nucleobase" refers to a nucleobase of unfixed identity, which can be any of A, T, G, ar C, or other non-standard nucleobase, at a given position in a nucleic acid (poly-nucleotide or oligonucleotide) sequence. By way of example, in the sequence "NNICNVT4*4*4*4'l, all "N's" and "4*'s" are variable nucleobases.
"Semi-variable" or "partially-variable" nucleotides or nucleobases as used =Ire. to a nucleobase of unfixed but limited potential identity at a given position in a nucleic acid (polynucleotide or oligonucleotide) sequence By way of example, a partially-variable nucleobase can be A, G, or C, but not Tat a given position of a sequence. Thus, in the sequence "NNNNVI4*4*4*4*", "V" is a semi-variable or partially-variable nucleotide or nucleobase, indicating either A, C, or (3, but not T, at that position of the sequence. The terms "semi-variable" and "partially variable" nucleotides or nucleobases are used interchangeably herein. As used herein, "comprises," "comprising," "containing" and "having" and the like can have the meaning ascribed to them in U.S. Patent law and can meanincludes," "including," and the like, "consisting essentially of" Or "consists essentially" likewise has the meaning ascribed in U.S. Patent law and the term is open-ended, allowing for the presence of more than that which is recited so long as basic or novel characteristics of that which is recited are not changed by the presence of more than that which is recited, but excludes prior art embodiments.
"Detect" refers to identifying the presence, absence or amount of a molecule, compound, or agent to be detected.
By "fragment" is meant a portion of a polypepti e or nucleic acid (polynucleotide) molecule. The portion contains at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the entire length of the reference nucleic acid molecule or polypepti de. .A fragment may contain 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides or amino acids.
Hybridization" means hydrogen bonding, which may be Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary nucleobases. For example, adenine and thymine are complementary nucleobases that pair through the formation of hydrogen bonds.
The terms isolated," "purified," or "biologically pure" refer to material that is free to varying degrees from components which normally accompany or are associated with it as found in its native state. "Isolate" denotes a degree of separation from original source or surroundings. "Purify" denotes a degree of separation that is higher than isolation. A "purified" or "biologically pure" protein or polynucleotide is sufficiently free of other materials such that any impurities do not materially affect the biological properties of the protein or polynucleotide, or cause other adverse consequences. That is, a polynucleotide (nucleic acid), polypeptide, or peptide is purified if it is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized. Purity and homogeneity are typically determined using analytical chemistry techniques, for example, polyacrylamide gel electrophoresis or high-performance liquid chromatography. The term "purified" can denote that a nucleic acid, protein, or peptide gives rise to essentially one band in an electrophoretic gel. For a protein that can be subjected to modifications, for example, phosphorylation or glycosylation, different modifications may give rise to different isolated proteins, which can be separately purified.
By "isolated polynucleotide" is meant a nucleic acid (e.g., a DNA) that is free of the genes which, in the naturally-occurring genome of the organism from which the nucleic acid molecule of the invention is derived, flank the gene. The term includes, for example, a recombinant DNA that is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote; or that exists as a separate molecule (for example, a cDNA. or a genomic or cDNA fragment produced by PCR or restriction endonuclease digestion) independent of other sequences. In addition, the term includes an RNA molecule that is transcribed from a DNA molecule. Fragments or portions of polynucleotides and nucleic acid sequences, such as are obtained by lysing cells, e.g., a single cell, as described herein, are also considered to be isolated polynucleotides_ "marker' is meant any polynucleotide (or protein) that is detectable in a sample, such as a cell, undergoing analysis. In one embodiment, a marker identifies a cell or cell sample or a polynucleotide or protein obtained therefrom. In an embodiment, a marker is one or more nucleobases in a barcod.e nucleotide sequence. In an embodiment, a marker is one or more invariant or semi-variable nucleobases in a sequence, such as a barcode sequence, that is identifiable in bioinformatics analysis, e g., of sequences generated from a single cell.
The term "mutation" refers to a substitution of a nucleotide base or amino acid residue within a sequence, e.g, a nucleic acid or amino acid sequence, respectively, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in. the art" and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboraiou Atinial (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, :N.Y.
(2012)).
As used herein, "obtaining" as in 'obtaining an agent' includes synthesizing, purchasing, performing an action on to acquire, or otherwise acquiring the agent.
By "polynucleotide"' is meant a nucleic acid molecule or sequence, e.g., a double-stranded (ds) DNA polynucleotide, a single-stranded (ss) DNA polvnucleotide, a dsRNA polynucleotide, or a ssRNA polynucleotide, e.g., that encodes one or more polypeptides. The term encompasses positive-sense (i.e., protein-coding) DNA polynucleotides, which are capable of being transcribed to form an RNA transcript, which can be subsequently translated to produce a polypeptide following one or more optional RNA processing events (e.g., introit excision by RNA splicing, or ligation of a 5' cap or a polvadenyl tail). The term additionally encompasses 2.5 positive-sense RNA polynucleotides, capable of being directly translated to produce a polypeptide following one or more optional RNA processing events.
The terms "nucleic acid" and "nucleic acid molecule," as used herein, refer to a compound comprising a nucleobase and an acidic moiety, e.g., a nucleoside, a nucleotide, or a polymer of nucleotides. Typically, polymeric nucleic acids, e.g., nucleic acid molecules comprising three or more nucleotides are linear molecules, in which adjacent nucleotides are linked to each other via a phosphodiester linkage. In some embodiments, "nucleic acid" refers to individual nucleic acid residues (e.g. nucleotides and/or nucleosides). In some embodiments, "nucleic acid' refers to an oligonucleotide chain comprising three or more individual nucleotide residues. As used herein, the terms "oligonucleotide" and "polynuclecifide" can he used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three nucleotides).
In some embodiments, "nucleic acid" encompasses RNA as well as single and/or double-stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule On the other hand, a nucleic acid ICl molecule may be a non-naturally occurring molecule, e.g., a recombinant DNA or RNA, an artificial chromosome., an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides. Furthermore, the terms "'nucleic acid," "DNA," "'RNA," and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5' to 3' direction unless otherwise indicated. In sonic embodiments, a nucleic acid is or comprises natural nucleosides/nucleotides (nucleobases), (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2--thiathymidine, inosine, pyrrolopyrimidine, 3-methyl adenosine, 5-methyicytidine, 2-a.minoadenosine, C5-bromouridine, C5 -fluorouridine, C5-iodouridirte, C5-propynyl-uridine, C5-propynyi-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-derizaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguithine, and 2-thiocytidine); chemically modified bases; biologically modified bases (e.g., methylated bases); intercalated bases; modified sugars (2'-e.g.,fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e_g., phosphorothioates and 5'-Ar-phosphoramidite linkages).
reduces'' is meant a negative alteration at least 5%, 10%, 25%, 50%, 75%, or 100%.
Eby "reference' is meant a standard or control condition. A "reference sequence" is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset of or the entirety of a specified sequence, for example, a segment of a full-length eDNA or gene sequence, or the complete cDNA. or gene sequence. For polypeptides, the length of the reference polypeptide sequence will generally be at least about 16 amino acids, at least about 20 amino acids, at least about 25 amino acids, or about 35 amino acids, about 50 amino acids, or about 100 amino acids. For nucleic acids, the length of the reference nucleic acid sequence will generally be at least about 50 nucleotides, at least about 60 nucleotides, at least about 75 nucleotides, or about 100 nucleotides, or about 300 nucleotides, or any integer thereabouts or therebetween.
By "specifically binds" is meant a nucleic acid molecule, polypeptide, or complex thereof (e.g., a binding protein such as a transcription factor and its cognate nucleic acid binding region), or a compound, or molecule that recognizes and binds a polypeptide and/or nucleic acid molecule of the invention, but which does not substantially recognize and bind other molecules in a sample, for example, a biological sample.
By "subject" is meant a mammal, including, but not limited to, a human or non--human mammal, such as a non-human primate, e.g., a marmoset, or a non-human mammal, such as a bovine, equine, canine, ovine, or feline mammal, or a sheep, goat, llama, camel, or a rodent (rat, mouse), ferret, gerbil, or hamster. In an embodiment, the subject whose cellular nucleic acids are to be analyzed by single cell sequencing methods is a human, such as human patient who has a particular disease or condition, such as a cancer, tumor, or neoplasm. Examples of subjects and patients include mammals, such as humans, having diseases or conditions or who are at risk of having such diseases or conditions.
Ranges provided herein are understood to be shorthand for all of the values within the range. For example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29., 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, or 50.
By "substantially identical" is meant a polypeptide or nucleic acid molecule exhibiting at least 50% identity to a reference amino acid sequence (for example, any one of the amino acid sequences described herein) or nucleic acid sequence (for example, any one of the nucleic acid sequences described herein). Preferably, such a sequence is at least 60%, preferably at least 70%, more preferably 80% or 85%, and most preferably 90%, 95% or even 99% identical at the amino acid level or nucleic acid to the sequence used for comparison, for example, over a specified comparison window. Optimal alignment may be conducted using the homology alignment algorithm of Needleman and Wunsch, 1970,1, Mot. Biol., 48:443 Sequence identity is typically measured using sequence analysis software (for example, U Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, BLAST, BESTFIT, GAP, or PILEUP/PRETTY-BOX programs). Such software matches identical or similar sequences by assigning degrees of homology to various substitutions, deletions, and/or other modifications. Conservative substitutions typically include substitutions within the following groups: glycine" alanine; valine, isoleucine,leticine; aspartic acid" glutamic acid, asparagine, glutamine; serine, threonine, lysine, arginine; and phenylalanine, tyrosine. In an exemplary approach to determining the degree of identity, a BLAST program may be used, with a probability score between e and e-mu indicating a closely related sequence.
Sequencing depth refers to a measure of sequencing capacity spent on a single sample, which can be reported, for example, as the number of raw reads per cell.
Split pooling refers to an approach in which sample material is subjected to multiple rounds of aliquoting and pooling,often used for producing unique barcodes by step-wise introduction of distinct barcode elements into each aliquot, as also described further infra.
By "substantially identical" is meant a polypeptide or nucleic acid molecule exhibiting at least 50% identity to a reference amino acid sequence (for example, any one of Me amino acid sequences described herein) or nucleic acid sequence (for example, any one of the nucleic acid sequences described herein). In one embodiment, such a sequence is at least 60%, 80% or 85%, 90%, 95% or even 99% identical at the amino acid level or nucleic acid to the sequence used for comparison.
ucleotides may include any nucleac acid molecule or sequence as described herein. Polynucleotide may also encode a pot ypeptide, or a fragment thereof. Polynucleotides having substantial identity to a nucleic acid sequence or oligonucieotide sequence, such as an endogenous sequence or a sequence attached to a solid support, e.g., bead or particle, as described herein, are typically capable of hybridizing with at least one strand of a double-stranded nucleic acid molecule. By "hybridize" is meant pairing of the nucleic acid molecules to form a double-stranded molecule between complementary poi ynucleotide sequences (e.g., a nucleic acid sequence or oligonucleotide sequence attached to a solid support, e.g., bead or particle, as described herein), or portions thereof, under various conditions of stringency. (See, e.g.:, Wahl, Cr. M and S. L. Berger (1987)Alethods Enzymal. 152:399; Kimmel, A. R. (1987) Methadv itiCy17101 152:507).
For example, stringent salt concentration will ordinarily he less than about 750 mM NaCI and 75 iriM trisodium citrate, preferably less than about 500 rriM NaCI and 50 mM trisodium citrate, and more preferably less than about 250 mM NaCI and 25 mM trisodium citrate_ Low stringency hybridization can be obtained in the absence of organic solvent, e.g., formamide, while high stringency hybridization can be obtained in the presence of at least about 35% formamide, and more preferably at least about 50% formamide. Stringent temperature conditions will ordinarily include temperatures of at least about 30°C, more preferably of at least about 37°C, and most preferably of at least about 42°C. Varying additional parameters, such as hybridization time, the concentration of detergent, e.g., sodium dodecyl sulfate (SDS), and the inclusion or exclusion of carrier DNA, are well known to those skilled in the an. Various levels of stringency are accomplished by combining these various conditions as needed. In one embodiment, hybridization will occur at 30°C in 750 InIM NaCI, 75 miM, trisodium citrate, and 1% SDS. In a more preferred embodiment, hybridization will occur at 37°C in 500 mM NaCI, 50 rriM trisodium citrate, 1% SDS, 35% formamide, and 100 ug/m1 denatured salmon sperm (ssDNA). In another embodiment, hybridization will occur at 42°C. in 250 HIM NaCl, 25 mM trisodium citrate, 1% SDS, 50% formamide, and 200 pgiml sslYNA. Useful variations on these conditions will be readily apparent to those skilled in the art.
For most applications, washing steps that follow hybridization will also vary in stringency Wash stringency conditions can be defined by salt concentration and by temperature.
As above, wash stringency can be increased by decreasing salt concentration or by increasing temperature. For example, stringent salt concentration for the wash steps will preferably be less than about 30 inIVI NaCI and 3 mM trisodium citrate, and most preferably less than about 15 mM NaCi and 1.5 mM trisodium citrate. Stringent temperature conditions for the wash steps will ordinarily include a temperature of at least about 25°C, more preferably of at least about 42°C, and even more preferably of at least about 68° C. In an embodiment, wash steps will occur at 25°C in 30 MMNaCI, 3 mnh trisodium citrate, and 0.1% SDS. in another embodiment, wash steps will occur at 42 C in 15 mM NaCl, 1.5 WO trisodium citrate, and 0.1% SDS. in yet another embodiment, wash steps will occur at 68° C in 15 ruMNaC1, 1.5 InM trisodium citrate, 0 and 0.1% SDS. Additional variations on these conditions will be readily apparent to those skilled in the art. Hybridization techniques are well known to those skilled in the art and are described, for example, in Benton and Davis (Science, 196:180, 1977); Grunstein and Hogness (Proc. Nati Acad. Sci., USA, 72:3961, 1975); Ausubel et al. (Current Protocols in Molecular Biology, Wiley nerscience, New York, 2001); Berger and Kimmel ((Huide to Molecular Cloning Techniques, 1987, Academic Press, New York); and Sambrook et al.,Itiolectilar Cloning: A Laboratutyklanutil, Cold Spring Harbor Laboratory Press, New York.
Nucleic acids that do not hybridize to each other under stringent conditions may still be substantially identical, e.g., if the polypeptides that they encode are substantially identical, which occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions. Nonlimiting examples of "moderately stringent hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaCi, 1% SDS at 37 C, and a wash in 1 x S SC at 45 C. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency.
Ranges as provided herein are understood to be shorthand for all of the values within the range. for example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50, inclusive of the first and last values.
Unless specifically stated or obvious from context, as used herein, the term "or' is understood to be inclusive. Unless specifically stated or obvious from context, as used herein, the terms "a", 'an", and 'the" are understood to be singular or plural.
As used herein, the term "about" or "approximately" means within an acceptable error range for the type of value described and the method used to measure the value. For example, these terms can signify within 20%, more preferably within 10%, and most preferably still within 5% of a given value or range. More specifically, "about" can be understood as within 20%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, a' 0 5w 0 1°7 0.05%, or 0.01% of the stated value or range Alternatively, especially in biological systems, the term "about" means within one log unit (i e., one order of magnitude), preferably within a factor of two of a given value. Unless specifically stated or obvious from context, as used herein, the term "about" is understood as within a ranee of normal tolerance in the art, for example within 2 standard deviations of the mean Unless otherwise clear from context, all numerical values provided herein are modified by the term about.
The recitation of a listing of chemical groups or component pioups in any definition of a variable herein includes definitions of that variable as any single group or combination of listed groups. The recitation of an embodiment for a variable Or aspect herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.
Any compositions or methods provided herein can be combined with one or more of any of the other compositions and methods provided herein.
DETAILED DESCRIPTION OF THE EMBODIMENT'S
The products, molecules, compositions and methods described herein relate, in part, to discoveries made to improve single cell sequencing techniques and approaches that comprise molecular barcodes.
As described herein, solid supports, such as beads or microparticles, comprising attached oiigonucleo?ides and denti fying barcode sequences to capture cell ul ar nucleic acids, such as mRNA, are provided. Such barcoded beads are of a diameter that can be employed in single cell sequencing methods using, high-throughput technology comprising microtiter plates or using micron uidic devices. The beads, which are useful in methods and techniques for performing single molecule or single cell analysis, allow molecular/genetic "information" to be appended to individual reactions to process and analyze a large population of reactions or assays collectively, while also being able to partition results by individual reactions or assays. In general, the beads have oligonucleotides ("oligos"), including barcodes, attached to their surfaces in a particular configuration The combination of a harcoded bead with the particular configuration of oligos (and especially the barcode(s)) paired with a capture sequence(s) provides beads that are advantageous and useful in single cell applications, e.g., transcriptome analysis and sequence library generation.
The analysis of nucleic acids, particularly, mRNA, derived from single cells, is performed by a number of single cell sequencing methods. In general, individual cells isolated from tissue are admixed with beads or microparticles to which oligonucleotides (primer sequences) containing molecular barcode sequences ("barcodes') are affixed. The individual cells, which may be in the well of a culture plate or encapsulated in a droplet, are lysed, and their 1.5 mRNA sequences bind to the oligonucleotide primers on the beads or microparticles. The xiRN.As captured on the beads are reversed transcribed into cDNAs such that the beads contain single-cell transcriptomes that are harcoded so as to retain and molecularly memorialize the identity of the individual cell from which each mRNA originated. The oligonucleotide primer sequences attached to the beads or Mies-opal-tides contain a sequence allowing for PCR amplification as well as barcode sequences. By way of example, attached to a given bead or microparticle may be at least 108 oligonucleotides (oligos), all of the oligos comprise a primer binding site followed by at least three barcodes or barcode combinations, including a cell barcode ("CBC") sequence, a unique molecular identifier (UMI) barcode sequence, at least a third barcode sequence comprised of at least one of (i) a plurality of invariant nucleobases, partially-variable nucleobases, variable nucleobases, linked or unlinked, or a combination thereof: (ii) a plurality of invariant nucleobases comprising at least one terminal partially variable nucleobase; or (iii) a single partially-variable or invariant nucleobase, such as 4*4*4*4", and a terminal, polynucleotide capture sequence, as described herein. The presence of 3-ibareodes per oligonuci eoti de attached to a bead, which has a plurality of such oligos attached, offers advantageous properties and utility as described herein. in an embodiment, the terminal polynucleotide capture sequence comprises an oligo poly LIT sequence present at le 3' terminus to capture poly-A containing mRNAs from the cell.
In embodiments, nucleic acids to be analyzed by single cell sequencing approaches using the barcoded beads described herein (i.e., the analyte) may obtained from any type of cell or cell sample. The cell may be obtained from a tissue or organ of a mammalian subject, including humans. The analyte nucleic acids may be obtained from an organelle within a cell, such as a nucleus, mitochondrion, lysosome, or vesicle. In embodiments, the cell as analyte may be any type of cell, for example, without limitation, a cell from brain, eye, retina, skin, blood, plasma, serum, lymph, heart, lung., pancreas, liver, kidney, bladder, ovary, testis, cervix., prostate, bone 0 marrow, spleen, stomach, intestine, gastrointestinal tract, gall bladder, esophagus, or trachea.
The cell may derive from a microorganism or pathogenic microorganism, such as a bacterium, fungus, parasite, worm, and the like. In addition, the cell may derive from tissue or organ having a disease or pathology, such as a cancer, tumor, or neoplasm cell, or another type of cell variant or mutant.
Cells and other agents for nucleic acid analysis may be lysed for release of nucleic acid using a lysi s reagent as known and practiced in the art, for example, without limitation, a nonionic surfactant (detergent), e.g., ethoxylates; a zwitterionic surfactant, e.g., (3-4(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CRAPS) or betaines; an anionic surfactant, e.g., sodium dodecyl sulfate (SIBS), ammonium hairy] sulfate (SLS), sodium laureth sulfate, sodium lauryl sarcosinate, or a chaotropic salt, e.g., urea or guanidium thiocyanate. In a particular embodiment, the lysis reagent is an anionic surfactant. In single cell sequencing methods, cells are suspended in an aqueous solution, e.g., phosphate buffered saline or saline; the barcoded beads as described herein are suspended in lysis reagent. The cells and beads are admixed, e.g., in the well of a microliter (multi-well) plate or while simultaneously being flowed 2.5 into a microfluidic apparatus or device where they mix and form droplets. In the plate well or droplets, the cells I yse, and nucleic acid (e.g., mRNA) is released and bound to, or captured onto, the surface of the barcoded bead by hybridization of the cell's nucleic acid (e.g., mRNA) to the nucleic acid, polynucleotide, or oligonucleotide sequences, including molecular barcode sequences, attached to the surface of the beads.
Single cell sequencing methods and platforms, such as high-throughput methods for transcriptome analysis of tens of thousands of individual cells, in which the described barcoded beads may be employed are known and practiced by those having skill in the an. See, e,,g., Liang, J. et al., 2014, J Genetics and Genomics, 41(10):513-528; Haque"A, et al., 2017, Genome Medianha, 9:75; Olsen, T.K. and Baryawno, N., 2018., Oar Protot Mot 122(1):e57; Hwang, B. et al., 2018, Ever/meted Molecular Medicine, 50:Anicle No. 96, 14 pages).
In some cases, barcodes, based on their method of manufacture and/or the number of barcodes required to maintain sequence uniqueness, are created without explicitly knowing in advance the specific barcode polynucleotide sequences. In such cases, the features of the 0 resulting data are used to determine the barcode sequences used in a particular experiment. In such cases, it is often advantageous to use two barcodes. A first cell barcode (CBC) is typically generated by a combinatorial synthesis approach and allows the identification of the specific partition (well or droplet), and thus, the associated cell. Typically, this cell barcode is produced by split-pool synthesis, where the population of heads is distributed (-split") into separate 13 columns or other partitions (-typically 4 partitions), and only 1 nucleobase (base) is added per column, such that all beads in column 1 receive a single A, all beads in column 2 receive a single 'T, etc. Following the addition of the single base, the beads from all 4 columns are pooled together and mixed before again being distributed (split) among 4 columns for the second and subsequent rounds of nucleotide addition. This split-pool-split method, or "split-pooling," when performed for sufficient synthesis cycles, allows for the construction of barcodes in which all oligonucleotide sequences on a given bead are the same, but the barcode sequence on each bead differs from that of every other bead in the population.
The second UMI barcode sequence distinguishes downstream workflow duplications and allows for identification of unique molecules. A system involving two barcodes is especially useful in that the cell barcode (CRC) identifies the cell of origin (individual oligonucleotide primers on the bead may share the same CRC, and the UNIT barcode identifies unique (and non-unique) molecules of interest so that many samples (originating from many cells) can be processed in parallel and sequenced in bulk. Thereafter, the associations can be reconstructed with correction for downstream workflow duplication events. , 11 be appreciated by the troduced during PCR skilled practitioner in the art, the UNTI enables de-duplication ilia amplification of the mRNA library that is captured on a bead. The UNIT barcode puts a unique barcode on each mRNA molecule when it is copied to cDNA. In this way, multiple copies of a particular mRNA can be distinguished as arising naturally in the cell, or as a duplication event that occurs during the PCR process. If the same cDNA is sequenced multiple times associated with the same UNIT, then it is determined to be a duplication arising from PCP. in the workflow.
Tf the same cDNA is sequenced multiple times, and each of cDNAs is associated with a unique UNIT barcode, then, multiple copies of that mRNA were present in that cell.
A two barcode approach, while advantageous over single cell sequencing methods that use one barcode, presents certain challenges. One challenge is that several factors can lead to IO sequences that confound the downstream analysis. These factors include imperfect sequence integrity of the barcodes themselves and downstream events that can collectively lead to artefacts n data and sequencing analyses Such downstream events include, but are not limited to, partial sequences participating in the reactions, in vitro recombination events, PCR errors and artefacts, and sequencing errors. ous types of bioinforrnatics tools are used by those skilled in the art to identify and correct for these errors; however, the compositions and methods as described herein, which involve the use of multiple harcodes, e.g., more than two or more than three barcodes, are highly useful for distinguishing true biological variation from errors arising from downstream events as described above and for improving the data quality resulting from molecular barcode-based data analysis, without disrupting the practicality of manufacture or the single cell sequencing workflow.
in an embodiment, a single cell analysis platform or technique is provided in which beads or microparticles comprising attached nucleotide or oligonucleotide sequences are employed. 'The nucleotide or oligonucleotide sequences comprise oligonucleotide primers attached to the bead or microparticle and a plurality of barcodes, e.g., at least three barcodes, which are spatially arranged and/or comprise sequence features that create predictable, allowable states within, between, or among the barcodes, where non-conforming states can be attributed to imperfect sequence integrity of the barcodes and/or downstream events.
In an embodiment, the barcodes are incorporated into primer nucleic acid molecules or oligonucleotides that are attached to beads or microparticles. In another embodiment, the oligonucleotide primers containing the harcodes as described herein are synthesized directly on the beads or microparticies, e.g.; in a 5' to 3' tenon). In an embodiment, a barcoded primer bead is provided which comprises, from 5' to 3', a bead or microparticle to which are attached a linker sequence, a primer binding site or sequence (sequencing primer binding sequence) to enable PCR amplification, at least three or more barcode sequences and an oligo di sequence.
In embodiments, the oligo dT sequence comprises about or equal to 10, 15, 20, 25, 30, 35, 40, 45, or 50 or more base pairs (bp). In an embodiment, the oligo dT sequence comprises about or equal to 20 or more bp. In an embodiment, the oligo dT sequence comprises about or equal to 30 or more bp. In an embodiment, the oligo dT sequence comprises about or equal to 35 or more bp.
lil In an particular embodiment, the oligos attached to a bead comp 5' to 3' direction, a (sequencing) primer binding site or sequencing priming site, an 8-12 base barcode (generated by split-pool synthesis such that every oligo on the bead has the same barcode, but every bead has a unique barcode), a DIal region of 14 bases comprising 13 nucleobases (N's) followed by 1 variable (V) nucleohase (V-A, C, 0, but not T), followed by a poly-T region of approximately 30 T bases with a 3' OH so that the oligos can be extended by polvmerase or reverse transcriptase. In an embodiment, the primer binding sequence comprises one mixed base and is not an identical sequence used as a sequencing priming site. In an embodiment, the bead is monodisperse and spherical with a 30 um average diameter, comprising a polymer with a pore size of 100 nm (1000 angstrom), with oligos including 3H--barcodes as described herein attached to the bead surface through a linker, which may be cleavable or non-cleavable. In an embodiment, the oligos are attached to beads comprising a hydroxylated methacrylic polymer.
In an embodiment, the oligos are attached to the beads through a non-cleavable linker comprising bexa(ethylene glycol). As will be appreciated by the skilled practitioner, monodisperse beads or particles are typically of a uniform size in a dispersed phase.
The beads or microparticles (solid supports) suitable for use in the products and methods described herein may comprise materials such as, without limitation, glass, polystyrene, silicon-based polymer, hydroxylated methacrylic polymer IMAM, pol ydi methyl siloxane (PDMS), silica gel, polyethylene, or composite materials, optionally with a paramagnetic core or coating. In an embodiment, the beads are monodisperse and spherical. The beads may have an average diameter of 30 um"Alternatively, the beads may be of a diameter such as, for example, without limitation, 1 pm, 2 pm, 5 pm, 10 pm, 15 pat, 20 pm, 25 pm, 30 pm, 35 pm, 40 pm, 45 pm, 50 pm, 55 pm, 60 pm; 65 pm, 70 pm, 75 pm, 80 pm, 85 um, 90 pm, 95 pm, or 100 pin in diameter, including values therebetween. In addition, the beads need not be "micro" in size,but may encompass "nano" sized beads, or, in some case, macro sized beads.
The linker sequences suitable for use in the products and methods described herein include, without limitation; cleavable linkers; e.g., photocleavable (photolahile) linkers, disulfide linkers, other thermal or chemically cleavable linkers, or non-cleavable linker sequences. Tn an embodiment, the non-cleavable linker comprises a straight-chain polymer. in an embodiment, the linker is a chemically-cleavable, straight-chain polymer. In an embodiment, the linker is non- 0 and may be a substituted hydrocarbon polymer. In an embodiment, the linker is photolahile and may be a substituted hydrocarbon polymer. In an embodiment, the linker is non-cleavable and may be a polyethylene glycol, e.g., a PEG-C3 to PEG-C2 ii, such as hexa(ethylene glycol).
In an embodiment, split-pool synthesis may he employed to prepare the barcoded beads as described herein. Split-pool barcoding methods are known and used in the art. By way of example, the preparation of a large number of beads, particles, microbeads, nanoparticies; or the like having attached thereto unique nucleic acid barcodes generally involves performing polynucleotide synthesis on the surface of the beads in a split-pool technique such that in each cycle of synthesis the beads are split into subsets that are subjected to different chemical reactions; and then this split-pool process is repeated in two or more cycles, to produce a combinatorially large number of distinct nucleic acid barcodes. Such a method may involve, for example, performing reverse phosphoramidite synthesis on the surface of the bead in split-pool manner, such that in each cycle of synthesis, the beads are 'split' into four reactions with one of the four canonical nucleotides (I. C, G, or A) or unique oligonucleotides two or more bases in length; and repeating.. this process numerous times, e.g., at least two or three time, and preferably more than twelve times, so as to generate by the end of the repeated cycles, many millions (e.g., more than 16 million) of unique barcodes on the surface of each bead in the pool. See, e.g., WO 2016/040476. In an embodiment, the pool-and-split process is repeated for 2 cycles to 500 cycles, or for 2 cycles to 300 cycles, or for 2 cycles to 200 cycles. Tn an embodiment, oligos of 1 to about 150 split-pool bases are built on (attached to) the solid support (bead).
The barcodes as described herein are attached to beads or micropartieles in conjunction with other oligos and capture sequences and are used in single cell analysis techniques to provide enhanced sequencing data analysis, improve overall data quality and greatly enhance the ability to distinguish signal from noise in the sequence analyses. Based on computational analysis of the presence and configurations of the at least three barcodes (e.g., "3+ barcodes"), the techniques described herein provide cost effective and more efficient components and procedures for single cell sequencing and manufacture. For example, in the oligos attached to the bead or particle solid support as described herein, the primer binding site is upstream of the barcodes (1.e, 5' to the barcodes), and the capture sequence is at the terminal. 3° end of the oligos on the beads. The barcode elements contained in the oligos may be be configured in any way and still be useful, such as described and exemplified herein. in theory, there is no upper or lower limit to die number of oligos that can be attached per bead from. By way of example, 108 oligos per bead, each ago comprising a primer binding site, 3+ barcodes, and a terminal (3') capture sequence is provided.
The arrangement of barcodes and oligos attached to the beads for the uses as described offer several beneficial properties. Adding markers within the data provides better separation of signals allowing for better risk assessment(s). An error within one barcode does not necessarily alter the confidence in the other barcodes if they can be clearly distinguished. A properly configured third barcode, even if only a single partially-variable or a single invariant nucleobase, not only carries its own barcode information, but also helps to better delineate the other barcodes and thus better partition confidence levels.
It becomes possible to establish expected linkage even when the surrounding sequences show variation within the data set and may not be known prior to the analysis. For example, the oligo sequence presented below and described in Example 2 includes a "cell barcode ((IBC) -- --201tt'LltiLl* third barcode" arrangement. Observing a pattern of bases-N bases-) bases" in the sequencing data allows the attribution of each base to the correct barcode, even if the number of bases in that section do not precisely match the exemplar sequence (such as would occur due to an N-1 synthesis error).
Detecting the expected barcode patterns in the sequencing data helps to identify synthesis or downstream workflow errors. Example 2 sequence: 51-T FITT' IAA (RCICAGTGOTATC TACiACITAOPW84*4*4*4*44*NNNINNNNN4*4*4 TITTTEVITTITTITTITTITTITTT-3'. Often with these analyses, the specific sequences of the -Lli\TI or even the cell barcode are not known prior to sequencing, which makes errors particularly difficult to identify and characterize. The expected linkage provides additional analysis options that assist in identifying variance caused by single events and variance due to the noise in the workflow and analysis. When the expected linkage is among features spread within the oligo, then it becomes particularly useful in identifying particular sources of error, such as recombination The same bases can be used for multiple purposes. In essence, the barcodes can be used independently or collectively, and they can be combined in different configurations in whole or in part.
Of particular note, the number of effective barcodes can be higher than the number actual barcodes, i.e., the three or more (3 -F-) barcode system as provided and described herei provide more than simply three pieces of information. This embraces existing two barcode systems that can be analyzed as if they were more than two barcodes, although most of the benefits require spatial arrangements that are not practical with the current two-barcode systems Either markers such as invariant bases (which becornea an additional barcode oft base), or semi-or partially-variable bases need to be present, or the barcodes must be arranged in such a fashion that there are three or more different barcodes to take full advantage of the improved analyses, and thus the potential advantages of these approaches are not realized without a third or additional barcodes Simply considering a cell barcode to be two barcodes provides a limited benefit.
While the analyses that derive from an at least 3-barcode system as described herein can be applied to existing systems, physically separating the barcodes provides additional benefit, especially when there are one or more bases between barcodes that are analyzed together. Such a benefit is that an additional data layer can be established. Where two barcode systems allow distinguishing among cells, and among molecules within a cell, additional classes can now be distinguished that may be higher (e.g., groups of beads/cells) or lower (e.g., subsets of ampli cons from one molecular capture) groupings from the cell and molecule level. These additional layers allow a more complete analysis with only limited additional sequence, the same sequence information, or even less sequence.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are well within the purview of the skilled artisan. Such techniques are explained fully in the literature, such as, "Molecular Cloning: A Laboratory Manual", second edition (Sambrook, 1989); "Oligonucleotide Synthesis" (Gait, 1984); "An al Cell Culture" (Freshney, 1987); "Methods in Enzymology" "Handbook of Experimental Immunology" (Weir, 1996); "Gene Transfer Vector s for Mammalian Cells" (Miller and Calos, 1987); "Current Protocol 5 in Molecular Biology" (Ausuhel, 1987); "PCR: The Polvmerase Chain Reaction", Mullis., 1994), "Current Protocols in Immunology" (Col igan, 1991). These techniques are applicable to the production of the polynucleotides, viral vectors and viral particles of the invention, and" as such, may be considered in making and practicing the invention. Particularly useful techniques for particular embodiments are described in the
Examples herein.
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the products:, compositions and therapeutic methods of the invention, and are not intended to limit the scope of what is described and exemplified herein.
Nomenclature and conventions used in the Examples described herein Certain nomenclature and conventions associated with the barcode sequences contained in the nucleotides or oligonucleotides linked to beads as shown in the Examples herein are described below. Typically, bases added combinatorially, such as by split-pool synthesis or by several other methods (such as strand extension, unique amidites, such as trimer amidites, or ligation/chemical attachment methods, combinatorial chemistry, etc.), are referred to as "J," for example, as reported by Macosko, E.Z. et al. (2015, Cell, 161:1202-1214). While the nomenclature '1" is used as convention of the split-pool, the goal is to indicate bases that vary among wells/beads/cells or other features, but that differ from the "random" N's, in that the method(s) of manufacture result(s) in at least a plurality of oligos per bead or per feature that are
A
the same, but differences among the set (e.g., among beads) exist. It will be appreciated by the skilled practitioner that the use of the nomenclature "r when referring to bases added combinatorially does not necessarily correlate with, and is not limited to, a particular method of production or manufacture.
To avoid confusion that may occur when using the standard TUPAC nomenclature which includes the "J" designation, a new nomenclature has been adopted herein, which employs numbers with an asterisk to denote the respective number of possible bases, i.e."A" T, C, or G, (or the respective number of choices of alternative bases) to occupy a given or particular position in a nucleotide or oligonucleotide sequence. By way of example, the nomenclature "1*" indicates an identical nucleotide base ("nucleobase" or "base") within a given sequence and/or across a given set of sequences. Of note, the nomenclature "I*" does not indicate a particular base" such that the sequence Al *A, may be AAA, ATA, ACA, or ACiA. in a similar manner, 1*1* may be AA, AT, AC" AG, TA, TT, IC, TC, CA, CT, CC, CC, GA, GT, GC, or CG, but across the set of sequences being considered (e.g., a set of beads), only one of the afore-noted 13 combinations would generally be found.
Similarly, 4*4* may be AA, AT, AC, AG TA, IT, TC, TG" CT, CC CC, GA, GT, GC, or GG, but across the set of sequences being considered, position 1 may potentially include A, T, C, and C, and position 2 may potentially include A, T, C, and G: however, it is not necessarily the case that all of the combinations (e.g., AA, AT, AC, AG, TA, IT, TC, IC, CA, CT, CC, CU, GA, GT, GC, or GG) are or will be present. For example, the 4*4* nomenclature could designate the sets (AT, TC, CA, GC) or (AA, AT, TA, TT, CA, CT, GC, (Ki).
According to the nomenclature adopted herein, each asterisk denotes a base; therefore, 1*, 2*, and 4* all represent a single base length, although the number of possible bases at that position differs among the three.
Thus, for nucleobase designations within the barcodes described herein, e.g., 1", 2*, 3*, or 4*, the numerical portion designates the number of alternative choices for a base that can be used at a given position in a nucleic acid or oligonucleotide sequence, and the asterisk (*) combined with the numerical portion indicates that there is a single base at that given position in the sequence. Accordingly, and by way of exa.mple, 1* = one of A, T, C, or Cr nucleobase at the given position; = either A or T; A or C; A or Ci; T or or G; or C or G nucleobase at the,siven position; 3* = either A, or C; A, C, or G; or T, C, or G-nucleotides at the given position; and 4 = either A, T, C, or G at the given position.
in a similar fashion, the designations 1*1 * and 1 *4*2* both represent oligos of three (3) bases in length (3 base long oligos). Ttwiil also be understood in this context that the "1-3*" and the "1-4*" designations both represent a single base length, but the set may include from I to 3 different bases in the fonner case and from 1 to 4 different bases in the latter ease. For brevity, the dash nomenclature need not he used. Thus, it will be appreciated that, unless explicitly stated otherwise, that "2*" signifies "1 -2*," "3*" signifies "1 -3*," and "4*" signifies"1-4*." in some examples descdbed herein, a similar nomenclature to that described above is used, but the TIMM: nomenclature each of the bases is used together with asterisk for clarity of illustration. By way of example, MP AC nomenclature provides the following base designations for bases used at a given position in a sequence: IUPAC Nucleotide code Base designation i A Adenine C Cytostnt-i Guanine
C
T (or 1) Tbyrnine (or Limed) R A or G Y C or 'T s I S GorC A or T 1 K G or I NI A or C B C or G or T I D A or G or T II A or C or T I V A or C Of G i R Any base Thus, "R*" represents a single base position that may be either A or G at a given position in a nucleotide or oligonucleotide sequence shown in the Examples. As used in the Examples, "R" refers specifically to a special case of "2*"" in which "2*" = A or G." Similarly, "Y*" = C or T. Also, similarly, "V*" represents a single base position that may be A, C, or G at that position, and is therefore a special case of 3*. As demonstrated in the Examples, "V*" is used preceding a T.. in part., because "V*" provides a bioinformatic marker in the sequence data.. Since V represents any of the bases A, C. or G, other than T, a T found in the position designate "V" upon data analysis is due to an error, often a base deletion or "n-l". The use of "standard' V bases before T is known to those skilled in the an and is used barcoded beads described herein. The use of "V*" is a novel extension of the aforementioned aspect, and is used herein to illustrate that a specific "3*" before an invariant base can be a particularly useful bioinformatic marker for single cell sequencing analysis. The specific use of V* and T is for illustration only and is not intended to be limiting.
Similarly, Bt, D*, and Er may be used in a sequence directly before A, C, or G, respectively. In some cases herein, parentheses (or brackets) are used to indicate specifically-linked bases. For example"A.(4!*4")A may comprise the sets of bases"AA.AA, AT TA" ACCA" and ACiGA, where the two bases represented within the bracket are linked and identical, as well as all the combinations in which they are non-identical. The bases may be linked due to their method of synthesis, for example, a split-pool synthesis method, in which two bases are added per column or partition (the "split") during one round of synthesis, e.g. TT, AA:: GG, or CC, rather than the typical single base addition per split round of split-pool synthesis. This results in a population of beads (solid supports) which have a defined position in the oligonucleotide sequence that contains two identical bases, and in which every olio on a particular bead has the same two bases at that location; the two identical bases are linked by virtue of being synthesized during the same split' of split-pool synthesis. Thus, for A(4*4*)A, AACA and ATCA and other sequence combinations are also possible, but for any given subset of sequences, there exists only one of the possible combinations. For utility, it must be known what possible combinations exist II C Nucleotide code Base designation Gap within a set, and the combinations must be less than all possible combinations, or the 444* equivalent becomes the default. The simplest use case is that in which both bases are identical.
It will be understood that while an asterisk is often used in sequencing nomenclature to refer to a phosphorothioate linkage between bases, the asterisk nomenclature used in the poi ynucleotides or oligonucleotides comprising the barcodes and sequences described herein does not indicate a phosphorothioate linkage between the bases, unless specifically described as such.
The three-barcode approach described herein provides more utility in distinguishing artefacts and a high degree of resolution power in the same types of single cell experiments, while lso allowing new applications. Because of sequencing length constraints, it may be practical to shorten the barcodes as much as possible. In this context, particular configurations of multiple barcodes can be especially useful. Aside from invariant bases, such as in the primer binding site or in terminal capture sequence (polvT), a particular base is not selected. Instead, a base is selected if it will vary across every oligonucleotide on a given bead (e.g., N or V nomenclature). N or V bases are useful to identify a unique molecule and are thus are especially used to generate the MI. Alternatively, a base is selected if it will be identical across all oligos on a given bead, but different across a population of beads (i.e., the "J" nomenclature or the 4*, V*, etc. nomenclature used herein. These bases are useful to identify a well/partition or cell, and thus are used to generate a CBC. The base that is selected for use at each position depends on the needs of the experimental system. For example, the amount of diversity needed in the experimental system can dictate the structure and length of the barcodes in the oligos. Consequently, and by way of example, a system that is limited to only 1,000 wells to partition cells plus beads can accommodate a shorter CBC than a system that uses 1,000,000 wells. The CBC is what identifies each independent well and thus single cell, e.g., a CBC of 5 bases in length (4*4*4*4*4*) = 1,024 possible sequences, while a CBC of 10 bases in length = 1,048,576 possible sequences_ Practical constraints, such as next generation sequencing (DIGS) read length and oligonucleotide synthesis failure rate, may dictate that shorter barcode length is preferable; however, if greater diversity is required, a three+ barcode system provides greater diversity in a limited sequence length, among other benefits.
it particular, a unique feature of the 3+ barcode system is that the same sequences can e used for multiple purposes. For example, as described in Example I below, the first 4* series may be used as the CBC barcode (identified as 4*4*4*4*4*4*4*4ai4*4*4"3*), the 3N's+V may be used as the Ulla barcode, and the second i4*4*4'4''V series may be used as the third barcode (or as a fourth and fifith barcodes, if the Vs are defined as single base barcodes). Because the third barcode serves to better identify artefacts, due to synthesis errors of the preceding CBC and HMI barcodes, as well as to delineate specific features of the well/cell location in a potentially well/cell-specific and individual oligo-specific fashion, due to increased discriminatory power, the third barcode can enhance both of the preceding barcodes.
EXAMPLES
EXAMPLE 1.
A bead or microparticie comprising a linker oligonucleotide sequence attached to an oligonucleotide sequence comprising multiple (3-4-) barcode sequences as described herein was designed and produced in the bead-linker-oligonucleotide as shown below: Bead-Linker-5f-TTITTTTAATRGCAGIGGTATCAACGCAGAGTAC4*4*4*4*4*4*4*4*4" 4*3*TNNNN.N NNNNNNNN VT4*4'4*4*VTTTTTTT ITTTTIT ITTITTTTITTITIT T-3 ' This bead-barcode product comprises a bead or microparticle, a linker, nd multiple molecular barcode components comprising the oligonucleotide molecule attached to the bead or microparticle. Tile barcode shown in this Example allows for lower total diversity to be tolerated in the barcode, while still achieving similar or improved downstream analysis, e.g., artefacts due to synthesis errors of the preceding CRC and UNIT barcodes are better identified, and there is more discriminatory power to identify specific wells/cells and specific captured nucleic acids 2.5 (_,RNA transcripts). In the exemplary, representative harcoded bead described above, the component sequences are identified as follows: BeadeLinker-fic Priming see' uence: ITITTITA[A Of R]GCAGIGGIATCAACGCAGAGTAC
A
Barcode one (cell barcode (CBC)): 4*4*4*4*4*49*4*4*4*4* [31 Barcode two (ULM barcode)
NNNNNNNNNNNNNV
Barcode three rrreril*V Repeated oligonucleotide terminus that binds mRNA polyA TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3' for example, as further described in the Examples below. Such features are provided through several properties or mechanisms that are intrinsic to the 3+ barcodes described herein and can be enhanced or fine-tuned for different uses by virtue of the different placements and complexity of the barcodes in the oligonucleotide molecule attached to the bead. In the priming sequence component above, "R" indicates an A or a Gnucleobase, in accordance with the.1.1_313AC conventional nucleotide code.
An advantage of the 3' barcode as described herein is that it can be used to "extend" the number of nucleotides in or overall length of the CBC barcode, as well as to extend the number of nucleotides in or overall length of the tTNIT barcode (albeit with slightly more complicated analysis to the UNIT portion. Thus, the overall length or size of the CBC barcode can be shortened and/or the overall length or size of the UNIT barcode can be shortened, thereby resulting in more total discriminatory power, even when an additional barcode is included. For example, given the sequence described in Example 1, barcode three can be considered in conjunction with the CBC barcode and in conjunction with the barcode; the CBC barcode in isolation allows for ell ')*3 possible sequence combinations, or 12,582,912 discrete barcodes. By considering the third barcode in conjunction with the CBC, there are ((411)* 3) x Ott 3), or 9.6 billion discrete barcode sequences possible. This allows for more total discriminatory power than is available from a two barcode system which utilizes only a CBC and LAIL because more independent wells/cells and molecules arising from those wells/cells can be uniquely identified. The third barcode (plus additional barcodes) thus extend the effective length of both preceding barcodes. Use of the 3+ barcode system provides more discriminatory power (more potential discrete barcode sequences) in fewer total bases, compared with a two barcode system.
EXAMPLE 2
A bead or micron rticie comprising a linker oligonucleotide sequence attached to an o ibornicleoti de sequence comprising multiple (3+) barcode sequences as described herein was designed and produced in the beadslinker-oligonucleotide as shown below: Bead-Einker-5'- 4 4*4*4.*4. NN,1\14*4*4 TTITTTIAARGCAMIGGIATCAACHCAG/ *VITTITTTTVITITITTITTITTITTITTTr.
ltl in the bead or microparticle linked to the barcode-containing oligo as described in this Example, the first barcode (CBC barcode) is indicated in italics, the second barcode (LIM1 barcode) is indicated by dotted underlining; and the third barcode is indicated in bold font/underline. In this Example, the linker comprises a non-cleavable linker comprised of polyethylene glycol, such as hexa(ethylene glycol). In the barcode sequence of this Example, the CBC barcode comprises 8 bases; the EMI barcode comprises 8 bases (now known to be inadequate for most analysis), but the third barcode adds another 4 barcode bases.
In general, a [BC barcode and a UMI barcode comprising 8 nucleotide bases would not be adequate for most single cell sequencing analyses, because there is not sufficient diversity with a (CBC or HMI) barcode having only 8 bases (65,536 possible sequences). It is likely that a greater number of individual wells/cells (in the case of the CBC) and a greater number of transcripts (in the case of the UM1) would need to be able to be discriminated in a typical, single cell experiment. Thus, the barcodes need to be longer (contain more nucleobases per barcode). Total length is limited by the constraints of oligo synthesis and sequencing read length, i e, the shorter the bead-bound sequence, the better in terms of manufacturability and sequencing read length However, after experimentation and sequencing using the above bead-barcode product" individual cells were able to be distinguished by using both "4*" barcodes contained therein, which essentially function as a single barcode. Thus, the effective CBC barcode length is 12 bases. in addition, by combining the EMI barcode comprising 8 bases and the third barcode, 4*4*4*4*, the totality of the multiple barcodes in the oligonucleotide attached to the linker ultimately provide a more highly effective UMI barcode, which functions to distinguish unique sequence capture events from duplicates that occur from PCR error and other downstream processes. By way of example, nucleotide sequences that are identical at all 12 bases are identified as duplicates and can be collapsed to a single capture event.
Those sequences that are identical, or those sets of sequences that are unique from other sets, comprise the data that are used in downstream analysis. As noted above, an 8-base UMT barcode corresponds to about 65,000 potential sequences, but, due to imperfect sequence balancing and other factors, is effectively lower. The inclusion of the third barcode increases the effective diversity and provides for new analysis potential, such as, allowing for the ability to better distinguish randomly identical UMT barcodes attached to two different beads from potential artefacts in which a UNE barcode attached to one bead switches to be associated with the CBC barcode of another bead Notably, for certain analyses, the location of the bases can be ignored, thus allowing the first and the third barcodes to function effectively as if they were a single barcode. However, the emergent properties that occur through the selective placement and use of the barcodes as described herein are not achieved by a two-barcode arrangement.
Moreover, for many applications, the CBC barcodes may be positioned between or among UM,' barcodes, as described in Example 3.
MIME 3 A bead or microparticle comprising a linker oligonucleotide sequence attached to an 2.0 oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced in the bead-linker-oligonucleotide as shown below: Bead-Linker-5'-TTITTTTAAIRGCACiTOG6'ATCAACtiCAGACITAC4*4*4*4*4*4 4"4*NNSNINN 4 4*4 4*N NNIN TTTITTTTITTTTr TTTTTTTTTT LITTTT-3 2.5 This Example demonstrates that a larger number of interspersions of the different types of barcodes can be included in the oligonucleotide portion containing the barcode sequences, including some defined or partially defined sequences, for example, using invariant bases as markers or semi-or partially-variable bases rather than invariant bases.
EXAMPLE 4
A bead or microparticle comprising a linker oligonucleotide sequence attached to an oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced in the bead-linker-oligonucleotide as shown below: Bead-Linker-5r TTTITTIAA/RGCACiTGGIATCAACGCAGAGTAC4*4*4*4*RR4*4*4*3*TNi RNN4 *4* 4*4*3*TNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3' Invariant bases and semi-or partially-variable bases serve as extremely useful bioinformatics markers. However, those who assess the data and perform the sequencing analysis (sequencers) often read the invariant bases as errors, which results in warnings and cart even terminate sequencing runs, thus resulting in lost data. To avoid these problems, it is typically necessary to include other library material that is sufficiently variable to avoid these errors (which increases costs and/or difficulty in setting up the sequencing) or to avoid the use of invariant bases. Consequently, invariant bases are usually not used. Therefore, it is particularly useful to use a limited combinatorial approach in a slightly different way. For example, in the polynucleotide sequence shown in this Example, some or all of the "R" nucleobases interspersed within the other bar-codes could be "2*"'s. By way of example, the above sequence in this Example is provided in two different configurations as presented in Examples 5 and 6, respectively, below. Illustratively, in the sequences shown in Examples 5 and 6, the RIPA and 2* could both be used singly or together, depending on the specific need, and are not intended to he limiting, as other semi-or partially-variable bases (e.g., IN etc.) or combinatorial bases (e.g., I-4*) could also be used. PLE 5
As described above in Example 4, a bead or microparticle comprising a linker oligonucleotide sequence attached to an oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced in the bead-linkeroligonucleotide as shown below: Bead-Linker-5"-TETTITTAAAWCAGIGGIATCAACGCAGAGTAC4*4*4*4*R2*' *4 4*3 NN2*R.NN4*4 *4*4*3*TNNNNVTT7TT7TTTTT1T1,TTTTTTTTTTT'T'TTTT-3'
EXAMPLE 6
As described above in Example 4, a bead or microparticle comprising a linker oligonucleotide sequence attached to an oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced in the bead-linkeroligonucleotide as shown below: Bead-tinker-5c TTTTTTTAA/RGCAGTGGTATCAACGCAGAGTAC4*4*4*4*2*2*4*4*4*3*TNN2*2* '4 *4*4*4*3*INNNNVITITTITTITTITTITETTIT ITITIFETT-3 In this Example, one particular configuration of the 2 would be analogous to that of the "if; is", such that a collection of beads, some of which contain only A at one of the 2* positions and others which contain only Gat the same position, is provided. This allows for identification of potentially problematic sequences, for example, those arising from errors during oligonucleotide synthesis, such as failure of a base to incorporate on the growing oligonucleotide sequence, together with a capping failure, which collectively result in an oligonucleotide sequence that is missing a base (referred to as an "N-I" failure). In the configuration of this example as outlined above, the presence of a base other than A or G at the specified 2* positions would indicate a problematic sequence. The configuration presented in this example can also both eliminate unwanted sequences that are likely to have arisen by artefactual processes and recognize and retain those sequences containing errors, but which may nonetheless be informative, and by extension, allow for quantify=ing the confidence. This applies by having two invariant (or variant but without the full complement of all 4 bases) bases that, by virtue of their being known and linked, can better discern the different errors. The bases may be the same or different and may include multiple permutations. Ilustratively, two representative permutations were designed and produced, based on the specific 2* configuration (analogous to R). The above sequence in this Example was produced by making two sets of beads and subsequently mixing them together, resulting in some beads comprising the sequences shown in 6-1 and 6-2 below; substantively all of the beads have either the sequence of 6-1 or the sequence of (5-2, but not both sequences. 6-1:
Bead-Linker-51-TTITYITAA/RGCAGIGGTAICAACCICAGAGTAC.V4*4*4*AA4*4 4 NNAANN4*4* 4*4*3*TNNNNVTITT rTITITTTTITTTITTTITTITITTT-3, 6-2: Bead-Linker-5i TTTTIT1'AARGCAGTGUTATCAACCie,AGAGTAC4*4*4*4*GG4 * NNGGNN4*4 *4*4*3*TNNNWTTITTTITTTITTITTITITTTITTITITT-31 The beneficial results of this approach are that collectively across the set of beads there can be a diversity of sequences which avoids sequencing flags and other problems, and yet it provides stable markers that provide multiple benefits to the analyses. An additional benefit is that the sets of unique barcodes become smaller, and thus, any errors that do arise will not result in adverse effects. More importantly, the use of such sequences allows for easier identification of in vitro recombinants or crossing events because of the different markers within the larger set.
For example, if a sequence comprises AA at the First positions and GG at the last positions, then there is a problem with that sequence, because neither AA-GCS T1 or GG-AA is possible within the bead-bound oligo sequence. Typically, the sequences are simply removed from analysis. These types of known sequences as markers can assist in the identification of such problem sequences, but also, those sequences can potentially be "recovered" and assigned correctly in some cases.
By way of illustration, a sequence similar sequence to that shown in this Example is presented in Example 7 below, with the difference that the sequence comprises two additional sets of 2* bases.
EX: MLLE 7 A bead or 3nicropartiele comprising a linker oligorrucleatide sequence attached to an oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced in the bead-linker-oligonucleotide as shown below: Bead-Linker-5'-TTITTITA.ATRCICAGTGGTATCA.ACCiCAGA.GTAC4*4*4*4*2*2*4*4* 3 T 2*-NN4 *4*4*4*2*2*3*TNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3' The sequence presented in this E ample is similar to that shown in Example 6, with the difference that the above sequence in this Example comprises two additional sets of 2* bases.
Illustratively, if the sets of 2* bases should be AA, AA, and AA or GG, GG, and GG, then sequences having the sets of 2" bases of AA, GC-, GG can be eliminated as being artefactual, and the point of error can be identified as occurring between the first and second set of 2* bases. Thus, all of the bases from the second set of 2* bases and beyond can be used. By way of example, in an analysis, all of the bases from the second set of 2* bases and beyond would connect via the poly T or other capture sequence to the target of interest. Thus, the first barcode (comprising 4'4*4*4*), the second barcode (comprising 2*2), and additional barcodes preceding the second 2* set (4*4*4*3*TNN) might be discarded from consideration in matching and other analyses, but the remaining barcodes (e.g 2*2*NN4*4*4*4*2*2*3*TNNNITV) could still be used with high confidence, thus allowing those sequences to be compared to their respective barcodes or to full length sequences and still be utilized in the analysis.
if, for example, only one bead in the analysis perfectly matches the numbered barcode bases (e.g., matches the bolded bases in 2*2*NN4*4*4*4*2*2*3*TNNNNV), then the head can be reasonably identified, and the remaining UMI bases can be effectively used as well. In the case where the remaining bases do not unambiguously identify the bead, the bases will restrict the possible beads to a subset of the total beads, and this may also be sufficient for many analyses. Specifically, among other advantages, such a method avoids or reduces the number of times that sequences are erroneously attributed to the incorrect first barcode (which, in a twobarcode system, may be the entire CBC barcode) and the number of times that chimeric sequences result in either eliminating the data or mistakenly assigning the data to a unique barcode which, by extension, may mean mistakenly assigning the data to a unique cell.
While the linkage/matching approach has been illustratively described herein using bases that are all the same (identical bases), the approach may use bases that are different (not all of the bases are the same). if, in a data set, the expected sequence begins with AA and has GG as the later marker, then a sequence with AA at the start and AA at the end is the incorrect one. The bases are not required to be fully defined as invariant bases. if R (mixed A HQ) is used, and/or other mixed bases are used, the approach can still distinguish correct from incorrect bases, although it may be less effective than using invariant bases. The possibility to establish expected linkage between the bases used as markers is needed, and thus to identify deviations from that linkage. Also, even a single barcode that consists of expected bases can be valuable for use in the approach.
Matching (or failure to match) can occur within a single barcode. In practice, the lower diversity that results reflects that a single barcode of known composition will not provide sufficient diversity for most experiments on its own, unless it is quite a long sequence. However, by using combinations of barcodes, sufficient diversity is added. One barcode may provide most of the diversity, and the second barcode may provide some additional diversity, and also the benefit of known composition. With only a single barcode of known composition, it may be difficult in do certain linkage analyses, but there is still the benefit of identifying some incorrect sequences. It is not necessary to use more than one invariant/panially variant base sequentially (as illustrated above). However, in practice, adding two such bases provides a clearer signal, and adding 3 or more such bases provides a still clearer signal, but there are diminishing returns from the standpoint of simply increasing confidence in the sequence calls, although there can be additional benefits.
In this Example, the sequence provided comprises three sets of markers (Piz, the three sets of -2*2*" nucleobases present in the sequence) in addition to the other barcodes. The first and last do not necessarily need to be compared. Linkage (or lack thereof) between any sets of markers can be useful this way. In practice, there is utility in having the markers within, between, and among the other barcodes, but they can optionally be positioned or situated elsewhere in the sequence.
EXAMPLE 8
A bead or microparticle comprising a linker oligonucleotide sequence attached to an oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced in the below bead-linker-oligonucleotide products as shown below: Beada-Linker-5'-TTITTTIAARGCAGTOGIATCAACGCAGAGTACAM*4*4*(e04')4*,P4* INN(4'4')N N4'4?44*4*(4*4*)3*TNNNNVTTITTTITTTITTTITTI1.ITTITTITITT-3 8-1: Bead-Linker-5'-ITITTITAA/RGCACiTCKITATCAACGCAGAGIAC4*4*4*4*(AA)4*4* *3*INN(AA)NN4 *4*4*4*(AA)3*TNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3' 8-2: Bead-Li nker-5'-ITITTITAA/RGCAGTGGTAICAACGCAGAGTAC4*4*4*4*(IT)4*4*4* TNN(JI)NN4* 4*4*4*(TT)3*TNNNNVIT'ffITTIITTFITTT" TTrITTTTYTTTTTT-3' 8-3: Bead-Linker-5?- 17TITITTAALRGCAGTGGTATCAACCICACIAGTAC4*4*4*4*(GG)4*4*4*3*TNN(GG)NN4 *4*4*4*(GG)3*TNNNNVTTTTTTTTTTTTTTTTTTTTTT ETTITTET-3' 8-4: Bead-Linker-51-TTTTTTTAA/RCiCAGTGGTATCAACGCAGAGTAC4*4*4*4*(CC)4*4*4 T IN(CC)NN4 *4*4*4*(CC)3*TNNNNVITITTTIFTITTITTTITTITI2TTITITT-3' 8-5: Bead--Linker-5'-ITITTTTAA/R.G-CAGICiCiTATCAACGCAGAGIAC4*4*4*4*4*(4*4*) 4*4*4*4"3*INNNN NNNNNNNNNVI4*4*4*4* VTITITTITTITITITTITITTITITITITT-3" Variations of the bead-linker-oligonucl cuticle products having manufacturing benefits are illustrated in this Example. The first bead-1 r-oligonucleotide sequence presented in this Example is essentially a variation of that presented in Example 7, and the bead-linkeroligonucleotide sequence denoted as 8-5 above is essentially a variation of that presented in Example 1, but with the dual 2* bases replaced with a two base set of linked 4* bases, and further includes, illustratively, repeats of 2 identical bases that are the same as those in the 2 base repeat and are consistent within each of the sequences of 8-I to 8-5. In embodiments, the bases (base repeats) may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 10, or 20 bases, or longer. The bases only need to be known, and not necessarily identical. For the purposes of example, 2 identical bases may be used for illustration and in actual analysis. The bases can occur multiple times in different barcodes or, optionally, within a single CBC.
This described method is particularly advantageous when using the split-pool technique. Typically, in split-pool synthesis, the beads are split into separate columns (typically 4) or other partitions, and only one base is added, such that all of the beads in column I receive a single A, all those in column 2 receive a T, etc. The beads are then pooled together and mixed before being re-distributed among the columns. By adding 2 or more bases rather than i base at defined positions (from 1 to all possible positions within the barcode), an internal marker is created. Because the two positions are fully linked (even if they are not the same nucleotide base, "nucleobase"), no benefit to the total diversity of the barcodes is provided. Lengthening the barcode would not normally not he done without benefit to the barcode diversity; however, in the context of the benefits of 3+ barcodes described herein and the associated analysis derived therefrom, increasing barcode length in the manner described herein can be extremely valuable. With the addition of only a single extra base to be sequenced, the approach described erein provides a particularly useful additional barcode (two additional bases in the example shown). Of note, each round of split-pool synthesis is very costly and labor-intensive. The addition of an extra base requires essentially no additional labor and adds very little overall cost; however, the extra base addition can provide many advantages for the subsequent sequence and data analyses. In fact, because the approach provides markers that allow better analysis overall, the total diversity of barcodes required is reduced, because sequencing and other errors are less likely to convert one barcode into another (analogous to Hamming distance in information theory and string metrics). Thus, in practice, for many experiments, the length of the GEC barcode may remain the same or decrease less than one base for each of the dinucleotide markers.
As with the above Examples, the expected linkage and the marker effects provide additional analysis power. In the case of this specific Example (and other arrangements of the two base markers), one of the primary advantages is that the manufacture is particularly simple.
Specifically, sequencing problems due to invariant bases are avoided; two (or more) bases to identify with greater confidence the marker or markers are provided; only certain sets of barcodes can be deemed to be useful, because only certain sets of doublets are expected; and it is useful for the more distant linkage analyses, as well, if more than one set of doublets is used, as illustrated herein. It is also useful if more than one set of doublets is inserted only once within an otherwise standard GBC barcode. Illustratively, a simple form of this parameter is shown in the 8-5 sequence shown above, which comprises only a single 2 base repeat, as adapted from the sequence shown in Example 1.
In this Example, sufficient diversity exists due to the inclusion of multiple barcodes; thus the barcode length is the same as that shown in Example I. This results in less diversity overall, but better options for data analysis. If, in the case of this Example, (4*4*) = AA, TT, GO, or CC, is used, then all of the previously mentioned advantages are gained, other than the ability to use the linkage between di -base repeats as the specific linkage analysis method. Nonetheless, those advantages are achieved in a way that is efficient and cost-effective for manufacture.
EXAMPLE 9
A bead or microparticle comprising a linker oligonucleotide sequence attached to an oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced as shown in the below bead-linker products Bead-Linker-5'-TITTTITAA/ROCA.CITOGTATCAACGCAGAGTAC4*4*4*4*4"4 zrzt*-NNNNNNN4"44*4* 4*NNINNVITTTITITTITTITTITTITTITTITTITT-31 This Example illustrates a specific, yet nonlimiting, case in which the semi-or partially variable barcodes used as markers also serve other barcode purposes. Thus, a barcode length of greater than 2 consecutive bases is warranted. 9-1:
Bead-Linker-5e-ITITITTAA/RCiCACITGGIATCAACGCAGAGTAC4'4"4*4*4*4*4*-4*Ni NiNNI\IR*R*11 R*NtNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTT1lT-3' 9-2: ficad-Linker-5?- 11TITITTAAIRGCAUTGGIATCAAGOCAGAGIAC44414*4*4 4*NNNNNNNY*Y*Y *1/*NNNNVITITITITITTITITTITITITITTITTTI-3? The R* series and Y* series in 9-1 and 9-2, respectively, serve as barcodes, but also serve as markers of a particular bead pool and a particular location in the bead bound oli CM. Although the total diversity of the third barcode is much lower if the bead-linker-sequence molecules of 9-I and 9-2 above are mixed, compared with full diversity if four 4* bases were employed, sufficient diversity is achieved so as to provide reasonable discriminatory power in sequence analysis, as well as also serving as a marker to distinguish unambiguously the pool of 9-1 bead and linked sequences from the pool of 9-2 bead and linked sequences, and to distinguish the 9-i and 9-2 pools from the various error sequences. Since the sequences of the 9-1 molecules comprise only A or G at all 4 positions of barcode 3, and the sequences of the 9-2 molecules comprise C or Tat all 4 positions, any barcode 3 sequences that contain any C or T mixed with A or G must be the result of one of the types of errors
EXAMPLE 10
A bead or microparticle comprising a linker oligonucleotide sequence attached to an oligonucleotide sequence comprising multiple (3+) barcode sequences as described herein was designed and produced as shown in the below bead-linker products Bead-Linker-5r-fur TITTAAIRGCACiTGGTATCAACGCAGAGTAC4*4*4*4*4*4 NNNN4* 4*4* 4*4*V*INNNNVTIITTIIIIIITTITTITTITITITTITTI3' 10-1: Bead-Lin. -5f-IIIITTIAAIRGCAGIGCiTATCAACGCACiAGIACB*B*B*B*B*B*B1V NNI\TNNNB*B LT132i TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 10-2: Bead-Linker-5 r-Tryr ITTAA,IRGCAGIGGIATCAACGCAGAGIACb*D*D*D*D*Dti* \INTN-; D*D*IT*D*V*TNNN1 rTT-3' 10-3: Bead--Linker-5'-ITITTITAAMGCAGIGGIATCAACGCAGAGTACkrirti*H*1-MtilrTNICNNIN. WErtr, fl*H*EPH*V*INNNNVIIITTITTITTIITTITITITTITTITITI-3' This Example provides another illustrative, yet non-limiting, case in which le semi-or partially-variable barcodes used as markers also serve other barcode purposes. Thus, a barcode length of greater than 2 consecutive bases is warranted.
This Example illustrates the generation of barcodes that are not pre-defined in exact S sequence and include overlapping sets of nucleobases, wherein such barcodes provide a high degree of discriminatory power compared with currently available harcodes that are commonly used. By 'discriminatory power' is meant the ability of the barcoded oligo to distinguish/define a unique well/cell or transcript molecule (niRNA). The number of discrete sequences possible defines the discriminatory power for the barcode Or combination of barcodes.
Ci Specifically, barcode one included in the above-depicted 10-1 and 10-2 comprise overlapping barcode sets. In the above representative examples, the first barcodes contained in the oligos designated within a box dboxfl, and the collective 12 base barcode is underlined. 10-i comprises the bases C, G, T, and 10-2 comprises the bases A, G, T. Thus, barcodes comprising only the nucleobases 0/T may be derived from either 10-1 or 10-2, but barcodes that any C's can be assigned to I 0-1, and those containing any A's can be assigned to 10-2. In practice, the probability that two beads contain the same barcode should be low, especially when both of the B* or D* barcodes (total 12 bases plus the two V's which add additional diversity) are considered; this becomes sufficiently low for downstream analysis. Notably, if both barcodes n molecule I 0-I, or Tr barcodes in molecule 10-2, are used to establish barcode Uniqueness, the ability to use the multiple barcode, advantage in identifying recombinations and other artefacts is not removed.
The set of actual barcodes present in a particular experiment comprises harcodes which can he, with a defined probability cutoff, unambiguously assigned to either the 10-i or the 10-2 barcode sequence by the first and second B* or D* barcodes. When the set of actual barcodes both properly match to the same exemplar sequence, then the sequence is considered to be "good," i.e., to have no errors in that particular oligo sequence (errors due to synthesis problems, or downstream workflow errors such as those introduced by PCP., recombination, sequencing machine errors, etc.). An exemplar sequence refers to the sequence that was intended to be present on every oligo on every bead. The actual set of barcodes sequenced reflects all synthesis and workflow errors (per errors, crossovers, sequencing errors, etc). The actual set is also a set of defined sequences, rather than 4*, N, R, etc. For example. "4*TTV" is an exemplar sequence, to which the following set of actual barcodes all conform: Bead l:ATTA, ATTG, ATTC; Bead 2: T'TTA, TITO, TTTC; and Bead 3: GTTA, GITG, GTIC.
When the set of actual barcodes map differently for an exemplar sequence, they provide evidence of artefacts such as recombination. These known segments can be used to determine the rate of artefacts (which is a valuable confidence level estimator by itself). The rate can then be used to correct the probability of artefacts among those barcodes that cannot be independently assigned to one of the exemplary barcodes shown above. The 10-3 molecule shares bases A and T with 10-2, and shares T with 10-1; therefore, overlapping and non-overlapping sets of barcodes again comprise the sets. The same process can be applied among the full set of potential sequence sets provided that there remains a set of sequences that can be unambiguously assigned. Accordingly, all possible combinations of bases cannot be covered by the represented sets or the benefit of having sets is lost. Moreover, the total number of sets that can be covered depends upon the complexity of the barcodes and the ability to distinguish separate sets (e.g., increasing length and or complexity of the barcodes enables more sets to be covered). Overall, this results in probabilities that allow for reasonably high confidence in the uniqueness of the cell barcodes (allowing calculation of sufficient total diversity of barcodes for the number of samples in the experiment and the degree of certainty of uniqueness required) while still keeping the total barcode length short and maintaining the ability to utilize the advantages of the 31--barcode system.
In another aspect, these exemplified barcode sequences,or other three or more barcodes, can also be used to differentiate another layer of complexity in an experiment. For example, if in the first stage of an RNA capture experiment, the cell membranes, but not the nuclei, are lysed, and in a subsequent stage, the nuclei are lysed in the presence of a second bead, a second set of capture sequences, or another mechanism, the different signals are able to be teased apart (separately identified) even if one or more of the barcodes is identical, as long as there is an additional barcoding level or parameter that is distinct (i.e., a third barcode, or additional barcodes beyond three barcodes, e.g., 4, 5, 6, 7, 8, 9, 10, etc. barcodes). For example, an exogenous sequence may be included in the well, in the droplet, or in another partition, such that mapping back to the same cell is readily achieved in the end if using multiple beads. If the partition includes an exogenous cell barcode that is separate from t le beads (for example, on a primer that binds to a sequence on the beads), the cell barcode is encoded by the partition space and thus is identical between the cell lysis and nuclei lysis portions (or other differentiation steps), but the multiple barcodes on the beach's) provide another layer that would allow differentiation between the respective steps.
EXAMPLE 11
Method of preparing beads containing the described barcodes A population of heads comprised of hydroxylated methylmethacrviate with 1000A pore size and 30 um average diameter is fallen onalized with non-cleavable hexa(ethylene glyco.
linker. Oligonucleoti de synthesis, in the 5' to 3' direction, is can-ied out to build (attach) oligonucleotide sequences onto the non-cleavable linkers. Specifically, the following representative 3 sequence containing cell barcodes shown in boxes and a UMI (comprising 8 N nucleohases) is constructed: Bead-Linker-5'-TITTITTAAGCAGTGGIATCAACGCAGAGTACR*4*4*4*4*4*4*4*NNNNNNIN04* 4'1 [4*TTTTTTTTTTTTTTTTTTTTTTTITTTTTT-3' First, the invariant 32 bases are built using standard methods of oligonucleotide synthesis; this region comprises the primer binding site. Second, immediately adjacent to the invariant 32 base primer binding site, the first barcode, the cell barcode (CBC), is synthesized using 8 rounds of split-pool-synthesis: the population of beads is split into 4 separate columns and only I base is added per column, such that all beads in column I receive a single A, all beads in column 2 receive a single T, all beads in column 3 receive a single (3 and all beads in column 4 receive a single C. Following the addition of the single base, the beads from all 4 columns are pooled together and mixed before being distributed (split) among 4 columns again for the second and subsequent rounds of nucleotide addition. This process is repeated until 8 bases have been added, thus completing the cell barcode. Third, the LiMI barcode is synthesized. This is accomplished by using all four nucleobases to yield an "N blend" in the synthesis reaction, instead of a particular base, such that there is an equal likelihood that either T, A, G, or C will incorporate into each growing oligonucleotide during each round of nucleotide addition.
This is repeated for 8 cycles to complete the 8 base UM1. Fourth, a third barcode is constructed, again using the split-pool technique, for 4 rounds of nucleotide addition. Finally, 30 rounds of nucleotide addition are carried out using only I base to generate a polynucleotide capture region.
References: Koshinsky, H. et al., 2015 Nucleic acid analysis byjoining barcoded polynucleotide probes, US 201 8/0258476A1 oja, T.2011, Counting absolute numbers of molecules using unique molecular identifiers, Nat It Gas 9:72-74.
Macosko, E.Z, et al., 2015, Highly parallel wide expression profiling of individual cells using nanoliter droplets, Cell, 161(5):1202-1214.

Claims (27)

  1. hind is claimed is: 1. A solid support comprising, a linker and a plurality of nucleic acid or ollp coti de sequences attached thereto, wherein each attached nucleic acid or oligonucleotide &pence 5 comprises: a primer binding sequence, at least three nonidentical molecular barcode nucleic acid sequences, wherein at least one of the barcode sequences comprises a cell barcode (CBC) sequence comprising a plurality of contiguous invariant or partially-variable nucleohases, linked or unlinked; at least one of the lU barcode sequences comprises a unique molecular identifier (UNIT) sequence comprising variable nucleobases, partially-variable nucleohases, invariant nucleobases, or a combination thereof: and a third or additional barcodes comprises at least one of (i) a plurality of invariant nucleohases, partially-variable nucleohases, variable nucleohases, linked or unlinked, or a combination thereof; (ii) a plurality of invariant nucleobases comprising at least one terminal partially variable nucleobase, or (iii) a single partially-variable or invariant nucleobase, and a terminal polynucleotide or oligonucleotide sequence.
  2. 2 The solid support of claim I, wherein the terminal polvnucleotide or oligonucleotide sequence comprises identical repeating nucleohases to which a polyaderwl ated nucleotide sequence binds.
  3. 3. The solid support of claim 1 or aim 2, which comprises a bead or a particle.
  4. 4. The solid support of claim l or claim 2, wherein head or particle is a nanoparticle, a microparticle, or a macroparticie.
  5. 5. The solid support of claim 3 or claim.4, wherein the head or particle comprises glass, polystyrene, silicon-based polymer, polymethylmethacrylate (PleB4A), polydimethylsiloxane (PDM,S), silica gel, polyethylene, or composite materials, optionally with a paramagnetic core or coating.
  6. 6. The solid support of any one of claims3-5, wherein the bead or particle is monodisperse and spherical.
  7. The solid support of any one of claims 3-6, wherein erein the bead or particle comprises an average diameter of 30 um.
  8. 8. The solid support of any one of claims 1-7, wherein the linker is a cleavable linker or a non-cleavable linker,
  9. 9. The solid support of claim 8, wherein the linker is a cleavable linker selected from a photocieavable (photolabile) linker, disulfide linker, thermally cleavable linker, or chemically cleavable linker.
  10. 10. The solid support of claim 8, wherein the linker is a non-cleavable linker selected from a straight-chain polymer, a substituted hydrocarbon polymer, polyethylene glycol, or PEG-C3-C24 polyethylene glycol.
  11. 11. The solid support of any one of claims 1-10, wherein the at least three barcodes are spatially separated or contiguous within the attached nucleotide or oligonucleotide sequence.
  12. 12. The solid support of any one of claims 1-11, wherein the CRC barcode sequence comprises at least four invariant, partially variable, and/or variable nucleobases.
  13. 13. The solid support of claim 12, wherein the CRC barcode sequence s at least four variable nucleobases.
  14. 14. The solid support of claim 12 or claim 13, wherein the CRC barcode sequence comprises at least four variable nucleobases comprising 4*4*4*4*, wherein each 4* is a single nucleobase across all oligonucleotides attached to a given head or particle, and wherein the single nucleobase is selected from A, C, T, or G.
  15. 15. The solid support of any one of claims 1-14, wherein the UMI barcode sequence comprises at least four variable nucleobases.
  16. 16. The solid support of claim 15, wherein the UM! barcode sequence comprises at least four variable nucleobases comprising NNNN, wherein each N is a single nucleobase of unfixed identity across all oligonucleotides on a given bead, and wherein the single nucleobase is selected from A, C, T, or G.
  17. 17. The solid support of claim 16, wherein the UMI barcode sequence terminates in a partially-variable nucleobase, V, which comprises one of A, C, or G.
  18. 18. The solid support of any one of claims 1-17, wherein the oligonucleotide sequences attached to the bead or microparticle comprise, in a 5' to 3' direction, a sequencing priming site, CBC barcode sequence comprising 8-12 nucleobases, a UMI barcode sequence comprising 14 nucleobases, wherein 14 nucleobases comprises 13 nucleobases, each of which is either A, T, C, or G, and a terminal variable (V) nucleobase, wherein V = A, C, G, but not T, and a terminal polynueleotide or oligonucleotide. capture sequence.
  19. 19. The solid support of any one of claims 1-18, wherein the terminal polynucleotide or oligonucleotide sequence or capture sequence comprises a poly-dr sequence.
  20. 20. The solid support of claim 19, whereinthe poly-dT sequence comprises at least 30 T nucleobases.i 5
  21. 21. A nucleic acid or polynucleotide sequence comprising at least three nonidentical molecular barcode nucleic acid sequences, wherein at least one of the barcode sequences comprises a cell barcode (CBC) sequence comprising a plurality of contiguous invariant or partially-variable nucleobases, linked or unlinked; at least one of the barcode sequences comprises a unique molecular identifier (UNIT) sequence comprising variable nucleobases, partially-variable nucleobases, invariant nucleobases, or a combination thereof; and a third or additional barcode sequence comprises at least one of (i) a plurality of invariant nucleobases, partially-variable nucleobases, variable nucleobases, linked or unlinked, or a combination thereof; (ii) a plurality of invariant nucleobases comprising at least one terminal partially variable nucleobase; or (iii) a single partially-variable or invariant nucleobase; and a terminal polynucleotide or oligonucleotide capture sequence.
  22. 22. The nucleic acid or polynucleotide sequence of claim 21, wherein the terminal polynucleotide capture sequence comprises identical repeating nucleobases to which a polyadenylated nucleotide sequence binds.
  23. 23. The solid support of any one of claims - comprising an attached nucleic acid or oligonucleotide sequence, or the nucleic acid or polvnucleotide sequence of claim 21 or 22., wherein the sequence is selected from: 5,.ITITTTTAA/RGGAGTGGTAT kCGCAGAGTAC4*4*4*4*4*4"4*4*4*4 NNNNN NNNINNINNNVI4*4 4*VTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3'; TITTTTTANRCiCAGTGGTATCAACCiCAGAGTACA*4*4*4*1*,r4 4*NNICNI\INNN4*4*4 *4*TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3°; 5'-TTITITTAALRGCAGTGGTATCAACGCAGAGTAC4*4*4*4*4*4*4 *NNICNNNN4*4"4* 4*NNNNVTTTTTTTTTTTTTTT'fTTTTTTTfTTfTtf-3'; 5'-TITITTTANIRGCAGTGGIATCAACGCAGAGTAC4*4*4*4*RR4"< *3* TNNHANN4*4" 20 4*4*3"TNiNiNNVTTTITITTITTTITTTITTTITTITTITTT-3'; ITITTTTAA/RGCAGTGGTATCAACGCAGAGTAC4*4*=1*4*R2W4"4"3*TNN2*RNN4*4 *4*4*3*INNNNVTITITITITTIITITITTTITTITTITTIT-3; 5'-fITTITTAA/R.GCAGTGUIATCAACCiCACIAGTACA*,4*4*4*2*2*zr 4*3*TNN2*2*NN4 *4*4*4*3*TNINiNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3% ITTTITTAA/RGCAGTGGTATCAACGCAGAGTAC4*4*4*4' 4*4*4*3*TNN, N4-'4* 4*4*3*TNNNNVTTITTITTTITTITTTTTTITTITTTTITT-3' sf_ TTITTTTAA/RCiCAGTOGTATCAACGCAGAGTAC4*4*4*4*CiG4*4*4*3*INNGGNN4*4* 35 4*4*3*TNNNN VTTITTTTTITTTITTTTITTITTTITTITT-3'; -TTITTIFTAAIRGCACMXITATCAACGCAGAGIACV4*4*4*2*2*4*4*4*3 2"NN4 *4*4*4*2*2*3*TNNNNVTTTTTTTTTTTTTITTTTTTTTTTTTTTTT-3'; 5'.TTTTTTTA A /13. GC A CrTGGT A TC A ACCFC AGA GT A C4*4*4*4*(4*4*)4*4*4* TNN(4*4*)N N4*4*4*4*(4*4*)3*TNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3'; 5/-ITITITTAAIRCiCAGTeiGTATCAACCEVAGAGTAC4*4*4*4*(AA)4*4*4*3*INN(AA)NN4 *4*4*4*(AA)3 *TNNNNWTTTTMTTTTTTITTTTTITTTITTT17-3'; 5;-TTTTTTT A A/RGCAGTGGT ATC AACGC.AGrAGTAC4*4*4*4*(TT)4*4*4*3*TNN(TT)NN4* 15 4*4*4*(7F17)3*TNNINNV TITT I. TIT TITTITTIT TITTITTITTIT-3'; 5'-TTTTTTTAA/RGCAGTOGTATCAACGCACiAGTAC4*4*4*4*(GG)4*4*4*3*TNN( -)VN4 *4*4*4*(66)3*TNNNNVITITTTITITTTITTTTIYITTLITITTTITT-3'; TTITITTA ARGCA GTGGTA TC AAC G CA( ;AG-LAC4* a' ( CC)4t4'4* TN N (C r 4 *4*4*4'1CC)3*TNNNNVTITITITTTITITTITTTITITTTTITT17-37; TITFITTAARGCAGTGGTATCAACGCAGAGTAC4*4*4*4*4*(4"4*W4*4*4*3*TNNNN NNNNNNNNNVF4*4*4*4*VTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3'; 5'-ITTTITTTA A/ROC/ACM-Kill/0'CA ACGCAGAGTAC4*4s4*4*4 -r 4 *NM. 4*NNNN VITTITITITTITTITTITITITTITTITIT39; TTTIFTT A A/RGCACHIGGT ATCAA CCiC.AGAGTAC4 *4*4*4*4* 4*i NNNNNNNR*R*R 35 *R*NNNNVITITITITTITTITITITITITITTITITT-3% TTTIFTT A A/RGCACiTticif A TCAACGCAGAGTAC4*4*4*4*4* 4*i NNNNNNNY*Y*'Y*Y*NNNNVTTTTTTTTTTTTTTTfITTTTTTTTTTTTT-.3% 5'-TTTTTTT A AIRGCAGMCIT A TCA A CGCAGAGTAC4*4*4*4*4*4*V" T NNNN4*4*4*4* 4*4*V*INNNNVITTITTITITITITTITMITITITITIT-39; TTITTTTAA/R0CAGEGGTATCAACGCAGAGTACB*B*B*B*B*B*B*V*TNINNNNNWB *B 413* B *B *V* TNNNN VITTITYTTITTTTITTTTTTTTITTTTTIT--3; TTTTTTTAA/RGCAGTGGTATCAACGCAGAGTACD*D*D*D*D*D*V*TNNNNNN D's D*D'D*D*V*INNNNVITTITITIETTITITTITITTITIffITIT-3'; or 5'-TTITTTIAA/RGCAGTOCiTATCAACGCAGAGTACH*WH*WH*H*V*TINr; ma-N-3*I1* H*H*H*.H*V*TNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT-3': wherein each of 2*, 3*, and 4* represents a single nucleobase at a given position in the sequence with differing numbers of possible bases (A, adenine; T, thymine; C, cytosine; G, guanine) at each position, wherein 2* is a single nucleobase of either A or T; A or C; A or 0; T or C; T or 0, or C or G-nucleobase; wherein 3* is is a single nucleobase of either A, T, or C; A, C, or 0; or T, C, or Ci; and wherein 4* is a single nucleobase of either A, C, T, or G; wherein N = A, C, T, or 0; wherein R. = A or 0; wherein V -A, C, or 0, wherein B C or G Or I, wherein D -A or G or wherein H = A or C or T; and wherein nucleobases set forth in parentheses,identical or nonidentical, are linked.
  24. 24. The solid support of any one of claims 1-20 or 23, or the nucleic acid or poi ynucl eotide sequence of claims 21 or 22, wherein the at least three barcode sequences allow for identification of artefacts following amplification of cellular nucleic acids.
  25. 25. A composition comprising the solid support of any one of claims l 20, 23, or 24, or the nucleic acid or polynucleotide sequence of claims 20 or 2].
  26. 26. t comprising a solid support of any one of claims 1-20, 23, or 24, or the nucleic acid or polynucleotide sequence of claims 20 or 21, and instructions for use thereof
  27. 27. A method of analyzing nucleic acid sequences or sequence libraries obtained from a single cell, comprising lysing the cell to produce a single cell lysate; admixing the single cell lysate With the solid support of any one of claims 1-20, 23, or 24, wherein nucleic acids from the single cell lysate are captured on the nucleic acids or oligonucleotides attached to the bead or particle; amplifying the nucleic acid from the cell; and analwing the amplified nucleic acid sequences or sequence libraries using bioinformatic methods.
GB2000471.9A 2019-02-13 2020-01-13 Molecular barcodes for single cell sequencing, compositions and methods thereof Withdrawn GB2582850A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US201962804926P 2019-02-13 2019-02-13

Publications (2)

Publication Number Publication Date
GB202000471D0 GB202000471D0 (en) 2020-02-26
GB2582850A true GB2582850A (en) 2020-10-07

Family

ID=69626267

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2000471.9A Withdrawn GB2582850A (en) 2019-02-13 2020-01-13 Molecular barcodes for single cell sequencing, compositions and methods thereof

Country Status (1)

Country Link
GB (1) GB2582850A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022188827A1 (en) * 2021-03-10 2022-09-15 Nanjing University Chemical sample indexing for high-throughput single-cell analysis

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114958996B (en) * 2021-05-12 2022-12-20 浙江大学 Ultrahigh-throughput unicellular sequencing reagent combination

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015200893A2 (en) * 2014-06-26 2015-12-30 10X Genomics, Inc. Methods of analyzing nucleic acids from individual cells or cell populations
WO2016040476A1 (en) 2014-09-09 2016-03-17 The Broad Institute, Inc. A droplet-based method and apparatus for composite single-cell nucleic acid analysis
WO2016138496A1 (en) * 2015-02-27 2016-09-01 Cellular Research, Inc. Spatially addressable molecular barcoding
US20180258476A1 (en) 2015-09-08 2018-09-13 Affymetrix, Inc. Nucleic acid analysis by joining barcoded polynucleotide probes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015200893A2 (en) * 2014-06-26 2015-12-30 10X Genomics, Inc. Methods of analyzing nucleic acids from individual cells or cell populations
WO2016040476A1 (en) 2014-09-09 2016-03-17 The Broad Institute, Inc. A droplet-based method and apparatus for composite single-cell nucleic acid analysis
WO2016138496A1 (en) * 2015-02-27 2016-09-01 Cellular Research, Inc. Spatially addressable molecular barcoding
US20180258476A1 (en) 2015-09-08 2018-09-13 Affymetrix, Inc. Nucleic acid analysis by joining barcoded polynucleotide probes

Non-Patent Citations (24)

* Cited by examiner, † Cited by third party
Title
"The Cambridge Dictionary of Science and Technology", 1988
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 2001, WILEY INTERSCIENCE
AUSUBEL, CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, 1987
BENTONDAVIS, SCIENCE, vol. 196, 1977, pages 180
COLIGAN, CURRENT PROTOCOLS IN IMMUNOLOGY, 1991
DANIEL ALPERN ET AL: "Time- and cost-efficient high-throughput transcriptomics enabled by Bulk RNA Barcoding and sequencing", BIORXIV, 8 January 2019 (2019-01-08), XP055719233, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/256594v2.full.pdf> [retrieved on 20200730], DOI: 10.1101/256594 *
DARIO ROMAGNOLI ET AL: "ddSeeker: a tool for processing Bio-Rad ddSEQ single cell RNA-seq data", BMC GENOMICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 19, no. 1, 24 December 2018 (2018-12-24), pages 1 - 7, XP021265869, DOI: 10.1186/S12864-018-5249-X *
FRESHNEY, ANIMAL CELL CULTURE, 1987
GAIT, OLIGONUCLEOTIDE SYNTHESIS, 1984
GRUNSTEINHOGNESS, FROC. NATL. ACAD. SCI., USA, vol. 72, 1975, pages 3961
HAQUE, A., GENOME MEDICINBA, vol. 9, 2017, pages 75
HWANG, B., EXPERIMENTAL & MOLECULAR MEDICINE, vol. 50, 2018, pages 14
ISLAM S ET AL.: "Quantitative single-cell RNA-seq with unique molecular identifiers", NAT. METHODS, vol. 11, no. 2, 2014, pages 163 - 166, XP055614140, DOI: 10.1038/nmeth.2772
KIVIOJA T ET AL.: "Counting absolute numbers of molecules using unique molecular identifiers", NAT. METHODS., vol. 9, no. 1, 2012, pages 72 - 74, XP055201583, DOI: 10.1038/nmeth.1778
KIVIOJA, T. ET AL.: "Counting absolute numbers of molecules using unique molecular identifiers", NAT METHODS, vol. 9, 2011, pages 72 - 74, XP055401382, DOI: 10.1038/nmeth.1778
LIANG, J. ET AL., J GENETICS AND GENOMICS, vol. 41, no. 10, 2014, pages 513 - 528
MACOSKO, E.Z. ET AL.: "Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets", CELL, vol. 161, no. 5, 2015, pages 1202 - 1214, XP055586617, DOI: 10.1016/j.cell.2015.05.002
MILLERCALOS, GENE TRANSFER VECTORS FOR MAMMALIAN CELLS, 1987
MULLIS, PCR: THE POLYMERASE CHAIN REACTION, 1994
NEEDLEMANWUNSCH, MOL. BIOL., vol. 48, 1970, pages 443
OLSEN, T.K.BARYAWNO, N., CURR PROTOCALOL BIOL, vol. 122, no. l, 2018, pages e57
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
WAHL, G. M.S. L. BERGER, METHODS ENZYMOL., vol. 152, 1987, pages 507
WEIR: "Handbook of Experimental Immunology", 1996, article "Methods in Enzymology"

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022188827A1 (en) * 2021-03-10 2022-09-15 Nanjing University Chemical sample indexing for high-throughput single-cell analysis

Also Published As

Publication number Publication date
GB202000471D0 (en) 2020-02-26

Similar Documents

Publication Publication Date Title
US20240327827A1 (en) Whole transcriptome analysis of single cells using random priming
US12071617B2 (en) Hybrid targeted and whole transcriptome amplification
US11649497B2 (en) Methods and compositions for quantitation of proteins and RNA
EP3914728B1 (en) Oligonucleotides associated with antibodies
EP4055160B1 (en) Using random priming to obtain full-length v(d)j information for immune repertoire sequencing
US11390914B2 (en) Methods and compositions for whole transcriptome amplification
JP2018538006A (en) Combination set of nucleic acid barcodes for analysis of nucleic acids associated with a single cell
EP3861134A1 (en) Determining 5&#39; transcript sequences
WO2022026909A9 (en) Single cell assay for transposase-accessible chromatin
JP2019528059A (en) Method for de novo assembly of barcoded genomic DNA fragments
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
EP3574112B1 (en) Barcoded dna for long range sequencing
WO2011100617A2 (en) Nucleic acid, biomolecule and polymer identifier codes
US20200255888A1 (en) Determining expressions of transcript variants and polyadenylation sites
JP2021094038A (en) Isothermal methods for preparing nucleic acids and related compositions
US11371076B2 (en) Polymerase chain reaction normalization through primer titration
EP3728636B1 (en) Particles associated with oligonucleotides
GB2582850A (en) Molecular barcodes for single cell sequencing, compositions and methods thereof
CN114729349A (en) Method for detecting and sequencing barcode nucleic acid
JP7562424B2 (en) Sequencing primer oligonucleotides
CN111801428B (en) Method for obtaining single-cell mRNA sequence
US10465242B2 (en) Multi-sequence capture system
Alam et al. Microfluidics in Genomics
EP4305201A1 (en) Chemical sample indexing for high-throughput single-cell analysis

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)