WO2024256581A1 - Identification de cytosines modifiées - Google Patents
Identification de cytosines modifiées Download PDFInfo
- Publication number
- WO2024256581A1 WO2024256581A1 PCT/EP2024/066447 EP2024066447W WO2024256581A1 WO 2024256581 A1 WO2024256581 A1 WO 2024256581A1 EP 2024066447 W EP2024066447 W EP 2024066447W WO 2024256581 A1 WO2024256581 A1 WO 2024256581A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- strand
- polynucleotide
- hairpin
- sequence
- library
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 336
- 108091033319 polynucleotide Proteins 0.000 claims description 543
- 102000040430 polynucleotide Human genes 0.000 claims description 543
- 239000002157 polynucleotide Substances 0.000 claims description 543
- 230000000295 complement effect Effects 0.000 claims description 312
- 238000012163 sequencing technique Methods 0.000 claims description 251
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 claims description 224
- 230000002441 reversible effect Effects 0.000 claims description 181
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 claims description 172
- 230000027455 binding Effects 0.000 claims description 153
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 claims description 148
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 claims description 140
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical class NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 claims description 119
- 125000003729 nucleotide group Chemical group 0.000 claims description 111
- 239000002773 nucleotide Substances 0.000 claims description 103
- 239000002243 precursor Substances 0.000 claims description 103
- 230000003321 amplification Effects 0.000 claims description 89
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 89
- 238000006243 chemical reaction Methods 0.000 claims description 88
- 229940035893 uracil Drugs 0.000 claims description 86
- 229940113082 thymine Drugs 0.000 claims description 74
- 239000007787 solid Substances 0.000 claims description 60
- 238000012545 processing Methods 0.000 claims description 57
- 239000003795 chemical substances by application Substances 0.000 claims description 53
- 239000002981 blocking agent Substances 0.000 claims description 41
- 125000006850 spacer group Chemical group 0.000 claims description 40
- 102000004190 Enzymes Human genes 0.000 claims description 33
- 108090000790 Enzymes Proteins 0.000 claims description 33
- 230000000903 blocking effect Effects 0.000 claims description 25
- 102000004533 Endonucleases Human genes 0.000 claims description 22
- 108010042407 Endonucleases Proteins 0.000 claims description 22
- 229940104302 cytosine Drugs 0.000 claims description 16
- 238000003786 synthesis reaction Methods 0.000 claims description 14
- 101000931098 Homo sapiens DNA (cytosine-5)-methyltransferase 1 Proteins 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 13
- 102100036279 DNA (cytosine-5)-methyltransferase 1 Human genes 0.000 claims description 11
- 239000000203 mixture Substances 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 11
- 108060004795 Methyltransferase Proteins 0.000 claims description 10
- 102000016397 Methyltransferase Human genes 0.000 claims description 10
- 230000015572 biosynthetic process Effects 0.000 claims description 7
- 239000013043 chemical agent Substances 0.000 claims description 5
- 150000007523 nucleic acids Chemical group 0.000 abstract description 20
- 239000002585 base Substances 0.000 description 114
- 101000964378 Homo sapiens DNA dC->dU-editing enzyme APOBEC-3A Proteins 0.000 description 55
- 102100040263 DNA dC->dU-editing enzyme APOBEC-3A Human genes 0.000 description 52
- 102000005381 Cytidine Deaminase Human genes 0.000 description 49
- 108010031325 Cytidine deaminase Proteins 0.000 description 49
- 108091033409 CRISPR Proteins 0.000 description 48
- 108091008146 restriction endonucleases Proteins 0.000 description 45
- 238000006481 deamination reaction Methods 0.000 description 39
- 230000009615 deamination Effects 0.000 description 31
- 230000008569 process Effects 0.000 description 29
- 108020004414 DNA Proteins 0.000 description 28
- 239000012634 fragment Substances 0.000 description 24
- 239000000758 substrate Substances 0.000 description 21
- 210000004027 cell Anatomy 0.000 description 16
- 102000004169 proteins and genes Human genes 0.000 description 15
- 108090000623 proteins and genes Proteins 0.000 description 15
- 238000006467 substitution reaction Methods 0.000 description 15
- 102000039446 nucleic acids Human genes 0.000 description 14
- 108020004707 nucleic acids Proteins 0.000 description 14
- LSNNMFCWUKXFEE-UHFFFAOYSA-N Sulfurous acid Chemical compound OS(O)=O LSNNMFCWUKXFEE-UHFFFAOYSA-N 0.000 description 13
- 230000000694 effects Effects 0.000 description 13
- 238000002360 preparation method Methods 0.000 description 13
- 230000004048 modification Effects 0.000 description 12
- 238000012986 modification Methods 0.000 description 12
- 230000035772 mutation Effects 0.000 description 12
- -1 triethylsilyl Chemical group 0.000 description 12
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 11
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 11
- 239000003638 chemical reducing agent Substances 0.000 description 10
- 230000000670 limiting effect Effects 0.000 description 10
- BLQMCTXZEMGOJM-UHFFFAOYSA-N 5-carboxycytosine Chemical compound NC=1NC(=O)N=CC=1C(O)=O BLQMCTXZEMGOJM-UHFFFAOYSA-N 0.000 description 9
- FHSISDGOVSHJRW-UHFFFAOYSA-N 5-formylcytosine Chemical compound NC1=NC(=O)NC=C1C=O FHSISDGOVSHJRW-UHFFFAOYSA-N 0.000 description 9
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical group OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 description 9
- ZOXJGFHDIHLPTG-UHFFFAOYSA-N Boron Chemical compound [B] ZOXJGFHDIHLPTG-UHFFFAOYSA-N 0.000 description 9
- 125000003275 alpha amino acid group Chemical group 0.000 description 9
- 229910052796 boron Inorganic materials 0.000 description 9
- 239000003153 chemical reaction reagent Substances 0.000 description 9
- 238000001514 detection method Methods 0.000 description 9
- 102000051366 Glycosyltransferases Human genes 0.000 description 8
- 108700023372 Glycosyltransferases Proteins 0.000 description 8
- 108030004080 Methylcytosine dioxygenases Proteins 0.000 description 8
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical group OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 8
- 238000003556 assay Methods 0.000 description 8
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 7
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 7
- 102000008300 Mutant Proteins Human genes 0.000 description 7
- 108010021466 Mutant Proteins Proteins 0.000 description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 description 7
- 238000003776 cleavage reaction Methods 0.000 description 7
- 239000005547 deoxyribonucleotide Substances 0.000 description 7
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 7
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 7
- 230000007017 scission Effects 0.000 description 7
- OIVLITBTBDPEFK-UHFFFAOYSA-N 5,6-dihydrouracil Chemical compound O=C1CCNC(=O)N1 OIVLITBTBDPEFK-UHFFFAOYSA-N 0.000 description 6
- 102220605874 Cytosolic arginine sensor for mTORC1 subunit 2_D10A_mutation Human genes 0.000 description 6
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 6
- 101000653360 Homo sapiens Methylcytosine dioxygenase TET1 Proteins 0.000 description 6
- 101000653374 Homo sapiens Methylcytosine dioxygenase TET2 Proteins 0.000 description 6
- 125000003798 L-tyrosyl group Chemical group [H]N([H])[C@]([H])(C(=O)[*])C([H])([H])C1=C([H])C([H])=C(O[H])C([H])=C1[H] 0.000 description 6
- 102100030803 Methylcytosine dioxygenase TET2 Human genes 0.000 description 6
- 229910000085 borane Inorganic materials 0.000 description 6
- NNTOJPXOCKCMKR-UHFFFAOYSA-N boron;pyridine Chemical compound [B].C1=CC=NC=C1 NNTOJPXOCKCMKR-UHFFFAOYSA-N 0.000 description 6
- 238000009396 hybridization Methods 0.000 description 6
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 6
- 238000003672 processing method Methods 0.000 description 6
- 239000000523 sample Substances 0.000 description 6
- UORVGPXVDQYIDP-UHFFFAOYSA-N trihydridoboron Substances B UORVGPXVDQYIDP-UHFFFAOYSA-N 0.000 description 6
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 5
- 108010079649 APOBEC-1 Deaminase Proteins 0.000 description 5
- 108010004483 APOBEC-3G Deaminase Proteins 0.000 description 5
- 102100040397 C->U-editing enzyme APOBEC-1 Human genes 0.000 description 5
- 102100040399 C->U-editing enzyme APOBEC-2 Human genes 0.000 description 5
- 108700004991 Cas12a Proteins 0.000 description 5
- 102100040262 DNA dC->dU-editing enzyme APOBEC-3B Human genes 0.000 description 5
- 102100040261 DNA dC->dU-editing enzyme APOBEC-3C Human genes 0.000 description 5
- 102100040264 DNA dC->dU-editing enzyme APOBEC-3D Human genes 0.000 description 5
- 102100040266 DNA dC->dU-editing enzyme APOBEC-3F Human genes 0.000 description 5
- 102100038076 DNA dC->dU-editing enzyme APOBEC-3G Human genes 0.000 description 5
- 102100038050 DNA dC->dU-editing enzyme APOBEC-3H Human genes 0.000 description 5
- 101710082737 DNA dC->dU-editing enzyme APOBEC-3H Proteins 0.000 description 5
- 101000964322 Homo sapiens C->U-editing enzyme APOBEC-2 Proteins 0.000 description 5
- 101000964385 Homo sapiens DNA dC->dU-editing enzyme APOBEC-3B Proteins 0.000 description 5
- 101000964383 Homo sapiens DNA dC->dU-editing enzyme APOBEC-3C Proteins 0.000 description 5
- 101000964382 Homo sapiens DNA dC->dU-editing enzyme APOBEC-3D Proteins 0.000 description 5
- 101000964377 Homo sapiens DNA dC->dU-editing enzyme APOBEC-3F Proteins 0.000 description 5
- 101000800426 Homo sapiens Putative C->U-editing enzyme APOBEC-4 Proteins 0.000 description 5
- 102100033091 Putative C->U-editing enzyme APOBEC-4 Human genes 0.000 description 5
- 150000001768 cations Chemical class 0.000 description 5
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 5
- 239000000463 material Substances 0.000 description 5
- KDELTXNPUXUBMU-UHFFFAOYSA-N 2-[2-[bis(carboxymethyl)amino]ethyl-(carboxymethyl)amino]acetic acid boric acid Chemical compound OB(O)O.OB(O)O.OB(O)O.OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KDELTXNPUXUBMU-UHFFFAOYSA-N 0.000 description 4
- 229930024421 Adenine Natural products 0.000 description 4
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 4
- 102000000340 Glucosyltransferases Human genes 0.000 description 4
- 108010055629 Glucosyltransferases Proteins 0.000 description 4
- 101000653369 Homo sapiens Methylcytosine dioxygenase TET3 Proteins 0.000 description 4
- 102100030819 Methylcytosine dioxygenase TET1 Human genes 0.000 description 4
- 102100030812 Methylcytosine dioxygenase TET3 Human genes 0.000 description 4
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 4
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 4
- 229960000643 adenine Drugs 0.000 description 4
- 150000001413 amino acids Chemical class 0.000 description 4
- 238000001369 bisulfite sequencing Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 125000004184 methoxymethyl group Chemical group [H]C([H])([H])OC([H])([H])* 0.000 description 4
- 108090000765 processed proteins & peptides Proteins 0.000 description 4
- 150000003839 salts Chemical group 0.000 description 4
- 230000005945 translocation Effects 0.000 description 4
- JYHRLWMNMMXIHF-UHFFFAOYSA-N (tert-butylamino)boron Chemical compound [B]NC(C)(C)C JYHRLWMNMMXIHF-UHFFFAOYSA-N 0.000 description 3
- MXHRCPNRJAMMIM-SHYZEUOFSA-N 2'-deoxyuridine Chemical compound C1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 MXHRCPNRJAMMIM-SHYZEUOFSA-N 0.000 description 3
- SBHSUMUTJOPRIK-HPFNVAMJSA-N 5-(beta-D-glucosylmethyl)cytosine Chemical compound NC1=NC(=O)NC=C1CO[C@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 SBHSUMUTJOPRIK-HPFNVAMJSA-N 0.000 description 3
- UBKVUFQGVWHZIR-UHFFFAOYSA-N 8-oxoguanine Chemical compound O=C1NC(N)=NC2=NC(=O)N=C21 UBKVUFQGVWHZIR-UHFFFAOYSA-N 0.000 description 3
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 3
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 3
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- 108091028664 Ribonucleotide Proteins 0.000 description 3
- DWAQJAXMDSEUJJ-UHFFFAOYSA-M Sodium bisulfite Chemical group [Na+].OS([O-])=O DWAQJAXMDSEUJJ-UHFFFAOYSA-M 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- JBANFLSTOJPTFW-UHFFFAOYSA-N azane;boron Chemical compound [B].N JBANFLSTOJPTFW-UHFFFAOYSA-N 0.000 description 3
- QHXLIQMGIGEHJP-UHFFFAOYSA-N boron;2-methylpyridine Chemical compound [B].CC1=CC=CC=N1 QHXLIQMGIGEHJP-UHFFFAOYSA-N 0.000 description 3
- QELVBRYVPXJQMT-UHFFFAOYSA-N boron;ethane-1,2-diamine Chemical compound [B].NCCN QELVBRYVPXJQMT-UHFFFAOYSA-N 0.000 description 3
- RJTANRZEWTUVMA-UHFFFAOYSA-N boron;n-methylmethanamine Chemical compound [B].CNC RJTANRZEWTUVMA-UHFFFAOYSA-N 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 230000009977 dual effect Effects 0.000 description 3
- 238000001976 enzyme digestion Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000011987 methylation Effects 0.000 description 3
- 238000007069 methylation reaction Methods 0.000 description 3
- 230000003647 oxidation Effects 0.000 description 3
- 238000007254 oxidation reaction Methods 0.000 description 3
- 102000004196 processed proteins & peptides Human genes 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 230000035484 reaction time Effects 0.000 description 3
- 239000002336 ribonucleotide Substances 0.000 description 3
- 125000002652 ribonucleotide group Chemical group 0.000 description 3
- 238000007841 sequencing by ligation Methods 0.000 description 3
- 235000010267 sodium hydrogen sulphite Nutrition 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 241001515965 unidentified phage Species 0.000 description 3
- 125000003903 2-propenyl group Chemical group [H]C([*])([H])C([H])=C([H])[H] 0.000 description 2
- HRPVXLWXLXDGHG-UHFFFAOYSA-N Acrylamide Chemical compound NC(=O)C=C HRPVXLWXLXDGHG-UHFFFAOYSA-N 0.000 description 2
- 229920000936 Agarose Polymers 0.000 description 2
- 229920001213 Polysorbate 20 Polymers 0.000 description 2
- 239000007983 Tris buffer Substances 0.000 description 2
- 125000002777 acetyl group Chemical group [H]C([H])([H])C(*)=O 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 125000002252 acyl group Chemical group 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 125000003236 benzoyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C(*)=O 0.000 description 2
- 125000001797 benzyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C([H])([H])* 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000004202 carbamide Substances 0.000 description 2
- 230000003197 catalytic effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- KPUWHANPEXNPJT-UHFFFAOYSA-N disiloxane Chemical group [SiH3]O[SiH3] KPUWHANPEXNPJT-UHFFFAOYSA-N 0.000 description 2
- 239000000975 dye Substances 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 230000004049 epigenetic modification Effects 0.000 description 2
- 125000001033 ether group Chemical group 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 125000003147 glycosyl group Chemical group 0.000 description 2
- 238000010348 incorporation Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 229910052751 metal Inorganic materials 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 2
- 229910052755 nonmetal Inorganic materials 0.000 description 2
- 238000010899 nucleation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000256 polyoxyethylene sorbitan monolaurate Substances 0.000 description 2
- 235000010486 polyoxyethylene sorbitan monolaurate Nutrition 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 239000013641 positive control Substances 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 239000011780 sodium chloride Substances 0.000 description 2
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 2
- 239000007790 solid phase Substances 0.000 description 2
- 239000012536 storage buffer Substances 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 125000000999 tert-butyl group Chemical group [H]C([H])([H])C(*)(C([H])([H])[H])C([H])([H])[H] 0.000 description 2
- 125000001412 tetrahydropyranyl group Chemical group 0.000 description 2
- 125000000025 triisopropylsilyl group Chemical group C(C)(C)[Si](C(C)C)(C(C)C)* 0.000 description 2
- 125000000026 trimethylsilyl group Chemical group [H]C([H])([H])[Si]([*])(C([H])([H])[H])C([H])([H])[H] 0.000 description 2
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 2
- 229910052725 zinc Inorganic materials 0.000 description 2
- 239000011701 zinc Substances 0.000 description 2
- 102100039217 3-ketoacyl-CoA thiolase, peroxisomal Human genes 0.000 description 1
- 241000604451 Acidaminococcus Species 0.000 description 1
- 101000860090 Acidaminococcus sp. (strain BV3L6) CRISPR-associated endonuclease Cas12a Proteins 0.000 description 1
- 241000589875 Campylobacter jejuni Species 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 108010082610 Deoxyribonuclease (Pyrimidine Dimer) Proteins 0.000 description 1
- 102000004099 Deoxyribonuclease (Pyrimidine Dimer) Human genes 0.000 description 1
- 102100024375 Gamma-glutamylaminecyclotransferase Human genes 0.000 description 1
- 101710201613 Gamma-glutamylaminecyclotransferase Proteins 0.000 description 1
- 108020005004 Guide RNA Proteins 0.000 description 1
- 101100153048 Homo sapiens ACAA1 gene Proteins 0.000 description 1
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- 241000904817 Lachnospiraceae bacterium Species 0.000 description 1
- WHXSMMKQMYFTQS-UHFFFAOYSA-N Lithium Chemical compound [Li] WHXSMMKQMYFTQS-UHFFFAOYSA-N 0.000 description 1
- 201000009906 Meningitis Diseases 0.000 description 1
- 241000588653 Neisseria Species 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- ZLMJMSJWJFRBEC-UHFFFAOYSA-N Potassium Chemical compound [K] ZLMJMSJWJFRBEC-UHFFFAOYSA-N 0.000 description 1
- JUJWROOIHBZHMG-UHFFFAOYSA-N Pyridine Chemical compound C1=CC=NC=C1 JUJWROOIHBZHMG-UHFFFAOYSA-N 0.000 description 1
- 239000013614 RNA sample Substances 0.000 description 1
- VMHLLURERBWHNL-UHFFFAOYSA-M Sodium acetate Chemical compound [Na+].CC([O-])=O VMHLLURERBWHNL-UHFFFAOYSA-M 0.000 description 1
- 241000191967 Staphylococcus aureus Species 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- 241000589892 Treponema denticola Species 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 229910001413 alkali metal ion Inorganic materials 0.000 description 1
- 150000003863 ammonium salts Chemical class 0.000 description 1
- 239000007864 aqueous solution Substances 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 125000004429 atom Chemical group 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 229910001417 caesium ion Inorganic materials 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 238000003508 chemical denaturation Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 239000000562 conjugate Substances 0.000 description 1
- 238000006114 decarboxylation reaction Methods 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 150000002009 diols Chemical class 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- WBZKQQHYRPRKNJ-UHFFFAOYSA-L disulfite Chemical compound [O-]S(=O)S([O-])(=O)=O WBZKQQHYRPRKNJ-UHFFFAOYSA-L 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 125000002485 formyl group Chemical group [H]C(*)=O 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 108700014210 glycosyltransferase activity proteins Proteins 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 125000004029 hydroxymethyl group Chemical group [H]OC([H])([H])* 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 238000011901 isothermal amplification Methods 0.000 description 1
- 229910001416 lithium ion Inorganic materials 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000000863 peptide conjugate Substances 0.000 description 1
- 150000004714 phosphonium salts Chemical class 0.000 description 1
- 239000011591 potassium Substances 0.000 description 1
- 229910001414 potassium ion Inorganic materials 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- IGLNJRXAVVLDKE-UHFFFAOYSA-N rubidium atom Chemical compound [Rb] IGLNJRXAVVLDKE-UHFFFAOYSA-N 0.000 description 1
- 229910001419 rubidium ion Inorganic materials 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 239000011734 sodium Substances 0.000 description 1
- 239000001632 sodium acetate Substances 0.000 description 1
- 235000017281 sodium acetate Nutrition 0.000 description 1
- 229910001415 sodium ion Inorganic materials 0.000 description 1
- LSNNMFCWUKXFEE-UHFFFAOYSA-L sulfite Chemical compound [O-]S([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-L 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 230000004572 zinc-binding Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
Definitions
- the invention relates to methods of distinguishing between different types of modified cytosines in nucleic acid sequences.
- Modified cytosines including 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC), are well-studied epigenetic modifications that play fundamental roles in human development and disease. Its genome-wide distribution differs between tissue types, and between healthy and diseased states. As a result, there has been an intense focus on developing methods for mapping modified cytosines at single base resolution, with minimal loss of sample polynucleotide quantity, quality, and complexity.
- a method of preparing polynucleotide templates for distinguishing between modified cytosines comprising: (a) providing a polynucleotide library hairpin strand comprising: a double-stranded polynucleotide comprising a forward library strand and a reverse library strand, a hairpin loop adaptor ligated to an end of the double-stranded polynucleotide, wherein the hairpin loop adaptor comprises a cleavable site, wherein the polynucleotide library hairpin strand has been generated from a precursor polynucleotide library hairpin strand such that any CpG dyads in the precursor polynucleotide library hairpin comprising only unmodified cytosine are converted to a first dyad in the polynucleotide library hairpin strand, any CpG dyads in the precursor polynucleotide
- the method further comprises a step of: (c) synthesising at least one template complement strand by generating a complement of the template strand, each of the template complement strands comprising a forward complement template strand, a spacer complement strand, and a reverse complement template strand, wherein the spacer complement strand comprises a second cleavable site.
- the method further comprises a step of: (d) cleaving the first cleavable site on the at least one template strand to generate at least one first polynucleotide sequence each comprising a first portion and cleaving the second cleavable site on the at least one template complement strand to generate at least one second polynucleotide sequence each comprising a second portion, wherein the first portion corresponds with the forward template strand and the second portion corresponds with the reverse complement template strand, or wherein the first portion corresponds with the reverse template strand and the second portion corresponds with the forward complement template strand.
- the first portion is at least 25 base pairs and the second portion is at least 25 base pairs.
- the first cleavable site is a first restriction site for an endonuclease.
- the second cleavable site is a second restriction site for an endonuclease.
- the at least one first polynucleotide sequence each comprise a first sequencing primer binding site.
- the first sequencing primer binding site is located after a 3’-end of the first portion.
- the at least one second polynucleotide sequence each comprise a second sequencing primer binding site.
- the second sequencing primer binding site is located after a 3’-end of the second portion.
- a double C-C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; where CpG dyads comprising 5-methylcytosine were present in the precursor polynucleotide library hairpin, then a double mismatch is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; and where CpG dyads comprising 5-hydroxymethylcytosine were present in the precursor polynucleotide library hairpin, then a single mismatch and single C-C/G-G match is present when comparing corresponding positions in the at least one first poly
- a double mismatch is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; where CpG dyads comprising 5-methylcytosine were present in the precursor polynucleotide library hairpin, then a double C- C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; and where CpG dyads comprising 5-hydroxymethylcytosine were present in the precursor polynucleotide library hairpin, then a single mismatch and single C-C/G-G match is present when comparing corresponding positions in the at least one first poly
- the method further comprises a step of preparing the first portion and the second portion for concurrent sequencing.
- the method comprises simultaneously contacting first sequencing primer binding sites located after a 3’-end of the first portions with first primers and second sequencing primer binding sites located after a 3’-end of the second portions with second primers.
- the method further comprises a step of processing the at least one first polynucleotide sequence comprising a first portion and the at least one second polynucleotide sequence comprising a second portion, such that a proportion of first portions are capable of generating a first signal and a proportion of second portions are capable of generating a second signal.
- the processing involves selective processing to cause an intensity of the first signal to be greater than an intensity of the second signal.
- a concentration of the first portions capable of generating the first signal is greater than a concentration of the second portions capable of generating the second signal.
- a ratio between the concentration of the first portions capable of generating the first signal and the concentration of the second portions capable of generating the second signal is between 1.25:1 to 5:1. In one aspect, the ratio is between 1.5:1 to 3:1. In one aspect, the ratio is about 2:1.
- selective processing comprises preparing for selective sequencing or conducting selective sequencing. In one aspect, selectively processing comprises conducting selective amplification.
- selectively processing comprises contacting first sequencing primer binding sites located after a 3’-end of the first portions with first primers and contacting second sequencing primer binding sites located after a 3’-end of the second portions with second primers, wherein the second primers comprises a mixture of blocked second primers and unblocked second primers.
- the blocked second primer comprises a blocking group at a 3’ end of the blocked second primer.
- the blocking group is selected from the group consisting of: a hairpin loop, a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer, a modification blocking the 3’-hydroxyl group, or an inverted nucleobase.
- the selective processing comprises selectively removing some or substantially all of second immobilised primers that are not yet extended, and conducting a further amplification cycle in order to selectively amplify the first polynucleotide sequence(s) relative to the second polynucleotide sequence(s).
- selectively processing comprises selectively blocking some or substantially all of second immobilised primers that are not yet extended using a primer blocking agent, wherein the primer blocking agent is configured to limit or prevent synthesis of a strand extending from the second immobilised primer, and conducting a further amplification cycle in order to selectively amplify the first polynucleotide sequence(s) relative to the second polynucleotide sequence(s).
- the primer blocking agent is added whilst first polynucleotide sequence(s) are hybridised to the second immobilised primers.
- the method comprises contacting some or substantially all of the second immobilised primers with an extended primer sequence, wherein the extended primer sequence is substantially complementary to the second immobilised primer and further comprises a 5’ additional nucleotide; and adding the primer blocking agent, wherein the primer blocking agent is complementary to the 5’ additional nucleotide.
- the primer blocking agent is a blocked nucleotide.
- the blocked nucleotide comprises a blocking group at a 3’ end of the blocked nucleotide.
- the blocking group is selected from the group consisting of: a hairpin loop, a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer, a modification blocking the 3’-hydroxyl group, or an inverted nucleobase.
- the blocked nucleotide is A or G.
- the first signal and the second signal are spatially resolved. In one aspect, the first signal and the second signal are spatially unresolved.
- the at least one first polynucleotide sequence comprising the first portion and the at least one second polynucleotide sequence comprising the second portion are attached to a solid support.
- the solid support is a flow cell.
- the at least one first polynucleotide sequence comprising the first portion and the at least one second polynucleotide sequence comprising the second portion forms a cluster on the solid support.
- the cluster is formed by bridge amplification.
- the at least one first polynucleotide sequence comprising the first portion and the at least one second polynucleotide sequence comprising the second portion form a duoclonal cluster.
- the solid support comprises at least one first immobilised primer and at least one second immobilised primer.
- the first immobilised primer comprises a sequence as defined in SEQ ID NO.1 or 5, or a variant or fragment thereof; and the second immobilised primer comprises a sequence as defined in SEQ ID NO.2, or a variant or fragment thereof.
- each first polynucleotide sequence is attached to a first immobilised primer, and wherein each second polynucleotide sequence is attached to a second immobilised primer.
- each first polynucleotide sequence comprises a second adaptor sequence and wherein each second polynucleotide sequence comprises a first adaptor sequence, wherein the second adaptor sequence is substantially complementary to the second immobilised primer and wherein the first adaptor sequence is substantially complementary to the first immobilised primer.
- a method of sequencing polynucleotide sequences to distinguish between modified cytosines comprising: preparing polynucleotide templates for distinguishing between modified cytosines using a method as described herein; sequencing nucleobases in the first portion and the second portion; and identifying the presence of 5-methylcytosine or 5-hydroxymethylcytosine by detecting differences when comparing a sequence output from the first portion with a sequence output from the second portion.
- the step of sequencing nucleobases in the first portion and the second portion involves concurrent sequencing of nucleobases in the first portion and the second portion.
- the step of sequencing nucleobases comprises performing sequencing-by-synthesis.
- the method further comprises a step of conducting paired-end reads.
- the step of concurrently sequencing nucleobases comprises: (a) obtaining first intensity data comprising a combined intensity of a first signal component obtained based upon a respective first nucleobase at the first portion and a second signal component obtained based upon a respective second nucleobase at the second portion, wherein the first and second signal components are obtained simultaneously; (b) obtaining second intensity data comprising a combined intensity of a third signal component obtained based upon the respective first nucleobase at the first portion and a fourth signal component obtained based upon the respective second nucleobase at the second portion, wherein the third and fourth signal components are obtained simultaneously; (c) selecting one of a plurality of classifications based on the first and the second intensity data, wherein each classification represents a possible combination of respective first and second nucleobases; and (d) based on the selected classification, base calling the respective first and second nucleobases.
- selecting the classification based on the first and second intensity data comprises selecting the classification based on the combined intensity of the first and second signal components and the combined intensity of the third and fourth signal components.
- the plurality of classifications comprises sixteen classifications, each classification representing one of sixteen unique combinations of first and second nucleobases.
- the first signal component, second signal component, third signal component and fourth signal component are generated based on light emissions associated with the respective nucleobase.
- the light emissions are detected by a sensor, wherein the sensor is configured to provide a single output based upon the first and second signals.
- the sensor comprises a single sensing element.
- a computer program product comprising instructions which, when the program is executed by a processor, cause the processor to carry out a method as described herein.
- a computer-readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out a method as described herein.
- a computer-readable data carrier having stored thereon a computer program product as described herein.
- a data carrier signal carrying a computer program product as described herein.
- the hairpin loop adaptor connects a 3’-end of the precursor forward library strand with a 5’-end of the precursor reverse library strand; or wherein the hairpin loop adaptor connects a 3’-end of the precursor reverse library strand with a 5’-end of the precursor forward library strand.
- the cleavable site is a restriction site for an endonuclease.
- the method further comprises a step of: (c) removing the precursor reverse library strand from the first hairpin polynucleotide to generate a second hairpin polynucleotide comprising the precursor forward library strand and the hairpin loop adaptor, wherein the hairpin loop adaptor comprises the cleavable site.
- the method further comprises a step of: (e) exposing the third hairpin polynucleotide to an enzyme configured to convert hemimethylated 5-methylcytosine CpG dyads to fully methylated 5-methylcytosine CpG dyads, but not convert hemimethylated 5-hydroxymethylcytosine dyads, in order to generate a fourth hairpin polynucleotide.
- the enzyme configured to convert hemimethylated 5-methylcytosine CpG dyads to fully methylated 5-methylcytosine CpG dyads, but not convert hemimethylated 5- hydroxymethylcytosine dyads is a DNA methyltransferase.
- the enzyme configured to convert hemimethylated 5-methylcytosine CpG dyads to fully methylated 5-methylcytosine CpG dyads, but not convert hemimethylated 5- hydroxymethylcytosine dyads is a member of the DNA methyltransferase 1 (DNMT1) family or the DNA methyltransferase 5 (DNMT5) family.
- the method further comprises a step of: (f) exposing the fourth hairpin polynucleotide to a conversion agent configured to convert 5-methylcytosine and 5-hydroxymethylcytosine to thymine or a nucleobase which is read as thymine/uracil, or to a conversion agent configured to convert an unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil, in order to generate a fifth hairpin polynucleotide.
- the conversion agent is configured to convert 5-methylcytosine and 5- hydroxymethylcytosine to thymine or a nucleobase which is read as thymine/uracil.
- the conversion agent is configured to convert unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil.
- the conversion agent comprises a chemical agent and/or an enzyme.
- the conversion agent comprises a boron-based reducing agent and a ten-eleven translocation (TET) methylcytosine dioxygenase.
- the boron-based reducing agent is an amine-borane compound or an azine-borane compound.
- the cytidine deaminase is a member of the APOBEC3A subfamily. In one aspect, the cytidine deaminase comprises amino acid substitution mutations at positions functionally equivalent to (Tyr/Phe)130 and Tyr132 in a wild-type APOBEC3A protein. In one aspect, the (Tyr/Phe)130 is Tyr130, and the wild-type APOBEC3A protein is SEQ ID NO. 16. In one aspect, the substitution mutation at the position functionally equivalent to Tyr130 comprises Ala, Val or Trp. In one aspect, the substitution mutation at the position functionally equivalent to Tyr132 comprises a mutation to His, Arg, Gln or Lys.
- the mutant cytidine deaminase comprises a ZDD motif H-[P/A/V]-E-X [23-28] -P-C- X [2-4] -C (SEQ ID NO.51).
- the mutant cytidine deaminase is a member of the APOBEC3A subfamily and comprises a ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-11]LX2LX[10]M (SEQ ID NO.52).
- the mutant cytidine deaminase converts 5-methylcytosine to thymine by deamination at a greater rate than a conversion rate of cytosine to uracil by deamination. In one aspect, the rate is at least 100-fold greater.
- the conversion agent further comprises a glycosyltransferase. In one aspect, the glycosyltransferase is a ⁇ -glucosyltransferase.
- the method further comprises a step of ligating a flanking adaptor to an end of the double-stranded polynucleotide away from the hairpin loop adaptor to the third hairpin polynucleotide, the fourth hairpin polynucleotide or the fifth hairpin polynucleotide, wherein the flanking adaptor comprises a primer-binding sequence and a primer-binding complement sequence.
- the flanking adaptor is a forked adaptor comprising a base-paired stem, a first arm and a second arm.
- the primer-binding sequence is located on the first arm
- the primer-binding complement sequence is located on the second arm.
- a polynucleotide library hairpin strand prepared according to a method of preparing a polynucleotide library hairpin strand as described herein.
- Figure 1 shows a forward strand, reverse strand, forward complement strand, and reverse complement strand of a polynucleotide molecule.
- Figure 2 shows the steps involved in a loop fork method.
- Figure 3 shows an example of a polynucleotide sequence prepared using a loop fork method.
- Figure 4 shows an example of a polynucleotide sequence prepared using a loop fork method.
- Figure 5 shows a typical solid support.
- Figure 6 shows the stages of bridge amplification for polynucleotide templates prepared using a loop fork method and the generation of an amplified cluster, comprising (Panel A) a concatenated library strand hybridising to a immobilised primer; (Panel B) generation of a template strand from the library strand; (Panel C) dehybridisation and washing away the library strand; (Panel D) generation of a template complement strand from the template strand via bridge amplification and dehybridisation of the sequence bridge; and (Panel E) further amplification to provide a plurality of template and template complement strands.
- Figure 7 shows the detection of nucleobases using 4-channel, 2-channel and 1-channel chemistry.
- Figure 8 shows a method of selective sequencing.
- Figure 9 shows a method of selective amplification comprising (Panel A) starting from a plurality of template and template complement strands; (Panel B) selective cleavage of one type of immobilised primer from the support; (Panel C) only template (or template complement) strands complementary to the free immobilised primer anneal and undergo bridge amplification, (Panel D) producing different proportions of template and template complement strands; (Panel E) subsequent standard (non-selective) sequencing occurs in different proportions enabling signal differentiation.
- Figure 10 shows a method of selective amplification comprising (Panel A) template and template complement strands annealing to immobilised primers; (Panel B) addition of a primer-blocking agent that binds only to one type of immobilised primer, preventing the extension from that one type of immobilised primer, preventing the extension from one type of immobilised primer; (Panel C) producing different proportions of template and template complement strands; (Panel D) subsequent standard (non-selective) sequencing occurs in different proportions enabling signal differentiation.
- Figure 11 shows a method of selective amplification comprising (Panel A) flowing a (or a plurality of) extended primer sequence(s) containing at least one additional 5’ nucleotide across the surface of the solid support; (Panel B) addition of a primer-blocking agent that binds only to one type of immobilised primer and is complementary to the additional 5’ nucleotide of the extended primer sequence, preventing the extension from one type of immobilised primer.
- Figure 12 is a plot showing graphical representations of sixteen distributions of signals generated by polynucleotide sequences according to one embodiment.
- Figure 13 is a flow diagram showing a method for base calling according to one embodiment.
- Figure 14 shows a prior art method for detecting 5-hydroxymethylcytosine and 5-methylcytosine. This involves conducting a sequencing run to determine the presence of both 5- hydroxymethylcytosine and 5-methylcytosine (left), then another separate sequencing run to determine the presence of only 5-methylcytosine (right). The presence of 5- hydroxymethylcytosine is obtained by comparing the two separate runs.
- Figure 15 shows a prior art method (Füllgrabe et al.) for detecting 5-hydroxymethylcytosine and 5-methylcytosine using hairpin polynucleotides. In the prior art method, the hairpin loop adaptor does not comprise a cleavable site.
- Figure 16 shows an example workflow of preparing a polynucleotide library hairpin strand according to a method as described herein, then a subsequent example workflow for preparing polynucleotide templates for distinguishing between modified cytosines according to a method as described herein.
- 5-methylcytosine is represented as 5m in bubbles
- 5-hydroxymethylcytosine is represented as 5hm in bubbles
- U represents uracil, thymine or a nucleobase which is read as thymine/uracil
- X represents A or T (the exact nature of A or T is not material and is not shown for clarity – XX base pairs are AT/TA base pairs which remain unchanged during the preparation process).
- Figure 17 shows another example workflow of preparing a polynucleotide library hairpin strand according to a method as described herein, then a subsequent example workflow for preparing polynucleotide templates for distinguishing between modified cytosines according to a method as described herein.
- 5-methylcytosine is represented as 5m in bubbles
- 5-hydroxymethylcytosine is represented as 5hm in bubbles
- U represents uracil, thymine or a nucleobase which is read as thymine/uracil
- X represents A or T (the exact nature of A or T is not material and is not shown for clarity – XX base pairs are AT/TA base pairs which remain unchanged during the preparation process).
- Figure 18 shows a further example workflow conducted according to the method shown in Figure 17.
- All patents, patent applications, and other publications referred to herein, including all sequences disclosed within these references, are expressly incorporated herein by reference, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference.
- All documents cited are, in relevant part, incorporated herein by reference in their entireties for the purposes indicated by the context of their citation herein. However, the citation of any document is not to be construed as an admission that it is prior art with respect to the present disclosure.
- the present invention can be used in sequencing, in particular concurrent sequencing.
- variant refers to a variant polypeptide sequence or part of the polypeptide sequence that retains desired function of the full non-variant sequence.
- a desired function of the immobilised primer retains the ability to bind (i.e. hybridise) to a target sequence.
- a “variant” has at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to the non-variant nucleic acid sequence.
- sequence identity of a variant can be determined using any number of sequence alignment programs known in the art.
- fragment refers to a functionally active series of consecutive nucleic acids from a longer nucleic acid sequence.
- the fragment may be at least 99%, at least 95%, at least 90%, at least 80%, at least 70%, at least 60%, at least 50%, at least 40% or at least 30% the length of the longer nucleic acid sequence.
- a fragment as used herein may also retain the ability to bind (i.e. hybridise) to a target sequence.
- Sequencing generally comprises four fundamental steps: 1) library preparation to form a plurality of target polynucleotides for identification; 2) cluster generation to form an array of amplified template polynucleotides; 3) sequencing the cluster array of amplified template polynucleotides; and 4) data analysis to identify characteristics of the target polynucleotides from the amplified template polynucleotide sequences.
- the polynucleotide sequence 100 comprises a forward strand of the sequence 101 and a reverse strand of the sequence 102.
- the polynucleotide sequence 100 is replicated (e.g. using a DNA/RNA polymerase), complementary versions of the forward strand 101 of the sequence 100 and the reverse strand 102 of the sequence 100 are generated.
- replication of the polynucleotide sequence 100 provides a double-stranded polynucleotide sequence 100a that comprises a forward strand of the sequence 101 and a forward complement strand of the sequence 101’, and a double-stranded polynucleotide sequence 100b that comprises a reverse strand of the sequence 102 and a reverse complement strand of the sequence 102’.
- the term “template” may be used to describe a complementary version of the double-stranded polynucleotide sequence 100. As such, the “template” comprises a forward complement strand of the sequence 101’ and a reverse complement strand of the sequence 102’.
- a sequencing process (e.g. a sequencing-by-synthesis or a sequencing-by-ligation process) reproduces information that was present in the original forward strand of the sequence 101.
- a sequencing process (e.g. a sequencing-by-synthesis or a sequencing-by-ligation process) reproduces information that was present in the original reverse strand of the sequence 102.
- the two strands in the template may also be referred to as a forward strand of the template 101’ and a reverse strand of the template 102’.
- the complement of the forward strand of the template 101’ is termed the forward complement strand of the template 101
- the complement of the reverse strand of the template 102’ is termed the reverse complement strand of the template 102.
- Language for original polynucleotide Corresponding language for the sequence 100 “template” Forward strand of the sequence 101 Forward complement strand of the template 101 (sometimes referred to herein as forward complement strand 101) Reverse strand of the sequence 102 Reverse complement strand of the template 102 (sometimes referred to herein as reverse complement strand 102) Forward complement strand of the sequence Forward strand of the template 101’ 101’ (sometimes referred to herein as forward strand 101’) Reverse complement strand of the sequence Reverse strand of the template 102’ 102’ (sometimes referred to herein as reverse strand 102’)
- Library preparation is the first step in any high-throughput sequencing platform.
- nucleic acid sequences for example genomic DNA sample, or cDNA or RNA sample
- a sequencing library which can then be sequenced.
- the first step in library preparation is random fragmentation of the DNA sample.
- Sample DNA is first fragmented and the fragments of a specific size (typically 200–500 bp, but can be larger) are ligated, sub-cloned or “inserted” in- between two oligo adaptors (adaptor sequences).
- the original sample DNA fragments are referred to as “inserts”.
- the target polynucleotides may advantageously also be size-fractionated prior to modification with the adaptor sequences.
- the templates to be generated from the libraries may include separate polynucleotide sequences, in particular a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion. Generating these templates from particular libraries may be performed according to methods known to persons of skill in the art. However, some example approaches of preparing libraries suitable for generation of such templates are described below. In some embodiments, the library may be prepared using a loop fork method, which is described below.
- This procedure may be used, for example, for preparing templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, wherein the first portion is a forward strand of the template, and the second portion is a reverse complement strand of the template (or alternatively, wherein the first portion is a reverse strand of the template, and the second portion is a forward complement strand of the template).
- a representative process for conducting a loop fork method is shown in Figure 2. Starting from a double-stranded polynucleotide sequence comprising a forward strand of the sequence and a reverse strand of the sequence, adaptors may be ligated to a first end of the sequence (e.g.
- a second end of the sequence (different from the first end) may be ligated to a loop, which connects the forward strand of the sequence and the reverse strand of the sequence, thus generating a loop fork ligated polynucleotide sequence.
- additional steps e.g.
- templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, wherein the first portion is a forward strand of the template, and the second portion is a reverse complement strand of the template (or alternatively, wherein the first portion is a reverse strand of the template, and the second portion is a forward complement strand of the template).
- the processes described above in relation to loop fork methods generate libraries that have self- tandem insert polynucleotides.
- one strand of a polynucleotide within a polynucleotide library may comprise, in a 5’ to 3’ direction, a second primer-binding complement sequence 302 (e.g. P7), an optional first terminal sequencing primer binding site complement 303’, a first insert sequence 401 (A and B), a loop sequence 403 (L) (also referred to herein as a hairpin loop adaptor), a second insert sequence 402 (B’ and A’), an optional second terminal sequencing primer binding site 304, and a first primer- binding sequence 301’ (e.g. P5’) ( Figures 3 and 4).
- a second primer-binding complement sequence 302 e.g. P7
- an optional first terminal sequencing primer binding site complement 303’ e.g. P7
- a first insert sequence 401 A and B
- L loop sequence 403
- B second insert sequence 402
- B optional second terminal sequencing primer binding site 304
- a first primer- binding sequence 301’ e.g
- one or more sequencing primer binding sites may be provided within the loop sequence 403 (L) (or hairpin loop adaptor).
- the strand may further comprise one or more index sequences.
- a first index sequence e.g. i7
- the second primer- binding complement sequence 302 e.g. P7
- the optional first terminal sequencing primer binding site complement 303’ e.g. P5
- a second index complement sequence e.g. i5’
- the optional second terminal sequencing primer binding site 304 and the first primer-binding sequence 301’ e.g. P5’.
- one strand of a polynucleotide within a polynucleotide library may comprise, in a 5’ to 3’ direction, a second primer-binding complement sequence 302 (e.g. P7), a first index sequence (e.g. i7), an optional first terminal sequencing primer binding site complement 303’, a first insert sequence 401 (A and B), a loop sequence 403 (L) (or hairpin loop adaptor), a second insert sequence 402 (B’ and A’), an optional second terminal sequencing primer binding site 304, a second index complement sequence (e.g. i5’), and a first primer-binding sequence 301’ (e.g. P5’).
- a second primer-binding complement sequence 302 e.g. P7
- a first index sequence e.g. i7
- an optional first terminal sequencing primer binding site complement 303’ e.g. i7
- a first insert sequence 401 A and B
- L loop sequence 403
- B or
- one or more index sequences may be provided within the loop sequence 403 (L) (or hairpin loop adaptor).
- the loop sequence 403 (L) (or hairpin loop adaptor) is shown in Figures 3 and 4 as being ligated on the right hand side of the double-stranded fragment, and the P5’/P7 adaptor being ligated on the left hand side of the double-stranded fragment – in other words, the loop sequence 403 (L) (or hairpin loop adaptor) connects a 3’-end of the forward library strand with a 5’-end of the reverse library strand.
- the positions of the hairpin loop adaptor and the P5’/P7 adaptor may be reversed, such that the loop sequence 403 (L) (or hairpin loop adaptor) is ligated on the left hand side of the double-stranded fragment, and the P5’/P7 adaptor being ligated on the right hand side of the double-stranded fragment – in other words, the loop sequence 403 (L) (or hairpin loop adaptor) connects a 3’-end of the reverse library strand with a 5’-end of the forward library strand.
- the loop sequence 403 (L) or hairpin loop adaptor
- the first insert sequence 401 may comprise a forward strand of the sequence 101
- the second insert complement sequence 402’ may comprise a reverse complement strand of the sequence 102’ (or the first insert sequence 401 may comprise a reverse strand of the sequence 102, and the second insert complement sequence 402’ may comprise a forward complement strand of the sequence 101’), for example where the library is prepared using a loop fork method.
- Figure 3 shows the presence of a first terminal sequencing primer binding site complement 303’, a second terminal sequencing primer binding site 304, these are optional as mentioned above (in addition, their complements, a second terminal sequencing primer binding site complement 304’, and a first terminal sequencing primer binding site 303). Accordingly, these sections may be omitted from the library.
- a double-stranded nucleic acid will typically be formed from two complementary polynucleotide strands comprised of deoxyribonucleotides or ribonucleotides joined by phosphodiester bonds, but may additionally include one or more ribonucleotides and/or non-nucleotide chemical moieties and/or non-naturally occurring nucleotides and/or non-naturally occurring backbone linkages.
- the double-stranded nucleic acid may include non-nucleotide chemical moieties, e.g. linkers or spacers, at the 5' end of one or both strands.
- the double-stranded nucleic acid may include methylated nucleotides, uracil bases, phosphorothioate groups, peptide conjugates etc.
- Such non-DNA or non-natural modifications may be included in order to confer some desirable property to the nucleic acid, for example to enable covalent, non-covalent or metal-coordination attachment to a solid support, or to act as spacers to position the site of cleavage an optimal distance from the solid support.
- a single stranded nucleic acid consists of one such polynucleotide strand.
- a sequence comprising at least a primer-binding sequence (e.g. a primer-binding sequence and a sequencing primer binding site, or a combination of a primer-binding sequence, an index sequence and a sequencing primer binding site) may be referred to herein as an adaptor sequence, and an insert is flanked by a 5’ adaptor sequence and a 3’ adaptor sequence.
- the primer-binding sequence may also comprise a sequencing primer for the index read.
- an “adaptor” refers to a sequence that comprises a short sequence-specific oligonucleotide that is ligated to the 5 ⁇ and 3′ ends of each DNA (or RNA) fragment in a sequencing library as part of library preparation.
- the adaptor sequence may further comprise non- peptide linkers.
- the P5’ and P7’ primer-binding sequences are complementary to short primer sequences (or lawn primers) present on the surface of a flow cell. Binding of P5’ and P7’ to their complements (P5 and P7) on – for example – the surface of the flow cell, permits nucleic acid amplification.
- ‘” denotes the complementary strand.
- the primer-binding sequences in the adaptor which permit hybridisation to amplification primers will typically be around 20-40 nucleotides in length, although the invention is not limited to sequences of this length.
- the precise identity of the amplification primers (e.g. lawn primers), and hence the cognate sequences in the adaptors, are generally not material to the invention, as long as the primer-binding sequences are able to interact with the amplification primers in order to direct PCR amplification.
- Sequencing reads from pooled libraries are identified and sorted computationally, based on their barcodes, before final data analysis.
- Library multiplexing is also a useful technique when working with small genomes or targeting genomic regions of interest. Multiplexing with barcodes can exponentially increase the number of samples analysed in a single run, without drastically increasing run cost or run time. Examples of tag sequences are found in WO05/068656, whose contents are incorporated herein by reference in their entirety. The tag can be read at the end of the first read, or equally at the end of the second read, for example using a sequencing primer complementary to the strand marked P7.
- the invention is not limited by the number of reads per cluster, for example two reads per cluster: three or more reads per cluster are obtainable simply by dehybridising a first extended sequencing primer, and rehybridising a second primer before or after a cluster repopulation/strand resynthesis step. Methods of preparing suitable samples for indexing are described in, for example WO 2008/093098, which is incorporated herein by reference.
- Single or dual indexing may also be used. With single indexing, up to 48 unique 6-base indexes can be used to generate up to 48 uniquely tagged libraries. With dual indexing, up to 24 unique 8-base Index 1 sequences and up to 16 unique 8-base Index 2 sequences can be used in combination to generate up to 384 uniquely tagged libraries.
- Pairs of indexes can also be used such that every i5 index and every i7 index are used only one time. With these unique dual indexes, it is possible to identify and filter indexed hopped reads, providing even higher confidence in multiplexed samples.
- the sequencing primer binding sites are sequencing and/or index primer binding sites and indicate the starting point of the sequencing read.
- a sequencing primer anneals (i.e. hybridises) to at least a portion of the sequencing primer binding site on the template strand.
- the polymerase enzyme binds to this site and incorporates complementary nucleotides base by base into the growing opposite strand.
- a double stranded nucleic acid library is formed, typically, the library has previously been subjected to denaturing conditions to provide single stranded nucleic acids. Suitable denaturing conditions will be apparent to the skilled reader with reference to standard molecular biology protocols (Sambrook et al., 2001, Molecular Cloning, A Laboratory Manual, 4th Ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Press, NY; Current Protocols, eds Ausubel et al). In one embodiment, chemical denaturation may be used. Following denaturation, a single-stranded library may be contacted in free solution onto a solid support comprising surface capture moieties (for example P5 and P7 lawn primers).
- surface capture moieties for example P5 and P7 lawn primers.
- inventions of the present invention may be performed on a solid support 200, such as a flowcell.
- seeding and clustering can be conducted off- flowcell using other types of solid support.
- the solid support 200 may comprise a substrate 204. See Figure 5.
- the substrate 204 comprises at least one well 203 (e.g. a nanowell), and typically comprises a plurality of wells 203 (e.g. a plurality of nanowells).
- the solid support comprises at least one first immobilised primer and at least one second immobilised primer.
- each well 203 may comprise at least one first immobilised primer 201, and typically may comprise a plurality of first immobilised primers 201.
- each well 203 may comprise at least one second immobilised primer 202, and typically may comprise a plurality of second immobilised primers 202.
- each well 203 may comprise at least one first immobilised primer 201 and at least one second immobilised primer 202, and typically may comprise a plurality of first immobilised primers 201 and a plurality of second immobilised primers 202.
- the first immobilised primer 201 may be attached via a 5’-end of its polynucleotide chain to the solid support 200. When extension occurs from first immobilised primer 201, the extension may be in a direction away from the solid support 200.
- the second immobilised primer 202 may be attached via a 5’-end of its polynucleotide chain to the solid support 200. When extension occurs from second immobilised primer 202, the extension may be in a direction away from the solid support 200.
- the first immobilised primer 201 may be different to the second immobilised primer 202 and/or a complement of the second immobilised primer 202.
- the second immobilised primer 202 may be different to the first immobilised primer 201 and/or a complement of the first immobilised primer 201.
- the (or each of the) first immobilised primer(s) 201 may comprise a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof.
- the second immobilised primer(s) 202 may comprise a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof. Whilst first immobilised primer(s) 201 are shown here to correspond to P5 and second immobilised primer(s) 202 are shown here to correspond to P7, the definitions of these may be swapped – in other words, first immobilised primer(s) 201 may correspond instead to P7, and second immobilised primer(s) 202 may correspond to P5. In some embodiments, the first immobilised primer(s) 201 and the second immobilised primer(s) 202 within a well 203 may be spatially separated from each other.
- the first immobilised primer(s) 201 may occupy a first region
- the second immobilised primer(s) 202 may occupy a second region, wherein the first region and the second region do not overlap with each other.
- any signals generated e.g. a first signal and a second signal as referred to herein
- the first immobilised primer(s) 201 and the second immobilised primer(s) 202 within a well 203 may not be spatially separated from each other.
- the first immobilised primer(s) 201 may occupy a first region
- the second immobilised primer(s) 202 may occupy a second region, wherein the first region and the second region may correspond to the same region or may be substantially overlapping.
- any signals generated e.g. a first signal and a second signal as referred to herein
- the solid support may be contacted with the template to be amplified under conditions which permit hybridisation (or annealing – such terms may be used interchangeably) between the template and the immobilised primers.
- Both strands of the amplification products will be immobilised on the solid support at or near the 5' end, this attachment being derived from the original attachment of the amplification primers.
- the amplification products within each colony will be derived from amplification of a single template molecule.
- Other amplification procedures may be used, and will be known to the skilled person.
- amplification may be isothermal amplification using a strand displacement polymerase; or may be exclusion amplification as described in WO 2013/188582. Further information on amplification can be found in WO 02/06456 and WO 07/107710, the contents of which are incorporated herein in their entirety by reference.
- a cluster of template molecules comprising copies of a template strand and copies of the complement of the template strand.
- the steps of cluster generation and amplification for templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion are illustrated below and in Figure 6.
- each first polynucleotide sequence may be attached (via the 5’-end of the first polynucleotide sequence) to a first immobilised primer, and wherein each second polynucleotide sequence is attached (via the 5’-end of the second polynucleotide sequence) to a second immobilised primer.
- Each first polynucleotide sequence may comprise a second adaptor sequence, wherein the second adaptor sequence comprises a portion which is substantially complementary to the second immobilised primer (or is substantially complementary to the second immobilised primer).
- the second adaptor sequence may be at a 3’-end of the first polynucleotide sequence.
- Each second polynucleotide sequence may comprise a first adaptor sequence, wherein the first adaptor sequence comprises a portion which is substantially complementary to the first immobilised primer (or is substantially complementary to the first immobilised primer).
- the first adaptor sequence may be at a 3’-end of the second polynucleotide sequence.
- a solution comprising a polynucleotide library prepared by a loop fork method as described above may be flowed across a flowcell.
- a particular polynucleotide strand from the polynucleotide library to be sequenced comprising, in a 5’ to 3’ direction, a second primer-binding complement sequence 302 (e.g.
- an optional first terminal sequencing primer binding site complement 303’, a first insert sequence 401 (A and B), a loop sequence 403 (L) (or hairpin loop adaptor), a second insert sequence 402 (B’ and A’), an optional second terminal sequencing primer binding site 304, and a first primer-binding sequence 301’ (e.g. P5’), may anneal (via the first primer-binding sequence 301’) to the first immobilised primer 201 (e.g. P5 lawn primer) located within a particular well 203 ( Figure 6A).
- the polynucleotide library may comprise other polynucleotide strands with different first insert sequences 401 and second insert sequences 402.
- Such other polynucleotide strands may anneal to corresponding first immobilised primers 201 (e.g. P5 lawn primers) in different wells 203, thus enabling parallel processing of the various different strands within the polynucleotide library.
- a new polynucleotide strand may then be synthesised, extending from the first immobilised primer 201 (e.g. P5 lawn primer) in a direction away from the substrate 204.
- this generates a template strand comprising, in a 5’ to 3’ direction, the first immobilised primer 201 (e.g.
- P5 lawn primer which is attached to the solid support 200, an optional second terminal sequencing primer binding site complement 304’, a second insert complement sequence 402’ (A’ copy and B’ copy), a loop complement sequence 403’ (L’) (also referred to herein as a spacer strand, that is complementary to the hairpin loop adaptor), a first insert complement sequence 401’ (B copy and A copy), an optional first terminal sequencing primer binding site 303, and a second primer-binding sequence 302’ (e.g. P7’) ( Figure 6B).
- a polymerase such as a DNA or RNA polymerase. If the polynucleotides in the library comprise index sequences, then corresponding index sequences are also produced in the template.
- the polynucleotide strand from the polynucleotide library may then be dehybridised and washed away, leaving a template strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) ( Figure 6C).
- the second primer-binding sequence 302’ (e.g. P7’) on the template strand may then anneal to a second immobilised primer 202 (e.g. P7 lawn primer) located within the well 203. This forms a “bridge”.
- a new polynucleotide strand may then be synthesised by bridge amplification, extending from the second immobilised primer 202 (e.g. P7 lawn primer) (initially) in a direction away from the substrate 204.
- the strand attached to the second immobilised primer 202 may then be dehybridised from the strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) ( Figure 6D).
- a subsequent bridge amplification cycle can then lead to amplification of the strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) and the strand attached to the second immobilised primer 202 (e.g. P7 lawn primer).
- the second primer-binding sequence 302’ e.g. P7’
- on the template strand attached to the first immobilised primer 201 may then anneal to another second immobilised primer 202 (e.g.
- first primer-binding sequence 301’ e.g. P5’
- second immobilised primer 202 e.g. P7 lawn primer
- another first immobilised primer 201 e.g. P5 lawn primer
- Completion of bridge amplification and dehybridisation may then provide an amplified cluster, thus providing a plurality of polynucleotide sequences comprising a first insert complement sequence 401’ and a second insert complement sequence 402’, as well as a plurality of polynucleotide sequences comprising a first insert sequence 401 and a second insert sequence 402 ( Figure 6E). If desired, further bridge amplification cycles may be conducted to increase the number of polynucleotide sequences within the well 203.
- Figure 6 shows the presence of a first terminal sequencing primer binding site complement 303’, a second terminal sequencing primer binding site 304, a second terminal sequencing primer binding site complement 304’, and a first terminal sequencing primer binding site 303, these are optional as mentioned above. Accordingly, these sections may be omitted from the template and template complement strands.
- the methods for clustering and amplification described above generally relate to conducting non- selective amplification. However, methods of the present invention relating to selective processing may comprise conducting selective amplification, which is described in further detail below under selective processing. Sequencing As described herein, the template provides information (e.g. identification of the genetic sequence, identification of epigenetic modifications) on the original target polynucleotide sequence.
- a sequencing process may reproduce information that was present in the original target polynucleotide sequence, by using complementary base pairing.
- sequencing may be carried out using any suitable "sequencing-by-synthesis" technique, wherein nucleotides are added successively in cycles to the free 3' hydroxyl group, resulting in synthesis of a polynucleotide chain in the 5' to 3' direction. The nature of the nucleotide added may be determined after each addition.
- One particular sequencing method relies on the use of modified nucleotides that can act as reversible chain terminators. Such reversible chain terminators comprise removable 3' blocking groups.
- Suitable labels are described in PCT application PCT/GB2007/001770, the contents of which are incorporated herein by reference in their entirety.
- a separate reaction may be carried out containing each of the modified nucleotides added individually.
- the modified nucleotides may carry a label to facilitate their detection.
- a label may be configured to emit a signal, such as an electromagnetic signal, or a (visible) light signal.
- the label is a fluorescent label (e.g. a dye).
- such a label may be configured to emit an electromagnetic signal, or a (visible) light signal.
- One method for detecting the fluorescently labelled nucleotides comprises using laser light of a wavelength specific for the labelled nucleotides, or the use of other suitable sources of illumination.
- the fluorescence from the label on an incorporated nucleotide may be detected by a CCD camera or other suitable detection means. Suitable detection means are described in PCT/US2007/007991, the contents of which are incorporated herein by reference in their entirety.
- the detectable label need not be a fluorescent label. Any label can be used which allows the detection of the incorporation of the nucleotide into the DNA sequence. Each cycle may involve simultaneous delivery of four different nucleotide types to the array of template molecules.
- each nucleotide type may have a (spectrally) distinct label.
- four channels may be used to detect four nucleobases (also known as 4-channel chemistry) ( Figure 7 – left).
- a first nucleotide type e.g. A
- a second nucleotide type e.g. G
- a third nucleotide type e.g.
- T may include a third label (e.g. configured to emit a third wavelength, such as green light), and a fourth nucleotide type (e.g. C) may include a fourth label (e.g. configured to emit a fourth wavelength, such as yellow light).
- a detection channel that is selective for one of the four different labels.
- the first nucleotide type e.g. A
- the second nucleotide type e.g. G
- the third nucleotide type e.g.
- T may be detected in a third channel (e.g. configured to detect the third wavelength, such as green light), and the fourth nucleotide type (e.g. C) may be detected in a fourth channel (e.g. configured to detect the fourth wavelength, such as yellow light).
- the fourth nucleotide type e.g. C
- a fourth channel e.g. configured to detect the fourth wavelength, such as yellow light.
- detection of each nucleotide type may be conducted using fewer than four different labels. For example, sequencing-by-synthesis may be performed using methods and systems described in US 2013/0079232, which is incorporated herein by reference.
- two channels may be used to detect four nucleobases (also known as 2-channel chemistry) ( Figure 7 – middle).
- a first nucleotide type e.g. A
- a second label e.g. configured to emit a second wavelength, such as red light
- a second nucleotide type e.g. G
- a third nucleotide type e.g. T
- the first label e.g.
- the first nucleotide type (e.g. A) may be detected in both a first channel (e.g. configured to detect the first wavelength, such as red light) and a second channel (e.g. configured to detect the second wavelength, such as green light), the second nucleotide type (e.g.
- the third nucleotide type (e.g. T) may be detected in the first channel (e.g. configured to detect the first wavelength, such as red light) and may not be detected in the second channel
- the fourth nucleotide type (e.g. C) may not be detected in the first channel and may be detected in the second channel (e.g. configured to detect the second wavelength, such as green light).
- one channel may be used to detect four nucleobases (also known as 1- channel chemistry) ( Figure 7 – right).
- a first nucleotide type e.g. A
- a second nucleotide type e.g. G
- a third nucleotide type e.g. T
- a non-cleavable label e.g. configured to emit the wavelength, such as green light
- a fourth nucleotide type e.g. C
- a label-accepting site which does not include the label.
- a first image can then be obtained, and a subsequent treatment carried out to cleave the label attached to the first nucleotide type, and to attach the label to the label-accepting site on the fourth nucleotide type.
- a second image may then be obtained.
- the first nucleotide type e.g. A
- the second nucleotide type e.g. G
- the third nucleotide type e.g. T
- the channel e.g.
- the sequencing process comprises a first sequencing read and second sequencing read.
- the first sequencing read is conducted separately from the second sequencing read.
- the first sequencing read and the second sequencing read may be conducted concurrently. In other words, the first sequencing read and the second sequencing read may be conducted at the same time.
- the first sequencing read may comprise the binding of a first sequencing primer (also known as a read 1.1 sequencing primer) to the first sequencing primer binding site (e.g. within loop complement sequence 403’).
- the second sequencing read may comprise the binding of a second sequencing primer (also known as a read 1.2 sequencing primer) to the second sequencing primer binding site (e.g. within loop sequence 403). This leads to sequencing of the first portion (e.g. second insert complement sequence 402’) and the second portion (e.g. first insert sequence 401).
- the methods for sequencing described above generally relate to conducting non-selective sequencing.
- methods of the present invention relating to selective processing may comprise conducting selective sequencing, which is described in further detail below under selective processing.
- the signals generated may be spatially resolved or spatially unresolved.
- the signals generated by the first portion and the second portion may be parsed by interpreting these signals separately in view of the spatial separation, and non-selective processing methods (such as non-selective amplification and non-selective sequencing) may be used.
- spatially unresolved signals may involve selective processing methods (such as selective amplification and/or selective sequencing). Selective processing methods In some embodiments, selective processing methods may be used to generate signals of different intensities.
- the method may comprise selectively processing the at least one first polynucleotide sequence comprising a first portion and the at least one second polynucleotide sequence comprising a second portion, such that a proportion of first portions are capable of generating a first signal and a proportion of second portions are capable of generating a second signal, wherein the selective processing causes an intensity of the first signal to be greater than an intensity of the second signal.
- the method may comprise selectively processing a plurality of first polynucleotide sequences each comprising a first portion and a plurality of second polynucleotide sequences each comprising a second portion, such that a proportion of first portions are capable of generating a first signal and a proportion of second portions are capable of generating a second signal, wherein the selective processing causes an intensity of the first signal to be greater than an intensity of the second signal.
- selective processing is meant here performing an action that changes relative properties of the first portion and the second portion in the at least one first polynucleotide sequence comprising a first portion and at least one second polynucleotide sequence comprising a second portion (or the plurality of first polynucleotide sequences each comprising a first portion and the plurality of second polynucleotide sequences each comprising a second portion), so that the intensity of the first signal is greater than the intensity of the second signal.
- the property may be, for example, a concentration of first portions capable of generating the first signal relative to a concentration of second portions capable of generating the second signal.
- the action may include, for example, conducting selective amplification, conducting selective sequencing, or preparing for selective sequencing.
- the selective processing results in the concentration of the first portions capable of generating the first signal being greater than the concentration of the second portions capable of generating the second signal.
- the method of the invention results in an altered ratio of R1:R2 molecules, such as within a single cluster or a single well.
- the ratio may be between 1.25:1 to 5:1.
- the ratio may be between 1.5:1 to 3:1.
- the ratio may be about 2:1.
- Selective processing may refer to conducting selective sequencing.
- selective processing may refer to preparing for selective sequencing. As shown in Figure 8, in one example, selective sequencing may be achieved using a mixture of unblocked and blocked sequencing primers.
- the method of the invention involves (separate) polynucleotide strands, with a first polynucleotide strand with a first portion, and a second polynucleotide strand with a second portion
- the first polynucleotide strand may comprise a first sequencing primer binding site
- the second polynucleotide strand may comprise a second sequencing primer binding site, where the first sequencing primer binding site and second sequencing primer binding site are of a different sequence to each other and bind different sequencing primers.
- binding of first sequencing primers to the first sequencing primer site generates a first signal and binding of second sequencing primers to the second sequencing primer site generates a second signal, where the intensity of the first signal is greater than the intensity of the second signal.
- first polynucleotide strand comprises a first sequencing primer binding site
- second polynucleotide strand comprises a second sequencing primer binding site.
- Any ratio of blocked:unblocked second primers can be used that generates a second signal that is of a lower intensity than the first signal, for example, the ratio of blocked:unblocked primers may be 20:80 to 80:20.
- the ratio may be 1:2 to 2:1.
- a ratio of 50:50 of blocked:unblocked second primers is used, which in turn generates a second signal that is around 50% of the intensity of the first signal.
- the first and second sequencing primers may be added to the flow cell at the same time, or separately but sequentially.
- blocked is meant that the sequencing primer comprises a blocking group at a 3’ end of the sequencing primer. Suitable blocking groups include a hairpin loop (e.g.
- a polynucleotide attached to the 3’-end comprising in a 5’ to 3’ direction, a cleavable site such as a nucleotide comprising uracil, a loop portion, and a complement portion, wherein the complement portion is substantially complementary to all or a portion of the immobilised primer), a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer (e.g. -O-(CH2)3-OH instead of a 3’-OH group), a modification blocking the 3’-hydroxyl group (e.g.
- hydroxyl protecting groups such as silyl ether groups (e.g. trimethylsilyl, triethylsilyl, triisopropylsilyl, t-butyl(dimethyl)silyl, t- butyl(diphenyl)silyl), ether groups (e.g. benzyl, allyl, t-butyl, methoxymethyl (MOM), 2- methoxyethoxymethyl (MEM), tetrahydropyranyl), or acyl groups (e.g. acetyl, benzoyl)), or an inverted nucleobase.
- the blocking group may be any modification that prevents extension (i.e.
- the sequence of the sequencing primers and the sequence primer binding sites are not material to the methods of the invention, as long as the sequencing primers are able to bind to the sequence primer binding site to enable amplification and sequencing of the regions to be identified.
- the unblocked and blocked second sequencing primers are present in the sequencing composition in equal concentrations. That is, the ratio of blocked:unblocked second sequencing primers is around 50:50.
- the sequencing composition may further comprise at least one additional (first) sequencing primer.
- the sequencing composition comprises blocked second sequencing primers, unblocked second sequencing primers and at least one first sequencing primer.
- selective sequencing may be conducted on the amplified (duoclonal) cluster shown in Figure 6E, after restriction sites in the loop complement sequence 403' and the loop sequence 403 are cleaved by an endonuclease, as described in further detail below.
- a plurality of first sequencing primers 501 are added. These sequencing primers 501 anneal to a sequencing primer binding site present in the loop complement sequence 403’.
- a plurality of second unblocked sequencing primers 502a and a plurality of second blocked sequencing primers 502b are added, either at the same time as the first sequencing primers 501, or sequentially (e.g. prior to or after addition of first sequencing primers 501).
- second unblocked sequencing primers 502a and second blocked sequencing primers 502b anneal to a sequencing primer binding site present in the loop sequence 403.
- This then allows the second insert complement sequences 402’ (i.e. “first portions”) to be sequenced and the first insert sequences 401 (i.e. “second portions”) to be sequenced, wherein a greater proportion of second insert complement sequences 402’ are sequenced (black arrow) compared to a proportion of first insert sequences 401 (grey arrow).
- the positioning of first sequencing primers and second sequencing primers may be swapped. In other words, the first sequencing binding primers may anneal instead to the loop sequence 403, and the second sequencing binding primers may anneal instead to the loop complement sequence 403’.
- selective processing may refer to selective amplification. That is, selectively amplifying one portion (e.g. the first or second portion) on a first or second polynucleotide strand.
- selective processing comprises selectively removing some or substantially all of second immobilised primers that have not yet been extended (extended to form a second polynucleotide strand), and conducting at least one further amplification cycle in order to selectively amplify the first polynucleotide sequence(s) relative to the second polynucleotide sequence(s).
- Immobilised primers that have not yet been extended may be referred to herein as free or un-extended second immobilised primers.
- selective removal of some or substantially all free second immobilised primers is carried out before at least one further round of bridge amplification and before any sequencing of the target regions.
- the ratio of first polynucleotide capable of generating a first signal to the second polynucleotide that is capable of generating a second signal is altered, which in turn leads to two signals of different intensities, permitting concurrent sequencing of both sequences (or the target regions within those sequences).
- “some or substantially all” is meant that at least 75%, at least 80%, at least 90% or between 95% and 100% of free second immobilised primers are removed.
- the selective removal of all or substantially all free second immobilised primers may be carried out using a reagent capable of cleaving the immobilised primer from the solid support.
- This reagent may be added following at least 5, at least 10, at least 15 or following at least 20 to 24 rounds of bridge amplification.
- the reagent may be added separately or together with the amplification reagents for performing the at least one further round of amplification.
- the first and second immobilised primers may be attached to the surface of a solid support though a linker.
- the linker may be different for the first and second immobilised primers.
- the linker may be any cleavable linker; that is the linker may comprise one or more moieties, such as modified nucleotides, that enable selective cleavage of the immobilised primer from the surface of the solid support.
- the linker may comprise uracil bases, phosphorothioate groups, ribonucleotides, diol linkages, disulphide linkages, peptides etc. which may be included, not only to allow covalent attachment to a solid support, but also to allow selective cleavage of the linker.
- the first immobilised primer is attached to a solid support through a first linker, where the linker comprises uracil, or 2-deoxyuridine.
- free first immobilised primers can be removed using uracil glycosylase.
- free first immobilised primers can be removed using a USER enzyme mix (which is a cocktail of uracil glycosylase and endonuclease VIII).
- the second immobilised primer is attached to a solid support through a second linker, where the linker comprises 8-oxoguanine.
- free second immobilised primers that is, primers that are not extended
- One example of this method is shown in Figure 9. Selective amplification may be conducted on the amplified (duoclonal) cluster as shown in Figure 6E.
- the solid support 200 comprises free first immobilised primers 201 and free second immobilised primers 202 ( Figure 9A).
- strand 1001’ represents second insert complement sequence 402’, loop complement sequence 403’ and first insert complement sequence 401’, whilst strand 1001 represents first insert sequence 401, loop sequence 403 and second insert sequence 402.
- Free second immobilised primers 202 are cleaved from the solid support 200, thus leaving behind free first immobilised primers 201 ( Figure 9B).
- the first primer-binding sequence 301’ e.g. P5’
- free first immobilised primers 201 e.g. P5 lawn primer located within the well 203.
- free second immobilised primers 202 e.g.
- second primer-binding sequences 302’ (e.g. P7’) are not able to anneal ( Figure 9C). After conducting a cycle of bridge amplification, this leads to selective amplification of the strand 1001’, relative to the strand 1001 ( Figure 9D). Conducting standard (non-selective) sequencing then allows strands 1001’ and strands 1001 to be sequenced, wherein a greater proportion of strands 1001’ are sequenced (grey arrow) compared to a proportion of strands 1001 (black arrow) ( Figure 9E).
- selectively processing comprises selectively blocking the extension of some or substantially all of the second immobilised primers that have not yet been extended (extended to form a second polynucleotide strand).
- these primers may be referred to herein as free or un-extended second immobilised primers.
- the method may involve using a primer-blocking agent, wherein the primer-blocking agent is configured to limit or prevent synthesis of a strand (i.e. a polynucleotide strand) extending from the second immobilised primer.
- the method may further involve conducting at least one further amplification cycle. As the free second immobilised primers are blocked from being extended by the primer-blocking agent, only the first immobilised primers can be extended.
- the primer-blocking agent may be flowed across the solid support following bridge amplification. In one embodiment, the primer-blocking agent is flowed across the solid support following at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 cycles, following at least 15, following at least 20 or following at least 25 rounds of bridge amplification.
- the primer-blocking agent is added whilst first polynucleotide sequence(s) are hybridised to the second immobilised primers. That is, the primer-blocking agent is added during amplification and following extension of at least the first polynucleotide strand. At this stage the extended first polynucleotide strand bends (bridges) and hybridises at its 5’ end to the second immobilised primer. Addition of the primer-blocking agent at this stage prevents extension of the second immobilised primer, which would normally occur using the first polynucleotide strand as its template.
- the primer-blocking agent is a blocked nucleotide.
- the blocked nucleotide may be A, C, T or G, but may be selected from A or G.
- blocked is meant that the sequencing primer comprises a blocking group at a 3’ end of the sequencing primer. Suitable blocking groups include a hairpin loop (e.g.
- a polynucleotide attached to the 3’-end comprising in a 5’ to 3’ direction, a cleavable site such as a nucleotide comprising uracil, a loop portion, and a complement portion, wherein the complement portion is substantially complementary to all or a portion of the immobilised primer), a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer (e.g.
- hydroxyl protecting groups such as silyl ether groups (e.g. trimethylsilyl, triethylsilyl, triisopropylsilyl, t-butyl(dimethyl)silyl, t- butyl(diphenyl)silyl), ether groups (e.g. benzyl, allyl, t-butyl, methoxymethyl (MOM), 2- methoxyethoxymethyl (MEM), tetrahydropyranyl), or acyl groups (e.g.
- hydroxyl protecting groups such as silyl ether groups (e.g. trimethylsilyl, triethylsilyl, triisopropylsilyl, t-butyl(dimethyl)silyl, t- butyl(diphenyl)silyl), ether groups (e.g. benzyl, allyl, t-butyl, methoxymethyl (MOM), 2- methoxyethoxymethyl
- the blocking group may be any modification that prevents extension (i.e. elongation) of the primer by a polymerase.
- the block may be reversible or irreversible.
- the blocked nucleotide may be added as part of a mixture comprising both blocked and unblocked nucleotides. Alternatively, the blocked nucleotide may be added to the flow cell separately and either before or after unblocked nucleotides are added. Following addition of the blocked nucleotide, at least one more round of bridge amplification is performed.
- Figure 10 One example of this method is shown in Figure 10.
- Selective amplification may be conducted on the amplified (duoclonal) cluster as shown in Figure 9A.
- the first primer-binding sequence 301’ (e.g. P5’) on one set of template strands may anneal to first immobilised primers 201 (e.g. P5 lawn primer), and the second primer-binding sequence 302’ (e.g. P7’) on another set of template strands may anneal to second immobilised primers 202 (e.g. P7 lawn primer) ( Figure 10A).
- the second primer-binding sequence 302’ e.g.
- a primer-blocking agent 601 is selectively installed onto a 3’-end of the second immobilised primer 202, whilst no installation occurs to the 3’-end of the first immobilised primer 201 ( Figure 10B). After conducting a cycle of bridge amplification, this leads to selective amplification of the strands 1001’, relative to the strands 1001.
- the primer-blocking agent 601 prevents extension from the second immobilised primer 202 ( Figure 10C).
- the method comprises flowing at least one, or a plurality of, extended primer sequence(s) across the surface of the solid support (e.g. a flow cell), wherein such sequences can bind (e.g. hybridise) free immobilised primers (e.g.
- the extended primer sequences further comprise at least one 5’ additional nucleotide; and (b) adding the primer blocking agent, where the primer blocking agent is complementary to the 5’ additional nucleotide.
- the extended primer sequences are substantially complementary to the first or second immobilised primers (e.g. P5 or P7), or substantially complementary to a portion of the first or second immobilised primer.
- the 5’ additional nucleotide may be selected from A, T, C or G, but may be T (or U) or C.
- the 5’ additional nucleotide is not a complement of the 3’ nucleotide of the second immobilised primer (where the extended primer sequence binds the first immobilised primer) or is not a complement of the 3’ nucleotide of the first immobilised primer (where the extended primer sequence binds the second immobilised primer).
- the first immobilised primer is P5 (for example as defined in SEQ ID NO.1) and the second immobilised primer is P7 for example as defined in SEQ ID NO.2)
- the 5’ additional nucleotide is not A.
- the 5’ additional nucleotide is not G.
- the primer-blocking agent is a blocked nucleotide, for example, as described above.
- the blocked nucleotide may be A, C, T or G, but may be selected from A or G. Accordingly, where the 5’ additional nucleotide is T or U, the primer-blocking agent is A, and where the 5’ additional nucleotide is C, the primer-blocking agent is G.
- the extended primer sequence(s) and primer-blocking agent may be flowed across the solid support following bridge amplification.
- the primer-blocking agent is flowed across the solid support following at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20 or following at least 25 rounds of bridge amplification.
- the extended primer sequence is selected from SEQ ID NO. 55 to 66 or a variant or fragment thereof.
- Selective amplification may be conducted on the amplified (duoclonal) cluster as shown in Figure 9A; as such following a number of rounds of amplification, a cluster is formed comprising both extended first (e.g. P5) and second (e.g. P7) immobilised polynucleotide strands.
- a (or a plurality of) extended primer sequence(s) is flowed across the surface of the solid support 200.
- the extended primer sequence 701 is substantially complementary to at least a portion, if not all of the immobilised primer (e.g. either P5 or P7) and binds to the immobilised primer (e.g. P5 or P7) as shown in Figure 11A.
- the extended primer sequence 701 comprises at least one additional 5’ nucleotide.
- a primer blocking agent 601 is added and flowed across the surface of the solid support (e.g. flow cell).
- the primer-blocking agent 601 As the primer-blocking agent 601 is complementary to the 5’ additional nucleotide of the extended primer sequence 701 the primer-blocking agent 601 binds to the 3’-end of the immobilised strands that are hybridised to the extended primer sequence 701, as shown in Figure 16B. As a consequence, addition of the primer-blocking agent 601 prevents not only extension of the immobilised strand (e.g. P5 or P7) but renders the immobilised primer (P5 or P7) unavailable for hybridisation and subsequent bridge amplification for other extended strands (e.g.101’) (see Figure 11B). Performing at least one more cycle of bridge amplification, leads to selective amplification of strands 1001’ (in a 2:1 ratio of 1001’ to 1001).
- conducting standard (non-selective) sequencing then allows strands 1001’ and strands 1001 to be sequenced, wherein a greater proportion of strands 1001’ are sequenced (grey arrow) compared to a proportion of strands 1001 (black arrow) ( Figure 10D).
- the extended primer sequences may be added as part of the amplification mixture described above.
- the blocked immobilised primer-binding sequence may be added to the flow cell separately and may be before the amplification mixture is added. Following addition of the blocked immobilised primer-binding sequence, at least one more round of bridge amplification is performed.
- Data analysis using 16 QaM Figure 12 is a scatter plot showing an example of sixteen distributions of signals generated by polynucleotide sequences disclosed herein.
- the scatter plot of Figure 12 shows sixteen distributions (or bins) of intensity values from the combination of a brighter signal (i.e. a first signal as described herein) and a dimmer signal (i.e. a second signal as described herein); the two signals may be co-localized and may not be optically resolved as described above.
- the intensity values shown in Figure 12 may be up to a scale or normalisation factor; the units of the intensity values may be arbitrary or relative (i.e., representing the ratio of the actual intensity to a reference intensity).
- the sum of the brighter signal generated by the first portions and the dimmer signal generated by the second portions results in a combined signal.
- the combined signal may be captured by a first optical channel and a second optical channel. Since the brighter signal may be A, T, C or G, and the dimmer signal may be A, T, C or G, there are sixteen possibilities for the combined signal, corresponding to sixteen distinguishable patterns when optically captured. That is, each of the sixteen possibilities corresponds to a bin shown in Figure 12.
- the computer system can map the combined signal generated into one of the sixteen bins, and thus determine the added nucleobase at the first portion and the added nucleobase at the second portion, respectively.
- the computer processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as C.
- the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as T.
- the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as G.
- the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as A.
- the processor base calls the added nucleobase at the first portion as T and the added nucleobase at the second portion as C.
- the processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as T.
- the processor base calls the added nucleobase at the first portion as T and the added nucleobase at the second portion as G.
- the processor base calls the added nucleobase at the first portion as T and the added nucleobase at the second portion as A.
- the processor base calls the added nucleobase at the first portion as G and the added nucleobase at the second portion as C.
- the processor base calls the added nucleobase at the first portion as G and the added nucleobase at the second portion as T.
- the processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as G.
- the processor base calls the added nucleobase at the first portion as G and the added nucleobase at the second portion as A.
- the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as C.
- the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as T.
- the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as G.
- the processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as A.
- T is configured to emit a signal in both the IMAGE 1 channel and the IMAGE 2 channel
- A is configured to emit a signal in the IMAGE 1 channel only
- C is configured to emit a signal in the IMAGE 2 channel only
- G does not emit a signal in either channel.
- A may be configured to emit a signal in both the IMAGE 1 channel and the IMAGE 2 channel
- T may be configured to emit a signal in the IMAGE 1 channel only
- C may be configured to emit a signal in the IMAGE 2 channel only
- G may be configured to not emit a signal in either channel.
- Figure 13 is a flow diagram showing a method 1700 of base calling according to the present disclosure.
- the described method allows for simultaneous sequencing of two (or more) portions (e.g. the first portion and the second portion) in a single sequencing run from a single combined signal obtained from the first portion and the second portion, thus requiring less sequencing reagent consumption and faster generation of data from both the first portion and the second portion.
- the simplified method may reduce the number of workflow steps while producing the same yield as compared to existing next-generation sequencing methods. Thus, the simplified method may result in reduced sequencing runtime.
- the disclosed method 1700 may start from block 1701. The method may then move to block 1710.
- intensity data is obtained.
- the intensity data includes first intensity data and second intensity data.
- the first intensity data comprises a combined intensity of a first signal component obtained based upon a respective first nucleobase of the first portion and a second signal component obtained based upon a respective second nucleobase of the second portion.
- the second intensity data comprises a combined intensity of a third signal component obtained based upon the respective first nucleobase of the first portion and a fourth signal component obtained based upon the respective second nucleobase of the second portion.
- the first portion is capable of generating a first signal comprising a first signal component and a third signal component.
- the second portion is capable of generating a second signal comprising a second signal component and a fourth signal component.
- the first portion and the second portion may be arranged on the solid support such that signals from the first portion and the second portion are detected by a single sensing portion and/or may comprise a single cluster such that first signals and second signals from each of the respective first portions and second portions cannot be spatially resolved.
- obtaining the intensity data comprises selecting intensity data that corresponds to two (or more) different portions (e.g. the first portion and the second portion).
- intensity data is selected based upon a chastity score.
- a chastity score may be calculated as the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities.
- the desired chastity score may be different depending upon the expected intensity ratio of the light emissions associated with the different portions. As described above, it may be desired to produce clusters comprising the first portion and the second portion, which give rise to signals in a ratio of 2:1. In one example, high-quality data corresponding to two portions with an intensity ratio of 2:1 may have a chastity score of around 0.8 to 0.9.
- the method may proceed to block 1720. In this step, one of a plurality of classifications is selected based on the intensity data. Each classification represents a possible combination of respective first and second nucleobases. In one example, the plurality of classifications comprises sixteen classifications as shown in Figure 12, each representing a unique combination of first and second nucleobases.
- Selecting the classification based on the first and second intensity data comprises selecting the classification based on the combined intensity of the first and second signal components and the combined intensity of the third and fourth signal components.
- the method may then proceed to block 1730, where the respective first and second nucleobases are base called based on the classification selected in block 1720.
- the signals generated during a cycle of a sequencing are indicative of the identity of the nucleobase(s) added during sequencing (e.g. using sequencing-by-synthesis). It will be appreciated that there is a direct correspondence between the identity of the nucleobases that are incorporated and the identity of the complementary base at the corresponding position of the template sequence bound to the solid support.
- any references herein to the base calling of respective nucleobases at the two portions encompasses the base calling of nucleobases hybridised to the template sequences and, alternatively or additionally, the identification of the corresponding nucleobases of the template sequences.
- the method may then end at block 1740.
- the polynucleotide library hairpin strand (in particular, a fourth hairpin polynucleotide as described herein) may be exposed to a conversion agent configured to convert 5-methylcytosine and 5-hydroxymethylcytosine to thymine or a nucleobase which is read as thymine/uracil, or to a conversion agent configured to convert an unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil (in particular, in order to generate a fifth hairpin polynucleotide as described herein).
- a conversion agent configured to convert 5-methylcytosine and 5-hydroxymethylcytosine to thymine or a nucleobase which is read as thymine/uracil
- a conversion agent configured to convert an unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil
- modified cytosine may refer to any one or more of 5-methylcytosine (5-mC), 5-hydroxymethylcytosine (5-hmC), 5-formylcytosine (5-fC) and 5-carboxylcytosine (5- caC): wherein the wavy line indicates an attachment point of the modified cytosine to the polynucleotide.
- unmodified cytosine refers to cytosine (C): wherein the wavy line indicates an attachment point of the unmodified cytosine to the polynucleotide.
- the term “conversion agent configured to convert 5-methylcytosine and 5- hydroxymethylcytosine to thymine or a nucleobase which is read as thymine/uracil” may refer to a reagent which converts 5-methylcytosine and 5-hydroxymethylcytosine to thymine (i.e. would base pair with adenine), or to an equivalent nucleobase which would base pair with adenine.
- the conversion may comprise a deamination reaction converting 5-methylcytosine or 5- hydroxymethylcytosine to thymine or nucleobase which is read as thymine/uracil.
- the term “conversion agent configured to convert an unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil, in order to generate a fifth hairpin polynucleotide” may refer to a reagent which converts one or more unmodified cytosines to uracil (i.e. would base pair with adenine), or to an equivalent nucleobase which would base pair with adenine.
- the conversion may comprise a deamination reaction converting the unmodified cytosine to uracil or nucleobase which is read as thymine/uracil.
- the conversion agent configured to convert 5-methylcytosine and 5- hydroxymethylcytosine to thymine or a nucleobase which is read as thymine/uracil may further be configured to be selective for converting 5-methylcytosine and/or 5-hydroxymethylcytosine over converting unmodified cytosine.
- the selectivity may be measured by comparing reaction parameters (e.g. deamination reaction parameters) of the conversion of 5-methylcytosine and/or 5-hydroxymethylcytosine to thymine or equivalent nucleobase which is read as thymine/uracil, with corresponding reaction parameters (e.g.
- a rate of a reaction e.g. deamination of 5-methylcytosine and/or 5- hydroxymethylcytosine to thymine or nucleobase which is read as thymine/uracil may be greater (e.g. at least 2 times greater, at least 5 times greater, at least 10 times greater, at least 20 times greater, at least 50 times greater, or at least 100 times greater) than a corresponding rate of a reaction (e.g.
- a yield of a reaction (e.g. deamination) of 5-methylcytosine and/or 5-hydroxymethylcytosine to thymine or nucleobase which is read as thymine/uracil may be greater (e.g. at least 2 times greater, at least 5 times greater, at least 10 times greater, at least 20 times greater, at least 50 times greater, or at least 100 times greater) than a corresponding yield of a reaction (e.g. deamination) of the unmodified cytosine to uracil or nucleobase which is read as thymine/uracil.
- the conversion agent configured to convert an unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil may further be configured to be selective for converting unmodified cytosine over converting 5-methylcytosine and/or 5- hydroxymethylcytosine.
- the selectivity may be measured by comparing reaction parameters (e.g. deamination reaction parameters) of the conversion of unmodified cytosine to uracil or nucleobase which is read as thymine/uracil, with corresponding reaction parameters (e.g.
- deamination reaction parameters of the conversion of 5-methylcytosine and/or 5- hydroxymethylcytosine to thymine or nucleobase which is read as thymine/uracil.
- reaction parameters such as rate of reaction or yield may be compared.
- rate of reaction a rate of a reaction (e.g. deamination) of the unmodified cytosine to uracil or nucleobase which is read as thymine/uracil may be greater (e.g. at least 2 times greater, at least 5 times greater, at least 10 times greater, at least 20 times greater, at least 50 times greater, or at least 100 times greater) than a rate of a reaction (e.g.
- a yield of a reaction e.g. deamination
- the unmodified cytosine to uracil or nucleobase which is read as thymine/uracil may be greater (e.g. at least 2 times greater, at least 5 times greater, at least 10 times greater, at least 20 times greater, at least 50 times greater, or at least 100 times greater) than a corresponding yield of a reaction (e.g.
- the conversion agent may comprise a chemical agent and/or an enzyme.
- the conversion agent may comprise a boron-based reducing agent and a ten-eleven translocation (TET) methylcytosine dioxygenase.
- the boron-based reducing agent is an amine-borane compound or an azine-borane compound (wherein the term “azine” refers to a nitrogenous heterocyclic compound comprising a 6-membered aromatic ring).
- Non-limiting examples of amine-borane compounds include compounds such as t-butylamine borane, ammonia borane, ethylenediamine borane and dimethylamine borane.
- Non-limiting examples of azine-borane compounds include compounds such as pyridine borane and 2-picoline borane.
- This process is selective for a particular type of modified cytosine (5-carboxylcytosine) and does not convert unmodified cytosine.
- treatment with additional reagents may be included to convert 5-methylcytosine and 5-hydroxymethylcytosine to 5-formylcytosine and/or 5-carboxylcytosine.
- boron-based reducing agents may be combined with ten-eleven translocation (TET) methylcytosine dioxygenases as described herein.
- TET methylcytosine dioxygenase may be a member of the TET1 subfamily, the TET2 subfamily, or the TET3 subfamily.
- the enzyme may be configured to convert 5-methylcytosine to 5-hydroxymethylcytosine, 5-hydroxymethylcytosine to 5- formylcytosine, and 5-formylcytosine to 5-carboxylcytosine.
- TET methylcytosine dioxygenase include: TET protein Non-limiting examples TET1 UniProt: Q8NFU7 (SEQ ID NO.43) UniProt: Q3URK3 (SEQ ID NO.44) TET2 UniProt: Q6N021 (SEQ ID NO.45) UniProt: Q4JK59 (SEQ ID NO.46) TET3 UniProt: O43151 (SEQ ID NO.47) UniProt: Q8BG87 (SEQ ID NO.48)
- the conversion agent e.g.
- the chemical agent may comprise sulfite.
- the sulfite may be present in a partially acid/salt form (e.g. as bisulfite ions), or be present in a salt form (e.g. as sulfite ions).
- the sulfite may comprise a cation (not including H+).
- the cation may be selected from “metal cations” or “non-metal cations”.
- Metal cations may include alkali metal ions (e.g. lithium, sodium, potassium, rubidium or caesium ions).
- Non-metal cations may include ammonium salts (e.g.
- sulfite also encompasses “metabisulfite”, which dissolves in aqueous solution to form bisulfite.
- the sulfite is bisulfite.
- the bisulfite is sodium bisulfite.
- sulfite e.g. bisulfite
- the conversion agent e.g. enzyme
- cytidine deaminase may refer to an enzyme which is able to catalyse the following reaction: wherein R is hydrogen, methyl, hydroxymethyl, formyl or carboxyl, and wherein the wavy line indicates an attachment point to a polynucleotide.
- the cytidine deaminase is a wild-type cytidine deaminase or a mutant cytidine deaminase.
- the cytidine deaminase is a mutant cytidine deaminase.
- the cytidine deaminase is a member of the APOBEC protein family. In a further embodiment, the cytidine deaminase is a member of the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the APOBEC3 subfamily (e.g. the APOBEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, the APOBEC3G subfamily, or the APOBEC3H subfamily), or the APOBEC4 subfamily.
- the APOBEC1 subfamily e.g. the APOBEC1 subfamily, the APOBEC2 subfamily, the APOBEC3 subfamily (e.g. the APOBEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the APOBEC3D subfamily, the
- the cytidine deaminase is a member of the APOBEC3A subfamily.
- cytidine deaminases are able to catalyse the deamination of 5-methylcytosine and 5- hydroxymethylcytosine to their equivalent deaminated versions (i.e. nucleobases which are read as thymine/uracil), as well as catalysing the deamination of unmodified cytosines to uracil.
- rates of reaction may differ depending on the type of modified cytosine; for example, wild-type APOBEC3A catalyses the deamination of unmodified cytosine and 5- methylcytosine relatively efficiently, whereas deamination of 5-hydroxymethylcytosine is ⁇ 5000- fold slower relative to unmodified cytosine.
- particular cytidine deaminases e.g. mutant cytidine deaminases
- the APOBEC protein family is a member of the large cytidine deaminase superfamily that contains a canonical zinc-dependent deaminase (ZDD) signature motif embedded within a core cytidine deaminase fold.
- This fold includes a five-stranded mixed beta (b)-sheet surrounded by six alpha (a)-helices with the order a1-b1-b2-a2-b3-a3-b4-a4-b5-a5-a6 (Salter et al., Trends Biochem Sci.201641(7):578–594. doi:10.1016/j.tibs.2016.05.001; Salter et al., Trends Biochem. Sci.
- Each cytidine deaminase domain core structure of APOBEC proteins contains a highly conserved spatial arrangement of the catalytic centre residues of a zinc-binding motif H-[P/A/V]-E-X[23-28]-P-C-X[2-4]-C (SEQ ID NO. 51) (referred to herein as the ZDD motif, where X is any amino acid, and the subscript range of numbers after X refers to the number of amino acids) (Salter et al., Trends Biochem Sci. 2016 41(7):578–594. doi:10.1016/j.tibs.2016.05.001).
- the H and two C residues coordinate a Zn atom
- the E residue polarises a water molecule near the Zn-atom for catalysis
- Some members of the APOBEC protein family e.g., the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the APOBEC3A subfamily, the APOBEC3C subfamily, the APOBEC3H subfamily, and the APOBEC4 subfamily, include one copy of the ZDD motif.
- APOBEC3B subfamily e.g., the APOBEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the APOBEC3G subfamily
- APOBEC3B subfamily e.g., the APOBEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the APOBEC3G subfamily
- ZDD motif e.g., the APOBEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the APOBEC3G subfamily
- a mutant cytidine deaminase disclosed herein includes one or two ZDD motifs.
- a mutant cytidine deaminase based on a member of the APOBEC3A subfamily includes the following ZDD motif: HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-11]LX2LX[10]M (SEQ ID NO. 52) (where X is any amino acid, and the subscript number or range of numbers after X refers to the number of amino acids) (Salter et al., Trends Biochem Sci. 2016 41(7):578–594. doi:10.1016/j.tibs.2016.05.001).
- Non-limiting examples of wild-type cytidine deaminases in the APOBEC protein family are shown in the table below (from UniProt, database of protein sequence and functional information, available at uniprot.org; or GenBank, collection of nucleotide sequences and their protein translations, available at ncbi.nlm.nih.gov/protein/): APOBEC protein
- APOBEC protein Non-limiting examples AID UniProt: Q9GZX7 (SEQ ID NO.7); UniProt: G3QLD2 (SEQ ID NO.8); Uniprot Q9WVE0 (SEQ ID NO.9)
- mutant cytidine deaminases are described in further detail in US Provisional Application 63/328,444, which is incorporated herein by reference.
- “functionally equivalent” it is meant that the mutant cytidine deaminase has the amino acid substitution at the amino acid position in a reference (wild-type) cytidine deaminase that has the same functional role in both the reference (wild-type) cytidine deaminase and the mutant cytidine deaminase.
- the (Tyr/Phe)130 may be Tyr130
- the wild-type APOBEC3A protein may be SEQ ID NO.16.
- the mutant cytidine deaminase may convert 5-methylcytosine to thymine by deamination at a greater rate than conversion rate of cytosine to uracil by deamination. In a further embodiment, the rate may be at least 100-fold greater. In some embodiments, the mutant cytidine deaminase may comprise both a 5-methylcytosine specific deaminase and 5-hydroxymethylcytosine specific deaminase. In one embodiment, the substitution mutation at the position functionally equivalent to Tyr130 may comprise Ala, Val or Trp. In one embodiment, the substitution mutation at the position functionally equivalent to Tyr132 may comprise a mutation to His, Arg, Gln or Lys.
- the mutant cytidine deaminase may comprise a ZDD motif H-[P/A/V]-E- X[23-28]-P-C-X[2-4]-C (SEQ ID NO.51).
- the mutant cytidine deaminase may be a member of the APOBEC3A subfamily and may comprise a ZDD motif HXEX 24 SW(S/T)PCX [2-4] CX 6 FX 8 LX 5 R(L/I)YX [8- 11] LX 2 LX [10] M (SEQ ID NO. 52).
- the conversion agent may further comprise a glycosyltransferase (e.g.
- the glycosyltransferase may be a ⁇ -glucosyltransferase.
- Such a glycosyltransferase may be configured to convert 5- hydroxymethylcytosine to a 5-hydroxymethylcytosine analogue bearing a hydroxyl protecting group, wherein the hydroxyl protecting group is glycosyl.
- a non-limiting example of the enzyme includes T4- ⁇ GT, for example as supplied by New England BioLabs (catalog # M0357S, M0357L) or by ThermoFisher Scientific (catalog # EO0831); further non-limiting examples of glycosyltransferases include: Glucosyltransferase Non-limiting examples ⁇ -glucosyltransferase UniProt: P04519 (SEQ ID NO.49) ⁇ -glucosyltransferase UniProt: P04547 (SEQ ID NO.50) Specific methods of modified cytosine sequencing using conversion agents are further illustrated below. However, the type of conversion agents are not limited thereto.
- BS-seq Bisulfite sequencing involves using bisulfite as the conversion agent. This process is described in Frommer et al. (Proc. Natl. Acad. Sci. U.S.A., 1992, 89, pp. 1827-1831), which is incorporated herein by reference. This process converts unmodified cytosines in the target polynucleotide to uracil to deaminated analogues, but does not convert 5-methylcytosine and 5- hydroxymethylcytosine. Accordingly, BS-seq allows identification of the modified cytosines 5- mC and 5-hmC by reading them as C; whereas unmodified C is converted to nucleobases which are read as T/U.
- EM-seq Enzymatic Methyl sequencing involves using T4 bacteriophage ⁇ -glucosyltransferase and a TET2 enzyme as the further agents and APOBEC3A as the conversion agent. This process is described in Vaisvila et al. (Genome Res.2021, 31, pp.1280-1289), US 10,619,200 B2 and US 9,121,061 B2, which are incorporated herein by reference.
- the T4 bacteriophage ⁇ - glucosyltransferase converts 5-hydroxymethylcytosine in the target polynucleotide to ⁇ -glucosyl- 5-hydroxymethylcytosine, which prevents oxidation.
- the TET2 enzyme causes oxidation of 5- methylcytosine in the target polynucleotide to 5-hydroxymethylcytosine, which in turn is converted to ⁇ -glucosyl-5-hydroxymethylcytosine by the T4 bacteriophage ⁇ -glucosyltransferase.
- Subsequent treatment with APOBEC3A converts unmodified cytosines in the target polynucleotide to uracil. Accordingly, EM-seq allows identification of the modified cytosines 5- mC and 5-hmC (as protected glycosyl residues) by reading them as C; whereas unmodified C is converted to U.
- Modified APOBEC sequencing involves using a mutant APOBEC3A enzyme as the conversion agent, which is described in more detail in the Reference Examples 1 to 4 below. This process is described in US Provisional Application 63/328,444, which is incorporated herein by reference.
- TAPS TET-assisted pyridine borane sequencing involves using a TET1 enzyme as the further agent and pyridine borane as the conversion agent. This process is described in Liu et al. (Nature Biotechnology, 2019, 37, pp. 424-429), which is incorporated herein by reference.
- the TET1 enzyme causes oxidation of 5-methylcytosine, 5-hydroxymethylcytosine and 5-formylcytosine in the target polynucleotide to 5-carboxylcytosine.
- Subsequent treatment with pyridine borane converts 5-carboxylcytosine (including residues that used to be 5-methylcytosine, 5- hydroxymethylcytosine and 5-formylcytosine) to dihydrouracil, but does not convert unmodified cytosine.
- TAPS allows identification of the modified cytosines 5-mC and 5-hmC by reading them as T/U; whereas unmodified cytosine is read as C.
- a method of preparing polynucleotide templates for distinguishing between modified cytosines comprising: (a) providing a polynucleotide library hairpin strand comprising: a double-stranded polynucleotide comprising a forward library strand and a reverse library strand, a hairpin loop adaptor ligated to an end of the double-stranded polynucleotide, wherein the hairpin loop adaptor comprises a cleavable site, wherein the polynucleotide library hairpin strand has been generated from a precursor polynucleotide library hairpin strand such that any CpG dyads in the precursor polynucleotide library hairpin comprising only unmodified cytosine are converted to a first dyad in the polynucleotide library hairpin strand, any CpG dyads in the
- the hairpin loop adaptor used in the polynucleotide library hairpin strand contains a cleavable site.
- the template strand that is generated from the polynucleotide library hairpin strand contains a first cleavable site in the spacer strand. This differs from previous methods, which do not contain a cleavable site within the hairpin loop adaptor, or in the resulting template strand generated from the library.
- a risk associated with templates generated from these types of polynucleotide library hairpin strands is that the template may rehybridise to itself – in other words, the forward template strand (complementary to the forward library strand) may self- hybridise to the reverse template strand (complementary to the reverse library strand).
- the formation of such self-hybridised structures can interfere with the sequencing process since nucleotides that are capable of emitting a signal (e.g. fluorescent nucleotides) are now unable to base-pair with the forward template strand (or reverse template strand).
- the forward template strand (or the reverse template strand) may be removed at a later stage, removing the risk of hairpins forming within the template.
- a further advantage is that longer reads are enabled.
- the hairpin loop adaptor may be ligated to one end of the double-stranded polynucleotide such that the hairpin loop adaptor connects a 3’-end of the forward library strand with a 5’-end of the reverse library strand.
- the polynucleotide library hairpin strand may comprise, in a 5’ to 3’ direction, the forward library strand, the hairpin loop adaptor (which comprises the cleavable site), and the reverse library strand.
- the hairpin loop adaptor may instead be ligated to another end of the of the double-stranded polynucleotide such that the hairpin loop adaptor connects a 3’-end of the reverse library strand with a 5’-end of the forward library strand.
- the polynucleotide library hairpin strand may comprise, in a 5’ to 3’ direction, the reverse library strand, the hairpin loop adaptor (which comprises the cleavable site), and the forward library strand.
- the term “dyad” refers to a portion of a double-stranded polynucleotide containing two adjacent nucleotides on one strand, and two adjacent polynucleotides at corresponding positions on the other strand. Both base pairs within the dyad may be complementary (e.g.
- one or both base pairs within the dyad may not be complementary.
- this may cause one base pair within the dyad to be complementary, and one base pair in the dyad to not be complementary; in other cases, both base pairs within the dyad may not be complementary.
- CpG dyad refers to a double-stranded polynucleotide containing the following motif: (5’)-CG-(3’) (3’)-GC-(5’)
- C may represent either unmodified cytosine, 5-methylcytosine or 5- hydroxymethylcytosine.
- both C’s in the motif are unmodified cytosine.
- the CpG dyad comprises 5-methylcytosine
- one or both C’s may be 5-methylcytosine.
- the C in the forward library strand may be 5- methylcytosine and the C in the reverse library strand may be unmodified cytosine (e.g. in a third hairpin polynucleotide as described herein) – such a CpG dyad may be described as a “hemimethylated 5-methylcytosine CpG dyad”.
- both the C in the forward library strand may be 5-methylcytosine and the C in the reverse library strand may also be 5- methylcytosine (e.g.
- a CpG dyad in a fourth hairpin polynucleotide as described herein) – such a CpG dyad may be described as a “fully methylated 5-methylcytosine CpG dyad”.
- the CpG dyad comprises 5-hydroxymethylcytosine
- one or both C’s may be 5- hydroxymethylcytosine.
- the C in the forward library strand may be 5-hydroxymethylcytosine and the C in the reverse library strand may be unmodified cytosine (e.g.
- such a CpG dyad may be described as a “hemimethylated 5- hydroxymethylcytosine CpG dyad”.
- the first dyad (produced from the CpG dyads comprising only unmodified cytosine), the second dyad (produced from the CpG dyads comprising 5-methylcytosine) and the third dyad (produced from the CpG dyads comprising 5-hydroxymethylcytosine) are different to each other when read.
- first dyad As used herein for the terms “first dyad”, “second dyad” and “third dyad”, these are different to each other when read if the complement of respective dyads are different.
- first dyad when a “first dyad” is sequenced, the sequence output is different compared to a sequence output of the “second dyad” and the “third dyad”, since the complement of the “first dyad” is different to that of the complement of the “second dyad” and the complement of the “third dyad”; similarly, when a “second dyad” is sequenced, the sequence output is different compared to a sequence output of the “first dyad” and the “third dyad”, since the complement of the “second dyad” is different to that of the complement of the “first dyad” and the complement of the “third dyad”; and
- the “first dyad” (produced from the CpG dyads comprising only unmodified cytosine) may contain the following motif: (5’)-CG-(3’) (3’)-GC-(5’); the “second dyad” (produced from the CpG dyads comprising 5-methylcytosine) may contain the following motif: (5’)-UG-(3’) (3’)-GU-(5’); and the “third dyad” (produced from the CpG dyads comprising 5-hydroxymethylcytosine) may contain the following motif: (5’)-UG-(3’) (3’
- the “first dyad” (produced from the CpG dyads comprising only unmodified cytosine) may contain the following motif: (5’)-UG-(3’) (3’)-GU-(5’); the “second dyad” (produced from the CpG dyads comprising 5-methylcytosine) may contain the following motif: (5’)-CG-(3’) (3’)-GC-(5’); and the “third dyad” (produced from the CpG dyads comprising 5-hydroxymethylcytosine) may contain the following motif: (5’)-CG-(3’) (3’)-GU
- 5-methylcytosine 5-hydroxymethylcytosine, or derivatives thereof such as protected 5-hydroxymethylcytosine, for example ⁇ -glucosyl-5- hydroxymethylcytosine
- U represents uracil or a nucleobase which is read as thymine/uracil.
- the “first dyad” is read as (5’)-TG-(3’) in the forward library strand and (3’)-GT-(5’) in the reverse library strand
- the “second dyad” is read as (5’)-CG-(3’) in the forward library strand and (3’)-GC-(5’) in the reverse library strand
- the “third dyad” is read as (5’)-CG-(3’) in the forward library strand and (3’)-GT-(5’) in the reverse library strand
- any CpG dyads comprising only unmodified cytosine, any CpG dyads comprising 5-methylcytosine, and any CpG dyads comprising 5-hydroxymethylcytosine can again be distinguished from each other.
- the polynucleotide library hairpin strand comprises, in a 5’ to 3’ direction, the forward library strand, the hairpin loop adaptor (which comprises the cleavable site), and the reverse library strand
- the corresponding template strand may comprise, in a 5’ to 3’ direction, the reverse template strand, the spacer strand (which comprises the first cleavable site), and the forward template strand.
- the corresponding template strand may comprise, in a 5’ to 3’ direction, the forward template strand, the spacer strand (which comprises the first cleavable site), and the reverse template strand.
- step (b) may comprise synthesising a plurality of template strands.
- the method of preparing polynucleotide templates for distinguishing between different modified cytosines may instead comprise a step of: (b) synthesising a plurality of template strands by generating a complement of the polynucleotide library hairpin strand, each of the template strands comprising a forward template strand complementary to the forward library strand, a spacer strand complementary to the hairpin loop adaptor, and a reverse template strand complementary to the reverse library strand, wherein the spacer strand comprises a first cleavable site.
- the method may further comprise a step of: (c) synthesising at least one template complement strand by generating a complement of the template strand, each of the template complement strands comprising a forward complement template strand, a spacer complement strand, and a reverse complement template strand, wherein the spacer complement strand comprises a second cleavable site.
- the template strand comprises, in a 5’ to 3’ direction, the reverse template strand, the spacer strand (which comprises the first cleavable site), and the forward template strand
- the corresponding template complement strand may comprise, in a 5’ to 3’ direction, the forward complement strand, the spacer complement strand (which comprises the second cleavable site), and the reverse complement template strand.
- the corresponding template complement strand may comprise, in a 5’ to 3’ direction, the reverse complement strand, the spacer complement strand (which comprises the second cleavable site), and the forward complement template strand.
- step (c) may comprise synthesising a plurality of template complement strands.
- the method of preparing polynucleotide templates for distinguishing between different modified cytosines may instead comprise a step of: (c) synthesising a plurality of template complement strands by generating a complement of the template strand, each of the template complement strands comprising a forward complement template strand, a spacer complement strand, and a reverse complement template strand, wherein the spacer complement strand comprises a second cleavable site.
- the method may further comprise a step of: (d) cleaving the first cleavable site on the at least one template strand to generate at least one first polynucleotide sequence each comprising a first portion and cleaving the second cleavable site on the at least one template complement strand to generate at least one second polynucleotide sequence each comprising a second portion, wherein the first portion corresponds with the forward template strand and the second portion corresponds with the reverse complement template strand, or wherein the first portion corresponds with the reverse template strand and the second portion corresponds with the forward complement template strand.
- first portion corresponds with the forward template strand and the second portion corresponds with the reverse complement template strand
- first portion corresponds with the reverse template strand and the second portion corresponds with the forward complement template strand
- first portion corresponds with the reverse template strand and the second portion corresponds with the forward complement template strand
- first portion corresponds with the reverse template strand and the second portion corresponds with the forward complement template strand
- forward template strand and the reverse complement template strand that have been removed. Since portions of the original template and template complement strands that would self-hybridise have been removed, this reduces the risk of unwanted hairpins forming within the first polynucleotide sequence and the second polynucleotide sequence, as mentioned above.
- step (b) may comprise synthesising a plurality of template strands
- step (c) may comprise synthesising a plurality of template complement strands.
- the method of preparing polynucleotide templates for distinguishing between different modified cytosines may instead comprise a step of: (d) cleaving the first cleavable site on the plurality of template strands to generate a plurality of first polynucleotide sequences each comprising a first portion and cleaving the second cleavable site on the plurality of template complement strands to generate a plurality of second polynucleotide sequences each comprising a second portion, wherein the first portion corresponds with the forward template strand and the second portion corresponds with the reverse complement template strand, or wherein the first portion corresponds with the reverse template strand and the second portion corresponds with the forward complement template strand.
- cleavable site is meant any moiety that allows the hairpin loop adaptor, spacer strand, or spacer complement strand to be separated into two strands from a single strand.
- the cleavable site is a restriction site.
- restriction site is meant a sequence of nucleotides recognised by an endonuclease.
- the endonuclease may be a double strand restriction endonuclease or restriction enzyme. By either of these terms is meant an enzyme that can hydrolyze both strands of a double- stranded polynucleotide (duplex), to produce polynucleotide molecules that are cleaved on both strands.
- the restriction enzyme may be a type II restriction enzyme.
- the restriction enzyme may be a type IIP restriction enzyme, a type IIS restriction enzyme, a type IIC restriction enzyme, or a type IIT restriction enzyme.
- the type II restriction enzyme may be EcoRI and the restriction enzyme is G/AATTC wherein EcoRI catalyzes a double stranded break within the recognition site.
- the type II restriction enzyme may be BglII and the restriction site is A/GATCT, wherein BglII catalyzes a double stranded break within the recognition site.
- the type II restriction enzyme may be NotI and the restriction site is GC/GGCCGC, wherein NotI catalyses a double stranded break within the recognition site.
- the type II restriction enzyme may be FokI and the restriction site is GGATG(9/13).
- Other suitable endonucleases are available from commercial sources, including New England Biolabs and Fisher Scientific.
- the endonuclease is a CRISPR enzyme. CRISPR-Cas mechanisms are currently classified into two classes (classes 1 and 2) and six types (types I to VI).
- the CRISPR enzyme is a class I enzyme, and is selected from type I, III and IV.
- the CRISPR enzyme is Cas6. In another embodiment, the CRISPR enzyme is a class 2 enzyme, and is selected from type II, IV, V, and VI. In one embodiment, the CRISPR enzyme is selected from Cas9, Cpf1 (Cas12a), Mad7, CasC2c1, C2C2 and C2c3. In another embodiment, the CRISPR enzyme is a type VI CRISPR enzyme. In one embodiment, the enzyme is selected from CasC2c1, C2C2 and C2c3.
- CRISPR enzymes may be naturally occurring, for example Cas9 may be obtained from any one of Staphylococcus aureus, Neisseria meningitides, Streptococcus thermophiles, Treponema denticola or Campylobacter jejuni.
- Cpf1 enzymes may be selected from the Acidaminococcus sp (AsCpf1) or Lachnospiraceae bacterium (LbCpf1).
- the CRISPR enzyme is a Cas9 paired nickase. Examples of a Cas9 paired nickase include Cas9 D10A and Cas9 H840A.
- the Cas9 protein may comprise the D10A or H840A amino acid substitutions. These nickases cleave only the DNA strand that is complementary to and recognised by a gRNA.
- the restriction site may be or may comprise a PAM (protospacer adjacent motif) sequence. Examples of suitable PAM sequences include NGG, NGAG, NGCG, NGN, NG, GAA, GAT, NNG, NGN, NRN, YG, NNGRRT, NNNRRT, NNAGAA, NNNNGATT and NNNNCRAA and complements thereof.
- the Cas9 protein may alternatively or additionally comprise the N863A or N854A amino acid substitutions.
- the Cas9 protein has been modified to improve activity.
- the Cas9 protein may additionally comprise a D1135E substitution.
- the Cas9 protein may also be the VQR variant.
- the first cleavable site may be a first restriction site for an endonuclease.
- the endonuclease may be a type II restriction enzyme as defined herein, or a CRISPR enzyme as defined herein (e.g. a Cas9 paired nickase as defined herein).
- the restriction enzyme is a type IIP restriction enzyme, a type IIS restriction enzyme, a type IIC restriction enzyme, or a type IIT restriction enzyme.
- Non-limiting examples of the type II restriction enzyme may include EcoRI, BglII, NotI and FokI; non-limiting examples of the CRISPR enzyme include Cas6, Cas9, Cpf1 (Cas12a), Mad7, CasC2c1, C2C2 and C2c3, and Cas9 paired nickases such as Cas9 D10A and Cas9 H840A.
- the second cleavable site may be a second restriction site for an endonuclease.
- the endonuclease may be a type II restriction enzyme as defined herein, or a CRISPR enzyme as defined herein (e.g. a Cas9 paired nickase as defined herein).
- the restriction enzyme is a type IIP restriction enzyme, a type IIS restriction enzyme, a type IIC restriction enzyme, or a type IIT restriction enzyme.
- type II restriction enzyme may include EcoRI, BglII, NotI and FokI;
- CRISPR enzyme include Cas6, Cas9, Cpf1 (Cas12a), Mad7, CasC2c1, C2C2 and C2c3, and Cas9 paired nickases such as Cas9 D10A and Cas9 H840A.
- the first cleavable site and the second cleavable site may be cleaved under the same reaction conditions.
- the first cleavable site and the second cleavable site may be a restriction site recognised by the same endonuclease.
- the endonuclease may be a type II restriction enzyme as defined herein that recognises both the first cleavable site and the second cleavable site, or a CRISPR enzyme as defined herein (e.g. a Cas9 paired nickase as defined herein) that recognises both the first cleavable site and the second cleavable site.
- the restriction enzyme is a type IIP restriction enzyme, a type IIS restriction enzyme, a type IIC restriction enzyme, or a type IIT restriction enzyme, that recognises both the first cleavable site and the second cleavable site.
- non-limiting examples of the type II restriction enzyme may include EcoRI, BglII, NotI and FokI, recognising both the first cleavable site and the second cleavable site;
- non-limiting examples of the CRISPR enzyme include Cas6, Cas9, Cpf1 (Cas12a), Mad7, CasC2c1, C2C2 and C2c3, and Cas9 paired nickases such as Cas9 D10A and Cas9 H840A, recognising both the first cleavable site and the second cleavable site.
- the first dyad (produced from the CpG dyads comprising only unmodified cytosine), the second dyad (produced from the CpG dyads comprising 5- methylcytosine) and the third dyad (produced from the CpG dyads comprising 5- hydroxymethylcytosine) generate different patterns when comparing appropriate bases in the forward template strand and the reverse complement template strand, or when comparing appropriate bases in the reverse template strand and the forward complement template strand.
- the first dyad produced from the CpG dyads comprising only unmodified cytosine
- the second dyad produced from the CpG dyads comprising 5- methylcytosine
- the third dyad (produced from the CpG dyads comprising 5- hydroxymethylcytosine) generate different patterns when comparing appropriate bases in the forward template strand and the reverse complement template strand, or when comparing appropriate bases in
- Figure 16 shows how the first dyad, second dyad and the third dyad form certain patterns that allow each of these to be distinguished from each other.
- the first dyad (produced from the CpG dyads comprising only unmodified cytosine) produces a (5’)-CG-(3’) sequence in the forward template strand at positions corresponding to the original CpG dyad, and a (5’)-CG-(3’) sequence in the reverse complement template strand at positions corresponding to the original CpG dyad.
- the first dyad (produced from the CpG dyads comprising only unmodified cytosine) produces a (5’)-CG-(3’) sequence in the reverse template strand at positions corresponding to the original CpG dyad, and a (5’)-CG-(3’) sequence in the forward complement template strand at positions corresponding to the original CpG dyad.
- a double C-C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad (e.g.
- the second dyad (produced from the CpG dyads comprising 5-methylcytosine) produces a (5’)-CA-(3’) sequence in the forward template strand at positions corresponding to the original CpG dyad, and a (5’)-TG-(3’) sequence in the reverse complement template strand at positions corresponding to the original CpG dyad.
- the second dyad (produced from the CpG dyads comprising 5-methylcytosine) produces a (5’)-CA-(3’) sequence in the reverse template strand at positions corresponding to the original CpG dyad, and a (5’)-TG-(3’) sequence in the forward complement template strand at positions corresponding to the original CpG dyad. Accordingly, a double mismatch is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad (e.g.
- the third dyad (produced from the CpG dyads comprising 5- hydroxymethylcytosine) produces a (5’)-CA-(3’) sequence in the forward template strand at positions corresponding to the original CpG dyad, and a (5’)-CG-(3’) sequence in the reverse complement template strand at positions corresponding to the original CpG dyad.
- the third dyad (produced from the CpG dyads comprising 5-hydroxymethylcytosine) produces a (5’)- CG-(3’) sequence in the reverse template strand at positions corresponding to the original CpG dyad, and a (5’)-TG-(3’) sequence in the forward complement template strand at positions corresponding to the original CpG dyad. Accordingly, a single mismatch and single C-C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad (e.g.
- a double C-C/G-G match may be present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; where CpG dyads comprising 5-methylcytosine were present in the precursor polynucleotide library hairpin, then a double mismatch may be present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; and where CpG dyads comprising 5-hydroxymethylcytosine were present in the precursor polynucleotide library hairpin, then a single mismatch and single C- C/G-G match may be present when comparing corresponding positions in
- Figure 17 shows how the first dyad, second dyad and the third dyad form certain patterns that allow each of these to be distinguished from each other.
- the first dyad (produced from the CpG dyads comprising only unmodified cytosine) produces a (5’)-CA-(3’) sequence in the forward template strand at positions corresponding to the original CpG dyad, and a (5’)-TG-(3’) sequence in the reverse complement template strand at positions corresponding to the original CpG dyad.
- the first dyad (produced from the CpG dyads comprising only unmodified cytosine) produces a (5’)-CA-(3’) sequence in the reverse template strand at positions corresponding to the original CpG dyad, and a (5’)-TG-(3’) sequence in the forward complement template strand at positions corresponding to the original CpG dyad. Accordingly, a double mismatch is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad (e.g.
- the second dyad (produced from the CpG dyads comprising 5-methylcytosine) produces a (5’)-CG-(3’) sequence in the forward template strand at positions corresponding to the original CpG dyad, and a (5’)-CG-(3’) sequence in the reverse complement template strand at positions corresponding to the original CpG dyad.
- the second dyad (produced from the CpG dyads comprising 5-methylcytosine) produces a (5’)-CG-(3’) sequence in the reverse template strand at positions corresponding to the original CpG dyad, and a (5’)-CG-(3’) sequence in the forward complement template strand at positions corresponding to the original CpG dyad. Accordingly, a double C-C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad (e.g.
- the third dyad (produced from the CpG dyads comprising 5- hydroxymethylcytosine) produces a (5’)-CG-(3’) sequence in the forward template strand at positions corresponding to the original CpG dyad, and a (5’)-TG-(3’) sequence in the reverse complement template strand at positions corresponding to the original CpG dyad.
- the third dyad (produced from the CpG dyads comprising 5-hydroxymethylcytosine) produces a (5’)- CA-(3’) sequence in the reverse template strand at positions corresponding to the original CpG dyad, and a (5’)-CG-(3’) sequence in the forward complement template strand at positions corresponding to the original CpG dyad. Accordingly, a single mismatch and single C-C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad (e.g.
- a double mismatch may be present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; where CpG dyads comprising 5-methylcytosine were present in the precursor polynucleotide library hairpin, then a double C-C/G-G match may be present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; and where CpG dyads comprising 5- hydroxymethylcytosine were present in the precursor polynucleotide library hairpin, then a single mismatch and single C-C/G-G match may be present when comparing
- Figures 16 and 17 show that the reverse template strand and the forward complement template strand are retained as part of the at least one first polynucleotide sequence and the at least one second polynucleotide sequence after cleavage, it should be appreciated that in other cases, it may be the forward template strand and the reverse complement template strand that are retained instead. This can be achieved, for example, by attaching the hairpin loop adaptor at the other end to that shown in Figures 16 and 17, as explained above. Furthermore, whilst Figures 16 and 17 show that the template strand is generated extending from the second immobilised primer (e.g. P7), and that the template complement strand is generated extending from the first immobilised primer (e.g.
- the second immobilised primer e.g. P7
- the template complement strand is generated extending from the first immobilised primer
- the template strand may instead extend from the first immobilised primer (e.g. P5), and that the template complement strand may instead extend from the second immobilised primer (e.g. P7).
- the polynucleotide sequences each comprise portions of a double-stranded nucleic acid template, and the first portion may comprise (or be) the forward strand of a polynucleotide sequence (e.g.
- the second portion may comprise (or be) the reverse complement strand of the polynucleotide sequence (e.g. reverse complement strand of the template, or reverse complement template strand) (in effect, a reverse complement strand may be considered a “copy” of the forward strand).
- the first portion may comprise (or be) the reverse strand of a polynucleotide sequence (e.g. reverse strand of a template, or reverse template strand)
- the second portion may comprise (or be) the forward complement strand of the polynucleotide sequence (e.g.
- the first portion may be derived from a forward strand of a target polynucleotide to be sequenced (also referred to herein as a forward library strand), and the second portion may be derived from a reverse complement strand of the target polynucleotide to be sequenced (also may be considered as a reverse complement library strand, a complement of a reverse library strand); or the first portion may be derived from a reverse strand of a target polynucleotide to be sequenced (also referred to herein as a reverse library strand), and the second portion may be derived from a forward complement strand of the target polynucleotide to be sequenced (also may be considered as a forward complement library strand, a complement of a forward library strand).
- the template is generated from a (double-stranded) target polynucleotide to be sequenced via complementary base pairing.
- the (double-stranded) target polynucleotide may be one (double- stranded) polynucleotide present in a polynucleotide library to be sequenced.
- the template allows sequence information to be obtained for that particular polynucleotide.
- the method may further comprise a step of preparing the first portion and the second portion for concurrent sequencing. For example, the method may comprise simultaneously contacting first sequencing primer binding sites located after a 3’-end of the first portions with first primers and second sequencing primer binding sites located after a 3’-end of the second portions with second primers.
- the method may comprises a step of processing the at least one first polynucleotide sequence comprising a first portion and the at least one second polynucleotide sequence comprising a second portion, such that a proportion of first portions are capable of generating a first signal and a proportion of second portions are capable of generating a second signal.
- the first signal and the second signal may be spatially resolved. In other embodiments, the first signal and the second signal may be spatially unresolved. In some embodiments (e.g.
- a proportion of first portions may be capable of generating a first signal and a proportion of second portions may be capable of generating a second signal, wherein an intensity of the first signal is substantially the same as an intensity of the second signal.
- a proportion of first portions may be capable of generating a first signal and a proportion of second portions may be capable of generating a second signal, wherein the selective processing causes an intensity of the first signal to be greater than an intensity of the second signal.
- the first signal and the second signal may be spatially unresolved (e.g. generated from the same region or substantially overlapping regions). Further aspects relating to selective processing methods (e.g.
- the first portion may be referred to herein as read 1.1 (R1.1).
- the second portion may be referred to herein as read 1.2 (R1.2).
- the first portion is at least 25 or at least 50 base pairs and the second portion is at least 25 base pairs or at least 50 base pairs.
- the first and second strand may be separately attached to a solid support. In a further embodiment, this solid support may be a flow cell. In a further embodiment, each of the first and second strands are attached to the solid support (e.g.
- the term “cluster” may refer to a clonal group of template polynucleotides (e.g. DNA or RNA) bound within a single well of a solid support (e.g. flow cell).
- a cluster may refer to the population of polynucleotide molecules within a well that are then sequenced.
- a “cluster” may contain a sufficient number of copies of template polynucleotides such that the cluster is able to output a signal (e.g. a light signal) that allows sequencing reads to be performed on the cluster.
- a “cluster” may comprise, for example, about 500 to about 2000 copies, about 600 to about 1800 copies, about 700 to about 1600 copies, about 800 to about 1400 copies, about 900 to about 1200 copies, or about 1000 copies of template polynucleotides.
- a cluster may be formed by bridge amplification, as described above. Where the method of the invention involves a first polynucleotide strand and a second polynucleotide strand, the cluster formed may be a duoclonal cluster.
- duoclonal cluster is meant that the population of polynucleotide sequences that are then sequenced (as the next step) are substantially of two types – e.g. a first sequence and a second sequence.
- a “duoclonal” cluster may refer to the population of single first sequences and single second sequences within a well that are then sequenced.
- a “duoclonal” cluster may contain a sufficient number of copies of a single first sequence and copies of a single second sequence such that the cluster is able to output a signal (e.g. a light signal) that allows sequencing reads to be performed on the “monoclonal” cluster.
- a “duoclonal” cluster may comprise, for example, about 500 to about 2000 combined copies, about 600 to about 1800 combined copies, about 700 to about 1600 combined copies, about 800 to about 1400 combined copies, about 900 to about 1200 combined copies, or about 1000 combined copies of single first sequences and single second sequences.
- the copies of single first sequences and single second sequences together may comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, or about 95%, 98%, 99% or 100% of all polynucleotides within a single well of the flow cell, and thus providing a substantially duoclonal “cluster”.
- the at least one first polynucleotide sequence comprising a first portion and at least one second polynucleotide sequence may be prepared using a loop fork method as described herein (see Figure 4).
- the polynucleotide library hairpin strand may be prepared using methods of preparing a polynucleotide library hairpin strand as described herein.
- the method may further comprise a step of concurrently sequencing nucleobases in the first portion and the second portion.
- Figure 16 shows an example workflow for preparing a polynucleotide library hairpin strand according to a method as described herein (using a conversion agent configured to convert 5- methylcytosine and 5-hydroxymethylcytosine to thymine or a nucleobase which is read as thymine/uracil), then a subsequent example workflow for preparing polynucleotide templates for distinguishing between modified cytosines according to a method as described herein.
- a hairpin loop adaptor 2002 is ligated, where the hairpin loop adaptor comprises a cleavable site 2003, to generate a first hairpin polynucleotide (not shown) (steps (a) and (b) for methods of preparing a polynucleotide library hairpin strand as described herein).
- the precursor reverse library strand then generates second hairpin polynucleotide 2200, which comprises the precursor forward library strand 2001 and the hairpin loop adaptor 2002, wherein the hairpin loop adaptor 2002 comprises the cleavable site 2003 (step (c) for methods of preparing a polynucleotide library hairpin strand as described herein).
- the precursor forward library strand 2001 comprises a CpG sequence comprising 5-methylcytosine, a CpG sequence comprising 5- hydroxymethylcytosine, and a CpG sequence comprising unmodified cytosine.
- third hairpin polynucleotide 2300 comprises the precursor forward library strand 2001, the hairpin loop adaptor 2002 (comprising the cleavable site 2003), and resynthesised reverse library strand 2004.
- This process may be conducted using unmodified cytosines, thus meaning that the resynthesised reverse library strand 2004 is devoid of modified cytosines such as 5-methylcytosine and 5-hydroxymethylcytosine.
- This generates a CpG dyad 2005 comprising only unmodified cytosine, a CpG dyad 2006 comprising 5-methylcytosine (a hemimethylated 5-methylcytosine CpG dyad), and a CpG dyad 2007 comprising 5-hydroxymethylcytosine (a hemimethylated 5-hydroxymethylcytosine CpG dyad).
- flanking adaptor 2005 a P7’/P5 fork adaptor
- hairpin polynucleotide 2300-1 Treatment of hairpin polynucleotide 2300-1 with DNA methyltransferase 1 enzyme transforms CpG dyad 2006 comprising 5-methylcytosine (a hemimethylated 5-methylcytosine CpG dyad) to a fully methylated 5-methylcytosine CpG dyad.
- fourth hairpin polynucleotide 2400 comprises the precursor forward library strand 2001, the hairpin loop adaptor 2002 (comprising the cleavable site 2003), and partially methylated reverse library strand 2008.
- step (f) for methods of preparing a polynucleotide library hairpin strand as described herein.
- the precursor forward library strand 2001 is converted to forward library strand 2009.
- the partially methylated reverse library strand 2008 is converted to reverse library strand 2010.
- the CpG dyad 2005 comprising only unmodified cytosine is unaffected, which can now be considered as first dyad 2011.
- the CpG dyad 2006 comprising 5-methylcytosine (a hemimethylated 5-methylcytosine CpG dyad), which was transformed to a fully methylated 5-methylcytosine CpG dyad, is converted to second dyad 2012.
- the CpG dyad 2007 comprising 5-hydroxymethylcytosine (a hemimethylated 5- hydroxymethylcytosine CpG dyad) is converted to third dyad 2013.
- fifth hairpin polynucleotide 2500 anneals (via P7’) to immobilised primer P7 on a solid support.
- Extending from the 3’-end of immobilised primer P7 in a 5’ to 3’ direction generates a template strand (e.g. using a DNA polymerase).
- the template strand comprises, in a 5’ to 3’ direction, a reverse template strand 2050 which is complementary to reverse library strand 2010, a spacer strand 2051 (comprising a first cleavable site 2053) which is complementary to hairpin loop adaptor 2002, and a forward template strand 2052 which is complementary to forward library strand 2009.
- a template complement strand is generated which extends from the 3’-end of immobilised primer P5.
- the template complement strand comprises, in a 5’ to 3’ direction, a forward complement template strand 2052’, a spacer complement strand 2051’ (comprising a second cleavable site 2054), and a reverse complement template strand 2050’.
- Cleavage of the first cleavable site 2053 and second cleavable site 2054 causes removal of forward template strand 2052 and reverse complement template strand 2050’, thus leaving behind a first polynucleotide sequence comprising a first portion (reverse template strand 2050) and a second polynucleotide sequence comprising a second portion (forward complement template strand 2052’).
- Figure 17 shows another example workflow for preparing a polynucleotide library hairpin strand according to a method as described herein (using a conversion agent configured to convert an unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil), then a subsequent example workflow for preparing polynucleotide templates for distinguishing between modified cytosines according to a method as described herein.
- a hairpin loop adaptor 2002 is ligated, where the hairpin loop adaptor comprises a cleavable site 2003, to generate a first hairpin polynucleotide (not shown) (steps (a) and (b) for methods of preparing a polynucleotide library hairpin strand as described herein).
- the precursor reverse library strand then generates second hairpin polynucleotide 2200, which comprises the precursor forward library strand 2001 and the hairpin loop adaptor 2002, wherein the hairpin loop adaptor 2002 comprises the cleavable site 2003 (step (c) for methods of preparing a polynucleotide library hairpin strand as described herein).
- the precursor forward library strand 2001 comprises a CpG sequence comprising 5-methylcytosine, a CpG sequence comprising 5- hydroxymethylcytosine, and a CpG sequence comprising unmodified cytosine.
- third hairpin polynucleotide 2300 comprises the precursor forward library strand 2001, the hairpin loop adaptor 2002 (comprising the cleavable site 2003), and resynthesised reverse library strand 2004.
- This process may be conducted using unmodified cytosines, thus meaning that the resynthesised reverse library strand 2004 is devoid of modified cytosines such as 5-methylcytosine and 5-hydroxymethylcytosine.
- This generates a CpG dyad 2005 comprising only unmodified cytosine, a CpG dyad 2006 comprising 5-methylcytosine (a hemimethylated 5-methylcytosine CpG dyad), and a CpG dyad 2007 comprising 5-hydroxymethylcytosine (a hemimethylated 5-hydroxymethylcytosine CpG dyad).
- flanking adaptor 2005 a P7’/P5 fork adaptor
- hairpin polynucleotide 2300-1 Treatment of hairpin polynucleotide 2300-1 with DNA methyltransferase 1 enzyme transforms CpG dyad 2006 comprising 5-methylcytosine (a hemimethylated 5-methylcytosine CpG dyad) to a fully methylated 5-methylcytosine CpG dyad.
- fourth hairpin polynucleotide 2400 comprises the precursor forward library strand 2001, the hairpin loop adaptor 2002 (comprising the cleavable site 2003), and partially methylated reverse library strand 2008.
- step (f) for methods of preparing a polynucleotide library hairpin strand as described herein.
- the precursor forward library strand 2001 is converted to forward library strand 2009.
- the partially methylated reverse library strand 2008 is converted to reverse library strand 2010.
- the CpG dyad 2005 comprising only unmodified cytosine is converted to first dyad 2011.
- the CpG dyad 2006 comprising 5-methylcytosine (a hemimethylated 5-methylcytosine CpG dyad), which was transformed to a fully methylated 5- methylcytosine CpG dyad, is unchanged in this step and can be considered as second dyad 2012.
- the CpG dyad 2007 comprising 5-hydroxymethylcytosine (a hemimethylated 5- hydroxymethylcytosine CpG dyad) is converted to third dyad 2013.
- fifth hairpin polynucleotide 2500 anneals (via P7’) to immobilised primer P7 on a solid support.
- Extending from the 3’-end of immobilised primer P7 in a 5’ to 3’ direction generates a template strand (e.g. using a DNA polymerase).
- the template strand comprises, in a 5’ to 3’ direction, a reverse template strand 2050 which is complementary to reverse library strand 2010, a spacer strand 2051 (comprising a first cleavable site 2053) which is complementary to hairpin loop adaptor 2002, and a forward template strand 2052 which is complementary to forward library strand 2009.
- a template complement strand is generated which extends from the 3’-end of immobilised primer P5.
- the template complement strand comprises, in a 5’ to 3’ direction, a forward complement template strand 2052’, a spacer complement strand 2051’ (comprising a second cleavable site 2054), and a reverse complement template strand 2050’.
- Cleavage of the first cleavable site 2053 and second cleavable site 2054 causes removal of forward template strand 2052 and reverse complement template strand 2050’, thus leaving behind a first polynucleotide sequence comprising a first portion (reverse template strand 2050) and a second polynucleotide sequence comprising a second portion (forward complement template strand 2052’).
- a method of sequencing polynucleotide sequences to distinguish between modified cytosines comprising: preparing polynucleotide templates for distinguishing between modified cytosines using a method as described herein; sequencing nucleobases in the first portion and the second portion; and identifying the presence of 5-methylcytosine or 5-hydroxymethylcytosine by detecting differences when comparing a sequence output from the first portion with a sequence output from the second portion.
- the step of sequencing nucleobases in the first portion and the second portion may involve concurrent sequencing of nucleobases in the first portion and the second portion.
- the step of concurrently sequencing nucleobases may comprise: (a) obtaining first intensity data comprising a combined intensity of a first signal component obtained based upon a respective first nucleobase at the first portion and a second signal component obtained based upon a respective second nucleobase at the second portion, wherein the first and second signal components are obtained simultaneously; (b) obtaining second intensity data comprising a combined intensity of a third signal component obtained based upon the respective first nucleobase at the first portion and a fourth signal component obtained based upon the respective second nucleobase at the second portion, wherein the third and fourth signal components are obtained simultaneously; (c) selecting one of a plurality of classifications based on the first and the second intensity data, wherein each classification represents a possible combination of respective first and second nucleobases; and (d) based on the selected classification, base calling the respective first and second nucleobases.
- selecting the classification based on the first and second intensity data may comprise selecting the classification based on the combined intensity of the first and second signal components and the combined intensity of the third and fourth signal components.
- the plurality of classifications may comprise sixteen classifications, each classification representing one of sixteen unique combinations of first and second nucleobases.
- the first signal component, second signal component, third signal component and fourth signal component may be generated based on light emissions associated with the respective nucleobase.
- the light emissions may be detected by a sensor, wherein the sensor is configured to provide a single output based upon the first and second signals.
- the sensor may comprise a single sensing element.
- the method may further comprise repeating steps (a) to (d) for each of a plurality of base calling cycles.
- Kits Methods as described herein may be performed by a user physically. In other words, a user may themselves conduct the methods of preparing polynucleotide templates for distinguishing between modified cytosines as described herein, and as such the methods as described herein may not need to be computer-implemented.
- a kit comprising instructions for preparing polynucleotide templates for distinguishing between modified cytosines as described herein, and/or for sequencing polynucleotide sequences to distinguish between modified cytosines as described herein.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a processor can be a microprocessor, but in the alternative, the processor can be a controller, microcontroller, or state machine, combinations of the same, or the like.
- a processor can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- systems described herein may be implemented using a discrete memory chip, a portion of memory in a microprocessor, flash, EPROM, or other types of memory.
- the elements of a method, process, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two.
- a software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of computer-readable storage medium known in the art.
- An exemplary storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium can be integral to the processor.
- the processor and the storage medium can reside in an ASIC.
- a software module can comprise computer-executable instructions which cause a hardware processor to execute the computer-executable instructions. Computer-executable instructions may be stored in a (transitory or non-transitory) computer readable storage medium (e.g., memory, storage system, etc.) storing code, or computer readable instructions.
- Methods of preparing library hairpin strands Also described herein is a method of preparing a polynucleotide library hairpin strand, comprising: (a) providing a double-stranded polynucleotide comprising a precursor forward library strand and a precursor reverse library strand; and (b) ligating a hairpin loop adaptor to an end of the double-stranded polynucleotide to generate a first hairpin polynucleotide, wherein the hairpin loop adaptor comprises a cleavable site.
- the polynucleotide library hairpin strand comprises a cleavable site within the hairpin loop adaptor, this allows template strands (and template complement strands) to be generated that themselves have first cleavable sites and second cleavable sites as described herein. Accordingly, these polynucleotide library hairpin strand allow the generation of template strands (and template complement strands) that have a reduced risk of forming hairpins during sequencing.
- the hairpin loop adaptor may be an oligonucleotide of any structure or any sequence that allows the (precursor) forward library strand and the (precursor) reverse library strand to be connected via a loop.
- the hairpin loop adaptor may connect a 3’-end of the precursor forward library strand with a 5’-end of the precursor reverse library strand. In another embodiment, the hairpin loop adaptor may connect a 3’-end of the precursor reverse library strand with a 5’-end of the precursor forward library strand. In one embodiment, the hairpin loop adaptor may comprise a base-paired stem and a non-base- paired loop (e.g.
- the cleavable site may be located in the non-base-paired loop.
- the cleavable site may be a restriction site for an endonuclease.
- the endonuclease may be a type II restriction enzyme as defined herein, or a CRISPR enzyme as defined herein (e.g.
- the restriction enzyme is a type IIP restriction enzyme, a type IIS restriction enzyme, a type IIC restriction enzyme, or a type IIT restriction enzyme.
- type II restriction enzyme may include EcoRI, BglII, NotI and FokI;
- CRISPR enzyme include Cas6, Cas9, Cpf1 (Cas12a), Mad7, CasC2c1, C2C2 and C2c3, and Cas9 paired nickases such as Cas9 D10A and Cas9 H840A.
- the method may further comprise a step of: (c) removing the precursor reverse library strand from the first hairpin polynucleotide to generate a second hairpin polynucleotide comprising the precursor forward library strand and the hairpin loop adaptor, wherein the hairpin loop adaptor comprises the cleavable site.
- the precursor reverse library strand may not be present.
- the native library may additionally contain modified cytosines on the precursor reverse library strand, it is advantageous to remove these as these may interfere with the generation of particular dyad patterns depending on the methylation status (e.g.5-methylcytosine or 5-hydroxymethylcytosine) that is present on the precursor forward library strand.
- the method may further comprise a step of: (d) forming a resynthesised reverse library strand from the second hairpin polynucleotide to generate a third hairpin polynucleotide, wherein when any cytosine bases are present in the resynthesised reverse library strand, then all such cytosine bases are unmodified cytosine. Accordingly, in the third hairpin polynucleotide, only the precursor forward library strand contains modified cytosines (e.g. 5-methylcytosine or 5-hydroxymethylcytosine).
- modified cytosines e.g. 5-methylcytosine or 5-hydroxymethylcytosine
- the method may further comprise a step of: (e) exposing the third hairpin polynucleotide to an enzyme configured to convert hemimethylated 5-methylcytosine CpG dyads to fully methylated 5-methylcytosine CpG dyads, but not convert hemimethylated 5-hydroxymethylcytosine dyads, in order to generate a fourth hairpin polynucleotide.
- the methylation status of all CpG dyads is controlled.
- the resynthesised reverse library strand is converted to a partially methylated reverse library strand in step (e).
- the corresponding CpG dyad also contains only unmodified cytosine.
- the corresponding CpG dyad contains 5-methylcytosine on both the precursor forward library stand and the partially methylated reverse library strand.
- the corresponding CpG dyad contains 5-hydroxymethylcytosine on the precursor forward library strand, but unmodified cytosine in the partially methylated reverse library strand.
- the enzyme configured to convert hemimethylated 5-methylcytosine CpG dyads to fully methylated 5-methylcytosine CpG dyads, but not convert hemimethylated 5- hydroxymethylcytosine dyads may be a DNA methyltransferase.
- the DNA methyltransferase may be a member of the DNA methyltransferase 1 (DNMT1) family or the DNA methyltransferase 5 (DNMT5) family.
- Non-limiting examples of the DNA methyltransferase 1 or DNA methyltransferase 5 enzyme include: DNMT protein
- the method may further comprise a step of: (f) exposing the fourth hairpin polynucleotide to a conversion agent configured to convert 5-methylcytosine and 5-hydroxymethylcytosine and 5-
- the fifth hairpin polynucleotide may comprise first dyads, second dyads and third dyads as described herein.
- the method may further comprise a step of ligating a flanking adaptor to an end of the double-stranded polynucleotide away from the hairpin loop adaptor to the third hairpin polynucleotide, the fourth hairpin polynucleotide or the fifth hairpin polynucleotide, wherein the flanking adaptor comprises a primer-binding sequence and a primer-binding complement sequence.
- Figures 16 and 17 show ligation of the flanking adaptor (P7’/P5 fork adaptor) to the third hairpin polynucleotide (in other words, immediately after step (d) as described herein), the flanking adaptor need not be ligated at this stage.
- the flanking adaptor may instead be ligated onto the fourth hairpin polynucleotide (in other words, immediately after step (e) as described herein), or the fifth hairpin polynucleotide (in other words, immediately after step (f) as described herein).
- the flanking adaptor may instead be ligated onto the fourth hairpin polynucleotide (in other words, immediately after step (e) as described herein).
- flanking adaptor P7’/P5 fork adaptor
- the positions of the hairpin loop adaptor and the flanking adaptor may be swapped instead.
- the flanking adaptor may instead be ligated onto the right hand side
- the hairpin loop adaptor may be ligated to the left hand side.
- Figures 16 and 17 show ligation of a P7’/P5 type fork adaptor
- a P5’/P7 type fork adaptor may be used instead, as mentioned above.
- the flanking adaptor may comprise a first and second strand, wherein the first and second strands are base-paired for a portion of their sequence (forming the base-paired stem) and are non-complementary for the remainder of their sequence, for example, P5’ and P7 or P7’ and P5, which subsequently forms a fork structure, wherein a first arm of the fork structure comprises a primer-binding sequence and the second arm of the fork structure comprises a primer- binding complement sequence.
- the flanking adaptor may be a forked adaptor comprising a base-paired stem, a first arm and a second arm.
- the primer-binding sequence may be located on the first arm
- the primer- binding complement sequence may be located on the second arm.
- the primer-binding sequence may be capable of binding to a lawn or immobilised primer that is immobilised on the surface of a solid support.
- the primer-binding sequence may be either P5’ (for example, SEQ ID NO.3 or 6 or a variant or fragment thereof) or P7’ (for example, SEQ ID NO. 4 or a variant or fragment thereof).
- the primer-binding complement sequence may be either P5 (for example, SEQ ID NO.1 or 5 or a variant or fragment thereof) or P7 (for example, SEQ ID NO.2 or a variant or fragment thereof). If the primer-binding sequence is P5’, the primer-binding complement sequence is P7. If the primer-binding sequence is P7’, the primer-binding complement sequence is P5.
- the hairpin loop adaptor may comprise one or more sequencing primer binding sites (or sequencing primer binding site complements).
- the sequencing primer binding sites and the sequencing primer binding site complements may allow binding of a sequencing primer.
- the sequencing primer-binding sites may be in the non-base-paired loop or in the base-paired stem.
- the base-paired stem may comprise at least one sequencing primer binding site.
- the sequencing primer-binding site may be in the base-paired stem, and a sequencing primer-binding site complement may be also be in the base-paired stem.
- the sequencing primer-binding site and sequencing primer-binding site complement may be in the base-paired stem, and the cleavable site may be in the non-base-paired loop.
- non-base-paired loop may comprise two sequencing primer binding sites.
- non-base-paired loop may comprise two sequencing primer- binding sites, wherein the sequencing primer-binding sites are either side of the cleavable site.
- the sequencing primer binding sites are sequencing primer binding sites and indicate the starting point of the sequencing read.
- a sequencing primer anneals (i.e. hybridises) to at least a portion of the sequencing primer binding site on the template strand.
- the polymerase enzyme binds to this site and incorporates complementary nucleotides base by base into the growing opposite strand.
- sequence of the sequencing primers and the sequence primer binding sites are not material to the methods of the invention, as long as the sequencing primers are able to bind to the sequence primer binding site (or sequencing binding site complement) to enable amplification and sequencing of the regions to be identified.
- a polynucleotide library hairpin strand prepared according to a method of preparing a polynucleotide library hairpin strand as described herein. Any of the methods of preparing polynucleotide library hairpin strands as described herein may be utilised in methods of preparing polynucleotide templates for distinguishing between modified cytosines as described herein. Additional Notes The embodiments described herein are exemplary. Modifications, rearrangements, substitute processes, etc.
- Disjunctive language such as the phrase “at least one of X, Y or Z,” unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y or Z, or any combination thereof (e.g., X, Y and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y or at least one of Z to each be present.
- phrases such as “a device configured to” or “a device to” are intended to include one or more recited devices. Such one or more recited devices can also be collectively configured to carry out the stated recitations.
- a processor to carry out recitations A, B and C can include a first processor configured to carry out recitation A working in conjunction with a second processor configured to carry out recitations B and C.
- the corresponding proteins were expressed in BL21(DE3) cells, purified using Ni-NTA agarose beads, and desalted/concentrated using spin columns to storage buffer (50mM Tris pH 7.5, 200mM NaCl, 5%(v/v) glycerol, 0.01% (v/v) Tween-20, 0.5mM DTT). This yielded APOBEC3A(Y130X) mutant protein preparations with 80-85% purity, as judged by SDS-PAGE analysis.
- Wild type APOBEC3A deaminated 5mC and C substrates to completion, consistent with previous literature. Different mutants exhibited a wide range of reactivities towards 5mC and C substrates, with some showing preference towards either substrate. Remarkably, APOBEC3A(Y130A) (first box) deaminated 5mC substrates almost completely (94.2%), while it deaminated the corresponding C substrate to a minor extent (29.4%). Other mutants, such as APOBEC3A(Y130P) and APOBEC3A(Y130T), also exhibited more complete deamination of the 5mC than C substrate, albeit to a lesser extent than APOBEC3A(Y130A).
- APOBEC3A(Y130L) (second box) deaminated approximately half of the C substrate (56%), but almost none of the 5mC substrate (6.8%).
- the deaminase activity of all APOBEC3A(Y130X) mutants is quantified and summarised in the table below: % C % 5mC Protein % C % 5mC deamination deamination Protein deamination deamination NEB APOBEC 92 94.9 NEB APOBEC 99.3 95.3 Y130A 29.4 94.2 Y130V 11.6 15 Y130G 3.7 19.1 Y130D 0.8 2.3 Y130L 56 6.8 Y130E 0 2.9 Y130F 84.6 94.3 Y130S 53.5 95.4 Y130I 8.1 8.1 Y130C 88.2 96.2 Y130H 83.4 95.8 Y130W 97.6 71.6 Y130Q 1.6 14.6 Y130P 0.2 22.3 Y130M 37 55.4 Y
- a time course analysis of APOBEC3A(Y130A) deaminase activity was therefore performed.
- the extent of C and 5mC deamination was monitored at 0, 5, 10, 30, 60 and 120 minutes by incubation of ⁇ 10-20 ⁇ M of APOBEC(Y130A) with 500nM C and 5mC oligonucleotide substrate.
- a greater difference in the extent of 5mC versus C deamination was observed at t ⁇ 30 min.
- the kinetics of deamination by wild type APOBEC3A and mutant APOBEC3A(Y130A) were quantitatively compared.
- the initial deamination reaction velocity was measured at a range of DNA substrate concentrations and used to construct Michaelis-Menten curves for 5mC and C substrates, respectively.
- the resulting Km and Kcat values were then derived from these data.
- the catalytic efficiency of APOBEC3A(Y130A) was ⁇ 100-fold higher on 5mC than C substrates corroborating the endpoint SwaI assays shown above.
- Reference Example 4 DNA deaminase activity of APOBEC3A(Y130A-Y132H) double mutant protein
- the deaminase activity of purified APOBEC3A(Y130A-Y132H) double mutant protein was then analyzed using the SwaI assay, with a 37°C/ 2 hour reaction time and NEB APOBEC3A as positive control.
- the conditions used were the same as described in Reference Example 2 with the exception that the SwaI assay used reaction conditions of 40 mM sodium acetate pH 5.2, 37°C for 1 hour to 16 hours.
- the DNA substrates are shown below: 5’GAGGTGTATGGTTGTACTAAT/5mC/ACT/5mC/CTGGA/5mC/GAATCTTAA/5mC/ACAA/5mC/ GTGCAG/5mC/CAAA/5mC/GCTT/5mC/GC/5mC/ACGG/5mC/AACGTG/5mC/GGACT/5mC/GTCG/5 mC/CTTA/5mC/AATCG/5mC/GCAGGT/5mC/ACGTTGAAGATGAGGATG-3’ (SEQ ID NO: 74) GAGGTGTATGGTTGTAG/5mC/GCAAATCGTAAAA/5mC/GCAAAGCGAAAAC/5mC/GCAAACC GTAAAC/5mC/GAAAAGCGCTTGAAGATGAGGATG (SEQ ID NO: 75) GAGGTGTATGGTTGTAG/5mC/GGAAAACGGAAAT/5mC/GGAAAACGTAAAG/5mC/GTAAATC GGAAAG/5mC/
- APOBEC3A(Y130A- Y132H) exhibited higher levels of deamination at all methylated sites compared to unmethylated sites. This was consistent across both CpG and non-CpG contexts, and was robust to variation in reaction time. The difference in deamination level between methylated and unmethylated sites was markedly higher for APOBEC3A(Y130A-Y132H) than APOBEC3A(Y130A), indicating that APOBEC3A(Y130A-Y132H) achieves better discrimination of methylated sites than APOBEC3A(Y130A).
- a method of preparing polynucleotide templates for distinguishing between modified cytosines comprising: (a) providing a polynucleotide library hairpin strand comprising: a double-stranded polynucleotide comprising a forward library strand and a reverse library strand, a hairpin loop adaptor ligated to an end of the double-stranded polynucleotide, wherein the hairpin loop adaptor comprises a cleavable site, wherein the polynucleotide library hairpin strand has been generated from a precursor polynucleotide library hairpin strand such that any CpG dyads in the precursor polynucleotide library hairpin comprising only unmodified cytosine are converted to
- Clause 2 A method according to clause 1, wherein the method further comprises a step of: (c) synthesising at least one template complement strand by generating a complement of the template strand, each of the template complement strands comprising a forward complement template strand, a spacer complement strand, and a reverse complement template strand, wherein the spacer complement strand comprises a second cleavable site.
- Clause 9 A method according to clause 8, wherein the second sequencing primer binding site is located after a 3’-end of the second portion.
- Clause 10 A method according to any one of clauses 3 to 9, wherein where CpG dyads comprising only unmodified cytosine were present in the precursor polynucleotide library hairpin, then a double C-C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; where CpG dyads comprising 5-methylcytosine were present in the precursor polynucleotide library hairpin, then a double mismatch is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; and where CpG dyads comprising 5- hydroxymethylcytos
- Clause 11 A method according to any one of clauses 3 to 9, wherein where CpG dyads comprising only unmodified cytosine were present in the precursor polynucleotide library hairpin, then a double mismatch is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; where CpG dyads comprising 5-methylcytosine were present in the precursor polynucleotide library hairpin, then a double C-C/G-G match is present when comparing corresponding positions in the at least one first polynucleotide sequence and the at least one second polynucleotide sequence corresponding to the CpG dyad; and where CpG dyads comprising 5-hydroxymethylcytosine were present in the precursor polynucleotide library hairpin, then a single mismatch and single C-C/G-G match is
- Clause 12 A method according to any one of clauses 3 to 11, wherein the method further comprises a step of preparing the first portion and the second portion for concurrent sequencing. Clause 13. A method according to clause 12, wherein the method comprises simultaneously contacting first sequencing primer binding sites located after a 3’-end of the first portions with first primers and second sequencing primer binding sites located after a 3’-end of the second portions with second primers. Clause 14.
- Clause 15. A method according to clause 14, wherein the processing involves selective processing to cause an intensity of the first signal to be greater than an intensity of the second signal.
- Clause 16 A method according to clause 15, wherein a concentration of the first portions capable of generating the first signal is greater than a concentration of the second portions capable of generating the second signal.
- a method according to clause 16, wherein a ratio between the concentration of the first portions capable of generating the first signal and the concentration of the second portions capable of generating the second signal is between 1.25:1 to 5:1.
- Clause 18. A method according to clause 17, wherein the ratio is between 1.5:1 to 3:1.
- Clause 19. A method according to clause 18, wherein the ratio is about 2:1.
- Clause 20. A method according to any one of clauses 15 to 19, wherein selective processing comprises preparing for selective sequencing or conducting selective sequencing.
- Clause 21. A method according to any one of clauses 15 to 19, wherein selectively processing comprises conducting selective amplification.
- a method according to any one of clauses 15 to 20, wherein selectively processing comprises contacting first sequencing primer binding sites located after a 3’-end of the first portions with first primers and contacting second sequencing primer binding sites located after a 3’-end of the second portions with second primers, wherein the second primers comprises a mixture of blocked second primers and unblocked second primers.
- the blocked second primer comprises a blocking group at a 3’ end of the blocked second primer.
- the blocking group is selected from the group consisting of: a hairpin loop, a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer, a modification blocking the 3’-hydroxyl group, or an inverted nucleobase.
- selectively processing comprises selectively blocking some or substantially all of second immobilised primers that are not yet extended using a primer blocking agent, wherein the primer blocking agent is configured to limit or prevent synthesis of a strand extending from the second immobilised primer, and conducting a further amplification cycle in order to selectively amplify the first polynucleotide sequence(s) relative to the second polynucleotide sequence(s).
- the primer blocking agent is added whilst first polynucleotide sequence(s) are hybridised to the second immobilised primers.
- Clause 29. A method according to any one of clauses 26 to 28, wherein the primer blocking agent is a blocked nucleotide.
- Clause 30. A method according to clause 29, wherein the blocked nucleotide comprises a blocking group at a 3’ end of the blocked nucleotide.
- the blocking group is selected from the group consisting of: a hairpin loop, a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer, a modification blocking the 3’-hydroxyl group, or an inverted nucleobase.
- Clause 32 A method according to any one of clauses 29 to 31, wherein the blocked nucleotide is A or G.
- Clause 33 A method according to any one of clauses 14 to 32, wherein the first signal and the second signal are spatially resolved.
- Clause 35 A method according to any one of clauses 14 to 32, wherein the first signal and the second signal are spatially unresolved.
- Clause 35 A method according to any one of clauses 3 to 34, wherein the at least one first polynucleotide sequence comprising the first portion and the at least one second polynucleotide sequence comprising the second portion are attached to a solid support.
- Clause 36 A method according to clause 35, wherein the solid support is a flow cell.
- Clause 37 A method according to clause 35 or clause 36, wherein the at least one first polynucleotide sequence comprising the first portion and the at least one second polynucleotide sequence comprising the second portion forms a cluster on the solid support.
- Clause 39 A method according to any one of clauses 35 to 38, wherein the at least one first polynucleotide sequence comprising the first portion and the at least one second polynucleotide sequence comprising the second portion form a duoclonal cluster.
- Clause 40 A method according to any one of clauses 35 to 39, wherein the solid support comprises at least one first immobilised primer and at least one second immobilised primer.
- the first immobilised primer comprises a sequence as defined in SEQ ID NO.
- each first polynucleotide sequence is attached to a first immobilised primer, and wherein each second polynucleotide sequence is attached to a second immobilised primer.
- each first polynucleotide sequence comprises a second adaptor sequence and wherein each second polynucleotide sequence comprises a first adaptor sequence, wherein the second adaptor sequence is substantially complementary to the second immobilised primer and wherein the first adaptor sequence is substantially complementary to the first immobilised primer.
- a method of sequencing polynucleotide sequences to distinguish between modified cytosines comprising: preparing polynucleotide templates for distinguishing between modified cytosines using a method according to any one of clauses 3 to 43; sequencing nucleobases in the first portion and the second portion; and identifying the presence of 5-methylcytosine or 5-hydroxymethylcytosine by detecting differences when comparing a sequence output from the first portion with a sequence output from the second portion.
- Clause 45 A method according to clause 44, wherein the step of sequencing nucleobases in the first portion and the second portion involves concurrent sequencing of nucleobases in the first portion and the second portion.
- a method according to clause 44 or clause 45, wherein the step of sequencing nucleobases comprises performing sequencing-by-synthesis.
- Clause 47. A method according to any one of clauses 44 to 46, wherein the method further comprises a step of conducting paired-end reads.
- Clause 48. A kit comprising instructions for preparing polynucleotide templates for distinguishing between modified cytosines according to any one of clauses 1 to 43, and/or for sequencing polynucleotide sequences to distinguish between modified cytosines according to any one of clauses 44 to 47.
- Clause 49. A data processing device comprising means for carrying out a method according to any one of clauses 1 to 47. Clause 50.
- a data processing device comprising instructions which, when the program is executed by a processor, cause the processor to carry out a method according to any one of clauses 1 to 47.
- Clause 52. A computer-readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out a method according to any one of clauses 1 to 47.
- Clause 53. A computer-readable data carrier having stored thereon a computer program product according to clause 51.
- Clause 54. A data carrier signal carrying a computer program product according to clause 51.
- Clause 55 A data carrier signal carrying a computer program product according to clause 51.
- a method of preparing a polynucleotide library hairpin strand comprising: (a) providing a double-stranded polynucleotide comprising a precursor forward library strand and a precursor reverse library strand; and (b) ligating a hairpin loop adaptor to an end of the double-stranded polynucleotide to generate a first hairpin polynucleotide, wherein the hairpin loop adaptor comprises a cleavable site.
- Clause 56 A method according to clause 55, wherein the hairpin loop adaptor comprises a base- paired stem and a non-base-paired loop.
- Clause 58. A method according to any one of clauses 55 to 57, wherein the hairpin loop adaptor connects a 3’-end of the precursor forward library strand with a 5’-end of the precursor reverse library strand; or wherein the hairpin loop adaptor connects a 3’-end of the precursor reverse library strand with a 5’-end of the precursor forward library strand.
- Clause 59. A method according to any one of clauses 55 to 58, wherein the cleavable site is a restriction site for an endonuclease.
- a method according to clause 63, wherein the enzyme configured to convert hemimethylated 5-methylcytosine CpG dyads to fully methylated 5-methylcytosine CpG dyads, but not convert hemimethylated 5-hydroxymethylcytosine dyads is a member of the DNA methyltransferase 1 (DNMT1) family or the DNA methyltransferase 5 (DNMT5) family.
- DNMT1 DNA methyltransferase 1
- DNMT5 DNA methyltransferase 5
- Clause 66 Clause 66.
- Clause 68. A method according to any one of clauses 65 to 67, wherein the conversion agent comprises a chemical agent and/or an enzyme. Clause 69.
- the conversion agent comprises a boron- based reducing agent and a ten-eleven translocation (TET) methylcytosine dioxygenase.
- TET ten-eleven translocation
- the boron-based reducing agent is an amine- borane compound or an azine-borane compound.
- the boron-based reducing agent is selected from the group consisting of pyridine borane, 2-picoline borane, t-butylamine borane, ammonia borane, ethylenediamine borane and dimethylamine borane.
- Clause 73. A method according to clause 68, wherein the conversion agent comprises sulfite.
- Clause 74. A method according to clause 73, wherein the sulfite is bisulfite.
- Clause 75. A method according to clause 74, wherein the bisulfite is sodium bisulfite.
- Clause 76. A method according to clause 68, wherein the conversion agent comprises a cytidine deaminase. Clause 77.
- cytidine deaminase is a wild-type cytidine deaminase or a mutant cytidine deaminase.
- Clause 78 A method according to clause 76 or clause 77, wherein the cytidine deaminase is a member of the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the APOBEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, the APOBEC3G subfamily, the APOBEC3H subfamily, or the APOBEC4 subfamily.
- Clause 79 A method according to clause 78, wherein the cytidine deaminase is a member of the APOBEC3A subfamily.
- Clause 80 A method according to any one of clauses 76 to 79, wherein the cytidine deaminase comprises amino acid substitution mutations at positions functionally equivalent to (Tyr/Phe)130 and Tyr132 in a wild-type APOBEC3A protein.
- Clause 81 A method according to clause 80, wherein the (Tyr/Phe)130 is Tyr130, and the wild- type APOBEC3A protein is SEQ ID NO.16.
- Clause 82 A method according to clause 80, wherein the (Tyr/Phe)130 is Tyr130, and the wild- type APOBEC3A protein is SEQ ID NO.16.
- Clause 84. A method according to any one of clauses 77 to 83, wherein the mutant cytidine deaminase comprises a ZDD motif H-[P/A/V]-E-X[23-28]-P-C-X[2-4]-C (SEQ ID NO.51).
- Clause 88. A method according to any one of clauses 65 to 87, wherein the conversion agent further comprises a glycosyltransferase.
- Clause 89. A method according to clause 88, wherein the glycosyltransferase is a ⁇ - glucosyltransferase. Clause 90.
- the flanking adaptor is a forked adaptor comprising a base-paired stem, a first arm and a second arm.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Certains aspects concernent des procédés permettant de distinguer différents types de cytosines modifiées dans des séquences d'acides nucléiques.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363508166P | 2023-06-14 | 2023-06-14 | |
US63/508,166 | 2023-06-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024256581A1 true WO2024256581A1 (fr) | 2024-12-19 |
Family
ID=91664534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2024/066447 WO2024256581A1 (fr) | 2023-06-14 | 2024-06-13 | Identification de cytosines modifiées |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024256581A1 (fr) |
Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998044152A1 (fr) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Methode de sequençage d'acide nucleique |
WO1998044151A1 (fr) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Methode d'amplification d'acide nucleique |
WO2000018957A1 (fr) | 1998-09-30 | 2000-04-06 | Applied Research Systems Ars Holding N.V. | Procedes d'amplification et de sequençage d'acide nucleique |
WO2002006456A1 (fr) | 2000-07-13 | 2002-01-24 | Invitrogen Corporation | Methodes et compositions d'extraction et d'isolation rapides de proteines et de peptides au moyen d'une matrice de lyse |
WO2003074734A2 (fr) | 2002-03-05 | 2003-09-12 | Solexa Ltd. | Procedes de detection de variations de sequence a l'echelle du genome associees a un phenotype |
WO2005068656A1 (fr) | 2004-01-12 | 2005-07-28 | Solexa Limited | Caracterisation d'acides nucleiques |
US20060024681A1 (en) | 2003-10-31 | 2006-02-02 | Agencourt Bioscience Corporation | Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof |
WO2006110855A2 (fr) | 2005-04-12 | 2006-10-19 | 454 Life Sciences Corporation | Procedes de determination de variantes de sequence utilisant un sequencage des amplicons |
WO2006135342A1 (fr) | 2005-06-14 | 2006-12-21 | Agency For Science, Technology And Research | Procede permettant de traiter et/ou de mapper des sequences ditag a un genome |
US20060292611A1 (en) | 2005-06-06 | 2006-12-28 | Jan Berka | Paired end sequencing |
WO2007010252A1 (fr) | 2005-07-20 | 2007-01-25 | Solexa Limited | Procede de sequencage d'une matrice de polynucleotide |
WO2007052006A1 (fr) | 2005-11-01 | 2007-05-10 | Solexa Limited | Procede pour preparer des bibliotheques de polynucleotides matrices |
WO2007091077A1 (fr) | 2006-02-08 | 2007-08-16 | Solexa Limited | Procédé de séquençage d'une matrice polynucléotidique |
WO2007107710A1 (fr) | 2006-03-17 | 2007-09-27 | Solexa Limited | Procédés isothermiques pour créer des réseaux moléculaires clonales simples |
WO2008041002A2 (fr) | 2006-10-06 | 2008-04-10 | Illumina Cambridge Limited | Procédé de séquençage d'une matrice polynucléotidique |
WO2008093098A2 (fr) | 2007-02-02 | 2008-08-07 | Illumina Cambridge Limited | Procedes pour indexer des echantillons et sequencer de multiples matrices nucleotidiques |
US20120316086A1 (en) | 2011-06-09 | 2012-12-13 | Illumina, Inc. | Patterned flow-cells useful for nucleic acid analysis |
US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
WO2013188582A1 (fr) | 2012-06-15 | 2013-12-19 | Illumina, Inc. | Amplification par exclusion cinétique de banques d'acides nucléiques |
US20150011396A1 (en) * | 2012-07-09 | 2015-01-08 | Benjamin G. Schroeder | Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing |
US9121061B2 (en) | 2012-03-15 | 2015-09-01 | New England Biolabs, Inc. | Methods and compositions for discrimination between cytosine and modifications thereof and for methylome analysis |
US20190212294A1 (en) | 2018-01-08 | 2019-07-11 | Illumina, Inc. | High-Throughput Sequencing with Semiconductor-Based Detection |
US10619200B2 (en) | 2015-10-30 | 2020-04-14 | New England Biolabs, Inc. | Compositions and methods for analyzing modified nucleotides |
US20220119878A1 (en) * | 2020-03-06 | 2022-04-21 | Singular Genomics Systems, Inc. | Linked paired strand sequencing |
WO2022087150A2 (fr) | 2020-10-21 | 2022-04-28 | Illumina, Inc. | Modèles de séquençage comprenant de multiples inserts et compositions et procédés d'amélioration du débit de séquençage |
US20220220543A1 (en) * | 2019-08-01 | 2022-07-14 | Twinstrand Biosciences, Inc. | Methods and reagents for nucleic acid sequencing and associated applications |
WO2023034814A1 (fr) * | 2021-09-03 | 2023-03-09 | Singular Genomics Systems, Inc. | Procédés de différenciation de nucléobases modifiées |
-
2024
- 2024-06-13 WO PCT/EP2024/066447 patent/WO2024256581A1/fr unknown
Patent Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998044152A1 (fr) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Methode de sequençage d'acide nucleique |
WO1998044151A1 (fr) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Methode d'amplification d'acide nucleique |
WO2000018957A1 (fr) | 1998-09-30 | 2000-04-06 | Applied Research Systems Ars Holding N.V. | Procedes d'amplification et de sequençage d'acide nucleique |
WO2002006456A1 (fr) | 2000-07-13 | 2002-01-24 | Invitrogen Corporation | Methodes et compositions d'extraction et d'isolation rapides de proteines et de peptides au moyen d'une matrice de lyse |
WO2003074734A2 (fr) | 2002-03-05 | 2003-09-12 | Solexa Ltd. | Procedes de detection de variations de sequence a l'echelle du genome associees a un phenotype |
US20060024681A1 (en) | 2003-10-31 | 2006-02-02 | Agencourt Bioscience Corporation | Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof |
WO2005068656A1 (fr) | 2004-01-12 | 2005-07-28 | Solexa Limited | Caracterisation d'acides nucleiques |
WO2006110855A2 (fr) | 2005-04-12 | 2006-10-19 | 454 Life Sciences Corporation | Procedes de determination de variantes de sequence utilisant un sequencage des amplicons |
US20060292611A1 (en) | 2005-06-06 | 2006-12-28 | Jan Berka | Paired end sequencing |
WO2006135342A1 (fr) | 2005-06-14 | 2006-12-21 | Agency For Science, Technology And Research | Procede permettant de traiter et/ou de mapper des sequences ditag a un genome |
WO2007010252A1 (fr) | 2005-07-20 | 2007-01-25 | Solexa Limited | Procede de sequencage d'une matrice de polynucleotide |
WO2007052006A1 (fr) | 2005-11-01 | 2007-05-10 | Solexa Limited | Procede pour preparer des bibliotheques de polynucleotides matrices |
WO2007091077A1 (fr) | 2006-02-08 | 2007-08-16 | Solexa Limited | Procédé de séquençage d'une matrice polynucléotidique |
WO2007107710A1 (fr) | 2006-03-17 | 2007-09-27 | Solexa Limited | Procédés isothermiques pour créer des réseaux moléculaires clonales simples |
WO2008041002A2 (fr) | 2006-10-06 | 2008-04-10 | Illumina Cambridge Limited | Procédé de séquençage d'une matrice polynucléotidique |
WO2008093098A2 (fr) | 2007-02-02 | 2008-08-07 | Illumina Cambridge Limited | Procedes pour indexer des echantillons et sequencer de multiples matrices nucleotidiques |
US20120316086A1 (en) | 2011-06-09 | 2012-12-13 | Illumina, Inc. | Patterned flow-cells useful for nucleic acid analysis |
US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
US9121061B2 (en) | 2012-03-15 | 2015-09-01 | New England Biolabs, Inc. | Methods and compositions for discrimination between cytosine and modifications thereof and for methylome analysis |
WO2013188582A1 (fr) | 2012-06-15 | 2013-12-19 | Illumina, Inc. | Amplification par exclusion cinétique de banques d'acides nucléiques |
US20150011396A1 (en) * | 2012-07-09 | 2015-01-08 | Benjamin G. Schroeder | Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing |
US10619200B2 (en) | 2015-10-30 | 2020-04-14 | New England Biolabs, Inc. | Compositions and methods for analyzing modified nucleotides |
US20190212294A1 (en) | 2018-01-08 | 2019-07-11 | Illumina, Inc. | High-Throughput Sequencing with Semiconductor-Based Detection |
US20220220543A1 (en) * | 2019-08-01 | 2022-07-14 | Twinstrand Biosciences, Inc. | Methods and reagents for nucleic acid sequencing and associated applications |
US20220119878A1 (en) * | 2020-03-06 | 2022-04-21 | Singular Genomics Systems, Inc. | Linked paired strand sequencing |
WO2022087150A2 (fr) | 2020-10-21 | 2022-04-28 | Illumina, Inc. | Modèles de séquençage comprenant de multiples inserts et compositions et procédés d'amélioration du débit de séquençage |
WO2023034814A1 (fr) * | 2021-09-03 | 2023-03-09 | Singular Genomics Systems, Inc. | Procédés de différenciation de nucléobases modifiées |
Non-Patent Citations (16)
Title |
---|
"GenBank", Database accession no. XP_003264816.1 |
"NCBI", Database accession no. XP_004028087.1 |
"NCBINP", Database accession no. 001332895.1 |
"UniProt", Database accession no. J9VI03 |
"Uniprot", Database accession no. Q694C 1 |
CHEN ET AL., VIRUSES, vol. 13, 2021, pages 497 |
FROMME ET AL., PROC. NATL. ACAD. SCI. U.S.A., vol. 89, 1992, pages 1827 - 1831 |
FÜLLGRABE ET AL.: "Accurate simultaneous sequencing of genetic and epigenetic bases in DNA", BIORXIV, Retrieved from the Internet <URL:https://doi.org/10.1101/2022.07.08.499285> |
GIEHR PASCAL ET AL: "Two are better than one: HPoxBS - hairpin oxidative bisulfite sequencing", NUCLEIC ACIDS RESEARCH, vol. 46, no. 15, 15 June 2018 (2018-06-15), GB, pages e88 - e88, XP093025520, ISSN: 0305-1048, DOI: 10.1093/nar/gky422 * |
KYRIAKOPOULOS CHARALAMPOS ET AL: "A comprehensive approach for genome-wide efficiency profiling of DNA modifying enzymes", CELL REPORTS METHODS, vol. 2, no. 3, 1 March 2022 (2022-03-01), pages 100187, XP093207818, ISSN: 2667-2375, DOI: 10.1016/j.crmeth.2022.100187 * |
LIU ET AL., NATURE BIOTECHNOLOGY, vol. 37, 2019, pages 424 - 429 |
SALTER ET AL., TRENDS BIOCHEM SCI, vol. 41, no. 7, 2016, pages 578 - 594 |
SALTER ET AL., TRENDS BIOCHEM. SCI, vol. 43, no. 8, 2018, pages 606 - 622 |
SAMBROOK ET AL.: "Molecular Cloning, A Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS |
SCHUTSKY ET AL., NUCLEIC ACID RESEARCH, vol. 45, 2017, pages 7655 - 7665 |
VAISVILA ET AL., GENOME RES, vol. 31, 2021, pages 1280 - 1289 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7091400B2 (ja) | 核酸の多重検出 | |
AU2019222723B2 (en) | Methods for the epigenetic analysis of DNA, particularly cell-free DNA | |
CA2810931C (fr) | Capture directe, amplification et sequencage d'adn cible a l'aide d'amorces immobilisees | |
AU2009229157B2 (en) | Compositions and methods for nucleic acid sequencing | |
US8999677B1 (en) | Method for differentiation of polynucleotide strands | |
WO2024256581A1 (fr) | Identification de cytosines modifiées | |
US10036063B2 (en) | Method for sequencing a polynucleotide template | |
US20240287578A1 (en) | Concurrent sequencing of forward and reverse complement strands on concatenated polynucleotides for methylation detection | |
US20240301464A1 (en) | Concurrent sequencing of forward and reverse complement strands on separate polynucleotides for methylation detection | |
WO2023175037A2 (fr) | Séquençage simultané de brins de complément avant et inverse sur des polynucléotides séparés pour la détection de méthylation | |
US20250043275A1 (en) | Methods of preparing loop fork libraries | |
AU2015202111B9 (en) | Compositions and methods for nucleic acid sequencing | |
WO2024256580A1 (fr) | Séquençage simultané avec des anneaux spatialement séparés | |
AU2023354390A1 (en) | Methods of modulating clustering kinetics | |
Booth | DNA and RNA Sequencing | |
AU2023354388A1 (en) | Thermophilic compositions for nucleic acid amplification | |
AU2023354389A1 (en) | Mesophilic compositions for nucleic acid amplification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24735888 Country of ref document: EP Kind code of ref document: A1 |