CN103233072B

CN103233072B - High-flux mythelation detection technology for DNA (deoxyribonucleic acid) of complete genome

Info

Publication number: CN103233072B
Application number: CN201310163085.0A
Authority: CN
Inventors: 王师; 吕佳; 包振民; 张玲玲; 胡晓丽; 陆维
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2013-05-06
Filing date: 2013-05-06
Publication date: 2014-07-02
Anticipated expiration: 2033-05-06
Also published as: CN103233072A

Abstract

The purpose of the present invention is to provide a high-throughput genome-wide DNA methylation detection method, which uses the methylation modification-dependent restriction enzyme FspEI to digest the genomic DNA, and the methylation at the CG or CHG site in the genome All modifications can be recognized by FspEI, and after digestion, an equal-length tag with a core methylation site is generated. High-throughput sequencing technology for the tag library can obtain genome-wide methylation site sequence information. This method can realize accurate positioning of methylation modification sites, enrich genome methylation sites by enzymatic digestion, and directly sequence and quantify methylation tag sequences, which can effectively reduce sequencing costs. The experimental process only takes two days, and has the advantages of high throughput, simple operation, low cost, and good reliability. It is an excellent high-throughput genome-wide methylation detection method suitable for non-model organisms.

Description

A high-throughput genome-wide DNA methylation detection technology

技术领域 technical field

本发明属于DNA甲基化检测技术领域，具体涉及一种高通量全基因组DNA甲基化检测技术。The invention belongs to the technical field of DNA methylation detection, and in particular relates to a high-throughput genome-wide DNA methylation detection technology.

背景技术： Background technique:

DNA甲基化(DNA methylation)是真核生物基因组DNA的一种重要的表观遗传学修饰，即在DNA甲基化转移酶(DNA methyltransferase,DNMT)的作用下，将S-腺苷甲硫氨酸（SAM）的甲基集团共价结合到DNA分子的胞嘧啶上形成5-甲基胞嘧啶(5-^mC)的过程。DNA甲基化在维持高等生物正常细胞功能、遗传印记、胚胎发育、衰老以及人类肿瘤的发生等生物学过程中起着重要作用。在无脊椎动物中，基因组DNA甲基化通过调控基因的表达模式，参与调节机体对环境的适应过程。因此，获得全基因组范围内所有胞嘧啶位点的甲基化数据，对于表观遗传学的时空特异性研究具有重要意义。DNA methylation is an important epigenetic modification of eukaryotic genomic DNA, that is, under the action of DNA methyltransferase (DNA methyltransferase, DNMT), S-adenosylmethylthio The process in which the methyl group of amino acid (SAM) is covalently bound to cytosine of DNA molecule to form 5-methylcytosine (5- ^mC ). DNA methylation plays an important role in the maintenance of normal cell function in higher organisms, genetic imprinting, embryonic development, aging and the occurrence of human tumors and other biological processes. In invertebrates, genomic DNA methylation is involved in regulating the adaptation process of the organism to the environment by regulating the expression pattern of genes. Therefore, obtaining the methylation data of all cytosine sites in the whole genome is of great significance for the study of spatiotemporal specificity of epigenetics.

随着甲基化研究的不断深入，已经有多种甲基化分析方法以满足不同甲基化研究的需要。在针对全基因范围DNA甲基化的研究中，检测的手段主要是基于芯片平台的全基因组甲基化位点的筛选和基于高通量测序平台的甲基化图谱分析。其中芯片技术在模式生物的甲基化研究中是相对完善成熟的检测工具，具有高覆盖度、方便快捷，性价比高等特点，如 Illumina推出的人类全基因组甲基化芯片Human Methylation HD 450包含45万个CpG位点，能够覆盖所有NCBI注释的基因。操作流程简单，能够进行高通量甲基化位点的准确检测。但针对遗传信息相对匮乏的非模式生物，其芯片造价昂贵，甲基化位点选择的灵活性不高，因而难以利用现有芯片平台对非模式生物进行高通量全基因组甲基化的研究。With the deepening of methylation research, there have been a variety of methylation analysis methods to meet the needs of different methylation research. In the research on DNA methylation in the whole gene range, the means of detection are mainly the screening of genome-wide methylation sites based on chip platform and the analysis of methylation map based on high-throughput sequencing platform. Among them, chip technology is a relatively complete and mature detection tool in the methylation research of model organisms. It has the characteristics of high coverage, convenience, and high cost performance. For example, the Human Methylation HD 450 chip launched by Illumina contains 450,000 CpG sites, able to cover all NCBI annotated genes. The operation procedure is simple, and it can accurately detect high-throughput methylation sites. However, for non-model organisms with relatively scarce genetic information, the cost of the chip is expensive, and the flexibility of methylation site selection is not high, so it is difficult to use the existing chip platform to conduct high-throughput genome-wide methylation research on non-model organisms .

随着新一代测序技术通量不断提高和成本的降低，目前有很多基于二代测序平台的全基因组甲基化检测方法以得到应用，包括有全基因组重亚硫酸盐测序(whole genome bisulfite sequencing，BS-seq)和简化的表达重亚硫酸盐测序 (reduced representation bisulfite sequencing, RRBS)、甲基化DNA 免疫共沉淀测序(methylated DNA immunoprecipitation sequencing, MeDIP-seq)、甲基化DNA 富集结合高通量测序(methylated DNA binding domainsequencing, MBD-Seq) 和甲基化敏感性限制酶测序(methylation-sensitive Restriction Enzyme sequencing,MRE-seq)等。这些技术都有其优缺点和适用范围，如基于亚硫氢酸盐处理的DNA甲基化检测方法, 尽管作为DNA甲基化检测的金标准，但操作复杂（重亚硫氢酸盐处理），测序所需的成本较高，不适用于贝类等基因组较大的物种。 MeDIP-Seq是对特异性抗体富集的基因组上的甲基化区域进行高通量测序从而获得全基因组范围的甲基化位点。但该技术需要大量DNA，并且抗体的价格昂贵。With the continuous improvement of next-generation sequencing technology throughput and cost reduction, there are currently many genome-wide methylation detection methods based on next-generation sequencing platforms that have been applied, including whole genome bisulfite sequencing (whole genome bisulfite sequencing, BS-seq) and simplified expression bisulfite sequencing (reduced representation bisulfite sequencing, RRBS), methylated DNA immunoprecipitation sequencing (methylated DNA immunoprecipitation sequencing, MeDIP-seq), methylated DNA enrichment combined with Qualcomm DNA sequencing (methylated DNA binding domain sequencing, MBD-Seq) and methylation-sensitive restriction enzyme sequencing (methylation-sensitive Restriction Enzyme sequencing, MRE-seq), etc. These technologies have their advantages, disadvantages and scope of application, such as the DNA methylation detection method based on sulfite treatment, although it is the gold standard for DNA methylation detection, the operation is complicated (bisulfite treatment) , the cost required for sequencing is high, and it is not suitable for species with large genomes such as shellfish. MeDIP-Seq is a high-throughput sequencing of methylated regions on the genome enriched by specific antibodies to obtain genome-wide methylated sites. But the technique requires large amounts of DNA, and antibodies are expensive.

发明内容 Contents of the invention

本发明的目的在于提供一种高通量全基因组DNA甲基化检测方法，即一种适用于非模式生物的，低成本、简单快速的高通量全基因组甲基化检测方法，以弥补现有技术的不足。The purpose of the present invention is to provide a high-throughput whole-genome DNA methylation detection method, that is, a low-cost, simple and fast high-throughput whole-genome DNA methylation detection method suitable for non-model organisms, to make up for the existing There are technical deficiencies.

本发明的全基因组DNA甲基化检测方法，包括如下的步骤：The whole genome DNA methylation detection method of the present invention comprises the following steps:

1）将基因组DNA用内切酶FspEI酶进行酶切，获得酶切片段；1) Genomic DNA is digested with endonuclease FspEI to obtain digested fragments;

2）将酶切片段的两端分别连接上接头，作为扩增引物的结合点；2) Connect the two ends of the enzyme-digested fragments to adapters as the binding points of the amplification primers;

3）将连接上接头的酶切片段用引物进行第一轮PCR扩增，从而富集接头连接正确的酶切片段；3) Perform the first round of PCR amplification of the enzyme-cut fragments connected with the adapters with primers, so as to enrich the enzyme-cut fragments with the correct adapter ligation;

4）将第一轮PCR扩增产物经凝胶纯化后用引物进行第二轮PCR扩增，引入Barcode来构建测序文库；4) The first round of PCR amplification products were gel-purified, and primers were used for the second round of PCR amplification, and Barcode was introduced to construct a sequencing library;

5）测序文库进行测序；将测序数据进行分析得到全基因组甲基化信息。5) The sequencing library is sequenced; the sequencing data is analyzed to obtain the genome-wide methylation information.

其中，步骤2）中的接头，为接头slx1和slx2，其中构成slx1的两个核苷酸片段，其序列分别为5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′（SEQ ID NO:1）和3′-CGAGAAGGCTAGANNNNN-5′（SEQ ID NO:2），其中N为碱基A、T、G、C中的任一个；Wherein, the linkers in step 2) are linkers slx1 and slx2, wherein the two nucleotide fragments constituting slx1 have the sequences respectively 5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′ (SEQ ID NO: 1) and 3′-CGAGAAGGCTAGANNNNN- 5' (SEQ ID NO: 2), wherein N is any one of the bases A, T, G, and C;

构成slx2的两个核苷酸片段，其序列分别为5′-GTGACTGGAGTTCAGACTwo nucleotide fragments constituting slx2, the sequences of which are 5′-GTGACTGGAGTTCAGAC

GTGTGCTCTTCCGATCT-3′（SEQ ID NO:3）和3′-CGAGAAGGCTAGANNNNN-5′ （SEQ ID NO:2）。GTGTGCTCTTCCGATCT-3' (SEQ ID NO: 3) and 3'-CGAGAAGGCTAGANNNNN-5' (SEQ ID NO: 2).

所述的步骤3）中的引物，为Slx-Primer 1和Slx-Primer 2，其核苷酸序列分别为：The primers in the step 3) are Slx-Primer 1 and Slx-Primer 2, and their nucleotide sequences are respectively:

Slx-Primer1：5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTASlx-Primer1: 5′-AATGATACGGCGACCACCGAGATCTACACTCTTTTCCCTA

CACGAC-3’ （SEQ ID NO:4）；CACGAC-3' (SEQ ID NO: 4);

Slx-Primer 2：5'-GTGACTGGAGTTCAGACGTGT-3’ （SEQ ID NO:5）；Slx-Primer 2: 5'-GTGACTGGAGTTCAGACGTGT-3' (SEQ ID NO: 5);

所述的步骤4）中的引物为Slx-Primer 1和Slx-Index Primer，其核苷酸序列分别为：The primers in the step 4) are Slx-Primer 1 and Slx-Index Primer, and their nucleotide sequences are respectively:

CACGAC-3’ （SEQ ID NO:4）；CACGAC-3' (SEQ ID NO: 4);

Slx-Index Primer：Slx-Index Primer:

5'-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGT-3'（SEQ ID NO:6）；其中N为碱基A、T、G、C中的任一个。5'-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGT-3' (SEQ ID NO: 6); wherein N is any one of bases A, T, G, and C.

本方法可以实现甲基化修饰位点的准确定位，通过酶切富集基因组甲基化位点，直接对甲基化标签序列进行测序和定量，可以有效降低测序费用。实验流程仅需两天，具有通量高、操作简便、成本低廉、可靠性好等优点，是一种优良的适用于非模式生物的高通量全基因组甲基化检测方法。This method can realize accurate positioning of methylation modification sites, enrich genome methylation sites by enzymatic digestion, and directly sequence and quantify methylation tag sequences, which can effectively reduce sequencing costs. The experimental process only takes two days, and has the advantages of high throughput, simple operation, low cost, and good reliability. It is an excellent high-throughput genome-wide methylation detection method suitable for non-model organisms.

附图说明 Description of drawings

图1：本发明的全基因组甲基化检测方法的流程及原理示意图。Figure 1: Schematic diagram of the process and principle of the genome-wide methylation detection method of the present invention.

具体实施方式： Detailed ways:

本发明开发了基于高通量测序平台的新型全基因组甲基化检测分析技术MethylRAD-Seq（Methylation-dependent restriction-site associated DNA sequencing）,结合甲基修饰依赖性内切酶和高通量测序的特点，在不需要基因组背景信息的前提下可以大规模发掘全基因组范围内的甲基化位点，直接精确检测发生甲基化的胞嘧啶位点。该技术的原理主要是利用甲基化修饰依赖性限制性酶FspEI对基因组DNA进行酶切，基因组中CG 或 CHG 位点上的甲基化修饰均可以被FspEI 识别，酶切后产生具有核心甲基化位点的等长标签，对标签文库进行高通量测序技术能够获得全基因组范围内的甲基化位点序列信息。建库流程简便快速，可同时对多个样本进行全基因组DNA甲基化分析。而且，本发明所用的接头和引物可以高效的对基因组DNA进行操作，提高了效率。本发明的方法对于表观遗传学背景相对较少的非模式生物是一种成本较低、操作简便的高通量全基因组甲基化分析方法。The present invention has developed a new genome-wide methylation detection and analysis technology MethylRAD-Seq (Methylation-dependent restriction-site associated DNA sequencing) based on a high-throughput sequencing platform, which combines methyl modification-dependent endonucleases and high-throughput sequencing It is characterized by large-scale mining of methylation sites across the genome without the need for genomic background information, and direct and accurate detection of methylated cytosine sites. The principle of this technology is to use the methylation modification-dependent restriction enzyme FspEI to digest genomic DNA. The methylation modification at the CG or CHG site in the genome can be recognized by FspEI, and the DNA with core formazan is produced after digestion. The high-throughput sequencing technology of the tag library can obtain the sequence information of the methylation site in the whole genome. The library construction process is simple and fast, and the whole genome DNA methylation analysis can be performed on multiple samples at the same time. Moreover, the adapters and primers used in the present invention can efficiently operate on genomic DNA, improving the efficiency. The method of the present invention is a low-cost, easy-to-operate high-throughput genome-wide methylation analysis method for non-model organisms with relatively few epigenetic backgrounds.

对于本发明中所涉及的名词，定义如下：For the terms involved in the present invention, the definitions are as follows:

1、内切酶，又称为核酸内切酶（endonuclease）在核酸水解酶中，为可水解分子链内部磷酸二酯键生成寡核苷酸的酶；本发明所用到的FspEI内切酶是一种甲基化修饰依赖性内切酶，依赖于胞嘧啶位点的甲基化修饰从而对DNA分子产生切割作用。该酶购自NEB（New EnglandBiolabs）有限公司。1. Endonuclease, also known as endonuclease (endonuclease) in nucleolytic enzymes, is an enzyme that can hydrolyze the internal phosphodiester bond of the molecular chain to generate oligonucleotides; the FspEI endonuclease used in the present invention is A methylation-dependent endonuclease that cleaves DNA molecules depending on the methylation of cytosine sites. The enzyme was purchased from NEB (New England Biolabs) Co., Ltd.

2、接头：adaptor DNA ，是一段短的含酶切位点并能与钝性末端或粘性末端匹配的人工合成DNA片段，接头DNA常用于一钝性末端DNA与一粘性末端DNA的连接。有时连接到粘性末端的接头DNA是为了给未知DNA片段提供一段已知的序列，根据其设计引物，扩增未知的DNA片段。2. Adapter: adapter DNA is a short artificially synthesized DNA fragment that contains a restriction site and can be matched with a blunt end or a sticky end. Adapter DNA is often used to connect a blunt end DNA to a sticky end DNA. Sometimes the linker DNA ligated to the cohesive end is to provide a known sequence to the unknown DNA fragment, based on which primers are designed to amplify the unknown DNA fragment.

3、其中N为碱基A、T、G、C中的任一个；其中A、T、G、C代表组成DNA分子的四种脱氧核苷。3. N is any one of the bases A, T, G, and C; wherein A, T, G, and C represent the four deoxynucleosides that make up the DNA molecule.

4、Barcode即一段短的特征序列，对多个样本同时进行高通量测序时，对每条reads上带有的一段特定短序列（即barcode）测序能够准确识别样本来源。4. Barcode is a short characteristic sequence. When high-throughput sequencing is performed on multiple samples at the same time, sequencing a specific short sequence (ie barcode) on each reads can accurately identify the source of the sample.

本发明的方法，包括有如下的步骤：The method of the present invention comprises the following steps:

1）制备生物基因组DNA：提取生物的基因组DNA,4℃冰箱保存备用。1) Preparation of biological genome DNA: extract the genome DNA of the organism and store it in a refrigerator at 4°C for later use.

将提取的基因组DNA利用甲基化修饰依赖性内切酶FspEI酶切基因组，获得全基因组范围内的甲基化标签，其中酶切体系为20μl,包含300ng基因组DNA,4U的FspEI内切酶（NEB）,1×NE Buffer4,1×EnzymeActivator Solution,1×BSA,37℃保温4小时。The extracted genomic DNA was cleaved with the methylation modification-dependent endonuclease FspEI to obtain methylation tags in the whole genome. The enzyme digestion system was 20 μl, containing 300ng of genomic DNA, and 4U of FspEI endonuclease ( NEB), 1×NE Buffer4, 1×Enzyme Activator Solution, 1×BSA, kept at 37°C for 4 hours.

2）设计有粘性末端的接头，连接标签2) Adapters designed with sticky ends, ligated tags

酶切反应产生的标签 5′末端都带有一个4碱基突出，设计3’端带4个兼并碱基的接头Slx-Adaptor 1, Slx-Adaptor 2,连接反应体系为20μl，包含10μl上步的酶切产物，800U T4 DNA连接酶（NEB）,1×T4 Ligase Buffer,4μM Slex-Ad1,4μM Slx-Ad2,20mM三磷酸腺苷ATP,4℃连接16h。The tags produced by the enzyme digestion reaction have a 4-base overhang at the 5' end, and design adapters Slx-Adaptor 1 and Slx-Adaptor 2 with 4 merged bases at the 3' end. The ligation reaction system is 20 μl, including 10 μl of the previous step 800U T4 DNA ligase (NEB), 1×T4 Ligase Buffer, 4μM Slex-Ad1, 4μM Slx-Ad2, 20mM ATP, 4℃ for 16h.

其中接头的序列信息见表1。The sequence information of the linker is shown in Table 1.

3）进行第一轮PCR扩增，富集标签，其中PCR反应体系为20μL，包含7μl连接了接头的酶切片段作为反应模板，4μM Slx-Primer1引物3) Carry out the first round of PCR amplification to enrich the tags. The PCR reaction system is 20 μL, including 7 μl of enzyme-cut fragments connected with adapters as reaction templates, and 4 μM Slx-Primer1 primer

（5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC -3'）,4μM Slx-Primer2引物（5'- GTGACTGGAGTTCAGACGTGT -3'）,0.3mM dNTPs, 0.4U Phusion 超保真 DNA 聚合酶（NEB），1×HF buffer；反应条件为98℃变性5 s，60℃退火20 s，72℃延伸10 s，进行14-18个循环，最后72℃延伸 5min。PCR产物用8% 非变性聚丙烯酰胺琼凝胶电泳检测，扩增产物大小约为120bp, 切胶回收PCR产物。(5'- AATGATACGGCGACCACCGAGATTCACACTCTTTCCCTACACGAC -3'), 4μM Slx-Primer2 primer (5'- GTGACTGGAGTTCAGACGTGT -3'), 0.3mM dNTPs, 0.4U Phusion ultra-fidelity DNA polymerase (NEB), 1×HF buffer; the reaction conditions are Denaturation at 98°C for 5 s, annealing at 60°C for 20 s, extension at 72°C for 10 s, 14-18 cycles, and finally extension at 72°C for 5 min. The PCR product was detected by 8% non-denaturing polyacrylamide agar gel electrophoresis, the size of the amplified product was about 120bp, and the PCR product was recovered by cutting the gel.

4）Barcode特异性引物二轮PCR扩增4) Second-round PCR amplification with Barcode-specific primers

为了实现多个个体混合测序进行甲基化检测，可以通过对每个个体添加不同的Barcode来区分，利用PCR反应的不同引物引入不同的Barcode。PCR反应体系为20μL，包含25ng一轮PCR扩增纯化产物，4μM Slx-Primer1引物 5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC -3' ，In order to realize the mixed sequencing of multiple individuals for methylation detection, it can be distinguished by adding different Barcodes to each individual, and using different primers in the PCR reaction to introduce different Barcodes. The PCR reaction system is 20 μL, including 25ng round PCR amplification purification product, 4μM Slx-Primer1 primer 5'-AATGATACGGCGACCACCGAGATTCACTCTTTCCCTACACGAC-3',

4μM Slx-Index Primer引物：4μM Slx-Index Primer primer:

5'- CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGT-3'，其中NNNNNN可根据不同的Barcode序列改变）,0.3mM dNTPs, 0.4UPhusion 超保真 DNA 聚合酶（NEB），1× HF buffer；反应条件为98℃变性5 s，60℃退火20 s，72℃延伸10 s，进行5-7个循环，最后72℃延伸5min。平行扩增3管，PCR产物用8％非变性聚丙烯酰胺凝胶电泳检测，扩增产物大小约为150bp，利用QIAGEN PCR产物纯化试剂盒回收纯化PCR产物。利用Solexa Hiseq2000测序平台测序，此部分由测序公司完成。5'- CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGT-3', where NNNNNN can be changed according to different Barcode sequences), 0.3mM dNTPs, 0.4UPhusion ultra-fidelity DNA polymerase (NEB), 1× HF buffer; the reaction conditions are denaturation at 98°C for 5 s, 60 Anneal at ℃ for 20 s, extend at 72°C for 10 s, perform 5-7 cycles, and finally extend at 72°C for 5 min. Three tubes were amplified in parallel, and the PCR product was detected by 8% non-denaturing polyacrylamide gel electrophoresis. The size of the amplified product was about 150 bp, and the PCR product was recovered and purified using the QIAGEN PCR Product Purification Kit. The Solexa Hiseq2000 sequencing platform was used for sequencing, and this part was completed by the sequencing company.

5）数据分析：5) Data analysis:

本实施例采用的方法如下：The method adopted in this embodiment is as follows:

1、Illunima/Solexa测序产生的结果文件为fastq格式，首先利用SolexaQA软件包对原始序列进行质量过滤，去除含有N的序列以及大于5个碱基的质量值小于10的reads；1. The result file generated by Illunima/Solexa sequencing is in fastq format. First, use the SolexaQA software package to perform quality filtering on the original sequence, and remove sequences containing N and reads with a quality value greater than 5 bases and less than 10;

2、利用CD-HIT软件对短序列进行聚类分析，获得测序文库中的甲基化标签种类以及该代表标签的丰度信息，甲基化位点的覆盖reads数目可以衡量该位点的甲基化水平；2. Use CD-HIT software to cluster and analyze short sequences to obtain the type of methylation tag in the sequencing library and the abundance information of the representative tag. The number of covered reads at the methylation site can measure the methylation of the site. Basic level;

3、利用SOAP软件将甲基化位点的序列比对基因组参考序列，可以获得该位点的基因组来源信息。3. Using SOAP software to compare the sequence of the methylation site with the genome reference sequence, the genome source information of the site can be obtained.

表1本发明中涉及的接头及引物序列表Linker and primer sequence list involved in the present invention in table 1

下面以虾夷扇贝为例通过实施例详细叙述本发明，对于本发明所用的试剂，本领域的技术人员可以根据本发明的技术方案，在现有试剂中进行选择，而不仅限于本发明具体实施例的限制。The present invention is described in detail below by taking the scallop as an example. For the reagents used in the present invention, those skilled in the art can select from the existing reagents according to the technical scheme of the present invention, and are not limited to the specific implementation of the present invention. Example limitations.

1)提取扇贝基因组DNA1) Extraction of scallop genome DNA

取II龄野生群体的虾夷扇贝和海大金贝各12只，每个个体闭壳肌约0.1克，加入500ulSTE裂解缓冲液(NaCl:100mM；EDTA:1mM，pH=8.0；Tris-HCl，10nM,pH=8.0)，剪碎，再加入50μl 10%的SDS，以及5μl蛋白酶K（20mg/ml），56℃水浴消化，至组织碎块完全裂解，裂解液澄清。加入等体积饱和酚（250μl）以及氯仿/异戊醇（24：1）（250μl），抽提3次，取上清液，加入等体积氯仿/异戊醇（24：1）（500ul）抽提1次，取上清液，加入1/10体积NaAc（3M，pH 5.2）（50μl）和2倍体积-20℃保存无水乙醇（1000μl），缓慢摇匀；-20℃沉淀30min.12000rpm离心10min，核酸将沉淀于管底。70%乙醇（1000μl）洗涤沉淀并干燥至乙醇全部挥发，加入100μl无菌水以及少量（1-2μl）RNase A，4℃冰箱保存备用。Take 12 scallops and golden scallops from the wild population at age II, add about 0.1 g of adductor muscle to each individual, add 500 ul of STE lysis buffer (NaCl: 100 mM; EDTA: 1 mM, pH=8.0; Tris-HCl, 10nM, pH=8.0), cut into pieces, then added 50μl 10% SDS, and 5μl proteinase K (20mg/ml), digested in a water bath at 56°C until the tissue fragments were completely lysed and the lysate was clarified. Add an equal volume of saturated phenol (250 μl) and chloroform/isoamyl alcohol (24:1) (250 μl), extract 3 times, take the supernatant, add an equal volume of chloroform/isoamyl alcohol (24:1) (500ul) to extract Extract once, take the supernatant, add 1/10 volume of NaAc (3M, pH 5.2) (50μl) and 2 times the volume of absolute ethanol (1000μl) stored at -20°C, shake slowly; precipitate at -20°C for 30min.12000rpm Centrifuge for 10 minutes, and the nucleic acid will precipitate at the bottom of the tube. Wash the precipitate with 70% ethanol (1000 μl) and dry until the ethanol evaporates completely, add 100 μl sterile water and a small amount (1-2 μl) of RNase A, and store in a 4°C refrigerator for later use.

扇贝基因组DNA的消化Digestion of scallop genomic DNA

利用甲基化修饰依赖性内切酶FspEI酶切基因组，获得全基因组范围内的甲基化标签：酶切体系为20μl,包含300ng基因组DNA,4U的FspEI内切酶（NEB）,1×NEBuffer4,1×Enzyme Activator Solution,1×BSA,37℃保温4小时。Use the methylation-dependent endonuclease FspEI to digest the genome to obtain methylation tags across the genome: the digestion system is 20 μl, containing 300ng genomic DNA, 4U of FspEI endonuclease (NEB), 1×NEBuffer4 , 1×Enzyme Activator Solution, 1×BSA, keep warm at 37°C for 4 hours.

2) 将酶切片段的两端分别连接上接头，作为扩增引物的结合点2) Connect the two ends of the enzyme-digested fragments to adapters as the binding points of the amplification primers

酶切反应产生的标签 5′末端都带有一个4碱基突出，设计3’端带4个兼并碱基的接头Slx-Ad1, Slx-Ad2,连接反应体系为20ul，包含10ul上步的酶切产物，800U T4 DNA连接酶（NEB）,1×T4 LigaseBuffer,4uM Adaptor 1,4uM Adaptor 2,20mM三磷酸腺苷ATP,4℃连接16h。The tags produced by the enzyme digestion reaction have a 4-base overhang at the 5' end, and design adapters Slx-Ad1 and Slx-Ad2 with 4 merged bases at the 3' end, and the ligation reaction system is 20ul, including 10ul of the enzyme in the previous step Digested product, 800U T4 DNA ligase (NEB), 1×T4 LigaseBuffer, 4uM Adapter 1, 4uM Adapter 2, 20mM adenosine triphosphate ATP, ligated at 4°C for 16h.

3) 将连接上接头的酶切片段用引物进行第一轮PCR扩增，从而富集接头连接正确的酶切片段；3) Perform the first round of PCR amplification of the enzyme-cut fragments connected with the adapters with primers, so as to enrich the enzyme-cut fragments connected with the adapters correctly;

PCR反应体系为20μL，包含7ul反应模板，4μM Slx-Primer1引物（5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC -3'）,4μM Slx-Primer2引物（5'- GTGACTGGAGTTCAGACGTGT -3'）,0.3mMdNTPs, 0.4U Phusion 超保真 DNA 聚合酶（NEB），1× HFbuffer；反应条件为98℃变性5 s，60℃退火20 s，72℃延伸10 s，进行14-18个循环，最后72℃延伸 5min。PCR产物用8% 非变性聚丙烯酰胺琼凝胶电泳检测，扩增产物大小约为120bp, 切胶回收PCR产物。The PCR reaction system is 20μL, including 7ul reaction template, 4μM Slx-Primer1 primer (5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC-3'), 4μM Slx-Primer2 primer (5'- GTGACTGGAGTTCAGACGTGT -3'), 0.3mMdNTPs, 0.4U Phusion ultra-fidelity DNA polymerase (NEB), 1× HFbuffer; the reaction conditions are denaturation at 98°C for 5 s, annealing at 60°C for 20 s, extension at 72°C for 10 s, 14-18 cycles, and finally extension at 72°C for 5 min. The PCR product was detected by 8% non-denaturing polyacrylamide agar gel electrophoresis, the size of the amplified product was about 120bp, and the PCR product was recovered by cutting the gel.

4) 将第一轮PCR扩增产物用引物进行第二轮PCR扩增，引入Barcode来构建测序文库；4) Perform the second round of PCR amplification of the first round of PCR amplification products with primers, and introduce Barcode to construct a sequencing library;

为了实现多个个体混合测序进行甲基化检测，可以通过对每个个体添加不同的Barcode来区分，利用PCR反应的不同引物引入不同的Barcode。PCR反应体系为20uL，包含25ng一轮PCR扩增纯化产物，4μM Slx-Primer1引物 5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC -3' ，4μM Slx-Index Primer引物In order to realize the mixed sequencing of multiple individuals for methylation detection, it can be distinguished by adding different Barcodes to each individual, and using different primers in the PCR reaction to introduce different Barcodes. The PCR reaction system is 20uL, including 25ng round PCR amplification purification product, 4μM Slx-Primer1 primer 5'-AATGATACGGCGACCACCGAGATTCTACACTCTTTCCCTACACGAC-3', 4μM Slx-Index Primer primer

5'- CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGT-3'，其中NNNNNN可根据不同的Barcode序列改变,24只虾夷扇贝甲基化文库使用的Slx-Index Primer引物序列如表2所示,0.3mM dNTPs, 0.4UPhusion 超保真 DNA聚合酶（NEB），1× HF buffer；反应条件为98℃变性5 s，60℃退火20 s，72℃延伸10 s，进行5-7个循环，最后72℃延伸5min。平行扩增3管，PCR产物用8％非变性聚丙烯酰胺凝胶电泳检测，扩增产物大小约为150bp，利用QIAGEN PCR产物纯化试剂盒回收纯化PCR产物。利用Solexa Hiseq2000测序平台测序，此部分由测序公司完成。5'- CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGT-3', where NNNNNN can be changed according to different Barcode sequences, the Slx-Index Primer primer sequence used in the 24 scallop methylation libraries is shown in Table 2, 0.3mM dNTPs, 0.4UPhusion ultra-fidelity DNA polymerase (NEB), 1× HF buffer; the reaction conditions are denaturation at 98°C for 5 s, annealing at 60°C for 20 s, extension at 72°C for 10 s, 5-7 cycles, and finally extension at 72°C for 5 min. Three tubes were amplified in parallel, and the PCR product was detected by 8% non-denaturing polyacrylamide gel electrophoresis. The size of the amplified product was about 150 bp, and the PCR product was recovered and purified using the QIAGEN PCR Product Purification Kit. The Solexa Hiseq2000 sequencing platform was used for sequencing, and this part was completed by the sequencing company.

表2 本发明中涉及的Slx-Index Primer引物序列表Table 2 Slx-Index Primer primer sequence list involved in the present invention

1 1 CAAGCAGAAGACGGCATACGAGATAACCTGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATAACCTGGTGACTGGAGTTCAGACGTGT 2 2 CAAGCAGAAGACGGCATACGAGATAACTGAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATAACTGAGTGACTGGAGTTCAGACGTGT 3 3 CAAGCAGAAGACGGCATACGAGATACAGTGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATACAGTGGTGACTGGAGTTCAGACGTGT 4 4 CAAGCAGAAGACGGCATACGAGATAGACTCGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATAGACTCGTGACTGGAGTTCAGACGTGT 5 5 CAAGCAGAAGACGGCATACGAGATATCACGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATATCACGGTGACTGGAGTTCAGACGTGT 6 6 CAAGCAGAAGACGGCATACGAGATATGTCAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATATGTCAGTGACTGGAGTTCAGACGTGT 7 7 CAAGCAGAAGACGGCATACGAGATCACGATGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCACGATGTGACTGGAGTTCAGACGTGT 8 8 CAAGCAGAAGACGGCATACGAGATCATCGGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCATCGGGTGACTGGAGTTCAGACGTGT 9 9 CAAGCAGAAGACGGCATACGAGATCATGGCGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCATGGCGTGACTGGAGTTCAGACGTGT 10 10 CAAGCAGAAGACGGCATACGAGATCCATCTGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCCATCTGTGACTGGAGTTCAGACGTGT 11 11 CAAGCAGAAGACGGCATACGAGATCGACATGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCGACATGTGACTGGAGTTCAGACGTGT 12 12 CAAGCAGAAGACGGCATACGAGATCGTACGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCGTACGGTGACTGGAGTTCAGACGTGT 13 13 CAAGCAGAAGACGGCATACGAGATCGTGCAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCGTGCAGTGACTGGAGTTCAGACGTGT 14 14 CAAGCAGAAGACGGCATACGAGATCTCAGAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCTCAGAGTGACTGGAGTTCAGACGTGT 15 15 CAAGCAGAAGACGGCATACGAGATCTGTAGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCTGTAGGTGACTGGAGTTCAGACGTGT 16 16 CAAGCAGAAGACGGCATACGAGATGACATCGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATGACATCGTGACTGGAGTTCAGACGTGT 17 17 CAAGCAGAAGACGGCATACGAGATGCACTAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATGCACTAGTGACTGGAGTTCAGACGTGT 18 18 CAAGCAGAAGACGGCATACGAGATGCTCCAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATGCTCCAGTGACTGGAGTTCAGACGTGT 19 19 CAAGCAGAAGACGGCATACGAGATGTAGAGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATGTAGAGGTGACTGGAGTTCAGACGTGT 20 20 CAAGCAGAAGACGGCATACGAGATGTCTAGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATGTCTAGGTGACTGGAGTTCAGACGTGT 21 twenty one CAAGCAGAAGACGGCATACGAGATTAGTCAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATTAGTCAGTGACTGGAGTTCAGACGTGT 22 twenty two CAAGCAGAAGACGGCATACGAGATTCGGCAGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATTCGGCAGTGACTGGAGTTCAGACGTGT 23 twenty three CAAGCAGAAGACGGCATACGAGATTGAGTGGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATTGAGTGGTGACTGGAGTTCAGACGTGT 24 twenty four CAAGCAGAAGACGGCATACGAGATTGTAGCGTGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATTGTAGCGTGACTGGAGTTCAGACGTGT

注：粗体表示barcode序列Note: Bold indicates the barcode sequence

5）测序文库利用Solexa Hiseq2000测序平台进行测序；将测序数据进行分析得到全基因组甲基化信息。5) The sequencing library was sequenced using the Solexa Hiseq2000 sequencing platform; the sequencing data was analyzed to obtain genome-wide methylation information.

1、Illumina/Solexa测序产生的结果文件为fastq格式，首先利用编译的perl脚本对24个文库的原始序列进行质量过滤，去除含有N的序列以及大于5个碱基的质量值小于10的reads，24个甲基化文库中得到的高质量reads数目范围在6,749,208—14,661,303，每个库中的高质量reads数目占原始reads的百分比均在99%以上。1. The result file generated by Illumina/Solexa sequencing is in the fastq format. First, use the compiled perl script to perform quality filtering on the original sequences of the 24 libraries, and remove sequences containing N and reads with a quality value greater than 5 bases and less than 10. The number of high-quality reads obtained in the 24 methylated libraries ranged from 6,749,208 to 14,661,303, and the number of high-quality reads in each library accounted for more than 99% of the original reads.

2、利用CD-HIT软件对高质量标签序列进行聚类分析，获得测序文库中的甲基化标签种类以及该代表标签的丰度，即该位点的甲基化水平信息。在野生型虾夷扇贝和一种突变型虾夷扇贝新品系“海大金贝”基因组范围内获得98,754个甲基化位点，将甲基化位点比对到基因组参考序列上并进行基因的注释分析表明，具有甲基化位点的基因涉及到多种生物学功能和代谢途径中，说明贝类的生长、繁殖和免疫等相关通路可能有表观遗传学机制的参与调控。2. Use CD-HIT software to perform cluster analysis on high-quality tag sequences, and obtain the type of methylation tag in the sequencing library and the abundance of the representative tag, that is, the methylation level information of the site. 98,754 methylation sites were obtained within the genome range of the wild-type scallop and a new mutant line of scallop "Haida Jinbei", and the methylation sites were compared to the genome reference sequence and genetically Annotation analysis of . showed that genes with methylation sites were involved in a variety of biological functions and metabolic pathways, indicating that the growth, reproduction and immunity of shellfish and other related pathways may be regulated by epigenetic mechanisms.

3、利用edgeR软件对野生型虾夷扇贝和“海大金贝”全基因组范围的位点进行甲基化水平差异分析，获得差异甲基化位点2452个，在“海大金贝”中有8个位点的甲基化水平与闭壳肌类胡萝卜素含量表现出较强的相关性，相关系数在0.90以上，将此类位点比对到基因组参考序列中，获得了若干与类胡萝卜素积累相关的候选基因。为了测试本发明技术的可靠性，设计了两组技术重复实验，两组平行对照的泊松系数均能达到0.99以上。验证了本发明技术的稳定和可靠性。3. Using the edgeR software to analyze the difference in methylation level between wild-type scallops and "Haida Jinbei" genome-wide loci, 2452 differentially methylated sites were obtained, among the "Haida Jinbei" The methylation level of 8 sites showed a strong correlation with the carotenoid content of adductor muscle, and the correlation coefficient was above 0.90. By comparing these sites to the genome reference sequence, several results related to carotenoids were obtained. Candidate genes associated with carotene accumulation. In order to test the reliability of the technology of the present invention, two groups of technical repeated experiments were designed, and the Poisson coefficients of the two groups of parallel controls could reach more than 0.99. The stability and reliability of the technology of the present invention are verified.

实验证明本发明技术既可以用于全基因组范围内甲基化位点的筛选，也可以用于不同细胞、组织或者样本间的甲基化位点修饰的差异分析，探讨DNA甲基化影响基因表达的调控机制。Experiments have proved that the technology of the present invention can be used not only for the screening of methylation sites in the whole genome, but also for the differential analysis of methylation site modifications among different cells, tissues or samples, and to explore the influence of DNA methylation on genes. Expression regulation mechanism.

本发明具有通量高、效率高、成本低的特点，适用于在全基因组范围分析检测甲基化位点，在非模式生物的全基因组甲基化检测中具有良好的应用潜能。The invention has the characteristics of high throughput, high efficiency and low cost, is suitable for analyzing and detecting methylation sites in the whole genome, and has good application potential in the whole genome methylation detection of non-model organisms.

Claims

1. A whole genome DNA methylation detection method, comprising the steps of:

1) Genomic DNA is digested with endonuclease FspEI to obtain digested fragments;

2) connecting the two ends of the enzyme-cut fragments with adapters as the binding points of the amplification primers;

3) performing the first round of PCR amplification on the enzyme-cut fragments connected with the adapters with primers, so as to enrich the enzyme-cut fragments connected with the adapters correctly;

4) The first round of PCR amplification products were gel-purified, and primers were used for the second round of PCR amplification, and Barcode was introduced to construct a sequencing library;

5) The sequencing library is sequenced; the sequencing data is analyzed to obtain the methylation information of the whole genome;

Wherein the joint in step 2) is joint slx1 and slx2, wherein constitutes two nucleotide fragments of slx1, and its sequence is respectively SEQ ID NO:1 and SEQ ID NO:2; Constitutes two nucleotide fragments of slx2 , whose sequences are respectively SEQ ID NO:3 and SEQ ID NO:2;

The nucleotide sequences of the primers in the step 3) are respectively SEQ ID NO:4 and SEQ ID NO:5;

The nucleotide sequences of the primers in the step 4) are respectively SEQ ID NO:4 and SEQ ID NO:6.