Nothing Special   »   [go: up one dir, main page]

CN113012757B - Method and system for identifying bases in nucleic acids - Google Patents

Method and system for identifying bases in nucleic acids Download PDF

Info

Publication number
CN113012757B
CN113012757B CN201911331502.1A CN201911331502A CN113012757B CN 113012757 B CN113012757 B CN 113012757B CN 201911331502 A CN201911331502 A CN 201911331502A CN 113012757 B CN113012757 B CN 113012757B
Authority
CN
China
Prior art keywords
image
images
intensity
sequencing
bright spot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911331502.1A
Other languages
Chinese (zh)
Other versions
CN113012757A (en
Inventor
李林森
金欢
姜泽飞
孙雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Genemind Biosciences Co Ltd
Original Assignee
Genemind Biosciences Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genemind Biosciences Co Ltd filed Critical Genemind Biosciences Co Ltd
Priority to CN201911331502.1A priority Critical patent/CN113012757B/en
Priority to PCT/CN2020/114355 priority patent/WO2021120715A1/en
Priority to US17/787,824 priority patent/US12211589B2/en
Priority to EP20902453.8A priority patent/EP4116402A4/en
Publication of CN113012757A publication Critical patent/CN113012757A/en
Application granted granted Critical
Publication of CN113012757B publication Critical patent/CN113012757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

本发明公开了一种识别核酸中的碱基的方法、一种计算机可读存储介质、一种计算机程序产品和一种系统。所称的识别核酸中的碱基的方法包括将对应于模板的亮斑集合中的每个亮斑的坐标映射到待检图像上,确定待检图像上相应坐标的位置;确定待检图像上相应坐标的位置的信号的强度,该强度为矫正后的强度;以及比较待检图像上相应坐标的位置的信号的强度与第一预设值的大小,基于比较结果判断该位置对应的碱基类型,实现碱基识别。该方法能够快速且准确地识别碱基,实现模板的至少一部分序列的核苷酸/碱基的次序的测定。

The invention discloses a method for identifying bases in nucleic acids, a computer-readable storage medium, a computer program product and a system. The so-called method for identifying bases in nucleic acids includes mapping the coordinates of each bright spot in the bright spot set corresponding to the template onto the image to be detected, determining the position of the corresponding coordinates on the image to be detected; determining the position of the corresponding coordinates on the image to be detected; The intensity of the signal at the position of the corresponding coordinates, which is the corrected intensity; and comparing the intensity of the signal at the position of the corresponding coordinates on the image to be detected with the size of the first preset value, and judging the base corresponding to the position based on the comparison result. type to achieve base recognition. This method can quickly and accurately identify bases and achieve the determination of the nucleotide/base sequence of at least a part of the template sequence.

Description

识别核酸中的碱基的方法和系统Methods and systems for identifying bases in nucleic acids

技术领域Technical field

本发明涉及数据处理领域,尤其涉及一种识别核酸中的碱基的方法、一种计算机可读存储介质、一种计算机程序产品和一种系统。The present invention relates to the field of data processing, and in particular to a method for identifying bases in nucleic acids, a computer-readable storage medium, a computer program product and a system.

背景技术Background technique

在相关技术中,所称的测序一般指测定生物聚合物,包括测定核酸例如DNA和RNA等的一级结构或序列,包括测定给定的核酸片段的核苷酸碱基(腺嘌呤A、鸟嘌呤G、胸腺嘧啶T/尿嘧啶U和胞嘧啶C)的次序的过程。该类方法通常包括识别核酸中的一个或多个位置上的碱基即进行碱基识别(basecalling),来测定该核酸的序列。In the related art, the so-called sequencing generally refers to the determination of biopolymers, including determination of the primary structure or sequence of nucleic acids such as DNA and RNA, including determination of the nucleotide bases (adenine A, adenine A, adenine, etc.) of a given nucleic acid fragment. The sequence of purine G, thymine T/uracil U and cytosine C). This type of method usually involves identifying the base at one or more positions in the nucleic acid, that is, performing base calling (basecalling), to determine the sequence of the nucleic acid.

核苷酸/碱基结合到待测核酸分子(模板)的特定位置对应的信号和/或信号强度的变化可以指示该核酸分子上该位置的碱基类型,例如,可利用标记不同荧光分子来识别不同的碱基。所称的核苷酸/碱基结合到待测核酸分子的特定位置,也称为核苷酸/碱基掺入到待测核酸分子或者碱基延伸,例如可通过聚合、连接和杂交等方式来实现。The signal and/or the change in signal intensity corresponding to the binding of nucleotides/bases to a specific position of the nucleic acid molecule (template) to be tested can indicate the type of base at that position on the nucleic acid molecule. For example, different fluorescent molecules can be used to label it. Recognize different bases. The so-called nucleotide/base binding to a specific position of the nucleic acid molecule to be tested is also called the incorporation of nucleotides/bases into the nucleic acid molecule to be tested or base extension, for example, through polymerization, connection and hybridization. to fulfill.

具体地,在利用光学成像系统多次对碱基延伸的信号进行图像采集、并基于处理该些图像实现核酸测序的平台上,由于光学效应、空间效应和/或化学反应如色差(chromatic aberration)、串色(crosstalk)和/或相位失相(phasing)等对图像采集、定位和/或信号强度的影响,常使得难以基于图像处理准确地识别碱基。Specifically, on a platform that uses an optical imaging system to collect images of base extension signals multiple times and implements nucleic acid sequencing based on processing these images, due to optical effects, spatial effects and/or chemical reactions such as chromatic aberration The effects of crosstalk and/or phasing on image acquisition, positioning and/or signal intensity often make it difficult to accurately identify bases based on image processing.

因此,如何处理包括关联多次不同时间点所采集的图像的信息,以有效且准确地判定该模板的至少一部分的核苷酸/碱基类型和次序,是期望得到解决或改善的问题。Therefore, how to process information including correlating images collected at multiple different time points to effectively and accurately determine the nucleotide/base type and sequence of at least a part of the template is a problem that is expected to be solved or improved.

发明内容Contents of the invention

本发明实施方式旨在至少一定程度上解决现有技术中存在的技术问题之一或者至少提供一种有用的手段。为此,本发明实施方式提供一种识别核酸中的一个或多个碱基的方法、一种计算机可读存储介质、一种计算机程序产品和一种系统。The embodiments of the present invention are intended to solve one of the technical problems existing in the prior art to at least a certain extent or at least provide a useful means. To this end, embodiments of the present invention provide a method for identifying one or more bases in a nucleic acid, a computer-readable storage medium, a computer program product, and a system.

本发明实施方式的一种识别核酸中的一个或多个碱基的方法,该方法通过检测获自测序的图像,包括:将对应于模板的亮斑集合中的每个亮斑的坐标映射到待检图像上,确定待检图像上相应坐标的位置;确定待检图像上相应坐标的位置的信号的强度,该强度为矫正后的强度;以及比较待检图像上相应坐标的位置的信号的强度与第一预设值的大小,基于比较结果判断该位置对应的碱基类型,实现碱基识别。A method for identifying one or more bases in a nucleic acid according to an embodiment of the present invention, by detecting an image obtained from sequencing, including: mapping the coordinates of each bright spot in a bright spot set corresponding to a template to On the image to be inspected, determine the position of the corresponding coordinates on the image to be inspected; determine the intensity of the signal at the position of the corresponding coordinates on the image to be inspected, which intensity is the corrected intensity; and compare the signal of the position of the corresponding coordinates on the image to be inspected. Based on the comparison result between the intensity and the first preset value, the base type corresponding to the position is determined to realize base identification.

所称的对应于模板的亮斑集合基于一组图像构建获得,所称的一组图像中的每个图像均包含多个亮斑;所称的一组图像和待检图像均来自测序且对应一个相同的视野;所称的一组图像来自至少一轮测序;至少一部分所称的信号在一组图像上表现为至少一部分所称的亮斑。The so-called bright spot set corresponding to the template is constructed based on a set of images. Each image in the so-called set of images contains multiple bright spots; the so-called set of images and the image to be inspected are both from Sequencing and corresponding to the same field of view; the set of said images comes from at least one round of sequencing; at least a part of the so-called signals appear as at least a part of the so-called bright spots on the set of images.

本发明其它实施方式涉及与上述实施方式中的方法有关的计算机可读介质、计算机产品、计算机程序产品和系统。Other embodiments of the invention relate to computer-readable media, computer products, computer program products and systems related to the methods of the above-described embodiments.

例如,本发明实施方式的一种计算机可读存储介质,用于存储供计算机执行的程序,执行该程序包括完成上述任一实施方式中的识别核酸中的碱基的方法。For example, a computer-readable storage medium according to an embodiment of the present invention is used to store a program for computer execution. Executing the program includes completing the method for identifying bases in nucleic acids in any of the above embodiments.

本发明实施方式的一种计算机产品,包括上述任一实施方式中的计算机可读存储介质。A computer product according to an embodiment of the present invention includes the computer-readable storage medium in any of the above embodiments.

本发明实施方式的一种系统,包括上述任一实施方式中的计算机产品;和,一个或多个处理器,用于执行存储于所称的计算机可读存储介质中的程序。执行所称的程序包括完成上述任一实施方式中的碱基识别方法。A system according to an embodiment of the present invention includes the computer product in any of the above embodiments; and one or more processors for executing programs stored in so-called computer-readable storage media. Executing the program includes completing the base calling method in any of the above embodiments.

本发明实施方式的一种计算机程序产品,包括实现识别核酸中的一个或多个碱基的指令,该指令在该计算机执行所称的程序时,使计算机执行上述任一实施方式中的碱基识别方法。A computer program product according to an embodiment of the present invention includes instructions for identifying one or more bases in nucleic acids. When the computer executes the so-called program, the instructions cause the computer to execute the bases in any of the above embodiments. recognition methods.

本发明实施方式的一种配置成执行上述任一实施方式中的识别核酸中的碱基的方法的系统。An embodiment of the present invention is a system configured to perform the method for identifying bases in nucleic acids in any of the above embodiments.

本发明实施方式的一种系统,包括多个模块,该些模块用于执行上述任一实施方式中的识别核酸中碱基的方法的步骤。A system according to an embodiment of the present invention includes a plurality of modules, which are used to perform the steps of the method for identifying bases in nucleic acids in any of the above embodiments.

上述本发明任一实施方式的识别核酸中的碱基的方法、产品和/或系统,对待检图像即原始输入数据的类型和格式等没有特别限制,待检图像可来自任何基于光学成像检测实现核酸测序的平台,包括但不限于一般所称的二代和三代测序平台,例如华大基因BGI包括全基因组CG(Complete Genomics)、伊鲁米纳Illumina包括太平洋生物PacBio(PacificBiosciences)、赛默飞世ThermoFisher包括生命技术Life technologies、罗氏Roche和海利克斯Helicos等机构的一个或多个系列测序平台。The method, product and/or system for identifying bases in nucleic acids according to any embodiment of the present invention has no special restrictions on the type and format of the image to be inspected, that is, the original input data. The image to be inspected can come from any implementation based on optical imaging detection. Nucleic acid sequencing platforms include but are not limited to so-called second-generation and third-generation sequencing platforms, such as BGI including Complete Genomics, Illumina including PacBio (PacificBiosciences), and Thermo Fisher Scientific ThermoFisher includes one or more series of sequencing platforms from Life technologies, Roche and Helicos.

利用本发明任一实施方式的方法、产品和/或系统进行碱基识别,能够快速且准确地识别碱基,实现模板的至少一部分序列的核苷酸/碱基的次序的测定。Using the method, product and/or system of any embodiment of the present invention to perform base identification can quickly and accurately identify bases and achieve the determination of the nucleotide/base sequence of at least a part of the template sequence.

本发明实施方式的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明实施方式的实践了解到。Additional aspects and advantages of embodiments of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of embodiments of the invention.

附图说明Description of the drawings

本发明实施方式的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of embodiments of the present invention will become apparent and readily understood from the description of the embodiments in conjunction with the following drawings, in which:

图1是本发明实施方式的识别核酸中的一个或多个碱基的方法的流程示意图;Figure 1 is a schematic flow chart of a method for identifying one or more bases in nucleic acids according to an embodiment of the present invention;

图2是本发明实施方式的四种荧光染料的光谱曲线;Figure 2 is the spectral curves of four fluorescent dyes according to the embodiment of the present invention;

图3是本发明实施方式的5*5矩阵示意图;Figure 3 is a 5*5 matrix schematic diagram of the embodiment of the present invention;

图4是本发明实施方式的5*5大小的卷积核示意图;Figure 4 is a schematic diagram of a 5*5 size convolution kernel according to the embodiment of the present invention;

图5是本发明实施方式的图像进行卷积前和卷积后的对比示意图;Figure 5 is a schematic comparison diagram of the image before and after convolution according to the embodiment of the present invention;

图6是本发明实施方式的一个5*5矩阵的指定方向的多个像素点的像素值呈单调波动示意图;Figure 6 is a schematic diagram showing monotonic fluctuations in pixel values of multiple pixels in a specified direction of a 5*5 matrix according to an embodiment of the present invention;

图7是本发明实施方式的将图像划分成多个块、确定块与块之间的偏移量来对齐一个视野的一轮图像的过程示意图;Figure 7 is a schematic diagram of the process of dividing an image into multiple blocks and determining the offset between blocks to align a round of images in one field of view according to an embodiment of the present invention;

图8是本发明实施方式的将两个图像分成100*100大小的块后,至少一部分的相应块组合之间的偏移量示意图;Figure 8 is a schematic diagram of the offset between at least part of the corresponding block combinations after dividing two images into blocks of 100*100 size according to an embodiment of the present invention;

图9是本发明实施方式的一轮测序的一组图像进行对应于模板的亮斑集合的构建过程示意图;Figure 9 is a schematic diagram of the construction process of a set of bright spots corresponding to the template for a set of images in one round of sequencing according to the embodiment of the present invention;

图10是本发明实施方式的一个视野的四张图像两两之间的crosstalk散点图;Figure 10 is a crosstalk scatter diagram between two images of four images in one field of view according to the embodiment of the present invention;

图11是本发明实施方式的A-G crosstalk散点图,横坐标为A,纵坐标为G;Figure 11 is an A-G crosstalk scatter diagram according to the embodiment of the present invention, with the abscissa being A and the ordinate being G;

图12是本发明实施方式的A-T信号强度拟合曲线;Figure 12 is an A-T signal intensity fitting curve according to the embodiment of the present invention;

图13是本发明实施方式的A-T信号强度矫正前和矫正后的结果示意图;Figure 13 is a schematic diagram of the results of A-T signal intensity before and after correction according to the embodiment of the present invention;

图14是本发明实施方式的进行色差矫正后的一个视野的一轮测序的四张图像两两之间的crosstalk示意图;Figure 14 is a schematic diagram of crosstalk between four images in one round of sequencing of one field of view after chromatic aberration correction according to the embodiment of the present invention;

图15是本发明实施方式的一个视野中的特定碱基的第一轮和第二轮(cycle1和cycle2)的信号串扰示意图,从上到下及从左往右依次为A、C、G和T的phasing散点图;每个phasing散点图中,横坐标为cycle1中该碱基的相对信号强度,纵坐标为cycle2中相同碱基的相对信号强度;Figure 15 is a schematic diagram of the signal crosstalk of the first and second rounds (cycle1 and cycle2) of a specific base in one field of view according to the embodiment of the present invention. From top to bottom and from left to right, they are A, C, G and The phasing scatter plot of T; in each phasing scatter plot, the abscissa is the relative signal intensity of the base in cycle1, and the ordinate is the relative signal intensity of the same base in cycle2;

图16是本发明实施方式的一个视野中的特定碱基的第三十轮和第三十一轮(cycle30和cycle31)的信号串扰示意图,从上到下及从左往右依次为A、C、G和T的phasing散点图;每个phasing散点图中,横坐标为cycle30中该碱基的相对信号强度,纵坐标为cycle31中相同碱基的相对信号强度;Figure 16 is a schematic diagram of signal crosstalk in the thirtieth and thirty-first rounds (cycle30 and cycle31) of a specific base in one field of view according to the embodiment of the present invention. From top to bottom and from left to right, they are A and C. phasing scatter plot of , G and T; in each phasing scatter plot, the abscissa is the relative signal intensity of the base in cycle30, and the ordinate is the relative signal intensity of the same base in cycle31;

图17是本发明实施方式的一个视野中的特定碱基的第六十轮和第六十一轮(cycle60和cycle61)的信号串扰示意图,从上到下及从左往右依次为A、C、G和T的phasing散点图;每个phasing散点图中,横坐标为cycle60中该碱基的相对信号强度,纵坐标为cycle61中相同碱基的相对信号强度;Figure 17 is a schematic diagram of the signal crosstalk of the sixtieth and sixty-first rounds (cycle60 and cycle61) of a specific base in one field of view according to an embodiment of the present invention. From top to bottom and from left to right, they are A and C. , G and T phasing scatter plot; in each phasing scatter plot, the abscissa is the relative signal intensity of the base in cycle60, and the ordinate is the relative signal intensity of the same base in cycle61;

图18是本发明实施方式的一个视野中的特定碱基的第九十轮和第九十一轮(cycle90和cycle91)的信号串扰示意图,从上到下及从左往右依次为A、C、G和T的phasing散点图;每个phasing散点图中,横坐标为cycle90中该碱基的相对信号强度,纵坐标为cycle91中相同碱基的相对信号强度;Figure 18 is a schematic diagram of the signal crosstalk of the ninetieth and ninety-first rounds (cycle90 and cycle91) of a specific base in one field of view according to an embodiment of the present invention. From top to bottom and from left to right, they are A and C. , G and T phasing scatter plot; in each phasing scatter plot, the abscissa is the relative signal intensity of the base in cycle90, and the ordinate is the relative signal intensity of the same base in cycle91;

图19是本发明实施方式的四种碱基的phasing比例或prephasing比例与测序轮数的关系示意图,横坐标为测序轮数,纵坐标为prephasing比例;Figure 19 is a schematic diagram of the relationship between the phasing ratio or prephasing ratio of four bases and the number of sequencing rounds according to the embodiment of the present invention. The abscissa is the number of sequencing rounds, and the ordinate is the prephasing ratio;

图20是本发明实施方式的cycle 30和31的A的phasing散点图;Figure 20 is a phasing scatter plot of A in cycles 30 and 31 according to the embodiment of the present invention;

图21是本发明实施方式的系统100示意图。Figure 21 is a schematic diagram of the system 100 according to the embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and are not to be construed as limitations of the present invention.

在本发明的实施方式中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量;限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。除非另有说明,“一组”或者“多个”指两个或两个以上。In the embodiments of the present invention, it should be understood that the terms "first" and "second" are only used for descriptive purposes and cannot be understood to indicate or imply relative importance or implicitly indicate the number of indicated technical features. ; Features defined as “first” and “second” may explicitly or implicitly include one or more of the described features. Unless otherwise stated, "a group" or "a plurality" refers to two or more.

需要说明的是,除非另有说明,“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接或可以相互通信;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在相应示例中的具体含义。It should be noted that, unless otherwise stated, "connected" and "connected" should be understood in a broad sense. For example, it can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection. Connection or mutual communication; it can be directly connected, or it can be indirectly connected through an intermediary, it can be the internal connection between two elements or the interactive relationship between two elements. For those of ordinary skill in the art, the specific meanings of the above terms in corresponding examples can be understood according to specific circumstances.

本发明可以在不同例子中重复参考数字和/或参考字母,这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施方式和/或设定之间的关系。The present invention may repeat reference numbers and/or reference letters in different examples. Such repetition is for the purposes of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed.

在本发明的实施方式中,所称的“测序”、“核酸测序”和“基因测序”可以互换,指核酸序列测定;包括合成测序(边合成边测序,SBS)和/或连接测序(边连接边测序,SBL),包括DNA测序和/或RNA测序,包括长片段测序和/或短片段测序,所称的长片段和短片段是相对的,如长于1Kb、2Kb、5Kb或者10Kb的核酸分子可称为长片段,短于1Kb或者800bp的可称为短片段;包括双末端测序、单末端测序和/或配对末端测序等,所称的双末端测序或者配对末端测序可以指同一核酸分子的不完全重叠的任意两段或两个部分的读出;所称的测序包括使核苷酸(包括核苷酸类似物)结合到模板并采集发出的相应的信号的过程。In embodiments of the present invention, the terms "sequencing", "nucleic acid sequencing" and "gene sequencing" are interchangeable and refer to nucleic acid sequence determination; including sequencing by synthesis (sequencing by synthesis, SBS) and/or ligation sequencing ( Sequencing while ligating (SBL), including DNA sequencing and/or RNA sequencing, including long fragment sequencing and/or short fragment sequencing. The so-called long fragments and short fragments are relative, such as longer than 1Kb, 2Kb, 5Kb or 10Kb. Nucleic acid molecules can be called long fragments, and those shorter than 1Kb or 800bp can be called short fragments; including paired-end sequencing, single-end sequencing and/or paired-end sequencing, etc. The so-called paired-end sequencing or paired-end sequencing can refer to the same nucleic acid The reading of any two segments or parts of a molecule that do not completely overlap; the so-called sequencing includes the process of binding nucleotides (including nucleotide analogs) to a template and collecting the corresponding signals emitted.

测序一般包括多轮测序以实现模板上的多个核苷酸/碱基的次序的测定;“一轮测序”(cycle)也称为“测序轮”,可定义为四种核苷酸/碱基的一次碱基延伸,换句话说,可定义为完成模板上任意一个指定位置的碱基类型的测定,对于基于聚合或连接反应实现测序的测序平台,一轮测序包括实现一次四种核苷酸(包括核苷酸类似物)结合到所称的模板并采集发出的相应的信号的过程;对于基于聚合反应实现测序的平台,反应体系包括反应底物核苷酸、聚合酶和模板,模板上结合有一段序列(测序引物),基于碱基配对原则和聚合反应原理,加入的反应底物核苷酸在聚合酶的催化下,连接到测序引物上实现该核苷酸与模板的特定位置的结合;通常地,一轮测序可包括一次或多次碱基延伸(repeat),例如,四种核苷酸依次加入到反应体系中,分别进行碱基延伸和相应的反应信号的采集,一轮测序包括四次碱基延伸;又例如,四种核苷酸任意组合加入到反应体系中,例如两两组合或者一三组合,两个组合分别进行碱基延伸和相应的反应信号的采集,一轮测序包括两次碱基延伸;再例如,四种核苷酸同时加入到反应体系中进行碱基延伸和反应信号的采集,一轮测序包括一次碱基延伸。Sequencing generally includes multiple rounds of sequencing to determine the order of multiple nucleotides/bases on the template; a "sequencing cycle" is also called a "sequencing round" and can be defined as four nucleotides/bases. One base extension of the base, in other words, can be defined as the completion of the determination of the base type at any specified position on the template. For sequencing platforms that implement sequencing based on polymerization or ligation reactions, one round of sequencing includes achieving four nucleosides at a time. The process of binding an acid (including nucleotide analogs) to a so-called template and collecting the corresponding signal emitted; for a platform based on polymerization to achieve sequencing, the reaction system includes the reaction substrate nucleotide, polymerase and template, and the template There is a sequence (sequencing primer) bound to it. Based on the principle of base pairing and polymerization reaction, the added reaction substrate nucleotide is catalyzed by the polymerase and connected to the sequencing primer to achieve the specific position of the nucleotide and the template. Combination; Generally, a round of sequencing can include one or more base extensions (repeat). For example, four nucleotides are added to the reaction system in sequence, and base extensions and corresponding reaction signals are collected respectively. A round of sequencing includes four base extensions; for another example, four nucleotides are added to the reaction system in any combination, such as two-two combinations or one-three combinations. The two combinations perform base extension and corresponding reaction signal collection respectively. One round of sequencing includes two base extensions; for another example, four nucleotides are added to the reaction system at the same time for base extension and reaction signal collection, and one round of sequencing includes one base extension.

所称的图像上的“亮斑”(spots或者peaks),也称为“亮点”或“光点”,指图像上的信号相对强的位置,例如该位置的信号较周围的强,在图像上表现为相对亮的一个斑或点,一个亮斑或一个所称的位置占有一个或多个像素。亮斑/该位置的信号可能来自目标分子,也可能来自非目标物质。对“亮斑”的检测包括对目标分子如延伸碱基或碱基簇的光学信号的检测。The so-called "bright spots" (spots or peaks) on the image, also known as "bright spots" or "light spots", refer to the positions on the image where the signal is relatively strong. For example, the signal at this position is stronger than the surrounding ones. In the image It appears as a relatively bright spot or point, and a bright spot or a so-called position occupies one or more pixels. The bright spot/signal at this location may come from the target molecule or from non-target substances. Detection of "bright spots" involves detection of optical signals from target molecules such as extended bases or clusters of bases.

所称的“色差”(chromatic aberration,CA)是指光学上透镜无法将各种波长的色光都聚焦在同一点上的现象[Max Born;Emil Wolf.Principles of Optics:Electromagnetic Theory of Propagation,Interference and Diffraction of Light(7th Edition).Cambridge University Press.October 13,1999:334.ISBN0521642221.];在成像上,色差表现为光谱上的每一种颜色无法聚焦在光轴上的同一点,对于涉及利用多个波长的色光对同一物体(例如一个或多个核酸分子)进行成像的测序平台,至少地,色差会使得在不同波长下采集得的一个物体的多个像中,该物体具有不同的位置/坐标,或者说,该物体没有发生实际的移动,但由于色差会使得它在不同波长下的多个图像中看上去是移动的。The so-called "chromatic aberration" (CA) refers to the optical phenomenon that the lens cannot focus all the colored light of various wavelengths on the same point [Max Born; Emil Wolf. Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light(7th Edition).Cambridge University Press.October 13,1999:334.ISBN0521642221.]; In imaging, chromatic aberration manifests itself as the inability of each color on the spectrum to focus on the same point on the optical axis. For applications involving A sequencing platform that uses multiple wavelengths of colored light to image the same object (such as one or more nucleic acid molecules). At least, the chromatic aberration will cause the object to have different positions in multiple images of an object collected at different wavelengths. / coordinates, or in other words, the object does not actually move, but due to chromatic aberration it appears to be moving in multiple images at different wavelengths.

所称的“串色”(crosstalk或laser-crosstalk或spectra-crosstalk),也称为“光谱串色”或“光谱交叉”,指对应一种碱基的信号扩散到另一种碱基的信号中的现象;对于利用标记不同的荧光分子来识别不同的碱基的测序平台,如果选择的两种或两种以上的荧光分子的发射光谱有重叠,可能检测到一轮测序中一种荧光分子的信号扩散到另一种荧光通道的情况。The so-called "crosstalk" (crosstalk or laser-crosstalk or spectrum-crosstalk), also known as "spectral crosstalk" or "spectral crosstalk", refers to the signal corresponding to one base spreading to the signal of another base. phenomenon; for sequencing platforms that use different labeled fluorescent molecules to identify different bases, if the emission spectra of two or more selected fluorescent molecules overlap, one fluorescent molecule may be detected in one round of sequencing The signal diffuses into another fluorescence channel.

所称的“相位失相”、“相位失衡”、“失相”、“相位差异”,指化学反应中,一个群体比如一个核酸分子簇中的核酸分子之间的反应不同步的现象,包括落后/滞后(phasing或sequence lag)和提前/超前(prephasing或sequence lead);在利用标记不同的荧光分子来识别不同的碱基的测序平台中,表现为特定位置的碱基对应的荧光分子在超过一轮测序中的信号不为零的现象。一般地,利用带有荧光分子标记和阻断基团的核苷酸进行测序,核苷酸上的阻断基团可以阻止其它核苷酸结合到该模板的下一个位置,阻断基团例如为连接在该核苷酸的糖基的3'位的叠氮,阻断基团的脱落或者未能在下一碱基延伸之前被去除,均会造成相位失相。The so-called "phase out of phase", "phase imbalance", "out of phase", and "phase difference" refer to the phenomenon of unsynchronized reactions between nucleic acid molecules in a group, such as a nucleic acid molecule cluster, in chemical reactions, including Lagging/lag (phasing or sequence lag) and advance/lead (prephasing or sequence lead); in a sequencing platform that uses different fluorescent molecules labeled to identify different bases, the fluorescent molecules corresponding to the bases at a specific position are expressed in The phenomenon of non-zero signal in more than one round of sequencing. Generally, nucleotides with fluorescent molecular labels and blocking groups are used for sequencing. The blocking group on the nucleotide can prevent other nucleotides from binding to the next position of the template. The blocking group is such as For the azide attached to the 3' position of the sugar group of the nucleotide, the loss of the blocking group or the failure to be removed before the extension of the next base will cause dephasing.

在本发明的实施方式中,图像来自基于光学成像检测芯片实现核酸测序的平台,所称的平台包括但不限于来自BGI/CG(Complete Genomics)、Illumina/Solexa、ThermoFisher/Life Technologies/ABI SOLiD和Roche 454等公司或机构的一个或多个系列的测序平台。In the embodiment of the present invention, the images come from a platform that implements nucleic acid sequencing based on optical imaging detection chips. The so-called platforms include but are not limited to BGI/CG (Complete Genomics), Illumina/Solexa, ThermoFisher/Life Technologies/ABI SOLiD and One or more series of sequencing platforms from companies or institutions such as Roche 454.

在一些平台中,固相支持物例如芯片上固定有多段序列(探针或者测序引物),模板(待测核酸分子)通过与探针结合例如通过杂交连接到该芯片上,可选地,在芯片上对模板进行扩增,接着将承载有模板的芯片载入测序设备中,该测序设备包括成像系统和液路系统,通过控制液路系统通入分别包含聚合酶和核苷酸的溶液至芯片,在合适的条件下进行可控的聚合酶链式反应,例如通入的核苷酸溶液包含的核苷酸包括修饰的核苷酸,该修饰的核苷酸带有阻断基团和荧光分子,根据碱基互补原则,在聚合酶的催化下该修饰的核苷酸结合到某个模板的特定位置上,其上的阻断基团能够阻止其它核苷酸(包括修饰的核苷酸)结合到该模板的下一个位置;进而,利用成像系统激发荧光分子使荧光分子发出荧光信号,以及采集该些荧光信号例如对芯片上的反应区域进行拍照,获得图像;最后,通过控制液路系统通入切割试剂,以去除结合到模板上的修饰的核苷酸的阻断基团和荧光分子;至此,完成一次碱基延伸,再次通入分别包含聚合酶和核苷酸的溶液至芯片,重复上述碱基反应。基于该些拍得的图像和各次拍照的时间顺序和/或加入的碱基类型,确定每次模板的特定位置结合的核苷酸/碱基的类型,亦即确定模板的该些特定位置的核苷酸/碱基。In some platforms, multiple sequences (probes or sequencing primers) are fixed on a solid support such as a chip, and the template (nucleic acid molecule to be tested) is connected to the chip by binding to the probe, such as by hybridization. Optionally, The template is amplified on the chip, and then the chip carrying the template is loaded into the sequencing equipment. The sequencing equipment includes an imaging system and a liquid system. The liquid system is controlled to introduce solutions containing polymerase and nucleotides to The chip performs a controllable polymerase chain reaction under appropriate conditions. For example, the nucleotide solution passed through contains nucleotides including modified nucleotides, and the modified nucleotides have blocking groups and Fluorescent molecules, based on the principle of base complementarity, bind the modified nucleotide to a specific position of a template under the catalysis of polymerase, and the blocking group on it can prevent other nucleotides (including modified nucleosides) acid) is bound to the next position of the template; then, the imaging system is used to excite the fluorescent molecules so that the fluorescent molecules emit fluorescent signals, and these fluorescent signals are collected, such as taking pictures of the reaction area on the chip to obtain the image; finally, through the control liquid The cleavage reagent is passed through the system to remove the blocking group and fluorescent molecules of the modified nucleotide bound to the template; at this point, a base extension is completed, and the solution containing the polymerase and the nucleotide respectively is passed again to Chip, repeat the above base reaction. Based on the captured images and the time sequence of each photograph and/or the type of base added, the type of nucleotide/base bound to a specific position of the template each time is determined, that is, the specific positions of the template are determined. of nucleotides/bases.

基于生化反应的各个步骤的反应效率达不到百分之一百,例如即使在信号采集之前,对未结合到模板的修饰的核苷酸进行清除,例如利用不影响碱基延伸的缓冲液对芯片上的反应区域进行清洗,可以理解地,采集得的图像上的表现为亮斑的位置,除了可能对应结合到模板的修饰的核苷酸,也可能对应未结合到模板但未能除去的修饰的核苷酸或荧光分子,还可能对应芯片上的检测区域存在其它非目标物质发出的信号。The reaction efficiency based on individual steps of the biochemical reaction is less than 100%, such as the removal of modified nucleotides that are not bound to the template even before signal acquisition, such as the use of buffer pairs that do not affect base extension. The reaction area on the chip is cleaned. It is understandable that the positions that appear as bright spots on the collected images may not only correspond to the modified nucleotides bound to the template, but may also correspond to the nucleotides that are not bound to the template but cannot be removed. Modified nucleotides or fluorescent molecules may also correspond to signals emitted by other non-target substances in the detection area on the chip.

在本发明的一个实施例中,图像来自二代测序平台,例如Illumina HiSeq/MiSeq系列和BGI MGISeq系列,输入的原始数据为采集得的信号的位置和强度等相关参数包括图像的像素相关信息,对图像上所称的“亮斑”的检测包括对对应于核酸分子簇的光学信号的检测。In one embodiment of the present invention, the image comes from a second-generation sequencing platform, such as the Illumina HiSeq/MiSeq series and the BGI MGISeq series. The input raw data is the position and intensity of the collected signal and other related parameters including the pixel related information of the image. The detection of so-called "bright spots" on the image involves the detection of optical signals corresponding to clusters of nucleic acid molecules.

请参阅图1,本发明实施方式的一种识别核酸中的一个或多个碱基的方法,该方法通过检测获自测序的图像,包括:S11将对应于模板的亮斑集合中的每个亮斑的坐标映射到待检图像上,确定待检图像上相应坐标的位置;S21确定所述待检图像上相应坐标的位置的信号的强度,所述强度为矫正后的强度;以及S31比较所述待检图像上相应坐标的位置的信号的强度与第一预设值的大小,基于比较结果判断该位置对应的碱基类型,实现碱基识别。Please refer to Figure 1, a method of identifying one or more bases in nucleic acids according to an embodiment of the present invention. The method detects images obtained from sequencing, including: S11 converting each bright spot set corresponding to the template The coordinates of the bright spot are mapped to the image to be inspected, and the position of the corresponding coordinates on the image to be inspected is determined; S21 determines the intensity of the signal at the position of the corresponding coordinates on the image to be inspected, and the intensity is the corrected intensity; and S31 compares The intensity of the signal at the position of the corresponding coordinate on the image to be detected is compared with the size of the first preset value, and the base type corresponding to the position is determined based on the comparison result to realize base identification.

所称的对应于模板的亮斑集合基于一组图像构建获得,一组图像中的每个图像均包含多个亮斑;该一组图像和待检图像均来自测序且对应一个相同的视野(Field ofview,FOV),所称的一组图像来自至少一轮测序,至少一部分所称的信号在该一组图像上表现为至少一部分所称的亮斑。The so-called bright spot set corresponding to the template is constructed based on a set of images. Each image in the set of images contains multiple bright spots; both the set of images and the image to be inspected are from sequencing and correspond to the same Field of view (FOV), the so-called set of images comes from at least one round of sequencing, and at least a part of the so-called signals appear as at least a part of the so-called bright spots on the set of images.

该方法能够快速且准确地识别碱基,进而快速且准确地测定模板的至少一部分序列的核苷酸/碱基的次序。This method can quickly and accurately identify bases, and thereby quickly and accurately determine the order of nucleotides/bases of at least a part of the template sequence.

具体地,S11中,所称的对应于模板的亮斑集合包括与模板对应的多个亮斑,包含各个亮斑的强度和坐标信息。Specifically, in S11, the so-called bright spot set corresponding to the template includes multiple bright spots corresponding to the template, including the intensity and coordinate information of each bright spot.

所称的坐标映射是通过原图像如对应于模板的亮斑集合与目标图像如待检图像之间建立一种映射关系,这里的映射关系包括确定原图像的任意亮斑在映射后图像的坐标位置。The so-called coordinate mapping is to establish a mapping relationship between the original image, such as the set of bright spots corresponding to the template, and the target image, such as the image to be inspected. The mapping relationship here includes determining the coordinates of any bright spots in the original image in the mapped image. Location.

本实施方式对坐标的确定方法和实现坐标映射的方法均不作限制。对于坐标映射,例如,可通过Opencv的remap函数实现。而对于亮斑的坐标的确定,通常地,图像上的一个亮斑占有一个或多个像素,可以以某个像素的坐标作为该亮斑的坐标,或者利用诸如二次函数插值法等确定该亮斑的亚像素中心坐标作为该亮斑的坐标。This embodiment places no restrictions on the method of determining coordinates or the method of implementing coordinate mapping. For coordinate mapping, for example, it can be achieved through Opencv's remap function. As for the determination of the coordinates of a bright spot, usually, a bright spot on an image occupies one or more pixels, and the coordinates of a certain pixel can be used as the coordinates of the bright spot, or the coordinates of the bright spot can be determined using, for example, a quadratic function interpolation method. The sub-pixel center coordinates of the bright spot are used as the coordinates of the bright spot.

具体地,在一些实施例中,输入的待检图像可为512*512或2048*2048的16位tiff格式的图像,tiff格式的图像可为灰度图像。对于灰度图像,像素值同灰度值。输入的图像也可以是彩色图像,彩色图像的一个像素点具有三个像素值,可以将彩色图像转化为灰度图像,再进行后续处理检测,以降低图像处理过程的计算量和复杂度。可选择但不限于利用浮点算法、整数方法、移位方法或平均值法等将非灰度图像转换成灰度图像。Specifically, in some embodiments, the input image to be inspected may be a 16-bit tiff format image of 512*512 or 2048*2048, and the image in tiff format may be a grayscale image. For grayscale images, pixel values are the same as grayscale values. The input image can also be a color image. One pixel of the color image has three pixel values. The color image can be converted into a grayscale image, and then subsequent processing and detection can be performed to reduce the calculation amount and complexity of the image processing process. You can choose but are not limited to using floating point algorithm, integer method, shift method or average method to convert the non-grayscale image into a grayscale image.

所称的对应于模板的亮斑集合可以在进行该碱基识别时构建,也可以预先构建保存。这里,利用采集自至少一轮测序的一组图像,预先构建对应于模板的亮斑集合,保存备用。The so-called bright spot collection corresponding to the template can be constructed when the base recognition is performed, or can be constructed and saved in advance. Here, a set of images collected from at least one round of sequencing is used to pre-construct a set of bright spots corresponding to the template and save them for later use.

在一些示例中,四种核苷酸带有不同的标记,在进行测序时,该些不同的标记被激发发出不同颜色的信号,不同的信号对应不同类型的核苷酸/碱基。所称的对应于模板的亮斑集合包括四个分别对应于四种核苷酸的亮斑集合。In some examples, four nucleotides are labeled differently, and when sequenced, these different labels are excited to emit signals of different colors, and the different signals correspond to different types of nucleotides/bases. The so-called bright spot set corresponding to the template includes four bright spot sets corresponding to four nucleotides respectively.

在一个示例中,利用来自一轮测序的一组图像进行对应于模板的亮斑集合的构建,包括:顺序或者同时加入四种核苷酸至反应体系中进行一轮测序,获得所称的一组图像,该一组图像包括第一图像、第二图像、第三图像和第四图像,第一图像、第二图像、第三图像和第四图像分别采集自四种核苷酸反应时发出的信号,所称的反应体系包括模板和聚合酶;分别对第一图像、第二图像、第三图像和第四图像进行亮斑检测,确定各个图像的亮斑,包括确定亮斑的坐标;对齐该一组图像,以使该一组图像的亮斑处于一个相同的坐标系中;合并对齐后的一组图像上的亮斑,获得一级亮斑集合;依据所称的一级亮斑集合,建立分别对应于四种核苷酸的亮斑集合,即建立四种核苷酸/碱基的模板。In one example, a set of images from a round of sequencing is used to construct a set of bright spots corresponding to the template, including: sequentially or simultaneously adding four nucleotides to the reaction system for a round of sequencing to obtain the so-called A set of images, the set of images includes a first image, a second image, a third image and a fourth image, the first image, the second image, the third image and the fourth image are respectively collected from four kinds of nucleotides The signal emitted during the reaction, the so-called reaction system includes a template and a polymerase; perform bright spot detection on the first image, the second image, the third image and the fourth image respectively, and determine the bright spots in each image, including determining the bright spots coordinates; align the set of images so that the bright spots of the set of images are in the same coordinate system; merge the bright spots on the aligned set of images to obtain a first-level set of bright spots; according to the The first-level bright spot set is called a set of bright spots, and a set of bright spots corresponding to four nucleotides is established, that is, a template of four nucleotides/bases is established.

构建对应于模板的亮斑集合时,对该一组图像进行亮斑检测和对齐该一组图像,没有顺序限制。进行该一组图像的对齐可以利用该一组图像上的亮斑,也可以不利用该一组图像上的亮斑,例如在检测区域的特定位置作一些标记,依据各图像的该些标记的信息来对齐这组图像。When constructing a bright spot set corresponding to the template, bright spots are detected and aligned on the set of images without order restrictions. The alignment of the set of images can be done by using the bright spots on the set of images, or without using the bright spots on the set of images. For example, some marks can be made at specific positions of the detection area, and the bright spots on the set of images can be aligned according to the Use some labeled information to align the set of images.

所称的一轮测序可以包括四次碱基延伸,例如四种核苷酸依次加入到反应体系中分别独立地完成碱基延伸包括相应反应信号的采集,也可以包括两次碱基延伸,例如四种核苷酸两两组合,各组合中的核苷酸同时进入反应体系中进行碱基延伸,还可以只包括一次碱基延伸,例如四种核苷酸同时在反应体系中进行碱基延伸。The so-called round of sequencing can include four base extensions, for example, four nucleotides are added in sequence to the reaction system to independently complete base extension including the collection of corresponding reaction signals, or it can include two base extensions, for example Four nucleotides are combined in pairs, and the nucleotides in each combination enter the reaction system at the same time for base extension. It can also include only one base extension. For example, four nucleotides are used in the reaction system for base extension at the same time. .

在一个示例中,同时加入四种核苷酸至所称的应体系中,利用成像系统采集相应的反应信号以获得一组图像和/或待测图像,所称的成像系统包括第一激光、第二激光、第一相机和第二相机。In one example, four nucleotides are added to the so-called reaction system at the same time, and the corresponding reaction signals are collected using an imaging system to obtain a set of images and/or images to be measured. The so-called imaging system includes a first laser , the second laser, the first camera and the second camera.

进一步地,所称的模板为DNA,四种核苷酸分别带有第一标记、第二标记、第三标记和第四标记,例如,四种发射光谱不同或者说不完全重叠的荧光分子;在一轮测序中,利用第一激光激发核苷酸,四种核苷酸中的两种分别发出第一信号和第二信号,第一相机和第二相机同步作业以分别采集该第一信号和第二信号,获得第一图像和第二图像,以及,利用第二激光激发核苷酸,四种核苷酸中的另外两种核苷酸分别发出第三信号和第四信号,第一相机和第二相机同步作业以分别采集该第三信号和所述第四信号,获得第三图像和第四图像。所称的第一激光和第二激光可以来自两个能够发射不同波长的激光器,也可以来自一个能够发射多种波长的激光器。Further, the so-called template is DNA, and the four nucleotides carry a first label, a second label, a third label and a fourth label respectively, for example, four fluorescent molecules with different emission spectra or incomplete overlap; In a round of sequencing, a first laser is used to excite nucleotides, two of the four nucleotides respectively emit a first signal and a second signal, and the first camera and the second camera operate simultaneously to collect the first signal respectively. and a second signal, obtaining the first image and the second image, and using the second laser to excite the nucleotides, the other two nucleotides among the four nucleotides respectively emit the third signal and the fourth signal, the first The camera and the second camera operate synchronously to respectively collect the third signal and the fourth signal to obtain a third image and a fourth image. The so-called first laser and second laser can come from two lasers capable of emitting different wavelengths, or they can come from one laser capable of emitting multiple wavelengths.

具体地,例如,四种脱氧核糖核苷酸dATP(有时简示为A)、dTTP(有时简示为T)、dGTP(有时简示为G)和dCTP(有时简示为C)分别带有ATTO-532、ROX、CY5和IF700四种荧光染料,该四种荧光染料的光谱曲线如图2所示,从左到右的虚线曲线分别为ATTO-532、ROX、CY5和IF700的吸收光谱,各吸收光谱峰值波长分别为531nm、577nm、651nm和692nm,从左到右的实线曲线分别为ATTO-532、ROX、CY5和IF700的辐射光谱/发射光谱,各辐射光谱峰值波长分别为551nm、602nm、670nm和712nm。在成像系统的光路结构设计时,考虑到染料的激发效率,采用至少两种波长的激光对该四种染料进行两两分时激发,并由两个相机通过分光二向色镜及双带通滤片进行分时荧光信号采集;换句话说,第一激光和第二激光可异步作业,第一相机和第二相机可同步作业,如此,可高效地实现四种染料的激发和相应信号的采集。Specifically, for example, the four deoxyribonucleotides dATP (sometimes abbreviated as A), dTTP (sometimes abbreviated as T), dGTP (sometimes abbreviated as G) and dCTP (sometimes abbreviated as C) respectively carry There are four fluorescent dyes: ATTO-532, ROX, CY5 and IF700. The spectral curves of these four fluorescent dyes are shown in Figure 2. The dotted curves from left to right are the absorption spectra of ATTO-532, ROX, CY5 and IF700 respectively. The peak wavelengths of each absorption spectrum are 531nm, 577nm, 651nm and 692nm respectively. The solid curves from left to right are the radiation spectra/emission spectra of ATTO-532, ROX, CY5 and IF700 respectively. The peak wavelengths of each radiation spectrum are 551nm, 602nm, 670nm and 712nm. When designing the optical path structure of the imaging system, taking into account the excitation efficiency of the dyes, lasers of at least two wavelengths are used to excite the four dyes in two time-sharing manners, and the two cameras pass through the dichroic mirror and dual bandpass The filter performs time-sharing fluorescence signal collection; in other words, the first laser and the second laser can operate asynchronously, and the first camera and the second camera can operate synchronously. In this way, the excitation of the four dyes and the corresponding signal can be efficiently achieved. collection.

对图像上亮斑的识别和检测,为能检测出来自目标分子的信号。本发明的该实施方式对亮斑的检测方式不作限制,例如可参照CN107918931A披露的方法进行。The identification and detection of bright spots on the image is to detect signals from target molecules. This embodiment of the present invention does not limit the method of detecting bright spots. For example, the method disclosed in CN107918931A can be referred to.

在一些实施例中,检测亮斑包括利用k1*k2矩阵对该一组图像中的各个图像进行检测,包括:判定中心强度与边缘强度的关系midS满足第一预设条件的矩阵对应一个所称的亮斑,中心强度反映该矩阵的中心区域的强度,边缘强度反映该矩阵的边缘区域的强度,一个中心区域和一个边缘区域形成所称的k1*k2矩阵,k1和k2均为大于1的自然数,k1*k2矩阵包含k1*k2个像素。In some embodiments, detecting bright spots includes using a k1*k2 matrix to detect each image in the set of images, including: determining the relationship between center intensity and edge intensity midS. A matrix that satisfies the first preset condition corresponds to a A so-called bright spot, the center intensity reflects the intensity of the central area of the matrix, the edge intensity reflects the intensity of the edge area of the matrix, a central area and an edge area form the so-called k1*k2 matrix, k1 and k2 are both greater than 1 is a natural number, and the k1*k2 matrix contains k1*k2 pixels.

k1和k2的取值与模板分子在固相基质上的密度和分布以及成像分辨率有关,一般期望k1*k2矩阵不小于一个目标亮斑的大小,所称的目标亮斑对应目标信号或者对应目标分子/分子簇;较佳地,一般也期望k1*k2矩阵小于图像上两个独立的亮斑所占的大小。The values of k1 and k2 are related to the density and distribution of template molecules on the solid matrix and the imaging resolution. It is generally expected that the k1*k2 matrix is not smaller than the size of a target bright spot. The so-called target bright spot corresponds to the target signal or corresponds to Target molecules/molecule clusters; preferably, it is generally expected that the k1*k2 matrix is smaller than the size of two independent bright spots on the image.

k1*k2矩阵,k1和k2可以相等也可以不相等。一般地,k1和k2的取值范围均为大于1且小于10。k1*k2 matrix, k1 and k2 can be equal or unequal. Generally, the value ranges of k1 and k2 are both greater than 1 and less than 10.

在一个示例中,成像系统相关参数为:物镜60倍,电子传感器的尺寸为6.5μm,经过显微镜成的像再经过电子传感器,能看到的最小尺寸(分辨率)约为0.1μm,获得的图像或者输入的图像可为512*512、1024*1024或2048*2048的16位的灰度或彩色图像,一个目标亮斑对应单个分子,对应的尺寸通常小于10nm,所称的单个分子包括一个或少数几个分子/核酸片段,一般少于10个分子,例如1、2、3、4或5个分子,一个目标亮斑在该图像上大概占3*3像素。In one example, the relevant parameters of the imaging system are: the objective lens is 60 times, the size of the electronic sensor is 6.5 μm, the image formed by the microscope and then passes through the electronic sensor, the smallest size (resolution) that can be seen is about 0.1 μm, the obtained The image or input image can be a 16-bit grayscale or color image of 512*512, 1024*1024 or 2048*2048. A target bright spot corresponds to a single molecule, and the corresponding size is usually less than 10nm. The so-called single molecule includes a Or a few molecules/nucleic acid fragments, generally less than 10 molecules, such as 1, 2, 3, 4 or 5 molecules. A target bright spot occupies approximately 3*3 pixels on the image.

在另一个示例中,成像系统相关参数为:物镜20倍,经过显微镜成的像再经过电子传感器,分辨率约为0.3μm,获得的图像或者输入的图像可为512*512、1024*1024、2048*2048或2560*2048的灰度或彩色图像,一个目标亮斑对应一个分子簇,一个目标亮斑在该图像上大概占5*5像素。In another example, the relevant parameters of the imaging system are: the objective lens is 20 times, the image formed by the microscope then passes through the electronic sensor, the resolution is about 0.3μm, the image obtained or the input image can be 512*512, 1024*1024, For a grayscale or color image of 2048*2048 or 2560*2048, a target bright spot corresponds to a molecular cluster, and a target bright spot occupies approximately 5*5 pixels on the image.

k1和k2可以为奇数也可以为偶数,在一些实施例中,k1和k2均为奇数。如此,便于矩阵的中心区域和边缘区域的设定以及便于后续计算。k1 and k2 may be odd numbers or even numbers. In some embodiments, k1 and k2 are both odd numbers. In this way, it is convenient to set the central area and edge area of the matrix and facilitate subsequent calculations.

在一个示例中,k1=k2=3。In one example, k1=k2=3.

所称的中心区域和边缘区域是相对的定义,例如,可以以矩阵的中心像素或中心亚像素为中心的一定大小的区域为中心区域,其它区域则构成该矩阵的边缘区域。The so-called central area and edge area are relative definitions. For example, an area of a certain size centered on the central pixel or central sub-pixel of the matrix can be the central area, and other areas constitute the edge area of the matrix.

所称的强度,或者信号的强度,包括这里的中心强度和边缘强度,反映在图像上,一般与像素的大小相关,例如为一个或多个像素的像素值、多像素值的平均值或中位数、多个像素值之和或者为与像素大小呈正相关的关系。The so-called intensity, or the intensity of the signal, including the center intensity and edge intensity here, is reflected in the image and is generally related to the size of the pixel, such as the pixel value of one or more pixels, the average or median value of multiple pixels. The number of bits, the sum of multiple pixel values, or is positively correlated with the pixel size.

在一些示例中,所称的第一预设条件为midS≥S1,midS=midInt-sumInts(1:n)/n,midInt表示所称的中心强度,sumInts(1:n)/n表示所称的边缘强度,sumInts(1:n)表示边缘区域的第1至第N个像素的像素值之和,n为不小于4的自然数,S1为[2,4]中的任意值。该第一预设条件是发明人通过大量图像数据训练总结获得,适合来自各种测序平台的不同信号强度、亮斑密度和分布的图像的亮斑检测。In some examples, the so-called first preset condition is midS≥S1, midS=midInt-sumInts(1:n)/n, midInt represents the so-called central intensity, and sumInts(1:n)/n represents the so-called central strength. The edge strength of sumInts(1:n) represents the sum of pixel values from the 1st to Nth pixels in the edge area, n is a natural number not less than 4, and S1 is any value in [2,4]. The first preset condition was obtained by the inventor through training and summarizing a large amount of image data, and is suitable for bright spot detection in images with different signal intensities, bright spot densities and distributions from various sequencing platforms.

具体地,k1和k2均为大于3的奇数,所称的中心区域为以该矩阵的中心像素为中心的3*3区域。在一个示例中,请参阅图3,k1=k2=5,图3示意一个5*5矩阵,中心区域为以该图中标记着midS的像素为中心的3*3区域,以该中心区域的任意一个像素的像素值为该中心区域的强度(中心强度),例如以该图中中标记着midS所在的像素的像素值为中心强度,n取12,如图上标记着1-12的像素点,S1取2。如此,能够快速且有效地检测出对应于目标分子的亮斑,利于对应于模板的亮斑集合的构建,利于后续碱基的准确识别。Specifically, k1 and k2 are both odd numbers greater than 3, and the so-called central area is a 3*3 area centered on the central pixel of the matrix. In an example, please refer to Figure 3, k1=k2=5, Figure 3 illustrates a 5*5 matrix, the central area is a 3*3 area centered on the pixel marked midS in the figure, and the central area is The pixel value of any pixel is the intensity of the central area (center intensity). For example, the pixel value of the pixel marked midS in the picture is the center intensity. n is 12, such as the pixels marked 1-12 in the picture. point, S1 takes 2. In this way, the bright spots corresponding to the target molecule can be quickly and effectively detected, which facilitates the construction of a bright spot collection corresponding to the template and facilitates the accurate identification of subsequent bases.

在另一些实施例中,亮斑检测包括:分别对该一组图像中的各个图像进行卷积,获得卷积后的图像;寻找卷积后的图像中所有的在k3*k4区域内包含峰值的像素,k3和k4均为大于1的自然数,k3*k4区域包含k3*k4个卷积后的图像的像素;以及,判定满足第二预设条件的以峰值像素为中心的k5*k6区域对应一个所称的亮斑,第二预设条件为k5*k6区域的峰值像素的像素不小于S2,k5和k6均为大于1的自然数,S2可通过该卷积后的图像的像素进行确定。In other embodiments, bright spot detection includes: convolving each image in the set of images to obtain a convolved image; finding all the convolved images contained in the k3*k4 area For the peak pixel, k3 and k4 are both natural numbers greater than 1, and the k3*k4 area contains k3*k4 pixels of the convolved image; and, k5*k6 centered on the peak pixel is determined to meet the second preset condition. The area corresponds to a so-called bright spot. The second preset condition is that the pixel of the peak pixel in the k5*k6 area is not less than S2. Both k5 and k6 are natural numbers greater than 1. S2 can be performed through the pixels of the convolved image. Sure.

利用卷积核对图像进行卷积,卷积核也称为卷积模板、滤波器、滤波模板或者扫描窗,该实施方式对实现卷积的方式不作限定,例如,设定卷积核后,利用Matlab中的相关函数进行。对图像进行卷积,一般包括的计算过程为可选地卷积模板翻转,然后在原图像上滑动该卷积模板,把对应位置上的元素相乘后加起来,得到最终的结果。例如,一般所称的滤波,可利用高斯模板来实现。The image is convolved using a convolution kernel. The convolution kernel is also called a convolution template, filter, filter template or scan window. This embodiment does not limit the way to implement convolution. For example, after setting the convolution kernel, use Related functions in Matlab are performed. Convolving an image generally involves the calculation process of optionally flipping the convolution template, then sliding the convolution template on the original image, multiplying the elements at the corresponding positions and adding them up to obtain the final result. For example, so-called filtering can be implemented using Gaussian templates.

在一些示例中,目标分子是核酸分子簇,例如为一个核酸分子经过扩增如链置换扩增或者桥式扩增后形成的核酸分子簇,图像采集利用的成像系统的分辨率约为0.3μm,设置k3=k4=k5=k6=5;进一步地,发明人在研究了大量这样的目标分子在图像上的形态和/或强度变化的规律后,设置了一个5*5大小的卷积核来进行该卷积,该5*5大小的卷积核如图4所示,图4所示的卷积核上的标记显示该标记所在的像素相对于中心像素的坐标/位置,横向表示为x,纵向表示为y,单位为像素,利用这样一个5*5大小的卷积核对图像进行卷积运算,包括利用该卷积核对图像中的每个像素进行重新赋值。如此,能够增强图像中的5*5区域的中心像素和边缘像素(例如最外围像素)的差异。In some examples, the target molecules are nucleic acid molecule clusters, for example, nucleic acid molecule clusters formed after a nucleic acid molecule is amplified, such as strand displacement amplification or bridge amplification. The resolution of the imaging system used for image acquisition is about 0.3 μm. , set k3=k4=k5=k6=5; further, after studying the pattern and/or intensity changes of a large number of such target molecules on the image, the inventor set up a convolution kernel of 5*5 size To perform this convolution, the 5*5 size convolution kernel is shown in Figure 4. The mark on the convolution kernel shown in Figure 4 shows the coordinates/position of the pixel where the mark is located relative to the center pixel. The horizontal expression is x, vertically expressed as y, the unit is pixel, use such a 5*5 size convolution kernel to perform convolution operation on the image, including using the convolution kernel to reassign each pixel in the image. In this way, the difference between the center pixel and the edge pixel (for example, the outermost pixel) of the 5*5 area in the image can be enhanced.

具体地,在一个示例中,发明人通过大量训练数据,设置使图4所示的卷积核上的不带坐标标记的位置/像素的强度值/像素值为0,通过该卷积核进行以下设定的卷积运算后,图像中的像素例如坐标为(x,y)的像素的强度/像素值Ints(x,y)变为newInts(x,y),newInts(x,y)=(12*Ints(x,y)–Edge8Ints(x,y,2))*200/(Ints(x,y)+Edge8Ints(x,y,2)),Specifically, in one example, the inventor used a large amount of training data to set the intensity value/pixel value of the position/pixel without coordinate marking on the convolution kernel shown in Figure 4 to 0, and performed the operation through this convolution kernel. After the convolution operation set below, the intensity/pixel value Ints(x,y) of the pixel in the image, for example, the pixel with coordinates (x,y) becomes newInts(x,y), newInts(x,y)= (12*Ints(x,y)–Edge8Ints(x,y,2))*200/(Ints(x,y)+Edge8Ints(x,y,2)),

Ints(x,y)代表卷积前坐标为(x,y)的像素/位置的像素值/强度值;为利于快速运算可进一步设定newInts(x,y)的范围为[0,255],newInts(x,y)小于0的则赋值为0、大于255则赋值为255;Ints(x,y) represents the pixel value/intensity value of the pixel/position with coordinates (x,y) before convolution; in order to facilitate fast operation, the range of newInts(x,y) can be further set to [0,255], newInts If (x, y) is less than 0, it is assigned a value of 0, and if it is greater than 255, it is assigned a value of 255;

Edge8Ints(x,y,2)表示中心坐标(x,y)的8个方向(8邻域)的、与该中心坐标的(x,y)距离不小于2个像素的12个像素的像素值/强度值之和,该示例中,所称的与该中心坐标的(x,y)距离不小于2个像素的12个像素,如图4上显示的带坐标标记所在的像素,Edge8Ints(x,y,2)可表示为Edge8Ints(x,y,2)=(Ints(x-2,y-1)+Ints(x-2,y)+Ints(x-2,y+1)+Ints(x+2,y-1)+Ints(x+2,y)+Ints(x+2,y+1)+Ints(x-1,y-2)+Ints(x,y-2)+Ints(x+1,y-2)+Ints(x-1,y+2)+Ints(x,y+2)+Ints(x+1,y+2)),这里的Ints(x-2,y-1)、Ints(x-2,y)、Ints(x-2,y+1)、Ints(x+2,y-1)、Ints(x+2,y)、Ints(x+2,y+1)、Ints(x-1,y-2)、Ints(x,y-2)、Ints(x+1,y-2)、Ints(x-1,y+2)、Ints(x,y+2)和Ints(x+1,y+2)分别代表坐标为(x-2,y-1)、(x-2,y)、(x-2,y+1)、(x+2,y-1)、(x+2,y)、(x+2,y+1)、(x-1,y-2)、(x,y-2)、(x+1,y-2)、(x-1,y+2)、(x,y+2)和(x+1,y+2)的位置/像素卷积前的强度值/像素值。Edge8Ints(x,y,2) represents the pixel values of 12 pixels in 8 directions (8 neighborhoods) of the center coordinate (x, y), and the distance (x, y) from the center coordinate is not less than 2 pixels. /Sum of intensity values, in this example, the so-called (x,y) distance from the center coordinate is not less than 12 pixels of 2 pixels, such as the pixel with coordinate mark shown in Figure 4, Edge8Ints(x ,y,2) can be expressed as Edge8Ints(x,y,2)=(Ints(x-2,y-1)+Ints(x-2,y)+Ints(x-2,y+1)+Ints (x+2,y-1)+Ints(x+2,y)+Ints(x+2,y+1)+Ints(x-1,y-2)+Ints(x,y-2)+ Ints(x+1,y-2)+Ints(x-1,y+2)+Ints(x,y+2)+Ints(x+1,y+2)), where Ints(x-2 ,y-1),Ints(x-2,y),Ints(x-2,y+1),Ints(x+2,y-1),Ints(x+2,y),Ints(x+ 2,y+1),Ints(x-1,y-2),Ints(x,y-2),Ints(x+1,y-2),Ints(x-1,y+2),Ints (x,y+2) and Ints(x+1,y+2) respectively represent the coordinates of (x-2,y-1), (x-2,y), (x-2,y+1), (x+2,y-1), (x+2,y), (x+2,y+1), (x-1,y-2), (x,y-2), (x+1 ,y-2), (x-1,y+2), (x,y+2) and (x+1,y+2) position/intensity value/pixel value before convolution.

进行该卷积前,可选择的,对图像进行高斯滤波;对获得的高斯滤波后的图像再进行上述卷积运算。Before performing this convolution, optionally perform Gaussian filtering on the image; then perform the above-mentioned convolution operation on the obtained Gaussian filtered image.

图5显示利用上述方式对图像进行卷积前和卷积后的对比图,上图为卷积前,下图为卷积后,图中的方框示意卷积前和卷积后的图中的一个相同区域上信号强度和/或形态的变化。Figure 5 shows a comparison of images before and after convolution using the above method. The upper image is before convolution, and the lower image is after convolution. The boxes in the image indicate the images before and after convolution. Changes in signal intensity and/or pattern over the same area.

可以理解地,根据需要,例如目标分子在图像上的形态和/或强度变化具有不同的特征,可以调整上述卷积核的大小、卷积核中的数值以及调整例如Edge8Ints(x,y,n)中n的大小,对于调整该n,一般地,若已知理想亮斑的大小为m*m,可调整使n=m/2且向下取整。It can be understood that, according to needs, for example, if the morphology and/or intensity changes of the target molecules on the image have different characteristics, the size of the above-mentioned convolution kernel, the value in the convolution kernel, and the adjustment of, for example, Edge8Ints(x,y,n ), for adjusting the n, generally, if the size of the ideal bright spot is known to be m*m, it can be adjusted so that n=m/2 and rounded down.

对于k3和k4或者k5和k6的设置,类似地,k3和k4或者k5和k6的取值与模板分子在固相基质上的密度和分布以及成像分辨率有关,一般期望k3*k4或者k5*k6不小于一个目标亮斑的大小,所称的目标亮斑对应目标信号或者对应目标分子/分子簇;较佳地,一般也期望k3*k4或者k5*k6小于图像上两个独立的亮斑所占的大小。For the settings of k3 and k4 or k5 and k6, similarly, the values of k3 and k4 or k5 and k6 are related to the density and distribution of template molecules on the solid matrix and the imaging resolution. It is generally expected that k3*k4 or k5* k6 is not less than the size of a target bright spot, which corresponds to the target signal or the target molecule/molecule cluster; preferably, it is generally expected that k3*k4 or k5*k6 is smaller than two independent bright spots on the image The size occupied.

k3和k4,或者k5和k6可以相等也可以不相等。一般地,k3、k4、k5和k6的取值范围均为大于1且小于10。k3 and k4, or k5 and k6 may or may not be equal. Generally, the value ranges of k3, k4, k5 and k6 are greater than 1 and less than 10.

在一些实施例中,k3等于k4,和/或,k5等于k6。In some embodiments, k3 is equal to k4, and/or, k5 is equal to k6.

在一些实施例中,k3和k4均为大于1的奇数,和/或k5和k6均为大于1的奇数。进一步地,对于一个目标亮斑对应一个分子簇的平台,例如一个模板经过扩增形成一个分子簇,该分子簇固定在微球上或者芯片表面上,通常地,该一个分子簇的大小为数百纳米,在20倍放大的成像光路下,k3和k4均可取为大于3的奇数,和/或k5和k6均可取为大于3的奇数。如此,便于计算,利于对应于模板的亮斑集合的构建,也利于后续碱基的准确识别。In some embodiments, k3 and k4 are both odd numbers greater than 1, and/or k5 and k6 are both odd numbers greater than 1. Furthermore, for a target bright spot corresponding to a molecular cluster platform, for example, a template is amplified to form a molecular cluster, and the molecular cluster is fixed on the microsphere or the chip surface. Usually, the size of the molecular cluster is several Hundred nanometers, under the imaging light path of 20 times magnification, both k3 and k4 can be taken as odd numbers greater than 3, and/or both k5 and k6 can be taken as odd numbers greater than 3. In this way, calculation is facilitated, the construction of a bright spot set corresponding to the template is facilitated, and it is also conducive to the accurate identification of subsequent bases.

S2与变换后的图像的像素相关,例如,S2可通过该变换后的图像的所有像素进行确定。在一些实施例中,S2不小于卷积后的图像的所有像素按像素值升序排序的中位数,和/或不大于该变换后的图像的所有像素按像素值升序排序的第八十分位数。在一个示例中,将输入的图像转化成256色图(16位图),S2可设置为19-25中的任意值。如此,能有效地进行亮斑的检测。S2 is related to the pixels of the transformed image, for example, S2 can be determined by all pixels of the transformed image. In some embodiments, S2 is not less than the median of all pixels of the convolved image sorted in ascending order of pixel value, and/or is not greater than the eightieth percentile of all pixels of the transformed image sorted in ascending order of pixel value. number of digits. In one example, the input image is converted into a 256-color image (16-bit image), and S2 can be set to any value from 19 to 25. In this way, bright spots can be detected effectively.

在一个示例中,对原图进行高斯滤波后进行上述卷积运算,获得卷积后的图;找出该卷积后的图上的所有具备峰值的点(亮斑),并且保证峰值大于特定值,例如设定特定值为19-25中的任意值,一般地,峰值越大表示这个点越亮、形态越好,具体地,该变换图上的每个像素都具有一个midS,凸起的位置对应的MidS的数值是较高的,此处设置25为过滤阈值,大于25以上的该位置,可认为此处是一个凸起的点;进一步地,对所有符合上述条件的点,在原图上使用3*3区域重心法确定其亚像素坐标。In one example, perform Gaussian filtering on the original image and then perform the above convolution operation to obtain the convolved image; find all peak points (bright spots) on the convolved image, and ensure that the peak value is greater than a certain Value, for example, set a specific value to any value between 19 and 25. Generally, the larger the peak value, the brighter the point and the better the shape. Specifically, each pixel on the transformation map has a midS, bulge The value of MidS corresponding to the position is relatively high. Here, 25 is set as the filtering threshold. The position greater than 25 can be considered as a raised point; further, for all points that meet the above conditions, in the original The 3*3 area center of gravity method is used to determine its sub-pixel coordinates in the figure.

在一些实施例中,在利用上述任一示例的方法检测图像上的亮斑之后,还包括基于原始图像上该亮斑所在区域的强度对检测出的亮斑进行筛选。如此,去除相对较暗或者特别亮的亮斑或者说去除很可能不是或不单纯是来自于目标分子的信号,利于减少计算量,利于提高高质量下机数据的比例。In some embodiments, after using the method of any of the above examples to detect bright spots on the image, the method further includes filtering the detected bright spots based on the intensity of the area where the bright spots are located on the original image. In this way, removing relatively dark or extremely bright spots or signals that are likely not or not simply from target molecules will help reduce the amount of calculations and increase the proportion of high-quality offline data.

在又一些实施例中,所称的亮斑检测包括利用k7*k8矩阵对所称的一组图像中的各个图像进行检测,包括:判定指定方向的多个像素为单调波动的k7*k8矩阵对应一个候选亮斑;利用相应k7*k8矩阵中的至少一部分区域的像素对该候选亮斑进行筛选,以确定所称的亮斑,k7和k8均为大于1的自然数,k7*k8矩阵包含k7*k8个像素。In some embodiments, the so-called bright spot detection includes using a k7*k8 matrix to detect each image in the so-called set of images, including: determining that multiple pixels in a specified direction are monotonically fluctuating k7*k8 The matrix corresponds to a candidate bright spot; the candidate bright spot is screened using at least a part of the pixels in the corresponding k7*k8 matrix to determine the so-called bright spot. Both k7 and k8 are natural numbers greater than 1. The k7*k8 matrix Contains k7*k8 pixels.

类似地,k7和k8或取值一般与模板分子在固相基质上的密度和分布以及成像分辨率有关,一般期望k7*k8不小于一个目标亮斑的大小,所称的目标亮斑对应目标信号或者对应目标分子/分子簇;较佳地,一般也期望k7*k8小于图像上两个独立的亮斑所占的大小。Similarly, the values of k7 and k8 are generally related to the density and distribution of template molecules on the solid matrix and the imaging resolution. It is generally expected that k7*k8 is not less than the size of a target bright spot, and the so-called target bright spot corresponds to the target The signal may correspond to the target molecule/molecule cluster; preferably, it is generally expected that k7*k8 is smaller than the size occupied by two independent bright spots on the image.

在一些示例中,k7等于k8,和/或k7和k8均为大于1的奇数。如此,便于计算,便于对应于模板的亮斑集合的构建,也便于后续碱基识别的进行。In some examples, k7 is equal to k8, and/or both k7 and k8 are odd numbers greater than 1. In this way, calculation is facilitated, construction of a bright spot set corresponding to the template is facilitated, and subsequent base identification is facilitated.

所称的指定方向可以是经过k7*k8矩阵中心例如中心像素像素或亚像素的任意方向;所称的单调波动指指定方向上的多个像素的像素值围绕着k7*k8矩阵的中心无波动、呈对称波动或者呈近似对称波动。The so-called specified direction can be any direction passing through the center of the k7*k8 matrix, such as the central pixel or sub-pixel; the so-called monotonic fluctuation means that the pixel values of multiple pixels in the specified direction have no fluctuation around the center of the k7*k8 matrix. , showing symmetrical fluctuations or almost symmetrical fluctuations.

在一个示例中,请参阅图6,图6显示一个5*5矩阵的指定方向的多个像素点的像素值呈单调波动,一个具体的指定方向可以为任一箭头所示的方向,a0、a1、a2和a3示意所在像素点的像素值,该矩阵对应一个候选亮斑。In an example, please refer to Figure 6. Figure 6 shows that the pixel values of multiple pixels in the specified direction of a 5*5 matrix fluctuate monotonically. A specific specified direction can be the direction indicated by any arrow, a0, a1, a2 and a3 indicate the pixel values of the pixels where they are located, and this matrix corresponds to a candidate bright spot.

所称的利用相应k7*k8矩阵中的至少一部分区域的像素对该候选亮斑进行筛选,能够进一步去除相对较暗或者特别亮的亮斑或者说去除很可能不是或不单纯是来自于目标分子的信号,利于减少计算量,利于提高高质量下机数据的比例。The so-called screening of the candidate bright spots by using at least part of the pixels in the corresponding k7*k8 matrix can further remove relatively dark or particularly bright spots, or remove the bright spots that are likely not or not simply from the target molecule. signal, which will help reduce the amount of calculation and increase the proportion of high-quality offline data.

例如,以对应候选亮斑的k7*k8矩阵中的全部像素、任意一行或一列的像素的平均值或者高频值作为背景,比较候选亮斑的中心的强度和该背景的大小,对该候选亮斑进行筛选,例如设置筛选条件为候选亮斑的中心的强度不小于3倍的背景,满足该条件的候选亮斑为所称的亮斑。如此,能够提高下机数据中高质量数据的比例。For example, all pixels in the k7*k8 matrix corresponding to the candidate bright spot, the average value of the pixels in any row or column, or the high-frequency value are used as the background, and the intensity of the center of the candidate bright spot is compared with the size of the background, and the candidate To filter bright spots, for example, set the filtering condition to be that the intensity of the center of the candidate bright spot is not less than 3 times the background. Candidate bright spots that meet this condition are so-called bright spots. In this way, the proportion of high-quality data in the offline data can be increased.

在一些示例中,亮斑检测还包括利用重心法确定检测出的亮斑的亚像素坐标。如此,获得亮斑的坐标信息。In some examples, the bright spot detection further includes determining the sub-pixel coordinates of the detected bright spot using a center of gravity method. In this way, the coordinate information of the bright spot is obtained.

基于光路系统实现成像的系统,一般不可避免的存在色差,色差一般会使得一个静止的信号在不同时间点采集的多个像中具有不同的位置;另外,若使用的测序平台是基于成像系统和芯片的相对运动,对芯片上的一个视野进行多次图像采集,不同轮测序的同一视野的图像采集涉及相关结构的机械运动,一般也会造成同一视野在不同时间点采集的图像中具有不同的位置。对齐所称的一组图像至少可以至少一定程度地纠正由于上述原因造成的位置偏差。Systems that realize imaging based on optical path systems generally inevitably have chromatic aberration. Chromatic aberration generally causes a stationary signal to have different positions in multiple images collected at different time points; in addition, if the sequencing platform used is based on an imaging system and The relative movement of the chip requires multiple image acquisitions of a field of view on the chip. Image acquisition of the same field of view in different rounds of sequencing involves mechanical movement of related structures, which generally results in different images of the same field of view collected at different time points. Location. Aligning a set of images can at least correct positional deviations due to the above reasons at least to some extent.

在一些实施例中,对齐该一组图像,包括以该一组图像中的任一图像比如以第一图像的坐标系为基准,分别对第二图像、第三图像和第四图像的坐标系进行转换,以使该一组图像的坐标系相同。In some embodiments, aligning the set of images includes using any image in the set of images, such as the coordinate system of the first image as a reference, to respectively align the second image, the third image and the fourth image. The coordinate system is transformed so that the coordinate system of the set of images is the same.

本发明该实施方式对转换坐标系的方法不作限制,例如,可利用MatLab相关函数进行。This embodiment of the present invention does not limit the method of converting the coordinate system. For example, MatLab correlation functions can be used.

具体地,一轮测序中,一个视野的四张图像来自两个相机的四个波段,尽管已尽量进行光学调整,该四张图像之间仍存在像素偏移(色差),一般地,光学设置不变,可认为相应的色差造成的偏移是固定的;假如该一组图像来自第一轮测序(cycle1)或者前几轮测序,在cycle1或者前几轮测序中,一般对应四种碱基的四种信号的指定两种之间没有发生串扰或者串扰不明显,例如,ATGC分别带有ATTO-532、ROX、CY5和IF700四种荧光染料,在前几轮测序中的任一轮测序中,于某个一个时间点,同时地第一相机拍A、第二相机拍G,于另一时间点,同时地第一相机拍T、第二相机拍C,从该轮采集得的图像/信号来看,A和T信号或者G和C信号通常会存在串扰,但C和T或者A和G的信号不存在串扰或者串扰不明显,所称的C和T的信号不存在串扰或者串扰不明显,表现为某个位置采集有C信号时采集不到T信号(C亮的T不亮),因而,在某次测序中,一般难以用该测序的前几轮中的一轮图像来确定该固定的偏移量。Specifically, in one round of sequencing, four images of one field of view come from four bands of two cameras. Although optical adjustments have been made as much as possible, there is still a pixel shift (chromatic aberration) between the four images. Generally, the optical settings unchanged, it can be considered that the offset caused by the corresponding color difference is fixed; if this set of images comes from the first round of sequencing (cycle1) or previous rounds of sequencing, in cycle1 or previous rounds of sequencing, it generally corresponds to the four bases There is no crosstalk or the crosstalk is not obvious between the designated two of the four base signals. For example, ATGC carries four fluorescent dyes: ATTO-532, ROX, CY5 and IF700 respectively. In any of the previous rounds of sequencing, , at a certain time point, the first camera takes a picture of A and the second camera takes a picture of G at the same time. At another time point, the first camera takes a picture of T and the second camera takes a picture of C at the same time. The images collected from this round / signal, there is usually crosstalk between A and T signals or G and C signals, but there is no crosstalk or the crosstalk is not obvious between C and T or A and G signals. The so-called C and T signals do not have crosstalk or crosstalk. It is not obvious, which means that when a C signal is collected at a certain position, the T signal cannot be collected (C is bright and T is not bright). Therefore, in a certain sequencing, it is generally difficult to use one round of images in the first few rounds of the sequencing. Determine this fixed offset.

因此,在一些示例中,对齐所称的一组图像,包括利用来自第M轮测序的图像进行。M例如大于20、30或者50。一轮测序一般能确定模板上的一个位置的碱基类型,测序进行到第M轮(cycle M)时,例如第20、50、80、100或150轮,由于荧光染料发射光谱部分重叠引起的串色和/或由于化学反应不同步引起的相位失衡,由于累积或叠加等一般已比较明显,表现为四种碱基的信号两两之间均存在串扰,可以利用该第M轮采集得的图像进行偏移量的确定,进而对齐该一组图像。Thus, in some examples, a set of images is aligned, including using images from the Mth round of sequencing. M is greater than 20, 30 or 50, for example. One round of sequencing can generally determine the base type at a position on the template. When sequencing reaches cycle M, such as the 20th, 50th, 80th, 100th or 150th round, the emission spectra of fluorescent dyes partially overlap. Cross-color and/or phase imbalance caused by unsynchronized chemical reactions are generally more obvious due to accumulation or superposition, and are manifested as crosstalk between the signals of the four bases. The M-th round of collected signals can be used The image offset is determined to align the set of images.

在一些示例中,第M轮测序的图像包括第五图像、第六图像、第七图像和第八图像,所称的第五图像、第六图像、第七图像和第八图像分别与该一组图像中的第一图像、第二图像、第三图像和第四图像对应相同种核苷酸。In some examples, the images of the Mth round of sequencing include a fifth image, a sixth image, a seventh image and an eighth image. The so-called fifth image, sixth image, seventh image and eighth image are respectively related to the first image. The first image, the second image, the third image and the fourth image in the set of images correspond to the same kind of nucleotide.

在一个示例中,构建对应于模板的亮斑集合的一组图像也来自该第M轮,所称的第五图像、第六图像、第七图像和第八图像分别同该一组图像中的第一图像、第二图像、第三图像和第四图像。In one example, the set of images that constructs the bright spot set corresponding to the template also comes from the Mth round, and the so-called fifth image, sixth image, seventh image and eighth image are respectively the same set of images. The first image, the second image, the third image and the fourth image in .

在一个示例中,所称的一组图像来自cycle1,利用相同视野的第100轮(cycle100)的图像来确定该偏移量,以对齐该一组图像。具体地,例如以第五图像的坐标系为基准,分别对第六图像、第七图像和第八图像的坐标系进行转换,可包括:以相同方式分别将第五图像和第六图像划分成一组大小为k9*k10的块,k9和k10均为大于30的自然数,k9*k10包含k9*k10个像素;分别确定第六图像的每个块相对于第五图像的相应块的偏移量;基于该偏移量,对齐第二图像和第一图像。类似地,对齐第三图像和第一图像、第四图像和第一图像,以快速且准确地实现该一组图像的对齐。In one example, a set of images is said to be from cycle 1 and the offset is determined using images from cycle 100 of the same field of view to align the set of images. Specifically, for example, using the coordinate system of the fifth image as a reference, converting the coordinate systems of the sixth image, the seventh image, and the eighth image respectively may include: dividing the fifth image and the sixth image into one in the same manner. For blocks with a group size of k9*k10, k9 and k10 are both natural numbers greater than 30, and k9*k10 contains k9*k10 pixels; determine the offset of each block of the sixth image relative to the corresponding block of the fifth image. ;Based on this offset, align the second image with the first image. Similarly, the third image is aligned with the first image, and the fourth image is aligned with the first image to achieve alignment of the set of images quickly and accurately.

k9和k10可以相等也可以不相等。k9和k10的取值受限于检测区域上目标分子/分子簇的分布、密度和成像分辨率,期望一个k9*k10的块上存在的亮斑的数目具有统计意义,例如大于30、50、100或500。k9 and k10 may or may not be equal. The values of k9 and k10 are limited by the distribution, density and imaging resolution of target molecules/molecule clusters in the detection area. It is expected that the number of bright spots present on a k9*k10 block has statistical significance, for example, greater than 30, 50, 100 or 500.

假设色差造成的一轮测序的特定视野的多个图像的偏移量是固定的,可以理解地,只要该视野的一轮测序的图像的两两之间均存在信号串扰,不论该些串扰的信号在图像上的形态,该轮测序的图像就可以用来确定所称的固定的偏移量,进而对齐构建对应于模板的亮斑集合的一组图像。在一些情况中,可以在不同检测区域例如不同芯片或者同一个检测区域上设置横纵上规律分布的网格信号作为特征信息(信源),这些特征信息在不同通道或波段上都能够成像,即在采集哪种碱基信号时都能采集到,利用该些特征信息包括其分布规律容易对齐各图像。在把图像划分成多个块之后,通过对齐该些特征信息,可确定一组组相应块的偏移量。Assuming that the offset of multiple images in a specific field of view caused by chromatic aberration is fixed, it can be understood that as long as there is signal crosstalk between each pair of images in a round of sequencing in that field of view, regardless of the nature of the crosstalk The shape of the signal on the image, the image of this round of sequencing can be used to determine the so-called fixed offset, and then align to construct a set of images corresponding to the bright spot set of the template. In some cases, grid signals regularly distributed horizontally and vertically can be set as feature information (source) in different detection areas, such as different chips or the same detection area. These feature information can be imaged in different channels or bands. That is, all base signals can be collected when collecting them, and the characteristic information including their distribution rules can be used to easily align the images. After dividing the image into multiple blocks, by aligning the feature information, the offset of a group of corresponding blocks can be determined.

将图像划分成块,相邻块之间可以重叠,也可以不重叠。在一个示例中,相邻块不重叠且相邻块之间有一条公共的边或顶点。Divide the image into blocks, and adjacent blocks may or may not overlap. In one example, adjacent blocks do not overlap and there is a common edge or vertex between adjacent blocks.

图7示意一个实施例中的将图像划分成多个块、确定块与块之间的偏移量来对齐一个视野的一轮图像的过程,图上的黑色方块表示一个所称的块,具体地,以对应碱基G的图像(简称G图)上的一个块的坐标系为基准/参考,分别确定A图、C图和T图上的相应块相对于G图的该块的偏移量。Figure 7 illustrates the process of dividing an image into multiple blocks and determining the offset between blocks to align a round of images in a field of view in one embodiment. The black square in the figure represents a so-called block. Specifically, Ground, using the coordinate system of a block on the image corresponding to base G (referred to as the G picture) as the base/reference, determine the offset of the corresponding block on the A picture, C picture and T picture relative to the block on the G picture. quantity.

测试中发现,相应的块与块之间的偏移量不是固定的,即位于图像上不同位置的两个组合块的偏移量是不相同的,例如,位于两个图像的中心区域的两个块的偏移量为5像素(pixels),位于两个图像的边缘区域的两个块的偏移量为10pixels;而且,相邻的块组合的偏移量的差异较小。例如,对于4112*2176的图像,长边4112的偏移量为4-5个pixels,短边2176的偏移量大概为2-3个pixels。在一个示例中,k9=k10=100,一般地,可认为在100*100大小的一个块内部,偏移量是恒定的,图8示意将两个图像分成100*100大小的块后,至少一部分的相应块组合之间的偏移量,图8示意的偏移量表格可以表示两张图之间的坐标系关系。During the test, it was found that the offsets between the corresponding blocks are not fixed, that is, the offsets of the two combined blocks located at different positions on the image are different. For example, the offsets of the two combined blocks located in the central areas of the two images are not the same. The offset of a block is 5 pixels (pixels), and the offset of two blocks located in the edge area of the two images is 10 pixels; moreover, the offset difference of adjacent block combinations is small. For example, for a 4112*2176 image, the offset of the long side 4112 is 4-5 pixels, and the offset of the short side 2176 is about 2-3 pixels. In an example, k9=k10=100. Generally, it can be considered that the offset is constant within a block of 100*100 size. Figure 8 shows that after dividing two images into blocks of 100*100 size, at least The offset between a part of the corresponding block combinations. The offset table shown in Figure 8 can represent the coordinate system relationship between the two images.

在一些实施例中,所称的合并对齐后的一组图像上的亮斑,包括将预设范围k11*k12内的多个亮斑合并为一个亮斑,k11和k12均为大于1的自然数,k11*k12包含k11*k12个像素。In some embodiments, the so-called merging of bright spots on a set of aligned images includes merging multiple bright spots within the preset range k11*k12 into one bright spot, where both k11 and k12 are greater than 1. Natural numbers, k11*k12 contains k11*k12 pixels.

一般地,设置k11*k12不大于两个分离的目标亮斑的大小,较佳地,设置k11*k12不大于一个目标亮斑的大小。Generally, k11*k12 is set to be no larger than the size of two separate target bright spots. Preferably, k11*k12 is set to be no larger than the size of one target bright spot.

在某个成像系统中,例如,电子传感器的尺寸为6.5μm,显微镜放大倍率60倍,分辨率为0.1μm,对应于目标分子包括分子簇的一个亮斑的大小一般小于10*10或者5*5。可设置k11=k12=3,即设置所称的预设范围为3*3来进行亮斑合并,如此,能准确的构建得对应于模板的亮斑集合。In an imaging system, for example, the size of the electronic sensor is 6.5 μm, the microscope magnification is 60 times, and the resolution is 0.1 μm. The size of a bright spot corresponding to the target molecule including the molecular cluster is generally less than 10*10 or 5* 5. You can set k11=k12=3, that is, set the so-called preset range to 3*3 to perform bright spot merging. In this way, a bright spot set corresponding to the template can be accurately constructed.

具体地,对预设范围内的亮斑进行合并时,可以先设置一个空白集合/空白图/空白模板(TemplateVec),然后依次将第一图像、第二图像、第三图像或第四图像(简称为A图、C图、G图和T图)上的亮斑标注到该空白图上,在标注某个亮斑时,若发现它的临近位置(预设范围内)已有一个亮斑,可根据这两亮斑的强度做权重,来确定这合并这两个亮斑后的新亮斑的位置,例如,亮斑1的强度是350、坐标是(3.0,5.0),亮斑2的强度是150、坐标是(4.0,7.0),将这两个亮斑标注为一个新亮斑,新亮斑的强度为290、坐标为(3.3,5.6)。如此,实现该一组图像上的符合预设条件的亮斑的合并,便于获得对应于模板的亮斑集合。Specifically, when merging bright spots within a preset range, you can first set a blank set/blank image/blank template (TemplateVec), and then sequentially combine the first image, the second image, the third image or the fourth image ( The bright spots on the picture (referred to as picture A, picture C, picture G and picture T) are marked on the blank picture. When marking a bright spot, if it is found that there is a bright spot nearby (within the preset range) , the weight of the two bright spots can be used to determine the position of the new bright spot after merging the two bright spots. For example, the intensity of bright spot 1 is 350, and the coordinates are (3.0,5.0), and bright spot 2 The intensity is 150 and the coordinates are (4.0,7.0). Mark these two bright spots as a new bright spot. The intensity of the new bright spot is 290 and the coordinates are (3.3,5.6). In this way, the bright spots that meet the preset conditions on the set of images are combined to facilitate obtaining a set of bright spots corresponding to the template.

请参阅图9,图9示意一个实施例中的利用来自一轮测序的一组图像进行对应于模板的亮斑集合的构建过程,包括对一组图像中的A图、C图、G图和T图上的亮斑进行检测识别,获得各图像的亮斑集,以G图的坐标系为参考坐标系,对齐该一组图像包括合并各图的亮斑集,获得一级亮斑集合,再经过坐标系转换,将一级亮斑集合的坐标系转换成A图、C图、G图和T图原本的坐标系,获得对应于四种核苷酸/碱基的亮斑集,即获得四种核苷酸/碱基的模板。Please refer to Figure 9. Figure 9 illustrates a process of constructing a set of bright spots corresponding to a template using a set of images from a round of sequencing in one embodiment, including the process of constructing a set of images A, C, and G in a set of images. The bright spots on the picture and T picture are detected and identified to obtain the bright spot set of each image. Using the coordinate system of the G picture as the reference coordinate system, aligning the set of images includes merging the bright spot sets of each picture to obtain a first-level bright spot. Spot set, and then through coordinate system conversion, convert the coordinate system of the first-level bright spot set into the original coordinate system of the A picture, C picture, G picture and T picture, and obtain the bright spots corresponding to the four nucleotides/bases Set, that is, templates of four nucleotides/bases are obtained.

在该实施方式中,S21中的强度为矫正后的强度。在一些实施例中,矫正强度包括串色矫正和/或相位矫正。In this embodiment, the intensity in S21 is the intensity after correction. In some embodiments, the correction intensity includes cross-color correction and/or phase correction.

具体地,在对待检图像上的相应坐标位置的强度进行矫正之前,使待检图像对齐于所称的对应于模板的亮斑集合。如此,利于后续步骤的进行。Specifically, before correcting the intensity of the corresponding coordinate position on the image to be inspected, the image to be inspected is aligned with the so-called bright spot set corresponding to the template. This will facilitate subsequent steps.

在一个示例中,ATGC分别带有ATTO-532、ROX、CY5和IF700四种荧光染料,在测序中,利用两种波段的激光分别激发该四种荧光染料,每次激发后利用两个相机同时采集荧光信号;图10显示依据该示例,第50轮测序中的一个视野的四张图像两两之间的crosstalk图,从上到下、从左到右依次为碱基A-C crosstalk散点图(横坐标是A信号的相对强度、纵坐标是C信号的相对强度)、碱基A-G crosstalk散点图(横坐标是A信号的相对强度、纵坐标是G信号的相对强度)、碱基A-Tcrosstalk散点图(横坐标是A信号的相对强度、纵坐标是T信号的相对强度)、碱基C-G crosstalk散点图(横坐标是C信号的相对强度、纵坐标是G信号的相对强度)、碱基C-T crosstalk散点图(横坐标是C信号的相对强度、纵坐标是T信号的相对强度)和碱基G-Tcrosstalk散点图(横坐标是G信号的相对强度、纵坐标是T信号的相对强度),各散点图上的一个点表示所称的待检图像上相应坐标的一个位置;从各crosstalk散点图的两个臂和图上的点的弥散情况可以看出,A信号(A图)受到T信号的串扰较明显,C信号(C图)受到G信号的串扰较明显,表现为多个A图上的相应坐标的位置具有较明显的T信号,多个C图上的相应坐标的位置存在较明显的G信号。In one example, ATGC contains four fluorescent dyes: ATTO-532, ROX, CY5 and IF700. During sequencing, lasers of two bands are used to excite the four fluorescent dyes respectively. After each excitation, two cameras are used to simultaneously Collect fluorescence signals; Figure 10 shows the crosstalk diagram between four images of one field of view in the 50th round of sequencing based on this example. From top to bottom and from left to right, the crosstalk scatter diagram of bases A-C ( The abscissa is the relative intensity of the A signal, the ordinate is the relative intensity of the C signal), base A-G crosstalk scatter plot (the abscissa is the relative intensity of the A signal, the ordinate is the relative intensity of the G signal), base A- Tcrosstalk scatter plot (the abscissa is the relative intensity of the A signal, the ordinate is the relative intensity of the T signal), base C-G crosstalk scatter plot (the abscissa is the relative intensity of the C signal, the ordinate is the relative intensity of the G signal) , base C-T crosstalk scatter plot (the abscissa is the relative intensity of the C signal, the ordinate is the relative intensity of the T signal) and the base G-T crosstalk scatter plot (the abscissa is the relative intensity of the G signal, the ordinate is the T The relative strength of the signal), a point on each scatter plot represents a position of the corresponding coordinates on the so-called image to be inspected; it can be seen from the two arms of each crosstalk scatter plot and the dispersion of the points on the plot, The A signal (A picture) is more obviously affected by the crosstalk of the T signal, and the C signal (C picture) is more obviously affected by the G signal. This is manifested in the fact that the positions of the corresponding coordinates on multiple A pictures have obvious T signals, and multiple C There are obvious G signals at the corresponding coordinates on the map.

在一些示例中,对强度的矫正包括串色(crosstalk)矫正,基于来自相同一轮测序、相同视野且对应不同种类核苷酸/碱基的图像的至少之一进行该串色矫正。In some examples, correction of intensity includes crosstalk correction based on at least one of the images from the same round of sequencing, the same field of view, and corresponding to different kinds of nucleotides/bases.

矫正crosstalk,利于碱基的准确识别。在一些示例中,图像Xi和待检图像来自相同一轮测序,图像Xi和待检图像对应相同的视野,待检图像受到图像Xi对应的核苷酸的信号的串扰,对待检图像进行串色矫正,包括:对待检图像的特定区域内的多个相应坐标的位置的信号进行拟合,获得拟合结果;以及,基于该拟合结果矫正该待检图像上的相应坐标的位置的信号。如此,能够消除待检图像上的来自对应于图像Xi的碱基的信号串扰,使待检图像中的信号尽量都只对应一种碱基,利于碱基的准确识别,利于准确地测定出核苷酸的序列。Correct crosstalk to facilitate accurate identification of bases. In some examples, the image Xi and the image to be inspected come from the same round of sequencing, the image Xi and the image to be inspected correspond to the same field of view, the image to be inspected is subject to crosstalk by the signal of the nucleotide corresponding to the image Xi, and the image to be inspected is cross-colored. Correction includes: fitting signals at positions of multiple corresponding coordinates in a specific area of the image to be inspected to obtain a fitting result; and correcting signals at positions of corresponding coordinates on the image to be inspected based on the fitting results. In this way, the signal crosstalk from the bases corresponding to the image Xi on the image to be detected can be eliminated, so that the signals in the image to be detected correspond to only one kind of base as much as possible, which is conducive to the accurate identification of bases and the accurate determination of nuclei. The sequence of nucleotides.

如无例外说明,以“AC矫正”或“A->C”或“A-C”表示矫正C图的相应坐标的位置上受到的A信号串扰(即矫正A信号对C信号的串扰);类似地,“TA矫正”或“T->A”表示矫正A图上的相应坐标的位置受到的T信号的串扰(即矫正T信号对A信号的串扰),“CG矫正”或“C->G”表示矫正G图上的相应坐标的位置受到的C信号的串扰(即矫正C信号对G信号的串扰)等等。If there are no exceptions, "AC correction" or "A->C" or "A-C" represents the crosstalk of the A signal at the corresponding coordinate position of the corrected C chart (that is, the crosstalk of the corrected A signal to the C signal); similarly , "TA correction" or "T->A" means correcting the crosstalk of the T signal at the corresponding coordinate position on the A map (that is, correcting the crosstalk of the T signal to the A signal), "CG correction" or "C->G ” means correcting the crosstalk of the C signal at the corresponding coordinate position on the G map (that is, correcting the crosstalk of the C signal to the G signal) and so on.

四维数据两两进行矫正,有12种情况,矫正过程可表示为其中的/>为crosstalk矫正系数矩阵,该crosstalk矩阵中的值表示两两信号的拟合结果(矫正系数),如RAC表示矫正C图上的相应坐标位置上受到的A信号的串扰时依据的拟合结果/矫正系数,为观测值,/>为真实值(矫正后的值)。Four-dimensional data are corrected in pairs. There are 12 situations. The correction process can be expressed as of which/> is the crosstalk correction coefficient matrix. The value in the crosstalk matrix represents the fitting result (correction coefficient) of the pair of signals. For example, R AC represents the fitting result based on the correction of the crosstalk of the A signal at the corresponding coordinate position on the C diagram. /Correction coefficient, is the observed value,/> is the true value (corrected value).

所称的特定区域,可以是整个待检图像,也可以是待检图像的一部分。较佳地,特定区域选自待检图像的中心区域的至少一部分,图像的中心区域可如一般所理解的,例如,对于大小为4000*2000的图像,该图像的中心区域可以是3000*1500、2056*1024、2000*1500、1024*1024、1024*512、1000*500、1000*1000、512*512或者512*256等,相对的,图像的其它区域可以称为边缘区域。一般地,图像的中心区域内的相应坐标的位置的强度值波动较小,在crosstalk图上表现为较为会聚,如图11示例的A-G crosstalk散点图中的黑圈中的点。利用该区域中的至少一部分位置的强度值进行拟合而确定的拟合结果/矫正系数进行矫正,能够快速且准确地实现色差矫正。The so-called specific area may be the entire image to be inspected or a part of the image to be inspected. Preferably, the specific area is selected from at least a part of the central area of the image to be inspected. The central area of the image can be generally understood. For example, for an image with a size of 4000*2000, the central area of the image can be 3000*1500. , 2056*1024, 2000*1500, 1024*1024, 1024*512, 1000*500, 1000*1000, 512*512 or 512*256, etc. In contrast, other areas of the image can be called edge areas. Generally, the intensity value of the corresponding coordinate position in the central area of the image fluctuates less and appears to be relatively convergent on the crosstalk diagram, such as the points in the black circles in the A-G crosstalk scatter diagram in Figure 11. Using the fitting results/correction coefficients determined by fitting the intensity values of at least a part of the region for correction, chromatic aberration correction can be achieved quickly and accurately.

该实施方式对拟合的方法不作限制,例如可以利用MatLab cftool曲线拟合工具箱、aTool、CurveExpert等软件进行;拟合可以是线性拟合,也可以是非线性拟合。关于用于拟合的数据量或者说采样量,即选取多少个图像上的特定区域内的相应坐标的位置的强度来做该拟合,没有特别限制,原则上,只要能求解出欲拟合的Y元方程的Y个系数就行,例如对于线性拟合,可以取2个、5个、10个、20个、30个或者50个;较佳地,希望采样量能具有统计学意义,例如不小于20个、30个或50个;可选的,为了使计算量不至于太大,同时可限制采样量小于200或者小于100。如此,利用相应的拟合结果(矫正系数)能够准确地实现矫正。This embodiment does not limit the fitting method. For example, MatLab cftool curve fitting toolbox, aTool, CurveExpert and other software can be used; the fitting can be linear fitting or nonlinear fitting. There are no special restrictions on the amount of data or sampling amount used for fitting, that is, how many intensity positions of corresponding coordinates in a specific area on the image are selected to do the fitting. In principle, as long as the desired fitting can be solved Y coefficients of the Y-element equation are enough. For example, for linear fitting, you can take 2, 5, 10, 20, 30 or 50 coefficients; preferably, you hope that the sampling amount can be statistically significant, for example No less than 20, 30 or 50; optionally, in order to prevent the calculation amount from being too large, the sampling amount can be limited to less than 200 or less than 100. In this way, correction can be accurately achieved using the corresponding fitting results (correction coefficients).

在一些示例中,进行线性拟合。如此,便于计算,花费时间少,利于快速矫正。In some examples, a linear fit is performed. In this way, it is easy to calculate, takes less time, and is conducive to rapid correction.

具体地,请参阅图12和图13,在一个示例中,待检图像为A图,图像Xi为T图,选取待检图像上的中心区域的20个相应坐标的位置的信号的强度进行线性拟合,图12显示了该拟合的结果,横坐标为A的相对信号强度值,纵坐标为T的相对信号强度值,该拟合的结果确定了拟合的直线的斜率k,以该斜率作为矫正系数,对待检图像的每个相应坐标的位置的信号的强度进行矫正,例如IT'=IT-IA×k,IT'为矫正后的该位置上的T信号强度,IT为观测得的该位置的T信号强度(观测值),IA为观测得的该位置的A信号强度(观测值);图13示意待检图像上的该20个相应坐标的位置的信号的矫正前和利用该方式进行矫正的矫正后的结果。如此,能够消除或削减T信号对待检图像的相应坐标的位置的信号强度的波动的贡献,获得矫正后的待检图像。Specifically, please refer to Figure 12 and Figure 13. In one example, the image to be inspected is picture A, and the image Xi is picture T. The intensity of the signals at the 20 corresponding coordinate positions of the central area on the image to be inspected is selected for linear calculation. Fitting. Figure 12 shows the results of the fitting. The abscissa is the relative signal intensity value of A, and the ordinate is the relative signal intensity value of T. The fitting result determines the slope k of the fitted straight line. With this The slope is used as a correction coefficient to correct the signal intensity at each corresponding coordinate position of the image to be examined. For example, I T' = I T - I A × k, I T' is the T signal intensity at the position after correction, I T is the observed T signal intensity (observed value) at this position, I A is the observed A signal intensity (observed value) at this position; Figure 13 illustrates the positions of the 20 corresponding coordinates on the image to be inspected The signal before correction and the result after correction using this method. In this way, the contribution of the T signal to the fluctuation of the signal intensity at the position of the corresponding coordinates of the image to be inspected can be eliminated or reduced, and a corrected image to be inspected can be obtained.

对比图10,图14显示利用上述示例的方式进行色差矫正后的相同视野的该轮测序的四张图像两两之间的crosstalk图。可看出,经过该色差矫正,同一视野同一轮的对应于不同碱基的图像之间的信号串扰明显降低,利于碱基的准确识别以及测读得更长的序列。Compared with Figure 10, Figure 14 shows the crosstalk diagram between four images of the same field of view for this round of sequencing after chromatic aberration correction using the above example. It can be seen that after the chromatic aberration correction, the signal crosstalk between the images corresponding to different bases in the same field of view and in the same round is significantly reduced, which is beneficial to the accurate identification of bases and the detection of longer sequences.

请参阅图15-18,图15-18显示一个示例中的同一视野的对应同一种碱基的相邻轮的两图像之间的信号串扰图,图上的一个点代表一个所称的相应坐标的位置,横纵坐标均为相对信号强度;从上到下、从左至右,图15中的四张phasing散点图分别为cycle1和cycle2的两张A图、两张C图、两张G和两张T图的信号强度关系图,图16中的四张phasing散点图分别为cycle30和cycle31的两张A图、两张C图、两张G和两张T图的信号强度关系图,图17中的四张phasing散点图分别为cycle60和cycle61的两张A图、两张C图、两张G和两张T图的信号强度关系图,图18中的四张phasing散点图分别为cycle90和cycle91的两张A图、两张C图、两张G和两张T图的信号强度关系图。Please refer to Figure 15-18. Figure 15-18 shows an example of the signal crosstalk between two images of adjacent rounds corresponding to the same base in the same field of view. A point on the graph represents a so-called corresponding coordinate. The position of The signal strength relationship diagram of G and two T pictures. The four phasing scatter diagrams in Figure 16 are the signal intensity relationships of two A pictures, two C pictures, two G and two T pictures of cycle30 and cycle31 respectively. Figure, the four phasing scatter diagrams in Figure 17 are the signal intensity relationship diagrams of two A pictures, two C pictures, two G pictures and two T pictures of cycle60 and cycle61 respectively. The four phasing scatter diagrams in Figure 18 The dot plots are the signal strength relationship diagrams of two A pictures, two C pictures, two G pictures and two T pictures of cycle90 and cycle91 respectively.

可以看出,该示例中,相对于A或G,C或T的失相现象(phasing或prephasing)较明显;且随着测序轮数的增加,各种碱基的化学反应的相位失衡造成的信号串扰越发严重,结合图18可看出,该示例到第91轮测序时,失相已经造成难以准确区分出T图中的某个位置的信号是来自第90轮测序还是来自该第91轮测序。一般地,测序进行到最后,会出现所有的相应坐标的位置都亮且亮度均匀的情况,该种情况下已无法识别出正确的碱基,也就是说无法继续测序了,失相是限制边合成边测序的读长的主要原因。It can be seen that in this example, the phasing or prephasing of C or T is more obvious relative to A or G; and as the number of sequencing rounds increases, the phase imbalance of the chemical reactions of various bases is caused. The signal crosstalk is becoming more and more serious. It can be seen from Figure 18 that by the 91st round of sequencing in this example, the phase loss has made it difficult to accurately distinguish whether the signal at a certain position in the T chart comes from the 90th round of sequencing or the 91st round. Sequencing. Generally, at the end of the sequencing process, all the corresponding coordinate positions will be bright and uniform. In this case, the correct base cannot be identified, which means that sequencing cannot continue. Dephasing is the limiting edge. The main reason for read length in side-of-synthesis sequencing.

图19中的上下两个曲线图分别显示了某次核酸样本测序中的四种碱基的phasing比例或prephasing比例与测序轮数的关系,随着测序轮数增加,每种碱基的phasing和prephasing比例均增大。The upper and lower graphs in Figure 19 respectively show the relationship between the phasing ratio or prephasing ratio of the four bases in a certain nucleic acid sample sequencing and the number of sequencing rounds. As the number of sequencing rounds increases, the phasing and prephasing ratios of each base increase. The proportion of prephasing increases.

进行phasing或prephasing矫正,利于碱基的正确识别和测读得更长的序列。相位矫正,可以在进行crosstalk矫正之前进行,也可以在crosstalk矫正之后进行。Perform phasing or prephasing correction to facilitate the correct identification of bases and read longer sequences. Phase correction can be performed before crosstalk correction or after crosstalk correction.

在一些示例中,对强度的矫正包括相位矫正,基于来自相邻轮测序且对应相同种类核苷酸的图像的至少之一进行该相位矫正。In some examples, the correction of the intensity includes a phase correction based on at least one of the images from adjacent rounds of sequencing and corresponding to the same kind of nucleotide.

具体地,在一个示例中,图像Yj和待检图像来自相邻两轮测序,例如图像Yj来自于第31轮测序、待检图像来自第30轮测序,Yj和待检图像对应相同的视野,图像Yj和待检图像对应相同种类核苷酸/碱基,例如A,所称的相位矫正包括:对待检图像的特定区域内的多个相应坐标的位置的信号进行拟合,获得拟合结果;以及基于拟合结果矫正待检图像上的相应坐标的位置的信号。Specifically, in one example, the image Yj and the image to be inspected come from two adjacent rounds of sequencing. For example, the image Yj comes from the 31st round of sequencing, and the image to be inspected comes from the 30th round of sequencing. Yj and the image to be inspected correspond to the same field of view, The image Yj and the image to be tested correspond to the same type of nucleotide/base, such as A. The so-called phase correction includes: fitting the signals at the positions of multiple corresponding coordinates in a specific area of the image to be tested to obtain the fitting result. ; and a signal that corrects the position of the corresponding coordinates on the image to be inspected based on the fitting results.

类似地,这里所称的特定区域,可以是整个待检图像,也可以是待检图像的一部分。较佳地,特定区域选自待检图像的中心区域的至少一部分,图像的中心区域可如一般所理解的,例如,对于大小为4000*2000的图像,该图像的中心区域可以是3000*1500、2056*1024、2000*1500、1024*1024、1024*512、1000*500、1000*1000、512*512或者512*256等,相对的,图像的其它区域可以称为边缘区域。一般地,图像的中心区域内的相应坐标的位置的强度值波动较小,在phasing散点图上表现为较为会聚,如图20示例的cycle 30和31的A图的phasing散点图中的黑圈中的点。利用该区域中的至少一部分位置的强度值进行拟合而确定的拟合结果/矫正系数进行矫正,能够快速且准确地实现相位矫正。Similarly, the specific area referred to here may be the entire image to be inspected or a part of the image to be inspected. Preferably, the specific area is selected from at least a part of the central area of the image to be inspected. The central area of the image can be generally understood. For example, for an image with a size of 4000*2000, the central area of the image can be 3000*1500. , 2056*1024, 2000*1500, 1024*1024, 1024*512, 1000*500, 1000*1000, 512*512 or 512*256, etc. In contrast, other areas of the image can be called edge areas. Generally, the intensity value of the corresponding coordinate position in the central area of the image fluctuates less and appears more convergent on the phasing scatter plot, as shown in the phasing scatter plot of cycle 30 and cycle 31 in Figure 20. Dots in black circles. Phase correction can be achieved quickly and accurately by using the fitting result/correction coefficient determined by fitting the intensity values of at least a part of the region.

类似地,该实施方式对拟合的方法不作限制;拟合可以是线性拟合,也可以是非线性拟合。关于用于拟合的数据量或者说采样量,即选取多少个图像上的特定区域内的相应坐标的位置的强度来做该拟合,没有特别限制,原则上,只要能求解出欲拟合的Y元方程的Y个系数就行,例如对于线性拟合,可以取2个、5个、10个、20个、30个或者50个;较佳地,希望采样量能具有统计学意义,例如不小于20个、30个或50个;可选的,为了使计算量不至于太大,同时可限制采样量小于200或者小于100。如此,利用相应的拟合结果(矫正系数)能够准确地实现矫正。Similarly, this embodiment does not limit the fitting method; the fitting may be linear fitting or nonlinear fitting. There are no special restrictions on the amount of data or sampling amount used for fitting, that is, how many intensity positions of corresponding coordinates in a specific area on the image are selected to do the fitting. In principle, as long as the desired fitting can be solved Y coefficients of the Y-element equation are enough. For example, for linear fitting, you can take 2, 5, 10, 20, 30 or 50 coefficients; preferably, you hope that the sampling amount can be statistically significant, for example No less than 20, 30 or 50; optionally, in order to prevent the calculation amount from being too large, the sampling amount can be limited to less than 200 or less than 100. In this way, correction can be accurately achieved using the corresponding fitting results (correction coefficients).

在一些示例中,依据上述示例的方法在crosstalk矫正之前进行线性拟合以矫正phasing,线性拟合的R^2=0.97;在另一些示例中,取相同的多个位置的信号进行拟合,依据上述示例的方法在crosstalk矫正之后进行线性拟合以矫正phasing,线性拟合的R^2=0.93。In some examples, linear fitting is performed to correct phasing before crosstalk correction according to the method of the above example, and the R^2 of the linear fitting is 0.97; in other examples, signals at the same multiple positions are taken for fitting, According to the method of the above example, linear fitting is performed after crosstalk correction to correct phasing. The R^2 of the linear fitting is 0.93.

对于S31,在一些示例中,所称的待检图像上相应坐标的位置的信号的强度为一个包含四个数值的数组(四维数据),对应该位置上的四种核苷酸/碱基的信号强度,例如可表示为{Ints A,Ints T,Ints G,Ints C},Ints A、Ints T、Ints G和Ints C分别表示碱基A、T、G和C的信号强度值,经过矫正后,一般地,Ints A、Ints T、Ints G和Ints C具有相同的基准,可取该数组中的最大值(max)与所称的第一预设值进行比较,大于或等于该第一预设值,可判定该图像上的该位置对应的碱基型为该最大值对应的那种碱基,即识别出相应的核酸分子上的对应位置上的碱基为最大值所对应的那种碱基;若该数组中的最大值(max)小于该第一预设值,可判定该图像上的该位置所对应的碱基型无法准确识别出,可将相应核酸分子的该位置上的碱基记为N或者留空位,N为ATGC中的任意一种;在一些示例中,碱基识别后的包含N或空位的读段(reads),可以进一步处理,例如,进一步依据其他读段例如相邻读段的信息推测该读段中的N或空位所表示的碱基型,或者被部分过滤掉等,以提高产出数据的利用率或者产出数据的质量。For S31, in some examples, the so-called intensity of the signal at the corresponding coordinate position on the image to be detected is an array (four-dimensional data) containing four values, corresponding to the four nucleotides/bases at the position. The signal intensity, for example, can be expressed as {Ints A, Ints T, Ints G, Ints C}, Ints A, Ints T, Ints G and Ints C represent the signal intensity values of bases A, T, G and C respectively, after correction Finally, generally, Ints A, Ints T, Ints G and Ints C have the same basis. The maximum value (max) in the array can be compared with the so-called first preset value, which is greater than or equal to the first preset value. By setting the value, it can be determined that the base type corresponding to the position on the image is the base corresponding to the maximum value, that is, it is identified that the base at the corresponding position on the corresponding nucleic acid molecule is the base corresponding to the maximum value. base; if the maximum value (max) in the array is less than the first preset value, it can be determined that the base type corresponding to the position on the image cannot be accurately identified, and the base type at the position of the corresponding nucleic acid molecule can be The base is recorded as N or a gap, and N is any one of ATGC; in some examples, the reads containing N or gaps after base identification can be further processed, for example, based on other reads For example, the information of adjacent reads can be used to infer the base type represented by the N or gap in the read, or be partially filtered out, etc., to improve the utilization rate of the output data or the quality of the output data.

在一些示例中,{Ints A,Ints T,Ints G,Ints C}中的各个数值为经过处理例如归一化后的数值。In some examples, each value in {Ints A, Ints T, Ints G, Ints C} is a value after processing, such as normalization.

在一个示例中,对该四维数据进行质量分数(QualityScore,简称QScore)计算,QScore本质是一种先验概率,可利用已知的方法计算,例如参照[Ewing et al.,Base-calling of automated sequencer traces using phred.I.Accuracy assessment.,Genome Res.1998Mar,8(3):175-85.]进行。这里,发明人使用矫正后的4维数据中的最大值与总值的比值计算该QScore,计算得的该QScore大小的范围为[0,40],具体地,QScore=(1.0*maxInts/sumInts-0.25)/0.75*40,maxInts为Ints A、Ints T、Ints G和Ints C中的最大值,sumInts为Ints A、Ints T、Ints G和Ints C之和,相应地,设置第一预设值为0.1,若该QScore大于0.1,则判定该位置的碱基型为maxInts对应的碱基。如此,能有效进行碱基识别。In one example, the quality score (QualityScore, QScore for short) is calculated on the four-dimensional data. QScore is essentially a prior probability and can be calculated using known methods. For example, refer to [Ewing et al., Base-calling of automated sequencer traces using phred.I.Accuracy assessment.,Genome Res.1998Mar,8(3):175-85.]. Here, the inventor uses the ratio of the maximum value in the corrected 4-dimensional data to the total value to calculate the QScore. The range of the calculated QScore size is [0,40]. Specifically, QScore=(1.0*maxInts/sumInts -0.25)/0.75*40, maxInts is the maximum value among Ints A, Ints T, Ints G and Ints C, sumInts is the sum of Ints A, Ints T, Ints G and Ints C. Correspondingly, set the first preset The value is 0.1. If the QScore is greater than 0.1, it is determined that the base type at this position is the base corresponding to maxInts. In this way, base recognition can be effectively performed.

上述在流程图中表示或在此以其他方式描述的逻辑和/或步骤,可以被认为是用于实现逻辑功能的可执行指令的序列表,可以具体实现在任何计算机可读存储介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。例如,本发明实施方式的一种计算机可读存储介质,用于存储供计算机执行的程序,执行程序包括完成上述任一实施方式的方法。该计算机可读存储介质可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置,包括但不限于只读存储器,磁盘或光盘等。更具体地,该计算机可读存储介质包括以下(非穷尽性列表):具有一个或多个布线的电连接部(电子装置)、便携式计算机盘盒(磁装置)、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编辑只读存储器(EPROM或闪速存储器)、光纤装置以及便携式光盘只读存储器(CDROM)。另外,该计算机可读存储介质甚至可以是可在其上打印程序的纸或其他合适的介质,例如可以通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得程序,然后将其存储在计算机存储器中。上述对任一实施方式中的碱基识别方法的技术特征和优点的描述,同样适用于该计算机可读存储介质,在此不再赘述。The above logic and/or steps represented in the flowcharts or otherwise described herein may be considered as a sequence listing of executable instructions for implementing logical functions, and may be embodied in any computer-readable storage medium to For use by, or in conjunction with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions). equipment for use. For example, a computer-readable storage medium in an embodiment of the present invention is used to store a program for computer execution, and executing the program includes completing the method of any of the above embodiments. The computer-readable storage medium may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, device, or device, including but not limited to read-only storage media. memory, magnetic disk or optical disk, etc. More specifically, the computer readable storage medium includes the following (non-exhaustive list): electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM) , read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM). Additionally, the computer-readable storage medium may even be paper or other suitable media on which the program may be printed, for example by optical scanning of the paper or other medium, followed by editing, interpretation, or other suitable means if necessary Processing is performed to obtain a program electronically and then stored in computer memory. The above description of the technical features and advantages of the base identification method in any embodiment is also applicable to the computer-readable storage medium, and will not be described again here.

进一步地,本发明的实施方式提供了一种计算机产品,包括上述任一实施方式中的计算机可读存储介质。Further, an embodiment of the present invention provides a computer product, including the computer-readable storage medium in any of the above embodiments.

例如,本发明的实施方式提供了一种系统,包括上述任一实施方式中提供的计算机产品和至少一个处理器,用于执行存储于所述计算机可读存储介质中的程序。For example, an embodiment of the present invention provides a system, including the computer product provided in any of the above embodiments and at least one processor, for executing a program stored in the computer-readable storage medium.

例如,本发明的实施方式提供了一种计算机程序产品,包括实现识别核酸上的一个或多个碱基的指令,指令在计算机执行程序时,使计算机执行上述任一实施例中的识别核酸中的一个或多个碱基的方法。For example, embodiments of the present invention provide a computer program product, including instructions for identifying one or more bases on a nucleic acid. When the computer executes the program, the instructions cause the computer to perform the identification of nucleic acids in any of the above embodiments. method of one or more bases.

本发明的实施方式提供了一种配置成执行上述任一实施例中的识别核酸中的一个或多个碱基的方法的系统。Embodiments of the present invention provide a system configured to perform the method of identifying one or more bases in a nucleic acid in any of the above embodiments.

请参阅图21,本发明的实施方式提供了一种系统100,包括多个模块,该系统用于执行上述任一实施例中的识别核酸中的一个或多个碱基的方法的步骤。该系统100包括:映射模块110、信号确定模块120和比较模块130。映射模块110,用于将对应于模板的亮斑集合中的每个亮斑的坐标映射到待检图像上,确定待检图像上相应坐标的位置。所称的对应于模板的亮斑集合基于一组图像构建获得,一组图像中的每个图像均包含多个亮斑;该一组图像和所述待检图像均来自测序且对应一个相同的视野;测序包括加入核苷酸进行多轮测序,所称的一组图像来自至少一轮测序,至少一部分所述信号在所述一组图像上表现为至少一部分所述亮斑;Referring to Figure 21, an embodiment of the present invention provides a system 100, which includes a plurality of modules and is used to perform the steps of the method for identifying one or more bases in a nucleic acid in any of the above embodiments. The system 100 includes a mapping module 110 , a signal determination module 120 and a comparison module 130 . The mapping module 110 is used to map the coordinates of each bright spot in the bright spot set corresponding to the template onto the image to be inspected, and determine the position of the corresponding coordinates on the image to be inspected. The so-called bright spot set corresponding to the template is constructed based on a set of images. Each image in the set of images contains multiple bright spots; the set of images and the image to be detected are both derived from sequencing and correspond to An identical field of view; sequencing includes adding nucleotides for multiple rounds of sequencing, the set of images is from at least one round of sequencing, and at least a portion of the signal appears as at least a portion of the bright spots on the set of images ;

信号确定模块120,用于确定来自映射模块110的待检图像上相应坐标的位置的信号的强度,所称的强度为矫正后的强度;以及比较模块130,用于比较来自信号确定模块120的待检图像上相应坐标的位置的信号的强度与第一预设值的大小,基于比较结果判断该位置对应的碱基类型,实现所述碱基识别。The signal determination module 120 is used to determine the intensity of the signal at the position of the corresponding coordinate on the image to be detected from the mapping module 110. The so-called intensity is the corrected intensity; and the comparison module 130 is used to compare the signal from the signal determination module 120. The intensity of the signal at the position of the corresponding coordinate on the image to be detected is compared with the size of the first preset value, and the base type corresponding to the position is determined based on the comparison result to realize the base identification.

本领域技术人员知晓,除了以纯计算机可读程序代码方式实现控制器/处理器外,完全可以通过将方法步骤进行逻辑变成来使得控制器以逻辑门、开关、专用集成电路、可编辑逻辑控制器和嵌入微控制器等的形式来实现相同的功能。因此,这种控制器/处理器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的的软件模块又可以是硬件部件内的结构。Those skilled in the art know that in addition to implementing the controller/processor in the form of pure computer-readable program code, the controller can be controlled by logic gates, switches, application-specific integrated circuits, or editable logic by converting the method steps into logic. The same function can be achieved in the form of processors and embedded microcontrollers. Therefore, this controller/processor can be considered as a hardware component, and the devices included therein for implementing various functions can also be considered as structures within the hardware component. Or even, the means for implementing various functions can be regarded as either software modules that implement the methods or structures within hardware components.

上述实施方式中对识别核酸中的一个或多个碱基的方法的技术特征和有点的描述,也适用于该系统,在此不再赘述。可以理解,上述任一实施例中的识别核酸中的一个或多个碱基的方法的附加技术特征,包括子步骤、附加步骤、可选择可替代或较佳的设置或处理等,可通过使得该系统或该系统的模块进一步包括单元/模块或子单元/子模块得以实施。The description of the technical features and advantages of the method for identifying one or more bases in nucleic acids in the above embodiments is also applicable to this system and will not be described again here. It can be understood that additional technical features of the method for identifying one or more bases in nucleic acids in any of the above embodiments, including sub-steps, additional steps, alternative or better settings or processing, etc., can be achieved by making The system or modules of the system further comprise units/modules or sub-units/sub-modules for implementation.

在一些示例中,系统100还包括亮斑集合构建模块140,用于构建所称的对应于模板的亮斑集合,该亮斑集合构建模块140与映射模块110连接。In some examples, the system 100 further includes a bright spot set building module 140 for building a so-called bright spot set corresponding to the template, and the bright spot set building module 140 is connected to the mapping module 110 .

在另一些示例中,映射模块110包括亮斑集合构建子模块111,用于构建对应于模板的亮斑集合,亮斑集合构建子模块111包括:图像采集单元1111,用于顺序或者同时加入四种核苷酸至反应体系中进行一轮测序,获得一组图像;该四种核苷酸带有不同的标记,被激发发出不同颜色的信号,一组图像包括第一图像、第二图像、第三图像和第四图像,第一图像、第二图像、第三图像和第四图像分别采集自同一视野的四种核苷酸的反应信号,反应体系包括所述模板和聚合酶;亮斑检测单元1113,用于分别对来自图像采集单元1111中的第一图像、第二图像、第三图像和第四图像进行亮斑检测,确定各个图像的亮斑;对齐单元1115,用于对齐所称的一组图像;合并单元1117,用于合并来自对齐单元1115的对齐后的一组图像上的亮斑,获得一级亮斑集合;以及,亮斑集合建立单元1119,用于依据来自合并单元1117的一级亮斑集合,建立分别对应于四种核苷酸的亮斑集合。In other examples, the mapping module 110 includes a bright spot set construction sub-module 111 for constructing a bright spot set corresponding to the template. The bright spot set construction sub-module 111 includes: an image acquisition unit 1111 for sequentially or simultaneously adding four Add nucleotides to the reaction system for a round of sequencing to obtain a set of images; the four nucleotides carry different labels and are excited to emit signals of different colors. The set of images includes the first image, the second image, the third image and the fourth image, the first image, the second image, the third image and the fourth image are respectively collected from the reaction signals of four nucleotides in the same field of view, and the reaction system includes the template and the polymerase; The bright spot detection unit 1113 is used to detect bright spots on the first image, the second image, the third image and the fourth image from the image acquisition unit 1111 respectively, and determine the bright spots of each image; the alignment unit 1115 is used to Align the so-called set of images; the merging unit 1117 is used to merge the bright spots on the aligned set of images from the alignment unit 1115 to obtain a first-level bright spot set; and the bright spot set creation unit 1119 is used to Based on the first-level bright spot set from the merging unit 1117, bright spot sets respectively corresponding to the four nucleotides are established.

在一些示例中,图像采集单元1111在同时加入四种核苷酸至所述反应体系中,利用成像系统采集所述信号以获得所述一组图像和/或所述待测图像,所称成像系统包括第一激光、第二激光、第一相机和第二相机。In some examples, the image acquisition unit 1111 adds four nucleotides to the reaction system at the same time and uses an imaging system to collect the signals to obtain the set of images and/or the image to be tested, so-called The imaging system includes a first laser, a second laser, a first camera and a second camera.

在一些示例中,图像采集单元1111加入的四种核苷酸分别带有第一标记、第二标记、第三标记和第四标记,在所称的一轮测序中,开启第一激光激发所述核苷酸,四种核苷酸中的两种分别发出第一信号和第二信号,第一相机和第二相机同步作业以分别采集所述第一信号和所述第二信号,获得第一图像和第二图像,以及,开启第二激光激发所述核苷酸,四种核苷酸中的另外两种核苷酸分别发出第三信号和第四信号,第一相机和第二相机同步作业以分别采集所述第三信号和所述第四信号,获得第三图像和第四图像。In some examples, the four nucleotides added by the image acquisition unit 1111 carry a first label, a second label, a third label and a fourth label respectively. In a so-called round of sequencing, the first laser excitation is turned on. Described nucleotides, two of the four nucleotides respectively emit a first signal and a second signal, the first camera and the second camera operate synchronously to respectively collect the first signal and the second signal, and obtain the first signal and the second signal respectively. A first image and a second image, and, turning on the second laser to excite the nucleotides, the other two nucleotides among the four nucleotides respectively emit a third signal and a fourth signal, the first camera and the second camera Synchronize operations to separately collect the third signal and the fourth signal to obtain a third image and a fourth image.

在一些示例中,亮斑检测单元1113利用k1*k2矩阵对所述一组图像中的各个图像进行检测,包括,判定中心强度与边缘强度的关系midS满足第一预设条件的矩阵对应一个所述亮斑,中心强度反映所述矩阵的中心区域的强度,边缘强度反映所述矩阵的边缘区域的强度,中心区域和所述边缘区域构成所述矩阵,k1和k2均为大于1的自然数,k1*k2矩阵包含k1*k2个像素。In some examples, the bright spot detection unit 1113 uses the k1*k2 matrix to detect each image in the set of images, including determining the relationship between the center intensity and the edge intensity midS. The matrix that satisfies the first preset condition corresponds to a For the bright spot, the center intensity reflects the intensity of the center area of the matrix, and the edge intensity reflects the intensity of the edge area of the matrix. The center area and the edge area constitute the matrix, and k1 and k2 are both natural numbers greater than 1. , the k1*k2 matrix contains k1*k2 pixels.

在一些示例中,亮斑检测单元1113在利用k1*k2矩阵对所述一组图像中的各个图像进行检测时,k1等于k2。In some examples, when the bright spot detection unit 1113 uses a k1*k2 matrix to detect each image in the set of images, k1 is equal to k2.

在一些示例中,亮斑检测单元1113在利用k1*k2矩阵对所述一组图像中的各个图像进行检测时,k1和k2均为大于1的奇数。In some examples, when the bright spot detection unit 1113 uses the k1*k2 matrix to detect each image in the set of images, both k1 and k2 are odd numbers greater than 1.

亮斑检测单元1113在利用k1*k2矩阵对所述一组图像中的各个图像进行检测时,k1和k2均为大于3的奇数,所述中心区域为以所述矩阵的中心像素为中心的3*3区域。When the bright spot detection unit 1113 uses the k1*k2 matrix to detect each image in the set of images, k1 and k2 are both odd numbers greater than 3, and the central area is centered on the central pixel of the matrix. 3*3 area.

在一些示例中,亮斑检测单元1112在利用k1*k2矩阵对所述一组图像中的各个图像进行检测时第一预设条件为midS≥S1,midS=midInt-sumInts(1:n)/n,midInt表示所述中心强度,sumInts(1:n)/n表示所述边缘强度,sumInts(1:n)表示所述边缘区域的第1至第N个像素的像素值之和,n为不小于4的自然数,S1为[2,4]中的任意值。In some examples, when the bright spot detection unit 1112 uses the k1*k2 matrix to detect each image in the set of images, the first preset condition is midS≥S1, midS=midInt-sumInts(1:n) /n, midInt represents the center intensity, sumInts(1:n)/n represents the edge intensity, sumInts(1:n) represents the sum of pixel values of the 1st to Nth pixels in the edge area, n is a natural number not less than 4, and S1 is any value in [2,4].

在一些示例中,亮斑检测单元1113包括用于:分别对所述一组图像中的各个图像进行卷积,获得卷积后的图像;寻找所述卷积后的图像中所有的在k3*k4区域内包含峰值的像素,k3和k4均为大于1的自然数,k3*k4区域包含k3*k4个卷积后的图像的像素;判定满足第二预设条件的以峰值像素为中心的k5*k6区域对应一个所述亮斑,所述第二预设条件为所述k5*k6区域的峰值像素的像素不小于S2,k5和k6均为大于1的自然数,S2可通过该卷积后的图像的像素进行确定。In some examples, the bright spot detection unit 1113 includes: convolving each image in the set of images to obtain a convolved image; finding all the convolved images in k3 The *k4 area contains peak pixels. Both k3 and k4 are natural numbers greater than 1. The k3*k4 area contains k3*k4 pixels of the convolved image; determine the pixels centered on the peak pixel that meet the second preset condition. The k5*k6 area corresponds to one of the bright spots. The second preset condition is that the peak pixel of the k5*k6 area is not less than S2. Both k5 and k6 are natural numbers greater than 1. S2 can pass the convolution. The pixels of the resulting image are determined.

在一些示例中,k3等于k4,和/或k5等于k6。In some examples, k3 is equal to k4, and/or k5 is equal to k6.

在一些示例中,k3和k4均为大于1的奇数,和/或k5和k6均为大于1的奇数。In some examples, k3 and k4 are both odd numbers greater than 1, and/or k5 and k6 are both odd numbers greater than 1.

在一些示例中,k3和k4均为大于3的奇数,和/或k5和k6均为大于3的奇数。In some examples, k3 and k4 are both odd numbers greater than 3, and/or k5 and k6 are both odd numbers greater than 3.

在一些示例中,S2不小于所述卷积后的图像的所有像素按像素值升序排序的中位数,和/或不大于所述卷积后的图像的所有像素按像素值升序排序的第八十分位数。In some examples, S2 is not less than the median of all pixels of the convolved image sorted in ascending order of pixel value, and/or is not greater than the median of all pixels of the convolved image sorted in ascending order of pixel value. Eighty percentile.

在一些示例中,亮斑检测单元1113还包括基于原始图像上所述亮斑所在区域的强度对该图像的亮斑进行筛选。In some examples, the bright spot detection unit 1113 further includes filtering the bright spots of the image based on the intensity of the area where the bright spot is located on the original image.

在一些示例中,亮斑检测单元1113包括利用k7*k8矩阵对所述一组图像中的各个图像进行检测,包括,判定指定方向的多个像素值为单调波动的k7*k8矩阵对应一个候选亮斑,利用相应k7*k8矩阵中的至少一部分区域的像素对所述候选亮斑进行筛选,以确定所述亮斑,k7和k8均为大于1的自然数,k7*k8矩阵包含k7*k8个像素。In some examples, the bright spot detection unit 1113 includes using a k7*k8 matrix to detect each image in the set of images, including determining that multiple pixel values in the specified direction are monotonically fluctuating. The k7*k8 matrix corresponds to a Candidate bright spots are screened using pixels in at least part of the area in the corresponding k7*k8 matrix to determine the bright spots, k7 and k8 are both natural numbers greater than 1, and the k7*k8 matrix contains k7* k8 pixels.

在一些示例中,k7等于k8,和/或k7和k8均为大于1的奇数。In some examples, k7 is equal to k8, and/or both k7 and k8 are odd numbers greater than 1.

在一些示例中,映射模块110还包括亚像素坐标确认子模块112,用于利用重心法确定所述亮斑的亚像素坐标。In some examples, the mapping module 110 also includes a sub-pixel coordinate confirmation sub-module 112 for determining the sub-pixel coordinates of the bright spot using the center of gravity method.

在一些示例中,对齐单元1115用来自第M轮测序的图像进行该对齐。In some examples, alignment unit 1115 performs this alignment using images from the Mth round of sequencing.

在一些示例中,对齐所称的一组图像,包括利用来自第M轮测序的图像进行。M例如大于20、30或者50。In some examples, the so-called set of images is aligned, including using images from the Mth round of sequencing. M is greater than 20, 30 or 50, for example.

在一些示例中,对齐单元1115用来自第M轮测序的图像进行该对齐时,第M轮测序的图像包括第五图像、第六图像、第七图像和第八图像,所称的第五图像、第六图像、第七图像和第八图像分别与第一图像、第二图像、第三图像和第四图像对应相同种核苷酸的反应信号,以所称的第五图像的坐标系为基准,分别对第六图像、第七图像和第八图像的坐标系进行转换,包括,以相同方式分别将所述第五图像和第六图像划分成一组大小为k9*k10的块,k9和k10均为大于30的自然数,k9*k10包含k9*k10个像素;分别确定所称的第六图像的每个块相对于所称的第五图像的相应块的偏移量;基于所述偏移量,对齐所称的第二图像和所述第一图像。In some examples, when the alignment unit 1115 performs the alignment using images from the M-th round of sequencing, the images from the M-th round of sequencing include fifth images, sixth images, seventh images, and eighth images, the so-called fifth images. , the sixth image, the seventh image and the eighth image respectively correspond to the reaction signals of the same nucleotides as the first image, the second image, the third image and the fourth image. The coordinate system of the so-called fifth image is The benchmark, respectively converting the coordinate systems of the sixth image, the seventh image and the eighth image, includes dividing the fifth image and the sixth image into a group of blocks of size k9*k10 in the same way, k9 and k10 are all natural numbers greater than 30, and k9*k10 includes k9*k10 pixels; the offset of each block of the so-called sixth image relative to the corresponding block of the so-called fifth image is determined respectively; based on the offset The shift amount aligns the second image with the first image.

在一些示例中,合并单元1117,在合并对齐后的一组图像上的亮斑时,包括将预设范围k11*k12内的多个亮斑合并为一个亮斑,k11和k12均为大于1的自然数,k11*k12包含k11*k12个像素。In some examples, the merging unit 1117, when merging the bright spots on the aligned set of images, includes merging multiple bright spots within the preset range k11*k12 into one bright spot, where both k11 and k12 are greater than A natural number of 1, k11*k12 contains k11*k12 pixels.

在一些示例中,信号确定模块120,用于确定所述待检图像上相应坐标的位置的信号的强度,所称的强度为矫正后的强度,矫正强度包括串色矫正和/或相位矫正。In some examples, the signal determination module 120 is used to determine the intensity of the signal at the position of the corresponding coordinate on the image to be inspected. The intensity is the corrected intensity, and the corrected intensity includes cross-color correction and/or phase correction.

在一些示例中,信号确定模块120在对所述强度进行矫正之前,映射模块110使所述待检图像对齐于所述对应于模板的亮斑集合。In some examples, before the signal determination module 120 corrects the intensity, the mapping module 110 aligns the image to be detected with the set of bright spots corresponding to the template.

在一些示例中,信号确定模块120在进行强度的矫正时包括采用串色矫正,基于来自相同一轮测序且对应不同种类核苷酸的图像的至少之一进行所述串色矫正。In some examples, when correcting the intensity, the signal determination module 120 includes using cross-color correction based on at least one of the images from the same round of sequencing and corresponding to different types of nucleotides.

在一些示例中,信号确定模块120在进行强度的矫正时,所采用的串色矫正包括:In some examples, when the signal determination module 120 performs intensity correction, the cross-color correction used includes:

对所述待检图像的特定区域内的多个相应坐标的位置的信号进行拟合,获得拟合结果;以及,基于拟合结果矫正所述待检图像上的相应坐标的位置的信号。其中,图像Xi和所述待检图像来自相同一轮测序,所述图像Xi和所述待检图像对应相同的视野,所述待检图像包含来自所述图像Xi对应的核苷酸的信号。Fit the signals of the positions of multiple corresponding coordinates in the specific area of the image to be inspected to obtain a fitting result; and correct the signals of the positions of the corresponding coordinates on the image to be inspected based on the fitting results. The image Xi and the image to be tested are from the same round of sequencing, the image Xi and the image to be tested correspond to the same field of view, and the image to be tested contains signals from the nucleotides corresponding to the image Xi.

在一些示例中,拟合为线性拟合。In some examples, the fit is a linear fit.

在一些示例中,信号确定模块120在进行强度的矫正时包括采用相位矫正,基于来自相邻轮测序且对应相同种类核苷酸的图像的至少之一进行所述相位矫正。In some examples, when correcting the intensity, the signal determination module 120 includes using phase correction based on at least one of the images from adjacent rounds of sequencing and corresponding to the same type of nucleotide.

在一些示例中,信号确定模块120在进行强度的矫正时,所采用相位矫正包括:对所述待检图像的特定区域内的多个相应坐标的位置的信号进行拟合,获得拟合结果;以及基于拟合关系矫正所述待检图像上的相应坐标的位置的信号。其中,图像Yj和所述待检图像来自相邻两轮测序,所述图像Yj和所述待检图像对应相同的视野,所述图像Yj和所述待检图像对应相同种类核苷酸。In some examples, when the signal determination module 120 performs intensity correction, the phase correction used includes: fitting signals at the positions of multiple corresponding coordinates in a specific area of the image to be inspected to obtain a fitting result; and correcting the signal of the position of the corresponding coordinate on the image to be inspected based on the fitting relationship. The image Yj and the image to be tested are from two adjacent rounds of sequencing, the image Yj and the image to be tested correspond to the same field of view, and the image Yj and the image to be tested correspond to the same type of nucleotides.

利用本发明任一实施方式的方法、产品和/或系统进行碱基识别,能够快速且准确地识别碱基,实现模板的至少一部分序列的核苷酸/碱基的次序的测定。Using the method, product and/or system of any embodiment of the present invention to perform base identification can quickly and accurately identify bases and achieve the determination of the nucleotide/base sequence of at least a part of the template sequence.

在本说明书的描述中,参考术语“一个实施方式”、“一些实施方式”、“示意性实施方式”、“示例”、“具体示例”、或“一些示例”等的描述意指结合所述实施方式或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。In the description of this specification, reference to the description of the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples" is intended to be in conjunction with the description. An embodiment or example describes a specific feature, structure, material, or characteristic that is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管上面已经示出和描述了本发明的实施方式,可以理解的是,上述实施方式是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施方式进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present invention. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present invention. The embodiments are subject to changes, modifications, substitutions and variations.

Claims (51)

1. A method of identifying one or more bases in a nucleic acid by detecting an image obtained from sequencing, comprising:
mapping coordinates of each bright spot in a bright spot set corresponding to a template onto an image to be detected, wherein the mapping comprises determining the coordinate position of any bright spot in the bright spot set corresponding to the template on the image to be detected so as to determine the position of corresponding coordinates on the image to be detected;
determining the intensity of a signal of the position of the corresponding coordinate on the image to be detected, wherein the intensity is corrected intensity; and
comparing the intensity of the signal of the position of the corresponding coordinate on the image to be detected with a first preset value, and judging the base type corresponding to the position based on the comparison result to realize the base identification;
the set of bright spots corresponding to the template is constructed based on a set of images, each image of the set of images comprising a plurality of bright spots,
The set of images and the image to be examined are both from a sequence that includes the addition of nucleotides and corresponds to one and the same field of view,
the sequencing includes multiple rounds of sequencing, the set of images from at least one round of sequencing, at least a portion of the signals appearing as at least a portion of the bright spots on the set of images.
2. The method of claim 1, wherein in performing said sequencing, four nucleotides are added, the four nucleotides bearing different labels, said different labels being excited to emit signals of different colors, and constructing said set of hot spots corresponding to templates comprises:
sequentially or simultaneously adding four nucleotides into a reaction system to perform one-round sequencing to obtain a group of images, wherein the group of images comprises a first image, a second image, a third image and a fourth image, the first image, the second image, the third image and the fourth image are respectively acquired from reaction signals of the four nucleotides in the same visual field, and the reaction system comprises the template and polymerase;
respectively carrying out bright spot detection on the first image, the second image, the third image and the fourth image to determine bright spots of the images;
Aligning the set of images;
combining the bright spots on the aligned group of images to obtain a first-level bright spot set;
and establishing a set of bright spots respectively corresponding to the four nucleotides according to the first-level bright spot set.
3. The method of claim 2, wherein four nucleotides are added simultaneously to the reaction system, the signals being acquired with an imaging system comprising a first laser, a second laser, a first camera and a second camera to obtain the set of images and/or the image to be examined.
4. The method of claim 3, wherein the template is DNA and the four nucleotides bear a first tag, a second tag, a third tag, and a fourth tag, respectively, and wherein, during the one round of sequencing,
starting a first laser to excite the nucleotide, respectively emitting a first signal and a second signal by two of the four nucleotides, synchronously operating the first camera and the second camera to respectively acquire the first signal and the second signal, obtaining a first image and a second image,
and starting a second laser to excite the nucleotides, wherein the other two nucleotides in the four nucleotides respectively emit a third signal and a fourth signal, and the first camera and the second camera synchronously work to respectively acquire the third signal and the fourth signal, so as to obtain a third image and a fourth image.
5. The method of any of claims 2-4, wherein the speckle detection comprises detecting each image in the set of images using a k1 x k2 matrix, comprising,
the method comprises the steps that a matrix, in which the relation midS between center intensity and edge intensity meets a first preset condition, corresponds to one bright spot, the center intensity reflects the intensity of a center area of the matrix, the edge intensity reflects the intensity of an edge area of the matrix, the center area and the edge area form the matrix, k1 and k2 are natural numbers larger than 1, and a k1 k2 matrix comprises k1 k2 pixels.
6. The method of claim 5, wherein k1 is equal to k2.
7. The method of claim 5, wherein k1 and k2 are each an odd number greater than 1.
8. The method of claim 7, wherein k1 and k2 are each an odd number greater than 3, and the center region is a 3*3 region centered on a center pixel of the matrix.
9. The method of claim 5, wherein the first predetermined condition is midS.gtoreq.S 1,
mids=midint-sumInts (1:n)/N, midInt represents the center intensity, sumInts (1:n)/N represents the edge intensity, sumInts (1:n) represents the sum of pixel values of 1 st to nth pixels of the edge region, N is a natural number not less than 4, and S1 is any value of [2, 4 ].
10. The method of any of claims 2-4, wherein the hot spot detection comprises:
respectively convolving each image in the group of images to obtain convolved images;
searching all pixels containing peaks in a k3 k4 region in the convolved image, wherein k3 and k4 are natural numbers larger than 1, and the k3 k4 region contains k3 k4 pixels of the convolved image;
and determining that a k5 x k6 region with a peak pixel as a center corresponds to one bright spot, wherein the second preset condition is that the pixel of the peak pixel of the k5 x k6 region is not less than S2, k5 and k6 are natural numbers which are larger than 1, and S2 can be determined through the pixel of the convolved image.
11. The method of claim 10, wherein k3 is equal to k4 and/or k5 is equal to k6.
12. The method according to claim 10, wherein k3 and k4 are each odd numbers greater than 1, and/or k5 and k6 are each odd numbers greater than 1.
13. The method of claim 12, wherein k3 and k4 are each an odd number greater than 3, and/or k5 and k6 are each an odd number greater than 3.
14. The method of claim 10, wherein S2 is not less than a median of all pixels of the convolved image ordered in ascending pixel value order and/or is not greater than an eighteenth median of all pixels of the convolved image ordered in ascending pixel value order.
15. The method of claim 5, further comprising screening the image for bright spots based on the intensity of the area of the original image where the bright spots are located.
16. The method of claim 10, further comprising screening the image for bright spots based on the intensity of the area of the original image where the bright spots are located.
17. The method of any of claims 2-4, wherein the speckle detection comprises detecting each image in the set of images using a k7 x k8 matrix, comprising,
a k 7-k 8 matrix for determining that a plurality of pixel values in a specified direction are monotonically fluctuating corresponds to one candidate bright spot,
and screening the candidate bright spots by utilizing pixels of at least a part of areas in a corresponding k 7-k 8 matrix to determine the bright spots, wherein k7 and k8 are natural numbers larger than 1, and the k 7-k 8 matrix comprises k 7-k 8 pixels.
18. The method of claim 17, wherein k7 is equal to k8, and/or k7 and k8 are each an odd number greater than 1.
19. The method of any of claims 2-4, further comprising determining subpixel coordinates of the bright spot using a barycentric method.
20. The method of claim 5, further comprising determining subpixel coordinates of the bright spot using a barycentric method.
21. The method of claim 10, further comprising determining subpixel coordinates of the bright spot using a barycentric method.
22. The method of claim 17, further comprising determining subpixel coordinates of the bright spot using a barycentric method.
23. The method of any one of claims 2-4, wherein aligning the set of images comprises using images from an mth round of sequencing for the alignment.
24. The method of claim 5, wherein aligning the set of images comprises using images from an mth round of sequencing for the alignment.
25. The method of claim 10, wherein aligning the set of images comprises using images from an mth round of sequencing for the alignment.
26. The method of claim 17, wherein aligning the set of images comprises using images from an mth round of sequencing for the alignment.
27. The method of claim 23, wherein M is greater than 20.
28. The method of claim 27, wherein M is greater than 50.
29. The method of claim 23, wherein the image sequenced by the mth round comprises a fifth image, a sixth image, a seventh image, and an eighth image, the fifth image, the sixth image, the seventh image, and the eighth image corresponding to the reaction signals of the same species of nucleotide as the first image, the second image, the third image, and the fourth image, respectively,
Converting coordinate systems of a sixth image, a seventh image and an eighth image respectively by taking the coordinate system of the fifth image as a reference, wherein the fifth image and the sixth image are respectively divided into a group of blocks with the size of k9 x k10 in the same way, the k9 and the k10 are natural numbers larger than 30, and the k9 x k10 comprises k9 x k10 pixels;
determining an offset of each block of the sixth image relative to a corresponding block of the fifth image, respectively;
the second image and the first image are aligned based on the offset.
30. The method of any one of claims 24-26, wherein the image sequenced in the Mth round comprises a fifth image, a sixth image, a seventh image, and an eighth image, wherein the fifth image, the sixth image, the seventh image, and the eighth image correspond to the reaction signals of the same nucleotide as the first image, the second image, the third image, and the fourth image, respectively,
converting coordinate systems of a sixth image, a seventh image and an eighth image respectively by taking the coordinate system of the fifth image as a reference, wherein the fifth image and the sixth image are respectively divided into a group of blocks with the size of k9 x k10 in the same way, the k9 and the k10 are natural numbers larger than 30, and the k9 x k10 comprises k9 x k10 pixels;
Determining an offset of each block of the sixth image relative to a corresponding block of the fifth image, respectively;
the second image and the first image are aligned based on the offset.
31. The method of any of claims 2-4, wherein merging the bright spots on the aligned set of images comprises merging a plurality of bright spots within a predetermined range k11 x k12 into one bright spot, where k11 and k12 are natural numbers greater than 1, and k11 x k12 comprises k11 x k12 pixels.
32. The method of claim 5 wherein merging the bright spots on the aligned set of images includes merging a plurality of bright spots within a predetermined range k11 k12 into one bright spot, where k11 and k12 are natural numbers greater than 1, and k11 k12 comprises k11 k12 pixels.
33. The method of claim 10 wherein merging the bright spots on the aligned set of images includes merging a plurality of bright spots within a predetermined range k11 x k12 into one bright spot, where k11 and k12 are natural numbers greater than 1, and k11 x k12 comprises k11 x k12 pixels.
34. The method of claim 17 wherein merging the bright spots on the aligned set of images includes merging a plurality of bright spots within a predetermined range k11 x k12 into one bright spot, where k11 and k12 are natural numbers greater than 1, and k11 x k12 comprises k11 x k12 pixels.
35. The method of any of claims 2-4, wherein the correction of intensity comprises cross-color correction and/or phase correction.
36. The method of claim 5, wherein the correction of intensity comprises cross-color correction and/or phase correction.
37. The method of claim 10, wherein the correction of intensity comprises cross-color correction and/or phase correction.
38. The method of claim 17, wherein the correction of intensity comprises cross-color correction and/or phase correction.
39. The method of claim 35, wherein the image to be inspected is aligned to the set of bright spots corresponding to templates prior to correcting the intensity.
40. The method of claim 39, wherein the correction of intensity comprises a cross color correction based on at least one of images from the same round of sequencing and corresponding to different types of nucleotides.
41. The method of any one of claims 36-38, wherein the correction of intensity comprises a cross color correction based on at least one of images from the same round of sequencing and corresponding to different types of nucleotides.
42. The method of claim 40, wherein image Xi and said image to be inspected are from the same round of sequencing, said image Xi and said image to be inspected correspond to the same field of view, said image to be inspected comprises signals from the nucleotides corresponding to said image Xi, said color correction comprises,
fitting signals of positions of a plurality of corresponding coordinates in a specific area of the image to be detected to obtain a fitting result; and
correcting signals of positions of corresponding coordinates on the image to be detected based on fitting results.
43. The method of claim 41, wherein image Xi and said image to be inspected are from the same round of sequencing, said image Xi and said image to be inspected correspond to the same field of view, said image to be inspected comprises a signal from a nucleotide corresponding to said image Xi, said color correction comprises,
fitting signals of positions of a plurality of corresponding coordinates in a specific area of the image to be detected to obtain a fitting result; and
correcting signals of positions of corresponding coordinates on the image to be detected based on fitting results.
44. The method of claim 42 or 43, wherein the fit is a linear fit.
45. The method of claim 35, wherein the correction of intensity comprises phase correction based on at least one of images from adjacent rounds of sequencing and corresponding to the same kind of nucleotide.
46. The method of any one of claims 36-38, wherein the correction of intensity comprises phase correction based on at least one of images from adjacent rounds of sequencing and corresponding to the same kind of nucleotide.
47. The method of claim 45, wherein the image Yj and the image to be examined are from two adjacent rounds of sequencing, the image Yj and the image to be examined correspond to the same field of view, the image Yj and the image to be examined correspond to the same kind of nucleotide, the phase correction comprises,
fitting signals of positions of a plurality of corresponding coordinates in a specific area of the image to be detected to obtain a fitting result; and
correcting signals of positions of corresponding coordinates on the image to be detected based on fitting relation.
48. The method of claim 46, wherein the image Yj and the image to be examined are from two adjacent rounds of sequencing, the image Yj and the image to be examined correspond to the same field of view, the image Yj and the image to be examined correspond to the same kind of nucleotide, the phase correction comprises,
Fitting signals of positions of a plurality of corresponding coordinates in a specific area of the image to be detected to obtain a fitting result; and
correcting signals of positions of corresponding coordinates on the image to be detected based on fitting relation.
49. A computer readable storage medium storing a program for execution by a computer, the execution of the program comprising performing the method of any one of claims 1-48.
50. A system for identifying one or more bases in a nucleic acid, comprising:
the computer readable storage medium of claim 49; and
at least one processor configured to execute the program stored in the computer-readable storage medium.
51. A system for identifying one or more bases in a nucleic acid, comprising a plurality of modules for performing the steps of the method of any one of claims 1-48.
CN201911331502.1A 2019-12-21 2019-12-21 Method and system for identifying bases in nucleic acids Active CN113012757B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201911331502.1A CN113012757B (en) 2019-12-21 2019-12-21 Method and system for identifying bases in nucleic acids
PCT/CN2020/114355 WO2021120715A1 (en) 2019-12-21 2020-09-10 Method for identifying base in nucleic acid and system
US17/787,824 US12211589B2 (en) 2019-12-21 2020-09-10 Method for identifying base in nucleic acid and system
EP20902453.8A EP4116402A4 (en) 2019-12-21 2020-09-10 METHOD FOR IDENTIFYING A BASE IN A NUCLEIC ACID AND SYSTEM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911331502.1A CN113012757B (en) 2019-12-21 2019-12-21 Method and system for identifying bases in nucleic acids

Publications (2)

Publication Number Publication Date
CN113012757A CN113012757A (en) 2021-06-22
CN113012757B true CN113012757B (en) 2023-10-20

Family

ID=76382824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911331502.1A Active CN113012757B (en) 2019-12-21 2019-12-21 Method and system for identifying bases in nucleic acids

Country Status (4)

Country Link
US (1) US12211589B2 (en)
EP (1) EP4116402A4 (en)
CN (1) CN113012757B (en)
WO (1) WO2021120715A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4151709A4 (en) * 2020-05-12 2024-02-21 Hitachi High-Tech Corporation Nucleic acid analysis device, nucleic acid analysis method, and machine learning method
CN115035952B (en) * 2022-05-20 2023-04-18 深圳赛陆医疗科技有限公司 Base recognition method and device, electronic device, and storage medium
WO2024000312A1 (en) * 2022-06-29 2024-01-04 深圳华大生命科学研究院 Base calling method and system, gene sequencer and storage medium
CN116486910B (en) * 2022-10-17 2023-12-22 北京普译生物科技有限公司 Deep learning training set establishment method for nanopore sequencing base recognition and application thereof
WO2024124378A1 (en) * 2022-12-12 2024-06-20 深圳华大智造科技股份有限公司 Method for correcting base interpretation result of synchronous sequencing, synchronous sequencing method and system, and computer program product
CN118429965A (en) * 2023-01-31 2024-08-02 深圳市真迈生物科技有限公司 Base recognition method, base recognition device, electronic apparatus, and storage medium
CN117990667A (en) * 2023-02-03 2024-05-07 深圳市真迈生物科技有限公司 Microscopic imaging method and application

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108192953A (en) * 2017-11-22 2018-06-22 深圳市瀚海基因生物科技有限公司 The method for detecting nucleic acid specificity and/or non-specific adsorption
CN209759461U (en) * 2018-12-12 2019-12-10 深圳市真迈生物科技有限公司 optical system and sequencing system

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL108497A0 (en) * 1993-02-01 1994-05-30 Seq Ltd Methods and apparatus for dna sequencing
JP2008512129A (en) * 2004-09-10 2008-04-24 セクエノム,インコーポレイティド Comprehensive sequence analysis of nucleic acids
WO2009092035A2 (en) * 2008-01-17 2009-07-23 Sequenom, Inc. Methods and compositions for the analysis of biological molecules
US8965076B2 (en) * 2010-01-13 2015-02-24 Illumina, Inc. Data processing system and methods
US9453898B2 (en) * 2012-08-27 2016-09-27 Koninklijke Philips N.V. Motion tracking based on fast image acquisition
US10694939B2 (en) * 2016-04-29 2020-06-30 Duke University Whole eye optical coherence tomography(OCT) imaging systems and related methods
WO2018068511A1 (en) 2016-10-10 2018-04-19 深圳市瀚海基因生物科技有限公司 Image processing method and system for gene sequencing
CN109423491A (en) * 2017-08-29 2019-03-05 北京大学 The relevant long non-coding RNA of myoblast differentiation and its application
US20200129005A1 (en) * 2018-10-26 2020-04-30 Ali Group Srl Oven and cooking method for the food service industry with remote control and programming unit
JP7308097B2 (en) * 2019-08-20 2023-07-13 キヤノンメディカルシステムズ株式会社 METHOD OF SETTING EXCITATION AREA AND MAGNETIC RESONANCE IMAGING DEVICE
CN110592026B (en) * 2019-08-30 2020-10-23 山东大学 HBV cccDNA and host interaction research model construction and application
CN110564762A (en) * 2019-09-25 2019-12-13 湖北大学 Elongation factor BnELP4 gene for regulating cabbage type rape sclerotinia sclerotiorum resistance and application thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108192953A (en) * 2017-11-22 2018-06-22 深圳市瀚海基因生物科技有限公司 The method for detecting nucleic acid specificity and/or non-specific adsorption
CN209759461U (en) * 2018-12-12 2019-12-10 深圳市真迈生物科技有限公司 optical system and sequencing system

Also Published As

Publication number Publication date
US20230027811A1 (en) 2023-01-26
CN113012757A (en) 2021-06-22
EP4116402A1 (en) 2023-01-11
EP4116402A4 (en) 2023-10-25
WO2021120715A1 (en) 2021-06-24
US12211589B2 (en) 2025-01-28

Similar Documents

Publication Publication Date Title
CN113012757B (en) Method and system for identifying bases in nucleic acids
US11676275B2 (en) Identifying nucleotides by determining phasing
US11308640B2 (en) Image analysis useful for patterned objects
JP2024028778A (en) Single light source, 2 optical channel sequencing
AU2008261935B2 (en) Methods and processes for calling bases in sequence by incorporation methods
CN109117796B (en) Base recognition method and device, and method and system for generating color image
EP4015645A1 (en) Base recognition method and system, computer program product, and sequencing system
AU2014360530A1 (en) Methods and systems for analyzing image data
CN116994246A (en) Base recognition method and device based on multitasking combination, gene sequencer and medium
CN117237198A (en) Super-resolution sequencing method and device based on deep learning, sequencer and medium
US10614571B2 (en) Object classification in digital images
CN117392673A (en) Base recognition method and device, gene sequencer and medium
EP3843033A1 (en) Method for constructing sequencing template based on image, and base recognition method and device
US20230407386A1 (en) Dependence of base calling on flow cell tilt
WO2024159993A1 (en) Base calling method and apparatus, electronic device, and storage medium
CN118429967A (en) Base recognition method and system
CN117274739A (en) Base recognition method, training set construction method thereof, gene sequencer and medium
CN118010685A (en) Method, device and medium for determining inter-channel crosstalk correction intensity
CN117523559A (en) Base recognition method and device, gene sequencer and storage medium
Ilnicki et al. Analysis of the DNA microarray hybridization images using morphological image processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40054464

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 518000 podium 502A and 502B, podium 602, Luohu investment holding building, No. 112, Qingshuihe 1st Road, Qingshuihe community, Qingshuihe street, Luohu District, Shenzhen, Guangdong

Patentee after: Shenzhen Zhenmai Biotechnology Co.,Ltd.

Country or region after: China

Address before: 518000 5th and 6th floors, block 2, Shenye Jinyuan Building, No.116, Qingshuihe 1st Road, Qingshuihe street, Luohu District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen Zhenmai Biotechnology Co.,Ltd.

Country or region before: China