KR102322308B1

KR102322308B1 - Apparatus and method for expanding the amount of omics sequencing data from partial omics sequencing data

Info

Publication number: KR102322308B1
Application number: KR1020200037217A
Authority: KR
Inventors: 박종화; 김병철; 조윤성
Original assignee: 주식회사 클리노믹스
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-11-05
Also published as: KR20210120471A

Abstract

일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치 및 방법이 개시된다. 본 발명은, 오믹스별 참조표준 데이터, 개별 오믹스 내의 연관관계 정보 및 오믹스 간의 연관관계 정보를 이용한 일련의 산입 과정을 거쳐 부분적 오믹스 정보로부터 전체 오믹스 정보로 확대 생산한다. 또한, 가족 등과 같이 서로 가까운 친족관계를 가지는 샘플들의 유전적 유사성에 대한 정보를 활용하여 적은 양의 오믹스 정보로부터 전체 오믹스 정보로 확대 생산할 수 있다. 본 발명에 따르면, 전체 오믹스 정보를 생산하는 데 소요되는 자원(비용 등)을 절감할 수 있다.Disclosed are an apparatus and method for expanding production to full omics information by utilizing some omics information. The present invention expands production from partial omics information to full omics information through a series of inclusion processes using reference standard data for each omics, correlation information within individual omics, and correlation information between omics. In addition, it is possible to expand production from a small amount of omics information to full omics information by using information on the genetic similarity of samples having close kinship relationships, such as families. According to the present invention, it is possible to reduce resources (costs, etc.) required to produce the entire omics information.

Description

Apparatus and method for expanding the amount of omics sequencing data from partial omics sequencing data

본 발명은 한 개 혹은 한 세트의 샘플에서 부분적인 일부의 오믹스 정보를 활용하여 그 한 개의 샘플 혹은 한 세트의 샘플로부터 나올 수 있는 전체 오믹스 정보로 확대 생산하는 장치 및 방법에 관한 것으로, 더욱 상세하게는 오믹스별 참조표준 데이터, 개별 오믹스 내의 연관관계 정보 및 오믹스 간의 연관관계 정보를 이용한 일련의 산입 과정을 거쳐 부분적 오믹스 정보로부터 전체 오믹스 정보로 확대 생산하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for using partial omics information from one or a set of samples to expand and produce full omics information that can be derived from the single sample or a set of samples, Specifically, it relates to an apparatus and method for expanding production from partial omics information to full omics information through a series of inclusion processes using reference standard data for each omics, correlation information within individual omics, and correlation information between omics. will be.

게놈(유전체, genome), 전사체(transcriptome), 외유전체(후성유전체, epigenome), 장거리 유전정보 맵핑 데이터(long distance genetic mapping data) 등의 다양한 오믹스(체학: omics) 분야에 걸쳐 가장 방대하고 정확한 유전적 정보를 생산(sequencing) 하는 방법은 충분한 양(예컨대, sequencing depth)의 유전자와 RNA등의 서열데이터를 생산하여 분석하는 것이 일반적이다. 그러나, 충분한 양의 데이터를 생산하기 위해서는 높은 비용이 발생되며, 충분하다는 것도 매우 주관적이고, 생산량은 일반적으로 많으면 많을수록 좋다고 인식이 되어 있다. 따라서, 이런 높은 생산비용을 줄이기 위해 목표 영역에 대해서만 해독(targeted sequencing)하는 방법, 바이오칩 혹은 DNA chip 이라고 불리우는 일부 생물학 마커만을 확인하는 칩을 제작(목표 마커들을 array 등에 심어놓는 방법)하는 방법 등이 활용되고 있다.It is the most extensive and diverse field of omics such as genome (genome, genome), transcriptome, exogenous (epigenome, epigenome), and long distance genetic mapping data. As a method of producing accurate genetic information (sequencing), it is common to produce and analyze sequence data such as genes and RNA of sufficient amount (eg, sequencing depth). However, in order to produce a sufficient amount of data, a high cost is incurred, and it is also very subjective that sufficient data is sufficient, and it is generally recognized that the greater the amount of production, the better. Therefore, in order to reduce this high production cost, there are methods such as targeted sequencing of only the target region, and manufacturing a chip that identifies only some biological markers called a biochip or DNA chip (a method of implanting target markers in an array, etc.). is being utilized

DNA chip 데이터를 활용하는 경우, 목표하는 유전자 변이 서열 분자(마커라고 일반적으로 불리움)가 심어져 있지 않거나, 목표로 심어져 있는 마커들이 부분적으로 유전형 생산이 안되는 경우가 있기 때문에, 참조표준게놈(reference genome) 데이터로부터 계산된 연관불균형 관계(linkage disequilibrium) 정보를 활용하여 산입(imputation) 분석을 수행하고, 직접 실험을 통해 생산되지 않는 영역의 유전정보를 생산하는 방법(즉, genome의 coverage를 높이는 방법)이 일반적으로 활용되고 있다.In the case of using DNA chip data, the target gene mutation sequence molecule (generally called a marker) is not planted, or the markers planted as the target may not be partially genotyped. A method of performing imputation analysis using linkage disequilibrium information calculated from genome) data, and producing genetic information in a region that is not produced through direct experiments (that is, increasing the coverage of the genome) ) is commonly used.

그러나, 현재 활용되는 연관불균형 관계에 기반한 산입 방법은 목표 마커나, 목표 마커의 인근 영역에 한정하여 수행되고 있어, 전체 게놈 혹은 오믹스 데이터를 생산/추론할 수가 없다. 따라서, 저렴한 비용의 실험 방법 즉, 목표 영역에 대해서만 해독(targeted sequencing)하는 방법이나, LPS(low pass sequencing) 방법이나, 칩(chip) 방법 등을 통해 전체 오믹스 데이터를 정확히 추론하는 방법의 개발이 필요한 상황이다.However, the currently used inclusion method based on the linkage disequilibrium relationship is limited to the target marker or a region adjacent to the target marker, and thus cannot produce/infer the whole genome or omics data. Therefore, a low-cost experimental method, that is, a method of deciphering the entire omics data accurately through a targeted sequencing method, a low pass sequencing (LPS) method, a chip method, etc., has been developed. This is a necessary situation.

한국공개특허 제2014-0119723호 (다우 아그로사이언시즈 엘엘씨) 2014. 10. 10. 특허문헌 1은 DNA 서열의 데이터 분석으로서, 특허문헌 1에는 서열 데이터를 전자식으로 수신하는 단계, 1개 이상의 발현 벡터와 관련된 하나 이상의 참조 데이터 서열을 전자식으로 수신하는 단계, 서열 데이터를 참조 데이터 서열 중 1개 이상의 것과 연관시켜 트랜스진 측면 서열을 확인하는 단계, 트랜스진 측면 서열의 하나 이상의 삽입 부위에 대한 게놈을 검색하는 단계, 및 검색 단계에서 하나 이상의 삽입 부위가 발견되었을 때, 게놈 및 게놈내 하나 이상의 삽입 부위에 주석을 다는 단계를 포함하는 분석 방법에 대한 내용이 개시되어 있다.Korean Patent Application Laid-Open No. 2014-0119723 (Dow AgroSciences LLC) 2014. 10. 10. Patent Document 1 is a data analysis of a DNA sequence, and Patent Document 1 includes a step of electronically receiving sequence data, expression of one or more electronically receiving one or more reference data sequences associated with the vector, associating the sequence data with one or more of the reference data sequences to identify a transgene flanking sequence, generating a genome for one or more insertion sites of the transgene flanking sequence; Disclosed is a method of analysis comprising searching, and annotating the genome and one or more insertion sites in the genome when one or more insertion sites are found in the searching step.

본 발명이 이루고자 하는 기술적 과제는, 오믹스별 참조표준 데이터, 개별 오믹스 내의 연관관계 정보 및 오믹스 간의 연관관계 정보를 이용한 일련의 산입 과정을 거쳐 한 개의 샘플이나 한 세트의 샘플의 부분적 오믹스 정보로부터 해당 한 개의 샘플이나 한 세트의 샘플에 대한 전체 오믹스 정보로 확대 생산하는 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치 및 방법을 제공하는 데 있다.The technical problem to be achieved by the present invention is to provide a partial omics of one sample or a set of samples through a series of inclusion processes using reference standard data for each omics, correlation information within individual omics, and correlation information between omics. An object of the present invention is to provide an apparatus and method for expanding production to total omics information by using partial omics information that is expanded and produced from information to total omics information for a single sample or a set of samples.

상기의 기술적 과제를 달성하기 위한 본 발명에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 방법은, 복수개의 오믹스 각각에 대해 부분적 오믹스 정보를 생산하는 단계; 복수개의 오믹스 각각에 대해 생산된 부분적 오믹스 정보를 대상으로, 기 구축된 개별 오믹스 내의 유전정보 간의 연관정보를 이용하여, 복수개의 오믹스 각각에 대하여 반복적인 산입을 수행하는 단계; 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 반복적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스 사이의 유전정보 간의 연관정보를 이용하여, 추가적인 산입을 수행하는 단계; 및 오믹스 사이의 유전정보 간의 연관정보를 이용한 추가적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스별 참조표준 데이터를 이용하여, 추가적인 산입을 수행하는 단계;를 포함한다.According to an aspect of the present invention, there is provided a method for expanding and producing total omics information by using partial omics information according to the present invention, comprising: producing partial omics information for each of a plurality of omics; repeating inclusion for each of the plurality of omics by using the information related to the genetic information in each omics that has been previously established with respect to the partial omics information produced for each of the plurality of omics; performing additional inclusion with reference to a result obtained through repeated inclusion using association information between genetic information in individual omics, using previously established association information between genetic information between omics; and performing additional inclusion by using pre-established reference standard data for each omics with respect to a result obtained through additional inclusion using correlation information between genetic information between omics.

상기 반복적인 산입 수행 단계는, 오믹스에 대해 생산된 부분적 오믹스 정보를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 추가하는 제1 과정; 오믹스에 대해 생산된 부분적 오믹스 정보와 이전 과정에서 추가된 유전 데이터를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 다시 추가하는 제2 과정; 및 상기 제2 과정을 반복적으로 수행하는 제3 과정;으로 이루어질 수 있다.The repetitive inclusion may include: a first process of adding genetic data inferred through inclusion using the correlation information between genetic information in individual omics with respect to partial omics information produced for omics; a second process of re-adding the genetic data deduced through inclusion using the correlation information between the genetic information in individual omics with respect to the partial omics information produced for omics and the genetic data added in the previous process; and a third process of repeatedly performing the second process.

상기 반복적인 산입 수행 단계는, 염색체 상의 위치와 상관없이 유전정보 간의 모든 연관정보를 이용하여 산입을 수행하는 것으로 이루어질 수 있다.The repetitive inclusion may be performed using all association information between genetic information regardless of a position on a chromosome.

상기 개별 오믹스 내의 유전정보 간의 연관정보는, 오믹스 내의 유전정보 사이의 연관불균형 관계를 나타낼 수 있다.The association information between the genetic information in the individual omics may indicate a linkage imbalance between the genetic information in the omics.

상기 오믹스 사이의 유전정보 간의 연관정보는, 2개 이상의 오믹스 사이에서 연관성을 보이는 마커 또는 유전정보를 나타내며, 유전자의 발현(expression)과 연관성을 보이는 DNA 상의 SNP(Single Nucleotide Polymorphism)인 eQTL(expresson Quantitative Trait Loci) 및 외유전체의 DNA 메틸레이션(methylation) 패턴과 연관성을 보이는 DNA 상의 SNP인 mQTL(methylation Quantitative Trait Loci) 중 하나일 수 있다.The association information between the genetic information between the omics indicates a marker or genetic information showing a correlation between two or more omics, and eQTL (Single Nucleotide Polymorphism) on DNA that shows association with the expression of a gene (eQTL ( expresson Quantitative Trait Loci) and mQTL (methylation quantitative trait loci), which are SNPs on DNA that are correlated with the DNA methylation pattern of the exogenous genome.

상기 오믹스별 참조표준 데이터를 이용한 추가적인 산입 수행 단계는, 오믹스별 참조표준 데이터로부터 집단의 유전 정보 분포를 확인하고, 가능도(likelihood)를 측정하여 추가적인 산입을 수행하는 것으로 이루어질 수 있다.The step of performing the additional inclusion using the reference standard data for each omics may include confirming the distribution of genetic information of the population from the reference standard data for each omics and performing additional inclusion by measuring the likelihood.

상기 오믹스별 참조표준 데이터를 이용한 추가적인 산입 수행 단계는, 미리 설정된 친족관계 범위 내에 속하는 피검샘플이 다수 존재하는 경우, 서로 계층적인 친족관계를 가지는 샘플간의 해독의 깊이(sequencing depth)와 변이에 대한 정보를 활용하여 추가적인 산입을 수행하는 것으로 이루어질 수 있다.In the step of performing additional inclusion using the reference standard data for each omics, when a plurality of test samples belonging to a preset kinship range exist, the sequencing depth and variation between samples having hierarchical kinship with each other are analyzed. It may consist of performing additional inclusion using the information.

상기의 기술적 과제를 달성하기 위한 본 발명에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치는, 복수개의 오믹스 각각에 대해 부분적 오믹스 정보를 생산하는 생산부; 복수개의 오믹스 각각에 대해 생산된 부분적 오믹스 정보를 대상으로, 기 구축된 개별 오믹스 내의 유전정보 간의 연관정보를 이용하여, 복수개의 오믹스 각각에 대하여 반복적인 산입을 수행하는 제1 산입부; 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 반복적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스 사이의 유전정보 간의 연관정보를 이용하여, 추가적인 산입을 수행하는 제2 산입부; 및 오믹스 사이의 유전정보 간의 연관정보를 이용한 추가적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스별 참조표준 데이터를 이용하여, 추가적인 산입을 수행하는 제3 산입부;를 포함한다.According to an aspect of the present invention, there is provided an apparatus for expanding and producing total omics information by using partial omics information to achieve the above technical problem, comprising: a production unit for producing partial omics information for each of a plurality of omics; A first inclusion unit that repeatedly performs inclusion for each of the plurality of omics by using information related to the genetic information in each omics that has been previously established with respect to the partial omics information produced for each of the plurality of omics. ; a second inclusion unit for performing additional inclusion by using previously-established association information between genetic information between omics for a result obtained through repeated inclusion using correlation information between genetic information in individual omics; and a third inclusion unit that performs additional inclusion by using pre-established reference standard data for each omics with respect to the result obtained through additional inclusion using the correlation information between genetic information between omics.

상기 제1 산입부는, 오믹스에 대해 생산된 부분적 오믹스 정보를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 추가하는 제1 과정; 오믹스에 대해 생산된 부분적 오믹스 정보와 이전 과정에서 추가된 유전 데이터를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 다시 추가하는 제2 과정; 및 상기 제2 과정을 반복적으로 수행하는 제3 과정;으로 이루어지는 과정을 수행할 수 있다.The first inclusion unit may include: a first process of adding genetic data inferred through inclusion by using correlation information between genetic information in individual omics with respect to partial omics information produced for omics; a second process of re-adding the genetic data deduced through inclusion using the correlation information between the genetic information in individual omics with respect to the partial omics information produced for omics and the genetic data added in the previous process; and a third process of repeatedly performing the second process.

상기 제1 산입부는, 염색체 상의 위치와 상관없이 유전정보 간의 모든 연관정보를 이용하여 산입을 수행할 수 있다.The first inclusion unit may perform the inclusion by using all association information between the genetic information regardless of the position on the chromosome.

상기 제3 산입부는, 오믹스별 참조표준 데이터로부터 집단의 유전 정보 분포를 확인하고, 가능도(likelihood)를 측정하여 추가적인 산입을 수행할 수 있다.The third inclusion unit may check a distribution of genetic information of a population from reference standard data for each omics, and may perform additional inclusion by measuring likelihood.

상기 제3 산입부는, 미리 설정된 친족관계 범위 내에 속하는 피검샘플이 다수 존재하는 경우, 서로 계층적인 친족관계를 가지는 샘플간의 해독의 깊이(sequencing depth)와 변이에 대한 정보를 활용하여 추가적인 산입을 수행할 수 있다.The third inclusion unit performs additional inclusion by utilizing information on sequencing depth and variation between samples having hierarchical kinship with each other when there are a large number of test samples falling within a preset kinship range. can

본 발명에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치 및 방법에 의하면, 오믹스별 참조표준 데이터, 개별 오믹스 내의 연관관계 정보 및 오믹스 간의 연관관계 정보를 이용한 일련의 산입 과정을 거쳐 부분적 오믹스 정보로부터 전체 오믹스 정보로 확대 생산하는 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산함으로써, 전체 오믹스 정보를 생산하는 데 소요되는 자원(비용 등)을 절감할 수 있다.According to the apparatus and method for expanding and producing all omics information by using some omics information according to the present invention, a series of reference standard data for each omics, correlation information within individual omics, and correlation information between omics are used. Reduction of resources (costs, etc.) required to produce total omics information by using some omics information that is expanded from partial omics information to full omics information through the inclusion process to expand production to full omics information can do.

또한, 가족 등과 같이 서로 가까운 친족관계를 가지는 샘플들의 유전적 유사성에 대한 정보를 활용하여 적은 양의 오믹스 정보로부터 전체 오믹스 정보로 확대 생산할 수 있고, 소요되는 해독 비용을 절감할 수 있다.In addition, it is possible to expand production from a small amount of omics information to full omics information by using information on the genetic similarity of samples having close kinship relationships, such as families, and reduce decoding costs.

도 1은 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치를 설명하기 위한 블록도이다.
도 2는 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보로부터 전체 오믹스 정보를 확대 생산하는 과정을 설명하기 위한 도면이다.
도 3은 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 방법을 설명하기 위한 흐름도이다.1 is a block diagram for explaining an apparatus for expanding and producing all omics information by utilizing some omics information according to a preferred embodiment of the present invention.
2 is a diagram for explaining a process of expanding and producing all omics information from some omics information according to a preferred embodiment of the present invention.
3 is a flowchart for explaining a method of expanding production to all omics information by using some omics information according to a preferred embodiment of the present invention.

이하에서 첨부한 도면을 참조하여 본 발명에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치 및 방법의 바람직한 실시예에 대해 상세하게 설명한다.Hereinafter, a preferred embodiment of an apparatus and method for expanding and producing all omics information by utilizing some omics information according to the present invention will be described in detail with reference to the accompanying drawings.

먼저, 도 1 및 도 2를 참조하여 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치에 대하여 설명한다.First, with reference to FIGS. 1 and 2 , an apparatus for expanding and producing all omics information by using some omics information according to a preferred embodiment of the present invention will be described.

도 1은 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치를 설명하기 위한 블록도이고, 도 2는 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보로부터 전체 오믹스 정보를 확대 생산하는 과정을 설명하기 위한 도면이다.1 is a block diagram illustrating an apparatus for expanding and producing all omics information by utilizing some omics information according to a preferred embodiment of the present invention, and FIG. 2 is a partial omics information according to a preferred embodiment of the present invention. It is a diagram to explain the process of expanding and producing all omics information from

도 1을 참조하면, 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 장치(이하 '확대 생산 장치'라 합니다)(100)는 오믹스별 참조표준 데이터, 개별 오믹스 내의 연관관계 정보 및 오믹스 간의 연관관계 정보를 이용한 일련의 산입 과정을 거쳐 한 개 혹은 한 세트의 샘플의 부분적 오믹스 정보로부터 해당 한 개의 샘플이나 한 세트의 샘플에 대한 전체 오믹스 정보로 확대 생산한다.Referring to FIG. 1 , a device (hereinafter referred to as an 'extended production device') 100 that expands and produces all omics information by utilizing some omics information according to a preferred embodiment of the present invention is reference standard data for each omics. , from partial omics information of one or a set of samples through a series of inclusion processes using correlation information within individual omics and correlation information between omics, to total omics for one sample or a set of samples. Expand production with information.

이를 위해, 확대 생산 장치(100)는 저장부(110), 분석부(120), 생산부(130), 제1 산입부(140), 제2 산입부(150) 및 제3 산입부(160)를 포함할 수 있다.To this end, the extended production apparatus 100 includes a storage unit 110 , an analysis unit 120 , a production unit 130 , a first inclusion unit 140 , a second inclusion unit 150 , and a third inclusion unit 160 . may include.

저장부(110)는 확대 생산 장치(100)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행하며, 프로그램 영역과 데이터 영역으로 구분될 수 있다.The storage unit 110 serves to store programs and data necessary for the operation of the enlarged production apparatus 100 , and may be divided into a program area and a data area.

프로그램 영역은 확대 생산 장치(100)의 전반적인 동작을 제어하는 프로그램, 확대 생산 장치(100)를 부팅시키는 운영체제(Operating System, OS), 오믹스별 참조표준 데이터의 수집, 개별 오믹스 내의 유전정보 간의 연관정보 및 오믹스 사이의 유전정보 간의 연관정보의 획득, 오믹스 정보의 부분적 생산, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입(imputation) 수행, 오믹스 사이의 유전정보 간의 연관정보를 이용한 산입 수행, 참조표준 데이터를 이용한 산입 수행 등과 같은 확대 생산 장치(100)의 동작에 필요한 응용 프로그램 등을 저장할 수 있다.The program area includes a program for controlling the overall operation of the enlarged production apparatus 100, an operating system (OS) for booting the enlarged production apparatus 100, collection of reference standard data for each omics, and genetic information in individual omics. Acquisition of related information between related information and genetic information between omics, partial production of omics information, performing imputation using correlation information between genetic information within individual omics, and using information related to genetic information between omics It is possible to store an application program required for the operation of the expanded production apparatus 100, such as performing the calculation, the calculation using the reference standard data, and the like.

데이터 영역은 확대 생산 장치(100)의 사용에 따라 발생하는 데이터가 저장되는 영역으로서, 오믹스별 참조표준 데이터, 개별 오믹스 내의 유전정보 간의 연관정보, 오믹스 사이의 유전정보 간의 연관정보, 부분적 오믹스 정보, 전체 오믹스 정보 등을 저장할 수 있다.The data area is an area in which data generated according to the use of the expanded production apparatus 100 is stored, and reference standard data for each omics, association information between genetic information in individual omics, association information between genetic information between omics, and partial It is possible to store omics information, total omics information, and the like.

분석부(120)는 도 2에 도시된 제1 블록(P1)과 같이, 오믹스별 참조표준 데이터를 수집한다. 여기서, 오믹스(omics)는 게놈(유전체, genome), 전사체(transcriptome), 외유전체(후성유전체, epigenome), 장거리 유전정보 맵핑 데이터(long distance genetic mapping data), 단백질체, 대사체, 환경체 등을 말한다. 예컨대, 게놈의 경우, 한국인 표준 변이체(KoVariome)등의 데이터 및 1,000 genome project의 참조 데이터 등이 활용 될 수 있다. 그리고, 분석부(120)는 수집한 오믹스별 참조표준 데이터를 저장부(110)에 저장할 수 있다.The analysis unit 120 collects reference standard data for each omics, as in the first block P1 shown in FIG. 2 . Here, omics are genome (genome, genome), transcriptome, exogenous (epigenome, epigenome), long distance genetic mapping data, proteome, metabolite, environment. say etc. For example, in the case of genome, data such as Korean standard variant (KoVariome) and reference data of 1,000 genome project can be utilized. In addition, the analysis unit 120 may store the collected reference standard data for each omics in the storage unit 110 .

또한, 분석부(120)는 도 2에 도시된 제2 블록(P2) 및 제3 블록(P3)과 같이, 개별 오믹스 내의 유전정보 간의 연관정보와 오믹스 사이의 유전정보 간의 연관정보를 획득한다. 그리고, 분석부(120)는 획득된 개별 오믹스 내의 유전정보 간의 연관정보와 오믹스 사이의 유전정보 간의 연관정보를 저장부(110)에 저장할 수 있다.In addition, the analysis unit 120 obtains association information between genetic information between genetic information in individual omics and association information between genetic information between omics, as in the second block P2 and third block P3 shown in FIG. 2 . do. In addition, the analysis unit 120 may store the obtained association information between the genetic information in the individual omics and the association information between the genetic information between the omics in the storage unit 110 .

여기서, 개별 오믹스 내의 유전정보 간의 연관정보는 오믹스 내의 유전정보 사이의 연관불균형 관계(linkage disequilibrium)를 나타낼 수 있다. 그리고, 오믹스 사이의 유전정보 간의 연관정보는 2개 이상의 오믹스 사이에서 연관성을 보이는 마커 또는 유전정보를 나타내며, eQTL(expresson Quantitative Trait Loci), mQTL(methylation Quantitative Trait Loci) 등일 수 있다. eQTL은 유전자의 발현(expression)과 연관성을 보이는 DNA 상의 SNP(Single Nucleotide Polymorphism)를 말하며, mQTL은 외유전체의 DNA 메틸레이션(methylation) 패턴과 연관성을 보이는 DNA 상의 SNP를 말한다.Here, the linkage information between the genetic information in individual omics may indicate a linkage disequilibrium between the genetic information in the omics. Further, the association information between genetic information between omics indicates a marker or genetic information showing association between two or more omics, and may be an expressionon quantitative trait loci (eQTL), a methylation quantitative trait loci (mQTL), or the like. eQTL refers to SNP (Single Nucleotide Polymorphism) on DNA that is correlated with gene expression, and mQTL refers to SNP on DNA that shows correlation with the DNA methylation pattern of the exogenous genome.

생산부(130)는 복수개의 오믹스 각각에 대해 부분적 오믹스 정보를 생산한다. 즉, 생산부(130)는 기 공개된 실험 방법 등을 통해 오믹스 별로 적은 양의 유전정보를 생산할 수 있다. 예컨대, 도 2에 도시된 제4 블록(P4)과 같이, 목표 영역에 대해서만 해독(targeted sequencing)하는 방법이나, LPS(low pass sequencing) 방법이나, 칩(chip) 방법 등을 통해 적은 비용을 들여 적은 양, 즉 부분적인 유전정보를 생산할 수 있다. 여기서, 적은 양의 유전정보(부분적인 유전정보)의 생산에 이용되는 실험 방법은 기 공개된 다양한 실험 방법이 적용될 수 있으며, 특정한 실험 방법에 한정되는 것은 아니다.The production unit 130 generates partial omics information for each of the plurality of omics. That is, the production unit 130 may produce a small amount of genetic information for each omics through a previously disclosed experimental method. For example, as in the fourth block P4 shown in FIG. 2 , a method for decoding only a target region, a low pass sequencing (LPS) method, a chip method, etc. is used at a low cost. It can produce a small amount, that is, partial genetic information. Here, as the experimental method used for the production of a small amount of genetic information (partial genetic information), various previously disclosed experimental methods may be applied, and it is not limited to a specific experimental method.

제1 산입부(140)는 복수개의 오믹스 각각에 대해 생산된 부분적 오믹스 정보를 대상으로, 기 구축된 개별 오믹스 내의 유전정보 간의 연관정보를 이용하여, 복수개의 오믹스 각각에 대하여 반복적인 산입을 수행한다. 여기서, 산입 방법은 기존의 통계학적 방법, 인공지능 알고리즘 등이 적용될 수 있으며, 특정한 산입 방법에 한정되는 것은 아니다.The first inclusion unit 140 is configured to repeat the iterative process for each of the plurality of omics by using the association information between genetic information in each omics that has been previously established with respect to the partial omics information produced for each of the plurality of omics. perform inclusion. Here, the inclusion method may be an existing statistical method, an artificial intelligence algorithm, or the like, and is not limited to a specific inclusion method.

즉, 제1 산입부(140)는 아래의 제1 과정, 제2 과정 및 제3 과정으로 이루어지는 과정을 수행할 수 있다.That is, the first inclusion unit 140 may perform a process including the following first process, second process, and third process.

제1 과정 : 오믹스에 대해 생산된 부분적 오믹스 정보를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 추가Step 1: Add genetic data inferred through inclusion using the linkage information between genetic information in individual omics with the partial omics information produced for omics

제2 과정 : 오믹스에 대해 생산된 부분적 오믹스 정보와 이전 과정에서 추가된 유전 데이터를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 다시 추가Second process: For partial omics information produced for omics and genetic data added in the previous process, genetic data deduced through inclusion using the linkage information between genetic information in individual omics is added again

제3 과정 : 제2 과정을 반복적으로 수행3rd process: Repeat the 2nd process

이때, 제1 산입부(140)는 염색체 상의 위치와 상관없이 유전정보 간의 모든 연관정보를 이용하여 산입을 수행할 수 있다. 염색체 상의 위치가 근접하지 않더라도, 염색질(chromatin) 수준의 폴딩(folding) 과정에서 근접하게 위치되거나, 트랜스(trans) 조절(regulation) 방식으로 서로 상호 작용을 할 수 있기 때문에, 본 발명에서는 염색체 상의 위치와 상관없이 유전정보 간의 모든 연관정보를 이용하여 산입을 수행한다. 이와 달리 기존의 DNA 서열 산입 방법은 염색체 상의 위치가 근접할 경우로 한정하여 산입을 수행하고 있다.In this case, the first inclusion unit 140 may perform the inclusion by using all association information between the genetic information regardless of the position on the chromosome. Even if the position on the chromosome is not close, in the present invention, the position on the chromosome is located close to each other in the folding process at the chromatin level, or because they can interact with each other in a trans regulation manner. Inclusion is performed using all related information between genetic information regardless of Contrary to this, the existing DNA sequence inclusion method is limited to the case where the position on the chromosome is close to performing the inclusion.

다시 설명하면, 제1 산입부(140)는 연관불균형 관계가 낮은(즉, 분석된 연관성이 높은) 오믹스 데이터를 해당 오믹스 내(예컨대, 게놈의 경우에는, 게놈 오믹스 데이터 내에서)에서 반복적으로 산입하는 과정을 수행할 수 있다.In other words, the first inclusion unit 140 converts omics data having a low linkage disequilibrium relationship (ie, a high analyzed association) in a corresponding omics (eg, in the case of a genome, within the genomic omics data). An iterative process of counting can be performed.

예컨대, 도 2에 도시된 제5 블록(P5)과 같이, 상단 첫번째줄 2번째 데이터(검정색 원)에 기초하여 3번째 빈 데이터(흰색 원)에 대한 산입이 수행되고, 5번째 데이터(검정색 원)에 기초하여 4번째 빈 데이터(흰색 원)와 6번째 빈 데이터(흰색 원)에 대한 산입이 수행될 수 있다. 그리고, 다음 산입과정에서, 중간 두번째줄 7번째 빈 데이터(흰색 원)는 이전 산입 과정에서 산입된 6번째 데이터를 기초로 산입될 수 있다. 이러한 일련의 반복적인 산입 과정을 통하여 부족했던 데이터(3, 4, 6 번째 빈 데이터 등)을 채워나갈 수 있다. 다만, 도 2에 도시한 제5 블록(P5)과 같이 인접한 위치에서 데이터가 산입되는 것은 발명의 이해를 돕기 위해 단순하게 설명한 것이고, 본 발명에서는 염색체 상의 위치와 상관없이 모든 조합의 연관성 정보를 활용하여 산입을 수행할 수 있다. 염색체 상의 위치가 근접하지 않더라도, 염색질(chromatin) 수준의 폴딩(folding) 과정에서 근접하게 위치되거나, 트랜스(trans) 조절(regulation) 방식으로 서로 상호 작용을 할 수 있기 때문에, 각 데이터의 다른 데이터와의 연관성을 기준으로 산입 절차가 반복적, 연속적으로 수행될 수 있다. 즉, 염색체 상의 위치와 상관없이, 데이터간 연관성 정보에 기초하여 산입이 수행될 수 있다.For example, as in the fifth block P5 shown in FIG. 2 , the inclusion of the third blank data (white circle) is performed based on the second data (black circle) in the upper first row, and the fifth data (black circle) ), the inclusion of the 4th bin data (white circle) and the 6th bin data (white circle) may be performed. And, in the next counting process, the 7th blank data (white circle) in the middle second row may be counted based on the 6th data counted in the previous counting process. It is possible to fill in the missing data (3rd, 4th, 6th blank data, etc.) through a series of iterative inclusion processes. However, as in the fifth block P5 shown in FIG. 2 , data included in adjacent positions is simply described to help the understanding of the invention, and in the present invention, association information of all combinations is utilized regardless of the position on the chromosome. so that inclusion can be performed. Even if the position on the chromosome is not close, it is located close to each other in the folding process at the chromatin level, or because they can interact with each other in a trans regulation manner, The inclusion procedure can be repeatedly and continuously performed based on the relevance of That is, regardless of a position on a chromosome, inclusion may be performed based on correlation information between data.

제2 산입부(150)는 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 반복적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스 사이의 유전정보 간의 연관정보를 이용하여, 추가적인 산입을 수행한다.The second inclusion unit 150 performs additional inclusion by using previously established association information between genetic information between omics for a result obtained through repeated inclusion using association information between genetic information in individual omics. carry out

즉, 제2 산입부(150)는 제1 산입부(140)에 의한 반복적인 산입을 통해 생산 및 추론이 안된 영역은 오믹스 간의 유전정보 연관정보를 활용하여 추가적인 산입을 수행할 수 있다.That is, the second inclusion unit 150 may perform additional inclusion by using the genetic information association information between the omics for regions that are not produced and inferred through repeated inclusion by the first inclusion unit 140 .

예컨대, 도 2에 도시된 제6 블록(P6)과 같이, 서로 다른 종류의 오믹스(게놈과 외유전체) 사이의 유전정보 연관성을 이용하여 빈 데이터를 채워 나갈 수 있다. 즉, 다른 오믹스 데이터간의 상관/연관정보를 기반(예컨대, DNA 서열의 채워지지 않는 영역을 RNA expreesion 정보와 DNA methylation 정보를 통해)으로 추가적으로 채워 나가는 과정이다.For example, as in the sixth block P6 shown in FIG. 2 , empty data may be filled by using genetic information correlation between different types of omics (genome and exogenous genome). That is, it is a process of additionally filling in an unfilled region of a DNA sequence based on correlation/correlation information between other omics data (eg, through RNA expreesion information and DNA methylation information).

제3 산입부(160)는 오믹스 사이의 유전정보 간의 연관정보를 이용한 추가적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스별 참조표준 데이터를 이용하여, 추가적인 산입을 수행한다.The third inclusion unit 160 performs additional inclusion by using pre-established reference standard data for each omics with respect to a result obtained through additional inclusion using correlation information between genetic information between omics.

즉, 제3 산입부(160)는 오믹스별 참조표준 데이터로부터 집단의 유전 정보 분포를 확인하고, 가능도(likelihood)를 측정하여 추가적인 산입을 수행할 수 있다.That is, the third inclusion unit 160 may check the distribution of genetic information of the population from the reference standard data for each omics, measure the likelihood, and perform additional inclusion.

예컨대, 도 2에 도시된 제7 블록(P7)과 같이, 제3 산입부(160)는 제1 산입부(140)에 의한 반복적인 산입과 제2 산입부(150)에 의한 추가적인 산입을 통해서도 생산 및 추론이 안된 영역은 오믹스별 참조표준 데이터(예컨대, 표준게놈 데이터 등)를 활용하여 잔여 부분을 채워나갈 수 있다. 즉, 참조표준 데이터를 이용하여 집단의 유전 정보 분포(polymorphism 등)를 확인하고, 가능도(likelihood) 등을 측정한 후, 여전히 채워지지 않은 데이터(즉, 빈 데이터)에 대한 유전정보를 추론하며, 해당 빈 데이터를 채워나갈 수 있다. 다시 설명하면, 참조표준 데이터로부터 가장 존재 확률이 높은 정보를 그대로 채워 나가게 된다.For example, like the seventh block P7 shown in FIG. 2 , the third inclusion unit 160 is also included through repeated inclusion by the first inclusion unit 140 and additional inclusion by the second inclusion unit 150 . Areas where production and inference are not made can be filled in with the remaining parts by using reference standard data for each omics (eg, standard genome data, etc.). That is, the genetic information distribution (polymorphism, etc.) of the population is checked using the reference standard data, the likelihood is measured, etc. , you can fill in the blank data. In other words, the information with the highest probability of existence is filled out as it is from the reference standard data.

이때, 제3 산입부(160)는 미리 설정된 친족관계 범위(예컨대, 부모자식 관계, 직계가족, 쌍둥이 관계 등) 내에 속하는 피검샘플이 다수 존재하는 경우, 서로 계층적인 친족관계(부모와 자식 등)를 가지는 샘플간의 해독의 깊이(sequencing depth)와 변이에 대한 정보를 활용하여 추가적인 산입을 수행할 수도 있다.At this time, the third inclusion unit 160 is a hierarchical kinship relationship (parent and child, etc.) when a plurality of test samples belonging to a preset kinship range (eg, parent-child relationship, immediate family, twin relationship, etc.) exist. Additional counting may be performed by using information on sequencing depth and variation between samples with .

보다 자세히 설명하면, 집단의 유전 정보를 활용하는 경우, 가족 전체의 게놈 정보를 해독하려고 할 경우, 각각의 개개인 가족 구성원의 샘플을 전장 게놈 해독을 한다면 이에 소요되는 전체 해독 비용이 매우 높을 수 있다. 이때, 본 발명과 같이 서로 계층적인 관계(부모와 자식 등)를 가지는 샘플간의 유전적 유사성(서로 일치되는 오믹스 정보)을 활용하는 경우, 부모와 자식들 간의 해독의 양을 1/3 이상 줄이더라도, 게놈을 포함하는 전체 오믹스 정보를 얻을 수 있다. 즉, 부모의 게놈이 유전적으로 공통된 부분과 변이된 부분들이 자식에게 거의 오류없이 전달되는 경우, 일부의 오믹스(게놈) 정보를 통해서도 전장 게놈과 거의 같은 수준의 오믹스 정보를 용이하게 생산할 수 있다. 따라서, 오믹스별 참조표준 데이터와 병행하여 추가적인 산입을 할 경우, 매우 낮은 비용으로도 가족 전체에 대한 매우 정확한 오믹스 정보를 생산할 수 있다.More specifically, when using the genetic information of a group, when trying to decode the genome information of the entire family, if the entire genome is decoded from the sample of each individual family member, the overall decoding cost can be very high. At this time, when using the genetic similarity (omics information matching each other) between samples having a hierarchical relationship (parent and child, etc.) as in the present invention, the amount of decoding between parent and child is reduced by 1/3 or more. Even in this case, it is possible to obtain full omics information including the genome. In other words, if the genetically common part of the parent's genome and the mutated part are passed on to the offspring with little error, omics information on the same level as the full-length genome can be easily produced even through some omics (genomic) information. . Therefore, if additional inclusion is performed in parallel with the reference standard data for each omics, very accurate omics information for the entire family can be produced at very low cost.

그러면, 도 3을 참조하여 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 방법에 대하여 설명한다.Then, with reference to FIG. 3 , a method of expanding and producing all omics information by using some omics information according to a preferred embodiment of the present invention will be described.

도 3은 본 발명의 바람직한 실시예에 따른 일부 오믹스 정보를 활용하여 전체 오믹스 정보로 확대 생산하는 방법을 설명하기 위한 흐름도이다.3 is a flowchart for explaining a method of expanding production to all omics information by using some omics information according to a preferred embodiment of the present invention.

도 3을 참조하면, 확대 생산 장치(100)는 오믹스별 참조표준 데이터를 수집한다(S110). 확대 생산 장치(100)는 수집한 오믹스별 참조표준 데이터를 확대 생산 장치(100)에 저장할 수 있다.Referring to FIG. 3 , the enlarged production apparatus 100 collects reference standard data for each omics ( S110 ). The expanded production apparatus 100 may store the collected reference standard data for each omics in the expanded production apparatus 100 .

여기서, 오믹스(omics)는 게놈(유전체, genome), 전사체(transcriptome), 외유전체(후성유전체, epigenome), 장거리 유전정보 맵핑 데이터(long distance genetic mapping data), 단백질체, 대사체, 환경체 등을 말한다.Here, omics are genome (genome, genome), transcriptome, exogenous (epigenome, epigenome), long distance genetic mapping data, proteome, metabolite, environment. say etc.

그리고, 확대 생산 장치(100)는 개별 오믹스 내의 유전정보 간의 연관정보와 오믹스 사이의 유전정보 간의 연관정보를 획득한다(S120). 확대 생산 장치(100)는 획득된 개별 오믹스 내의 유전정보 간의 연관정보와 오믹스 사이의 유전정보 간의 연관정보를 확대 생산 장치(100)에 저장할 수 있다.Then, the extended production apparatus 100 acquires the association information between the genetic information between the genetic information in each omics and the association information between the genetic information between the omics ( S120 ). The extended production apparatus 100 may store the obtained association information between the genetic information in the individual omics and the association information between the genetic information between the omics in the enlarged production apparatus 100 .

여기서, 개별 오믹스 내의 유전정보 간의 연관정보는 오믹스 내의 유전정보 사이의 연관불균형 관계(linkage disequilibrium)를 나타낼 수 있다. 그리고, 오믹스 사이의 유전정보 간의 연관정보는 2개 이상의 오믹스 사이에서 연관성을 보이는 마커 또는 유전정보를 나타내며, eQTL(expresson Quantitative Trait Loci), mQTL(methylation Quantitative Trait Loci) 등일 수 있다.Here, the linkage information between the genetic information in individual omics may indicate a linkage disequilibrium between the genetic information in the omics. Further, the association information between genetic information between omics indicates a marker or genetic information showing association between two or more omics, and may be an expressionon quantitative trait loci (eQTL), a methylation quantitative trait loci (mQTL), or the like.

이후, 확대 생산 장치(100)는 복수의 오믹스 각각에 대해 부분적 오믹스 정보를 생산한다(S130). 예컨대, 목표 영역에 대해서만 해독(targeted sequencing)하는 방법이나, LPS(low pass sequencing) 방법이나, 칩(chip) 방법 등을 통해 적은 비용을 들여 적은 양, 즉 부분적인 유전정보를 복수의 오믹스 각각에 대해 생산할 수 있다.Thereafter, the enlarged production apparatus 100 generates partial omics information for each of the plurality of omics ( S130 ). For example, a small amount, i.e., partial genetic information, is transferred to each of a plurality of omics at a low cost through a targeted sequencing method, a low pass sequencing (LPS) method, or a chip method. can be produced for

그런 다음, 확대 생산 장치(100)는 복수의 오믹스 각각에 대해 생산된 부분적 오믹스 정보를 대상으로, 기 구축된 개별 오믹스 내의 유전정보 간의 연관정보를 이용하여, 복수개의 오믹스 각각에 대하여 반복적인 산입을 수행한다(S140).Then, the expanded production apparatus 100 applies the partial omics information produced for each of the plurality of omics to each of the plurality of omics by using the association information between genetic information in the previously constructed individual omics. Repeated inclusion is performed (S140).

즉, 확대 생산 장치(100)는 오믹스에 대해 생산된 부분적 오믹스 정보를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 추가하는 제1 과정, 오믹스에 대해 생산된 부분적 오믹스 정보와 이전 과정에서 추가된 유전 데이터를 대상으로, 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 산입을 통해 추론된 유전 데이터를 다시 추가하는 제2 과정 및 제2 과정을 반복적으로 수행하는 제3 과정으로 이루어지는 과정을 수행할 수 있다.That is, the expansion production apparatus 100 targets the partial omics information produced for omics, a first process of adding genetic data inferred through inclusion using correlation information between genetic information in individual omics, omics The second process and the second process of adding the genetic data inferred through inclusion using the linkage information between the genetic information in individual omics with the partial omics information produced for the omics and the genetic data added in the previous process A process consisting of a third process that is repeatedly performed may be performed.

이때, 확대 생산 장치(100)는 염색체 상의 위치와 상관없이 유전정보 간의 모든 연관정보를 이용하여 산입을 수행할 수 있다.In this case, the expansion production apparatus 100 may perform the inclusion by using all association information between the genetic information regardless of the position on the chromosome.

그리고, 확대 생산 장치(100)는 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 반복적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스 사이의 유전정보 간의 연관정보를 이용하여, 추가적인 산입을 수행한다(S150).In addition, the expansion production apparatus 100 targets a result obtained through repeated inclusion using the correlation information between the genetic information in individual omics, and uses the correlation information between the genetic information between the established omics for additional inclusion. to perform (S150).

즉, 확대 생산 장치(100)는 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 반복적인 산입단계을 통해 생산 및 추론이 안된 영역은 오믹스 간의 유전정보 연관정보를 활용하여 추가적인 산입을 수행할 수 있다.That is, the expanded production apparatus 100 may perform additional inclusion by using the genetic information association information between the omics for the region that is not produced and inferred through the repeated inclusion step using the association information between the genetic information in the individual omics.

또한, 확대 생산 장치(100)는 오믹스 사이의 유전정보 간의 연관정보를 이용한 추가적인 산입을 통해 획득된 결과물을 대상으로, 기 구축된 오믹스별 참조표준 데이터를 이용하여, 추가적인 산입을 수행한다(S160).In addition, the expansion production apparatus 100 performs additional inclusion by using the reference standard data for each omics that has been established for the result obtained through additional inclusion using the linkage information between the genetic information between the omics ( S160).

즉, 확대 생산 장치(100)는 오믹스별 참조표준 데이터로부터 집단의 유전 정보 분포를 확인하고, 가능도(likelihood)를 측정하여 추가적인 산입을 수행할 수 있다. 다시 설명하면, 확대 생산 장치(100)는 개별 오믹스 내의 유전정보 간의 연관정보를 이용한 반복적인 산입단계와 오믹스 사이의 유전정보 간의 연관정보를 이용한 추가적인 산입단계를 통해서도 생산 및 추론이 안된 영역은 오믹스별 참조표준 데이터를 활용하여 잔여 부분을 채워나갈 수 있다.That is, the extended production apparatus 100 may check the distribution of genetic information of the population from the reference standard data for each omics, measure the likelihood, and perform additional inclusion. In other words, the expanded production device 100 is a region that is not produced and inferred even through the repeated inclusion step using the linkage information between the genetic information in individual omics and the additional inclusion step using the linkage information between the genetic information between the omics. The remaining part can be filled by using the reference standard data for each omics.

이때, 확대 생산 장치(100)는 미리 설정된 친족관계 범위 내에 속하는 피검샘플이 다수 존재하는 경우, 서로 계층적인 친족관계를 가지는 샘플간의 해독의 깊이(sequencing depth)와 변이에 대한 정보를 활용하여 추가적인 산입을 수행할 수도 있다.In this case, when there are a plurality of test samples belonging to the preset kinship range, the extended production apparatus 100 uses information about the sequencing depth and variation between samples having hierarchical kinship with each other to be additionally included. can also be performed.

본 발명에 따른 확대 생산 장치(100)는 위와 같은 "개별 오믹스 내의 유전정보 간의 연관정보를 이용한 반복적인 산입단계" -> "오믹스 사이의 유전정보 간의 연관정보를 이용한 추가적인 산입단계" -> "오믹스별 참조표준 데이터를 이용한 추가적인 산입단계"로 이루어지는 일련의 산입 단계를 통해, 부분적 오믹스 정보로부터 전체 오믹스 정보로 확대 생산할 수 있다.The expanded production apparatus 100 according to the present invention provides the above "repetitive inclusion step using the linkage information between the genetic information in individual omics" -> "Additional inclusion step using the linkage information between the genetic information between the omics" -> Through a series of inclusion steps consisting of "additional inclusion steps using reference standard data for each omics", it is possible to expand production from partial omics information to full omics information.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터로 읽을 수 있는 기록 매체는 컴퓨터에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 롬(ROM), 램(RAM), 씨디-롬(CD-ROM), 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있다.The present invention can also be embodied as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which computer-readable data is stored. Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

이상에서 본 발명의 바람직한 실시예에 대하여 상세하게 설명하였지만 본 발명은 상술한 특정의 바람직한 실시예에 한정되지 아니하며, 다음의 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 된다.Although preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific preferred embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the following claims Anyone with ordinary skill in the art can make various modifications, of course, such changes are within the scope of the claims.

100 : 확대 생산 장치,
110 : 저장부, 120 : 분석부,
130 : 생산부, 140 : 제1 산입부,
150 : 제2 산입부, 160 : 제3 산입부100: enlarged production device,
110: storage unit, 120: analysis unit,
130: production part, 140: first inclusion part,
150: second inclusion part, 160: third inclusion part

Claims

producing partial omics information for each of a plurality of omics by a production unit of an apparatus that expands and produces full omics information by utilizing partial omics information;
With respect to partial omics information produced for each of the plurality of omics by the first inclusion unit of the device, association information between genetic information in each omics that has been established is used for each of the plurality of omics performing iterative inclusion;
By the second inclusion unit of the device, the result obtained through repeated inclusion using the correlation information between genetic information in individual omics is used, performing inclusion; and
By the third inclusion unit of the device, additional inclusion is performed using pre-established reference standard data for each omics with respect to a result obtained through additional inclusion using correlation information between genetic information between omics. step;
How to expand production to full omics information by utilizing some omics information including

In claim 1,
Performing the iterative inclusion by the first inclusion unit of the device comprises:
a first process of adding genetic data inferred through inclusion using correlation information between genetic information in individual omics with respect to partial omics information produced for omics;
a second process of re-adding the genetic data deduced through inclusion using the correlation information between the genetic information in individual omics with respect to the partial omics information produced for omics and the genetic data added in the previous process; and
a third process of repeatedly performing the second process;
consisting of,
How to expand production to full omics information by utilizing some omics information.

In claim 1,
Performing the iterative inclusion by the first inclusion unit of the device comprises:
It consists of performing inclusion using all association information between genetic information regardless of the position on the chromosome,
How to expand production to full omics information by utilizing some omics information.

In claim 1,
The association information between the genetic information in the individual omics is,
Representing the linkage disequilibrium relationship between genetic information in omics,
How to expand production to full omics information by utilizing some omics information.

In claim 1,
The association information between the genetic information between the omics is,
Represents a marker or genetic information that shows a correlation between two or more omics,
Expressionon Quantitative Trait Loci (eQTL), a SNP (Single Nucleotide Polymorphism) on DNA that is correlated with gene expression, and mQTL (methylation Quantitative Trait) (methylation Quantitative Trait), a SNP on DNA that is correlated with the DNA methylation pattern of the exogenous genome Loci), one of
How to expand production to full omics information by utilizing some omics information.

In claim 1,
performing additional inclusion using the reference standard data for each omics by the third inclusion unit of the device,
Consists of confirming the distribution of genetic information in the population from the reference standard data for each omics and performing additional inclusion by measuring the likelihood,
How to expand production to full omics information by utilizing some omics information.

In claim 1,
performing additional inclusion using the reference standard data for each omics by the third inclusion unit of the device,
When there are a large number of test samples that fall within a preset kinship range, additional inclusion is performed by utilizing information on sequencing depth and variation between samples having hierarchical kinship with each other.
How to expand production to full omics information by utilizing some omics information.

a production unit generating partial omics information for each of the plurality of omics;
A first inclusion unit that repeatedly performs inclusion for each of the plurality of omics by using information related to the genetic information in each omics that has been previously established with respect to the partial omics information produced for each of the plurality of omics. ;
a second inclusion unit for performing additional inclusion by using previously-established association information between genetic information between omics for a result obtained through repeated inclusion using correlation information between genetic information in individual omics; and
a third inclusion unit for performing additional inclusion by using pre-established reference standard data for each omics with respect to a result obtained through additional inclusion using correlation information between genetic information between omics;
A device that expands production to full omics information by utilizing some omics information including

In claim 8,
The first inclusion part,
a first process of adding genetic data inferred through inclusion using correlation information between genetic information in individual omics with respect to partial omics information produced for omics;
a second process of re-adding the genetic data deduced through inclusion using the correlation information between the genetic information in individual omics with respect to the partial omics information produced for omics and the genetic data added in the previous process; and
a third process of repeatedly performing the second process;
performing a process consisting of
A device that utilizes some omics information to expand production to full omics information.

In claim 8,
The first inclusion part,
Counting is performed using all association information between genetic information regardless of the position on the chromosome,
A device that utilizes some omics information to expand production to full omics information.

In claim 8,
The association information between the genetic information in the individual omics is,
Representing the linkage disequilibrium relationship between genetic information in omics,
A device that utilizes some omics information to expand production to full omics information.

In claim 8,
The association information between the genetic information between the omics is,
Represents a marker or genetic information that shows a correlation between two or more omics,
Expressionon Quantitative Trait Loci (eQTL), a SNP (Single Nucleotide Polymorphism) on DNA that is correlated with gene expression, and mQTL (methylation Quantitative Trait) (methylation Quantitative Trait), a SNP on DNA that is correlated with the DNA methylation pattern of the exogenous genome Loci), one of
A device that utilizes some omics information to expand production to full omics information.

In claim 8,
The third inclusion part,
Confirming the distribution of genetic information in the population from the reference standard data for each omics, measuring the likelihood, and performing additional inclusion;
A device that utilizes some omics information to expand production to full omics information.

In claim 8,
The third inclusion part,
When there are a large number of test samples that fall within a preset kinship range, additional inclusion is performed using information on sequencing depth and variation between samples having hierarchical kinship with each other.
A device that utilizes some omics information to expand production to full omics information.