Nothing Special   »   [go: up one dir, main page]

CN109585017A - Risk prediction algorithm model and device for age-related macular degeneration - Google Patents

Risk prediction algorithm model and device for age-related macular degeneration Download PDF

Info

Publication number
CN109585017A
CN109585017A CN201910101067.7A CN201910101067A CN109585017A CN 109585017 A CN109585017 A CN 109585017A CN 201910101067 A CN201910101067 A CN 201910101067A CN 109585017 A CN109585017 A CN 109585017A
Authority
CN
China
Prior art keywords
age
macular degeneration
biomarker
risk
amd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910101067.7A
Other languages
Chinese (zh)
Other versions
CN109585017B (en
Inventor
王丽君
高军晖
袁卫兰
龚建兵
刘慧敏
林灵
许骋
张英霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Biotecan Medical Diagnostics Co ltd
Shanghai Biotecan Biology Medicine Technology Co ltd
Original Assignee
Shanghai Biotecan Medical Diagnostics Co ltd
Shanghai Biotecan Biology Medicine Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Biotecan Medical Diagnostics Co ltd, Shanghai Biotecan Biology Medicine Technology Co ltd filed Critical Shanghai Biotecan Medical Diagnostics Co ltd
Priority to CN201910101067.7A priority Critical patent/CN109585017B/en
Publication of CN109585017A publication Critical patent/CN109585017A/en
Application granted granted Critical
Publication of CN109585017B publication Critical patent/CN109585017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides an Age-related macular degeneration (AMD) risk prediction algorithm model and device. Specifically, the invention provides 7 related Single Nucleotide Polymorphisms (SNPs) genotyping, wherein the genotyping is converted into OR (odd ratio) values, 7 pieces of clinical information are combined, and a risk prediction algorithm model and a risk prediction algorithm device are constructed by adopting a machine learning method. The invention can assist in advanced prediction and early diagnosis of AMD in clinic and has great clinical significance for reducing the incidence rate of AMD and improving the disease treatment rate.

Description

A kind of the risk prediction algorithms model and device of age-related macular degeneration
Technical field
The present invention relates to medical biotechnology detection fields, and in particular to a kind of age-related macular degeneration (Age-related Macular degeneration, AMD) risk prediction algorithms model and device.
Background technique
Age-related macular degeneration (Age-related macular degeneration, AMD) is that the elderly is caused to lose Bright principal element.The disease has the complicated cause of disease relevant to factors such as age, gender, smoking, race and heredity, is irreversible Property visual deprivation, there is no effective treatment means for the disease at present.AMD disease incidence with higher, meta-analysis knot Fruit shows that the total disease incidence of global AMD is 8.01%, the disease incidence in Europe, Africa and asian population AMD is respectively 11.2%, 7.1% and 6.8%.China's elderly population early stage AMD and advanced stage AMD disease incidence are respectively 4.7%-9.2% and 0.2%- 1.9%.Prediction is to 2020 and the year two thousand forty, global AMD patient will respectively reach 1.96 hundred million and 2.88 hundred million.As China human mortality is old The quickening in age, AMD have apparent ascendant trend.
AMD occur as environmental factor and inherent cause comprehensive function as a result, wherein hair of the inherent cause in the disease Occupy higher proportion in raw risk, reaches 45-70%.The AMD cause of disease is complicated, and pathogenesis and the h and E factor are related, As described above, inherent cause occupies important proportion in the occurrence risk of the disease.Obviously, if comprehensive consideration h and E because Element, and combine eyesight, intraocular pressure, funduscopy and underwent eye-ground vascular fluorescence visualization, optical tomography etc. conventional and the inspection of auxiliary AMD It looks into, accurate diagnosis and effective risk assessment to AMD can be necessarily greatly improved in this, also would be beneficial for prevention and its morning of AMD Phase discovery and treatment.
Therefore, there is an urgent need in the art to develop a kind of method for reliably carrying out early prediction and diagnosis to AMD.
Summary of the invention
It is an object of the invention to provide a kind of age-related macular degeneration (Age-related macular Degeneration, AMD) risk prediction algorithms model and device.
In the first aspect of the present invention, a kind of biomarker set is provided, the set includes being selected from the group two The biomarker of kind: rs2338104, rs754203, or combinations thereof.
In another preferred example, the biomarker collection is combined into the biology mark for diagnosing macular degeneration (AMD) disease Will object set further includes biomarker selected from the group below: rs2284664, rs2071277, rs1999930, rs10490924, Rs5749482, or combinations thereof.
In another preferred example, the biomarker collection is combined into the biology mark for diagnosing macular degeneration (AMD) disease Will object set, the biomarker including being selected from Table A:
Table A
Number Chromosome location Mutating alkali yl
rs2338104 12:109457363 C>G
rs754203 14:99691630 A>G
rs2284664 1:196733395 C>T
rs2071277 6:32203906 T>C
rs1999930 6:116065971 C>T
rs10490924 10:122454932 G>T
rs5749482 22:32663679 C>G
In another preferred example, the biomarker set is used to diagnose macular degeneration (AMD) disease, or is used to prepare One kit or reagent, the kit or reagent are used to assess macular degeneration (AMD) disease risk of object to be measured (neurological susceptibility) or diagnosis (including early diagnosis and/or auxiliary diagnosis) object macular degeneration (AMD) disease to be measured.
In another preferred example, the set includes the biomarker selected from table B:
Table B
In another preferred example, the set includes biomarker b1~b2.
In another preferred example, the set further includes biomarker b3~b7.
In another preferred example, the set further includes biomarker: rs551397, rs800292, Rs10737680, rs3753396, rs1410996, rs2284664, rs1065489, or combinations thereof.
In another preferred example, the biomarker or biomarker set from blood, blood plasma, serum or Mouth swab sample.
In another preferred example, each biomarker is detected by PCR.
In another preferred example, the amplification of DNA fragmentation and the extension of single base are carried out using quantitative fluorescent PCR.
In another preferred example, the detection of biological standard object is carried out using MassARRAT Analyzer 4system.
In another preferred example, the PCR includes QPCR, quantitative fluorescent PCR.
In another preferred example, the set is used for the assessment or diagnosis of AMD risk.
In another preferred example, the AMD risk of the assessment object to be measured includes the early screening of AMD.
In the second aspect of the present invention, a kind of reagent combination of assessment or diagnosis for AMD risk is provided, it is described Reagent combination includes the reagent for detecting each biomarker in set as described in the first aspect of the invention.
In the third aspect of the present invention, a kind of kit is provided, the kit includes such as first aspect present invention institute The set stated and/or reagent combination as described in respect of the second aspect of the invention.
In another preferred example, each biomarker is used as standard items in set as described in the first aspect of the invention.
In another preferred example, the kit further includes a specification.
In the fourth aspect of the present invention, a kind of purposes of biomarker set is provided, is used to prepare a kit, it is described Kit be used for AMD risk assessment or diagnosis, wherein the biomarker set includes two kinds selected from the group below Biomarker: rs2338104, rs754203, or combinations thereof.
In another preferred example, when assessment or diagnosis for AMD risk, the biomarker set further includes Biomarker selected from the group below: rs2284664, rs2071277, rs1999930, rs10490924, rs5749482 or its Combination.
In another preferred example, the assessment comprising steps of
(1) sample for deriving from object to be measured is provided, to SNP points of each biomarker in set described in sample Offset (i.e. the A1 or A2 of table 2) is detected;
(2) site information that step (1) measures is compared with a reference data set;
Preferably, the reference data set includes from AMD patient and normal healthy controls person as in the set each A biomarker;
In another preferred example, the sample is selected from the group: blood, blood plasma, serum and buccal swab.
In another preferred example, the site information that step (1) is measured is compared with a reference data set, also Include the steps that establishing the multivariate statistical model of Supervised machine learning to export illness possibility, preferably, the machine Device learning model is Xgboost analysis model.
In another preferred example, if the illness possibility > 0.5, the object is judged as with AMD disease Sick risk suffers from AMD disease.
In another preferred example, before step (1), the method further includes the steps that handling sample.
In the fifth aspect of the invention, it provides a kind of for assessing or diagnosing the side of the AMD risk of object to be measured Method, comprising steps of
(1) sample for deriving from object to be measured is provided, to the site of each biomarker in set described in sample Information (such as SNP parting value (i.e. the A1 or A2 of table 2)) is detected;
(2) parting that step (1) measures is compared with a reference data set;
Preferably, the reference data set includes from AMD patient and normal healthy controls person as in the set each The data of a biomarker;
In another preferred example, the sample is selected from the group: blood, blood plasma, serum and buccal swab.
In another preferred example, described step (1) is measured into parting to calculate corresponding data and a reference data set It is compared, further including the steps that establishing has the machine learning model of supervision integrated study to export illness possibility, preferably Ground, the machine learning model are Xgboost analysis model.
In another preferred example, if the illness possibility > 0.5, the object is judged as with AMD disease Sick risk suffers from AMD disease.
In another preferred example, before step (1), the method further includes the steps that handling sample.
In the sixth aspect of the present invention, a kind of screen for assessing or diagnosing AMD risk candidate compound is provided Method, comprising steps of
(1) in test group, test compound is applied to object to be measured, detects the sample for deriving from the object in test group The horizontal V1 of each biomarker in gathering in product;In control group, blank control (including solvent) is applied to object to be measured, Detect the horizontal V2 of each biomarker in set described in the sample of the object in control group;
(2) the horizontal V1 and horizontal V2 detected to previous step is compared, so that it is determined that the test compound It whether is the candidate compound for treating AMD, wherein the set includes two or more biomarkers selected from the group below: rs2338104、rs1999930、rs10490924。
In another preferred example, the object to be measured suffers from AMD.
In another preferred example, if the horizontal V1 of one or more biomarkers selected from subset H is substantially less than water Flat V2 shows that testing compound is the candidate compound for treating AMD.
In another preferred example, described ratio≤0.8 for " substantially less than " referring to the horizontal V2 of horizontal V1/, preferably≤0.6, More preferably ,≤0.4.
In the seventh aspect of the present invention, a kind of purposes of biomarker set is provided, for screening assessment or diagnosis AMD The candidate compound of risk and/or for assessing candidate compound to the therapeutic effect of AMD, wherein the biological marker Object set two kinds of biomarkers selected from the group below: rs2338104, rs754203, or combinations thereof.
In another preferred example, the biomarker further include: rs2284664, rs2071277, rs1999930, Rs10490924, rs5749482, or combinations thereof.
In the eighth aspect of the present invention, a kind of AMD early stage auxiliary screening system is provided, which is characterized in that the system packet It includes:
(a) AMD related disease feature input module, the AMD related disease feature input module is for inputting certain a pair The AMD related disease feature of elephant;
Wherein the AMD related disease feature includes being selected from the group the site information of A (if SNP parting value is (i.e. table 2 A1 or A2)) two or more: rs2284664, rs2071277, rs1999930, rs10490924, rs2338104, Rs754203, rs5749482, or combinations thereof;
(b) AMD related disease differentiates that processing module, the processing module press pre- the AMD related disease feature of input Fixed judgment criteria carries out scoring processing, to obtain risk scoring;And by risk scoring and AMD related disease Risk threshold value be compared, to obtain auxiliary screening results, wherein when the risk scoring be higher than the risk When threshold value, then the risk for prompting the object to suffer from AMD related disease is higher than normal population;With
(c) screening results output module is assisted, the output module is used to export the auxiliary screening results.
It in another preferred example, further include following AMD related disease feature: age, the glycosuria state of an illness in the step (a) Condition, body-mass index (BMI index), injury of kidney situation, atherosclerosis, situation of drinking, whether often situation outdoors.
In another preferred example, the object is people.
In another preferred example, the object includes infant, teenager or adult.
In another preferred example, following to carry out risk scoring processing in the processing module:
In another preferred example, the feature input module includes sample collection instrument.
In another preferred example, the feature input module is selected from the group: 4 system of MassARRAT Analyzer Parting output module, Askme module.
In another preferred example, the differentiation processing module of the AMD related disease includes a processor and a storage Device, wherein be stored in the reservoir AMD related disease based on AMD related disease feature risk threshold data or Model.
In another preferred example, the output module includes reporting system (reporting system of such as Askme).
It should be understood that above-mentioned each technical characteristic of the invention and having in below (eg embodiment) within the scope of the present invention It can be combined with each other between each technical characteristic of body description, to form a new or preferred technical solution.As space is limited, exist This no longer tires out one by one states.
Detailed description of the invention
Fig. 1 shows technology path of the invention.
Fig. 2, which is shown, carries out gene SNP typing assay step using MassARRAT Analyzer 4system.
Fig. 3 shows that Logistic is returned, random forest, the repetition of Adaboost and Xgboost classifier 1000 times Random to split training set and test set, test set average result does ROC curve, and characteristic variable includes clinical information and site information (SNP+CC)。
Fig. 4 shows that Xgboost repeats 1000 study and prediction, the consensus forecast result of test set do ROC curve, " CC ", which is characterized variable and only has clinical information data, " SNP " to be characterized variable, only has SNP site, and SNP+CC is characterized variable packet Containing clinical information and site information.
Fig. 5 shows the importance scores of preceding 10 characteristic variables of Xgboost output.
Fig. 6 shows the relationship of variables number Yu ROC-AUC score.Process is that variable spy is obtained according to Xgboost model Importance (Feature-importance) score of sign, according to score optimal screening model again, one by one according to importance Score increases the number of characteristic variable from big to small and input model is trained and tests, and the ROC-AUC tested is optimal Required variables number, the corresponding variables number of optimal ROC-AUC is 4 as shown in the figure, can be by before importance scores four A characteristic variable treats as input variable, at this time ROC-AUC highest scoring.
Fig. 7 is shown using Xgboost as machine learning model, age, rs754203, rs2338104, diabetes conduct 1000 test set average value is done ROC curve by variable.
Specific embodiment
The present inventor after extensive and in-depth study, develops a kind of age-related macular degeneration (Age- for the first time Related macular degeneration, AMD) risk prediction algorithms model and device.The present invention uses 7 correlations Risk (Odd ratio) value of SNP constructs risk prediction algorithms model in conjunction with 7 clinical information, and using machine learning method With device.The present invention can adjuvant clinical carry out AMD look-ahead, early diagnosis, to reduce AMD disease incidence, improve its disease Treatment rate all has major clinical significance.The present invention is completed on this basis.
Term
Rs2338104: sequence
TGAAAAAGTTCTAAAATTAGATAGT [C/G] GTTATGGCCTCACAACTTGTGAATA, chromosome location 12: 109457363, participate in gene KCTD10
Rs754203: sequence
GTGCTGTCCTGGGGCCCAGGAGCCC [C/T] GGGGGCAAGGCTCTGCCCTGTTGCT, chromosome location 14: 99691630, it participates in gene C YP46A1 (GeneView)
Rs2284664: sequence
AGAAAAATACCAGTCTCCATAGATC [A/G/T] TAAAGCAAATAGATGGTCTTAAAAT, chromosome location 1:196733395 participates in gene C FH
Rs2071277: sequence
GGCAGTGACTGATGCAGTGTGTGAC [A/G] TCTAATCTCCCCCATAATTACAGGC, chromosome location 6: 32203906, participate in gene NOTCH4
Rs1999930: sequence
ATAGGACAGATTCTAGATTTTCCTT [A/C/G/T] TGATACAGAGAAATATAAGACATAA dyes position 6:116065971 is set, gene FRK is participated in
Rs10490924: sequence
TTTATCACACTCCATGATCCCAGCT [G/T] CTAAAATCCACACTGAGCTCTGCTT, chromosome location 10: 122454932, participate in Gene A RMS2
Rs5749482: sequence
TGGGAACTGACTAATACAGCATGTA [C/G] GAACTATGAAATATGAATTGTGTAA, chromosome location: 32663679, participate in gene LOC105373002, SYN3
Age-related macular degeneration (Age-related macular degeneration, AMD)
Aging for macula lutea plot structure sexually revises.It is mainly shown as retinal pigment epithelium to acromere disk film Digestion power decline is swallowed, makes the disk film residual body retention not digested completely in basal part cell magma, and to cell Outer discharge is deposited on Bruch film, and form glass-film wart then causes macular degeneration to be sent out thus after secondary various pathological changes It is raw, or cause this fracture of Bruch film, choroidal capillaries are entered under RPE by the Bruch film ruptured and retina neural It is upper subcutaneous, form choroidal neovascularization.Due to the textural anomaly of new vessels wall, lead to the leakage and bleeding of blood vessel, in turn Cause a series of secondary pathological change.Senile macular degeneration mostly occurs in 45 years old or more, and illness rate increases with the age It grows and increases, be the important diseases of current the elderly's blinding.
Single nucleotide polymorphism (Single nucleotide polymorphism, SNP)
It is primarily referred to as DNA sequence polymorphism caused by a single nucleotide variation at the genomic level.SNP is in people It is widely present in genoid group, it is average everyJust there is 1 in a base-pair, estimates its sum up to 3,000,000 very To more.SNP is a kind of label of two condition, as caused by the conversion or transversion of single base, can also by base insertion or lack It becomes homeless cause.SNP both may be in gene order, it is also possible on the non-coding sequence other than gene.
Xgboost
A kind of boosting's has supervision integrated study model, is combined by multiple associated CART trees and is constituted.CART is A kind of binary decision tree is each threshold value of each exhaustive characteristic series, being found according to GINI coefficient makes not when each branch Pure property reduces maximum characteristic column and its threshold values, two be then divided into according to characteristic series≤threshold value and characteristic series > threshold value Branch, each branch include the sample for meeting branch condition;Continue branch until all samples under the branch with same method Unified classification is belonged to, or reaches preset termination condition, if the classification in final leaf node is not unique, with most samples Classification of the classification as the leaf node.Xgboost is represented by following formula:
For predicted value, F indicates all possible CART tree, and f indicates a specific CART tree.
The objective function of model is following formula:
For loss function and ∑ Ω (fk) it is regular terms, the point that Obj (θ) is minimized is exactly this node Predicted value, it is the smallestFunctional value is least disadvantage function.Xgboost uses addition coaching method, step by step optimization aim Function optimizes one tree, re-optimization second tree, until having optimized k tree first.
ROC-AUC
A kind of method of evaluation model accuracy, ROC curve are Receiver operating curve (Receiver Operating characteristic curve), with false positive probability (False positive rate) for horizontal axis, kidney-Yang Property (True positive rate) be the longitudinal axis composed by coordinate diagram, be reflect sensibility and specificity continuous variable synthesis Index.AUC is ROC curve area under (Area under the curve).ROC-AUC value more connects between 1.0 and 0.5 It is bordering on 1, illustrates that diagnosis effect is better, there is lower accuracy at 0.5~0.7, there is certain accuracy, AUC at 0.7~0.9 There is high accuracy at 0.9 or more.When AUC=0.5, illustrate that diagnostic method does not work completely, no diagnostic value.AUC< 0.5 does not meet truth, few in practice to occur.
Main advantages of the present invention include:
1) present invention predicts AMD value-at-risk for the first time in clinical field with site information and clinical data, is suitable for high throughput The detection of sample;
2) the present invention prediction following age suffers from the risk of AMD, can prompt to change the effect to value-at-risk such as living habit, right There is prevention warning in AMD disease.
Present invention will be further explained below with reference to specific examples.It should be understood that these embodiments are merely to illustrate the present invention Rather than it limits the scope of the invention.In the following examples, the experimental methods for specific conditions are not specified, usually according to conventional strip Part, such as Sambrook et al., molecular cloning: laboratory manual (New York:Cold Spring Harbor Laboratory Press, 1989) condition described in, or according to the normal condition proposed by manufacturer.Unless otherwise stated, no Then percentage and number are weight percent and parts by weight.
Embodiment 1.
The AMD disease of algorithm model and device needs is filtered out by statistical analysis from 108 alternative SNP site data Relevant 7 site data.
It recruits Experiment Training group and control group carries out SNP statistics and clinical informatics analysis, by largely screening, find 108 SNP sites, SNP site are shown in Table 1.SNP typing data is obtained by following steps:
1. sample collection: using the two kinds of acquisition modes in lower section.
A) Blood specimen collection mode: whole blood acquires 2-4mL in EDTA anticoagulant tube.
B) buccal swab acquisition mode: nylon flocking buccal swab scrapes subject's oral cavity and oral cavity two sides are glutinous Film will adopt excellent buccal swab sample and be put into and fill sample protection until buccal swab nylon flocking position is all wet It is saved in the test tube of liquid (1-2mL).
2. sample transport: ice bag low-temperature transport being added in the foam box for being placed with sample.
3. 7500 quantitative fluorescent PCRs of application carry out the amplification of DNA fragmentation and the extension of single base.Dyestuff MIX is configured first: 1) when, configuring dyestuff, the several holes of polygamy is answered, -20 DEG C of preservations are put into after the completion of configuration;Secondly the mixed liquor of dye method and sonde method It should be marked on centrifugation tube wall, two kinds of dyestuffs is avoided to obscure;It is sequentially added into reagent, i.e. MIXTURE (17 μ L), primer again 1 (1 μ L) sample (2 μ L);Last sealer, upper machine are completed.
4. application MassARRAT Analyzer 4system carries out gene SNP parting, operating procedure is as shown in Figure 2.
5. obtaining the relevant SNP site of AMD, association point by SNP association analysis (GWAS) technology within the scope of full-length genome Analysis includes following hypothesis:
1) Genotypic Model (genotype model), it is assumed that A is time allele, and a is main allele, 3 kinds of differences Genotype have different influences.
2) Dominant Model (dominant models), i.e. AA/Aa have different influences from aa genotype.
3) Recessive Model (recessive model), i.e. AA has different influences from Aa/aa
4) Allelic Model (waiting bit models), i.e. A and a have different influences
Based on above-mentioned it is assumed that calculating chi-square value.O is observing frequency, and E is expected frequence.Such as (2) it is assumed that the first step we calculate AA or Aa (the two the meets one) observing frequency of genotype in normal person and The difference of expected frequence, the value V1 obtained divided by expected frequence, second step calculate AA or Aa according to the calculation method of normal person Value V2 in disease, third step obtain value V3 of the aa in normal person and according to the method described above respectively in diseases Value V4, calculating chi-square value is then V1+V2+V3+V4.The p value of its correlation is obtained by chi-square value, according to p value less than 0.05 Screening obtains 14 related locus.
There is the chromosome there are synteny inside this 14 related locus, 7 big sites of synteny excluded by algorithm, Specific algorithm is as follows: do 50 SNP sites draw window (window), this stroke of window every time move 5 SNP, calculate wherein 1 and its The multiple correlation index R in his each site2, calculate 1/ (1-R2) VIF index, if the index be greater than 2, exclude these SNP Point.Exclude rs551397, rs800292, rs10737680, rs3753396, rs1410996, rs2284664, rs1065489 Site finally obtains rs2284664, rs2071277, rs1999930, rs10490924, rs2338104, rs754203, The site rs5749482.
After above-mentioned process, required site is obtained, information is shown in Table 2.
6.SNP loci gene type cleaning data become corresponding numerical value, the AMD7 relevant bits which goes out The OR value (Odd ratio) of point.OR value (Odd ratio) refers to the probability that things occurs and the ratio between the probability not occurred.Formula It is as follows:
OR=(nA/na)/(mA/ma)=(nA × ma)/(mA × na)
Assuming that A is time allele, nA is the gene number of A in disease, and it is not the gene number of A, mA that na, which is in disease, For the gene number of A in control, it is not the gene number of A that ma, which is in control,.It has following effect:
A) when OR > 1, illustrate that the frequency of the A of case group is greater than non-case group, i.e. A has higher onset risk.
B) when OR < 1, illustrate the frequency of the A of case group lower than non-case group, i.e., A has protective effect.
C) disease and A equipotential contact more closely, and the numerical value of odds ratio is bigger.
The SNP site number that table 1. is initially selected
(Unified number of the dbSNP of US National Biotechnology Information center (NCBI) database)
SNP association analysis (GWAS) technology obtains the relevant SNP site information of AMD in 2. genome range of table
CHR SNP A1 F_A F_U A2 CHISQ P OR SE L95 U95
1 rs2284664 T 0.2687 0.3762 C 4.25 0.03924 0.6091 0.2414 0.3795 0.9777
6 rs2071277 C 0.3672 0.4471 T 7.591 0.022470 0.7175 0.2304 0.4568 1.127
6 rs1999930 T 0.03676 0.004587 C 5.204 0.02253 8.282 1.101 0.9571 71.67
10 rs10490924 T 0.5397 0.4231 G 4.286 0.03842 1.599 0.2273 1.024 2.496
12 rs2338104 G 0.4206 0.2905 C 5.951 0.01471 1.773 0.2359 1.117 2.816
14 rs754203 G 0.2868 0.3773 A 7.925 0.019020 0.6636 0.2352 0.4186 1.052
22 rs5749482 G 0.2353 0.3636 C 6.42 0.01128 0.5385 0.246 0.3325 0.872
First row CHR is the chromosome information in site, and second is classified as the number of SNP site, and it is time equipotential that third, which arranges (A1), Genotype, the 4th column F_A are the frequency observed of A1 genotype disease, and the 5th to be classified as F_U be A1 allele in Healthy People The frequency observed, the 6th is classified as the i.e. main allele (A2) of another allelotype, and the 7th column CHISQ is chi-square value, the Eight column P are the P value that chi-square value converts, and the 9th column OR is then OR value-at-risk, and being left ten, 11, ten second is the mark of OR value Quasi- mistake and the thereon upper value of 95% confidence interval and lower value.
Subsequent gene type will be replaced by the OR value of secondary allele, it is assumed for example that A is time allele, based on a etc. Position gene replaces with OR value comprising one allele (Aa), and will replace with OR comprising two allele (AA) is worth Square, 1. are replaced with if without the secondary allele (aa)
Embodiment 2.
Age of acquisition, height and weight index (BMI), hypertension situation, hyperlipidemia are arranged in questionnaire situation according to subject Situation, diabetic condition, injury of kidney situation, whether often outdoors, whether vegetarian diet, be not always drawn through cigarette, always do not drink Cross 13 clinical investigation data such as wine, atherosclerosis situation, ocular surgical situation, gender situation.
Embodiment 3.
Machine learning algorithm can be divided into three classes: supervised learning, unsupervised learning and semi-supervised learning.Supervised learning is to pass through Input is mapped to suitable output, for example divided by the corresponding relation between a part of input data and output data, generating function Class.Sample data of the invention is all in clinical definite, with the label classified, therefore will be in the machine learning for having supervision Exploration selection is carried out in disaggregated model.All samples are only had to the data (SNP) of SNP site information respectively, all samples only have The data (CC) of clinical information, and the integrated data (SNP+CC) of SNP site and clinical information is combined to be used as input data, sample This diagnostic result is as output category label.
Algorithm building is carried out according to following steps:
A) all data are randomly divided into 75% training set and 25% test set.
B) Machine learning classifiers are constructed.It uses SNP+CC as input data, successively attempts Logistic and return, it is random gloomy Woods, Adaboost and Xgboost.
C) cross validation tune is joined, and chooses the best parameter of score.
D) result verification is carried out with test set.
E) model evaluation.Above-mentioned steps repeat 1000 times, calculate aspect under the curve of average subject's curve of test set Product (ROC-AUC).The Xgboost for choosing highest ROC-AUC score is best model (see Fig. 3).
F) characteristic variable is screened.Respectively by clinical information (CC), site information (SNP) is believed in conjunction with clinical information and site It ceases (SNP+CC) and is used as input data, classified by Xgboost, repeated 1000 times, test ensemble average subject curve is shown in Fig. 4, it can be seen that the ROC-AUC highest of SNP+CC.
G) Feature Selection is advanced optimized.Xgboost model obtains the importance (Feature- of characteristics of variables Importance) score (such as preceding 10 importance is shown in Fig. 5) will change score according to score optimal screening model again From big to small, increase variables number one by one and remove trained and test model, to obtain the relationship of variables number Yu ROC-AUC score Scheme (see Fig. 6).The results show that the data of 4 most important variables (age, rs754203, rs2338104, diabetes) of input Training and test model, the ROC-AUC score highest that model measurement obtains.
H) using Xgboost as machine learning model, at the age, rs754203, rs2338104, diabetes are as input change Amount, the average ROC-AUC for obtaining 1000 times is (0.800 ± 0.06).
I) storage model, the AMD risk profile for subsequent measurement data.
J) value-at-risk exports: i.e. the test data of the complete algorithm model prediction input of learning training (is suffered from 0 (control) and 1 AMD disease) between probability, 1 (suffering from the disease) probability value is finally confirmed as value-at-risk, is by the judgement that value-at-risk is more than 0.5 Suffer from AMD disease.
All references mentioned in the present invention is incorporated herein by reference, independent just as each document It is incorporated as with reference to such.In addition, it should also be understood that, after reading the above teachings of the present invention, those skilled in the art can To make various changes or modifications to the present invention, such equivalent forms equally fall within model defined by the application the appended claims It encloses.
Sequence table
<110>Hypon rattan biological medicine Science and Technology Co., Ltd. on
Co., Ltd, upper Hypon rattan medical test institute
<120>the risk prediction algorithms model and device of a kind of age-related macular degeneration
<130> P2018-2112
<160> 7
<170> SIPOSequenceListing 1.0
<210> 1
<211> 52
<212> DNA
<213>artificial sequence (Artificial Sequence)
<400> 1
tgaaaaagtt ctaaaattag atagtcggtt atggcctcac aacttgtgaa ta 52
<210> 2
<211> 52
<212> DNA
<213>artificial sequence (Artificial Sequence)
<400> 2
gtgctgtcct ggggcccagg agcccctggg ggcaaggctc tgccctgttg ct 52
<210> 3
<211> 53
<212> DNA
<213>artificial sequence (Artificial Sequence)
<400> 3
agaaaaatac cagtctccat agatcagtta aagcaaatag atggtcttaa aat 53
<210> 4
<211> 52
<212> DNA
<213>artificial sequence (Artificial Sequence)
<400> 4
ggcagtgact gatgcagtgt gtgacagtct aatctccccc ataattacag gc 52
<210> 5
<211> 54
<212> DNA
<213>artificial sequence (Artificial Sequence)
<400> 5
ataggacaga ttctagattt tccttacgtt gatacagaga aatataagac ataa 54
<210> 6
<211> 52
<212> DNA
<213>artificial sequence (Artificial Sequence)
<400> 6
tttatcacac tccatgatcc cagctgtcta aaatccacac tgagctctgc tt 52
<210> 7
<211> 52
<212> DNA
<213>artificial sequence (Artificial Sequence)
<400> 7
tgggaactga ctaatacagc atgtacggaa ctatgaaata tgaattgtgt aa 52

Claims (9)

1. a kind of biomarker set, which is characterized in that the set includes the biomarker for being selected from the group two kinds: Rs2338104, rs754203, or combinations thereof.
2. biomarker set as described in claim 1, which is characterized in that the set further includes biology selected from the group below Marker: rs2284664, rs2071277, rs1999930, rs10490924, rs5749482, or combinations thereof.
3. biomarker set as described in claim 1, which is characterized in that the set further includes biomarker: Rs551397, rs800292, rs10737680, rs3753396, rs1410996, rs2284664, rs1065489 or its group It closes.
4. a kind of reagent of assessment or diagnosis for age-related macular degeneration risk combines, which is characterized in that institute Stating reagent combination includes the reagent for detecting each biomarker in set as described in claim 1.
5. a kind of kit, which is characterized in that the kit includes set as described in claim 1 and/or such as right It is required that reagent described in 4 combines.
6. a kind of purposes of biomarker set, which is characterized in that be used to prepare a kit, the kit is used for year The assessment or diagnosis of age macular degeneration related risk, wherein the biomarker set includes selected from the group below two Kind biomarker: rs2338104, rs754203, or combinations thereof.
7. a kind of screening is used to assess or the method for diagnosis of age-related macular degeneration risk candidate compound, feature It is, comprising steps of
(1) in test group, test compound is applied to object to be measured, is detected in test group in the sample of the object The horizontal V1 of each biomarker in set;In control group, blank control is applied to object to be measured, detects control group The horizontal V2 of each biomarker in set described in sample derived from the object;
(2) the horizontal V1 and horizontal V2 detected to previous step is compared, so that it is determined that the test compound whether It is the candidate compound for treating age-related macular degeneration, wherein the set includes two or more biologies selected from the group below Marker: rs2338104, rs1999930, rs10490924.
8. a kind of purposes of biomarker set, which is characterized in that become for screening assessment or diagnosis of age-related macula lutea The candidate compound of property risk and/or for assessing candidate compound to the therapeutic effect of age-related macular degeneration, Wherein, the biomarker set two kinds of biomarkers selected from the group below: rs2338104, rs754203, or combinations thereof.
9. a kind of age-related macular degeneration early stage auxiliary screening system, which is characterized in that the system comprises:
(a) age-related macular degeneration related disease feature input module, the age-related macular degeneration related disease Feature input module is used to input the age-related macular degeneration related disease feature of certain an object;
Wherein the age-related macular degeneration related disease feature includes two kinds or more of site information for being selected from the group A Kind: rs2284664, rs2071277, rs1999930, rs10490924, rs2338104, rs754203, rs5749482 or its Combination;
(b) age-related macular degeneration related disease differentiates that processing module, the processing module are related for the age of input Property macular degeneration related disease feature, carry out scoring processing by scheduled judgment criteria, thus obtain risk scoring;And it will The score risk threshold value of age-related property macular degeneration related disease of the risk is compared, to obtain auxiliary sieve Come to an end fruit, wherein when risk scoring is higher than the risk threshold value, then the object is prompted to suffer from age-related macular The risk of degeneration-related disorder is higher than normal population;With
(c) screening results output module is assisted, the output module is used to export the auxiliary screening results.
CN201910101067.7A 2019-01-31 2019-01-31 Risk prediction algorithm model and device for age-related macular degeneration Active CN109585017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910101067.7A CN109585017B (en) 2019-01-31 2019-01-31 Risk prediction algorithm model and device for age-related macular degeneration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910101067.7A CN109585017B (en) 2019-01-31 2019-01-31 Risk prediction algorithm model and device for age-related macular degeneration

Publications (2)

Publication Number Publication Date
CN109585017A true CN109585017A (en) 2019-04-05
CN109585017B CN109585017B (en) 2023-12-12

Family

ID=65918525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910101067.7A Active CN109585017B (en) 2019-01-31 2019-01-31 Risk prediction algorithm model and device for age-related macular degeneration

Country Status (1)

Country Link
CN (1) CN109585017B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110042156A (en) * 2019-04-22 2019-07-23 苏州亿康医学检验有限公司 A kind of method and its application judging endometrium receptivity
CN111471753A (en) * 2020-04-22 2020-07-31 优生贝(北京)生物技术有限公司 Female fertility genetic risk gene detection method based on risk assessment model
CN113906296A (en) * 2019-04-23 2022-01-07 中国医学科学院北京协和医院 Method and apparatus for diagnosing autism spectrum disorder using metabolite as marker based on machine learning
CN114283883A (en) * 2021-12-27 2022-04-05 河北北方学院附属第一医院 Liver cancer tumor screening model based on molecular marker and application
CN114373547A (en) * 2022-01-11 2022-04-19 平安科技(深圳)有限公司 Method and system for predicting disease risk
CN116179682A (en) * 2022-12-29 2023-05-30 温州谱希基因科技有限公司 Kit for detecting age-related macular degeneration and application thereof

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101173314A (en) * 2006-10-18 2008-05-07 四川省医学科学院(四川省人民医院) Reagent kit for detecting senility macular degeneration disease
CN101501194A (en) * 2006-06-13 2009-08-05 英国贝尔法斯特女王大学 Protection against and treatment of age related macular degeneration
CN101550451A (en) * 2008-03-04 2009-10-07 四川省医学科学院(四川省人民医院) Reagent kit for detecting agedness yellow spot degenerative disease
CN101748189A (en) * 2008-12-22 2010-06-23 上海基康生物技术有限公司 Senile dementia related locus detection method
CN101857899A (en) * 2009-04-03 2010-10-13 四川省医学科学院(四川省人民医院) Kit for detecting senile macular degeneration disease
CN103201393A (en) * 2010-11-01 2013-07-10 霍夫曼-拉罗奇有限公司 Predicting progression to advanced age-related macular degeneration using a polygenic score
CN203307338U (en) * 2012-09-25 2013-11-27 浙江爱易生物医学科技有限公司 Macular degeneration related gene locus detection kit
CN104334173A (en) * 2012-05-01 2015-02-04 特兰斯拉图姆医学公司 Methods for treating and diagnosing blinding eye diseases
US20170091425A1 (en) * 2010-07-19 2017-03-30 Pathway Genomics Corporation Genetic based health management systems for weight and nutrition control
CN107974500A (en) * 2018-01-22 2018-05-01 常熟市第二人民医院 LncRNAGAS5 is as the application in age related macular degeneration diagnosis marker

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101501194A (en) * 2006-06-13 2009-08-05 英国贝尔法斯特女王大学 Protection against and treatment of age related macular degeneration
CN101173314A (en) * 2006-10-18 2008-05-07 四川省医学科学院(四川省人民医院) Reagent kit for detecting senility macular degeneration disease
CN101550451A (en) * 2008-03-04 2009-10-07 四川省医学科学院(四川省人民医院) Reagent kit for detecting agedness yellow spot degenerative disease
CN101748189A (en) * 2008-12-22 2010-06-23 上海基康生物技术有限公司 Senile dementia related locus detection method
CN101857899A (en) * 2009-04-03 2010-10-13 四川省医学科学院(四川省人民医院) Kit for detecting senile macular degeneration disease
US20170091425A1 (en) * 2010-07-19 2017-03-30 Pathway Genomics Corporation Genetic based health management systems for weight and nutrition control
CN103201393A (en) * 2010-11-01 2013-07-10 霍夫曼-拉罗奇有限公司 Predicting progression to advanced age-related macular degeneration using a polygenic score
CN104334173A (en) * 2012-05-01 2015-02-04 特兰斯拉图姆医学公司 Methods for treating and diagnosing blinding eye diseases
CN203307338U (en) * 2012-09-25 2013-11-27 浙江爱易生物医学科技有限公司 Macular degeneration related gene locus detection kit
CN107974500A (en) * 2018-01-22 2018-05-01 常熟市第二人民医院 LncRNAGAS5 is as the application in age related macular degeneration diagnosis marker

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张敏: "KCTD10对神经胶质瘤侵袭及迁移的影响", 《中国优秀硕士学位论文全文数据库<医药卫生科技辑>》 *
张敏: "KCTD10对神经胶质瘤侵袭及迁移的影响", 《中国优秀硕士学位论文全文数据库<医药卫生科技辑>》, 28 February 2017 (2017-02-28), pages 2 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110042156A (en) * 2019-04-22 2019-07-23 苏州亿康医学检验有限公司 A kind of method and its application judging endometrium receptivity
CN110042156B (en) * 2019-04-22 2021-12-28 苏州亿康医学检验有限公司 Method for judging endometrial receptivity and application thereof
CN113906296A (en) * 2019-04-23 2022-01-07 中国医学科学院北京协和医院 Method and apparatus for diagnosing autism spectrum disorder using metabolite as marker based on machine learning
CN111471753A (en) * 2020-04-22 2020-07-31 优生贝(北京)生物技术有限公司 Female fertility genetic risk gene detection method based on risk assessment model
CN114283883A (en) * 2021-12-27 2022-04-05 河北北方学院附属第一医院 Liver cancer tumor screening model based on molecular marker and application
CN114283883B (en) * 2021-12-27 2022-11-22 上海华测艾普医学检验所有限公司 System for screening and risk prediction of liver cancer based on molecular marker and application
CN114373547A (en) * 2022-01-11 2022-04-19 平安科技(深圳)有限公司 Method and system for predicting disease risk
CN114373547B (en) * 2022-01-11 2024-10-25 平安科技(深圳)有限公司 Disease risk prediction method and system
CN116179682A (en) * 2022-12-29 2023-05-30 温州谱希基因科技有限公司 Kit for detecting age-related macular degeneration and application thereof
CN116179682B (en) * 2022-12-29 2024-02-06 温州谱希基因科技有限公司 Kit for detecting age-related macular degeneration and application thereof

Also Published As

Publication number Publication date
CN109585017B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
CN109585017A (en) Risk prediction algorithm model and device for age-related macular degeneration
van Koolwijk et al. Genetic contributions to glaucoma: heritability of intraocular pressure, retinal nerve fiber layer thickness, and optic disc morphology
CN107075446B (en) Biomarkers for obesity related diseases
CN105219844B (en) Gene marker combination, kit and the disease risks prediction model of a kind of a kind of disease of screening ten
CN107058521A (en) A kind of detecting system for detecting human immunity state
CN107338324A (en) For the serum lncRNA marks of acatalepsia reason recurrent miscarriage, primer sets and application and kit
CN107254531A (en) The genetic biomarkers thing of early onset colorectal cancer auxiliary diagnosis and its application
CN108531597A (en) A kind of detection kit for oral squamous cell carcinomas early diagnosis
CN117418025A (en) Application of intestinal flora marker in diagnosis and treatment of autism
CN109234385A (en) Detect the primer sets and kit of Alzheimer&#39;s disease gene mutation
CN111647673A (en) Application of microbial flora in acute pancreatitis
CN110358849A (en) Derived from the biomarker of the Diagnosis of Pancreatic inflammation of enteron aisle, screening technique and application thereof
CN102251045A (en) Screening kit for detecting high myopia
CN107557468B (en) Cancer-testis gene genetic marker related to auxiliary diagnosis of primary lung cancer and application thereof
CN114525336A (en) SNP (Single nucleotide polymorphism) combined markers for myopia diagnosis and right-eye cylindrical lens screening and application thereof
CN115505638A (en) Application of biomarker combination in risk prediction of highly myopic male susceptible population
CN115678986A (en) Biomarker combination for predicting risk of female high myopia and auxiliary diagnosis of female high myopia and application thereof
CN114783613A (en) Myopia prediction analysis method
WO2016049927A1 (en) Biomarkers for obesity related diseases
EP3636771A1 (en) Three molecular markers for diagnosis of glaucoma, kit, and application
US20210285047A1 (en) Two molecular markers, kits and applications for glaucoma diagnosis
WO2023197442A2 (en) Group of myopia and high myopia related snp markers and application thereof
CN114574574A (en) SNP markers related to quantitative traits of right-eye equivalent spherical lens and application thereof
CN113151512B (en) Detection of early lung cancer using intestinal bacteria
CN114807347A (en) SNP markers for screening myopia right ophthalmoscope and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant