WO2018049946A1

WO2018049946A1 - Biomarker composition for detection of adenomyosis and application thereof

Info

Publication number: WO2018049946A1
Application number: PCT/CN2017/096248
Authority: WO
Inventors: 贾慧珏; 钟焕姿; 宋晓蕾; 王子榕; 陈晨
Original assignee: 深圳华大基因研究院
Priority date: 2016-09-19
Filing date: 2017-08-07
Publication date: 2018-03-22
Also published as: CN107858415A; CN109689890A; CN107858415B; CN109689890B

Abstract

Provided are a biomarker composition for detection or risk assessment of adenomyosis, and application thereof. The biomarker composition comprises at least one of 44 nucleic acids. The 44 nucleic acids respectively have sequences represented by Seq ID No. 1 to Seq ID No. 44, or sequences having similarity of 97% or more to those represented by Seq ID No. 1 to Seq ID No. 44.

Description

Biomarker combination for adenomyosis detection and its application

Technical field

The present application relates to the field of biomarkers, and in particular to a biomarker combination for adenomyosis detection or risk assessment of a disease and its use.

Background technique

Adenomyosis is a symptom caused by the intima and glands of the uterus invading the myometrium. Under normal circumstances, the endometrium should be under the myometrium, there is a boundary between them, when the endometrium and superficial muscle layer are damaged, such as childbirth, multiple abortions and curettage, the endometrium will Take advantage of the imaginary, they grow in the myometrium, and stimulate the proliferation of surrounding muscle cells, forming adenomyosis. The endometrium in the myometrium can be the same as the normal endometrium, with periodic hyperemia, edema, and even hemorrhage due to changes in the menstrual cycle, causing intense uterine contractions and severe abdominal pain, while the patient's uterus uniformity increases. Hard, menorrhagia, long menstrual period, severe cases can lead to anemia.

At present, there are mainly the following treatment methods for adenomyosis: 1. surgical removal of the uterus; 2. conservative surgical treatment, 3. Chinese medicine conditioning treatment. Each of the three treatments has its pros and cons. In the past, adenomyosis occurred mostly in women over 40 years old, but in recent years it has gradually become younger, which may be related to the increase of cesarean section, induced abortion and other operations.

The clinical diagnosis of uterine adenomy depends mainly on symptoms, internal diagnosis and ultrasound examination. Ultrasonic scanning can see the entire uterus swollen, the uterine wall, especially the posterior wall, will exceed 2.5 mm or more. If it exceeds 2.5 cm thickness, it is almost certainly abnormal. If there is a certain group, it may be a fibroid or an adenoma. It can also be distinguished by ultrasound, because the adenoma has no capsular enveloping on the periphery, and the fibroids are there, and the ultrasound echo of the adenoma is better than the muscle. Strong tumor. In addition, the use of tumor index CA125 can also assist in diagnosis. However, none of the above methods can achieve early detection or risk assessment of adenomyosis.

Therefore, the search for sensitive and specific biomarkers of adenomyosis is an urgent problem to be solved.

Summary of the invention

The purpose of the present application is to provide a biomarker combination for adenomyosis detection or risk assessment of a disease, and its use in adenomyosis test kits, detection tools or drug screening.

In order to achieve the above objectives, the present application adopts the following technical solutions:

One aspect of the present application discloses a biomarker combination for adenomyosis detection or disease risk assessment, the biomarker combination comprising at least one of forty four nucleic acids, forty-four nucleic acids The sequences shown in Seq ID No. 1 to Seq ID No. 44, respectively, or sequences having 97% or more similarity to the sequences shown in Seq ID No. 1 to Seq ID No. 44, respectively.

It should be noted that the forty-four nucleic acids of the present application are researched and have a nucleic acid sequence associated with adenomyosis, wherein each nucleic acid sequence is associated with adenomyosis, and therefore, In the case of judging the accuracy of the judgment or the case where the requirement is low, it may be used alone or in combination for the adenomyosis test or the risk assessment of the disease. However, in a preferred embodiment of the present application, not only forty-four nucleic acids are used together, but also forty-four nucleic acids are classified according to a specific rule, and are divided into a plurality of marker groups, and each marker group is used together. Adenomyosis detection or risk assessment of disease, which will be described in detail in the following preferred technical solutions.

It should also be noted that the forty-four nucleic acids of the present application are clustered according to 97% or more similarity, and then the most representative sequence is selected from each taxon (abbreviation OTU) as a seed sequence, wherein Adenomyosis has forty-four seed sequences that are related, ie, constitutes a biomarker combination of the present application; therefore, in the biomarker combination of the present application, forty-four nucleic acids are not limited to Seq ID No. 1 to Seq The sequence shown by ID No. 44 may also be a sequence having 97% or more similarity to the sequence shown by Seq ID No. 1 to Seq ID No. 44.

It should be added that the biomarker combination for the detection of adenomyosis or the risk assessment of the present application is not directly based on the presence or absence of detection of the combination of biomarkers for adenomyosis or the risk of disease. Evaluate, but after detecting the combination of biomarkers, judge by random forest model, judge whether the test subject has adenomyosis or evaluate the adenomyosis of the test subject according to the probability of random forest model output. The risk will be explained in detail in the following technical solutions.

Preferably, another aspect of the present application discloses a biomarker combination for adenomyosis detection or risk assessment of a disease, the biomarker combination comprising a first marker panel, a second marker panel, and a third At least one of the marker groups; the first marker group consists of eighteen nucleic acids, and the eighteen nucleic acids are respectively Seq ID No. 1 to Seq ID No. 18, or respectively, and Seq ID No. The sequence shown in 1 to Seq ID No. 18 has a sequence of 97% or more similarity; the second marker group consists of twenty-two nucleic acids, and the twenty-two nucleic acids are Seq ID No. 1, Seq ID No. 4, respectively. , Seq ID No. 5, Seq ID No. 7, Seq ID No. 10, Seq ID No. 11, Seq ID No. 13, Seq ID No. 15, Seq ID No. 18 to Seq ID No. 31 Sequence, or respectively with Seq ID No. 1, Seq ID No. 4, Seq ID No. 5, Seq ID No. 7, Seq ID No. 10, Seq ID No. 11, Seq ID No. 13, Seq ID No. 15, Seq ID No. 18 to Seq ID No. 31 sequences having a similarity of 97% or more; the third marker group consisting of eighteen nucleic acids, respectively Seq ID No. 1, Seq ID No. 2, Seq ID No. 13, Seq ID No. 19, Seq ID No. 28, Seq ID No. 32 to Seq ID No. 44, or respectively, and Seq ID No. 1, Seq ID No. 2, Seq ID No. 13, Seq ID No. 19, Seq ID No. 28, Seq ID The sequence shown in No. 32 to Seq ID No. 44 has a sequence of 97% or more similarity.

It should be noted that, in a preferred embodiment of the present application, forty-four nucleic acids are reproducibly divided into three marker groups, namely, a first marker group, a second marker group, and a third marker group; The comprehensive judgment of the three marker groups can greatly improve the accuracy of detecting the adenomyosis of the biomarker combination of the present application or assessing the risk of the disease.

Preferably, the first marker group is a CL marker group for performing adenomyosis detection or risk assessment of a sample from the lower third of the vagina.

Preferably, the second marker group is a CU marker group for performing adenomyosis detection or risk assessment of a sample from the vaginal posterior iliac crest.

Preferably, the third marker group is a CV marker group for performing adenomyosis detection or risk assessment of a sample from the cervical canal.

It should be noted that the forty-four nucleic acids in the biomarker combination of the present application actually represent 28 kinds of microorganisms in the lower third of the vagina, the posterior vaginal canal and the cervical canal; the present application passes under the vagina Forty-four nucleic acids of 28 microorganisms in 1/3, vaginal posterior fornix and cervical canal were detected, and the relationship between their relative abundance and adenomyosis was statistically analyzed to establish a random forest model. Determine whether the subject has adenomyosis or is at risk of developing adenomyosis. Therefore, the three marker groups actually correspond to three sampling sites respectively; the samples from the three sites correspond to the respective marker groups, and are independently analyzed and judged. However, comprehensive judgment based on the results of the three methods can improve the accuracy of detecting the adenomyosis of the biomarker combination of the present application or assessing the risk of the disease.

It should also be noted that in the lower third of the vagina, the posterior vaginal canal and the cervical canal, the number of microorganisms is far more than 28 species, and the nucleic acids of 28 microorganisms are far more than the 44 described in the present application; However, this application screens forty-four nucleic acids of 28 microorganisms according to the random forest model, as a biomarker for adenomyosis detection, and provides a new way for the detection and evaluation of adenomyosis.

It should be added that in the three marker groups, the CL marker group is the marker group of the lower third of the vagina, the lower third of the vagina is abbreviated as CL; the marker of the CU marker group is the marker of the posterior vaginal sample. Group, vaginal posterior hernia is abbreviated as CU; CV marker group is the marker group of cervical canal sample, and cervical canal is abbreviated as CV.

Another aspect of the present application discloses a kit for adenomyosis detection or risk assessment of a disease comprising a primer pair for detecting a biomarker combination of the present application, a forward primer of a primer pair The sequence shown in SEQ ID No. 45, the reverse primer is the sequence shown in SEQ ID No. 46.

It should be noted that the biomarker combination of the present application can be present as a standard reference. In the kit, the primer pair is used directly for PCR amplification of the biomarker combination in the sample to be tested.

The other side of the present application discloses the use of the biomarker combination of the present application in a drug application for adenomyosis or in the preparation of a kit or detection tool for adenomyosis detection or risk assessment.

It can be understood that the biomarker combination of the present application is itself studied for adenomyosis, and can of course be used for the detection or risk assessment of adenomyosis; and the biomarker combination of the present application can also be integrated into some special uses. In the kit or tool for detecting adenomyosis, in order to facilitate the detection and evaluation of adenomyosis, as long as the biomarker combination of the present application is employed, it is within the scope of protection of the present application. At the same time, because the biomarker combination of the present application can detect adenomyosis or assess the risk of adenomyosis; of course, it can be compared with the pre- and post-medication adenomyosis or the disease. The risk of the disease changes to determine whether the drug used is effective for the purpose of drug screening.

A further aspect of the present application discloses a method for detecting adenomyosis, comprising the following steps,

(1) performing sample collection on the object to be tested, detecting the biomarker combination of the present application in the collected sample, and analyzing the level of each nucleic acid in the biomarker combination;

(2) comparing the level of each nucleic acid measured in the step (1) with a reference data set or a reference value to obtain a detection result;

Preferably, the level of each nucleic acid is the relative abundance of each nucleic acid; the reference data set or reference value is the level of each nucleic acid in the biomarker combination derived from adenomyosis patients and non-adenomyosis controls.

More preferably, the reference data set or reference value in step (2) is at least one of Table 5, Table 6, or Table 7; comparing the level of each nucleic acid with a reference data set or a reference value to obtain a detection result, specifically Including, using a multivariate statistical model to calculate the probability of disease, preferably, the multivariate statistical model is a random forest model.

More preferably, the sample to be tested is subjected to sample collection in step (1), including collecting the lower third of the vagina sample, the posterior vaginal sputum sample and the cervical canal sample.

It should be noted that the biomarker combination of the present application is a nucleic acid that has been studied and associated with adenomyosis. Therefore, by analyzing the collected samples of different parts of the test subject, the level of the corresponding biomarker combination is , that is, relative abundance, can detect whether the object to be tested is sick or judge the risk of disease.

A further aspect of the present application discloses a method for determining adenomyosis by detecting a biomarker for use in preparing a kit or a tool for assessing a disease or disease risk of adenomyosis; wherein the biomarker is the present application Biomarker combination;

A method for determining adenomyosis by detecting a biomarker includes the following steps,

A further aspect of the present application discloses a method of screening for a drug candidate for treating adenomyosis, comprising the following steps,

1) separately determining the biomarker combination of the present application in the sample before and after administration, and analyzing the level of each nucleic acid in the biomarker combination;

2) judging the candidate drug according to the level of each nucleic acid in the sample before and after the drug is compared;

In step 2), comparing the levels of each nucleic acid in the sample before and after administration, specifically including calculating the probability of disease using a multivariate statistical model, preferably, the multivariate statistical model is a random forest model.

A further aspect of the present application discloses a method for detecting a microbiota in a female reproductive tract, comprising the following steps:

(1) collecting a microbial sample in the reproductive tract of the test subject, detecting the biomarker combination of the present application in the collected sample, and analyzing the level of each nucleic acid in the biomarker combination;

Preferably, the reference data set or reference value in step (2) is at least one of Table 5, Table 6, or Table 7; comparing the level of each nucleic acid with a reference data set or a reference value to obtain a detection result, specifically including The multivariate statistical model is used to calculate the probability of disease. More preferably, the multivariate statistical model is a random forest model.

Preferably, in step (1), the microbial sample in the genital tract of the test subject is collected, specifically comprising collecting the lower third of the vaginal sample, the posterior vaginal sputum sample and the cervical canal sample of the test subject. Among them, the collection of the microbial samples in the reproductive tract can be carried out by using a conventional nylon fluff swab, which is not specifically limited herein.

It should be noted that the biomarker combination of the present application is actually based on the relationship between the microbial DNA in the female reproductive tract and adenomyosis, that is, the biomarker of the present application is actually in the female reproductive tract. The microorganism OTU capable of reflecting the adenomyosis state; therefore, the present application proposes a method for detecting a microbiota in the female reproductive tract, which provides a basis for judging and evaluating adenomyosis or its risk by detecting the microbial population.

A further aspect of the present application discloses a method of preparing a combination of adenomyosis biomarkers, comprising the steps of

(1) Collecting microbial samples in the genital tract of adenomyosis patients and non-patients, respectively, and performing 16S sequencing on all collected samples;

(2) Clustering the 16S sequencing results to obtain the OTU unit and the seed sequence of each OTU, and calculate the relative abundance of each OTU unit;

(3) Using the random forest model to fit the relative abundance of each OTU unit with the adenomyosis state, and perform five 10-fold cross-validation to obtain the optimal OTU combination, the OTU of the optimal OTU combination. A seed sequence, a combination of biomarkers that make up adenomyosis.

Preferably, in step (1), the microbial sample is collected in the genital tract, specifically comprising collecting the lower third of the vagina sample, the posterior vaginal sputum sample and the cervical canal sample.

It should be noted that the key to the preparation method of the adenomyosis biomarker combination in the present application is to use a random forest model to fit and verify the association between the microbial DNA and the adenomyosis in the reproductive tract, and finally obtain the ability to the uterus. A combination of biomarkers for the assessment of the risk or risk of adenomyosis. It will be understood that the preparation method of the present application or its basic idea is not limited to the preparation of a biomarker combination for adenomyosis; it can also be used to prepare a biomarker combination of similar conditions associated with the presence of microbial DNA in the reproductive tract, for example A biomarker combination of endometriosis.

Due to the adoption of the above technical solutions, the beneficial effects of the present application are:

The biomarker combination for adenomyosis detection of the present application provides a new way for the detection or risk assessment of adenomyosis, which can be used for early diagnosis of adenomyosis, avoiding symptoms and internal symptoms. Conventional tests such as diagnosis or ultrasound examination delay the diagnosis or treatment of adenomyosis. Other key advantages of this application include:

(a) The biomarker of the present application is used for the detection of adenomyosis or the risk assessment of a disease, and has the advantages of high sensitivity and high specificity, and has important application value.

(b) The genital tract sample as a biomarker detection sample has the advantages of convenient material selection, simple operation steps and continuous in vitro detection.

(c) The biomarkers of the present application are useful for the detection of adenomyosis or for assessing the risk of disease with reproducible characteristics.

DRAWINGS

1 is a graph showing the results of identifying adenomyosis based on a marker group of CL at the lower third of the vagina in the embodiment of the present application. In the figure, a is a randomized forest identification of adenomyosis with an increase in the number of OTUs. 5 times 10-fold cross-validation error rate distribution, b is the cross-validated combination of receiver operations Curve (abbreviated ROC curve), the area under the curve (abbreviated AUC) is 0.8668, the shaded area represents a 95% confidence interval, and the diagonal represents a curve with an AUC of 0.5;

2 is a diagram showing the results of augmentation of adenomyosis based on the marker group of vaginal posterior sacral CU in the embodiment of the present application. In the figure, a is 5 times of random adolescent differentiation of adenomyosis with increasing number of OTUs. The error rate distribution of the cross-validation verification, b is the cross-validated combination ROC curve, the area under the curve is 0.8404, the shaded area represents the 95% confidence interval, and the diagonal line represents the curve with AUC of 0.5;

3 is a graph showing the results of identifying adenomyosis based on the CV marker group of the cervical canal in the embodiment of the present application, in which a is a 10-fold crossover of a randomized forest differentiation adenomyosis with an increase in the number of OTUs. The error rate distribution of the verification, b is the cross-validated combination ROC curve, the area under the curve is 0.8369, the shaded area represents the 95% confidence interval, and the diagonal represents the curve with AUC of 0.5;

4 is a ROC curve for identifying adenomyosis in a second population of the CL marker group at the lower third of the vagina in the embodiment of the present application;

5 is a ROC curve for identifying adenomyosis in a second population of the vaginal posterior sputum CU marker group in the embodiment of the present application;

6 is a ROC curve of the cervical canal CV marker group for identifying adenomyosis in a second population in the embodiment of the present application;

In the figure, the number of variables refers to the number of OTUs, where sensitivity = true positive / (true positive + false negative); specificity = true negative / (true negative + false positive).

detailed description

The biomarker of the present application is obtained based on the relationship between the microbial DNA of the three parts of the subject and the adenomyosis. The biomarkers of the present application are actually the uterine glands of the three parts. Microorganism OTU in the myopathy state. Specifically, in a preparation method of the present application, the correspondence or biomarker is obtained by taking the relative abundance of the OTU seed sequence as a target, and the adenomyosis state (sick or non-diseased) is The second object, fitted to the two by a random forest model, was finally obtained by five 10-fold cross-validation. Through rigorous calculations and experimental studies, the present application finally obtained forty-four nucleic acids of 28 microorganisms at three sites as biomarkers of the present application.

In one implementation of the present application, the marker group of the three sites can independently evaluate the disease or risk of adenomyosis, but combine the probability of the three sites to determine whether the subject has a uterine adenomyis Whether the disease is at risk of adenomyosis, the accuracy will be higher.

The terms used in this application are those that are generally understood by those of ordinary skill in the art. For a better understanding of this application, some definitions and related terms are explained as follows:

The "Adenomyosis" of the present application is a diffuse or localized lesion of endometrial glands and interstitial invasion of the myometrium. Like endometriosis, it is a common gynecological disease and a difficult disease.

The levels of the biomarker materials of the present application are indicated by relative abundance.

In one embodiment of the present application, the reference value refers to a reference value or a normal value of a healthy control. It will be apparent to those skilled in the art that in the case where the number of samples is sufficient, the range of normal values, i.e., absolute values, of each biomarker can be obtained by inspection and calculation.

A "biomarker", also referred to as a "biological marker," as used herein, refers to a measurable indicator of the biological state of an individual. Such a biomarker may be any substance in an individual as long as they are related to a specific biological state of the individual to be examined, such as a disease. Such biomarkers can be, for example, nucleic acid markers (eg, DNA), protein markers, cytokine markers, chemokine markers, carbohydrate markers, antigenic markers, antibody markers, species markers ( Species/genus markers) and functional markers (KO/OG markers). The biomarkers of the present application are specifically DNA nucleic acid markers.

The term "OTU" in this application refers to the operation taxonomic units (OTU), which is in the phylogenetic study or population genetics research. In order to facilitate the analysis, a certain classification unit, such as strain, species, and genus, is artificially given. , grouping, etc., set the same flag. In the present application, the sequence is divided into an OTU according to a 97% similarity threshold, whereby a plurality of OTUs can be obtained for each of the three sites, and each OTU is regarded as a microbial species. The microbial diversity in the samples and the abundance of different microorganisms are based on an analysis of the OTU.

As used herein, "individual" refers to an animal, particularly a mammal, such as a primate, referred to as a human in the examples of the present application.

The present application will be further described in detail below by way of specific embodiments and the accompanying drawings. The following examples are only intended to further illustrate the present application and are not to be construed as limiting the invention.

Example

1. Materials and methods

1.1 Sample Collection

The sample collection in this case was assisted by a gynaecologist at Shenzhen Peking University Hospital. Excluding the cases of inflammation, the subjects were non-menstrual, non-pregnancy, non-lactation women, no endocrine and autoimmune diseases, normal liver and kidney function. No hormones or antibiotics were used for some time before sampling, no vaginal medication, vaginal lavage and cervical treatment, and no sexual life was performed within 48 hours before sampling. According to the above criteria, 95 cases of women of childbearing age were selected as the first group. All individuals who meet the above criteria are detailed Phenotypic information was registered to understand his medical history, family history, medication history, and lifestyle habits, and both signed informed consent.

The lower genital tract sampling is performed after the individual is admitted to the hospital, without disinfection, after emptying the urine, the lower third of the vagina (abbreviation CL), the posterior vagina (abbreviated CU), and the cervical canal (abbreviated CV) are collected in the gynecological examination bed. A sample of secretions at each site. Specifically, the sample number and sampling information of the 95 acquisition objects are: the fourteen acquisition objects of numbers C033, C038, C043, C051, C057, C062, C063, C065, T023, T069, T078, T089, T092, T095 are For patients with adenomyosis, samples of CL, CU, and CV were collected from fourteen subjects; numbers C023, C026, C028, C035, C039, C040, C041, C042, C045, C047, C048, C050, C053, C055, C056, C058, C059, C060, C064, C066, C067, C068, T022, T024, T025, T026, T027, T028, T029, T030, T031, T032, T033, T035, T036, T038, T039, T040, T041, T042, T043, T044, T045, T046, T047, T048, T049, T051, T052, T053, T054, T055, T056, T057, T058, T059, T060, T061, T062, T063, T064, T065, Eighty-one acquisition objects of T066, T067, T068, T070, T071, T072, T073, T074, T075, T076, T084, T085, T086, T087, T088, T090, T091, T093, T094 are non-adenomyosis Among the eighty-one patients, except for T048, three CL, CU and CV were collected. Bit samples, samples collected T048 only two positions CU and CV, CL not collected sample.

Sample collection was performed using a nylon fluff swab for sample collection. Nylon fluff swabs were purchased from Chenyang Global Group CY-93050 and CY-98000. After sampling, the swab head was quickly frozen with liquid nitrogen, stored at -80 ° C, and transported to Shenzhen Huada Gene Research Institute with dry ice for subsequent experiments.

1.2 DNA extraction and 16S sequencing

In this example, DNA extraction was performed using the QIAamp DNA Mini Kit (purchased from QIAGEN). The specific extraction steps are carried out in accordance with the instructions provided by the manufacturer. The 16S rRNA gene V4-V5 hypervariable region-specific primers were used for amplification. The two primers were V4-515F and V5-907R, V4-515F was the sequence shown by Seq ID No. 45, and V5-907R was Seq ID No. The sequence shown in 46.

Seq ID No. 45: 5'-GTGCCAGCMGCCGCGGTAA-3'

Seq ID No. 46: 5'-CCGTCAATTCMTTTRAGT-3’

The PCR procedure was as follows, denaturation at 94 ° C for 3 min; then into 25 cycles: denaturation at 94 ° C for 45 s, annealing at 50 ° C for 60 s, extension at 72 ° C for 90 s; after the end of the cycle, extension at 72 ° C for 10 min. The obtained PCR product was purified by AMPure Beads (Axygen), and sequencing was carried out by chip lane sequencing, and a plurality of samples were mixed and sequenced. Therefore, the library construction requires the addition of a linker sequence after ligation of a 10 bp barcode sequence at the outer end of the primer sequence of each sample. By adding different barcode sequences to each sample, ie sample identification Sequence, distinguishing between different samples. After the library was constructed, V5-V4 reverse sequencing was performed by Ion torrent PGM sequencing platform. The above library construction and sequencing were carried out by Shenzhen Huada Gene.

1.3 16S sequencing data processing

The raw data was extracted and pre-processed from the PGM system using Mothur software (V1.33.3). The standards for high-quality sequences include: 1) length greater than 200 bp; 2) less than 2 mismatched bases with degenerate PCR; 3) The average quality score is greater than 25. Based on the 16S rRNA gene sequence, the OTU was clustered using the QIIME uclust method, and the similarity threshold was set to 97%. A seed sequence of each OTU was selected and annotated using the reference gene information gg_13_8_otus in the Greengene database. The relative abundance of each OTU in each sample is calculated, where the relative abundance of an OTU is the ratio of the abundance of the OTU in a sample to the sum of all OTU abundances in the sample.

1.4 Analysis of microbial consistency between samples at different sites

Based on the presence or absence of OTU, this example uses the Sorenson index (

–Dice index) to measure the similarity of microbial populations at different sites in the same individual, calculated as follows:

Where A and B represent the number of OTUs in samples A and B, respectively, and C represents the number of OTUs shared in the two samples. QS is a similarity index and ranges from 0 to 1. In this example, the similarity index of CL and CU, the similarity index of CL and CV, and the similarity index of CU and CV were calculated. The similarity index is close to 1, indicating that the similarity of the microbiota at the two sampling sites is higher.

1.5 random forest classifier

In order to establish a model capable of identifying abnormal state samples, for each sampling site, the relative abundance of OTU of each sample was fitted to the adenomyosis state using the randomForest toolkit in R software (3.1.2RC). The default parameters. Wherein, the OTU of each sample is an OTU present in at least 10% of the sample, that is, the OTU detected in less than 10% of all samples to be tested at each site is excluded. Then, five 10-fold cross-validation is performed, and the error curves of the five 10-fold cross-validation are averaged, and the lowest error of the average post-curve is added to the standard error of the point as the domain value of the acceptable error. Among the groups of OTUs whose classification error is smaller than the domain value, the least number of OTUs is the optimal OTU combination as a biomarker combination for identifying adenomyosis.

1.6 Biomarker verification

In order to verify the biomarkers obtained in this example, this example additionally used an independent test population, that is, the second population for verification. In the second group, there were 4 adenomyosis patients and 36 non-adenomyosis individuals for CL and CU; for CV, there were 4 adenomyosis patients and 37 non-adenotrophic individuals.

2. Experimental results

2.1 Microbial structural characteristics and trends of the same in vivo genital tract

In order to explore the relationship between microbial populations in different regions of the reproductive tract, this example calculates the distance between samples of the same individual. The weighted UniFrac distance from the posterior vagina (CU), cervical (CV) mucus to the uterus and peritoneal fluid increased sequentially relative to the lower vaginal 1/3 (CL) sample, again indicating the anatomical structure From bottom to top, the community structure of the female reproductive tract is continuously changing.

Samples of different parts of the same individual showed a high correlation, and the Sorenson index between samples of different parts was consistent with their anatomical structure. There was a significant correlation between cervical (CV) mucus and peritoneal fluid samples, with an average Sorenson index of 0.255, indicating that the uterine cavity and abdominal cavity health can be evaluated by analyzing readily available cervical mucus samples in the general population.

In addition, in this case, the cervical mucus was sampled through the vagina and the uterine cavity, respectively. It was found that the bacterial distribution of the samples taken by the two routes showed a high degree of similarity, further indicating that the uterine cavity microorganisms can be evaluated by analyzing the easily available cervical tube samples. Case.

2.2 Disease-related microorganisms

In order to obtain the OTU biomarkers used to identify adenomyosis, this example establishes a random forest model. The specific steps are as follows: (1) Using the relative abundance of OTU as an input feature, design a random forest model based on the first population; (2) For the random forest model, a 10-fold cross-validation algorithm was designed, and the first group was divided into two types: adenomyosis individuals and non-adenomyosis individuals, and the ROC curves of random forest models were obtained respectively, with the area under each ROC curve. The AUC value is used as an evaluation index.

In this example, a random forest model was used, combined with a 10-fold cross-validation, to obtain the optimal biomarkers for each part, as shown in Table 1, for identifying adenomyosis. Tables 2 to 4 show the enrichment information of the marker group of the three sites in the sample, and Tables 5 to 7 respectively show the relative abundance information of the marker group of the three sites in the first population sample. In this example, the biomarkers of the three sites identify the results of adenomyosis, as shown in Figures 1 to 3. Figure 1 shows the identification of adenomyosis in the marker group at the lower third of the vagina (CL). Figure 2 shows the adenomyosis of the vaginal posterior iliac crest (CU) and FIG. 3 identifies the adenomyosis of the cervical canal (CV).

Table 1 biomarkers and their respective parts

Seq ID No.Seq ID No.	OTU编号OTU number	OTU分类OTU classification	CL CL	CUCU		CVCV
11	77	Acinetobacter sp.Acinetobacter sp.	√√	√√	√√
22	8080	Anaerococcus sp.Anaerococcus sp.	√√	----	√√
33	8383	Finegoldia sp.Finegoldia sp.	√√	----	----
44	3636	Ochrobactrum sp.Ochrobactrum sp.	√√	√√	----
55	11	Lactobacillus crispatusLactobacillus crispatus	√√	√√	----

66	421421	Lactobacillus inersLactobacillus iners	√√	----	----
77	6161	Lactobacillus sp.Lactobacillus sp.	√√	√√	----
88	7171	RuminococcaceaeRuminococcaceae	√√	----	----
99	550550	Lactobacillus sp.Lactobacillus sp.	√√	----	----
1010	274274	Peptoniphilus sp.Peptoniphilus sp.	√√	√√	----
1111	157157	BifidobacteriaceaeBifidobacteriaceae	√√	√√	----
1212	5656	Staphylococcus sp.Staphylococcus sp.	√√	----	----
1313	3434	ComamonadaceaeComamonadaceae	√√	√√	√√
1414	304304	Peptoniphilus sp.Peptoniphilus sp.	√√	----	----
1515	204204	Lactobacillus inersLactobacillus iners	√√	√√	----
1616	5959	Lactobacillus inersLactobacillus iners	√√	----	----
1717	1313	BifidobacteriaceaeBifidobacteriaceae	√√	----	----
1818	184184	Lactobacillus inersLactobacillus iners	√√	√√	----
1919	4444	EnterobacteriaceaeEnterobacteriaceae	----	√√	√√
2020	1212	Delftia sp.Delftia sp.	----	√√	----
21twenty one	1111	Vagococcus sp.Vagococcus sp.	----	√√	----
22twenty two	307307	Corynebacterium sp.Corynebacterium sp.	----	√√	----
23twenty three	3737	Pseudomonas viridiflavaPseudomonas viridiflava	----	√√	----
24twenty four	2626	Shewanella sp.Shewanella sp.	----	√√	----
2525	101101	Lactobacillus inersLactobacillus iners	----	√√	----
2626	9595	Paracoccus sp.Paracoccus sp.	----	√√	----
2727	3838	Lactobacillus sp.Lactobacillus sp.	----	√√	----
2828	4141	Pseudomonas sp.Pseudomonas sp.	----	√√	√√
2929	306306	Lactobacillus inersLactobacillus iners	----	√√	----
3030	138138	Lactobacillus inersLactobacillus iners	----	√√	----
3131	6060	Lactobacillus inersLactobacillus iners	----	√√	----
3232	3030	Stenotrophomonas sp.Stenotrophomonas sp.	----	----	√√
3333	4343	Pseudochrobactrum sp.Pseudochrobactrum sp.	----	----	√√
3434	8989	OxalobacteraceaeOxalobacteraceae	----	----	√√
3535	112112	Pseudomonas sp.Pseudomonas sp.	----	----	√√
3636	533533	Pseudomonas sp.Pseudomonas sp.	----	----	√√
3737	315315	Corynebacterium sp.Corynebacterium sp.	----	----	√√
3838	9999	Micrococcus luteusMicrococcus luteus	----	----	√√
3939	419419	TissierellaceaeTissierellaceae	----	----	√√
4040	492492	Paenibacillus sp.Paenibacillus sp.	----	----	√√
4141	147147	Shewanella sp.Shewanella sp.	----	----	√√
4242	1717	Pseudomonas fragiPseudomonas fragi	----	----	√√
4343	9898	Vagococcus sp.Vagococcus sp.	----	----	√√
4444	8181	Sphingobium sp.Sphingobium sp.	----	----	√√

In Table 1, the markers of the three parts of CL, CU, and CV can be judged separately, and "√" is a biomarker used for judging the part. "--" means that it is not needed. of.

When performing sample testing, it is necessary to calculate the relative abundance of the "√" OTU of each part, which will be relatively abundant. Enter the random forest model and get the result to determine whether it is adenomyosis.

Table 2 OTU abundance information of the marker group in CL

Table 3 OTU abundance information of marker groups in CU

Table 4 OTU abundance information of the marker group in CV

In Tables 2 to 4, the adenomyosis group refers to the sample of adenomyosis in 95 of the first group, and the control group refers to the absence of adenomyosis in 95 of the first group. Sample of the disease.

Table 5 Abundance information of each OTU of the marker group in the first group in CL

Table 6 Abundance information of each OTU of the marker group in the CU in the first population

Table 7 Abundance information of each OTU of the marker group in the CV in the first population

Figure 1 shows the adenomyosis identified by the marker group at the lower third of the vagina (CL). In the figure, a is a five-fold 10-fold cross-validation for randomized forest identification of adenomyosis with increasing number of OTUs. The distribution of error rates, the model was trained with the relative abundance of OTU in the sample, using a total of 14 adenomyosis individuals and 80 non-adenomyosis individuals with CL samples, and black lines representing the average of 5 trials. The value, the gray line represents 5 trials respectively, the black vertical line represents the number of OTUs in the best combination; the b diagram shows the receiver operation curve of the cross-validated combination, the area under the curve AUC is 0.8668, and the shaded area represents 95% confidence. The interval, the diagonal line represents a curve with an AUC of 0.5.

Figure 2 shows the adenomyosis of the vaginal posterior iliac crest (CU). In the figure, a is the error rate of five 10-fold cross-validation on random forest identification of adenomyosis with increasing number of OTUs. Distribution, the model was trained with the relative abundance of OTU in the sample, using a total of 14 uterine adenomyosis individuals and 81 non-adenomyosis individuals with CU samples, black lines representing the average of 5 trials, gray The line is 5 trials respectively, the black vertical line represents the number of OTUs in the best combination; b is the receiver's operation curve of the cross-validated combination, the area under the curve AUC is 0.8404, the shaded area represents 95% confidence interval, diagonal The line represents a curve with an AUC of 0.5.

Figure 3 is a marker group of cervical canal (CV) to identify adenomyosis. In the figure, a is the error rate distribution of five 10-fold cross-validation on random forest identification of adenomyosis with the increase of OTU number. In the case, the model was trained with the relative abundance of OTU in the sample, using a total of 14 CV samples from individuals with adenomyosis and 81 individuals with non-adenomyosis. The black line represents the mean of 5 trials, gray line For each of the five trials, the black vertical line represents the number of OTUs in the best combination; b is the cross-validated combination of the receiver operating curve, the area under the curve AUC is 0.8369, the shaded area represents the 95% confidence interval, diagonal A curve representing an AUC of 0.5.

It can be seen from the results of Fig. 1 to Fig. 3 that the OTU biomarker group at three different sites can identify individuals with adenomyosis and non-adenomyosis individuals; the area under the curve of ROC is 0.8668 (CL) ), 0.8404 (CU) and 0.8369 (CV). Among them, AUC is the area under the curve, and the larger the value, that is, the closer to 1, indicating that the judgment ability is stronger, that is, the more accurate the judgment.

2.3 Biomarker verification

The OTU biomarkers obtained from the random forest were verified in the second population samples, and the results are shown in Table 8, Table 9, and Table 10. In Tables 8 to 10, sample numbers such as C002CL, C002CU, and C002CV respectively indicate samples of three parts of CL, CU, and CV collected from the same C002 sample object. Tables 8 to 10 show the probability that the three marker groups predict the individual suffering from adenomyosis, and the ROC curve thus obtained is sequentially shown in Figs. 4 to 6 . In Tables 8 to 10, the probability > 0.5 is considered to be that the individual has a risk of suffering from adenomyosis or adenomyosis through the marker group at the site.

Table 8 CL marker group at the CL site predicts the probability that the second population sample has adenomyosis

样品编号Sample serial number	实际是否子宫腺肌症(N：否；Y是)Actually whether adenomyosis (N: No; Y is)	概率Probability
C001CLC001CL	NN	0.4450.445
C002CLC002CL	NN	0.1680.168
C003CLC003CL	YY	0.2890.289
C004CLC004CL	NN	0.0110.011
C005CLC005CL	NN	0.3580.358
C007CLC007CL	NN	0.1660.166
C008CLC008CL	NN	0.0000.000
C009CLC009CL	NN	0.0950.095
C011CLC011CL	NN	0.4470.447
C012CLC012CL	YY	0.5500.550
C014CLC014CL	NN	0.4770.477
C016CLC016CL	NN	0.3110.311
C018CLC018CL	NN	0.2130.213
C019CLC019CL	YY	0.8550.855
C020CLC020CL	NN	0.1320.132
C021CLC021CL	NN	0.3760.376
T000CLT000CL	NN	0.1170.117
T001CLT001CL	NN	0.1090.109
T003CLT003CL	NN	0.5260.526
T005CLT005CL	NN	0.5700.570
T006CLT006CL	NN	0.0790.079
T007CLT007CL	NN	0.0130.013
T008CLT008CL	NN	0.3820.382
T009CLT009CL	NN	0.0550.055
T010CLT010CL	NN	0.0380.038
T011CLT011CL	NN	0.1950.195
T012CLT012CL	NN	0.1470.147
T013CLT013CL	NN	0.0160.016
T014CLT014CL	NN	0.3480.348
T015CLT015CL	YY	0.5400.540
T016CLT016CL	NN	0.3520.352
T017CLT017CL	NN	0.3940.394
T018CLT018CL	NN	0.0530.053
T019CLT019CL	NN	0.1590.159
T020CLT020CL	NN	0.7660.766
T021CLT021CL	NN	0.0610.061
T080CLT080CL	NN	0.0060.006
T081CLT081CL	NN	0.5320.532
T082CLT082CL	NN	0.0890.089
T083CLT083CL	NN	0.2280.228

Table 9 CU marker group at CU site predicts the probability of adenomyosis in a second population sample

样品编号Sample serial number	实际是否子宫腺肌症(N：否；Y是)Actually whether adenomyosis (N: No; Y is)	概率Probability
C001CUC001CU	NN	0.4950.495
C002CUC002CU	NN	0.0740.074
C003CUC003CU	YY	0.3160.316
C004CUC004CU	NN	0.0400.040
C005CUC005CU	NN	0.3020.302
C007CUC007CU	NN	0.0000.000
C008CUC008CU	NN	0.0330.033
C009CUC009CU	NN	0.0830.083
C011CUC011CU	NN	0.4270.427
C012CUC012CU	YY	0.2340.234
C014CUC014CU	NN	0.2440.244
C016CUC016CU	NN	0.3460.346
C018CUC018CU	NN	0.4890.489
C019CUC019CU	YY	0.7980.798
C020CUC020CU	NN	0.0120.012
C021CUC021CU	NN	0.0690.069
T000CUT000CU	NN	0.0770.077
T001CUT001CU	NN	0.0170.017
T002CUT002CU	NN	0.0970.097
T003CUT003CU	NN	0.2740.274
T005CUT005CU	NN	0.2010.201
T006CUT006CU	NN	0.1630.163
T007CUT007CU	NN	0.0710.071
T008CUT008CU	NN	0.2440.244
T009CUT009CU	NN	0.0610.061
T010CUT010CU	NN	0.0010.001
T011CUT011CU	NN	0.1720.172
T013CUT013CU	NN	0.0900.090
T014CUT014CU	NN	0.0270.027
T015CUT015CU	YY	0.2400.240
T016CUT016CU	NN	0.0000.000
T017CUT017CU	NN	0.0000.000
T018CUT018CU	NN	0.0760.076
T019CUT019CU	NN	0.0560.056
T020CUT020CU	NN	0.7010.701
T021CUT021CU	NN	0.0200.020
T080CUT080CU	NN	0.0070.007
T081CUT081CU	NN	0.1500.150
T082CUT082CU	NN	0.1360.136
T083CUT083CU	NN	0.0170.017

Table 10 CV marker group at the CV site predicts the probability that the second population sample has adenomyosis

样品编号Sample serial number	实际是否子宫腺肌症(N：否；Y是)Actually whether adenomyosis (N: No; Y is)	概率Probability
C002CVC002CV	NN	0.4040.404
C003CVC003CV	YY	0.3740.374
C004CVC004CV	NN	0.1180.118
C005CVC005CV	NN	0.4030.403
C007CVC007CV	NN	0.4290.429
C008CVC008CV	NN	0.2780.278
C009CVC009CV	NN	0.3770.377
C011CVC011CV	NN	0.4660.466
C012CVC012CV	YY	0.5470.547
C014CVC014CV	NN	0.3330.333
C016CVC016CV	NN	0.4080.408
C018CVC018CV	NN	0.0810.081

C019CVC019CV	YY	0.5870.587
C020CVC020CV	NN	0.3490.349
C021CVC021CV	NN	0.0200.020
T000CVT000CV	NN	0.3700.370
T001CVT001CV	NN	0.3460.346
T002CVT002CV	NN	0.0000.000
T003CVT003CV	NN	0.0040.004
T004CVT004CV	NN	0.0000.000
T005CVT005CV	NN	0.0000.000
T006CVT006CV	NN	0.3660.366
T007CVT007CV	NN	0.4060.406
T008CVT008CV	NN	0.0660.066
T009CVT009CV	NN	0.2490.249
T010CVT010CV	NN	0.1000.100
T011CVT011CV	NN	0.3170.317
T012CVT012CV	NN	0.3440.344
T013CVT013CV	NN	0.4090.409
T014CVT014CV	NN	0.1380.138
T015CVT015CV	YY	0.6450.645
T016CVT016CV	NN	0.3710.371
T017CVT017CV	NN	0.0000.000
T018CVT018CV	NN	0.0000.000
T019CVT019CV	NN	0.0240.024
T020CVT020CV	NN	0.6400.640
T021CVT021CV	NN	0.0310.031
T080CVT080CV	NN	0.3160.316
T081CVT081CV	NN	0.3550.355
T082CVT082CV	NN	0.3120.312
T083CVT083CV	NN	0.3870.387

The results in Figure 4 show that the CL site is based on the CL marker group to determine the probability of adenomyosis, and its AUC value is 0.8750; the results in Figure 5 show that the CU site based on the CU marker group to determine the probability of adenomyosis, its AUC value is 0.840; The results in Fig. 6 show that the CV site is based on the CV marker group to determine the probability of adenomyosis, and its AUC value is 0.9189; it can be seen that these three marker groups have higher discriminating ability and can be used for the detection of adenomyosis. This result is consistent with the results of Tables 8 to 10. In the results of Tables 8 to 10, the probability of prediction by the three marker groups, at least one of which is greater than 0.5, judges that the individual has the risk of suffering from adenomyosis or has adenomyosis, and the judgment result obtained thereby In line with the actual situation.

The above content is a further detailed description of the present application in conjunction with the specific embodiments, and the specific implementation of the present application is not limited to the description. For the ordinary person skilled in the art to which the present invention pertains, a number of simple deductions or substitutions may be made without departing from the spirit of the present application.

Claims

A biomarker combination for adenomyosis detection or risk assessment of a disease, characterized in that the biomarker combination comprises at least one of forty four nucleic acids, respectively The sequence shown by Seq ID No. 1 to Seq ID No. 44, or a sequence having 97% or more similarity to the sequence shown by Seq ID No. 1 to Seq ID No. 44, respectively.
A biomarker combination for adenomyosis detection or risk assessment of a disease, characterized in that the biomarker combination comprises a first marker group, a second marker group and a third marker group At least one group;

The first marker group is composed of eighteen nucleic acids, and the eighteen nucleic acids are respectively the sequences shown by Seq ID No. 1 to Seq ID No. 18, or respectively, and Seq ID No. 1 to Seq ID No. 18, respectively. The sequence shown has a sequence of 97% or more similarity;

The second marker group is composed of twenty-two nucleic acids, and the twenty-two nucleic acids are Seq ID No. 1, Seq ID No. 4, Seq ID No. 5, Seq ID No. 7, and Seq ID No., respectively. 10. Seq ID No. 11, Seq ID No. 13, Seq ID No. 15, Seq ID No. 18 to Seq ID No. 31, or Seq ID No. 1, Seq ID No. 4, respectively. , Seq ID No. 5, Seq ID No. 7, Seq ID No. 10, Seq ID No. 11, Seq ID No. 13, Seq ID No. 15, Seq ID No. 18 to Seq ID No. 31 a sequence having a sequence similarity of 97% or more;

The third marker group is composed of eighteen nucleic acids, which are Seq ID No. 1, Seq ID No. 2, Seq ID No. 13, Seq ID No. 19, and Seq ID No. 28, respectively. The sequence shown by Seq ID No. 32 to Seq ID No. 44, or Seq ID No. 1, Seq ID No. 2, Seq ID No. 13, Seq ID No. 19, Seq ID No. 28, The sequence shown by Seq ID No. 32 to Seq ID No. 44 has a sequence of 97% or more similarity.
The biomarker combination according to claim 2, wherein the first marker group is a CL marker group for detecting adenomyosis or a disease risk from a sample of the lower third of the vagina. Evaluation.
The biomarker combination according to claim 2, wherein the second marker group is a CU marker group for performing adenomyosis detection or disease risk assessment on a sample from the vaginal posterior iliac crest.
The biomarker combination according to claim 2, wherein the third marker group is a CV marker group for performing adenomyosis detection or disease risk assessment on a sample from the cervical canal.
A kit for use in the detection of adenomyosis or risk assessment of a disease, characterized in that the kit comprises a primer pair for detecting the biomarker combination according to any one of claims 1 to 5, Said The forward primer of the primer pair is the sequence shown in SEQ ID No. 45, and the reverse primer is the sequence shown in SEQ ID No. 46.
Use of the biomarker combination according to any one of claims 1 to 5 for drug screening of adenomyosis or for the preparation of a kit or a detection tool for adenomyosis detection or risk assessment.
A method for detecting adenomyosis characterized by comprising the following steps,

(1) performing sample collection on the object to be tested, detecting the biomarker combination according to any one of claims 1 to 5 in the collected sample, and analyzing the level of each nucleic acid in the biomarker combination;

(2) comparing the level of each nucleic acid measured in the step (1) with a reference data set or a reference value to obtain a detection result;

Preferably, the level of each nucleic acid is the relative abundance of each nucleic acid; the reference data set or reference value is the level of each nucleic acid in the biomarker combination derived from adenomyosis patients and non-adenomyosis controls. .
The detecting method according to claim 8, wherein the reference data set or reference value in the step (2) is at least one of the table 5, the table 6 or the table 7; the level and reference of each nucleic acid are referred to The data set or the reference value is compared to obtain the detection result, which specifically includes calculating the disease probability by using the multivariate statistical model. Preferably, the multivariate statistical model is a random forest model.
The detecting method according to claim 8 or 9, wherein in the step (1), the sample to be measured is subjected to sample collection, including collecting the lower third of the vagina sample, the posterior vaginal sample and the cervical canal sample of the object to be tested. .
A method for determining adenomyosis by detecting a biomarker for use in preparing a kit or a tool for assessing a disease or disease risk; the biomarker is according to any one of claims 1-5 Combination of biomarkers;

The method for determining adenomyosis by detecting a biomarker includes the following steps,

(1) performing sample collection on the object to be tested, detecting the biomarker combination according to any one of claims 1 to 5 in the collected sample, and analyzing the level of each nucleic acid in the biomarker combination;

(2) comparing the level of each nucleic acid measured in the step (1) with a reference data set or a reference value to obtain a detection result;

Preferably, the level of each nucleic acid is the relative abundance of each nucleic acid; the reference data set or reference value is the level of each nucleic acid in the biomarker combination derived from adenomyosis patients and non-adenomyosis controls. .
The use according to claim 11, wherein the reference data set or reference value in the step (2) is at least one of the table 5, the table 6 or the table 7; the level of each nucleic acid and the reference data The set or reference value is compared to obtain the test result, which specifically includes calculating the disease probability by using the multivariate statistical model. Preferably, the multivariate statistical model is a random forest model.
A method for screening a drug candidate for treating adenomyosis, characterized by comprising the following steps,

1) separately determining the biomarker combination according to any one of claims 1 to 5 in the sample before and after administration, and analyzing the level of each nucleic acid in the biomarker combination;

2) judging the candidate drug according to the level of each nucleic acid in the sample before and after the drug is compared;

In the step 2), comparing the levels of each nucleic acid in the sample before and after administration, specifically including calculating the probability of disease by using a multivariate statistical model, preferably, the multivariate statistical model is a random forest model.
A method for detecting a microbiota in a female reproductive tract, comprising: the following steps,

(1) collecting a microbial sample in the reproductive tract of the test subject, detecting the biomarker combination according to any one of claims 1 to 5 in the collected sample, and analyzing the level of each nucleic acid in the biomarker combination;

(2) comparing the level of each nucleic acid measured in the step (1) with a reference data set or a reference value to obtain a detection result;

Preferably, the level of each nucleic acid is the relative abundance of each nucleic acid; the reference data set or reference value is the level of each nucleic acid in the biomarker combination derived from adenomyosis patients and non-adenomyosis controls. .
The detecting method according to claim 14, wherein the reference data set or reference value in the step (2) is at least one of the table 5, the table 6 or the table 7; the level and reference of each nucleic acid The data set or the reference value is compared to obtain the detection result, which specifically includes calculating the disease probability by using the multivariate statistical model. Preferably, the multivariate statistical model is a random forest model.
The detecting method according to claim 14 or 15, wherein in the step (1), the microbial sample in the genital tract of the object to be tested is collected, which comprises collecting the lower third of the vagina sample, the vaginal posterior iliac sample and the sample to be tested. Cervical tube samples.
A method for preparing a combination of adenomyosis biomarkers, comprising: the following steps,

(1) Collecting microbial samples in the genital tract of adenomyosis patients and non-patients, respectively, and performing 16S sequencing on all collected samples;

(2) Clustering the 16S sequencing results to obtain the OTU unit and the seed sequence of each OTU, and calculate the relative abundance of each OTU unit;

(3) Using the random forest model to fit the relative abundance of each OTU unit with the adenomyosis state, and perform five 10-fold cross-validation to obtain the optimal OTU combination, the OTU of the optimal OTU combination. A seed sequence, a combination of biomarkers that make up adenomyosis.
The method according to claim 17, wherein in the step (1), the microbial sample is collected in the genital tract, and specifically comprises collecting the lower third of the vagina sample, the posterior vaginal sputum sample and the cervical canal sample of the test subject.