CN110674841B

CN110674841B - Logging curve identification method based on clustering algorithm

Info

Publication number: CN110674841B
Application number: CN201910780696.7A
Authority: CN
Inventors: 周军; 姬庆庆; 李国军; 胡家琦; 张娟; 朱登明; 刘昱晟; 王兆其
Original assignee: China National Petroleum Corp; Institute of Computing Technology of CAS; China Petroleum Logging Co Ltd
Current assignee: China National Petroleum Corp; Institute of Computing Technology of CAS; China Petroleum Logging Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2022-03-29
Anticipated expiration: 2039-08-22
Also published as: CN110674841A

Abstract

The invention discloses a logging curve identification method based on a clustering algorithm, which belongs to the field of logging curve identification. The logging curve identification method based on the clustering algorithm first uses the principal component analysis method to reduce the dimensionality of the logging curve to simplify the information reflected by the logging curve, and use a few independent and uncorrelated comprehensive indicators to replace it. A limited number of indicators can fully reflect the original multi-indicator information as much as possible; combining machine learning ideas with K clustering analysis method to construct a KNN network, and using the calibrated hierarchical information as the data label of training data in machine learning, can be more effective. The K-means clustering problem can be well guided to find the cluster center, and real and effective clustering results can be obtained; the identification method of the present invention does not need to perform normalization operation on the curve data, which greatly reduces the time complexity of program operation. Spend.

Description

Logging curve identification method based on clustering algorithm

Technical Field

The invention belongs to the field of logging curve identification, and particularly relates to a logging curve identification method based on a clustering algorithm.

Background

The logging technique originated from schrenbach, france and mainly collected attributes that reflect the properties of the formation, such as radioactivity, acoustic properties and conductivity. With the development of the last hundred years, the logging technology goes through the development process from analog logging to digital logging, numerical control logging and imaging logging. The method is widely applied to the exploration and development process of oil and gas fields, and becomes an important technical means for assisting geological exploration and oil exploitation personnel to find and evaluate oil and gas reservoirs. Meanwhile, the technology is gradually widely applied to the exploration of other mineral resources. With the continuous development of the logging technology in recent years, the logging information obtained by using the logging means is more and more abundant. The related well logging interpretation technology gradually moves from qualitative and semi-quantitative manual interpretation to the era of quantitative interpretation by means of computers, and the related interpretation model and interpretation efficiency are improved to a certain extent. But in general, the well logging interpretation method still lags behind and has low accuracy and efficiency. With the continuous expansion of oil and gas exploration scale, well logging interpretation is about to face more and more complex research objects, and the existing well logging interpretation method is difficult to meet the continuously improved interpretation requirement.

At present, a manual interpretation method mainly adopted for layering logging curves requires a large amount of manpower, and meanwhile, a layering result is easily influenced by subjective factors of interpreters, and a large amount of manpower and material resources are required. Therefore, it is highly desirable to realize automatic layering by using computer technology, thereby avoiding human error, reducing labor consumption, and improving production efficiency. With the development of computer technology, the automatic layering of well logging curves by using computer technology has made great progress. At present, methods for automatic lithology identification mainly include a probability statistics method, a support vector machine method and the like. The probability statistics method is usually based on probability statistics, and the well logging is identified and explained by estimating posterior probability through prior and conditional probability. Liu Zi Yun and the like are firstly put forward to judge lithology by utilizing a probability statistical method and obtain certain effect. The national construction and the like propose that a probabilistic neural network model (PNN model) is used for lithology recognition of well logging information, and the model is trained and tested by using well logging data, so that the PNN model can obtain a certain effect in the aspect of well logging layered interpretation. The probability statistical method is suitable for the digital logging information of the surrounding rock reflected by the curve with better physical property characteristic conditions, and can achieve a certain effect under the condition that the core information is less but the logging information is more, but the method has the defects of difficulty in obtaining the prior probability and large artificial influence factors. The support vector machine (SVM for short) is a new pattern recognition method developed on the basis of the theory of statistical learning, and can achieve good effects in solving the problems of small sample number, nonlinearity and high-dimensional data pattern recognition. Research and experimental verification prove that the method has stronger feasibility and effectiveness in automatic well logging layering and lithology identification and simultaneously obtains good effect. The high-sea coke is used for layering the single-well logging curve by combining the method with the key logging curve, and carrying out sedimentary facies identification on the single-well stratum profile by combining the layering result, so that a certain experimental result is obtained; zhang Yan [16] based on different lithology and fluid logging characteristics, the research of lithology and fluid identification is carried out by adopting an SFLA-SVM method, and the lithology is identified and judged. However, the method of the support vector machine is difficult to classify large-scale training samples, and meanwhile, the SVM has certain difficulty in handling the multi-classification problem. The method of Zhangxihua, Shenkao and the like in the experiment has difficulty in the high-precision well logging layering problem and larger error.

Through investigation of the current research situation at home and abroad, the research on the aspect of self-adaptive multi-scale hierarchical calibration is still in a continuous exploration stage at present, and the following problems mainly exist: both the probability statistical method and the manual calibration interpretation method are easily influenced by human factors; the support vector machine method is difficult to obtain a good effect on the high-precision layering problem by improving the algorithm. In addition to the above problems, the current-stage logging horizon interpretation is often researched by using a single method or two methods and results are obtained, and such methods often have the problems of unclear horizon differentiation and inaccurate interpretation. In addition, many researches often only can carry out level division on logging information, and the corresponding specific level name cannot be directly identified, so that manual identification is needed.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a logging curve identification method based on a clustering algorithm.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

a logging curve identification method based on a clustering algorithm comprises the following steps:

1) analyzing the logging curves by adopting a principal component analysis method, and determining m principal component curves according to the contribution rate from high to low; wherein m is the number of logging attributes;

2) training the principal component curve of the known well by utilizing the first a principal component curves of the known well and adopting a KNN classification algorithm, and taking the calibrated classification information as a data label of training data in machine learning until the clustering centers of two consecutive times are unchanged to form a training data set; and (4) adding the first a principal component curves of the well to be identified as the example to be identified into the training data set, and training again to finish classification.

Further, before the step 1), abnormal value elimination and depth correction processing are carried out on the logging curve.

Further, step 2) comprises the following operations:

201) identifying reservoirs and non-reservoirs;

202) carrying out oil-water-gas layer multi-scale classification identification in a reservoir;

wherein the multi-scale classification of the oil-water-gas layer comprises: the dry layer, the oil-poor layer, the water layer, the oil-gas layer, the oil-containing water layer, the gas-poor layer, the gas-water layer, the oil-water layer and the gas layer.

Further, the first a principal component curves in step 201) are respectively acoustic wave, natural potential and natural gamma.

Further, the specific process of step 201) is as follows:

calculating Euclidean distances from data points of the first a principal component curves of the well to be identified to reservoir and non-reservoir logging data points in the training data set;

selecting k data points closest to the logging data points to be identified;

counting the number of reservoirs and non-reservoirs in the k data points;

and taking the category with the highest occurrence frequency as the category of the logging data points to be identified.

Further, the following operations are included after the category of the logging data point to be identified is identified:

and carrying out filtering processing for filtering the prediction error point positions.

Further, the first a principal component curves in step 202) are respectively acoustic wave, natural potential, neutron, density, natural gamma, deep lateral resistivity and shallow lateral resistivity.

Further, the specific process of step 202) is as follows:

adding data points of the first a principal component curves of the reservoir stratum of the well to be identified as an example to be identified into a training data set, and calculating the Euclidean distance from the data points of the example to be identified to the logging data points of the known category;

selecting k data points closest to the example data points to be identified;

counting the number of the occurrences of each prediction category in the k data points;

and the class with the highest class occurrence frequency in the k data points is used as the class of the example data point to be identified.

Compared with the prior art, the invention has the following beneficial effects:

the logging curve identification method based on the clustering algorithm comprises the steps of firstly utilizing a principal component analysis method to carry out dimensionality reduction on a logging curve so as to simplify information reflected by the logging curve, replacing the information with a few mutually independent and unrelated comprehensive indexes, and fully reflecting original multi-index information to the greatest extent by the limited indexes; the machine learning idea is combined with a K clustering analysis method to construct a KNN network, and calibrated hierarchical information is used as a data label of training data in machine learning, so that the K mean clustering problem can be better guided to find a clustering center, and a real and effective clustering result is obtained; the identification method of the invention does not need to carry out normalization operation on curve data, thereby reducing the time complexity of program operation to a great extent.

Furthermore, abnormal value elimination and depth correction processing are carried out on the logging curve, on one hand, negative influence of the abnormal value is eliminated, on the other hand, corresponding attribute values of different logging attributes are ensured to exist under the same depth, and extraction and analysis of the layer position characteristic information are facilitated by the method;

furthermore, when the oil-water-gas layer is identified, the identified non-reservoir data is removed, and then identification is carried out in the reservoir, so that curve characteristics can be better extracted, and higher identification accuracy is obtained; the multi-scale recognition can be realized, different requirements of actual production are met, and the problems that the existing layered recognition method can only realize layered recognition of a single scale and is difficult to realize multi-scale recognition aiming at the layered recognition of the logging curve are solved;

furthermore, the acoustic wave AC, the natural potential SP and the natural gamma GR are used as the first a principal component curves, so that the reservoir and non-reservoir identification can be realized, and the calculation speed is increased.

Drawings

FIG. 1 is a schematic flow chart of a logging curve identification method based on a clustering algorithm according to the present invention;

FIG. 2 is a schematic view of the recognition of the oil-water layer according to the present invention;

FIG. 3 is a schematic diagram showing the well location distribution in the experimental region in example 1;

fig. 4 is a graph of the reservoir and non-reservoir identification results for Y189 wells in example 1.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention is described in further detail below with reference to the accompanying drawings:

referring to fig. 1, fig. 1 is a schematic flow chart of a logging curve identification method based on a clustering algorithm, which includes three steps:

s1, analyzing the logging curve by adopting a principal component analysis method, and determining m principal component curves according to the contribution rate from high to low

Due to the limitations of the logging equipment and the geological structure, during the logging process, in the initial and final logging stages, there are often numerical abnormal measurements, such as: 99999. 99999, 0, etc. The abnormal values can bring negative influence on automatic layering identification of the logging curve, so that the abnormal data needs to be removed before layering. In the well logging process, a plurality of well logging devices are often used, and depth intervals of the well logging devices are different when well logging attributes are acquired, and the following three sampling depth intervals generally exist: 0.075m, 0.1m and 0.125 m. The multiple attribute features of each sampling point can be better extracted under the same depth, so that the selected logging curve needs to be subjected to depth correction. By correcting the logging curve, corresponding attribute values of different logging attributes can be ensured to exist at the same depth, and extraction and analysis of the layer position characteristic information are facilitated by the method;

along with the development of logging equipment, the logging equipment can acquire more and more abundant logging data, in order to carry out dimensionality reduction on high-dimensional logging data, the method researches linear combinations of a few indexes in the data on the basis of the principle that data information is lost least, and retains various information in the original indexes as much as possible by comprehensive indexes formed by the linear combinations, wherein the comprehensive indexes are called as principal components.

For the logging curve data, a sample comprises m logging attributes, each variable has n sampling points, and thus an mxn-order matrix is formed, but the huge data information is difficult to process by a computer. Therefore, useful information capable of characterizing the object needs to be analyzed from the complicated data information, that is, key attributes need to be searched in m logging attributes, which puts high requirements on human analysis and observation capability. In order to effectively solve the problem, the high-dimensional logging data needs to be subjected to dimensionality reduction operation, information reflected by more original logging curves is simplified, then a few independent and unrelated comprehensive indexes are used for substitution, and meanwhile the purpose that the original multi-index information can be fully reflected by the limited indexes to the greatest extent is needed. Therefore, the above effect can be obtained by selecting the linear combination of the original indexes.

Suppose W₁，W₂，...，W_mLogging attributes corresponding to all logging curves, wherein each logging attribute comprises n sampling points, and the following are provided: w_j＝(w_j1，...，w_jn)^T. Let W ═ W₁，W₂，...，W_m)^TThen there is a high order matrix as follows:

wherein, w_ijAnd j is the jth logging attribute of the ith sampling point, i is the ith sampling point, and j is the jth logging attribute.

Their main components are represented as:

wherein, Y_iIs the ith main component, e_ijThe correlation coefficient of the ith logging attribute and the jth logging attribute is obtained; the matrix is a m-order non-negative definite matrix, and the correlation coefficient thereof can be determined by the following rule:

1、Y_iand Y_j(i ≠ j; i, j ═ 1, 2.. multidot.m) has no correlation therebetween; 2. y is₁Is W₁，W₂，...，W_mMaximum of variance in all linear combinations, Y₂Is with Y₁W without correlation₁，W₂，...，W_mMaximum of variance, Y, in any one of the linear combinations_iIs with Y_j(i ≠ j; i, j ═ 1, 2.. times.m) W without correlation₁，W₂，...，W_mThe maximum value of the variance in any one of the linear combinations.

Firstly, a covariance matrix is calculated:

σ_ijis the standard deviation between log property i and log property j,

is W₁，W₂，...，W_nMean of all linear combinations;

is with Y₁W without correlation₁，W₂，...，W_mThe mean of any of the linear combinations; solving to obtain the characteristic value of sigma as: lambda [ alpha ]₁≥λ₂≥…≥λ_m≥0，e₁＝(e_i1，e_i2，…，e_im)^TFor corresponding characteristic value lambda_iThe contribution ratio of the ith principal component is:

because the well log data have different dimensions, the data are discrete to different degrees, and the calculated variance is different. In order to eliminate the influence caused by different logging curve dimensions, a method for standardizing data is often adopted, and the formula (3) is transformed to obtain:

r is a standardized equation of variables, R_ijThe standard deviation between the normalized logging attribute i and the normalized logging attribute j is obtained; solving to obtain the characteristic value of R as follows: lambda [ alpha ]₁ ^*≥λ₂ ^*≥…≥λ_m ^*≥0，e₁ ^*＝(e_i1 ^*，e_i2 ^*，…，e_im ^*)^TFor corresponding characteristic value lambda_i ^*The contribution ratio of the ith principal component is:

the well log data to which the present invention relates includes 60 different attribute dimensions, such as Acoustic (AC), Density (DEN), neutrons (CNL), etc. Through the principal component analysis and calculation, seven logging curves such as an acoustic wave (AC), a natural potential (SP), a neutron (CNL), a Density (DEN), a natural Gamma (GR), a deep lateral resistivity (LLD) and a shallow lateral resistivity (LLS) are finally selected as principal component curves for developing automatic layering.

S2, reservoir and non-reservoir classification identification

Since the non-reservoir occupies most of the depth of the whole depth section in the logging problem, it brings great challenge to the identification and calibration of each horizon in the reservoir, so in order to achieve the goal of level-accurate identification in the reservoir, the invention firstly distinguishes the reservoir from the non-reservoir.

Because the reservoir and non-reservoir are divided into two categories, which is simple, in the invention, when the reservoir and non-reservoir are divided, three curves with the highest contribution rate calculated by adopting a principal component analysis method are taken as the layering identification basis, which are respectively: acoustic wave (AC), natural potential (SP), natural Gamma (GR).

For the classification problem, a common data processing method of cluster analysis is often adopted. The cluster analysis method is often used for classifying sample data, and the methods are more, and a K-means clustering method, a hierarchical clustering method, a fuzzy clustering method and the like are common. The core idea of cluster analysis is to classify sample data, so that the similarity of samples in the same class is as large as possible, and the difference between samples in different classes is as large as possible. The cluster analysis method usually uses a specific criterion to measure the similarity of the sample data, and this criterion is generally the true distance between the sample and the sample in space. For the present invention, assuming that there are m logging attributes, the similarity between one sample and another sample in m-dimensional space can be measured by using a distance formula, in the K-means clustering method, the euclidean distance is often used as a measurement standard, and the euclidean distance formula between the ith sample and the jth sample is:

the invention combines the specific characteristics of the logging data, and adopts a K-means clustering algorithm to classify the logging data, wherein the K-means clustering method is also called as a rapid clustering method and is a common method in clustering analysis. The invention is characterized in that a classification number K is given in a database containing n logging sample points, K initial clustering centers are selected, Euclidean distances from the sample points to the clustering centers are calculated, the sample points are classified into the class of the closest clustering center to the class of the closest clustering center, after all the sample points are classified, a new clustering center of each class of sample points is obtained by a method of calculating an average value, and finally after continuous iteration change, when the clustering center obtained by the last two iterations does not change, the iteration is considered to be finished, so that the final clustering center is obtained and the classification is finished. The method has higher requirement on the initial clustering centers, and different initial clustering centers can generate different clustering results; and if a proper initial clustering center is not selected, a true and effective clustering result may not be obtained.

Aiming at the problems, the method combines the supervised machine learning idea with the K clustering analysis method to construct the KNN network. According to the invention, the logging data contains artificially calibrated accurate hierarchical information which can be used as a data label of training data in machine learning, and the K-means clustering problem can be better guided to find a clustering center on the basis of training a network by the training data, so that a hierarchical task of an appointed exploratory well is completed.

For the well logging data points to be identified, the following algorithm steps are specifically required:

(1) calculating the distances from the logging data points to be identified to all reservoir and non-reservoir logging data points;

(2) sorting according to the calculated distance in ascending order;

(3) selecting the first k data points which are closest to the logging data points to be identified;

(4) counting the number of occurrences of two types of reservoirs and non-reservoirs in the k data points;

(5) the category with the highest category occurrence frequency in the k points is used as the category of the logging data point to be identified;

(6) and filtering the prediction result to filter out individual prediction error points.

After the steps are executed, the input logging curve data can be divided into two types of reservoir and non-reservoir.

S3, carrying out multi-scale oil-water-gas layer identification in the reservoir

After the logging curve is divided into the reservoir and the non-reservoir in the previous step, the logging curve data points of the depth section corresponding to the non-reservoir are removed, the logging curve is divided into the reservoir section with finer granularity, and the data characteristics in front of the oil-water-gas layer in the reservoir can be amplified to the maximum extent, so that the interference of non-reservoir data with a large number of data points is eliminated, and the purpose of better distinguishing the oil-water layer from the oil-water layer is achieved.

In the process of distinguishing oil, gas and water layers, because the characteristic information contained in the three logging curves is limited, a good layering effect is difficult to obtain. In order to solve the problems, the invention adopts seven logging curves, such as finally selected acoustic waves (AC), natural potential (SP), neutrons (CNL), Density (DEN), natural Gamma (GR), deep lateral resistivity (LLD) and shallow lateral resistivity (LLS), calculated by a principal component analysis method in the step as curve characteristic sources for distinguishing oil, gas and water layers.

The oil-water-gas layer comprises the following layering categories: dry layer, poor oil layer, water layer, oil-gas layer, oil-containing water layer, gas-containing water layer, poor gas layer, gas-water layer, oil-water layer, and gas layer, etc. in total 11 types.

For the well logging data points to be identified in the reservoir, the following algorithm steps are specifically required to be executed:

(1) calculating the distances from the logging data points to be identified to all the known logging data points;

(2) sorting according to the calculated distance in ascending order;

(4) counting the number of the occurrences of each prediction category in the k data points;

After the steps are executed, the input reservoir logging curves can be classified in a fine granularity mode, and oil-water-gas layer identification is completed.

The main innovation points of the invention and the existing method are as follows: the existing layered recognition method can only realize the layered recognition of a single scale aiming at the layered recognition problem of the logging curve, and is difficult to realize the multi-scale recognition, but the invention can realize the multi-scale recognition and meet different requirements of actual production; the existing logging curve layered identification method usually needs to normalize data first, but in order to better discover curve characteristics, the method does not need to normalize the curve data, and time complexity of program operation is reduced to a great extent; when the oil-water-gas layer is identified, the identified non-reservoir data is removed, and then identification is carried out in the reservoir, so that curve characteristics can be better extracted, and better identification accuracy is obtained.

Example 1

The invention selects 10 vertical well logging data in a certain block as experimental data, the well position distribution is shown as figure 2, wherein, reservoir and non-reservoir logging data of 9 wells are used as training data, and the reservoir and non-reservoir two-kind layer position distribution of another well, namely Y189 well, is identified, and relevant experiments are carried out.

Inputting the logging attributes into an expression (1), constructing a high-order matrix, and finally obtaining the contribution sequence of each logging attribute in the logging curve identification through the calculation of an expression (2) -an expression (9): natural potential (SP) > Acoustic (AC) > neutron (CNL) > Density (DEN) > natural Gamma (GR) > deep lateral resistivity (LLD) > shallow lateral resistivity (LLS) > other logging properties. Therefore, three attributes of natural potential (SP), sound wave (AC) and neutron (CNL) are selected as division bases in the division of the reservoir and the non-reservoir; when oil-water-gas layer division is carried out in a reservoir, seven curves such as natural potential (SP), sound wave (AC), neutron (CNL), Density (DEN), natural Gamma (GR), deep lateral resistivity (LLD) and shallow lateral resistivity (LLS) are selected as division bases.

S2, reservoir and non-reservoir classification identification

Before the experiment, three curves of sound wave (AC), natural potential (SP) and natural Gamma (GR) of 9 exploratory wells such as Y220, Y219, Y205, Y194, Y192, Y181, Y148, Y146 and Y45 are selected for the experiment, the depth ranges of the curves of different wells are not completely the same, but are mostly concentrated in the range of 500m-1300m, and the sampling interval is 0.1 m. The prediction range of the pre-logging Y189 well is 500m-1100m, the total length is 600m, and 6000 sampling points are required to be predicted in total.

The KNN clustering analysis method is characterized in that a diagram for identifying results of a well Y189 well reservoir and a non-reservoir is shown in FIG. 4, and is limited by space, and only partial depth section identification results are shown. In fig. 4, the left three logs are used in the method of the present invention, and different response values are generated in different strata according to the depth change; the first channel on the right side is a reservoir and non-reservoir prediction result before filtering, the second channel on the right side is a reservoir and non-reservoir prediction result after filtering, the depth section filled with colors in the graph belongs to the reservoir, and the depth section not filled belongs to the non-reservoir. The third track on the right side is an identification result of manual marking, a marked depth section is a reservoir stratum, and a depth section which is not marked is a non-reservoir stratum.

As shown in Table 1, the horizon identification results shown in the table show that the method can accurately identify most horizons, which shows that the method can well identify reservoirs and non-reservoirs.

TABLE 1Y 189 well reservoir and non-reservoir stratifying results

On the basis of the experimental identification result, the method can divide the interior of the reservoir layer into finer granularity. The reservoir in the experimental region has 11 types of layers including a dry layer, a poor oil layer, a water layer, an oil-gas layer, an oil-containing water layer, a gas-containing water layer, a poor gas layer, a gas-water layer, an oil-water layer, a gas layer and the like. The sample set with few sample classes is expanded before the experiment, so that the aim of sample balance of each layer is fulfilled, and the layering result is more accurate.

In the experiment process, seven logging curves such as acoustic waves (AC), natural potentials (SP), neutrons (CNL), Densities (DEN), natural Gammas (GR), deep lateral resistivities (LLD) and shallow lateral resistivities (LLS) which are finally selected by calculation of 9 exploratory wells such as Y220, Y219, Y205, Y194, Y192, Y181, Y148, Y146 and Y45 through a principal component analysis method are used as training data input. Table 2 is the multi-scale layering method results:

TABLE 2Y 44 well Multi-Scale identification results

For oil-water layer identification of a Y44 well, as can be seen from Table 2, although the identification accuracy of the multi-scale identification method designed by the invention is not high for the layer position of the oil-gas same layer, among 24 layers, the multi-scale identification method designed by the invention can accurately identify 19 layers, and the identification accuracy reaches 79.2%. Experiments prove that in the oil-water-gas layer identification process, a more accurate identification effect can be obtained by removing non-reservoir data and then continuously identifying in a reservoir.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A logging curve identification method based on a clustering algorithm is characterized by comprising the following steps:

2) training the principal component curve of the known well by utilizing the first a principal component curves of the known well and adopting a KNN classification algorithm, and taking the calibrated classification information as a data label of training data in machine learning until the clustering centers of two consecutive times are unchanged to form a training data set; adding the first a principal component curves of the well to be identified as an example to be identified into a training data set, and training again to finish classification;

step 2) comprises the following operations:

201) identifying reservoirs and non-reservoirs;

the specific process of step 201) is as follows:

selecting k data points closest to the logging data points to be identified;

counting the number of reservoirs and non-reservoirs in the k data points;

taking the category with the highest occurrence frequency as the category of the logging data points to be identified;

wherein the multi-scale classification of the oil-water-gas layer comprises: a dry layer, a poor oil layer, a water layer, an oil-gas layer, an oil-containing water layer, a gas-containing water layer, a poor gas layer, a gas-water layer, an oil-water layer and a gas layer;

the specific process of step 202) is:

selecting k data points closest to the example data points to be identified;

2. The method for identifying the logging curve based on the clustering algorithm as claimed in claim 1, wherein the step 1) is preceded by outlier rejection and depth correction processing of the logging curve.

3. The method for identifying logging curves based on clustering algorithm as claimed in claim 1, wherein the first a principal component curves in step 201) are respectively acoustic wave, natural potential and natural gamma.

4. The method of claim 3, wherein identifying the category of the log data point to be identified further comprises:

5. The method for identifying a logging curve based on a clustering algorithm as claimed in claim 1, wherein the first a principal component curves in step 202) are respectively acoustic, natural potential, neutron, density, natural gamma, deep lateral resistivity and shallow lateral resistivity.

6. The method of claim 5, wherein identifying the category of the log data point to be identified further comprises: