Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a logging curve identification method based on a clustering algorithm.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a logging curve identification method based on a clustering algorithm comprises the following steps:
1) analyzing the logging curves by adopting a principal component analysis method, and determining m principal component curves according to the contribution rate from high to low; wherein m is the number of logging attributes;
2) training the principal component curve of the known well by utilizing the first a principal component curves of the known well and adopting a KNN classification algorithm, and taking the calibrated classification information as a data label of training data in machine learning until the clustering centers of two consecutive times are unchanged to form a training data set; and (4) adding the first a principal component curves of the well to be identified as the example to be identified into the training data set, and training again to finish classification.
Further, before the step 1), abnormal value elimination and depth correction processing are carried out on the logging curve.
Further, step 2) comprises the following operations:
201) identifying reservoirs and non-reservoirs;
202) carrying out oil-water-gas layer multi-scale classification identification in a reservoir;
wherein the multi-scale classification of the oil-water-gas layer comprises: the dry layer, the oil-poor layer, the water layer, the oil-gas layer, the oil-containing water layer, the gas-poor layer, the gas-water layer, the oil-water layer and the gas layer.
Further, the first a principal component curves in step 201) are respectively acoustic wave, natural potential and natural gamma.
Further, the specific process of step 201) is as follows:
calculating Euclidean distances from data points of the first a principal component curves of the well to be identified to reservoir and non-reservoir logging data points in the training data set;
selecting k data points closest to the logging data points to be identified;
counting the number of reservoirs and non-reservoirs in the k data points;
and taking the category with the highest occurrence frequency as the category of the logging data points to be identified.
Further, the following operations are included after the category of the logging data point to be identified is identified:
and carrying out filtering processing for filtering the prediction error point positions.
Further, the first a principal component curves in step 202) are respectively acoustic wave, natural potential, neutron, density, natural gamma, deep lateral resistivity and shallow lateral resistivity.
Further, the specific process of step 202) is as follows:
adding data points of the first a principal component curves of the reservoir stratum of the well to be identified as an example to be identified into a training data set, and calculating the Euclidean distance from the data points of the example to be identified to the logging data points of the known category;
selecting k data points closest to the example data points to be identified;
counting the number of the occurrences of each prediction category in the k data points;
and the class with the highest class occurrence frequency in the k data points is used as the class of the example data point to be identified.
Further, the following operations are included after the category of the logging data point to be identified is identified:
and carrying out filtering processing for filtering the prediction error point positions.
Compared with the prior art, the invention has the following beneficial effects:
the logging curve identification method based on the clustering algorithm comprises the steps of firstly utilizing a principal component analysis method to carry out dimensionality reduction on a logging curve so as to simplify information reflected by the logging curve, replacing the information with a few mutually independent and unrelated comprehensive indexes, and fully reflecting original multi-index information to the greatest extent by the limited indexes; the machine learning idea is combined with a K clustering analysis method to construct a KNN network, and calibrated hierarchical information is used as a data label of training data in machine learning, so that the K mean clustering problem can be better guided to find a clustering center, and a real and effective clustering result is obtained; the identification method of the invention does not need to carry out normalization operation on curve data, thereby reducing the time complexity of program operation to a great extent.
Furthermore, abnormal value elimination and depth correction processing are carried out on the logging curve, on one hand, negative influence of the abnormal value is eliminated, on the other hand, corresponding attribute values of different logging attributes are ensured to exist under the same depth, and extraction and analysis of the layer position characteristic information are facilitated by the method;
furthermore, when the oil-water-gas layer is identified, the identified non-reservoir data is removed, and then identification is carried out in the reservoir, so that curve characteristics can be better extracted, and higher identification accuracy is obtained; the multi-scale recognition can be realized, different requirements of actual production are met, and the problems that the existing layered recognition method can only realize layered recognition of a single scale and is difficult to realize multi-scale recognition aiming at the layered recognition of the logging curve are solved;
furthermore, the acoustic wave AC, the natural potential SP and the natural gamma GR are used as the first a principal component curves, so that the reservoir and non-reservoir identification can be realized, and the calculation speed is increased.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, fig. 1 is a schematic flow chart of a logging curve identification method based on a clustering algorithm, which includes three steps:
s1, analyzing the logging curve by adopting a principal component analysis method, and determining m principal component curves according to the contribution rate from high to low
Due to the limitations of the logging equipment and the geological structure, during the logging process, in the initial and final logging stages, there are often numerical abnormal measurements, such as: 99999. 99999, 0, etc. The abnormal values can bring negative influence on automatic layering identification of the logging curve, so that the abnormal data needs to be removed before layering. In the well logging process, a plurality of well logging devices are often used, and depth intervals of the well logging devices are different when well logging attributes are acquired, and the following three sampling depth intervals generally exist: 0.075m, 0.1m and 0.125 m. The multiple attribute features of each sampling point can be better extracted under the same depth, so that the selected logging curve needs to be subjected to depth correction. By correcting the logging curve, corresponding attribute values of different logging attributes can be ensured to exist at the same depth, and extraction and analysis of the layer position characteristic information are facilitated by the method;
along with the development of logging equipment, the logging equipment can acquire more and more abundant logging data, in order to carry out dimensionality reduction on high-dimensional logging data, the method researches linear combinations of a few indexes in the data on the basis of the principle that data information is lost least, and retains various information in the original indexes as much as possible by comprehensive indexes formed by the linear combinations, wherein the comprehensive indexes are called as principal components.
For the logging curve data, a sample comprises m logging attributes, each variable has n sampling points, and thus an mxn-order matrix is formed, but the huge data information is difficult to process by a computer. Therefore, useful information capable of characterizing the object needs to be analyzed from the complicated data information, that is, key attributes need to be searched in m logging attributes, which puts high requirements on human analysis and observation capability. In order to effectively solve the problem, the high-dimensional logging data needs to be subjected to dimensionality reduction operation, information reflected by more original logging curves is simplified, then a few independent and unrelated comprehensive indexes are used for substitution, and meanwhile the purpose that the original multi-index information can be fully reflected by the limited indexes to the greatest extent is needed. Therefore, the above effect can be obtained by selecting the linear combination of the original indexes.
Suppose W1,W2,...,WmLogging attributes corresponding to all logging curves, wherein each logging attribute comprises n sampling points, and the following are provided: wj=(wj1,...,wjn)T. Let W ═ W1,W2,...,Wm)TThen there is a high order matrix as follows:
wherein, wijAnd j is the jth logging attribute of the ith sampling point, i is the ith sampling point, and j is the jth logging attribute.
Their main components are represented as:
wherein, YiIs the ith main component, eijThe correlation coefficient of the ith logging attribute and the jth logging attribute is obtained; the matrix is a m-order non-negative definite matrix, and the correlation coefficient thereof can be determined by the following rule:
1、Yiand Yj(i ≠ j; i, j ═ 1, 2.. multidot.m) has no correlation therebetween; 2. y is1Is W1,W2,...,WmMaximum of variance in all linear combinations, Y2Is with Y1W without correlation1,W2,...,WmMaximum of variance, Y, in any one of the linear combinationsiIs with Yj(i ≠ j; i, j ═ 1, 2.. times.m) W without correlation1,W2,...,WmThe maximum value of the variance in any one of the linear combinations.
Firstly, a covariance matrix is calculated:
σ
ijis the standard deviation between log property i and log property j,
is W
1,W
2,...,W
nMean of all linear combinations;
is with Y
1W without correlation
1,W
2,...,W
mThe mean of any of the linear combinations; solving to obtain the characteristic value of sigma as: lambda [ alpha ]
1≥λ
2≥…≥λ
m≥0,e
1=(e
i1,e
i2,…,e
im)
TFor corresponding characteristic value lambda
iThe contribution ratio of the ith principal component is:
because the well log data have different dimensions, the data are discrete to different degrees, and the calculated variance is different. In order to eliminate the influence caused by different logging curve dimensions, a method for standardizing data is often adopted, and the formula (3) is transformed to obtain:
r is a standardized equation of variables, RijThe standard deviation between the normalized logging attribute i and the normalized logging attribute j is obtained; solving to obtain the characteristic value of R as follows: lambda [ alpha ]1 *≥λ2 *≥…≥λm *≥0,e1 *=(ei1 *,ei2 *,…,eim *)TFor corresponding characteristic value lambdai *The contribution ratio of the ith principal component is:
the well log data to which the present invention relates includes 60 different attribute dimensions, such as Acoustic (AC), Density (DEN), neutrons (CNL), etc. Through the principal component analysis and calculation, seven logging curves such as an acoustic wave (AC), a natural potential (SP), a neutron (CNL), a Density (DEN), a natural Gamma (GR), a deep lateral resistivity (LLD) and a shallow lateral resistivity (LLS) are finally selected as principal component curves for developing automatic layering.
S2, reservoir and non-reservoir classification identification
Since the non-reservoir occupies most of the depth of the whole depth section in the logging problem, it brings great challenge to the identification and calibration of each horizon in the reservoir, so in order to achieve the goal of level-accurate identification in the reservoir, the invention firstly distinguishes the reservoir from the non-reservoir.
Because the reservoir and non-reservoir are divided into two categories, which is simple, in the invention, when the reservoir and non-reservoir are divided, three curves with the highest contribution rate calculated by adopting a principal component analysis method are taken as the layering identification basis, which are respectively: acoustic wave (AC), natural potential (SP), natural Gamma (GR).
For the classification problem, a common data processing method of cluster analysis is often adopted. The cluster analysis method is often used for classifying sample data, and the methods are more, and a K-means clustering method, a hierarchical clustering method, a fuzzy clustering method and the like are common. The core idea of cluster analysis is to classify sample data, so that the similarity of samples in the same class is as large as possible, and the difference between samples in different classes is as large as possible. The cluster analysis method usually uses a specific criterion to measure the similarity of the sample data, and this criterion is generally the true distance between the sample and the sample in space. For the present invention, assuming that there are m logging attributes, the similarity between one sample and another sample in m-dimensional space can be measured by using a distance formula, in the K-means clustering method, the euclidean distance is often used as a measurement standard, and the euclidean distance formula between the ith sample and the jth sample is:
the invention combines the specific characteristics of the logging data, and adopts a K-means clustering algorithm to classify the logging data, wherein the K-means clustering method is also called as a rapid clustering method and is a common method in clustering analysis. The invention is characterized in that a classification number K is given in a database containing n logging sample points, K initial clustering centers are selected, Euclidean distances from the sample points to the clustering centers are calculated, the sample points are classified into the class of the closest clustering center to the class of the closest clustering center, after all the sample points are classified, a new clustering center of each class of sample points is obtained by a method of calculating an average value, and finally after continuous iteration change, when the clustering center obtained by the last two iterations does not change, the iteration is considered to be finished, so that the final clustering center is obtained and the classification is finished. The method has higher requirement on the initial clustering centers, and different initial clustering centers can generate different clustering results; and if a proper initial clustering center is not selected, a true and effective clustering result may not be obtained.
Aiming at the problems, the method combines the supervised machine learning idea with the K clustering analysis method to construct the KNN network. According to the invention, the logging data contains artificially calibrated accurate hierarchical information which can be used as a data label of training data in machine learning, and the K-means clustering problem can be better guided to find a clustering center on the basis of training a network by the training data, so that a hierarchical task of an appointed exploratory well is completed.
For the well logging data points to be identified, the following algorithm steps are specifically required:
(1) calculating the distances from the logging data points to be identified to all reservoir and non-reservoir logging data points;
(2) sorting according to the calculated distance in ascending order;
(3) selecting the first k data points which are closest to the logging data points to be identified;
(4) counting the number of occurrences of two types of reservoirs and non-reservoirs in the k data points;
(5) the category with the highest category occurrence frequency in the k points is used as the category of the logging data point to be identified;
(6) and filtering the prediction result to filter out individual prediction error points.
After the steps are executed, the input logging curve data can be divided into two types of reservoir and non-reservoir.
S3, carrying out multi-scale oil-water-gas layer identification in the reservoir
After the logging curve is divided into the reservoir and the non-reservoir in the previous step, the logging curve data points of the depth section corresponding to the non-reservoir are removed, the logging curve is divided into the reservoir section with finer granularity, and the data characteristics in front of the oil-water-gas layer in the reservoir can be amplified to the maximum extent, so that the interference of non-reservoir data with a large number of data points is eliminated, and the purpose of better distinguishing the oil-water layer from the oil-water layer is achieved.
In the process of distinguishing oil, gas and water layers, because the characteristic information contained in the three logging curves is limited, a good layering effect is difficult to obtain. In order to solve the problems, the invention adopts seven logging curves, such as finally selected acoustic waves (AC), natural potential (SP), neutrons (CNL), Density (DEN), natural Gamma (GR), deep lateral resistivity (LLD) and shallow lateral resistivity (LLS), calculated by a principal component analysis method in the step as curve characteristic sources for distinguishing oil, gas and water layers.
The oil-water-gas layer comprises the following layering categories: dry layer, poor oil layer, water layer, oil-gas layer, oil-containing water layer, gas-containing water layer, poor gas layer, gas-water layer, oil-water layer, and gas layer, etc. in total 11 types.
For the well logging data points to be identified in the reservoir, the following algorithm steps are specifically required to be executed:
(1) calculating the distances from the logging data points to be identified to all the known logging data points;
(2) sorting according to the calculated distance in ascending order;
(3) selecting the first k data points which are closest to the logging data points to be identified;
(4) counting the number of the occurrences of each prediction category in the k data points;
(5) the category with the highest category occurrence frequency in the k points is used as the category of the logging data point to be identified;
(6) and filtering the prediction result to filter out individual prediction error points.
After the steps are executed, the input reservoir logging curves can be classified in a fine granularity mode, and oil-water-gas layer identification is completed.
The main innovation points of the invention and the existing method are as follows: the existing layered recognition method can only realize the layered recognition of a single scale aiming at the layered recognition problem of the logging curve, and is difficult to realize the multi-scale recognition, but the invention can realize the multi-scale recognition and meet different requirements of actual production; the existing logging curve layered identification method usually needs to normalize data first, but in order to better discover curve characteristics, the method does not need to normalize the curve data, and time complexity of program operation is reduced to a great extent; when the oil-water-gas layer is identified, the identified non-reservoir data is removed, and then identification is carried out in the reservoir, so that curve characteristics can be better extracted, and better identification accuracy is obtained.
Example 1
The invention selects 10 vertical well logging data in a certain block as experimental data, the well position distribution is shown as figure 2, wherein, reservoir and non-reservoir logging data of 9 wells are used as training data, and the reservoir and non-reservoir two-kind layer position distribution of another well, namely Y189 well, is identified, and relevant experiments are carried out.
S1, analyzing the logging curve by adopting a principal component analysis method, and determining m principal component curves according to the contribution rate from high to low
Inputting the logging attributes into an expression (1), constructing a high-order matrix, and finally obtaining the contribution sequence of each logging attribute in the logging curve identification through the calculation of an expression (2) -an expression (9): natural potential (SP) > Acoustic (AC) > neutron (CNL) > Density (DEN) > natural Gamma (GR) > deep lateral resistivity (LLD) > shallow lateral resistivity (LLS) > other logging properties. Therefore, three attributes of natural potential (SP), sound wave (AC) and neutron (CNL) are selected as division bases in the division of the reservoir and the non-reservoir; when oil-water-gas layer division is carried out in a reservoir, seven curves such as natural potential (SP), sound wave (AC), neutron (CNL), Density (DEN), natural Gamma (GR), deep lateral resistivity (LLD) and shallow lateral resistivity (LLS) are selected as division bases.
S2, reservoir and non-reservoir classification identification
Before the experiment, three curves of sound wave (AC), natural potential (SP) and natural Gamma (GR) of 9 exploratory wells such as Y220, Y219, Y205, Y194, Y192, Y181, Y148, Y146 and Y45 are selected for the experiment, the depth ranges of the curves of different wells are not completely the same, but are mostly concentrated in the range of 500m-1300m, and the sampling interval is 0.1 m. The prediction range of the pre-logging Y189 well is 500m-1100m, the total length is 600m, and 6000 sampling points are required to be predicted in total.
The KNN clustering analysis method is characterized in that a diagram for identifying results of a well Y189 well reservoir and a non-reservoir is shown in FIG. 4, and is limited by space, and only partial depth section identification results are shown. In fig. 4, the left three logs are used in the method of the present invention, and different response values are generated in different strata according to the depth change; the first channel on the right side is a reservoir and non-reservoir prediction result before filtering, the second channel on the right side is a reservoir and non-reservoir prediction result after filtering, the depth section filled with colors in the graph belongs to the reservoir, and the depth section not filled belongs to the non-reservoir. The third track on the right side is an identification result of manual marking, a marked depth section is a reservoir stratum, and a depth section which is not marked is a non-reservoir stratum.
As shown in Table 1, the horizon identification results shown in the table show that the method can accurately identify most horizons, which shows that the method can well identify reservoirs and non-reservoirs.
TABLE 1Y 189 well reservoir and non-reservoir stratifying results
S3, carrying out multi-scale oil-water-gas layer identification in the reservoir
On the basis of the experimental identification result, the method can divide the interior of the reservoir layer into finer granularity. The reservoir in the experimental region has 11 types of layers including a dry layer, a poor oil layer, a water layer, an oil-gas layer, an oil-containing water layer, a gas-containing water layer, a poor gas layer, a gas-water layer, an oil-water layer, a gas layer and the like. The sample set with few sample classes is expanded before the experiment, so that the aim of sample balance of each layer is fulfilled, and the layering result is more accurate.
In the experiment process, seven logging curves such as acoustic waves (AC), natural potentials (SP), neutrons (CNL), Densities (DEN), natural Gammas (GR), deep lateral resistivities (LLD) and shallow lateral resistivities (LLS) which are finally selected by calculation of 9 exploratory wells such as Y220, Y219, Y205, Y194, Y192, Y181, Y148, Y146 and Y45 through a principal component analysis method are used as training data input. Table 2 is the multi-scale layering method results:
TABLE 2Y 44 well Multi-Scale identification results
For oil-water layer identification of a Y44 well, as can be seen from Table 2, although the identification accuracy of the multi-scale identification method designed by the invention is not high for the layer position of the oil-gas same layer, among 24 layers, the multi-scale identification method designed by the invention can accurately identify 19 layers, and the identification accuracy reaches 79.2%. Experiments prove that in the oil-water-gas layer identification process, a more accurate identification effect can be obtained by removing non-reservoir data and then continuously identifying in a reservoir.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.