1. Introduction
Wavelet methods are widely used for various signal and image processing tasks. They are applied for the frequency–time analysis of signal spectra, the identification of singular points and data compression. Moreover, during the process of receiving, processing and transmitting, signals are often contaminated with noise due to system errors, environmental interference and the internal noise of receivers. To solve this problem, various noise reduction methods have been proposed, such as linear and median filtering. Recently, wavelet decomposition has become a popular tool in solving noise reduction problems. Compared with traditional linear and median filters, wavelet analysis methods offer advantages such as computational speed and adaptive spectral analysis capabilities: a flexible time–frequency window is automatically narrowed when considering high-frequency phenomena and expanded when studying low-frequency oscillations. This makes wavelets an effective tool for the analysis of non-stationary signals with varying regularity in their domain. Telecommunication channels and computer networks may contain additive Gaussian noise caused by thermal sources, solar radiation and other natural causes. Sensor noise, electronic component noise and poor lighting are also sources of Gaussian noise in digital images. The use of standard filtering methods may lead to the excessive blurring of small-scale image details. The use of wavelets in this situation also offers certain advantages, since applying the wavelet transform retains the Gaussian noise structure, and the useful signal is economically represented as a small number of wavelet coefficients. Thus, in the space of wavelet coefficients, it is possible to effectively separate the noise from the useful signal.
Many methods based on wavelet theory have been proposed [
1]. Among these methods, threshold processing is the most widely used due to its simplicity and efficiency. Donoho and Johnstone [
2] suggested that only a small number of wavelet coefficients are needed to restore the signal, and they proposed the classical method of hard threshold processing. When using the hard threshold processing function, coefficients with absolute values lower than the threshold are removed, while large coefficients remain unchanged. This function is discontinuous, and additional artifacts, such as an analog of the Gibbs effect, may appear in the restored signal. To make the threshold function continuous, Donoho proposed a soft threshold processing function that reduces large coefficients by the threshold value. However, this introduces an additional bias that can lead to distortions in the reconstructed signal (for more information, see Ref. [
3]). Many authors have proposed various versions of thresholding functions that improve the classical methods to some extent [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15]. Such functions, like the soft thresholding function, are continuous and allow one to construct an unbiased estimate of the mean square risk using Stein’s method [
16]. In this case, the threshold value is usually proposed to be calculated by solving the problem of minimizing this estimate. Due to the appearance of such a variety of threshold processing methods, Johnstone [
3] proposed to consider a fairly general class of thresholding functions that satisfy some natural requirements and allow the construction of an unbiased risk estimate. The study of risk estimate properties is also an important practical task, since this estimate allows one to quantitatively evaluate the error of the noise suppression method using only the observed data. However, while the properties of theoretical risk are well studied, the properties of its estimates have received less attention so far. The purpose of this paper is to fill this gap and study the statistical properties of thresholding risk estimates. This paper proves that, under some additional requirements for the threshold function, the unbiased estimate of the mean square risk is strongly consistent and asymptotically normal. These properties serve as the basis for the use of the risk estimate as a quality criterion for the thresholding method.
2. Wavelet Decomposition
For a function
, the wavelet decomposition is a series of the form
where
are shifts and dilations of the chosen wavelet function
. The index
j in this expansion is called the scale, and the index
k is called the shift. Under certain conditions, the sequence
forms an orthonormal basis. The function
can be chosen so that it is sufficiently smooth and has other useful properties [
1].
In what follows, we will consider functions
f with support on some finite interval
, uniformly Lipschitz regular with the exponent
. If a wavelet function is
M times continuously differentiable (
), has
M vanishing moments, i.e.,
and rapidly decays at infinity together with its derivatives, i.e., for all
and any
, there is a constant
such that, for all
,
then
where
is some positive constant depending on
f (a wavelet function with the given properties always exists—for example, a Daubechies wavelet of the corresponding order). This inequality shows that, in the space of wavelet coefficients, Lipschitz regular functions have a sparse representation.
In practice, functions are defined by their values on some grid of points. Suppose that this grid is uniform and the number of points is
for some
. In the case where the number of observations differs from a power of two, some extension of the array to the nearest power of 2 is applied [
17]. The discrete wavelet transform is the multiplication of the vector of function values by the orthogonal matrix
W defined by the wavelet function
. This results in a set of discrete wavelet coefficients
, for which, for a sufficiently large
N, the relation
is valid [
1]. Thus, for
, the following inequality holds:
3. Thresholding of Wavelet Coefficients
Real observations are usually contaminated by noise. In many cases, it is assumed that the noise is additive, white and Gaussian. Thus, we consider the following model:
where
are values of the function
f, and the random variables
have the distribution
and are independent. Since the matrix
W is orthogonal, the discrete wavelet coefficients have the form
where
have distribution
and are independent. Classical methods of signal processing assume the additivity and Gaussianity of the noise. However, there are approaches (see, for example, [
18,
19]) that assume non-Gaussianity. The consideration of such a problem could be the subject of a separate study.
To suppress noise, a thresholding function is applied to the wavelet coefficients, the purpose of which is to remove sufficiently small coefficients that are considered noise. The most common are the hard thresholding function
and the soft thresholding function
with some threshold
T. However, each of them has its own drawbacks. The function
is discontinuous, which leads to a lack of stability and the appearance of additional artifacts, and the function
leads to the appearance of an additional bias in the estimate of the function
f. A number of studies have proposed some alternative thresholding functions
, which, in fact, are a compromise between hard and soft thresholding [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15]. These functions are continuous, like the function
, but
as
, i.e., for large absolute values of the coefficients, they resemble the hard thresholding function.
Due to the emergence of various types of thresholding functions, it makes sense to consider some general classes of such functions. Let the function have the following properties:
1. (oddness);
2. for (boundedness);
3. (continuity).
Then, the thresholding function is defined by the expression
This form has, for example, the soft thresholding function, the hybrid thresholding function [
5,
15], the thresholding function based on the hyperbolic tangent [
10], the sigmoid function [
12] and some others.
Thus, the signal processing scheme is as follows. First, a discrete wavelet transform is performed using the orthogonal matrix W. Then, the threshold processing of the resulting wavelet coefficients is performed, which results in the suppression of most of the noise, since the useful signal has an economical representation in the space of wavelet coefficients, and the noise is uniformly distributed over all coefficients. Finally, the inverse wavelet transform is performed using the matrix to obtain the processed signal (, since W is orthogonal). Moreover, the task of the economical presentation of data is solved along the way for storage or transmission over communication channels.
The error (or risk) of threshold processing is defined as follows:
One of the key issues is the choice of the threshold value
T (note that a “reasonable” threshold value should increase with
J [
1]). One of the first to appear was the so-called universal threshold
. The following statement demonstrates that this threshold is, in a certain sense, the maximum among the “reasonable” ones [
1].
Lemma 1. Let be independent random variables with distribution . Then,as . Thus, the maximum noise amplitude with a high probability does not exceed the universal threshold. In Refs. [
20,
21], additional considerations are also given, showing that it is reasonable not to consider
.
4. Statistical Estimate of the Mean Square Risk
Another approach to calculating the threshold value is based on minimizing the statistical risk estimate (
2) constructed using Stein’s method [
16]. If
is differentiable with respect to the variable
y, then the following relation is valid:
where
. That is, Stein’s method consists of the fact that, in (
2), the expression containing unknown
can be replaced by an expression containing only observed data.
Thus, the following expression can be used as an unbiased risk estimate:
where
Let
denote the threshold that minimizes the estimate (
3):
This threshold mimics the theoretical “ideal” threshold
that minimizes the theoretical risk (
2):
Remark 1. Since this paper studies the order of the expressions under consideration with respect to J, we will use the notations C, c and others for some positive constants that may depend on the model parameters but do not depend on J. In this case, in one chain of transformations, different constants may be designated by the same letters, and the transformations themselves are performed under the assumption that J is sufficiently large.
The following upper bound is valid for .
Lemma 2. Let f be supported on some finite interval and be uniformly Lipschitz regular with some exponent . Then, Proof. Let the index
be such that
In this case, due to (
1),
Let us split the expression for risk into two sums:
Let us estimate the first sum from above. Given the restrictions on
, we obtain
Hence,
Now, consider the second sum. Given the constraints on
and the fact that, for each term of the second sum
(and hence
, since
), we obtain
Taking into account (
1), we have
Thus,
In this expression, the first term increases with the growing
T, and the second one decreases (while, as noted,
). Repeating the reasoning of Ref. [
22] and choosing the value
to equalize the orders of these terms, we obtain the upper bound (
5) (in this case, the value
represents the lower bound for the “ideal” threshold
). The lemma is proven. □
For the threshold value , the following statement holds, showing that it cannot be too small.
Lemma 3. Let f be supported on some finite interval and be uniformly Lipschitz regular with some exponent . Let, for some positive θ,Then, for and arbitrary ,If , then, for an arbitrary ,where , . Proof. Let
. Then, for
, we obtain
Next, since
,
Using the inequality
where
and
are the density and distribution function of the standard normal law, and given that, by (
1),
we conclude that there is a constant
such that, for
,
If
, then, obviously, for some
, independent of
J,
Thus,
Therefore, taking into account (
5),
Denote
For
,
Let
. Then, given the form of
and the constraints (
6), we obtain the estimate
Let us divide the interval
into equal parts using points
Then,
where
Using Hoeffding’s inequality [
23], we obtain
Next,
Since
we have
Applying Hoeffding’s inequality one more time, we obtain
Combining (
9) and (
10), we obtain
Now, let
. Then, for
, the estimate of the probability
has exactly the same form. Next, let
and
. Then, repeating the previous reasoning, we obtain
and
where
It can also be shown that
Now, repeating the previous reasoning, to estimate the probabilities of the events
and
, instead of Hoeffding’s inequality, we apply Bernstein’s inequality [
23] and obtain
Hence,
Combining this estimate with the estimate of the probability
and taking into account that
, we obtain the statement of the lemma for
.
The lemma is proven. □
7. Conclusions
Wavelet analysis and threshold processing methods have found wide application in the signal analysis of noisy data. The advantages of these methods include high computational efficiency and the ability to adapt to the local features of the estimated function. Under some additional assumptions, it is possible to economically represent the signal function in such a way that the useful signal is concentrated in a relatively small number of coefficients with sufficiently large absolute values, after which threshold processing is performed to remove noise arising both for natural reasons (for example, due to background sources) and due to imperfections in the equipment used. Typically, the noise distribution is assumed to be additive and Gaussian. This model has been well studied in the literature. Various types of threshold processing and parameter selection strategies adapted to specific practical applications have been proposed. The analysis of errors arising from the application of threshold processing methods is an important practical task, since it allows one to evaluate the quality of both the methods themselves and the equipment used for processing. Most of the efforts in this area are devoted to studying the asymptotic order of the theoretical risk, which cannot be calculated in practice because it explicitly depends on the theoretical unobservable noise-free signal. However, a statistical risk estimate based only on the observed data can also be used to assess the quality of noise reduction methods.
In this paper, a model of a signal contaminated with additive Gaussian noise is considered, and the general formulation of the thresholding problem with threshold functions belonging to a certain special class is discussed. A number of examples of known threshold functions from the considered class are given, which are used in such problems as ECG and EEG signal processing, the processing of noisy images, electromyography, noise reduction in audio signals, seismic signal analysis, etc. The specific type of thresholding function for practical use depends on the application and the amount of available data. Tuning additional parameters of the specified thresholding methods improves their performance in some specific applications. For example, [
10,
11] present numerical studies demonstrating the effectiveness of hybrid thresholding and hyperbolic tangent-based thresholding in ECG signal processing.
This paper obtains lower bounds for thresholds that minimize the unbiased risk estimate. It also provides conditions under which this risk estimate is asymptotically normal and strongly consistent. The authors plan to conduct a number of further studies in this direction. In particular, we will seek to obtain results concerning the construction of confidence intervals and estimates of the convergence rate, which, in turn, will allow us to obtain specific error values in signal processing for a wide range of threshold processing methods. It is also planned to consider similar models for dependent sample components and non-Gaussian noise.