1. Introduction
Recently, scientific research in the field of wireless acoustic sensor networks solves very important technical problems. Many areas have been covered, such as self-localization of acoustic sensors, recognition and coding of audio signals, active noise control, and localization of sound sources [
1,
2].
This paper considers another important area in acoustic sensory systems, information security, which will allow use of a highly redundant audio signal that is received from acoustic sensors as a container for hiding text information in it, so that the classical problem of audio steganography is solved. It will also be quite relevant to apply the developed method in voice messengers, where a fake voice message is transmitted that hides a true text message. In this case, the attacker will not be able to recognize the essence of the hidden correspondence of users, and if we assume that the microphone of a mobile device will act as an acoustic sensor, then it is possible to mask hidden correspondence against the background of another audio conference in real time, which also can confuse the attacker. It is necessary to remember the features of text recognition systems against the background of multimedia information (images, video), which is obtained from video sensors, where the recognized text can also be hidden in the audio signal of acoustic sensor networks.
It should be noted that if we slightly modify the developed method at the stage of processing hidden information before integrating it into the audio signal (adapt the method to another type of information), then it can be easily used not only for hiding text information, but also for hiding signal parameters (object recognition features), which are the result of the analysis, processing, and classification of information received from different types of network sensors (video and audio sensors). This type of hidden information is very common today in computer vision, speech, and video recognition. In this case, it is not the carrier signal itself (audio, video, images) that is subject to hiding in the audio signal, but its recognition features, depending on the specific classification task being solved. For example, the semantic parameters of speech or the biometric features of the voice can be hidden in the audio signal; if we are talking about recognizing video information or images, then there is an opportunity to hide the parameters that characterize the tracking of moving objects in time, identification of a person by photo, optical character recognition, and other such signal parameters.
To ensure the effective hiding of text information in an audio signal, a deep understanding of their amplitude–frequency characteristics [
3] is required. This is because many factors will depend on the correct analysis of where in the amplitude–frequency component text information is to be integrated. The main ones are the effectiveness of the hiding (masking) itself, as well as the resistance of the steganocodec to audio container transcoding. A fundamental understanding of the spectral features of audio signals [
4] will allow balancing between increasing the efficiency of hiding text information in an audio container and resistance to various compression algorithms of a steganographic audio file.
Therefore, the question arises whether the secret text information will be preserved without distortion when re-transcoding the steganographic audio file, and if so, what is the maximum value of the compression ratio at which the secret information maintains integrity? In particular, this question prompted the authors to write this article and develop one of the methods for steganographic hiding of text information in an audio signal [
5,
6,
7], which will allow for answering the contradictions that have arisen using modern methods of digital audio signal processing and spectral analysis methods.
1.1. Problem Statement
The developed method of steganographic hiding of text information in an audio signal based on the wavelet transform [
8] acquires a deep meaning in the conditions of the use by an attacker of deliberate unauthorized manipulations with a steganocoded audio signal to distort the text information embedded in it; that is, to make its semantic constructions illegible. The main form of these manipulations is the use of various algorithms for compressing the audio signal [
9,
10], but not to remove its uninformative components, which, according to the human psychophysiological model of sound perception, are beyond the threshold of audibility, and to remove the text information hidden in the audio signal by deliberately introducing distortions by the compression algorithm.
Thus, increasing the robustness of the stego-system to compression (reducing redundancy) of the steganocoded audio signal [
11,
12] subject to the preservation of the integrity of text information (genuine semantic structures), taking into account the features of the psychophysiological model of sound perception (hiding the very fact of text transmission by masking in acoustic signals), is the main objective of this scientific research.
1.2. Analysis of Existing Research and Formation of a Scientific Hypothesis
The task of this scientific research is effectively solved using a multilevel discrete wavelet transform [
8,
13] based on adaptive block normalization of text information with subsequent recursive embedding in the low-frequency component of the audio signal and further scalar product of the obtained coefficients with the Daubechies wavelet filters [
14,
15], which is a new approach in the field of steganography that makes the stego-system more resistant to transcoding. The difference between the developed method and the existing ones is that in existing steganographic methods of information hiding based on wavelet transform [
16,
17,
18,
19,
20], text information is usually embedded in the high-frequency wavelet coefficients (HFWC) at the last level of the wavelet decomposition, and in the developed method, it is proposed to use recursive embedding in the low-frequency wavelet coefficients (LFWC) followed by scalar product with orthogonal Daubechies filters at each level of the wavelet decomposition, which allows for increasing the average power of hidden data. This will increase the critical compression threshold of the steganocoded audio signal, at which the text will begin to distort (the transmitted message will be different from the received).
The formalization of the mentioned statements is as follows:
(1) An existing method that is used in many studies [
16,
17,
18,
19,
20] in different configurations, where for the most part, we may apply the idea of text integration according to the expression
in Formula (1):
(2) The proposed method differs significantly in the expression
in Formula (5), which allows for increasing the average power of hidden text information due to the scalar product with the coefficients of the low-frequency wavelet filter
:
where
,
are input and output audio signals with number of samples
;
,
are input and output texts divided into
blocks depending on the number of wavelet decomposition levels
;
,
are wavelet coefficients of low and high frequencies in quantity
;
,
,
,
are Daubechies filters of the
-th order low and high frequencies for decomposition and reconstruction;
,
are operations of double thinning and excess;
is a symbol used to logically explain the operation of integrating text information into wavelet coefficients.
The expression in Formula (5) shows that the integration of blocks of text information into wavelet coefficients occurs at the levels of the wavelet decomposition to their scalar product with a low-pass Daubechies filter , as opposed to the expression in Formula (1), where integration into wavelet coefficients occurs after the scalar product with the high-pass Daubechies filter (3).
Extraction of text information
from an audio signal
occurs recursively depending on the number of levels of the wavelet decomposition
according to Formulas (2), (3), (5), and (6) in the existing [
16,
17,
18,
19,
20] and proposed approaches, respectively.
Thus, using the developed method, it is possible to allow an attacker to re-encode the audio signal with various lossy compression algorithms, but at the same time, the text information embedded in the audio signal maintains integrity. This statement is based on the fact that the current variety of existing compression algorithms [
9,
10,
11,
12] operates according to the same principle, namely, the elimination of the uninformative redundant component of the audio signal. Since the proposed method hides text information at medium frequencies and amplitudes of wavelet coefficients, and because this is its main feature, it can significantly increase the resistance of the stego-system to audio signal compression, taking into account the features of the psychophysiological model of sound perception. The only exceptions are those cases of completely deleting an audio file or applying critical compression with a complete loss of meaningful audio information. A quantitative assessment of the boundary values of critical compression occurrence will be obtained in an experimental study. Critical compression should be understood as the degree of compression at which text information is distorted or completely deleted (violation of semantic links) from the audio signal with a significant reduction in redundancy (compression). Then the main evaluation for the effectiveness of the proposed stego-system is the maximum degree of audio signal compression and the integrity of text information; that is, the highest compression ratio that maintains the full integrity of the semantic structures of the text.
Analysis of the literature [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26] shows an almost complete absence of methods for embedding compression-resistant audio signals. One of the transformations that allows for such an embedding is the multilevel discrete wavelet transform, which has clear advantages in representing the local characteristics of the signal and takes into account the features of the psychophysiological model of sound perception. The proposed method increases the robustness of the stego-system to deliberate compression (elimination of highly informative features). We will show that the application of this approach in the development of the steganography algorithm, which is designed to achieve maximum robustness, can solve the main tasks of steganography, namely, minimization of introduced distortions and resistance to attacks by a passive intruder.
The next section is devoted to the presentation of all the main theoretical aspects of the proposed method, namely, (1) integrating text information into low-frequency wavelet coefficients of an audio signal followed by their scalar product with low-frequency and high-frequency orthogonal Daubechies wavelet filters for decomposition; (2) reconstructing of the audio signal with the text integrated into it by low-frequency and high-frequency wavelet coefficients; (3) extracting text information from low-frequency wavelet coefficients of the audio signal.
2. Presentation of the Proposed Method
Structural diagrams of the developed method of steganographic protection of text information based on the wavelet transform are shown in
Figure 1 and
Figure 2. A detailed explanation of all the blocks on the diagram and their formal presentation are given below.
Any text information in English can be represented as an ASCII encoding, where all characters of the computer alphabet are numbered from 0 to 127, describing the ordinal number of a character in the binary number system of a seven-digit code from 0000000 to 1111111. Thus, we will form a set of numbers that correspond to each specific character according to the ASCII encoding. Then text information can be represented as a set , which corresponds to a sequence of numbers from the set , where the occurrence of each in the set is determined by the sequence of characters in the text .
So, given some text information , where is the total number of characters to be hidden in the audio signal, it is necessary to perform an interleaving operation to remove statistical dependencies between characters in the text. This operation is implemented using a pseudo-random number generator (PRNG), which forms a sequence of uniformly distributed numbers in the range .
Given a random variable, we often compute the expectation and variance, two important summary statistics. The expectation describes the average value, and the variance describes the spread (amount of variability) around the expectation.
Then, the mathematical expectation
and variance
of such a sequence, which consists of
pseudo-random numbers
, should tend
to the following values
In order to shuffle the characters of the set in a pseudo-random way, it is necessary that the pseudo-random numbers that are generated by the PRNG are in the range , which is different from . Numbers in the range are equivalent to the indexes of each character of text information .
To solve this problem, we can use the formula
where
—pseudo-random numbers from the range
.
The correctness of this transform is described as follows
and is demonstrated in
Figure 3.
Then are pseudo-random numbers uniformly distributed in the range from to .
Thus, we can form a set of non-repeating numbers
which will correspond to the new indexes of each character of text information
. This set of numbers
will correspond to Key 1, which is used at the stage of integrating text into an audio signal (
Figure 1) and at the stage of extracting text from an audio signal (
Figure 2).
Then, the operations of interleaving, which is used at the stage of integrating text into an audio signal (
Figure 1), and de-interleaving, which is used at the stage of extracting text from an audio signal (
Figure 2), can be represented as follows:
Since the low-frequency wavelet coefficients will increase their absolute power with each next level of decomposition, then the text information must be sorted in such a way that its integration into low-frequency wavelet coefficients occurs from the minimum to the maximum values in accordance with the expression ; this is the main task of applying the sorting operation.
So, having received text information that was subject to the interleaving operation using , it is necessary to perform a sorting operation from the minimum to the maximum value of the set of characters .
We presented the input text information in the form of a set , where is a set of numbers that correspond to each specific character according to the ASCII encoding, and is determined by the initial sequence of characters in the text. Therefore, the expression can be rewritten as , where defines a sequence of numbers in the range from 0 to 127 depending on .
Then, the operations of sorting, which is used at the stage of integrating text into an audio signal (
Figure 1), and de-sorting, which is used at the stage of extracting text from an audio signal (
Figure 2), can be written as follows:
where
is the sequence of indexes of the set of characters
that was formed according to the expression
, which corresponds to Key 2 in
Figure 1 and
Figure 2.
So, having text information
that has undergone a sorting operation according to the condition
, it needs to be divided into
blocks, where
is a maximum number of levels of wavelet decomposition of the audio signal, since there is no text integration at the last level of wavelet decomposition (
Figure 1).
Then, the number of blocks
of text information
is determined by finding the maximum level of wavelet decomposition
of the audio signal, which can be expressed as follows:
The correctness of this expression is confirmed by the fulfillment of the condition
where
is a number of samples of the audio signal,
is a number of coefficients of the Daubechies wavelet filter, and the symbol
characterizes the rounding down of a number
[
27,
28,
29].
Then, the number of characters in one block
of text information
is determined according to the expression
where
is a total number of characters of text information
that should be hidden in the audio signal.
It should be noted that the number of characters of text information in one block directly depends on the maximum level of wavelet decomposition of the audio signal, as can be seen from Formulas (16)–(19). Then, finding the maximum level of wavelet decomposition allows for uniformly integrating all blocks of text information at all decomposition levels to increase the resistance to audio signal compression, since with an increase in the decomposition level, the amplitude of the wavelet coefficients will increase and, accordingly, the amplitude of the text information integrated into them, due to the subsequent scalar product with a wavelet filter at each decomposition level , which is a characteristic feature of the proposed method.
Thus text information
, which is divided into
blocks, where the number of characters in one block is
, can be represented in the form of a set
where
,
,
,
,
, which corresponds to the operation of dividing text information into blocks, which is used at the stage of integrating text into an audio signal, according to
Figure 1.
Then, the operation of combining blocks of text information
, which is used at the stage of extracting text from an audio signal, according to
Figure 2, will look like this:
At the final stage of preparing text information for integration into an audio signal, it is necessary to perform the normalization operation
so that text information
and audio signal
are in the same normalization scale, namely, so that values of ASCII codes of text characters
and audio signal samples
are in the range from 0 to 1.
Then, the restoration of the normalized text information
and the audio signal
to the original normalization (de-normalization) scale can be carried out according to the expressions
where this sequence of operations corresponds to the blocks of normalization and de-normalization, which are used at the stage of integrating text into an audio signal (
Figure 1) and extracting text from an audio signal (
Figure 2).
Thus, we get blocks of normalized text information
that are ready for integration into a normalized audio signal
. However, since the integration does not take place in the audio signal
itself, but in its low-frequency wavelet coefficients (LFWC) followed by their scalar product with low-frequency (LPF-D) and high-frequency (HPF-D) orthogonal Daubechies wavelet filters at each
level of the wavelet decomposition, it is necessary to perform a wavelet transform of the audio signal
and find the low-frequency (LFWC) and high-frequency (HFWC) wavelet coefficients for each
level of the wavelet decomposition [
30,
31]. It should be noted that not only Daubechies filters can be used, but also other orthogonal wavelet filters, such as Coiflets, Symlets, or Meyer.
Then, the discrete wavelet transform is the scalar product of the values of the studied audio signal
, with the coefficients of the orthogonal Daubechies wavelet filters of low
(LPF-D) and high
(HPF-D) frequencies for decomposition, followed by a double thinning
of the obtained coefficients
which can be formalized as follows:
where
,
,
, and
,
are low-frequency (LFWC) and high-frequency (HFWC) wavelet coefficients for the 1st level of audio signal
decomposition [
32,
33].
Since the text information
has been sorted from minimum
to maximum
values according to the expression
, to find the indexes of values (Key 3) of low-frequency wavelet coefficients
, which should be replaced
with the corresponding block of text information
, it is also necessary to sort the low-frequency wavelet coefficients
from the minimum
to the maximum
values according to the expression
and determine the indexes
of absolute minimum values
, which can be written as follows:
where
is the number of characters in one block of text information
.
Then, the operations of integrating
text information
into low-frequency wavelet coefficients
(
Figure 1) and extracting text information
from low-frequency wavelet coefficients
(
Figure 2) can be written as follows
where
is a sequence of indexes of the absolute minimum values of low-frequency wavelet coefficients
, which was formed according to the condition
, and corresponds to Key 3 in
Figure 1 and
Figure 2.
This operation is needed in order to replace
the absolute minimum values of low-frequency wavelet coefficients
with the minimum values of text information
, which can be formalized by the following relation:
where
.
This approach will provide less distortion of the audio signal
during its inverse recovery
by wavelet coefficients
and
, since both the audio signal
and text information
are in the same normalization scale, namely from 0 to 1, which allows us to correlate their absolute power [
34,
35].
Then, the operation of recursive integrating
of all blocks of text information
into low-frequency wavelet coefficients
at all
levels of the wavelet decomposition of the audio signal
followed by their scalar product with low-frequency
(LPF-D) and high-frequency
(HPF-D) orthogonal Daubechies wavelet filters for decomposition (
Figure 1) can be written as follows:
where
,
,
,
, and
where
,
,
,
, and
where
,
,
,
, and
where
,
,
, and
,
are low-frequency and high-frequency wavelet coefficients for
, the levels of audio signal
and decomposition
are low-frequency wavelet coefficients of decomposition levels
with integrated
blocks of text information
in accordance with
.
If we shorten expressions (34)–(44), we obtain the operation of recursive integrating
of text information
into low-frequency wavelet coefficients
of the audio signal
(
Figure 1), according to the following formulas:
where
,
,
;
where
,
,
,
.
Then, to reconstruct the audio signal
with the text
integrated
into it (
Figure 1), it is required to perform the operation of doubling
the low-frequency
,
and high-frequency
wavelet coefficients followed by the sum of the results of their scalar products with the coefficients of the orthogonal Daubechies wavelet filters of low
(LPF-R) and high
(HPF-R) frequencies for reconstruction at each
level of the wavelet decomposition, according to the expression
where
,
,
.
Then, the operation of recursively extracting all blocks of text information
from low-frequency wavelet coefficients
at all
levels of the wavelet decomposition of the audio signal
(
Figure 2) can be represented as follows:
where
,
,
,
where
,
,
,
.
Thus, we have the following operations:
(1) integrating
text information
into low-frequency wavelet coefficients
of an audio signal
followed by their scalar product with low-frequency
(LPF-D) and high-frequency
(HPF-D) orthogonal Daubechies wavelet filters for decomposition (45)–(49) (
Listing A1 in
Appendix A);
(2) reconstructing the audio signal
with the text
integrated
into it by low-frequency
,
and high-frequency
wavelet coefficients (53) (
Listing A2 in
Appendix A);
(3) extracting text information
from low-frequency wavelet coefficients
of the audio signal
(54)–(56) (
Listing A3 in
Appendix A).
These are the main scientific results of the proposed method of steganographic hiding of text information in an audio signal based on the wavelet transform.
3. Results of Scientific Experimental Research
A computer model of the method of steganographic protection of text information based on the wavelet transform was modeled and studied in the MATLAB R2021b software and mathematical complex using a set of the following libraries: Signal Processing Toolbox, Wavelet Toolbox, Audio Toolbox, Text Analytics Toolbox, Filter Design HDL Coder, DSP System Toolbox, Communications Toolbox, Statistics and Machine Learning Toolbox.
In the experimental study, the initial audio signal for the proposed method of steganographic hiding of text information is a mono recording of the announcer in a male voice. The duration of mono recording is 91 s of the poem The Road Not Taken, by Robert Frost, in audio format WAV with a sampling rate of 44.1 kHz and a quantization bit depth of 16 bits per sample. Therefore, the stream of the bit sequence of audio data at the input of the computer model of the developed method will be—705.6 Kbps, and the total amount of audio data will be—7.8 MB.
The audio signal was recorded using a sound card with a maximum sampling rate of 192 kHz, number of bits per sample of 24 bits/sample, and a signal-to-noise ratio of 116 dB using a unidirectional 16-bit condenser microphone with an audio sensitivity of 110 dB.
Figure 4 shows the original audio signal before steganographic processing to embed secret text information, and
Figure 5 shows the wavelet coefficients of the 17th level of decomposition, where the Daubechies function of the 12th order was used as a generating wavelet function. It should be noted that the optimal choice of the generating wavelet function and the number of decomposition levels are not trivial tasks, since the speech signal is a non-stationary process, and it is not possible to predict changes in its spectral component over time. Therefore, in practice, it is recommended to use the smoothest wavelet functions with a large number of zero moments (function order) and maximum number of possible levels of decomposition, which is determined through the energy of the signal under study and the wavelet function. This will make the wavelet spectrum of the speech signal most suitable for integrating text information.
As the initial text information (to be hidden) in the method under study, the poem The Road Not Taken by Robert Frost was used in text format TXT in the amount of 740 characters according to the rules of ASCII encoding, where 8 bits are allocated per character, from which it follows that the total amount of text information at the input of the computer model of the developed method will be 740 bytes.
Figure 6 shows the original text information in symbolic form before steganographic embedding in order to hide it in the audio signal, taking into account the psychophysiological features of human hearing.
Figure 7 also shows text information, but already encoded according to the ASCII encoding rules. It is the normalized values of ASCII codes that we must mask as best as possible in a highly redundant audio data stream, to hide the very fact of text transmission.
Table 1 presents the results of an experimental study, namely, quantitative estimates of the effectiveness of the existing stego-system based on wavelet transform under conditions of passive or deliberate distortion of text information hidden in the audio signal were obtained by applying redundancy reduction methods (compression).
The main task formulated earlier is to increase the robustness of the stego-system to compression algorithms, so that when compressing a steganocoded audio signal, the text information that is hidden inside it remains as complete as possible. Objective metrics are used to automate the processes of evaluating the effectiveness of embedding text information in an audio signal, which allow evaluating the distortions introduced by the stego-system into the original audio signal. As such, criteria for evaluating the effectiveness of the stego-system include objective metrics such as compression ratio (CR), correlation coefficient (CC), normalized root mean square error (NRMSE), and signal-to-noise ratio (SNR), peak signal-to-noise ratio (PSNR). It should be noted that CC, NRMSE, SNR, and PSNR are very sensitive to changes in the amplitude of the audio signal. Since it is the change in the amplitude of the audio signal that characterizes the degree of its distortion, this is exactly what we need to evaluate the quality of masking text information in an audio container, since this process entails signal distortion (amplitude distortion). Also, in this experimental study, Daubechies wavelet filters of the 12th order were used. This fact should be taken into account when interpreting the results obtained in CR, CC, NRMSE, SNR, and PSNR, which directly depend on the specific implementation of the audio signal, text information, and the selected wavelet filter, which will result in changes in the critical compression threshold in different versions of the experiment.
The obtained values of performance indicators should be interpreted as follows: with CR = 1, the steganocoded audio signal is not subjected to distortions introduced by the compression algorithm; at the same time, a very high psychophysiological model of sound perception (masking) is observed, which is confirmed by indicators CC = 0.9999, NRMSE = 0.0060, SNR = 36.7794, and PSNR = 63.0616. In this case, text information, when extracted from the audio signal, has ideal performance CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞, and this means that text information has not been subjected to the slightest distortion and is completely integral. The infinity symbol ∞ in this case means an infinitely high value of the criterion. According to
Table 1, the parsed text information will match the full copy of the input text; that is, at the output of the transformations, we will have a text of the form as in
Figure 6.
Figure 8 shows the wavelet coefficients after the audio signal is compressed by six times, but the integrity of the text information remains unchanged, which is ideal.
It should be noted that in the existing method, the indicator CR = 6 is the boundary value at which text information is not subjected to distortions of the compression algorithm; this can be seen by analyzing the values CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞ while maintaining a sufficient quality indicator in terms of masking according to CC = 0.9861, NRMSE = 0.0681, SNR = 30.2330, and PSNR = 51.6068. In other words, at a compression level of six times, there are no audible differences between the original and steganocoded audio signals. This is the so-called ‘critical level of compression’, at which there is no distortion of text information, by raising the threshold above the critical compression level, distortion occurs.
For clarity, we present the values of the wavelet coefficients of the compressed steganocoded audio signal by a factor of 30 in
Figure 9. From CC = 0.8632, NRMSE = 0.4984, SNR = 6.8333, and PSNR = 11.8327, it can be seen that under such conditions, it is not necessary to talk about the good sound quality of the audio signal. Also, due to the fact that there is a significant reduction in the redundancy of the steganocoded audio signal, it becomes problematic to maintain the integrity of text information in it.
Consider what happens to text information with such compression.
Figure 10 shows the recovered text information when the steganocoded audio signal is compressed by 30 times. According to the indicators from
Table 1, with CR = 30, we have text distortion in proportion to the values CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, which are sufficiently large distortions, the result of which is clearly visible in
Figure 10.
As can be seen from the above, the existing method of steganographic hiding of text information in an audio signal based on the wavelet transform shows rather mediocre results in terms of compression resistance.
Let conduct an experimental study of the developed method and clearly see its advantage over the existing one.
Table 2 presents the results of an experimental study of the developed method for hiding text information in an audio signal based on the wavelet transform, and as will be seen below, the proposed approach significantly increases the robustness of stego-system to the deliberate and passive elimination of redundancy to distort text information.
In doing so CR = 1, we have CC = 0.9999, NRMSE = 0.0059, SNR = 36.7274, and PSNR = 63.7437, which corresponds to the high performance of the psychophysiological model of audio signal perception (masking efficiency), and CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞ characterizes the integrity of text information extracted from the audio signal.
Very close attention should be paid to the results shown in
Table 2 for CR = 20, namely CC = 0.9109, NRMSE = 0.2990, SNR = 16.3873, and PSNR = 28.3405: they characterize a strong distortion of the steganographic audio signal, but according to CC = 1, NRMSE = 0. SNR = ∞, PSNR = ∞ text information remains integrity. These results are quite remarkable, since when compressed by 20 times, the integrity of the text is preserved in full: it is this result that is significant in our study.
It should be remembered that, by analyzing the existing method, we obtained the boundary value CR = 6, and in the developed, CR = 20, with full integrity of text information in both cases. Then, we can make reasonable conclusions that by applying the developed method of steganographic hiding of text information in an audio signal, we will get a gain of 3.3 times compared to the existing method, thereby increasing the robustness of the stego-system to deliberate or passive compression of the audio signal in order to distort the embedded text information.
The wavelet coefficients of the steganocoded audio signal after 20-times compression are shown in
Figure 11. According to
Table 2, CR = 20 is a borderline result, above which text information will be distorted.
Figure 12 shows the wavelet coefficients of the steganocoded audio signal after compression by 30 times. Given such compression, according to the values of the metrics CC = 0.9133, NRMSE = 0.3433, SNR = 21.3553, and PSNR = 34.3475, it can be concluded that text information is distorted, but comparing them with the indicators in
Table 1 at the same compression level CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, we will come to the conclusion that objectively, we have many times gain in the fight against distortions, all other things being equal, using the developed method of steganographic hiding of text information in an audio signal.
Figure 13 shows text information with 30-fold compression of a steganocoded audio signal using the developed method. It is clearly seen that distortion occurs, but in comparison with the existing concealment method, the results of which are shown in
Figure 10, we have a significant increase in the effective steganographic processing of audio signals to embed text information.
According to the results obtained in the experimental study, it is possible to draw reasonable conclusions that the proposed method of steganographic protection of text information is promising in this area and requires further research.