Keywords

1 Introduction

Electronic warfare has become an indispensable part of modern warfare, which is the key to contend for the information superiority of the whole battlefield. With the increasing complex electromagnetic environment, the classification of radar emitter signal modulation becomes an urgent problem to be solved [1].

With the application of the new radar system, the modulation mode of radar emitter signal becomes more and more complex and the feature of signal changes and develops ceaselessly. Hence the traditional classification methods of modulation cannot meet the requirements of present electronic reconnaissance [2]. Traditional feature analysis methods mostly focus on the feature in a certain domain, while ignoring other domains. So it cannot effectively extract modulation feature of signal, thus affecting the electronic reconnaissance [3]. In [4], the deep belief network was used to realize the automatic extraction of feature parameters of large-sample data. The extracted feature is of a large order of magnitude, which makes the computational complexity of system high. And deep layer of network will easily cause gradient dispersion and gradient explosion. Literature [5] used LeNet-5 network to recognize document, which achieved good results. LeNet-5 is a small network that can adaptively train network parameters to make the network more suitable for current data processing. What’s more, higher-order statistics (HOS) feature can improve the anti-noise performance [6], Renyi entropy feature can reflect the energy concentration level of the signal [7], and the fusion of them is helpful to improve recognition rate at low SNR.

Based on the self-training network, a multi-dimensional feature fusion modulation classification system is proposed in this paper. The system uses self-training network to extract time-frequency feature of radar signal, then fuses the network extracted feature with HOS feature and Renyi entropy feature, finally sends them into extreme learning machine (ELM) to realize the accurate classification.

The rest of this paper is organized as follows. In Sect. 2, model of system and signal is established. Section 3 introduces the radar signal process. Section 4 shows methods of feature extraction. Section 5 describes the classifier. Simulation results and analysis are given in Sect. 6. At last, conclusions are drawn in Sect. 7.

2 Model of System and Signal

In order to realize the multi-dimensional feature fusion modulation classification based on self-training network, time-frequency transform, image preprocessing, feature extraction, feature fusion and feature classification needed to be introduced. Firstly, the pseudo Wigner-Vile distribution (PWVD) is used to transform radar signals into time-frequency images. Secondly, time-frequency images need to be preprocessed before feature extraction. Thirdly, LeNet-5, a small self-training network, is used to extract features. Fourthly, to express information of radar signals more comprehensively, network feature, Renyi entropy feature and HOS feature are fused together based on non-negative matrix factorization (NMF). Finally ELM can realize the classification. The structure of the proposed system is showed in Fig. 1.

Fig. 1.
figure 1

The structure of the proposed system.

To implement the proposed algorithm, it is necessary to generate radar signal to train the network parameters. The unified model of radar signal is as follow.

$$ s\left( t \right) = A\left( t \right)\exp \left[ {j\left( {2\pi f_{0} t + c\left( t \right) + \varphi_{0} } \right)} \right] $$
(1)

where \( A\left( t \right) \) represents the amplitude function, \( f_{0} \) is the carrier frequency, \( c\left( t \right) \) means phase function and \( \varphi_{0} \) is the original phase.

The classical nine types of radar signal include CW, BPSK, LFM, COSTAS, FRANK, T1, T2, T3 and T4, detailed models are showed in Table 1.

Table 1. Models of nine types of radar signal.

3 Signal Processing

Because radar signal is a non-stationary signal, the traditional methods would cause signal aliasing, which leads to in low recognition rate. Therefore, time-frequency transform is adopted.

3.1 Time-Frequency Transform

Wavelet transform and short-time Fourier transform (STFT) are two common methods of time-frequency transform, but wavelet transform is sensitive to noise, STFT can only deal with stationary signal [8, 9]. Wigner-vile distribution (WVD) has good time-frequency aggregation, and PWVD could further enhance the aggregation of distribution, which is helpful to classification [10].

$$ P_{x} (t,f) = \int_{ - \infty }^{ + \infty } {h(\tau )} x\left( {t + \tau /2} \right)x\left( {t - \tau /2} \right)e^{ - jf\tau } d\tau $$
(2)

where \( h\left( \tau \right) \) is window function. Figure 2 shows different time-frequency images without noise obtained by PWVD.

Fig. 2.
figure 2

The time-frequency image of PWVD, which are CW, BPSK, LFM, COSTAS, FRANK, T1, T2, T3 and T4.

3.2 Image Preprocessing

With the development of artificial intelligence, the application of neural network1 is more and more extensive. To apply LeNet-5 to radar signal, the images need to be preprocessed before being sent into network. The image preprocessing could eliminate noise and reduce the computational complexity of the LeNet-5 network. Most image preprocessing algorithms are defined by grayscale or binary image. In grayscale preprocessing, the original image information brightness is expressed by grayscale, thus changing the color image into grayscale format.

The time-frequency image could be represented by a \( M \times N \) matrix, and the brightness of image pixel points can be calculated by grayscale formula.

$$ {{\boldsymbol{I}}_{fg}} = 0.3{{\boldsymbol{B}}_{fg}} + 0.59\,{{\boldsymbol{G}}_{fg}} + 0.11{{\boldsymbol{R}}_{fg}} $$
(3)

where \( f,g \) represent the pixel point of image, \( 0 < f \le M,0 < g \le N \).

Because the dynamic range of gray value of time-frequency images is different to their corresponding signals, the range of gray value would affect the classification. In order to reduce the data imbalance on classification, the gray value needs to be normalized.

$$ {{{\hat{\boldsymbol{I}}}}_{fg}} = {{\left( {{{\boldsymbol{I}}_{fg}} - {{\bar{\boldsymbol{I}}}}} \right)} \mathord{\left/ {\vphantom {{\left( {{{\boldsymbol{I}}_{fg}} - {{\bar{\boldsymbol{I}}}}} \right)} {\sqrt {{1 \over {MN - 1}}\sum\limits_{f = 1}^M {\sum\limits_{g = 1}^N {({{\boldsymbol{I}}_{fg}} - {{\bar{\boldsymbol{I}}}}} {)^2}} } }}} \right. \kern-0pt} {\sqrt {{1 \over {MN - 1}}\sum\limits_{f = 1}^M {\sum\limits_{g = 1}^N {({{\boldsymbol{I}}_{fg}} - {{\bar{\boldsymbol{I}}}}} {)^2}} } }} $$
(4)

where \( {\bar{\boldsymbol{I}}} \) is the average of gray value.

In order to further enhance signal, reduce the influence of noise and the amount of data, binary process could be carried out.

$$ \user2{P}_{{fg}} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {\user2{I}_{{fg}} \ge \partial } \hfill \\ 0 \hfill & {\user2{I}_{{fg}} < \partial } \hfill \\ \end{array} } \right.$$
(5)

where the \( \partial \) is the binary threshold, in this paper it equals to 0.4.

4 Feature Extraction

In the classification of radar signal, feature extraction is an essential part, this paper proposes a method of multi-dimensional feature extraction, which enables the extracted feature to represent information of signal more comprehensively.

4.1 LeNet-5

LeNet-5 was proposed in 1998, which is the most representative among the early neural networks. It was initially used in document recognition. The network structure is simple so that it is suitable for small sample size training. The application of LeNet-5 on feature extraction can make information of radar signal more comprehensive, which can improve the reliability of the system at low SNR. The structure of LeNet-5 network is showed in Fig. 3.

Fig. 3.
figure 3

The structure of LeNet-5 network.

After the training of LeNet-5, it can be used to extract feature. This paper chooses principal component analysis (PCA) and kernel principal component analysis (KPCA) to deal with network extracted feature. To be specific, PCA treats the extracted feature as a data matrix F, the covariance matrix could be represented as R = FFT.

$$ {\varvec{R}} = {\varvec{UAU}}^{T} $$
(6)

where A is the eigenvalue diagonal matrix of covariance matrix, U is the correspond feature matrix.

$$ {\varvec{P}} = {{\varvec{U}}^T}{\varvec{F}} = {\left[ {{p_1},{p_2}, \cdots {p_K}} \right]^T} $$
(7)

where \( p \) is the principal component of extracted feature matrix. The first \( K \) principal components are chosen as feature matrix.

The difference between PCA and KPCA is that KPCA would map the feature matrix to high-dimensional feature space by nonlinear mapping. To feature matrix F, it would be mapped to high-dimensional space Φ to get Φ(f), the covariance matrix is \( \varvec{R} = \frac{1}{M}\sum\limits_{c = 1}^{M} {\Phi \left( {f_{c} } \right)}\Phi \left( {f_{c} } \right)^{T} \). Eigen value \( \lambda_{c} \) and eigenvector \( \mu_{c} \) can be get from following equation.

$$ \varvec{R}\mu_{c} = \lambda_{c} \mu_{c} $$
(8)

Eigenvector \( \mu_{c} \) can be represented by the linear combination of \( c \).

$$ \mu_{c} = \sum\limits_{c = 1}^{M} {a_{c} }\Phi \left( {x_{c} } \right) $$
(9)
$$ \lambda_{c} a = \frac{1}{M}\Phi \left( {x_{c} } \right)\Phi \left( {x_{c} } \right)^{T} \cdot a $$
(10)

where \( a \) is the linear combination coefficient \( a = \left( {a_{1} ,a_{2} , \ldots a_{n} } \right)^{T} \). A kernel function can be defined as follow.

$$ K_{cs} = K\left( {f_{c} ,f_{s} } \right) =\Phi \left( {f_{c} } \right)^{T}\Phi \left( {f_{c} } \right) $$
(11)
$$ N\lambda_{c} a = Ka $$
(12)

The \( k \) th kernel principal component through KPCA mapping is

$$ p_{k} = \mu_{t}^{T}\Phi \left( x \right) = \sum\limits_{s = 1}^{M} {a_{s} K\left( {x_{s} ,x} \right)} $$
(13)

The first \( k \) principal components are chosen as feature matrix.

4.2 Renyi Entropy

The more regular the time-frequency of the signal distributes, the less information it contains, and the smaller the entropy value. When the components of signal are cluttered, it means that it contains more information and the entropy will increase. The Renyi entropy of time-frequency image can be represented as

$$ R^{\alpha } = \frac{1}{1 - \alpha }\log_{2} \iint {P_{x}^{\alpha } }(t,f)dtdf $$
(14)

Order can reflect feature well, and this paper chooses the order \( \alpha \) of Renyi entropy in 3, 5, 7, 9 and 11 as signal feature.

4.3 Higher-Order Statistic

HOS can express the essential feature of signal well, which can improve the robustness of system. Kurtosis and margin of signal are extracted in time domain. The average of time domain signal \( x\left( t \right) \) is

$$ \bar{X} = \frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i} } \left( t \right) $$
(15)

where \( N \) is the number of \( x\left( t \right) \). The mean-square value can be represented as

$$ X_{rms}^{2} = \frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i}^{2} } \left( t \right) $$
(16)

Thus, the margin of signal is

$$ C_{e} = \frac{{X_{rms} }}{{\bar{X}}} $$
(17)

The kurtosis of signal is

$$ C_{q} = \frac{{\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {\left| {x_{i} } \right| - \bar{x}} \right)} }}{{X_{rms}^{4} }} $$
(18)

The time domain signal could be transformed into frequency domain signal \( X\left( f \right) \) by Fourier Transform. We extract the kurtosis and margin feature of frequency as the spectral kurtosis and spectral margin feature.

4.4 Feature Fusion

This paper applies self-training network LeNet-5 to extract features, and the extracted feature is reduced by PCA and KPCA. What’s more, the Renyi entropy feature and HOS feature would be extracted as supplement to make feature extraction more accurate. But the ranges of feature value of different extraction methods are different, which may affect the accuracy of subsequent classifier greatly. So it is necessary to normalize the extracted features respectively.

$$ {\varvec{T}} = \left[ {{{\varvec{T}}_{{\varvec{PCA}}}},{{\varvec{T}}_{{\varvec{KPCA}}}},{{\varvec{T}}_{{\varvec{HOS}}}},{{\varvec{T}}_{{\varvec{Renyi}}}}} \right] $$
(19)

where TPCA is the normalized feature reduced by PCA, TKPCA is the normalized feature reduced by KPCA, THOS is the normalized feature extracted by HOS, TRenyi is the normalized feature extracted by Renyi entropy. When different features are normalized, NMF is used to fuse them together, which can reduce the redundant information of features. The NMF can be expressed as

$$ \mathop {\hbox{min} }\limits_{{\varvec{W,H}}} f\left( {\varvec{W,H}} \right) = \frac{1}{2}\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {\left( {\varvec{T}_{ij} - \left( {\varvec{WH}} \right)_{ij} } \right)^{2} } } $$
(20)

where \( m \times n \) is the size of T, W and H are two matrixes. W is the feature matrix after being fused. Then the fused feature is sent into classifier to realize classification.

5 Feature Classification

ELM overcomes the shortcomings of traditional neural network such as low training rate, easy to fall into local optimum and sensitive to learning rate [11]. During the process of training, ELM could randomly generate the connection weights between input layer and hidden layer and threshold of hidden layer. There is no need to adjust them in the training, the global optimal can be obtained by setting the number of hidden layer neuron.

The mathematical expression of ELM is

$$ v_{k} = \omega^{T} g\left( {W_{in} u_{k} + b} \right),k = 1,2, \cdots ,N $$
(21)

where \( v_{k} \) is the output vector, \( \omega \) is the output weight, \( g \) is the activation function, \( W_{in} \) is the input weight, \( u_{k} \) is the input vector, \( b \) is the bias value of hidden layer, \( N \) is the sum of sample.

During the training process, the \( W_{in} \) and \( b \) are randomly initialized and unchanged, the only parameter need to be trained is \( \omega \). The detailed calculation method is as followed.

$$ \omega = \varvec{H}^{{\mathbf{ + }}} \varvec{I} $$
(22)

where \( \varvec{H}^{{\mathbf{ + }}} \) is the Moore-Penrose generalized inverse matrix of hidden layer input matrix H. We can expand the H as

$$ \varvec{H} = \left[ {\begin{array}{*{20}c} {g\left( {W_{in} u_{1} + b_{1} } \right)} & \cdots & {g\left( {W_{in} u_{1} + b_{n} } \right)} \\ \vdots & \ddots & \vdots \\ {g\left( {W_{in} u_{N} + b_{1} } \right)} & \cdots & {g\left( {W_{in} u_{N} + b_{n} } \right)} \\ \end{array} } \right]_{N \times n} $$
(23)

The expected output matrix I is

$$ \varvec{I} = \left( {I_{1} ,I_{2} , \cdots ,I_{N} } \right)^{T} $$
(24)

Thus, the training process of ELM is a simple linear regression process. When the \( \omega \) is found, the training is finished.

6 Simulation Results and Analysis

To test and verity the feasibility of the proposed algorithm, we set the parameters of radar signal as follow. The sampling frequency is \( f_{s} = 32\,\text{MHz} \), the sampling point is \( N = 512 \) and the noise is white Gaussian noise, the pulse width of signal is 10us, the carrier frequency is 10 MHz. According to the signal models in Table 1, we generate nine types of radar signal in random under −3–6 dB respectively, and do time-frequency transform on them. For each SNR, we randomly choose 100 signals from each type of signals. In addition, we generate nine types of radar signal without noise to train LeNet-5 network, each signal is chosen 300 randomly to do time-frequency transformation and image preprocessing.

Figure 4 shows the proportion of each component of total component under 0 dB. From Fig. 4, we can see that both in PCA and KPCA, as the increase of subcomponent, the proportion of subcomponent decreases, while the proportion of sum component increases. In PCA, first 20 subcomponents can account for 90% of the total components, which is enough to represent most feature information. The proportion of the component after 20th component is smaller and smaller, which is easily disturbed by noise, so it is ignored. We can get 20 features after PCA. Similar to PCA, first twelve subcomponents can account for 90% of the total components in KPCA. We choose first 12 subcomponents of KPCA as feature matrix. If we output the features from C5 convolutional layer, 4096 features will be obtained. But the number of features after dimension reduction is only 32, which greatly reduces the computational complexity.

Fig. 4.
figure 4

The proportion of each component in PCA and KPCA.

Figure 5 shows the recognition rate curves of training set and test set of the proposed algorithm under −3–6 dB, in which the number of PCA components is 20, the number of KPCA components is 12, each type of signal is randomly divided into training set and test set according to 7:3, and the experiment results take the mean by 500 times repeat.

Fig. 5.
figure 5

Recognition rate curves of training set and test set.

Figure 5 shows that the recognition rate of the training set and test set of proposed algorithm would increase as the SNR increases. The recognition rate of training set reaches 91% under 0 dB, which indicates that the proposed algorithm has a good recognition rate. What’s more, the recognition rate trend of the test set is similar to that of the training set, and the difference between them is not obvious, which indicates that there is no over-fitting or over-fitting in classifier. The Fig. 5 proves that the proposed algorithm can be well applied to radar signal modulation classification.

Figure 6 shows that the recognition rates of the test set based on four method all increase as the SNR increases. What’ more, the recognition rate of algorithm based on LeNet-5 + HOS + Renyi entropy is higher than that of LeNet-5, which proves that feature fusion in proposed algorithm can improve the recognition rate. The recognition rate of proposed algorithm tends to stable after 4 dB, the recognition rate under 4 dB is 96% and the recognition rate under low SNR of −3 dB up to 78%. In addition, it is obvious that the performance of LeNet-5 extracted feature is worse that of Renyi entropy feature. This is mainly because the signal is chaotic under the influence of noise in low SNR, Renyi entropy feature can explain the energy concentration level of the signal, which can improve the anti-noise performance effectively, and fusing the HOS feature can further enhance the reliability. However, with the increase of SNR, the influence of noise on the signal decreases gradually, the ability of Renyi entropy feature to interpret modulation regulation becomes weak. On the contrary, LeNet-5 can extract signal features completely and accurately, which makes the recognition rate better than Renyi entropy after 3 dB. All in all, after fusing the LeNet-5 extracted feature with HOS feature and Renyi entropy feature, although a small amount of cumulative error will be included, the fused feature expresses modulation information more comprehensive and the performance is further improved.

Fig. 6.
figure 6

Recognition rate of test set based on four algorithms.

7 Conclusion

A multi-dimensional feature fusion modulation classification algorithm based on self-training network is proposed in this paper. The algorithm applies LeNet-5 network to extract the modulation regulation feature automatically, which can solve the problem that traditional algorithms extract feature incompletely and deep learning neural network is not suitable for small sample training. Renyi entropy feature and HOS feature are fused by NMF to increase the recognition rate of proposed algorithm. The simulation results show that the recognition rate of modulation classification based on fused feature is better than other classification algorithms, and it performs well under low SNR.