Keywords

1 Introduction

ECG is a standard measurement method widely used in the diagnosis and monitoring of cardiovascular diseases. It can reflect the electrophysiological process of the heart to a certain extent and assist the doctors to diagnose diseases accordingly [7]. Moreover, the automatic classification of arrhythmias is a challenging task because ECG signals based on different patients vary significantly under different conditions.

The deep-learning-based approaches have been applied in many fields, as well as the field of ECG classification. Inspired by this method, Kiranyaz et al. [3] developed an adaptive implementation of one-dimensional convolutional neural network (1D-CNN). It adopts CNN for feature extraction and classification and achieved excellent results. Although CNN is suitable for extracting internal morphological features, it cannot learn to take advantage of the information between beats. Researchers tried to apply different methods to improve the performance and concern more about features among adjacent beats. Zhang et al. [9] proposed a method of combining recurrent neural networks (RNN) and clustering techniques to classify ECG beats. The representative training data set is obtained by the clustering technique, and the beat morphology information is directly fed into the RNN to obtain the classification results.

In this work, we propose a novel end-to-end multi-label classification model based on a combination of Neural Network (NN) and the characteristic points. The ECG waveforms from all 12 leads will first be fed into the CNN to extract internal morphological features, which are then placed into the BiRNN to learn the relationship between the current beat and the adjacent beats. Besides, the Median Beat and the normalized coordinates of each characteristic point would be put into other two input channels to extract more relevant features. The experimental results on The First China ECG Intelligent Competition demonstrate that our proposed scheme achieves superior multi-label classification performance.

2 System Framework

The framework of our proposed system is shown in Fig. 1. Firstly the raw ECG data is denoised and then normalized. Next, we use the DPI algorithm [5] to detect the positions of \(R_{peak}\). Afterward, with the help of the random walk algorithm [8] we designed earlier, we can obtain other characteristic points with high precision. Meanwhile, the normalized data would be realigned according to the coordinate of the first \(R_{peak}\). Then if the length of this record is more extended than 10000, we cut the extra tail off; else we pad the front part of the record to the tail. Thus we have got three input channels for the classification model. These precessed data is randomly divided into a training set, a testing set, and a validation set. Finally, our model is trained and then evaluated on the validation set, respectively.

Fig. 1.
figure 1

The framework of the whole algorithm.

2.1 Denosing

Wavelet method is used to filter the noise of the raw ECG data. Firstly, the raw ECG signal is decomposed into nine scales using the Dual-Tree Complex Wavelet Transform (DTCWT) proposed by Selesnick [6]. Then only the information in three to eight levels is retained to reconstruct the signal, while other information is treated as noise and discard.

2.2 Alignment and Padding

We chose the DPI algorithm to detect the \(R_{peak}\) position of each beat. Furthermore, all leads of the normalized ECG data is realigned according to the position of the first \(R_{peak}\) (\(R_{0}\)). Points before \(R_{0}\) are abandoned. And then, we pad the head of the signal to the tail for each lead to make the signal up to a 10000 \(\times \,\)12-dim vector when the original length of the record is less than 10000. Otherwise, the redundant points over the tail would be discarded.

2.3 Characteristic Points Detection

In our previous work [8], we proposed a fast ECG delineation scheme by leveraging wavelet transform and machine learning techniques to detect characteristic points in ECG waveforms. With the RSWT (randomly selected wavelet transform) feature pool, we build a random forest regressor for each type of characteristic point. The regression tree is then trained to estimate the probability distribution to the direction toward the target point, relative to the current position. Then we devise a random walk testing scheme to refine the final positions of each ECG characteristic point.

Our trained model on QT database [4] can infer eight types of ECG characteristic points: \(P_{onset}\), \(P_{peak}\), \(P_{offset}\), \(R_{onset}\), \(R_{peak}\), \(R_{offset}\), \(T_{peak}\), and \(T_{offset}\) (\(T_{onset}\) wave isn’t included in QT database). Thus, we adopt the random walk algorithm to generate other seven characteristic points with the processed ECG data. We would collect 160 points of 20 continuous beats, and each point contains two features which are the normalized abscissa and ordinate.

2.4 Median Beat Extraction

For each person, the ECG waveform has a particular specificity. The waveforms presented in the static ECG have their own unique morphological laws. We hope to extract the most representative heartbeat as a simple reference.

The United States General Electric Company (GE) has conducted in-depth research on this issue [2], and the final solution aims to the median heartbeat algorithm. The basic idea of this algorithm is to use the pre-detected \(R_{peak}\)s to cut out each heartbeat in a record, and then align all the obtained heartbeats according to their \(R_{peak}\)s. For each point in the time series, the median of the amplitude values corresponding to the point in each heartbeat is taken as the final amplitude value at the same timing point of the median heartbeat. Thus, a representative median beat is generated, which could characterize the current record for later processing (Fig. 2).

Fig. 2.
figure 2

Example of the Median Beat.

2.5 Classification

We create an end-to-end model to accomplish the multi-label classification task. There are three input channels of our model: ECG channel, median beat channel, and characteristic points channel. The ECG channel learns the overall morphological features in the ECG waveform. The median beat channel focuses on the most significant and representative feature in each lead. Most of the clinically useful information in ECG can be inferred from the intervals and amplitudes of the ECG characteristic points. So we establish a new input channel sending the position and amplitude information of 8 characteristic points (\(P_{onset}\), \(P_{peak}\), \(P_{offset}\), \(R_{onset}\), \(R_{peak}\), \(R_{offset}\), \(T_{peak}\), and \(T_{offset}\)) to improve the performance.

The final classification result is acquired from the features offered by all three input channels. More details about our model would be presented in the following part.

3 Classification Model

This model is used for both feature extraction and classification of all 12 lead ECG signals. The network architecture of our model is shown in Fig. 3. The network takes three input channels as input data and outputs a vector of multi-label prediction.

At the final dense layer, in order to achieve multi-label classification, we choose the sigmoid function rather than the softmax function as the activation of the final dense layer. Because it is hoped that sigmoid function will activate the value of each node once, thus outputting the probability of 1 for each node respectively. Moreover, the binary cross-entropy loss function is used, which makes the model continuously reduce the cross-entropy between output and label during the training process. It is equivalent to make the output value of the node with label one closer to one, and the output value of the node with label zero closer to zero.

Fig. 3.
figure 3

The structure of the multi-label classification model.

In the ECG channel, the shape of the input data is N \(\times \,\)10000 \(\times \,\)12 (N refers to the number of records). The input data would firstly be fed into the four sequential CNN blocks which can effectively extract the interior morphological features of the12-lead ECG. As Fig. 3 shows, the CNN block is built by a 1-D convolution layer, a leaky ReLu layer, and a dropout layer. The reason why we design it in this way is that this structure has a stronger ability to learn interior features. After the CNN block, another convolution layer with large kernel size follows it, since it helps to enhance the relevance between those interior features. Next, these features would be sent to the BiRNN layer, which could consider the above features in the context, thus represent more relevant information between adjacent beats. Finally, we set up an AttentionWithContext layer to balance the weights between different combinations of features and focus more on the essential parts.

In the median beat channel, the input shape is N \(\times \,\)400 \(\times \,\)12. This input data is relatively tiny that it will only get through 2 CNN blocks to learn the most representative features of each lead. Similarly, it follows a BiRNN layer and an AttentionWithContext layer. This input channel may pay more attention to the relevance between those 12 leads and the implied information behind some specific groups of certain leads.

In the characteristic points channel, the input shape is N \(\times \,\)320 \(\times \,\)12. This input data is extracted by only one CNN block and then directly connected to the BiRNN layer and the AttentionWithContext layer. The feature of the characteristic points could help consider the intervals between different waveforms and some sudden change of the amplitudes. It is quite efficient for identifying certain diseases.

4 Results and Discussion

The First China ECG Intelligent Competition [1] has offered us a dataset of 6500 records digitized at 500 Hz with 12-lead (I, II, III, aVR, aVL, aVF, V1, V2, V3, V4, V5, and V6) with corresponding multi-labels. Each record may contain more than one arrhythmia, and there are eight arrhythmias (AF, FDAVB, CRBBB, LAFB, PVC, PAC, ER, and TWC) at all. We randomly split the dataset into 5000, 1000, and 500 as the training set, testing set, and validation set. The distribution of each arrhythmia is kept similar for the three datasets. Note that all of the following experimental results are the final averaging of the experiments repeated ten times of randomly splitting datasets.

Our proposed algorithm has achieved excellent performance on the validation dataset. For classification performance measured, four standard metrics are used: classification accuracy (Acc.), sensitivity (Sen.), positive predictivity (Ppr.), and f1 score (F1). Using true positive (TP), false positive (FP), true negative (TN), false negative (FN), Acc., Sen., Ppr., and F1 are defined as follows. Acc. is the ratio of the number of correctly classified patterns to the total number of patterns classified: \(Acc.=(TP+TN)/(TP+TN+FP+FN)\). Sen. is the rate of correctly classified events among all events: \(Sen.=TP/(TP+FN)\). Ppr. is the rate of correctly classified events in all detected events: \(Ppr.=TP/(TP+FP)\). F1 is the harmonic mean of the Sen. and the Ppr.: \(F1=2*Sen.*Ppr./(Sen.+Ppr.)\). The result is shown in Table 1. As can be seen from the table, we find that the overall performance of our algorithm is superior, and the performance of CRBBB is particularly outstanding.

In order to show the significance of the median beat channel and the characteristic points channel, we trained four models with different configurations. The result of the control experiments is shown in Table 2. Several observations could be made from this table. First, with the median beat embedded separately, the overall F1 score improved a lot. Note that the performance on Normal and ER is exceptionally better than CP. We argue that it is because the characteristics of the median beat can distinguish between normal and abnormal ECG waveforms. The results of the characteristic points embedded separately are also gratifying. For instance, results on CRBBB and PVC are even a little better than the result of the combination of these two channels. Moreover, like PVC and TWC, because the characteristic points of certain diseases possess distinguishing features, the results on them are better as expected. The result of the combined model achieved a sharp increase than the baseline, which fully demonstrates that the median beat and characteristic points have excellent performance on the work of multi-label ECG classification.

Table 1. Evaluation Results on the Validation Set
Table 2. Control Experiments for the Median Beat and Characteristic Points

5 Conclusion

In this paper, we firstly propose a novel end-to-end ECG classification model with extra input channels of the median beat and characteristic points. We use CNN to extract interior morphological features, and the features between beats are considered in the context via BiRNN. The median beat could help represent the most significant features between adjacent beats and leads. The position and amplitude information of characteristic points could enhance the associations between features. Moreover, it performs better on some arrhythmias, which relies more on the interval and amplitude changes during the waveform. The evaluation results on The First China ECG Intelligent Competition show that our proposed system achieves superior multi-label ECG classification performance.