Introduction

Previously specified as 2019 novel coronavirus (2019-nCOV), the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) disease (COVID-19) has precipitated global outbreak as it is termed as a pandemic by World Health Organization (WHO) [1]. It is rapidly disseminating all over the world since the development of the virus in Wuhan, China, at the end of 2019 [2, 3]. Though it was reportedly covered about the linkage of the Wuhan animal market, pointing out about the animal to human transmission, further studies have suggested human to human transmission [4] through droplets and inappropriate social distancing. Nevertheless, the virus is transmitting expeditiously, the costliest Polymerase Chain Reaction (PCR) testing and inept isolating process still behind the foundation of proper treatment. For that purpose, we have to think of an economical and straightforward testing approach by which the examining process can escalate with the speed of transmitting to promptly identify and detach the infected person. So, it is the most decisive task to defend the proliferation as this malignant virus is continuously ravaging the world.

The convoluted PCR testing method is the only approved process of detecting the novel COVID-19 disease by WHO. For most of the country, people cannot afford this verification cost. As a result, people are dying without getting proper treatment because of the fatal virus. Moreover, the effect of this virus reflects in the major body parts including lung, heart, brain, etc. In a study published by nature, directly induced lung injury and deteriorate the respiratory system [5]. Also, there is a considerable amount of feature and observation to distinguish the infected part from a lung image. So, it will be beneficial for the society and a milestone development if we consider and establish a detection model based on the Chest X-Ray (CXR) or CT Scan images to classify the COVID-19 disease.

Researchers around the world are continuously trying to build a time-efficient and cost-effective model to identify COVID-19 disease. Investigators are adopting CXR and CT Scan images for the classification of the infected lung. There is a disparate type of AI-based architectures developed to efficiently recognize the infected lung images [6,7,8,9]. In the AI-based methods, machine learning and deep learning architectures stands out in most of the COVID-19 classification tasks [10,11,12]. But one of the biggest hindrances the researchers are facing is a deficiency in the dataset. To efficiently train a model, a reasonable amount of the subject images is required. But there is an insufficient amount of COVID-19 affected lung available for the research. So, image processing or machine learning model hardly can segregate the COVID and non-COVID images. To support this dataset complication, researchers are lying towards deep learning architecture because of the augmentation and transfer learning approaches [13, 14]. Various versions of CNN models such as—Inception Net, XceptionNet, ResNet, VGGNet, etc. are the prominent architectures that are employed in this research.

Studying the existing classification models, we have adopted High-Resolution Network (HRNet) for feature extraction. In HRNet, different types of convolutional resolutions are linked in a parallel manner. Besides, HRNet consists of plentiful interaction across low and high resolution that bolsters its internal representation ability. The feature representation of this network is kept up during the training process that can prevent the small target information loss in the feature map. For vision-based problems, like small target segmentation and classification, HRNet gives more accurate and definite results because of the parallel procedure. In summary, our contributions in this research are as follows:

  • Firstly, we propose a COVID-19 classification method based on a high-resolution network for feature extraction, provides competing results compared with the existing architectures.

  • Secondly, we integrate UNet for lung segmentation along with the HRNet to precisely and accurately classify the COVID region. This addition improves the result significantly and validates the infected lung region rather than the redundant non-relevant areas.

  • Finally, we conduct a performance comparison with the existing advanced models by implementing those models and evaluating their performance with our proposed work. From the experimental results, we can affirm that the proposed model accomplishes surpassed the existing models by accuracy, sensitivity, specificity, and other evaluation metrics.

The rest of this paper is organized as follows. In section “Related Works”, we provided a study on the related works in this area of research. Subsequently, in section “Proposed Model”, we discussed our proposed model which consists of a detailed description of our dataset and classification network. In section “Experimental Analysis”, the experimental analysis is presented in detail, followed by the performance evaluation. Finally, in section “Conclusion”, we concluded this paper with significant future works.

Related Works

Over the years, numerous types of works have been established to detect COVID-19 disease from a distinct perspective. Researchers around the world tried to come up with a model that can efficiently classify this disease considering a short amount of time. Various transfer learning approaches attracted attention because of insufficient amount of COVID-19 affected lung images [15,16,17]. In this section, a study on the existing works on COVID-19 classification will thoroughly describe with appropriate characterization and depiction.

Apostolopoulos et al. [14] proposed an architecture based on transfer learning for the feature extraction. First, the authors tried to employed a CNN model to extract the feature of a different nature that is called pre-trained (only used for feature extractor) the CNN model. This was done by operating the transfer learning for feature extraction. After that, the extracted features were fed into a particular network for classification purposes. Though the author accomplished exceptional result but the author did not focus on handling the negative transfer.

Borghesi et al. [18] introduced a Chest X-Ray (CXR) scoring system named as Brixia score to determine the outcome (recovery or death). Dividing the lungs into six zones with the aid of frontal chest projection, the authors assigned four scores (Score 0: no abnormalities to Score 3: alveolar predominance or critical condition). Then the six scores for the six divided zones were aggregated to obtain the final score ranging from 0 to 18. For validation, weight kappa \((k_\mathrm{w})\), confidence interval (CI) and P-values were calculated. Although the scoring system is a unique way to identify the disease, the experiment should apply to a considerable amount of CXR images.

Leveraging the multi-resolution capabilities of the inception module, Das et al. [19] built a truncated inception net architecture associating with the adaptive learning rate protocol. In this model, kernels of disparate receptive fields were executed in a parallel manner for feature extraction. Then, the extracted features were deformed depth-wise to obtain the final output. Because of the diminutive dataset, used in the architecture, the inception net is truncated at some particular point of the model. An accuracy of 99.92% was achieved classifying COVID-19 cases combining with the pneumonia cases.

A COVID-19 detection model considering multi-class and hierarchical classification was developed by Pereira et al. [20]. Working on the natural data imbalance, a resampling algorithm for rebalancing the class distribution was operated. For feature extraction, Inception V3, Local Binary Pattern (LBP), Local Phase Quantisation (LPQ), Local Directional Number pattern descriptor (LDN), Locally Encoded Transform Feature Histogram (LETRIST), Binarized Statistical Image Features (BSIF), etc. model was employed. Early and late fusion techniques were leveraged to the feature descriptor algorithms. Then, the author introduced the resampling operation to rebalance the distribution of the multi-class: flat and hierarchical. A macro-average \(F_{1}\) score of 0.65 and 0.89 was achieved using the multi-class approach and the hierarchical classification respectively. In an unbalanced environment, this architecture stands out in the relative existing works.

A 19-layer CNN architecture was proposed by Ozturk et al. [21] for the binary COVID-19 classification. The authors also covered the multiclass problem by assimilating the Pneumonia with the COVID and No Finding class. DarkNet-19, adopted by the author, is based on the prominent real-time object detection model YOLO. In the proposed model, a total of 17 layers were built by a 2D Convolutional layer of different dimensions and trainable parameters, one Flatten and one Linear layer. Each of the Convolutional Layer was developed followed by batch normalization and LeakyReLU operation. An accuracy of 87.02% for multi-class cases and 98.08% for binary classes was attained by the authors. A highly diverse and long-rang selective connective method was proposed by Wang et al. [22]. The machine-driven design strategy is leveraged by generative synthesis [23]. A PEPX module (conv\(1\times 1\) + conv\(1\times 1\) + DWconv\(3\times 3\) + conv\(1\times 1\) + conv\(1\times 1\)) was assembled with general convolutional, flatten and fully connected layers. Finally, softmax was used for classification purposes. Though an accuracy of 93.3% was achieved operating this network, the long-range connections in the densely-connected DNN produce memory overhead. Also, the architecture is computationally expensive for the long-range connections in the network. Moreover, heterogeneous incorporation of convolutional layers with different kernels and grouping configurations, heavily affect the interconnection and operation of the architecture.

Narin et al. [24] worked with the pre-trained models such as—ResNet50, InceptionV3 and Inception-ResNetV2. Because of the diminutive amount of data, transfer learning was incorporated to overcome the training time and deficiency in the dataset. Firstly, the input images were fed into the pre-trained models integrated with the transfer learning. Secondly, in the training phase, Global Average Pooling, Fully Connected Layer with ReLU was employed. Finally, the authors concluded with a fully connected layer with softmax for the final classification. The model achieved 97%, 98% and 87% accuracy respectively by operating InceptionV3, ResNet50 and Inception-ResNetV2 architecture. Nevertheless, transfer learning was incorporated with the model for the deficiency in the dataset, the model overfits with the data.

Working on both imaging systems (CT Scans and Chest X-rays), Mukherjee et al. [25] proposed a Deep Neural Network (DNN) model to detect COVID-19 disease. The authors structured a Nine layer DNN architecture by incorporating convolutional, max pooling, and dense layers. Employing this DNN model on 672 images (336 CXR and 336 CT Scans), the authors achieved 96.28% as overall accuracy with 98.08% as AUC and 2.08% as False Negative Rate. To efficiently validate the experimental result, this model is evaluated on 10-fold cross-validation which ensures that each of the images is considered for training and testing at least one time. Another deep learning COVID-19 model is proposed by Mukherjee et al. [26], concentrating on Shallow Convolutional Neural Network—consists of four layers. This model is a lightweight CNN architecture—trained with fewer parameters which reduce the time complexity by focusing on the meticulous lung affected areas (features). The authors validated their proposed model on 321 COVID-19 affected CXR images, with 5856 non-COVID-19 affected images. Considering a possible bias, the authors adopted a 5-fold cross-validation technique and achieved 99.69% accuracy.

Table 1 Analysis of the existing COVID-19 classification works

From the studied research works, we have summarized the methods, datasets and performance in Table 1. Though these deep learning methods worked well on this classification, there is a high chance of biasness and oversampling in the learning process for an insufficient number of images. Also, the feature information needs to incorporate in each and every layer on high-to-low and low-to-high upsampling processes which is not integrated with the existing model. To overcome these difficulties, High-Resolution Network (HRNet) is employed for feature extraction of this classification. Moreover, in the existing models, researchers did not focus on the segmentation as it plays a critical role to train an architecture. We should exclude the redundant area of a lung image to efficiently work on the infected lung portion only. For that purpose, we have integrated UNet architecture for segmentation purposes. In summary, we have built a unique and unprecedented COVID-19 detection architecture based on UNet (segmentation purpose) and HRNet (feature extraction purpose).

Proposed Model

In this section, we briefly discussed our approaches to data pre-processing, proposed research methodologies with proper illustration and characterization. To build this model, we have explored several pre-processing and classification techniques including supervised machine learning, deep learning, and transfer learning. In the extant research on COVID-19 detection, supervised learning focuses on the binary classification (COVID vs. Non-COVID) or a multiclass classification (COVID vs. Pneumonia vs. Normal lung conditions) [41]. Studying these existing works and considering the drawbacks described in the literature review, we have introduced HRNet [42] for feature extraction embedded with UNet [43] for segmentation purposes. Firstly, this segmentation process has been introduced because COVID CXR images that are publicly available contain several redundant marks, lung regions cropped, shifted to different directions, etc. After that, for feature selection, HRNet has been introduced in the field of COVID-19 detection from CXR images. HRNet has the capability of avoiding small features from the images. After feature selection, a classification head, created of a fully connected neural network has been used to classify COVID vs. Non-COVID images. Furthermore, a standard dataset is developed for COVID detection from chest X-ray images from several public data sources. These data sources are updated every day and giving researchers opportunities to focus more on COVID detection from CXR images. A depiction of the proposed architecture is illustrated in Fig. 1. Following, we thoroughly described the related steps of the proposed method.

Fig. 1
figure 1

Proposed model to classify COVID-19 disease using HRNet

Dataset Collection and Preperation

The dataset for the classification purpose is built by assembling images from several acknowledgeable sources. Firstly, the COVID-19 CXR dataset that has been used in our experiment is collected from the public repository of GitHub [27]. As of July 03, 2020, this repository contains 759 images. In Fig. 2, the class distribution of the public repository is depicted. In this public repository, 521 images are labeled as COVID-19 and 12 images are labeled as Acute Respiratory Distress Syndrome (ARDS) COVID-19. A total number of 533 images from the repository were collected from this source for our primary COVID-19 dataset. Though this is the most prominent repository since the beginning of the research in COVID-19 classification, there are some shortcomings in the images. One of the particular drawbacks can be exemplified as—this collection contains images from several publicly available resources such as websites and pdf formats. As a result, these images are in variable size and quality. Also, there are few side view images where the majority of the images belong to frontal views. Moreover, some images have color issues, contain markers, etc. In Fig. 3, four examples of COVID-19 images are depicted acknowledging the issues—side view (Fig. 3a), washed out (Fig. 3b), color issue (Fig. 3c) and markers (Fig. 3d).

Fig. 2
figure 2

A class distribution representation of the public repository

For non-COVID or normal images, we have collected images from the National Institute of Health (NIH) Chest X-Ray [30] which contains 108,948 frontal view X-ray images with 14 condition labels including normal condition and pneumonia of 32,717 unique patients. Unlike the COVID dataset, these images are not of variable dimensions. All of the images are resized to 1024 \(\times\) 1024 in portrait orientation. As most of the images from the COVID dataset belongs to adult patients, we have applied an age threshold of 18 years or older on normal condition images to keep the X-Ray image condition as similar as possible. We also explored the ChexPert [44] which contains 224,316 chest radiographs. This dataset contains 14 labeled observations of 65,240 patients where the number of Normal (non-COVID) images are 16,627. In Table 2, we have summarized the properties of the final dataset created for the evaluation of our proposed architecture.

Table 2 Dataset preperation
Fig. 3
figure 3

Variations observed in COVID dataset [a side view, b washed out, c color issue, d and e markers, f and g medical equipment]

As described earlier, COVID images contain lateral X-ray images, taken from PDF files, marked images (Fig. 3d and e), etc. Firstly, we removed these images from the COVID dataset as there is a possibility that these images can make the classifier biased. Secondly, in NIH and ChexPert datasets, some images contain medical equipment (Fig. 3f and g) which creates redundancies and abnormalities in times of training the model. Hence, these types of images are excluded from the main data repository. But still, most of the images contain several marks around the lungs area. To avoid these marks, we segmented the lung area from the images. In the next section, we described the process of lung segmentation and pre-processing the images and finally creating a practical COVID dataset for our experiment. In Fig. 4, a comparison between a single random image collected from COVID, NIH, and ChexPert dataset is illustrated. Furthermore, we aggregated Normal images from NIH and ChexPert dataset as it looks similar.

Fig. 4
figure 4

Image comparison of COVID and non-COVID dataset [a COVID dataset, b NIH dataset, and c ChexPert dataset]

Lung Segmentation

For our primary purpose, all the datasets mentioned above do not contain any annotation for the lung area. Thus to segment the lung data, we collect dataset that has lung area annotated and train the UNet [43] from scratch. Belongs to the semantic segmentation category, UNet was solely created for medical image segmentation and also proven its worth in recent segmentation tasks. Another great feature of UNet is it can make a strong usage of augmented data, contracting and symmetric path-based architecture enables precise localization.

Assembled upon the Fully Connected Network (FCN), UNet is symmetric and consists of a skip connection which provides information from local to global network while upsampling. As a consequence of the symmetricity, this network has an extended number of feature maps in the intermediate connection that allows transferring the information. There are three distinct parts of the UNet—Downsampling path (Contraction), Bottleneck, and Upsampling path (Expansion). While the Contracting path is established by the basic convolutional process, the Extracting path is constituted with transposed 2D convolutional layers.

For this segmentation task, we collected a dataset from Jaeger et al. [45]. This dataset contains a total number of 800 frontal X-ray images from the Montgomery County chest X-ray dataset of the Department of Health and Human Services, Montgomery County, Maryland, USA (138 images), and Shenzen chest X-ray dataset of Guangdong Medical College, Shenzen, China (662 images). Chest X-ray datasets from these sources are visually similar to the dataset of 2472 images which we have selected for our COVID detection task. A side by side comparison with our COVID dataset images and segmentation dataset images is depicted in Fig. 5.

Fig. 5
figure 5

Image comparison used in lung segmentation [a COVID dataset, b Montgomery dataset, and c Shenzen dataset]

Every image of the Montgomery dataset is either 4020 \(\times\) 4892 pixels or 4892 \(\times\) 4020 pixels. Images from the Shenzen dataset are variable but approximately 3000 \(\times\) 3000 pixels. We resized these images and their corresponding lung masks to 512 \(\times\) 512 pixels to train our model. Moreover, augmentation is done on the images and their corresponding masks to take the advantages of the UNet model. The augmentation procedure and parameters are described in the dataset augmentation section and the training process with the parameters is explained in the “Experimental Analysis” section. After training the UNet model with this segmentation dataset, we used the operating weights to get segmented lung images from the COVID dataset. Furthermore, the UNet model provides a prediction mask for every image. Hence, we applied dilation operation for 8 iterations and 2 iterations of closing operation on the predicted masks to gain more information from the edges of the lungs. To ensure the quality of the segmentation, we then applied a region-based threshold to distinguish the pixels. For every image, we got several segmented regions then calculate the total area of these segmented regions for image and apply a threshold of 50,000 pixels to ensure the quality. This threshold value was chosen by trial and error and observing the output from the segmentation model. After applying an area threshold to these segmentations, we finally got our COVID dataset. Our final COVID dataset consists of 410 COVID images and 500 non-COVID images from NIH and the ChexPert dataset. The working procedure of this segmentation method is summarized in Algorithm 1 (Fig. 6).

Fig. 6
figure 6

Segmented region after dilation and closing apply UNet architecture

figure a

Data Augmentation

For a better generalization of the classification model, we augmented both the segmentation dataset for the UNet model and the pre-processed segmented COVID dataset. In both cases, we applied the same set of augmentation such as scaling, padding, crop, rotation, gamma correction, slight Gaussian blur, random noise, salt and pepper noise, etc. based on a probabilistic value. Figure 7 demonstrates the output of image augmentation. In Table 3, we have discussed the parameters of augmentation. These parameters were used for both the segmentation dataset and the COVID dataset. All of these parameters were selected empirically.

Fig. 7
figure 7

Sample images after augmentation

Table 3 Image augmentation parameters

Classification with HRNet

For feature extraction, High-Resolution Net (HRNet) [42] is a state-of-the-art neural network architecture. In most of the recent HRNet adopted architecture, it is used as the backbone of the proposed models. Considering the two approaches—Top-Down and Bottom-Up, HRNet followed the top-down approach because it detects the subject first, establishes a bounding box around the subject, and then estimate the significant feature. Moreover, HRNet relies on continuous multi-scale fusions instead of a single high-to-low upsampling process. In the following, a brief description of the HRNet architecture is characterized.

Fig. 8
figure 8

A general architecture of HRNet for feature extraction

The fundamental working procedure is it calculates lower resolution and higher resolution sub-network parallelly. Then, these two networks coalesced together by a fuse layer for the purpose to assemble and interchange information from each of the sub-network with each other. Consisting of four parts, each of the parts is built with repeating modularized multi-resolution sections [42]. Each section consists of a group convolution supporting the multi-resolution properties. A multi-resolution convolution can be constructed with the aid of regular convolution wherein a regular convolution, the input, and output layers are connected in a fully-connected approach [46]. The subnetworks follow these properties to aggregate their multi-resolution attributes. In this way, the subnetwork gains global high-resolution representations. A representation of the employed HRNet architecture is depicted in Fig. 8. The working procedure of feature extraction using HRNet and classification is compiled in Algorithm 2.

figure b

Experimental Analysis

Computing Infrastructure

This whole experiment was conducted on a device with Ryzen 7 3700X processor, NVIDIA GeForce RTX 2070 Super GPU, 16 GB 3600 MHz RAM, and Samsung Evo 970 m.2 SSD.

Dataset Segmentation

To make an accurate detection, the classifiers need to focus on the lung regions properly. Our previous experiments without segmentation (Fig. 9) shows that despite having high accuracy on training and validation set, the focus region of the classifiers often deviate to outside of lungs which may lead to a false prediction. To address this issue, we segmented our dataset and kept only the lung portion from the X-ray images.

Fig. 9
figure 9

This figure visualizes some sample heatmaps (blue shade represents the focus region) of corona positive detection by the classifier. But as we can see in the figure, most of the focused regions redundant, not significant, erroneous and somewhat can produce a biased result in times of classification

Only lung areas are collected to create a clean dataset without any redundant markers and flaws in the image. To collect these lung areas from the primary dataset of 2472 images, the UNet model was trained on Montgomery and Shenzen dataset. This dataset has 800 images and to train UNet KFold (10-fold) cross-validation is used. For validating the experimental results, the 10-fold cross validation technique is adopted considering a possible bias. Furthermore, 10-fold cross-validation technique ensures that each images (COVID and non-COVID) is considered for training and testing purpose at least once in the experiment cycle. In this dataset, two corresponding masks, one for the right lung and one for the left lung contains for one CXR image. At first, two lung images were combined to make one corresponding mask for each CXR image. These images are then employed to train the UNet model. For the loss function, a Dice coefficient loss is used to get crisp borders. This loss function has been introduced by Milletari et al. [47] in their research of 3D volumetric image segmentation. Dice loss originates from Sørensen–Dice coefficient which is a statistical method developed in 1940. Given dice coefficient D, it can be written as

$$\begin{aligned} D=\frac{2 \sum _{i}^{N} p_{i} g_{i}}{\sum _{i}^{N} p_{i}^{2}+\sum _{i}^{N} g_{i}^{2}} \end{aligned}$$
(1)

where \(P_{i}\) and \(G_{i}\) represent the pixels of prediction and ground truth respectively. In edge detection, the values of \(P_{i}\) or \(G_{i}\) is either 0 or 1. That means if two sets of pixels overlap perfectly the D reaches its maximum value to 1, otherwise it decreases. By using dice loss, two sets of predicted edge and ground truth edge are trained to overlap gradually. After that, traditional cross-entropy loss calculates the average of per-pixel loss where the per-pixel loss is calculated discretely here without knowing if the adjacent pixels are edges or not. As a result, it is not enough for image-level prediction. Dice loss provided better results in our lung segmentation using UNet. The model was trained for 25 epochs for each fold with a learning rate of 0.005, and a custom dice coefficient loss function with a grayscale input image of size 512 \(\times\) 512 pixels. Table 4 shows the input, output, and layer configuration used for UNet.

Table 4 UNet configuration

As mentioned above, UNet is trained for 10-folds, 25 epochs for each fold, and the dice coefficient loss function is used. The learning rate is selected by train and error process and images are augmented according to the configuration described in the augmentation section. Figure 10 shows the average training accuracy, training loss, average validation accuracy, and average validation loss for 10-folds.

Fig. 10
figure 10

Training and validation curves after training the UNet architecture

After training the model, the weights are applied on the COVID dataset to remove redundant areas from chest X-ray images and to get lung areas. This segmented dataset is used for training the HRNet and get classification results from the classifier.

Feature Extraction and Training Procedure

In this section, the training procedure of feature extraction using HRNet and classification been discussed briefly. HRNet has the capability of avoiding the loss of small target information in the feature map due to its convolutions being connected in parallel and high-resolution feature representation. To train the model, the segmented COVID dataset of 910 images is used. This dataset contains 410 lungs segmented COVID images and 500 lung segmented non-COVID images. For training purposes, 10-fold cross-validation is used. The model showed better accuracy for learning rate 0.01, weight decay 0.0001, momentum 0.8, learning rate factor 0.1, learning rate step 30, 60, 90. Stochastic gradient descent and binary cross-entropy loss functions are used. These parameters were chosen by observing the performance and evaluating the model progressively (Table 5). The performance analysis of the model is shown in Table 6 for each fold. Using the KFold algorithm, first, the dataset is divided into 10-folds where 1-fold is kept for testing. The other 9 folds are used for training purposes.

Table 5 Hyper-parameter setup
Fig. 11
figure 11

Classification Head

In summary, HRNet has been used as the backbone of our classification model to extract features from images. These features are then passed through several fully connected layers which are defined as a classification head. The parameters of this classification head such as the number of layers, dropout, activation function, regularization, etc. were selected by trial and error method. The configuration of the classification head is given in the Fig. 11. Furthermore, for performance evaluation, if the number of True Positive, True Negative, False Positive and False Negative denoted by TP, TN, FP, and FN respectively, then the three adopted evaluation matrices which are as follows:

$$\begin{aligned} \mathrm{Accuracy} = \frac{\mathrm{TP} + \mathrm{TN}}{\mathrm{TP} + \mathrm{TN} + \mathrm{FP} + \mathrm{FN}} \end{aligned}$$
(2)
$$\begin{aligned} \mathrm{Sensitivity} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}} \end{aligned}$$
(3)
$$\begin{aligned} \mathrm{Specificity} = \frac{\mathrm{TN}}{\mathrm{TN} + \mathrm{FP}} \end{aligned}$$
(4)

In Table 6, the average of the testing accuracy, sensitivity and specificity for the 10-folds is characterized. We have accomplished 99.26% testing accuracy, 98.53% of testing sensitivity and 98.52% of testing specificity. Furthermore, we depicted the confusion matrices of worst and best cases in Fig. 12. In the worst confusion matrix (Fig. 12a), 39 and 50 testing images correctly classified as COVID and non-COVID respectively whereas only two images are falsely classified. But, in Fig. 12b, all the testing images are accurately classified as COVID and non-COVID which represents the best confusion matrix. We have also successfully eradicated the issue of false-focused region detection through segmentation. Figure 13 shows the heatmaps of correctly classified regions.

Fig. 12
figure 12

Confusion matrix after training the proposed model

Table 6 Performance analysis of the proposed architecture
Fig. 13
figure 13

Heatmaps in a visualize some sample for positive detection and in b represent corona negative detection. As we can see, the lung regions are accurately focused (blue shade represents the focus region) by the classifier after training with segmented data

Comparative Analysis

We trained our model both with segmented and non-segmented data. We have compared the performance of the proposed model with the existing adopted models. Two existing prominent architecture—ResNet152 and DenseNet121 are implemented for the comparison purpose. We achieved the superlative result in each of the terms of the evaluation metric—Accuracy, Testing Specificity, Sensitivity, and F1 Score—by comparing our proposed model with these trained architectures. HRNet achived a high accuracy of 100% (with non-segmented data) and 99.26% (On Segmented Data) respectively. In Table 7, we have summarized the existing model’s performance comparison with our proposed model. In Fig. 14, a depiction of the optimal confusion matrix of the existing trained model is illustrated.

Table 7 Performance comparison with the existing models
Fig. 14
figure 14

Confusion matrix after training the existing models

Conclusion

In this study, we use a segmented X-ray dataset, as extensive experiments show that, with a non-segmented dataset, classifiers may focus outside of the lung region which can lead to false classification results. Meanwhile, we also evaluate some state-of-the-art recognition methods on our dataset. The result demonstrates that HRnet performs the best among the others with 99.26% accuracy, 98.53% sensitivity, and 98.82% specificity. The proposed model is fully automated without any need for manual feature extraction. Moreover, to ensure a production-ready solution, we broadly investigate the results and focus regions of the classifiers and our experimental results show the robustness of our proposed model to focus on the right region and classify. To conclude, this model can be used to help the radiologist to make clinical decisions, due to its unbiased high-accuracy and correctly identified focus region. We hope that our proposed methodology is a step towards the possibility of lessening the false positive detection from X-Ray images.

However, in terms of data, we are still in the primary level of the experiment. As the number of patients increasing around the world and the symptoms and formation of the virus are changing day by day, with the continuous collection of data, we intend to extend the experiment further and enhance the usability of the model.