AU2020102631A4

AU2020102631A4 - The Severity Level and Early Prediction of Covid-19 Using CEDCNN Classifier

Info

Publication number: AU2020102631A4
Application number: AU2020102631A
Authority: AU
Inventors: Babu Rao Dharavath; N. Kanagavalli; Shahina Parveen M.; P. Mangayarkarasi; Balusu Nandini; Balasubramani R.; Deepali Virmani
Original assignee: A Anbuchezian Dr; M Madiajagan Dr; M Shahina Parveen Dr; N Pradeep Dr
Current assignee: A Anbuchezian Dr; M Shahina Parveen Dr; N Pradeep Dr; T Subramani Dr
Priority date: 2020-10-07
Filing date: 2020-10-07
Publication date: 2020-11-26
Anticipated expiration: 2028-10-07

Abstract

The Severity Level and Early Prediction of Covid-19 Using CEDCNN Classifier The rapid spread of Coronavirus disease 2019 (COVID-19) has brought doctors, researchers, and data scientists together to find a solution. Scientists are using sophisticated technologies, such as big data analytics, machine learning, and natural language processing for tracking the virus and learning more about it. On the one hand, the excess of stored data has considerably increased the opportunities to interrelate and analyze them. This proposed invention is to predict the severity level and early prediction of COVID-19 using Cross Entropy-based Deep Convolutional Neural Network (CEDCNN) for big data. This invented work is composed of '3' steps, namely, disease prediction, severity level analysis, and early prediction. In the first phase, initially, the dataset is preprocessed, then the important features are extracted from the dataset, and finally, the disease is clustered into positive and predictive using Taxicab Norm K Means (TNKM). In phase 2, the proposed system utilizes CEDCNN for severity analysis, which classifies a high, low, and moderate level. In phase 3, the non-coronavirus data undergoes preprocessing, and then important features are extracted from the dataset. Finally, the potential level of the patient against the coronavirus is predicted by the Mahalanobis Distance Ranking (MDR) method. NCoV19 dataiset Pre- processing Feature Clustering Normializat exrction TNKM OutputClassfiedseeostysivwCorona vir negative Corona virus psitive dat-aset Corona virus negative dataset Preprocessing N f Feature extractio Feature extracIon . Prediction Classification Potential No Potential Low Moderate Highj Fig. 1 Input: Extracted feature set (NG) Output: Clas sifi ed s everity as low, moderate,. anad hi Begin Initialize 23, Z V , Wf , and bias Calculate thae number of training samples Numfeatures-n ls elected feature s Ifn(numofeature us)=0 Error gfeafeature not integer) Endfif For W. d& Calculate activati on function by using Of =A-L Wf +bias For each feature do Convolution feature map using Af = A, (NG, Pooling feature map using V, = Vf (29" End for End for End Fig. 2

Description

NCoV19 dataiset Pre- processing Feature Clustering Normializat exrction TNKM

OutputClassfiedseeostysivwCorona vir negative

Corona virus psitive dat-aset Corona virus negative dataset Preprocessing

N fFeature extractio Feature extracIon . Prediction Classification

Potential No Potential Low Moderate Highj

Fig. 1

Input: Extracted feature set (NG) Output: Clas sifi ed s everity as low, moderate,. anad hi

Begin Initialize 23, Z V , Wf , and bias Calculate thae number of training samples Numfeatures-n ls elected feature s Ifn(numofeature us)=0 Error gfeafeature not integer) Endfif For W. d&

Calculate activati on function by using Of =A-L Wf +bias

For each feature do Convolution feature map using Af = A, (NG, Pooling feature map using V, = Vf (29" End for End for End

Fig. 2

Editorial Note 2020102631 There is only eleven pages of the description

TITLE OF THE INVENTION

The Severity Level and Early Prediction of Covid-19 Using CEDCNN Classifier

FIELD OF THE INVENTION

[001]. The present disclosure is generally related to the Severity Level and Early Prediction of Coronavirus disease 2019 (COVID-19) Using Cross Entropy-based Deep Convolutional Neural Network Classifier

BACKGROUND OF THE INVENTION

[002]. The World Health Organization (WHO) has updated the clinical and epidemiological criteria used in its case definitions for the global surveillance for human infection with Coronavirus disease 2019 (COVID-19) on 27 February 2020. In the context of COVID-19, big data refers to the patient care data, such as physician notes, X-Ray reports, and case history, list of doctors and nurses, and information on outbreak areas. By combining with deep learning analytics, big data helps to understand the COVID-19 in terms of outbreak tracking, virus structure, disease treatment, and vaccine manufacturing.

[003]. This is a computer science related efficient technique to predict the severity level and early prediction of COVID-19.It is found to be robust and efficient for the early coronavirus prediction and severity level classification.

[004]. The proposed system mainly consists of'3' phases, say disease prediction, severity analysis, and early prediction. In the disease prediction phase, the dataset is clustered into disease positive and disease negative set. In the severity analysis phase, the disease positive set is classified into high, moderate, and low-risk levels using Cross Entropy based Deep Convolutional Neural Network (CEDCNN). In the early prediction phase, the disease negative set is taken to predict the disease early. The performance of the proposed system is compared with the conventional DLNN, CNN, ANN, and SVM with respect to the accuracy, f-measure, and computation time. Here, the proposed system achieves 98.56 % accuracy with 36.78s computation time.

SUMMARY OF THE INVENTION

[005]. This proposed invention is to predict the severity level and early prediction of Coronavirus disease 2019 (COVID-19) using Cross Entropy-based Deep Convolutional Neural Network (CEDCNN) for big data. This invented work is composed of '3' steps, namely, disease prediction, severity level analysis, and early prediction. In the first phase, initially, the dataset is preprocessed, then the important features are extracted from the dataset, and finally, the disease is clustered into positive and predictive using Taxicab Norm K-Means (TNKM). In phase 2, the proposed system utilizes CEDCNN for severity analysis, which classifies a high, low, and moderate level. In phase 3, the non-coronavirus data undergoes preprocessing, and then important features are extracted from the dataset. Finally, the potential level of the patient against the coronavirus is predicted by the Mahalanobis Distance Ranking (MDR) method.

[006]. Disease Prediction Phase: The disease prediction phase is the first phase to predict the disease will be positive or negative. The disease prediction system consists of three phases, namely, dataset collection, preprocessing, and clustering. These phases are explained briefly below.

[007]. Dataset Collection: W.H.O declared the Coronavirus pandemic as Health Emergency. The researchers and hospitals give open access to the data regarding this pandemic. The proposed system has collected data from Kaggle as "Novel Corona Virus 2019 (nCoV-2019) Dataset". It collects information on individuals from national, provincial, and municipal health reports, along with additional knowledge from online reports. All data are geo-coded and contain further input, such as symptoms, key dates (date of onset, admission, and confirmation), and travel records were available. Mathematically, it is expressed as below,

Cd =c 1 2 c3 ..'''''' . . . . . (

Where, Cd indicates the dataset for further processing and c, signifies the n -number of data in the dataset.

[008]. Preprocessing: In data preprocessing, normalization is applied to preprocess the data. It is important to make the data more appropriate for data mining and analysis with respect to time, cost, and quality.

[009]. Zero score normalization (ZSN): The dataset consists of multiple missing values, which cause an error when passed directly as an input. Thus, the system fills the missing values with "?". Then, the missing values are replaced by the Zero Score Normalization (ZSN). Here, the values are normalized based on the mean and standard deviation of the data. This calculation is mathematically described as follows:

, Cd -F Cd=-F (2) 07F

Where, Nd and Nd represents the new and old of each entry in the dataset respectively, F and4Fsignifies the standard deviation and mean of F respectively.

[0010]. Feature Extraction: Feature Extraction aims to reduce the number of features in a dataset by creating new features from the existing ones. The proposed system extracts age, sex, co-morbidities, respiratory pattern, viral load, survival, and etc. features from the dataset. These extracted features are explained mathematically as,

G ={g 1 ,g 2 ,g ,- 3 . . . . . . . gk} (3)

Where, G. indicates the extracted features set and represent the k -number of features.

[0011]. Clustering : In this section, the proposed system clusters the disease's negative and disease positive from the extracted feature sets. Here, the clustering process is done by using the Taxicab Norm based K-Means algorithm (TNKM). Normally, the distance in the k-means algorithm is determined by applying Euclidean distance. Euclidean distance only makes sense when all the dimensions have the same units (like meters) since it includes adding the squared value of them. In the suggested system, the TNKM algorithm will be used to perform the clustering process. In the TNKM algorithm, the Euclidean distance formula was replaced by the taxicab norm formula to reduce the number of iteration. Let G1 be the set of extracted features and Mf ={mI,m 2 ,m 3 . . . .m.}be the set of centers. Next, calculate the distance between each data point and cluster centers. Here, the distance will be calculated by using the taxicab norm. It is expressed as follows,

D, = 3Gf -Mf (4)

Where, D.denotes the distance between each data point and cluster center. Then, assign the data point to the cluster center whose distance from the cluster center is the minimum of all the cluster centers. After that, recalculate the new cluster center using

R = K Gf (5)

Where, G, indicates the number of data points in f th cluster and G, represents the set of each data point. Next, recalculate the distance (Q 1 ) between each data point and newly obtained cluster centers by using the taxicab norm. It is evaluated as follows,

Qf = $|Gf - Rf (6)

[0012]. Based on the above steps, the proposed system clustered two sets, namely, disease positive set (P ) and disease negative set (N).

[0013]. Severity Analysis Phase: Individuals of any age can acquire severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections, although adults of middle age and older are most commonly affected, and older adults are more likely to have severe disease. This phase is utilized to ascertain the severityanalysis of the coronavirus as low, moderate, and medium based on their features. The severity analysis phase consists of '4' stages namely, merged input dataset, preprocessing, feature extraction, and classification.

[0014]. Corona virus-positive dataset: The proposed system gathers the data from the coronavirus-positive dataset (P)Then, these dataset is merged with some common attributes, such as age, sex, respiratory pattern, etc to form a new dataset. The new dataset is expressed as follow:

ND, ={ndj,nd2 ,nd,......... . . . . .ndn} (7)

Where, ND, indicates the new dataset, it contains details about the coronavirus positive patient and nd, signifies the n -number of data.

[0015]. Preprocessing and feature extraction: The preprocessing step involved in the disease prediction phase is done for this new dataset using equation (2). After feature extraction, the important features, such as body temperature, dry cough, age, etc. are extracted from the dataset for efficient analysis. The extracted feature set is denoted as, NGI.

[0016]. Severity analysis: After feature extraction, the extracted features are given as input to the CEDCNN. A typical architecture of Convolutional Neural Network (DCNN) includes convolution, pooling, fully connected layers, and also softmax layers. The DCNN classifier includes more number of layers. The input to a convolutional layer is feature maps, and each feature map is convolved by filters in the convolutional layer. The activation value A) of convolutional feature can be calculated as:

A" = A, (NG1 ) (8)

Where, A (.) represents the non-linear activation function. Then, it is given as the pooling layer. Representing the pooling functions asV, for each feature map Af it has:

V: = V (Af) (9)

Where, Vsignifies the pooling layer. After several convolutional and pooling layers, there may be one or more fully-connected layers that aim to perform high-level reasoning. The fully connected layer function is denoted asw; . Finally, the softmax activation function calculate the final output, which is expressed as follows,

O =. W' + bias (10)

Where, O denotes the final softmax output, / weight value of the layer and bias is the bias value of the layer. Here, the weight value of the layer is given based on the cross-entropy equation, which is described as follows,

/I = N Y P (wf ) log P (W ) 11

Where, Nindicates the number of samples and also the above equation called the cross-entropy equation, which gives less loss function and perfect weight value for classification. Lastly, this classifies the severity level as high, low, and moderate. The pseudocode for the CEDCNN algorithm is shown in figure 2.

DETAILED DESCRIPTION OF THE INVENTION

[0017]. This proposed invention is to predict the severity level and early prediction of COVID-19 using Cross Entropy-based Deep Convolutional Neural Network (CEDCNN) for big data. This invented work is composed of '3' steps, namely, disease prediction, severity level analysis, and early prediction. In the first phase, initially, the dataset is preprocessed, then the important features are extracted from the dataset, and finally, the disease is clustered into positive and predictive using Taxicab Norm K Means (TNKM). In phase 2, the proposed system utilizes CEDCNN for severity analysis, which classifies a high, low, and moderate level. In phase 3, the non coronavirus data undergoes preprocessing, and then important features are extracted from the dataset. Finally, the potential level of the patient against the coronavirus is predicted by the Mahalanobis Distance Ranking (MDR) method. Architecture diagram for the proposed methodology is given in figure 1.

[0018]. Disease Prediction Phase: The disease prediction phase is the first phase to predict the disease will be positive or negative. The disease prediction system consists of three phases, namely, dataset collection, preprocessing, and clustering. These phases are explained briefly below.

[0019]. Dataset Collection: W.H.O declared the Coronavirus pandemic as Health Emergency. The researchers and hospitals give open access to the data regarding this pandemic. The proposed system has collected data from Kaggle as "Novel Corona Virus 2019 (nCoV-2019) Dataset". It collects information on individuals from national, provincial, and municipal health reports, along with additional knowledge from online reports. All data are geo-coded and contain further input, such as symptoms, key dates (date of onset, admission, and confirmation), and travel records were available. Mathematically, it is expressed as below,

Cd =c 1 2 c3 ..'''''' . . . . .

( Where, Cd indicates the dataset for further processing and c, signifies the n -number of data in the dataset.

[0020]. Preprocessing: In data preprocessing, normalization is applied to preprocess the data. It is important to make the data more appropriate for data mining and analysis with respect to time, cost, and quality.

[0021]. Zero score normalization (ZSN): The dataset consists of multiple missing values, which cause an error when passed directly as an input. Thus, the system fills the missing values with "?". Then, the missing values are replaced by the Zero Score Normalization (ZSN). Here, the values are normalized based on the mean and standard deviation of the data. This calculation is mathematically described as follows:

,C-F (2 Nd _- d 07F

Where, Nd and Nd represents the new and old of each entry in the dataset respectively, F and -F signifies the standard deviation and mean of F respectively.

[0022]. Feature Extraction: Feature Extraction aims to reduce the number of features in a dataset by creating new features from the existing ones. The proposed system extracts age, sex, co-morbidities, respiratory pattern, viral load, survival, and etc. features from the dataset. These extracted features are explained mathematically as,

G ={g 1 ,g 2 , g 3 ,- . . . . . . . gk} (3)

Where, G.indicates the extracted features set and represent the k -number of features.

[0023]. Clustering : In this section, the proposed system clusters the disease's negative and disease positive from the extracted feature sets. Here, the clustering process is done by using the Taxicab Norm based K-Means algorithm (TNKM). Normally, the distance in the k-means algorithm is determined by applying Euclidean distance. Euclidean distance only makes sense when all the dimensions have the same units (like meters) since it includes adding the squared value of them. In the suggested system, the TNKM algorithm will be used to perform the clustering process. In the TNKM algorithm, the Euclidean distance formula was replaced by the taxicab norm formula to reduce the number of iteration. Let G 1, be the set of extracted features and M1 .={mI,m 2 ,m 3 ,. . ... m}be the set of centers. Next, calculate the distance between each data point and cluster centers. Here, the distance will be calculated by using the taxicab norm. It is expressed as follows,

D, =YGf -Mf, (4)

Rf K= -2 Gf (5) G,

Where, Gf indicates the number of data points in f -th cluster and G, represents the set of each data point. Next, recalculate the distance (Q 1 ) between each data point and newly obtained cluster centers by using the taxicab norm. It is evaluated as follows,

Qf= Gf-Rf (6)

[0024]. Based on the above steps, the proposed system clustered two sets, namely, disease positive set (Pf ) and disease negative set (N ).

[0025]. Severity Analysis Phase: Individuals of any age can acquire severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections, although adults of middle age and older are most commonly affected, and older adults are more likely to have severe disease. This phase is utilized to ascertain the severityanalysis of the coronavirus as low, moderate, and medium based on their features. The severity analysis phase consists of '4' stages namely, merged input dataset, preprocessing, feature extraction, and classification.

[0026]. Corona virus-positive dataset: The proposed system gathers the data from the coronavirus-positive dataset (P).Then, these dataset is merged with some common attributes, such as age, sex, respiratory pattern, etc to form a new dataset. The new dataset is expressed as follow:

NDf ={ndj,nd 2 ,nd,......... . . . . .nd,) (7)

[0027]. Preprocessing and feature extraction: The preprocessing step involved in the disease prediction phase is done for this new dataset using equation (2). After feature extraction, the important features, such as body temperature, dry cough, age, etc. are extracted from the dataset for efficient analysis. The extracted feature set is denoted as, NG 1 .

[0028]. Severity analysis: After feature extraction, the extracted features are given as input to the CEDCNN. A typical architecture of Convolutional Neural Network (DCNN) includes convolution, pooling, fully connected layers, and also softmax layers. The DCNN classifier includes more number of layers. The input to a convolutional layer is feature maps, and each feature map is convolved by filters in the convolutional layer. The activation value A) of convolutional feature can be calculated as:

A" = A, (NG) (8)

Where, A (.) represents the non-linear activation function. Then, it is given as the pooling layer. Representing the pooling functions asVf, for each feature map Af it has:

V = V, (Af) (9)

Where, V signifies the pooling layer. After several convolutional and pooling layers, there may be one or more fully-connected layers that aim to perform high-level reasoning. The fully connected layer function is denoted as W . Finally, the softmax activation function calculate the final output, which is expressed as follows,

O = A.W' +bias (10)

/I = Y P (wf ) log P (W ) 11 N

[0029]. Early Prediction Phase: In this phase, the early prediction of the disease is carried out. Early diagnosis of diseases is important so that patients can begin to manage the disease early and potentially prevent or delay the serious disease complications that can decrease the quality of life. This phase consists of two phases, such as feature extraction, and early prediction system.

[0030]. Feature Extraction: In this phase, the proposed system takes a non-coronavirus patient dataset (N ) for early prediction. Then, the common features, such as age, sex, temperature, respiratory rate, blood pressure, diabetics' level, body mass index, etc. Features are extracted from the non-coronavirus dataset. Mathematically, it is described below,

KIF ={kfl, kf2 , kf3 ,.......... .....kf,) (12)

Where, KF indicates the extracted feature set, which is extracted from the non coronavirus patient dataset and kf, represents the n -number of features.

[0031]. Mahalanobis distance ranking (MDR) method for early prediction: In this phase, the early prediction of the coronavirus is carried out utilizing the Mahalanobis Distance Ranking method. Here, the MD is computed according to the below equation. The proposed techniques use two feature set for MDR calculation, one is extracted feature set KF, and another one is the "corona virus-prediction feature set", which is denoted as LH, which contains the real value for each feature. The Mahalanobis distance between two features is calculated as,

DMDR =(LH,-KF)T.Z .(LH,-Kj)a (13)

Where, Z indicates the sample covariance of the features and a represents the constant value. The stages are found centered on the DMDR , which means that the proposed method fixed some threshold if the DMDRis below the threshold, then the stage is denoted as the potential. Otherwise, the stage is denoted as 'no potential'.

[00321. In the results analysis section, the performance of the proposed severity level analysis and early prediction of COVID-19 using the CEDCNN classifier is analyzed. The performance of the proposed system is validated through the performance analysis section.

[00331. Performance Analysis Section: In this performance analysis section, the performance of the proposed CEDCNN algorithm is contrasted to the Deep Learning Neural Network (DLNN), Convolutional Neural Network (CNN), Artificial Neural Network (ANN), and Support Vector Machine (SVM) algorithms based on some performance metrics, say, accuracy, f-measure, and computation time are performed. These performance analyses could be tabulated from the below table,

Proposed 98.56 96.24 36.78 CEDCNN

DLNN 95.30 93.47 52.24

CNN 93.24 91.11 59.99

ANN 92.14 90.02 69.87

SVM 89.99 87.82 81.24

[00341. Discussion: The above table depicts the performance of the proposed CEDCNN classifier is compared with the conventional DLNN, CNN, ANN, and SVM classifiers in terms of accuracy, f-measure, and computation time. Accuracy, f-measure, and computation time is an important measure to predict better results. Here, the conventional SVM proffers poor performance than the proposed classifier. Likewise, the conventional DLNN, CNN, and ANN classifiers attain lower performance than the traditional methodologies. For example, the existent SVM offers accuracy and f measure of 89.99 % and 87.82 % with 81.2s computation time. But, the proposed CEDCNN classifier attains 98.56 % accuracy and 96.24 % f-measure with less computation time, which means the proposed system takes 36.78s time to predict the results. Consequently, it is deduced that the proposed DSR has attained higher performance when contrasted to the prevailing system. These performance analyses are graphically represented in the figure 3. In figure 3a the performance of the proposed CEDCNN classifier with the conventional classifiers is demonstrated based on accuracy & f-measure and in figure 3b the performance of the proposed CEDCNN classifier with the conventional classifiers is demonstrated based on computation time

Claims

Editorial Note 2020102631 There is only one page of the claim CLAIMS: We Claim:

1. This proposed invention is to predict the severity level and early prediction of COVID-19 using Cross Entropy-based Deep Convolutional Neural Network (CEDCNN) for big data.

2. As claimed in 1, this invented work is composed of '3' steps, namely, disease prediction, severity level analysis, and early prediction.

3. As claimed in 2, in the first phase of the invention, the dataset is preprocessed, then the important features are extracted from the dataset, and finally, the disease is clustered into positive and predictive using Taxicab Norm K-Means (TNKM).

4. As claimed in 2, in phase 2 of the proposed invention, CEDCNN is utilized for severity analysis, which classifies a high, low, and moderate level.

5. As claimed in 2, in phase 3 of the invention, the non-coronavirus data undergoes preprocessing, and then important features are extracted from the dataset. Also, the potential level of the patient against the coronavirus is predicted by the Mahalanobis Distance Ranking (MDR) method.

6. The proposed invention works by combining deep learning analytics, big data, virus structure, disease treatment, and vaccine manufacturing.

7. As claimed in 6, this is a computer science related efficient technique to predict the severity level and early prediction of COVID-1 and tt is found to be robust and efficient for the early coronavirus prediction and severity level classification.

\

Fig. 2 Fig. 1

Fig. 3