research-article

Open access

Data Augmentation-based Novel Deep Learning Method for Deepfaked Images Detection

Authors:

Imad RidaAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 11

Article No.: 339, Pages 1 - 15

https://doi.org/10.1145/3592615

Published: 12 September 2024 Publication History

PDF eReader

Abstract

Recent advances in artificial intelligence have led to deepfake images, enabling users to replace a real face with a genuine one. deepfake images have recently been used to malign public figures, politicians, and even average citizens. deepfake but realistic images have been used to stir political dissatisfaction, blackmail, propagate false news, and even carry out bogus terrorist attacks. Thus, identifying real images from fakes has got more challenging. To avoid these issues, this study employs transfer learning and data augmentation technique to classify deepfake images. For experimentation, 190,335 RGB-resolution deepfake and real images and image augmentation methods are used to prepare the dataset. The experiments use the deep learning models: convolutional neural network (CNN), Inception V3, visual geometry group (VGG19), and VGG16 with a transfer learning approach. Essential evaluation metrics (accuracy, precision, recall, F1-score, confusion matrix, and AUC-ROC curve score) are used to test the efficacy of the proposed approach. Results revealed that the proposed approach achieves an accuracy, recall, F1-score and AUC-ROC score of 90% and 91% precision, with our fine-tuned VGG16 model outperforming other DL models in recognizing real and deepfakes.

1 Introduction

Digital image, video, audio recognition, and tampering tools are quickly evolving [1, 2]. Digital content creation, manipulation methods, and technological expertise are also readily available [3]. Hyper-realistic digital graphics may now be created with few resources and simple how-to-do instructions accessible online [4, 5] hence leading to fake audios, videos, and images [6, 7]. With the help of the deepfake method, anyone may replace the face of someone you want with the face of someone else [8]. Splicing a synthetic face into the original picture creates this effect. Using Deep Learning, deepfake produces faked photos in which one person’s face is substituted for another’s. Celebrities’ reputations might be jeopardized if their likenesses are manipulated to disseminate misinformation. On a photograph of Abraham Lincoln, the oldest known attempt at a face-to-face trade was reported [9]. John Calhoun’s body was lithographed with his face on top. Due to advances in artificial neural networks, ML classifiers, and cyber thieves can now manipulate digital media to produce very realistically difficult-to-detect changed picture data, spreading false information. deepfake generation has become much simpler due to recent advances in disciplines like Generative Adversarial Networks (GANs) [10, 11], which need a reference image and a set of intended defects to generate realistically changed photographs.

Furthermore, FaceApp [12], MTCNN, and other tools have made creating deepfake photos quite simple. Consequently, automated systems for identifying deepfake photos have become critical, considering the worldwide impact and the scale to which these images may harm the security and stability of any society. However, since the forged images include prominent visual properties, these auto-generated fake images might be quickly assessed using Convolutional Neural Networks (CNNs). One of CNN’s many applications is picture classification. Objects that occupy the bulk of a photograph are given a class number, which is used to classify the photograph [13]. CNN is a component of the deep learning approach used for image identification and analysis and is built to handle pixelated input. The picture is used as an input and assigns priority to unique image features, allowing it to identify one from another [14]. CNN needs substantially less pre-processing than other classification techniques. While filters in other DL systems are created by hand, CNN can learn these features or filters with appropriate training [13].

Compared with other standard Neural Networks, in which layers make up the structure, the neurons in CNN’s layers are constructed in three dimensions: height, width, and depth. The most often utilized layers are convolutional/Pooling/Relu and fully connected (FC). Each layer has a basic API that employs a discrete function that may or may not have arguments to turn a 3D input into a 3D output. As previously stated, a simple CNN is a series of layers in which each layer translates a single set of operations to the other using a differentiable function. The types of layers used to build a CNN structure are the Conv/Pooling Layer and the FC Layer.

A complete CNN structure is built by stacking these layers. Input will contain the raw pixel values, and the picture’s raw pixel values represent an image with its height, width, and color. For example, if we decide to use 12 filters, we may end up with an input size of 224 \(\times\) 224 \(\times\) 12 by calculating the dot product between the input size and the weights of the neurons linked to it. In addition, the RELU layer assigns an element-based activation function. The pooling layer performs a spatial dimension downsampling process. The FC layer, calculates the final class grade. A CNN model transforms an original picture by layering values from the initial pixel values to the final class scores. Various approaches have been used to address problems such as a low detection rate for deep false images, a high error rate, a lengthy processing time, and improper data access. This study employs the concept of transfer learning to improve the detection performance of both deepfake and real images, emphasizing data augmentation approaches to improve the performance of the models during training [15, 16, 17].

1.1 Motivation

Researchers’ interest in identifying fakes has increased significantly in the past several years because of their impact on human behavior. An extensive body of research aimed at identifying practical methods for deducing if an image/video was a deepfake [18, 19, 20, 21]. Fake news has evolved partly because of the increasing accessibility of video modification tools. Improvements in the ability to identify deepfakes are thus critical. This new area of study has opened up a whole new research area. As software manipulation gets more common and simpler to manage, the number of deepfake examples increases. In general, deepfake presents several issues, including severe ramifications for individuals, governments, and businesses. To better understand and use machine learning-based techniques for deepfake detection, researchers have compiled several of these solutions in one place [22, 23, 24, 25].

In this research, we employed a CNN and its various variants that use binary classification to tell the difference between real and fake images and then offer an accurate result. The following is a list of some of the most important aspects of this work:

—

Propose a CNN-based approach using the concept of transfer learning that increases the detection performance of deepfake and real images with an accuracy rate of 90% using the VGG16 model.

—

Apply a data augmentation to improve the model performance by generating new image samples during the training of the VGG19 model.

—

Thorough comparison of multiple CNN variants was undertaken, and the VGG16 model performed much better than other DL models and state-of-the-art.

The structure of this research is as follows: For generating deepfake content generation and detection methods and the most up-to-date and widely used frameworks and architectures are introduced in Section 2 while an overview of deepfake and real images dataset is provided in Section 3.1. This study also explores the techniques to detect the deepfake and real images, as shown in Section 3. The finding of this research is presented in Section 4. Conclusions and future research opportunities on deepfake detection are discussed in Section 5.

2 Literature Review

While deepfake technology is still in its infancy, there has been much study into it. According to statistics from (https://app.dimensions.ai) at the end of 2020, the number of deepfake articles has increased dramatically in recent years, which is why Nguyen et al. and his collaborators conducted research [26]. Research on deep learning’s ability to represent complex and multidimensional data is rising. Despite this, the number of deepfake articles gathered is probably lower than the precise number. It has been frequently employed for dimensionality reduction and image compression by deep autoencoders, a form of deep network with such an ability [27, 28, 29].

Authors in [30] employed a neural network that uses binary classification to differentiate between real and fake photos and then offers an accurate result. One of the most often used deep learning models to identify between false and genuine images is the CNN. To compare multiple Deep Learning-based state-of-the-art models, the author suggested the deepfake-Stack, a deep ensemble learning technique. Incorporating the predictions from various level-0 models, we have constructed this model. In order to train the level-1 model, we must use out-of-sample data predictions supplied by the base models. A CNN model called deepfake Classifier was created using the InceptionResNetV2 and InceptionV3 base learners and ResNet101, DenseNet121, and Mobile-Net. Accuracy was 99.65%, and AUC-ROC was 1.0 in the experiment, which was achieved using these models. To identify deepfakes, Xu Chang et al. developed an algorithm that combines picture noise and image augmentation. The SRM filter method is used to get the image noise characteristics from the RGB pictures that must be recognized to emphasize the image noise. A big dataset and low picture quality necessitated Celeb-DF as the dataset for their experiment, which yielded significant results in identifying deepfake face photographs [31]. AUC was only 73.2% in this experiment using SRM filter and VGG-16, whereas their NA-VGG technique on the same dataset reached an AUC-ROC of 85.7%.

Badale et al. proposed a Neural Network technique to identify the fake video content reported in this article. For both original and deepfake frames, the video was used to extract them. Every image/frame has been classified as either “actual” or “fake” based on the source footage. A neural network’s first layer receives input from a flattened set of 256 \(\times\) 256 \(\times\) 3 dimensions [14]. Thus, the data is ready for input.SGD obtained 88% accuracy in categorical cross entropy, whereas Adam achieved 91%. Cross-entropy binary analysis showed that Adam was 90% accurate, whereas SGD was 86% accurate, but the mean square was 80% accurate in both cases. Another technique to identify fake videos is with Shraddha Suratkar et al. CNN architecture, and Transfer Learning methodology [8]. The use of CNN is offered as a way to extract characteristics from a video’s frames using CNN. It was determined that InceptionV3 had a 91.56% test accuracy rate, ResNet50 had an 84.04% score, ResNet18 had an 84.68% test accuracy score, NasNetMobile had a 91.97% score, and Xception had a 90.08% score [32]. The model based on Inceptionv3 had the greatest testing accuracy of all the other models in the research cited above. Rezende et al. [33] suggest using a pre-trained ResNet-50 to recognize computer-generated pictures is possible. SVM classifiers were used to replace the original top layer, which was replaced with a fully linked layer. The SVM classifier attained an accuracy rate of 94.05%, compared to a softmax layer’s 92.28% identification rate. AlexNet and VGG16 were also employed by Sengur et al. [34] to detect fraudulent content evidence by extracting facial characteristics. SVMs are used instead of thick layers to classify fake and legitimate faces in this suggested method uses transfer learning to import learned weights. The authors offered more information and better prediction accuracy and suggested merging the traits gleaned from both networks. On the CASIA dataset, the model’s accuracy was 94.01% when using the integrated features. Three sets of conv/max-pooling layers were suggested by Mo et al. to identify fraudulent faces. They employed a collection of spatially high-band-pass filters, which execute spatial operations for emphasizing minute features on pictures, increasing the image’s noise. The suggested CNN architecture uses residual noise as input characteristics. A GAN-based technique was used to enhance legitimate photos of the CELEBA-HQ dataset, resulting in a 99.4% accuracy rate [35, 36]. Regarding face picture forgery detection, authors in [37, 38] proposed comparing multiple CNN architectures over the Real and Fake Face datasets. The study includes picture normalization and preprocessing utilizing Error Level Analysis for further training and fine-tuning several deep learning models.

Existing methods have many drawbacks, including a low detection rate for deep fake images, a high mistake rate, a long processing time, and inaccurate data access. This work focuses on data augmentation approaches to improve the model performance by generating new image samples while training the models and also uses the concept of transfer learning that increases the detection performance of deepfake and real images. The survey results for identifying fake images are presented in Table 1.

Table 1.

Ref.	Method	Classification Model	Dataset
[39] (2020)	GAN model for features extraction	SVM, KNN, and LDA	Implemented multiple datasets (StarGAN, StyleGAN, StyleGAN2, AttGAN and GDWCT)
[40] (2020)	Deep Learning	Convolutional Neural Network	StyleGAN and iFakeFaceDB dataset (100K-Faces)
[41] (2020)	Deep Learning	CNN fusion with attention mechanism	two datasets (ProGAN and StyleGAN)
[42] (2020)	Deep Learning	CNN and auto-encoder	Four datasets (Glow, ProGAN, StarGAN, and StyleGAN)
[43] (2020)	Deep Learning	CNN and LSTM model	Two datasets (Celeb-DF & UADFV)
[44] (2020)	Deep Learning	CNN and LSTM model	Three datasets (Celeb-DF, FaceForensics++, and deepfake Detection Challenge)
[45] (2019)	GAN-Model-Pipeline	SVM model	Two datasets (StyleGAN, InterFaceGAN)
[46] (2019)	Steganography	CNN model	StyleGAN dataset with 100K-Faces images
[47] (2018)	Deep Learning	CNN model	(ProGAN, SNGAN, CramerGAN, MMDGAN)
[48] (2019)	Deep Learning	CNN model	Four datasets (StyleGAN, StarGAN, ProGAN, Glow, and CycleGAN)
[49] (2018)	Deep Learning	Neural network, naive bayes, KNN and random forest model	Own dataset

Table 1. Comparison of Various Studies on the Topic of Deepfake Detection

3 Proposed Approach

The use of deep learning for detecting deepfake images is crucial as the prevalence of these hoaxes becomes a greater social problem. deepfakes are faked visual or audio content made with the help of deep learning algorithms to give the impression that a certain person or group said or did something they did not [50]. The harmful use of this technology includes disseminating misleading information and propaganda and producing fake evidence for use in the courts. As a result, it is crucial to detect deepfakes. To protect the public from misinformation and help maintain our information ecosystem’s integrity, deep learning algorithms must be developed to identify deepfakes properly.

This section explains the suggested method for real and fake image detection. The proposed methodology used to detect real and fake images is shown in Figure 1. The proposed method used a deep CNN and its various variants (VGG16, inceptionV3, and VGG19) by applying transfer learning for better classification of images. Data augmentation provides new and varied instances for training datasets to enhance model performance and results. We separated the data into three sections: training validation, testing, and evaluation. The training and validation datasets are fed into a CNN model for feature extraction. We trained CNN, VGG16, InceptionV3, and VGG19 models to predict if an image is real or fake. This research is based on two classes, Real or Fake. We evaluated our suggested models based on training accuracy, testing precision, recall, F1-score, confusion matrix, and ROC_AUC score.

Fig. 1.

3.1 Dataset Selection

deepfake and real image datasets were employed in this study, which had never been used in any earlier research. The images depict both real and fake human faces. This dataset is freely available on kaggle.¹ The collection is made up of 190,335 RGB-resolution deepfake and real images. A dataset contains 256 \(\times\) 256 JPG images. These images are split into two groups, Real and Fake, with 140,002 images used to train the model, 39,428 images used for validation, and 10,905 used to evaluate the model’s abilities. Figure 2 depicts our dataset’s deepfake and real image visualization.

Fig. 2.

3.2 Data Pre-Processing

We added a variety of image augmentations since we know that deepfake detection techniques have a big issue of overfitting and poor generalization performance [51, 52, 53]. New pictures may be generated with Image Augmentation that can be rotated, zoomed, translated, and enhanced [54, 55, 56]. There is a significant reduction in the risk of over-fitting because of this procedure. Images are augmented in this study using the ImageDataGenerator [57], imported from Keras’ framework, principally via the random horizontal flip function and zoomed and rotated images to weaken the features of human faces, emphasizing the identification of deepfake trace characteristics. For example, we augmented our data using several augmentations like as (1) rescaling the image to 256 \(\times\) 256, (2) the rotation range is 25, (3) the shear range is 0.2, (4) the zoom range is 0.2, (5) horizontal flip is true, (6) fill mode is nearest, (7) width shift range is 0.1, and (8) height shift range is 0.1. Model effectiveness can be enhanced by constraining the picture pixel values inside a predetermined range, such as 0–1 or 1–1. For this purpose, we also employed image normalization.

3.3 Feature Extraction

The convolution operator is effective for solving complicated processes. A convolutional network analyzes data in several dimensions, such as images and time domain data. During the learning phase, it does feature extraction and weight computations. The most crucial benefit of CNNs is their ability to extract features automatically. As illustrated in Figure 3, the input image is first sent to a network for feature extraction.

Fig. 3.

The generated features are subsequently sent to a network for classifier inference. The feature extraction process uses many convolutional and pooling layer pairings to get the most out of the data. Each digital filter in the convolutional layer executes the convolution on the input data. The pooling layer determines the threshold, a dimensionality reduction layer. Many parameters must be tinkered with during backpropagation, which reduces the number of connections in the neural network design. In this way, we extracted the feature from the images. Convolution, a mathematical procedure, is carried out by slicing the Kernel matrix over the input matrix. The feature map is the accumulated result of matrix multiplications conducted element by element at each position. The linear operation of convolution has many applications beyond image processing, statistics, and even physics. More than one axis can be used in a convolution operation. The convoluted image can be computed as in Equation (1), assuming we have the input picture I and the 2-dimensional kernel filter K:

\begin{equation} {S(x,y) = \Sigma _{a}\Sigma _{b}I(a, b)k(x-a, y-b)}. \end{equation}

(1)

The data augmentation and optimal feature extraction are very significant for this research. All the model’s inference relies on these two steps. Sequence diagram showing how data augmentation and feature extraction are repeated if the model prediction is below 85% (Figure 4). This indicates the importance of these two steps in detecting deepfake images.

Fig. 4.

3.4 Modelling

Custom CNN model [58], VGG16, InceptionV3, and VGG19 are among the four CNN architectures trained to identify deepfake images. Classifiers are fine-tuned to distinguish between real and fake images. The CNN model utilized in this work comprises six convolutional and max-pooling layers, each having a pool size of two and a kernel size of three. The CNN model’s input shape is 256 \(\times\) 256 \(\times\) 3. For classification, the CNN model used the softmax activation function. Here, in Equation (2), \(z_{i}\) is the score value of the right class, and \(z_{j}\) are the score values of the other j classes; \(f(x)\) is the softmax.

\begin{equation} {f_{i}(x) = \frac{e^{z_{i}}}{\Sigma _{j}e^{z_{j}}}}. \end{equation}

(2)

After applying a classifier, the softmax function calculates the probabilities associated with each class. The Softmax algorithm converts the scores to a vector of numeric values from 0 to 1 that add up to 1. As a result, in a classification problem, the outputs of a softmax can be understood as a vector of probabilities of a label. The training aims at improving the likelihood that the proper class has been selected. As a result, the negative log-likelihood of the proper class is minimized. The Loss expression in a Softmax classifier is calculated using the formula in Equation (3):

\begin{equation} {L_{x} = -log\left(\frac{e^{Z_{y_{x}}}}{\Sigma _{j}e^{z_{j}}}\right)}, \end{equation}

(3)

where \(Z_{y_{x}}\) the amount of the correct class’s score and the \(z_{j}\) is other classes’ scores add up to. CNN model architecture consists of 29 layers, the first of which is an input layer, then many hidden layers, and finally, a fully linked layer. The total number of parameters is 1,601,534, of which 1,599,512 are trainable and only 2,022 are not. The loss was calculated using binary cross-entropy and the Adam optimizer. The model is trained across seven epochs. The VGG16 model used in this study has 24 layers, the first of which is an input layer, followed by several hidden levels, and eventually a dense layer. To get better outcomes, we used the principle of transfer learning. At the end of the VGG16 and InceptionV3 models, we added one flattened layer, two dense layers, and one dropout layer. The dropout and dense layers control the overfitting and classification of deepfake and real images. The VGG16 model was trained across three epochs, and InceptionV3 was trained using four epochs, and both yielded good results. We used two dense layers, one with relu and softmax activation function and one global average pooling layer at the end of the VGG19 model for model training. The VGG19 model was trained using seven epochs.

3.5 Evaluation Measures

We used various evaluation measures to evaluate model abilities, but we mainly focused on accuracy, precision, recall, and F1-score. These are the essential criteria used to evaluate the models’ performance in this research. The evaluation measures listed below are used to confirm the model’s effectiveness. The model’s deepfake detection capabilities may be accessed via these measures.

Accuracy: refers to the percentage of accurately classified test outcomes. To get this result, the Equation (4) shows by dividing the total expected outcomes by the total wrong predicted outcomes.

\begin{equation} {Acc = \frac{(tp + tn)}{(tp + fp + tn + fn)}}. \end{equation}

(4)

Precision: An algorithm’s precision is measured by dividing the count of positive occurrences by the total accurate predictions as defined in Equation (5).

\begin{equation} {Pre = \frac{(tp)}{(tp + fp)}}. \end{equation}

(5)

Recall: The accurate total values divided by the total predicted positive values is called recall, defined in Equation (6)

\begin{equation} {Rec = \frac{(tp)}{(tp + fn)}}. \end{equation}

(6)

F1-score: The harmonic mean of a classifier’s precision and recall is used to get the F1 score. The F1-score is a great way to test both precision and recall simultaneously. It is defined in Equation (7).

\begin{equation} {F1 = \frac{(2 * (Pre * Rec))}{(Pre + Rec)}}. \end{equation}

(7)

4 Experimental Analysis and Results

We used four different CNN models for deepfake and real image classification (custom CNN model, VGG16, VGG19, and InceptionV3). To evaluate the DL model’s capabilities, six assessment measures were used (accuracy, precision, recall, F1-score, confusion matrix, and Auc_Roc score). The dataset was separated into three sections: 140,002 images were used to train the models, 39,428 images were used for validation, and 10,905 were used to test the CNN models. Python is used to conduct the experiments for this research. Python code was written and executed in this analysis using the Kaggle framework, a web-based Python editor. Table 2 depicts the experimental setup used. The experiments also used several useful programming tools (including pandas, numpy, Keras, OpenCV, TensorFlow, matplotlib, and sklearn) to aid analysis and visualization.

Table 2.

Parameters	Value
Platform	Kaggle
Operating System (OS)	Windows 10 Home
GPU	TESLA P100-PCIE-16 GB
CPU	Intel Xeon (2 cores)
Computer memory unit	16
Programming Language	Python
Python Version	3.8.8

Table 2. Experimental Settings

4.1 CNN

The experimental results of the CNN model are shown in Table 3. The dataset was prepared using various preprocessing steps, and the CNN model was trained on extracted features. The CNN model takes 256 \(\times\) 256 \(\times\) 3 input image shape. The model was trained on 1,599,512 parameters. The binary cross entropy and 0.01 learning rate were used during the experiments. The CNN model was trained on seven epochs. The CNN model training accuracy is 96.29%, and the validation accuracy is 96.22%. After the model training, on testing data, the CNN model achieved an accuracy of 89%. Other assessment metrics such as precision, recall, F1-score, and AUC ROC score were also calculated on the testing data, with precision scoring 90% and the other three scorings 89%. The model exhibited very promising results for detecting deepfake images.

Table 3.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Roc Score (%)
CNN	89	90	89	89	89
InceptionV3	89	91	89	89	89
VGG19	71	72	71	71	71
VGG16	90	91	90	90	90

Table 3. Training and Testing Results of Models

4.2 InceptionV3 Model

The experimental results of the InceptionV3 model are presented in Table 3. The model was trained on CNN-based features. The InceptionV3 model takes the input of 256 \(\times\) 256 \(\times\) 3 image shape and ImageNet weights. The softmax activation function was used for the classification at the FC layer. The binary cross entropy and 1e-4 learning rate are used during model training. The model was trained on 40,643,490 parameters. The InceptionV3 model was trained on four epochs. The InceptionV3 model training accuracy is 92.80%, and the validation accuracy is 98.66%. The InceptionV3 model received the highest accuracy score of 89% using the test data. The model had a higher precision score than the CNN model, which was 90%, and all other assessment criteria were the same.

4.3 VGG19

Table 3 displays the empirical findings of the VGG19 model. The VGG19 model was trained using CNN-based features. The VGG19 model accepts ImageNet weights and a 256 \(\times\) 256 \(\times\) 3 image as input. At the FC layer, we performed classification using a softmax activation function. The binary cross entropy and RMSPROP learning rate are utilized during the model training process. There were 527,362 parameters used to train the model. Seven epochs were used to fine-tune the VGG19 model. The validation accuracy of the VGG19 model is 86.60%, whereas the training accuracy is 80.86%. The VGG19 model performed significantly worse than the other DL models. The model had a precision score of 72% and the lowest accuracy, recall, F1 score, and ROC AUC score of 71%. The initial dense layer has a large unit value, and no dropout layer was utilized to control overfitting.

4.4 VGG16

The results of the VGG16 model’s empirical tests are shown in Table 3. CNN-based characteristics were utilized during the training of the VGG16 model. The VGG16 model accepts ImageNet weights and a 256 \(\times\) 256 \(\times\) 3 image as input. We used a softmax activation function to classify the FC layer. The model training procedure uses the binary cross entropy and 0.00001 learning rate. When developing the model, 23,104,066 parameters were taken into account. Three epochs were utilized to fine-tune the VGG16 model. The VGG16 model has a validation accuracy of 98.15% and a training accuracy of 98.82%.

Finally, the VGG16 model outperformed all other DL models regarding the accuracy, recall, F1-score, and AUC ROC score, with 90% accuracy and 91% precision. The classification report of the VGG16 model is presented in Table 4. Based on the classification report, it is reasonable to infer that our Fine-tuned VGG16 model distinguishes between actual and deepfake pictures. The weighted and macro average accuracy, recall, and F1 score are shown in the classification report. There are 5,492 actual photographs and 5,413 false images in the testing set. We also show the VGG16 model’s training, validation accuracy, and loss graphs. Figure 5(a) and (b) illustrates the training and validation accuracy and loss of the VGG16 model. Furthermore, Figure 5(c) depicts the VGG16 model’s confusion matrix. The confusion matrix demonstrates that the VGG16 model performed well on the testing data. In addition, we construct the VGG16 model’s AUC ROC score. The AUC score indicates that if the model has an AUC score of more than 85%, it did extremely well on the provided data. Figure 5(d) depicts the visualization of the AUC ROC curve. The AUC ROC curve score for the VGG16 model was 90%.

Table 4.

Class	Precision (%)	Recall (%)	F1-score (%)	Support
Real	0.85	0.97	0.91	5,492
Fake	0.96	0.82	0.89	5,413
Accuracy	-	-	0.90	10,905
Macro_avg	0.91	0.90	0.90	10,905
Weighted_avg	0.91	0.90	0.90	10,905

Table 4. Classification Report of VGG16 Model

Fig. 5.

When comparing the outcomes of Deep Learning models, the Fine-tuned VGG16 model surpassed other DL models with a 90% accuracy rate.

5 Conclusion

Artificial intelligence has progressed to the point that we can now create “deepfake pictures,” which can be used to swap out a user’s face for a fake one. In recent years, people have used deepfake photos to smear celebrities, politicians, and ordinary people. Convincingly deep-faked images have been used to incite political unrest, exact blackmail, spread fake news, and even stage hoaxed terrorist acts. Therefore, it is more difficult to distinguish genuine from fake photographs. Our research uses a transfer learning and data augmentation technique to circumvent these problems to determine which photos are real and which deepfake. This research uses 190,335 deepfake and real photos of RGB resolution for testing purposes. In addition, the dataset is pre-processed using image augmentation techniques. Models are trained using features based on CNNs. In order to determine if the proposed method was effective, this study employed standard assessment metrics. The studies use a transfer learning strategy with deep learning models (CNN, Inception V3, VGG19, and VGG16). 10,905 fresh photos were used to evaluate the deep learning models. Our improved VGG16 model outperformed other DL models in spotting genuine and fake photographs, and the results showed that the suggested method achieved 90% accuracy, recall, F1-score, and AUC-ROC, with 91% precision.

The lack of comprehensive data is the primary shortcoming of this study, which will be addressed in further research. In the future, we intend to work on video forensics. A video may also be used to identify deepfake images. The AI and deep learning model might be used to build a mobile app that makes it simpler to identify deepfake and expands the technology’s reach. Other than that, it may be possible to use it on Android.

Footnote

https://www.kaggle.com/datasets/manjilkarki/deepfake-and-real-images.

References

[1]

Bobby Chesney and Danielle Citron. 2019. Deep fakes: A looming challenge for privacy, democracy, and national security. California Law Review 107 (2019), 1753.

Abstract

1 Introduction

1.1 Motivation

2 Literature Review

3 Proposed Approach

3.1 Dataset Selection

3.2 Data Pre-Processing

3.3 Feature Extraction

3.4 Modelling

3.5 Evaluation Measures

4 Experimental Analysis and Results

4.1 CNN

4.2 InceptionV3 Model

4.3 VGG19

4.4 VGG16

5 Conclusion

Footnote

References

Cited By

Index Terms

Recommendations

A comprehensive survey of recent trends in deep learning for digital images augmentation

Data augmentation using fast converging CIELAB-GAN for efficient deep learning dataset generation

A new deep learning approach based on grayscale conversion and DWT for object detection on adversarial attacked images

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations