Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Data Augmentation-based Novel Deep Learning Method for Deepfaked Images Detection

Published: 12 September 2024 Publication History

Abstract

Recent advances in artificial intelligence have led to deepfake images, enabling users to replace a real face with a genuine one. deepfake images have recently been used to malign public figures, politicians, and even average citizens. deepfake but realistic images have been used to stir political dissatisfaction, blackmail, propagate false news, and even carry out bogus terrorist attacks. Thus, identifying real images from fakes has got more challenging. To avoid these issues, this study employs transfer learning and data augmentation technique to classify deepfake images. For experimentation, 190,335 RGB-resolution deepfake and real images and image augmentation methods are used to prepare the dataset. The experiments use the deep learning models: convolutional neural network (CNN), Inception V3, visual geometry group (VGG19), and VGG16 with a transfer learning approach. Essential evaluation metrics (accuracy, precision, recall, F1-score, confusion matrix, and AUC-ROC curve score) are used to test the efficacy of the proposed approach. Results revealed that the proposed approach achieves an accuracy, recall, F1-score and AUC-ROC score of 90% and 91% precision, with our fine-tuned VGG16 model outperforming other DL models in recognizing real and deepfakes.

1 Introduction

Digital image, video, audio recognition, and tampering tools are quickly evolving [1, 2]. Digital content creation, manipulation methods, and technological expertise are also readily available [3]. Hyper-realistic digital graphics may now be created with few resources and simple how-to-do instructions accessible online [4, 5] hence leading to fake audios, videos, and images [6, 7]. With the help of the deepfake method, anyone may replace the face of someone you want with the face of someone else [8]. Splicing a synthetic face into the original picture creates this effect. Using Deep Learning, deepfake produces faked photos in which one person’s face is substituted for another’s. Celebrities’ reputations might be jeopardized if their likenesses are manipulated to disseminate misinformation. On a photograph of Abraham Lincoln, the oldest known attempt at a face-to-face trade was reported [9]. John Calhoun’s body was lithographed with his face on top. Due to advances in artificial neural networks, ML classifiers, and cyber thieves can now manipulate digital media to produce very realistically difficult-to-detect changed picture data, spreading false information. deepfake generation has become much simpler due to recent advances in disciplines like Generative Adversarial Networks (GANs) [10, 11], which need a reference image and a set of intended defects to generate realistically changed photographs.
Furthermore, FaceApp [12], MTCNN, and other tools have made creating deepfake photos quite simple. Consequently, automated systems for identifying deepfake photos have become critical, considering the worldwide impact and the scale to which these images may harm the security and stability of any society. However, since the forged images include prominent visual properties, these auto-generated fake images might be quickly assessed using Convolutional Neural Networks (CNNs). One of CNN’s many applications is picture classification. Objects that occupy the bulk of a photograph are given a class number, which is used to classify the photograph [13]. CNN is a component of the deep learning approach used for image identification and analysis and is built to handle pixelated input. The picture is used as an input and assigns priority to unique image features, allowing it to identify one from another [14]. CNN needs substantially less pre-processing than other classification techniques. While filters in other DL systems are created by hand, CNN can learn these features or filters with appropriate training [13].
Compared with other standard Neural Networks, in which layers make up the structure, the neurons in CNN’s layers are constructed in three dimensions: height, width, and depth. The most often utilized layers are convolutional/Pooling/Relu and fully connected (FC). Each layer has a basic API that employs a discrete function that may or may not have arguments to turn a 3D input into a 3D output. As previously stated, a simple CNN is a series of layers in which each layer translates a single set of operations to the other using a differentiable function. The types of layers used to build a CNN structure are the Conv/Pooling Layer and the FC Layer.
A complete CNN structure is built by stacking these layers. Input will contain the raw pixel values, and the picture’s raw pixel values represent an image with its height, width, and color. For example, if we decide to use 12 filters, we may end up with an input size of 224 \(\times\) 224 \(\times\) 12 by calculating the dot product between the input size and the weights of the neurons linked to it. In addition, the RELU layer assigns an element-based activation function. The pooling layer performs a spatial dimension downsampling process. The FC layer, calculates the final class grade. A CNN model transforms an original picture by layering values from the initial pixel values to the final class scores. Various approaches have been used to address problems such as a low detection rate for deep false images, a high error rate, a lengthy processing time, and improper data access. This study employs the concept of transfer learning to improve the detection performance of both deepfake and real images, emphasizing data augmentation approaches to improve the performance of the models during training [15, 16, 17].

1.1 Motivation

Researchers’ interest in identifying fakes has increased significantly in the past several years because of their impact on human behavior. An extensive body of research aimed at identifying practical methods for deducing if an image/video was a deepfake [18, 19, 20, 21]. Fake news has evolved partly because of the increasing accessibility of video modification tools. Improvements in the ability to identify deepfakes are thus critical. This new area of study has opened up a whole new research area. As software manipulation gets more common and simpler to manage, the number of deepfake examples increases. In general, deepfake presents several issues, including severe ramifications for individuals, governments, and businesses. To better understand and use machine learning-based techniques for deepfake detection, researchers have compiled several of these solutions in one place [22, 23, 24, 25].
In this research, we employed a CNN and its various variants that use binary classification to tell the difference between real and fake images and then offer an accurate result. The following is a list of some of the most important aspects of this work:
Propose a CNN-based approach using the concept of transfer learning that increases the detection performance of deepfake and real images with an accuracy rate of 90% using the VGG16 model.
Apply a data augmentation to improve the model performance by generating new image samples during the training of the VGG19 model.
Thorough comparison of multiple CNN variants was undertaken, and the VGG16 model performed much better than other DL models and state-of-the-art.
The structure of this research is as follows: For generating deepfake content generation and detection methods and the most up-to-date and widely used frameworks and architectures are introduced in Section 2 while an overview of deepfake and real images dataset is provided in Section 3.1. This study also explores the techniques to detect the deepfake and real images, as shown in Section 3. The finding of this research is presented in Section 4. Conclusions and future research opportunities on deepfake detection are discussed in Section 5.

2 Literature Review

While deepfake technology is still in its infancy, there has been much study into it. According to statistics from (https://app.dimensions.ai) at the end of 2020, the number of deepfake articles has increased dramatically in recent years, which is why Nguyen et al. and his collaborators conducted research [26]. Research on deep learning’s ability to represent complex and multidimensional data is rising. Despite this, the number of deepfake articles gathered is probably lower than the precise number. It has been frequently employed for dimensionality reduction and image compression by deep autoencoders, a form of deep network with such an ability [27, 28, 29].
Authors in [30] employed a neural network that uses binary classification to differentiate between real and fake photos and then offers an accurate result. One of the most often used deep learning models to identify between false and genuine images is the CNN. To compare multiple Deep Learning-based state-of-the-art models, the author suggested the deepfake-Stack, a deep ensemble learning technique. Incorporating the predictions from various level-0 models, we have constructed this model. In order to train the level-1 model, we must use out-of-sample data predictions supplied by the base models. A CNN model called deepfake Classifier was created using the InceptionResNetV2 and InceptionV3 base learners and ResNet101, DenseNet121, and Mobile-Net. Accuracy was 99.65%, and AUC-ROC was 1.0 in the experiment, which was achieved using these models. To identify deepfakes, Xu Chang et al. developed an algorithm that combines picture noise and image augmentation. The SRM filter method is used to get the image noise characteristics from the RGB pictures that must be recognized to emphasize the image noise. A big dataset and low picture quality necessitated Celeb-DF as the dataset for their experiment, which yielded significant results in identifying deepfake face photographs [31]. AUC was only 73.2% in this experiment using SRM filter and VGG-16, whereas their NA-VGG technique on the same dataset reached an AUC-ROC of 85.7%.
Badale et al. proposed a Neural Network technique to identify the fake video content reported in this article. For both original and deepfake frames, the video was used to extract them. Every image/frame has been classified as either “actual” or “fake” based on the source footage. A neural network’s first layer receives input from a flattened set of 256 \(\times\) 256 \(\times\) 3 dimensions [14]. Thus, the data is ready for input.SGD obtained 88% accuracy in categorical cross entropy, whereas Adam achieved 91%. Cross-entropy binary analysis showed that Adam was 90% accurate, whereas SGD was 86% accurate, but the mean square was 80% accurate in both cases. Another technique to identify fake videos is with Shraddha Suratkar et al. CNN architecture, and Transfer Learning methodology [8]. The use of CNN is offered as a way to extract characteristics from a video’s frames using CNN. It was determined that InceptionV3 had a 91.56% test accuracy rate, ResNet50 had an 84.04% score, ResNet18 had an 84.68% test accuracy score, NasNetMobile had a 91.97% score, and Xception had a 90.08% score [32]. The model based on Inceptionv3 had the greatest testing accuracy of all the other models in the research cited above. Rezende et al. [33] suggest using a pre-trained ResNet-50 to recognize computer-generated pictures is possible. SVM classifiers were used to replace the original top layer, which was replaced with a fully linked layer. The SVM classifier attained an accuracy rate of 94.05%, compared to a softmax layer’s 92.28% identification rate. AlexNet and VGG16 were also employed by Sengur et al. [34] to detect fraudulent content evidence by extracting facial characteristics. SVMs are used instead of thick layers to classify fake and legitimate faces in this suggested method uses transfer learning to import learned weights. The authors offered more information and better prediction accuracy and suggested merging the traits gleaned from both networks. On the CASIA dataset, the model’s accuracy was 94.01% when using the integrated features. Three sets of conv/max-pooling layers were suggested by Mo et al. to identify fraudulent faces. They employed a collection of spatially high-band-pass filters, which execute spatial operations for emphasizing minute features on pictures, increasing the image’s noise. The suggested CNN architecture uses residual noise as input characteristics. A GAN-based technique was used to enhance legitimate photos of the CELEBA-HQ dataset, resulting in a 99.4% accuracy rate [35, 36]. Regarding face picture forgery detection, authors in [37, 38] proposed comparing multiple CNN architectures over the Real and Fake Face datasets. The study includes picture normalization and preprocessing utilizing Error Level Analysis for further training and fine-tuning several deep learning models.
Existing methods have many drawbacks, including a low detection rate for deep fake images, a high mistake rate, a long processing time, and inaccurate data access. This work focuses on data augmentation approaches to improve the model performance by generating new image samples while training the models and also uses the concept of transfer learning that increases the detection performance of deepfake and real images. The survey results for identifying fake images are presented in Table 1.
Table 1.
Ref.MethodClassification ModelDataset
[39] (2020)GAN model for features extractionSVM, KNN, and LDAImplemented multiple datasets (StarGAN, StyleGAN, StyleGAN2, AttGAN and GDWCT)
[40] (2020)Deep LearningConvolutional Neural NetworkStyleGAN and iFakeFaceDB dataset (100K-Faces)
[41] (2020)Deep LearningCNN fusion with attention mechanismtwo datasets (ProGAN and StyleGAN)
[42] (2020)Deep LearningCNN and auto-encoderFour datasets (Glow, ProGAN, StarGAN, and StyleGAN)
[43] (2020)Deep LearningCNN and LSTM modelTwo datasets (Celeb-DF & UADFV)
[44] (2020)Deep LearningCNN and LSTM modelThree datasets (Celeb-DF, FaceForensics++, and deepfake Detection Challenge)
[45] (2019)GAN-Model-PipelineSVM modelTwo datasets (StyleGAN, InterFaceGAN)
[46] (2019)SteganographyCNN modelStyleGAN dataset with 100K-Faces images
[47] (2018)Deep LearningCNN model(ProGAN, SNGAN, CramerGAN, MMDGAN)
[48] (2019)Deep LearningCNN modelFour datasets (StyleGAN, StarGAN, ProGAN, Glow, and CycleGAN)
[49] (2018)Deep LearningNeural network, naive bayes, KNN and random forest modelOwn dataset
Table 1. Comparison of Various Studies on the Topic of Deepfake Detection

3 Proposed Approach

The use of deep learning for detecting deepfake images is crucial as the prevalence of these hoaxes becomes a greater social problem. deepfakes are faked visual or audio content made with the help of deep learning algorithms to give the impression that a certain person or group said or did something they did not [50]. The harmful use of this technology includes disseminating misleading information and propaganda and producing fake evidence for use in the courts. As a result, it is crucial to detect deepfakes. To protect the public from misinformation and help maintain our information ecosystem’s integrity, deep learning algorithms must be developed to identify deepfakes properly.
This section explains the suggested method for real and fake image detection. The proposed methodology used to detect real and fake images is shown in Figure 1. The proposed method used a deep CNN and its various variants (VGG16, inceptionV3, and VGG19) by applying transfer learning for better classification of images. Data augmentation provides new and varied instances for training datasets to enhance model performance and results. We separated the data into three sections: training validation, testing, and evaluation. The training and validation datasets are fed into a CNN model for feature extraction. We trained CNN, VGG16, InceptionV3, and VGG19 models to predict if an image is real or fake. This research is based on two classes, Real or Fake. We evaluated our suggested models based on training accuracy, testing precision, recall, F1-score, confusion matrix, and ROC_AUC score.
Fig. 1.
Fig. 1. Block diagram for deepfake and real images detection.

3.1 Dataset Selection

deepfake and real image datasets were employed in this study, which had never been used in any earlier research. The images depict both real and fake human faces. This dataset is freely available on kaggle.1 The collection is made up of 190,335 RGB-resolution deepfake and real images. A dataset contains 256 \(\times\) 256 JPG images. These images are split into two groups, Real and Fake, with 140,002 images used to train the model, 39,428 images used for validation, and 10,905 used to evaluate the model’s abilities. Figure 2 depicts our dataset’s deepfake and real image visualization.
Fig. 2.
Fig. 2. Visualization of fake and real image.

3.2 Data Pre-Processing

We added a variety of image augmentations since we know that deepfake detection techniques have a big issue of overfitting and poor generalization performance [51, 52, 53]. New pictures may be generated with Image Augmentation that can be rotated, zoomed, translated, and enhanced [54, 55, 56]. There is a significant reduction in the risk of over-fitting because of this procedure. Images are augmented in this study using the ImageDataGenerator [57], imported from Keras’ framework, principally via the random horizontal flip function and zoomed and rotated images to weaken the features of human faces, emphasizing the identification of deepfake trace characteristics. For example, we augmented our data using several augmentations like as (1) rescaling the image to 256 \(\times\) 256, (2) the rotation range is 25, (3) the shear range is 0.2, (4) the zoom range is 0.2, (5) horizontal flip is true, (6) fill mode is nearest, (7) width shift range is 0.1, and (8) height shift range is 0.1. Model effectiveness can be enhanced by constraining the picture pixel values inside a predetermined range, such as 0–1 or 1–1. For this purpose, we also employed image normalization.

3.3 Feature Extraction

The convolution operator is effective for solving complicated processes. A convolutional network analyzes data in several dimensions, such as images and time domain data. During the learning phase, it does feature extraction and weight computations. The most crucial benefit of CNNs is their ability to extract features automatically. As illustrated in Figure 3, the input image is first sent to a network for feature extraction.
Fig. 3.
Fig. 3. Block diagram of features extraction method.
The generated features are subsequently sent to a network for classifier inference. The feature extraction process uses many convolutional and pooling layer pairings to get the most out of the data. Each digital filter in the convolutional layer executes the convolution on the input data. The pooling layer determines the threshold, a dimensionality reduction layer. Many parameters must be tinkered with during backpropagation, which reduces the number of connections in the neural network design. In this way, we extracted the feature from the images. Convolution, a mathematical procedure, is carried out by slicing the Kernel matrix over the input matrix. The feature map is the accumulated result of matrix multiplications conducted element by element at each position. The linear operation of convolution has many applications beyond image processing, statistics, and even physics. More than one axis can be used in a convolution operation. The convoluted image can be computed as in Equation (1), assuming we have the input picture I and the 2-dimensional kernel filter K:
\begin{equation} {S(x,y) = \Sigma _{a}\Sigma _{b}I(a, b)k(x-a, y-b)}. \end{equation}
(1)
The data augmentation and optimal feature extraction are very significant for this research. All the model’s inference relies on these two steps. Sequence diagram showing how data augmentation and feature extraction are repeated if the model prediction is below 85% (Figure 4). This indicates the importance of these two steps in detecting deepfake images.
Fig. 4.
Fig. 4. Sequence diagram.

3.4 Modelling

Custom CNN model [58], VGG16, InceptionV3, and VGG19 are among the four CNN architectures trained to identify deepfake images. Classifiers are fine-tuned to distinguish between real and fake images. The CNN model utilized in this work comprises six convolutional and max-pooling layers, each having a pool size of two and a kernel size of three. The CNN model’s input shape is 256 \(\times\) 256 \(\times\) 3. For classification, the CNN model used the softmax activation function. Here, in Equation (2), \(z_{i}\) is the score value of the right class, and \(z_{j}\) are the score values of the other j classes; \(f(x)\) is the softmax.
\begin{equation} {f_{i}(x) = \frac{e^{z_{i}}}{\Sigma _{j}e^{z_{j}}}}. \end{equation}
(2)
After applying a classifier, the softmax function calculates the probabilities associated with each class. The Softmax algorithm converts the scores to a vector of numeric values from 0 to 1 that add up to 1. As a result, in a classification problem, the outputs of a softmax can be understood as a vector of probabilities of a label. The training aims at improving the likelihood that the proper class has been selected. As a result, the negative log-likelihood of the proper class is minimized. The Loss expression in a Softmax classifier is calculated using the formula in Equation (3):
\begin{equation} {L_{x} = -log\left(\frac{e^{Z_{y_{x}}}}{\Sigma _{j}e^{z_{j}}}\right)}, \end{equation}
(3)
where \(Z_{y_{x}}\) the amount of the correct class’s score and the \(z_{j}\) is other classes’ scores add up to. CNN model architecture consists of 29 layers, the first of which is an input layer, then many hidden layers, and finally, a fully linked layer. The total number of parameters is 1,601,534, of which 1,599,512 are trainable and only 2,022 are not. The loss was calculated using binary cross-entropy and the Adam optimizer. The model is trained across seven epochs. The VGG16 model used in this study has 24 layers, the first of which is an input layer, followed by several hidden levels, and eventually a dense layer. To get better outcomes, we used the principle of transfer learning. At the end of the VGG16 and InceptionV3 models, we added one flattened layer, two dense layers, and one dropout layer. The dropout and dense layers control the overfitting and classification of deepfake and real images. The VGG16 model was trained across three epochs, and InceptionV3 was trained using four epochs, and both yielded good results. We used two dense layers, one with relu and softmax activation function and one global average pooling layer at the end of the VGG19 model for model training. The VGG19 model was trained using seven epochs.

3.5 Evaluation Measures

We used various evaluation measures to evaluate model abilities, but we mainly focused on accuracy, precision, recall, and F1-score. These are the essential criteria used to evaluate the models’ performance in this research. The evaluation measures listed below are used to confirm the model’s effectiveness. The model’s deepfake detection capabilities may be accessed via these measures.
Accuracy: refers to the percentage of accurately classified test outcomes. To get this result, the Equation (4) shows by dividing the total expected outcomes by the total wrong predicted outcomes.
\begin{equation} {Acc = \frac{(tp + tn)}{(tp + fp + tn + fn)}}. \end{equation}
(4)
Precision: An algorithm’s precision is measured by dividing the count of positive occurrences by the total accurate predictions as defined in Equation (5).
\begin{equation} {Pre = \frac{(tp)}{(tp + fp)}}. \end{equation}
(5)
Recall: The accurate total values divided by the total predicted positive values is called recall, defined in Equation (6)
\begin{equation} {Rec = \frac{(tp)}{(tp + fn)}}. \end{equation}
(6)
F1-score: The harmonic mean of a classifier’s precision and recall is used to get the F1 score. The F1-score is a great way to test both precision and recall simultaneously. It is defined in Equation (7).
\begin{equation} {F1 = \frac{(2 * (Pre * Rec))}{(Pre + Rec)}}. \end{equation}
(7)

4 Experimental Analysis and Results

We used four different CNN models for deepfake and real image classification (custom CNN model, VGG16, VGG19, and InceptionV3). To evaluate the DL model’s capabilities, six assessment measures were used (accuracy, precision, recall, F1-score, confusion matrix, and Auc_Roc score). The dataset was separated into three sections: 140,002 images were used to train the models, 39,428 images were used for validation, and 10,905 were used to test the CNN models. Python is used to conduct the experiments for this research. Python code was written and executed in this analysis using the Kaggle framework, a web-based Python editor. Table 2 depicts the experimental setup used. The experiments also used several useful programming tools (including pandas, numpy, Keras, OpenCV, TensorFlow, matplotlib, and sklearn) to aid analysis and visualization.
Table 2.
ParametersValue
PlatformKaggle
Operating System (OS)Windows 10 Home
GPUTESLA P100-PCIE-16 GB
CPUIntel Xeon (2 cores)
Computer memory unit16
Programming LanguagePython
Python Version3.8.8
Table 2. Experimental Settings

4.1 CNN

The experimental results of the CNN model are shown in Table 3. The dataset was prepared using various preprocessing steps, and the CNN model was trained on extracted features. The CNN model takes 256 \(\times\) 256 \(\times\) 3 input image shape. The model was trained on 1,599,512 parameters. The binary cross entropy and 0.01 learning rate were used during the experiments. The CNN model was trained on seven epochs. The CNN model training accuracy is 96.29%, and the validation accuracy is 96.22%. After the model training, on testing data, the CNN model achieved an accuracy of 89%. Other assessment metrics such as precision, recall, F1-score, and AUC ROC score were also calculated on the testing data, with precision scoring 90% and the other three scorings 89%. The model exhibited very promising results for detecting deepfake images.
Table 3.
ModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)Roc Score (%)
CNN8990898989
InceptionV38991898989
VGG197172717171
VGG169091909090
Table 3. Training and Testing Results of Models

4.2 InceptionV3 Model

The experimental results of the InceptionV3 model are presented in Table 3. The model was trained on CNN-based features. The InceptionV3 model takes the input of 256 \(\times\) 256 \(\times\) 3 image shape and ImageNet weights. The softmax activation function was used for the classification at the FC layer. The binary cross entropy and 1e-4 learning rate are used during model training. The model was trained on 40,643,490 parameters. The InceptionV3 model was trained on four epochs. The InceptionV3 model training accuracy is 92.80%, and the validation accuracy is 98.66%. The InceptionV3 model received the highest accuracy score of 89% using the test data. The model had a higher precision score than the CNN model, which was 90%, and all other assessment criteria were the same.

4.3 VGG19

Table 3 displays the empirical findings of the VGG19 model. The VGG19 model was trained using CNN-based features. The VGG19 model accepts ImageNet weights and a 256 \(\times\) 256 \(\times\) 3 image as input. At the FC layer, we performed classification using a softmax activation function. The binary cross entropy and RMSPROP learning rate are utilized during the model training process. There were 527,362 parameters used to train the model. Seven epochs were used to fine-tune the VGG19 model. The validation accuracy of the VGG19 model is 86.60%, whereas the training accuracy is 80.86%. The VGG19 model performed significantly worse than the other DL models. The model had a precision score of 72% and the lowest accuracy, recall, F1 score, and ROC AUC score of 71%. The initial dense layer has a large unit value, and no dropout layer was utilized to control overfitting.

4.4 VGG16

The results of the VGG16 model’s empirical tests are shown in Table 3. CNN-based characteristics were utilized during the training of the VGG16 model. The VGG16 model accepts ImageNet weights and a 256 \(\times\) 256 \(\times\) 3 image as input. We used a softmax activation function to classify the FC layer. The model training procedure uses the binary cross entropy and 0.00001 learning rate. When developing the model, 23,104,066 parameters were taken into account. Three epochs were utilized to fine-tune the VGG16 model. The VGG16 model has a validation accuracy of 98.15% and a training accuracy of 98.82%.
Finally, the VGG16 model outperformed all other DL models regarding the accuracy, recall, F1-score, and AUC ROC score, with 90% accuracy and 91% precision. The classification report of the VGG16 model is presented in Table 4. Based on the classification report, it is reasonable to infer that our Fine-tuned VGG16 model distinguishes between actual and deepfake pictures. The weighted and macro average accuracy, recall, and F1 score are shown in the classification report. There are 5,492 actual photographs and 5,413 false images in the testing set. We also show the VGG16 model’s training, validation accuracy, and loss graphs. Figure 5(a) and (b) illustrates the training and validation accuracy and loss of the VGG16 model. Furthermore, Figure 5(c) depicts the VGG16 model’s confusion matrix. The confusion matrix demonstrates that the VGG16 model performed well on the testing data. In addition, we construct the VGG16 model’s AUC ROC score. The AUC score indicates that if the model has an AUC score of more than 85%, it did extremely well on the provided data. Figure 5(d) depicts the visualization of the AUC ROC curve. The AUC ROC curve score for the VGG16 model was 90%.
Table 4.
ClassPrecision (%)Recall (%)F1-score (%)Support
Real0.850.970.915,492
Fake0.960.820.895,413
Accuracy--0.9010,905
Macro_avg0.910.900.9010,905
Weighted_avg0.910.900.9010,905
Table 4. Classification Report of VGG16 Model
Fig. 5.
Fig. 5. Results of Fine-tuned VGG16 model.
When comparing the outcomes of Deep Learning models, the Fine-tuned VGG16 model surpassed other DL models with a 90% accuracy rate.

5 Conclusion

Artificial intelligence has progressed to the point that we can now create “deepfake pictures,” which can be used to swap out a user’s face for a fake one. In recent years, people have used deepfake photos to smear celebrities, politicians, and ordinary people. Convincingly deep-faked images have been used to incite political unrest, exact blackmail, spread fake news, and even stage hoaxed terrorist acts. Therefore, it is more difficult to distinguish genuine from fake photographs. Our research uses a transfer learning and data augmentation technique to circumvent these problems to determine which photos are real and which deepfake. This research uses 190,335 deepfake and real photos of RGB resolution for testing purposes. In addition, the dataset is pre-processed using image augmentation techniques. Models are trained using features based on CNNs. In order to determine if the proposed method was effective, this study employed standard assessment metrics. The studies use a transfer learning strategy with deep learning models (CNN, Inception V3, VGG19, and VGG16). 10,905 fresh photos were used to evaluate the deep learning models. Our improved VGG16 model outperformed other DL models in spotting genuine and fake photographs, and the results showed that the suggested method achieved 90% accuracy, recall, F1-score, and AUC-ROC, with 91% precision.
The lack of comprehensive data is the primary shortcoming of this study, which will be addressed in further research. In the future, we intend to work on video forensics. A video may also be used to identify deepfake images. The AI and deep learning model might be used to build a mobile app that makes it simpler to identify deepfake and expands the technology’s reach. Other than that, it may be possible to use it on Android.

Footnote

References

[1]
Bobby Chesney and Danielle Citron. 2019. Deep fakes: A looming challenge for privacy, democracy, and national security. California Law Review 107 (2019), 1753.
[2]
Imad Rida. 2020. Data-driven audio recognition: A supervised dictionary approach. arXiv:2012.14761. Retrieved from https://arxiv.org/abs/2012.14761.
[3]
Ahmed Abbasi, Abdul Rehman Rehman Javed, Amanullah Yasin, Zunera Jalil, Natalia Kryvinska, and Usman Tariq. 2022. A large-scale benchmark dataset for anomaly detection and rare event classification for audio forensics. IEEE Access 10 (2022), 38885–38894.
[4]
Rayan Al Sobbahi and Joe Tekli. 2022. Low-light image enhancement using image-to-frequency filter learning. In Proceedings of the International Conference on Image Analysis and Processing. Springer, 693–705.
[5]
Pavel Korshunov and Sébastien Marcel. 2018. Deepfakes: A new threat to face recognition? Assessment and detection. arXiv:1812.08685. Retrieved from https://arxiv.org/abs/1812.08685.
[6]
Farkhund Iqbal, Ahmed Abbasi, Abdul Rehman Javed, Zunera Jalil, and Jamal Al-Karaki. 2022. Deepfake audio detection via feature engineering and machine learning. In Proceedings of the Woodstock’22: Symposium on the Irreproducible Science. IEEE, 70–75.
[7]
Ameer Hamza, Abdul Rehman Rehman Javed, Farkhund Iqbal, Natalia Kryvinska, Ahmad S. Almadhor, Zunera Jalil, and Rouba Borghol. 2022. Deepfake audio detection via MFCC features using machine learning. IEEE Access 10 (2022), 134018–134028.
[8]
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security. IEEE, 1–7.
[9]
David Güera and Edward J. Delp. 2018. Deepfake video detection using recurrent neural networks. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’18). IEEE, 1–6.
[10]
Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A. Bharath. 2018. Generative adversarial networks: An overview. IEEE Signal Processing Magazine 35, 1 (2018), 53–65.
[11]
Charan D. Prakash and Lina J. Karam. 2021. It GAN DO better: GAN-based detection of objects on images with varying quality. IEEE Transactions on Image Processing 30 (2021), 9220–9230.
[12]
Ashar Neyaz, Avinash Kumar, Sundar Krishnan, Jessica Placker, and Qingzhong Liu. 2020. Security, privacy and steganographic analysis of FaceApp and TikTok. International Journal of Computer Science and Security 14, 2 (2020), 38–59.
[13]
Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, and Tsuhan Chen. 2018. Recent advances in convolutional neural networks. Pattern Recognition 77 (2018), 354–377.
[14]
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3207–3216.
[15]
I. Kouatli. 2020. The use of fuzzy logic as augmentation to quantitative analysis to unleash knowledge of participants’ uncertainty when filling a survey: Case of cloud computing. IEEE Transactions on Knowledge and Data Engineering 34, 3 (2020), 1489–1500.
[16]
Issam M. Kouatli and Skander Ben Abdallah. 2018. An augmentation of fuzziness to randomness in project evaluation. In Proceedings of the FSDM. 164–173.
[17]
Saima Iqbal, Wilayat Khan, Abdulrahman Alothaim, Aamir Qamar, Adi Alhudhaif, and Shtwai Alsubai. 2022. Proving reliability of image processing techniques in digital forensics applications. Security and Communication Networks 2022 (2022).
[18]
Ye-Chan Ahn and Chang-Sung Jeong. 2019. Natural language contents evaluation system for detecting fake news using deep learning. In Proceedings of the 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE’19). IEEE, 289–292.
[19]
Yuta Yanagi, Ryohei Orihara, Yuichi Sei, Yasuyuki Tahara, and Akihiko Ohsuga. 2020. Fake news detection with generated comments for news articles. In Proceedings of the 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES’20). IEEE, 85–90.
[20]
Youngkyung Seo and Chang-Sung Jeong. 2018. FaGoN: Fake news detection model using grammatic transformation on neural network. In Proceedings of the 2018 13th International Conference on Knowledge, Information and Creativity Support Systems (KICSS’18). IEEE, 1–5.
[21]
G. A. Rajesh Kumar, Ravi Kant Kumar, and Goutam Sanyal. 2017. Discriminating real from fake smile using convolution neural network. In Proceedings of the 2017 International Conference on Computational Intelligence in Data Science (ICCIDS’17). IEEE, 1–6.
[22]
Xishuang Dong, Uboho Victor, and Lijun Qian. 2020. Two-path deep semisupervised learning for timely fake news detection. IEEE Transactions on Computational Social Systems 7, 6 (2020), 1386–1398.
[23]
Peisong He, Haoliang Li, and Hongxia Wang. 2019. Detection of fake images via the ensemble of deep representations from multi color spaces. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP’19). IEEE, 2299–2303.
[24]
Sunidhi Sharma and Dilip Kumar Sharma. 2019. Fake news detection: A long way to go. In Proceedings of the 2019 4th International Conference on Information Systems and Computer Networks (ISCON’19). IEEE, 816–821.
[25]
Paulo Roberto Da Cordeiro, Vladia Pinheiro, Ronaldo Moreira, Cecilia Carvalho, and Livio Freire. 2019. What is real or fake?-Machine learning approaches for rumor verification using stance classification. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 429–432.
[26]
Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Cuong M. Nguyen, Dung Nguyen, Duc Thanh Nguyen, and Saeid Nahavandi. 2019. Deep learning for deepfakes creation and detection: A survey. arXiv:1909.11573. Retrieved from https://arxiv.org/abs/1909.11573.
[27]
Abhijith Punnappurath and Michael S. Brown. 2019. Learning raw image reconstruction-aware deep image compressors. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2019), 1013–1019.
[28]
Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2019. Energy compaction-based image compression using convolutional autoencoder. IEEE Transactions on Multimedia 22, 4 (2019), 860–873.
[29]
Jan Chorowski, Ron J. Weiss, Samy Bengio, and Aäron Van Den Oord. 2019. Unsupervised speech representation learning using wavenet autoencoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 12 (2019), 2041–2053.
[30]
Md Shohel Rana and Andrew H. Sung. 2020. Deepfakestack: A deep ensemble-based learning technique for deepfake detection. In Proceedings of the 2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom’20). IEEE, 70–75.
[31]
Xu Chang, Jian Wu, Tongfeng Yang, and Guorui Feng. 2020. Deepfake face image detection based on improved VGG convolutional neural network. In Proceedings of the 2020 39th Chinese Control Conference (CCC’20). IEEE, 7252–7256.
[32]
Shraddha Suratkar, Elvin Johnson, Karan Variyambat, Mihir Panchal, and Faruk Kazi. 2020. Employing transfer-learning based CNN architectures to enhance the generalizability of deepfake detection. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT’20). IEEE, 1–9.
[33]
Edmar R. S. De Rezende, Guilherme C. S. Ruppert, and Tiago Carvalho. 2017. Detecting computer generated images with deep convolutional neural networks. In Proceedings of the 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI’17). IEEE, 71–78.
[34]
Abdulkadir Şengür, Zahid Akhtar, Yaman Akbulut, Sami Ekici, and Ümit Budak. 2018. Deep feature extraction for face liveness detection. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP’18). IEEE, 1–4.
[35]
Huaxiao Mo, Bolin Chen, and Weiqi Luo. 2018. Fake faces identification via convolutional neural network. In Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security. 43–47.
[36]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196. Retrieved from https://arxiv.org/abs/1710.10196.
[37]
Qurat-ul-ain, Nudrat Nida, Aun Irtaza, and Nouman Ilyas. 2021. Forged face detection using ELA and deep learning techniques. In Proceedings of the 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST’21). IEEE, 271–275.
[38]
Leandro A. Passos, Danilo Jodas, Kelton AP da Costa, Luis A. Souza Júnior, Danilo Colombo, and João Paulo Papa. 2022. A review of deep learning-based approaches for deepfake content detection. arXiv:2202.06095. Retrieved from https://arxiv.org/abs/2202.06095.
[39]
Luca Guarnera, Oliver Giudice, and Sebastiano Battiato. 2020. Deepfake detection by analyzing convolutional traces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 666–667.
[40]
Joao C. Neves, Ruben Tolosana, Ruben Vera-Rodriguez, Vasco Lopes, Hugo Proença, and Julian Fierrez. 2020. Ganprintr: Improved fakes and evaluation of the state-of-the-art in face manipulation detection. IEEE Journal of Selected Topics in Signal Processing 14, 5 (2020), 1038–1048.
[41]
Hao Dang, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K. Jain. 2020. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5781–5790.
[42]
Nils Hulzebosch, Sarah Ibrahimi, and Marcel Worring. 2020. Detecting CNN-generated facial images in real-world scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 642–643.
[43]
Peng Chen, Jin Liu, Tao Liang, Guangzhi Zhou, Hongchao Gao, Jiao Dai, and Jizhong Han. 2020. Fsspotter: Spotting face-swapped video by spatial and temporal clues. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME’20). IEEE, 1–6.
[44]
Pranjal Ranjan, Sarvesh Patil, and Faruk Kazi. 2020. Improved generalizability of deep-fakes detection using transfer learning based CNN framework. In Proceedings of the 2020 3rd International Conference on Information and Computer Technologies (ICICT’20). IEEE, 86–90.
[45]
R. Wang, L. Ma, F. Juefei-Xu, X. Xie, J. Wang, and Y. Liu. Fakespotter: A simple baseline for spotting ai-synthesized fake faces. arXiv:1909.06122. Retrieved from https://arxiv.org/abs/1909.06122.
[46]
Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H. Bappy, Amit K. Roy-Chowdhury, and B. S. Manjunath. 2019. Detecting GAN generated fake images using co-occurrence matrices. arXiv:1903.06836. Retrieved from https://arxiv.org/abs/1903.06836.
[47]
Ning Yu, Larry Davis, and Mario Fritz. 2018. Attributing fake images to gans: Analyzing fingerprints in generated images. arXiv:1811.08180. Retrieved from https://arxiv.org/abs/:1811.08180.
[48]
Francesco Marra, Cristiano Saltori, Giulia Boato, and Luisa Verdoliva. 2019. Incremental learning for the detection and classification of gan-generated images. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS’19). IEEE, 1–6.
[49]
P. D. Mahendhiran and Subramanian Kannimuthu. 2018. Deep learning techniques for polarity classification in multimodal sentiment analysis. International Journal of Information Technology and Decision Making 17, 03 (2018), 883–910.
[50]
Imad Rida, Romain Hérault, and Gilles Gasso. 2018. An efficient supervised dictionary learning method for audio signal recognition. arXiv:1812.04748. Retrieved from https://arxiv.org/abs/1812.04748.
[51]
Xinsheng Xuan, Bo Peng, Wei Wang, and Jing Dong. 2019. On the generalization of GAN image forensics. In Proceedings of the Chinese Conference on Biometric Recognition. Springer, 134–141.
[52]
Polychronis Charitidis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Ioannis Kompatsiaris. 2020. Investigating the impact of pre-processing and prediction aggregation on the deepfake detection task. arXiv:1701.00133. Retrieved from https://arxiv.org/abs/1701.00133.
[53]
Rayan Al Sobbahi and Joe Tekli. 2022. Low-light homomorphic filtering network for integrating image enhancement and classification. Signal Processing: Image Communication 100 (2022), 116527.
[54]
Rody El Nawar, Jennifer Yeung, Julien Labreuche, Marie-Laure Chadenat, Duc Long Duong, Maxime De Malherbe, Yves-Sebastien Cordoliani, Bertrand Lapergue, and Fernando Pico. 2019. MRI-based predictors of hemorrhagic transformation in patients with stroke treated by intravenous thrombolysis. Frontiers in Neurology 10 (2019), 897.
[55]
Guy Assaker, Peter O’Connor, and Rania El-Haddad. 2020. Examining an integrated model of green image, perceived quality, satisfaction, trust, and loyalty in upscale hotels. Journal of Hospitality Marketing and Management 29, 8 (2020), 934–955.
[56]
Joe Tekli. 2022. An overview of cluster-based image search result organization: Background, techniques, and ongoing challenges. Knowledge and Information Systems 64, 3 (2022), 589–642.
[57]
Zoe Papakipos and Joanna Bitton. 2022. Augly: Data augmentations for robustness. arXiv:2201.06494. Retrieved from https://arxiv.org/abs/2201.06494.
[58]
Ahmed Abbasi, Abdul Rehman Javed, Farkhund Iqbal, Natalia Kryvinska, and Zunera Jalil. 2022. Deep learning for religious and continent-based toxic content detection and classification. Scientific Reports 12, 1 (2022), 17478.

Cited By

View all
  • (2025)Amodal instance segmentation with dual guidance from contextual and shape priorsApplied Soft Computing10.1016/j.asoc.2024.112602169(112602)Online publication date: Jan-2025
  • (2024)Introduction to Special Issue on “Recent Trends in Multimedia Forensics”ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367847320:11(1-7)Online publication date: 2-Aug-2024
  • (2024)Domain-invariant and Patch-discriminative Feature Learning for General Deepfake DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3657297Online publication date: 27-Apr-2024
  • Show More Cited By

Index Terms

  1. Data Augmentation-based Novel Deep Learning Method for Deepfaked Images Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 11
    November 2024
    702 pages
    EISSN:1551-6865
    DOI:10.1145/3613730
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 September 2024
    Online AM: 13 April 2023
    Accepted: 08 April 2023
    Revised: 23 February 2023
    Received: 09 December 2022
    Published in TOMM Volume 20, Issue 11

    Check for updates

    Author Tags

    1. Deepfake detection
    2. data augmentation
    3. image processing
    4. deep learning
    5. artificial intelligence
    6. transfer learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,796
    • Downloads (Last 6 weeks)644
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Amodal instance segmentation with dual guidance from contextual and shape priorsApplied Soft Computing10.1016/j.asoc.2024.112602169(112602)Online publication date: Jan-2025
    • (2024)Introduction to Special Issue on “Recent Trends in Multimedia Forensics”ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367847320:11(1-7)Online publication date: 2-Aug-2024
    • (2024)Domain-invariant and Patch-discriminative Feature Learning for General Deepfake DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3657297Online publication date: 27-Apr-2024
    • (2024)Approach Advancing Stock Market Forecasting with Joint RMSE Loss LSTM-CNN ModelFluctuation and Noise Letters10.1142/S021947752440018223:02Online publication date: 12-Mar-2024
    • (2024)A Federated Convolution Transformer for Fake News DetectionIEEE Transactions on Big Data10.1109/TBDATA.2023.332574610:3(214-225)Online publication date: Jun-2024
    • (2024)Detection of AI-Generated Images From Various Generators Using Gated Expert Convolutional Neural NetworkIEEE Access10.1109/ACCESS.2024.346661412(147772-147783)Online publication date: 2024
    • (2024)Optimal Features Driven Attention Network With Medium-Scale Benchmark for Wheat Diseases RecognitionIEEE Access10.1109/ACCESS.2024.343457512(150739-150753)Online publication date: 2024
    • (2024)Deep learning-based biometric image feature extraction for securing medical images through data hiding and joint encryption–compressionJournal of Information Security and Applications10.1016/j.jisa.2023.10362879:COnline publication date: 4-Mar-2024
    • (2024)A flexible analytic wavelet transform and ensemble bagged tree model for electroencephalogram-based meditative mind-wandering detectionHealthcare Analytics10.1016/j.health.2023.1002865(100286)Online publication date: Jun-2024
    • (2024)Data augmentation with attention framework for robust deepfake detectionThe Visual Computer10.1007/s00371-024-03690-yOnline publication date: 8-Nov-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media