Custom CNN model [
58], VGG16, InceptionV3, and VGG19 are among the four CNN architectures trained to identify deepfake images. Classifiers are fine-tuned to distinguish between real and fake images. The CNN model utilized in this work comprises six convolutional and max-pooling layers, each having a pool size of two and a kernel size of three. The CNN model’s input shape is 256
\(\times\) 256
\(\times\) 3. For classification, the CNN model used the softmax activation function. Here, in Equation (
2),
\(z_{i}\) is the score value of the right class, and
\(z_{j}\) are the score values of the other j classes;
\(f(x)\) is the softmax.
After applying a classifier, the softmax function calculates the probabilities associated with each class. The Softmax algorithm converts the scores to a vector of numeric values from 0 to 1 that add up to 1. As a result, in a classification problem, the outputs of a softmax can be understood as a vector of probabilities of a label. The training aims at improving the likelihood that the proper class has been selected. As a result, the negative log-likelihood of the proper class is minimized. The Loss expression in a Softmax classifier is calculated using the formula in Equation (
3):
where
\(Z_{y_{x}}\) the amount of the correct class’s score and the
\(z_{j}\) is other classes’ scores add up to. CNN model architecture consists of 29 layers, the first of which is an input layer, then many hidden layers, and finally, a fully linked layer. The total number of parameters is 1,601,534, of which 1,599,512 are trainable and only 2,022 are not. The loss was calculated using binary cross-entropy and the Adam optimizer. The model is trained across seven epochs. The VGG16 model used in this study has 24 layers, the first of which is an input layer, followed by several hidden levels, and eventually a dense layer. To get better outcomes, we used the principle of transfer learning. At the end of the VGG16 and InceptionV3 models, we added one flattened layer, two dense layers, and one dropout layer. The dropout and dense layers control the overfitting and classification of deepfake and real images. The VGG16 model was trained across three epochs, and InceptionV3 was trained using four epochs, and both yielded good results. We used two dense layers, one with relu and softmax activation function and one global average pooling layer at the end of the VGG19 model for model training. The VGG19 model was trained using seven epochs.