Nothing Special   »   [go: up one dir, main page]

Copy-Move Document Image Forgery Detection and Localization Based On JPEG Clues

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Copy-Move Document Image Forgery Detection and Localization

based on JPEG Clues


Chuiko A.V.1, Bulatov K. B.1,2, Tropin D. V.1,2
1
Smart Engines Service LLC, Moscow, Russia
2
Federal Research Center “Computer Science and Control” of Russian Academy of Sciences,
Vavilova str, Moscow, 119333, Russia

ABSTRACT

The amount of image forgery strongly increases recently. There are different ways to fake an image, one of the common
ones is a copy-move manipulation. There are numerous methods for detecting copy-move manipulations on natural images.
However, they are difficult to adapt for document images due to their features. This work proposes an algorithm for
detecting and localizing copy-move manipulations on digital images of documents. The main idea is to use JPEG artifacts
in order to find the target region area and then localize the source and target regions precisely. For the efficient application
of the proposed method, firstly, the original image must have been subjected to JPEG compression, and secondly, after the
manipulation the image must have been saved in a lossless format. The experiments were carried out on an open set of
document images CMID; in the detection task, the recall was 0.992, the specificity was 1.0; in the localization task, the
recall was 0.923, the false discovery rate was 0.021, which means that the proposed algorithm successfully detects more
than 99% of copy-move manipulations, similar to manipulations in the CMID and does not give false positives.
Keywords: copy-move, image tampering, image forgery, document forgery, JPEG artifacts .

1. INTRODUCTION
A digital image, after its registration by the capture device, can be modified using photo editors or other software. Such
modifications are called digital image manipulation [1,2] and they are often used for a data forgery [3]. In this paper, the
initial image obtained after being captured is referred to as the original image, and the image obtained after manipulation
is referred to as the manipulated image.

Figure 1. Sample images from the COVERAGE [6], H2020 Figure 2. Copy-move example from the CMID [16]
[7], CASIA2.0 [8], respectively. The top row shows original (tampered/0_FRA_TS_N_2.png). The forged image is on the
images, and the bottom row shows falsifications of copy- left, the source and target regions are on the right, marked
move, erase-fill and splicing (from left to right). green and red respectively.

Sometimes manipulations can be used in good faith to improve a digital image without changing its semantic content,
for example, for contrasting a photograph. Conversely, manipulations can sometimes be used to change the semantic
content of a digital image with malicious intent, such manipulations are called image forgery. A special place in researching
image forgery is held by manipulations in which part of the final image remains authentic, and part is modified; such
manipulations are called image tampering. Among them, three main types are distinguished: copy-move, splicing, erase-
fill [1,4] (Fig. 1). These types of manipulations can be viewed as filling the modified zone with a new content. If the source
for filling the zone is taken from the original image and represents a whole part, this is a copy-move manipulation. If the
source for filling is taken from another image and also represents a whole part, this is a splicing manipulation. If the zone

Sixteenth International Conference on Machine Vision (ICMV 2023), edited by Wolfgang Osten,
Proc. of SPIE Vol. 13072, 130720K · © 2024 SPIE · 0277-786X
doi: 10.1117/12.3023365

Proc. of SPIE Vol. 13072 130720K-1


is filled with elementary fragments from the original or other images or synthetic content, this is an erase-fill manipulation
[5].
The scope of this report covers only copy-move manipulations. The region from which the copied content is taken is
called the source region, and the region to which the copied content is pasted, that is, the region with modified pixels, is
called the target region. The copied content can be converted before pasting into the target region; also, the image can be
post-processed after inserting the new content, both locally, in the target region, and globally.
Fraudsters have found a vast scope of applications for copy-move manipulation when creating fakes. For example, by
copying a background section, an object can be removed from the image. In addition, copy-move manipulation is very
easy to implement and makes it possible to change the semantic content so that the manipulated image looks genuine and
natural. Therefore, the problem of copy-move manipulations is currently one of the most actively researched in the field
of image tampering [9, 10, 11]. In particular, fraudsters use copy-move operations when forging documents [12].
The task of detecting copy-move manipulations on a document image is associated with some peculiarities (Fig. 2).
Firstly, to change the content of a document image, it is enough to change the text image on it. In other words, to falsify a
document image significantly, it is enough to modify an area of a small relative size. Secondly, document images
traditionally contain many symbols that are Similar but Genuine Objects (SGOs) (Fig. 1 the top left image). Traditional
datasets with copy-move manipulations [6, 7, 8, 13, 14, 15] do not cover cases complex in terms of the SGOs quantity and
the size of the modified area. They are more focused on problems associated with post-processing of manipulated images
or converting copied content. The COVERAGE dataset [6] stands apart; its images contain SGOs, but their quantity and
sizes of modified areas are incomparable with those for copy-move manipulated document images.
In 2021, there was introduced the CMID dataset [16], containing 16 types of ID documents (Fig. 2). It contains 304
genuine and 893 fake images with one character changed by copy-move. The average area of the regions involved in copy-
move is 0.21% of the entire image area, and the images themselves have a resolution of 1342 x 943. The dataset was
created to evaluate copy-move detection methods in conditions of a large number of SGOs, as well as the small relative
size of the source and target regions.
Beside the presentation of the dataset itself in the publication [16], an experimental study was performed, in which
methods from publications [9, 17, 18, 19] were tested on the presented set. The study has shown that the selected methods
have poor quality on images from the CMID. We suppose that this is due to the fact that, firstly, these methods were
configured for images with a different SGO count and other sizes of the source and target regions. Secondly, these methods
solve a more general problem than that posed by the dataset, since they do not use all the clues about images from the
CMID.
The main contributions of the present paper are the following:
• An algorithm is proposed for reliable detection and localization of modifications when, firstly, the original image
has been subjected to JPEG compression, and secondly, it has been saved in lossless format after modification.
• We propose an algorithm for searching source and target regions by approximate localization of the target region
in cases of linear model of transformation the source region into the target region.
• It is shown that the method composed of a sequential combination of these two algorithms is robust to a large
number of SGOs and small modification areas and gives high results on the CMID set.
In this paper, based on JPEG compression artifacts (see Sec. 2), we propose a method for detecting copy-move
manipulations (see Sec. 3), which has high quality results, on the CMID set (see Sec. 4), and, finally, we provide a
conclusion.

2. JPEG BACKGROUND INFORMATION


Let us consider JPEG compression in detail (JFIF specification [20]). First, the image is converted into the YCbCr
color space, then compression is made independently on a per-channel basis. We are interested only in the conversions of
the Y channel, so we will consider only its compression. The Y channel is padded with zeros to a multiple of the sides of
eight and is split into 8 × 8 blocks. Further, these blocks will be processed independently. For each 8 × 8 block, a two-
dimensional discrete cosine transform (DCT) is performed; its specific form is of little significance for this work. Thus,

Proc. of SPIE Vol. 13072 130720K-2


DCT image 𝐷 is obtained. The image compression ratio is determined by the quantization matrix, which is entered as input
to the JPEG compression algorithm along with the image. Let us provide definitions.
Definition 1. The quantization matrix Q is an 8 × 8 integer matrix with positive values, fixed for the entire image (for a
given compression). The number set in a particular position of the quantization matrix is called the quantization step. The
position in the quantization matrix 𝜓 = (𝜓𝑤 , 𝜓ℎ ), 𝜓 ∈ {0, . . . ,7}2 is called frequency.
Definition 2. The quantization of the matrix 𝐴 by the quantization matrix 𝑄 is the operation of an element-wise division
𝐴𝑖𝑗
of the values of the matrix 𝐴 by the values of the matrix 𝑄, followed by rounding: 𝑈𝑖𝑗 = [ ]. The inverse quantization of
𝑄𝑖𝑗

the matrix 𝑈 into the matrix 𝑄 will be called an element-wise operation: 𝐴 𝑖𝑗 = 𝑈𝑖𝑗 ⋅ 𝑄𝑖𝑗 .
Each 8 × 8 block of image 𝐷 is quantized by the quantization matrix 𝑄 . The resulting integers are losslessly
compressed using series coding and Huffman codes and saved to a JPEG file according to the standard.
When deploying the image in the editor, the reverse chain of transformation is involved. Each 8 × 8 block packed in
JPEG file is inverse quantized by the quantization matrix 𝑄, so the image 𝐷 ∗ is obtained. Note that 𝐷 ∗ ≠ 𝐷, since rounding
of the divided values removes some of the information. The inverse discrete cosine transform DCT -1 is applied to the
resulting image of the modified DCT coefficients 𝐷 ∗ , also independently by 8 × 8 blocks. The resulting values after DCT-
1
may not belong to {0,1, . . . ,255}, therefore rounding and truncation are performed additionally and the 𝑌𝑐𝑜𝑚 image is
obtained.
Let us estimate the distribution of 𝐷𝑐𝑜𝑚 = 𝐷𝐶𝑇(𝑌𝑐𝑜𝑚 ) values at a fixed frequency 𝜓 = (𝜓𝑤 , 𝜓ℎ ). If the image was
subjected to JPEG compression with 𝑄(𝜓) = 𝑞, its DCT coefficients will be near the values of multiples of 𝑞. Note that
the values 𝐷 ∗ (𝜓𝑤 + 8𝑥, 𝜓ℎ + 8𝑦), obtained immediately after the inverse quantization, lie in the set {𝑘 ⋅ 𝑄(𝜓), 𝑘 ∈ ℤ}.
Image 𝐷 ∗ differs from 𝐷𝑐𝑜𝑚 due to the rounding and the truncation step. The difference between the coefficients in 𝐷 ∗ and
𝐷𝑐𝑜𝑚 in blocks where there was no truncation is called the rounding error and can be modeled by a normal distribution
with zero mean and variance of 1⁄12 [21].
Proposition 1. Let us suppose that when rounding and truncation were applied, no truncation was performed. Then
1
𝐷𝑐𝑜𝑚 (𝜓𝑤 + 8𝑥, 𝜓ℎ + 8𝑦) ∼ Ɲ (𝐷 ∗ (𝜓𝑤 + 8𝑥, 𝜓ℎ + 8𝑦), ), (1)
12

where Ɲ(𝑎, 𝜎 2 ) is a normal distribution with the mean 𝑎 and the variance 𝜎 2 .
The article [22] shows that the distribution of DCT coefficients for images that have not been subjected to JPEG
compression can be modeled using a Laplace distribution.
Proposition 2. The distribution of discrete cosine transform coefficients at any fixed frequency 𝜓 ≠ (0,0) is a Laplace
distribution with zero mean:
𝜆𝜓
𝐷(𝜓𝑤 + 8𝑥, 𝜓ℎ + 8𝑦) ∼ 𝜌(𝑡) = 𝑒 −𝜆𝜓|𝑡| , 𝑡 ∈ ℝ. (2)
2

Histograms of DCT coefficients calculated on images that have not been / have been subjected to JPEG compression
are shown schematically in Figure 3.

Figure 3. Visual representation of histograms of the calculated DCT coefficients of frequency 𝜓 of a non-JPEG-compressed image
(left) and a JPEG-compressed image, excluding blocks where truncation occurred with 𝑄(𝜓) = 𝑞 (right).

3. PROPOSED ALGORITHM
Our algorithm globally consists of two stages. The first one is determining the focus area using JPEG compression
artifacts. The second one is detecting the source and target regions with a known focus area. We will assume that an

Proc. of SPIE Vol. 13072 130720K-3


original image was subjected to JPEG compression, and to create a manipulated image, the original image was subjected
to a copy-move manipulation and then saved in a lossless format.
Detection of the modified area is based on searching for inconsistencies in the structure of the DCT coefficients of the
JPEG image. Let us introduce sets of the following form:
𝑄𝑚(𝑞, 𝑙) = ⋃𝑘∈ℤ[𝑘𝑞 − 𝑙, 𝑘𝑞 + 𝑙) , 𝑞 ∈ ℕ, 𝑙 ∈ ℝ>0 . (3)
By our assumption, the DCT coefficients in the falsified area will not fit into the model shown in Figure 3 on the right,
since this area most likely has not been subjected to JPEG-compression with the same 8 × 8 partition previously. We will
look for image areas where the 𝐷𝑐𝑜𝑚 values calculated by us do not fall into the sets 𝑄𝑚(𝑞, 𝑙), but for this we will also
need to estimate the quantization matrix with which the non-falsified part of the image was compressed. The schematic
structure of the proposed algorithm can be seen in Figure 4.

Figure 4. Schematic structure of the proposed algorithm.

3.1 Determining the focus area


3.1.1 Calculation of the image DCT coefficients
Thus, we have an image in RGB space, we augment its sides with black pixels up to multiplicity 8 and calculate 𝑌𝑐𝑜𝑚
— brightness in YCbCr space. Then we apply the discrete cosine transform, preserving the positional information, and
rounding the obtained values:
𝐷𝑐𝑜𝑚 = [𝐷𝐶𝑇(𝑌𝑐𝑜𝑚 )]. (4)
The rounding, according to our observations, does not affect the overall quality of work, but it allows convenient
construction of factor histograms (see Sec. 3.1.2) and reduces memory requirements. Let us denote as 𝐵𝑠𝑎𝑡 the set of blocks
in which the source image contains values 0 or 255 in at least one of the channels R, G, B. There may be truncations in
blocks from 𝐵𝑠𝑎𝑡 , so 𝐷𝑐𝑜𝑚 values derived from them may behave unpredictably.
3.1.2 Quantization matrix estimation
Further in our method, the quantization matrix is estimated using the calculated DCT coefficients. To avoid the
influence of the falsified area on the matrix evaluation, 20 non-overlapping blocks with side 232 are extracted from the
image. Next, we fix the frequency and estimate the corresponding quantization step in each block using a modified method
from work [23]. So, we follow these steps:
1. Calculation of the factor histogram ℎ, where ℎ(𝑥) is the number of non-zero DCT coefficients that are divided by
x without remainder and are not equal to -1, 0, 1 and do not belong to blocks from 𝐵𝑠𝑎𝑡 .
2. Normalization of ℎ by the total number of coefficients involved.
3. Estimation of the quantization step as 𝑞̂ =max{𝑥 | ℎ(𝑥) ≥ 𝑇}, for the threshold 𝑇 = 0.7.
The most frequent non-unit estimation of the quantization step for blocks is taken as the final estimation of the quantization
step. Thus, quantization steps for all 64 frequencies are estimated, denote the estimated quantization step for 𝜓 -th
frequency by 𝑞̂.𝜓 In the case when the true quantization step is equal to 1, despite the fact that the mode is taken over non-
unit values, the resulting estimate with a high chance will be 2 due to Proposition 2, and will not affect the further algorithm
since 𝑄𝑚(2,1.5) = ℝ.

Proc. of SPIE Vol. 13072 130720K-4


3.1.3 Search for discrepancies of the DCT coefficients to the quantization matrix
The next step is to construct two auxiliary images 𝐼1 and 𝐼2 :
𝐼1 (𝑥, 𝑦) = ∑𝜓∈{0,...,7}2 [𝐷𝑐𝑜𝑚 (𝜓𝑤 + 8𝑥, 𝜓ℎ + 8𝑦) ∉ 𝑄𝑚( 𝑞̂,
𝜓 1.5)] ; (5)
𝛹 = {𝜓| ∃(𝑥, 𝑦): 𝐷𝑐𝑜𝑚 (𝜓𝑤 + 8𝑥, 𝜓ℎ + 8𝑦) ∉ 𝑄𝑚( 𝑞̂,
𝜓 1.5)}; (6)
𝐼2 (𝑥, 𝑦) = ∑𝜓∈𝛹[𝐷𝑐𝑜𝑚 (𝜓𝑤 + 8𝑥, 𝜓ℎ + 8𝑦) ∉ {−1,0,1}], (7)
where [⋅] — Iverson bracket. In other words, the brighter is the pixel in image 𝐼1 , the more frequencies there are for which
the corresponding 8 × 8 block does not fit into the assumed DCT coefficient distribution model. DCT coefficients equal
to -1, 0, 1 fall into the sets 𝑄𝑚(𝑞, 1.5) with any 𝑞, and their high content in a certain area can be interpreted as a low
texture of the original image in it. Falsified low-texture areas, due to the above, will have lower brightness in image 𝐼1 . A
normalizing image 𝐼2 is introduced to compensate for the effect of low texture on the detection of falsified areas. The
frequency set 𝛹 is arranged so that each frequency from it gave a contribution to at least one pixel of 𝐼1 . 𝐼1 pixel values
corresponding to blocks from 𝐵𝑠𝑎𝑡 are equated to 0, 𝐼2 pixel values to 1.
𝐼1 (𝑥,𝑦)
Let's construct an image of 𝐼3 (𝑥, 𝑦) = , where unity is added in the denominator to avoid division by zero.
𝐼2 (𝑥,𝑦)+1
Thus, an image is obtained from values in the range [0,1]. Values in 𝐼3 can be interpreted as the fraction of frequencies
whose which corresponding DCT coefficients are inconsistent with the model.
Image 𝐼4 is obtained from image 𝐼3 after applying morphological operations (maximum from opening with wing sizes
(1,0) and (0,1) and closing with wing size (1,1)) to eliminate false features arising from random deviations of DCT
coefficients from the model and errors in the quantization matrix estimation. To avoid accidental false triggering, all 𝐼4
values are equated to zero if the maximum value in 𝐼4 is less than 0.5. The focus area is considered to be a set of 8 × 8
blocks whose corresponding pixel values in 𝐼4 are greater than 0.25. Contrasted images 𝐼1 , 𝐼2 , 𝐼3 , 𝐼4 for the modified image
tampered/0_FRA_TS_N_2.png from the CMID [16] are presented on Figure 5.

Figure 5. The input image with manipulation is illustrated in A, up-scaled calculated images 𝐼1 , 𝐼2 , 𝐼3 , 𝐼4 are depicted in B, C, D, E.
The brightnesses of both modified zones equalize at 𝐼3 .

3.2 Finding the source and target regions


The second stage is to find the source and target regions given a known focus area (Fig. 6B). We take the upper left
component of connectivity in region 𝐴, then, in the smallest orthotropic rectangle containing 𝐴, we select a central square
with side 20 pixels, denote it as 𝐴𝑡 (Fig. 6C). Such a square has a high probability of being completely in the target region,
and its small size makes the remaining steps of the algorithm computationally simple.
3.2.1 Approximate localization of the source region
When viewing the dataset [16], it was observed that the target region is obtained by parallelizing the source region and
transforming the pixel values. Let us consider any two corresponding pixels in the target and source regions, let 𝑖𝑡 and 𝑖𝑠
be their brightnesses respectively, we assume that they are related as follows:
𝑖𝑡 = 𝑎 ⋅ 𝑖𝑠 + 𝑏, (8)
where 𝑎, 𝑏 are parameters fixed for the selected image. Accordingly, we go to the brightness image 𝐼 and then, to have
𝑏 = 0 in the model, we go to
+1 0
𝐺 = |𝛻1 𝐼|, where 𝛻1 denotes the kernel convolution [ ]. (9)
0 −1
It is worth noting that the model may no longer be valid for 𝐺 at the edge pixels of the source and target regions.

Proc. of SPIE Vol. 13072 130720K-5


Next, we take a region in image 𝐺 corresponding to the 𝐴𝑡 and find the closest non-coinciding region in 𝐺 (of the same
size) in the sense of the normalized residual of the least squares method:
(∑𝑖 𝑇𝑖 𝑋𝑖 )2
1−∑ 2 2 , ∑𝑖 𝑇𝑖2 ∑𝑖 𝑋𝑖2 ≠ 0;
𝑆(𝑇, 𝑋) = { 𝑖 𝑇𝑖 ∑𝑖 𝑋𝑖 (10)
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,
index 𝑖 in sums runs the corresponding positions in regions 𝑇 and 𝑋, and 𝑇𝑖 and 𝑋𝑖 are pixels brightness at position of 𝑖
regions 𝑇 and 𝑋, respectively. Let's denote the found area by 𝐴𝑠 .
3.2.2 Region edges refinement
Extend 𝐴𝑡 and 𝐴𝑠 symmetrically in all directions so that their size becomes 80 × 80 (Fig. 6C). Let us take the ratio of
image regions 𝐺 + 𝜀 corresponding to 𝐴𝑡 and 𝐴𝑠 , denote it as 𝐴𝑡⁄𝑠 (ε is necessary to avoid division by 0). Then, if 𝐴𝑠 is
found correctly and model (8) is true, we will obtain a constant |𝑎| in 𝐴𝑡⁄𝑠 in the region corresponding to the source and
target regions (possibly excluding edge pixels), so we will search in 𝐴𝑡⁄𝑠 for a region with a nearly constant value.
To do this, we calculate 𝐴𝑑(𝑡⁄𝑠) = |𝛻1 𝐴𝑡⁄𝑠 |. The area with an almost constant value at 𝐴𝑑(𝑡⁄𝑠) should be close to zero.
If the 𝐴𝑡⁄𝑠 , contains an area close to zero, then in 𝐴𝑑(𝑡⁄𝑠) the corresponding area will be close to zero, and therefore we
propose adding to 𝐴𝑑(𝑡⁄𝑠) some penalty for this. Let us denote the resulted auxiliary image with added penalty for 𝐻 (Fig.
6E):
min(𝐴𝑡⁄𝑠 (𝑥,𝑦),0.5)
𝐻(𝑥, 𝑦) = min(𝐴𝑑(𝑡⁄𝑠) (𝑥, 𝑦), 2.5) + (2.5 − 2.5 ⋅ ), (11)
0.5

we set the upper limit for 𝐴𝑑(𝑡⁄𝑠) as 2.5 for more stable behavior (Fig. 6D).
To determine the relative position of source and target regions within 𝐴𝑡 and 𝐴𝑠 , we find a rectangle 𝑟̂ in 𝐻 (Fig. 6E)
that minimizes the intra-class variance:
∑𝑢∈𝑟(𝐻(𝑢))2 ∑𝑢∈𝑟 𝐻(𝑢) 2 ∑ 1 ∑𝑢∉𝑟(𝐻(𝑢))2 ∑𝑢∉𝑟 𝐻(𝑢) 2 ∑ 1 ∑𝑢∈𝑟 𝐻(𝑢) ∑𝑢∉𝑟 𝐻(𝑢)
𝑟̂ = argmin {( ∑𝑢∈𝑟 1
−( ∑𝑢∈𝑟 1
) ) ∑ 𝑢∈𝑟 + ( ∑𝑢∉𝑟 1
−( ∑𝑢∉𝑟 1
) ) ∑ 𝑢∉𝑟 | ∑𝑢∈𝑟 1
≤ ∑𝑢∉𝑟 1
} . (12)
𝑟 𝑢∈𝐻 1 𝑢∈𝐻 1

We extend 𝑟̂ by 2 pixels in each direction to compensate for 𝛻1 operations (Fig. 6E). Then we obtain the absolute positions
of the found source and target regions using the absolute positions of 𝐴𝑡 , 𝐴𝑠 and the found rectangle.

Figure 6. Elements of the algorithm operation for the tampered/0_FRA_TS_N_2.png from the CMID set [16] (see Fig. 2). A is the
area of the source image. B is the area of the binary mask, where the focus area is white. C — 𝐴𝑠 (green) and 𝐴𝑡 (red) before and after
extension. D is the image 𝐴𝑑(𝑡⁄𝑠) limited at the top at 2.5, E is image 𝐻 with the 𝑟̂ before and after expansion.

4. RESULTS
The CMID set is dedicated to two tasks: detecting the presence of falsification (image-level task) and localizing
modified regions (pixel-level task). The following quality indicators are provided in the tables: True Positive Rate (TPR),
False Positive Rate (FPR), Matthews Correlation Coefficient (MCC), False Discovery Rate (FDR), 𝐹1 score (F1).
To calculate the TPR, FPR, MCC for our method in the image-level task, we mark the images of the CMID set [16]
from the folder ‘tampered’ as positive, and the images from the folder ‘ref’ as negative. Our algorithm marks the image as
manipulated (positive) if the focus area is not empty, otherwise it marks the image as original (negative). We have TPR =
0.9922, FPR = 0, MCC = 0.9848. The indicators in the image-level task for methods [9, 17, 18, 19] from the table in the
work [16], supplemented by the indicators of our method, are presented in Table 1 (columns labeled “image”).
In the CMID set, the folder ‘gt’ contains three-channel mask images corresponding to the images from the folder
‘tampered’. In the green channel of images from the folder ‘gt’, values of 255 indicate pixels of the source area (the
remaining pixels are in channel 0), in the red channel, pixels of the target area are indicated, but not only with values of

Proc. of SPIE Vol. 13072 130720K-6


255 (at the edges of the areas, values range from 0 to 255). The blue channel has zeros. To calculate the TPR, FDR, F1,
MCC for our method in the pixel-level task, we convert the mask images from the folder ‘gt’ to single-channel binarized
ones as follows: if the value in the red OR green channel is greater than 0.5, then 1, otherwise 0. Values 1 of pixels in the
new single-channel image serve as positive tags for the corresponding pixels of the corresponding image from the folder
‘tampered’; values 0 serve as negative tags. The algorithm calculates identical masks based on input images from the folder
‘tampered’. The pixel values of these masks are considered to be the detector's responses on the corresponding image
pixels (0 for negative, 1 for positive). Next, we calculate the quality indicators in the task of classifying a set of pixels
(total from all images in the folder ‘tampered’), which are tagged positive and negative. We have TPR = 0,9234, FDR =
0.0209, F1 = 0.9504, MCC = 0.9507. The indicators in the pixel-level task for methods [9, 17, 18, 19] from the table in
work [16], supplemented by the indicators of our method, are presented in Table 1 (columns labeled “pixel”).

Table 1. Results of testing the proposed algorithm and comparison with others.
Method TPR ↑ image FPR ↓ image MCC ↑ image TPR ↑ pixel FDR ↓ pixel F1 ↑ pixel MCC ↑ pixel
SURF [9] 0.7919 0.7697 0.0236 0.2155 0.9792 0.0378 0.0606
SIFT [9] 0.9676 0.9145 0.1104 0.6004 0.9610 0.0731 0.1471
BusterNet [17] 0.1601 0.1607 -0.0006 0.0016 0.9979 0.0018 0.0000
FE-CMFD [18] 0.0246 0.0066 0.0561 0.0341 0.3114 0.0650 0.1530
SIFT-LDM [19] 0.7917 0.0197 0.6847 0.2555 0.0541 0.4024 0.4912
Proposed algorithm 0.9922 0.0000 0.9848 0.9234 0.0209 0.9504 0.9507

The performance indicators of our method are ahead of the maximum values of the performance of methods [9, 17, 18,
19] in both tasks. Thus, in the image-level task our MCC = 0.9742 versus MCC = 0.6847 of the method holding the second
place in this performance indicator [19]. In the pixel-level task, the situation is similar, our MCC = 0.9554 versus MCC =
0.4912 of the second place [19]. Interpreting the TPR and FPR performance indicators in the image-level task, we can state
that our method successfully detects more than 99% of copy-move manipulations, similar to manipulations in the CMID
[16] and does not give false positives.

CONCLUSION
In this paper, we proposed a method for detecting and localizing copy-move manipulations on the CMID set [16]. This
method uses the properties of the JPEG format and assumptions about the model of transforming the source region into
the target region, which allows the algorithm to be robust to the large number of SGOs and small modification areas which
occur in the CMID images [16]. The stability of the method is confirmed by experiments. Later on, it is planned to make
the method resistant to JPEG re-compression using a different strategy for highlighting the focus area.

REFERENCES

[1] Zheng, Lilei, Ying Zhang, and Vrizlynn LL Thing. "A survey on image tampering and its detection in real-world
photos." Journal of Visual Communication and Image Representation 58 (2019): 380-399.
[2] Thakur, Rahul, and Rajesh Rohilla. "Recent advances in digital image manipulation detection techniques: A brief
review." Forensic science international 312 (2020): 110311.
[3] Garfinkel S. L. Digital forensics research: The next 10 years //digital investigation. – 2010. – Т. 7. – С. S64-S73.
[4] Meena, Kunj Bihari, and Vipin Tyagi. "Image forgery detection: survey and future directions." Data, Engineering and
Applications: Volume 2 (2019): 163-194.
[5] Castillo Camacho, Ivan, and Kai Wang. "A comprehensive review of deep-learning-based methods for image
forensics." Journal of imaging 7.4 (2021): 69.
[6] B. Wen, Y. Zhu, et al., COVERAGE: a novel database for copy-move forgery detection, in: Proceedings of ICIP, IEEE,
2016, pp. 161–165.
[7] Faria Hossain, Asim Gul, Rameez Raja, Tasos Dagiuklas, Chathura Galkandage, January 13, 2022, "Forgery Image
Dataset", IEEE Dataport, doi: https://dx.doi.org/10.21227/9dmj-yn86.

Proc. of SPIE Vol. 13072 130720K-7


[8] J. Dong, W. Wang, T. Tan, CASIA image tampering detection evaluation database, in: Proceedings of ChinaSIP, IEEE,
2013, pp. 422–426.
[9] Christlein, Vincent, et al. "An evaluation of popular copy-move forgery detection approaches." IEEE Transactions on
information forensics and security 7.6 (2012): 1841-1854.
[10] Kumar, Shubham, Soumya Mukherjee, and Arup Kumar Pal. "An improved reduced feature-based copy-move forgery
detection technique." Multimedia Tools and Applications 82.1 (2023): 1431-1456.
[11] Koul, Saboor, et al. "An efficient approach for copy-move image forgery detection using convolution neural network."
Multimedia Tools and Applications 81.8 (2022): 11259-11277.
[12] Abramova, Svetlana. "Detecting copy–move forgeries in scanned text documents." Electronic Imaging 2016.8 (2016):
1-9.
[13] Tralic, D.; Zupancic, I.; Grgic, S.; Grgic, M. CoMoFoD—New database for copy-move forgery detection. In
Proceedings of the International Symposium on Electronics in Marine, Zadar, Croatia, 25–27 September 2013; pp.
49–54.
[14] Novozamsky, Adam, Babak Mahdian, and Stanislav Saic. "IMD2020: A large-scale annotated dataset tailored for
detecting manipulated images." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision Workshops. 2020.
[15] I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, G. Serra, A SIFT-based forensic method for copy–move attack
detection and transformation recovery, IEEE Trans. Inf. Forensics Secur. 6 (3) (2011) 1099–1110.
[16] Mahfoudi, Gaël, et al. "CMID: A New Dataset for Copy-Move Forgeries on ID Documents." 2021 IEEE International
Conference on Image Processing (ICIP). IEEE, 2021.
[17] Wu, Yue, Wael Abd-Almageed, and Prem Natarajan. "Busternet: Detecting copy-move image forgery with
source/target localization." Proceedings of the European conference on computer vision (ECCV). 2018.
[18] Li, Yuanman, and Jiantao Zhou. "Fast and effective image copy-move forgery detection via hierarchical feature point
matching." IEEE Transactions on Information Forensics and Security 14.5 (2018): 1307-1322.
[19] Mahfoudi, Gaël, et al. "Copy and move forgery detection using sift and local color dissimilarity maps." 2019 IEEE
Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2019.
[20] "JPEG File Interchange Format (JFIF)." https://www.ecma-international.org/wp-content/uploads/ECMA_TR-
98_1st_edition_june_2009.pdf. Accessed: 2023-10-03.
[21] Fan, Zhigang, and Ricardo L. De Queiroz. "Identification of bitmap compression history: JPEG detection and
quantizer estimation." IEEE Transactions on Image Processing 12.2 (2003): 230-235.
[22] Lam, Edmund Y., and Joseph W. Goodman. "A mathematical analysis of the DCT coefficient distributions for
images." IEEE transactions on image processing 9.10 (2000): 1661-1666.
[23] Yang, Jianquan, et al. "Estimating JPEG compression history of bitmaps based on factor histogram." Digital Signal
Processing 41 (2015): 90-97.

Proc. of SPIE Vol. 13072 130720K-8

You might also like