1 Introduction
Intellectual property theft is a growing problem in Europe and the USA. According to the FBI, the annual cost incurred by the US economy because of intellectual property theft lies between 225 and 600
billion USD.
1 This is an enormous number. In fact, the situation is considered to be so severe that in 2023, the US House of Representatives formed a bipartisan select committee to “investigate and submit policy recommendations” on the topic.
2The FORGE [
8] framework addresses this problem by suggesting that when an inventor creates a document containing intellectual property, a process should automatically trigger the creation of
n fake versions of the document (e.g.,
\(n=49\) or
\(n=99\) or even more). An adversary, faced with
\((n+1)\) versions of a given document, will need to sift through all
\((n+1)\) documents to find the real one. This causes delays, imposes costs, increases frustration, and increases uncertainty for the adversary. Even if he finds the real document, he may not be sure that it is real. And if he identifies a fake document as the real thing and chooses to act on that basis, then the impact on his operations may be devastating. Several successor efforts build upon FORGE in various ways [
1,
21,
26,
49]. Nevertheless, all of these efforts seek to generate
n fake documents by replacing words or concepts (e.g., n-grams) in an original document with new words/concepts so the fake is “close enough” to the original to be believable, yet “far enough” to be wrong.
However, these past efforts fail to take advantage of existing knowledge about the process of human language understanding. Deception involves inducing a state of incorrect belief in a user’s mind, and therefore there may be significant value in looking to psycholinguistics for inspiration, specifically at how people get from a sentence to the information that is ultimately derived from it.
For some time, a measurement known as
surprisal has been one of the main workhorses in cognitive science research on sentence understanding [
19,
20,
41]. Surprisal measures the quantity of information conveyed by a probabilistic event x,
\(-\log p(x)\), where the base of the logarithm is often 2 so the quantity of information is measured in bits. Notice that the familiar definition of Shannon entropy for a random variable
X,
\(H(X) = \sum _x -p(x)\log p(x)\), can be written as
\(H(X) = E\left[-\log p(x)\right]\); that is, entropy is the expected value of surprisal. In psycholinguistics, the distribution most commonly of interest is the conditional probability of a word given its preceding context,
\(p(w_i|w_1 \ldots w_{i-1})\). When this probability is high, the next word
\(w_i\) conveys little information given its context and its surprisal is low. Conversely, the surprisal for
\(w_i\) is high when it is unexpected given the context.
Psycholinguists and neurolinguists often adopt a
linking hypothesis [
14,
44] —in which surprisal is connected with processing effort [
5,
11,
38,
42,
45]. For example, word-level surprisal estimated using corpus-based probabilities has been correlated with indices of human sentence processing effort using fMRI [
4,
6,
22,
39], MEG [
7,
13], EEG [
31,
34,
37], reading times [
11,
18,
40,
42,
48], and pupillometric measurements [
2]. In another influential line of work, Nobel Laureate Daniel Kahneman [
25] drew a connection between effort and attention: In a system where mental energy is a limited resource, he observes, the terms “exert effort” and “pay attention” can to a great extent be considered synonymous (p. 8).
Putting these ideas together leads to our working hypothesis related to deception. We know that a word’s surprisal in context is connected to cognitive effort in processing it, and we know that effort is connected to attention. Therefore, it is plausible to surmise that there is a relationship between surprisal and attention. Adding the last piece of the puzzle, there is an obvious connection between attention and deception—the common idiom “slipping something under the radar” encapsulates the idea that deception is not deception if it is noticed, i.e., if it rises to a conscious level of attention. We therefore conjecture that optimizing success at deception should involve surprisal as a crucial variable to manipulate.
In this article, we explore this hypothesis by introducing methodology for
Surprisal-based Fake (
SbFAKE for short) document generation. In particular, we develop objective functions involving surprisal, such as substituting words that minimize surprisal, in optimization-based fake document generation (cf. References [
1,
8]).
To our knowledge, we are the first to introduce methods inspired by research on human sentence understanding in service of improving the generation of documents that will deceive adversaries. In addition to that core idea, our principal contributions are the following: First, we contribute a bilevel optimization program that takes an original document and specifies how to substitute words in the original document with replacements (using the concept of surprisal) so
n fake versions of the document are generated. Second, because bilevel optimization is computationally challenging, we develop methods to scale this computation by showing how a single-level linear program can do the job in an equivalent manner. Finally, we use a prototype implementation to conduct an extensive set of experiments with human subjects, looking at the ability to detect deceptive documents in two domains, computer science and chemistry.
3Our experiments: (i) identified the best parameters under which to run
SbFAKE, (ii) showed that
SbFAKE under these best parameters beat out the WE-FORGE system [
1] (which had been previously been shown to beat FORGE [
8]), and (iii) yielded interesting results about the surprisingness of words being replaced in the original document and the surprisingness of the replacement words, and (iv) demonstrating interesting relationships between the amount of time spent by subjects reading documents, their attentiveness, and their ability to detect the real document. In short,
SbFAKE beats the state-of-the-art along many dimensions and also reveals new insights about how words’ surprisal is linked to their ability to help deceive human users.
The organization of this article is as follows: Section
2 describes prior work on this topic. Section
3 provides a bird’s-eye view of the architecture of the
SbFAKE framework. Section
4 shows how we combine the psycholinguistic concept of surprisal and concepts from operations research to set up a bilevel optimization problem to solve the problem of finding
n fake versions of an original document. Because solving bilevel optimization problems is hard, Section
5 shows how to convert these bilevel optimization problems into a single level linear programming problem. Section
6 contains details of the prototype implementation of the
SbFAKE system, along with experimental results conducted under appropriate IRB authorization.
2 Related Work
The goal of reducing IP theft has been in the minds of cybersecurity researchers for years. While much of this effort has gone into traditional security instruments (e.g., firewalls to keep intruders out [
9], encryption to protect secret data [
47], network and/or transaction monitoring [
27]) to identify malicious activity, recent work has focused on generating fake versions of documents to deter IP theft.
The history of using fakes to deceive an adversary is not new. Reference [
43] was one of the first to propose the use of honeypots to identify insiders within an organization who are stealing sensitive data. Honey files [
51] created files with attractive sounding names such as
passwords.doc so attackers would be drawn toward those files—if a user touched those files, then s/he would be assumed to be malicious. Reference [
46] proposed using fake DNS information and HTML comments to lead attackers astray. Reference [
35] provides a comprehensive survey of honeypots in the literature, but conspicuously says little about work involving natural language processing.
Separately from this work, there is growing interest in combating data breaches through the use of fake data [
10,
15,
33] and intentionally falsified information [
30]. Because those efforts focus on relational databases (usually only tables), we do not describe them in detail here.
Concurrently with efforts to combat data breaches with fake data, there has been a noticeable increase in research on the idea of generating
n fake versions of a real, technical document to deter IP theft. Unlike honeypots, the idea here is not to identify an attacker, but to impose costs on him/her, once they steal a tranche of documents from a compromised system, even if the victim does not even know they have been hacked. FORGE [
8] was the first system to propose this idea. It extracted concepts from a network, built a multilevel concept graph to find words to replace in the original document, and then used an ontology to find appropriate replacement words.
4 FORGE suffered from several issues: First, it required a good ontology for the domain of documents being protected from IP theft, but such ontologies were not always present. Second, by first choosing a word to replace in a first phase (without considering the quality of the potential replacements of that word), it often forced suboptimal choices in a second phase where the replacements were chosen. The WE-FORGE system [
1] improved upon FORGE by eliminating both of these flaws. Rather than using an ontology, WE-FORGE automatically associated an embedding vector [
28] with each word or concept in an original document as well as in some underlying background corpus. All of these word embeddings (and hence the words themselves) were then run through a clustering algorithm [
50] to generate clusters. Replacements for a word were selected from the cluster containing the word. Word-Replacement pairs were selected simultaneously by solving an optimization problem. WE-FORGE was shown to outperform FORGE in its ability to deceive experts by generating high-quality documents.
Subsequent work focused on the fact that technical documents can be multimodal, e.g., they may contain tables or images or diagrams or equations/formulas. Probabilistic logic graphs [
21] provided a single framework based on graphs to represent knowledge expressed via such multimodal structures. Reference [
49] proposed a mechanism to generate fake equations that were sufficiently close to the real equation to be credible, yet sufficiently different to be wrong.
4 Generating Fake Documents
At this stage, each occurrence of a token in a document has an associated surprisal score. However, the same token may appear in different parts of the document with different surprisal scores. We can set the surprisal score of a token in a given document to be any aggregate (e.g., mean, median) of the set of surprisal scores of the token in the document.
6For example, consider the sentences (S1) and (S2) both taken from a US patent [
23].
Sentence S1
Thus, for context, from 0.8 to 1.2 mol of catalyst can be employed per mole of chloroacetyl chloride.
The surprisal scores of “chloroacetyl” and “chloride” in S1 are 22.39 and 16.19, respectively.
Sentence S2
There is a need for a technically simple and highly efficient method for preparing 2,5-dimethylphenyl acetic acid.
The surprisal score of “dimethylphenyl” in S2 is 19.28.
Consider the case of the “How many of each animal did Moses take on the Ark?” example. This deception succeeded because the word “Moses” is not too surprising in the context of the word “Ark.” Had we instead posed this question as “How many of each animal did Oprah take on the Ark?” then the deception would have been much less effective because people do not mentally associate either Oprah with Biblical themes. As a consequence, we hypothesized that the surprisal score of a word being replaced must not be too high. We therefore provided a surprisal score interval for each word being replaced so very surprising words are not replaced. At the same time, we do not want words being replaced to be very unsurprising—replacing such words may not have the desired effect of making the generated fake “wrong enough.”
In sentence S1 above, for instance, we might consider replacing the words “chloracetyl” with “pyridyl” and “chloride” with “iodide” to yield sentence S3 below.
Sentence S3
Thus, for context, from 0.8 to 1.2 mol of catalyst can be employed per mole of pyridyl iodide.
The surprisal score of “pyridyl” and “iodide” are 20.54 and 17.31, respectively. These surprisal scores are similar to those of the words “chloracetyl” and “chloride” that they, respectively, replace.
The reader will notice that (S3) is intuitively a credible replacement for (S1). The rest of this section will show how such fakes can be generated automatically (and, in fact, S3 is generated automatically from S1 by one of the algorithms we discuss below).
4.1 An Explicit Bilevel Surprisal Optimization Problem
Given a surprisal score interval, we will try to replace some tokens whose score is in this interval to generate a set of fake documents. We need to find the interval that will give us a set of documents whose deceptive capabilities are the highest. In addition, we need to consider which token we should use to replace a token in the original document to make a fake document look real, while at the same time ensuring that the resulting set of fake documents look different from one another.
We can formulate this as a bilevel optimization problem: Determine the interval first and then determine the best set of fake documents. In this section, we will formulate this problem with several objective functions for each approach. Figure
2 shows the natural formulation of our fake document generation problem as a bilevel optimization problem. The notation used and their intuitive explanation is as follows:
—
\(\mathcal {C}\) is the set of tokens in the original document d.
—
\(S(c)\) is the suprisal score of token
c (aggregated across all occurrences in
c in the original document). Throughout this section, we slightly abuse notation and allow
\(S(c)\) to be any one of a number of different types of surprisal scores such as those in References [
4,
12,
18], as well as variants thereof such as the SSIDF score proposed earlier in this article. In fact, as new methods of evaluating surprisal are developed, those, too, can be used as potential definitions of
\(S(c)\)—and everything in this section will continue to work with those new definitions.
—
\(I=[I_1,I_2]\) is the surprisal score interval for each token in \(\mathcal {C}\) of the original document.
—
\(\mathcal {C}(c)\) is the set of tokens from the corpus of documents that can be used as replacements for c—these are tokens in the same cluster as c. In addition, the tokens in \(\mathcal {C}(c)\) should have a surprisal score interval in the range \(I=[I^{\prime }_1,I^{\prime }_2]\).
—
\(\mathcal {F}\) is a set of fake documents. For now, we can think of each of these as a copy of the original document—once we solve the optimization problem in Figure
2, we will make the appropriate replacements to generate the fake documents.
—
\(\text{fake}(\mathcal {F})\) is a function to measure deceptive capabilities. We will give specific examples of this function later.
—
\(X_{f,c,c^{\prime }}=1\) says that token c in document f is replaced by \(c^{\prime }\), and \(X_{f,c,c^{\prime }}=0\) says that c in document f is not replaced by \(c^{\prime }\). \(dist(c,c^{\prime })\) is the distance between c and \(c^{\prime }\).
Equation (3) ensures that at least \(\alpha\) tokens are replaced. This partially ensures that the fakes are sufficiently different from the original. Equation (4) ensures that the distance between each fake document and the original document is not less than a threshold \(\beta\). This ensures that the fakes are sufficiently different from the original. Equation (5) says that we never replace concept c in document f with \(c^{\prime }\) if the surprisal score of c is in the desired interval \([I_1,I_2]\) but the surprisal score of \(c^{\prime }\) is less than \(I_1\). Intuitively, this means that \(c^{\prime }\) has a surprisal score outside the desired interval. Later in this article, we will experimentally identify the desired interval that leads to maximal deception. Equation (6) is similar but this time, \(c^{\prime }\) surprisal score exceeds \(I_2\). Equation (7) says that we never replace concept c in document f with \(c^{\prime }\) if the surprisal score of c is outside the desired interval \([I_1,I_2]\) by being smaller than \(I_1\). Again, this is because once we have discovered an interval for the surprisal score that maximizes deception, we want the surprisal score of replaced concepts to be within that desired interval. Similarly, Equation (8) says that we never replace concept c in document f with \(c^{\prime }\) if the surprisal score of c is outside the desired interval \([I_1,I_2]\) by being greater than \(I_2\). The final constraint, Equation (9), says that all the variables \(X_{f,c,c^{\prime }}\) are binary, i.e., set to either 0 (indicating the c is not replaced by \(c^{\prime }\) in document f) or 1 (indicating the c is replaced by \(c^{\prime }\) in document f). This is because for each potential fake file f, a concept c is either replaced by concept \(c^{\prime }\) or not.
4.2 The Objective Function
The objective function uses the term
\(\text{fake}~(\mathcal {F})\) whose definition has not been provided thus far. This expression could be defined in many ways. One way is as follows:
This formulation ensures that a fake document is close to the original document and that two fake documents are different from each other. This is important, because we do not want all the fake documents to look exactly the same.
\(TFIDF(c)\) is the well-known product of the term frequency of
c and the inverse document frequency of
c, and
\(\lambda \gt 0\) is a constant.
We can also consider a variant of the above formulation. Suppose
\(\tau \in [0,\max _{f\in \mathcal {F}}\sum _{ c\in \mathcal {C},c^{\prime }\in \mathcal {C}(c)}dist(c,c^{\prime })TFIDF(c)]\). We can now try to make sure that the distance between each fake document and the original document is close to
\(\tau\) by updating the above objective function as follows:
This formulation allows us to adjust the distance between each fake document and the original document to make sure that they are not too close and not too far. When
\(\tau =0\), Equation (
11) is equivalent to Equation (
10).
4.3 An Implicit Bilevel Surprisal Optimization Problem
One problem with using the explicit bilevel surprisal optimization problem formulation above is that the number of variables can be enormous. If the corpus of documents from the domain of interest is large, then the set of tokens in that corpus can be enormous—and the explicit formulation has one variable
\(X_{f,c,c^{\prime }}\) for
each concept
\(c^{\prime }\) in the corpus. If we consider a case where we want to generate say 100 fake documents, then there are 1,000 tokens in the original document and 100k tokens in the domain corpus, then we may have
\(10^{11}\) variables
\(X_{f,c,c^{\prime }}\), which would likely cause any real-world application to struggle. In this section, we try to replace the
\(X_{f,c,c^{\prime }}\) variables with
\(X_{f,c}\) variables to tame this complexity. To do this, we use the “Implicit” Bilevel Optimization Problem in Figure
3.
The notation in Figure
3 is similar to that in Figure
2, but there are some differences worth noting.
—
\(X_{f,c}=1\) means that the token c in document f is replaced in fake f (but without specifying which concept in \(\mathcal {C}(c)\) is the replacement). \(X_{f,c}=0\) means c in document f is not replaced.
—
Equation (14) ensures that at least \(\alpha\) tokens are replaced, while Equation (15) ensures that the distance between each fake document and the original document is not less than a threshold \(\beta\). \(\overline{dist}(c)\) is the average distance between c and each element in \(\mathcal {C}(c)\).
As in the case of the explicit approach, we may now formulate our objective function
\(\text{fake}(\mathcal {F})\) in terms of the
\(X_{f,c}\) variables. One way to do this is as given below:
which ensures that a fake document is close to the original document and two fake documents are distinguishable.
\(TFIDF(c)\) is the product of the term frequency of
c and the inverse document frequency of
c, and
\(\lambda \gt 0\) is a constant. And as in the case of the explicit approach, if
\(\tau \in [0,\max _{f\in \mathcal {F}}\sum _{c\in \mathcal {C}}\overline{dist}(c)TFIDF(c)]\), then we can define a version of the objective function, which ensures that the distance between each fake document and the original document is close to
\(\tau\).
which allows us to adjust the distance between each fake and the original document to make sure that they are neither too close nor too far. When
\(\tau =0\), Equation (
20) is equivalent to Equation (
19).
5 Transforming the Bilevel Program into A Linear Program
The replacement of the \(X_{f,c,c^{\prime }}\) variables in the explicit bilevel surprisal optimization problem with \(X_{f,c}\) variables provides one potential performance improvement. However, bilevel optimization problems are still hard to solve. The goal of this section is to transform the bilevel programs into linear programs.
In Equations (1)–(9) of the explicit bilevel surprisal optimization problem, we want to use
\(c^{\prime }\) to replace
c so
\(I_1\le S(c)\le I_2\) and
\(I^{\prime }_1\le S(c^{\prime })\le I^{\prime }_2\), i.e.,
Recall, as stated earlier, that
\(S(c)\) could be any arbitrary but fixed surprisal function such as those proposed in References [
4,
12,
18], the SSIDF metric proposed earlier in this article, or, in fact, any function that is hypothesized to measure the quality of word replacements. Note that for any pair of tokens
\(c,c^{\prime }\) not satisfying the above constraints,
\(X_{f,c,c^{\prime }}\) should be 0, indicating that we cannot use
\(c^{\prime }\) to replace
c. To make this happen, we introduce a binary
\(I_{c,c^{\prime },i}\) for any
\(c,c^{\prime }\) for each of the above four constraints, which satisfies:
which assumes that the surprisal score values have normalized to
\([0,1]\). For example, we have
\(I_{c,c^{\prime },1}=1\) if
\(S(c)-I_1\ge 0\), otherwise,
\(I_{c,c^{\prime },1}=0\). Then, we have:
which will make sure that
\(X_{f,c,c^{\prime }}=0\) if
c or
\(c^{\prime }\) is not in the given interval. Therefore, we can transform the bilevel program in Equations(1)–(9) into the linear program shown in Figure
4.
Similarly, the implicit bilevel program in Equations (12)-(18) can be transformed into the linear surprisal program shown in Figure
5.
The objective function contains absolute values of variables, e.g., the objective function
\(\text{fake}(\mathcal {F})\) in Equation (
10) has
\(\sum _{c\in \mathcal {C},f,f^{\prime }\in \mathcal {F}}|\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f,c,c^{\prime }}-\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f^{\prime },c,c^{\prime }}|\). For each absolute value term, we use the following linear constraints to represent the value
\(\sum _{c\in \mathcal {C},f,f^{\prime }\in \mathcal {F}}Y_{f,f^{\prime },c}\)=
\(\sum _{c\in \mathcal {C},f,f^{\prime }\in \mathcal {F}}|\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f,c,c^{\prime }}-\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f^{\prime },c,c^{\prime }}|\):
where
Y is a large constant (at least the upper bound of
\(Y_{f,f^{\prime },c}\), i.e.,
\(|\mathcal {C}|^2\)),
\(B_{f,f^{\prime },c}=0\) means
\(Y_{f,f^{\prime },c}=\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f,c,c^{\prime }}-\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f^{\prime },c,c^{\prime }}\ge 0\), and
\(B_{f,f^{\prime },c}=1\) means
\(Y_{f,f^{\prime },c}=-(\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f,c,c^{\prime }}-\sum _{c^{\prime }\in \mathcal {C}(c)}X_{f^{\prime },c,c^{\prime }})\ge 0\). Then, the objective function
\(\text{fake}(\mathcal {F})\) in Equation (
10) becomes the following linear objective function:
with constraints in Equations (
60)–(
65). Similarly, the objective function
\(\text{fake}(\mathcal {F})\) in Equation (
11) becomes the following linear objective function:
with the help of constraints in Equations (
60)–(
65) and the following constraints:
Similarly, the objective function
\(\text{fake}(\mathcal {F})\) in Equation (
19) becomes the following linear objective function:
with the following constraints:
The objective function
\(\text{fake}(\mathcal {F})\) in Equation (
20) becomes the following linear objective function:
with the help of the above constraints for
\(Y_{f,f^{\prime },c}\) and the following constraints for
\(Y_f\):
6 Implementation and Experiments
The implementation of the SbFAKE system consisted of 980 lines of code written in Python. In addition, the implementation used multiple Python libraries including the Natural Language Toolkit, scikit-learn, SciPy, and PyTorch. The optimization problems were set up using CVSPY library and solved with GUROBI. We calculated surprisal scores via two different models: an LSTM-based model and a fine-tuned GPT-2 model. All experiments were run on a machine with i7-10750H CPU and 64 GB RAM.
Our experiments fall into three broad categories: (i) finding the best parameter settings for the SbFAKE system, (ii) comparing SbFAKE against the state-of-the-art existing system, and (iii) assessing specific hypotheses regarding the SbFAKE system. We describe each of these below.
6.1 Finding the Best Parameters for SbFAKE
To find the best hyperparameters for SbFAKE, we conducted an experiment (with 40 human subjects) at the sentence level for both the Computer Science and the Chemistry domains and varied the following parameters:
(1)
Type of optimization: We compared explicit versus implicit linear program optimizations that were described in Section:
4.
(2)
Surprisal score versus SSIDF: We also varied the type of scoring considered to include both surprisal scores and SSIDF (described in Section:
3.2). Recall that SSIDF is a combination of surprisal score and TFIDF.
(3)
Model that was used to compute surprisal score: We used both LSTM-based and fine-tuned GPT-based models to calculate the surprisal score [
29].
(4)
Score interval: The fake document generation problem requires specifying an interval \([I1,I2]\). In our experiments, we compared five different combinations of \(I1\) and \(I2\): [0.0, 1.0];[0.3, 0.7];[0.0, 0.4];[0.6, 1.0];[0.4, 0.6].
(5)
Others: Based on some quick initial testing, we set the parameters
\(\alpha = 0.8\),
\(\beta = 0.8\), and
\(\lambda = 0.6\) throughout our experiments. These parameters were introduced in equations Equation (3), Equation (15), and Equation (
10), respectively.
We present the best 16 parameter settings in Table
2.
The rows correspond to different parameter settings (in descending order of the quality of performance as measured by the last column). The columns in Table
2 can be interpreted as follows: The “Choice 1” column shows the percentage of subjects whose first choice was the result generated by a given parameter setting (or the original document which corresponds to the “Authentic version” row). Similarly, the “Choice 2” and “Choice 3” columns show the percentage of subjects who chose the result generated by a given parameter setting (or the original document). The last column shows the percentage of subjects whose top 3 choices included the sentences generated by the given parameter setting.
Results. We see that the best parameter setting (based on any one of the top 3) used the explicit algorithm with the LSTM-based surprisal model and a surprisal interval of
\([0.0,1.0]\) (i.e., the surprisal interval was unconstrained). This corresponds to the first row in Table
2. However, if we consider only the top choice, then the second row, which has the same parameters except that the surprisal interval is
\([0.6,1.0]\), is best.
The results show that out of the 16 combinations examined, 11 beat out the “authentic” version. These results are statistically significant via a Student’s t-test (\(p \lt 0.001\)). These combinations were chosen more frequently as one of the three preferred choices. Consequently, based on these findings, we recommend selecting the following three parameter combinations that demonstrated the highest efficiency:
(1)
explicit, lstm, surprisal, [0.0, 1.0] (we refer to this setting later in the article as SbFAKE[E]-01). The “01” denotes the surprisal interval used.
(2)
explicit, lstm, surprisal, [0.6, 1.0] (we refer to this setting later in the article as SbFAKE[E]-61). The “61” denotes the surprisal interval used.
(3)
implicit, lstm, surprisal, [0.6, 1.0] (we refer to this setting later in the article as SbFAKE[I]).
Each of these three parameter settings beats the next best one in a statistically significant manner (\(p \lt 0.001\) in all cases).
6.2 Comparison with WE-FORGE
This section describes an experiment that used the three best parameter settings discovered above to compare the deception ability of
SbFAKE with WE-FORGE [
1], which represents the best prior work on fake document generation. WE-FORGE was already shown to be superior to the earlier FORGE [
8] system and hence
SbFAKE was only compared with WE-FORGE.
A total of 40 participants were assigned to two surveys, each focusing on documents from either the field of Chemistry or Computer Science.
7 Each survey comprised 10 tasks, with each task presenting participants with 11 documents. Among these documents, only one was original, while the remaining 10 were generated using five distinct algorithms, resulting in two fabricated documents from each algorithm (one each corresponding to the three best settings of
SbFAKE and two versions of WE-FORGE
8). The primary objective of each task was for participants to identify and select the three documents they believed to be authentic. Each of the five algorithms generated two fake documents in each task to give a total of 10 fakes and one original document per task. These five algorithms were compared head-to-head and the results of these head-to-head comparisons are reported in this section. This is therefore a fair comparison.
Furthermore, as part of this experiment, we recorded the duration of time participants spent on the task. In addition, we included four questions related to the documents they were working with to assess their attentiveness to the task.
Table
3 shows the results on the Chemistry dataset, while Table
4 shows the results on the Computer Science data.
Results on Chemistry documents. Table
3 shows the results. Columns “Choice 1,” “Choice 2,” “Choice 3,” and “Total” are interpreted in the same way as in Table
2. We see that if only the top choice is considered, WE-FORGE[I] does the best and beats all versions of
SbFAKE. But if we consider the top 3 as was done in the WE-FORGE paper [
1] and as we did when tuning hyperparameters in Table
2, then
SbFAKE[E] is the top performer.
Results on Computer science documents Table
4 shows the results on the CS dataset. Here, we see that all three versions of
SbFAKE outperform both versions of WE-FORGE. In fact, WE-FORGE[I] (which was the top performer w.r.t. Chemistry documents) is the worst performer. All these results were statistically significant at the
\(p \lt 0.001\) level.
Results on combined documents Table
5 presents the aggregated outcomes for documents from both datasets. We see that all versions of
SbFAKE beat all versions of WE-FORGE and that
SbFAKE[E] is the best and the surprisal intervals of [0,1] and [0.6,1] seem to give the same performance. The two pairwise comparisons of WE-FORGE with
SbFAKE[E] and
SbFAKE[I] show that WE-FORGE exhibits weaker performance in a statistically significant manner (
\(p \lt 10^{-10})\). The claim that
SbFAKE[E] is better than
SbFAKE[I] is also statistically significant (
\(p \lt 10^{-10})\).
6.3 Impact of Surprisal Scores
We also explored three hypotheses relating to surprisal scores.
We first define the deception rate of a fake document as the percentage of human subjects that chose that document as the real one as one of their top 3 choices. For instance if a fake document d was chosen to be in the top 3 by 10 out of 40 human subjects in the two experiments combined, then the deception rate of that document is 10/40 = 25%. A higher deception rate for a selected document suggests that the document has higher efficacy in achieving deception.
Choosing to Replace Low Surprisal Score Words Achieves Higher Deception. Suppose d is the original document and f is a fake version of it. We defined the average surprisal score \(AS(f,d)\) to be the average of the average surprisal scores of occurrences of words in d that were replaced to generate f. Our hypothesis was that as \(AS(f,d)\) increased, the deception rate would go down.
Figure
6 shows the result. We see that an increase in
\(AS(f,d)\) achieves an increase in deception up to some point (surprisal score = 15) but any further increase causes the deception rate to go down. This suggests that deception is not achieved when the words chosen for replacement are very unsurprising, but they are not achieved either when the words chosen to be replaced are too surprising. The “sweet spot” is somewhere in the middle with surprisal scores around 15.
Choosing Low Surprisal Score Words as the Replacements Achieves Higher Deception. This time, our hypothesis was that as the average surprisal score of the words chosen to be replacements increased (once they were substituted into the original document to generate a fake), the deception rate would go down. In other words, this time around, we looked at the replacement words (not the words being replaced) after substitution to create the fake.
Figure
7 shows the result. We see that an increase in the average surprisal score of replacement words
after replacement to generate the fake achieves an increase in deception up to some point (surprisal score somewhere between 16 and 18) but any further increase causes the deception rate to go down. This suggests that deception is not achieved when the words chosen as the replacements are very unsurprising, but they are not achieved either when the words chosen as the replacements are too surprising. The “sweet spot” is somewhere in the middle with surprisal scores around 16–18.
Choosing Replacement Words whose Surprisal Score Is About the Same as the Replaced Words Achieves Higher Deception. This time, our hypothesis was that as the
difference in the average surprisal score of the words chosen for replacement and chosen to be replacements increased (once they were substituted into the original document to generate a fake), the deception rate would go down.
Figure
8 shows the result. We see that the hypothesis is in fact more or less valid. As long as the difference between the average surprisal score of the replaced words and the replacement words is less than or equal to two, the deception rate is high. After that, the deception rate drops precipitously.
6.4 Additional Hypotheses
We also examined some other hypotheses. Our experiments allowed us to monitor the amount of time the human subjects were active when responding to a task. We also periodically included attention checks.
Influence of time. Our hypothesis posits that the duration of time an individual invests in the experiment is proportional to the likelihood of successfully finding the original document (in the subject’s top 3 choices).
Figure
9 shows the time spent by the subject on the experiment (x-axis) against the probability that one of his/her top-3 answers was correct. To our surprise, the hypothesis is not completely true. We see an interesting pattern. Subjects who spent too little or too much time were the ones who were most deceived. In contrast, subjects who spent a “modest” amount of time were more successful in finding the original document. Those who spent relatively little time may not have been as diligent as they should have been and perhaps were quickly fooled. Those who spent a lot of time were perhaps unable to uncover the real document despite strong effort. Those who spent a modest amount of time (two–three hours for computer science, over four hours for chemistry) had perhaps achieved the right tradeoff in terms of time spent vs. accuracy in finding the real document.
Influence of attentiveness. We performed four attention checks during the time subjects were participating in the experiment. Simple questions about the data were posed and were supposed to be answered correctly. We hypothesized that the subjects who answered these attention check queries correctly were more likely to find the real document than those who did not.
Figure
10 shows the results. We see that the results are inconclusive, regardless of the dataset considered.
6.4.1 Score Type Hypothesis.
One of the most notable aspects that captured our attention was the contrast between surprisal scores and ssidf scores. As indicated in Table
6, surprisal scores produced significantly better outcomes in terms of deception of adversaries. Fakes generated using surprisal scores were more frequently chosen over those generated through ssidf scores.
6.4.2 Model Type Hypothesis.
Other findings regarding the surprisal score show that scores produced using the LSTM-based model are preferable almost at the same rate as in the hypothesis before (Table
7).
6.4.3 Interval Hypothesis.
The table containing information regarding thresholds used in optimization shows that the best option is to not exclude words based on the surprisal scores (Table
8).
6.4.4 Optimization Type Hypothesis.
As for the optimization itself, it was shown that explicit optimization outperforms implicit optimization (Table
9).
6.4.5 Runtime Assessments.
Another experiment checked the runtime of the explicit optimization vs. the implicit optimization. The overall runtime is highly dependent on the original text, but optimization itself is around 40 times faster in the case of implicit optimization compared to explicit optimization.
We also checked the absolute runtimes of the implicit optimization. For this, we ran two experiments. In the first, we computed the runtime of the implicit algorithm when we varied the number of words in the original document (Figure
11(a)) and also varied the number of fakes desired. In the second, we computed the runtime of the implicit algorithm when we varied the number of
nouns in the original document (Figure
11(b)) and also varied the number of fakes desired. Recall that in
SbFAKE, we only replace nouns. The times reported are quite fast, just 1–2 seconds in the scenarios considered. But these times do not include the time to compute the surprisal scores.
Surprisal score computation is relatively expensive, as shown in Figure
12, and can take several minutes for long documents.
Regardless, the total time to generate a number of fake versions of even long documents is a matter of minutes—maybe 10 minutes maximum—which is deemed acceptable by potential users we have spoken with. In the worst case, the person tasked with creating the fakes merely pushes a “generate fakes” button and goes for a short coffee break or the like.
7 Conclusion and Future Work
There is now growing interest in combating intellectual property theft through the use of fake data [
10,
15,
33] and intentionally falsified information [
30]. One version of this general trend is also applicable to the use of fake documents in which we create a set of fake versions that are similar enough to the original to be credible, yet sufficiently different to be wrong [
3,
17,
24,
36]. We develop a novel system called
SbFAKE that makes use of surprisal, an information theoretic measure widely used in research on human sentence understanding. Bringing together surprisal, natural language processing, and optimization, we show that our novel method to generate a set of fake versions of a document significantly outperforms the state-of-the-art fake document system, WE-FORGE [
1].
The SbFAKE framework and method we have developed is the first framework (to the best of our knowledge) that combines the power of psychology with NLP and optimization methods to enhance the ability to generate fake documents that are more likely to deceive an IP thief than the best prior work. More generally, we have established that drawing insights from cognitive science research on human language processing improves computational methods for deceptive text generation, suggesting the potential for cognitive science research in human language understanding to contribute to novel solutions for other problems. Our technical approach shows how any surprisal scoring method can be used to automatically associate a bilevel optimization problem whose solution corresponds to a desired set of n fake documents. We further show that this bilevel optimization problem can be transformed into an equivalent single-level optimization problem that can be solved efficiently to generate effective fakes. This approach has the potential of being improved over the coming years (e.g., by developing more sophisticated surprisal models) and enhancing scalability so batches of fake documents (i.e., n fake documents can be simultaneously generated for each of a batch of k original documents).
A first direction for future work is to develop specialized data structures to scale the speed with which we can compute the surprisal scores of all words (or all nouns) within the original document. Currently, the problem of finding the surprisal score of all the words in the document dominates the runtime of the SbFAKEframework.
A second direction for future work is to look at the relationship between surprisal scores and large language models (LLMs). In an LLM, we compute the probability that a concept c (or a word) will appear in a given context. Text is generated by inserting high probability words in a given context, e.g., the partial sentence “He enjoyed his trip to New...” would likely be completed with the word “York” when an LLM is used. One interesting hypothesis is that modern LLMs might be better than other models at suggesting plausible replacement words because they use a much larger context than other kinds of language models and encode much more world knowledge because they are trained on vastly more text. In particular, this suggests that we can leverage LLMs to better identify replacement concepts.
We plan to study these issues in further detail in future work.