research-article

An Experimental Comparison of a Document Deception Detection Policy using Real and Artificial Deception

Authors:

Michael ManninoAuthors Info & Claims

Journal of Data and Information Quality (JDIQ), Volume 3, Issue 3

Article No.: 6, Pages 1 - 25

https://doi.org/10.1145/2287714.2287716

Published: 01 August 2012 Publication History

Abstract

Developing policies to screen documents for deception is often hampered by the cost of data collection and the inability to evaluate policy alternatives due to lack of data. To lower data collection costs and increase the amount of data, artificially generated deception data can be used, but the impact of using artificially generated deception data is not well understood. This article studies the impact of artificially generated deception on document screening policies. The deception and truth data were collected from financial aid applications, a document-centric area with limited resources for screening. Real deception was augmented with artificial data generated by noise and deception generation models. Using the real data and artificially generated data, we designed an innovative experiment with deception type and deception rate as factors, and harmonic mean and cost as outcome variables. We used two budget models (fixed and variable) typically employed by financial aid offices to measure the cost of noncompliance in financial aid applications. The analysis included an evaluation of a common policy for deception screening using both fixed and varying screening rates. The results of the experiment provided evidence of similar performance of screening policy with real and artificial deception, suggesting the possibility of using artificially generated deception to reduce the costs associated with obtaining training data.

References

[1]

Angluin, D. and Laird, P. 1988. Learning from noisy examples. Mach. Learn. 2, 4, 343--370.

Digital Library

[2]

Barse, E., Kvarnstrom, H., and Jonsson, E. 2003. Synthesizing test data for fraud detection systems. In Proceedings of the 19th Annual Computer Security Applications Conference. 384--395.

Digital Library

[3]

Bond, C. and Depaulo, B. 2006. Accuracy of deception judgments. Pers. Soc. Psych. Rev. 10, 214--234.

[4]

Buller, D. and Burgoon, J. 1996. Interpersonal deception theory. Comm. Theory 6, 3, 203--242.

[5]

Burgoon, J., Blair, J., Qin, T., and Nunamaker, J. 2003. Detecting deception through linguistic analysis. In Proceedings of the 1st NSF/NIJ Symposium on Intelligence and Security Informatics. 91--101.

Digital Library

[6]

Burgoon, J., Blair, J., and Strom, R. 2005. ‘Heuristics and modalities in determining truth versus deception. In Proceedings of 38th Hawaii International Conference on System Sciences.

Digital Library

[7]

Demsar, J. 2006. Statistical comparisons of classifiers over multiple datasets. J. Mach. Learn. Res. 7, 1--30.

Digital Library

[8]

FAFSA. 2008. Free application for federal student aid. http://www.fafsa.ed.gov.

[9]

Fannie Mae. 2007. Mortgage fraud overview. https://www.efanniemae.com/utility/legal/pdf/mtgfraudoverview.pdf.

[10]

George, J., Marett, K., and Tilley, P. 2004. Deception detection under varying electronic media and warning conditions. In Proceedings of 37th Hawaii International Conference on System Sciences.

Digital Library

[11]

Goldman, S. and Stone, R. 1995. Can PAC algorithms tolerate random attribute noise? Algorithimica 14, 70--84.

Digital Library

[12]

Haines, J., Lippmann, R., Fried, D., Zissman, M., Tran, E., and Boswell, S. 2001. DARPA Intrusion detection evaluation: Design and procedures. Tech. rep., MIT Lincoln Laboratory, Lexington, MA.

[13]

Heuer, R. 1981. Strategic deception and counter deception, a cognitive process approach. Internat. Studies Quart. 25, 2, 294--327.

[14]

Jiang, Z., Mookerjee, V., and Sarkar, S. 2005. Lying on the web: Implications for expert systems redesign. Inform. Syst. Res. 16, 2, 131--148.

Digital Library

[15]

Kearns, M. and Li, M. 1993. Learning in the presence of malicious errors. SIAM J. Comput. 22, 4, 807--837.

Digital Library

[16]

Knight, J. and Levenson, N. 1986. An experimental evaluation of the assumption of independence of multi-version programming. IEEE Trans. Softw. Eng. 12, 96--109.

Digital Library

[17]

Kohavi, R. and Provost, F. 1998. Editorial for the special issue on applications of machine learning and the knowledge discovery process. Mach. Learn. 30, 2--3.

[18]

Mannino, M., Mookerjee, V., and Gilson, R. 1995. Improving the performance stability of inductive expert systems under input noise. Inform. Syst. Res. 6, 4, 328--356.

Digital Library

[19]

Mannino, M., Yang, Y., and Ryu, Y. 2009. Classification algorithm sensitivity to training data with non-representative attribute noise. Decis. Support Syst. 46, 2, 743--751.

Digital Library

[20]

Rhodes, D. and Tuccillo, A. 2008. Analysis of quality assurance program sample data: 2006-07. http://ifap.ed.gov/qadocs/ToolsforSchools/0607DataAnalysisReport.pdf.

[21]

Rijsbergen, C. 1979. Information Retrieval 2nd Ed., Butterworth, London.

Digital Library

[22]

Twitchell, D., Wiers, K., Adkins, M., Burgoon, J., and Nunamaker, J. 2005. StrikeCOM: A multi-player online strategy game for researching and teaching group dynamics. In Proceedings of the 38th Hawaii International Conference on System Sciences.

Digital Library

[23]

U.S. Department of Education. 2009. Federal Pell grant program. http://www.ed.gov/programs/fpg/index.html.

[24]

Yang, Y. and Mannino, M. 2012. An experimental comparison of real and artificial deception using a deception generation model. Decis. Support Syst. 53, 3, 543--553.

Digital Library

[25]

Zhou, L. and Zhang, D. 2005. An exploratory study into deception detection in text-based computer-mediated communication. IEEE Trans. Prof. Comm. 48, 291--400.

[26]

Zhou, L., Twitchell, D., Qin, T., Burgoon, J., and Nunamaker, J. 2003. An exploratory study into deception detection in text-based computer-mediated communication. In Proceedings of the 36th Hawaii International Conference on System Sciences.

Digital Library

[27]

Zhou, L., Burgoon, J., Twitchell, D., Qin, T., and Nunamaker, J. 2004. A comparison of classification methods for predicting deception in computer-mediated communication. J. Manage. Inform. Syst. 20, 4, 139--165.

Digital Library

[28]

Zhou, L., Shi, Y., and Zhang, D. 2008. A statistical language modeling approach to online deception detection. IEEE Trans. Knowl. Data Eng. 20, 8, 1077--1081.

Digital Library

[29]

Zhu, X. and Wu, X. 2004. Class noise vs. attribute noise: A quantitative study of their impacts. Artif. Intell. Rev. 22, 3, 177--210.

Digital Library

Cited By

Rowe NRrushi JRowe NRrushi J(2016)FakesIntroduction to Cyberdeception10.1007/978-3-319-41187-3_7(75-96)Online publication date: 24-Sep-2016
https://doi.org/10.1007/978-3-319-41187-3_7
Yang YMannino M(2012)An experimental comparison of real and artificial deception using a deception generation modelDecision Support Systems10.1016/j.dss.2012.04.00953:3(543-553)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1016/j.dss.2012.04.009

Index Terms

An Experimental Comparison of a Document Deception Detection Policy using Real and Artificial Deception
1. Security and privacy
  1. Database and storage security
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Theory of database privacy and security

Recommendations

An experimental comparison of real and artificial deception using a deception generation model

To develop a data mining approach for a deception application, data collection costs can be prohibitive because both deceptive data and truthful data are necessary to be collected. To reduce data collection costs, artificially generated deception data ...
Deception in avatar-mediated virtual environment

This study explored the effects of avatars on deception - how perceived avatar likeness to self can affect the truthfulness and accuracy of interactions online. More specifically, this study examined the extent to which perceived avatar similarity ...
A Comparison of Classification Methods for Predicting Deception in Computer-Mediated Communication

The increased chance of deception in computer-mediated communication and the potential risk of taking action based on deceptive information calls for automatic detection of deception. To achieve the ultimate goal of automatic prediction of deception, we ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality

Journal of Data and Information Quality Volume 3, Issue 3

August 2012

53 pages

ISSN:1936-1955

EISSN:1936-1963

DOI:10.1145/2287714

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2012

Accepted: 01 April 2012

Revised: 01 December 2011

Received: 01 June 2010

Published in JDIQ Volume 3, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
296
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rowe NRrushi JRowe NRrushi J(2016)FakesIntroduction to Cyberdeception10.1007/978-3-319-41187-3_7(75-96)Online publication date: 24-Sep-2016
https://doi.org/10.1007/978-3-319-41187-3_7
Yang YMannino M(2012)An experimental comparison of real and artificial deception using a deception generation modelDecision Support Systems10.1016/j.dss.2012.04.00953:3(543-553)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1016/j.dss.2012.04.009

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents