Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An Experimental Comparison of a Document Deception Detection Policy using Real and Artificial Deception

Published: 01 August 2012 Publication History

Abstract

Developing policies to screen documents for deception is often hampered by the cost of data collection and the inability to evaluate policy alternatives due to lack of data. To lower data collection costs and increase the amount of data, artificially generated deception data can be used, but the impact of using artificially generated deception data is not well understood. This article studies the impact of artificially generated deception on document screening policies. The deception and truth data were collected from financial aid applications, a document-centric area with limited resources for screening. Real deception was augmented with artificial data generated by noise and deception generation models. Using the real data and artificially generated data, we designed an innovative experiment with deception type and deception rate as factors, and harmonic mean and cost as outcome variables. We used two budget models (fixed and variable) typically employed by financial aid offices to measure the cost of noncompliance in financial aid applications. The analysis included an evaluation of a common policy for deception screening using both fixed and varying screening rates. The results of the experiment provided evidence of similar performance of screening policy with real and artificial deception, suggesting the possibility of using artificially generated deception to reduce the costs associated with obtaining training data.

References

[1]
Angluin, D. and Laird, P. 1988. Learning from noisy examples. Mach. Learn. 2, 4, 343--370.
[2]
Barse, E., Kvarnstrom, H., and Jonsson, E. 2003. Synthesizing test data for fraud detection systems. In Proceedings of the 19th Annual Computer Security Applications Conference. 384--395.
[3]
Bond, C. and Depaulo, B. 2006. Accuracy of deception judgments. Pers. Soc. Psych. Rev. 10, 214--234.
[4]
Buller, D. and Burgoon, J. 1996. Interpersonal deception theory. Comm. Theory 6, 3, 203--242.
[5]
Burgoon, J., Blair, J., Qin, T., and Nunamaker, J. 2003. Detecting deception through linguistic analysis. In Proceedings of the 1st NSF/NIJ Symposium on Intelligence and Security Informatics. 91--101.
[6]
Burgoon, J., Blair, J., and Strom, R. 2005. ‘Heuristics and modalities in determining truth versus deception. In Proceedings of 38th Hawaii International Conference on System Sciences.
[7]
Demsar, J. 2006. Statistical comparisons of classifiers over multiple datasets. J. Mach. Learn. Res. 7, 1--30.
[8]
FAFSA. 2008. Free application for federal student aid. http://www.fafsa.ed.gov.
[9]
Fannie Mae. 2007. Mortgage fraud overview. https://www.efanniemae.com/utility/legal/pdf/mtgfraudoverview.pdf.
[10]
George, J., Marett, K., and Tilley, P. 2004. Deception detection under varying electronic media and warning conditions. In Proceedings of 37th Hawaii International Conference on System Sciences.
[11]
Goldman, S. and Stone, R. 1995. Can PAC algorithms tolerate random attribute noise? Algorithimica 14, 70--84.
[12]
Haines, J., Lippmann, R., Fried, D., Zissman, M., Tran, E., and Boswell, S. 2001. DARPA Intrusion detection evaluation: Design and procedures. Tech. rep., MIT Lincoln Laboratory, Lexington, MA.
[13]
Heuer, R. 1981. Strategic deception and counter deception, a cognitive process approach. Internat. Studies Quart. 25, 2, 294--327.
[14]
Jiang, Z., Mookerjee, V., and Sarkar, S. 2005. Lying on the web: Implications for expert systems redesign. Inform. Syst. Res. 16, 2, 131--148.
[15]
Kearns, M. and Li, M. 1993. Learning in the presence of malicious errors. SIAM J. Comput. 22, 4, 807--837.
[16]
Knight, J. and Levenson, N. 1986. An experimental evaluation of the assumption of independence of multi-version programming. IEEE Trans. Softw. Eng. 12, 96--109.
[17]
Kohavi, R. and Provost, F. 1998. Editorial for the special issue on applications of machine learning and the knowledge discovery process. Mach. Learn. 30, 2--3.
[18]
Mannino, M., Mookerjee, V., and Gilson, R. 1995. Improving the performance stability of inductive expert systems under input noise. Inform. Syst. Res. 6, 4, 328--356.
[19]
Mannino, M., Yang, Y., and Ryu, Y. 2009. Classification algorithm sensitivity to training data with non-representative attribute noise. Decis. Support Syst. 46, 2, 743--751.
[20]
Rhodes, D. and Tuccillo, A. 2008. Analysis of quality assurance program sample data: 2006-07. http://ifap.ed.gov/qadocs/ToolsforSchools/0607DataAnalysisReport.pdf.
[21]
Rijsbergen, C. 1979. Information Retrieval 2nd Ed., Butterworth, London.
[22]
Twitchell, D., Wiers, K., Adkins, M., Burgoon, J., and Nunamaker, J. 2005. StrikeCOM: A multi-player online strategy game for researching and teaching group dynamics. In Proceedings of the 38th Hawaii International Conference on System Sciences.
[23]
U.S. Department of Education. 2009. Federal Pell grant program. http://www.ed.gov/programs/fpg/index.html.
[24]
Yang, Y. and Mannino, M. 2012. An experimental comparison of real and artificial deception using a deception generation model. Decis. Support Syst. 53, 3, 543--553.
[25]
Zhou, L. and Zhang, D. 2005. An exploratory study into deception detection in text-based computer-mediated communication. IEEE Trans. Prof. Comm. 48, 291--400.
[26]
Zhou, L., Twitchell, D., Qin, T., Burgoon, J., and Nunamaker, J. 2003. An exploratory study into deception detection in text-based computer-mediated communication. In Proceedings of the 36th Hawaii International Conference on System Sciences.
[27]
Zhou, L., Burgoon, J., Twitchell, D., Qin, T., and Nunamaker, J. 2004. A comparison of classification methods for predicting deception in computer-mediated communication. J. Manage. Inform. Syst. 20, 4, 139--165.
[28]
Zhou, L., Shi, Y., and Zhang, D. 2008. A statistical language modeling approach to online deception detection. IEEE Trans. Knowl. Data Eng. 20, 8, 1077--1081.
[29]
Zhu, X. and Wu, X. 2004. Class noise vs. attribute noise: A quantitative study of their impacts. Artif. Intell. Rev. 22, 3, 177--210.

Cited By

View all

Index Terms

  1. An Experimental Comparison of a Document Deception Detection Policy using Real and Artificial Deception

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal of Data and Information Quality
      Journal of Data and Information Quality  Volume 3, Issue 3
      August 2012
      53 pages
      ISSN:1936-1955
      EISSN:1936-1963
      DOI:10.1145/2287714
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 August 2012
      Accepted: 01 April 2012
      Revised: 01 December 2011
      Received: 01 June 2010
      Published in JDIQ Volume 3, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Screening policy
      2. artificial deception
      3. boosted deception
      4. data generation model
      5. deception
      6. natural deception
      7. noise

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media