Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

A survey on statistical methods for health care fraud detection

  • Published:
Health Care Management Science Aims and scope Submit manuscript

Abstract

Fraud and abuse have led to significant additional expense in the health care system of the United States. This paper aims to provide a comprehensive survey of the statistical methods applied to health care fraud detection, with focuses on classifying fraudulent behaviors, identifying the major sources and characteristics of the data based on which fraud detection has been conducted, discussing the key steps in data preprocessing, as well as summarizing, categorizing, and comparing statistical fraud detection methods. Based on this survey, some discussion is provided about what has been lacking or under-addressed in the existing research, with the purpose of pinpointing some future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Abbott DW, Matkovsky IP, Elder JF (1998) An evaluation of high-end data mining tools for fraud detection. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA

    Google Scholar 

  2. Bennett K, Demiriz A (1998) Semi-supervised support vector machines. Adv Neural Inf Process Syst 12:368–374

    Google Scholar 

  3. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory

  4. Bonchi F, Giannotti F, Mainetto G, Pedreschi D (1999) A classification-based methodology for planning auditing strategies in fraud detection. In Proceedings of SIGKDD99, 175–184

  5. Borsuk ME, Stow CA, Reckhow KH (2004) A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis. Ecol Model 173:219–239

    Article  Google Scholar 

  6. Chan CL, Lan CH (2001) A data mining technique combining fuzzy sets theory and Bayesian classifier—an application of auditing the health insurance fee. In Proceedings of the International Conference on Artificial Intelligence, 402–408

  7. Cooper C (2003) Turning information into action. Computer Associates: The Software That Manages eBusiness, Report, available at http://www.ca.com

  8. Cox E (1995) A fuzzy system for detecting anomalous behaviors in healthcare provider claims. In: Goonatilake S, Treleaven P (eds) Intelligent systems for finance and business. Wiley, New York, pp 111–134

    Google Scholar 

  9. Dai H, Korb KB, Wallace CS, Wu X (1997) A study of casual discovery with weak links and small samples. In Proceeding of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI), San Francisco, CA, pp 1304–1309

  10. Dash M, Liu H (1997) Feature selection for classification. IDA 1:131–156

    Google Scholar 

  11. Fawcett T, Provost F (1999) Activity monitoring: noticing interesting changes in behavior. In Proceedings of SIGKDD99, 53–62

  12. Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620

    Article  Google Scholar 

  13. GAO (1996) Health Care Fraud: Information-Sharing Proposals to Improve Enforcement Effects. Report of United States General Accounting Office

  14. Ghosh S, Reilly D (1994) Credit card fraud detection with a neural network. Proceedings of 27th Hawaii International Conference on Systems Science 3:621–630

    Google Scholar 

  15. Hall C (1996) Intelligent data mining at IBM: new products and applications. Intell Softw Strateg 7(5):1–11

    Google Scholar 

  16. He H, Hawkins S, Graco W, Yao X (2000) Application of Genetic Algorithms and k-Nearest Neighbour method in real world medical fraud detection problem. Journal of Advanced Computational Intelligence and Intelligent Informatics 4(2):130–137

    Google Scholar 

  17. He H, Wang J, Graco W, Hawkins S (1997) Application of neural networks to detection of medical fraud. Expert Syst Appl 13:329–336

    Article  Google Scholar 

  18. Heckerman D (1998) A tutorial on learning with Bayesian networks. In Learning in Graphical Models. Kluwer Academic, Boston, pp 301–354

    Google Scholar 

  19. Herb W, Tom M (1995) A scientific approach for detecting fraud. Best’s Review 95(4):78–81

    Google Scholar 

  20. Hubick KT (1992) Artificial neural networks in Australia. Department of Industry, Technology and Commerce, CPN Publications, Canberra

    Google Scholar 

  21. Hwang SY, Wei CP, Yang WS (2003) Discovery of temporal patterns from process instances. Comp Ind 53:345–364

    Article  Google Scholar 

  22. Lam W, Bacchus F (1993) Learning Bayesian belief networks: an approach based on the MDL principle. Comput Intell 10:269–293

    Article  Google Scholar 

  23. Li J, Jin J, Shi J (2008) Causation-based T 2 Decomposition for Multivariate Process Monitoring and Diagnosis. Journal of Quality Technology, to appear in January 2008.

  24. Li J, Shi J (2007) Knowledge Discovery from Observational Data for Process Control using Causal Bayesian Networks. IIE Transactions 39(6):681–690

    Article  Google Scholar 

  25. Lin J-H, Haug PJ (2006) Data preparation framework for preprocessing clinical data in data mining, AMIA Symposium Proceedings 489–493

  26. Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    Google Scholar 

  27. Major JA, Riedinger DR (2002) EFD: A hybrid knowledge/statistical-based system for the detection of fraud. The Journal of Risk and Insurance 69(3):309–324

    Article  Google Scholar 

  28. NHCAA (2005) The Problem of Health Care Fraud: A serious and costly reality for all Americans, report of National Health Care Anti-Fraud Association (NHCAA)

  29. Nigam K, McCalum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Machine Learning 39:103–134

    Article  Google Scholar 

  30. Ormerod T, Morley N, Ball L, Langley C, Spenser C (2003) Using ethnography to design a Mass Detection Tool (MDT) for the early discovery of insurance fraud. In Proceedings of the ACM CHI Conference

  31. Ortega PA, Figueroa CJ, Ruz GA (2006) A medical claim fraud/abuse detection system based on data mining: a case study in Chile. In Proceedings of International Conference on Data Mining, Las Vegas, Nevada, USA

  32. Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. SIGKEE Explorations 6(1):50–59

    Article  Google Scholar 

  33. Ireson CL (1997) Critical pathways: effectiveness in achieving patient outcomes. J Nurs Adm 27(6):16–23

    Article  Google Scholar 

  34. Shapiro AF (2002) The merging of neural networks, fuzzy logic, and genetic algorithms. Insurance: Mathematics and Economics 31:115–131

    Article  Google Scholar 

  35. Sokol L, Garcia B, West M, Rodriguez J, Johnson K (2001) Precursory steps to mining HCFA health care claims. In Proceedings of the 34th Hawaii International Conference on System Sciences

  36. Spirtes P, Glymour C, Scheines R (1993) Causation, Prediction and Search. Springer, New York

    Google Scholar 

  37. Viaene S, Derrig R, Dedene G (2004) A case study of applying boosting Naive Bayes to claim fraud diagnosis. IEEE Trans Knowl Data Eng 16(5):612–620

    Article  Google Scholar 

  38. Viveros MS, Nearhos JP, Rothman MJ (1996) Applying data mining techniques to a health insurance information system. In Proceedings of the 22nd VLDB Conference, Mumbai, India, 286–294

  39. Wei CP, Hwang SY, Yang WS (2000) Mining frequent temporal patterns in process databases. Proceedings of international workshop on information technologies and systems, Australia, 175–180

  40. Williams G (1999) Evolutionary Hot Spots data mining: an architecture for exploring for interesting discoveries. Lect Notes Comput Sci 1574:184–193

    Google Scholar 

  41. Williams G, Huang Z (1997) Mining the knowledge mine: The Hot Spots methodology for mining large real world databases. Lect Notes Comput Sci 1342:340–348

    Article  Google Scholar 

  42. Yamanishi K, Takeuchi J, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery 8:275–300

    Article  Google Scholar 

  43. Yang WS, Hwang SY (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68

    Article  Google Scholar 

  44. Yang WS (2002) Process analyzer and its application on medical care. In Proceedings of 23rd International Conference on Information Systems (ICIS02), Spain

  45. Yang WS (2003) A Process Pattern Mining Framework for the Detection of Health Care Fraud and Abuse, Ph.D. thesis, National Sun Yat-Sen University, Taiwan

Download references

Acknowledgement

We would like to thank the reviewers for the valuable comments that helped us significantly improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Huang, KY., Jin, J. et al. A survey on statistical methods for health care fraud detection. Health Care Manage Sci 11, 275–287 (2008). https://doi.org/10.1007/s10729-007-9045-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10729-007-9045-4

Keywords

Navigation