A survey on statistical methods for health care fraud detection

Jing Li¹,
Kuei-Ying Huang²,
Jionghua Jin² &
…
Jianjun Shi²

3761 Accesses
102 Citations
3 Altmetric
Explore all metrics

Abstract

Fraud and abuse have led to significant additional expense in the health care system of the United States. This paper aims to provide a comprehensive survey of the statistical methods applied to health care fraud detection, with focuses on classifying fraudulent behaviors, identifying the major sources and characteristics of the data based on which fraud detection has been conducted, discussing the key steps in data preprocessing, as well as summarizing, categorizing, and comparing statistical fraud detection methods. Based on this survey, some discussion is provided about what has been lacking or under-addressed in the existing research, with the purpose of pinpointing some future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on the state of healthcare upcoding fraud analysis and detection

Article 28 July 2016

Multi-stage methodology to detect health insurance claim fraud

Article 20 January 2015

Intelligent Financial Fraud Detection Practices: An Investigation

Discover the latest articles, news and stories from top researchers in related subjects.

References

Abbott DW, Matkovsky IP, Elder JF (1998) An evaluation of high-end data mining tools for fraud detection. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA
Google Scholar
Bennett K, Demiriz A (1998) Semi-supervised support vector machines. Adv Neural Inf Process Syst 12:368–374
Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory
Bonchi F, Giannotti F, Mainetto G, Pedreschi D (1999) A classification-based methodology for planning auditing strategies in fraud detection. In Proceedings of SIGKDD99, 175–184
Borsuk ME, Stow CA, Reckhow KH (2004) A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis. Ecol Model 173:219–239
Article Google Scholar
Chan CL, Lan CH (2001) A data mining technique combining fuzzy sets theory and Bayesian classifier—an application of auditing the health insurance fee. In Proceedings of the International Conference on Artificial Intelligence, 402–408
Cooper C (2003) Turning information into action. Computer Associates: The Software That Manages eBusiness, Report, available at http://www.ca.com
Cox E (1995) A fuzzy system for detecting anomalous behaviors in healthcare provider claims. In: Goonatilake S, Treleaven P (eds) Intelligent systems for finance and business. Wiley, New York, pp 111–134
Google Scholar
Dai H, Korb KB, Wallace CS, Wu X (1997) A study of casual discovery with weak links and small samples. In Proceeding of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI), San Francisco, CA, pp 1304–1309
Dash M, Liu H (1997) Feature selection for classification. IDA 1:131–156
Google Scholar
Fawcett T, Provost F (1999) Activity monitoring: noticing interesting changes in behavior. In Proceedings of SIGKDD99, 53–62
Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620
Article Google Scholar
GAO (1996) Health Care Fraud: Information-Sharing Proposals to Improve Enforcement Effects. Report of United States General Accounting Office
Ghosh S, Reilly D (1994) Credit card fraud detection with a neural network. Proceedings of 27th Hawaii International Conference on Systems Science 3:621–630
Google Scholar
Hall C (1996) Intelligent data mining at IBM: new products and applications. Intell Softw Strateg 7(5):1–11
Google Scholar
He H, Hawkins S, Graco W, Yao X (2000) Application of Genetic Algorithms and k-Nearest Neighbour method in real world medical fraud detection problem. Journal of Advanced Computational Intelligence and Intelligent Informatics 4(2):130–137
Google Scholar
He H, Wang J, Graco W, Hawkins S (1997) Application of neural networks to detection of medical fraud. Expert Syst Appl 13:329–336
Article Google Scholar
Heckerman D (1998) A tutorial on learning with Bayesian networks. In Learning in Graphical Models. Kluwer Academic, Boston, pp 301–354
Google Scholar
Herb W, Tom M (1995) A scientific approach for detecting fraud. Best’s Review 95(4):78–81
Google Scholar
Hubick KT (1992) Artificial neural networks in Australia. Department of Industry, Technology and Commerce, CPN Publications, Canberra
Google Scholar
Hwang SY, Wei CP, Yang WS (2003) Discovery of temporal patterns from process instances. Comp Ind 53:345–364
Article Google Scholar
Lam W, Bacchus F (1993) Learning Bayesian belief networks: an approach based on the MDL principle. Comput Intell 10:269–293
Article Google Scholar
Li J, Jin J, Shi J (2008) Causation-based T ² Decomposition for Multivariate Process Monitoring and Diagnosis. Journal of Quality Technology, to appear in January 2008.
Li J, Shi J (2007) Knowledge Discovery from Observational Data for Process Control using Causal Bayesian Networks. IIE Transactions 39(6):681–690
Article Google Scholar
Lin J-H, Haug PJ (2006) Data preparation framework for preprocessing clinical data in data mining, AMIA Symposium Proceedings 489–493
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Google Scholar
Major JA, Riedinger DR (2002) EFD: A hybrid knowledge/statistical-based system for the detection of fraud. The Journal of Risk and Insurance 69(3):309–324
Article Google Scholar
NHCAA (2005) The Problem of Health Care Fraud: A serious and costly reality for all Americans, report of National Health Care Anti-Fraud Association (NHCAA)
Nigam K, McCalum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Machine Learning 39:103–134
Article Google Scholar
Ormerod T, Morley N, Ball L, Langley C, Spenser C (2003) Using ethnography to design a Mass Detection Tool (MDT) for the early discovery of insurance fraud. In Proceedings of the ACM CHI Conference
Ortega PA, Figueroa CJ, Ruz GA (2006) A medical claim fraud/abuse detection system based on data mining: a case study in Chile. In Proceedings of International Conference on Data Mining, Las Vegas, Nevada, USA
Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. SIGKEE Explorations 6(1):50–59
Article Google Scholar
Ireson CL (1997) Critical pathways: effectiveness in achieving patient outcomes. J Nurs Adm 27(6):16–23
Article Google Scholar
Shapiro AF (2002) The merging of neural networks, fuzzy logic, and genetic algorithms. Insurance: Mathematics and Economics 31:115–131
Article Google Scholar
Sokol L, Garcia B, West M, Rodriguez J, Johnson K (2001) Precursory steps to mining HCFA health care claims. In Proceedings of the 34th Hawaii International Conference on System Sciences
Spirtes P, Glymour C, Scheines R (1993) Causation, Prediction and Search. Springer, New York
Google Scholar
Viaene S, Derrig R, Dedene G (2004) A case study of applying boosting Naive Bayes to claim fraud diagnosis. IEEE Trans Knowl Data Eng 16(5):612–620
Article Google Scholar
Viveros MS, Nearhos JP, Rothman MJ (1996) Applying data mining techniques to a health insurance information system. In Proceedings of the 22nd VLDB Conference, Mumbai, India, 286–294
Wei CP, Hwang SY, Yang WS (2000) Mining frequent temporal patterns in process databases. Proceedings of international workshop on information technologies and systems, Australia, 175–180
Williams G (1999) Evolutionary Hot Spots data mining: an architecture for exploring for interesting discoveries. Lect Notes Comput Sci 1574:184–193
Google Scholar
Williams G, Huang Z (1997) Mining the knowledge mine: The Hot Spots methodology for mining large real world databases. Lect Notes Comput Sci 1342:340–348
Article Google Scholar
Yamanishi K, Takeuchi J, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery 8:275–300
Article Google Scholar
Yang WS, Hwang SY (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68
Article Google Scholar
Yang WS (2002) Process analyzer and its application on medical care. In Proceedings of 23rd International Conference on Information Systems (ICIS02), Spain
Yang WS (2003) A Process Pattern Mining Framework for the Detection of Health Care Fraud and Abuse, Ph.D. thesis, National Sun Yat-Sen University, Taiwan

Download references

Acknowledgement

We would like to thank the reviewers for the valuable comments that helped us significantly improve the paper.

Author information

Authors and Affiliations

Department of Industrial Engineering, Arizona State University, P.O. Box 875906, Tempe, AZ, 85287-5906, USA
Jing Li
Department of Industrial and Operations Engineering, University of Michigan, 1205 Beal Avenue, Ann Arbor, MI, 48109-2117, USA
Kuei-Ying Huang, Jionghua Jin & Jianjun Shi

Authors

Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Kuei-Ying Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jionghua Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Huang, KY., Jin, J. et al. A survey on statistical methods for health care fraud detection. Health Care Manage Sci 11, 275–287 (2008). https://doi.org/10.1007/s10729-007-9045-4

Download citation

Received: 29 May 2007
Accepted: 11 December 2007
Published: 10 January 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s10729-007-9045-4

A survey on statistical methods for health care fraud detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A survey on the state of healthcare upcoding fraud analysis and detection

Multi-stage methodology to detect health insurance claim fraud

Intelligent Financial Fraud Detection Practices: An Investigation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A survey on statistical methods for health care fraud detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A survey on the state of healthcare upcoding fraud analysis and detection

Multi-stage methodology to detect health insurance claim fraud

Intelligent Financial Fraud Detection Practices: An Investigation

Explore related subjects

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation