Abstract
Fraud and abuse have led to significant additional expense in the health care system of the United States. This paper aims to provide a comprehensive survey of the statistical methods applied to health care fraud detection, with focuses on classifying fraudulent behaviors, identifying the major sources and characteristics of the data based on which fraud detection has been conducted, discussing the key steps in data preprocessing, as well as summarizing, categorizing, and comparing statistical fraud detection methods. Based on this survey, some discussion is provided about what has been lacking or under-addressed in the existing research, with the purpose of pinpointing some future research directions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abbott DW, Matkovsky IP, Elder JF (1998) An evaluation of high-end data mining tools for fraud detection. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA
Bennett K, Demiriz A (1998) Semi-supervised support vector machines. Adv Neural Inf Process Syst 12:368–374
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory
Bonchi F, Giannotti F, Mainetto G, Pedreschi D (1999) A classification-based methodology for planning auditing strategies in fraud detection. In Proceedings of SIGKDD99, 175–184
Borsuk ME, Stow CA, Reckhow KH (2004) A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis. Ecol Model 173:219–239
Chan CL, Lan CH (2001) A data mining technique combining fuzzy sets theory and Bayesian classifier—an application of auditing the health insurance fee. In Proceedings of the International Conference on Artificial Intelligence, 402–408
Cooper C (2003) Turning information into action. Computer Associates: The Software That Manages eBusiness, Report, available at http://www.ca.com
Cox E (1995) A fuzzy system for detecting anomalous behaviors in healthcare provider claims. In: Goonatilake S, Treleaven P (eds) Intelligent systems for finance and business. Wiley, New York, pp 111–134
Dai H, Korb KB, Wallace CS, Wu X (1997) A study of casual discovery with weak links and small samples. In Proceeding of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI), San Francisco, CA, pp 1304–1309
Dash M, Liu H (1997) Feature selection for classification. IDA 1:131–156
Fawcett T, Provost F (1999) Activity monitoring: noticing interesting changes in behavior. In Proceedings of SIGKDD99, 53–62
Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620
GAO (1996) Health Care Fraud: Information-Sharing Proposals to Improve Enforcement Effects. Report of United States General Accounting Office
Ghosh S, Reilly D (1994) Credit card fraud detection with a neural network. Proceedings of 27th Hawaii International Conference on Systems Science 3:621–630
Hall C (1996) Intelligent data mining at IBM: new products and applications. Intell Softw Strateg 7(5):1–11
He H, Hawkins S, Graco W, Yao X (2000) Application of Genetic Algorithms and k-Nearest Neighbour method in real world medical fraud detection problem. Journal of Advanced Computational Intelligence and Intelligent Informatics 4(2):130–137
He H, Wang J, Graco W, Hawkins S (1997) Application of neural networks to detection of medical fraud. Expert Syst Appl 13:329–336
Heckerman D (1998) A tutorial on learning with Bayesian networks. In Learning in Graphical Models. Kluwer Academic, Boston, pp 301–354
Herb W, Tom M (1995) A scientific approach for detecting fraud. Best’s Review 95(4):78–81
Hubick KT (1992) Artificial neural networks in Australia. Department of Industry, Technology and Commerce, CPN Publications, Canberra
Hwang SY, Wei CP, Yang WS (2003) Discovery of temporal patterns from process instances. Comp Ind 53:345–364
Lam W, Bacchus F (1993) Learning Bayesian belief networks: an approach based on the MDL principle. Comput Intell 10:269–293
Li J, Jin J, Shi J (2008) Causation-based T 2 Decomposition for Multivariate Process Monitoring and Diagnosis. Journal of Quality Technology, to appear in January 2008.
Li J, Shi J (2007) Knowledge Discovery from Observational Data for Process Control using Causal Bayesian Networks. IIE Transactions 39(6):681–690
Lin J-H, Haug PJ (2006) Data preparation framework for preprocessing clinical data in data mining, AMIA Symposium Proceedings 489–493
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Major JA, Riedinger DR (2002) EFD: A hybrid knowledge/statistical-based system for the detection of fraud. The Journal of Risk and Insurance 69(3):309–324
NHCAA (2005) The Problem of Health Care Fraud: A serious and costly reality for all Americans, report of National Health Care Anti-Fraud Association (NHCAA)
Nigam K, McCalum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Machine Learning 39:103–134
Ormerod T, Morley N, Ball L, Langley C, Spenser C (2003) Using ethnography to design a Mass Detection Tool (MDT) for the early discovery of insurance fraud. In Proceedings of the ACM CHI Conference
Ortega PA, Figueroa CJ, Ruz GA (2006) A medical claim fraud/abuse detection system based on data mining: a case study in Chile. In Proceedings of International Conference on Data Mining, Las Vegas, Nevada, USA
Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. SIGKEE Explorations 6(1):50–59
Ireson CL (1997) Critical pathways: effectiveness in achieving patient outcomes. J Nurs Adm 27(6):16–23
Shapiro AF (2002) The merging of neural networks, fuzzy logic, and genetic algorithms. Insurance: Mathematics and Economics 31:115–131
Sokol L, Garcia B, West M, Rodriguez J, Johnson K (2001) Precursory steps to mining HCFA health care claims. In Proceedings of the 34th Hawaii International Conference on System Sciences
Spirtes P, Glymour C, Scheines R (1993) Causation, Prediction and Search. Springer, New York
Viaene S, Derrig R, Dedene G (2004) A case study of applying boosting Naive Bayes to claim fraud diagnosis. IEEE Trans Knowl Data Eng 16(5):612–620
Viveros MS, Nearhos JP, Rothman MJ (1996) Applying data mining techniques to a health insurance information system. In Proceedings of the 22nd VLDB Conference, Mumbai, India, 286–294
Wei CP, Hwang SY, Yang WS (2000) Mining frequent temporal patterns in process databases. Proceedings of international workshop on information technologies and systems, Australia, 175–180
Williams G (1999) Evolutionary Hot Spots data mining: an architecture for exploring for interesting discoveries. Lect Notes Comput Sci 1574:184–193
Williams G, Huang Z (1997) Mining the knowledge mine: The Hot Spots methodology for mining large real world databases. Lect Notes Comput Sci 1342:340–348
Yamanishi K, Takeuchi J, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery 8:275–300
Yang WS, Hwang SY (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68
Yang WS (2002) Process analyzer and its application on medical care. In Proceedings of 23rd International Conference on Information Systems (ICIS02), Spain
Yang WS (2003) A Process Pattern Mining Framework for the Detection of Health Care Fraud and Abuse, Ph.D. thesis, National Sun Yat-Sen University, Taiwan
Acknowledgement
We would like to thank the reviewers for the valuable comments that helped us significantly improve the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Huang, KY., Jin, J. et al. A survey on statistical methods for health care fraud detection. Health Care Manage Sci 11, 275–287 (2008). https://doi.org/10.1007/s10729-007-9045-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10729-007-9045-4