Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
(2)
k=1
where i,j =records and n =number of variables
c) Comparing two medical claims
Record 1 shows sore throat, swollen neck and fractured shoulders whereas Record 2 shows ruptured toe and broken
elbow. The system checks these ailments against the column for variables and the existence of the same is marked with a
one (1) and the absence is marked with a zero (0) as shown in Table 4.5. The absolute difference and corresponding
square are computed for each row. The totals and corresponding Euclidean distances are then calculated. These
distances are the ones that will then be used for clustering.
d) The clustering process and claim analysis process
Medical claim details are entered one by one. When all the working days medical claim data are entered into the system,
the Euclidean distance between each pair of the claim forms is calculated as shown in Figure 4.5. The user of the system
sets the desired maximum distance for the records to belong to the same cluster. The sum claimed for all the claim forms
is computed and the average amount determined. If the amount for a given claim exceeds a set amount for that cluster,
then that particular claim form is rejected. The same process is repeated for different set distances and the rejected claims
is listed.
4.4 Entry of Claim details
Figure 4.5: Entry of Claim details
4.5 Claims Analysis
The Analysis process:
The Analysis process starts when the user clicks the Analyse button shown in Figure 6.3 The program then computes
the Euclidean distance between each pair of the claim forms using the Euclidean distance formula:
n
d
i,j
= |xi,k xj,k| (3)
k=1
where i,j =records, n =number of variables
The user of the system sets the desired maximum distance for the records to belong to the same cluster, Varying the
distance also changes the number of clusters that are formed. The assumption here is that the greater the distance, the
lower the number of clusters for the given data set and vice versa. The sum claimed for all the claim forms is computed
and the average amount determined. If the amount for a given claim exceeds the set amount for that cluster, then that
particular claim form is rejected.
CLAIMS ANALYZER
package com.wakoli.claimsanalyser;
import javax.swing.*;
import javax.swing.event.*;
import java.awt.*;
import java.awt.event.*;
import java.sql.SQLException;
import com.wakoli.claimsanalyser.methods.*;
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 7, July 2014 ISSN 2319 - 4847
Volume 3, Issue 7, July 2014 Page 149
public class ClaimsAnalyzer extends JFrame{
JPanel
flags =new JPanel();
JLabel
flag[ ]={
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif")),
new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/flag.gif"))},
logo =new JLabel(new ImageIcon("com/wakoli/claimsanalyser/support/images/eck.gif"));
static JDBCAdapter dataModel;
JDesktopPane desktop =new JDesktopPane();Login pass =new Login();
ChangePassword change =new ChangePassword();
boolean changed=false;
boolean loaded=false;
public ClaimsAnalyzer(){
Color background=new Color(150,160,130);
flags.setBackground(background);
flags.setLayout(new GridLayout(8,1));
int i;
for(i=0;i<=7;i++){
//flags.add(flag[i]);
}
ContentPanel contentPane =new ContentPanel("com/wakoli/claimsanalyser/support/images/eck.gif");
desktop.setBackground(background);//(150,160,130));
desktop.add(pass,JLayeredPane.MODAL_LAYER);
getContentPane().add(flags,BorderLayout.WEST);
getContentPane().add(desktop);//Can either add JDesktopPane or set it to be ContentPane
//getContentPane().add(contentPane); //this sets a background image
contentPane.setOpaque(false);
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
setTitle("THE CLAIMS ANALYZER");
setSize(1024,768);//800,500);
Label c=new Label();setLocationRelativeTo(c);
setUndecorated(true);
setVisible(true);
ChildFrame.select(pass);
//AudioPlayer.play("com/wakoli/claimsanalyser/support/audio/wape vidonge.wav");
Figure 4.6: Claims Analysis
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 7, July 2014 ISSN 2319 - 4847
Volume 3, Issue 7, July 2014 Page 150
Figure 4.6 shows the dialogue box for clustering and analyzing the claims in which case the distance was set to 10 units
whereby the cluster had 11 records out of which 7 were approved and 4 rejected. Clicking on the View Rejected button
displays a screen as the one shown in Figure 4.7
Figure 4.7: Rejected Claims
Figure 4.7 also shows that the greater the distance, the higher the rejection rate. This is because increasing the Euclidean
distance means having more records within a cluster; hence the chances of netting more fraudulent ones are higher.
5. CONCLUSIONS
This paper shows the successful application of the K-Means clustering algorithm to medical claims records. The medical
claims were successfully clustered and the average amount claimed per cluster was computed. Claims that were far away
from the average were flagged for further scrutiny. Hence the prototype can be used isolate flag suspicious claims that
can be subsequently rechecked. This prototype can immensely increase the medical claim fraud detection rate which in
turn will yield savings that cover operational costs and allowed to increase the quality of the health care coverage, fully
justifying the investment.
REFERENCES
[1]. Anderberg, 1973: Anderberg, M. R. (1973) Cluster Analysis for Applications.
[2]. Andrew Moore: K-Means and Hierarchical Clustering - Tutorial Slides http://www-
2.cs.cmu.edu/~awm/tutorials/kmeans.html (Accessed on 17 December 2008).
[3]. Berry, Michael J. A., and Gordon Linoff (2000), Mastering Data. Mining)
[4]. Brian Everitt, Sabine Landau, Morven Leese, Cluster analysis Edition: 4, illustrated Published by Arnold, 2001 ISBN
0340761199, 9780340761199
[5]. Brian T. Luke: K-Means Clustering http://fconyx.ncifcrf.gov/~lukeb/kmeans.html (Accessed on 23 January 2009).
[6]. DUDA, R. and HART, P. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, NY.
[7]. Dunn J.C. (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated
Clusters", Journal of Cybernetics 3: 32-57
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 7, July 2014 ISSN 2319 - 4847
Volume 3, Issue 7, July 2014 Page 151
[8]. Fayyad, Usama; Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996). "From Data Mining to Knowledge
Discovery in Databases". http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf. Retrieved on
2008-12-17. (Accessed on 23 January 2009).
[9]. Hand D.J., Adams N.M., and Bolton R.J. (eds.) (2002) Pattern detection and discovery. Springer .
[10]. Hans-Joachim Mucha and Hizir Sofyan: Nonhierarchical Clustering
http://www.quantlet.com/mdstat/scripts/xag/html/xaghtmlframe149.html (Accessed on 23 January 2009).
[11]. Hartigan, J. A. (1975). Clustering Algorithms. Wiley. MR0405726. ISBN 0-471-35645-X
[12]. MacQueen, J. (1967). "Some methods for classification and analysis of multivariate observations." In Proceedings
of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, edited by L. M. Le Cam and J.
[13]. Mugenda, O. M. and Mugenda, A. G. (1999) Research Methods: Quantitative and. Qualitative Approaches.
[14]. Paul E. Black, "Euclidean distance", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed.,
U.S. National Institute of Standards and Technology. Available from:
http://www.itl.nist.gov/div897/sqg/dads/HTML/euclidndstnc.html (Accessed on 23 January 2009).
[15]. UNESCO, 2004 (Best and Khan, 2004; Mugenda and Mugenda, 1999; Nachmias and Nachmias, 1996).
[16]. Wikipedia, http://en.wikipedia.org/wiki/Euclidean_distance (Accessed on 23 January 2009).
[17]. Yamanishi K., Takeuchi J., G. Williams, and P. Milne, On-line unsupervised outlier detection using finite
mixtures with discounting learning algorithms, In Data Mining and Knowledge Discovery vol. 8, pp. 275300,
2004.
Mr. Leonard .W. Wakoli is a Lecturer, Kenya Methodist University (Department of Computer Science
and Business Information Systems ) Nakuru Campus With well over 10 years of experience. He is also
a Cyber Security Consultant, PhD Candidate (Business Information Systems Security), Jaramogi
Oginga Odinga University Bondo Kenya. He holds a MSc. Software Engineering JKUAT- Kenya,
a Post Graduate Dip. in Management of Information Systems (MIS) Greenwich University UK, a
BSc. in Mathematics and Computer Science JKUAT- Kenya, a Dip. in Science Education KSTC-
Kenya. He is also an Environmental Impact Assessment Auditor , a Motivational Speaker and a Certified
Public Accountant K Finalist. His area of interest in research is Cyber Security, now working on Harnessing the
Power of Intrusion Detection Systems.
Mr. Abkul Orto is a Lecturer in the Department of Information Technology in the School of
Information Technology and Engineering and Director of Open, Distance and eLearning , Meru
University of Science and Technology, Kenya. He holds a Master of Science (Computer Based
Information Systems) from University of Sunderland, United Kingdom. He previously taught at KCA
University for seven years as a full time faculty and also at Institute of Computer Science and
Information Technology and IT Centre of the Jomo Kenyatta University of Agriculture and Technology,
Kenya. His research interests include Human Centred Computing, Spoken Language processing focusing on African
Languages, Information retrieval and security. He has a teaching experience of 12 years research interests include
Human Centred Computing, Spoken Language processing focusing on African Languages, Information retrieval and
security. He has a teaching experience of 12 years.
Mr. Stephen Mageto is a Lecturer in information Technology and Management and Chairman of
Department at Meru University of Science and Technology. He holds a Master of Studies (Information
Technology and Management) degree from Madurai Kamaraj University, 2004. He has over 8 years of
teaching experience. He has a strong research interest in the impact and integration of ICT applications
on community development and strategic information systems.