Nothing Special   »   [go: up one dir, main page]

The Comparative Study On Clustering Method Using Hospital Facility Data in Jakarta District and Surrounding Areas

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

International Journal of Machine Learning and Computing, Vol. 9, No.

6, December 2019

The Comparative Study on Clustering Method Using


Hospital Facility Data in Jakarta District and Surrounding
Areas
Yogi Wahyu Romadon and Devi Fitrianah

requirement.
Abstract—The present study reports the findings of a Data was collected by the Ministry of Health of the
comparative analysis conducted on health facility research Republic of Indonesia in the form of data on hospital
(HFR) data, based on a clustering method with K-Means and resources obtained through health facility research (HFR),
K-Medoids algorithm. The K-Means algorithm consist of four
steps: specifyng the centroid values, grouping data on the
which is available to the public on the website
centroid, calculating centroid values, repeating grouping data (sirs.yankes.kemkes.go.id/rsonline). However there is still a
on the centroid, and calculating the values until the cluster is lack of effort to utilize the data for further analysis. Therefore,
stable (convergence), The K-Medoids algorithm consists of six the author will perform a comparative study by conducting an
steps: medoids initialization, data allocation to the nearest analysis, using the K-Means and K-Medoids algorithms and
medoid, determinination of new medoids, calculation of data classifying hospital facilities based on the resources owned
distance with medoid, calculation of deviation, and repeating
the determination of new medoids, until the deviation to the
by a hospital. The data will be grouped into a cluster based on
convergence cluster is counted. The data utilized in this study certain similarities with the predeterimined set of clusters. In
are HFR data located in the Jakarta district and surrounding accordance with these conditions, an algorithm that is
areas, from 2013 to 2018. Our results showed that the execution suitable for use is employed using K-Means and K-Medoids.
time from K-Medoids algorithm outperformed the K-Means These algorithms partition or group data into one or more
algorithm. The K-Medoids algorithm speeded up the execution clusters that have certain similarities based on a
time, and at the same time improved the density value between
clusters (silhouette). By utilizing the clustering method, health
predetermined cluster number. The clustering method, using
facilities in hospitals in Jabodetabek can be categorized based the K-Means and K-Medoids algorithms, will be
on their resources. implemented with TensorFlow, which is an open-source
library software for numerical calculations for both machine
Index Terms—Clustering, K-Means, K-Medoids, health learning and deep learning [3].
facility research.

II. LITERATURE REVIEW


I. INTRODUCTION
Hospitals provide vitally important inpatient, and A. Clustering
outpatient, as well as emergency services [1]. Hospitals are One method for grouping data is clustering. In clustering,
one of the main facilities for all residents, of the Jakarta existing data are grouped based on the level of similarity data
distric, allowing constant access to health services. There is a sharing common characteristics with other data are grouped
need to assure that high quality of health services and into one cluster, while data that do not have similarities be
appropriate infrastructure, exsist to promote preventive, grouped with other clusters [4]. Clustering methods have
curative, and rehabilitative activities. The central government, been widely used in various fields, including pattern
local government and the community assure this quality [2]. recognition, data analysis, and image processing [5].
Hospital facilities and services grouped into class A Basically, this method has a simpleset of steps that is
consisting of 2.42%, class B consisting of 14.11%, class C defining the distance between data elements, so that each data
consisting of 41.25%, and class D consisting of 21.07%, in will be centered to a nearest point (called centroid); if the data
2016; 21.15% have not been grouped [1]. is not close together it will center to another point [6].
In accordance with the conditions described above, a
B. K-Means
system is needed to group hospital into a particular data
group (cluster), based on certain characteristics, so that the The K-Means algorithm is one method, used to classify
user can find information regarding which hospital offer the semi-constructed or unstructured datasets. This method is
set of doctors, facilities, and set staff pertaining to the users’ one of the most common and effective methods for grouping
data, due to its simplicity and ability to handle productive
datasets [7]. The K-Means algorithm is generally
implemented to relocate datasets into k cluster [8].
The K-Means algorithm has the following main steps:
Manuscript received July 19, 2019; revised October 17, 2019. determining the value of k randomly as a cluster center;
The authors are with the Department of Informatics, Faculty of Computer locating existing data so that they forn or produce clusters
Science, Mercu Buana Univerisity, Jakarta 11650, Indonesia (e-mail: with k as their center (centroid); calculating a new cluster
41515010069@student.mercubuana.ac.id,
devi.fitrianah@mercubuana.ac.id). center or calculating centroid; repeatedly generating new

doi: 10.18178/ijmlc.2019.9.6.868 749


International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019

partitions and calculating new clusters until the member is where d is Euclidian Distance, xi and yi are the distance
stable (convergence) [9]. These stages can be explained by between points x and y.
the following equations. 2) Allocate each data to the nearest cluster of each data
1) Determine the value k randomly as the center of the with the euclidean distance equation, which is shown in
cluster; determination of k can be calculated by the equation (4) then (5).
following equation (1): 3) Determine randomly the object on each cluster as new
medoid.
Z1 (1), Z 2 (1), Z k (k ) (1) 4) Calculate each distance between objects in each cluster
with a new medoid.
where Z1 (1), Z 2 (1), Z k (k ) is the value of each cluster center 5) Calculate the total deviation (S) by means of total
distance recently – total long distance. If you get S < 0 as
(centroid).
a result, then exchange between objects with cluster data,
2) After the clusters are determined, the next step is to
to make the new k as new medoid, which shown in the
group and distribute the data into each cluster, so that the
equation (6),
cluster pieces are performed with k as the center of each
cluster (centroid). The grouping and distribution can be
formulated with the equation (2), E =  ik=1  pci dist ( p, oi ) (6)

x  Si (k ) = if x − zi (k ) < x − zi (k ) (2) where E is the absolute error deviation for each object p in the
dataset, n is the number of objects to be grouped into as many
as k cluster and oi is the cluster center (medoid) of the cluster
where i is a value for every k cluster that exist, i is not equal to Ci.
j, and Sj(k) shown the cluster sample from Zj(k). 6) Repeat steps 3 trough 5, until the medoid does not
3) The next step after the data has been grouped and
change, and a cluster and each member are obtained
distributed into existing cluster, is to calculate the new
(convergence).
cluster center, for example Zj(k+1), j = 1, 2,…, k such
that the sum of squared distances from all points Sj(k) to D. Silhouette Coefficient
the center of new cluster. The new cluster center Zj(k +1) Silhouette coefficient is one way to perform a cluster
is calculated to minimize the next index performance evaluation internally. Silhouette coefficient is also used to
given according to the equation (3). assess the quality and strength of a cluster, by assessing how
precisely an object is placed into a cluster based on synthetic
J j =  xS j ( k ) x − z j (k + 1) , j1, 2,
2
,K (3) and real-world data [11], [12]. Silhouette coefficient is also
used to calculate similarity between one object in one cluster
and another object in another cluster [13]. Silhouette
4) The last step in this algorithm is if data Zj(k +1) = Zj(k) coefficient calculation can be determined by the following
for j = 1,2,…,k, the procedure of K-Means algorithm equation (7).
will occur when this process stops and does not repeat to
equation (2) and (3), which is called convergence. b(i) − a(i)
s(i) = (7)
C. K-Medoids max(a(i), b(i))
The K-Medoids algorithm is a clustering method that
functions to break up the dataset into groups. The advantage where s(i) is silhouette of i, with value between -1 to 1, i is
of this method is, that it is able to overcome the weakness of center cluster are determined, a(i) is mean dissimilarity from
points or point cluster one with the other, b(i) is the mean
the K-Means algorithm in regards to the sensitivity to outlier
value of smallest dissimilarity between i and element other
[10].
than cluster i [14].
The difference between the K-Means and the K-Medoids
algorithm is that the K-Medoids algorithm uses an object as E. Rand Index
medoid (cluster center) for each cluster, whereas the Rand index is the method for performing an external
K-Means algorithm uses the mean as the center of cluster. evaluation of a clustering [15]. Rand index is a way to
The steps of the K-Medoids algorithm are as follows; measure the percentage of decisions that are appropriate to
initialize k as much as the number of clusters and cluster the cluster result obtained, where rand index can be obtained
centers, allocate each data to the nearest cluster, determine a by equation (8) [16].
new medoid, calculate distance between the object and the
cluster center, do the determination of the new medoid, TP + TN (8)
Ri =
calculate the total deviation, and calculate the distance of the TP + FP + FN + TN
object with the cluster.
1) Initialize k as many of cluster and as cluster center TP is True Positive is a result where the model places two
(medoid), according to the equation (4) dan (5). similar points in the same cluster, TN is True Negative is a
result where the model places two similar points into
d ( x, y ) = x − y (4) different clusters, FP is False Positive where the model
places two points which do not have the same resemblance
into the same cluster, and FN is False Negative state where
d=  n
i =1 ( xi − yi )2 ; 1, 2,3, n (5) the model places similar data into different clusters.

750
International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019

F. Related Works Regarding to Data Mining on Hospital hospitals in the Jakarta district (Jakarta, Bogor, Depok,
Facilities Tangerang, and Bekasi), in the period from 2013 to 2018,
There are several related studies in grouping hospital consisting of 337 registered hospital data.
facilities. In 2015, Bichen Zheng [17] implemented Neural B. Preprocessing
Network, Random Forest, and Support Vector Machine to
Preprocessing is a process that aims to improve the quality
classify facility data in hospitals in the United States, and
of the data mining process [21], [22]. At the preprocessing
readmission data to classify existing patients. The results
stage, several processes are carried out: data cleaning, data
obtained showed an average accuracy value of 78.4%, after
reduction, data transformation, and data integration [23].
the completion of the tuning parameters process, increased to
The data obtained from the scraping process were
97.3%.
preprocessed, using data cleaning, data reduction, and data
Then, Abd Elrazek in 2017 [18] applied the Naïve Bayes
transformation.
algorithm with medical care system data including facilities,
1) Data cleaning: data that were still incomplete or empty
time, and cost in Egypt, which aimed to produce more
were supplemented with appropriate data to avoid
efficient analysis compared to traditional statistical analysis
incomplete data; empty or non-filled attributes were
in terms of controlling costs and improving patient care.
filled with integer data, which corresponded to the
Next, in 2016 Magesh Vivek, Sundar Franco, and Thomas
attributes of the HFR data.
Greefin [19] applied the ANFIS method (neural-fuzzy is
2) Data reduction, at this step there are data with attributes
system which contains both Neural Network and fuzzy
that are not needed in the clustering process will be
system), which aimed to improve the quality of health
reduced or eliminated, while the 113 other attributes in
services for tuberculosis and cardiovascular patients.
the dataset will be simplified to 3, that is bed in hospital,
medical personnel, and list of hospital equipment.
III. METHODOLOGY 3) Data Transformation: in this process data other than
integer type were converted into an integer type, for
In this paper, five consecutive methodological steps were example in the dataset there were data in the form of
taken.: data collection, preprocessing, clustering with string type that were ‘existing-function’ and
K-Means and K-Medoids, validation, and result analysis. An ‘non-existent’; the data were converted into an integer
overview of the methodology is displayed on Fig. 1. by converting ‘existing-function’ into 1 while
‘non-existent’ was changed to 0. Another
transformation was, that using TensorFlowLearn, this
process aimed to convert integer data type into data that
can be understood by TensorFlow (Tensor data type),
which is tf.int32. Pseudo code for transforming data
into Tensor data type is shown in Fig. 2.

# Pseudo code for preprocessing data

def normalize(v):
norm =
tflearn.data_preprocessing.DataPreprocessing(v)
if norm == 0:
Fig. 1. Block diagram of methodology. return v
return v / norm
The following are details of the methodology stage:
Fig. 2. Pseudo code for preprocessing data.
A. Data Collection
Data was obtained, using the Scraping or Web Scraping C. Clustering with K-Means and K-Medoids
Technique. This technique is one of the possible ways to
collect content that is available on each page of a website [20]. 1) K-Means
This technique is implemented, using the python In the K-Means algorithm, the first thing to do is to
programming language, and using the Scipy module with determine the random value initialization k as the cluster
spyder as the Scientific Python Development EnviRonment center (centroid). The k value is determined for as many as
on the Ministry of Health of the Republic Indonesia website four points different from each other, which are used as the
sirs.yankes.kemkes.go.id/rsonline. center of each existing cluster.
The data collected from the scraping process were HFR After the four clusters, the next step is to classify and
data conducted by the Ministry of Health of the Republic distribute the data into each cluster, so that four clusters are
Indonesia. They consisted of 37,518 data points, which had formed with k as the center of each cluster (centroid). The
not been preprocessed, and contained all the resources in the grouping and distribution data can be displayed in a pseudo
hospitals in throughout Indonesia from 2013 to 2018. The code (Fig. 3).
data consisted of various attributes, for example number of where _kmeans_clustering_model_fn is the function
rooms in the hospital, number of medical personnel, and that is in tf.contrib.KMeansClustering, which serves
equipment data in the hospital. to classify each data with each cluster that has been defined
The dataset used in this study included resource data on on class KMeansClustering.

751
International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019

"""Pseudo code for grouping each data into each """Pseudo code to allocate or distribute all data
cluaster""" into each medoid"""

def _kmeans_clustering_model_fn(features, def clustering_K_Medoids(instance, medoids=4):


labels, mode, params, config): row_count = len(medoids.index)
assert labels is None, labels distances = np.zeros((row_count),
(all_scores,model_predictions,losses,is_init dtype=float)
ialized,init_op,training_op) =
clustering_ops.KMeans(_parse_tensor_or_dict( for index, row in medoids.iterrows():
features), params.get('num_clusters'), distances[index] =
initial_clusters=params.get('training_initia euclidean_distance(instance,row)
l_clusters'), result = [0, 0]
distance_metric=params.get('distance_metric' result[0] = np.argmin(distances)
), result[1] = distances[np.argmin(
use_mini_batch=params.get('use_mini_batch'), distances)]
mini_batch_steps_per_iteration=params.get('m return result
ini_batch_steps_per_iteration'),
random_seed=params.get('random_seed'), Fig. 5. Pseudo code to allocate or distribute all data into each medoid.
kmeans_plus_plus_num_retries=params.get('kme
where clustering_K_Medoids is a function of grouping
ans_plus_plus_num_retries'))
... or allocating data in the dataset into each medoid.
Fig. 3. Pseudo code for grouping each data into each cluster.
"""Pseudo code to determine random object as new
medoids using previous function"""
The next step after the data has been grouped and
distributed into 4 existing clusters, is to calculate the new def k_medoids(tf, k, max_iterations):
cluster center. row_count = len(tf.index)
The last step in this algorithm is to determine, whether the col_count = len(tf.columns)
data move cluster after the calculation of a new cluster. The medoids = tf.sample(k)
medoids = medoids.reset_index(
process can be displayed in Fig. 4, key is the result of
drop=True)
calculation will be checked that data is experiencing
movement or not. ...

"""Pseudo code for check the data are change for index in range(0,k):
cluster or not""" membership.append([])

def transform(self, input_fn=None, # First time classify


as_iterable=False): prev_medoids = medoids.copy()
key = KMeansClustering.ALL_SCORES pred = np.zeros(
results = super(KMeansClustering, row_count).astype(int)
self).predict(
input_fn=input_fn, for index, row in tf.iterrows():
outputs=[key], tmp_array = clustering_K_Medoids(
as_iterable=as_iterable) row, prev_medoids)
if not as_iterable: pred[index] = tmp_array[0]
return results[key] membership[tmp_array[0]].append(
else: index)
return results error += tmp_array[1]

Fig. 4. Pseudo code for check the data are change cluster or not. best_error = error
best_pred = np.copy(pred)
2) K-Medoids best_medoids = medoids.copy()
...
In the K-Medoids algorithm six steps are carried out,
Fig. 6. Pseudo code to determine random object as new medoids.
which include initialization of k as many clusters and as a
cluster center (medoid). The third step was to determine an object randomly as new
The first step is to determine a medoid randomly; it shoukd medoids, in order to replace previous medoids. This step is
not overlap with the other centroids which is four medoid displayed in pseudo code in Fig. 6.
pieces. In the fourth step, after the new medoid is formed, the total
The second step is to allocate and distribute each existing deviation (S) of each object is calculated and the object
data into the medoid by using the euclidean distance equation. becomes the center of the cluster.
Before the data are distributed to each medoid, euclidean The fifth step is to redetermine the object as a new medoid,
distance must be calculated for each predetermined medoid. to calculate the distance between each of the different objects
After euclidean distance is known, data are allocated to in each cluster, and to calculate the total deviation, until the
each medoid based on the specified euclidean distance. medoid does not change (convergence), and a cluster and
Pseudo code for distributing or allocating each data into an each member are obtained. This step is displayed on the
existing medoids is shown in Fig. 5. pseudo code on Fig. 7.

752
International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019

"""Pseudo code for calculate total deviation and


find new medoid"""

...
for index, row in tf.iterrows():
tmp_array = clustering_K_Medoids(
row, prev_medoids)
pred[index] = tmp_array[0]
membership[tmp_array[0]].append(
index)
error += tmp_array[1]
... Fig. 8. The result of Silhouette and Rand Index of K-Means and
K-Medoids.

"""stop condition when the distance deviation TABLE I: THE COMPARISON OF SILHOUETTE INDEX AND RAND INDEX
does not change """ BETWEEN K-MEANS AND K-MEDOIDS

Clusuter Result
if(error >= best_error): Value of k
Evaluation K-Means K-Medoids
iteration += 1
Silhouette Index 4 0.1817 .4461
else:
iteration = 0 Rand Index 4 0.1755 0.6214
best_error = error
best_pred = np.copy(pred) Table I shows that the K-Medoids algorithm outperform
best_medoids = medoids.copy() the K-Means algorithm is the K-Medoids. This is indicated
i += 1 by the higher values of the silhouette index and the rand index
if(iteration == max_iterations):
is_convergence = True
(close to positive 1). That is because in the K-Medoids
algorithm the data at the center of each cluster are randomly
Fig. 7. Pseudo code for calculate total deviation and find new medoid.
determined and then iterated, whereas in the center of
The last step in the K-Medoids algorithm is to repeat the K-Means algorithm, is the average result of each cluster.
determination of the new medoid until the calculation of the TABLE II: THE COMPARISON OF EXECUTION TIME BETWEEN K-MEANS AND
total deviation of the medoid does not change (convergence). K-MEDOIDS
Algorithm Value of k Execution Time (sec)
D. Validation
K-Means 4 2.96
In this study the K-Means algorithm and the K-Medoids K-Medoids 4 1.15
algorithm were compared, using the silhouette index and the
rand index. Details of the comparison are explained bellow: Table II shows that K-Medoids had better execution time
1) Silhouette index compared with the K-Means algorithm. That is because the
cluster center of K-Means is determined by calculating all the
Silhouette index is used to assess the quality and strength
averages in the data of every cluster, so it required more time
of a cluster, by assessing how precisely an object is placed
to calculate the average. Whereas in K-Medoids only
into a cluster based on synthetic data and real-world data
initialization data were used as cluster center.
[13].
After clustering was implemented with the K-Means and E. Result Analysis
the K-Medoids algorithm, the results regarding the best From the studies between the K-Means and the K-Medoids
cluster were calculated. The test was done with the same data, algorithms, the time execution of the K-Medoids algorithm is
namely HFR data and with different algorithms with the better than that of the K-Means algorithm. Furthemore, the
value of k = 4 in each algorithm. Silhouette Index was result of the cluster evaluation with the silhouette index and
calculated using best_pred (the best result of clustering the rand index showed that the K-Medoids algorithm is better
method using K-Means and K-Medoids models). All data than the K-Means algorithm.
were fitted, using StandardScaler on Scikit-Learn Python The execution time result of K-Means was 2.96 seconds
module, the best result of silhouette index within K-Means and the execution time of K-Medoids was 1.15 seconds.
and K-Medoids algorithm was 0.4461 from K-Medoids From that result K-Medoids algorithm showed better time
algorithm. execution time and outperformed the silhouette index and
2) Rand index rand index values, because the K-Medoids used robustness of
Rand index is a measure of similarity between data in one medoid (use medoid as a cluster center). Therefore, the
cluster. This validation was calculated by labels_true and K-Medoids algorithm is more robust to outliers than
labels_pred (on the K-Means and K-Medoids model), arithmetic mean. Ths the hospital resources data can be better
using Scikit-Lern Python module; the best result of the rand grouped using K-Medoids, rather than K-Means.
index was 0.6124 from the K-Medoids algorithm. Fig. 8
shows the silhouette index and rand index of the K-Means
and K-Medoids algorithms. IV. RESULT AND DISCUSSION
Table I shows more details on clustering results from the
A. K-Means Result
K-Means and K-Medoids algorithms< k refers to the result of
grouping hospitals based on resources the z owned. Experiments were carried out, using the clustering method

753
International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019

with K-Means algorithm, on HFR data from 2013 to 2018 hospitals that have complete resources with good predicate;
consisting of 337 lists of hospitals in Jabodetabek district. in cluster2 (red cluster color) are a group of hospitals with
Three attributes were preprocessed first, and the data were enough predicate; while in cluster3 (blue color) are a hospital
grouped into 4 clusters (cluster 0, cluster 1, cluster 2, and group with deficient resource predicate.
cluster 3). In cluster 0 is a cluster with resources in the
B. K-Medoids Result
hospital with excellent quality and completeness, if the data
is in cluster 1, the hospital has good quality and complete After experiments on HFR data, using the K-Means
resources, then if a data is included in cluster 2 then the algorithm, the next step was to test the same data with a
hospital have enough quality and complete resources, the last different algorithm, namely the K-Medoids algorithm; the
one is cluster 3 which has deficient resources and quality. number of cluster centers (medoids) was set at 4 as during the
The clustering model with the K-Means algorithm is testing with the K-Means algorithm.
created using the TensorFlow framework. The results are In this experiment, the K-Medoids model was constructed
shown in the graph. using the TensorFlow framework and with the help of the
pandas module. The results of this experiment are shown in
the following graph.

Fig. 9. Data before clustering using K-Means.

Fig. 11. Clustering result using K-Medoids with medoids=4.

In Fig. 11 the clustering results using the K-Medoids


algorithm are shown four clusters, were produced. Cluster0
(presented with magenta color) cluster is the group of
hospitals with the highest number of beds, medical personnel,
and hospital equipment; therefore, Cluster1 (presented with
green color) is a group of hospitals that have complete
resources with a good predicate. In cluster2 (with red colors)
are hospitals with enough predicate. Cluster 2 has the same
shape as cluster3. The difference is the number of data on the
attributes of medical personnel and beds in the hospitals.
Cluster 3 (presented with blue) is a group of hospitals with
predicate resources that is deficient due to the lowest number
Fig. 10. Clustering result using K-Means with k=4. of beds in hospitals, medical personnel, and hospital
equipment.
Fig. 9 shows the HFR dataset from 2013 to 2018 before
experiencing the clustering process with the K-Means C. Comparative Analysis
algorithm. The dataset is visualized into a plot with a green The findings, obtained from this study, are the time of
point symbol, which is the data in the HFR. The data consist execution, or the time needed to do one-time clustering
of three attributes, namely beds in the hospital attribute (the process with the K-Means algorithm and the K-Medoids
data on the number of beds in the hospital), medical algorithm, and the number of each cluster (member of
personnel (data that contains the number of medical clusters). The validation of the findings was by conducting
personnel available at the hospital), and list of medical cluster evaluation both in terms of internal evaluation and
equipment (data of equipment information in the hospital). external evaluation.
Fig. 10 displays the results of the K-Means algorithm, The calculation of time is done by counting, beginning
which produce four clusters. The cluster0 (magenta color) when the initial process starts and ending when the process
represent a group of hospitals with the most complete stops and produces the cluster. The time obtained was
resources, so that it belongs to the group of hospitals with different, depending on the algorithm used; namely, in the
excellent predicate; in cluster1 (green cluster) a group K-Means algorithm the time was 2.96 seconds, in the

754
International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019

K-Medoids algorithm the time obtained (was faster) with a [7] H. I. Arumawadu, R. M. K. T. Rathnayaka, and S. K. Illangarathne,
K-Means Clustering For Segment Web Search Results, vol. 2, no. 8, pp.
value of 1.15 seconds. Other results obtained are the number 79–83, 2015.
of each cluster. When clustering with K-Means, cluster 0 [8] Z. S. Younus et al., “Content-based image retrieval using PSO and
consisted of 9, cluster 1 of 124, cluster 2 of 65, and cluster 3 k-means clustering algorithm,” Arab. J. Geosci., vol. 8, no. 8, pp.
6211–6224, 2015.
of 139 data, respectively. When clustering with K-Medoids,
[9] A. K. Jain, “Jain, Lansing - 2009 - Data Clustering 50 Years Beyond
cluster 0 consisted of 29 data, cluster 1 consisted of 129 data, K-Means 1 Anil K . Jain Michigan State University,” in Proc. 19th Int.
cluster 2 consisted of 6 data, and cluster 3 consisted of 173 Conf. Pattern Recognit., 2010, pp. 651–666.
data. [10] D. F. Pramesti, M. T. Furqon, and C. Dewi, “Implementasi Metode
K-Medoids clustering untuk pengelompokan data potensi kebakaran
The validation conducted in this study, is using the cluster hutan / lahan berdasarkan persebaran titik panas (hotspot),” J.
evaluation method both internally and externally. The results Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 1, no. 9, pp. 723–732,
obtained from the validation, namely in the K-Means 2017.
[11] M. Anggara, H. Sujiani, and N. Helfi, “Pemilihan distance measure
algorithm, obtained the silhouette index value of 0.1817 and pada K-Means clustering untuk pengelompokkan member di alvaro
the rand index value of 0.1755. In the K-Medoids algorithm, fitness,” J. Sist. dan Teknol. Inf., vol. 1, no. 1, pp. 1–6, 2016.
the silhouette index value is 0.4461 and the rand index value [12] M. A. Masud, J. Z. Huang, C. Wei, J. Wang, I. Khan, and M. Zhong,
“I-nice: A new approach for identifying the number of clusters and
is 0.6214. initial cluster centres,” Inf. Sci. (Ny)., vol. 466, pp. 129–151, 2018.
[13] D. Fitrianah, A. N. Hidayanto, H. Fahmi, J. L. Gaol, and A. M.
Arymurthy, “ST-AGRID: A spatio temporal grid density based
clustering and its application for determining the potential fishing
V. CONCLUSION zones,” Int. J. Softw. Eng. its Appl., vol. 9, no. 1, pp. 13–26, 2015.
The clustering method, using the K-Means algorithm and [14] F. Gargiulo, S. Silvestri, and M. Ciampi, “A clustering based
methodology to support the translation of medical specifications to
the K-Medoids algorithm, has been successfully applied to software models,” Appl. Soft Comput. J., vol. 71, pp. 199–212, 2018.
data HFR. The results obtained show that the K-Medoids [15] C. C. Yeh and M. S. Yang, “Evaluation measures for cluster ensembles
algorithm outperformed the K-Means algorithm in terms of based on a fuzzy generalized Rand index,” Appl. Soft Comput. J., vol.
57, pp. 225–234, 2017.
the execution time and cluster generated. Time execution for
[16] M. Hoffman, D. Steinley, and M. J. Brusco, “A note on using the
K-Medoids is better than for K-Means (2.96 seconds for adjusted Rand index for link prediction in networks,” Soc. Networks,
execution on K-Means and 1.15 seconds for execution on vol. 42, pp. 72–79, 2015.
K-Medoids, respectively). In addition to the time differences [17] B. Zheng, J. Zhang, S. W. Yoon, S. S. Lam, M. Khasawneh, and S.
Poranki, “Predictive modeling of hospital readmissions using
in execution, K-Medoids also maintained the accuracy of metaheuristics and data mining,” Expert Syst. Appl., vol. 42, no. 20, pp.
grouping results. This can be seen from the value of the 7110–7120, 2015.
silhouette index and the rand index, which approached [18] A. E. A. Elrazek, “How can data mining improve health care?” Appl.
Math. Inf. Sci., vol. 11, no. 2, pp. 585–588, 2017.
positive 1. [19] V. S. Magesh and T. G. Franco, Improving Indian Healthcare Using
This study showed that the clustering method can be Data Mining, 2016, pp. 598–607.
applied to group hopitals based on beds in the hospital, [20] C. F. Zanon et al., “Web scraping computer program for the estomato
web software: A potential tool for oral medicine practice and research,”
medical personnel, and list of hospital equipment. Oral Surg. Oral Med. Oral Pathol. Oral Radiol., vol. 124, no. 2, p.
Adjustments to the data used before the clustering method is e141, 2017.
applied, is very influential on the cluster results. In this case [21] M. Sadikin and F. Alfiandi, “Comparative study of classification
method on customer candidate data to predict its potential risk,” Int. J.
the adjustment was performed with data cleaning, which Electr. Comput. Eng., vol. 8, no. 6, pp. 4763–4771, 2018.
filled the incomplete data, carried out data reduction by [22] F. Gürbüz, L. Özbakir, and H. Yapici, “Data mining and preprocessing
reduction of needed attributes, and performed data application on component reports of an airline company in Turkey,”
Expert Syst. Appl., vol. 38, no. 6, pp. 6618–6626, 2011.
transformation by changing non-integer into integer data type.
[23] A. Idri, H. Benhar, J. L. Fernández-Alemán, and I. Kadi, “A systematic
From the comparison results between two clustering map of medical data preprocessing in knowledge discovery,” Comput.
algorithms with K-Means and K-Medoids, it was concluded Methods Programs Biomed., vol. 162, pp. 69–85, 2018.
that the K-Medoids algorithm was better than the K-Means,
supported by the cluster evaluation results and execution time. Yogi Wahyu Romadon was born in Cilacap, Indonesia,
Hospital resources were grouped in the form of cluster 0 with in 1998. He studied the bachelor’s degree in informatics
excellent predicate, cluster 1 with good predicate, cluster 2 engineering from the Mercu Buana University, Jakarta,
Indonesia, in period from 2015 to 2019.
with enough predicate, and cluster 3 with deficient predicate. He is currently a student in the Faculty of Computer
Science. His research interest include data mining,
REFERENCES machine learning, and deep learning.
[1] Kementerian Kesehatan Republik Indonesia, Profil Kesehatan
Indonesia, 2016.
[2] R. I. K. Kesehatan. (2014). Peraturan Menteri Kesehatan Republik Devi Fitrianah was born in Jakarta, Indonesia, in 1978.
Indonesia Nomor 75 Tahun 2014 tentang Pusat Kesehatan Masyarakat. She received the bachelor’s degree in computer science
1–24. [Online]. Available: from Bina Nusantara University, West Jakarta,
http://aspak.yankes.kemkes.go.id/beranda/wp-content/uploads/downlo Indonesia, in 2000, and the master’s degree in
ads/2015/03/PMK-No.-75-ttg-Puskesmas.pdf information technology and Ph.D. degree in computer
[3] A. A. K. Gulli, TensorFlow 1.x Deep Learning Cookbook, 1st ed., science from the Universitas Indonesia, Depok,
BIRMINGHAM-MUMBAI: Packt Publishing Ltd, 2017. Indonesia, in 2008 and 2015, respectively.
[4] R. Liu, X. Li, L. Du, S. Zhi, and M. Wei, “Parallel implementation of She is currently a faculty member with the
density peaks clustering algorithm based on spark,” Procedia Comput. Department of Computer Science, Universitas Mercu Buana, where she is
Sci., vol. 107, pp. 442–447, 2017. the head of Research Center.
[5] J. Han, M. Kamber, and J. Pei, “Data mining concepts and techniques,” Her research interests include image procesing, data mining, applied
Data Mining, 3rd ed., Morgan Kaufmann Publishers, 2011, pp. 1–38. remote sensing, and geographic information system.
[6] R. Bonnin, Building Machine Learning Projects with TensorFlow.
Birmingham: Packt Publishing Ltd., 2016.

755

You might also like