Customer segmentation 21 (1)
Customer segmentation 21 (1)
Customer segmentation 21 (1)
Bachelor of Technology in
K.Bhavya (S180109)
B.Srinivasu (S180551)
V.Rupavathi (S180269)
K.Suryasri (S180291)
CERTIFICATE
This is to certify that the work entitled, “Customer segmentation using
k-means clustering” is the bonafied work of V.RUPAVATHI (ID No:
S180269), K.BHAVYA (ID No: S180109), B.SRINIVASU (ID No:
DECLARATION
We Ch.veerababu, K.bhavya, V.rupavathi, B.srinivasu and K.surya sri hereby declare that
We also declare that it has not been submitted previously in part or in full to this University or
V.Rupavathi (S180269)
K.Bhavya (S180109)
B.Srinivasu (S180551)
Ch.Veerababu (S180642)
ACKNOWLEDGEMENT
We would like to express my sincere gratitude to, my project Guide Mr.N.Ramesh babu, for
valuable suggestions and keen interest throughout the progress of my course of research.
We are grateful to Mr.N.Ramesh babu assistant professor of ECE, for providing excellent computing
At the outset, We would like to thank Rajiv Gandhi University of Knowledge and
Technologies, Srikakulam for providing all the necessary resources for the successful
completion of our course work. At last, but not least we thank our classmates and other
V.Rupavathi,
Ch.Veerababu,
K.Bhavya,
B.Srinivasu,
K.Surya sri.
CUSTOMER SEGMENTATION USING MACHINE LEARNING
ABSTRACT
businesses to categorize their customers into distinct groups based on common attributes such
companies can identify patterns and trends within their customer base and use this
business ,and optimize business operations. This abstract provides an overview of the benefits
increased revenue, and more efficient resource allocation. Additionally, it outlines the main
steps involved in creating a customer segmentation model, including data collection and
cleaning, feature engineering, model selection and training, and evaluation and deployment.
Overall, customer segmentation using machine learning offers businesses a powerful tool to
gain deeper insights into their customers and make data-driven decisions that drive business
Title……………………………………………………………… i
T
Certificate……………………………………………………….. ii
Declaration……………………………………………………… iii
Acknowledgements……………………………………………... iv
Abstract………………………………………………………...... v
1. INTRODUCTION
1.1 Introduction……………………………………………………... 1
1.2 Motivation……………………………………………………….. 1
1.4 Objectives………………………………………………………... 2
1.5 Goal………………………………………………………………. 3
1.6 Scope……………………………………………………………... 3
1.7 Applications……………………………………………………... 3
2. LITERATURE SURVEY
2.2 Study……………………………………………………………... 6
2.3 Summary………………………………………………………… 6
CUSTOMER SEGMENTATION USING MACHINE LEARNING
3. EXISTING SYSTEM
3.2 Disadvantages……………………………………………………. 7
4. PROPOSED SYSTEM
4.2 Advantages………………………………………………….......... 8
5.1 Algorithm…………………………………………………………. 9
5.2 Factors…………………………………………………………….. 10 6.
EXPERIMENT RESULTS………………………………… 11
8.CONCLUSION…………………………………………….... 28
9.REFERENCES…………………………………………….... 29
CUSTOMER SEGMENTATION USING MACHINE LEARNING
LIST OF FIGURES
Figure 1- Dataset ............................................................................................ 18
Figure 2- Data Preprocessing ......................................................................... 18
Figure 3- Distance ...........................................................................................19
Figure 4- Purchase Rate ................................................................................. 20
Figure 5 - Gender.............................................................................................21
INTRODUCTION
1.1 Introduction
Customer segmentation is the process of dividing a target market into distinct groups based
on their characteristics and behaviours. By analysing data such as demographics,
psychographics, and purchase history, businesses can identify common patterns among
customers. This segmentation allows businesses to personalize their marketing efforts,
allocate resources effectively, enhance customer satisfaction, foster loyalty, and drive
innovation. Overall, customer segmentation helps businesses understand and serve their
customers better, leading to improved business performance and growth.
1.2 Motivation
The motivation behind customer segmentation is to personalize marketing efforts, target the
right audience, optimize resources, enhance customer satisfaction and loyalty, differentiate
in the market, and drive product development and innovation. By understanding and catering
to the diverse needs of their customers, businesses can gain a competitive advantage and
achieve sustainable growth.
1.4 Objectives
3
drive product development and innovation, creating new offerings that address specific customer
requirements and stay ahead of competitors.
1.5 Goal
The goal of customer segmentation is to divide a target market into distinct groups based on
specific characteristics and behaviours, in order to better understand and meet the needs of
different customer segments. The primary goal is to enable businesses to deliver personalized
experiences, tailor marketing efforts, allocate resources effectively, enhance customer
satisfaction, foster customer loyalty, differentiate in the market, and drive business growth.
Ultimately, the goal of customer segmentation is to improve overall business performance
by effectively catering to the diverse needs and preferences of customers.
1.6 Scope
The scope of customer segmentation involves collecting relevant customer data, selecting
segmentation criteria, analysing the data to identify patterns, grouping customers into distinct
segments, creating segment profiles, developing targeted marketing strategies, implementing
and testing those strategies, and continuously refining the segmentation approach. It
encompasses various departments and functions within a business and aims to understand
customers, deliver personalized experiences, and improve overall business performance.
1.7 Applications
1.8 Limitations
2. Static Nature: Segmentation is often based on historical data and assumptions about
customer behavior. However, customer preferences and behaviors can change over time.
Segmentation models may not capture these changes effectively, resulting in outdated
and less accurate segment profiles.
CHAPTER-2
LITERATURE SURVEY
If a dataset contains null values, duplicates, or other noisy data, data cleaning must be performed.
Data cleansing ensures that information is reliable, usable, and available for analysis.
When we have the data, we may visualize it by comparing the distance and purchase rate, which is
gender-specific. According to the study, there are five different types of plots that illustrate groups of
customers who engage in the following activities, as well as customer behaviours linked to yearly
distance and purchase rate.
We can now build a K-means model based on the fact that there are a lot of groups, but not in great
detail. The silhouette coefficient approach is used to do Clustering using k-means for a range of k
clusters (let's say 1 to 5) and estimate the sum of square distances from each point to its assigned
canter for each value. Decide on the number of clusters that will give you the best silhouette score.
This defines how the silhouette score is calculated. We noticed that once K=5 is reached, there is no
rapid movement in WCSS (Within Cluster Sum of Squares). And, given the number of clusters we
have now, K=5 will be the correct number of clusters 5.
We can divide the plot into various groups, determine cluster can be prioritized, and then assign a
label to each using the method stated above. The K-means approach can be used to decide which of
the five clusters using distance and purchase rate.
CUSTOMER SEGMENTATION USING MACHINE LEARNING
2.2 Study
Key Features of Customer Segmentation are
• Improved Targeting
2.3 Summary
Customer segmentation is a strategic approach used by businesses to divide their target market
into distinct groups based on characteristics and behaviours. It aims to understand and meet the unique
needs of different customer segments. The benefits include improved targeting, enhanced satisfaction,
optimized resource allocation, customer loyalty, and innovation. The process involves data collection,
analysis, profile creation, and targeted marketing. However, there are limitations such as
oversimplification, data limitations, and biases. Applications include marketing, product
development, customer retention, pricing, and personalized experiences. Overall, customer
segmentation helps businesses understand customers and drive growth.
CUSTOMER SEGMENTATION USING MACHINE LEARNING
CHAPTER-3
EXISTING SYSTEM
• Age
• Gender
• Income
• Education
• Ethnicity
3.2 DBSCAN
Clustering analysis or simply Clustering is basically an Unsupervised learning method that divides the data
points into a number of specific batches or groups, such that the data points in the same groups have similar
properties and data points in different groups have different properties in some sense. It comprises many
different methods based on differential evolution.
E.g. K-Means (distance between points), Affinity propagation (graph distance), Mean-shift (distance between
points), DBSCAN (distance between nearest points), Gaussian mixtures (Mahalanobis distance to centers),
Spectral clustering (graph distance), etc.Fundamentally, all clustering methods use the same approach i.e. first
we calculate similarities and then we use it to cluster the data points into groups or batches. Here we will focus
on the Density-based spatial clustering of applications with noise (DBSCAN) clustering method.
CUSTOMER SEGMENTATION USING MACHINE LEARNING
1. Find all the neighbor points within eps and identify the core points or visited with more than MinPts
neighbors.
2. For each core point if it is not already assigned to a cluster, create a new cluster.
3. Find recursively all its density-connected points and assign them to the same cluster as the core point.
A point a and b are said to be density connected if there exists a point c which has a sufficient number of points
in its neighbors and both points a and b are within the eps distance. This is a chaining process. So, if b is a
neighbor of c, c is a neighbor of d, and d is a neighbor of e, which in turn is neighbor of a implying that b is a
neighbor of a.
4.Iterate through the remaining unvisited points in the dataset. Those points that do not belong to any cluster
are noise.
3.3 Disadvantages
CHAPTER-4
PROPOSED SYSTEM
The proposed system of Customer Segmentation is based on using both demographic and geological
factors instead of using only demographic factors.
Instead of taking Annual Income and Spending Score we are taking Distance and Purchase rate.
4.2 Advantages
• Provide Home Delivery services to the customers who are in rural areas.
10
Hardware requirements:
11
CHAPTER-5
5.1 Algorithm
Clustering
Clustering is the task of dividing the population or data points into a number of groups such that data
points in the same groups are more similar to other data points in the same group and dissimilar to
the data points in other groups. It is basically a collection of objects on the basis of similarity and
dissimilarity between them.
K-Means
Unsupervised Machine Learning is the process of teaching a computer to use unlabeled, unclassified data and
enabling the algorithm to operate on that data without supervision. Without any previous data training, the
machine’s job in this case is to organize unsorted data according to parallels, patterns, and variations.
K means clustering, assigns data points to one of the K clusters depending on their distance from the center
of the clusters. It starts by randomly assigning the clusters centroid in the space. Then each data point assign
CUSTOMER SEGMENTATION USING MACHINE LEARNING
12
to one of the cluster based on its distance from centroid of the cluster. After assigning each point to one of
the cluster, new cluster centroids are assigned. This process runs iteratively until it finds good cluster. In the
analysis we assume that number of cluster is given in advanced and we have to put points in one of the
group.
In some cases, K is not clearly defined, and we have to think about the optimal number of K. K Means
clustering performs best data is well separated. When data points overlapped this clustering is not suitable. K
Means is faster as compare to other clustering technique. It provides strong coupling between the data
points. K Means cluster do not provide clear information regarding the quality of clusters. Different initial
assignment of cluster centroid may lead to different clusters. Also, K Means algorithm is sensitive to noise. It
maymhave stuck in local minima.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.
The k-means algorithm is commonly used for customer segmentation due to its simplicity and effectiveness in
identifying distinct groups within a dataset.
1. Data Preparation: Gather the customer data that includes relevant attributes for
segmentation, such as demographics, purchase history, behavior, or any other variables that provide
insights into customer characteristics.
2. Select the Number of Clusters (k): Determine the desired number of segments or
clusters based on the specific needs and goals of the segmentation analysis. The appropriate value of
CUSTOMER SEGMENTATION USING MACHINE LEARNING
13
k can be determined through prior knowledge, domain expertise, or by utilizing techniques like the
elbow method or silhouette score.
3. Attribute Scaling: Normalize or standardize the attribute values if they have different
scales or variances. This is important to ensure that all attributes contribute equally to the clustering
process.
4. Assign Data Points to Clusters: Measure the similarity or distance between each
data point and the cluster centers using a distance metric such as Euclidean distance. Assign each data
point to the nearest cluster center based on the minimal distance.
5. Evaluate and Interpret the Results: Analyze the resulting clusters based on the
attributes and characteristics of the customers in each cluster. Determine meaningful segment labels
or profiles for each cluster to gain insights into customer behavior, preferences, or other relevant
factors.
5.2 Factors
Factors include both demographic and geographic factors are used for segmentation.
CUSTOMER SEGMENTATION USING MACHINE LEARNING
14
CHAPTER-6
EXPERIMENT RESULTS
Experiment Results
Mall shoppers can be divided into five groups depending on their yearly earnings and spending
habits. By analyzing the data, we can predict customer behaviour based on their distance and purchase
rate. This cluster analysis may be applied to a number of consumer marketing methods. By using this
clustering data we want to deliver the products who are far from the shopping mall and make more
sales and more beneficial to shopping mall sales. A cluster analysis may be used to establish what
kind of things clients wish to consume, allowing for the development of more targeted marketing
efforts. The people in clusters 3 and 4 are the potential clients in this situation.
CUSTOMER SEGMENTATION USING MACHINE LEARNING
15
CHAPTER-7
SOURCE CODE
cs.head()
CUSTOMER SEGMENTATION USING MACHINE LEARNING
16
m.figure(figsize=(15,5))
s.countplot(y="Gender",data=cs)
m.show()
17
m.figure(figsize=(15,6))
s.barplot(x=agex, y=agey, palette="mako")
m.title("Number of Customer and Ages")
m.xlabel("Age")
m.ylabel("Number of Customer")
m.show()
18
m.figure(figsize=(15,6))
s.barplot(x=ssx, y=ssy, palette="rocket")
m.title("PurchaseRate")
m.xlabel("Rate")
m.ylabel("Number of Customer having the purchase rate") m.show()
CUSTOMER SEGMENTATION USING MACHINE LEARNING
19
m.figure(figsize=(6,3))
m.grid()
m.plot(range(1,12),wcss,linewidth=2,color="red",marker="8")
m.xlabel("K Value")
m.ylabel("WCSS")
m.show()
Step7: Labeling
kmc=KMeans(n_clusters=5)
lab=kmc.fit_predict(x1) print(lab)
print(kmc.cluster_centers_)
m.scatter(x1[:,0],x1[:,1],c=kmc.labels_, cmap='rainbow')
m.scatter(kmc.cluster_centers_[:,0], kmc.cluster_centers_[:,1],
color="black")
m.title("Clusters of Customers")
CUSTOMER SEGMENTATION USING MACHINE LEARNING
20
m.xlabel("Distance(km)")
m.ylabel("PurchaseRate")
m.show()
21
print("*******************************************")
OUTPUT Step2:
Figure 1- Dataset
Step3:
22
Customer int64
Village object
Distance(km) float64
Customer 0
Gender 0
Age 0
Village 0
Distance(km) 0 PurchaseRate
0 dtype: int64
Step4:
Figure 3- Distance
CUSTOMER SEGMENTATION USING MACHINE LEARNING
23
Figure 5 - Gender
CUSTOMER SEGMENTATION USING MACHINE LEARNING
24
25
Step5:
CUSTOMER SEGMENTATION USING MACHINE LEARNING
26
27
Step6:
Step7:
[2 1 0 4 4 3 3 3 4 1 0 1 4 4 3 3 2 4 0 0 2 2 2 0 4 1 0 0 2 1 2 3 2 2 0 0 4
3 3 4 4 2 2 0 4 0 1 1 0 4 4 1 4 3 3 3 3 4 4 4 3 3 3 3 4 2 2 2 2 2 2 4 1
1
0 2 2 3 0 1 4 0 2 1 1 2 1 1 0 2 3 0 0 2 2 0 1 0 2 0 2 3 2 2 0 2 3 1 0 2 0
1 0 0 0 2 1 0 2 2 0 2 0 0 0 2 2 2 2 0 0 2 2 2 2 0 1 0 0 1 0 0 1 0 2 23 0
0 1 2 2 0 3 2 0 2 2 2 1 0 3 1 0 0 2 0 2 0 2 1 0 2 0 2 1 2 1 0 0 1 0 01 1
1 1 0 1 2 1 0 1 0 0 2 2 1 2 1]
Step8:
[[ 30.82033898 14.84745763]
[ 73.22105263 45.23684211]
[ 31.6 80.41666667]
[129.07826087 82.34782609]
[123.57 14.7 ]]
CUSTOMER SEGMENTATION USING MACHINE LEARNING
28
Step9:
29
Step10:
Step11:
CUSTOMER SEGMENTATION USING MACHINE LEARNING
30
Customers present in group1= 20
Customers are - [ 4 5 9 13 14 18 25 37 40 41 45 50 51 53 58 59 60
65 72 81]
******************************************
Customers present in group2= 59
Customers are - [ 3 11 19 20 24 27 28 35 36 44 46 49 75 79 82
89 92 93
96 98 100 105 109 111 113 114 115 118 121 123 124 125 130 131 136 138
139 141 142 144 148 149 153 156 161 164 165 167 169 172 174 179 180 182
183 188 192 194 195]
-******************************************
Customers present in group3= 60
Customers are - [ 1 17 21 22 23 29 31 33 34 42 43 66 67 68 69
70 71 76
77 83 86 90 94 95 99 101 103 104 106 110 116 119 120 122 126 127
128 129 132 133 134 135 145 146 151 152 155 157 158 159 166 168 170 173
175 177 190 196 197 199]
******************************************
Customers present in group4= 23
Customers are - [ 6 7 8 15 16 32 38 39 54 55 56 57 61
62 63 64 78 91
102 107 147 154 162]
******************************************
Customers present in group5= 38
Customers are - [ 2 10 12 26 30 47 48 52 73 74 80 84 85 87 88
97 108 112
117 137 140 143 150 160 163 171 176 178 181 184 185 186 187 189 191
193
198 200]
*******************************************
CUSTOMER SEGMENTATION USING MACHINE LEARNING
31
CHAPTER-8
CONCLUSION
• In today's highly competitive business environment, customer segmentation is an
essential strategy for companies that want to stay ahead of the curve.
32
CHAPTER-9
Future Scope
Customer segmentation using machine learning has a promising future with several avenues for
growth and innovation. Here are some key areas of future scope:
Multimodal Data Integration: With the proliferation of data sources such as social media, IoT devices,
and online interactions, future customer segmentation models may incorporate diverse data types,
including text, images, videos, and sensor data. Advanced techniques like multimodal learning could
enhance segmentation accuracy and depth.
Unsupervised Learning Techniques: While supervised learning methods are commonly used for
customer segmentation, unsupervised learning techniques such as self-organizing maps (SOMs),
autoencoders, and generative adversarial networks (GANs) hold potential for discovering hidden
patterns and structures in data without the need for labeled examples.
Real-time Segmentation and Decision Making: With advancements in computational power and
streaming data processing technologies, the future of customer segmentation may involve real-time
CUSTOMER SEGMENTATION USING MACHINE LEARNING
33
segmentation models capable of analyzing data streams and adapting marketing strategies on the fly
to meet changing customer needs and preferences.
Ethical and Fair Segmentation Practices: As machine learning algorithms play an increasingly central
role in customer segmentation, there will be a greater emphasis on ensuring ethical and fair practices,
including mitigating biases, protecting customer privacy, and maintaining transparency in
segmentation processes.
Integration with Marketing Automation Platforms: Future customer segmentation solutions may
seamlessly integrate with marketing automation platforms, enabling businesses to automate
personalized marketing campaigns, targeted advertisements, and customer communication across
various channels.
Chapter -10
REFERENCES
[1] Tushar Kansal, 2 Suraj Bahuguna , 3 Vishal Singh, 4 Tanupriya Choudhury ,”Customer
Segmentation using K-means Clustering” ,2018.
[3] https://www.analyticsvidhya.com/blog/2021/05/k-means-clustering-
withmallcustomersegmentation-data-full-detailed-code-and-explanation/t