Summarize Data Sets
Summarize Data Sets
Summarize Data Sets
Business
DEPARTMENT -AIT
M.B.A
Advanced Predictive Modelling-Course Code
22BBT616
1
• Why do we summarize? We summarize data to “simplify” the data and
quickly identify what looks “normal” and what looks odd.
The distribution of a variable shows what values the variable takes
and how often the variable takes these values.
•There are fou r key areas to consid er when su mmarizing a set of n umbers:
• The shape of the data affects the type of summary statistics that best summarize them. The
“shape” refers to how the data values are distributed across the range of values in the
sample. Generally you expect there to be a “cluster” of values around the average. It is
important to know if the values are more or less symmetrically arranged around the average,
or if there are more values to one side than the other.
• There are two main ways to explore the shape (distribution) of a sample of data values:
• Graphically – using frequency histograms or tally plots draws a picture of the sample shape.
• Shape statistics – such as skewness and kurtosis. These give values to how central the
average is and how clustered around the average the data are.
• The ultimate goal is to determine what kind of distribution your data forms. If you have
normal distribution you have a wide range of options when it comes to data summary and
subsequent analysis.
Types of data distribution