The Art of Data Analysis: January 2015
The Art of Data Analysis: January 2015
The Art of Data Analysis: January 2015
net/publication/283269432
CITATIONS READS
2 7,363
1 author:
Muhammad Ibrahim
Govt. M A O College
54 PUBLICATIONS 222 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Muhammad Ibrahim on 27 October 2015.
Muhammad Ibrahim1
Abstract
1. Muhammad Ibrahim,
After collection of reliable data next step is to analyze the data, here is briefed review Department of Statistics, Govt. MAO
of data analysis procedure College, Lahore
Contact No.0300-4668681
Email: Ibrahim_ap98@yahoo.com
Corresponding Author:
Muhammad Ibrahim
Department of Statistics, Govt. MAO
College, Lahore
Contact No. +92-300-4668681
Email: ibrahim.ap12@gmail.com
35
30
30 26
25
China
20
17%
15
USA Pakistan
10
38% 6%
5
India
0
11%
I st I i nd IIIr d
UK
28%
SIMPLE BAR CHART
China Pakistan India UK USA
Multiple or Sub divided chart
Pie charts
It is simply the extension of simple bar
charts, which represents the more than one related HISTOGRAM
set of data.
A histogram is a graph of continuous data
40 like weight, height, age etc. to see the theoretical
40
35
35 shape of data. The curve of histogram tells us
30
30 whether data is skewed or symmetrical. Histogram
25 25 consists of a series of adjacent rectangles drawn
25
20 for a grouped frequency.
15 15
15
10 10
10
5
5
0 100
China Pakistan India UK USA
80
Multiple charts
60
80
40
70
60
50 20
Std. Dev = 9.95
40 Mean = 56.1
30 0 N = 387.00
x x
10
x2 ........ xn
8
x 1
n n
6
x x
2 Std. Dev = 10.57
Mean = 24.6
x2 ........ xn
0 N = 55.00 x 1
10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0
n n
AGE 1.7 2.2 3.9 3.11 14.7
5
5.12
45
40
35
30
25 MEDIAN:
20 Median is defined as the central
15
10
value of the arranged data; it does not depend on
5 magnitude of values but only on size of values.
0 Mathematically it is written as:
0 10 20 30 40
n 1
Scatter diagram
median th value of data
2
Measure of Location In the above example the median is 3.9. Other
allied measures of median are quartile and
percentiles which are useful in the application of
probability theory.
The graphical representation of data gives
us only tentative picture of data; it is unable to tell
MODE:
us to provide the exact picture of distribution and
It is that value of the data which is more
to estimate or predict the value. The important
frequent in the data. Suppose we have collar size
feature which able us to describe the data in
of patients as:
The art of Data Analysis 102
12, 14, 12, 15, 12, 13, 14, 12, 15. Here 12 is the SD
mode of data. CV *100
x
MEASURE OF VARIATION:
Coefficient of variation is used to compare the
The sample variability plays an important data from different sources. Standard deviation is
role in data analysis. In biological and pathological much useful to describe the shape of distribution.
characteristics, the variations are more common in
real life facts. There are many common measures
of variation but the most useful measures are MEASURE OF RELATIONSHIP:
range, Standard deviation and variance.
In certain situations the researcher is
Range: interested to find out the relationship between
variables. Whether there is a strong relationship
It is difference between maximum and between two variables or weaker between
minimum values. The range depends only on variables?
extreme values of the data and does not consider The measure of relationship is classified
the other values. The occurrence of extreme values as:
in the data greatly influence the range, so it is not
consider a good measure of dispersion, even 1. Regression.
though it is used in certain circumstances. 2. Correlation.
x x
2
independent variable and “a” is initial value of the
dependent variable.
SD=
n CORRELATION:
SD r
xy n x y
SE ( x)
x n x y n y
2 2 2 2
n
and other related quantities with standard
deviation is Coefficient of variation. The value of “r” lies between -1 and +1. The
coefficient of determination 100 r2 explains the
variations in dependent variable due to
independent variable.
The art of Data Analysis 103
2 Statistic
is used to test the goodness of fit
between observed data and expected data:
Critical region
Computation
DECISION