Sta301 Final Quizz by Sarfraz
Sta301 Final Quizz by Sarfraz
Sta301 Final Quizz by Sarfraz
com/
http://vuattach.ning.com/
FINALTERM EXAMINATION
FALL 2006
Marks: 50
Time: 120min
StudentID/LoginID:
______________________________
Student Name:
______________________________
Center Name/Code:
______________________________
Exam Date:
Tuesday, February 06, 2007
Please read the following instructions carefully before attempting any of the
questions:
1. Attempt all questions. Marks are written adjacent to each question.
2. Do not ask any question about the contents of this examination from
anyone.
a. If you think that there is something wrong with any of the questions,
attempt it to the best of your understanding.
b. If you believe that some essential piece of information is missing,
make an appropriate assumption and use it to solve the problem.
c. Write all steps, missing steps may lead to deduction of marks.
3. You are allowed to use the calculator & Statistical tables in order to solve the
http://vuattach.ning.com/
questions.
4. For your convenience we are providing you the following symbols,
,∑
log x
variance for variance, ,for square root or whole square root.
**WARNING: Please note that Virtual University takes serious note of unfair
means. Anyone found involved in cheating will get an `F` grade in this
course.
Question
1
2
3
4
5
6
7
8
9
10
Total
Marks
Question
11
12
13
14
http://vuattach.ning.com/
Marks
Statistics as a subject, in which two of parts is divided? Expalin briefly both of parts.
170, 169
170, 170
169, 170
176, 169
Quantitative variable
Qualitative variable
Discrete variable
Continuous variable
0
http://vuattach.ning.com/
►
►
µ
►
σ
Normal distribution is
Uni-model
Bi-modal
►
http://vuattach.ning.com/
Multi-model
None of these
One sided and two sided critical regions are based on:
Level of significance
Sample size
Null hypothesis
►
http://vuattach.ning.com/
Alternative hypothesis
Estimate
Estimator
Denominator
None of these
Level of significance
Type-1 error
Type-II error
None of above
Zero
Negative
Greater than 1
None of these
The grade-point averages of college seniors selected at random from the graduating class are as
follows:
3.2
1.9
2.7
2.4
2.8
2.9
3.8
3.0
http://vuattach.ning.com/
2.5
3.3
1.8
2.5
3.7
2.8
2.0
3.2
2.3
2.1
2.5
1.0
The mean lifetime of electric light bulbs produced by a company has in the past been 1120 hours
with a standard deviation of 125 hours. A sample of 8 electric bulbs recently chosen form a supply
of newly manufactured bulbs showed a mean lifetime of 1070 hours. Test the hypothesis that mean
lifetime of the bulbs has not changed using a level of significance of 0.05.
A random sample of 200 voters is selected and 120 are found to support an annexation suit. Find the
96% confidence interval for the fraction of the voting population favoring the suit.
FINALTERM EXAMINATION
Fall 2009
STA301- Statistics and Probability (Session - 1)
Time: 120 min
Marks: 70
Student Info
StudentID:
http://vuattach.ning.com/
Center:
Q No. 9 10 11 12 13 14 15 16
Marks
Q No. 17 18 19 20 21 22 23 24
Marks
Q No. 25 26 27 28 29 30 31
Marks
http://vuattach.ning.com/
► 362880
► 3628800
► 362280
► 362800
►2
►0
► 0.5
►1
► Zero
► Less than 1
► Greater than 1
► Negative
► Degrees of freedom
► Sample size
► Mean
► Variance
E ( X ) + E (Y )
►
E ( X ) − E (Y )
►
X − E(Y)
►
E ( X ) −Y
►
http://vuattach.ning.com/
Question No: 6 ( Marks: 1 ) - Please choose one
In testing
hypothesis, we always begin it with assuming that:
► 10
► 0.135
1
► 4
1
► 2
3
► 4
►1
► Single value
► Two values
► Range of values
► Zero
► Unbiased estimator of σ
2
► Biased estimator of σ
2
► Unbiased estimator of µ
► None of these
► 16 Var (X)
► 16 Var (X) + 5
► 4 Var (X) + 5
► 12 Var (X)
∫ ∫ f ( x, y ) dx dy
−∞ −∞ is equal to:
►1
►0
► -1
► ∞
► .0401
► .5500
► .4599
► .9599
► 16 outcomes
► Frequency polygon
► Ogive
► Histogram
► Frequency curve
http://vuattach.ning.com/
►2
►5
► 10
► 20
http://vustudents.ning.com
Question No: 24 ( Marks: 3 )
For given
data calculate the mean and standard deviation of sampling distribution of mean if the
sampling is down without replacement.
N = 1000, n = 25, µ = 68.5, σ = 2.7
Factory Sam
f ( x) = 2
0, elsewhere http://vuattach.ning.com/
Question No: 27 ( Marks: 5 )
The
means and variances of the weekly incomes in rupees of two samples of workers are given in
the following table, the samples being randomly drawn from two different factories:
A
http://vustudents.ning.com
Calculate the 90% confidence interval for the real difference in the incomes of the workers
from the two factories.
Question No: 28
B( Marks: 5 )
FINALTERM EXAMINATION
Fall 2009
STA301- Statistics and Probability (Session - 1)
Time: 120 min
Marks: 70
Student Info
StudentID:
Center:
Q No. 9 10 11 12 13 14 15 16
Marks
Q No. 17 18 19 20 21 22 23 24
Marks
Q No. 25 26 27 28 29 30 31
Marks
http://vuattach.ning.com/
► 362880
► 3628800
► 362280
► 362800
►2
►0
► 0.5
►1
► Zero
► Less than 1
► Greater than 1
► Negative
► Degrees of freedom
► Sample size
► Mean
► Variance
E ( X ) + E (Y )
►
E ( X ) − E (Y )
►
X − E(Y)
►
E ( X ) −Y
►
http://vuattach.ning.com/
Question No: 6 ( Marks: 1 ) - Please choose one
In testing
hypothesis, we always begin it with assuming that:
► 10
► 0.135
1
► 4
1
► 2
3
► 4
►1
► Single value
► Two values
► Range of values
► Zero
► Unbiased estimator of σ
2
► Biased estimator of σ
2
► Unbiased estimator of µ
► None of these
► 16 Var (X)
► 16 Var (X) + 5
► 4 Var (X) + 5
► 12 Var (X)
∫ ∫ f ( x, y ) dx dy
−∞ −∞ is equal to:
►1
►0
► -1
► ∞
► .0401
► .5500
► .4599
► .9599
► 16 outcomes
► Frequency polygon
► Ogive
► Histogram
► Frequency curve
http://vuattach.ning.com/
►2
►5
► 10
► 20
http://vustudents.ning.com
Question No: 24 ( Marks: 3 )
For given
data calculate the mean and standard deviation of sampling distribution of mean if the
sampling is down without replacement.
N = 1000, n = 25, µ = 68.5, σ = 2.7
Factory Sam
f ( x) = 2
0, elsewhere http://vuattach.ning.com/
Question No: 27 ( Marks: 5 )
The
means and variances of the weekly incomes in rupees of two samples of workers are given in
the following table, the samples being randomly drawn from two different factories:
A
http://vustudents.ning.com
Calculate the 90% confidence interval for the real difference in the incomes of the workers
from the two factories.
Question No: 28
B( Marks: 5 )
► Negatively skewed
► J-shaped
► Symmetrical
► Positively skewed
► Changed
► Vanish
► Does not changed
► Dependent
► 0 to 1
► 0 to -∞
► -∞ to +∞
► 0 to +∞
►n-p
► n - p-1
► n - p- 2
►n–2
http://vuattach.ning.com/
► -∞ ≤ χ2≤ ∞
► -∞ ≤χ2 ≤1
► -∞ ≤χ2 ≤0
► 0 ≤ χ2≤ ∞ 348
E ( X ) + E (Y )
►
E ( X ) − E (Y )
►
X − E(Y)
►
E ( X ) − Y answr
►
► ŷ = mx + b, where m = slope
► x = ŷ + mb, where m = slope
► ŷ = x/m + b, where m = slope
► ŷ = x + mb, where m = slope
► Null hypothesis
► Alternative hypothesis
► Value of alpha
► Value of test-statistic
ν
σ2 =
► ν −2
ν2
σ2 =
► ν −2
http://vuattach.ning.com/
ν
σ2 =
► ν −1
ν
σ2 =
► ν −2
2
Zα .σ
n= 2
e
►
2
Zα . σ
n= 2
e
►
2
Zα . X
n= 2
e
►
Zα .σ
n= 2
► e
► Non-negative
► Negative
► One
► Zero
► 12 E (X)
► 4 E (X) + 5
► 16 E (X) + 5
► 16 E (X)
f ( x |1) =
__________:
f ( 1,1)
►
f ( x,1)
►
f ( x,1)
h ( 1)
►
f ( x,1)
h ( x)
►
► .0401
► .5500
► .4599
► .9599
► 0.5σ
► 0.75σ
► 0.7979σ
► 0.6445σ
► Finite Set
► Infinite Set
► Universal Set
► No of these
► In all situations
► Infinite population
► Finite population
► Concrete population
► Hypothetical population
After drawing possible samples, we have calculated sampling mean u x = 7 and sampling
σ2
a )µ x = µ , b) σ x2 =
variance σ = 5.833 . Verify
2
x n
following:
If s=15, x =14 and t=3, what is values of n?
► 362880
► 3628800
► 362280
► 362800
►2
►0
► 0.5
►1
► Zero
► Less than 1
► Greater than 1
► Negative
► Degrees of freedom
► Sample size
► Mean
► Variance
E ( X ) + E (Y )
►
E ( X ) − E (Y )
►
X − E(Y)
►
E ( X ) −Y
►
http://vuattach.ning.com/
► 10
► 0.135
1
► 4
1
► 2
3
► 4
►1
► Single value
► Two values
► Range of values
► Zero
► Unbiased estimator of σ
2
► Biased estimator of σ
2
► Unbiased estimator of µ
► None of these
► 16 Var (X)
► 16 Var (X) + 5
► 4 Var (X) + 5
► 12 Var (X)
∫ ∫ f ( x, y ) dx dy
−∞ −∞ is equal to:
►1
►0
► -1
► ∞
► .0401
► .5500
► .4599
► .9599
► 16 outcomes
► Frequency polygon
► Ogive
► Histogram
► Frequency curve
Question No: 20 Factory
( Marks: 1 )
http://vuattach.ning.com/
20 values all the values are 10, what is the value of median?
Sam
In a set of
A
►2
►5
► 10
► 20
B
1 3 3 1
P ( X = 0 ) 8 P ( X = 1) 8 P ( X = 2 ) 8
= , = , = and P ( X = 3) = 8
Then find F (1)
Question No: 22 ( Marks: 2 )
Write
down the formula of mathematical expectation.
Calculate the 90% confidence interval for the real difference in the incomes of the workers
from the two factories.
x , for 0 ≤ x ≤ 2
f ( x) = 2
0, elsewhere http://vuattach.ning.com/
Question No: 28 ( Marks: 5 )
From the
given data n = 1340, x = 723, p = .54 and H 0 : P0 = 0.5 against H1 : P0 ≠ 0.5 .
Carry out the significance test for the stated hypothesis.
X1 + X 2 + X 3
T1 =
3
X + 2 X 2 + X3
T2 = 1
4
Which estimator should be preferred?
o understand statistical techniques underlying decisions that affect our lives and
well-being; and
o make informed decisions.
• Graphs - visual display of data used to present frequency distributions so that the
shape of the distribution can easily be seen.
o Bar graph - a form of graph that uses bars separated by an arbitrary amount
of space to represent how often elements within a category occur. The higher
the bar, the higher the frequency of occurrence. The underlying measurement
scale is discrete (nominal or ordinal-scale data), not continuous.
o Histogram - a form of a bar graph used with interval or ratio-scaled data.
Unlike the bar graph, bars in a histogram touch with the width of the bars
defined by the upper and lower limits of the interval. The measurement scale
is continuous, so the lower limit of any one interval is also the upper limit of the
previous interval.
o Boxplot - a graphical representation of dispersions and extreme scores.
Represented in this graphic are minimum, maximum, and quartile scores in the
form of a box with "whiskers." The box includes the range of scores falling into
the middle 50% of the distribution (Inter Quartile Range = 75th percentile - 25th
percentile)and the whiskers are lines extended to the minimum and maximum
scores in the distribution or to mathematically defined (+/-1.5*IQR) upper and
lower fences.
o Scatterplot - a form of graph that presents information from a bivariate
distribution. In a scatterplot, each subject in an experimental study is
represented by a single point in two-dimensional space. The underlying scale
of measurement for both variables is continuous (measurement data). This is
one of the most useful techniques for gaining insight into the relationship
between tw variables.
• Measures of Center - Plotting data in a frequency distribution shows the general
shape of the distribution and gives a general sense of how the numbers are
bunched. Several statistics can be used to represent the "center" of the distribution.
These statistics are commonly referred to as measures of central tendency.
o Mode - The mode of a distribution is simply defined as the most frequent or
common score in the distribution. The mode is the point or value of X that
corresponds to the highest point on the distribution. If the highest frequency is
http://vuattach.ning.com/
shared by more than one value, the distribution is said to be multimodal. It is
not uncommon to see distributions that are bimodal reflecting peaks in scoring
at two different points in the distribution.
o Median - The median is the score that divides the distribution into halves; half
of the scores are above the median and half are below it when the data are
arranged in numerical order. The median is also referred to as the score at the
50th percentile in the distribution. The median location of N numbers can be
found by the formula (N + 1) / 2. When N is an odd number, the formula yields
a integer that represents the value in a numerically ordered distribution
corresponding to the median location. (For example, in the distribution of
numbers (3 1 5 4 9 9 8) the median location is (7 + 1) / 2 = 4. When applied to
the ordered distribution (1 3 4 5 8 9 9), the value 5 is the median, three scores
are above 5 and three are below 5. If there were only 6 values (1 3 4 5 8 9),
the median location is (6 + 1) / 2 = 3.5. In this case the median is half-way
between the 3rd and 4th scores (4 and 5) or 4.5.
o Mean - The mean is the most common measure of central tendency and the
one that can be mathematically manipulated. It is defined as the average of a
distribution is equal to the ΣX / N. Simply, the mean is computed by summing
all the scores in the distribution (ΣX) and dividing that sum by the total number
of scores (N). The mean is the balance point in a distribution such that if you
subtract each value in the distribution from the mean and sum all of these
deviation scores, the result will be zero.
• Measures of Spread - Although the average value in a distribution is informative
about how scores are centered in the distribution, the mean, median, and mode lack
context for interpreting those statistics. Measures of variability provide information
about the degree to which individual scores are clustered about or deviate from the
average value in a distribution.
o Range - The simplest measure of variability to compute and understand is the
range. The range is the difference between the highest and lowest score in a
distribution. Although it is easy to compute, it is not often used as the sole
measure of variability due to its instability. Because it is based solely on the
most extreme scores in the distribution and does not fully reflect the pattern of
variation within a distribution, the range is a very limited measure of variability.
o Interquartile Range (IQR) - Provides a measure of the spread of the middle
50% of the scores. The IQR is defined as the 75th percentile - the 25th
percentile. The interquartile range plays an important role in the graphical
method known as the boxplot. The advantage of using the IQR is that it is
easy to compute and extreme scores in the distribution have much less impact
but its strength is also a weakness in that it suffers as a measure of variability
because it discards too much data. Researchers want to study variability while
eliminating scores that are likely to be accidents. The boxplot allows for this
for this distinction and is an important tool for exploring data.
o Variance - The variance is a measure based on the deviations of individual
scores from the mean. As noted in the definition of the mean, however, simply
summing the deviations will result in a value of 0. To get around this problem
the variance is based on squared deviations of scores about the mean. When
the deviations are squared, the rank order and relative distance of scores in
http://vuattach.ning.com/
the distribution is preserved while negative values are eliminated. Then to
control for the number of subjects in the distribution, the sum of the squared
deviations, Σ(X - X), is divided by N (population) or by N - 1 (sample). The
result is the average of the sum of the squared deviations and it is called the
variance.
Discrete Data
A set of data is said to be discrete if the values / observations belonging to it are distinct and
separate, i.e. they can be counted (1,2,3,....). Examples might include the number of kittens in
a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth;
gender (male, female); blood group (O, A, B, AB).
Compare continuous data.
Categorical Data
A set of data is said to be categorical if the values or observations belonging to it can be
sorted according to category. Each value is chosen from a set of non-overlapping categories.
For example, shoes in a cupboard can be sorted according to colour: the characteristic 'colour'
can have non-overlapping categories 'black', 'brown', 'red' and 'other'. People have the
characteristic of 'gender' with categories 'male' and 'female'.
Categories should be chosen carefully since a bad choice can prejudice the outcome of an
investigation. Every value should belong to one and only one category, and there should be no
doubt as to which one.
Nominal Data
A set of data is said to be nominal if the values / observations belonging to it can be assigned
a code in the form of a number where the numbers are simply labels. You can count but not
order or measure nominal data. For example, in a data set males could be coded as 0,
females as 1; marital status of an individual could be coded as Y if married, N if single.
Ordinal Data
A set of data is said to be ordinal if the values / observations belonging to it can be ranked (put
in order) or have a rating scale attached. You can count and order, but not measure, ordinal
data.
The categories for an ordinal set of data have a natural order, for example, suppose a group of
people were asked to taste varieties of biscuit and classify each biscuit on a rating scale of 1
to 5, representing strongly dislike, dislike, neutral, like, strongly like. A rating of 5 indicates
more enjoyment than a rating of 4, for example, so such data are ordinal.
However, the distinction between neighbouring points on the scale is not necessarily always
the same. For instance, the difference in enjoyment expressed by giving a rating of 2 rather
than 1 might be much less than the difference in enjoyment expressed by giving a rating of 4
rather than 3.
Interval Scale
http://vuattach.ning.com/
An interval scale is a scale of measurement where the distance between any two adjacents
units of measurement (or 'intervals') is the same but the zero point is arbitrary. Scores on an
interval scale can be added and subtracted but can not be meaningfully multiplied or divided.
For example, the time interval between the starts of years 1981 and 1982 is the same as that
between 1983 and 1984, namely 365 days. The zero point, year 1 AD, is arbitrary; time did not
begin then. Other examples of interval scales include the heights of tides, and the
measurement of longitude.
Continuous Data
A set of data is said to be continuous if the values / observations belonging to it may take on
any value within a finite or infinite interval. You can count, order and measure continuous data.
For example height, weight, temperature, the amount of sugar in an orange, the time required
to run a mile.
Compare discrete data.
Frequency Table
A frequency table is a way of summarising a set of data. It is a record of how often each value
(or set of values) of the variable in question occurs. It may be enhanced by the addition of
percentages that fall into each category.
A frequency table is used to summarise categorical, nominal, and ordinal data. It may also be
used to summarise continuous data once the data set has been divided up into sensible
groups.
When we have more than one categorical variable in our data set, a frequency table is
sometimes called a contingency table because the figures found in the rows are contingent
upon (dependent upon) those found in the columns.
Example
Suppose that in thirty shots at a target, a marksman makes the following scores:
522344320303215
131552400454455
The frequencies of the different scores can be summarised as:
Score Frequency Frequency (%)
0 4 13%
1 3 10%
2 5 17%
3 5 17%
4 6 20%
5 7 23%
Pie Chart
A pie chart is a way of summarising a set of categorical data. It is a circle which is divided into
segments. Each segment represents a particular category. The area of each segment is
proportional to the number of cases in that category.
Example
http://vuattach.ning.com/
Suppose that, in the last year a sports wear manufacturers has spent 6 million pounds on
advertising their products; 3 million has been spent on television adverts, 2 million on
sponsorship, 1 million on newspaper adverts, and a half million on posters. This spending can
be summarised using a pie chart:
Bar Chart
A bar chart is a way of summarising a set of categorical data. It is often used in exploratory
data analysis to illustrate the major features of the distribution of the data in a convenient form.
It displays the data using a number of rectangles, of the same width, each of which represents
a particular category. The length (and hence area) of each rectangle is proportional to the
number of cases in the category it represents, for example, age group, religious affiliation.
Bar charts are used to summarise nominal or ordinal data.
Bar charts can be displayed horizontally or vertically and they are usually drawn with a gap
between the bars (rectangles), whereas the bars of a histogram are drawn immediately next to
each other.
Dot Plot
A dot plot is a way of summarising data, often used in exploratory data analysis to illustrate the
major features of the distribution of the data in a convenient form.
For nominal or ordinal data, a dot plot is similar to a bar chart, with the bars replaced by a
series of dots. Each dot represents a fixed number of individuals. For continuous data, the dot
plot is similar to a histogram, with the rectangles replaced by dots.
A dot plot can also help detect any unusual observations (outliers), or any gaps in the data set.
Histogram
A histogram is a way of summarising data that are measured on an interval scale (either
discrete or continuous). It is often used in exploratory data analysis to illustrate the major
features of the distribution of the data in a convenient form. It divides up the range of possible
values in a data set into classes or groups. For each group, a rectangle is constructed with a
base length equal to the range of values in that specific group, and an area proportional to the
number of observations falling into that group. This means that the rectangles might be drawn
http://vuattach.ning.com/
of non-uniform height.
The histogram is only appropriate for variables whose values are numerical and measured on
an interval scale. It is generally used when dealing with large data sets (>100 observations),
when stem and leaf plots become tedious to construct. A histogram can also help detect any
unusual observations (outliers), or any gaps in the data set.
5-Number Summary
A 5-number summary is especially useful when we have so many data that it is sufficient to
present a summary of the data rather than the whole data set. It consists of 5 values: the most
extreme values in the data set (maximum and minimum values), the lower and upper quartiles,
and the median.
A 5-number summary can be represented in a diagram known as a box and whisker plot. In
cases where we have more than one data set to analyse, a 5-number summary is constructed
for each, with corresponding multiple box and whisker plots.
Outlier
An outlier is an observation in a data set which is far removed in value from the others in the
data set. It is an unusually large or an unusually small value compared to the others.
An outlier might be the result of an error in measurement, in which case it will distort the
interpretation of the data, having undue influence on many summary statistics, for example,
the mean.
If an outlier is a genuine result, it is important because it might indicate an extreme of
behaviour of the process under study. For this reason, all outliers must be examined carefully
before embarking on any formal analysis. Outliers should not routinely be removed without
further justification.
Symmetry
Symmetry is implied when data values are distributed in the same way above and below the
middle of the sample.
http://vuattach.ning.com/
Symmetrical data sets:
a. are easily interpreted;
b. allow a balanced attitude to outliers, that is, those above and below the middle value
( median) can be considered by the same criteria;
c. allow comparisons of spread or dispersion with similar data sets.
Many standard statistical techniques are appropriate only for a symmetric distributional form.
For this reason, attempts are often made to transform skew-symmetric data so that they
become roughly symmetric.
Skewness
Skewness is defined as asymmetry in the distribution of the sample data values. Values on
one side of the distribution tend to be further from the 'middle' than values on the other side.
For skewed data, the usual measures of location will give different values, for example,
mode<median<mean would indicate positive (or right) skewness.
Positive (or right) skewness is more common than negative (or left) skewness.
If there is evidence of skewness in the data, we can apply transformations, for example, taking
logarithms of positive skew data.
Compare symmetry.
Transformation to Normality
If there is evidence of marked non-normality then we may be able to remedy this by applying
suitable transformations.
The more commonly used transformations which are appropriate for data which are skewed to
the right with increasing strength (positive skew) are 1/x, log(x) and sqrt(x), where the x's are
the data values.
The more commonly used transformations which are appropriate for data which are skewed to
the left with increasing strength (negative skew) are squaring, cubing, and exp(x).
Scatter Plot
A scatterplot is a useful summary of a set of bivariate data (two variables), usually drawn
before working out a linear correlation coefficient or fitting a regression line. It gives a good
visual picture of the relationship between the two variables, and aids the interpretation of the
correlation coefficient or regression model.
Each unit contributes one point to the scatterplot, on which points are plotted but not joined.
The resulting pattern indicates the type and strength of the relationship between the two
variables.
http://vuattach.ning.com/
Illustrations
a. The more the points tend to cluster around a straight line, the stronger the linear
relationship between the two variables (the higher the correlation).
b. If the line around which the points tends to cluster runs from lower left to upper right,
the relationship between the two variables is positive (direct).
c. If the line around which the points tends to cluster runs from upper left to lower right,
the relationship between the two variables is negative (inverse).
d. If there exists a random scatter of points, there is no relationship between the two
variables (very low or zero correlation).
e. Very low or zero correlation could result from a non-linear relationship between the
variables. If the relationship is in fact non-linear (points clustering around a curve, not a
straight line), the correlation coefficient will not be a good measure of the strength.
A scatterplot will also show up a non-linear relationship between the two variables and
whether or not there exist any outliers in the data.
More information can be added to a two-dimensional scatterplot - for example, we might label
points with a code to indicate the level of a third variable.
If we are dealing with many variables in a data set, a way of presenting all possible scatter
plots of two variables at a time is in a scatterplot matrix.
Sample Mean
The sample mean is an estimator available for estimating the population mean . It is a
measure of location, commonly called the average, often symbolised .
Its value depends equally on all of the data which may include outliers. It may not appear
representative of the central region for skewed data sets.
It is especially useful as being representative of the whole sample for use in subsequent
calculations.
Example
Lets say our data set is: 5 3 54 93 83 22 17 19.
The sample mean is calculated by taking the sum of all the data values and dividing by the
total number of data values:
http://vuattach.ning.com/
Median
The median is the value halfway through the ordered data set, below and above which there
lies an equal number of data values.
It is generally a good descriptive measure of the location which works well for skewed data, or
data with outliers.
The median is the 0.5 quantile.
Example
With an odd number of data values, for example 21, we have:
Data 96 48 27 72 39 70 7 68 99 36 95 4 6 13 34 74 65 42 28 54 69
Ordered Data 4 6 7 13 27 28 34 36 39 42 48 54 65 68 69 70 72 74 95 96 99
Median 48, leaving ten values below and ten values above
Mode
The mode is the most frequently occurring value in a set of discrete data. There can be more
than one mode if two or more values are equally common.
Example
Suppose the results of an end of term Statistics exam were distributed as follows:
Student: Score:</I.< td>
1 94
2 81
3 56
4 90
5 70
6 65
7 90
8 90
9 30
Then the mode (most common score) is 90, and the median (middle score) is 81.
Dispersion
http://vuattach.ning.com/
The data values in a sample are not all the same. This variation between values is called
dispersion.
When the dispersion is large, the values are widely scattered; when it is small they are tightly
clustered. The width of diagrams such as dot plots, box plots, stem and leaf plots is greater for
samples with more dispersion and vice versa.
There are several measures of dispersion, the most common being the standard deviation.
These measures indicate to what degree the individual observations of a data set are
dispersed or 'spread out' around their mean.
In manufacturing or measurement, high precision is associated with low dispersion.
Range
The range of a sample (or a data set) is a measure of the spread or the dispersion of the
observations. It is the difference between the largest and the smallest observed value of some
quantitative characteristic and is very easy to calculate.
A great deal of information is ignored when computing the range since only the largest and the
smallest data values are considered; the remaining data are ignored.
The range value of a data set is greatly influenced by the presence of just one unusually large
or small value in the sample (outlier).
Examples
1. The range of 65,73,89,56,73,52,47 is 89-47 = 42.
2. If the highest score in a 1st year statistics exam was 98 and the lowest 48, then the
range would be 98-48 = 50.
Quantile
Quantiles are a set of 'cut points' that divide a sample of data into groups containing (as far as
possible) equal numbers of observations.
Examples of quantiles include quartile, quintile, percentile.
http://vuattach.ning.com/
Percentile
Percentiles are values that divide a sample of data into one hundred groups containing (as far
as possible) equal numbers of observations. For example, 30% of the data values lie below
the 30th percentile.
See quantile.
Compare quintile, quartile.
Quartile
Quartiles are values that divide a sample of data into four groups containing (as far as
possible) equal numbers of observations.
A data set has three quartiles. References to quartiles often relate to just the outer two, the
upper and the lower quartiles; the second quartile being equal to the median. The lower
quartile is the data value a quarter way up through the ordered data set; the upper quartile is
the data value a quarter way down through the ordered data set.
Example
Data 6 47 49 15 43 41 7 39 43 41 36
Ordered Data 6 7 15 36 39 41 41 43 43 47 49
Median 41
Upper quartile 43
Lower quartile 15
See quantile.
Compare percentile, quintile.
Quintile
Quintiles are values that divide a sample of data into five groups containing (as far as
possible) equal numbers of observations.
See quantile.
Compare quartile, percentile.
Sample Variance
Sample variance is a measure of the spread of or dispersion within a set of sample data.
The sample variance is the sum of the squared deviations from their average divided by one
less than the number of observations in the data set. For example, for n observations x1, x2, x3,
... , xn with sample mean
Standard Deviation
Standard deviation is a measure of the spread or dispersion of a set of data.
It is calculated by taking the square root of the variance and is symbolised by s.d, or s. In other
words
The more widely the values are spread out, the larger the standard deviation. For example,
say we have two separate lists of exam results from a class of 30 students; one ranges from
31% to 98%, the other from 82% to 93%, then the standard deviation would be larger for the
results of the first exam.
Coefficient of Variation
The coefficient of variation measures the spread of a set of data as a proportion of its mean. It
is often expressed as a percentage.
It is the ratio of the sample standard deviation to the sample mean:
There is an equivalent definition for the coefficient of variation of a population, which is based
on the expected value and the standard deviation of a random variable.
FINALTERM EXAMINATION
Fall 2009
STA301- Statistics and Probability (Session - 4)
Q No. 9 10 11 12 13 14 15 16
Marks
Q No. 17 18 19 20 21 22 23 24
Marks
Q No. 25 26 27 28 29 30 31
Marks
http://vuattach.ning.com/
► Zero
► Less than 1
► Greater than 1
► Negative
v1
forv1 〉 2
► v1 − 2
v2
forv 2 〉 2
► v2 − 2
v1
forv1 ≥ 2
► v1 − 2
v2
forv1 ≤ 2
► v2 − 2
E ( X ) + E (Y )
►
E ( X ) − E (Y )
►
X − E(Y)
►
E ( X ) −Y
►
►0
►1
► 99
► 100
► Population distribution
► Frequency distribution
► Sampling distribution
► Sample distribution
► E (T) = θ
► E (T) =T
► E (T) =0
► E (T) =1
X 1 2 3
► 0.6
► 0.8
► 0.2
► 0.4
► Non-negative
► Negative
► One
► Zero
► Variance
► Mean
http://vuattach.ning.com/
► Standard deviation
► Covariance
f ( x |1) =
__________:
f ( 1,1)
►
f ( x,1)
►
f ( x,1)
h ( 1)
►
f ( x,1)
h ( x)
►
► .0401
► .5500
► .4599
► .9599
► Destructive tests
► Heterogeneous
► To make voters list
► None of these
Ans:
FINALTERM EXAMINATION
Spring 2010
STA301- Statistics and Probability (Session - 4)
► Rejected
► Accepted
► No conclusion
► Acknowledged
► Variances
► Means
► Proportions
► Groups
► Helmert
► Pearson
► R.A Fisher
► Francis
► 0,1,2,3
► 1,3,3,1
► 1, 2, 3
► 3, 2
► E(XX)
► E(X).E(Y)
► X.E(Y)
► Y.E(X)
►x&n
►x&p
►n&p
► x, n & p
► E (T) = θ
EXPECTION OF STATISTIC IS EQUAL TO PARAMETER THAT IS ESTIMATED THEN STATISTIC IS
CALLED UNBIASED OTHER WISE BIASED.
► E (T) =T
► E (T) =0
► E (T) =1
http://vuattach.ning.com/
► Sample mean
► Sample median
► Sample proportion
► Sample variance
► Unbiased estimator of σ
2
► Biased estimator of σ
2
► Unbiased estimator of µ
► None of these
1
0
c
-c
►0
►1
► c THE EXPECTION OF A CONSTATNT IS ALWAYS CONSTANT
► -c
∫ f ( x, y ) dx
► −∞
∫ f ( x, y ) dy
► −∞
∞ ∞
∫ ∫ f ( x, y ) dx dy
► −∞ −∞
b d
∫ ∫ f ( x, y ) dy dx
► a c
► Comparing F distributions
► Comparing three or more means
► Measuring sampling error
► Comparing variances
► In all situations
► Descriptive Statistics
► Advance Statistics
► Inferential Statistics
► Sampled Statistics
► 0.5σ
► 0.75σ
► 0.7979σ
► 0.6745σ
►1
►2
►3
►0
http://vuattach.ning.com/
Question No: 23 ( Marks: 1 ) - Please choose one
If you
connect the mid-points of rectangles in a histogram by a series of lines that also touches the x-
axis from both ends, what will you get?
► Ogive
► Frequency polygon
► Frequency curve
► Historigram
►1
►2
►3
►4
►5
►8
► 9 n-1
► 10
► Type I error
► Type II error
► Correct decision
Three N n k
H 0 ≤ 16000
H1 > 16000
3.37
σ p2ˆ1 − pˆ 2
Find ,where n = 10
σ p2ˆ1 − pˆ 2
= p1q1/n1+p2q2/n2
µ
P µ ¶ 2 µ µ ¶ 2
µ
F( P ) P P F( P ) P F( P )
0 1/20
1/3 9/20
2/3 9/20
1 1/20
∑ 1
µ fP
µ =∑ P µ
Mean=
µ2fP
µ − ( P
∑ µ f P¶ )
2
E ( x)2 = ∑ P
Variance=