Nothing Special   »   [go: up one dir, main page]

Math Midterm

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

LESSON #8 Lesson Title: Describing Data Presentation and Constructing Frequency Distribution Table

STEPS IN CONSTRUCTING FREQUENCY DISTRIBUTION TABLE

Step (0). It is suggested to arrange the data for easy counting in the next few steps.
Step (1) Find the range (R) of the data
Step (2) Decide the approximate number of classes
Step (3) Determine the approximate class interval size (C): The size of class interval is obtained by dividing
the range of data by number of classes. C = R/K
Step (4): 1st Column of the FDT Create the Class Intervals (c.i). Set the lowest number as the starting point.
Class intervals consist of end numbers called lower and upper limits.
Step (5): 2nd Column of the FDT Tally the raw data to identify the frequency
Step (6) 3 rd Column of the FDT
Class frequency (f) the number of data that belong to its class interval.
Distribute the data into respective classes: The total of the frequency column must be equal to the
number of observations or sample size (n).
Step (7) 4 th and 5 th Column of the FDT
Determine the LCB and UCB. *see the values on the table
Class boundaries are the numbers used to separate class without gaps created by class limits.
Lower Class Boundary (LCB) – is the middle value between the lower class limit and the upper class
limit of the preceding class.
Upper Class Boundary (UCB) – is the middle value between the upper class limit and the lower class
limit of the next class.
Step (8) 6 th Column of the FDT
Determine the class mark. This is the average or midpoint of the upper limit and lower limit.
Class mark (CM) can be found by adding the lower and upper limits and then dividing it by 2.
Do not round-off the CM.
Step (9) 7 th and 8 th Column of the FDT
Cumulate the Frequency means you need to add-up the frequencies of each class interval.
Cumulative frequency “less than”(CF) – is obtained by adding the frequencies successively from the
lowest to the highest interval .
Cumulative frequency “greater than”(>CF) – is obtained by adding the frequencies successively
highest to the lowest class interval.
Step (10) 9 th Column of the FDT
Determine the relative frequency of each class interval by dividing the frequency of the interval by
the total number of observations.
Relative frequency percentage (RF%) is the frequency divided by the total frequency and multiplied
by 100 to express in percentage form.

FAQs
1) What is a tabular data? Tabular data is data that is structured into rows, each of which contains
information about something. This specification refers to such files, as well as tab-delimited files, fixed field
formats, spreadsheets, HTML tables, and SQL dumps as tabular data files
2) What does a Data Interpretation Table chart represent? Data Interpretation questions based on tables are
common in competitive exams. The boxes of the table consist of different types of information such as marks
of a student, income of a company, production of some firm, expenditure on different items and so on.
3) How to interpret data? Collect your data and make it as clean as possible. Choose the type of analysis to
perform: qualitative or quantitative, and apply the methods respectively to each.

LESSON #9 Lesson Title: Interpreting Graphical Presentation


Graphs are common method to visually illustrate relationships in data. The purpose of a graph is to present
the data that are too numerous or complicated to describe adequately in the text and in less space. It shows
trends and relationships of the variable under study. Graphs enable us in studying the cause and effect
relationship between two or more variables. Graphs help to measure the extent of change in one variable
when another variable changes by a certain amount.

Graphical presentation is a way of analyzing numerical data. It exhibits the relation between data, ideas,
information and concepts in a diagram. It is easy to understand and it is one of the most important learning
strategies. There are different types of graphical representation. Some of them are as follows

1) Bar Graph ⇒ Uses solid bars to represent quantities


⇒ Effective for discrete variables
⇒ Multiple bar graph for comparing figures of two or more categories
2) Line Graph ⇒ Show trends over a period of time
⇒ Information of is connected in some way
⇒ Effective for continuous variable
⇒ Multiple line graph for comparing figures of two or more categories
3) Pie Chart ⇒ Circle divided into slices to illustrate numerical proportion
⇒ Each part represents a percentage of the total.
⇒ A good way to show relative sizes
4) Pictograph ⇒ A way of showing data using images
⇒ A fun and interesting way to show data, but it is not very accurate
⇒ Can be vertical or horizontal

General Rules for Graphic Presentation of Data and Information

1) Suitable Title – Should clearly indicates the subject for which you are presenting it.

2) Unit of Measurement – Clearly state the unit of measurement.

3) Suitable Scale – To represent the entire data in an accurate manner.


4) Index – Explains the different colors and shades, lines and designs and include a scale of interpretation

5) Data Sources – Wherever possible, include the sources of information at the bottom of the graph.

6) Simple and Neat – Easy to understand and attractive to the eyes

As every graph tells a story, the graph creator has to be a good story teller and needs basic knowledge in
creating and interpreting the graphs produced. Also, the person trying to understand the story, needs some
basic knowledge about graphs. Otherwise reading a graph is like reading a text in a foreign language.

FAQs

1) What are the limitations of a Graph?


❖ A graph lacks complete accuracy of facts.
❖ It depicts only a few selected characteristics of the data.
❖ We cannot use a graph in support of a statement.
❖ A graph is not a substitute for tables.
❖ Usually, laymen find it difficult to understand and interpret a graph.
❖ Typically, a graph shows the unreasonable tendency of the data and the actual values are not
clear.
2) What are the general rules for the graphic presentation of data and information? The general rules
for the graphic presentation of data are:
❖ Use a suitable title ❖ Clearly specify the unit of measurement
❖ Ensure that you choose a suitable scale
❖ Provide an index specifying the colors, lines, and designs used in the graph
❖ If possible, provide the sources of information at the bottom of the graph
❖ Keep the graph simple and neat.
3) What are the other statistical graphs?
❖ Histograms – The graph that uses bars to represent the frequency of numerical data that are
organized into intervals. Since all the intervals are equal and continuous, all the bars have the same
width.
❖ Stem and Leaf Plot – In stem and leaf plot, the data are organized from least value to the greatest
value. The digits of the least place values from the leaves and the next place value digit forms the
stems.
❖ Box and Whisker Plot – The plot diagram summarizes the data by dividing into four parts. Box and
whisker shows the range (spread) and the middle (median) of the data.

LESSON 10 Lesson Title: Computing Measures of Central Tendency

In statistics, the three most common measures of central tendency are the mean, median and mode. Each of
these measures calculates the location of the central point using a different method.
FAQs

1) What is the best measure of central tendency? There can often be a "best" measure of central tendency
with regards to the data you are analyzing, but there is no one "best" measure of central tendency. This is
because whether you use the median, mean or mode will depend on the type of data you have, such as
nominal or continuous data; whether your data has outliers and/or is skewed; and what you are trying to
show from your data.

2) Does all data have a median, mode and mean? Yes and no. All continuous data has a median, mode and
mean. However, strictly speaking, ordinal data has a median and mode only, and nominal data has only a
mode.

3) When is the mean the best measure of central tendency? The mean is usually the best measure of central
tendency to use when your data distribution is continuous and symmetrical, such as when your data is
normally distributed. However, it all depends on what you are trying to show from your data.

4) When is the mode the best measure of central tendency? The mode is the least used of the measures of
central tendency and can only be used when dealing with nominal data. For this reason, the mode will be the
best measure of central tendency (as it is the only one appropriate to use) when dealing with nominal data.
The mean and/or median are usually preferred when dealing with all other types of data, but this does not
mean it is never used with these data types.

5) When is the median the best measure of central tendency? The median is usually preferred to other
measures of central tendency when your data set is skewed or you are dealing with ordinal data. However,
the mode can also be appropriate in these situations, but is not as commonly used as the median.

LESSON #11 Lesson Title: Computing Measures of Dispersion


The measure of dispersion shows how scattered the data are. It tells the variation of the data from
one another and gives a clear idea about the distribution of the data. The measure of dispersion shows the
homogeneity or the heterogeneity of the distribution of the observations. You need to be able to understand
how the degree to which data values are spread out in a distribution can be assessed using simple measures
to best represent the variability in the data. This measures also occur very frequently in the medical and
social sciences research literatures. In statistics, variability, dispersion, and spread are synonyms that denote
the width of the distribution.

FAQs

1) Why do we use standard deviation instead of variance? Standard Deviation which is the square root of
variance is a measure of dispersion like variance. But it is used more often than variance because the unit in
which it is measured is the same as that of mean, a measure of central tendency.

2) What is the purpose of measuring variability? In general, a good measure of variability serves two
purposes: – variability describes the distribution. Specifically, it tells whether the scores are clustered close
together or are spread out over a large distance.

3) What are the three tools used to assess variability? Statisticians use summary measures to describe the
amount of variability or spread in a set of data. The most common measures of variability are the range, the
interquartile range (IQR), variance, and standard deviation.

4) How do you know if variance is high or low? All non-zero variances are positive. A small variance indicates
that the data points tend to be very close to the mean, and to each other. A high variance indicates that the
data points are very spread out from the mean, and from one another. Variance is the average of the
squared distances from each point to the mean.

LESSON #12 Lesson Title: Computing Probabilities Under Standard Normal Curve

The normal distribution is a probability function that describes how the values of a variable are distributed. It
is a symmetric distribution where most of the observations cluster around the central peak and the
probabilities for values further away from the mean taper off equally in both directions. For practical
purpose normal distribution is good enough to represent the distribution of continuous variable like-height,
weight, blood pressure, food intake and many more. In this module you will learn to compute probabilities
and percentage areas under standard normal table, shade the area under the normal curve, and solve
problems that involves probabilities under normal curve
Standard Normal Table (z-score Table)

● To calculate the area under the normal curve, we use the standard normal or z-score table

● In a z-score table, found in the last pages of this module, the left most column tells you how many
standard deviations above the mean to 1 decimal place, the top row gives the second decimal place, and the
intersection of a row and column gives the probability.

Steps in Calculating Probabilities under Normal Curve

1. Translate the problem into one of the following: P (z < a), P (z > a) or P (a < z < b) .

2. Draw a picture of normal curve. Shade in the area of the given probability P on the picture. This will help
you to visualize the problem.

3. Standardize a (and/or b) to a z-score using the z-formula if it is not yet standardized.

4. Look up the z-score on the Z-table and find its corresponding probability.

FAQs

1) What does it mean to be normally distributed? A normal distribution of data is one in which the majority
of data points are relatively similar, meaning they occur within a small range of values with fewer outliers on
the high and low ends of the data range.

2) Why is it important to assume a normal distribution? The normal distribution is important because of the
Central limit theorem. In simple terms, if you have many independent variables that may be generated by all
kinds of distributions, assuming that nothing too crazy happens, the aggregate of those variables will tend
toward a normal distribution.

3) Can a normal distribution be skewed? For example, the normal distribution is a symmetric distribution
with no skew. The tails are exactly the same. ... Left-skewed distributions are also called negatively-skewed
distributions. That's because there is a long tail in the negative direction on the number line.

LESSON #13 Lesson Title: Computing Linear Relationship using Pearson Correlation Coefficient

Correlation coefficients are used in statistics to measure how strong a relationship is between two
variables (at least an interval scale). There are several types of correlation coefficient: Pearson’s correlation
(also called Pearson-r) is a correlation coefficient commonly used in linear relationship. For example, you
could use a Pearson-r correlation to understand whether there is an association between exam performance
and time spent studying in a coffee shop. The Pearson’s correlation attempts to draw a line of best fit
through the data of two variables, and the Pearson’s correlation coefficient (r) indicates how far away all
these data points are from this line of best fit.
FAQs
1) Is Pearson correlation affected by outliers? Pearson's correlation coefficient, r, is very sensitive to outliers,
which can have a very large effect on the line of best fit and the Pearson correlation coefficient. This means
— including outliers in your analysis can lead to misleading results.
2) What are the 5 types of correlation? Types of Correlation:
✔ Positive, Negative or Zero Correlation
✔ Linear or Curvilinear Correlation
✔ Scatter Diagram Method
✔ Pearson's Product Moment Co-efficient of Correlation:
✔ Spearman's Rank Correlation Coefficient
3) Can correlation be expressed as a percentage? It gives a measure of the amount of variation that can be
explained by the model (the correlation is the model). It is sometimes expressed as a percentage (e.g., 36%
instead of 0.36) when we discuss the proportion of variance explained by the correlation. You should write it
as a proportion (e.g., r 2 = 0.36).

LESSON #14 Lesson Title: Predicting Behavior using Regression Analysis

Regression analysis constitutes an important part of a statistical analysis to explore and model the
relationship between variables. Regression analysis helps in predicting the value of a dependent variable
based on the values of the independent variables. Regression analysis produces a regression equation where
the coefficients represent the relationship between each independent variable and the dependent variable.
In this module, students will compute the regression value, determine the regression equation, draw the
regression line and perform regression analysis. By performing a regression analysis on a data, we can
determine whether or not these variables have impact on each other, and if so, to what extent.

FAQs
1) Why is it called regression? The term "regression" was coined by Francis Galton in the 19th century to
describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors
tend to regress down towards a normal average (a phenomenon also known as regression toward the
mean).
2) Which regression model is best? A low predicted R-squared is a good way to check for this problem. P-
values, predicted and adjusted Rsquared, and Mallows' Cp can suggest different models. Stepwise regression
and best subsets regression are great tools and can get you close to the correct model.
3) What are the types of regression? Types of Regression are Linear, Logistic, Polynomial, Stepwise, Ridge,
Lasso, and ElasticNet

You might also like