Basic Statistical Tools
Basic Statistical Tools
Basic Statistical Tools
Presented by
INTRODUCTION
This discussion session provides information regarding
Acceptable practices for the analysis and consistent interpretation of data
obtained from chemical and other analyses.
Basic statistical approaches for evaluating Quality of the data
The treatment of outliers and comparison of analytical methods
INTRODUCTION
Session 1:Prerequisite laboratory practices and principles
Sound Record Keeping
Sampling considerations
INTRODUCTION
Basic Tools of Quality
Ishikawa diagram
Control chart
Pareto chart
Scatter plot
Session 1:Prerequisite
laboratory practices and
principles
Significant figures,
Addition & Subtraction
Description
Nonrandom /conveni Risks the possibility that the estimates will be biased.
ence sampling
Simple random
sampling
Systematic random
sampling
Stratified random
sample
Limited,
Farjana
Urmi
ACI HealthCareACI
Limited,
Kh.Kh.
Farjana
Urmi
Method Validation
All methods are appropriately validated as specified under Validation
of Compendial Methods 1225 .
ACI HealthCare
HealthCare Limited,
Limited, Kh.
Kh. Farjana
Farjana Urmi
Urmi
ACI
Accurate &
Precise
Precise but
not
Accurate
Accurate but
not
Precise
14.8
14.7
14.8
14.7
14.8
Student
B
14.7
14.2
14.6
14.6
14.8
Student
C
14.4
14.4
14.5
14.4
14.5
Trial 2
Trial 3
Trial 4
Trial 5
Student
A
14.8
14.7
14.8
14.7
14.8
Student
B
14.7
14.2
14.6
14.6
14.8
Student
C
14.4
14.4
14.5
14.4
14.5
OUTLYING RESULTS
Outliers: occasionally, observed analytical results are very different from those
expected. Aberrant, anomalous, contaminated, discordant, spurious, suspicious or
wild observations; and flyers, rogues, and mavericks are properly called outlying
results. Like all laboratory results, these outliers must be documented, interpreted,
and managed. Such results may be accurate measurements of the entity being
measured, but are very different from what is expected.
Outliers, in statistics, refer to relatively small or large values which are considered
to be different from, and not belong to, the main body of data. The problem of what
to do with outliers is a constant dilemma facing research scientists. If the cause of
an outlier is known, resulting from an obvious error, for example, the value can be
omitted from the analysis and tabulation of the data.
OUTLYING RESULTS
Factors to be considered when investigating an outlying result include
but are not limited to
Human error, instrumentation error, calculation error, and product or component
deficiency. If an assignable cause that is not related to a product or component
deficiency can be identified, then retesting may be performed on the same
sample, if possible, or on a new sample.
The precision and accuracy of the method, the Reference Standard, process
trends, and the specification limits should all be examined. Data may be
invalidated, based on this documented investigation, and eliminated from
subsequent calculations.
ACI
Limited,
Kh.
Farjana
Urmi
ACI HealthCare
Limited,
Kh.
Farjana
Urmi
OUTLYING RESULTS
Outlier identification is the use of statistical significance tests to confirm that
the values are inconsistent with the known or assumed statistical model.
When used appropriately, outlier tests are valuable tools for pharmaceutical
laboratories. Several tests exist for detecting outliers. Examples illustrating three of
these procedures, the Extreme Studentized Deviate (ESD) Test, Dixon's Test, and
Hampel's Rule.
Outlier rejection is the actual removal of the identified outlier from the data set.
However, an outlier test cannot be the sole means for removing an outlying result
from the laboratory data.
All data, especially outliers, should be kept for future review. Unusual data, when
seen in the context of other historical data, are often not unusual after all but reflect
the influences of additional sources of variation.
ACI HealthCare Limited, Kh. Farjana Urmi
OUTLYING RESULTS
An outlier test may be useful as part of the evaluation of the significance of that
result, along with other data. Outlier tests have no applicability in cases where the
variability in the product is what is being assessed, such as content uniformity,
dissolution, or release-rate determination. In these applications, a value
determined to be an outlier may in fact be an accurate result of a non uniform
product.
In summary, the rejection or retention of an apparent outlier can be a serious
source of bias. An outlier test can never take the place of a thorough laboratory
investigation. Rather, it is performed only when the investigation is inconclusive
and no deviations in the manufacture or testing of the product were noted.
OUTLYING RESULTS
Given the following set of 10 measurements: 100.0, 100.1, 100.3, 100.0, 99.7,
99.9, 100.2, 99.5, 100.0, and 95.7 (mean = 99.5, standard deviation = 1.369) are
there any outliers?
Dixon-Type Tests
Stage 1 (n= 10)The results are ordered on the basis of their magnitude (i.e., Xn
is the largest observation, Xn1 is the second largest, etc., and X1 is the smallest
observation). Dixon's Test has different ratios based on the sample size (in this
example, with n = 10), to declare X1 an outlier, the following ratio, r11, is calculated
by the formula:
OUTLYING RESULTS
If, r11 > Qtable, where Qtable is a reference value corresponding to the sample
size and confidence level, then reject the questionable point. Note that only
one point may be rejected from a data set using this test.
ACI Limited,
ACI HealthCare
Limited,Kh.
Kh.Farjana
FarjanaUrmi
Urmi
Ishikawa diagram
Ishikawa diagrams (also called fishbone diagrams, cause-and-effect diagrams) are causal
diagrams created by Kaoru Ishikawa (1968) that show the causes of a specific event. Common
uses of the Ishikawa diagram are product design and quality defect prevention to identify potential
factors causing an overall effect. Each cause or reason for imperfection is a source of variation.
Causes are usually grouped into major categories to identify these sources of variation.
ACI
Limited,
Kh.
Farjana
Urmi
ACI HealthCare
Limited,
Kh.
Farjana
Urmi
To construct the upper and lower control limits of the chart, we use the following
formulas:
Where, x= mean of the sample means or a target value set for the process
z = number of normal standard deviations
x = standard deviation of the sample means
= / n
= population standard deviation
n = sample size
A Pareto chart, named after Vilfredo Pareto, is a type of chart that contains
both bars and a line graph, where individual values are represented in descending
order by bars, and the cumulative total is represented by the line.
The left vertical axis is the frequency of occurrence, but it can alternatively
represent cost or another important unit of measure. The right vertical axis is the
cumulative percentage of the total number of occurrences, total cost, or total of the
particular unit of measure. Because the reasons are in decreasing order, the
cumulative function is a concave function. To take the example below, in order to
lower the amount of late arrivals by 78%, it is sufficient to solve the first three
issues.
t-Test
Looks at differences between two groups on some variable of interest
Ex: Do males and females differ in the amount of hours they spend
shopping in a given month?
ANOVA
When results of laboratories or methods are compared where more than one
factor can be of influence and must be distinguished from random effects, then
ANOVA is a powerful statistical tool to be used. Examples of such factors are:
different analysts, samples with different pre-treatments, different analyte levels,
different methods within one of the laboratories). Most statistical packages for the
PC can perform this analysis
ACI HealthCare Limited, Kh. Farjana Urmi
What is ANOVA?
A statistical method for testing whether two or more dependent
variable means are equal (i.e., the probability that any differences in
means across several groups are due solely to sampling error).
Variables in ANOVA (Analysis of Variance):
Dependent variable is metric.
Independent variable(s) is nominal with two or more levels
also called treatment, manipulation, or factor.
One-way ANOVA: only one independent variable with two or more
levels.
Two-way ANOVA: two independent variables each with two or
more levels.
Thank you