Nothing Special   »   [go: up one dir, main page]

Evaluation of Linearity

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5
At a glance
Powered by AI
The article discusses the importance of evaluating linearity in clinical laboratory tests and different statistical methods that can be used to assess linearity.

The purpose of evaluating linearity is to establish the relationship between measured values and true values of analytes to ensure accurate and reliable laboratory results across the entire measurement range.

Some methods discussed for evaluating linearity include visual assessment, lack-of-fit error test, regression analysis using the polynomial method, and alternative statistical techniques.

Evaluation of Linearity in the Clinical Laboratory

Jeffrey S. Jhang, MD; Chung-Che Chang, MD, PhD; Daniel J. Fink, MD, MPH; Martin H. Kroll, MD

● Context.—Clinical laboratory assessment of test linearity tive. The lack-of-fit error and the 1986 NCCLS EP6-P G test
is often limited to satisfying regulatory requirements rather are sensitive to imprecision and assume that the data are
than integrating this tool into the laboratory quality assur- first order. Regression analysis, as developed as the poly-
ance program. Although an important part of quality con- nomial method, is partly based on the experiences of the
trol and method validation for clinical laboratories, line- College of American Pathologists Instrumentation Re-
source Committee and has proved to be a robust statistical
arity of clinical tests does not get the attention it deserves.
method.
Objective.—This article evaluates the concepts and im- Conclusions.—We provide general guidelines for han-
portance of linearity evaluations for clinical tests. dling non-linear results from a linearity evaluation. Han-
Design.—We describe the theory and procedural steps dling linearity data in an objective manner will aid clinical
of each linearity evaluation. We then evaluate the statisti- laboratorians whose goal is to improve the quality of the
cal methods for each procedure. tests they perform.
Results.—Visual assessment, although simple, is subjec- (Arch Pathol Lab Med. 2004;128:44–48)

A ccording to NCCLS EP6-A, a quantitative analytical


method is said to be linear when the analyte recov-
ery from a series of sample solutions (measured value) is
Calibration verification ensures accuracy and involves
measuring analytes in calibrators or other samples of
known values traceable to a reference method to confirm
linearly proportional to the actual concentration or content that the established relationship has remained stable.2 In
of the analyte (true value) in the sample solutions.1 The contrast, one does not need to know the absolute concen-
points at the upper and lower limits of the analytic mea- tration to perform a linearity evaluation, even though
surement range that acceptably fit a straight line deter- knowledge of such is helpful in establishing upper and
mine the linear range. In some assays, the instrument re- lower limits. For example, testing serial dilutions of a sam-
sponse versus concentration of sample solutions is not lin- ple with an elevated value is an acceptable approach. It is
ear; for example, competitive radioimmunoassays have a the straight-line relationship between points that is the fo-
parabolic-shaped instrument response when plotted cus of interest in linearity evaluation. As described in
against concentration and a sigmoid-shaped curve when NCCLS E6-P2, this linear relationship is important to cli-
the response is plotted against the logarithm of the con- nicians who rely on this linear relationship for easy inter-
centration. The responses may be transformed using a 4- polation of results.1
parameter logistic formula or other formula such as logit- Prior to the availability of linearity survey programs,
log. The test results from this transformation should be such as those provided by the College of American Pa-
linearly proportional to the true value of the analyte in the thologists (CAP; Northfield, Ill) and Casco (Portland, Me),
sample solutions (Figure 1). Therefore, the curve of the individual laboratories were forced to undergo the diffi-
instrument response, which can be parabolic or sigmoid- cult task of saving specimens from patients with elevated
shaped, should not be confused with linearity between the results for testing at the upper limits of the analytic mea-
measured value and the true value. surement range. In the absence of testing at the upper
Some laboratory personnel mistake the requirement for limit of the analytic measurement range, laboratories had
calibration verification with that for evaluating linearity. to narrow the range with a subsequent increase in the
The Clinical Laboratory Improvement Amendments of frequency of sample dilutions.3 In 1988, the CAP Instru-
19982 define calibration as the process of testing an ana- mentation Resource Committee (IRC) began offering lin-
lytical method to establish a relationship between known earity surveys as a tool for evaluating linearity. The line-
concentrations of an analyte (calibrators) and the mea- arity program provides pre-prepared, analyte-spiked hu-
sured value, but makes no specifications about linearity. man samples (mostly serum and some urine) covering the
full, expected operating range for the analytes being test-
ed for linearity. Although lyophilized samples were used
Accepted for publication August 21, 2003. initially, analyte-spiked serum and urine samples that do
From the College of Physicians and Surgeons of Columbia University, not require dilution are now being used, eliminating im-
New York, NY (Drs Jhang and Fink); Medical College of Wisconsin, precision due to manual pipetting. Since survey samples
Milwaukee (Dr Chang); and University of Texas Southwestern Medical
are made to specific analyte target values, the data anal-
Center, Dallas (Dr Kroll).
Reprints: Martin H. Kroll, MD, VA Medical Center, Pathology and ysis verifies calibration within preset tolerances for the
Laboratory Medicine, SVC (113), 4500 Lancaster Rd, Dallas, TX 75216- participants, unlike the use of stored patient samples,
0000 (e-mail: martin.kroll@med.va.gov). which can only be used to evaluate linearity. These sur-
44 Arch Pathol Lab Med—Vol 128, January 2004 Linearity in Clinical Laboratory Testing—Jhang et al
veys are also useful because they allow comparison across
laboratories and methods.
Participation in a linearity and calibration verification
program can add a layer of quality assessment above and
beyond that provided by proficiency testing programs.
Generally, linearity testing has a narrower range of ac-
ceptability than proficiency testing and is more effective
in detecting analytical problems. For proficiency testing,
either 3 SD from the peer group mean, an absolute per-
centage, or an absolute percentage deviation from the peer
group mean is the usual limit, which is often disparate
from the true analytical value for a given sample. Accept-
ability in linearity testing can have much narrower, ab-
solute limits for error based on medically- or analytically-
relevant criteria. Also, linearity testing challenges the en-
tire calibration range, including the extremes, and can de-
tect problems such as reagent or spectrophotometer
deterioration earlier than quality control or proficiency
testing failures. It is also good laboratory practice to pe-
riodically demonstrate linearity to detect reagent deterio-
ration, monitor analyzer performance, or re-confirm line-
arity after a major servicing of equipment.4

PERFORMANCE OF LINEARITY STUDIES


Preparation of Standards
The appropriate evaluation of linearity requires 5 dif-
ferent concentrations spanning the analytical range. Five
or more samples are necessary because a sigmoid-shaped,
nonlinear curve can be missed with a regression using
fewer than 5 points. Even though in geometry 2 points
define a line, empirical studies require at least 3 points to
add an additional degree of freedom for statistical com-
putations. If a parabolic curve is to be captured, then 3
points define the curve, but 4 points are needed to assess
‘‘goodness of fit’’; the 1 additional point is required for
statistical calculation. A sigmoid curve can be defined by
4 points if it is symmetric about its axis, but by 5 points
if it is asymmetric.
The samples should be spaced where analytically rele-
vant. Frequently, equal spacing is sufficient. Spiking a bi-
ologic matrix with known amounts of analyte, making se-
rial dilutions, or creating mixtures with different ratios of
a high and low standard are all acceptable approaches that
can be used to prepare the test samples.1 Typically, mix-
tures and serial dilutions have less error than individually
prepared solutions. At least 2 replicate samples should
then be run to allow for estimation of random error.1 Sam-
ples should be run in random order within the same day
after establishing that the instrument is calibrated and in
Figure 1. Examples of 3 patterns of recovery (recovery vs x values): control.
linear (triangles), parabolic (squares), and sigmoid (circles). A first-order
fit describes the linear recovery best. A second-order fit describes the Analysis of Linearity Results
parabolic recovery best; note that the slope changes unidirectionally.
A third-order fit describes the sigmoid recovery best; note that the slope
A wide variety of analytic and statistical methods have
changes in 2 directions and displays an inflection point. been developed to estimate the departure of the linearity
experiment from perfect linearity. The methods for inter-
Figure 2. Linear (circles) and loss of linearity (squares) responses. The
loss of linearity response has lost its linearity at both the upper and
pretation of the data have evolved over the years from
lower ends of the data range. simple visual inspection to statistical regression analy-
sis.3,5,6 The techniques have been extensively reviewed by
Figure 3. Linear plot of data without bias (circles). Linear plot of data
showing proportional bias as indicated by a slope of 1.2 (triangles).
Tholen6 and Kroll and colleagues.5,7 The newer procedures
Linear plot of data showing constant bias as indicated by a y-intercept have been developed to determine whether deviation from
of 1.8 (squares). linearity is significantly relevant to analytic or clinical
goals.5 The more commonly used methods, described by
Tholen,6 are adapted here for completeness, and more re-
cently adopted methods are then described.
Arch Pathol Lab Med—Vol 128, January 2004 Linearity in Clinical Laboratory Testing—Jhang et al 45
Visual Review. The most common method used to in- Data for Analyte Recovery From a Series of Equally
terpret the results of a linearity experiment is visual re- Spaced Analyte-Spiked Solutions for a Linear and
view of a plot of the replicate mean of the measured values Nonlinear Experiment*
versus the true value for each level of the sample solu- Analyte Analyte
tions.6,8 The desideratum is for y-axis values (measured Recovery Recovery Nonlinear
value) to be as close as possible to the x-axis values (true Dilution Linear Linear Deltas Nonlinear Deltas
value). The points are connected, and the evaluation is 1 20 ... 20 ...
then based on the degree to which the data follow a 2 140 120 145 125
straight line (Figure 2). For experiments with known x- 3 260 120 255 110
values, it may be useful to draw a line with a slope of 1 4 380 120 335 80
passing through the origin as a visual reference for devi- 5 500 120 405 70
ation from perfect linearity. The best-fit line may be drawn * The linear data demonstrate equal successive deltas, while the non-
for experiments using equally spaced sample solutions. linear data set shows smaller successive deltas at higher concentra-
Visual assessment is a simple and intuitive tool for an tions, indicating loss of linearity at the high end of the measurement
range.
expert laboratorian, but it is subjective, unreliable, and
poorly reproducible when used without expert under-
standing of the method.6,8 ative; when a data set is linear, the first derivative is a
Least Squares Linear Regression. The most common- constant (Table).
ly used method for fitting a line to the data is the least Error.6 The total error around the regression line is
squares linear regression.6 The true value is plotted on the equal to the sum of the pure error or random error and
x-axis. If the solutions are evenly spaced, unitless solution the lack-of-fit error. The components of error can be de-
numbers may be assigned to the x-axis. The measured termined using an appropriately constructed linearity
values are plotted on the y-axis. Least squares linear re- evaluation with 2 or more replicates at each level. The pure
gression fits a straight line to a set of data points such that error is the error of duplicate samples around their com-
the sum of the squares of the vertical distance of the mon mean and estimates random error. It is the sum of
points to the fitted line is minimized. This minimization the squared vertical deviations from the mean of replicat-
is performed in the vertical direction, since the x-axis rep- ed samples at all sample levels. The lack-of-fit error esti-
resents true values. The equation is of the familiar form y mates the appropriateness of the given model. The lack-
5 mx 1 b, where m is the slope of the line and b is the of-fit error is the sum of the squared vertical distances
y-intercept. The y-intercept is the point where the regres- between the mean of replicates at a given level and the
sion line crosses the y-axis, that is, the value for y where regression line. The total error is the sum of the pure and
x equals 0. This value can be either positive or negative. lack-of-fit error and can also be calculated as the sum of
In more common terms, the y-intercept represents the con- the squared vertical distances between all the measured
stant systematic error or constant bias (Figure 3). The y- values and the regression line. If the model is a good fit,
intercept should be as close to 0 as possible. The accept- the lack-of-fit error will be close to 0 and the total error
able value of the y-intercept depends on the analyte being will be all random error.
evaluated. Analytes for which the clinical decision points Error and the G Test. In 1986, the NCCLS EP6-P
are close to the 0 point of the analytic measurement range guidelines incorporated a statistical procedure, the G test,
require a y-intercept close to 0, while analytes that have which can be used to determine the appropriateness of a
clinical decision points in the middle or high end of the regression model.6,8 The G statistic is defined as the ratio
analytic measurement range are more tolerant of a larger of the lack-of-fit error to the pure error. This ratio is an F
y-intercept. statistic and failure is set at P , .05.
When the solutions have known values, the ideal value If the value is less than a critical value, then the fit is
of the slope of the regression line, m, is 1. The deviation linear. One limitation of the G test is that it is too sensitive
of the slope of the regression line from 1 is used as an when precision is good and too insensitive when precision
estimate of the proportional systematic error of the testing is poor.3,9–12 Furthermore, the G test assumes that the data
system (Figure 3). Proportional error is most often caused are first order, which is often an inappropriate assump-
by incorrect assignment of the amount of substance in the tion. Therefore, additional methods of statistical evaluation
calibrator. As a result, the error is consistently high or low were subsequently developed.
proportional to the concentration of the analyte. Again, the The Polynomial Method (The CAP IRC Method).
level of acceptability will depend on the analyte tested. The 2003 proposed revised guideline NCCLS EP6-A1 has
The least squares method is exquisitely sensitive to outli- replaced the NCCLS EP6-P guidelines of 1986,8 which did
ers, which weighs the regression heavily towards the larg- not provide methods to evaluate nonlinearity with clini-
est values. Thus, the main drawback of least squares linear cally acceptable goals in mind. The approach proposed in
regression is that a single outlier may ‘‘pull’’ the regres- 2001 as NCCLS EP6-2 is based in part on the experiences
sion line steeper or flatter. Mathematically, such problems of the CAP IRC. Kroll and Emancipator7,13 developed a
can be reconciled by appropriately choosing the data polynomial method to evaluate data for nonlinearity. The
range and using the polynomial method (see ‘‘The Poly- CAP IRC developed a computer program incorporating
nomial Method’’). the polynomial approach and also found that first-, sec-
Another useful tool when looking at the slope is to look ond-, and third-order polynomials are commonly ob-
at the successive deltas (ie, the difference between two suc- served patterns in participant data, as expected.5,7,13 The
cessive points when the target values are equally spaced CAP linearity survey provides pre-diluted liquid samples
or the ratio of deltas when not equally spaced). If the fit (mostly serum, 1 urine) containing analyte with analyte
is linear, the deltas should be same for each subsequent concentrations at the upper and lower limits of the ana-
interval. The successive deltas are the ‘‘poor man’s’’ deriv- lytic measurement range. Samples are run in duplicate,
46 Arch Pathol Lab Med—Vol 128, January 2004 Linearity in Clinical Laboratory Testing—Jhang et al
and results are submitted to the CAP IRC for statistical teriorate over time. Systematic error can be introduced by
comparison with peer group results. The procedure then incorrect pipette calibration, while an imprecise pipette
calculates the regression equation and uses statistical tests can result in imprecise results. Samples may be incorrectly
to determine if the data are best modeled by a linear, qua- labeled or otherwise incorrectly matched for content and
dratic, or cubic equation. A detailed mathematical treat- testing.
ment of this technique is provided elsewhere.5,7 The pro- Analytic error can arise from various steps in the ana-
gram, after fitting the data to the 3 models, assesses lytic process. A good starting point is to check mainte-
whether the nonlinear coefficients are statistically signifi- nance, quality control, and calibration logs during the pe-
cant. If the nonlinear coefficients are statistically signifi- riod prior to the linearity testing to identify measurements
cant, then the data are nonlinear. If the nonlinear coeffi- that are out of control. This may provide some evidence
cients are not statistically significant, then the data are lin- and direction for finding and correcting the source of poor
ear. If the best-fit equation is linear, then the data are performance on the linearity evaluation. Potential, com-
called ‘‘Linear 1’’ in the CAP IRC survey. If not, the data mon analytic problems include wavelength shift of the
are nonlinear because a quadratic or cubic equation better spectrophotometer, dirty or aging optics, and dirty or el-
models the data, but the data set undergoes a further evated background in scintillation counting wells.4 Prob-
check to determine if the nonlinearity in the data is clin- lems with reagents can occur as well. Improperly pre-
ically significant when tested against clinically relevant al- pared reagents, expired reagents, or reagents near the end
lowable error.5 The process calculates the average distance of their shelf life can cause nonlinearities. Transformation
between the best nonlinear fit and the linear one. The av- formulas, such as those used for immunoassays, can also
erage distance is then compared against an analyte-spe- be inappropriate for the binding characteristics of the an-
cific bias adjusted for the random error.5 If the average tibody system being evaluated, leading to a nonlinear
deviation is within these predetermined limits, then the evaluation.
deviation from linearity is not clinically important and the One common cause of post-analytic error for this and
CAP IRC surveys designate it ‘‘Linear 2.’’ It is important all other surveys is incorrect transcription of the data onto
to note that Linear 1 and Linear 2 are equally valid dem- the submission forms. Transcription should be verified
onstrations of linearity. Finally, if the data yield an equa- prior to submitting data and should be confirmed if an
tion that is higher order than a line (quadratic or cubic) outlying point is seen on the initial visual inspection.
and the difference between the polynomial and the
straight-line fall outside preset tolerance limits, the data COMMENT
are determined to be nonlinear. Methods for the determination of linearity have evolved
It should be noted that all methods for evaluating line- over time, from simple visual assessment to the polyno-
arity must have enough statistical power to detect nonlin- mial method. Although visual assessment is an important
earity. This requires that the data have a minimum level step in interpretation of a linearity experiment, it is sub-
of precision. The CAP polynomial method uses a statisti- jective and poorly reproducible. Linear regression tech-
cally derived, formal approach for assessing whether the niques were next used to fit the data to a regression line,
data contain sufficient precision to attain appropriate sta- and the G test was added to determine appropriateness
tistical power.5 Therefore, the data are put through a test of this model. However, this method is dependent upon
of precision prior to initiating the linearity test. If the data precision and assumes that the data set is linear. It was
are not sufficiently precise (poor repeatability), then line- subsequently deemed an inappropriate test for evaluating
arity is not assessed and the data are labeled ‘‘imprecise.’’ linearity. Kroll and Emancipator7,13 developed a method
In other words, the data set has too much variability to for comparing first-, second-, third-, and higher-ordered
assess accurately the departure from linearity, and there polynomials, which the CAP IRC adopted for its linearity
is insufficient statistical power to evaluate the data.5 Cal- survey program. The polynomial method uses clinically
culating the ratio of the SD around the best-fit polynomial relevant goals in assessing linearity.
to the mean concentration for all assay solutions screens The benefits of the CAP IRC and other linearity surveys
for imprecision. This ratio is compared to a quantity based are that the assessment picks up problems before quality
on the clinically relevant tolerance limit for the analyte, control or proficiency testing failures occur. Therefore, en-
the number of measurements made (number of solutions rollment in a linearity program can help detect problems,
times the number of replicates), and a constant that de- so that corrections can ensure accurate and reliable labo-
pends on the degree of the best-fit polynomial. This sta- ratory results. The linearity assessment can detect reagent
tistical approach has gained general acceptance as the best deterioration, monitor analyzer performance, or re-con-
statistical method to evaluate linearity of quantitative tests firm linearity after a major servicing.4 Peer group com-
and has been adopted as an approved guideline (NCCLS parisons allow the clinical laboratory physician to com-
EP6-A).1 pare the laboratory’s instrument to the same instruments
of other enrollees.
TROUBLESHOOTING The CAP IRC monitors the survey participant reports
Laboratories must ensure the reliability of test results to evaluate the adequacy of both the survey materials and
when nonlinearities are discovered during a linearity eval- the statistical tools. Ongoing considerations include add-
uation. The specific actions will depend on the analyte, ing new analytes to be tested, refining the statistical meth-
method, extent of nonlinearity, and the individual labo- ods, deciding how to best supply the materials, and de-
ratory. Since each situation is different, we suggest some termining how to best process the samples once the prod-
general guidelines for evaluating nonlinearity. uct is received in a laboratory. Through proficiency test-
During the pre-analytic steps, human error is often in- ing, calibration verification, and other means, one can
volved. If the materials are not prepared or stored prop- determine the accuracy of a method at specific points. Ac-
erly, the amount of analyte may be incorrect or may de- curacy within all points in the analytic measurement
Arch Pathol Lab Med—Vol 128, January 2004 Linearity in Clinical Laboratory Testing—Jhang et al 47
range requires that one be able to show that the mathe- 3. Floering D. College of American Pathologists’ experience with the linearity
surveys, 1987–1991. Arch Pathol Lab Med. 1992;116:739–745.
matical relationship between input and output (concentra- 4. Kroll M, Gilstad C, Gochman G, et al, eds. Laboratory Instrument Evalua-
tion, activity, etc) is continuous and acceptably linear. If tion, Verification and Maintenance Manual. 5th ed. Northfield, Ill: College of
that relationship is nonlinear, then one must know it ex- American Pathologists; 1999:1.
actly, which requires empirical study. If it is linear, then 5. Kroll MH, Praestgaard J, Michaliszyn E, Styer PE. Evaluation of the extent of
nonlinearity in reportable range studies. Arch Pathol Lab Med. 2000;124:1331–
with 2 determined points, one can generate the rest of the 1338.
response curve and report a consistent, reliable, and clin- 6. Tholen DW. Alternative statistical techniques to evaluate linearity. Arch
ically meaningful value from the entire analytical range. Pathol Lab Med. 1992;116:746–756.
7. Kroll MH, Emancipator K. A theoretical evaluation of linearity. Clin Chem.
The authors extend their thanks to William J. Castellani, MD 1993;39:405–413.
(Department of Pathology, Truman Medical Center, Kansas City, 8. Passey RB, Bee DE, Caffo A, Erikson JM. Evaluation of Linearity of Quanti-
Mo), and the members and staff of the College of American Pa- tative Analytical Methods. Proposed guideline—EP6-P. Wayne, Pa: NCCLS; 1988.
thologists Standards and Instrumentation Committee for their re- 9. Tetrault G. Evaluation of assay linearity [letter]. Clin Chem. 1990;36:585–
586.
view of this manuscript.
10. Passey RB. Evaluation of assay linearity [response to letter]. Clin Chem.
References 1990;36:586.
1. NCCLS. Evaluation of the Linearity of Quantitative Measurement Proce- 11. Redondo FL. ‘‘G’’ test and evaluation of assay linearity [letter]. Clin Chem.
dures: A Statistical Approach. Approved guideline NCCLS document EP6-A (ISBN 1990;36:1384.
1-56238-498-8). Wayne, Pa: NCCLS; 2003. 12. Tetrault G. ‘‘G’’ test and evaluation of assay linearity [response to letter].
2. Department of Health and Human Services, Health Care Financing Admin- Clin Chem. 1990;36:1384.
istration. Clinical Laboratory Improvement Amendments of 1998; Final Rule. Fed- 13. Emancipator K, Kroll MH. A quantitative measure of nonlinearity. Clin
eral Register (1992) (codified at 42 CFR §493). Chem. 1993;39:766–772.

48 Arch Pathol Lab Med—Vol 128, January 2004 Linearity in Clinical Laboratory Testing—Jhang et al

You might also like