Nothing Special   »   [go: up one dir, main page]

Summary of Biostatistics Articles

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

The general linear mixed model extends the general linear model by the addition of random effect

parameters and by allowing a more flexible specification of the covariance matrix of the random
errors. For example, general linear mixed models allow for both correlated error terms and error terms
with heterogeneous variances.

The general linear mixed model can easily be fitted to longitudinal data. The model assumes that
the vector of repeated measurements on each subject follows a linear regression model where some
of the regression parameters are population-specific (fixed-effects) whereas other parameters are
subject- specific (random-effects). The subject-specific regression coefficients reflect how the
response evolves over time for each subject.

Estimation is more difficult in the mixed model than in the general linear model. Not only do you
have fixed effects as in the general linear model, but you also have to estimate the covariance matrix
of the random effects, and the covariance matrix of the random errors. Ordinary least squares is no
longer the best method because the distributional assumptions regarding the random error terms are
too restrictive. Generalized least squares is used because it takes into account the covariance
structures
of the random effects and random errors.

Longitudinal models usually have three sources of random variation. The between-subject variability
is represented by the random effects. The within-subject variability is represented by the serial
correlation. The correlation between the measurements within subject usually depends on the time
interval between the measurements and decreases as the length of the interval increases. Finally, there
is potentially also measurement error in the measurement process.

The covariance structure that is appropriate for your model is directly related to which component
of variability is the dominant component. For example, if the serial correlation among the
measurements is minimal, then the random effects probably account for most of the variability in the
data
and the remaining error components have a very simple covariance structure.

After a candidate-mean model is selected, fitting the model using ordinary least squares regression
and examining the residuals might help determine the appropriate covariance structure.Afunction
consisting of ordinary least squares that describes the association among repeated measurements and
is easily estimated with irregular observation times is the sample variogram.

The data values in the sample variogram are calculated from the observed half-squared differences
between pairs of residuals within individuals, where the residuals are ordinary least squares residuals
based on the mean model, andthecorrespondingtimedifferences.Theverticalaxisinthevariogram
represents the residual variability within subject over time. The scatter plot contains a smoothed
nonparametric curve, which estimates the general pattern in the sample variogram. This curve can be
used to decide whether the mixed model should include serial correlation. If a serial correlation
component is warranted, the fitted curve can be used in selecting the appropriate serial correlation
function. The fitted curve can also be used to determine whether measurement error and random
effects are evident
in the model.

You can also use the information criteria (such as the AIC and BIC) produced by PROC MIXED as a
tool to help you select the most appropriate covariance structure. The smaller the information criteria
value, the better the model. However, only choose the covariance structures that make sense given the
data.
For data with unequally spaced time points and different time points across subjects, only compound
symmetry and the spatial covariance structures are appropriate covariance structures. If the time
points are equally spaced, then the AR(1) and Toeplitz covariance structures could be examined. If the
time points were unequally spaced but have the same time points across subjects, then the
unstructured covariance structure could be examined.

PROC MIXED allows heterogeneity in the residual covariance parameters with the GROUP= option.
All observations having the same level of the GROUP effect have the same covariance parameters.
Each new level of the GROUP effect produces a new set of covariance parameters with the same
structure
as the original group.

After an appropriate covariance structure is selected, model-building efforts should be directed


at simplifying the mean structure of the model. Because the model should be hierarchically well
formulated, the first step is to evaluate the interactions. One recommended approach is to eliminate
the interactions one at a time, starting with the least significant interaction. If you use the model fit
statistics such as AIC, then you must use the MLestimation method. However, after the final model is
chosen, refit the model using REML because REML estimators are superior.

When the sample variogram clearly shows that the random effects error component is much larger
than the serial correlation error component, a longitudinal model using the RANDOM statement
might
be useful. These models are called random coefficient models because the regression coefficients for
one or more covariates are assumed to be a random sample from some population of possible
coefficients.

In longitudinal models, the random coefficients are the subject-specific parameter estimates. Random
coefficient models are useful for highly unbalanced data with many repeated measurements per
subject.

In random coefficient models, the fixed effect parameter estimates represent the expected values
of the population of intercepts and slopes. The random effects for intercept represent the difference
between the intercept for the ith subject and the overall intercept. The random effects for slope
represent the difference between the slope for the ith subject and the overall slope. Random coefficient
models also have a random error term for the within-subject variation.

In longitudinal models, it is recommended that the unstructured covariance structure be specified


in the RANDOM statement. PROC MIXED estimates the variances of the intercepts and slopes along
with the covariance between the intercepts and slopes in the G matrix. Specifying the unstructured
covariance structure indicates that you do not want to impose any structure on the variances for
intercepts and variances for slopes, and on the covariance between the intercepts and slopes.

You can also fit a model in PROC MIXED with both the RANDOM and REPEATED statements.
However, this model is generally not recommended in practice. These models tend to have
convergence and estimation problems, especially with complex covariance structures.

The purpose of model diagnostics is to compare the data with the fitted model to highlight any
systematic discrepancies. Conditional residual plots can be used to detect outliers and whether the
random effects are properly selected. Marginal residual plots can be used to diagnose whether you
selected the fixed effect part of the model properly. Model diagnostics are especially important in
linear mixed models because likelihood-based estimation methods are particularly sensitive to unusual
observations.

If the model is correctly specified and the covariance structure is appropriate, then the violation
of the normality assumption of the random effects has little effect on the estimation of the fixed effect
parameter estimates and their standard errors. However, violation of the normality assumption
of the random effects clearly affects the standard errors and parameter estimates of the random
effects.

General form of the MIXED procedure:

PROC MIXED DATA=SAS-data-set <options>;

CLASS variables;

MODEL response=<fixed effects></options>;

RANDOM random effects </options>;

REPEATED <repeated effect> </options>;

RUN;
PROC MIXED
o Calls procedure MIXED
o Specifies data-set
o Estimation method: ML, REML (default)

 CLASS
o Definition of the factor in the model

 MODEL
o Response variable
o Fixed effects
o Options similar to SAS regression procedures

 RANDOM
o Definition of random effects (including intercepts)
o Identification of subjects: Independence across subjects

o Type of random-effects covariance matrix D


o Options ‘g’ and ‘gcorr’ to print out D (random effects covariance matrix) and
corresponding correlation matrix

o Options ‘v’ and ‘vcorr’ to print out Vi (marginal covariance matrix) and
corresponding correlation matrix

 REPEATED
o Ordering of measurements within subjects
o The effect(s) specified must be of the factor-type
o Identification of subjects: Independence across subjects

o Type of residual covariance matrix, 


i

o Options ‘r’ and ‘rcorr’ to print out  (residual covariance matrix) and
i
corresponding correlation matrix

o Type of covariance matrix defined before


o When serial correlation is to be fitted, it should be specified in the REPEATED
statement and the option ‘local’ can then be added to also include measurement error,
if required.

You might also like