%\VignetteEngine{knitr::knitr} %\VignetteDepends{ggplot2} %\VignetteDepends{gamm4} %\VignetteIndexEntry{Fitting Linear Mixed-Effects Models using lme4} \documentclass[nojss]{jss} \usepackage[T1]{fontenc}% for correct hyphenation and T1 encoding \usepackage[utf8]{inputenc}% \usepackage{lmodern}% latin modern font \usepackage[american]{babel} %% for texi2dvi ~ bug \usepackage{bm,amsmath,thumbpdf,amsfonts}%,minted} \usepackage{blkarray} \usepackage{array} %% huxtable-ish stuff %% \usepackage{adjustbox} %% \usepackage{threeparttable} %% \newcolumntype{P}[1]{>{\raggedright\arraybackslash}p{#1}} \newcommand{\matindex}[1]{\mbox{\scriptsize#1}}% Matrix index \newcommand{\github}{Github} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\VEC}{vec} \newcommand{\bmb}[1]{{\color{red} \emph{#1}}} \newcommand{\scw}[1]{{\color{blue} \emph{#1}}} \newcommand{\dmb}[1]{{\color{magenta} \emph{#1}}} \shortcites{bolker_strategies_2013,sleepstudy,gelman2013bayesian} \author{Douglas Bates\\University of Wisconsin-Madison\And Martin M\"achler\\ETH Zurich\And Benjamin M. Bolker\\McMaster University\And Steven C. Walker\\McMaster University } \Plainauthor{Douglas Bates, Martin M\"achler, Ben Bolker, Steve Walker} \title{Fitting Linear Mixed-Effects Models Using \pkg{lme4}} \Plaintitle{Fitting Linear Mixed-Effects Models using lme4} \Shorttitle{Linear Mixed Models with lme4} \Abstract{% Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the \code{lmer} function in the \pkg{lme4} package for \proglang{R}. As for most model-fitting functions in \proglang{R}, the model is described in an \code{lmer} call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in \proglang{R}, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by \code{lmer}.} \Keywords{% sparse matrix methods, linear mixed models, penalized least squares, Cholesky decomposition} \Address{ Douglas Bates\\ Department of Statistics, University of Wisconsin\\ 1300 University Ave.\\ Madison, WI 53706, U.S.A.\\ E-mail: \email{bates@stat.wisc.edu}\\ \par\bigskip Martin M\"achler\\ Seminar f\"ur Statistik, HG G~16\\ ETH Zurich\\ 8092 Zurich, Switzerland\\ E-mail: \email{maechler@stat.math.ethz.ch}\\ % URL: \url{http://stat.ethz.ch/people/maechler}\\ \par\bigskip Benjamin M. Bolker\\ Departments of Mathematics \& Statistics and Biology \\ McMaster University \\ 1280 Main Street W \\ Hamilton, ON L8S 4K1, Canada \\ E-mail: \email{bolker@mcmaster.ca}\\ \par\bigskip Steven C. Walker\\ Department of Mathematics \& Statistics \\ McMaster University \\ 1280 Main Street W \\ Hamilton, ON L8S 4K1, Canada \\ E-mail: \email{scwalker@math.mcmaster.ca } } \newcommand{\thetavec}{{\bm\theta}} \newcommand{\betavec}{{\bm\beta}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\abs}{\operatorname{abs}} \newcommand{\bLt}{\ensuremath{\bm\Lambda_{\bm\theta}}} \newcommand{\mc}[1]{\ensuremath{\mathcal{#1}}} \newcommand{\trans}{\ensuremath{^\top}} % JSS wants \top \newcommand{\yobs}{\ensuremath{\bm y_{\mathrm{obs}}}} \newcommand*{\eq}[1]{eqn.~\ref{#1}}% or just {(\ref{#1})} <>= options(width=70, show.signif.stars=FALSE, str=strOptions(strict.width="cut"), ## prefer empty continuation for reader's cut'n'paste: continue = " ", #JSS: prompt = "R> ", continue = "+ ", useFancyQuotes = FALSE) library("knitr") library("lme4") library("ggplot2")# Keeping default theme, nicer "on line": #JSS theme_set(theme_bw()) library("grid") zmargin <- theme(panel.spacing=unit(0,"lines")) library("lattice") library("minqa") opts_chunk$set(engine='R',dev='pdf', fig.width=9, fig.height=5.5, prompt=TRUE, cache=TRUE, tidy=FALSE, comment=NA, error = FALSE) knitr::render_sweave() @ \setkeys{Gin}{width=\textwidth} \setkeys{Gin}{height=3.5in} \begin{document} A version of this manuscript has been published online in the \emph{Journal of Statistical Software}, on Oct.\ 2015, with DOI \linebreak[3] \texttt{10.18637/jss.v067.i01}, see \url{https://www.jstatsoft.org/article/view/v067i01/}. \section{Introduction} \label{sec:intro} The \pkg{lme4} package \citep{lme4} for \proglang{R} \citep{R} provides functions to fit and analyze linear mixed models, generalized linear mixed models and nonlinear mixed models. In each of these names, the term ``mixed'' or, more fully, ``mixed effects'', denotes a model that incorporates both fixed- and random-effects terms in a linear predictor expression from which the conditional mean of the response can be evaluated. In this paper we describe the formulation and representation of linear mixed models. The techniques used for generalized linear and nonlinear mixed models will be described separately, in a future paper. At present, the main alternative to \pkg{lme4} for mixed modeling in \proglang{R} is the \pkg{nlme} package \citep{nlme_pkg}. The main features distinguishing \pkg{lme4} from \pkg{nlme} are (1) more efficient linear algebra tools, giving improved performance on large problems; (2) simpler syntax and more efficient implementation for fitting models with crossed random effects; (3) the implementation of profile likelihood confidence intervals on random-effects parameters; and (4) the ability to fit generalized linear mixed models (although in this paper we restrict ourselves to linear mixed models). The main advantage of \pkg{nlme} relative to \pkg{lme4} is a user interface for fitting models with structure in the residuals (various forms of heteroscedasticity and autocorrelation) and in the random-effects covariance matrices (e.g., compound symmetric models). With some extra effort, the computational machinery of \pkg{lme4} can be used to fit structured models that the basic \code{lmer} function cannot handle (see Appendix~\ref{sec:modularExamples}). The development of general software for fitting mixed models remains an active area of research with many open problems. Consequently, the \pkg{lme4} package has evolved since it was first released, and continues to improve as we learn more about mixed models. However, we recognize the need to maintain stability and backward compatibility of \pkg{lme4} so that it continues to be broadly useful. In order to maintain stability while continuing to advance mixed-model computation, we have developed several additional frameworks that draw on the basic ideas of \pkg{lme4} but modify its structure or implementation in various ways. These descendants include the \mbox{\pkg{MixedModels}} package \citep{MixedModels} in \proglang{Julia} \citep{Julia}, the \pkg{lme4pureR} package \citep{lme4pureR} in \proglang{R}, and the \pkg{flexLambda} development branch of \pkg{lme4}. The current article is largely restricted to describing the current stable version of the \pkg{lme4} package (1.1-7), with Appendix~\ref{sec:modularExamples} describing hooks into the computational machinery that are designed for extension development. The \pkg{gamm4} \citep{gamm4} and \pkg{blme} \citep{blme, blme2} packages currently make use of these hooks. Another goal of this article is to contrast the approach used by \pkg{lme4} with previous formulations of mixed models. The expressions for the profiled log-likelihood and profiled REML (restricted maximum likelihood) criteria derived in Section~\ref{sec:profdev} are similar to those presented in \citet{bates04:_linear} and, indeed, are closely related to ``Henderson's mixed-model equations''~\citep{henderson_1982}. Nonetheless there are subtle but important changes in the formulation of the model and in the structure of the resulting penalized least squares (PLS) problem to be solved (Section~\ref{sec:PLSpureR}). We derive the current version of the PLS problem (Section~\ref{sec:plsMath}) and contrast this result with earlier formulations (Section~\ref{sec:previous_lmm_form}). This article is organized into four main sections (Sections~\ref{sec:lFormula}, \ref{sec:mkLmerDevfun}, \ref{sec:optimizeLmer}, and \ref{sec:mkMerMod}), each of which corresponds to one of the four largely separate modules that comprise \pkg{lme4}. Before describing the details of each module, we describe the general form of the linear mixed model underlying \pkg{lme4} (Section~\ref{sec:LMMs}); introduce the \code{sleepstudy} data that will be used as an example throughout (Section~\ref{sec:sleepstudy}); and broadly outline \pkg{lme4}'s modular structure (Section~\ref{sec:modular}). \subsection{Linear mixed models} \label{sec:LMMs} Just as a linear model is described by the distribution of a vector-valued random response variable, $\mc{Y}$, whose observed value is $\yobs$, a linear mixed model is described by the distribution of two vector-valued random variables: $\mc{Y}$, the response, and $\mc{B}$, the vector of random effects. In a linear model the distribution of $\mc Y$ is multivariate normal,%\begin{linenomath} \begin{equation} \label{eq:linearmodel} \mc Y\sim\mc{N}(\bm X\bm\beta+\bm o,\sigma^2\bm W^{-1}), \end{equation} where $n$ is the dimension of the response vector, $\bm W$ is a diagonal matrix of known prior weights, $\bm\beta$ is a $p$-dimensional coefficient vector, $\bm X$ is an $n\times p$ model matrix, and $\bm o$ is a vector of known prior offset terms. The parameters of the model are the coefficients $\bm\beta$ and the scale parameter $\sigma$. In a linear mixed model it is the \emph{conditional} distribution of $\mc Y$ given $\mc B=\bm b$ that has such a form, \begin{equation} \label{eq:LMMcondY} ( \mc Y|\mc B=\bm b)\sim\mc{N}(\bm X\bm\beta+\bm Z\bm b+\bm o,\sigma^2\bm W^{-1}), % | <- for ESS \end{equation} where $\bm Z$ is the $n\times q$ model matrix for the $q$-dimensional vector-valued random-effects variable, $\mc B$, whose value we are fixing at $\bm b$. The unconditional distribution of $\mc B$ is also multivariate normal with mean zero and a parameterized $q\times q$ variance-covariance matrix, $\bm\Sigma$, \begin{equation} \label{eq:LMMuncondB} \mc B\sim\mc N(\bm0,\bm\Sigma) . \end{equation} As a variance-covariance matrix, $\bm\Sigma$ must be positive semidefinite. It is convenient to express the model in terms of a \emph{relative covariance factor}, $\bLt$, which is a $q\times q$ matrix, depending on the \emph{variance-component parameter}, $\bm\theta$, and generating the symmetric $q\times q$ variance-covariance matrix, $\bm\Sigma$, according to%\begin{linenomath} \begin{equation} \label{eq:relcovfac} \bm\Sigma_{\bm\theta}=\sigma^2\bLt\bLt\trans , \end{equation}%\end{linenomath} where $\sigma$ is the same scale factor as in the conditional distribution (\ref{eq:LMMcondY}). Although Equations~\ref{eq:LMMcondY}, \ref{eq:LMMuncondB}, and \ref{eq:relcovfac} fully describe the class of linear mixed models that \pkg{lme4} can fit, this terse description hides many important details. Before moving on to these details, we make a few observations: \begin{itemize} \item This formulation of linear mixed models allows for a relatively compact expression for the profiled log-likelihood of $\bm\theta$ (Section~\ref{sec:profdev}, Equation~\ref{eq:profiledDeviance}). \item The matrices associated with random effects, $\bm Z$ and $\bLt$, typically have a sparse structure with a sparsity pattern that encodes various model assumptions. Sections~\ref{sec:LMMmatrix} and \ref{sec:CSCmats} provide details on these structures, and how to represent them efficiently. \item The interface provided by \pkg{lme4}'s \code{lmer} function is slightly less general than the model described by Equations~\ref{eq:LMMcondY}, \ref{eq:LMMuncondB}, and \ref{eq:relcovfac}. To take advantage of the entire range of possibilities, one may use the modular functions (Sections~\ref{sec:modular} and Appendix~\ref{sec:modularExamples}) or explore the experimental \pkg{flexLambda} branch of \pkg{lme4} on \github. \end{itemize} \subsection{Example} \label{sec:sleepstudy} Throughout our discussion of \pkg{lme4}, we will work with a data set on the average reaction time per day for subjects in a sleep deprivation study \citep{sleepstudy}. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The response variable, \code{Reaction}, represents average reaction times in milliseconds (ms) on a series of tests given each \code{Day} to each \code{Subject} (Figure~\ref{fig:sleepPlot}), % <>= str(sleepstudy) @ <>= ## BMB: seemed more pleasing to arrange by increasing slope rather than ## intercept ... xyplot(Reaction ~ Days | Subject, sleepstudy, aspect = "xy", layout = c(9, 2), type = c("g", "p", "r"), index.cond = function(x, y) coef(lm(y ~ x))[2], xlab = "Days of sleep deprivation", ylab = "Average reaction time (ms)", as.table = TRUE) @ % | Each subject's reaction time increases approximately linearly with the number of sleep-deprived days. However, subjects also appear to vary in the slopes and intercepts of these relationships, which suggests a model with random slopes and intercepts. As we shall see, such a model may be fitted by minimizing the REML criterion (Equation~\ref{eq:REMLdeviance}) using <>= fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) @ % | The estimates of the standard deviations of the random effects for the intercept and the slope are \Sexpr{round(sqrt(VarCorr(fm1)$Subject[1,1]), 2)} ms % $ and \Sexpr{round(sqrt(VarCorr(fm1)$Subject[2,2]), 2)} ms/day. % $ The fixed-effects coefficients, $\betavec$, are \Sexpr{round(fixef(fm1)[1], 1)} ms and \Sexpr{round(fixef(fm1)[2], 2)} ms/day for the intercept and slope. In this model, one interpretation of these fixed effects is that they are the estimated population mean values of the random intercept and slope (Section~\ref{sec:intuitiveFormulas}). We have chosen the \code{sleepstudy} example because it is a relatively small and simple example to illustrate the theory and practice underlying \code{lmer}. However, \code{lmer} is capable of fitting more complex mixed models to larger data sets. For example, we direct the interested reader to \code{RShowDoc("lmerperf", package = "lme4")} for examples that more thoroughly exercise the performance capabilities of \code{lmer}. \subsection{High-level modular structure} \label{sec:modular} The \code{lmer} function is composed of four largely independent modules. In the first module, a mixed-model formula is parsed and converted into the inputs required to specify a linear mixed model (Section~\ref{sec:lFormula}). The second module uses these inputs to construct an \proglang{R} function which takes the covariance parameters, $\bm\theta$, as arguments and returns negative twice the log profiled likelihood or the REML criterion (Section~\ref{sec:mkLmerDevfun}). The third module optimizes this objective function to produce maximum likelihood (ML) or REML estimates of $\bm\theta$ (Section~\ref{sec:optimizeLmer}). Finally, the fourth module provides utilities for interpreting the optimized model (Section~\ref{sec:mkMerMod}). \begin{table}[tb] \centering \begin{tabular}{lllp{2.1in}} \hline Module & & \proglang{R} function & Description \\ \hline Formula module & (Section~\ref{sec:lFormula}) & \code{lFormula} & Accepts a mixed-model formula, data, and other user inputs, and returns a list of objects required to fit a linear mixed model. \\ Objective function module & (Section~\ref{sec:mkLmerDevfun}) & \code{mkLmerDevfun} & Accepts the results of \code{lFormula} and returns a function to calculate the deviance (or restricted deviance) as a function of the covariance parameters, $\bm\theta$.\\ Optimization module & (Section~\ref{sec:optimizeLmer}) & \code{optimizeLmer} & Accepts a deviance function returned by \code{mkLmerDevfun} and returns the results of the optimization of that deviance function. \\ Output module & (Section~\ref{sec:mkMerMod}) & \code{mkMerMod} & Accepts an optimized deviance function and packages the results into a useful object. \\ \hline \end{tabular} \caption{The high-level modular structure of \code{lmer}.} \label{tab:modular} \end{table} To illustrate this modularity, we recreate the \code{fm1} object by a series of four modular steps; the formula module, <>= parsedFormula <- lFormula(formula = Reaction ~ Days + (Days | Subject), data = sleepstudy) @ the objective function module, <>= devianceFunction <- do.call(mkLmerDevfun, parsedFormula) @ the optimization module, <>= optimizerOutput <- optimizeLmer(devianceFunction) @ and the output module, <>= mkMerMod( rho = environment(devianceFunction), opt = optimizerOutput, reTrms = parsedFormula$reTrms, fr = parsedFormula$fr) @ % | \section{Formula module} \label{sec:lFormula} \subsection{Mixed-model formulas} \label{sec:formulas} Like most model-fitting functions in \proglang{R}, \code{lmer} takes as its first two arguments a \emph{formula} specifying the model and the \emph{data} with which to evaluate the formula. This second argument, \code{data}, is optional but recommended and is usually the name of an \proglang{R} data frame. In the \proglang{R} \code{lm} function for fitting linear models, formulas take the form \verb+resp ~ expr+, where \code{resp} determines the response variable and \code{expr} is an expression that specifies the columns of the model matrix. Formulas for the \code{lmer} function contain special random-effects terms, <>= resp ~ FEexpr + (REexpr1 | factor1) + (REexpr2 | factor2) + ... @ where \code{FEexpr} is an expression determining the columns of the fixed-effects model matrix, $\bm X$, and the random-effects terms, \code{(REexpr1 | factor1)} and \code{(REexpr2 | factor2)}, determine both the random-effects model matrix, $\bm Z$ (Section~\ref{sec:mkZ}), and the structure of the relative covariance factor, $\bLt$ (Section~\ref{sec:mkLambdat}). In principle, a mixed-model formula may contain arbitrarily many random-effects terms, but in practice the number of such terms is typically low. \subsection{Understanding mixed-model formulas} \label{sec:intuitiveFormulas} Before describing the details of how \pkg{lme4} parses mixed-model formulas (Section~\ref{sec:LMMmatrix}), we provide an informal explanation and then some examples. Our discussion assumes familiarity with the standard \proglang{R} modeling paradigm \citep{Chambers:1993}. Each random-effects term is of the form \code{(expr | factor)}. The expression \code{expr} is evaluated as a linear model formula, producing a model matrix following the same rules used in standard \proglang{R} modeling functions (e.g., \code{lm} or \code{glm}). The expression \code{factor} is evaluated as an \proglang{R} factor. One way to think about the vertical bar operator is as a special kind of interaction between the model matrix and the grouping factor. This interaction ensures that the columns of the model matrix have different effects for each level of the grouping factor. What makes this a special kind of interaction is that these effects are modeled as unobserved random variables, rather than unknown fixed parameters. Much has been written about important practical and philosophical differences between these two types of interactions \citep[e.g., ][]{henderson_1982,gelman2005analysis}. For example, the random-effects implementation of such interactions can be used to obtain shrinkage estimates of regression coefficients \citep[e.g., ][]{1977EfronAndMorris}, or account for lack of independence in the residuals due to block structure or repeated measurements \citep[e.g., ][]{laird_ware_1982}. Table~\ref{tab:formulas} provides several examples of the right-hand-sides of mixed-model formulas. The first example, \code{(1 | g)}, % | is the simplest possible mixed-model formula, where each level of the grouping factor, \code{g}, has its own random intercept. The mean and standard deviation of these intercepts are parameters to be estimated. Our description of this model incorporates any nonzero mean of the random effects as fixed-effects parameters. If one wishes to specify that a random intercept has \emph{a priori} known means, one may use the \code{offset} function as in the second model in Table~\ref{tab:formulas}. This model contains no fixed effects, or more accurately the fixed-effects model matrix, $\bm X$, has zero columns and $\bm\beta$ has length zero. \begin{table}[tb] \centering \begin{tabular}{llP{1.5in}} %% see new column type for ragged right \hline Formula & Alternative & Meaning \\ \hline%------------------------------------------------ \code{(1 | g)} & \code{1 + (1 | g)} & Random intercept with fixed mean. \\ \code{0 + offset(o) + (1 | g)} & \code{-1 + offset(o) + (1 | g)} & Random intercept with \emph{a priori} means. \\ \code{(1 | g1/g2)} & \code{(1 | g1)+(1 | g1:g2)} % | & Intercept varying among \code{g1} and \code{g2} within \code{g1}. \\ \code{(1 | g1) + (1 | g2)} & \code{1 + (1 | g1) + (1 | g2)}. & Intercept varying among \code{g1} and \code{g2}. \\ \code{x + (x | g)} & \code{1 + x + (1 + x | g)} & Correlated random intercept and slope. \\ \code{x + (x || g)} & \code{1 + x + (1 | g) + (0 + x | g)} & Uncorrelated random intercept and slope. \\ \hline \end{tabular} \caption{Examples of the right-hand-sides of mixed-effects model formulas. The names of grouping factors are denoted \code{g}, \code{g1}, and \code{g2}, and covariates and \emph{a priori} known offsets as \code{x} and \code{o}.} \label{tab:formulas} \end{table} We may also construct models with multiple grouping factors. For example, if the observations are grouped by \code{g2}, which is nested within \code{g1}, then the third formula in Table \ref{tab:formulas} can be used to model variation in the intercept. A common objective in mixed modeling is to account for such nested (or hierarchical) structure. However, one of the most useful aspects of \pkg{lme4} is that it can be used to fit random effects associated with non-nested grouping factors. For example, suppose the data are grouped by fully crossing two factors, \code{g1} and \code{g2}, then the fourth formula in Table \ref{tab:formulas} may be used. Such models are common in item response theory, where \code{subject} and \code{item} factors are fully crossed \citep{doran2007estimating}. In addition to varying intercepts, we may also have varying slopes (e.g., the \code{sleepstudy} data, Section~\ref{sec:sleepstudy}). The fifth example in Table~\ref{tab:formulas} gives a model where both the intercept and slope vary among the levels of the grouping factor. \subsubsection{Specifying uncorrelated random effects} \label{sec:uncor} By default, \pkg{lme4} assumes that all coefficients associated with the same random-effects term are correlated. To specify an uncorrelated slope and intercept (for example), one may either use double-bar notation, \code{(x || g)}, or equivalently use multiple random-effects terms, \code{x + (1 | g) + (0 + x | g)}, as in the final example of Table~\ref{tab:formulas}. For example, if one examined the results of model \code{fm1} of the \code{sleepstudy} data (Section~\ref{sec:sleepstudy}) using \code{summary(fm1)}, one would see that the estimated correlation between the slope for \code{Days} and the intercept is fairly low (\Sexpr{round(attr(VarCorr(fm1)$Subject, "correlation")[2],3)}) % $ (See Section~\ref{sec:summary} below for more on how to extract the random-effects covariance matrix.) We may use double-bar notation to fit a model that excludes a correlation parameter: <>= fm2 <- lmer(Reaction ~ Days + (Days || Subject), sleepstudy) @ Although mixed models where the random slopes and intercepts are assumed independent are commonly used to reduce the complexity of random-slopes models, they do have one subtle drawback. Models in which the slopes and intercepts are allowed to have a nonzero correlation (e.g., \code{fm1}) are invariant to additive shifts of the continuous predictor (\code{Days} in this case). This invariance breaks down when the correlation is constrained to zero; any shift in the predictor will necessarily lead to a change in the estimated correlation, and in the likelihood and predictions of the model. For example, we can eliminate the correlation in \code{fm1} simply by adding an amount equal to the ratio of the estimated among-subject standard deviations multiplied by the estimated correlation (i.e., $\sigma_{\text{\small slope}}/\sigma_{\text{\small intercept}} \cdot \rho_{\text{\small slope:intercept}}$) to the \code{Days} variable. The use of models such as \code{fm2} should ideally be restricted to cases where the predictor is measured on a ratio scale (i.e., the zero point on the scale is meaningful, not just a location defined by convenience or convention), as is the case here. %% <>= %% sleepstudyShift <- within(sleepstudy, { %% Days <- Days + (24.74*0.07)/5.92 }) %% lmer(Reaction ~ Days + (Days | Subject), sleepstudyShift) %% @ \subsection{Algebraic and computational account of mixed-model formulas} \label{sec:LMMmatrix} The fixed-effects terms of a mixed-model formula are parsed to produce the fixed-effects model matrix, $\bm X$, in the same way that the \proglang{R} \code{lm} function generates model matrices. However, a mixed-model formula incorporates $k\ge1$ random-effects terms of the form \code{(r | f)} as well. % | These $k$ terms are used to produce the random-effects model matrix, $\bm Z$ (Equation~\ref{eq:LMMcondY}; Section~\ref{sec:mkZ}), and the structure of the relative covariance factor, $\bLt$ (Equation~\ref{eq:relcovfac}; Section~\ref{sec:mkLambdat}), which are matrices that typically have a sparse structure. We now describe how one might construct these matrices from the random-effects terms, considering first a single term, \code{(r | f)}, % | and then generalizing to multiple terms. Tables~\ref{tab:dim} and \ref{tab:algebraic} summarize the matrices and vectors that determine the structure of $\bm Z$ and $\bLt$. \begin{table}[tb] \centering \begin{tabular}{lll} \hline Symbol & Size \\ \hline $n$ & Length of the response vector, $\mc{Y}$ \\ $p$ & Number of columns of fixed-effects model matrix, $\bm X$ \\ $q = \sum_i^k q_i$ & Number of columns of random-effects model matrix, $\bm Z$ \\ $p_i$ & Number of columns of the raw model matrix, $\bm X_i$ \\ $\ell_i$ & Number of levels of the grouping factor indices, $\bm i_i$ \\ $q_i = p_i\ell_i$ & Number of columns of the term-wise model matrix, $\bm Z_i$ \\ $k$ & Number of random-effects terms \\ $m_i = \binom{p_i+1}{2}$ & Number of covariance parameters for term $i$ \\ $m = \sum_i^k m_i$ & Total number of covariance parameters \\ \hline \end{tabular} \caption{Dimensions of linear mixed models. The subscript $i = 1, \dots, k$ denotes a specific random-effects term.} \label{tab:dim} \end{table} \begin{table}[tb] \centering \begin{tabular}{lll} \hline Symbol & Size & Description \\ \hline $\bm X_i$ & $n\times p_i$ & Raw random-effects model matrix \\ $\bm J_i$ & $n\times \ell_i$ & Indicator matrix of grouping factor indices\\ $\bm X_{ij}$ & $p_i\times 1$ & Column vector containing $j$th row of $\bm X_i$ \\ $\bm J_{ij}$ & $\ell_i\times 1$ & Column vector containing $j$th row of $\bm J_i$ \\ $\bm i_i$ & $n$ & Vector of grouping factor indices \\ $\bm Z_i$ & $n\times q_i$ & Term-wise random-effects model matrix \\ $\bm\theta$ & $m$ & Covariance parameters \\ $\bm T_i$ & $p_i\times p_i$ & Lower triangular template matrix \\ $\bm\Lambda_i$ & $q_i\times q_i$ & Term-wise relative covariance factor \\ \hline \end{tabular} \caption{Symbols used to describe the structure of the random-effects model matrix and the relative covariance factor. The subscript $i = 1, \dots, k$ denotes a specific random-effects term.} \label{tab:algebraic} \end{table} The expression, \code{r}, is a linear model formula that evaluates to an \proglang{R} model matrix, $\bm X_i$, of size $n\times p_i$, called the \emph{raw random-effects model matrix} for term $i$. A term is said to be a \emph{scalar} random-effects term when $p_i=1$, otherwise it is \emph{vector-valued}. For a \emph{simple, scalar} random-effects term of the form \code{(1 | f)}, $\bm X_i$ is the % | $n\times 1$ matrix of ones, which implies a random intercept model. The expression \code{f} evaluates to an \proglang{R} factor, called the \emph{grouping factor}, for the term. For the $i$th term, we represent this factor mathematically with a vector $\bm i_i$ of \emph{factor indices}, which is an $n$-vector of values from $1,\dots,\ell_i$.\footnote{In practice, fixed-effects model matrices and random-effects terms are evaluated with respect to a \emph{model frame}, ensuring that any expressions for grouping factors have been coerced to factors and any unused levels of these factors have been dropped. That is, $\ell_i$, the number of levels in the grouping factor for the $i$th random-effects term, is well-defined.} Let $\bm J_i$ be the $n\times \ell_i$ matrix of indicator columns for $\bm i_i$. Using the \pkg{Matrix} package \citep{Matrix_pkg} in \proglang{R}, we may construct the transpose of $\bm J_i$ from a factor vector, \code{f}, by coercing \code{f} to a `\code{sparseMatrix}' object. For example, <>= set.seed(2) @ <>= (f <- gl(3, 2)) (Ji <- t(as(f, Class = "sparseMatrix"))) @ When $k>1$ we order the random-effects terms so that $\ell_1\ge\ell_2\ge\dots\ge\ell_k$; in general, this ordering reduces ``fill-in'' (i.e., the proportion of elements that are zero in the lower triangle of $\bLt\trans\bm Z\trans\bm W\bm Z\bLt+\bm I$ but not in the lower triangle of its left Cholesky factor, $\bm L_{\bm\theta}$, described below in Equation~\ref{eq:blockCholeskyDecomp}). This reduction in fill-in provides more efficient matrix operations within the penalized least squares algorithm (Section~\ref{sec:plsMath}). \subsubsection{Constructing the random-effects model matrix} \label{sec:mkZ} The $i$th random-effects term contributes $q_i=\ell_ip_i$ columns to the model matrix $\bm Z$. We group these columns into a matrix, $\bm Z_i$, which we refer to as the \emph{term-wise model matrix} for the $i$th term. Thus $q$, the number of columns in $\bm Z$ and the dimension of the random variable, $\mc{B}$, is \begin{equation} \label{eq:qcalc} q=\sum_{i=1}^k q_i = \sum_{i=1}^k \ell_i\,p_i . \end{equation} Creating the matrix $\bm Z_i$ from $\bm X_i$ and $\bm J_i$ is a straightforward concept that is, nonetheless, somewhat awkward to describe. Consider $\bm Z_i$ as being further decomposed into $\ell_i$ blocks of $p_i$ columns. The rows in the first block are the rows of $\bm X_i$ multiplied by the 0/1 values in the first column of $\bm J_i$ and similarly for the subsequent blocks. With these definitions we may define the term-wise random-effects model matrix, $\bm Z_i$, for the $i$th term as a transposed Khatri-Rao product, \begin{equation} \label{eq:Zi} \bm Z_i = (\bm J_i\trans * \bm X_i\trans)\trans = \begin{bmatrix} \bm J_{i1}\trans \otimes \bm X_{i1}\trans \\ \bm J_{i2}\trans \otimes \bm X_{i2}\trans \\ \vdots \\ \bm J_{in}\trans \otimes \bm X_{in}\trans \\ \end{bmatrix}, \end{equation} where $*$ and $\otimes$ are the Khatri-Rao\footnote{Note that the original definition of the Khatri-Rao product is more general than the definition used in the \pkg{Matrix} package, which is the definition we use here.} \citep{khatri1968solutions} and Kronecker products, and $\bm J_{ij}\trans$ and $\bm X_{ij}\trans$ are row vectors of the $j$th rows of $\bm J_i$ and $\bm X_i$. These rows correspond to the $j$th sample in the response vector, $\mc Y$, and thus $j$ runs from $1, \dots, n$. The \pkg{Matrix} package for \proglang{R} contains a \code{KhatriRao} function, which can be used to form $\bm Z_i$. For example, if we begin with a raw model matrix, <>= (Xi <- cbind(1, rep.int(c(-1, 1), 3L))) @ then the term-wise random-effects model matrix is, <>= (Zi <- t(KhatriRao(t(Ji), t(Xi)))) @ <>= ## alternative formulation of Zi (eq:Zi) rbind( Ji[1,] %x% Xi[1,], Ji[2,] %x% Xi[2,], Ji[3,] %x% Xi[3,], Ji[4,] %x% Xi[4,], Ji[5,] %x% Xi[5,], Ji[6,] %x% Xi[6,]) @ In particular, for a simple, scalar term, $\bm Z_i$ is exactly $\bm J_i$, the matrix of indicator columns. For other scalar terms, $\bm Z_i$ is formed by element-wise multiplication of the single column of $\bm X_i$ by each of the columns of $\bm J_i$. Because each $\bm Z_i$ is generated from indicator columns, its cross-product, $\bm Z_i\trans\bm Z_i$ is block-diagonal consisting of $\ell_i$ diagonal blocks each of size $p_i$.\footnote{To see this, note that by the properties of Kronecker products we may write the cross-product matrix $Z_i\trans Z_i$ as $\sum_{j=1}^n \bm J_{ij} \bm J_{ij}\trans \otimes \bm X_{ij} \bm X_{ij}\trans$. Because $\bm J_{ij}$ is a unit vector along a coordinate axis, the cross-product $\bm J_{ij} \bm J_{ij}\trans$ is a $p_i\times p_i$ matrix of all zeros except for a single $1$ along the diagonal. Therefore, the cross-products, $\bm X_{ij} \bm X_{ij}\trans$, will be added to one of the $\ell_i$ blocks of size $p_i\times p_i$ along the diagonal of $Z_i\trans Z_i$.} Note that this means that when $k=1$ (i.e., there is only one random-effects term, and $\bm Z_i = \bm Z$), $\bm Z\trans\bm Z$ will be block diagonal. These block-diagonal properties allow for more efficient sparse matrix computations (Section~\ref{sec:CSCmats}). The full random-effects model matrix, $\bm Z$, is constructed from $k\ge 1$ blocks, \begin{equation} \label{eq:Z} \bm Z = \begin{bmatrix} \bm Z_1 & \bm Z_2 & \hdots & \bm Z_k \\ \end{bmatrix}. \end{equation} By transposing Equation~\ref{eq:Z} and substituting in Equation~\ref{eq:Zi}, we may represent the structure of the transposed random-effects model matrix as follows, \begin{equation} \label{eq:Zt} \bm Z\trans = \begin{blockarray}{ccccc} \text{sample 1} & \text{sample 2} & \hdots & \text{sample } n & \\ \begin{block}{[cccc]c} \bm J_{11} \otimes \bm X_{11} & \bm J_{12} \otimes \bm X_{12} & \hdots & \bm J_{1n} \otimes \bm X_{1n} & \text{term 1} \\ \bm J_{21} \otimes \bm X_{21} & \bm J_{22} \otimes \bm X_{22} & \hdots & \bm J_{2n} \otimes \bm X_{2n} & \text{term 2} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ \end{block} \end{blockarray}. \end{equation} Note that the proportion of elements of $Z\trans$ that are structural zeros is \begin{equation} \label{eq:ZtSparsity} \frac{\sum_{i=1}^k p_i(\ell_i - 1)}{\sum_{i=1}^k p_i} \qquad . \end{equation} Therefore, the sparsity of $\bm Z\trans$ increases with the number of grouping factor levels. As the number of levels is often large in practice, it is essential for speed and efficiency to take account of this sparsity, for example by using sparse matrix methods, when fitting mixed models (Section~\ref{sec:CSCmats}). \subsubsection{Constructing the relative covariance factor} \label{sec:mkLambdat} The $q\times q$ covariance factor, $\bLt$, is a block diagonal matrix whose $i$th diagonal block, $\bm\Lambda_i$, is of size $q_i,i=1,\dots,k$. We refer to $\bm\Lambda_i$ as the \emph{term-wise relative covariance factor}. Furthermore, $\bm\Lambda_i$ is a homogeneous block diagonal matrix with each of the $\ell_i$ lower-triangular blocks on the diagonal being a copy of a $p_i\times p_i$ lower-triangular \emph{template matrix}, $\bm T_i$. The covariance parameter vector, $\bm\theta$, of length $m_i =\binom{p_i+1}{2}$, consists of the elements in the lower triangle of $\bm T_i,i=1,\dots,k$. To provide a unique representation we require that the diagonal elements of the $\bm T_i,i=1,\dots,k$ be non-negative. The template, $\bm T_i$, can be constructed from the number $p_i$ alone. In \proglang{R} code we denote $p_i$ as \code{nc}. For example, if we set \code{nc <- 3}\Sexpr{nc <- 3}, we could create the template for term $i$ as, <>= nc <- 3 @ %% sequence() is equivalent to unlist(lapply(nvec, seq_len)) %% and (?sequence) ``mainly exists in reverence to the very early history of R'' %% scw: i like sequence, and in fact i never understood why that %% statement is there in the help file. <