Handout For ALDA Workshop - 001

You may download this handout and supporting materials at: http://gseweb.harvard.edu/~faculty/singer/ http://gseacademic.harvard.edu/alda/ http://gseacademic.harvard.edu/~willetjo/ http://www.ats.ucla.edu/stat/examples/alda/ Judith D.
. Singer & John B. Willett (2006)
Individual Growth Modeling: Modern Methods for Studying Change

Judith D. Singer & John B. Willett
Harvard Graduate School of Education
Time is the one immaterial object which we cannot influence neither speed up nor slow down, add to nor diminish. Maya Angelou
Judith D. Singer & John B. Willett, Harvard Graduate School of Education, Workshop Overview 1, slide 1
The fundamental problem of Muybridge research: The fundamental problem longitudinal research: The study of Making continuous of longitudinal(18301904) The study of TIME:continuoustime stand still (18301904) TIME: Eadweard stand still Making Eadweard Muybridge time
Eadweard Muybridge Animal Locomotion (1887)
The height of the son of Count Filibert Guneau de Montbeillard (1720-1785) The height of the son of Count Filibert Guneau de Montbeillard (1720-1785)
Scammon, RE (1927) The first seriation study of human growth, Am J of Physical Anthropology, 10, 329-336.
The first known longitudinal study of growth: The first known longitudinal study of growth:
200
150
oopsmeasurement error?
Height (in cm)
100
50
Recorded his sons height approximately every six months from birth (in 1759) until age 18
0 0 5 10 Age
15
20
Fast forward to the present: Fast forward to the present: In most fields, the quantity of longitudinal research is exploding In most fields, the quantity of longitudinal research is exploding
Annual searches for keyword 'longitudinal' in 9 OVID databases, between 1982 and 2005
10,000
medicine business biology psychology
1,000
sociology agriculture education zoology economics
100
10 '81 '84 '87 '90 '93 '96 '99 '02 '05
But what about the quality?: What does todays longitudinal research look like? But what about the quality?: What does todays longitudinal research look like?
Read 150 articles in 10 issues of APA journals published in each of 1999, 2003 and 2006
First, the good news: First, the good news: More longitudinal studies are More longitudinal studies are being published being published More of these are truly More of these are truly longitudinal longitudinal
Now, the bad news: Now, the bad news: Very few of these longitudinal Very few of these longitudinal studies use modern analytic studies use modern analytic methods methods
0 10 20 30 40 50
0
>1 Wave
1999 2003 2006
10
20
30
40
50
60
Growth Modeling Survival Analysis Repeated Measures ANOVA Wave-on-Wave regression Separate but parallel analyses Set aside waves
2 Waves
3 Waves
4+ Waves
Combine waves Ignore age heterogeneity
Part of the problem may well be reviewers ignorance Part of the problem may well be reviewers ignorance
Comments received from two reviewers for Developmental Psychology of a paper that fit individual growth models to 3 waves of data on vocabulary size among young children:
Reviewer A: I do not understand the statistics used in this study deeply enough to evaluate their appropriateness. I imagine this is also true of 99% of the readers of Developmental Psychology. Previous studies in this area have used simple correlation or regression which provide easily interpretable values for the relationships among variables. In all, while the authors are to be applauded for a detailed longitudinal study, the statistics are difficult. I thus think Developmental Psychology is not really the place for this paper.
Reviewer B: The analyses fail to live up to the promiseof the clear and cogent introduction. I will note as a caveat that I entered the field before the advent of sophisticated growthmodeling techniques, and they have always aroused my suspicion to some extent. I have tried to keep up and to maintain an open mind, but parts of my review may be nave, if not inaccurate.
What kinds of research questions require longitudinal methods? What kinds of research questions require longitudinal methods?
Questions about systematic change over time
Questions about whether and when events occur
Curran et al (1997) studied alcohol use 82 teens interviewed at ages 14, 15 & 16 alcohol use tended to increase over time Children of Alcoholics (COAs) drank more but had no steeper rates of increase over time.
Capaldi et al (1996) studied age of 1st sex 180 boys interviewed annually from 7th to 12th grade (30% remained virgins at end of study) Boys who experienced early parental transitions were more likely to have had sex.
1. Within-person summary: How does a teens alcohol consumption change over time? 2. Between-person comparison: How do these trajectories vary by teen characteristics?
1. Within-person summary: When are boys most at risk of having sex for the 1st time? 2. Between-person comparison: How does this risk vary by teen characteristics?
Individual Growth Model/ Multilevel Model for Change
Discrete- and Continuous-Time Survival Analysis
Four important advantages of modern longitudinal methods Four important advantages of modern longitudinal methods
You can identify temporal patterns in the data Does the outcome increase, decrease, or remain stable over time? Is the general pattern linear or non-linear? Are there abrupt shifts at substantively interesting moments? You can include time varying predictors (those whose values vary over time) Participation in an intervention Family circumstances (employment, marital status, etc) You can include interactions with time (to test whether a predictors effect varies over time) Some effects dissipatethey wear off Some effects increasethey become more important Some effects are especially pronounced at particular times
You have great flexibility in research design Not everyone needs the same rigid data collection schedulecadence can be person specific Not everyone needs the same number of wavescan use all cases, even those with just one wave! Design can be experimental or observational Designs can be single level (individuals only) or multilevel (e.g., patients within physician practices)
What were going to cover in this workshop What were going to cover in this workshop
A word about programming, software and other supplemental materials A word about programming, software and other supplemental materials
www.ats.ucla.edu/stat/examples/alda
Chapter
Table of contents A framework for investigating change over time Exploring longitudinal data on change Introducing the multilevel model for change Doing data analysis with the multilevel model for change Treating time more flexibly Modeling discontinuous and nonlinear change Examining the multilevel models error covariance structure Modeling change using covariance structure analysis A framework for investigating event occurrence Describing discrete-time event occurrence data Fitting basic discrete-time hazard models Extending the discrete-time hazard model Describing continuous-time event occurrence data Fitting the Cox regression model Extending the Cox regression model
Datasets Ch 1 Ch 2 Ch 3 Ch 4 Ch 5 Ch 6 Ch 7 Ch 8 Ch 9 Ch 10 Ch 11 Ch 12 Ch 13 Ch 14 Ch 15
Applied Longitudinal Data Analysis website http://gseacademic.harvard.edu/alda materials from past workshops videos of past workshops
MLwiN
SPSS
Mplus
SPlus
Stata
HLM
SAS
S-077: Applied Longitudinal Data Analysis more fully annotated computer code examples of detailed computer output course videos
Introducing the Multilevel Model for Change:

ALDA, Chapter Three
When youre finished changing, youre finished Benjamin Franklin
John B. Willett & Judith D. Singer Harvard Graduate School of Education
Judith D. Singer & John B. Willett, Harvard Graduate School of Education, ALDA, Chapter 3, slide 1
Chapter 3: Introducing the multilevel model for change Chapter 3: Introducing the multilevel model for change
General Approach: Well go through a worked example from start to finish; well save practical data analytic advice for the next session The level-1 submodel for individual change (3.2)examining The level-1 submodel for individual change (3.2)examining empirical growth trajectories and asking what population model might empirical growth trajectories and asking what population model might have given rise these observations? have given rise these observations? The level-2 submodels for systematic interindividual differences in The level-2 submodels for systematic interindividual differences in change (3.3)what kind of population model should we hypothesize to change (3.3)what kind of population model should we hypothesize to represent the behavior of the parameters from the level-1 model? represent the behavior of the parameters from the level-1 model? Fitting the multilevel model for change to data (3.4)there are now Fitting the multilevel model for change to data (3.4)there are now many options for model fitting, and more practically, many software many options for model fitting, and more practically, many software options. options. Interpreting the results of model fitting (3.5 and 3.6) Having fit the Interpreting the results of model fitting (3.5 and 3.6) Having fit the model, how do we sensibly interpret and display empirical results? model, how do we sensibly interpret and display empirical results?
Interpreting fixed effects Interpreting fixed effects Interpreting variance components Interpreting variance components Plotting prototypical trajectories Plotting prototypical trajectories
(ALDA, Chapter 3 intro, p. 45)
Illustrative example: The effects of early intervention on childrens IQ Illustrative example: The effects of early intervention on childrens IQ
Data source: Peg Burchinal and colleagues (2000) Child Development.
Sample: 103 African American Sample: 103 African American children born to low income families children born to low income families
58 randomly assigned to an early 58 randomly assigned to an early intervention program intervention program 45 randomly assigned to aacontrol 45 randomly assigned to control group group
Research design Research design
Each child was assessed 12 times Each child was assessed 12 times between ages 66and 96 months between ages and 96 months Here, we analyze only 33waves of data, Here, we analyze only waves of data, collected at ages 12, 18, and 24 months collected at ages 12, 18, and 24 months
Research question: What is the effect Research question: What is the effect of the early intervention program on of the early intervention program on childrens cognitive performance? childrens cognitive performance?
Within-individual: How does aachilds Within-individual: How does childs cognitive performance change between cognitive performance change between 12 and 24 months? 12 and 24 months? Between individuals: Do the Between individuals: Do the trajectories for children in the early trajectories for children in the early intervention program differ from those intervention program differ from those in the control group? [And, ififthey do in the control group? [And, they do differ, how do they differ?] differ, how do they differ?]
(ALDA, Section 3.1, pp. 46-49)
The fundamental building block of growth modeling The fundamental building block of growth modeling
General structure: A person-period data set has one row of data General structure: A person-period data set has one row of data for each period when that particular person was observed for each period when that particular person was observed
The person-period data set: The person-period data set:
Fully balanced, Fully balanced, 3 waves per child 3 waves per child AGE=1.0, 1.5, and 2.0 AGE=1.0, 1.5, and 2.0 (clocked in years (clocked in years instead of monthsso instead of monthsso that we assess annual that we assess annual rate of change) rate of change)
PROGRAM is a dummy variable PROGRAM is a dummy variable indicating whether the child was indicating whether the child was randomly assigned to the special randomly assigned to the special early childhood program (1) or early childhood program (1) or not (0) not (0)
COG is a nationally normed scale COG is a nationally normed scale Declines within empirical Declines within empirical growth records growth records Instead of asking whether the Instead of asking whether the growth rate is higher among growth rate is higher among program participants, well ask program participants, well ask whether the rate of decline is whether the rate of decline is lower lower
Examining empirical growth plots to help suggest a suitable individual growth model Examining empirical growth plots to help suggest a suitable individual growth model
(by superimposing fitted OLS trajectories) (by superimposing fitted OLS trajectories)
Many trajectories are smooth and systematic Many trajectories are smooth and systematic (70, 71, 72, 904, 908) (70, 71, 72, 904, 908)
150 125 COG ID 68 150 125 COG ID 70 125 150 COG ID 71 125 100 150 COG ID 72
Overall impression: Overall impression:

COG declines over COG declines over time, but theres some time, but theres some variation in the fit (its variation in the fit (its quality and shape) quality and shape)
100 75 50
100 75 50
100 75 50

1 1.5 AGE
75 50
1.5 AGE
1.5 AGE
1.5 AGE
150 125 100 75 50
COG ID 902
150 125
COG ID 904
150 125
COG ID 906
150 125 100
COG ID 908
100 75 50

1 1.5 AGE
100 75 50

1 1.5 AGE
75 50
1.5 AGE
1.5 AGE
Other trajectories are scattered, irregular (and could Other trajectories are scattered, irregular (and could even be curvilinear???) even be curvilinear???) (68, 902, 906) (68, 902, 906)
Key question when examining empirical growth Key question when examining empirical growth plots: What type of population individual growth plots: What type of population individual growth model might have generated these sample data? model might have generated these sample data?
Linear or curvilinear? Linear or curvilinear? Smooth or jagged? Smooth or jagged? Continuous or disjoint? Continuous or disjoint?
With just 33waves of data and many of the empirical growth With just waves of data and many of the empirical growth plots suggesting aalinear model would be fine, it makes plots suggesting linear model would be fine, it makes sense to start with aasimple linear individual growth model sense to start with simple linear individual growth model
Postulating a simple linear level-1 submodel for individual change: Postulating a simple linear level-1 submodel for individual change:
Examining its structural and stochastic portions Examining its structural and stochastic portions
Structural portion,which embodies our hypothesis about the shape of each persons true trajectory of change over time
Key assumption: In the population, COGij is a linear function of child is AGE on occasion j
Stochastic portion,which allows for the effects of random error from the measurement of person i 2 on occasion j. Usually assume ij ~ N(0, )
COGij = 0i + 1i ( AGE ij 1) + ij
] [ ]
i indexes persons (i=1 to 103) j indexes occasions/periods (j=1 to 3)
i1, i2, and i3 are deviations

Individual is hypothesized true change trajectory
150 COG
of is true change trajectory from linearity on each occasion (including the effects of
measurement error & omitted timevarying predictors)
0i is the intercept of is true

change trajectory. Because we have centered AGE at 1, 0i is is true value of COG at AGE=1, his true initial status
125
i1
i3
100
i2
1i is the slope of is true change trajectory, his yearly rate of change in true COG, his true annual rate of change
Net result: The individual growth Net result: The individual growth parameters, 0i and 1i , ,fully describe parameters, 0i and fully describe person is hypothesized1i trueindividual person is hypothesizedtrue individual growth trajectory growth trajectory
1 year
75
50 1
1.5 AGE
Examining fitted OLS trajectories to help suggest a suitable level-2 model Examining fitted OLS trajectories to help suggest a suitable level-2 model
Most children decline over time (although there are a few exceptions)
COG
But theres also great variation in these OLS estimates

Fitted initial status
14 13* 13. 12* 12. 11* 11. 10* 10. 9* 9. 8* 8. 7* 7. 6* 6. 5* 0 5568 00134 5556778999 02233344 55667777888889 000111112222233334444 55666688999 0012222244 6666677799 344 89 34 7
Fitted rate of change

2. 1* 1. 0* 0. -0* -0. -1* -1. -2* -2. -3* -3. -4* 0 0 79 134 4444332 99998888777765 4333322211000 99888877666655 44322211110000 9999877776655 443322100000 987 443111
Residual variance
46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 8
150
00 8 3 4 7 1444 8 3 00011 21 44433 1118886666 77744 333844 04444888833338888888 0000111122233334444444466668111114447
125
100
75
What does this behavior suggest about a suitable level-2 model?

50 1 1.5 AGE 2
Average OLS trajectory across the full sample 110-10 (AGE - 1)

(ALDA, Section 3.2.3, pp. 55-56)
The level-2 model must capture both the averages of the individual growth parameters and variation about these averages Andit must also provide a way to represent systematic interindividual differences in change according to variation in predictor(s) (here, PROGRAM participation)
Further developing the level-2 submodel for interindividual differences in change Further developing the level-2 submodel for interindividual differences in change
Four desired features of the level-2 submodel(s)
PROGRAM=0
150 COG 150 COG
PROGRAM=1
125
125
100
100
75
75
50 1 1.5 AGE 2
50 1 1.5 AGE 2
Program participants tend to have: Program participants tend to have: Higher scores at age 1 (higher initial status) Higher scores at age 1 (higher initial status) Less steep rates of decline (shallower slopes) Less steep rates of decline (shallower slopes) But these are only overall trendstheres great But these are only overall trendstheres great interindividual heterogeneity interindividual heterogeneity
1. Outcomes are the level-1 individual growth parameters 0i and 1i 2. Need two level-2 submodels, one per growth parameter (one for initial status, one for change) 3. Each level-2 submodel must specify the relationship between a level-1 growth parameter and predictor(s), here PROGRAM We need to specify a functional form for these relationships at level-2 (beginning with linear but ultimately becoming more flexible) 4. Each level-2 submodel should allow individuals with common predictor values to nevertheless have different individual change trajectories We need stochastic variation at level-2, too Each level-2 model will need its own error term, and we will need to allow for covariance across level-2 errors
Level-2 submodels for systematic interindividual differences in change Level-2 submodels for systematic interindividual differences in change
0i = 00 + 01 PROGRAM + 0i
For the level-1 intercept (initial status)
For the level-1 slope (rate of change)
1i = 10 + 11 PROGRAM + 1i
Key to remembering subscripts Key to remembering subscripts on the gammas (the s) on the gammas (the s) First subscript indicates role in First subscript indicates role in level-1 model (0 for intercept; 11 level-1 model (0 for intercept; for slope) for slope) Second subscript indicates role Second subscript indicates role in level-2 model (0 for intercept; in level-2 model (0 for intercept; 11for slope) for slope)
What about the zetas (thes)?

Theyre level-2 residuals that permit the level-1 individual growth parameters to vary stochastically across people As with most residuals, were less interested in their values than their population variances and covariances
Understanding the stochastic components of the level-2 submodels Understanding the stochastic components of the level-2 submodels
0i = 00 + 01 PROGRAM + 0i 1i = 10 + 11 PROGRAM + 1i
125
PROGRAM=0
150 COG Population trajectory for child i, (00 + 0i ) + (10 + 1i ) (AGE-1) 125 150
PROGRAM=1
COG
Key ideas behind the level-2 models: Key ideas behind the level-2 models: Models posit the existence of an average Models posit the existence of an average population trajectory for each program group population trajectory for each program group Because the level-2 models also include residuals Because the level-2 models also include residuals (the zetas), each child i ihas his own true change (the zetas), each child has his own true change trajectory (defined by 0i and 1i)) trajectory (defined by 0i and 1i In the figure, the shading is supposed to suggest the In the figure, the shading is supposed to suggest the existence of many true population trajectories, one existence of many true population trajectories, one per child per child
100 Average population trajectory, 00 + 10 (AGE-1) 75
100
Average population trajectory, (00 + 01) + (10 + 11) (AGE-1)
75
50 1 1.5 AGE 2
50 1 1.5 AGE 2
Assumptions about the level-2 residuals: Assumptions about the level-2 residuals:
initial status rate of change
2 0 0 0i , ~ N 0 1i 10
01 12
Three general types of software options (whose numbers are increasing over time) Three general types of software options (whose numbers are increasing over time)
Fitting the multilevel model for change to data Fitting the multilevel model for change to data
Programs expressly Programs expressly designed for multilevel designed for multilevel modeling modeling
MLwiN
Multipurpose packages Multipurpose packages with multilevel with multilevel modeling modules modeling modules
Specialty packages Specialty packages originally designed for originally designed for another purpose that another purpose that can also fit some can also fit some multilevel models multilevel models
aML
Two sets of issues to consider when comparing (and selecting) packages Two sets of issues to consider when comparing (and selecting) packages
88practical considerations practical considerations 88technical considerations technical considerations
(that affect ease of use/pedagogic value) (that affect ease of use/pedagogic value) Data input optionslevel-1/level-2 vs. Data input optionslevel-1/level-2 vs. person-period; raw data or xyz.dataset person-period; raw data or xyz.dataset Programming optionsgraphical Programming optionsgraphical interfaces and/or scripts interfaces and/or scripts Availability of other statistical Availability of other statistical procedures procedures Model specification optionslevel-1/ Model specification optionslevel-1/ level-2 vs. composite; random effects level-2 vs. composite; random effects Automatic centering options Automatic centering options Wisdom of programs defaults Wisdom of programs defaults Documentation & user support Documentation & user support Quality of outputtext & graphics Quality of outputtext & graphics
(that affect research value) (that affect research value) ##of levels that can be handled of levels that can be handled
Range of assumptions supported (for Range of assumptions supported (for the outcomes & effects) the outcomes & effects) Types of designs supported (e.g., crossTypes of designs supported (e.g., crossnested designs; latent variables) nested designs; latent variables) Estimation routinesfull vs. restricted; Estimation routinesfull vs. restricted; ML vs. GLSmore on this later ML vs. GLSmore on this later Ability to handle design weights Ability to handle design weights Quality and range of diagnostics Quality and range of diagnostics Speed Speed Strategies for handling estimation Strategies for handling estimation problems (e.g., boundary constraints) problems (e.g., boundary constraints)
Advice: Use whatever package youd like but be sure to invest the time and energy to learn to use it well. Visit http://www.ats.ucla.edu/stat/examples/alda for data, code in the major packages, and more
Examining estimated fixed effects Examining estimated fixed effects
In the population from which this sample was drawn we estimate that
True initial status (COG at age 1) for the average non-participant is 107.84
Fitted model for initial status Fitted model for rate of change
For the average participant, it is 6.85 higher
0i = 107.84 + 6.85 PROGRAM i

1i = 21.13 + 5.27 PROGRAM i
For the average participant, it is 5.27 higher
True annual rate of change for the average non-participant is 21.13
Advice: As youre learning these methods, take the time to actually write out the fitted level-1/level-2 models before interpreting computer outputIts the best way to learn what youre doing!
Plotting prototypical change trajectories Plotting prototypical change trajectories
General idea: Substitute prototypical values for the level-2 predictors General idea: Substitute prototypical values for the level-2 predictors (here, just PROGRAM=0 or 1) into the fitted models (here, just PROGRAM=0 or 1) into the fitted models
150 COG
0i = 107.84 + 6.85 PROGRAM i 1i = 21.13 + 65.27 PROGRAM i

PROGRAM = 1
COG = 114.69 15.86( AGE 1)
125
100
0i = 107.84 + 6.85(1) = 114.69 1i = 21.13 + 65.27(1) = 15.86 PROGRAM = 0 0i = 107.84 + 6.85(0) = 107.84 1i = 21.13 + 65.27(0) = 21.13
75
COG = 107.84 21.13( AGE 1)
50 1 1.5 AGE 2
Tentative conclusion: Program participants appear to have higher initial status and slower rates of decline. Question: Might these differences be due to nothing more than sampling variation?
Testing hypotheses about fixed effects using single parameter tests Testing hypotheses about fixed effects using single parameter tests
For initial status:
Average non-participant had a non-zero level of COG at age 1 (surprise!) Program participants had higher initial status, on average, than non-participants
(probably because the intervention had already started)
General formulation:
z=
ase()
For rate of change:

Average non-participant had a nonzero rate of decline (depressing) Program participants had slower rates of decline, on average, than non-participants (the program effect).
(ALDA, Section 3.5.2, pp.71-72)
Careful: Most programs provide appropriate tests but different programs use different terminology
Terms like z-statistic, t-statistic, t-ratio, quasi-tstatisticwhich are not the sameare used interchangeably
Examining estimated variance components Examining estimated variance components

General idea:: General idea Variance components quantify the amount of Variance components quantify the amount of residual variation leftat either level-1 or level-2 residual variation leftat either level-1 or level-2 that is potentially explainable by other predictors not that is potentially explainable by other predictors not yet in the model. yet in the model. Interpretation is easiest when comparing different Interpretation is easiest when comparing different models that each have different predictors (which we models that each have different predictors (which we will do in the next unit). will do in the next unit).
Level-1 residual variance (74.24***):

Summarizes within-person variability in outcomes around individuals own trajectories (usually non-zero) Here, we conclude there is some within-person residual variability If we had time-varying predictors, they might be able to explain some of this within-person residual variability
Level-2 residual variance:

Summarizes between-person variability in change trajectories (here, initial status and growth rates) after controlling for predictor(s) (here, PROGRAM) There are still statistically significant differences in true initial status after controlling for program (124.64***) There is no statistically significant residual variance in rates of change to be explainedits probably little use to add substantive predictors of change The residual covariance between initial status and rates of change is not statistically significant
124.64 * * * 36.41 36.41 12.29
Doing data analysis with the multilevel model for change

ALDA, Chapter Four
We are restless because of incessant change, but we would be frightened if change were stopped Lyman Bryson
Judith D. Singer & John B. Willett Harvard Graduate School of Education
Chapter 4: Doing data analysis with the multilevel model for change Chapter 4: Doing data analysis with the multilevel model for change
General Approach: Once again, well go through a worked example, but now well delve into the practical data analytic details Composite specification of the multilevel model for change Composite specification of the multilevel model for change (4.2) and how ititrelates to the level-1/level-2 specification just (4.2) and how relates to the level-1/level-2 specification just introduced introduced First steps: unconditional means model and unconditional First steps: unconditional means model and unconditional growth model (4.4) growth model (4.4)
Intraclass correlation Intraclass correlation Quantifying proportion of outcome variation explained Quantifying proportion of outcome variation explained
Practical model building strategies (4.5) Practical model building strategies (4.5)
Developing and fitting aataxonomy of models Developing and fitting taxonomy of models Displaying prototypical change trajectories Displaying prototypical change trajectories Recentering to improve interpretation Recentering to improve interpretation
Comparing models (4.6) Comparing models (4.6)

Using deviance statistics Using deviance statistics Using information criteria (AIC and BIC) Using information criteria (AIC and BIC)
Illustrative example: The effects of parental alcoholism on adolescent alcohol use Illustrative example: The effects of parental alcoholism on adolescent alcohol use
Data source: Pat Curran and colleagues (1997)
Journal of Consulting and Clinical Psychology.
Sample: 82 adolescents Sample: 82 adolescents
37 are children of an alcoholic parent (COAs) 37 are children of an alcoholic parent (COAs) 45 are non-COAs 45 are non-COAs Each was assessed 33timesat ages 14, 15, and 16 Each was assessed timesat ages 14, 15, and 16 The outcome, ALCUSE, was computed as follows: The outcome, ALCUSE, was computed as follows:
At age 14, PEER, aameasure of peer alcohol use At age 14, PEER, measure of peer alcohol use was also gathered was also gathered
4 items: (1) drank beer/wine; (2) hard liquor; (3) 5 or more 4 items: (1) drank beer/wine; (2) hard liquor; (3) 5 or more drinks in a row; and (4) got drunk drinks in a row; and (4) got drunk Each item was scored on an 8 point scale (0=not at all to Each item was scored on an 8 point scale (0=not at all to 7=every day) 7=every day) ALCUSE isis the square root of the sum of these 4 items ALCUSE the square root of the sum of these 4 items
Research question Research question
Do trajectories of adolescent alcohol use differ by: Do trajectories of adolescent alcohol use differ by: (1) parental alcoholism; and (2) peer alcohol use? (1) parental alcoholism; and (2) peer alcohol use?
Whats an appropriate functional form for the level-1 submodel? Whats an appropriate functional form for the level-1 submodel?
(Examining empirical growth plots with superimposed OLS trajectories) (Examining empirical growth plots with superimposed OLS trajectories)
3 features of these plots: 3 features of these plots:
1. Most seem approximately 1. Most seem approximately linear (but not always linear (but not always increasing over time) increasing over time) 2. Some OLS trajectories fit well 2. Some OLS trajectories fit well (23, 32, 56, 65) (23, 32, 56, 65) 3. Other OLS trajectories show 3. Other OLS trajectories show more scatter (04, 14, 41, 82) more scatter (04, 14, 41, 82)
A linear model makes sense
ALCUSEij = 0i + 1i ( AGEij 14) + ij where ij ~ N (0, 2 )
Yij = 0i + 1i TIMEij + ij
is true initial status (ie, when TIME=0)
(ALDA, Section 4.1, pp.76-80)
is true rate of change per unit of TIME
portion of is outcome that is unexplained on occasion j
Specifying the level-2 submodels for individual differences in change Specifying the level-2 submodels for individual differences in change
Examining variation in OLS-fitted Examining variation in OLS-fitted level-1 trajectories by: level-1 trajectories by:
COA: COAs have higher intercepts but no COA: COAs have higher intercepts but no steeper slopes steeper slopes PEER (split at mean): Teens whose friends at PEER (split at mean): Teens whose friends at age 14 drink more have higher intercepts but age 14 drink more have higher intercepts but shallower slopes shallower slopes
COA = 0 COA = 1
4 ALCUSE
Level-2 intercepts Level-2 intercepts Population average Population average initial status and rate of initial status and rate of change for a non-COA change for a non-COA Level-2 slopes Level-2 slopes Effect of COA on Effect of COA on initial status and initial status and rate of change rate of change
ALCUSE
0i = 00 + 01COAi + 0i
1i = 10 + 11COAi + 1i
13 14 15 AGE 16 17
(for initial status) (for rate of change)
-1
13
14
15 AGE
16
17
-1
Low PEER
4 ALCUSE 4 ALCUSE
High PEER
Level-2 residuals Level-2 residuals 2 0 0 01 0i Deviations of individual Deviations of individual ~ N 0, 2 change trajectories around change trajectories around 1i 10 1 predicted averages predicted averages
13 14 15 AGE 16 17
-1
13
14
15 AGE
16
17
-1
(ALDA, Section 4.1, pp.76-80)
Developing the composite specification of the multilevel model for change Developing the composite specification of the multilevel model for change
by substituting the level-2 submodels into the level-1 individual growth model by substituting the level-2 submodels into the level-1 individual growth model
0i = 00 + 01COAi + 0i
1i = 10 + 11COAi + 1i
Yij = 0i + 1i TIMEij + ij
Y ij = ( 00 + 01 COA i + 0 i ) + ( 10 + 11 COA i + 1i )TIME
ij
+ ij
Yij = [ 00 + 10TIMEij + 01COAi + 11 (COAi TIMEij )] + [ 0i + 1i TIMEij + ij ]
The composite specification shows how The composite specification shows how the outcome depends simultaneously on: the outcome depends simultaneously on:
The composite specification also: The composite specification also:
the level-1 predictor TIME and the level-2 the level-1 predictor TIME and the level-2 predictor COA as well as predictor COA as well as the cross-level interaction, COATIME. the cross-level interaction, COATIME. This tells us that the effect of one predictor This tells us that the effect of one predictor (TIME) differs by the levels of another (TIME) differs by the levels of another predictor (COA) predictor (COA)
Demonstrates the complexity of the Demonstrates the complexity of the composite residualthis isis not regular composite residualthis not regular OLS regression OLS regression Is the specification used by most software Is the specification used by most software packages for multilevel modeling packages for multilevel modeling Is the specification that maps most easily Is the specification that maps most easily onto the person-period data set onto the person-period data set
The person-period data set and its relationship to the composite specification The person-period data set and its relationship to the composite specification
ID 3 3 3 4 4 4 44 44 44 66 66 66
ALCUSE 1.00 2.00 3.32 0.00 2.00 1.73 0.00 1.41 3.00 1.41 3.46 3.00
AGE-14 0 1 2 0 1 2 0 1 2 0 1 2
COA 1 1 1 1 1 1 0 0 0 0 0 0
COA*(AGE-14) 0 1 2 0 1 2 0 0 0 0 0 0
ALCUSE = [ 00 + 10 ( AGE 14)ij + 01COA + 11(COA ( AGE 14)ij )] ij i i + [ 0i + 1i ( AGE 14)ij + ij ]
Words of advice before beginning data analysis Words of advice before beginning data analysis
Be sure youve examined Be sure youve examined empirical growth plots and empirical growth plots and fitted OLS trajectories. You fitted OLS trajectories. You First steps: Two unconditional models First steps: Two unconditional models 1. Unconditional means modela model 1. Unconditional means modela model with no predictors at either level, which with no predictors at either level, which will help partition the total outcome will help partition the total outcome variation variation 2. Unconditional growth modela model 2. Unconditional growth modela model with TIME as the only level-1 predictor with TIME as the only level-1 predictor and no substantive predictors at level and no substantive predictors at level 2, which will help evaluate the baseline 2, which will help evaluate the baseline amount of change. amount of change. What these unconditional models tell us: What these unconditional models tell us: 1. Whether there is systematic variation 1. Whether there is systematic variation in the outcome worth exploring and, if in the outcome worth exploring and, if so, where that variation lies (within or so, where that variation lies (within or between people) between people) 2. How much total variation there is both 2. How much total variation there is both within- and between-persons, which within- and between-persons, which provides a baseline for evaluating the provides a baseline for evaluating the success of subsequent model building success of subsequent model building (that includes substantive predictors) (that includes substantive predictors)
dont want to begin data analysis dont want to begin data analysis without being reasonably confident without being reasonably confident that you have aa sound level-1 that you have sound level-1 model. model.
Double check (and then triple Double check (and then triple check) your person-period check) your person-period data set. data set.
Run simple diagnostics using Run simple diagnostics using statistical programs with which statistical programs with which youre very comfortable youre very comfortable Once again, you dont want to Once again, you dont want to invest too much data analytic invest too much data analytic effort in aa mis-formed data set effort in mis-formed data set
Dont jump in by fitting aa Dont jump in by fitting range of models with range of models with substantive predictors. Yes, substantive predictors. Yes,
you want to know the answer, you want to know the answer, but first you need to understand but first you need to understand how the data behave, so instead how the data behave, so instead you should you should
(ALDA, Section 4.4, p. 92+)
The Unconditional Means Model (Model A) The Unconditional Means Model (Model A) Partitioning total outcome variation between and within persons Partitioning total outcome variation between and within persons
Level-1 Model: Y ij = 0 i + ij , where ij ~ N ( 0 , 2 )
2 Level-2 Model: 0i = 00 + 0i , where 0i ~ N (0, 0 )
Composite Model: Y ij = 00 + 0 i + ij
Grand mean across individuals and occasions
Within-person deviations
Person-specific means
Within-person variance
Between-person variance
Lets look more closely at these variances.

(ALDA, Section 4.4.1, p. 92-97)
Using the unconditional means model to estimate Using the unconditional means model to estimate the Intraclass Correlation Coefficient (ICC or )) the Intraclass Correlation Coefficient (ICC or
Major purpose of the unconditional Major purpose of the unconditional means model: To partition the means model: To partition the variation in Y into two components variation in Y into two components Estimated within-person variance: Quantifies the
amount of variation within individuals over time
Estimated between-person variance: Quantifies the

amount of variation between individuals, regardless of time
Intraclass correlation compares the relative magnitude of these VCs by estimating the
2 0 2 0 + 2
proportion of total variation in Y that lies between people
0 . 564 = 0 . 50 0 . 564 + 0 . 562
An estimated 50% of the total variation in alcohol use is attributable to differences between adolescents
Having partitioned the total variation into within-persons and between-persons, lets ask: What role does TIME play?
(ALDA, Section 4.4.1, p. 92-97)
The Unconditional Growth Model (Model B) The Unconditional Growth Model (Model B) A baseline model for change over time A baseline model for change over time
Level-1 Model: Yij = 0 i + 1i TIME Level-2 Model: Composite Model:
ij
+ ij , where ij ~ N ( 0 , 2 )
01 12
Composite residual
0i = 00 + 0i 1i = 10 + 1i
0 2 where 0i ~ N , 0 0 1i 10
Yij = 00 + 10TIME ij + [ 0 i + 1iTIME ij + ij ]

Average true rate of change
Average true initial status at AGE 14
ALCUSE
ALCUSE = 0.651+ 0.271 AGE 14) (
0 13 14 15 AGE 16 17
What about the variance components from this unconditional growth model?
(ALDA, Section 4.4.2, pp 97-102)
The unconditional growth model: Interpreting the variance components The unconditional growth model: Interpreting the variance components
Level-1 (within person) There is still unexplained within-person residual variance
Level-2 (between-persons):
There is between-person residual variance in initial status (but careful, because the definition of initial status has changed) There is between-person residual variance in rate of change (should consider adding a level-2 predictor) Estimated res. covariance between initial status and change is n.s.
Sowhat has been the effect of moving from an unconditional means model to an unconditional growth model?
(ALDA, Section 4.4.2, pp 97-102)
Quantifying the proportion of outcome variation explained Quantifying the proportion of outcome variation explained
R2 = Proportional reduction in the Level - 1 variance component 0.562 0.337 = = 0.40 .562
40% of the within-person variation in ALCUSE is associated with linear time
RY2 ,Y = rY ,Y
( )
= (0 . 21 ) = 0 . 043
2
4.3% of the total variation in ALCUSE is associated with linear time

For later: Extending the idea of proportional reduction For later: Extending the idea of proportional reduction in variance components to Level-2 (to estimate the percentage of in variance components to Level-2 (to estimate the percentage of between-person variation in ALCUSE associated with predictors) between-person variation in ALCUSE associated with predictors)
PseudoR2 =
) 2 (UncondGrowthModel 2 (LaterGrowthModel) 2 (UncondGrowthModel )
Careful : :Dont do this comparison with the unconditional means model Careful Dont do this comparison with the unconditional means model (as you can see in this table!). (as you can see in this table!).
(ALDA, Section 4.4.3, pp 102-104)
Where weve been and where were going Where weve been and where were going
What these unconditional models tell us: 1. About half the total variation in ALCUSE is attributable to differences among teens 2. About 40% of the within-teen variation in ALCUSE is explained by linear TIME 3. There is significant variation in both initial status and rate of change so it pays to explore substantive predictors (COA & PEER)
How do we build statistical models?

Use all your intuition and skill you bring from the cross sectional world

But because the data are longitudinal, we have some other options
Multiple level-2 outcomes (the individual growth parameters)each can be related separately to predictors Two kinds of effects being modeled:
Fixed effects Variance components Not all effects are required in every model
Examine the effect of each predictor separately Prioritize the predictors,

Focus on your question predictors Include interesting and important control predictors
Progress towards a final model whose interpretation addresses your research questions
(ALDA, Section 4.5.1, pp 105-106)
What will our analytic strategy be? What will our analytic strategy be?
Because our research interest focuses on the effect of COA, essentially treating PEER is a control, were going to proceed as follows
Model C: COA predicts both Model C: COA predicts both initial status and rate of change. initial status and rate of change.
Model D: Adds PEER to both Model D: Adds PEER to both Level-2 sub-models in Model C. Level-2 sub-models in Model C.
Model E: Simplifies Model D by Model E: Simplifies Model D by removing the non-significant removing the non-significant effect of COA on change. effect of COA on change.
(ALDA, Section 4.5.1, pp 105-106)
Model C: Assessing the uncontrolled effects of COA (the question predictor) Model C: Assessing the uncontrolled effects of COA (the question predictor)
Fixed effects Fixed effects Est. initial value of ALCUSE for non-COAs is Est. initial value of ALCUSE for non-COAs is 0.316 (p<.001) 0.316 (p<.001) Est. differential in initial ALCUSE between Est. differential in initial ALCUSE between COAs and non-COAs is 0.743 (p<.001) COAs and non-COAs is 0.743 (p<.001) Est. annual rate of change in ALCUSE for nonEst. annual rate of change in ALCUSE for nonCOAs is 0.293 (p<.001) COAs is 0.293 (p<.001) Estimated differential in annual rate of change Estimated differential in annual rate of change between COAs and non-COAS is 0.049 (ns) between COAs and non-COAS is 0.049 (ns) Variance components Variance components Within person VC is identical to Bs because no Within person VC is identical to Bs because no predictors were added predictors were added Initial status VC declines from B: COA Initial status VC declines from B: COA explains 22% of variation in initial status (but explains 22% of variation in initial status (but still stat sig. suggesting need for level-2 preds) still stat sig. suggesting need for level-2 preds) Rate of change VC unchanged from B: COA Rate of change VC unchanged from B: COA explains no variation in change (but also still explains no variation in change (but also still sig suggesting need for level-2 preds) sig suggesting need for level-2 preds)
Next step?
Remove COA? Not yetquestion predictor Add PEERYes, to examine controlled effects of COA
(ALDA, Section 4.5.2, pp 107-108)
Model D: Assessing the controlled effects of COA (the question predictor) Model D: Assessing the controlled effects of COA (the question predictor)
Fixed effects of COA Fixed effects of COA Est. diff in ALCUSE between COAs and nonEst. diff in ALCUSE between COAs and nonCOAs, controlling for PEER, is 0.579 (p<.001) COAs, controlling for PEER, is 0.579 (p<.001) No sig. Difference in rate of change No sig. Difference in rate of change Fixed effects of PEER Fixed effects of PEER Teens whose peers drink more at 14 also drink Teens whose peers drink more at 14 also drink more at 14 (initial status) more at 14 (initial status) Modest neg effect on rate of change (p<.10) Modest neg effect on rate of change (p<.10) Variance components Variance components Within person VC unchanged (as expected) Within person VC unchanged (as expected) Still sig. variation in both initial status and Still sig. variation in both initial status and changeneed other level-2 predictors changeneed other level-2 predictors Taken together, PEER and COA explain Taken together, PEER and COA explain
61.4% of the variation in initial status 61.4% of the variation in initial status 7.9% of the variation in rates of change 7.9% of the variation in rates of change
Next step?
If we had other predictors, wed add them because the VCs are still significant Simplify the model? Since COA is not associated with rate of change, why not remove this term from the model?
(ALDA, Section 4.5.2, pp 108-109)
Model E: Removing the non-significant effect of COA on rate of change Model E: Removing the non-significant effect of COA on rate of change
Fixed effects of COA Fixed effects of COA Controlling for PEER, the estimated diff in ALCUSE Controlling for PEER, the estimated diff in ALCUSE between COAs and non-COAs is 0.571 (p<.001) between COAs and non-COAs is 0.571 (p<.001) Fixed effects of PEER Fixed effects of PEER Controlling for COA, for each 11 pt difference in PEER, Controlling for COA, for each pt difference in PEER, initial ALCUSE is 0.695 higher (p<.001) but rate initial ALCUSE is 0.695 higher (p<.001) but rate of change in ALCUSE is 0.151 lower (p<.10) of change in ALCUSE is 0.151 lower (p<.10)
Variance components are unchanged suggesting Variance components are unchanged suggesting little is lost by eliminating the main effect of COA on little is lost by eliminating the main effect of COA on rate of change (although there is still level-2 rate of change (although there is still level-2 variance left to be predicted by other variables) variance left to be predicted by other variables) Partial covariance is indistinguishable from 0. Partial covariance is indistinguishable from 0. After controlling for PEER and COA, initial After controlling for PEER and COA, initial status and rate of change are unrelated status and rate of change are unrelated
(ALDA, Section 4.5.2, pp 109-110)
Where weve been and where were going Where weve been and where were going
Lets call Model E our tentative final model (based on not just these results but many other analyses not shown here) Controlling for the effects of PEER, the estimated differential in ALCUSE between COAs and nonCOAs is 0.571 (p<.001) Controlling for the effects of COA, for each 1-pt difference in PEER: the average initial ALCUSE is 0.695 higher (p<.001) and average rate of change is 0.151 lower (p<.10)
Displaying prototypical trajectories Recentering predictors to improve interpretation Alternative strategies for hypothesis testing: Comparing models using Deviance statistics and information criteria Additional comments about estimation
(ALDA, Section 4.5.1, pp 105-106)
Displaying analytic results: Constructing prototypical fitted plots Displaying analytic results: Constructing prototypical fitted plots
Key idea: Substitute prototypical values for Key idea: Substitute prototypical values for the predictors into the fitted models to yield the predictors into the fitted models to yield prototypical fitted growth trajectories prototypical fitted growth trajectories
Review of the basic approach (with one dichotomous predictor)
Model C :
0i = 0.316 + 0.743COA 1i = 0.293 0.049COA
1. Substitute observed values for COA (0 and 1)

ALCUSE
COA = 1
= 0.316 + 0.743(0) = 0.316 When COAi = 0 : 0i 1i = 0.293 0.049(0) = 0.293 = 0.316 + 0.743(1) = 1.059 When COAi = 1 0i 1i = 0.293 0.049(1) = 0.244
COA = 0
2. Substitute the estimated growth parameters into the level-1 growth model when COAi = 0 : Yij = 0.316 + 0.293TIME when COAi = 1 : Yij = 1.059 + 0.244TIME
0 13 14 15 AGE 16 17
What happens when the predictors arent all dichotomous?
(ALDA, Section 4.5.3, pp 110-113)
Constructing prototypical fitted plots when some predictors are continuous Constructing prototypical fitted plots when some predictors are continuous
Key idea: Select interesting values of continuous predictors and plot prototypical trajectories by selecting: 1. Substantively interesting values. This is easiest when the predictor has inherently appealing values (e.g., 8, 12, and 16 years of education in the US) 2. A range of percentiles. When there are no well-known values, consider using a range of percentiles (either the 25th, 50th and 75th or the 10th, 50th, and 90th) 3. The sample mean .5 (or 1) standard deviation. Best used with predictors with a symmetric distribution 4. The sample mean (on its own). If you dont want to display a predictors effect but just control for it, use just its sample mean Remember that exposition can be easier if you select whole number values (if the scale permits) or easily communicated fractions (eg.,, , , )
PEER: mean=1.018, sd = 0.726
Low PEER: 1.018-.5( 0.726) = 0.655

ALCUSE
High PEER: 1.018+.5( 0.726) = 1.381
Model E
0i = 0.314 + 0.695 PEER + 0.571COA 1i = 0.425 0.151PEER
COA = 1
High
PEER
Low High
COA = 0
PEER
Intercepts for plotting
Slopes for plotting
Low
0 13 14 15 AGE 16 17
(ALDA, Section 4.5.3, pp 110-113)
How can centering predictors improve the interpretation of their effects? How can centering predictors improve the interpretation of their effects?
At level-1, re-centering TIME is At level-1, re-centering TIME is usually beneficial usually beneficial Ensures that the individual Ensures that the individual intercepts are easily intercepts are easily interpretable, corresponding to interpretable, corresponding to status at aaspecific age status at specific age Often use initial status, but Often use initial status, but as well see, we can center as well see, we can center TIME on any sensible value TIME on any sensible value
Model F centers only PEER Model G centers PEER and COA
Many estimates are unaffected by centering
At level-2, you can re-center by At level-2, you can re-center by subtracting out: subtracting out: The sample mean, which causes The sample mean, which causes the level-2 intercepts to represent the level-2 intercepts to represent average fitted values (mean average fitted values (mean PEER=1.018; mean COA=0.451) PEER=1.018; mean COA=0.451) Another meaningful value, e.g., Another meaningful value, e.g., 12 yrs of ed, IQ of 100 12 yrs of ed, IQ of 100
As expected, centering the level-2 predictors changes the level-2 intercepts
Fs intercepts describe an average non-COA Gs intercepts describe an average teen
(ALDA, Section 4.5.4, pp 113-116)
Our preference: Here we prefer model F because it leaves the dichotomous question predictor COA uncentered
Hypothesis testing: What weve been doing and an alternative approach Hypothesis testing: What weve been doing and an alternative approach
Single parameter hypothesis tests Single parameter hypothesis tests Deviance based hypothesis tests Deviance based hypothesis tests
Simple to conduct and easy to interpret Simple to conduct and easy to interpret making them very useful in hands on data making them very useful in hands on data analysis (as weve been doing) analysis (as weve been doing) However, statisticians disagree about their However, statisticians disagree about their nature, form, and effectiveness nature, form, and effectiveness Disagreement is do strong that some software Disagreement is do strong that some software packages (e.g., MLwiN) wont output them packages (e.g., MLwiN) wont output them Their behavior is poorest for tests on variance Their behavior is poorest for tests on variance components components
Based on the log likelihood (LL) statistic that is Based on the log likelihood (LL) statistic that is maximized under Maximum Likelihood maximized under Maximum Likelihood estimation estimation Have superior statistical properties (compared Have superior statistical properties (compared to the single parameter tests) to the single parameter tests) Special advantage: permit joint tests on Special advantage: permit joint tests on several parameters simultaneously several parameters simultaneously You need to do the tests manually because You need to do the tests manually because automatic tests are rarely what you want automatic tests are rarely what you want
Deviance = -2[LLcurrent model LLsaturated model]

Quantifies how much worse the current model Quantifies how much worse the current model is in comparison to aasaturated model is in comparison to saturated model
AAmodel with aasmall deviance statistic is nearly as model with small deviance statistic is nearly as good; aamodel with large deviance statistic is much good; model with large deviance statistic is much worse (we obviously prefer models with smaller deviance) worse (we obviously prefer models with smaller deviance)
Simplification: Because aasaturated model Simplification: Because saturated model

fits perfectly, its LL= 00and the second term fits perfectly, its LL= and the second term drops out, making Deviance = -2LLcurrent drops out, making Deviance = -2LL
current
(ALDA, Section 4.6, p 116)
Hypothesis testing using Deviance statistics Hypothesis testing using Deviance statistics
You can use deviance statistics to compare You can use deviance statistics to compare two models ififtwo criteria are satisfied: two models two criteria are satisfied:
Both models are fit to the same exact data Both models are fit to the same exact data beware missing data beware missing data 2. One model is nested within the otherwe 2. One model is nested within the otherwe can specify the less complex model (e.g., A) can specify the less complex model (e.g., A) by imposing constraints on one or more by imposing constraints on one or more parameters in the more complex model (e.g., parameters in the more complex model (e.g., B), usually, but not always, setting them to 0) B), usually, but not always, setting them to 0)
1. 1.
If these conditions hold, then: If these conditions hold, then:
Difference in the two deviance statistics is Difference in the two deviance statistics is 2 asymptotically distributed as 2 asymptotically distributed as df = ##of independent constraints df = of independent constraints
1. We can obtain Model A from Model B by invoking 3 constraints: H0 : 10 = 0,12 = 0, 01 = 0
2: Compute difference in Deviance 2: Compute difference in Deviance 2 statistics and compare to appropriate 2 statistics and compare to appropriate distribution distribution Deviance ==33.55 (3 df, p<.001) Deviance 33.55 (3 df, p<.001) reject H0 reject H
0
(ALDA, Section 4.6.1, pp 116-119)
Using deviance statistics to test more complex hypotheses Using deviance statistics to test more complex hypotheses
Key idea: Deviance statistics are great for Key idea: Deviance statistics are great for simultaneously evaluating the effects of simultaneously evaluating the effects of adding predictors to both level-2 models adding predictors to both level-2 models We can obtain Model B from Model C by invoking 2 constraints:
H 0 : 01 = 0, 11 = 0
2: Compute difference in Deviance 2: Compute difference in Deviance 2 statistics and compare to appropriate 2 statistics and compare to appropriate distribution distribution Deviance ==15.41 (2 df, p<.001) Deviance 15.41 (2 df, p<.001) reject H0 reject H
0
The pooled test does not imply that each level-2 slope is on its own statistically significant
(ALDA, Section 4.6.1, pp 116-119)
Comparing non-nested multilevel models using AIC and BIC Comparing non-nested multilevel models using AIC and BIC
You can You can (supposedly) (supposedly) compare non-nested compare non-nested multilevel models multilevel models using information using information criteria criteria Information Criteria: AIC and BIC Information Criteria: AIC and BIC Each information criterion penalizes the logEach information criterion penalizes the loglikelihood statistic for excesses in the structure of likelihood statistic for excesses in the structure of the current model the current model
The AIC penalty accounts for the number of The AIC penalty accounts for the number of parameters in the model. parameters in the model. The BIC penalty goes further and also accounts for The BIC penalty goes further and also accounts for sample size. sample size.
Models need not be nested, Models need not be nested, but datasets must be the but datasets must be the same. same.
Smaller values of AIC & BIC indicate better fit Smaller values of AIC & BIC indicate better fit Heres the taxonomy of multilevel models that we ended up fitting, in the ALCUSE example.. Model E has the lowest AIC and BIC statistics
Interpreting differences in BIC Interpreting differences in BIC across models (Raftery, 1995): across models (Raftery, 1995):
0-2: Weak evidence 0-2: Weak evidence 2-6: Positive evidence 2-6: Positive evidence 6-10: Strong evidence 6-10: Strong evidence >10: Very strong >10: Very strong
Careful: Gelman & Rubin (1995) declare these statistics and criteria to be off-target and only by serendipity manage to hit the target
(ALDA, Section 4.6.4, pp 120-122)
A final comment about estimation and hypothesis testing A final comment about estimation and hypothesis testing
Two most common methods of estimation Maximum likelihood (ML): Maximum likelihood (ML): Generalized Least Squares (GLS) (& Iterative Generalized Least Squares (GLS) (& Iterative GLS): : Iteratively seeks those parameter estimates that GLS) Iteratively seeks those parameter estimates that
Seeks those parameter estimates that maximize the likelihood Seeks those parameter estimates that maximize the likelihood function, which assesses the joint probability of function, which assesses the joint probability of simultaneously observing all the sample data actually simultaneously observing all the sample data actually obtained (implemented, e.g., in HLM and SAS Proc Mixed). obtained (implemented, e.g., in HLM and SAS Proc Mixed).
minimize the sum of squared residuals (allowing them to be minimize the sum of squared residuals (allowing them to be autocorrelated and heteroscedastic) (implemented, e.g., in autocorrelated and heteroscedastic) (implemented, e.g., in MLwiN). MLwiN).
A more important distinction: Full vs. Restricted (ML or GLS) Full: Simultaneously estimate the fixed effects and Full: Simultaneously estimate the fixed effects and Restricted: Sequentially estimate the fixed effects Restricted: Sequentially estimate the fixed effects
the variance components. the variance components. Default in MLwiN & HLM Default in MLwiN & HLM
and then the variance components and then the variance components Default in SAS Proc Mixed Default in SAS Proc Mixed
Goodness of fit statistics apply to Goodness of fit statistics apply to the entire model the entire model (bothfixed and random effects) fixed and random effects) (both This is the method weve used in This is the method weve used in both the examples shown so far both the examples shown so far
Goodness of fit statistics apply to Goodness of fit statistics apply to only the random effects only the random effects So we can only test hypotheses about So we can only test hypotheses about VCs (and the models being compared VCs (and the models being compared must have identical fixed effects) must have identical fixed effects)
(ALDA, Section, 3.4, pp 63-68; Section 4.3, pp 85-92)
Other topics covered in Chapter Four of ALDA Other topics covered in Chapter Four of ALDA
Using Wald statistics to test composite hypotheses Using Wald statistics to test composite hypotheses about fixed effects (4.7)generalization of the about fixed effects (4.7)generalization of the parameter estimate divided by its standard error parameter estimate divided by its standard error approach that allows you to test composite hypotheses approach that allows you to test composite hypotheses about fixed effects, even if youve used restricted about fixed effects, even if youve used restricted estimation methods estimation methods Evaluating the tenability of the models assumptions Evaluating the tenability of the models assumptions (4.8) (4.8)
Checking functional form Checking functional form Checking normality Checking normality Checking homoscedasticity Checking homoscedasticity
Model-Based (empirical Bayes) estimates of the Model-Based (empirical Bayes) estimates of the individual growth parameters (4.9) Superior estimates individual growth parameters (4.9) Superior estimates that combine OLS estimates with population average that combine OLS estimates with population average estimates that are usually your best bet if you would like estimates that are usually your best bet if you would like to display individual growth trajectories for particular to display individual growth trajectories for particular sample members sample members
Extending the multilevel model for change

ALDA, Chapter Five
Change is a measure of time Edwin Way Teale
John B. Willett & Judith D. Singer Harvard Graduate School of Education
Chapter 5: Treating TIME more flexibly Chapter 5: Treating TIME more flexibly
General idea: Although all our examples have been equally spaced, time-structured, and fully balanced, the multilevel model for change is actually far more flexible
Variably spaced measurement occasions (5.1)each Variably spaced measurement occasions (5.1)each individual can have his or her own customized data individual can have his or her own customized data collection schedule collection schedule Varying numbers of waves of data (5.2)not everyone Varying numbers of waves of data (5.2)not everyone need have the same number of waves of data need have the same number of waves of data
Allows us to handle missing data Allows us to handle missing data Can even include individuals with just one or two waves Can even include individuals with just one or two waves
Including time-varying predictors (5.3) Including time-varying predictors (5.3)

The values of some predictors vary over time The values of some predictors vary over time Theyre easy to include and can have powerful interpretations Theyre easy to include and can have powerful interpretations
Re-centering the effect of TIME (5.4) Re-centering the effect of TIME (5.4)
Initial status is not the only centering constant for TIME Initial status is not the only centering constant for TIME Recentering TIME in the level-1 model improves interpretation Recentering TIME in the level-1 model improves interpretation in the level-2 model in the level-2 model
Example for handling variably spaced waves: Reading achievement over time Example for handling variably spaced waves: Reading achievement over time Data source: Children of the National Longitudinal Survey of Youth (CNLSY)
Sample: 89 children Sample: 89 children Research design Research design
Each approximately 66years old at study start Each approximately years old at study start 33waves of data collected in 1986, 1988, and waves of data collected in 1986, 1988, and 1990, when the children were to be in their 1990, when the children were to be in their 6th yr, in their 8th yr, and in their 10th 6th yr, in their 8th yr, and in their 10th yr yr Of course, not each child was tested on Of course, not each child was tested on his/her birthday or half-birthday, which his/her birthday or half-birthday, which creates the variably spaced waves creates the variably spaced waves The outcome, PIAT, is the childs The outcome, PIAT, is the childs unstandardized score on the reading portion unstandardized score on the reading portion of the Peabody Individual Achievement Test of the Peabody Individual Achievement Test Not standardized for age so we can see Not standardized for age so we can see growth over time growth over time No substantive predictors to keep the No substantive predictors to keep the example simple example simple How do PIAT scores change over time? How do PIAT scores change over time?
What does the person-period data set look like when waves are variably spaced? What does the person-period data set look like when waves are variably spaced?
Person-period data sets are easy to construct even with variably spaced waves
We could build models of PIAT scores over time using ANY of these 3 measures for TIMEso which should we use?
Three different ways of coding TIME WAVEreflects design but has no substantive meaning AGEGRPchilds expected age on each occasion AGEchilds actual age (to the day) on each occasionnotice occasion creeplater waves are more likely to be even later in a childs life
(ALDA, Section 5.1.1, pp 139-144)
Comparing OLS trajectories fit using AGEGRP and AGE Comparing OLS trajectories fit using AGEGRP and AGE
80 60 40 20 0
5 6 7 8 9 10 11 12
80 60 40 20 0
5 6 7 8 9 10 11 12
80 60 40 20 0
5 6
AGEGRP (+s with solid line)
For many childrenespecially those assessed near the half-yearsit makes little difference
AGE (s with dashed line)
7 8 9 10 11 12
80 60 40 20 0
5 6 7 8 9 10 11 12
80 60 40 20 0
5 6 7 8 9 10 11 12
80 60 40 20 0
5 6 7 8 9 10 11 12
Why ever use rounded AGE? Note that this what we did in the past two examples, and so do lots of researchers!!!
80 60 40 20 0
5 6 7 8 9 10 11 12
80 60 40 20 0
5 6 7 8 9 10 11 12
80 60 40 20 0
5 6 7 8 9 10 11 12
For some children thoughtheres a big difference in slope, which is our conceptual outcome (rate of change)
(ALDA, Figure 5.1 p. 143)
Comparing models fit with AGEGRP and AGE Comparing models fit with AGEGRP and AGE
Level-1 Model: Level-2 Model: Composite Model:
Yij = 0i + 1i TIMEij + ij , where ij ~ N (0, 2 )

0i = 00 + 0i 1i = 10 + 1i
0 2 where 0i ~ N , 0 0 1i 10
01 12
By writing the level-1 By writing the level-1 model using the generic model using the generic predictor TIME, the predictor TIME, the specification is identical specification is identical
Yij = 00 + 10TIME ij + [ 0 i + 1iTIME ij + ij ]

Some parameter estimates are virtually identical Other ests larger with AGEGRP 10 , the slope, is pt larger cumulates to a 2 pt diff over 4 yrs Level-2 VCs are also larger AGEGRP associates the data from later waves with earlier ages than observed, making the slope steeper Unexplained variation for initial status is associated with real AGE
AIC and BIC better with AGE
Treating an unstructured data set as structured introduces error into the analysis
(ALDA, Section 5.1.2, pp 144-146)
Example for handling varying numbers of waves: Wages of HS dropouts Example for handling varying numbers of waves: Wages of HS dropouts Data source: Murnane, Boudett and Willett (1999), Evaluation Review
Sample: 888 male high school dropouts Sample: 888 male high school dropouts
Based on the National Longitudinal Survey of Based on the National Longitudinal Survey of Youth (NLSY) Youth (NLSY) Tracked from first job since HS dropout, Tracked from first job since HS dropout, when the men varied in age from 14 to 17 when the men varied in age from 14 to 17 Each interviewed between 11and 13 times Each interviewed between and 13 times
Both variable number and spacing of waves Both variable number and spacing of waves Outcome is log(WAGES), inflation adjusted Outcome is log(WAGES), inflation adjusted natural logarithm of hourly wage natural logarithm of hourly wage
Interviews were approximately annual, but some were Interviews were approximately annual, but some were every 22 years every years Each waves interview conducted at different times Each waves interview conducted at different times during the year during the year
How do log(WAGES) change over time? How do log(WAGES) change over time? Do the wage trajectories differ by ethnicity Do the wage trajectories differ by ethnicity and highest grade completed? and highest grade completed?
Examining a person-period data set with varying numbers of waves of data per person Examining a person-period data set with varying numbers of waves of data per person
ID 206 has 3 waves # waves

1 2 3-4 5-6 7-8 9-10 >10
N men
38 39 82 166 226 240 97
ID 332 has 10 waves
ID 1028 has 7 waves
EXPER = specific moment (to the nearest day) in each mans labor force history Varying # of waves Varying spacing LNW in constant dollars seems to rise over time
(ALDA, Section 5.2.1, pp 146-148)
Covariates: Race and Highest Grade Completed
Fitting multilevel models for change when data sets have varying numbers of waves Fitting multilevel models for change when data sets have varying numbers of waves
Everything remains the sametheres really no difference! Everything remains the sametheres really no difference!
Unconditional growth model: On average, a dropouts hourly wage increases with work experience 100(e(0.0457)-1)=4.7 is the %age change in Y per annum
Model C: an intermediate final model Almost identical Deviance as Model B Effect of HGCdropouts who stay in school longer earn higher wages on labor force entry (~4% higher per yr of school) Effect of BLACKin contrast to Whites and Latinos, the wage of Black males increase less rapidly with labor force experience Rate of change for Whites and Latinos is 100(e0.489-1)=5.0% Rate of change for Blacks is 100(e0.489-0.0161-1)=3.3% Significant level-2 VCs indicate that theres still unexplained variationthis is hardly a final model
Fully specified growth model (both HGC & BLACK) HGC is associated with initial status (but not change) BLACK is associated with change (but not initial status)
Fit Model C, which removes non-significant parameters
(ALDA, Table 5.4 p. 149)
Prototypical wage trajectories of HS dropouts Prototypical wage trajectories of HS dropouts

Race At dropout, no racial differences in wages Racial disparities increase over time because wages for Blacks increase at a slower rate
2.4
LNW White/Latino
2.2 Black 12 th grade dropouts
2.0
1.8 9 th grade dropouts
Highest grade completed Those who stay in school longer have higher initial wages This differential remains constant over time (lines remain parallel)
1.6 0 2 4 6 EXPER 8 10
(ALDA, Section 5.2.1 and 5.2.2, pp150-156) D. Singer & John B. Willett, Harvard Graduate School of Education, ALDA, Chapter 5, slide 10 Judith
Practical advice: Problems can arise when analyzing unbalanced data sets Practical advice: Problems can arise when analyzing unbalanced data sets
The multilevel model for change is designed to handle The multilevel model for change is designed to handle unbalanced data sets, and in most circumstances, it does unbalanced data sets, and in most circumstances, it does its job well, however its job well, however When imbalance is severe, or lots of people have just 11 When imbalance is severe, or lots of people have just or 22waves of data, problems can occur or waves of data, problems can occur
You may not estimate some parameters (well) You may not estimate some parameters (well) Iterative fitting algorithms may not converge Iterative fitting algorithms may not converge Some estimates may hit boundary constraints Some estimates may hit boundary constraints Problem is usually manifested via VCs not fixed effects (because the Problem is usually manifested via VCs not fixed effects (because the fixed portion of the model is like aaregular regression model). fixed portion of the model is like regular regression model). IfIfyoure lucky, youll get negative variance components youre lucky, youll get negative variance components Another sign is too much time to convergence (or no convergence) Another sign is too much time to convergence (or no convergence) Most common problem: your model is overspecified Most common problem: your model is overspecified Most common solution: simplify the model Most common solution: simplify the model
Software packages may not issue clear warning signs Software packages may not issue clear warning signs
Many practical strategies discussed in ALDA, Section 5.2.2 Many practical strategies discussed in ALDA, Section 5.2.2
Another major advantage of the multilevel model for change: How easy it is to include time-varying predictors
(ALDA, Section 5.2.2, pp151-156) Judith D. Singer & John B. Willett, Harvard Graduate School of Education, ALDA, Chapter 5, slide 11
Example for illustrating time-varying predictors: Unemployment & depression Example for illustrating time-varying predictors: Unemployment & depression Source: Liz Ginexi and colleagues (2000), J of Occupational Health Psychology
Sample: 254 people identified at unemployment offices. Sample: 254 people identified at unemployment offices. Research design: Goal was to collect 33waves of data per person Research design: Goal was to collect waves of data per person
at 1, 55and 11 months of job loss. In reality, however, data set is not at 1, and 11 months of job loss. In reality, however, data set is not time-structured: time-structured: Interview 11was within 11day and 22months of job loss Interview was within day and months of job loss Interview 22was between 33and 88months of job loss Interview was between and months of job loss Interview 33was between 10 and 16 months of job loss Interview was between 10 and 16 months of job loss In addition, not everyone completed the 2nd and 3rd In addition, not everyone completed the 2nd and 3rd interview. interview. Time-varying predictor: Unemployment status (UNEMP) Time-varying predictor: Unemployment status (UNEMP) 132 remained unemployed at every interview 132 remained unemployed at every interview 61 were always working after the 1st interview 61 were always working after the 1st interview 41 were still unemployed at the 2nd interview, but 41 were still unemployed at the 2nd interview, but working by the 3rd working by the 3rd 19 were working at the 2nd interview, but were 19 were working at the 2nd interview, but were unemployed again by the 3rd unemployed again by the 3rd Outcome: CES-D scale20 4-pt items (score of 00to 80) Outcome: CES-D scale20 4-pt items (score of to 80) How does unemployment affect depression symptomatology? How does unemployment affect depression symptomatology?
(ALDA, Section 5.3..1, pp160-161) Judith D. Singer & John B. Willett, Harvard Graduate School of Education, ALDA, Chapter 5, slide 12
A person-period data set with a time-varying predictor A person-period data set with a time-varying predictor
TIME=MONTHS since job loss
UNEMP (by design, must be 1 at wave 1)
ID 7589 has 3 waves, all unemployed
ID 65641 has 3 waves, re-employed after 1st wave ID 53782 has 3 waves, re-employed at 2nd, unemployed again at 3rd
(ALDA, Table 5.6, p161)
Analytic approach: Were going to sequentially fit 4 increasingly complex models Analytic approach: Were going to sequentially fit 4 increasingly complex models
Model A: An individual growth model with no substantive predictors Model B: Adding the main effect of UNEMP Model C: Allowing the effect of UNEMP to vary over TIME Model D: Also allows the effect of UNEMP to vary over TIME, but does so in a very particular way
Y ij = 0 i + 1 i TIME + ij , where ij ~ N ( 0 , 2 )
ij
Y ij = 00 + 10 TIME ij + 20 UNEMP ij + [ 0 i + 1i TIME ij + ij ]
Yij = 00 + 10 TIME ij + 20UNEMP ij + 30UNEMP ij TIME ij + [ 0 i + 1iTIME ij + ij ]
Yij = 00 + 20UNEMP + 30UNEMP TIMEij ij ij + [ 0i + 2iUNEMP + 3iUNEMP TIMEij + ij ] ij ij
As we go through this analysis, we will demonstrate: Strategies for the thoughtful inclusion of time varying predictors Strategies for practical data analysis more generally (youre almost ready to fly solo!) How both the level-1/level-2 and composite specifications facilitate understanding The need to simultaneously consider the models structural (fixed effects) and stochastic components (variance components) and whether you want them to be parallel
(ALDA, Section 5.3.1, pp 159-164)
First step: Model A: The unconditional growth model First step: Model A: The unconditional growth model
Lets get a sense of the data by ignoring UNEMP and fitting the usual unconditional growth model
Level-1 Model: Y ij = 0 i + 1 i TIME Level-2 Model: Composite Model:

0 i = 00 + 0 i 1i = 10 + 1i
ij
+ ij , where ij ~ N ( 0 , 2 )
01 12
0 2 0 i where ~ N , 0 0 1i 10
How can it go at level-2??? It seems like it can go here
Yij = 00 + 10 TIME ij + [ 0 i + 1i TIME ij + ij ]

On the first day of job loss, the average person has an estimated CES-D of 17.7
On average, CES-D declines by 0.42/mo
Theres significant residual withinperson variation
Theres significant variation in initial status and rates of change
How do we add the timevarying predictor UNEMP?

(ALDA, Section 5.3.1, pp 159-164)
Model B: Adding time-varying UNEMP to the composite specification Model B: Adding time-varying UNEMP to the composite specification
Logical impossibility Population average rate of change in CES-D, controlling for UNEMP Population average difference, over time, in CES-D by UNEMP status
How can we understand this graphically? Although the magnitude of the TV How can we understand this graphically? Although the magnitude of the TV predictors effect remains constant, the TV nature of UNEMP implies the predictors effect remains constant, the TV nature of UNEMP implies the existence of many possible population average trajectories, such as: existence of many possible population average trajectories, such as:
Remains unemployed
20
20
CES-D
CES-D
Reemployed at 5 months
20
CES-D
Reemployed at 10 months
20
CES-D
Reemployed at 5 months Unemployed again at 10
15
15
20
15
15
20
10 10
20
20
10
10
4 6 8 10 12 Months since job loss
14
10
12
14
Months since job loss
14
14
What happens when we fit Model B to data?

(ALDA, Section 5.3.1, pp 159-164)
Fitting and interpreting Model B, which includes the TV predictor UNEMP Fitting and interpreting Model B, which includes the TV predictor UNEMP
Monthly rate of decline is cut in half by controlling for UNEMP (still sig.)
UNEMP has a large and stat sig effect
Model A is a much poorer fit ( Deviance = 25.5, 1 df, p<.001)
20
CES-D
Consistently unemployed (UNEMP=1):
UNEMP = 1 15
Y j = (12.6656 + 5.1113) 0.2020 MONTHS j Y j = 17.7769 0.2020MONTHS j

Consistently employed (UNEMP=0):
What about people who get a job?
10
UNEMP = 0
Y j = 12.6656 0.2020 MONTHS j

5 0
(ALDA, Section 5.3.1, pp. 162-167)
What about the variance components?

2 4 6 8 10 12 14 Months since job loss
Variance components behave differently when youre working with TV predictors Variance components behave differently when youre working with TV predictors
When analyzing time-invariant When analyzing time-invariant predictors, we know which VCs will predictors, we know which VCs will change and how: change and how:
When analyzing time-varying When analyzing time-varying predictors, all VCs can change, but predictors, all VCs can change, but
Level-1 VCs will remain relatively stable Level-1 VCs will remain relatively stable because time-invariant predictors cannot because time-invariant predictors cannot explain much within-person variation explain much within-person variation Level-2 VCs will decline ififthe timeLevel-2 VCs will decline the timeinvariant predictors explain some of the invariant predictors explain some of the between person variation between person variation Although you can interpret aadecrease in Although you can interpret decrease in the magnitude of the Level-1 VCs the magnitude of the Level-1 VCs Changes in Level-2 VCs may not be Changes in Level-2 VCs may not be meaningful! meaningful!
Level-1 VC, Adding UNEMP to the unconditional growth model (A) reduces its magnitude 68.85 to 62.39 UNEMP explains 9.4% of the variation in CES-D scores
2
Look what happened to the Level-2 VCs In this example, theyve increased! Why?: Because including a TV predictor changes the meaning of the individual growth parameters (e.g., the intercept now refers to the value of the outcome when all level-1 predictors, including UNEMP are 0). We can clarify whats happened by decomposing the composite specification back into a Level 1/Level-2 representation
(ALDA, Section 5.3.1, pp. 162-167)
Decomposing the composite specification of Model B into a L1/L2 specification Decomposing the composite specification of Model B into a L1/L2 specification

Level-1 Model: Level-2 Models:
Yij = 0 i + 1i TIME ij + 2 i UNEMP ij + ij
Unlike time-invariant predictors, TV predictors go into the level-1 model
0i = 00 + 0i 1i = 10 + 1i 2i = 20
Model Bs level-2 model for 2i has no residual! Model B automatically assumes that 2i is fixed (that it has the same value for everyone).
Should we accept this constraint? Should we assume that the effect of the person-specific predictor is constant across people? When predictors are time-invariant, we have no choice When predictors are time-varying, we can try to relax this assumption
(ALDA, Section 5.3.1, pp. 168-169)
Trying to add back the missing level-2 stochastic variation in the effect of UNEMP Trying to add back the missing level-2 stochastic variation in the effect of UNEMP
Level-1 Model: Level-2 Models:
Yij = 0 i + 1i TIME ij + 2 i UNEMP ij + ij
0i = 00 + 0i 1i = 10 + 1i 2i = 20 + 2i
Its easy to allow the effect of UNEMP to vary randomly across people by adding in a level-2 residual Check your software to be sure you know what youre doing.
2 0 0 0 i ~ N 0 , and 1i 10 0 20 2 i
But, you pay a price you may not be able to afford

Adding this one term adds 3 new VCs If you have only a few waves, you may not have enough data Here, we cant actually fit this model!!
ij ~ N ( 0, )
2
01 12 21
02 12 2 2
Moral: The multilevel model for change can easily handle TV predictors, but
Think carefully about the consequences for both the structural and stochastic parts of the model. Dont just buy the default specification in your software. Until youre sure you know what youre doing, always write out your model before specifying code to a computer package
So Are we happy with Model B as the final model??? Is there any other way to allow the effect of UNEMP to vary if not across people, across TIME?
(ALDA, Section 5.3.1, pp. 169-171)
Model C: Might the effect of a TV predictor vary over time? Model C: Might the effect of a TV predictor vary over time?
When analyzing the effects of time-invariant predictors, we automatically allowed predictors to affect the trajectorys slope Because of the way in which weve constructed the models with TV predictors, weve automatically constrained UNEMP to have only a main effect influencing just the trajectorys level
To allow the effect of the TV predictor to vary over time, just add its interaction with TIME
Y ij = 00 + 10 TIME ij + 20 UNEMP ij + 30 UNEMP ij TIME ij + [ 0 i + 1i TIME ij + ij ]
Two possible (equivalent) interpretations: The effect of UNEMP differs across occasions The rate of change in depression differs by unemployment status
But you need to think very carefully about the hypothesized error structure: Weve basically added another level-1 parameter to capture the interaction Just like we asked for the main effect of the TV predictor UNEMP, should we allow the interaction effect to vary across people? We wont right now, but we will in a minute.
What happens when we fit Model C to data?
(ALDA, Section 5.3.2, pp. 171-172)
Model C: Allowing the effect of a TV predictor to vary over time Model C: Allowing the effect of a TV predictor to vary over time
Main effect of TIME is now positive (!) & not stat sig ?!?!?!?!?!?!?!?!
UNEMP*TIME interaction is stat sig (p<.05) Model B is a much poorer fit than C ( Deviance = 4.6, 1 df, p<.05)
20
CES-D
Consistently unemployed (UNEMP=1)
UNEMP 15
=1
Y j = (9.6167 + 8.5291) + 0.(0.1620 0.4652) MONTHS j Y j = 18.1458 0.3032 MONTHS j

Consistently employed (UNEMP=0)
10 UNEMP =0
Y j = 9.6167 + 0.1620MONTHS j
(ALDA, Section 5.3.2, pp. 171-172)
5 0 2 4 6 8 10 12 14 Months since job loss
Should the trajectory for the reemployed be constrained to 0?
How should we constrain the individual growth trajectory for the re-employed? How should we constrain the individual growth trajectory for the re-employed?
Should we remove the main effect of TIME? (which is the slope when UNEMP=0) Yes, but this creates a lack of congruence between the models fixed and stochastic parts
Y ij = 00 + 10 TIME ij + 20 UNEMP ij + 30 UNEMP ij TIME ij + [ 0 i + 1i TIME ij + ij ]

So, lets better align the parts by having UNEMP*TIME be both fixed and random
Y ij = 00 + 20 UNEMP ij + 30 UNEMP ij TIME ij + [ 0 i + 3i UNEMP ij TIME ij + ij ]

If were allowing the UNEMP*TIME slope to vary randomly, might we also need to allow the effect of UNEMP itself to vary randomly?
But, this actually fits worse (larger AIC & BIC)!
Model D:
Yij = 00 + 20UNEMP + 30UNEMP TIMEij ij ij + [ 0i + 2iUNEMP + 3iUNEMP TIMEij + ij ] ij ij

UNEMP*TIME has both a fixed & random effect What happens when we fit Model D to data?
UNEMP has both a fixed & random effect
(ALDA, Section 5.3.2, pp. 172-173)
Model D: Constraining the individual growth trajectory among the reemployed Model D: Constraining the individual growth trajectory among the reemployed
Consistently unemployed = (11.2666 + 6.8795) 0.3254MONTHS Yj j
Y j = 18.1461 0.3254MONTHS j
Best fitting model (lowest AIC and BIC)
Consistently employed
Y j = 11.2666
(ALDA, Section 5.3.2, pp. 172-173)
Recentering the effects of TIME Recentering the effects of TIME
All our examples so far have centered TIME on the first wave of data collection
Allows us to interpret the level-1 intercept as individual is true initial status While commonplace and usually meaningful, this approach is not sacrosanct.
We always want to center TIME on a value that ensures that the level-1 growth parameters are meaningful, but there are other options
Middle TIME pointfocus on the average value of the outcome during the study Endpointfocus on final status Any inherently meaningful constant can be used
Example for recentering the effects of TIME Example for recentering the effects of TIME
Data source: Tomarken & colleagues (1997) American Psychological Society Meetings
Sample: 73 men and women with major depression who Sample: 73 men and women with major depression who were already being treated with non-pharmacological were already being treated with non-pharmacological therapy therapy Research design Research design
Randomized trial to evaluate the efficacy of supplemental Randomized trial to evaluate the efficacy of supplemental antidepressants (vs. placebo) antidepressants (vs. placebo)
Research question: Research question:
Pre-intervention night, the researchers prevented all Pre-intervention night, the researchers prevented all participants from sleeping participants from sleeping Each person was electronically paged 33times aaday (at 88 Each person was electronically paged times day (at am, 33pm, and 10 pm) to remind them to fill out aamood am, pm, and 10 pm) to remind them to fill out mood diary diary With full compliancewhich didnt happen, of course With full compliancewhich didnt happen, of course each person would have 21 mood assessments (most had each person would have 21 mood assessments (most had at least 16 assessments, although 11person had only 22and at least 16 assessments, although person had only and 11only 12) only 12) The outcome, POS is the number of positive moods The outcome, POS is the number of positive moods How does POS change over time? How does POS change over time? What is the effect of medication on the trajectories of What is the effect of medication on the trajectories of change? change?
How might we clock and code TIME? How might we clock and code TIME?
DAYIntuitively appealing, but doesnt distinguish readings each day TIME OF DAY quantifies 3 distance between readings (could also make unequal) (TIME-3.33) Same as TIME but now centered on the studys midpoint
WAVE Great for data processingno intuitive meaning
READING right idea, but how to quantify?
TIMEdays since study began (centered on first wave of data collection)
(TIME-6.67) Same as TIME but now centered on the studys endpoint
(ALDA, Section 5.4, pp 181-183)
Understanding what happens when we recenter TIME Understanding what happens when we recenter TIME
Instead of writing separate models depending upon the representation for TIME, let use a generic form:
2 Level-1 Model: Yij = 0 i + 1i (TIME ij c ) + ij , where ij ~ N (0, )
Level-2 Model:
0 i = 00 + 01TREAT i + 0 i 1i = 10 + 11TREAT i + 1i
0 2 0 i where ~ N , 0 0 1i 10
01 12
Notice how changing the value of the centering constant, c, changes the definition of the intercept in the level-1 model:
Yij = 0i + 1iTIMEij + ij
Yij = 0i + 1i (TIMEij 3.33) + ij
Yij = 0i + 1i (TIMEij 6.67) + ij
When c = 0:
When c = 3.33:
When c = 6.67:
0i is the individual mood at TIME=0 Usually called initial status
0i is the individual mood at TIME=3.33 Useful to think of asmid-experiment status
0i is the individual mood at TIME=6.67 Useful to think about as final status
Comparing the results of using different centering constants for TIME Comparing the results of using different centering constants for TIME
What are affected are the level-1 intercepts
00 assesses level of POS at time c for the

control group (TREAT=0)
01 assesses the diff. in POS between the

groups (TREATment effect) -3.11 (ns) at study beginning 15.35 (ns) at study midpoint 33.80 * at study conclusion
The choice of centering constant has no effect on: Goodness of fit indices Estimates for rates of change Within person residual variance
190.00 180.00 170.00 160.00 150.00
POS Treatment
Control
Betw person res variance in rate of change

140.00 0 1 2 3 Days 4 5 6 7
You can extend the idea of recentering TIME in lots of interesting ways You can extend the idea of recentering TIME in lots of interesting ways
Example: Instead of focusing on rate of change, Example: Instead of focusing on rate of change, parameterize the level-1 model so ititproduces one parameter for parameterize the level-1 model so produces one parameter for initial status and one parameter for final status initial status and one parameter for final status
6.67 TIMEij Yij = 0i 6.67

Individual Initial Status Parameter
TIMEij + 1i 6.67
+ ij
Individual Final Status Parameter
Advantage: You can use all your longitudinal data to analyze initial and final status simultaneously.
Modeling discontinuous and nonlinear change

ALDA, Chapter Six
Things have changed Bob Dylan
Chapter 6: Modeling discontinuous and nonlinear change Chapter 6: Modeling discontinuous and nonlinear change
General idea: All our examples so far have assumed that individual growth is smooth and linear. But the multilevel model for change is much more flexible:
Discontinuous individual change (6.1)especially useful when discrete shocks or Discontinuous individual change (6.1)especially useful when discrete shocks or time-limited treatments affect the life course time-limited treatments affect the life course Using transformations to model non-linear change (6.2)perhaps the easiest Using transformations to model non-linear change (6.2)perhaps the easiest way of fitting non-linear change models way of fitting non-linear change models
Can transform either the outcome or TIME Can transform either the outcome or TIME We already did this with ALCUSE (which was aasquare root of aasum of 44items) We already did this with ALCUSE (which was square root of sum of items)
Using polynomials of TIME to represent non-linear change (6.3) Using polynomials of TIME to represent non-linear change (6.3)
While admittedly atheoretical, its very easy to do While admittedly atheoretical, its very easy to do Probably the most popular approach in practice Probably the most popular approach in practice
Truly non-linear trajectories (6.4) Truly non-linear trajectories (6.4)

Logistic, exponential, and negative exponential models, for example Logistic, exponential, and negative exponential models, for example AAworld of possibilities limited only by your theory (and the quality and amount of data) world of possibilities limited only by your theory (and the quality and amount of data)
Example for discontinuous individual change: Wage trajectories & the GED Example for discontinuous individual change: Wage trajectories & the GED
Data source: Murnane, Boudett and Willett (1999),
Evaluation Review
Sample: the same 888 male high school Sample: the same 888 male high school dropouts (from before) dropouts (from before) Research design Research design
Each was interviewed between 11and 13 times Each was interviewed between and 13 times after dropping out after dropping out 34.6% (n=307) earned aaGED at some point 34.6% (n=307) earned GED at some point during data collection during data collection
OLD research questions OLD research questions

How do log(WAGES) change over time? How do log(WAGES) change over time? Do the wage trajectories differ by ethnicity and Do the wage trajectories differ by ethnicity and highest grade completed? highest grade completed?
Additional NEW research questions: What is the Additional NEW research questions: What is the effect of GED attainment? Does earning aa effect of GED attainment? Does earning GED: GED:
affect the wage trajectorys elevation? affect the wage trajectorys elevation? affect the wage trajectorys slope? affect the wage trajectorys slope? create aadiscontinuity in the wage trajectory? create discontinuity in the wage trajectory?
(ALDA, Section 6.1.1, pp 190-193)
First steps: Think about how GED receipt might affect an individuals wage trajectory First steps: Think about how GED receipt might affect an individuals wage trajectory
Lets start by considering four plausible effects of GED receipt by imagining what the wage trajectory might look like for someone who got a GED 3 years after labor force entry (post dropout)
2.5
LNW
F: Immediate shifts in both elevation & rate of change D: An immediate shift in rate of change; no difference in elevation
GED
B: An immediate shift in elevation; no difference in rate of change
2.0
A: No effect of GED whatsoever
1.5 0 2 4 6 EXPER
(ALDA, Figure 6.1, p 193)
How do we model trajectories like these within the context of a linear growth model???
10
Including a discontinuity in elevation, not slope (Trajectory B) Including a discontinuity in elevation, not slope (Trajectory B)
Key idea: Its easy; simply include GED as aatime-varying effect at level-1 Key idea: Its easy; simply include GED as time-varying effect at level-1
2.4
LNW
Yij = 0i + 1i EXPERij + 2i GEDij + ij

Common rate of change Pre-Post GED, 1i
2.2
Post-GED (GED=1):
Yij = ( 0i + 2i ) + 1i EXPERij + ij
2.0
Pre-GED (GED=0):
1.8 Elevation differential on GED receipt, 2i
Yij = 0i + 1i EXPERij + ij
1.6 0
LNW at labor force entry, 0i 2 4 6 EXPER 8 10
(ALDA, Section 6.1.1, pp 194-195)
Using an additional temporal predictor to capture the extra slope post-GED receipt Using an additional temporal predictor to capture the extra slope post-GED receipt
Including a discontinuity in slope, not elevation (Trajectory D) Including a discontinuity in slope, not elevation (Trajectory D)
Yij = 0i + 1i EXPERij + 3i POSTEXPij + ij
Post-GED (POSTEXP clocked in same cadence as EXPER):
Yij = 0i + 1i EXPERij + 3i POSTEXP + ij

LNW
2.4
2.2
Slope differential Pre-Post GED, 3i
POSTEXPij = 0 prior to GED POSTEXPij = Post GED experience, a new TV predictor that clocks TIME since GED receipt (in the same cadence as EXPER)
2.0 Rate of change Pre GED, 1i
Pre-GED (POSTEXP=0):
1.8
LNW at labor force entry, 0i 0 2 4 6 EXPER 8 10
1.6
(ALDA, Section 6.1.1, pp 195-198)
Including a discontinuities in both elevation and slope (Trajectory F) Including a discontinuities in both elevation and slope (Trajectory F)
Simple idea::Combine the two previous approaches Simple idea Combine the two previous approaches
Yij = 0i + 1i EXPERij + 2i GED + 3i POSTEXPij + ij

2.4 LNW
2.2
Slope differential Pre-Post GED, 3i
Yij = ( 0i + 2i ) + 1i EXPER + 3i POSTEXP + ij
Post-GED
2.0 Rate of change Pre GED, 1i Constant elevation differential on GED receipt, 2i LNW at labor force entry, 0i 0 2 4 6 EXPER 8 10
1.8
Pre-GED
1.6
(ALDA, Section 6.1.1, pp 195-198)
Many other types of discontinuous individual change trajectories are possible Many other types of discontinuous individual change trajectories are possible
What kinds of other complex trajectories could be used?
Effects on elevation and slope can depend upon timing of GED receipt (ALDA pp. 199-201) You might have non-linear changes before or after the transition point The effect of GED receipt might be instantaneous but not endure The effect of GED receipt might be delayed Might there be multiple transition points (e.g., on entry in college for GED recipients)
Just like a regular regression model,
the multilevel model for change can include discontinuities, nonlinearities and other nonstandard terms
Generally more limited by data, theory, or both, than by the ability to specify the model Extra terms in the level-1 model translate into extra parameters to estimate
Think carefully about what kinds of discontinuities might arise in your substantive context
How do we select among the alternative discontinuous models?
(ALDA, Section 6.1.1, pp199-201)
Lets start with a baseline model (Model A) Lets start with a baseline model (Model A)
against which well compare alternative discontinuous trajectories against which well compare alternative discontinuous trajectories
(UERATE-7) is the local area unemployment rate (added in previous chapter as an example of a TV predictor), centered around 7% for interpretability
Benchmark against which well evaluate discontinuous models
Yij = 0i + 1i EXPERij + 2i (UERATE ij 7) + ij
0i = 00 + 01 ( HGC i 9) + 0i 1i = 10 + 11 BLACK i + 1i 2i = 20
0 2 01 ij ~ N (0, 2 ) and 0i ~ N , 0 0 12 1i 10
-7
To appropriately compare this deviance statistic to more complex models, we need to know how many parameters have been estimated to achieve this value of deviance
(ALDA, Section 6.1.2, pp 201-202)
4 random effects 5 fixed effects
How were going to proceed How were going to proceed

Instead of constructing tables of (seemingly endless) parameter estimates, were going to construct a summary table that presents the specific terms in the model
Baseline just shown
n parameters (for d.f.)
deviance statistic (for model comparison)
(ALDA, Section 6.1.2, pp 202-203)
First steps: Investigating the discontinuity in elevation by adding the effect of GED First steps: Investigating the discontinuity in elevation by adding the effect of GED
B: Add GED as both a fixed and random effect (1 extra fixed parameter; 3 extra random) Deviance=25.0, 4 df, p<.001keep GED effect
C: But does the GED discontinuity vary across people? (do we need to keep the extra VCs for the effect of GED?) Deviance=12.8, 3 df, p<.01 keep VCs
What about the discontinuity in slope?

(ALDA, Section 6.1.2, pp 202-203)
Next steps: Investigating the discontinuity in slope by adding the effect of POSTEXP Next steps: Investigating the discontinuity in slope by adding the effect of POSTEXP
(without the GED effect producing a discontinuity in elevation) (without the GED effect producing a discontinuity in elevation) D: Adding POSTEXP as both a fixed and random effect (1 extra fixed parameter; 3 extra random) Deviance=13.1, 4 df, p<.05 keep POSTEXP effect
E: But does the POSTEXP slope vary across people? (do we need to keep the extra VCs for the effect of POSTEXP?) Deviance=3.3, 3 df, nsdont need the POSTEXP random effects (but in comparison with A still need POSTEXP fixed effect)
(ALDA, Section 6.1.2, pp 203-204)
What if we include both types of discontinuity?
Examining both discontinuities simultaneously Examining both discontinuities simultaneously
F: Add GED and POSTEXP simultaneously (each as both fixed and random effects)
comp. with B shows significance of POSTEXP comp. with D shows significance of GED
(ALDA, Section 6.1.2, pp 204-205)
Can we simplify this model by eliminating the VCs for POSTEXP (G) or GED (H)? Can we simplify this model by eliminating the VCs for POSTEXP (G) or GED (H)?
Each results in a worse fit, suggesting that Model F (which includes both random effects) is better (even though Model E suggested we might be able to eliminate the VC for POSTEXP) We actually fit several other possible models (see ALDA) but F was the best alternativesohow do we display its results?
(ALDA, Section 6.1.2, pp 204-205)
Displaying prototypical discontinuous trajectories Displaying prototypical discontinuous trajectories

(Log Wages for HS dropouts pre- and post-GED attainment) (Log Wages for HS dropouts pre- and post-GED attainment)
Race
At dropout, no racial differences in wages Racial disparities increase over time because wages for Blacks increase at a slower rate
LNW
2.4
White/ Latino
2.2
12th grade dropouts
earned a GED
Black
2
Highest grade completed
Those who stay longer have higher initial wages This differential remains constant over time
GED receipt has two effects
1.8
9th grade dropouts
Upon GED receipt, wages rise immediately by 4.2% Post-GED receipt, wages rise annually by 5.2% (vs. 4.2% pre-receipt)
1.6
0
(ALDA, Section 6.1.2, pp 204-206)
6 EXPERIENCE
10
Modeling non-linear change using transformations Modeling non-linear change using transformations
When facing obviously non-linear trajectories, we usually begin by trying transformation: When facing obviously non-linear trajectories, we usually begin by trying transformation:
A straight lineeven on a transformed scaleis a simple form with easily interpretable parameters A straight lineeven on a transformed scaleis a simple form with easily interpretable parameters Since many outcome metrics are ad hoc, transformation to another ad hoc scale may sacrifice little Since many outcome metrics are ad hoc, transformation to another ad hoc scale may sacrifice little
ALCUSE
COA = 1
High
The prototypical individual growth trajectories are now non-linear:

By transforming the outcome before analysis, we have effectively modeled non-linear change over time
PEER
Low High
COA = 0
PEER
Low
0 13 14 15 AGE
Earlier, we modeled ALCUSE, an outcome that we formed by taking the square root of the researchers original alcohol use measurement
16
17
We can detransform the findings and return to the original scale, by squaring the predicted values of ALCUSE and replotting
Sohow do we know what variable to transform using what transformation?
The Rule of the Bulge and the Ladder of Transformations The Rule of the Bulge and the Ladder of Transformations
Mosteller & Tukey (1977): EDA techniques for straightening lines Mosteller & Tukey (1977): EDA techniques for straightening lines
Step 2: How do we know when to use which transformation? Step 1: What kinds of transformations do we consider?
1. 2. Plot many empirical growth trajectories You find linearizing transformations by moving up or down in the direction of the bulge
Generic variable V compress scale

(ALDA, Section 6.2.1, pp. 210-212)
The effects of transformation for a single child in the Berkeley Growth Study The effects of transformation for a single child in the Berkeley Growth Study
Down in TIME
Up in IQ
expand scale
How else might we model non-linear change?
(ALDA, Section 6.2.1, pp. 211-213)
Representing individual change using a polynomial function of TIME Representing individual change using a polynomial function of TIME
Polynomial of the zero order (because TIME0=1)
Like including a constant predictor 1 in the level-1 model Intercept represents vertical elevation Different people can have different elevations
Polynomial of the first order (because TIME1=TIME)

Familiar individual growth model Varying intercepts and slopes yield criss-crossing lines
Second order polynomial for quadratic change

Includes both TIME and TIME2 0i=intercept, but now both TIME and TIME2 must be 0 1i=instantaneous rate of change when TIME=0 (there is no longer a constant slope) 2i=curvature parameter; larger its value, more dramatic its effect Peak is called a stationary pointa quadratic has 1.
Third order polynomial for cubic change

Includes TIME, TIME2 and TIME3 Can keep on adding powers of TIME Each extra polynomial adds another stationary pointa cubic has 2
(ALDA, Section 6.3.1, pp. 213-217)
Example for illustrating use of polynomials in TIME to represent change Example for illustrating use of polynomials in TIME to represent change Source: Margaret Keiley & colleagues (2000), J of Abnormal Child Psychology
st Sample: 45 boys and girls identified in 11stgrade: Sample: 45 boys and girls identified in thgrade:
Goal was to study behavior changes over time (until 66thgrade) Goal was to study behavior changes over time (until grade)

At the end of every school year, teachers rated At the end of every school year, teachers rated each childs level of externalizing behavior using each childs level of externalizing behavior using Achenbachs Child Behavior Checklist: Achenbachs Child Behavior Checklist:
33 point scale (0=rarely/never; 1=sometimes; 2=often) point scale (0=rarely/never; 1=sometimes; 2=often) 24 aggressive, disruptive, or delinquent behaviors 24 aggressive, disruptive, or delinquent behaviors
Outcome: EXTERNALranges from 00to 68 Outcome: EXTERNALranges from to 68 (simple sum of these scores) (simple sum of these scores) Predictor: FEMALEare there gender Predictor: FEMALEare there gender differences? differences?

How does childrens level of externalizing How does childrens level of externalizing behavior change over time? behavior change over time? Do the trajectories of change differ for boys and Do the trajectories of change differ for boys and girls? girls?
(ALDA, Section 6.3.2, p. 217)
Examining empirical growth plots (which invariably display great variability in temporal complexity) Examining empirical growth plots (which invariably display great variability in temporal complexity)
Quadratic change (but with varying curvatures)
Selecting a suitable level-1 polynomial trajectory for change Selecting a suitable level-1 polynomial trajectory for change
Linear decline (at least until 4th grade)
Little change over time (flat line?)
Two stationary points? (suggests a cubic)
Three stationary points? (suggests a quartic!!!)
When faced with so many different patterns, how do you select a common polynomial for analysis?
(ALDA, Section 6.3.2, pp 217-220)
Order optimized for each child (solid curves) and a common quartic across children (dashed line) Order optimized for each child (solid curves) and a common quartic across children (dashed line)
First impression: Most fitted trajectories provide a reasonable summary for each childs data Second impression: Maybe these ad hoc decisions arent the best?
Examining alternative fitted OLS polynomial trajectories Examining alternative fitted OLS polynomial trajectories
dra t
ic?
Third realization: We need a common polynomial across all cases (and might the quartic be just too complex)?
Using sample data to draw conclusions about the shape of the underlying true trajectories is trickylets compare alternative models
(ALDA, Section 6.3.2, pp 217-220)
Would a
quadr
atic d o?
Qu a
Using model comparisons to test higher order terms in a polynomial level-1 model Using model comparisons to test higher order terms in a polynomial level-1 model
Add polynomial functions of TIME to person period data set
Compare goodness of fit (accounting for all the extra parameters that get estimated)
A: significant between- and within-child variation B: no fixed effect of TIME but significant var comps Deviance=18.5, 3df, p<.01 C: no fixed effects of TIME & TIME2 but significant var comps Deviance=16.0, 4df, p<.01
D: still no fixed effects for TIME terms, but now VCs are ns also Deviance=11.1, 5df, ns
Quadratic (C) is best choice and it turns out there are no gender differentials at all.
(ALDA, Section 6.3.3, pp 220-223)
Example for truly non-linear change Example for truly non-linear change
Data source: Terry Tivnan (1980) Dissertation at Harvard Graduate School of Education
Sample: 17 1st and 2nd graders Sample: 17 1st and 2nd graders
During aa33week period, Terry repeatedly played aatwoDuring week period, Terry repeatedly played twoperson checkerboard game called Fox n Geese, person checkerboard game called Fox n Geese, (hopefully) learning from experience (hopefully) learning from experience
Fox is controlled by the experimenter, at one end of the board Fox is controlled by the experimenter, at one end of the board Children have four geese, that they use to try to trap the fox Children have four geese, that they use to try to trap the fox
Great for studying cognitive development because: Great for studying cognitive development because:
There exists a strategy that children can learn that will guarantee victory There exists a strategy that children can learn that will guarantee victory This strategy is not immediately obvious to children This strategy is not immediately obvious to children Many children can deduce the strategy over time Many children can deduce the strategy over time

Each child played up to 27 games (each game is aa Each child played up to 27 games (each game is wave) wave) The outcome, NMOVES is the number of moves made by The outcome, NMOVES is the number of moves made by the child before making aacatastrophic error the child before making catastrophic error (guaranteeing defeat)ranges from 11to 20 (guaranteeing defeat)ranges from to 20
Research question: Research question:

How does NMOVES change over time? How does NMOVES change over time? What is the effect of aachilds reading (or cognitive) What is the effect of childs reading (or cognitive) ability?READ (score on aastandardized reading test) ability?READ (score on standardized reading test)
(ALDA, Section 6.4.1, pp. 224-225)
Examining empirical growth plots (and asking what features should the hypothesized model display?) Examining empirical growth plots (and asking what features should the hypothesized model display?)
A lower asymptote,
because everyone makes at least 1 move and it takes a while to figure out whats going on
Selecting a suitable level-1 nonlinear trajectory for change Selecting a suitable level-1 nonlinear trajectory for change
An upper asymptote,
because a child can make only a finite # moves each game
A smooth curve joining the asymptotes,

that initially accelerates and then decelerates
These three features suggest a level-1 logistic change trajectory,which unlike our previous growth models will be non-linear in the individual growth parameters
(ALDA, Section 6.4.2, pp. 225-228)
Understanding the logistic individual growth trajectory Understanding the logistic individual growth trajectory (which is anything but linear in the individual growth parameters) (which is anything but linear in the individual growth parameters)
Upper asymptote in this particular model is constrained to be 20 (1+19)
0i is related to, and

determines, the intercept When 1i is large, the trajectory rises more rapidly
19 + ij Yij = 1 + TIME 1 + 0i e 1i ij
25 NMOVES 25 NMOVES
1i determines the rapidity with which the trajectory approaches the upper asymptote
25
NMOVES
20
20
1 = 0.5 1 = 0.3
20
1 = 0.5 1 = 0.3
15
15
15
1 = 0.5
10
1 = 0.1 1 = 0.3
10 10
1 = 0.1
5 5 5
Higher the value of 0i, the lower the intercept
When 1i is small, the trajectory rises slowly (often not reaching an asymptote)
0 10 Game 20 30
1 = 0.1
0 0 10 Game 20 30 0 0 10 Game 20 30 0
0 = 150
0 = 15
0 = 1.5
Models can be fit in usual way using provided your software can do it
(ALDA, Section 6.4.2, pp 226-230)
Results of fitting logistic change trajectories to the Fox n Geese data Results of fitting logistic change trajectories to the Fox n Geese data
Begins low and rises smoothly and non-linearly
Not statistically significant (note small ns), but better READers approach asymptote more rapidly
(ALDA, Section 6.4.2, pp 229-232)
A limitless array of non-linear trajectories awaits A limitless array of non-linear trajectories awaits (each is illustrated in detail in ALDA, Section 6.4.3) (each is illustrated in detail in ALDA, Section 6.4.3)
Yij = i
1 + ij 1i TIMEij
Yij = 0i e 1i
TIME ij
+ ij
Yij = i
1 + ij (1iTIME + 2iTIME2 ) ij ij
Yij =i (i 0i )e
1iTIME ij
+ ij
(ALDA, Section 6.4.3, pp 232-242)
Singer & Willett, page 28
Using SAS Proc Mixed to fit the multilevel model for change
Time is natures way of keeping everything from happening at once Woody Allen
Judith D. Singer & John B. Willett, Harvard Graduate School of Education, Using SAS Proc Mixed, slide 1
Resources to help you learn how to use SAS Proc Mixed Resources to help you learn how to use SAS Proc Mixed
Textbook Examples Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence by Judith D. Singer and John B. Willett
MLwiN
Mplus
SPlus
SPSS
Stata
Chapter
Table of contents A framework for investigating change over time Exploring longitudinal data on change Introducing the multilevel model for change Doing data analysis with the multilevel model for change Treating time more flexibly Modeling discontinuous and nonlinear change Examining the multilevel models error covariance structure Modeling change using covariance structure analysis A framework for investigating event occurrence Describing discrete-time event occurrence data Fitting basic discrete-time hazard models Extending the discrete-time hazard model Describing continuous-time event occurrence data Fitting the Cox regression model Extending the Cox regression model
HLM
SAS
Datasets Ch 1 Ch 2 Ch 3 Ch 4 Ch 5 Ch 6 Ch 7 Ch 8 Ch 9 Ch 10 Ch 11 Ch 12 Ch 13 Ch 14 Ch 15
What well do now: Using the specific models we just What well do now: Using the specific models we just fit in Chapter Four to demonstrate how to use fit in Chapter Four to demonstrate how to use SAS PROC MIXED to fit these models to data SAS PROC MIXED to fit these models to data Model A: The unconditional means model Model A: The unconditional means model Model B: The unconditional growth model Model B: The unconditional growth model Model C: The uncontrolled effects of COA Model C: The uncontrolled effects of COA Model D: The controlled effects of COA Model D: The controlled effects of COA
Using SAS Proc Mixed to fit Model A (the unconditional means model) Using SAS Proc Mixed to fit Model A (the unconditional means model)
Level-1 Model: Y ij = 0 i + ij , where ij ~ N ( 0 , 2 )
2 Level-2 Model: 0i = 00 + 0i , where 0i ~ N (0, 0 )
Composite Model:
Y ij = 00 + 0 i + ij
proc mixed data=one method=ml covtest; class id; model alcuse = /solution; random intercept/subject=id;
The proc mixed statement invokes the procedure, here using the dataset named one. The method = ml option tells SAS to use full maximum likelihood estimation. If you omit this option, by default SAS uses restricted maximum likelihood (as discussed on Chapter 4, slide 27) The covtest option tells SAS to display tests for the variance components. By default, SAS omits these tests (as discussed on Chapter 4, slide 23).
The class id statement tells SAS to treat the variable ID as a categorical (in SAS terms, a classification) variable. If you omit this statement, by default, SAS would treat ID as a continuous variable.
The model statement specifies the structural portion of the multilevel model for change. This specification model alcuse = may seem unusual but its the way SAS represents the unconditional means model (see Chapter 4, slide 9). The model includes no explicit predictor, but like any regression model, includes an implicit intercept by default. The /solution option on the model statement tells SAS to display the estimated fixed effects (as well as the associated standard errors and hypothesis tests).
The random statement specifies the stochastic portion of the multilevel model for change. By default, SAS always includes a variance component for the level-1 residuals. In this unconditional means model, the random intercept option tells SAS to also include a variance component for the intercept (allowing the means to vary across people). The /subject=id option tells SAS that the intercepts (the means in this unconditional means model) should be allowed to vary randomly across individuals (as identified by the classification variable ID)
Results of fitting Model A (the unconditional means model) to data Results of fitting Model A (the unconditional means model) to data
Level-1 Model: Y ij = 0 i + ij , where ij ~ N ( 0 , 2 ) Level-2 Model: 0i = 00 + 0i , where 0i ~ N (0, Composite Model:
2 0)
Y ij = 00 + 0 i + ij
proc mixed data=one method=ml covtest; class id; model alcuse = /solution; random intercept/subject=id;
Model A: Unconditional means model The Mixed Procedure

Covariance Parameter Estimates Standard Error 0.1191 0.06203 Z Value 4.73 9.06
Cov Parm Intercept Residual
Subject ID
Estimate 0.5639 0.5617
Pr Z <.0001 <.0001
Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 670.2 676.2 676.3 683.4
Solution for Fixed Effects Standard Error 0.09571
Effect Intercept
Estimate 0.9220
DF 81
t Value 9.63
Pr > |t| <.0001
Using SAS Proc Mixed to fit Model B (the unconditional growth model) Using SAS Proc Mixed to fit Model B (the unconditional growth model)
Level-1 Model:
Yij = 0 i + 1i ( AGE 14 ) ij + ij , where ij ~ N ( 0 , 2 )

0i = 00 + 0i 1i = 10 + 1i
0 2 where 0i ~ N , 0 0 1i 10
Level-2 Model: Composite Model:
01 12
Yij = 00 + 10 ( AGE 14 ) ij + [ 0 i + 1i ( AGE 14 ) ij + ij ]

proc mixed data=one method=ml covtest; class id; model alcuse = age_14/solution; random intercept age_14/type=un subject=id; As before, SAS implicitly assumes a variance component for the level-1 residuals. But because Model B includes a second random effect to capture the hypothesized level-2 stochastic variation, the random statement must be modified to include this second termdenoted by the temporal predictor AGE_14. The /type=un, which stands for unstructured, is crucial, telling SAS to not impose any structure on the variance covariance matrix for the level-2 residuals.
Model B, the unconditional growth model, includes a single predictor, age_14, representing the slope of the level-1 individual growth trajectory. As before, SAS implicitly understands that the user wishes to include an intercept term. Because the predictor age_14 is centered at age 14 (the first wave of data collection), the intercept now represents initial status.
Results of fitting Model B (the unconditional growth model) to data Results of fitting Model B (the unconditional growth model) to data
0i = 00 + 0i 1i = 10 + 1i
0 2 where 0i ~ N , 0 0 1i 10
proc mixed data=one method=ml covtest; class id; model alcuse = age_14/solution; random intercept age_14/type=un subject=id;
Parameter #1 Parameter #2
01 12
Yij = 00 + 10 ( AGE 14 ) ij + [ 0i + 1i ( AGE 14 ) ij + ij ]
Model B: Unconditional growth model The Mixed Procedure

Covariance Parameter Estimates Standard Error Z Value
Cov Parm UN(1,1) UN(2,1) UN(2,2) Residual
Subject ID ID ID
Estimate 0.6244 -0.06844 0.1512 0.3373
Pr Z <.0001 0.3288 0.0037 <.0001
0.1481 4.22 0.07008 -0.98 0.05647 2.68 0.05268 6.40
Solution for Fixed Effects Standard Error 0.1051 0.06245
Effect Intercept AGE_14
Estimate 0.6513 0.2707
DF 81 81
t Value 6.20 4.33
Pr > |t| <.0001 <.0001
Using SAS Proc Mixed to fit Model C (Uncontrolled effects of COA) Using SAS Proc Mixed to fit Model C (Uncontrolled effects of COA)
2 Level-1 Model: Yij = 0 i + 1i ( AGE 14 ) ij + ij , where ij ~ N ( 0 , )
Level-2 Model: Composite Model:
0i = 00 + 01COAi + 0i 1i = 10 + 11COA i + 1i
0 2 01 where 0i ~ N , 0 2 0 1i 10 1
Yij = 00 + 01COAi + 10 ( AGE 14 ) ij + 11COAi * ( AGE 14 ) ij + [ 0 i + 1i ( AGE 14 ) ij + ij ]
proc mixed data=one method=ml covtest; class id; model alcuse = coa age_14 coa*age_14/solution; random intercept age_14/type=un subject=id;
Like the companion Level-2 model, Model C adds two terms to register the uncontrolled effects of COA: (1) a main effect of COA, which captures the effect on the intercept (initial status); and (2) the cross-level interaction, COA*AGE_14, which captures the effect of COA on the rate of change All other statements, including the random statement, are unchanged from Model B because we have only added new fixed effects (for COA) and not any new random effects.
Results of fitting Model C (the uncontrolled effects of COA) to data Results of fitting Model C (the uncontrolled effects of COA) to data
0i = 00 + 01COAi + 0i 1i = 10 + 11COA i + 1i
0 2 01 where 0i ~ N , 0 2 0 1i 10 1
proc mixed data=one method=ml covtest; class id; model alcuse = coa age_14 coa*age_14/solution; random intercept age_14/type=un subject=id;
Yij = 00 + 01COAi + 10 ( AGE 14 ) ij + 11COAi * ( AGE 14 ) ij + [ 0 i + 1i ( AGE 14 ) ij + ij ]
Model C: Uncontrolled effects of COA The Mixed Procedure

Covariance Parameter Estimates Standard Error 0.1278 0.06573 0.05639 0.05268 Z Value 3.81 -0.90 2.67 6.40
Subject ID ID ID
Estimate 0.4876 -0.05934 0.1506 0.3373
Pr Z <.0001 0.3666 0.0038 <.0001
Solution for Fixed Effects Standard Error 0.1307 0.1946 0.08423 0.1254
Effect Intercept COA AGE_14 COA*AGE_14
Estimate 0.3160 0.7432 0.2930 -0.04943
DF 80 82 80 82
t Value 2.42 3.82 3.48 -0.39
Pr > |t| 0.0179 0.0003 0.0008 0.6944
Using SAS Proc Mixed to fit Model D (Controlled effects of COA) Using SAS Proc Mixed to fit Model D (Controlled effects of COA)
Level-1 Model: Yij = 0 i + 1i TIME ij + ij , where ij ~ N ( 0 , 2 ) Level-2 Model: Composite Model:
0i = 00 + 01COAi + 02 PEERi + 0i 1i = 10 + 11COA i + 12 PEERi + 1i
0 2 01 where 0i ~ N , 0 2 0 1i 10 1
Yij = 00 + 01COAi + 02 PEER i + 10 ( AGE 14 ) ij + 11COAi * ( AGE 14 ) ij + 12 PEER i * ( AGE 14 ) ij + [ 0 i + 1i ( AGE 14 ) ij + ij ]
proc mixed data=one method=ml covtest; class id; model alcuse = coa peer age_14 coa*age_14 peer*age_14/solution; random intercept age_14/type=un subject=id;
Like the companion Level-2 model, Model D adds two terms to register the controlled effects of PEER: (1) a main effect of PEER, which captures the effect on the intercept (initial status); and (2) the cross-level interaction, PEER*AGE_14, which captures the effect of PEER on the rate of change All other statements, including the random statement, are unchanged from Model C because we have only added new fixed effects (for PEER) and not any new random effects.
Results of fitting Model D (the controlled effects of COA) to data Results of fitting Model D (the controlled effects of COA) to data
Model D: Controlled effects of COA The Mixed Procedure

Covariance Parameter Estimates Standard Error 0.09259 0.05500 0.05481 0.05268 Z Value 2.60 -0.11 2.54 6.40
Subject Estimate ID ID ID 0.2409 -0.00612 0.1391 0.3373
Pr Z 0.0046 0.9115 0.0056 <.0001
Solution for Fixed Effects Standard Error 0.1481 0.1625 0.1115 0.1137 0.1248 0.08564
Effect Intercept COA PEER AGE_14 COA*AGE_14 PEER*AGE_14
Estimate -0.3165 0.5792 0.6943 0.4294 -0.01403 -0.1498
DF 79 82 82 79 82 82
t Value -2.14 3.56 6.23 3.78 -0.11 -1.75
Pr > |t| 0.0356 0.0006 <.0001 0.0003 0.9107 0.0840
Go to resources to help you use SAS


Handout For ALDA Workshop - 001

Uploaded by

Copyright:

Available Formats

Handout For ALDA Workshop - 001

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handout For ALDA Workshop - 001

Uploaded by

Copyright:

Available Formats

You may download this handout and supporting materials at: http://gseweb.harvard.edu/~faculty/singer/ http://gseacademic.harvard.edu/alda/ http://gseacademic.harvard.edu/~willetjo/ http://www.ats.ucla.edu/stat/examples/alda/ Judith D.

. Singer & John B. Willett (2006)

Individual Growth Modeling: Modern Methods for Studying Change

sociology agriculture education zoology economics

10 '81 '84 '87 '90 '93 '96 '99 '02 '05

Combine waves Ignore age heterogeneity

Questions about systematic change over time

Questions about whether and when events occur

Individual Growth Model/ Multilevel Model for Change

Discrete- and Continuous-Time Survival Analysis

Introducing the Multilevel Model for Change:

When youre finished changing, youre finished Benjamin Franklin

John B. Willett & Judith D. Singer Harvard Graduate School of Education

(ALDA, Chapter 3 intro, p. 45)

Research design Research design

(ALDA, Section 3.1, pp. 46-49)

The person-period data set: The person-period data set:

Overall impression: Overall impression:

150 125 100 75 50

150 125 100

i indexes persons (i=1 to 103) j indexes occasions/periods (j=1 to 3)

i1, i2, and i3 are deviations

0i is the intercept of is true

But theres also great variation in these OLS estimates

Fitted rate of change

00 8 3 4 7 1444 8 3 00011 21 44433 1118886666 77744 333844 04444888833338888888 0000111122233334444444466668111114447

What does this behavior suggest about a suitable level-2 model?

Average OLS trajectory across the full sample 110-10 (AGE - 1)

(ALDA, Section 3.3, pp. 57-60)

For the level-1 intercept (initial status)

For the level-1 slope (rate of change)

What about the zetas (thes)?

100 Average population trajectory, 00 + 10 (AGE-1) 75

Average population trajectory, (00 + 01) + (10 + 11) (AGE-1)

initial status rate of change

(ALDA, Section 3.3.2, pp. 61-63)

Examining estimated fixed effects Examining estimated fixed effects

For the average participant, it is 6.85 higher

0i = 107.84 + 6.85 PROGRAM i

True annual rate of change for the average non-participant is 21.13

Plotting prototypical change trajectories Plotting prototypical change trajectories

0i = 107.84 + 6.85 PROGRAM i 1i = 21.13 + 65.27 PROGRAM i

COG = 107.84 21.13( AGE 1)

(ALDA, Section 3.5.1, pp. 69-71)

For rate of change:

Examining estimated variance components Examining estimated variance components

Level-1 residual variance (74.24***):

Level-2 residual variance:

124.64 * * * 36.41 36.41 12.29

Doing data analysis with the multilevel model for change

Judith D. Singer & John B. Willett Harvard Graduate School of Education

Comparing models (4.6) Comparing models (4.6)

Sample: 82 adolescents Sample: 82 adolescents

Research design Research design

Research question Research question

3 features of these plots: 3 features of these plots:

A linear model makes sense

ALCUSEij = 0i + 1i ( AGEij 14) + ij where ij ~ N (0, 2 )

is true rate of change per unit of TIME