Nothing Special   »   [go: up one dir, main page]

Categorical Data Analysis With SAS and SPSS Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 576

CATEGORICAL DATA ANALYSIS

WITH s AS AND SPSS


APPLICATIONS

This page intentionally left blank

CATEGORICAL DATA ANALYSIS


WITH SAS AND SPSS
APPLICATIONS

Bayo Lawal
St. Cloud University

IE*
2003

LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS


Mahwah, New Jersey
London

Senior Editor:
Editorial Assistant:
Cover Design:
Textbook Production Manager:
Text and Cover Printer:

Debra Riegert
Jason Planer
Kathryn Houghtaling Lacey
Paul Smolenski
Sheridan Books, Inc.

Camera ready copy for this book was provided by the author.

Copyright 2003 by Lawrence Erlbaum Associates, Inc.


All rights reserved. No part of this book may be reproduced in any form, by
photostat, microform, retrieval system, or any other means, without prior
written permission of the publisher.
Lawrence Erlbaum Associates, Inc., Publishers
10 Industrial Avenue
Mahwah, New Jersey 07430
Library of Congress Cataloging-in-Publication Data
Lawal, H. Bayo.
Categorical data analysis with SAS and SPSS applications / H. Bayo Lawal.
p. cm.
Includes bibliographical references and index.
ISBN 0-8058-4605-0 (cloth : alk. paper)
1. Multivariate analysis. 2. SAS (Computer file). 3. SPSS (Computer file).
I. Title.
QA278.L384 2003
519.5'3dc21
2003044361
CIP
Books published by Lawrence Erlbaum Associates are printed on acid-free
paper, and their bindings are chosen for strength and durability.
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1

Disclaimer:
This eBook does not include the ancillary media that was
packaged with the original printed version of the book.

Contents
Preface

1 Introduction
1.1 Variable Classification
1.2 Categorical Data Examples
1.3 Analyses

1
1
3
4

2 Probability Models
2.1 Introduction
2.2 Moment-Generating Functions
2.3 Probability Models for Discrete Data
2.4 The Multinomial Distribution
2.5 The Hypergeometric Distribution
2.6 Generalized Linear Models
2.7 Exercises

9
9
9
11
19
24
27
36

3 One-Way Classification
3.1 The Multinomial Distribution
3.2 Test Statistics for Testing H0
3.3 An Alternative Test to EMT
3.4 Approximations to the Distribution of X2
3.5 Goodness-of-Fit (GOF) Tests
3.6 Goodness of Fit for Poisson Data
3.7 Local Effects Models
3.8 Goodness of Fit for Binomial Data
3.9 Exercises

39
39
41
44
48
52
57
66
68
75

4 Models for 2 x 2 Contingency Tables


4.1 Introduction
4.2 The Hypergeometric Probability Model
4.3 Fisher's Exact Test
4.4 The Mid-P Test
4.5 Product Binomial Sampling Scheme
4.6 The Full Multinomial Model Scheme
4.7 Summary Recommendations
4.8 The 2 x 2 Tables with Correlated Data
4.9 Measures of Association in 2 x 2 Contingency Tables

79
79
81
85
92
93
99
102
106
109

ii

CONTENTS
4.10 Analyzing Several 2 x 2 Contingency Tables
4.11 Exercises

120
128

5 The General / x J Contingency Table


5.1 Introduction
5.2 Multivariate Hypergeometric Distributions
5.3 Large Sample Test
5.4 Product Multinomial Probability Model
5.5 The Full Multinomial Probability Model
5.6 Residual Analysis
5.7 Partitioning of the G2 Statistic
5.8 The Quasi-Independence Model
5.9 Problems with Small Expected Frequencies
5.10 Association Measures in Two-Way Tables
5.11 Exercises

135
135
136
142
146
151
153
155
157
162
163
166

6 Log-Linear Models for Contingency Tables


6.1 Introduction
6.2 The 2 x 2 Table
6.3 Log-Linear Models for / x J Contingency Tables
6.4 Interaction Analysis
6.5 Three-Way Contingency Tables
6.6 Sufficient Statistics for Log-Linear Models
6.7 Maximum Likelihood Estimates, MLE
6.8 Decomposable and Graphical Log-Linear Models
6.9 MLE via Iterative Procedures
6.10 Interpretation of Parameters in Higher Tables
6.11 Model Interpretations in Higher Tables
6.12 Tests of Marginal and Partial Associations
6.13 Collapsibility Conditions for Contingency Tables
6.14 Problems with Fitting Log-Linear Models
6.15 Weighted Data and Models for Rates
6.16 Problems with Interpretation
6.17 Exercises

169
169
170
178
189
195
203
205
208
212
217
230
233
237
241
248
248
249

7 Strategies for Log-Linear Model Selection


7.1 Introduction
7.2 Example 7.1: The Stepwise Procedures
7.3 Marginal and Partial Associations
7.4 Aitkin's Selection Method
7.5 Selection Criteria
7.6 Interpretation of Final Model
7.7 Danish Welfare Study Data: An example
7.8 Exercises

255
255
257
261
263
265
267
270
276

CONTENTS

iii

8 Models for Binary Responses


8.1 Introduction
8.2 Generalized Linear Model
8.3 Parameters Estimation in Logistic Regression
8.4 Example 8.4: Relative Potency in Bioassays
8.5 Analyzing Data Arising from Cohort Studies
8.6 Example 8.6: Analysis of Data in Table 8.1
8.7 Diagnostics
8.8 Overdispersion
8.9 Other Topics: Case-Control Data Analysis
8.10 A Five-Factor Response Example
8.11 Exercises

279
279
280
286
304
309
317
318
321
327
338
344

9 Logit and Multinomial Response Models


9.1 Introduction
9.2 Poisson Regression, and Models for Rates
9.3 Multicategory Response Models
9.4 Models for Ordinal Response Variables
9.5 Exercises

353
353
364
381
389
411

10 Models in Ordinal Contingency Tables


10.1 Introduction
10.2 The Row Association (R) Model
10.3 The Column Association (C) Model
10.4 The R+C Association Model
10.5 The RC Association Model
10.6 Homogeneous R+C or RC Models
10.7 The General Association Model
10.8 Correlation Models
10.9 Grouping of Categories in Two-Way Tables
10.10 Higher Dimensional Tables
10.11 Conditional Association Models
10.12 Exercises

415
415
420
423
424
424
426
429
432
434
438
440
445

11 Analysis of Doubly Classified Data


11.1 Introduction
11.2 Symmetry Models
11.3 The Diagonal-Parameters Symmetry Models
11.4 The Odds-Symmetry Models
11.5 Generalized Independence Models
11.6 Diagonal Models
11.7 The Full Diagonal Models
11.8 Classification of All the Above Models
11.9 The Bradley-Terry Model
11.10 Measures of Agreement
11.11 Multirater Case
11.12 Exercises

449
449
451
459
463
466
468
473
477
478
483
490
494

iv

CONTENTS

12 Analysis of Repeated Measures Data


12.1 Introduction
12.2 Models for Repeated Binary Response
12.3 Generalized Estimating Equations
12.4 Example 12.2: Mother Rat Data
12.5 The Six Cities Longitudinal Study Data
12.6 Analysis of Nonbinary Response Data
12.7 Exercises

499
499
500
506
508
514
527
535

Appendices

539

Table of the Chi-Squared Distribution

541

Bibliography

543

Subject Index

557

Preface
This book is primarily designed for a senior undergraduate class in Categorical Data
Analysis and for majors in biomedical, biostatistics and statistics programs, but can
also be used as reference text for researchers working in the area, and /or for an
introductory text in a graduate course on the subject. A prerequisite of a one year
undergraduate background in Calculus, and a basic understanding of a one-year
background course in statistical methods, are therefore recommended.
I have tried to write the text in such a way that students will be at ease with the
materials. Thus where necessary, concepts have been explained in details so that
students can reproduce similar results for some of the problems in the exercises.
Emphasis is on implementation of the models discussed in the text, with the use
/r>.
of the Statistical Packages SAS/STAT^ Version 8e. Relevant codes and instructions are included with each example presented. Corresponding analyses with SPSS
(version 11) codes are presented in the CD ROM.
Many examples are given and tables of results of analyses as well as interpretations of the results of analyses are presented. Numerous references are also given
for students interested in further readings on a particular topic or topics in the
bibliography listing. The students will also find exercises at the end of each chapter
beginning from chapter 2. Most exercises require intensive use of PC based Statistical software. These exercises provide standard problems that have been chosen to
conform to the level of the text and students. To further strengthen the students'
understanding of the materials, the exercises sometimes require them to employ
some of the ideas expressed in the text in a more advanced capacity.
The data for illustration and exercises have been carefully selected from various
literature on the subject with one point in mind: to illustrate the concepts being
discussed. Some of the data have been analyzed before in various texts and prints,
but I have added relevant codes for their implementation. In some cases, I have
looked at the anlysis of some of the data from a completely different perspective. I
hasten to add however, that the reliability of these data depend on the honesty of
the sources where these data were drawn from.
Considerable use has been made of SAS PROC GENMOD in the text, the
SAS software codes for implementing a variety of problems are presented either in
the text or in the appendices which come in a CD ROM. Brief summaries of the
contents of each chapter are presented below:
Chapter 1 introduces readers to the various types of variables commonly encountered in Categorical Data analysis.
Chapter 2 reviews the Poisson, Binomial, Multinomial and hypergeometric

vi

Preface
distributions.lt also gives an overview of the properties of generalized linear
models.
Chapter 3 discusses the one-way classification, exact and large sample tests,
goodness-of-fit test statistics, local effects and goodness-of-fit tests for Poisson
and Binomial data.
Chapter 4 introduces models for the 2 x 2 and the 2 x 2 x k contingency
tables. The chapter discusses various sampling schemes, Fisher's exact and
Mid-p tests as well as the analysis of correlated 2 x 2 data.
Chapter 5 gives an overview of the general / x J table. Residual analysis,
quasi-independence and SAS codes for implementing all these are given.
Chapter 6 presents a comprehensive approach to log-linear model analyses. It
discusses the concepts of sufficient statistics, decomposable models, marginal
and partial associations, collapsibility conditions, and problems associated
with log-linear models. It also discusses weighted data as well as models
for rates. All models are implemented with SAS PROC GENMOD, and
CATMOD; SPSS PROC GENLOG, HILOGLINEAR and LOGLINEAR.
Chapter 7 discusses model selection strategies, Aitken's selection method and
discusses selection criteria. Ample examples are presented.
Chapter 8 discusses the logistic regression. Examples in bioassay are presented. The chapter also discusses analysis of cohort study data, diagnostics,
over-dispersion and factor-response models, as well as parameter interpretations are discussed in this chapter. SAS PROCs LOGISTIC, GENMOD,
PHREG, PROBIT, as well as PROC CATMOD are also heavily employed
in this chapter. SPSS procedures PROBIT, LOGISTIC, PLUM, NOMREG,
and COXREG are similarly employed in this chapter with implementation
and results in the CD ROM.
Chapter 9 dicusses logit and multinomial logit models, Poisson regression
models, multi-category response models (adjacent category, baseline, proportional odds, cumulative odds, continuation ratio) are fully discussed in this
chapter with considerable number of examples together with their SAS software and SPSS implementations.
Chapter 10 discusses association models. The chapter also discusses the general association models as well as grouping of categories in contingency tables.
Extension to higher dimensional tables is provided and conditional association
models are fully discussed, together with their implementation in SAS with
PROC GENMOD, and in SPSS with PROC GENLOG.
Chapter 11 introduces the analysis of doubly classified data. A new unified approach to analyzing asymmetry, generalized independence, and skewsymmetry models using non-standard log-linear models is also presented in
this chapter. The treatment of these topics in this chapter are most comprehensive and SAS and SPSS implementations of the various models are

Vll

also presented. The chapter also discusses the Bradley-Terry, and agreement
models.
Chapter 12 discusses the analysis of repeated binary response data. It also introduces readers to the generalized estimating equations (GEE) methodology.
Several examples involving binary response variables are given. An example
involving a non-binary response is also presented.
SPSS codes for all models discussed in this text are presented on the CD-ROM.

ACKNOWLEDGMENTS
I owe a debt of gratitude to a number of people: Dr. Raghavarao for encouraging
me to write the book, Dr. Graham Upton for comments on an earlier draft of
some of the chapters. I am also grateful to St. Cloud State University for the long
term research grant which provided me the summer 1994 stipend that enabled me
to complete chapter 11. I would also like to thank Phillip Rust from the Medical
University of South Carolina, and Alexander von Eye from Michigan State University, who reviewed the manuscript for publication. Their comments have been very
helpful. My sincere thanks to Debra Riegert, Jason Planer, Beth Dugger, and Paul
Smolenski for their encouraging support during the acquisition and production of
this text.
My thanks also go to the graduate students at Temple University for their invaluable comments on a section of this text when I taught the course in 1989. My
thanks similarly go to my STAT 436 classes at St. Cloud State University in 1996,
2000 and 2002.
My gratitude also goes to the Department of Biometry and Epidemiology, MUSC,
for the support provided during the final writing of this manuscript while I was on
sabbatical leave at MUSC and to the graduate students in the Department for their
insightful comments and input in my BIOM 711 class in the Spring of 2000.
I would also like to thank Professors Stuart Lipsitz of the Department of Biometry and Epidemiology, MUSC, Charleston, SC., for providng me some of the
material in chapter 12, and Emmanuel Jolayemi of the Department of Statistics,
University of Ilorin, Nigeria, whose lecture notes taken during graduate classes at
the University of Michigan were very useful in some sections of this text.
I thank the SAS^*1 Institute Inc., and SPSS Inc., for the use of their softwares for
the analyses of all the data examples in this text, the results of which are presented
in the text or in the appendices.
Finally, I thank my wife Nike and our children for their patience and support
during the years of working on this project.

Bayo Lawal
Department of Statistics
St. Cloud State University
St. Cloud, MN

This page intentionally left blank

CATEGORICAL DATA ANALYSIS


WITH SAS AND SPSS
APPLICATIONS

This page intentionally left blank

Chapter 1

Introduction
In this text, we will be dealing with categorical data, which consist of counts rather
than measurements. A variable is often denned as the characteristic of objects or
subjects that varies from one object or subject to another. Gender, for instance,
is a variable as it varies from one person to another. In this chapter, because
variables consist of different types, we describe the variable classification that has
been adopted over the years in the following section.

1.1

Variable Classification

Stevens (1946) developed the measurement scale hierachy into four categories,
namely, nominal, ordinal, interval and ratio scales. Stevens (1951) further prescribed statistical analyses that are appropriate and/or inappropriate for data that
are classified according to one of the four scales above. The nominal scale is the
lowest while the ratio scale variables are the highest.
However, this scale typology has seen a lot of critisisms from, namely, Lord
(1953), Guttman (1968), Tukey (1961,) and Velleman and Wilkinson (1993). Most
of the critisisms tend to focus on the prescription of scale types to justify statistical
methods. Consequently, Velleman and Wilkinson (1993) give examples of situations
where Steven's categorization failed and where statistical procedures can often not
be classified by Steveen's measurement theory.
Alternative scale taxonomies have therefore been suggested. One of such was
presented in Mosteller and Tukey (1977, chap. 5). The hierachy under their classification consists of grades, ranks, counted fractions, counts, amounts, and balances.
A categorical variable is one for which the measurement scale consists of a set
of categories that is non-numerical. There are two kinds of categorical variables:
nominal and ordinal variables. The first kind, nominal variables, have a set of unordered mutually exclusive categories, which according to Velleman and Wilkinson
(1993) "may not even require the assignment of numerical values, but only of unique
identifiers (numerals, letters, color)". This kind classifies individuals or objects into
variables such as gender (male or female), marital status (married, single, widowed,
divorced), and party affiliation (Republican, Democrat, Independent). Other variables of this kind are race, religious affiliation, etc. The number of occurrences
1

CHAPTER 1.

INTRODUCTION

in each category is referred to as the frequency count for that category. Nominal
variables are invariant under any transformations that preserves the relationship
between subjects (objects or individuals) and their identifiers provided we do not
combine categories under such transformations. For nominal variables, the statistical analysis remains invariant under permutation of categories of the variables.
As an example, consider the data in Table 1.1 relating to the distribution 120
students in an undergraduate biostatistics class in the fall of 2001. The students
were classified by their gender (males or females). Table 1.1 displays the classification of the students by gender.
Counts
Ui

Gender
Male Female
31
89

Total
120

Table 1.1: Classification of 120 students in class by gender


When nominal variables such as variable gender have only two categories, we say
that such variables are dichotomous or binary. This kind of variables have been
sometimes referred to as the categorical-dichotomous. A nominal variable such as
color of hair with categories {black, red, blonde, brown, gray}, which has multiple
categories but again without ordering, is referred to (Clogg, 1995) as categoricalnominal.
The second kind of variables, ordinal variables, are where the categories are
ordered. Using the definition in Velleman and Wilkinson (1993), if S is an ordinal scale that assigns real numbers in 5R to the elements of a set P, of observed
observations, then
P -> 3? such that
, ,
i > j <-> S(i) > S(j) for all iJ^P
Such a scale S preserves the one-to-one relation between numerical order values
under some transformation. The set of transformations that preserves the ordinality
of the mapping in (1.1) has been described by Stevens (1946) as being permissible.
That is, the monotonic transformation / is permissible if and only if:
(1.2)

Thus for this scale of measurements, permissible transformations are logs or square
roots (nonnegative) , linear, addition or multiplication by a positive constant.
Under an ordinal scale, therefore, the subjects or objects are ranked in terms
of degree to which they posses a characteristic of interest. Examples are: rank in
graduating class (90th percentile,80th percentile,60th percentile); social status (upper, middle, lower); type of degree (BS,MS,PhD); etc. The Likert variable such as
the attitudinal response variable with four levels (strongly dissapprove, dissapprove,
approve, strongly approve) can sometimes be considered as a partially ordinal variable (a variable with most levels ordered with one or more unordered levels) by
the inclusion of the response level "don't know" or "no answer" category. Ordinal
variables generally indicate that some subjects are better than others but then, we
can not say by how much better, because the intervals between categories are not
equal. Consider again the 120 students in our biostatistics class now classified by
their status at the college (freshman, sophomore, junior, or senior). Thus the status
variable has four categories in this case.

1.2. CATEGORICAL DATA EXAMPLES


Status
Counts
rii

F
10

S
45

J
50

Sr
15

Total
120

Table 1.2: Distribution of 120 students in class by status


The results indicate that more than 75% of the students are either sophomores or
juniors. Table 1.2 will be referred to as a one-way classification table because it was
formed from the classification of a single variable, in this case, status. For the data
in Table 1.2, there is an intrinsic ordering of the categories of the variable "status",
with the senior category being the highest level in this case. Thus this variable can
be considered ordinal.
On the other hand, a metric variable has all the characteristics of nominal and
ordinal variables, but in addition, it is based on equal intervals. Height, weight,
scores on a test, age, and temperatures are examples of interval variables. With
metric variables, we not only can say that one observation is greater than another
but by how much. A fuller classification of variable types is provided by Stevens
(1968).
Within this grouping, that is, nominal, ordinal, and metric, the metric or interval
variables are highest, followed by ordinal variables. The nominal variables are
lowest. Statistical methods applied to nominal variables will also be applicable to
either ordinal or interval variables but not vice versa. Similarly, statistical methods
applicable to ordinal variables can also be applied to interval variables, again but
not vice versa, nor can it be applied to nominal variables since these are lower in
order on the measurement scale.
When subjects or objects are classified simultaneously by two or more attributes,
the result of such a cross-classification can be conveniently arranged as a table of
counts known as a contingency table. The pattern of association between the classificatory variables may be measured by computing certain measures of association
or by fitting of log-linear model, logit, association, or other models.

1.2

Categorical Data Examples

Suppose the students in this class have been cross-classified by two variables, gender
and status, then the resulting table will be described as a two-way table or a two-way
contingency table below (see Table 1.3). Table 1.3 displays the cross-classification
of students by gender and status.
Status
Gender
Males
Females
Total

3
7
10

S
10
35
45

J
12
38
50

Sr
6
9
15

Total
31
89
120

Table 1.3: Joint classification of students by status and gender


The frequencies in Table 1.3 above indicate that there are 31 male students in this
class and 89 female students. Further, the table also reveals that of the 120 students
in the class, 38 are females whose status category is junior.

CHAPTER 1.

INTRODUCTION

In general, our data will consist of counts {n^, i = 1,2, , k} in the k cells
(or categories) of a contingency table. For instance, these might be observations
for the k levels of a single categorical variable, or for k=IJ cells of a two-way / x J
contingency table. In Tables 1.1 and 1.2, k equals 2 and 4 respectively, while k 8
in Table 1.3. We shall treat counts as random variables. Each observed count, Hi
has a distribution that is concentrated on the non-negative integers, with expected
values denoted by ra^ = E(ni). The {mi} are called expected frequencies, while the
{rii} are referred to as the observed frequencies. Observed frequencies refer to how
many objects or subjects are observed in each category of the variable(s).

1.3

Analyses

For the data in Tables 1.1 and 1.2, interest for this type of data usually centers on
whether the observed frequencies follow a specified distribution usually leading to
what is often referred to as 'goodness-of-fit test'. For the data in Table 1.3, on the
other hand, our interest in this case is often concerned with independence, that is,
whether the students' status is independent of gender. This is referred to as the
test of independence. An alternative form of this test is the test of "homogeneity",
which postulates, for instance, that the proportion of females (or males) is the same
in each of the four categories of "status". We will see that both the independence
and homogeneity tests lead asymptotically to the same result in chapter 5. We
may on the other hand wish to exploit the ordinal nature of the status variable
(since there is an hierachy with regards to the categories of this variable). We shall
consider this and other situations in chapters 6 and 9, which deals with log-linear
model and association analyses of such tables. The various measure of association
exhibited by such tables are discussed in chapter 5.

1.3.1

Example of a 2 x 2 Table

Suppose the students in the biostatistics class were asked whether they have ever
smoked; this would lead to a response variable that we will call here "smoking
status" with categories Yes and No. We display in Table 1.4 the two-way crossclassification of the students by gender and smoking status, leading to a 2 x 2
contingency table.
Gender
Males
Females
Total

Response
Yes No
18
13
52
37
70 50

Total
31
89
120

Table 1.4: 2 x 2 contigency table of response by gender


The resulting table indicates that of the 120 students in the class, 70 indicated that
they have once smoked. The 2 x 2 table has perhaps received more attention than
many other tables. The reasons for this is related to "how the observed frequencies
in the table are obtained." In this case, there are only 120 students in this class, and
we may therefore consider that this value or sample size is fixed. With 120 fixed, the
students are then cross-classified by the two variables, response and gender, leading

1.3. ANALYSES
to Table 1.4. This schemme leads to what has been described as the "multinomial
sampling scheme." Other sampling schemes relating to the 2 x 2 table will be
examined in chapter 4. Again, interests center here on whether smoking status is
independent of gender or whether the response YES is uniform across gender.

1.3.2

Three-Way Tables

Assuming the students were also asked their attitudes towards abortion, this would
lead to yet another variable, designated here as attitudes, with three response categories {positive, mixed, negative} designated here as 1, 2, and 3, respectively. We
now cross-classified the resulting data as a three-way contingency table having variables gender, status, and attitude, resulting in a 2 x 4 x 3 three-way contingency
tables having 24 cells. Table 1.5 displays the observed frequencies in this case.
Gender
Males

Females

Status
1
2
3
4
1
2
3
4

Attitudes
1
2
3
1
2
0
4 5 1
5 4 3
1
3
2
3 2 2
15 14 6
12 9 17
2 3 4

Total
3
10
12
6
7
35
36
9

Table 1.5: Three-way table of gender, status and attitudes


In Table 1.5, the status categories 1, 2, 3, and 4 refer respectively to freshman,
sophomore, junior, and senior. The form of analysis here can be viewed in several
ways.
(a) We may wish to consider whether the attitudes towards abortion is independent
of gender. In this case, we only need to collapse Table 1.5 over the status
variable (in other words we will ignore the status variable) and use the methods
mentioned in the previous section to analyze the data; that is, we will conduct
a test of independence, for Table 1.6.
Gender
Males
Females
Total

Attitudes
2 3
11 14 6
32 28 29
43 42 35
1

Total
31
89
120

Table 1.6: Two-way table of gender by attitudes collapsed over status


(b) We may also be interested in the relationship between the status category
and attitudes toward abortion. This again would lead to a 4 x 3 two-way
contingency table as in Table 1.7.
Again the usual test of independence will answer this question. Note here
that we are collapsing over the gender variable.

CHAPTER 1.

Status
1
2

3
4
Total

Attitudes
1 2 3
4 4 2

19
17
3
43

19
13
6
42

7
20
6
35

INTRODUCTION

Total
10
45
50
15
120

Table 1.7: Two-way table of status by attitudes collapsed over gender


(c) However, we may view collapsibility of Table 1.5 over either the status or gender
variable as unsatisfactory because we do not know whether the strength and
direction of the association between the resulting two variables will change
when collapsed over the third variable (this is called Simpson's paradox).
It is therefore desirable to analyze the data in Table 1.5 like the familiar analysis of
variance for the continous type data where we can study the interactions between the
various variables. The equivalent analysis of variance (ANOVA) model in categorical
data analysis is the log-linear models analysis, which we shall apply not only to
three-way tables but to higher dimensional tables in this book. The procedure
allows us to study interactions between the explanatory variables and the response
variable.
If, however, we consider variable attitude as a response variabe, an even better
procedure would be to consider fitting logit models to the data in Table 1.5. Logit
models are discussed in chapter 9. We can further exploit the intrinsic ordinality
of the response variable (+ve, mixed, -ve} by fitting specialized models, such as
the proportional odds, continution ratio, adjacent category, and cumulative logit
models to this data. These models are discussed in the latter part of chapter 9, and
we also present in that chapter how to implement these models in SAS^ PROC
CATMOD, GENMOD, LOGISTIC and SPSS PROC GENLOG and NOMREG.
If we consider again the 4 x 3 table formed from the three-way table when collapased over gender in Table 1.7; we would notice here that both variables "status"
and "attitudes" are ordinal in nature. We can study the associations between these
two variables better by employing either the row, column or R+C association models discussed in chapter 10. These class of models, which first appeared in Goodman
(1979), models the local odds-ratios in the table.
Consider again, the two-way table formed from collapsing over the "status"
variable, that is, Table 1.6 where only the response variable "attitude" can be
considered ordinal, gender variable being nominal in this case. Thus, we can fit the
row association (R) model to this data. Another possible model is the multiplicative
RC association model. These models are discussed in chapter 10 together with their
SAS implementation with example data in that chapter.
In chapter 11, we consider the analysis of doubly classified data. For the students
in our class we asked the following questions: (1) What is your political party
affiliation? (2) Which party did you vote for in the last Presidential election? We
gave them three party choices only:Democrat, Republican, Independent. Table 1.8
displays the joint classification of these students on the two variables.
For Table 1.8, while it is a two-way 3 x 3 table, we cannot employ the usual two-way

1.3. ANALYSES

Party Affiliation
D
R
I
Total

D
35
8
4
47

Vote
R
10
39
2
51

I
8
2
12
22

Total
53
49
18
120

Table 1.8: Two-way table of party affiliation by voting pattern


table model of independence as presented earlier with these data. We would expect
the analysis to be different in this case because it is obvious that the diagonal cells
are highly correlated: Democrats tend to vote Democrat and Republicans likewise.
Hence, the models that we would consider must take into consideration the diagonal
structure of this table. In chapter 11, we shall develop diagonal parameter and
similar models necessary for a proper explanation of such data. Other possible
models we would want to examine are whether there is symmetry with regards to
the cells or symmetry with regards to the marginal totals. Models discussed in this
chapter will examine symmetry and marginal symmetry models that are appropriate
for this class of data. A unified approach to fitting these models will be presented
in this chapter. We shall further extend our discussion in chapter 11 to methods
of analysis for agreement data arising from two and three raters. Nonstandard loglinear techniques (von Eye & Spiel, 1996) and (Lawal, 2001) will be discussed for
implementing these class of models.
In the last series of questions, the students were asked regarding abortion:
Should a pregnant woman be able to obtain a legal abortion if:
(1) She is married and does not want more children?
(2) The family has very low income and cannot afford any more children?
(3) She is not married and does not want to marry the man?
The form of these questions was tailored to the 1976 US General Social Survey.
Since there are three questions whose responses are simply Yes or No, there are
23 = 8 possible combinations of responses. The classification of our students into
these eight possible combinations of responses by gender is displayed in Table 1.9.
Responses
Gender
Male
Female
Total

YYY

YYN

YNY

YNN

NYY

NYN

NNY

NNN

7
18
25

3
10
13

1
7

4
9
13

5
11
16

3
8
11

2
5

6
21
27

Total
31
89
120

Table 1.9: Classification of students by gender and responses


In Table 1.9, for instance, the response NNN indicates the students responded No
to questions 1, 2, and 3. In this case, there are 27 students in this category. The
data in Table 1.9 fall into the category of repeated binary response data. Because
the responses are repeated, there is therefore an intrinsic correlation among the
three responses and models that will be employed for this would have to exploit this
correlation structure within the subjects' responses. We discuss in the last chapter of

CHAPTER 1.

INTRODUCTION

this book logit, marginal, and other similar models for handling data of this nature.
The discussion in this chapter also covers the case involving more than one covariates
with repeated observations corning from clusters of subjects. This approach leads to
the discussing of the generalized estimating equations (GEE) originally proposed by
Liang and Zeger (1986). Again, SAS^ programs are provided for all the examples
in this chapter.

Chapter 2

Probability Models
2.1

Introduction

Categorical data analysis depends on a thorough understanding of the various sampling schemes that may give rise to either a one-way, a two-way, or a multidimensional contingency tables under consideration. These sampling schemes, which will
be discussed later in the book, have various underlying probability distributions.
The most common of these being the Poisson distribution. Consequently, a proper
understanding of these probability distributions (or probability models as they are
sometimes referred to) is therefore highly desirable. However, before discussing
these various probability models, it would be necessary to define the concept of the
moment generating functions particularly as they relate to obtaining the moments
of the distributions and proving some asymptotic results.
We also introduce readers to the general theory of generalized linear models.
The discussions here are brief and will help readers have an easy understanding of
the theory behind categorical data analysis and some of the statistical softwares
that have been developed to take advantage of this theory.

2.2

Moment-Generating Functions

The moment-generating function (mgf) of a random variable X is defined as


Mx(t) = E(etx)
(2.1)
that is,
f 53 etxp(x),
if X is discrete

MX (f) = < ^

IV Jfoo
etxp(x]
dx
^ '
r

if X is continuous

where p(x) is the probability density function (pdf) such that for the random variable X

p(x) = 1 if X is discrete and


all x
poo

/' oo
-

p(x) dx = 1 if X is continuous

The Maclurin's series expansion for etx is given by:


e

tx

-i

4.

= 1 -f rx H

j.2^2

h H

f rr

/o o\

l^-^j

10

CHAPTER 2. PROBABILITY MODELS

Hence, substituting (2.2) in the expression for M(t) in (2.1) we have for the discrete
case for instance that:

.?
1 + fJ,t + M ^7 +
2

,'
+ P>r -T +

^.

(2 3)
'''

'

/ .

since ^ = E[} ^ xp(x)] and /ur = ^ [ / J xrp(x)\.


The case for the continuous type distribution is equivalent to the general result
in (2.3) except that the summation sign is replaced by the integral sign.
The results in (2.3) indicate that if we expand M.x(t) as a power series in t,
tr
then the moments can be obtained as the coefficient of in this expansion when
rl
evaluated at t 0. That is, these moments are given by:
Hr = -Jr[Mx(t)]

at* = 0

(2.4)

Thus, for instance, [i = [Mx()]t=o- Similar results can be obtained for other
higher moments.
tit

2.2.1

Properties of Mx(t)

The following results relating to M.x(t) are also important.


If a and (3 are constants, then it can be readily verified that:
(0

or

(iii)

The result in (i) is of special importance when a = p,, while the result in (iii) is
of special importance when a = // and /3 = a to obtain the moment generating
function of a standardized variable.
We give the proof of (iii) as follows:

If we set (3 = 1 in (iii), we would obtain (i). Similarly, setting a = 0 in (iii), we


would obtain (ii).

2.3. PROBABILITY MODELS FOR DISCRETE DATA

2.2.2

11

The Normal Probability Distribution

The normal or Gaussian probability distribution is denned as


1

[ if*-M\a]

p(x) = 7=eL 2 ^ CT ' J for oo < x < oo


0V27T
where // and a are real constants and a must be positive. We usually write that
X ~ N(n,cr2). The random variable denned by
a

is referred to as the standardized normal variate and has mean zero (// = 0) and
variance one (a2 = 1). That is, Z ~ JV(0, 1).
The moment generating function for the normal distribution can be obtained
as:
e tx p= e r
*
dx
-oo

0V27T

By completing the squares in the exponent, we have

-1

/-OO

1
/
e0V27T J_oo

because the quantity within the braces is the integral from oo to oo of a normal
density with mean (fj, + a 2 i), variance a2 and therefore equals 1.
If we set fj, = 0 and a1 = 1, we would have the corresponding moment generating
function of the standardized normal variate Z. That is,
Mz(t) = e%

(2.5)

Categorical data analysis deals primarily with discrete data (counts or frequencies)
and hence are described by discrete type distributions. We will therefore, start by
examining the various distributions that we will be considering in this book. These
are discussed in the next section.

2.3
2.3.1

Probability Models for Discrete Data


The Binomial Distribution

Consider an experiment H consisting of a sequence of n independent trials in which


the outcome at each trial is binary or dichotomous. At each trial, the probability
that a particular event (A)-success occurs is TT and the probability that the event
does not occur (failure) is therefore (1 TT). We shall also assume that this probability of success is constant from one trial to another. If the number of times the
event A occurs in the sequence of n trials is represented by the random variable X,
then, X is said to follow the binomial distribution with parameters n and TT (fixed).
The probability distribution for X is:
p[X = x} =

(2.6)

otherwise

12

CHAPTER 2. PROBABILITY MODELS

where 0 < TT < 1.


It can be shown that the moment generating function for the binomial distribution is given by: (See problem 4 in the exercises)
M x (t) = [l + 7r(c*-!)].
(2.7)
Differentiating (2.7) with respect to t, we have, (see problem 4 in the exercises)
Mx(t) = n7re*[l + 7r(e* - l)]n~l and
(2.8a)
Mx(t) = nice*[I - TT + n7re*}[l + 7r(e* - l)] n ~ 2
Evaluating these at t = 0, give
Mx(t) \t=0 = //!
These lead to:

= UTT

M^(*) |t=0 = rwr(l - TT + nTr)


= fj.{
=^-M

(2.8b)
(2.9a)
(2.9b)

=rm

(2.10a)

=n7r(l-7r)

(2.10b)

Example
Suppose the probability that a person suffering from migraine headache will obtain
relief with a certain analgesic drug is 0.8. Five randomly selected sufferers from
migraine headache are given this drug. We wish to find the following probabilities
that the number obtaining relief will be:
(a) Exactly zero (b) Exactly two (c) More than 3
(d) Four or fewer (e) three or four (f) At least 2
Solution: Let X be the binomial random variable, with parameters n = 5 and TT =
0.8. We can usually solve the problems in (a) to (f) by the use of the cummulative
distribution Function of the binomial random variable. The cdf F(x) = P(X < x)
for this problem is generated next with the accompanying SAS software program.
DATA BINOMIAL;
DO X=0 TO 5;
P=PROBBNML(0.8,5,X);
CUM=ROUND(P,.0001);
OUTPUT;
END;
DROP P;
PROC PRINT NODES;
RUN;

X
0
1
2
3
4
5

CUM

0.0003
0.0067
0.0579
0.2627
0.6723
1.0000

111 the program just given, we have employed the SAS software function probbnml(p,n,X) for generating the cummulative distribution function of a binomial
variable with the specified parameters. The ROUND statement asked SAS^ to
round the probabilities to four decimal places.

2.3. PROBABILITY MODELS FOR DISCRETE DATA

13

(a) The number obtaining relief will be exactly 0.


P(X = 0) = F(0) = 0.0003
(b) The number obtaining relief will be exactly 2.
P(X = 2) = P(X < 2) - P(X < 1) = F(2) - F(l)
= 0.0579 - 0.0067
= 0.0512

(c) The number obtaining relief will be more than 3.


P(X > 3) = 1 - P(X < 3) = 1 - F(3) = 1 - 0.2627 = 0.7373
(d) The number obtaining relief will be 4 or fewer.
P(X < 4) = F(4) = 0.6723
(e) The number obtaining relief will be 3 or 4.
P(3 < X < 4) = P(X < 4) - P(X < 2) = F(4) - F(2)
- 0.6723 - 0.0579
- 0.6144
(f) The number obtaining relief will be at least 2.

P(X > 2) = 1 - P(X < 2) = 1 - P(X < 1)


= 1-F(1)
=1-0.0067
= 0.9933
The mean and variance for the above problem are /x = np = 5 * 0 . 8 = 4 and
u2 = np(l p) = 5 * 0.8 * 0.2 = 0.8, respectively. Further, the moment generating
function from this problem is Mx(f) = [1 + 0.8(e* - I)] 5 .

2.3.2

Asymptotic Properties

All the asymptotic properties considered in this and other sections in this chapter
will depend heavily on the convergence results discussed in Appendix A at the end
of this chapter. Readers are advised to first familiarize themselves with the notation
and definition adopted therein.
For a binomial variate X, designated as X ~ 6(n, TT), 0 < TT < 1 with TT fixed,
then as n > oo
(i)
-

(2.11)

is distributed N(0,l) approximately for large n and moderate TT. That is,
Zi 4 N(0,1)
as n > oo, and TT fixed (Central Limit Theorem). This states that Z\ converges in distribution to N(0,l).
Z\ can also be rewritten as:
X mr

Z\ =
=
A/n7r(l TT)

x n

CHAPTER 2. PROBABILITY MODELS

14

That is, as n oo, the standardized binomial distribution approaches the


standard normal distribution with mean 0 and variance 1.
To prove this, we note from property (iii) of the moment generating functions
that
-

/ .\

r^ri-Ha-

i "

We thus have

where a = //, and /3 = a and therefore


logMZl(t) = -^ + nlog[l + 7r(e^ff - 1)]
<7

0"

.
h n log < 1 + TT

2 \a)

Expanding ex and log(l -f x) in Maclurin's series, we have

IE _ n7r2\ /2
'0'

-> -t ,

as n - cxo

The coefficients of tk, k > 3, converges to zero because of ak in the denominator. Thus we obtain

lim MZl(t) =

We conclude therefore that as n > oo, the standardized binomial distribution


approaches the standard normal distribution with mean 0 and variance 1.
(")

ATT
(2.12)
n
as n > oo, and TT fixed. (Weak law of large numbers which indicates that
converges in probability to TT)
Combining (2.11) and (2.12), we have
(2.13)

If p is an estimator for TT, then p ^ and a 95% confidence interval for TT is


approximately given by:
p1.96

2.3. PROBABILITY MODELS FOR DISCRETE DATA

(iii) If we define the odds of an event A occurring as

15

P(A)
1 P(A)

. This is sometimes

referred to as the odds in favor of A. Then, the log odds of an event A occurring
is
<j> = log TT log (1 TT) = log
Hence,
The observed log odds are given by:
t; = log (^) -tog( 1 - 2 ) = 108^)= tog
X

/
\
Thus, U = log
II , and approximately,

n- x

Since it can be shown using the delta method with the linearized Taylor's
expansion (see Appendix A. 4) that
Var(C7) = --mr(l TT)
Hence,
Z3 -> N(0, 1)

as n > oo with TT remaining fixed, using (2.11) and the limit theorem.
(iv)

ix

nx

since
U = log x log (n x)

Var(C7) = x

and

n x

This is a general result, and using Slutsky's theorem (from Appendix A) we


have
as n > oo, and TT remains fixed. An approximate 95% confidence interval for
4> therefore is given by
r

1-

Ul.96(-

n x

The results just shown are due to Cox (1970) and 0 is defined as the logit of TT.
On the other hand, if </> is defined in terms of the inverse normal probability
integral as 0 = ^>~ 1 (7r), then </> is in this case referred to as the probit of TT
(Finney, 1971). Similarly, if 0 is defined as: tp = arcsiny'Tr, we also have the
arcsin (variance stabilizing) transformation.

16

CHAPTER 2. PROBABILITY MODELS

Other Binomial Properties


Other properties of the binomial distribution include the following:
(a) If Xi and X-2 are independent random variables having binomial distributions
with parameters ni,7Ti and n 2 ,7T 2 , respectively, then the sum Y = Xi + X2,
that is, the sum of the random variables also has the binomial distribution
with parameters (ni + n 2 ),7Ti provided K\ = ?r2 only.
The moment generating function (mgf) of Y X\ + X-2 is:
[1 + 7r(e* - l)] ni [1 + 7r(e* - I)]712 = [1 + 7r(e* - 1)]^+^
That is, the sum of the random variables also has the binomial distribution with
parameters (n\ + n 2 ), TT. If however, TTI ^ 7r2, then y = J^ + A"2 does not
have a binomial distribution.
(b) The distribution of Xi, conditional on X\ + X? = x is the hypergeometric
n
n
distribution:
= r I *] =
g
(2.15)
subject to restrictions r < HI; x r < ri2 and max(0, x n 2 ) < r < min(ni,r).

2.3.3

The Poisson Distribution

The Poisson distribution is very important in the study of categorical data and is
often a realization of a rare event. If Xi ~ ?*(/*), * = 1, 2, , then the probability
density function for this random variable is given by:
: x]

^
x\

o 1 2
'

It can be easily shown that:


P[X = x + l]
fj,
P[X = x]
(x + 1)
The moment generating function for X is obtained as follows:

x\

x=0
OO

*\T

Y^ (jJ>e ) e

*-*

x=0

x\

with

0=

From the above expression for Mx(t), it is not too difficult to show that
E(X) = p.

and

Hence the standard error (sd) of X is ^//I-

Var(X) = fj,

(2 16)

2.3.

PROBABILITY MODELS FOR DISCRETE DATA

17

Sums of Independent Poissons


If Xi and X-2 are independent random variables having Poisson distributions with
parameters n\ and ^ respectively, then X = X\+X^ also has a Poisson distribution
with parameters // = //i + /^- (Show using mgfs.)
An Important Conditional Property
Let X\ and X% be each distributed independently as Poissons with parameters JJL\
and //2 respectively. Then the conditional distribution of X\ given the value of
Xi + X-2 = N is binomial with parameters that are based on (the observed values)
of N and //i/(//i + ^2)- That is,

P[Xl=ai i Xl+x2=N]=ail(/!ai),-(i

-*)"-

where TT = niK^t + /Lt2).

Proof:
Let X\ ~ P(^i) and X% ~ ^(^2), then given X\ 4- ^"2 = AT
P(Xi = i) = P[X^fx^=NN]

where ^ \L\ 4- /^2 and ?r = MI/(MI + ^2)Moreover, if Xi and ^"2 are independent nonnegative integer-valued random
variables, and the preceding expression holds, then X\ and X% must be Poisson
variates with parameters in the ratio n\ to //2- This is referred to as the characterization property of the Poisson distribution and the result is due to Chatterji
(1963).
In general, if we have n independent Poisson variates Xi, , Xn with parameters /zi, , /in, respectively, then the conditional distribution of -X\, , Xn given
Y^ Xi = N is multinomial.

2.3.4

Some Asymptotic Properties

Oppose X ~ P(p) and let us consider the case when p, is assumed large. Then
v A4

(2.18)

18

CHAPTER 2. PROBABILITY MODELS


That is,

Zi 4 JV(0, 1)
as n > oo. That is, X ~ AT(/z, //).
Proof:
Again, using the third result under the moment generating functions, we have
for the Poisson distribution:
MZl(t) =

- 1)1

It2

=e

exp -t22 + -

We have employed the Maclaurin's series expansion for ex in the above expressions. From the preceding, we therefore have

lim Mz(t) = e^

^i00

That is, for large /Li, the Poisson distribution approaches the normal with mean
p, and variance p,.

(ii)

JAI

(2.19)

as n > oo.

Let,
(2.20)
then, using (2.18), (2.19), and Slutsky's theorem, we have,

Z2 4 N(Q, 1)
and since

we have
Pr(-1.96 < ?--- < 1.96) ~ 0.95
That is, the probability that the random interval (X 1.96VA, X + 1.96VA)
contains fj, is ~ 0.95.
We may note here that, and are respectively called the Fisher's expected
/j,
X
and likelihood information. As an example, consider the random variable X

2.4. THE MULTINOMIAL DISTRIBUTION

19

having the pdf p ( x ; 9 ) . Then under regularity conditions, the Fisher's Information is defined as
/ \
r 021

a \ogp(x)

Thus, for the Poisson distribution, we have the log-likelihood (L) as:
L = p, 4- x log // log x\
dl
_
x

d2l
/<) = -E

which is larger for values of n closer to zero.

(iii) Let
^logX-log^

4^!)

as^oo

(2.21)

This result follows from (2.18) and the limit theorem


That is,

logX ~

approximately for large fa and hence,


-

^^

( 2. 22)

The preceding result follows from combining (2.19), (2.21) and certain limiting
distribution theorem.
Thus

'

(log (X] - 1.96A/X, log (X) + 1.9&VX)

is the approximate 95% confidence interval for large fi.


We note here that the Poisson probability model for counts {n^ } assumes that they
are independent Poisson variates. The joint probability function for {n^} is
then the product of the probabilities for the k cells. The total sample size
n = ^ m also has a Poisson distribution with parameter ^ fa.

2.4

The Multinomial Distribution

Here, a sample of n observations is classified into k mutually and exhaustive categories according to the specified underlying probabilities II = (7ri,7r2,7T3, , TTfc).
Let HI be the resulting observed frequencies in each of the k classes. We display
this in Table 2.1.
The joint distribution for the random variables n\: n?, ..., n^, the observed frequencies, is given by the multinomial distribution:

CHAPTER 2. PROBABILITY MODELS

20

Response Categories
1
2
3
k
Probabilities
Obs. Freq.

TTi

7T2

7T3

TTfc

HI

n2

n3

nk

Totals
1
n

Table 2.1: Table of observed frequencies with the underlying probability values

P(n,H) =
(2.23)

1=1

where TTi > 0, ^TT^ = 1 and )i=i Uin. n =


which can take on any value for which
(i) 0 < HI < n for i = 1, 2,

is a random vector

,k

(ii) ^Hi = n. Thus we have linear dependence as a result of


k

(iii) nr n
The expression in (2.23) is referred to as the multinomial distribution.

2.4.1

Proof:

n\ objects can be selected into n\ positions in ( n } ways, after which we select the
n2 from the remaining n n\ into positions (n~ni) ways, and so on. Using the
multiplication rule, we have therefore the number of distinguishable permutations
of the n objects comprising of n\ of one kind, n2 of another kind, etc. as:
n\\ (n n\ i
n
J

\Ti\ /

(2.24)

n
ni\n<2\- --nk\

The expression in (2.24) is sometimes referred to as the multinomial coefficient.


The joint probability density function of the linearly dependent random variables
(ni, n 2 , , nk] is given by:
i 2 , - - - ,n f c _i) = P ( n i , n 2 , - - - ,n f c _i)
where nk n 2.ni- Each particular arrangement of the n^s is from independent
trials and has probability

n-l

K2

Since the number of such arrangement is given by (2.24), hence it follows that the
product of these expressions gives the joint pdf of the ns. That is,
,^/fc) =

(2.25)

2.4.

THE MULTINOMIAL

2.4.2

DISTRIBUTION

21

Moments of the Multinomial Distribution

The factorial moment generating function (fmgf) of a random variable X can be


defined as K(t) = E(tx] when it exists for all real values of t in an open interval
that includes the point t = I. Then Km(l] is equal to the mth factorial moment
defined as

Km(l] = E[X(X

- l)(X - 2) - - - (X - m + 1)].

With this definition of factorial moments, it can be shown that the (fmgf) for the
multinomial distribution is
ri(t) = (TTitj + 7T2*2 + - + TTfcifc)"

(2.26)

We can obtain the means, variances and covariances for each element and pairs
of elements of the vector n from the first and second partial derivatives of rj(t)
evaluated at ti = t<i = = tk = 1. Differentiating rj(t] in (2.26) with respect to ,
we have
-

+ 7r2t2 + + nktk)

(2.27a)

= n(n - 1)7^7^1*1 + 7r2t2 + + 7T fc t fc ) n -

(2.27b)

For the means, we evaluate (2.27a) at t\ 2 = tk = 1, to have


^I{t 1 =t a =...=t f c =i}=n^

j = l,2,..-,/c

(2.28)

Similarly for the variances and covariances, we evaluate (2.27b) at ti = t<2 =


tk = 1 to have
|^ | { t 1= =t a =...=t fc =i} = n(n - 1)^-^
From the expression in (2.29), therefore, we have
(i) For j ^ j'
so that

E(njnj>) = n(n - 1) TTJ-TT.,-/

CQV(rij,nj>) = n(n \}'KJ^J' n TTJTTJ'


= mrjTTj'
(ii) Similarly, for j = j'
^^}= n(n _ 1)?f2
so that
Var(rij) = E[nj(nj - 1)] + E(HJ) - [ E ( n j ) } 2
= n(n I)TT| + nitj (mrj}2
n-Kj(l

TTj)

Note: Var(X) = E[X(X - 1) + E(X)] Hence,


(a)
(b)

(c) For i = j,

E(rii) = n-TTi

[E(X)}2

z = 1,2, ,k

E(niUj] n(n \}-KiKj,


Cov(ni,nj) = mriiTj

for

* 7^ ji

(2.29)

22

CHAPTER 2. PROBABILITY MODELS

2.4.3

Special Case when k = 2

When A; = 2, the multinomial reduces to the binomial case, and in this case,

E(n\) TIKI, where n = n\ + n^


E(n<2) = nir-2 = n(l TTI), since 7r2 = I TTI in this case
Var(ni) = n7Ti(l - TTI)
Var(n 2 ) = n7Ti(l TTI)
Cov(ni,n2) = n7Ti(l TTI) = Var(ni)

2.4.4

Maximum Likelihood Estimation of Parameters

Let NI,NI, ...., Nn denote a random sample of size n from a probability density
function p(n;7r), TT fi, where the parameter space ft is an interval. Then, the
likelihood function (the joint pdf of ATi, A^, ...., Nn) is
L(TT) = p(ni;7r)p(rc2;7r)...p(ra n ;7r)
Using (2.23), the likelihood function for the multinomial distribution becomes:
k
i
nj

L=

so that

TT ?r

n!\U2\ nk\ Jy J'


~

= log n!

"

We seek to maximize the likelihood function of the given observed sample HI, n 2 , ...,
nn. However, since the same value of TT would maximize both L(0) and its logarithm, we therefore wish to minimize the above log-likelihood subject to the constraint y^TTj = 1. The Lagrange's multipliers method is most appropriate for this
situation. The method seeks to find relative maxima and minima for a function on
which certain constraints are imposed. For this method, suppose we have a function
p(n, y) subject to the constraint h(n, y) = 0. Then, we construct a new function P
of three variables defined by the following:
P(n, y, A) = p(n, y) - A/i(n, y)
Partial derivatives of P(n, y, A) are then obtained, set to zeros, resulting in simultaneous equations whose solution (no,2/o>^o) correspond to the solution of p(n) with
the given constraint.
If we now apply this method to the problem at hand, if we let

then
dG _ n-j

Setting these to zero, we have

C/vTj-7

TTj7

dG

2A. THE MULTINOMIAL

23

DISTRIBUTION

-A =0

But ^TTj = 1, so that n =


Hence,

n,X

and therefore, we note here that


E(Pj)=E(*j)

= 7T,

1
n*
1

and

n
n

These can be summarized in matrix form as: E(P) = TL


7Tl(l TTl)

7T17T2

That is,
-7T17T2

-*k)

= i[D n -nn']
IV

where
Dn = diag[7Ti, 7T2, , 7r fc ], a diagonal matrix
Estimator for VP
P-; is an unbiased estimator for TT., with

This implies that PJ is a consistent estimator for TT,-. That is,

Hence,
with
Dp_ = diag[pi,p2) 5 Pfc], a diagonal matrix

CHAPTER 2. PROBABILITY MODELS

24

2.5

The Hypergeometric Distribution

Consider a finite population of N subjects, where each subject is classified as either


S (success) or F (failure). Suppose there are R of type S and therefore, (N R) of
type F. Suppose we wish to draw a sample of size n from this population of size N
(equivalent to sampling without replacement). Our aim is to find the probability of
obtaining exactly r S's (successes) in the sample of size n.
Sample
Population
S
F
Totals

S
r
n r
n

F
R-r
N -n- R+r
N -n

Totals
R
N-R
N

Table 2.2: The sampling procedure arranged as a 2 x 2 table


Let us denote the probability that X = r successes in the sample of size n from the
finite population containing R successes and N minus R failures by
P\XV*-
~ r' I\ n""> N
' * > Ft]
-riJ
~~
r

//v\~

where the values or r satisfy the constraint


max{0, n- (N - R)} <r < min{n, R}

2.5.1

Proof

The number of different samples of size n which can be drawn from the finite
population of N subjects is
'N\
nl
n)
n\(N-n}\'
Of these, the number of different ways of drawing (without replacement) the r
sample successes out of the possible R population successes is ( r ) , and for each of
these ways, there are (^1^) different ways of selecting (n r) failures from the
(N R) population failures.
Thus the proportion of samples of size n having exactly r successes is given by:
()("-*)

Pr{X =

\r/\nrI

Example
If N = 10, R = 5 and n = 6, then TV - R = 5. The extreme samples are SSSSSF
and FFFFFS. That is, min = 1 and max = 5.
As discussed earlier in section 2.3.2, if X\ and X? are two independent binomial
variates with parameters (ni,7r) and (n2,7r) respectively, then Xi + X% has a binomial distribution with parameters N = n\ +n-2 and TT. The conditional distribution
of Xi given X\ +X-2 = n is the hypergeometric distribution and is derived as follows:
From Table 2.3, n? N n\ and if we let n = X\ + X2, then we have

2.5.

THE HYPERGEOMETRIC

Outcome
S
F
Totals

DISTRIBUTION
Population
I
II
x
n x
n\ x n-2 n -f- x
HI
n-2

25

Totals
n
N -n
N

Table 2.3: The distribution arranged as a 2 x 2 table

P[Xi = x- X2 = n - x}
X2 = n]
(2.30)
(TUWna \
V x / Vnx)

D
\n /
since N = HI + n2. Notice that all the TT and (1 TT) terms cancel in both the
numerator and denominator in the above expression.
The above is the hypergeometric distribution with constraints x < ni, n
x < n-2. This distribution is used to test therefore the equality of two binomial
probabilities, and as observed above, this test does not depend on their common
value. Rearranging (2.30), we have
tn\ ( Nn^
n\

P[Xl = x | Xi + X2 = n} = P[Xi = x N,ni,n] = with the constraints max(0, ni+n-N} < X < min(ni, n). The expression in (2.31)
can be rewritten using Table (2.3) as:
P[Xi =x

x\(n

n)!ni!n2!
x)\(n<2 n -f x)\N\

From (2.30) and (2.31), the mean of X are respectively given by:
Rn
nn\
= and
The corresponding variances are also respectively given by:
fin (N - R)(N - n)
nni(N-m)(N-ni)
Var(X) =
-N
N(N- 1)

2.5.2

Generalizations of the Hypergeometric

1 . Our first generalization of the hypergeometric distribution relates to the case


when the assumption that the two binomial probabilities are equal is no longer
applicable. This leads to the extended hypergeometric distribution. Here again,
as in the previous section, if X\ and X2 are two independent binomial variates
with parameters (ni,7Ti) and (712,^2) respectively such that TTI 7^ 7T2, the
distribution of X\ + X<2 no longer has a binomial distribution. In this case,
the conditional distribution of X\ given Xi + X^ = n is derived as follows:

CHAPTER 2. PROBABILITY MODELS

26

From the above, n<i = N n\ and again if we let n = X\ + X?, we have

where q\ = (1 TTI), q% = (1 T^), and the denominator is the sum of the


numerator over all possible values of x, which are constrained by:
max(0; ni -f n N) < x < min(ni, n)
If we denote the log-odds ratio of the two binomial probabilities by
, f\i r i ( l - 7 T 2 H\
Ai = log
[7T2(l-7ri) J

then the probability expression in (2.32) can be written as:


P[Xi = x | N, ni, n, A] =

(2.33)
V
n-j
When A = 0, the expression in (2.33) reduces to those in (2.30). The expression in (2.33) is sometimes written as:
P[Xi = x | 7V,ni,n, A] =

CeXx
X\(HI x)\(n x)\(N

n+x

The factorials are those in the 2 x 2 table cells in Table 2.3. C is a function
of (A, JV, ni,n), which is independent of x. The probability function in (2.33)
has been referred to as the extended or noncentral hypergeometric distribution
with noncentrality parameter A. If we instead write ijj = ex representing
the odds, rather than the log odds of the binomial probabilities, then the
probability distribution in (2.33) becomes
P[Xi =x

N,ni,n,ip] =

x\(n\ x)!(n x)\(N n\ n + x)!

Example
Consider again the above example in which N 10, n 5, and n\ = 6; then

Outcomes
S
F
Totals

Population
I
II
x 5 x
6 x x 1
6
4

Totals
5
5
10

Table 2.4: The distribution arranged as a 2 x 2 table


Here again, min = 1 and max = 5. nn = 2 and A = log(2/12). We give
an SAS^ program and its corresponding output for generating the hypergeometric distribution.

2.6. GENERALIZED LINEAR MODELS

27

data hyper;
n=10; k=5; m=6; i2=min(k,m);
il=max(0,(k+m-n));
sum=0.0;
do i=il to 12;
if i-1 It il then prob=probhypr(n,k,m,i);
else prob=probhypr(n,k,m,i)-probhypr(n,k,m,i-l) ;
sum=sum+prob;
output; end;
proc print data=lab2 noobs;
var i prob sum;
format prob sum 8.4;
run; quit;
i
prob
sum
1
2
3
4
5

0.0238
0.2381
0.4762
0.2381
0.0238

0.0238
0.2619
0.7381
0.9762
1.0000

The corresponding extended hypergeometric distribution when we assume


that x = 2, that is, the odds ratio in this case being 2/12, is similarly generated
with the SAS^ program below.
set hyper;
u=2/12;
do i=il to i2;
if i-1 It il then prob=probhypr(n,k,m,i,u);
else prob=probhypr(n,k,m,i,u)-probhypr(n,k,m,i-l,u);
sum=sum+prob;
output;
end;
proc print data=hyper noobs;
var i prob sum;
format prob sum 8.4;
run;
quit;
Extended Hypergeometric distribution

prob

sum

1
2
3
4
5

0.3059
0.5098
0.1699
0.0142
0.0002

0.3059
0.8157
0.9856
0.9998
1.0000

The extended hypergeometric is most useful for power calculations and sample
size determination.
2. Another extension of the hypergeometric distribution relates its extension to
the general / x J contingency tables: that is, the multivariate case. This can
further be extended to cases involving more than two dimensions.

2.6

Generalized Linear Models

The generalized linear models (GLM) are a restricted class of the usual linear models used for regression analysis and the analysis of variance. This class of models
is well treated in the book by McCullagh and Nelder (1989). SAS PROC GENMOD, SPSS PROC GENLOG and the statistical package GLIM (generalized linear

28

CHAPTER 2. PROBABILITY MODELS

interactive modeling) is based on the principles of GLM. The term generalized linear model is due to Nelder and Wedderburn (1972) while applying Fisher's scoring
method to obtain maximum likelihood estimates for exponentially distributed variables.
The general linear model assumes having an observation yi, with x^ being the
vector of explanatory variables, and if /3 = (A , , PP) represents the p-column
vector of unknown parameters, then the linear model is written in the form:
We know from the assumptions of linear models that the yi are independently
distributed normal with means XiP^ and variance a2. That is,

with E(yi) = x'ip.


The GLM extends the classical linear model:
to include a broader class of distributions and gives a
more general relationship between E(Y) and the linear combination of predictors X/3.
The GLM is described by three components, namely:
1. a random component: which describes the distribution of the response variable
(continuous or categorical);
2. a systemic component: which describes how the covariates enter into the
model for the mean; and
3. a link between the random and systematic components: which describes the
functional relationship between the mean response and the systematic component.

2.6.1

Random Component

The responses, y\,yi,...yn can be a random sample from any distribution within
the exponential family. Distributions belonging to this category are the normal,
the binomial, the Poisson, and the gamma, among many others. The exponential
family distributions all depend on a vector of parameters 6 whose log-likelihood can
be written in the form:
u^;

u\y)

')]

(2-34)

where
0=natural (canonical) parameters of the distribution
(more explicitly, O(^JL)}
4> (nuisance) scale parameter
a(), r() and h() characterize the particular member of the exponential family

2.6. GENERALIZED LINEAR MODELS

29

Note: Often a(0i) is of the form 4>/u>i, where the weights (or prior weights) (a>;)
are known and vary from one observations to another. 9 is a scalar (for the oneparameter exponential family) and 0 is a constant parameter. The GLM belongs
to the one-parameter exponential family of distributions and consequently, the pdf
is of the form
)

(2.35)

The particular form of the probability density distribution f ( y \ 6, 0) in (2.35) is


chosen so that the maximum likelihood estimate of 9 does not depend on 0.
The family of distribution in (2.35) has an expected value of Y depending only
on 9 but not on </>.
From the log-likelihood function of f ( y \ #;</>) given in (2.34), we have:

09

a(<p)

(2.36)

a((p)
0)-

(2.37)

Let us show first that the following well known results apply to exponential family
distributions. Consider for example, the Poisson distribution.

Hence,
Similarly,
22n
22
+E(V
)
=-^
M + A*2 2 + -o2 [y - p } ^ +
^M/
A^
A^
A* A^2
rnr

In general, therefore, for an exponential family of distributions, we have

(2 38)

Applying these to our problem, we have:


((j)}

=r'(9), and
= r;/ (0) a(0)

(2.40a)
(2.40b)

where r'(^) and r"(9) are, respectively, the first and second derivatives (with respect to 9) and V(//) is often called the variance function, which describes how the
variance is related to the mean.

CHAPTER 2. PROBABILITY MODELS

30

2.6.2

Systematic Component

The linear predictor, rj = X/3 is a linear function of covariates:


rji = A) + faxii + ... + PkXpi 4- i

2.6.3

Link Functions

The link function g(.), describes the relationship between E(yi) =


predictor:
/

and the linear

(Any monotonic differentiate function can be a link.)


Example: Binomial proportion:
= gfa) = log

g(.) = logit function


The canonical link is the link that relates ^ to the canonical parameter Qi(p,i)
and is often used in practice. That is, g(.) is called the canonical link function.
The density f(y \ 0, </>) is called the error function (as used in GENMOD) and the
parameter 0 is called the dispersion parameter.
We consider four examples of the error functions, namely, the normal, the Poisson, the binomial, and the gamma, in the next section.
The Normal Error Function
If y ~ N(fjJJa2},

then
1

F exp

[ (y

'

2<72

l=e-y2^2
Consequently,
L=

(2.41)

Comparing the above with (2.34), we have

Since E(Y) = //, the canonical linear structure is 0 = n x'/3, that is, a standard
linear model. This is often referred to as the identity link because g(fjj) = p,.

2.6. GENERALIZED LINEAR MODELS

31

The Poisson Error Function


For Poisson data, Y ~ Pois(/z) and the pdf is given by:

f(y
Hence,
- log(y!)

(2.42)

Here again, comparing with (2.34), we have


a(0) = 1;

r(0) = /*

We know from distribution theory that if Y ~ Pois(//), then E(Y) = p, and Var(Y)
= /z. From the above, /z = E(Y) = r'(0) and with a((f>) = 1, r(0) p, = e0 and thus
r'(0) = e61 and, r"(0) = e0 and the canonical linear structure is 9 = log(/z) =
x'/3. The canonical link therefore is the log and leads to a standard log-linear model
for Poisson data.
The Binomial Error Function
If Y ~ b(n, TT), with n known, then the pdf is given by

/-i

\n

= (1 -TT) exp ylog

7T

\ I /n

This again leads to


L = nlog(l - TT) + y log

7T

+ log " )

(2.43)

from which again, on comparing with (2.34), we have


y

a(0) = 1

r(6") = -nlog(l - TT)

Here again, E(Y) = // = nvr, so that the canonical linear structure is


9 = log ( - I = log ( - 1 = x'/3. The canonical link therefore is the logit.
\l-7ry
\n-vJ
The Gamma Error Function
The gamma distribution with parameters a and A, written G(a,A), has the pdf
given by
\a
f(y | a, A) = fT^y^V"1' for y > and a > 0
=

/A\Q
f -A.
- exp a - )y
aj
[ a J \T(a

32

CHAPTER 2. PROBABILITY MODELS

Therefore,

\M;

(2.44)

Comparing the above with (2.34) again, we have


0 = -A/a
0= a
h(fa y} = aaya~l/Y(a}
a(<) = I/a
r(0) = - log(A/a)
with E(Y) = p, = a/A = 1/0 so that the canonical link structure is 0 = =
P-

H~l x'/3. The distribution is only defined for y > 0. Thus for any gamma distribution, n > 0, an adequate restriction must therefore be placed on the parameter
vector /3 to ensure that the expected value is positive.
The canonical generalized linear model for n independent observations with
gamma distribution is (Christensen, 1991) given by:

where /3 is restricted so that x^/3 > 0 for all i. The gamma distribution regression
has been found to be very useful in modeling situations in which the coefficient of
variation is constant: that is, when
/
v Var(^) _ y/a/A? _ 1
As pointed out by Christensen (1987) and McCullagh and Nelder (1989), when the
coefficient of variation is constant in a set of data, such data are often adequately
fitted by using the gamma distribution on the logs of the data.

2.6.4

Summary of Canonical Links

The distributions discussed in the preceeding section each have distinct link functions g(p,). These functions are called the canonical links when g(p,) = Q, where
again,
From the preceding results, we can summarize the canonical link functions for the
distributions considered in the preceeding section in Table 2.5.

2.6.5

Parameter Estimation for the GLM

Again, consider a single observation (the i-th) from an exponential family of distributions with a random variable Y given in (2.35) with the log-likelihood given
as:

and

2.6. GENERALIZED LINEAR MODELS

33

Distributions
Normal

Canonical link
fl'(p')=:Ai

Name
Identity

</>
a2

fj,(9)

Poisson

<7(^,)=log [j,

Log

e9

Binomial

g(fj,) = log ( ^ 1

Logit

e9/(l+e&]

Gamma

g(n)=H~l

Reciprocal

I/A

-1/9

Table 2.5: Canonical links for some distributions

since // = r'(0i). Also,

since
Similarly,

dg

dg

But

hence, = EJ. And finally, we can write


Opi

dL

dLd

dLd

For n independent observations therefore , the maximum likelihood equations for


the (3i terms are given by:

i
dg

_^

~2-'

(Vt

_x. _

^l) dm *

(245)
l

where
(2.46)
The above is generally referred to as the likelihood equation, and these equations
are nonlinear functions in the parameters (3.

2.6.6

Fisher's Scoring Algorithm

Consider the Hessian matrix,

34

CHAPTER 2. PROBABILITY MODELS

The above can be written in matrix notation as X'WX. Thus, the Fisher's
/
<92/ \
information matrix I(/3) defined by E I
j is given by
\

(JPi&Pj /

I(/3) = X'WX
where W, a diagonal matrix, is as previously defined in (2.46). The matrix I(/3)
is particularly important in ML estimation, since its inverse gives the asymptotic
variance-covariance matrix of the ML parameters.
As observed before, the likelihood equations are non-linear functions of the /3's,
consequently, iterative methods are usually employed for the solutions of the equations. The Newton-Raphson iterative method is usually employed in which we shall
assume that (3\ is the t-th approximation for the ML estimate of $. Then, the
N-R method based on Fisher (1935) scoring is for the p (5 parameters in matrix
form is:
IM<*>/3<* +1 > = IMw/3(t) + q<*>
(2.47)
where IM^' is the t-th approximation for the estimated information matrix evaluated at /3^ , q^-1 is the vector having elements dl/d(3j also evaluated at /3^ .
McCullagh and Nelder (1989), and Agresti (1900) have shown that the RHS of
equation (3.18) can be written in the form:
where W^' is W estimated at /3'*' and where z^ is the dependent variate,
. . /(*) \
z-

dg, (*)
The Fisher scoring algorithm reduces therefore to
The above are the normal equations for a weighted least squares for fitting a linear
model with a response variable z^, with design matrix of constants X and weight
matrix W^ . The corresponding weighted least squares solutions are:
(2.48)

where we regress for the t-th iteration cycle, z\' on x i , X 2 , - - - , xp with weights
eights
to obtain a new estimate /3
' . These new estimates are then used to obtain

2.6. GENERALIZED LINEAR MODELS

35

a new linear predictor value g^t+l^ X/3^+ ' and a corresponding new adjusted
dependent variable value z^t+l\ which will in turn, be used in the next cycle of
iteration. The process is repeated until changes are small. This process has been
described as the iterative re weighted least squares. We note here that the vector
z is a linearized form of the link function applied to the data at n evaluated at y.
Hence, to first order approximation, we have
~ h(fa] + (j/i - fa)h'(fa)

As pointed out in McCullagh and Nelder, the iterative process is usually started by
using the data values yi, i = 1, 2,..., n as the first estimate of /}Q, which in turn determines go, (dg/dp,) |o, and VQ. The iterative process continues until differences in
estimates between successive cycles are very small. Note that both the adjusted dependent variable z (it is sometimes referred to as the "working" dependent variable)
and the weights W depend on the fitted values based on current estimates.
A more comprehensive literature on Fisher's scoring algorithm can be found in
Finney (1971) or Green (1984).
For canonical links, both the Fisher scoring algorithm and the Newton-Raphson
method are equivalent and if a(0) is identical for all yi, the like-lihood equations
reduces to
^_^

2_^ xiVi 2^ Xi^1


i

Examples
1. For binomial data with parameters n and TT, we have E(Y) = niTi fa and
V(Yi) = mn(l - TVi) =fa(n- fa)/n
The link is:

(
Hence, dgi/dfa
come

\
) = log(Atj) - log(n - fa)
n-faj

n
;
and the maximum likelihood equations in (2.45) befa(n- fa)
Tr

9 fa TT-^
yj - fa
/
) ^dgi / J fa(n-fa)/n

fa(n-

fa

Similarly, for Poisson data, we have shown that E(Yi) fa V(Yi) and the link
gi = log fa, that is,
log(/z) = X/5
9i
9i
Hence, fa = e and d fa/ dgi e = fa. Consequently, the likelihood equations
become
^(2/i - fa)xi = 0.
i

In general, for exponential- family models, the likelihood equations reduce to the
form given above for both binomial and Poisson cases. That is,

36

CHAPTER 2. PROBABILITY MODELS

>i - mfa = 0

(2.49)

In this case, the Fisher's scoring algorithm is equivalent to the well known Newton's Raphson Iterative procedure. Most statistical packages, GLIM, SAS^, SPSS,
EGRET have capabilities for fitting models to binary data and other data arising
from exponentially distributed variables. Collett (1991) discussed the fitting of a
generalized linear model to binomial data in its appendix.

2.7

Exercises
n | 7r "i 7r n 2

_ , _ ^nk

, 2,
-^ for the
ni!n 2 !---fc!
one-way classification having k categories, find the MLE of TT^, its variance
and covariance.

1. Given the multinomial probability model P(n) =

2. Using results in (2.40a) and (2.40b), obtain the mean and variance for the
binomial distribution with parameters n and 9.
3. Repeat the previous question for the Poisson distribution with parameter p,.
4. Show that the moment generating function for the binomial distribution with
parameters n and 9 is as given in equation (2.7).
5. (i) Give the name of the distribution of X, (ii) find the values of ^ and a2
and calculate (iii) P(2 < X < 4) when the mgf of X is given by:
(a) M(t) = (0.4 + 0.6et)10
(b) M(t) = e8^-1)
6. A prescription drug company claims that 15% of all new drugs that are shown
to be effective in animal tests ever got marketed. Suppose the company currently has 10 new drugs that have been shown to be effective in animal tests:
(a) Find the probability that none of the 10 drugs will ever be marketed.
(b) Find the probability that at most four of the drugs get marketed.
7. In a random sample of 1500 adult Americans, the question was asked as to
how many support the presidents proposals on the national health bill, 920 say
they favor the proposals. Obtain a 95% confidence interval for the proportion
of adult Americans who support the bill. Interpret your result. Also, use the
procedure developed in section 1.2.2 to obtain a 95% confidence interval for
U. Compare your two results.
8. A jar contains 25 pieces of candy, of which 11 are yogurt-covered nuts and
14 are yogurt-covered raisins. Let X equal the number of nuts in a random
sample of 7 pieces of candy that are selected without replacement. (This
example is from Hogg & Tanis, 1997) Find
(a) P(X = 3)
(b) P(X = 6)
(c) The mean and variance of X.

2.7.

EXERCISES

37

9. Let X have the hypergeometric distribution given in (2.31). Find the probability of X = 2 given N = 80, n = 4, and n\ 40. Compare this answer to
a binomial approximation for X with n 4 and TT = 40/80.
10. For the binomial distribution with parameters n and 0, show, using momentgenerating function techniques that Z\ = x~n9 has asymptotically a standard normal distribution.
11. Use SAS^ software for this problem. Let X have the hypergeometric distribution. Find the probability density function of X given that TV = 80, n = 4,
and n\ =40. Compare your answer to a binomial approximation for X with
n = 4 and TT = 40/80. Find P(X = 2) in both cases.
12. Obtain the hypergeometric pdf of X, given that ./V = 75, n = 3, and n\ = 5.
Compare your results here to a Poisson approximation distribution with a
mean A =

(3 x 5)

75

-. Find P(X = 1) again in both cases.

13. Find all possible outcomes that are consistent with this set of marginal totals:
18
9

17

27

10

14. Suppose that an electronic equipment contains eight transistors, three of which
are defective. Four transistors are selected at random and inspected. Let X
be the number of defective transistors observed in the sample, where X =
0, 1,2,3,4. Find the probability distribution for X. Obtain E(X).
15. For the geometric distribution with probability density function

Show that the geometric distribution belongs to the exponential family of


distributions. Find its canonical parameter.
16. Consider the multinomial distribution (trinomial really) with probability parameters 7Ti,7T2, and 7T3. Suppose we wish to test the hypothesis
7T2

7T3

Show that HQ is equivalent to


=,

7T3 = ^

This page intentionally left blank

Chapter 3

One-Way Classification
3.1

The Multinomial Distribution

Suppose we are sampling from a population 17 which contains k types of objects


for which TT^ = P{an object selected at random is of type i} for i = 1,2, ...,fc.
Now, suppose we draw a simple random sample of size n from fJ and classify the
n objects into the k mutually and exhaustive classes or categories according to the
specified underlying probabilities II = (TTI , IT? , TT^ , , Tr/t). Let n be the resulting
observations in each of the k classes. These observed frequencies and the underlying
probabilities are displayed in Table 3.1. Refer to discussion in Chapter 2 for more
details.

Response Categories
1
2
3
k

TTi

7T2

7T3

TTfc

Obs. Freq.

ni

n2

n3

nk

Totals
1
n

Table 3.1: Table of observed frequencies with the underlying probability values
Since the categories are mutually exclusive and exhaustive, we have ^ TT; = 1 and
^2 Hi = n. Thus, it can be shown that the probability mass function (called the
multinomial distribution) for n = (rai, n?, ,n^) with II as defined above is given
by:

n'7r n i 7r n 2 -..7r n f c

p(n,n) = n-^
n

\\t

*;=

(3-1)

where TT^ > 0, ]T] TT^ = 1 and Xli=i n = n- Thus n is a random vector which can
take on any value for which
(i) 0 < Hi < n for i = 1, 2,..., fc, and

(ii) ^ ^ = n.
Now suppose we have a null hypothesis of the form:
39

40

CHAPTER 3. ONE-WAY

CLASSIFICATION

HQ : IIo (TTQI, 7TQ2,

with 5^7Toi = 1. The exact multinomial probability of the observed configuration is


from (3.1) given by:
T7!-7T ni 7r n2 .7T n f c
n!7r

01 ^02 ^Ofc

n i n 2 -rife!
*

n,

(3-2)

To test at an lOOa % significance level, if the observed outcome n came from a


population distributed according to HQ, the following steps are performed:
(a) For every other possible outcome n, calculate the exact probability as in (3.2)
above.
(b) Rank the probabilities from smallest to largest.
(c) Starting from the smallest rank, add the consecutive probabilities up to and
including that associated with the observed configuration n. This cumulative
probability a', gives the chance of obtaining an outcome that is no more likely
than n (that is, outcomes that are as extreme or more extreme than n).
(d) Reject HQ if a' < a where a is specified a priori.
The above procedure was originally due to Roscoe and Byars (1971) and is known as
the exact test procedure. This procedure, as we saw, is based on probability ranking.
We shall explore other exact tests that are not based on probability ranking later
in this chapter.

3.1.1

An Example

Consider the simple case when k = 3 and n = 3. Suppose we observed a one-way


table with observed counts given by n = {2,0,1} and we wish to test whether such
an observed table could have come from an underlying population with probabilities
{0.1,0.2,0.7}. That is, we are interested in testing the null hypothesis HQ : UQ =
(0.1,0.2,0.7) against a two-sided alternative. Then, the number of possible vectors,
M, such as {3, 0, 0} that are consistent with k = 3 and n = 3 are given in
the one-way multinomial by M = (n^^1) (Lawal, 1984). In our simple case,
M = 10. These possible configurations (vectors) that are consistent with k and
n with their probabilities computed under HQ are displayed below from a
software program implementation. The program can be found in appendix B.I.
Vectors
nl

n2

n3

3
2
0
1
2
0
1
1
0
0

0
1
3
2
0
2
1
0
1
0

0
0
0
0
1
1
1
2
2
3

Prob

Cum

0.0010
0 . 0060
0 . 0080
0.0120
0.0210
0 . 0840
0 . 0840
0 . 1470
0 . 2940
0.3430

0.0010
0 . 0070
0.0150
0.0270
0.0480 **
0.1320
0.2160
0 . 3630
0.6570
1 . 0000

41

3.2. TEST STATISTICS FOR TESTING H0

For the observed configuration n = {2,0,1}, a' equals 0.048, and based on this
value, we would reject H0 at a specified a level of 0.05 for example. We note here
that this attained a.' is sometimes referred to as the P-value.
The above test has been referred to (Tate and Hyer 1970) as the exact multinomial test or EMT for brevity. There are certain problems associated with implementing the EMT which basically have to do with the
(i) enumeration of the M vectors, and the
(ii) computation of probabilities given in (3.2) for each vector.
Since the number of configurations is given by M = ( n ^_7 )' with moderate n and k
therefore, the EMT may be feasible but with n and k large, the sheer number of these
configurations as n or k (or both) increase can make the exact test cumbersome.
Table 3.2 for instance, gives values of M for some n and k.

k
3
5
10
20

5
21
126

2002
42504

Sample size (n)


10
20
66
231
1001
10,626
92,378
100015005
20030010 > 6 x 1010

50
1326
316,251
>10 9

Table 3.2: Number of possible configurations (M) consistent with given n and k
From the preceeding table of possible configurations, it is realized that in order to
conduct the exact multinomial test (EMT), one would need to generate all such
configurations that are consistent with n and k which we have seen from the above
table, that, it can be really tasking for k and n large. Additionally, the null multinomial probabilities given in (3.2) would need to be computed for each of these possible
configurations, ranked, and the probabilities accumulated. No wonder, then, that
there are good reasons to look for alternative procedures (or test statistics) to the
exact multinomial test.

3.2

Test Statistics for Testing HQ

Under the null hypothesis (Ho) model specified earlier, interest centers on whether
the observed data fit the hypothesized model. This fit is usually assessed by comparing the observed cell frequencies n with the expected frequencies m, which are
given by mroi. The hypothesized model would be rejected if there is considerable
discrepancy between the observed frequencies and those expected from the model.
Below, we discuss some of the test statistics that are employed to test this discrepancy between the observed and expected frequencies.

3.2.1

Goodness-of-Fit Test Statistics

One of the foremost goodness-of-fit test statistics developed is the classical


statistic introduced by Karl Pearson (1900) and given by:

test

42

CHAPTERS.

ONE-WAY

CLASSIFICATION

x- = y; (n* "U7r*)2

(3.3)'
v

where nTToi = m* is the expected frequency under H0 for category i and n^ is the
corresponding observed frequency. This test rejects HQ when X2 exceeds some
appropriate upper percentage point of a x2 distribution with (k 1) degrees of
freedom. When there is perfect fit between the null model and the data, m^ = ni
for i = 1, 2, ..., k and X2 = 0 in this case. However, as the discrepancy between the
observed n^ and the corresponding expected m^ increases, so will X2. In this case,
the true upper tail probability P(X2 > c) is approximated by the corresponding
upper tail probability of the x2 distribution with (k 1) degrees of freedom. This
is referred to as the x2 approximation.
We give in the next sections other goodness-of-fit test statistics that have also
received considerable attention in recent years.
The Likelihood Ratio Test Statistic
The likelihood ratio test (also known as the log-likelihood ratio test) is defined
(Wilks, 1938) as:
G2 = 2 V n a o g ( ^ )
(3.4)
777- i

To illustrate the derivation of the above, consider for example independent response
variables Yi, Y?, ,Yn having Poisson distributions with parameters /^. The maximum likelihood estimator of the parameter 6 is the value 0 which maximizes the
likelihood function (LF) or the log- likelihood function (I). That is, L(0; y) > L(6; y)
for all 9 in the parameter space. The log likelihood ratio statistic is
G2 = 2 log A = 2[^(0max; y) - *(0;-y)]

(3.5)

where 0max is the maximal model likelihood estimate.


Returning to our example above, the log-likelihood function is:
(3.6)

In the case where all the Yi terms have the same parameter /z, the MLE is given by

and so

^(0; y) = X, 2/t log y - ny - 22 lo yJ

For the maximal model, the MLE are fri = yi (a perfect fit), and so

^max; y) = Y^ yi log yi ~ 1>2 yi ~ Y^ log yi]


G2 in equation (3.5) therefore becomes:
x;y)-(0;y)]

log

Hence, the definition of G .

yi~

3.2. TEST STATISTICS FOR TESTING H0

43

The Deviance
The deviance D was introduced by Nelder and Wedderburn (1972) and is written
x ;y)-(0;y)]

(3.7)

We notice that the deviance is equivalent to the log-likelihood by definition. That


is, D = G2. A model's adequacy is easily assessed by computing D or G2 and
comparing the result with a x2 distribution with the relevant degrees of freedom.
In general, a model that is based on the estimation of p parameters from a data set
with n observations would have its computed test statistic distributed as X 2 _ p . In
particular, for D, we have
_ ...
/ 0 \
D = n-p
(3.8)
As we demonstrate from the Poisson example above, D can be calculated from the
observed and fitted values respectively. However, for some other distributions, D
cannot be calculated directly because of some nuisance parameters. For the normal
distribution for example, it can be shown that

a2D =
where (n denotes the MLE of /^. PROC GENMOD in SAS computes a2D and
also gives the scale parameter which is an estimate of a2 . That is,
a2 =

(3.9)
n p
For the one way multinomial, both G2 and D have asymptotic x2 distribution with
(k 1) degrees of freedom in case of specified null probabilities.

The Freeman- Tukey Test


The Freeman-Tukey test was introduced by Freeman and Tukey (1950). It is given
V^-)2

(3.10)

while Bishop et al. (1975) introduced the improved or modified Freeman Tukey test
which is given by
FT =

v/^T - y/(m + 1) + ^(Irm + l )

(3.11)

The Modified Log-Likelihood Ratio Test Statistic


Kullback (1959) gives the minimum-discriminant information statistic for the external constraints problem as:
(3.12)
The Neyman Modified X2 Statistic
Neyman (1949) proposed
NM 2 = ^(n, - mi]2/nl

(3.13)

44

CHAPTERS.

ONE-WAY

CLASSIFICATION

The Cressie-Read Power Divergence Test Statistic

Cressie and Read (1984) proposed the power divergent test statistic

where A is a real parameter. It can be shown that:


(i) For A - 1 , 1(1) = X2

(ii) As A -> 0, limA^o /(A) = G2.


(iii) As A -> -1, lim A _>_i /(A) = GM 2 .

(iv) For A = -2, 7(-2) = NM 2 .


(v) For A = -I, 7(-|) =1*.

Cressie and Read recommend A = 2/3 because of its superiority over the other values
of A in terms of attained significance levels and small sample power properties.
Under HQ and certain conditions on HQ and k, all the statistics discussed above
all have asymptotically, the x2 distribution with (k 1) degrees of freedom. Pearson
(1900) derived the asymptotic (i.e, as the sample size n increases) distribution for
X 2 , while Wilks (1938) derived that of G2. Cramer (1946) had also derived the
limiting distribution of X2 .
Lawal (1984), Larntz (1978), Koehler and Larntz (1980), and West and
Kempthorne (1972) have, among others, studied the comparative accuracies of some
of the above test statistics. One advantage of G2 over other test statistics however,
relates to its partioning property, which allows us to decompose the overall G2 into
small components. A typical example of this is based on the Lancaster (1949a)
partioning principle, which will be discussed in chapter 5. Another advantage of G2
is that it simplifies the process of comparing one model against another. For this
reasons, the likelihood ratio test statistic G2 is the choice test statistic in this text.

3.3

An Alternative Test to EMT

Earlier, we discuss the exact multinomial test (EMT) in the context of ranking
probabilities. An alternative exact test suggested by Radlow and Alf (1975) ranks
the possible outcomes in terms of a test criterion (say) for example X2. We give
below the result of this approach based on the X2 test criterion. As in the previous
case, we again perform the following:
a For every other possible outcome n, calculate the exact probability as in (3.2)
and the corresponding test statistic X2 under HQ.
b Rank the test statistic X2 in the order of magnitude from largest to smallest and
allow each ranked value to carry its corresponding multinomial probability.

3.3. AN ALTERNATIVE

TEST TO EMT

45

c Starting from the largest ranked, add the consecutive corresponding probabilities
up to and including that associated with the observed configuration XQ . This
cumulative probability gives the chance of obtaining an outcome that is at
least as unlikely as XQ (i.e., outcomes that are as extreme as or more extreme
than X$).
d Reject HQ if this cumulative probability a' is less or equal to a: that is, if a' < a.
Again, let us implement this procedure by referring to our simple example. I shall
use the X2 criterion for illustration here, although any other test statistic could
have been used. In the table 3.3 are the ordered values of the X2 statistic: starting
from the largest to the smallest under the null hypothesis HQ = II = (0.1,0.2,0.7).
We have also presented a SAS software program for implementing this procedure
in appendix B.2.
X-squard
27.0000
12.0000
12 . 0000
10.8095
7.0000
4.1429
2.4762
2.2381
1 . 2857
0.5714

nl

n2

n3

3
0
2
2
1
0
1
1
0
0

0
3
1
0
2
2
1
0
0
1

0
0
0
1
0
1
1
2
3
2

Prob
0.0010
0.0080
0 . 0060
0.0210
0.0120
0 . 0840
0 . 0840
0.1470
0.3430
0 . 2940

Cum
0.0010
0 . 0090
0.0150
0.0360 **
0.0480
0.1320
0.2160
0.3630
0 . 7060
1 . 0000

Table 3.3: Ordered values of X2 with accompanying probabilities


Here, XQ = 10.8095, corresponding to the observed configuration n = {2,0,1}, and
therefore the pvalue equals 0.036 and we notice that the vectors n = {ni, 712,713}
have been reordered from those given under the EMT. We further note that for this
case at least, there is stronger evidence for the rejection of HQ in this example. When
this result is compared with that obtained from the EMT (i.e., where outcomes are
ranked by probabilities), the corresponding pvalue is 0.048, indicating that the two
procedures in many cases do give different results, and this is the main reason for the
controversy regarding whether to conduct exact test by using ranked probabilities
or ranked test statistics.
The number of distinct values of the statistic X2, say, S from table 3.3, is 9
(instead of 10), because there are two configurations that yield the same X2 of
12.000. These are {0,3,0} and {2,1,0} with multinomial probabilities of 0.0080
and 0.0060, respectively. This value of X2 therefore carries a total probability of
0.0140, which is the sum of the two probabilities. The above table should have
therefore been displayed with only one value of X2 12.0000 with a corresponding
probability 0.0140.
In general, S < M, as some vectors sometimes yield the same value of the test
statistic, as is indeed the case for the two configurations above.
As mentioned earlier, not only do the EMT and exact test based on X2 give
different results, but this problem is further complicated by the fact that different
test criteria rank outcomes differently (Lawal, 1984). For example, the likelihood
test statistic G2 ranks the outcomes differently as seen from the results in Table
3.4. In this case, the number of distinct test statistics, S, equals the number of

CHAPTER 3. ONE-WAY CLASSIFICATION

46

possible configurations M. The pvalue based on the G2 criterion for our observed
outcome is 0.048 in this case. We have presented again an SAS^ software progi
that implements this test in appendix B.3.

1
2
3
4
5
6
7
8
9
10

13.8155
9 . 6566
8.6101
7 . 2238
6 . 1046
3 . 3320
2.2128
2 . 1400
1 . 9457
0.8265

3
0
2
1
2
0
1
0

0
3
1
2
0
2
0
0

0
0
0
0

1
1
2
3
1
2

0.0010
0 . 0080
0 . 0060
0.0120
0.0210
0 . 0840
0.1470
0 . 3430
0 . 0840
0 . 2940

0.0010
0 . 0090
0.0150
0 . 0270
0.0480
0.1320
0.2790
0.6220
0.7060
1 . 0000

Table 3.4: Ordered values of G2 with accompanying probabilities


As another illustration, consider again the null hypothesis, Hoi = (0.3, 0.3, 0.4) and
n = 3; we still have M = 10 but S = 6 for the X2 criterion in this case. We list the
values of this statistic in Table 3.5.

X2
7.000
4.500
2.556
2.278
1.444
0.056

Cumulative
Probability Probability
0.054
0.054
0.064
0.118
0.162
0.280
0.216
0.496
0.784
0.288
0.216
1.000

Table 3.5: Ordering of all distinct vectors consistent with k 3, n = 3 by X2


In Table 3.5, vectors (0, 3, 0} and {3, 0, 0} with probabilities 0.0270 each give a computed X2 value of 7.000 and therefore a combined probability of 0.0270 + 0.0270 =
0.054. We give in Table 3.6 the vectors and their corresponding probabilities for all
the M 10 possible vectors in this case.

X2
7.000
4.500
2.556
2.278

1.444
0.056

Vector Probability
0.0270
030
300
0.0270
003
0.0640
120
0.0810
210
0.0810
0 21
0.1080
2 01
0.1080
012
0.1440
1 02
0.1440
1 11
0.2160

Table 3.6: Ordered vectors for X2, under H0= (0.3,0.3,0.4)

3.3. AN ALTERNATIVE TEST TO EMT

47

Suppose n = (2,1,0); then the corresponding pvalue would be 0.2800 for the X2
criterion under the above null hypothesis. Similar results can be obtained for the
test statistics /() and the G2. Table 3.7 gives some results for some null hypotheses
for three goodness-of-fit statistics.
H0

.1 .2 .7
.1 .2 .7

.3 .3 .4

.2 .2 .6

Test
Sample Observed Observed
Criterion Size
Vector test value
X'2
201
3
10 .8095
201
G2
3
6. 1046
!(-}
201
3
8. 4495
Xz
307
10
6 .000
G2
10
307
6 .592
10
307
5.832
A^
10
30 7
5.250
G2
10
30 7
7.835
10
30 7
5.698
^2
2 .667
10
30 7
G2
10
307
4 .591
10
307
3.039

S
9
10
10
53
66
66
27
35
36
24
36
36

PTail
Value Probability
0. 0368
0 0045
0 .048
0 0473
0. 0360
0 0146
0. 0481
0 0498
0. 0345
0 0370
0. 0345
0 0541
0. 0691
0 0724
0. 0374
0 0199
0. 0559
0 0579
0. 3986
0 2636
0. 2395
0 1007
0. 2836
0 2188

Table 3.7: Comparative results for various null hypotheses, sample sizes and three
test statistics X2, G2 , 7(2/3)
In Table 3.7, the tail probabilities are the corresponding upper tail probability of the
approximating x2 distribution with two degrees of freedom. For instance, for the
very first line in Table 3.7 above, the tail probability in the last column is obtained
by evaluating P(%2 > 10.8095) = 0.0045.
We see that the x2 distribution gives a very poor approximation to the exact pvalues in most of the cases. The reason for this is that the three statistics
have discrete type distributions and these are being approximated by a continuous
type distribution. Further, the different test statistics order the vectors differently
(Lawal, 1984; West & Kempthorne, 1972). We also observe that the number of
distinct values of the test statistics varies from one test criterion to another and
from one null hypothesis to another. Our results in Table 3.7 further show that
the distribution of X2 is better approximated by the %2 distribution than any of
the other test criteria in terms of attained nominal level. The above conclusion has
been supported by various works (e.g. Lawal 1984; Koehler and Larntz 1980 among
others).
Returning to the underlying %2 approximation to each of these statistics, the
results in the preceeding table indicate that this approximation is not well suited
for the situations displayed in the above table. Consequently, it has been suggested
that this approximation is only valid when the expected values are large and that
the approximation ceases to be appropriate if any of the expected cell frequencies
mi nTTQi becomes too small, since the x"2 approximation to X2 was derived under
the assumption that with k fixed, ra^ becomes increasingly large as the sample size
n also becomes large. That is,
rrii > oc, as n > oo, for i = 1, 2, , k
How small is "too small" has been the subject of controversy for quite some time.
Suggested minimum expected values range from 1 to 20. Good et. al. (1970) provide

CHAPTER 3. ONE-WAY

48

CLASSIFICATION

an overview of the historical recommendations. Earlier recommendations are Fisher


(1924) rm > 5, Cramer (1946) rm > 10, and Kendall (1952) nn > 20, while Cochran
(1954) states that in goodness-of-fit tests of unimodal distributions (e.g., Poisson or
Normal), tail expectations should be at least 1, that is, m* > 1. In an earlier article,
Cochran (1952) has advocated that satisfactory approximation may be obtained in
tests of distributions such as the normal when a single expectation is as low as 0.5,
the others "being above the conventional limits of 5 to 10." Yarnold (1970) has
suggested that the following criteria be used:
If the number of classes A; > 3 and if r is the number of expectations
less than 5, then the minimum expectation may be as small as .
k
However, strict application of this rule generally involves collapsing together of
one or more categories, and in most cases this may detract greatly from the interest
and usefulness of the study. For instance in Table 3.8 (taken from Lawal, 1980),
k 6 and r = 4.

Classes
THi

I 2 3 4
5
6 Total
21
0.1 0.2 0.3 0.4 10.0 10.0

Table 3.8: Expected values under some model


Hence, the minimum expected value by Yarnold's rule is 3.33. Thus, it would be
necessary to collapse the first five cells, leaving us with only 1 degree of freedom
for our test of significance! A less restrictive use of Yarnold's rule was proposed by
Lawal (1980), and this is examined later in this chapter.
Because small expectations affect the validity of the x2 approximation to the
discrete distribution of X2 (and indeed to all the test statistics mentioned above)
and because X2 has received more attention than any other test criteria, we give
below some of the alternative approximations that have been proposed over the
years.

3.4
3.4.1

Approximations to the Distribution of X2


The Continuity Correction

Because of the discreteness of X2, Cochran (1942) proposed the continuity corrected
X2 whose upper tail area is given by
P\Y2 ^
> cr\\ ~
~ -"fc-H
H,
(C +
-HAQ
^ \)

where w means "approximately," Hk-\ is the upper tail probability of a x2 distribution with (k l)d.f., cis the observed value of X2, and d is the next smallest
possible value of X2.
As an example, consider the observed configuration {2,0,1} in the case when
k = 3 and n = 10 under the null 11 = {0.1,0.2,0.7} discussed earlier. There,
X$ = c = 10.810, d = 7.000, and hence P[X2 > 10.810] w P[xl > 8.905] = 0.012.
This value, while not exactly equaling the exact probability of 0.036, it is nonetheless
closer to the exact value than the usual x2 approximation pvalue of 0.0045 obtained
again earlier.

TO THE DISTRIBUTION OF X2

3.4. APPROXIMATIONS

3.4.2

49

The C(m) Approximation

The C(m) distribution was first proposed by Cochran (1942) and is derived under
the assumption that as n ~> oo, some expected values will remain finite, while the
remaining expected values will be large. The limiting distribution of X2, called the
C'(m) distribution with parameters s,r, and m, is defined as:

where s = (k 1) and [7; is a Poisson variate with mean ra^. Cochran's definition
of the above distribution was limited because it was defined only for cases when
r = I and r = 2. Yarnold (1970) extended Cochran's results and showed that the
approximation works very well in the region of the parameter space (upper tail area)
in which the x2 approximation fails. Lawal (1980) obtained the 5% and 1% points
of the C(m) distribution.

3.4.3

The Gamma Approximation

Nass (1959) suggested approximating cX2 by a Xd distribution, where c and d


are chosen by matching the means and variances of both cX2 and a x"d- That is,
E(cX2) = d and Var(cX2) = 2d since E(xd) = d and Var(xJ) = 2d. From the
above,
.
2E(X2)
and
Var(X2)
d=

Haldane (1937) showed that:


E(X2} = k-l
2

Var(X 2 )
and

/rj

-i >\

VarpsT ) = 2(fc - 1) + (R - k - Ik + 2)/n


where R = \J TT~ l. Consequently,
i

Pr(cX2 > X2d] = Pr[X2 > X2d}

where c and d are given by (3.15).


It is evident that practical use of this approximation necessarily involves fractional degrees of freedom. Earlier attempts to use this approximation (Nass, 1959;
Yarnold, 1970) have sought to obtain the critical value corresponding to the fractional degree of freedom by interpolation in standard x2 tables. This procedure
has limited utility because it can lead to inaccurate critical values and it is labor
intensive.
The cube approximation in Abramowitz and Stegun (1970, pg. 941) for approximating the percentage points of the x2 distribution when there are either integral
or fractional degrees of freedom has been used to overcome this difficulty. The
Best and Roberts (1975) algorithm along with the necessary auxiliary routines is
an excellent program for computing the lower tail area of the x2 distribution with
d degrees of freedom for a given percentile. Lawal (1986) produced excellent results
from this approximation.

50

3.4.4

CHAPTERS.

ONE-WAY

CLASSIFICATION

The Lognormal Approximation

Lawal and Upton (1980) employed the two-parameter lognormal distribution to


approximate the upper tail distribution of X2. From the expressions for the mean
and variance of X2 in (3.16) we observe that E(X2) k 1, which is also the mean
of the approximating x2 distribution with (k 1) degrees of freedom. Similarly,
when R = ]T\ Tr^1 and k are small in comparison to n, then in this case, Var(X2) w
2(fc 1), again the variance of the approximating x2- In this situation, we would
hope the approximation would work well. Unfortunately, if R/n is large, as it
would be if there are some very small expected values, then the variance of X2 is
greater than the variance of the approximating distribution and, consequently, the
tail probabilities of X2 may greatly exceed the nominal values.
The Lawal and Upton (1980) approximation sought to correct this by once again
matching the first two moments of X2 with those of the two parameter lognormal
distribution. Basically, the approximation amounts to writing
P[X2 >z} = P[Z > z]
(3.17)
where Z has a lognormal distribution with parameters IJL and <72, which are related
by the method of moments to the mean and variance of X2 given in (3.16) by

u
P

u- u translates
+
i , *to
which

= 9--ib
1
^ =il)-0
9 = 2log[E(X2}}
= 2 log (k - 1)

(3.18)

and

= log [A;2 - 1 + (R - k2 - 2k + 2)/n]


If ua is the upper a point of a unit normal random variable, the above implies that
the upper a point for X2 is obtained to be
exp{/u + aua}

where P[X2 > z] = $[(logz AO/cr], and <&(.) is the unit normal distribution
function.
Lawal and Upton showed that the lognormal approximates well the upper tail
areas of X2 and that the approximation seems impervious to the number of small
cell expectations. They further suggested that the approximation will usually work
well if we allow minimum expected cell value not to be less than r/d* where r is
the number of expected values less than 3.0 and d is the degrees of freedom. Based
on this, Lawal (1980) gives the following slightly restricted rule for the use of the
X 2 distribution.
If the degree of freedom on which fit is based is d and r is the number
of expectations less than 3, then the minimum expectation may be as
small as r/d3/2.
We summarize these various rules in Table 3.9.
If Yarnold and Lawal's rules are satisfied, use the usual x2 as the approximating
distribution. Both the lognormal and the C(m) require the use of computed critical
levels based on the expressions above and those in Lawal (1980).

3.4. APPROXIMATIONS

TO THE DISTRIBUTION OF X2

51

Number of small minimum


Rules

expectations (r)

771;

Yarnold

r <5

5r
k

Lawal

r <3

(fc-1)3/2

Lognormal

r <3

r
(fc-1) 3 / 2

Cm

r <3

(fc-l)3/ 2

Table 3.9: Summary of rules for the case when k > 3

3.4.5

The Modified X2 Statistic

Because the variance of X2 is highly inflated when some expected values are small,
Lawal (1992a) has proposed a variance stabilizing test statistic: the modified X2
test statistic, which is defined as:
k
2
r =E[(n*-o)-m*]2/mi
(3.19)
*

- T
1= 1

T has variance 2(k 1) if the ni follow the Poisson distribution and variance
2(k 1)(1 ^) if the Hi follow the multinomial distribution. This test is also shown
to belong to the family of the power divergence test statistic proposed by Cressie
and Read (1988), which includes
c,

d)

where

Then T2 is equivalent to the family with A = 1, c = -| and d = Q. The T2 defined


in (3.19) above is related to the statistic D2 proposed by Zelterman (1987) by:

where D2 =

3.4.6

- m^ 2 -

Applications

We now give examples of the use of these approximations. The first example is
concerned with gene heredity. Peas may be yellow or green, round or wrinkled,
short or tall. Within each of these classifications, the dominant gene is believed
to be the first-named category, and a theory suggests that these characteristics
are mutually independent with, for each characteristic, the ratio of dominant to
recessive being 3 to 1. Suppose that we have a random sample of 64 peas whose
characteristics are given in Table 3.10 below.

CHAPTER 3. ONE-WAY

52

CLASSIFICATION

Green
Yellow
round
wrinkled
round
wrinkled
Short long short long short long short long
Observed
Count
Expected
Count

36

12

27

Table 3.10: Example of a nominal data: Pea types


The expected frequencies to the genetic theory are included in the table and we
wish to enquire whether the observed data are consistent with the theory, where,
for instance, the expected values of 27 and 1 are obtained as:

1 = x x x64
4 4 4

The expected frequency of 1 in the final category rules out the use of the x2 approximation since Yarnold's rule is violated. The common practice of combining
this category with some adjacent category does not make sense here because the
data are nominal and not ordinal. There is therefore no category that can be said
to be adjacent and arbitrary clumping of categories seems unsatisfactory.
To use the lognormal approximation, we note that for these data n = 64, k = 8
and R = 2.37, giving 9 = 3.89, ^ = 4.12 and hence p = 1.83 and a2 = 0.23.
Consequently, the upper 1% point of the distribution of X2 is estimated as 19.13,
compared with the tabulated value of 18.48. The corresponding 1% point using the
C(m) approximation is 19.35 (from Lawal, 1980). Since the observed value of X2
is 13.56, there is clearly no cause to doubt the theory in the present case.

3.5

Goodness-of-Fit (GOF) Tests

The standard situation involves n observations obtained as a simple random sample


from a population, which are classified into k mutually exclusive categories. There
is some theory or null hypothesis which gives the probability KJ that an observation
will be classified into the j'-th category. The TTJ are sometimes completely specified
by the theory, e.g., genetic characteristics, or sometimes they are functions of other
parameters 0.1,0.2,' ,Ok whose actual values are unknown. An example here is,
for instance, an underlying normal distribution with unknown mean and variance.
The quantities rrij are called the expected frequencies. Under the null hypothesis,
interest centers on estimating the expected frequencies and consequently using any
of the well known statistics above to conduct the test.
We have shown in chapter 1 that the maximum likelihood estimate (MLE) of TT^
77 '
from the multinomial one-way distribution in (3.1) is given by TT; = . We shall
n
use this result and the concept of MLE to find solutions to the examples in the next
section.

53

3.5. GOODNESS-OF-FIT (GOF) TESTS

3.5.1

Examples: Nominal Variable with


Many Categories

The data below relate to five methods of child delivery in eclamptic (eclampsia)
patients (Jolayemi, 1990b). The variable "mode of delivery" in this case has five
categories. A variables with many categories is sometimes referred to as a polytomous variable.
Mode of Number of
Delivery
Cases
CS
313
Normal
258
Vacuum
95
Forceps
81
ABD
41

%
39.72
32.74
12.06
10.28
5.20

Table 3.11: Number of deliveries in eclamptic patients


for CS (Caesarian section); Normal (SVD) delivery; Vacuum (Vacuum extraction);
Forceps (Forceps delivery); and ABD (Assisted breech delivery).
Here, we have five classes or categories and the following null hypotheses are of
interest:
(i) HQI : 7Ti TT, i = 1, 2, ,5, that is, the equi-probable case
(H) HO? : TTi = 7T2 ; 7r3 = 7T4 = 27T5
(iii)

#03 == ^ = ^ = ^ = ^ = 7T5

against the general alternatives.


Under #01, the likelihood function becomes:
L(n,HQ) = C7r ni+n2+n3+n4+n5
Thus, minimizing logL subject to the constraint

= 1 gives MLE estimates

TT = . The expected values are n/k (=157.60) and the corresponding G2 and X"2
are 369.4453 and 365.5533, respectively on 4 d.f This model gives a poor fit to the
data with a pvalue = 0.0000.
For the second null hypothesis, if we let K\ = 7^2 = ^A > and KS = KI 27r5 = KB ,
then the likelihood function becomes:

Again, maximizing log L subject to the constraint ITT A H


mates -ftA =

= 1 gives MLE esti-

= 0.3623 and TTR =


r
= 0.1102 and estimated cell
2n
|n
probabilities {0.3623, 0.3623, 0.1102, 0.1102, 0.0551}. The corresponding expected
values for each cell which are the product of corresponding probability estimates
and the total sample size n are respectively {285.4924, 285.4924, 86.8376, 86.8376,
43.4188}. The corresponding G2 and X2 are 6.4315 and 6.5944, respectively, and

54

CHAPTERS.

ONE-WAY

CLASSIFICATION

these are each based on (5 3) = 2 degrees of freedom since two parameters (namely,
it A and TTB) are estimated with corresponding pvalue of 0.0401.
For the third null hypothesis, let 7T5 = TT. Then, TTI = ir-2 = TTT, 7^3 = 7^4 = 2n,
and
L = C(77r) ni+n2 (27r) n3+n4 7r n5
Maximizing log L subject to the constraint 7?r + TTT + 2n + 2?r + TT = 1 gives MLE
estimate TT = 1/19 = 0.0526, and the corresponding expected values are {290.3158,
290.3158, 82.9474, 82.9474, 41.4737}. Computed G2 and X2 under this model are
7.1900 and 7.1766, respectively, and are again based on (5 2) = 3 degrees of
freedom since only one parameter was estimated. The corresponding pvalue here is
0.0681.
Based on the above results, while both hypotheses HQZ and H03 are tenable
at a = 0.01 nominal level, only HQS is tenable at a = .05 nominal level and is
considered more parsimonious than #02- We would therefore prefer #03
We would thus conclude that the occurrence rate of the mode of delivery by
cesarian section and SVD is seven times the occurrence rate of birth by ABD and
that the occurrence rates of delivery by vacuum and forceps are twice that of delivery
by ABD. Thus in the population, the categories are not uniformly distributed, with
the cesarean mode being as likely as the SVD mode. Similarly, the vacuum model
is as likely as the forceps mode of delivery. We present in appendix B.4 the SAS^1
software program and corresponding output employed in implementing some of the
results above.
In the SAS^ software program in appendix B.4, we first fit the equiprobable
model using PROC GENMOD. In the model statement, we specify that the data
is Poisson distributed with dist^poisson and that link=log. The make obstats
obstats option asks for the output to be written to a file named aa. The print
command requests specific variables to be printed with a format statement indicating output to four decimal places. The GENMOD output includes predicted values,
lower and upper confidence intervals for each predicted observation, and residuals
and other statistics.
The corresponding SAS^ software output gives the estimate, std Err, and
the relevant ChiSquare for the test of statistical significance of the estimated
parameter. The column headed pred gives the estimated ra;, while xbeta gives the
corresponding estimated log-means. The Resraw gives the raw residuals Hi m^,
while Reschi gives the standardized Pearson's residual, which is the square root of
the individual Pearson's X2. The model gives a deviance or G2 value of 369.4458
on 4 d.f.
PROC FREQ can also be used to obtain some of the results in the preceeding section. The first TABLES statement in this appendix fits the equi-probable
model. The second TABLES statement computes X2 for the model based on the
second hypothesis using TESTP, where the last P proportions are inputed. The
third TABLES statement also computes X2 statistic for the model based on the
third hypothesis using TESTF, with the last F indicating that frequencies are being inputed. In all the cases, the X2 values are correct but the degrees of freedom
in TESTP and TESTF should be 2 and 3, respectively.
Finally, PROC CATMOD is also employed to obtain some of the results as in
the previous case. This is implemented in CATMOD by stating in the RESTRICT

3.5.

GOODNESS-OF-FIT (GOF) TESTS

55

line log f - 1. The results are again similar to those obtained under PROC FREQ.
\rruj
Another way of analyzing the data in the last example is to consider the variable
of interest mode of delivery as a nominal polychotomous variable (Clogg & Shockey,
1988) where if we denote the variable as A, then the saturated log-linear model (this
will be fully discussed in Chapter 6) is given by:
log (mi) = p, + Af
If we let Li log (mi), then the MLE estimates are given by p, = L+/I and
Af = Li ft, with Y^i ^t = 0- As pointed out by Clogg and Shockey (1988), it is
sometimes useful to examine a set of logits for this type of data, where one category
is used as a reference point. In this case, the logit 4>i = Li LI = log(rij/n/) =
Af - Xf. This is the log odds that A i rather than A = I.
In the example above, if we use the first category as the reference, then we can
compute the logits (j>i = log(rii/ni), i 2, ,5. These are respectively {-0.213,
-1.192, -1.352, -2.033}.

3.5.2

Analysis When Variable Categories Are Ordered

When the variable of interest is ordinal in nature, then we may exploit this ordinal
nature of the variable in our modeling techniques. Again, both the saturated model
and the equiprobability model (Ho : TT = l/k) discussed above are also relevant but
various other models are quite possible. Suppose the ordered categories are assigned
scores {^}, i = 1,2, ,/. Various forms of the scoring can be used. Scores can
be obtained that are based on distributional assumptions or scores can be assigned
based on prior knowledge. Popular scores are the integer scores, z/i = i, which are
equivalent to the assumption of equal spacing or Vi = i (/ + l)/2, which centers
the scores. Whichever scores we adopt, inferences derived from subsequent analyses
are dependent on the scoring system adopted. We shall elaborate further on this
in Chapter 10. Four main models have received wide attention in relation to the
analysis of an ordered polytomous variable. These are the the equiprobable model,
the linear effect model, the the quadratic effect model, and the symmetry effect model.
The log-linear model formulations for these models are described below with the
relevant identifiability constraints.
(i) The equiprobable model:
%i

(ii) The linear effect model:


(iii) The quadratic model:
(iv) The symmetry model:

==

'M

==

^, , 1

ii =
_ > , n

<-i /\ ~r pVi -~r '/l/j

a _ \ , \A

The symmetry model in (iv) is subject to the following constraints: Xf Xf,


^2 ~ ^?-n ^f ^-2' ' ' ' WG note here that since the model of symmetry
above implies that, i\ = lj, 1% = li-i, 3 = ^7-2, and so on, the symmetry
model actually implies a composite of separate equiprobability models. That is,
categories 1 and I are equiprobable, categories 2 and (I 1) are equiprobable, and
so forth. This is sometimes referred to in nonparametric statistics as the "umbrella
alternatives."

CHAPTER 3. ONE-WAY

56

CLASSIFICATION

Example
We now apply some of the models discussed in the preceeding section to the following
data taken from Haberman (1978, p. 85), which concern the political views of
subjects interviewed during the U.S. 1975 General Social Survey. The data are
displayed in Table 3.12 with seven categories: (1) extremely liberal, (2) liberal,
(3) slightly liberal, (4) moderate, (5) slightly conservative, (6) conservative, (7)
extremely conservative.

m
mi

Extremely
liberal
(1)

46

40.5

ModeExtremely
rate
conservative
(3)
(4)
(7)
(2)
(6)
(5)
232
179 196
559
150
35
164.5 214.0 559 214.0 164.5
40.5

Table 3.12: Ordinal data: Political views


In Table 3.13 are the results of fitting some of the models (under integer scores)
discussed above to our data.
Model
d.f.
pvalue
Equiprobable
6 832.82 0.0000
Linear
5 832.28 0.0000
Quadratic
4 133.94 0.0000
Symmetry
3 7.09 0.0691
Table 3.13: G2 and pvalues based on the above models
In this example, the model of symmetry fits and has MLE given by the following
expressions. These expected frequencies are displayed in Table 3.12.
!/ + n )
mi =m7 = -(HI
7
1,
m2 = ra6 = -(n2 + n&)
Zt

!/ + n )
m3 = ra5 = -(n
5
3
rh,4 n4

The above models can be implemented in SAS^ with the program in appendix B.5.
The symmetry model for instance, can be decomposed into its components in terms
of their contributions to the overall G2 value of 7.09 as follows. This is accomplished
with the contrasts statements in the SAS^ software program in appendix B.5.
Type of symmetry df
1 &7
1.4985
2&6
2.5596
3.0316
Total
3 7.0897
Table 3.14: Results from symmetry analysis

3.6. GOODNESS OF FIT FOR POISSON DATA

57

The results are given in Table 3.14, where for instance the G2 = 2^rijlog f ^
= 1.4985 contribution is computed as:
46
35
+ 35 x log
1.4985 = 2 [46 x log
405
405
since n\ = 46, n-j = 35, and mi = rh-j = 40.5. The SAS^ software results from the
symmetry model implementation are displayed below.
SAS CONTRAST Statement Results
Contrast

DF

ChiSquare

3
1
1
1

SYM
1 ft7
2 ft6
3 ft5

Pr>Chi Type

7..0896 0,,0691 LR
1..4985 0. 2209 LR
2 .5596 0.,1096 LR
3 .0316 0.,0817 LR

Haberman (1978) gives some further examples of analyses of data of this type.
Plackett (1981) discusses the general analysis of Poisson data, while McCullagh
(1980) discusses the various types of regression models that may be used for ordinal
data.

3.6

Goodness of Fit for Poisson Data

The theoretical distribution for counts that is often considered first is the Poisson
distribution, which represents events in either time or space as random processes
and where the probabilities of counts of 0,1,2,... are given by:
p(x) = P{X = x} =
for 1 =
where p(x) is completely defined by the one parameter A, which is the average count.
The following cases for goodness-of-fit tests for the Poisson data are considered:
The parameter A is known a priori, or unknown. When A is known and interest
centers on whether the observed counts follow the Poisson distribution with the
given parameter, then, specifically, we wish to test HO : X ~ -P(A) where A is
known. This is illustrated with Table 3.15, where we consider a one-way table
(with k classes) having the observed frequencies Ui and corresponding expected
frequencies under a Poisson distribution with parameter A assumption.
X

Observed
Expected

0
U

no

1
J-

9
j

...

71 1 7T-2

>
k J-1 Total
^ r\j

nk-i

n
n

Table 3.15: Table of observed and expected counts


With A known, the expected values mx = mrx
e~x\x
7TX = - .
X\

for

x = 0, 1, , (k 1), where

The corresponding Pearson's statistic is computed as:


fe-i
X
GOF
i=0

CHAPTER 3. ONE-WAY

58

CLASSIFICATION

and is distributed %2 with (k 1) degrees of freedom.


Rarely is there any theoretical reason for expecting a particular value of A. More
often than not, A has to be estimated from the sample. The maximum likelihood
estimate of A was shown from Chapter 1 to be
A=n

H = y-

Consequently,

and mx = nirx for x 0,1,.


k-l

,-XXx

i-yyx

xl

x\

(k-l).

It follows that X2 = Jj i mi)2/mi is distributed

with (k t 1 ) degrees

i=0

of freedom where t is the number of parameters estimated from the sample. In this
case, the d.f. equals (k 2), since only one parameter, A is estimated from the data.
Example 3.5: Horse Kicks in the Prussian Army
As an example to illustrate the fitting of a Poisson model to data, let us consider the
frequency of death by horsekicks in the Prussian corps as recorded by Bortkiewicz
(1898) for each of 14 corps for each of 20 years, giving a total of 280 observations.
Death per corps per year
Frequency of occurrence

0 1 2 3 4 +
144 91 32 11 2

Table 3.16: Number of deaths from Prussian horse kicks


The hypothesis of interest centers on whether the observed data can be modeled by
a Poisson distribution with parameter A. An estimate of the mean death rate (that
is, A) is
~
0(144)+ 1(91)+2(32)+ 3(11)+4(2)
196
280

280

= 0.70

and hence, P[X = 0] = e A = 0.4966. The remaining probabilities are obtained


from the recursive formula given in chapter 2. (Show this.) Hence, P[X 1] =
^p.P[X = 0] = 0.7 x 0.4966 = 0.3476 and similarly for probabilities for when
X = 2,3, and 4, respectively. These results are displayed in the next table.
Expected Observed
Deaths Probability frequency frequency
144
0
0.4966
139.05
1
91
0.3476
97.33
32
2
0.1217
34.08
11
0.0284
3
7.95
4+
2
0.0057
1.60
The computed G2 and X2 are 1.9978 and 2.1266, respectively, which are clearly
not significant (pvalue = 0.5466) when compared with Xs (d.f. 3 5 1 1) at
a 0.05, and we therefore have no reason to doubt the data follows a Poisson
distribution.

3.6. GOODNESS OF FIT FOR POISSON DATA

59

We can implement the above fit with the SAS^ software statements in appendix
B.6 with a partial output below from the program. Note that FAC variable in the
program generates the factorials of 0 to 4. A more detailed output is provided in
appendix B.6.
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

Value/DF

3
3

1.9978
2.1266

0.6659
0.7089

Deviance
Pearson Chi-Square

Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

Intercept
DEATH
Scale

1
1
0

-0.7028
-0.3516
1.0000

0.0782
0.0720
0.0000

Wald 957.
Confidence Limits
-0.8560
-0.4928
1.0000

-0.5495
-0.2104
1.0000

ChiSquare

Pr > ChiSq

80.77
23.82

<.0001
<.0001

The estimate of A is given by the negative of the intercept (0.7028), where X =


0,1, 2,3,4. This estimate does not exactly match the estimate of A = 0.70 that we
obtained earlier but is very close, and the model fits the data well. Similarly, the
logarithm of the mean also gives the log estimate of the slope. That is, 0.3516 =
log (0.7028).
Notice that one of the expected values above is less than 3. Here r = I and
d = 3 and hence the minimum expected value, in order to use either the C(m) or
the lognormal approximation, cannot be lower than r/d? = 0.19. The computed
smallest expectation of 1.4 clearly satisfies this condition. Hence, the corresponding
critical value from the C(m) table at the 5% nominal level is 8.02, which indicates
that the null hypothesis is tenable.
In the above analysis, we have assumed that more than four deaths per corps per
year are impossible hence our estimate of A from the Poisson model is not exactly
0.7. The Poisson model is based on the assumption that the sum of the probabilities
Y^,i=oP(x} = 1 when A = 0.7. As seen below from the SAS^ software output, these
probabilities do not sum to 1 until the number of deaths gets to 6.
data horse;
sum=0.0;
do i=0 to 7;
if i-1 It 0 then prob=poisson(.7,0);
else prob=poisson(.7,i)-poisson(.7,i-1);
sum=sum+prob;
output;
end;
proc print data=horse noobs;
var i prob sum;
format prob sum 8.4;
run;
i

prob

sum

0
1
2
3
4
5
6

0.4966
0.3476
0.1217
0.0284
0.0050
0.0007
0.0001

0.4966
0.8442
0.9659
0.9942
0.9992
0.9999
1.0000

60

CHAPTERS.

ONE-WAY

CLASSIFICATION

It is therefore desirable to consider the case where the number of deaths per corps
per year is more than 4 to have zero frequencies. This can be implemented by
setting the frequencies to zero for categories 5 and 6. When this new model is
implemented, we now have the estimate of A being equal to the negative of the
intercept (0.7000), while the log of the mean is also given by the slope. That is,
0.7000 = e--3566. This model gives a G2 = 2.4378 on 5 d.f., which is a better
fit than the earlier model (pvalue 0.7858). The corresponding SAS^ software
program is in appendix B.7 and again a partial output from the implementing the
new model is displayed below.
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

Value/DF

5
5

2.4378
2.3687

0.4876
0.4737

Deviance
Pearson Chi-Square

Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

Intercept
DEATH
Scale

1
1
0

-0.7000
-0.3566
1.0000

0.0779
0.0714
0.0000

Wald 95%
Confidence Limits
-0.8528
-0.4966
1.0000

-0.5473
-0.2166
1.0000

ChiSquare

Pr > ChiSq

80.70
24.91

<.0001
<.0001

Let us now consider some variations of the above assumptions. First, assume the
data follow another single parameter distribution such as the negative binomial
distributions instead of the Poisson distribution. The second possible violation of
the above analyses is if, for instance, the means of the observations rii follow some
systemic pattern. For example, for data gathered over several days, the means
might be constant within a day but vary over days (e.g., traffic accidents across
different seasons).
In such situations, a more powerful test than the X^OF against the alternatives
is the variance test. The rationale for this procedure is based on the fact that both
the mean and the variance of a Poisson variate are equal to A. That is, jj,x = x,
al =x.
Hence,

V
A,,
=

The above is known as the variance test and under HQ, X% ~ Xn-i- The degree of
freedom (n 1) is due to the fact that the summation is over n observations and
only one parameter, namely, x, was estimated from the data.
We note here that this statistic is calculated from the individual observations
and does not require the construction of k classes as does the XQOF. For this
reason, X% can be used in small samples where XQOF would be inappropriate.

61

3.6. GOODNESS OF FIT FOR POISSON DATA


Example

In the preceding example, A = 0.70 and we give in the next table the number of
deaths and their corresponding observed values:
Observed
frequencies

i
0
1
2
3
4+

144
91
32
11
2
280

Totals

The variance test is now computed as follows:


144(0-0.7) 2

-2

91(1-0.7) 2

32(2-0.7) 2

11(3 - 0.7)2

2(4 - 0.7)2

0.7

0.7

= 100.80 + 11.70 + 77.257 + 83.129 + 31.114


= 304.00
because the grouped data can be visualized as the following:
0,0,...,0, 1,1,...,!, , 3,3,...,3, 4,4
144

91

11

The above is based on 279 (280 1) degrees of freedom with a pvalue 0.1454. That is,
we have no reason to doubt that the data actually came from a Poisson distribution.

3.6.1

Tests For Change in Level of Poisson Data

Consider a situation in which the first n\ observations have a common mean AI


and the additional n-2 = (n n\] observations have a common mean A2- Such a
case usually arises in situations when we are interested in testing that two Poisson
distributions (the first having sample size n\ and parameter AI and the second with
sample size 77,2 and parameter A2) have the same parameter A. Then it is of interest
to test
HQ : AI = A2 versus Ha : AI ^ \2
Under HQ,
Z=

X-2

where a2 is the common variance. From this we have


2
Z2 = _(xi -x2}

HI + n-2
2

x-2) 2
a2

and is distributed x with one degree of freedom. Under HQ, we can estimate the
common variance as

62

CHAPTER 3. ONE-WAY
a

Thus we have, under HQ,

XT,

nin-2

CLASSIFICATION

+712X2)

n
v2

Al

Example
Consider as an example the data below, which relate to the number of accidents per
working day in the months of December and January at a certain busy university
road in Minnesota.
December
333497643484255
January
41431335025233453538
For this data, n\ = 15, n<2 20, and n = n\ + n^ = 35; x\ = 4.6667, x-2 3.350,
and Xp = 3.9143; hence
2
2 _ 15 x 20 (4.6667 - 3.500)
35
3.9143
~ '
The corresponding pvalue equals 0.0842. Thus we do not have sufficient evidence to
reject the null hypothesis that the rates of accidents for the two months are equal.

3.6.2

Linear Trend Models

We shall illustrate the implementation of a linear trend model with the data in the
following example.
The data in Table 3.17 relate to parity (number of children born prior to present
condition) distribution of 186 infertile patients that reported in the Maternity Wing
of the University of Ilorin (Nigeria) teaching hospital between 1982 and 1987.
Parity (Y) No. of cases Proportions
i
rii
Pi
0.462
86
0
41
1
0.220
2
29
0.156
14
0.076
3
4
9
0.048
5
5
0.027
2
6+
0.011
Table 3.17: Parity distribution of 186 patients
In the constant-parity (or equiprobable) model, the expected parity rates mi are all
equal to some constant //. Then one obtains the model in the form
log (mi) \ where A = log fj,
Using PROC GENMOD, the MLE is A = 3.280 with an estimated asymptotic
standard error (a.s.e 0.07318 and (72= 178.18 on 6 degrees of freedom, which
clearly indicates a lack of fit of the model.

3.6.

GOODNESS OF FIT FOR POISSON DATA

63

A good guess on the adequacy of the model is also provided by the following
(Lindsey, 1995). If
| Parameter estimate |> 2 a.s.e.
then the estimate is significantly different from zero at the 5% nominal level. Clearly
for the above model | 3.280 |> 2(0.07318) = 0.1464. The Lindsey result is motivated
by the fact that under the null hypothesis HQ : parameter = 0, and the standardized
Parameter estimate . , . . . , , ,
,,
.,,
,
.
,
z score =

is distributed normally with mean n0 and variance 1.


AoE
Since, for normal distributions, the interval IJL 2a contains 95% of the data, it
is therefore reasonable to assume that any parameter value whose standardized z
score is more than 2 in absolute value must belong either to the bottom 2.5% or
the upper 2.5%.
The SAS^ software program and a modified output for fitting the equiprobability model are displayed next. The estimated model is log (nk) = 3.2798.
DATA EXAMPLE;
INPUT PARITY COUNT INT (80;
DATALINES;
0 86 11 41 12 29 13 14 1 4 9 1 5 5 1 6 2 1
PRDC GENMOD DATA=EXAMPLE order=data;
make 'obstats' out=aa;
model count=/DIST=POI LINK=LOG OBSTATS;
run;
proc print data=aa noobs;
var count pred resraw reschi;
format pred resraw reschi 10.4;
run;
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

Value/DF

6
6

178.1752
198.7742

29.6959
33.1290

Deviance
Pearson Chi-Square
Algorithm converged.

Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

Intercept
Scale

1
0

3.2798
1.0000

0.0733
0.0000

Wald 957.
Confidence Limits
3.1361
1.0000

3.4235
1.0000

ChiSquare

Pr > ChiSq

2000.86

<.0001

Summary Data set


COUNT

Pred

86
41
29
14
9
5
2

26.5714
26.5714
26.5714
26.5714
26.5714
26.5714
26.5714

Resraw

Reschi

59 . 4286
14.4286
2 . 4286
-12.5714
-17.5714
-21.5714
-24.5714

11.5289
2.7991
0.4711
-2.4388
-3 . 4088
-4.1848
-4.7668

Looking at the raw residuals above, we observe that the first three are positive and
the last four are negative, indicating that the probability of infertility is more than
average in those patients with high parity values and less than average in patients
with fewer parities.
Suppose that the probability of infertility diminishes in the same proportion
between any two consecutive parity values. That is,

64

CHAPTERS.

ONE-WAY

CLASSIFICATION

Then,
=-1

and this implies that


log (

| =(*-!) log (<)

But TTfc = rrik/n where m^ is the expected frequency in category k and n is the
sample size such that ^ rrik = n. Thus,
i
(^k\
,
(m \
log I = log I k i
=> log (rafc) - log (mi) + (k - 1) log (<)
= log (mi /(/>) + A; log (</>)
which is of the form
log(mfc)=A)+ftfc

(3-20)

where
A) = log (mi/0)

and ft = log (</>)

Fitting this log linear linear-trend model using GENMOD, we have


fa = 4.4217 ft = -0.5794 and
log (m fc ) = 4.4217 - 0.5794 A;

with fitted G2 = 1.3828 on 5 d.f. The model clearly fits the data well (pvalue =
0.9262), and the negative value of ft indicates a decrease in infertility with previous
parity. Since ft = log (</>), 0 = 0.5602 is the proportional decline in probability per
parity.
Hence,
ir^TTi^-^Trie^*- 1 '
Because ft is negative, this is a model of exponential decay. If ft > 0, it would
have been a model of exponential growth. Below are the SAS software program
and the accompanying modified output.
set example;
PROG GENMOD DATA=EXAMPLE order=data;
make ' obstats ' out=bb ;
model count=PARITY/DIST=POI LINK=LOG OBSTATS;
run;
proc print data=bb noobs;
var count pred resraw reschi ;
format pred resraw reschi 10.4;
run;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square

DF

Value

Value/DF

5
5

1.3828
1.3579

0.2766
0.2716

3.6. GOODNESS OF FIT FOR POISSON DATA

65

Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

Intercept
PARITY
Scale

1
1
0

4.4217
-0.5794
1.0000

0.0944
0.0516
0.0000

Wald 957.
Confidence Limits
4.2368
-0.6806
1.0000

4.6066
-0.4782
1.0000

ChiSquare Pr > ChiSq


2196.01
126.00

<.0001
<.0001

Summary Data set


COUNT

Pred

86
41
29
14
9
5
2

83.2388
46.6330
26.1252
14.6362
8 . 1996
4 . 5937
2 . 5735

Resraw

Reschi

2.7612
-5.6330
2 . 8748
-0 . 6362
0 . 8004
0.4063
-0.5735

0.3026
-0 . 8249
0.5624
-0.1663
0.2795
0.1896
-0.3575

We also consider fitting a gamma-type model to the data. This is accomplished


by defining a new variable, which is the logarithm of X, and then fitting the twoparameter gamma distribution. The results of such a fit is reproduced below from
a GENMOD output. We added 1 to parity values to avoid taking the logarithm of
zero. The values of X now ranges from 1 to 7.
set example;
PP=PARITY+1;
PLOG=LOG(PP);
PROC GENMOD DATA=EXAMPLE order=data;
make 'obstats' out=aa;
model count=PP plog/dist=poi link=log obstats;
run;
proc print data=aa noobs;
var count pred reschi streschi;
format pred reschi streschi 7.4;
run;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square

DF

Value

Value/DF

4
4

1.2758
1.2586

0.3190
0.3147

Analysis Of Parameter Estimates

Parameter

DF

Intercept
PP
PLOG
Scale

1
1
1
0

Standard
Error

Estimate
4.9596
-0.5208
-0.1466
1 . 0000
COUNT

86
41
29
14
9
5
2

Wald 95'/.
Confidence Limits

ChiSquare

5..3176
-0 .1569
0..7303
1 .. 0000

737.10
7.87
0.11

0.1827
4.,6016
0 . 1857
-0.8846
,
0.4474 -1..0235
0 . 0000
1, 0000
.
Pred
84.6778
45.4451
25 . 4400
14.4892
8 . 3307
4.8185
2 . 7986

Reschi

Streschi

0.1437
-0.6594
0.7058
-0.1285
0.2319
0.0827
-0.4774

0,.9051
-0..9666
0,.8803
-0,.1457
0.,2656
0,.0995
-0.,5987

Pr > ChiSq
<.0001
0.0050
0 . 7432

As expected, the model fits the data very well, being based on a G2 value of 1.2758
on 4 d.f. (pvalue = 0.8655) as compared to 1.3828 on 5 d.f. for the exponential.

CHAPTER 3. ONE-WAY CLASSIFICATION

66

The exponential is certainly the more parsimonious in this case. Of course, the
exponential is a special case of the gamma distribution with a being 1 in the twoparameter gamma, T(a,/3).

3.7

Local Effects Models

Lindsey (1995) describes fitting local effects models to certain data, where models
are fitted to only some observations in the data, the rest being ignored. The ignored
data values are often described as outliers. We consider below the following data
which relate to intervals between explosions in British mines between 1851 and 1962
(Jarret, 1979). The distribution of the number of accidents on each day of the week
during the period covered is presented in Table 3.18.
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday

5
19
34
33
36
35
29

Table 3.18: Weekly distribution of number of accidents: 1851 - 1962


Interest centers on how the accidents vary over the week. We first consider
fitting the equiprobability model to the data. This model gives a G2 = 37.705 on
6 d.f. The GENMOD output for this model is given below. Examination of the
residuals indicate that in the first two weekdays, the model overestimates the rate
and underestimates the rate the last five days (Tuesday to Saturday). Sunday has
a significant residual of 4.266. The SAS^1 software program for implementing the
models discussed here is presented in appendix B.8.
Obs

1
2
3
4
5
6
7

DAYS
1
2
3
4
5
6
7

COUNT

cex

5
19
34
33
36
35
29

0
0
1
1
1
1
1

weight

1
2
3
3
3
3
3

COUNT

Pred

Reschi

5
19
34
33
36
35
29

27. 2857
27. 2857
27. 2857
27. 2857
27. 2857
27. 2857
27. 2857

-4,.2664
-1..5862
1,.2854
1..0939
1..6683
1..4768
0,,3282

LL

1
4
9
16
25
36
49

A revised model, which assigns 0 to Sunday and Monday and 1 to Tuesday through
Saturday, was next fitted. This dichotomous variable is the variable CEX in the
output. That is,

3.7. LOCAL EFFECTS MODELS

67

CEX=JO

ifdays^l, 2
I 1 elsewhere

This model is implemented with the SAS^1 software program below together with
a partial output.
set local;
if 0 le days le 2 then cex=0;
else cex=l;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance

DF

Value

Value/DF

9.6028

1.9206

5
19
34
33
36
35
29

12.0000
12.0000
33 . 4000
33 . 4000
33 . 4000
33 . 4000
33 . 4000

-2 . 0207
2.0207
0.1038
-0.0692
0 . 4499
0.2769
-0.7613

The model fits the data with a G2 9.603 on 5 d.f., but the residuals for Sunday
and Monday are again significant. A look at the plot in Figure 3.1 of the number
of accidents versus day of the week indicates that perhaps a quadratic model might
be appropriate.
We next therefore fit a quadratic model to the data, and the model gives G2
4.4578 on 4 d.f. (pvalue = 0.3476). This model fits the data and the residuals
are now not significant. The parameter estimates for this model are /?o = 1.2388,
Pi = 0.9848, and fa = -0.0990. A partial SAS software output is also displayed
for this model.
Plot of COUNT*DAYS.

Symbol used is '+'.

COUNT

DAYS

Figure 3.1: Plot of count by days

CHAPTER 3. ONE-WAY

68

CLASSIFICATION

Criteria For Assessing Goodness Of Fit

Deviance
Pearson Chi-Square

DF

Value

Value/DF

4
4

4.4578
4.3530

1 .1145
1 .0883

COUNT

5
19
34
33
36
35
29

Pred

Reschi
-1.1646
0 . 5766
1.3116
-0.5573
-0.6212
-0 . 1604
0 . 4709

8 . 3692
16 . 6474
27 . 1640
36.3602
39.9249
35.9621
26.5724

But then, if one may ask, is this the most parsimonious model? To answer this
question, we next fit another model, which involves a new factor variable we call
weight (WT), which takes values 1 and 2, respectively, on Sunday and Monday, and
value 3, Tuesday through Saturday. That is,
1 if days = 1
wt = < 2 if days = 2
3 elsewhere
This model ignores the first two weekdays (Sunday and Monday) and then fits the
equiprobability model to the remaining days. This model fits with a G2 = 0.8952
on 4 d.f. (pvalue = 0.9252). Thus we can conclude that the number of accidents
between Tuesday and Saturday is equally probable but varies between Sunday and
Monday. This latter model is implemented in SAS with the following statements,
and again with a partial output.
set local;
if days eq 1 then weight=l;
else if days eq 2 then weight=2;
else weight=3;
proc genmod order=data;
class weight;
model count=weight/dist=poi;
run;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square

3.8

DF

Value

Value/DF

4
4

0 . 8952
0 . 8743

0 . 2238
0.2186

COUNT

Pred

Reschi

5
19
34
33
36
35
29

5 .0000
19 .0000
33 .4000
33 .4000
33 .4000
33 .4000
33 .4000

0..0000
0,.0000
0,.1038
-0,.0692
0,.4499
0..2769
-0..7613

Goodness of Fit for Binomial Data

Suppose there are k experiments, each of which consists of n independent trials that
result in either success (S) with probability TT or failure (F) with probability 1 TT.

69

3.8. GOODNESS OF FIT FOR BINOMIAL DATA


The resulting data may be displayed as in Table 3.19.
Experiment Response
S
F
number
1
nn ni2
2
7^21 n11
k
Totals

Total
n
n

n
n^i n&2
Ti-j-i n_(_2 N = kn

Table 3.19: Binomial experiment data display


If Xi = HH is the number of successes in the i-th experiment, we wish to test
HQ : Xi ~ b(n,7r)
i = l,2,...,A;
and the corresponding categories are:
Number of
Frequency
0
1 2
Observed no n\ n-2
Expected mo mi m-2

successes

n Totals
k

nn
k

mn

where ns denotes the number of times that


Xi s for s 0,1,2, , n
The expected frequencies are determined according to whether or not the success
rate TT is specified (CASE I) or estimated from the data (CASE II).
Below is an example of 100 simulated experiments (k = 100) from a binomial
distribution with n = 5 and TT = 0.3. The numbers of successes for each of the
experiments are also displayed. For instance, the first experiment results in 0 successes, the second in 2, the third in 3, etc. These values correspond respectively to
n
n,n"2ii n3\, The frequency distributions of these are then tallied, resulting in a
distribution to the table of observed and expected frequencies above.
1

4
1
1
2
2
1

2
5
1
0
2
X

0
1
2
3
4
5
N=

3.8.1

1
2
2
3
3
2

1
1
2
1
1
1

2
1
2
1
0
2

1
1
1
1
2
1

0
3
0
0
1
1

2
2
4
0
3
0

COUNT

20
38
26
12

3
1
100

Case I: TT Specified

If TT is specified, then

0
2
0
0
1
2

3
1
3
2
0
2

1
0
3
2
3

1
2
1
1
0

1
2
1
2
4

3
0
1
2
0

1
1
0
1
0

70

CHAPTERS.
ms = fc(S )7r s (l-7r) n - s
\ 7

for

ONE-WAY

CLASSIFICATION

s = 0,1,2, ...,n

(3.21)

since there are A; repeated binomial trials. The resulting XQOF statistic has an
approximate x2 distribution with (n + 1) 1 = n degrees of freedom.
Case II: TT Estimated from Data
For a 6(n, ?r) the likelihood can be written as
fc

where C is a constant. Then


log L = log C + n+i log TT + AT log(l TT) n + i log(l - TT)
and the MLE of TT is

~'> for* = 0,l,...,n


and

(3.22)

X^2

V^/ -m )^2 /m
/
GOF = 2^(n
s
s
s
s=0
2

is approximately distributed as a x with n 1 = (n + 1) 1 1 degrees of freedom.


Example
The following example in Table 3.20 is taken from Mead and Curnow (1983). It
is claimed that housewives cannot tell product A from product B. To test this
assertion, 100 housewives are each given eight pairs of samples, each pair containing
one sample of product A and one of product B, and are asked to identify the A
sample. The number of times (out of eight) that each housewife is correct is recorded
as follows:
In this example, A; = 100, n = 8, and hence N = 800. The following two possible
null hypotheses are considered:
(a) The housewives have no discriminatory power, in which case the null hypothesis
is of the form HQI : TT = - against the alternative HU : TT ^ ^.

(b) The housewives have discriminatory ability and the null hypothesis will be of
the form #02 : TT = TTQ > -.
Zi

For case (a), we are interested in testing the null hypothesis

3.8. GOODNESS OF FIT FOR BINOMIAL DATA

71

Time Number of Expected values Expected values


correct housewives
under #02
under #01
0.11
2
0
0.39
1
3.13
0
1.16
2
10.94
5.46
5
14.76
21.88
13
3
24.96
27.34
4
36
27.02
5
21.88
13
10.94
18.28
18
6
7.07
10
7
3.13
1.19
0.39
3
8
100.01
100.02
100
Totals
Table 3.20: Results from binomial fits
against the alternative
Under H0, TT is known and the problem reduces to Case I in subsection (3.8.1). The
expected values based on the null hypothesis are computed using (3.21) as:
rho = 100

m8 = 100

These expected values are given in column 3 of Table 3.20. Observe that two
expected values are less than 3. The minimum expected values by the Lawal (1980)
rule should be ra = r/d? where r = 2 in this problem and d = 8. Hence, the
minimum expected value equals 0.08. Because the computed expected values are
all greater than 0.08, we can justifiably employ either the lognormal approximation
or the critical values obtained from the C(m) tables.
To use the C(m) table in Lawal (1980), first compute an effective small expectation as:
i
riiG = (0.39 x 0.39)5 = o.39
Hence, at the 5% and 1% levels, the computed X2 will be compared with the C(m)
value of 16.30 and 23.62, respectively.
The computed Pearson's X2 = 60.06, which clearly indicates that HQ should be
rejected, concluding that the housewives have discriminatory ability.
To test for these discriminating powers, it is required to test whether these
powers are the same for different housewives and whether the results within each
set of eight are independent. What this implies is that we are interested in testing
whether the data are compatible with a binomial distribution with probability TT,
the proportion of a correct identification, greater than -. Thus we wish to test the
null hypothesis
'02 : TT =

against the alternative

>

72

CHAPTERS.

ONE-WAY

CLASSIFICATION

H-22 : 7T < ~

Based on the data, an estimate of ?r is computed from Table 3.20 as follows:


. 0(2)+ 1(0)+ 2(5)+3(13)+ - - - + 8(3) 460
7T =
=
= O.O/O
8 x 100
800
That is, TT = 0.575 and (1 - TT) = 0.425.
The expected values based on the null hypothesis are computed using (3.21) or
(3.22) and are displayed in column 4 of Table 3.20. They are again computed as
follows:
/\
m0 = 100 f ](0.575)(0.425)8

0\

(0.575)8(0.425)
/
Here, there are r = 3 expected values less than 3, and with d = 7 in this case,
the required minimum expected value is again computed as 3/72 = 0.16. Since
one of the expected cell frequencies 0.11 is less than this minimum allowable value,
it is necessary to collapse this cell with the adjacent second cell to give observed
and expected values of 2 and 1.27, respectively, with the corresponding decrease in
the degree of freedom from 7 to 6. With this new setup, the minimum expected
frequency for r = 2 and d = 6 is now 2/6 5 = 0.13. The expected values satisfy this
minimum requirement and consequently
0
8

m G -(1.27 x 1.19)0'5 = 1.23


The corresponding tabulated values with A; = 6, r = 2 are, by interpolation, C(m) =
12.91 and C(m) = 18.32 at the 5% and 1% levels, respectively. The computed
Pearson's X"2 = 16.17. Hence, the null is rejected at the 5% level but we would fail
to reject the null at the 1% level. Observe that the results would be based on 8
classes instead of 6 if we were to use the usual %2 approximation.
Thus, at the 1% point, there is no reason to doubt that the binomial model
fits these data. An estimate of the variance of the estimator is given by Var =

=
= 0.0003055, and an approximate 99% confidence interTV
8_00
val for the discriminating probability is
0.575 2.58V0.0003055 or

3.8.2

(0.530,0.620)

Variance Test for the Binomial Distribution

(a) Suppose that rather than the binomial distribution, the data follow a different
discrete distribution (usually with a larger variance) or
(b) The probabilities of success TTJ follow some systemic source of variation.
Situation (a) usually arises if the binary responses are correlated or if the wrong
explanatory-response model is assumed when the actual relationship is different
from that assumed (e.g., linear model versus quadratic model). Similarly, (b) usually

3.8. GOODNESS OF FIT FOR BINOMIAL DATA

73

arises in situations when one or more explanatory variables or interactions have been
omitted from the model.
In both situations like these, the variance test is more powerful than the XgOF
test. The resulting data can be displayed in Table 3.21. Note that the sample sizes
Hi may differ.
Experiment Response
number
S
F Total
1
ni_|_
2
n2i n>22 ^2+
k
Totals

n/ki n^2
n+i n+2

nk+
N

Table 3.21: Resulting data arranged in a table


Let 7T; = for i = 1,2,..., k. Then
7T

and the variance test can be expressed as:


(i)
k

(Tig -

^ n i+ 7To(l - 7T0)

(3.23)

=
^

7To(l -

where TTO is specified a priori. Under HQ, the statistic X% is distributed x2 with k
degrees of freedom.
(ii) When TT is unknown,

(nil - ni+7r):
(3.24)
^

7T(1-7T)

This variance test is equivalent to the overdispersion test that will be discussed in
chapter 8 in the context of the linear logistic modeling of binary data. The statistic
is based on k 1 degrees of freedom, since one parameter is estimated from the
data.

Example
The variance test for the data in Table 3.20 can be conducted as follows:

74

CHAPTERS.

ONE-WAY

CLASSIFICATION

100

Z,:
(0.575)(0.425) ^
rioo
= 32.73657 < ? U=i
- 128.900

100

- 5752

on 99 degrees of freedom and where


1=1

= 37.000

The above variance test is implemented in SAS^ with either PROC LOGISTIC or
PROC GENMOD. The following program along with its partial output gives the
relevant results.
data new;
input r count Sffl;
n=8;
datalines;
0 2 1 0 2 5 3 13 4 36 5 13 6 18 7 10

83

run;
proc logistic data=new; freq count;
model r/n=/scale=none; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

99
99

149.2817
128.9003

1.5079
1.3020

0.0008
0.0234

Number of events/trials observations: 100


proc genmod;
model r/n= /dist=b;
freq count;
run;
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

Value/DF

Deviance
Pearson Chi-Square

99
99

149.2817
128.9003

1.5079
1.3020

An alternative approximation suggested by Cochran (1954) utilizes the normal distribution and is given by
_

where

v2
"~

EX

= 99

3.9. EXERCISES

75

and

V&r(X2) = 2(k - 1). ( ?!-} =(2x 99)^ = 173.250


\ n J
8
Hence,

128.900 - 99
2.27
\/173.250
We observe that the pvalue is < 0.0001. This test has been shown to be more
powerful than the X2 test.

3.9

Exercises

1. The power divergence goodness-of-fit test statistic is denned as:

where oo < A < oo.


Show that:
(i) When A = 1, /(A) equals the Pearsons test statistic X2.
(ii) As A > 1 or 0, /(A) converges respectively to 2 ^ i m i l n ( ^ i j , and

2. For the data relating to methods of child delivery in eclamptic patients, estimate the probabilities if the null hypothesis is of the form: H0 : | TTI = 7T2 =
37T3 7T4 = 77T5- Test this hypothesis and compare your results with those
obtained in the text.
3. The 1982 General Social Survey of political opinions of subjects interviewed
has seven categories, which are displayed in the next table as
(1) (2) (3) (4) (5) (6) (7)
31 136 217 583 206 201 55

where (1) extremely liberal, (2) liberal, (3) slightly liberal, (4) moderate,
(5) slightly moderate, (6) conservative, (7) extremely conservative. Fit the
equiprobable model, and the linear and quadratic models to these data. Why
does the symmetry model fail to fit this data (partition the symmetry G2 into
its components)?
4. The number of shutdowns per day caused by a breaking of the thread was
noted for a nylon spinning process over a period of 10 days (Ott, 1984). The
data collected are displayed below:
Vi 0 1 2 3 4 > 5
n 20 28 15 8 7 12

Fit a Poisson distribution to the above data and test for goodness of fit at
a 0.05 level of significance. Also do a variance test for the data.

76

CHAPTER 3. ONE-WAY

CLASSIFICATION

5. The progeny in a qualitative experiment were expected to segregate into


classes A, B, C, and D in the ratios 9:3:3:1. The numbers actually observed in
the respective classes were 267, 86, 122, and 25. Were these data significantly
compatible with the theoretical expected numbers? (From Box, 1987.)
6. The data below (Haberman, 1978) relate to individuals randomly selected
California responding to one stressful event and time to illness.
Individuals reporting one stressful event.
Months before Number
1
15
11
2
14
3
4
17
5
5
11
6
10
7
4
8
8
9
10
10
11
7
12
9
11
13
14
15
6
15
1
16
1
17
4
18
Fit an exponential decay model to the above data.
7. Jarret (1979) also gave the following data, relating to the distribution of the
mine explosions in British mines analyzed in section (3.6).
Distribution of explosions by month
Months
January
February
March
April
May

June
July
August
September
October
November
December

Number
14
20
20
13
14
10
18
15
11
16
16
24

3.9.

EXERCISES

77

Study to see if there is any systemic change over the year.


A class of 100 students took an exam consisting of 15 questions, with results
as summarized below. Assume these data behave as a random sample. Fit
the binomial distribution to this data and state your conclusion.
Number of
Number of
questions missed students
1
0
1
6
2
16
23
3
4
23
17
5
9
6
4
7
1
8
0
>9
9. The numbers of tomato plants attacked by spotted wilt disease were counted
in each of 160 areas of 9 plants. The results are displayed below (Snedecor &
Cochran, 1973):
No. of diseased
plants
Hi

0 1 2 3 4 5 6 7 Total
36 48 38 23 10 3 1 1 160

Fit a binomial distribution to this data and perform the relevant goodnessof-fit test.

This page intentionally left blank

Chapter 4

Models for 2 x 2
Contingency Tables
4.1

Introduction

In this chapter, I shall consider the case where subjects are jointly classified by
two dichotomous classificatory variables A and B indexed by i = 1,2 and j 1,2
respectively, providing four category combinations. The joint frequency distribution
of the two variables is frequently displayed in an array called a 2 x 2 contingency
table. If we let n^ denote the number of subjects in the sample who are jointly
classified as belonging to the i-ih level of A and the j-ih level of B, then these data
can be summarized as shown in Table 4.1.
Observed Frequencies
Population Probabilities
B

1
2 Total
A
I
nn nia ni+
2
n2+
n<2l
7122
Total ra+i n+2 N

A
1
2
Total

Total

TTn

7Tl2

7Tl +

VT21

7T22

7T2+

TT+l

7T+2

Table 4.1: Notation for the (2 x 2) contingency table


Each entry in the body of the table is said to refer to a cell of the table.
In this formulation, HJ + = ^. n^- denotes the marginal number of subjects in
the i-ih level of A. Similarly, n+j ^ Uij denotes the marginal total number
of subjects classified in the j-th level of B and N = Y^i Y^j nij denotes the total
sample size. If the row margin is assumed to be fixed, these totals will be denoted
by MI {ni+,n2+}, and similarly, if the column margin is assumed to be fixed,
these totals will be denoted as M<2 {n+i,n+2}.
Let us consider a simple example of a data set that give rise to a 2 x 2 contingency
table. In Table 4.2 are 13 males between the ages of 11 and 30 who were operated
on for knee injures using arthroscopic surgery. The patients were classified by type
of injury: direct blow (D), or both twisted knee and direct blow (B). The results
of the surgery were also classified as excellent (E) or good (G). The resulting 2 x 2
contingency table is displayed in Table 4.2.
79

80

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY TABLES


Result
Injury Excellent Good Total
Direct
5
3
2
Both
8
7
1
Total
13
10
3
Table 4.2: Surgery results

We would be interested in whether the two classificatory variables are independent


of one another. That is, whether the result of a patient's surgery is independent
on the type of injury sustained. When there is such an independence, then the
conditional proportion of the result of a patient's surgery being excellent would be
the same regardless of whether the patient sustains a direct or both injuries. That
The probability that the result of a patient's surgery will be excellent
given that he or she sustained a direct injury would be equal to the
probability that the result of a patient's surgery is excellent given that
he or she sustained both injuries.
There are at least four situations that might give rise to the observed 2 x 2 table in
Table 4.2 above. These are:
1. The cross-classification was based on the information that 5 patients had
sustained direct and 8 had sustained both. Further, we were informed that
of these 13 patients, 10 had excellent surgery result and 3 had good surgery
result.
2. The cross-classification was solely based on the knowledge that 5 patients had
sustained direct and 8 patients had similarly sustained both injuries.
3. Here, all the 13 patients were cross-classified according to type of injury and
the result of the surgery. The classification is random.
4. Patients were cross-classified each at random until 3 were said to have sustained direct injury.
The differences in the situations lie in the numbers fixed by the schemes. In situation
1, each of ni+,n2+,n + i, and n+2 are fixed. In situation 2, only MI = {ni-|_,n2+}
is fixed. In situation 3, only N, the total sample size, is fixed, while in situation
4, only nn is fixed. Of course for situation 2 it is also possible to have fixed only
n+i and n+2- A particularly clear discussion of the conceptual differences in these
sampling schemes has been given by Pearson (1947).
The underlying distributions for the four situations that could give rise to the 2 x
2 contingency table displayed in Table 4.2 are respectively the hypergeometric, the
product binomial, the full multinomial, and the negative binomial (Kudo & Tarumi,
1978). The hypergeometric model, which assumes that both margins are fixed, has
had most publicity, though genuine examples in which both sets of marginal totals
are fixed are very rare. The first three sampling models are discussed in turn in
the following sections. The fourth scheme has not gained much popularity and will
therefore not be discussed in this text.

4.2. THE HYPERGEOMETRIC PROBABILITY MODEL

4.2

81

The Hypergeometric Probability Model

For observational data obtained from restricted populations or for experimental


design data viewed within the framework of a strict randomization model, both
marginal distributions M\ and M% are assumed to be fixed (either by design or
by conditional distribution arguments). In this context, the null hypothesis of
"randomness" can be stated as:
HQ : The classificatory variable B is randomly distributed with respect
to the other classificatory variable A.
In other words, the data in the first row of the 2 x 2 table can be regarded as a
simple random sample of size ni + from a fixed population consisting of the column
marginal distribution M2- Under HQ, it can be shown using conditional arguments
(see Chapter 2, section 2.4) that the vector n where n {nu, ni 2 , n 2 i, ^22} follows
the hypergeometric distribution given by the probability model:
P[n|Mi,M 2 ;# 0 ] =

ni+!n2+!n+i!n+2!

(4.1)

The above can also be expressed in the form


(nl + \ (
7->f

I A It

TV /T

TT

2+

/ Vra+1S

-,

for x 1,2, ,n + i

(4.2)
= P[nn

For brevity, we usually write P[nn nu + n 2 i = n + i] succinctly as simply P[nn].


Note that under the assumption of both marginal distributions MI and M2 being
fixed, the entire distributions of the vector n can be characterized by the distribution
of any one of the internal cell frequencies n^-. Specifically, once the value of one of
the cell frequencies is fixed, the other three can be determined immediately from
the fixed margins MI and M 2 . Thus the distribution of n can be determined
completely from the distribution of one of the "pivot" cells, say nu as illustrated
below in Table 4.3.
B
Total
2
A
1
1
ni+
ni+ - n n
nn
2
n+i ~ nu n2+ - n+i + nu n2+
N
Total
n+i
n+2

Table 4.3: Observed frequencies as functions of pivot cell and fixed marginal totals
As was discussed in Chapter 2, the range of nn is from LI = max (0, ni + n_|_ 2 )
to L-2 = min(ni+,n+i). In other words, the number of jointly classified subjects
in the first row and first column, nn cannot be less than either 0 or (ni + n + 2 ),
whichever is larger, and cannot be greater than ni+ or n+i, whichever is smaller.
As a result, the probability density for n can be relabeled as
P[nn] = P{nn |Mi,M 2 ;# 0 }
and it can be shown that

(4-3)

82

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

P[mi] = i
which would indicate that the probabilities in (4.3) do constitute a probability
density function for the pivot cell frequency. We illustrate this below with the data
from Table 4.2.
For the data in Table 4.2, n\\ = 3 and L\ = max(0, 5 3) = 2 and 1/2 =
min(5, 10) = 5. The probability for the observed configuration n = {3,2,7,1}
therefore is given based on (4.1) as:

Similarly, the probability for the possible outcome n = (5, 0, 5, 3} is given by:

The collection of all possible tables consistent with the marginal totals is displayed
below.

A short-cut method of evaluating the probabilities for each of the table is described
by Feldman and Klinger (1963). This method involves the application of the following recursive formula:

where we are starting with the a 2 table. Here, a is the smallest of all possible
values of the pivot cell n\\. Hence,

P(2) = P(2) x 1
8 v

P(3) =
P(4) =

3x1
4 x 2

5x3

Adding these probabilities and since they must sum to 1, we have


P(2) + 8P(2) + 14P(2) + 5.6P(2) = 1,
Hence,

P(2)

1
28.6

= 0.03497

The other probabilities can now be obtained in terms of P(2) and these probabilities
are displayed in the next table. The table therefore gives the distribution of the
pivot cell (for all the possible configurations) for the data in Table 4.2.

83

4.2. THE HYPERGEOMETRIC PROBABILITY MODEL


nn Vector (n') Probability
2
{2,3,8,0}
0.03497
0.27972
{3,2,7,1}
3
4
{4,1,6,2}
0.48951
{5,0,5,3}
0.19580
5
Total
1.00000

As expected, the probabilities based on the distribution of the pivot cell do add up
to 1.00.

4.2.1

Mean and Variance of the n^ Terms

The moments for the hypergeometric distribution are most readily obtained (Johnson & Kotz 1982) by computing the factorial moments. If we define

= a(a - I)(a - 2) (a - k + 1)

then the k-ih factorial moment can be expressed as


E{(nn | H0}k}J =
'

(4.4)

(N)k

As a result of (4.4), it follows that the expected value of


first factorial moment when k = 1) is
= E{nn | H0}

under HQ (that is, the


(4.5)

Similarly, the second factorial moment (k = 2) is also given by:


1) | H0} =

N(N -1)

<4'6'

Consequently, the variance of nn under HQ is obtained directly from (4.5) and (4.6)
as
Vii = Var(nn | HQ}
ni + n 2+ n + in +2
We are using here the fact that the variance of X can be defined as Var(X)
E[X(X - 1)] + E(X) - E(X)2.
The central moments in (4.5) and (4.7) will be used later to develop a large
sample test statistic for HQ. We may note here that in general, for the covariance structure of n, each pairwise calculations can be obtained directly from the
relationship shown in Table 4.3. In particular, for nn and n^, we obtain
COV(nii,n 2 2 ) = COV(nu,n2+ - n+1 + nu]
= COV(n n ,nii)
= Mi

(4.8)

84

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

because nn determines n^i and n 2 +,n +1 are constants. These covariances can be
summarized in a variance - covariance matrix as:

VAR{n} =

4.2.2

-1
1
1
-1

1
-1 = cov
-1
1

"12
"21
"22

Example 3.2

The data in Table 4.4 which appeared in Freeman (1987), were originally published
in Correa et al. (1983) to study the effect of passive smoking on lung cancer. Each
of 155 nonsmokers is tabulated according to whether the spouse smoked. For the
moment, we will focus only on the females. Freeman (1987) had considered the
case for the males. There were only 22 cancer cases among ever-married nonsmoking females, and only 75 spouses in the sample have ever smoked. In strict sense,
both margins of the table were not fixed in the actual sampling scheme. However,
if we assume that the "spouse-smoked" sample sizes are fixed by design, then the
proportion of being a case (having lung cancer), which equals 22/155, is of no interest in this situation, since our data are a nonrandom sample from the population.
This proportion is referred to as the nuisance parameter and we would accept this
observed value as given. The test of interest therefore is whether the proportions
of cases reported for both the spousal smoking and nonspousal smoking groups are
the same, that is, a conditional homogeneity test: see section 10.11 for details of
the homogeneity test.
Lung cancer status?
Spouse smoked? Case
Control
Total
Yes 14
61
75
No 8
72
80
Totals 22
133
155
Table 4.4: Spouse smoking by case-control study by sex: Nonsmokers who have
been married (females)
In this example, TV = 155, MI = {75,80}, and M2 = {22, 133}.
From the above, we would like to answer questions as to how likely it is that
exactly 14 cases are observed? Since there are 22 cases in a sample of 155, there are
therefore ( 22 ) possible samples. However, 14 of the 75 smoking spouses had diagnosed as cases, and this can occur in ([4) ways. Similarly, 8 of the 80 nonsmoking
spouses are also diagnosed as cases, and this can also occur in (8g) ways. Thus both
the 22 smoking cases and the 8 nonsmoking cases can occur jointly in (^) x (88) .
Hence, the probability of observed outcome is given by:
8
22!133!75!80!

= 0.05673.
/155\
155!14!8!61!72!
We notice that the arguments lead to the hypergeometric distribution of the pivot
cell nn = 14. With the pivot cell being the observed value 14, we have L\
max(0, 22 80) 0 and Z/2 = min(22, 75) = 22. Under HQ and the assumption that

85

4.3. FISHER'S EXACT TEST

M! and M2 are fixed, the range of nn is restricted from LI < nn < L^: that is,
from 0 to 22 with probabilities obtained from (4.1), which are summarized in Table
4.5. All other possible values of the pivot cell with the corresponding probabilities
are tabulated in the table below. There is a total of D l+min(ni + , n2+, n+i, 71+2)
tables that are consistent with the fixed marginals. As in chapter 2, these probabilities are generated with the following SAS software program.
data hyper;
n=155; k=75; m=22; i2=min(k,m); il=max(0,(k+m-n));
sum=0.0;
do i=il to 12;
if i-1 It il then prob=probhypr(n,k,m,i);
else prob=probhypr(n,k,m,i)-probhypr(n,k,m,i-l);
output; end;
proc print data=hyper noobs;
var i prob;
format prob 8.4; run;

nn
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Total

P(nn)
0.000000
0.000003
0.000034
0.000273
0.001507
0.006114
0.018948
0.045975
0.088815
0.138157
0.174321
0.179144
0.150140
0.102479
0.056729
0.025282
0.008968
0.002490
0.000528
0.000082
0.000009
0.000001
0.000000
1.000000

X2

-4.89
-4.43
-3.97
-3.51
-3.05
-2.59
-2.13
-1.67
-1.21
-0.76
-0.30
0.16
0.62
1.08
1.54
2.00
2.46
2.92
3.38
3.84
4.29
4.75
5.21

23.88
19.61
15.75
12.32
9.31
6.72
4.55
2.80
1.47
0.57
0.09
0.03
0.39
1.17
2.37
4.00
6.04
8.51
11.40
14.71
18.44
22.59
27.17

Table 4.5: pdf for the pivot cell in Table 4.4


The observed table is enclosed between the two lines, and X2 is the Pearson's test
statistic defined in Chapter 3 and Z is defined as,

The pivot cell nn = 0 leads to the configuration n' = {0,75,22,58}. The null
hypothesis of homogeneity HO : TT\ = 7^2 versus TTI ^ ir-i can be tested in terms
of the probabilities in Table 4.5 using the well-known Fisher's exact test, which
is discussed in the next section, or a large sample test that will be developed in
subsection 4.3.1 by using the moments summarized in the previous section.

4.3

Fisher's Exact Test

Fisher's exact test consists first of generating all tables that are consistent with the
given marginal totals M\,M<2. Then for each such table, we calculate the sums of

86

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

the individual probabilities associated with tables that are as extreme as or more
extreme than the observed table. This sum depends on what one considers to be
an extreme or more extreme table. Possible definitions of extreme or more extreme
tables are any other table satisfying the marginal total constraints having cell counts
{n(
i , n( 9 , no,
, no9 } n(ni such that
1 U
21 22/
P(n'n) < P(nn)
(4.9a)

n/

ll n 22 ~

n/

12n21 I ^1 n ll n 22 - ^12^21 |

(4.9c)

where n^ = {nn, ni2, ri2i, 7122} are the cell counts of the observed table. Equations
(4.9a) to (4.9c) refer respectively to the probability, log-likelihood, and absolute difference of cross-products (CP) orderings of events. We shall conduct Fisher's exact
test based on these methods later in this section.
If we assume that TTI and 7r2 are the underlying probabilities of successes (A-2 = 1)
for each respective rows in the 2 x 2 setup in Table 4.1, then the null hypothesis
can be conceptualized in terms of these underlying probabilities as
HQ : TTi = 7T2 = 7T

Then the pvalue for the test of HQ against the one-sided alternative
H\ \ TTi > 7T2

is obtained as:
r-i /
\
*!()=

r>f \
P(a)

a>nn

=
o>nu

Vn+i/

= P(nn) + P(nn + !) + + P(L 2 )

In our example, we have LI 22; thus,


22
14

= 0.094089
Similarly, the pvalue for the test of HQ against the one-sided alternative
HI \ TTi < 7T2

is given by

"U,
E^)

("r)^;^)
=E

Again for our example, we have


14

o
= 0.96260

In many situations, one is interested in the general two-sided alternative hypothesis


HI : TTi 7^ 7T2

4.3. FISHER'S EXACT TEST

87

which reflects departure from the null hypothesis of homogeneity in either direction.
For this two-tailed test, a natural choice for the pvalue calculations is the cumulation
of the point probabilities for all extreme tables in either direction. However, unless
ni + = n 2 + or n + i = n + 2 (in which case, the distribution is symmetric), the two-tail
sums are not equal. Thus the two-tailed pvalue is computed in this case as the sum
of all table probabilities that are as extreme as or more extreme than the observed
probability of 0.0567. That is,
pvalue = P(0) + + P(7) + P(14) + + P(22)
(4.12)
ST

PT

where PT is sometimes referred to as the primary tail probability, which is the sum
of the set of tables Ap {n'n} in the same tail of the distribution as n'n, that is,
n'n n n sucn that P(n'u) < -P(^n)Similarly, we can define the secondary tail (ST) as the set of tables AS = {n'n}
in the opposite tail of the distribution from the observed table, that is, n'n < nn,
such that P(n'11) < P(nn). In this context, the primary tail is
PT(n n )=
T P(n'n)
.Similarly,
ST(nn)=

__^
T PKi),

and therefore

PT(n n ) = min[Pr(nn),ST(nii)]
and the two-tailed pvalue can be obtained as

This result agrees with the recommendation in Bradly (1968). Cox (1970a), however, suggested using
P* = 2 PT(n }
Again in our example, the primary tail probability is PT(14) = 0.094089 while the
secondary tail probability is
5T(14) = ]T Pr(nn) - 0.0728854
n' n =0

As a result, the two-tailed pvalue is given as either


P(14) = 0.094089 + 0.072854 = 0.1669
or

(4.13)

P*(14) = 2(0.094089) = 0.1882

The null hypothesis HQ is rejected whenever the computed pvalue < a* where a*
is usually specified. For instance, if we choose a* to be 0.05 in our example, then
we would conclude that we do not have sufficient evidence to reject #0 and the null
hypothesis of randomness will be tenable in this case.
We can implement Fisher's exact test in SAS^ version 8, for the data in Table
4.4, as follows:
DATA FISHER;
DO STATUS=1 TO 2;
DO SMOKE=1 TO 2;
INPUT COUNT <B;
OUTPUT;
END; END;

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY TABLES


DATALINES;
14 61 8 72
TITLE 'FISHERS EXACT TEST';
PROC FREQ; WEIGHT COUNT;
TABLES STATUS*SMOKE/EXACT;
RUN;
Statistics for Table of STATUS by SMOKE
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Continuity Adj . Chi-Square
Mantel-Haenszel Chi-Square
Phi Coefficient
Contingency Coefficient
Cramer's V

DF

Value

1
1
1
1

2 . 3873
2.4068
1.7288
2.3719
0.1241
0.1232
0.1241

Prob
0.1223
0.1208
0.1886
0.1235

Fisher's Exact Test


Cell (1,1) Frequency (F)
Left-sided Pr <= F
Right-sided Pr >= F

14
0 . 9626
0.0941

Table Probability (P)


Two-sided Pr <= P
Sample Size = 155

0.0567
0.1669

The SAS software output above gives results that agree with our results to four
decimal places. Fisher's exact test can be summarized in terms of the alternatives
as follows:
1. If the hypothesis is of the form
-^o : TTi = 7T2 versus HI : TTI > 7T2
then the one-tailed pvalue equals the primary tail, which in our example equals
0.094089.
2. If the hypothesis is of the form
HO : ?TI = 7T2 versus HI : TTI < 7T2
then the one-tailed pvalue equals the sum of one minus the primary tail probability and P(nn), which in our example equals (1 - 0.094089) + 0.056729 =
0.96260.
3. If the hypothesis is of the form
HQ : TTi = 7T2

versus

H\:-x\^ 11-2

then the two-tailed pvalue equals the sum of the primary and secondary tail
probabilities, which in our example equals 0.094089 + 0.072854 = 0.1669. We
can show that an exact test based on the X2 criterion for the data above yields
0.166944, exactly the same as the result obtained for the two-sided alternative
in Fisher's test.
The pvalues based on the other ranking methods discussed earlier can also be obtained for the data in Table 4.4. For this table, we have presented in the next
SAS software output, the values of Z/(n'n), CP(n'n), and P(n/11). For the observed
nn = 14, therefore, P(14)=0.0567, L(14)=467.5133 and CP(14)=520.00. Extreme
or more extremme tables are those indicating P(n'u) < 0.0567, L(n'u) > 467.5133,
and CP(n'u) > 520 for the three ranking procedures, respectively.

4.3. FISHER'S EXACT TEST

89

prob
0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

0 . 0000
0 . 0000
0 . 0000
0 . 0003
0.0015
0.0061
0.0189
0 . 0460
0.0888
0.1382
0.1743
0.1791
0.1501
0.1025
0.0567
0.0253
0.0090
0.0025
0 . 0005
0.0001
0 . 0000
0.0000
0 . 0000

LHOOD
480.8179
477 . 4869
474.9258
472.8491
471.1414
469 . 7409
468.6099
467.7235
467 . 0650
466.6232
466 . 3907
466 . 3634
466 . 5400
466.9219
467.5133
468.3215
469.3579
470.6393
472.1905
474.0494
476 . 2779
478.9913
482 . 4754

CP

1650.000
1495.000
1340.000
1185.000
1030 . 000
875 . 0000
720 . 0000
565 . 0000
410.0000
255 . 0000
100 . 0000
55.0000
210.0000
365.0000
520.0000
675.0000
830.0000
985.0000
1140.000
1295.000
1450.000
1605 . 000
1760.000

0 . 0000
0 . 0000
0 . 0000
0.0003
0.0018
0.0079
0.0269
0.0729
0.1617
0.2998
0.4741
0.6533
0 . 8034
0 . 9059
0 . 9626
0 . 9879
0.9969
0.9994
0.9999
1 . 0000
1 . 0000
1 . 0000
1 . 0000

The pvalues for the test of HQ against a two-sided alternative are computed for each
case therefore as:
(i) Probability ranking method:
pvalue = P(0) + + P(7) + P(14) + + P(22) = 0.1642
(ii) Likelihood ranking method:
pvalue = P(0) + + P(7) + P(14) + + P(22) = 0.1642
(iii) CP ranking method:
pvalue = P(0) + + P(7) + P(14) + + P(22) = 0.1642
In the above cases, all the three ranking methods give the same pvalue for the data
in Table 4.4. While it is true that the three cases usually lead to the same ordering
of events and consequently identical results, for some situations the three methods
may not necessarily lead to the same results. While this does not create much
problem for the one tailed test, it does create a problem for two-tailed tests where
the concept of primary and secondary tail probabilities are important. Therefore,
in some cases, the three ordering results may not lead to the same conclusion.
Consider, for example, a table whose observed cell counts are n {1,10,12,13}.
0

1
2
3
4
5
6
7
8
9
10
11

prob

LHOOD

CP

0.0023
0 .0248
0.1061
0.2334
0.2918
0.2162
0.0961
0,.0253
0,.0038
0.0003
0.0000
0,.0000

60 .0417
57,.6438
56,.1885
55,.4000
55.,1769
55..4764
56,,2873
57. 6224
59. 5195
62..0532
65..3674
69. 8021

143 .0000
107,.0000
71,.0000
35 .0000
1,.0000
37,.0000
73,.0000
109.,0000
145,.0000
181,.0000
217,.0000
253,,0000

0.0023
0,.0270
0..1331
0,.3665
0.,6583
0..8745
0,.9706
0.,9959
0..9997
1..0000
1..0000
1.,0000

90

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

For this table, the corresponding SAS software output is displayed above, and the
pvalues computed for a two-sided alternative using the three methods are:
(a) Probability and likelihood ranking methods:
pvalue = P(0) + P(l) + P(8) + P(9) + P(10) + P(ll) = 0.0311
PT

ST

(b) CP ranking method:


pvalue = P(0) + P(l) + P(7) + P(8) + P(9) + P(10) + P(ll) = 0.0564
PT

ST

Notice that the (CP) ordering includes n'n = {7} in its secondary tail probability;
hence the pvalues are not equal for this data set and at an a' = 0.05, the results
will lead to different conclusions.

4.3.1

Large Sample Test

With larger frequencies, the exact calculation is awkward as a result of the factorial
calculations, hence, it is simpler to use the excellent xl approximation due to Yates
(1934).
We may appeal to standard asymptotic (that is, as n tends to oo) or large
sample size results concerning the convergence of hyper geometric distribution to
the normal distribution. That is,

where mn =

_____ and

Vu

A reasonable test suggested is the Wald (1943) test defined as:


mn) 2
(N - I)(niin 2 2-ni 2 n 2 i) 2
Q=
Vu
and the test asymptotically follows a x2 distribution with one degree of freedom.
We may also compute the expected values for each of the observed cells mj by
noting that under HQ, the expected values for each of the n^ terms are (from the
results in chapter 4)
mij = E{nij | H0} =
Then Pearson's X2 test statistic is
v2 = X^V^/
A
> y (mj
- mij)\2 / m^ =
v
2

nl+n2+n+ln+2

Under HQ, X is asymptotically distributed % with one degree of freedom.


Note here that
Q N
o~lxX2
~~N~
Thus, Q is asymptotically equivalent to the X2 criterion. Further, Q is equivalent
to the empirically obtained Xy proposed by Upton (1982) where

X
X2
x2-N~lx

V--JT

We shall discuss this and other statistics further later in this chapter.

91

4.3. FISHER'S EXACT TEST

4.3.2

The Yates Continuity Correction

The Yates correction to X2 involves calculating


2

(\

ni + n 2+ n +1 n +2
The correction was based on the concept that a discrete distribution was being approximated by a continuous distribution. The Yates corrected X2 has been shown
to give a better approximation to the hyper geometric probabilities than the uncorrected X2 (Yates, 1934; Mantel & Greenhouse, 1968; Grizzle, 1967; Conover, 1974).
For the hypergeometric sampling scheme, there are a possible number of outcomes
(N + 2) that are consistent with the given marginals. Ideally, the exact test is
usually employed for situations in which one or more of the expected values m^
is small. In the example above, the expected values are reasonably large and one
would therefore be expected to use the x2 approximation. Consequently, for the
data in Table 4.4, X2 = 2.3873 and X2/ = 1.7288, which when both are compared
with the tabulated xl gives pvalues of 0.1223 and 0.1886, respectively. Thus, again
at Q; = 0.05, the results indicate that we will fail to reject the null hypothesis of independence, which agrees with the earlier results obtained using Fisher's exact test.
It may also be noted here that the above asymptotic tests involving the test criteria are two-tailed tests. These results agree with those given in the SAS software
output earlier displayed.
The following is another example taken from Goldstein (1965). There is reason to
believe that a certain drug might have value in particular kinds of cancer. However,
the drug is quite toxic, so it is not desirable to embark on large scale clinical trials
without some good preliminary evidence of efficacy. In this, randomized, doubleblind clinical trial, 10 patients are assigned at random to control and treatment
groups and their status is evaluated at the end of 6 months, with the following
results displayed in Table 4.6.

Not
Improved improved Total
5
Treated
4
1
Control
5
0
5
Total
4
10
6

Table 4.6: Goldstein's table


x^v, u^ ^K^,,v^. .i^v^u ^ tm u^ ^im cu.v, w,xj o^om emu W.I..U software warns
us of this problem with a warning below the output (see below). The exact test
therefore would be most appropriate for the analysis of this data set. The twosided and right-sided pvalues for the data, based on Fisher's, are 0.0476 and 0.024,
respectively. The results imply that we would have to reject the null hypothesis
at the 5% significance level that the treatment is ineffective. The corresponding
computed values of X2 and XY are respectively 6.667 and 3.75 (see SAS software
output below) with corresponding (large sample approximation based) pvalues of
0.010 and 0.053, respectively. Had we used these, the adjusted Xy would have
resulted in a contradictory conclusion (since pvalue > 0.05) in this case. Although

92

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

the result from the unadjusted X2 agrees even more strongly with our conclusion,
this test is very unreliable in view of the very small expected values. In fact, the
continuity corrected X2 is more reliable in this case. Thus, the use of XY for the
two-sided alternative gives close to similar result to the exact test. We shall give
some guidelines for the use of these tests in a later section.
STATISTICS FOR THE GOLDSTEIN DATA
Statistic
Chi -Square
Likelihood Ratio Chi-Square
Continuity Adj . Chi-Square
Mantel-Haenszel Chi-Square
Fisher's Exact Test (Left)
(Right)
(2-Tail)

DF

Value

Prob

1
1
1
1

6.667
8.456
3.750
6.000

0.010
0.004
0.053
0.014
1.000
0.024
0.048

Sample Size = 10
WARNING: 1007, of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
STATISTICS FOR TABLE 2
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Continuity Adj . Chi-Square
Mantel-Haenszel Chi-Square
Fisher's Exact Test (Left)
(Right)
(2-Tail)

DF

Value

Prob

1
1
1
1

1.311
1.287
0.219
1.210

0.252
0.257
0.640
0.271
0.315
0.965
0.510

Sample Size = 13
WARNING: 757, of the cells have expected counts less
than 5. Chi-Square may not be a valid test.

I also note here that the one-tailed Fisher's exact test, when significant, indicates a departure from the null hypothesis in a specific direction, while the X2 test
indicated departure from the hypothesis in either direction. In the example above,
for instance, the one-tailed exact test is used to determine whether the proportions of patients in the two groups having improved condition are equal or whether
the proportion of treated patients with improved condition is less than the proportion of the control patients. The X2 test, on the other hand, tests whether these
proportions are equal or unequal regardless of the direction of inequality.

4.4

The Mid-P Test

The mid-P tests can be considered as alternatives to Fisher's exact test. To conduct
the mid-P test, let us consider G(y) = P(Y > y), where y is the observed value of
a random variable Y having a continuous type distribution. Then, in this case it
can be shown (Upton, 1992) that E[G(y)} = 1/2. However, when Y is discrete as
it would be in the 2 x 2 table with small sample sizes, it can be shown (Barnard,
1989) that E[G(y)} > 1/2. In this case, Fisher's exact test would be biased in an
upward direction. In order to correct this, Lancaster (1961), Barnard (1989, 1990),
Hirji et al. (1991) among others in recent years have suggested the use of the mid-P
test, which is defined as follows:

4.5.

93

PRODUCT BINOMIAL SAMPLING SCHEME


Under HQ, and the given marginal totals {Mi,M2}, let us define
/(n'n, | HQ) = P(n'u = mi) and
5(n' 11 ,|ffo) = -P(nii > n n )

(4.14)

where nn is the observed value of the pivot cell and n'n (a random variable) denotes
any other observed value of nn that are consistent with {Mi,M2} under HQ. For
brevity, we simply write (4.14) above as /(n' n ) and g(n'u) respectively.
With the above definitions, the one-sided Fisher's exact test for example is now
equivalent to rejecting HQ if
For the data in Table 4.4, for example, /(n' n ) = 0.056729 and g(n'u) = 0.0374.
Fisher's pvalue equals 0.056729 + 0.0374 = 0.0941, which as expected agrees with
the primary tail probability and our result above.
Based on the above definitions of the functions / and g, the one-sided mid-P
test proposed by Hirji et al. rejects HQ if:
and for the data in Table 4.4, for example, the mid-P based on Hirji's formulation
equals: 0.5(.056729) + 0.0374 = 0.0658. Hirji et al. also considered two other
methods for obtaining pvalues for the two-sided hypothesis. These are the methods
proposed by Cox (1970b) and that by Gibbons and Pratt (1975). The Gibbons and
Pratt method (MA), called the minimum likelihood method, rejects HQ if

P{n'u /Ki) < /Ki = nu)} < a*


while the Cox method (Me), also called twice the smallest tail method, rejects HQ
if

2 min {/(n'n) + 0(n'n), 1 - g(riu}} < a"

Again in the example in Table 4.4, MA gives a pvalue of 0.0728854 + 0.056729 +


0.0374 = 0.1670. Similarly, MB= 2 min (0.0941,0.9626) = 0.1882. In both cases,
these results agree with the results given earlier and both support the null hypothesis.

4.5

Product Binomial Sampling Scheme

Consider for example a binary risk factor (e.g, smoking) and a disease state (e.g.,
lung cancer). The distribution of the number of subjects in a reference population
can compactly be written as n = (7111,7112,7121,7122) and the classification by the
risk factor and disease status is presented in Table 4.7 together with the accompanying underlying probability distribution, which can also be compactly written in
vector form as II = (TTIi, ?r 12,7r21,
Disease
R factor Cases Controls Total
Exposed
nn
711 +
7112
Unexposed 7121
712+
7122
Total
n+i
N
71+2

Disease
R factor Cases Controls Total
Exposed
7Tl2
Tl+
TH
Unexposed T21
7T2+
7T22
Total
7T+2
T+l

Table 4.7: Observed and underlying probability tables

94

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY TABLES

There are two possible situations that could give rise to the sampling scheme above.
It is not uncommon to also refer to this design as arising from observational studies.
In the first case, called prospective studies, the marginal totals MI = rii+,i = 1,2,
are often fixed. Other names for this scheme are cohort studies or follow-up studies.
The sampling scheme involves characterizing or identifying one of the samples with
a predetermined number n^+ of subjects that have the presence (the exposed) of the
suspected antecedent or risk factor (e.g., smoking) and the other sample also of a
predetermined number ri2+ of subjects do not have the suspected antecedent factor
(the unexposed). The subjects are then followed up over time into the future (that
is, prospectively) and the proportions of subjects developing the disease (outcome
variable or response, e.g., lung cancer) at some point in time are then estimated for
both samples and inference made. As an example, we might identify one cohort (a
group of individuals who have a common characteristic) of smokers and the other
cohort of nonsmokers, and then observe them for a period of time to determine the
rate of incidence of lung cancer in the two groups. This kind of study is often very
expensive and time-consuming. Further, the study may sometimes not be feasible
if the disease of interest is a rare one such as particular types of cancer. Under this
scheme, therefore, the observed frequencies and underlying population probabilities
are displayed in Table 4.8.
Disease
R factor Cases Controls Total
Exposed
ni+
nu
ni2
Unexposed n-2i
n2+
^22
N
Total
n+i
n+2

Disease
R factor Cases Controls Total
TTn
1
Exposed
7!"12
Unexposed 7T21
1
7I"22

Table 4.8: Observed and probability structure with MI fixed


In Table 4.8, n+\ and n+2 are random variables, with ni+ and ri2+ fixed by design.
Under this scheme, the TT^ are unknown parameters that satisfy the constraints
= 1 for i = l,2
j
In the second case, called retrospective studies, the marginal totals M2 = n+j,j =
1, 2 are often fixed. Retrospective study or the case-control study is an alternative to
the prospective study and is characterized by selecting or identifying subjects that
fall into the categories of the outcome variable (disease of interest) and accordingly
predetermining these numbers based on the presence (cases) or absence (controls)
of the disease and then estimating the proportions possessing the antecedent factor
retrospectively. They are retrospective because we are looking back for possible
causes. Examples of both studies can be found in Fleiss (1981). Again under
this scheme, the observed frequencies and underlying population probabilities are
displayed in Table 4.9.
In the above, ni+ and n2+ are random variables, with n+i and n+2 fixed by
design. Under this scheme, the TT^ are unknown parameters that satisfy the constraints
v^
0
> KH = ,1 ffor 3- =1 1,2

Thus in the cohort studies, individuals (study units) are selected who are initially
free of disease, and the rate of occurrence of the disease is observed over some period

4.5.

PRODUCT BINOMIAL SAMPLING

95

SCHEME

Disease

Cases Controls Total


nn
rai+
ni2
Unexposed n2i
n22
^2+
Total
N
n+i
n+2
Rfactor
Exposed

Disease
Rfactor Cases Controls
TTn
Exposed
7T12
Unexposed 7T21
7T22
1
Total
1

Table 4.9: Observerd and probability Structure with M2 fixed


of time, in the presence and absence of factors suspected of being associated with
causing the disease. On the other hand, in the case-control study, individuals are
selected on the basis of presence or absence of the disease, and we determine the
rate of possible causative factors by looking back at their past history.
In general:
In prospective studies, the marginal totals nj + are often fixed.
In retrospective studies, the marginal totals n+j are often fixed.
In cross-sectional studies, the sample size N is assumed fixed.
Now returning to the problem, under the first (prospective studies) sampling
scheme, nn follows the binomial distribution with parameters ni+, TTH. Similarly,
n-2\ follows the binomial distribution with parameters n2+, 7T2i, respectively. That
is,
nn ~6(ni+,7Tn) ^ 2 i ~ 6(n 2 +,7r 2 i)
Since these are independent samples, it follows that the vector
n' = (nn, ni2,n 2 i, n 2 2)
follows the product binomial probability model
r "22

P{n | M l5 n}
with the constraints 2_]Hi3

4.5.1

= Hi

'22

an<

(4.15)

*=

Homogeneity Hypothesis

In the above framework, the usual hypothesis of interest involves testing for the
equality of the "cases" rates TTH and 7T2i for the two independent samples or subpopulations (exposed and unexposed). That is,
HQ : TTii = 7T2i

(4-16)

The above hypothesis is equivalent to the following:


TTn 7T2i = 0 or
I[II _ 1
The ratio KU/^I is often referred to as the "relative risk' of the event under consideration.
Under H0, let the common parameter for the two populations be TTO and its
compliment TT^, so that
,
-,
/ . -. ^
7Ta + Kb = 1

(4-17)

96

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY TABLES

Thus under HO the probability model in (4.15) becomes


P{n+1 | Mlt*a,HQ}

ll

The above simplifies to:

P{n+1 | M^KatHo}
where J^ n^- = n+j is the column totals for j = 1,2.

4.5.2

Maximum Likelihood Estimates of TTa

From the above expression, the likelihood equation is


n

+i/
= log N\ -

(4.18)
log(nij) + n+i log(7ra) + n+2 log(7T6)

where is the log-likelihood. Let G = n+i log(7ra) 4- n+2 log(7r&), the kernel of the
log-likelihood; then, since the first two terms on the R.H.S. of (4.18) do not involve
the TTS, we will therefore maximize G subject to the constraints in (4.17). Using
Lagrange multipliers, we can write G* as:
G* = n+i log(7T0) + n +2 log(7T6) - A(7Ta + TTfe - 1)

then

(4.19a)
^-A

(4.19b)

^. = -(^ + ^-1)

(4.19c)

Setting the equations in (4.19a) through (4.19c) to zero, we have


7raA = n+i
TTfeA = n+2

Adding these, and noting that from equation (4.19c) we have A = N, consequently,
N

n+2

The corresponding expected frequencies under HQ become


rriij ni+ 7ra

if i = 1, 2 and j 1

m-zi = n^+ 7T6

if i = 1,2 and j = 2

v(4.20)

'

4.5. PRODUCT BINOMIAL SAMPLING SCHEME

4.5.3

97

Parameter Estimates

If we let the estimates of the success (or cases) rates for the response or outcome
variable be given by
nn
,
n-2i
Pu = - , andfri= ni+
n2+
and if we further let
^
nn
(niirt22 _ ni2TO21 )
a = u - 2i =---= H-2+

ni+n 2 +

be the estimate of the observed difference of the success rates in the two subpopulations, then
so that under HQ,
and

E{d} = 0
,,

/ ,s

,,

7T117T12

. 7T217T22

Var d = Var (pu - Pzi) = - +

so that
\T
EJ- 1
Var Ud I F
0

,H
=-

a -n2+

= 7r07r6 n --\ i+

_ N7Ta(l - 7Ta)

Thus, the variance of d under the null hypothesis depends on the nuisance parameter
7Ta.

Let us consider two possible estimates of the variance of d under HQ. The first
is to simply substitute the MLE of ?ra obtained above into the expression for the
variance to have an estimate of the variance of d under HQ as
ni+n 2+
An a large sample test statistic for HQ based on the MLE for the variance of d is
2
j2/,n/
x Ar(n n n 2 2 -n 1 2 n 2 1 ) 2
2
XY P = d /est(Vd,H
0) =
The Pearson's X2 criterion can be viewed as the large sample test statistic based
on the MLE for the variance of d.
The second alternative approach to estimating the variance of d, which is based
on the difference TTU 7r 2 i, is given in appendix C.I. In this case, the variance estimate leads to the Wald large sample test statistic against the two-sided alternative
and is given by
.,T
..
.0
Q= (A r -l)(m 1 n 2 2 -n 1 2 n 2 1 ) 2 = 7V-1^ 2
ni + n 2+ n + in +2
N
which again is equivalent to the Q statistic obtained for the hypergeometric model.
The test statistic based on the above is generated from an unbiased estimator for
the variance of d under HQ.

4.5.4

Example 4.4: HIV Testing

The data in Table 4.10, adapted from Samuels and Witmer (1999), concern a survey
conducted to test the relative attitude of males and females to HIV testing in a
certain college. Of 61 females, 9 had taken the HIV test, while of the 59 men, 8
had taken the HIV test. The data are summarized in Table 4.10.

98

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY


HIV testing status?
Yes
No
9
52
8
51
17
103

Gender
Female
Male
Totals

TABLES

Total
61
59
120

Table 4.10: HIV testing data


We note in this example that the marginal total {61, 59} are fixed by the design of
this study. The marginal totals {17, 103} in this case can be considered as random
variables. Thus in this case, for each level of gender, HIV testing status (Yes or
No) can be assumed to have arisen from binomial random vector. With this study
design, we would then be interested in the hypothesis of homogeneity. That is,
The maximum likelihood estimate of 7ra = n+i/N = 17/120 = 0.14167. The
expected frequencies under HQ are ran = 61 x (0.14167) = 8.6417 and 77121 =
59 x (0.14167) = 8.3583. Note that ran + ra2i = 17.00. Also, rai2 = nl+ - ran =
61-8.6417 =52.3583. Similarly ra22 = n2+ - ra2i = 59 - 8.3583 = 50.6417.
The Pearson's X2 test statistic equals:
(9 - 8.6417)
8.3S83)
8.6417
8.3583
(52 52.35S3)2
(51 - 50.6417)2
52.3583
50.6417
= 0.0352

or

120(9 x 51-8 x 52);


= 0.0352
17 x 103 x 61 x 59
Alternatively, X2 can also be obtained as:
X =

= 0.1284

8.6417

52.3583

1
50.6417

8.3583

= 0.0352
where W = (nn - ran)2 = (9 - 8.6417)2 = 0.1284.
Similarly,
52

51
+ 51 log
8.3583)

V50.6417

/
\
/
= 2(0.3656 - 0.3571 - 0.3505 + 0.3596)
= 0.0353

+ Slog

4.6.

99

THE FULL MULTINOMIAL MODEL SCHEME

The value of X2 computed above when compared with the tabulated x2 with one
degree of freedom is not significant (pvalue = 0.8512). This result therefore indicates
that the proportions of females who had been tested for HIV seem to be the same
as the proportion of males who had similarly been tested for HIV. Ideally, Fisher's
exact test is most appropriate for the case when at least one of the expected values
rhij are small, that is, less than 3.
The product binomial (product multinomial in higher dimensional tables) sampling scheme has certain implications for models that can be fitted to data arising
from such a scheme. In the HIV example above, the marginal totals ni+ are fixed.
Any model that we would employ for these data must therefore accomodate these
constraints and be such that the expected values under such models must satisfy

for i = 1, 2

rhi+ =

Thus a log- linear model (chapter 6) must include the term Ap, where G stands for
gender. We shall discuss these implications more in chapter 6.

4.6

The Full Multinomial Model Scheme

Suppose we randomly sampled subjects according to whether they belong to either


of two groups (risk factor present and risk factor not present). We next determine
the number of subjects or proportion of subjects in each group that have the outcome variable (case or no case). Here, the subjects are observed at only one point
in time, and this represents an independent-sample design. Such a design is referred
to as a cross-section study. For cross-sectional studies, only the sample size N is
fixed and we are interested in the relationship between the two risk and outcome
variables. In this situation, individuals sampled are classified simultaneously on
both variables and thus only the sample size N is fixed here. In this context, the
observed counts and the corresponding underlying probability structure for these
counts are summarized in Table 4.11.
Observed Frequencies

Popl

I
II
Total

Response
F
S
nn
ni2

n+i

n+2

Population Probabilities
Response

Total

Popl
I
II
Total

n2+
N

Total

TTii

7Ti2

7T21

7T22

7T1+
7T2+

7T+1

7T+2

Table 4.11: Observed counts and corresponding probality structure


The TTij in Table 4.11 are unknown parameters that satisfy the constraint
_

Under this sampling scheme, the observed frequencies n follow the multinomial
distribution with parameters N and II, that is,

P[n | N,U}} =

'11

12

21

100
where

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY


fN\
n

TABLES

N\

is the standard multinomial coefficient.

4.6.1

The Independence Hypothesis

The primary focus is on the relationship between the two classificatory variables.
Specifically, the null hypothesis of independence can be stated formally as the conditional proportion of being in column 1, given that an individual belongs to a known
row is the same for both rows, that is,
TTji

7T21

The above can generally be written in the form


HQ : TTtj = 7rt+7r+J- for ( i , j ) = 1,2

(4.21)

That is, under HO, the joint probabilities can be obtained directly as products of
the corresponding marginal probabilities. From the above, it follows that
7T127T21,

<=>

^11^22 ~ 7I"l27r21 = 0

The latter is sometimes called the cross-product difference.


that the cross-product ratio or odds-ratio, 0, is

Similarly, HQ implies

Q _ 7T117T22 ^
7T127T21

so that HQ can be stated as HQ : 9 = I.


We observe here that under HQ, n\+ and n+i are independent in spite of sharing nil. This result can be demonstrated by using conditional arguments and is
presented in appendix C.2.

4.6.2

MLE Estimates of the TT Terms

Under HQ, the TT^ are estimated by


TTij = Ki+K+j

(4.22)

Consequently, the expected values under HQ are given by

Substituting these in X2, we have again,


ni + n 2+ n +1 n +2

4.6.3

Example 4.5: Longitudinal Study Data

Christensen (1990) gives the following example from a longitudinal study. A sample of 3,182 individuals without cardiovascular disease were cross-classified by two
factors: personality type and exercise. Personality type has two categories: type
A and type B, where a type A individual shows signs of stress, uneasiness, and
hyperactivity. On the other hand, a type B individual is relaxed, easygoing, and

4.6.

THE FULL MULTINOMIAL MODEL SCHEME


Personality
A
B
483
477
1101 1121
1584 1598

Exercise
Regular
Other
Totals

101

Total
960
2222
3182

Table 4.12: Longitudinal study data


normally active. The exercise variable is categorized as individuals who exercise
regularly and those who do not. The data are summarized in Table 4.12.
In this example, only the sample size N = 3182 is assumed fixed. Consequently,
TTii + 7Ti2 + 7T21 + ^22 = 1 under this sampling scheme. The hypotheses of interest
nere are.
JJQ _ Exercjse is independent of personality type
Ha = Exercise is dependent on personality type
For these data under H0, therefore, ran = (960)(1584)/3182 = 477.8881. Similarly,
mi2 = 482.1119; 77i2i = 1106.1119 and m22 = 1115.8881, and the Pearson's X2 =
0.1559 (pvalue = 0.6929), which is not significant when compared with the x2 distribution with one degree of freedom. Thus there is no evidence against independence
of exercise and personality type. Corresopondingly, G2 = 0.1559 for this data.

4.6.4

Alternative Formulation

We mentioned earlier that a test of independence in the 2 x 2 contingency table


under the multinomial sampling scheme is equivalent to testing that the crossproduct ratio (or the odds ratio) is
7T127T21

An estimate of 9 can be obtained as


If we define 77 to be
77 = ln(0) =

In(7n 22 ) - In(rai 2 ) - In(m2i)

Then 77 can be viewed as a log- contrast where


2 2
with

(4.23)

Let the corresponding observed log-odds be given by


i = ln(nn) + In(n 22 ) - In(n 12 ) - In(n 21 )
Again, we see that i can be considered as a log-contrast. The asymptotic variance
of i using the delta method equals
*

A large sample Wald statistic

converges in distribution to JV(0,1) and a


v

test of the hypothesis that 9 = 1 is equivalent to the hypothesis that

102

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY


HQ : j] = 0

Under HQ, we note that

versus

TABLES

Ha : 77 ^ 0

which implies that

The above is the test statistic suggested by Lindley (1964) and can be presented in
the form:
2
[In(nn) + In(n 22 ) - In(m 2 ) - In(n 2 i)r
# 2 = = -^rr-^r-^r(4.25)
I/
n n + n22 + n12 + ra21
where V is the variance of t and we would reject HQ at 5% significance level if
B2 > (1.96)2 = 3.84, or at the 1% level if B2 > (2.58)2 = 6.66. The corresponding
pvalue is given approximately by
1
^) where V = > n".
J

yi

^'

(4.26)

For the data in Table 4.12,


I = ln(483) + ln(1121) - ln(477) - ln(1101) = 0.0305
and the asymptotic standard error is
V* = \ + + + = 0.0772
V 483
1121
477
1101
Hence,

z
B = -^T = 0.3955 and B2 = 0.1564
V*
The corresponding pvalue = 2$(-0.3955) = 2 (0.3462) = 0.6924. There is no
sufficient evidence to reject HQ either by comparing B2 with a x2 distribution with
one degree of freedom or by the use of the pvalue computed above.
The odds ratio, 0 = zl is estimated to be e-0305 = 1.0310 and a 95% confidence
interval for log 9 is given by
(I 1.96^*) = 0.0305 0.1513 = (-0.1208,0.1818)
Since the interval includes 0, i.e., the null hypothesis is plausible, we would again
fail to reject HQ in this example.

4.7

Summary Recommendations

For the three sampling schemes designated as A, B, and C in Table 4.13, we give
the number of possible outcomes that could be consistent with a given sample size
N, when both margins, one margin and only the sample size is fixed.
We recognize from the above table that for a given JV, there are more possible configurations for C, followed respectively by B and A. In other words, the distribution
of a test statistic, say, X2, under sampling scheme A is more discrete than those

4.7. SUMMARY RECOMMENDATIONS


Sampling
scheme
Hypergeometric: A
Product Binomial: B
Multinomial: C

103

Number of
outcomes (M)
^(N + 2)
|(JV + 2)2
|(JV + l)(N + 2}(N + 3)

Table 4.13: Possible number of configurations (M) under the sampling schemes
for schemes B or C. Thus X2 would be better approximated by a continuous x2
distribution for those schemes than for scheme A. The Yates continuity correction
is therefore meant to correct this for the case when the sampling scheme is of type
A. It should be pointed out here that X2 will be based on 1 d.f. for each of the
sampling schemes.
Another continuity-corrected X2 criterion similar to the one discussed in chapter
3 for the one-way classification that has been found to be very successful for the
general 2 x 2 table is Cochran's (1942) correction, which is given by
2
2
A ^ ^\ ~, &( A
^ -v/
-rr>(1v^
> Xi) ~ ^l vc >
Xi)^
where X2-, = (XQ + A^), XQ is the observed X2 statistic, and X?_i is the next
lower value (when ranked) under the null distribution of X2. The correction is of
course based on the discreteness of the distribution of X2.
Conover (1974) has advocated this procedure. The argument for this is based on
the fact that the correction is an adaptive correction with respect to the sampling
models, by the fact that its importance will decrease as the empirical discrete distribution of the X2 criterion becomes less discrete as we go from sampling scheme A to
C. Although, this represents a considerable improvement over Yate's (fixed) correction, the search for the next lower value of the test criterion can be computationally
expensive, especially for sampling schemes B and C.
For the data in Table 4.4, for instance, we list in Table 4.14, the configurations
as well as the corresponding X2 for each configuration, ranked in terms of the test
statistic X2, from largest to smallest. If we chose to use any other test statistics
(not X2), the result would be different. Cochran's procedure involves evaluation of
P(X\ > 2 - 3 7 + L 4 7 ) = P(X\ > 1-92) = 0.1659
However, it should be pointed out here that regardless of the sampling scheme,
under the null hypothesis of independence or homogeneity, the expected values are
given by m^ = rii+n+j/N, the Pearsons's test statistic reduces in each case to
X2 = N(nnTi22 ni2ri2i) 2 /ni4-n2+n + in + 2 and is based on 1 degree of freedom.
The corresponding likelihood ratio test statistic is similarly computed from

4.7.1

Equivalent x2-Based GOF Test Statistics

The following test statistics have all been employed in conducting tests for the
general 2 x 2 contingency table.

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

104

Configuration
number
22,0,
0,22,

Ranked
X2
27.17
23.88

14, 8
8, 14

2.37
1.47

11, 12
11, 11

0.09
0.03

TABLES

Table 4.14: Relevant list of configurations for the data in Table 4.4
Upton's (1982) modified X2 statistic is N

~l

We notice that Upton's scaled statistic is equal to the statistic Q developed earlier
under the three sampling schemes.
Other test statistics that have also been considered are the following:

where X^ and X^2^ are continuity corrected statistics introduced by Burstein


(1981) to approximate two-tailed exact binomial significance levels for cases when
n\+ ^ n2+ and ni + = n2+ respectively for the sampling scheme B.
Berchtold (1972) also introduced the test statistic with continuity correction:
2
(N ~ l)Kin 22 ~ n12ri2i + 5)2
where 5 = T(^Q 1)(2<? l)(ni + n 2 +); za is the a-quantile of a standard normal
distribution and q equals ^i. Following Sachs (1986), the one-sided X^ statistic
will be designated as b*. For this case, the za quartile would be taken as 1.645
(Dozzi & Riedwyl, 1984).
Obviously, when HI+ = n 2+ , then X\ reduces to Upton's test statistic discussed
above, which was labeled as Xfj by Rice (1988).
And yet another test statistic proposed by Schouten et al. (1980) is defined here
as:
2
n
2+ )}
2o _ (N
\ - 1){|
/ u nnn 22
^ - n i2 n 2^i Ii -\2min(ni
v i++',n^
(4-27)
V

Again as pointed out in Schouten et al. (1980), this result extends those of Pirie
and Hamden (1972) for in the case when n\+ = n 2+ , the correction factor in (4.27)
becomes ^, which asymptotically is equivalent to X225.
The computation of x2 tail probabilities in SAS^ for each of the above test
statistics can be accomplished by using probchi and probnorm functions for the
upper tail of a %2 with one degree of freedom and normal distributions respectively.

4.7. SUMMARY RECOMMENDATIONS

105

We give below some summary recommendation for the use of the continuity
corrected X2 suggested earlier.

4.7.2

For Observational Data

1. If the sample size N is very small, use Fisher's exact test or the mid-P test.
The latter is recommended.
2. For moderate N, use Q = ^-^X"2 with Yate's continuity correction.

4.7.3

If the Sampling Scheme Is the Product Binomial

1. If HI+ = n2+, use uncorrected Q, or Burstein's X+25 statistic or simply resort


to the exact binomial test, which is discussed below.
2. For ni+ = kn2+, where k is an integer, use Q with continuity correction of
- where n^+ is the smaller sample. The exact binomial test or the X+i
will provide suitable alternatives. We would, however, recommend X^ if the
exact binomial test is not immediately feasible.
k

3. If ni+ is not a multiple of 712+, use X2 with , where k is the largest common
Zt

divisor of n\+ and ri2+. Again, Burstein's X^ will be highly recommended


in lieu of the exact binomial test.
4. If the sampling scheme is multinomial (cross-classifying) with N fixed, use Q.

4.7.4

The Exact Binomial Test

Burstein (1981) obtained for the test of HQ versus HI, significance levels based on
the joint probabilities:

B=
where 0 < i < ni+; 0 < j < n<z+ , with A as defined in the Burstein (1981)
expressions for the upper-tailed probability and the two-tailed probability.
7

Thus, BI, the upper tail probability, included events are those permitting
j

- ) > A while B2 = Tail-2 probability, included events are those permitting


n2+
i
i
(
--) > A and B, the two-tailed significance level will be given by BI + B2n2+ n i+
Burstein also provided simpler expressions and a computer program for calculating
Bi,B<2, and hence B.
The one-tailed probability BI so obtained is equivalent to the CBET result of
Rice (1988) case 0.
We give in Table 4.15 computed results for the exact binomial, mid-P, and
Fisher's exact test for the data in Tables 4.4 and 4.6.

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

106

Mid-P
Table 4.4
Table 4.6

1-Tail
0.0658
0.0119

2-Tail
0.1386
0.0238

Fisher's
1-Tail
2-Tail
0.0941 0.1669
0.0240 0.0476

TABLES

Binomial
1-Tail
2-Tail
0.0637 0.1281
0.0094 0.0188

Table 4.15: Summary of results for the data in Tables 4.4 and 4.6

4.8

The 2 x 2 Tables with Correlated Data

The procedures that we have employed so far for the analysis of 2 x 2 table assume
that the samples are independent. Sometimes the data arising from two binomial
populations can be correlated. Our interest still centers on testing for homogeneity
in the two populations. For instance, 2 x 2 table arising from husband and wife
pair cannot be considered to be independent. Also not independent are data arising
from matched-pair experiments. An example of such a matched-pair experiment is
the data in Table 4.16, which relate to the side effect of pump therapy in the control
of blood glucose levels in diabetic patients (Mecklenburg et al., 1984). The data is
on the occurrence of diabetic ketoacidosis (DKA) in patients before and after the
onset of pump therapy. Here, individuals provide a pair of measurements before
and after diabetic pump therapy.
After pump
therapy
Before pump
therapy
DKA
No DKA
Total

DKA
7
19
26

No DKA
7
128
135

Total
14
147
161

Table 4.16: Matched-pair data for DKA patients


The concordant pairs (that is, pairs in which outcomes are the same) equal
128 + 7 = 135. The discordant pairs (cases in which outcomes are different for
each member of the pair) usually designated as nrj equals 19 + 7 = 26. The
concordant pairs do not provide any information regarding the treatments effects.
The discordant pairs, however, do provide this information, and these pairs can
further be subdivided into pairs:
(a) A type A discordant pair is one in which a before treatment (pump therapy)
member has the event (DKA) and the after pump therapy member does not.
This pair is designated as UA = n\-2(b) A type B discordant pair is one in which an after pump therapy member has
the event (DKA) and the before therapy member does not. Again, we shall
designate this pair by UB = n<i\.
The number of counts in each of these two discordant pairs for the data above are
nA = 7 and UB 19, respectively. The test of the hypotheses
versus

107

4.8. THE 2 x 2 TABLES WITH CORRELATED DATA


is provided by the McNemar's test statistic, given by (see exercise 4.3)

The statistic is distributed x\ and the pvalue is given by


corrected version of the test statistic is defined as:
( | n i 2 - n 2 i -I) 2
2
For the data in Table 4.16, we have
2
2 _(19-7)
N
19 + 7

> Xi)- A continuity-

144
= 5.5385 and
~26~

(|19-7 -I) 2
19 + 7

121
= 4.6538
~26~
The pvalues for both statistics are respectively, 0.0186 and 0.0310. Thus there
is significant statistical DKA proportional difference between the before and after
pump therapies. The after pump therapy would be preferred. The SAS software
program and output for this analysis is provided below. The statistic used by SAS
software is clearly the X^.

data corr;
input before after count <BQ; datalines;
1 1 7 1 2 7 2 1 19 22 128
TITLE 'McNema's Test';
proc freq order=data;
weight count; tables before*after/agree; run;
STATISTICS FOR TABLE OF BEFORE BY AFTER
McNemar's Test
Statistic =5.538

DF = 1

Prob = 0.019

Simple Kappa Coefficient

Kappa = 0.267
ASE = 0.102
Sample Size = 161

957, Confidence Bounds


0.067
0.468

The X'xc can be alternatively written in terms of rip and UA as follows:


\ 2
nD 1 il\
,(nD\

^-T ^] I(T)

Substituting the relevant values into the above expression leads to a value of 4.6538
as expected.
The above test is valid if UD > 20. For cases when UD < 20, we shall use the
exact test, which is based on the binomial distribution. The exact pvalues can be
computed from the expressions below,
nA /
nD \ / -i \ no
2xVM(Tl
ifA<nD/2
j
3=0
P=

no

if HA > nD/2
j=n.A

if nA = nD/2

108

4.8.1

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

Another Example

A recent phenomenon in the recording of blood pressure is the development of


the automated blood-pressure machine, where for a small fee a person can sit in
a booth and have his or her blood pressure measured by a computer device. A
study is conducted to compare the computer device with the standard methods of
measuring blood pressure. Twenty patients are recruited, and their hypertensive
status is assessed by both the computer device and a trained observer. Hypertensive
status is denned as either hypertensive (+), if either systolic bp > 160 or diastolic
bp > 95, or normotensive () otherwise. The data are presented in Table 4.17,
(taken from Rosner, 2000). Assess the statistical significance of these findings.

Person
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Hypertensive status
Computer
Trained
device
observer

+
+

4+
+

+
+

+
+

Table 4.17: Hypertensive status of 20 women


The 2 x 2 table arising from this problem is presented in a SAS software output
below.
data matched;
input computer $ obs$<8<8; datalines;

proc freq; tables computer*obs/agree; run;


data exact; do i=0 to 8;
if i-l=-l then prob=probbnml(0.5,8,0);
else prob=probbnml(0.5,8,i)-probbnml(0.5,8,i-l);
output; end; run;
proc print noobs; run;
proc means data=exact noprint;
where i le 6; var prob;
output out=aa sum=p; run;
data result; set aa;
exact=2*(l-p); proc print data=result;

4.9. MEASURES OF ASSOCIATION IN 2 x 2 CONTINGENCY TABLES

109

var exact; format exact 10.4; run;


The FREQ Procedure
Table of computer by observer
observer

Total

Computer

Total

10

10

16

20

Statistics for Table of computer by obs


McNemar's Test
Statistic (S)
DF
Pr > S
Obs
1

4.5000
1
0.0339
exact
0.0703

The McNemar's test above indicates a pvalue of 0.0339, which indicate that we
would have to reject the null hypothesis. On the other hand, the exact result based
on the binomial gives a pvalue 0.0703, indicating that we would fail to reject the null
hypothesis in this case. That is, there is agreement between the trained observer
and the computer device. The exact result is more accurate in this case.

4.9

Measures of Association in 2 x 2
Contingency Tables

In this section, we shall introduce some measures of association that have received
considerable attention for the 2 x 2 contingency tables and later generalize this to
higher dimensional tables. The structure of measures of association in larger tables
can be easily constructed from a proper understanding of the 2 x 2 case (Goodman,
1979a, 1985).
For the 2 x 2 table, let the underlying probability model under the multinomial
sampling scheme (cross-sectional study design) be displayed as in the next table.
Popl

I
II

Total

Response
S
F
7Ti2

Total
TI+

7T21

7T22

fl"2+

7T+1

7T+2

TTii

Let 7Ti+ be the probability that a randomly selected individual is in the i-th row.
Similarly, let TT+J be the corresponding probability of a randomly selected individual
being in the j'-th column. Also let TT^ be the unconditional probability that an
individual selected at random is in row i and column j, i,j = 1,2.
A measure of association between the row categories and the column categories
is a population parameter which summarizes the relationship between the joint
variation of the two classificatory variables.

110

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

For the 2 x 2 table, interest is centered on the null hypothesis of independence


that is, the hypothesis when
Suitable measures of association that we would be considering here are those that
assume zero under the null hypothesis of independence or homogeneity. We shall
commence our discussion on these measures by first distinguishing between a measure of association and a test of independence. A test of independence is used to
determine whether a relationship exists between the two classificatory variables. On
the other hand, a measure of association indicates the particular type and extent of
this relationship. Thus tests of independence are existence tests, but measures of
association reflect the strength of that relationship.
For the 2 x 2 table with the underlying probabilities above, the odds that a
randomly selected individual will be classified in column 1 to being in column 2
given that he or she is classified in row 1 is given by
TTll 1 , \ 7T12 1 _ TTll

Similarly, the odds of being classified in column 1 to being in column 2 given that
the individual is classified in row 2 are given by
"^21 1 , [^22 I

7T21

7T22

The odds ratio is defined as the ratio of these two odds. That is,
0 = ^i^

(4.28)

7T127T21

0 is referred to as the odds ratio or cross-product ratio. Under the model of


independence, 9 = 1. If 9 > 1, then there is said to be positive association between
the two variables. Similarly, if 9 < 1, there is said to be negative association between
the two variables.

4.9.1

Properties of 0

1. Under H0 : 9 = 1.
2. 9 is invariant under interchange of rows and columns. However, an interchange
of only rows or columns will change 9 to 1/9.
3. 9 is invariant under row and column multiplication by a constant, say, (3 > 0.
4. 9 ranges from 0 to oo but the logarithm of 9, that is, the log odds ratio, is
symmetrical about 0. Thus, 0 < 9 < oo which implies that oo < log(#) < oo
and under HQ : log(#) = 0.
5. If we let r\ = - and r^ = - , be functions of the conditional row probabilities and similarly let c\ = - and C? = - be functions of the conditional
7r+i

7T+2

column probabilities, then we see that


(*)

(i-r 2 )i r cl i r(i-c 2 )

4.9. MEASURES OF ASSOCIATION IN 2 x 2 CONTINGENCY TABLES

111

and we can show that X"2 reduces in this case to


that is, a function estimated by X2.
(b) If we define, 7 = log(0) = log(Trn) 4- log(7T22) - log(7T12) - log(7T 2 i), then
we can reexpress 7 as:
7 = log (

- log -

=01-02

where 0j = log I - I , i = 1, 2, is the difference between two logits


\ 1 - fi /
Ti = --- based upon conditional probabilities ri,r 2 .

4.9.2

MLE and Asymptotic Variance

Under either the multinomial sampling scheme or the product binomial scheme, the
MLE estimator for 0 is given by

It is sometimes advocated that for those situations where there are zero cell frequencies, the estimate can be computed as
g* = (n u +0.5)(n 2 2 +0.5)
(n 1 2 +0.5)(n 2 i+0.5)
We have shown earlier using the delta method that an estimate of the asymptotic
variance of 0 is given by
n

ni2

n2i

n22

For the data of Table 4.4 we have


8 x 61

and

V(9] = 2.0662(0.2267) = 0.9676

The approximate 95% confidence interval for the true value of association, as measured by 0, is therefore
2.066 1.96V.9676 that is, (0.1380, 3.9940)
The confidence interval includes the value 1, the independence value, so that the
data do not rule out the possibility of independence (as measured by 0) or homogeneity.
A 100(1 a)% confidence interval is sometimes obtained first in terms of the
log of the odds ratio as:
_
t*l- /2VV
&

where t is the log of the estimated odds ratio and


ii

+ -L + JL +
ni2

n2i

112

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

is the asymptotic variance of i. We have therefore, for our data,


e ln(2.066)1.96\/0.2267

_ gO.7256iO.9332

_ e(-0.2076,1.6588)

= (0.8125,5.2530)
The latter is the confidence intervals computed by

4.9.3

General Functions of 6

Several measures of association are monotonically increasing or decreasing functions of 0. Let f(9) be a positive monotonically increasing function of 9 such that
/(I) = 1. Then a normalized measure of association based on f(0) whose maximum
absolute value is +1 is obtained as

The asymptotic variance of g(9) is given by

where / (0) is the derivative of f(0) with respect to 9.

Proof
We have shown using the linearized Taylor's series method, which is otherwise
known as the 5 method, that the variance of a function g(9) can be expressed as
In the present case,

9(0) =

Hence,

but

4/'(0)

1 _ g(0]

Hence,
Specific Functions of 0
Yules's (1900) Q measure of association: Let /(#) = 9; then

Q=

7T117T22 7T127T21
7r

127r21

Q +1 if either 7T2i = 0 or 7Ti2 = 0; that is, as the value of the classificatory


variable Popl increases, the value for Response either remains the same or increases.
Similarly, Q = 1 if either 7r22 0 or TTH = 0.

4.9. MEASURES OF ASSOCIATION IN 2 x 2 CONTINGENCY TABLES

113

A sample estimate of Q is given by

Q=!^ =

0+1

with an estimate of the asymptotic variance given by


VarA{<>} =

( i - Q,2\2

nu

n<2i

n22

The hypothesis of independence is rejected when Q 0. Referring again to the data


in Table 4.4, we have Q = 2'o66+i = 0-3477 with estimated asymptotic variance
Var(Q) = 0.1932 x 0.2267 = 0.0438. An approximate 95% CI for Q is therefore
given by:
0.3477 1.96V0.0438 that is, (-0.0625,0.7579)
Again, this interval contains zero; hence, we will again fail to reject the null hypothesis of independence.

4.9.4

Measure of Relative Risk

In situations in which one variable is assumed to be antecedent to another, but both


are measured at the same point in time, e.g., retrospective, or case-control studies
in epidemiology, it is of interest to investigate the relative risk of developing the
consequent condition based on the antecedent characteristic.

Notation
Let D denote the condition of interest (that is, the disease variable) e.g Cancer
of the lung, and let E denote the characteristic, factor, or explanatory variable of
interest, e.g. smoking, often called the exposure variable.
Let + denote presence and denote absence. In this framework, the corresponding 2 x 2 (exposure disease) table is

D
E
+
Totals

Totals

TTll

7Ti2

7l"21

^22

7T+1

7T+2

*"! +
7^2+
1

Among subjects with the factor present or exposed, the risk of D is

P(D = + | E = +] =
and among subjects with factor absent or unexposed, the risk of D is

P[D = + | E = -] =
The relative risk, RR is then defined as
7r

RR

n/ 7r i+

P(disease exposed)
P(disease | unexposed)

(4.29)

In words, we can define relative risk as the ratio of developing a disease among
exposed (or at risk) subjects to the risk of developing the disease among un exposed
subjects. The relative risk and the odds ratio are two different measures that sought

114

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

to explain the same phenomenon. The relative risk can be expressed in terms of
the odds ratio by writing
\
+7T12}

Now if D is a "rare disease" in the population, - and - will be essentially 0


7T22

7T12

(often referred to as the rare outcome assumption) . Thus in this case (usually when
outcome 10%),
For prospective data, RR = -:- . Similarly, for cross-sectional studies the ratio
- is often referred to as the prevalence ratio. However, this ratio does not indicate
7^21

risk since the disease and risk factors are assessed at the same time, but gives a
comparison of the prevalence of the disease for the "at risk" group versus the other
group.
Example

The example below relates to the relationship between aspirin use and heart attack (Agresti, 1990). During the Physician's Health Study, 11,037 physicians were
randomly assigned to take 325 mg of aspirin every other day. In another group,
11,034 physicians were randomly assigned to take a placebo. The resulting number
of heart attacks in each group is displayed in the next table as a 2 x 2 contingency
table.

Placebo
Aspirin

Myocordial infarction
Heart attack
No heart attack
189
10,845
104
10,933

Totals
11,034
11,037

This was a randomized clinical trial testing whether aspirin regularly taken reduces
mortality from cardiovascular disease. Let
TTi = P (Heart attack taking placebo)
7T2 =P (Heart attack taking aspirin)
The estimates of these probabilities from the data are:
TTi = 189/11034 = 0.0171
7T2 = 104/11037 = 0.0094
Sample difference of proportions is 0.0171 0.0094 = 0.0077 and the relative risk is

The proportion suffering heart attacks was 1.82 times higher for those taking placebo
than for those taking aspirin. The sample odds ratio is 9 ( 104 xio845) 1.83, which
is very close to the estimate of the relative risk. The above data are from a prospective or cohort study, which allows us to estimate TTI and 7T2. In a retrospective or
case-control study, however, we would not be able to estimate these probabilities
because in that case, the n+i = 293 and n+2 = 21, 778 were fixed by design and the
data therefore contain no information about the ?r values. We can, however, use
the result established earlier to show that regardless, an estimate of 9 is 1.83.

4.9. MEASURES OF ASSOCIATION IN 2 x 2 CONTINGENCY TABLES

4.9.5

115

Sensitivity, Specificity and Predictive Values

Other measures for testing the effectiveness of a test procedure (screening test or
set of symptoms), such as a medical test to diagonose a disease, are sensitivity,
specificity, and predictive values.
For screening tests, it must be realized that these tests themselves can sometimes
be wrong. That is, a testing procedure may yield either a false positive or a false
negative result.
1. A false positive results when a test indicates a positive status when the true
status is negative.
2. A false negative results when a test indicates a negative status when the true
status is positive.
Suppose we have for a sample of n subjects (n always large) the information in the
next table:
Test result
Positive (T)
Negative (f )
Total

Disease
Present (D)
Absent (D}
b
a
d
c
b+d
a-fc

Total
a+b
c+d
n

(i) The sensitivity of a test (or symptom) measures how well it detects disease or
set of symptoms. It is the proportion among those with the disease who give
a positive result. That is, it gives the probability of a positive test result (or
presence of the symptom) given the presence of the disease. In the context of
the above table, then, it is given by:
a
P(T D} =
a +c
(ii) The specificity of a test (or symptom) measures how well it detects absence
of disease. It is the proportion of those without the disease that are correctly
diagnosed (i.e., negative result). Again, this is the probability of a negative
test result (or absence of the symptom) given the absence of the disease. That
is, it is
ft

P(f \D} = -?
^
' b+d

(iii) The predictive value positive of a screening test or symptom is the probability that subject has the disease given that the subject has a positive screening
test result (or has the symptom):

P(D T+] =

P(T D}P(D]
P(T | D) P(D] + P(T D) P(D]

(iv) The predictive value negative of a screening test (or symptom) is the probability that subject does not have the disease given that the subject has a
negative screening test result (or does not have the symptom).
P(D | f) =

P(T D)P(D)
P(T D} P(D] + P(T | D) P(D)

116

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

Example
As an example, consider the data here from Pagano and Gauvreau (1993). Among
the 1,820 subjects in a study, 30 suffered from tuberculosis and 1,790 did not. Chest
x-rays were administered to all individuals; 73 had a positive x-ray, indicating the
significance presence of inflammatory disease, and 1,747 had a negative x-ray. The
data for the study are displayed in Table 4.18.
X-ray result
Positive (T)
Negative (?)
Total

Disease status
Present (D) Absent (D)
22
51
8
1739
30
1790

Total
73
1747
1,820

Table 4.18: Study data on Tuberculosis


We note that the prevalence of the disease (tuberculosis) in the population is
30/1820, or 1.65%.
(a) The sensitivity of the test is given by:

22
P(T | D] = = 0.7333
oU

A false negative occurs when the test of an individual who has tuberculosis incorrectly indicates that the individual does not. In our example this is given by:
P[test negative | tuberculosis] = -j^ = 0.2667. That is,
P[false negative] = 1 P[sensitivity]
(b) The specificity of the test is given by:

173Q

Since not all individuals tested are actually carrying tuberculosis, 2.85% of the
tests were false positive outcomes, which implies that
P[test positive | no tuberculosis] = p^Q = 0.0285
That is, P[false positive] = l-P[specificity]
(c) The predictive value positive of the test that is, the probability that a subject
who is positive on the x-ray test has tuberculosis is:
P(D

P(T D}P(D]
P(T | D) P(D] + P(T | D) P(D]
(0.7333)(0.0165)
~ (0.7333)(0.0165) + (0.0285)(1 - 0.0165)
= 0.3015

since, P(T \ D} = 51/1790 = 0.0285. We see that the predictive value of the
test is low.

4.9. MEASURES OF ASSOCIATION IN 2 x 2 CONTINGENCY TABLES

117

(d) The predictive value negative of the test that is, the probability that a subject
who is negative on the x-ray test is tuberculosis free, is:
P(T D} P(D}
P(D
T} =
'
{
' P(f | D) P(D) + P(f | D) P(D)

(1-0.0285)(1-0.0165)
~ 1 - 0.0285)(1 - 0.0165) + (1 - 0.7333)(0.0165)
(0.9715)(0.9835)
~ (0.9715)(0.9835) + (0.2667)(0.0165)
= 0.9954
We observe here that both the predictive value positive and negative can be easily
computed from the above table respectively as:
22
P(D I T] = -

= 0.3014

P[D | T] =

= 0.9954

It should be noted that for rare diseases, high values of specificity or sensitivity are
not necessarily sufficient to ensure that a large proportion of those testing positive
actually have the disease.

4.9.6

Product Moment Correlation Coefficient

Consider a population with the following underlying probability structure.


B
A
0
1
Totals

Totals

TTn

7Ti2

7Ti+

71"21

7T22

7T2+

7T+1

7T+2

Means of rows are:


- l ( 7 T 2 i ) + ll

7T22 = 7T2+

Similarly,
E( Column) =
and

+ 0(7T 2 i) + I(7ri 2 ) +

Cov(A, B)
V/Var(A)Var(J5)

From the above specified probability structure, we have


E(A) = 7T2+

so that

E(B)

= 7T+2

E(AB)

= 7r22

= 7T12 + 7r22 = 7r+2

118

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY TABLES


COV(A, B) = 7T22 - 7T2 + 7T+2

= CT12

Thus the variances are obtained as:


Var(A) = 7T2+(1 - 7T2+) = 7Ti+7T2+ = a?
Var(B) = 7r+2(l - 7T+2) = 7T+i7r+2= 0-3
and hence,

i2 _

^11^22 -

In the above formulation, we are assuming that the response variables A and B are
each measured on a dichotomous (or binary) scale for each of a sample of N subjects,
and that the measurement scale can be coded as 0 (absent) and 1 (present) so that
the usual arithmetic means represent the proportion present for each variable.
Properties of p
1. p is invariant under a change of both rows and columns, p changes only its
sign if either rows or columns are interchanged, but not both.
2. Under independence, p 0.
3. p +1 if and only if 7Ti2 = 7T2i = 0.
4. p = -1 if and only if TTH = 7r22 = 0.
5. p is invariant under positive linear transformations, so that the same value of
p is obtained under scoring the rows and columns by any monotonic increasing
functions of 0 and 1 .
An MLE for p is given by:

p~r=

(ni + n 2 + n+in + 2 ) 2

Under HQ : p = 0 and Var(p) = I/TV.


In most statistical literature,

The above is also known as the phi (</>) coefficient. For the data in Table 4.4, we
^

14 x 72 - 8 x 61

520

~ '

Var(/o) jf = 7^5 = 0.0065, and an approximate 95% CI for p equals


0.1241 1.96^0.0065 = 0.1241 0.1574 = (-0.0333, 0.2815)
This interval again includes zero, and we would again fail to reject the null hypothesis of independence or homogeneity.

4.9. MEASURES OF ASSOCIATION IN 2 x 2 CONTINGENCY TABLES

119

A function of p that has seen most application is the "coefficient of mean square
contingency" proposed by Pearson (1904). It is defined as

P=
An estimate of P is given by

p=
We give some of these measures as reproduced from a modified SAS software output.
OPTIONS NODATE NONUMBER LS=77 PS=66;
DATA MEASURE;
DO STATUS=1 TO 2;
DO SMOKE=1 TO 2;
INPUT COUNT <0<0; OUTPUT;
END; END;
DATALINES;
14 61 8 72
PROC PRINT; TITLE 'MEASURES OF ASSOCIATION';
PROC FREQ; WEIGHT COUNT; TABLES STATUS*SMOKE/ALL NOCOL NOPCT;
RUN;
Statistics for Table of STATUS by SMOKE
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Continuity Adj . Chi-Square
Mantel-Haenszel Chi-Square
Phi Coefficient
Contingency Coefficient
Cramer's V

DF

Value

Prob

1
1
1
1

2.3873
2 . 4068
1.7288
2.3719
0.1241
0.1232
0.1241

0.1223
0.1208
0 . 1886
0.1235

MEASURES OF ASSOCIATION
The FREQ Procedure
Statistics for Table of STATUS by SMOKE
Statistic
Pearson Correlation
Spearman Correlation

Value

ASE

0.1241
0.1241

0.0784
0.0784

Estimates of the Relative Risk (Rowl/Row2)


Type of Study

Value

Case-Control (Odds Ratio)


Cohort (Coll Risk)

957. Confidence Limits

2.0656
1.8667

0.8124
0.8308

5.2521
4.1941

Estimates of the Common Relative Risk (Rowl/Row2)


Type of Study

Method

Case-Control
(Odds Ratio)

Mantel-Haenszel
Logit

4.9.7

Value
2.0656
2.0656

957. Confidence Limits


0.8124
0.8124

5.2521
5.2521

Choosing a Measure of Association for a 2 x 2 Table

Bishop et al. (1975) suggest that the choice of a measure of association is basically
between Q and p, since both measures take the value 0 under HQ, the model of

120

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

independence. Further, they always lie between 1 and +1 and have reasonable
interpretations.
1. The measure Q takes the value +1 or 1 whenever any one of the cell probabilities in a 2 x 2 table is zero, whereas for p = + 1 or 1, both entries on one
of the diagonals must be zero.
2. In a given table, the marginal totals may constrain the cell entries in such a
way that p cannot take the value of +1 or 1. That is, p is a margin sensitive
measure. In fact, p =~ 1 implies the margins are the same for both variables
The more different the margins, the lower is the upper bound for p. On
the other hand, Q is not affected by row and column multiplications and
can assume its full range of values irrespective of the distribution of the
marginal totals.
Choice between Q and p depends to a great extent on whether we wish
to use a measure that is sensitive to margins and whether one wishes to
consider association complete when only one cell of the 2 x 2 table is zero
(Q) rather than two cells zero (p).

4.10

Analyzing Several 2 x 2 Contingency Tables

We now consider in this section the extension to analyzing several twofold contingency tables arising from stratified studies. We are interested in combining the
information in each of the several 2 x 2 tables without sacrificing the inherent associations in each of the several tables. The procedure for doing this is presented in
the following sub sections.

4.10.1

Combining Several 2 x 2 Tables

Often, associations between two categorical variables are examined across two or
more populations. The resulting data usually lead to several (say, h) 2 x 2 contingency tables. Table 4.19 relates to the effect of passive smoking on lung cancer.
It summarizes results of case-control studies from three different countries among
nonsmoking women married to smokers. These tables can sometime come from a
single study that has been stratified by some factor (in this case, country) that
might be a confounder. The goal is usually to be able to combine the tables in
order to have a unified information across the tables.
We would like to combine the evidence from the three countries to make an overall
statement about whether having lung cancer is independent of passive smoking. The
conditional test for these data within each country can be obtained by computing
Fisher's exact test separately for each subtable or obtain Pearson's X2 for each
table. Thus the value of X2 is first computed for each country, and the results
suggest that neither of them is significant, which suggests that lung cancer status is
independent of passive smoking by wives whose husbands are smokers within each
country. These results are given in the SAS software partial output below.
STATISTICS FOR TABLE OF SMOKE BY STATUS

4.10. ANALYZING SEVERAL 2 x 2 CONTINGENCY

121

TABLES

Cancer Status
Country
Japan

UK

USA

Totals

Spouse
Smoked?
Yes
No
Totals

Cases
73
21
94

Control
188
82
270

Total
261
103

Yes
No
Totals

19
5
24

38
16
54

57
21

Yes
No
Totals
Yes
No
Total

137
71
208
229
97
326

363
249
612
589
347
936

500
320

364

78

820

818
444
1262

Cases
Prop.
0.28
0.20
0.26

2.216

0.33
0.24
0.31

0.653

0.27
0.22
0.25
0.28
0.22
0.26

X2

2.800
5.678

Table 4.19: Cancer status according to whether spouse smoked, from three countries
Analysis for JAPAN
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Continuity Adj. Chi-Square
Sample Size = 364

DF

Value

Prob

1
1
1

2.216
2.288
1.838

0.137
0.130
0.175

Value

Prob

0.653
0.674
0.283

0.419
0.412
0.595

DF

Value

Prob

1
1
1

2.800
2.833
2.532

0.094
0.092
0.112

Analysis for UK
Statistic

DF

Chi-Square
1
Likelihood Ratio Chi-Square
1
Continuity Adj . Chi-Square
1
Sample Size = 78
Analysis for USA
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Continuity Adj. Chi-Square
Sample Size = 820

For the countries, the estimated odds ratios are 9j 1.5162, OUK = 1-600, and
QUS = 1-3236, with corresponding 95% confidence intervals given respectively by
(0.8745, 2.6288), (0.5090, 5.0293), and (0.9526,1.8390). All the three intervals include 1, indicating independence. The three odds ratios above are essentially estimating the same population value, and the estimated values differ only because of
sampling variability. The corresponding pvalues based on G2 for the three countries
are respectively 0.130,0.412, and 0.092. All the three are not significant at a 0.05.
In searching for an overall test of the above hypotheses, one may be tempted
to collapse these tables across countries as displayed in Table 4.20, which gives the
combined table collapsed across countries.
Analysis of the data in Table 4.20 gives X2 5.678 on 1 d.f. (significant), which now
suggests that lung cancer status is related to passive smoking among wives whose
husbands smoked. The corresponding SAS software output is displayed below.
Here, 0 = 1.3908 and the corresponding confidence interval equals (1.0598,1.8258),
which does not include 1, indicating once again that lung cancer is strongly associated with passive smoking among this group of women.

122

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY


Spouse
smoked?
Yes
No
Total

Cancer status
Cases Control
229
589
97
347
326
936

TABLES

Total
818
444
1262

Table 4.20: Collapsed table over country


Combined analysis collapsed over country
STATISTICS FOR TABLE OF SMOKE BY STATUS

Chi -Square
Likelihood Ratio Chi-Square
Continuity Adj . Chi-Square
Sample Size = 1262

1
1
1

0.017
0.016
0.021

5.678
5.779
5.362

This result obviously contradicts the results from the individual country analyses,
where it was concluded that lung cancer is independent of smoking status of this
group of women. This apparent contradiction (change in direction) with the earlier
results is what is known as Simpson's paradox. This contradiction can immediately
be attributed to the large differences in the overall number of individuals sampled
from each country. While a total of 364 respondents were sampled from Japan, only
78 were sampled from the United Kingdom, and an even larger number (820) was
sampled from the United States. There does not seem to be much differences in the
case proportions for the three countries, being respectively Japan (0.258), United
Kingdom (0.308), and United States (0.254). We will reanalyze these data later
using a log-linear model approach.
Simpson's paradox can also be due to the magnitude of or strength of an association. Consider the following example, which was published in Pagano and
Gauvreau (1993). The data relate to the study of the relationship between smoking
and aortic stenosis, stratified by gender.
Smoker
Gender
Males

Females
Combined

Aortic
stenosis
Yes
No
Total
Yes
No
Total
Yes
No
Totals

Yes
37
24
61
14
19
33
51
43
94

No
25
20
45
29
47
76
54
67
121

Total
62
44
106
43
66
109
105
110
215

Table 4.21: Data on relationship between smoking and aortic stenosis: Stratified
by gender
The SAS software partial output for both the individual tables (males and females) together with the combined table analysis is displayed below. In the above

4.10. ANALYZING SEVERAL 2 x 2 CONTINGENCY TABLES

123

SAS software output, we notice that we would fail to reject the null hypothesis
of independence for either the individual gender tables or the combined table. In
this case, there is no contradiction or directional changes between the individual
tables and the combined table results. Notice, however, the magnitude of X2 in
each analysis (or the relative magnitudes of the pvalues). Clearly, the strength appears weaker for the combined data than for either males or females. In this case,
Simpson's paradox manifests itself in terms of the strength of the association.
Analysis for HALES
STATISTICS FOR TABLE OF AORTIC BY SMOKER
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Sample Size = 106

DF

Value

Prob

1
1

0.277
0.277

0.598
0.599

Analysis for FEMALES


STATISTICS FOR TABLE OF AORTIC BY SMOKER
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Sample Size = 109

DF

Value

Prob

1
1

0.175
0.175

0.675
0.676

Combined analysis collapsed over GENDER


STATISTICS FOR TABLE OF AORTIC BY SMOKER
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Sample Size = 215

DF

Value

Prob

1
1

1.962
1.965

0.161
0.161

In both examples that we have considered above, we observe that it would not be
wise to collapse the data set in Table 4.19 over the factor variable country without
serious distortion to the association between the two variables. Similarly, it would
also not be wise to collapse the tables over the factor variable gender in the second
example without distorting the strength of the association within the individual
tables.
In general, we are interested in collecting information for each of several 2 x 2
tables across the levels of the subpopulations (which may be determined by various
configurations of factor variables or covariates).
In many cases, the primary question involves the relationship between an independent variable (factor) that is either present (1) or absent (2) and a dependent
(response) variable that is either present (1) or absent (2) in the presence of several
covariates. This could give rise to frequency data that may be summarized as a set
of 2 x 2 tables (see Table 4.22).
In these tables, i = 1,2, , h indexes the separate levels of the covariates set or
the stratified subpopulations. Let ai,i = l , 2 , - - - , / i denote the number of subjects
in the sample who are jointly classified as belonging to the i-th table, and the pivot
cell for the i-th subtable.
Further, let (a,i + bi) and (cj + di) denote the row marginal totals and (a* + Ci)
and (bj, + di) the corresponding column marginal totals for subtable i.

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY TABLES

124

Response
Total
Factor (1)
(2)
62 a2 +62
2
(1)
d-2 C2 + d2
C2
(2)
Total a2 + c2 &2 + 0*2 n-i

Response
Factor (1)
(2) Total
Oi
61 a\ + bi
(1)
di ci +di
Cl
(2)
Total a\ + GI 61 +di HI

Response
Factor (1)
(2) Total
bi di + &i
a
(1)
Cj + di
di
Ci
(2)
Total di + Ci bi + di Ui

Response
Factor (1)
Total
(2)
ah
6/ a/i + 6/1
(1)
dh C/i + d/j
Ch
(2)
Total a/i + Ch bh +dh n/i

Table 4.22: Summary of the set of 2 x 2 Tables


In particular, let a^ be the "pivot" cell frequency of subjects in the i-th table who
have both factor and response present. Under the assumption that the marginal
totals are fixed, the overall null hypothesis of no partial association against the alternative hypothesis that on the average across the h subtables, there is a consistent
relationship between the row and column variables is conducted by obtaining the
Cochran-Mantel-Haenzel (CMH) test statistic X^H, which is computed as follows:
For table i, for instance, n, = (a,i,bi,Ci,di) follows the hypergeometric distribution and therefore

P{m | H0} =
From results in (4.5) and (4.7), it follows that the expected value for the pivot cell
in the i-th subtable is given by:

E(ai I Ho) = rhi =


Var(a; \HQ) = Vai =

where n =

m = V"^ m^ , and V = 2_. Vi

are

the corresponding sums of

the observed frequencies, expected frequencies, and variance of the pivot cell across
the subtables. Then the Cochran-Mantel-Haenzel, (1959) statistic is obtained as

QCMH

(n m)"
V

and it is distributed x2 with 1 d.f. for large n.


In the data in Table 4.19, we have:

4.10. ANALYZING SEVERAL 2 x 2 CONTINGENCY TABLES

n = 73 + 19 + 137
_ 261 x 94
El
~ 364

125

= 229
_

*-^
3

--^

= 211.7689

(261)(103)(94)(270)
3642(363)
(57)(21)(24)(54)
=3 3115
^2=:
78^(77)
'
= (500)(320)(208)(612)=
3
8202(819)
1

V = V Vi

= 54.4823

Then X\fH = (229 - 211.7689)2/54.482 = 5.45, which is significant and clearly


indicates cases of lung cancer are related to spousal smoking for these groups of
women across the three countries. The following are the SAS^ software statements
(with a partial SAS software output) for implementing the CMH test.
DATA CHH;
INPUT COUNTRY$ SMOKE STATUS COUNT (DO;
CARDS;
japan 1 1 77 japan 1 2 188 japan 2 1 21 japan 2 2 82
Uklll9ukl238uk215uk2216
usa 1 1 137 usa 1 2 363 usa 2 1 71 usa 2 2 249
TITLE 'COCHRAN-MANTEL-HAENSZEL TEST';
PROC FREQ;WEIGHT COUNT;
TABLES COUNTRY*SMOKE*STATUS/NOPRINT CMH1;
RUN;
Summary Statistics for SMOKE by STATUS
Controlling for COUNTRY
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic
1

Alternative Hypothesis
Nonzero Correlation

DF

Value

Prob

5.4497

0.0196

Estimates of the Common Relative Risk (Rowl/Row2)


Type of Study

Method

Value

957. Confidence Limits

Case-Control
(Odds Ratio)

Mantel-Haenszel
Logit

1.3854
1.3839

1.0536
1.0521

1.8217
1.8203

Cohort
(Coll Risk)

Mantel-Haenszel
Logit

1.2779
1.2760

1.0366
1.0351

1.5753
1.5729

Cohort

Mantel-Haenszel

0.9225

0.8642

0.9848

126

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

(Col2 Risk)

Logit

0.9223
Total Sample Size = 1262

0.8640

TABLES

0.9845

The resulting Cochran-Mantel-Haenszel statistic computed from SAS software is


5.4497, which agrees with the value we computed above. The statistic is based on
1 d.f., giving a pvalue of 0.0196. This again confirms the conclusion that incidence
of lung cancer among married women is associated with spousal smoking, after
adjusting for country effects.

4.10.2

Estimating the Common Odds Ratio

While the Cochran-Mantel-Haenszel test provides the significance of the relationship


between lung cancer status and spousal smoking across the subtables, it does not
tell us the strength of this association. An estimator of the common odds ratio is

siven by-

h MH =

EtiMiM
v^
A7~
Z-,t=l i i/ i
c

In our example, the estimate of this common odds ratio in favor of being a lung
cancer case with spousal smoking after controlling for the countries is:
a
_ (73 x 82/364) + (19 x 16/78) + (137 x 249/820) _
MH
~ (188 x 21/364) + (38 x 5/78) + (363 x 71/820) ~~ '
SAS software gives the estimate of this common odds ratio as the case-control
Cochran-Mantel-Haenszel odds ratio. Thus, the odds in favor of a married woman
developing lung cancer are 1.3854 times higher for those whose spouses are smokers
than for those whose spouses are nonsmokers. The expression for a confidence
interval for this common odds ratio is a little complicated, but most statistical
packages readily give these confidence bounds. The 95% confidence interval for this
common odds ratio in our case is (1.0536, 1.8217). This interval does not include
1; therefore, there is dependence on lung cancer status and spousal smoking.
The above estimation of the common odds ratio assumes that the strength of
association as measured by the odds ratios in each subtable is the same. This
assumption is tested by the following homogeneity hypothesis:
To test this hypothesis, the Breslow-Day test is often employed. SAS^ software
automatically gives this statistic, which should be compared to a standard x2 distribution with (h 1) degrees of freedom. In this example, the Breslow-Day test
gives a value of 0.2381 on 2 d.f. and a pvalue of 0.8878, which indicates that we
would fail to reject the null hypothesis of homogeneity of odds ratios across the
subtables. The SAS software output of this test is given below. This result is part
of the general output displayed earlier.
Breslow-Day Test for
Homogeneity of the Odds Ratios
Chi-Square
DF
Pr > ChiSq

0.2381
2
0.8878

The estimate of the common odds ratio is based on the assumption that the
strength of the association is the same in each country. If this were not the case,

4.10. ANALYZING SEVERAL 2 x 2 CONTINGENCY TABLES

127

then we would have believed that there is interaction or effect modification between
country and spousal smoking. The factor variable (country) is often referred to as
the effect modifier. Evidence of the homogeneity of the odds ratios across countries
indicates that there is no significant effect modification in this case.
The logit odds ratio of 1.3839 displayed in the SAS software output above is the
estimate obtained from the weighted regression approach, where

log (0) = S^'ogW)


where

~
log (0i) = log[(aidi)/(feiCi)]

(4.30)

and

111

a 7,

n
Vl

r
OT

For the data in Table 4.19,


"TO

00

"I

21 x 18gJ

= log (1.5162) = 0.4162

5s + Ti + &] ' =(0-0788)"1 = 12'6852


1Q x Ifil

= log (1.6)

Oo X 0 J

5 + t + 1*]"
iQ7 x 24Ql
log (93) = log
J

7T +

=0.4700

=(-3414r1 = 2-9287
= log (1.3236) = 0.2804

= 0-0282

= 35'5181

From (4.30), therefore,


~ _ (12.6852)(0.4162) + (0.4700) (2.9287) + (0.2804)(35.5181)
g ( ^ ~
(12.6852 + 2.9287 + 35.5181)
= 0.32495
Hence, 0 = e-32495 = 1.38396. This estimate agrees with the estimate of the logit
odds ratio from SAS^1. With this estimate, the test of homogeneity of odds ratios
across the subpopulations is given by
= 0.2377

Again, this result is very close to the Breslow-Day test statistic. The above statistic
is also based on 2 degrees of freedom.
Most recently, von Eye and Indurkhya (2000) formulated representations of
the Cochran-Mantel-Haenszel and Breslow-Day tests in terms of log-linear models, which are discussed in this text in Chapter 6.

128

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

4.11

TABLES

Exercises

1. Generate all outcomes consistent with given marginal totals of the table in
this exercise. Based on the definition for extreme or more extreme tables in
(4.9a) to (4.9b), which of the tables so generated are in the tail of the observed
table?
16

18

17

10

27

Conduct exact tests based on both X2 and G2 and compare your results with
those obtained from probability ranking scheme.
2. Show that for a 2 x 2 table of cell counts {n^-}, the odds ratio is invariant to:
(a) interchange of rows with columns and
(b) multiplication of cell counts within rows or within columns by a nonzero
constant.
(c) Show that the difference of proportions do not have these invariant properties.
3. The table for this exercise contains results of a study by Mendenhall et al.
(1984) to compare radiation therapy with surgery in treating cancer of the
larynx.
Cancer
Cancer
Treatment controlled not controlled
Surgery
21
2
Radiation
15
3
The distribution of the pivot cell n\\ is also given by:

18
19
20
21
22
23

Probability
0.0449
0.2127
0.3616
0.2755
0.0939
0.0144

(a) What theoretical sampling scheme is assumed in this analysis? Justify


the range of the pivot cell HI\.

4.11.

129

EXERCISES
(b) Conduct exact tests to test the following two hypotheses separately:

against the alternatives


Ha:0>l

where B is the odds ratio for the table.


(c) Explain how you formed the pvalues in each case and draw your conclusions based on your analyses.
(d) Give the equivalent mid-pvalues for both hypotheses and compare your
results here with that obtained in (b). Which is more conservative?
(e) Is it appropriate to conduct a large sample test in this study.?
4. For the 2 x 2 contingency table with counts n^ following a multinomial distribution with probabilities ^11,^12:^21^22-1 obtain maximum likelihood estimates for testing the hypothesis HQ : TTI+ = TT+I or equivalently, 71*12 = 7T2i,
and use this to obtain an expression for Pearson's X"2.
5. The underlying probabilities for a 2 x 2 table under some sampling scheme
are displayed in the following table.
Samples Success Failure Total
1
1
TTn
7Tl2
1
2
7^22
7I"21
(a) Identify the sampling scheme.
(b) Find the maximum likelihood estimators for TT^ given the homogeneity
hypothesis HQ : TTH TT^I.
6. Suppose in the last exercise the sampling scheme is multinomial; test the
hypothesis that TTI+ = TT+I and show that in this case, the Pearson's X2
statistic reduces to McNemar's test statistic for correlated binary data.
7. For the 2 x 2 contingency table, show that if n^ are distributed as independent
Poisson (Ajj), the conditional distribution based on the fixed sample size n is
the full multinomial.
8. The following data pertain to the presence of loss of memory for patients
on hemodialysis at the University of Michigan Medical Center during two
different time periods.
Loss of memory
Period Present Absent Total
I
34
8
26
II
0
22
22
Total
56
8
48

130

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

TABLES

The investigator is interested in testing whether the incidence of loss of memory has been significantly lowered in time period II. Test an appropriate hypothesis and briefly discuss the choice of model for this analysis.
9. If it is believed that treatment A is better than treatment B, list all possible
outcomes that are extreme or more extreme than the observed table in the
following fictitious table of data.
Treatment
Outcome A
B
Total
Die
8
5
3
24
Live
9
15
32
Total
14
18
Conduct Fisher's exact test on these data. Calculate the corresponding mid-P
probabilities and comment on your results.
10. The table in this exercise gives information for the percentages of applicants
admitted to graduate programs in the six largest majors at the University of
California, Berkeley, in the fall of 1973. There was no evidence to support the
idea that men and women applicants were not equally well qualified.
Men
Women
Number of Number Number of Number
Major applicants admitted applicants admitted
A
825
512
108
89
B
560
353
25
17
C
325
120
593
202
D
417
138
375
202
E
191
53
393
94
F
373
22
341
24

Does there appear any indication of biasedness in graduate admissions based


on sex of the applicant?
Construct separate 2 x 2 tables for both men and women for each major and
conduct individual tests for each subtables. Draw your conclusions.
11. The National Institute for Occupational Safety and Health has developed

a case definition of carpal tunnel syndrome that incorporates three criteria:


symptoms of median nerve involvement, a history of occupational risk factors,
and the presence of physical exam findings. The sensitivity of this definition
as a test for carpal tunnel syndrome is 0.67; its specificity is 0.58.
(a) In a population in which the prevalence of carpal tunnel syndrome is
estimated to be 15%, what is the predictive value of positive test results?
(b) How does this probability change if the prevalence is only 5%?
12. The following data are taken from Pagano and Gauvreau (1993) and relate to
the study to investigate the use of radionuclide ventriculography in detecting
coronary disease.

4.11.

131

EXERCISES
Disease
Present Absent Total
Test
382
302
80
Positive
372
551
179
Negative
452
933
Total
481

(a) What is the sensitivity of radionuclide ventriculography in this study?


What is its specificity?
(b) What is the predictive value negative of the test?
13. Show that Yule's Q falls between 1 and 1.
Give conditions under which Q 1 or Q = 1.
Derive the relationship between Q and the odds ratio.
14. A study for examining the effectiveness of a drug product for the treatment of
arthritis was conducted at four different centers. At each center, 90 patients
were treated, 45 on the test drug and 45 on an identically appearing placebo.
The data are reproduced in the following table (Ott, 1984).
Global Outcome
Clinic Treatment Worse Same Better
1
Placebo
17
10
15
12
14
Test drug
10

Completely
well
8
14

Placebo
Test drug

6
4

20
15

22
10

2
21

Placebo
Test drug

7
5

25
22

12
12

6
11

Placebo
Test drug

2
1

14
12

20
15

14
22

(a) Suppose an investigator wishes to collapse the global outcome categories


into improved and not improved. Comment on this.
(b) Conduct a Cochran-Mantel-Haenszel test on the collapsed data of part
(a) using uniform scores. Draw conclusions.
15. A medical research team wished to evaluate a proposed screening test for
Alzheimer's disease. The test was given to a random sample of 450 patients
with Alzheimer's disease and an independent random sample of 500 patients
without symptoms of the diseases. The two samples were drawn from populations of subjects who were 65 years of age or older where it is assumed that
11.3% of the US population aged 65 and over have Alzheimer's disease. The
data from this study are presented as:

CHAPTER 4. MODELS FOR 2 x 2 CONTINGENCY

132

TABLES

Alzheimer's diagnosis?
Total
Test result Yes (>)
No (D)
441
Positive (T)
5
436
14
495
509
Negative (?)
Total
500
950
450
Obtain
(a) The sensitivity of the test.
(b) The specificity of the test.
(c) The predictive value positive of the test.
16. Suppose a retrospective study is conducted among men aged 50 54 in a
specific county who died over a 1-month period. The investigators attempt
to include approximately an equal number of men who died from CVD (the
cases) and men who died from other causes (the controls). It is found that of
35 people who died from CVD (cardiovascular disease), 5 were on a high-salt
diet before they died, whereas of 25 people who died from other causes, 2 were
on such a diet. The result is displayed in the table below. Is there sufficient
evidence to conclude that there is significant association between salt intake
and cause of death?
Type of diet
Cause of death High salt Low salt Total
CVD
5
30
35
Non-CVD
2
23
25
Total
7
53
60
(a) Generate all tables that are extreme or more extreme as the observed
table.
(b) Compute the probabilities and conduct Fisher's exact test. Why was this
test necessary?
(c) Compute the corresponding mid-P test and draw your conclusions.
(d) Conduct both the asymptotic tests and the exact tests. What is the
equivalent exact test in this case?
17. Two drugs (A, B) are compared for the medical treatment of duodenal ulcer.
For this purpose, patients are carefully matched on age, sex, and clinical
condition. The treatment results based on 200 matched pairs show that for
89 matched pairs both treatments are effective; for 90 matched pairs, both
treatments are ineffective; for 5 matched-pairs, drug A is effective, whereas
drug B is ineffective; for 16 matched pairs drug B is effective, whereas drug
A is ineffective.
(a) What test procedure can be used to assesss the results?
(b) Perform the test and report your result.

4.11.

133

EXERCISES

18. The following are data from two studies that investigated the risk factors for
epithelial ovarian cancer (Pagano & Gauvreau, 1993).
Study I

Term pregnancies
Disease
status
None One or more Total
111
Cancer
31
80
472
No cancer 93
379
Total
124
583
459
Study II

Term pregnancies
Disease
status
None One or more Total
Cancer
188
39
149
No cancer 74
465
539
614
727
Total
113
(a) Estimate the odds ratio of developing ovarian cancer for women who have
never had a term pregnancy versus women who have had one or more in
the first study.
(b) Estimate the odds ratio of developing ovarian cancer for women who have
never had a term pregnancy versus women who have had one or more in
the second study.
(c) If possible, you would like to combine the evidence in these two strata to
make an overall statement about the relationship between ovarian cancer
and term pregnancies. What would happen if you were to simply sum
the entries in the tables?
(d) Conduct a test of homogeneity. Is it appropriate to use the CochranMantel-Haenszel method to combine the information in these two tables?
(e) Obtain the Cochran-Mantel-Haenszel estimate of the common odds ratio.
(f) Interpret the confidence interval estimate for the common odds ratio.
(g) Test the null hypothesis that there is no significant association between
ovarian cancer and term pregnancies at the 0.01 level of significance.
19. Hansteen et al. (1982) reported the results of a clinical trial of propranolol on
patients with mycordial infarction:

Propranolol
Placebo

Sudden death No sudden death


11
267
23
259

Conduct exact tests based on two-tsided and one-sided hypotheses. Calculate


the corresponding mid-P probabilities and comment on your results.

This page intentionally left blank

Chapter 5

The General / x J
Contingency Table
5.1

Introduction

Suppose a sample of N objects is jointly classified according to two different and


independent classifications A and B with J and J classes, respectively. Let n^ be
the observed frequency in cell (i, j) with i 1,2, , / and j = 1, 2, , J. The
observed table can be displayed as in Table 5.1.
B
A
1
2

1
nn

2
ni2

J
n\j

^21

^-22

'''

Tl2J

Till

Tlj2

'''

fllJ

Total

N+i

N+2

N+J

Total
NI+
N2+
NI+
N

Table 5.1: Observed I x J contingency table


The observed frequencies can be represented compactly by a vector
n =

where

i = 1,2, ,/
is the vector of observed frequencies from the i-ih row of the table.
Similar to our discussion for the 2 x 2 contingency table, we shall again discuss
the general I x J contingency table under the three sampling schemes A, B and C.
We shall label the case when the row margin is fixed by MI = {-Wi+, , AT/+}.
Similarly, for the column totals M2 = {A/"+i,
135

136

5.2
5.2.1

CHAPTER 5. THE GENERAL I x J CONTINGENCY

TABLE

Multivariate Hypergeometric Distributions


The One-Sample Case

Consider a population of N subjects, consisting of TVi of type 1, 7V2 of type 2, ,


j
and Nj of type J, displayed in Table 5.2 such that
Nj = N.

Population
Sample

ni

nl

''.'.

Types
j

(J-l)

n-

n^

nj

'.''

Total
N
n

Table 5.2: Distribution of subjects


If we take a simple random sample of size n from this population without replacement, then the joint distribution of the number rij of type j within the sample is

where 0 < HJ < n for j = 1,2, , J such that ^rij n- This probability
distribution is called the multivariate hypergeometric distribution. The proof of this
can be found in appendix D.I.
The moments of the multivariate hypergeometric distribution are mostly readily
obtained by computing factorial moments. It can be shown (see appendix D.2) that:
E{n3] =
.

(5.2a)
n(N -n}Nj(N -Nj}

,.

(5-2c)
(

5.2.2

Several- Samples Case

Consider an / x J table in which it is assumed that both marginal distributions MI


and M-2 are fixed. In this context, the null hypothesis of interest is that the joint
distribution of the I and J categories are randomly distributed. That is, the response
variable B is randomly distributed with the categories of A. In other words, the data
in the respective rows can be regarded as successive set of simple random samples
of sizes {Ni+} from a fixed population corresponding to the marginal distribution
o f B { A T + j}.
On the basis of the null hypothesis, it can be shown that the vector n follows the
product multivariate hypergeometric distribution given by the probability model,

p,{Sl.,} =

nti^+!n/=i*+j!
'

,.,

'

5.2. MULTIVARIATE HYPERGEOMETRIC

DISTRIBUTIONS

137

The proof of the results in (5.3) can be found in appendix D.3.

5.2.3

Moments

By expanding the results for the one-sample hypergeometric model in expressions


(5.2a) and (5.2b), it follows that
EtnalHo) = Ni+N+j
(5.4a)
"

jV

v _ TV. . \ N . .r/v - N , . }

--^

(5.4b)

However, the covariances are slightly different, depending on whether or not the
two frequencies of interest share either a row or a column of the table.
For example, since marginal and conditional distributions of this probability
model are also hypergeometric, it follows that if either i = i' OT j = j', the covariance
between n^ and n^/ can be obtained from the previous one-sample results in (5.2d).
In particular, for same row but different columns observations,

n r
IITI
Ni+(N-Ni+)N +jN+j,
Cav{nijjnij,\H0} = -- N*(N-1)
and similarly, for same column but different rows observations, we have
- j )
2
N (N-l)

The only remaining situation is to obtain the covariance of two frequencies that
share neither the same row nor the same column, that is, Cov(nij,rii>j' HQ). It can
be shown that for this situation (see exercise 5.10) the covariance becomes:
. . . , . , . ,
.
jN+j>
(
fori^i,j^j
(5.7)
In general, therefore, the covariance of any two frequencies riij and n^j/ can be
written compactly as
r
. . 1 _ Ni+N+j(SwN - Ni>+)(SjfN - N+j,}
r
Cov{ny,ni/j-/|^ 0 } - -N2(N - 1)where

_ J 1 if i = i'
10 otherwise

^ii1 ~~ \

_ J 1 if j = j'
10 otherwise

^ ? ? ' ~~ i

It is not too difficult to see that the expression in (5.8) will generate those in (5.5),
(5.6), and (5.7) for appropriate is and js. We also note here that each n^ in Table
5.1 has marginal distributions that are themselves hypergeometric with parameters
(Ni+jN+jiN). That is, each has a mass univariate hypergeometric distribution.

5.2.4

Example 5.1

Stanley et al. (1996) examined the relative efficacy and side effects of morphine and
pethidine (drugs commonly used for patient-controlled analgesia). The study was
a prospective, randomized double-blind study where the subjects were 40 women
between the ages of 20 and 60 years, undergoing total abdominal hysterectomy. The
subjects were allocated randomly to recieve morphine or pethidine. At the end of
the study, subjects described their appreciation of nausea and vomiting, pain, and

'

138

CHAPTER 5. THE GENERAL I x J CONTINGENCY

Pain
Unbearable/severe
Moderate
Slight/none

Drug
Pethidine Morphine
2
2
10
8
8
10
20
20

TABLE

Total
4
18
18
40

Table 5.3: Joint distribution of drug and painful appreciation


satisfaction by means of a 3point verbal scale. The results for those who described
their appreciation as painful is displayed below. It is obvious that both margins of
this table were not fixed in the actual sampling scheme.
We are still, however, interested in the hypothetical question of whether or not the
observed distribution of painful appreciation is randomly distributed with respect
to the type of drugs administered. We shall assume that the marginals are fixed,
and in this case, the multivariate hypergeometric probability model is appropriate
for investigating this hypothesis.
However, unlike the simplicity of enumerating the distribution of the pivot cell
nn in the 2 x 2 table, the I x J table has (/1)( J 1) pivot cells whose distribution
can be used to characterize the entire distribution of the vector n. Specifically, once
the values of (/ 1)( J 1) of the cell frequencies are fixed, the other (/ + J 1)
elements can be determined immediately from the fixed margins M\ and M2. Thus,
without loss of generality, the distribution of n in the example data set can be
completely determined from the distribution of nn,n 2 i as illustrated below.
Pain
Unbearable/severe
Moderate
Slight /none

Drug
Pethidine

20 nn ~ ft<2i ^i
20

Morphine

Total

4 -nn

18 - 7121

18

+ ,n>-. 2i

o
L

10
lo

20

40

Table 5.4: Observed frequencies as functions of the pivot cells and fixed marginals
subject to constraints 0 < n 2 i < 18, and nn + n 2 i < 20
As a result,- nn ranges from 0 to 4 and ni 2 ranges from 2 to 18. Under HQ, MI
and MI are fixed, and the probabilities of each of the 82 possible tables that are
consistent with M\ and M2 are listed below. The SAS software program used to
generate this result is provided in appendix D.4.
Possible pairs of ( n n , n 2 i ) and their
corresponding probabilities-generated using SAS

Obs

1
2
3
4
5
6
7
8
9

PROB

0
0
0
0
0
0
0
0
0

2
3
4
5
6
7
8
9
10

0 . 00000
0 . 00000
0.00000
0 . 00005
0.00041
0.00198
0.00589
0.01122
0.01389

0 . 00000
0 . 00000
0 . 00000
0 . 00005
0 . 00047
0 . 00244
0 . 00834
0.01956
0 . 03345

5.2. MULTIVARIATE HYPERGEOMETRIC DISTRIB UTIONS

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2

11
12
13
14
15
16
17
18
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2
3
4
5
6
7
8
9

0.01122
0 . 00589
0.00198
0.00041
0 . 00005
0 . 00000
0 . 00000
0 . 00000
0 . 00000
0 . 00000
0 . 00007
0 . 00076
0.00462
0.01714
0.04041
0.06174
0.06174
0.04041
0.01714
0 . 00462
0.00076
0.00007
0 . 00000
0 . 00000
0 . 00000
0.00000
0.00003
0.00041
0 . 00320
0.01500
0 . 04408
0.08334
0.10289

0.04468
0.05057
0.05255
0.05296
0.05301
0.05301
0.05301
0.05301
0.05301
0.05302
0.05309
0.05385
0.05847
0.07561
0.11602
0.17775
0.23949
0.27990
0.29704
0.30166
0.30242
0.30249
0.30249
0.30249
0.30249
0.30250
0.30252
0.30293
0.30613
0.32113
0.36521
0.44855
0.55145

43

10

0 . 08334

0.63479

44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3

11
12
13
14
15
16
17
18
2
3
4
5
6
7
8
9
10
11
12

0 . 04408
0.01500
0.00320
0.00041
0 . 00003
0.00000
0.00000
0 . 00000
0 . 00000
0 . 00007
0 . 00076
0 . 00462
0.01714
0.04041
0.06174
0.06174
0.04041
0.01714
0.00462

0.67887
0.69387
0.69707
0.69748
0.69750
0.69751
0.69751
0.69751
0.69751
0.69758
0.69834
0.70296
0.72010
0.76051
0.82225
0.88398
0.92439
0.94153
0.94615

78
79
80
81
82

4
4
4
4
4

12
13
14
15
16

0.00041
0 . 00005
0 . 00000
0 . 00000
0 . 00000

0.99995
1 . 00000
1 . 00000
1 . 00000
1.00000

10

11

139

The observed table has {(nn,n2i) = (2,10)}, which is the 43rd configuration in the
above list, with corresponding hypergeometric observed probability 0.08334 computed from (5.3). Observations numbered from 43 to 82 therefore are those configurations with extreme or more extreme probabilities (to the right of (2,10)) than
the observed table, that is, tables for which P(nn,77,21) < 0.08334, the observed
table probability. Similarly, observations numbered from 1 to 40 therefore are those

140

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE

configurations with extreme or more extreme probabilities to the left of (2, 10)] than
the observed probability.

5.2.5

Fisher's Exact Test for (/ x J) Tables

The choice of a rejection region of size exactly equal to a has the same difficulties
as discussed previously for the 2 x 2 case. On the other hand, one can use the probabilities to compute the pvalue for the test of HQ against the two-sided alternative
that the joint distribution is not randomly distributed for a specific observed table.
The principle behind Fisher's exact test is valid for tables of any size and the exact
test procedure involves:
(i) Calculate null probabilities for each table that are consistent with MI and M2
using expression in (5.3).
(ii) Compute the sum of the null probabilities of those tables that are as extreme
or more extreme than the observed table. Here an extreme table is defined
as any other table consistent with MI and M2 having pivot cell frequencies
^1,^2 such that
Note that for tables in which (/ 1 ) ( J 1) > 1, the notion of a primary and
secondary tail probabilities is not well defined for a general alternative hypothesis
since the tables cannot be ordered explicitly.
For our data in Table 5.3, the two-tailed pvalue is obtained as the sum of all
the probabilities less than or equal to P(2, 10) 0.08334. This total pvalue is
T(2,10) =0.89711.
The above exact test was an extension of Fisher's exact test for the 2 x 2 table to
the general I x J table. The test was due to Freeman and Halton (1951) and is well
known as the exact conditional test. We can implement Fisher's exact test using
SAS software again in this case because the sample size is not too large. Because
about 33% of the cells in this example have expected values that are small, the exact
test is most appropriate in this example. The ranked probabilities for all possible
outcomes are presented in appendix D.5. The Fisher's exact test is implemented
with the following program and partial output.
data tab52;
input trt $ pain $ count 8<B;
if pain=' sever' then resp=3;
else if pain='mod' then resp=2;
else resp=l;
dat alines;
peth sever 2 peth mod 10 peth none 8
morp sever 2 morp mod 8 morp none 10
proc print; run;
proc freq data=tab52 order=data; weight count;
tables trt*resp/exact ; run;
Fisher's Exact Test
Table Probability (P)
Pr <= P
Sample Size = 40

0.0833
0.8971

The results obtained from SAS^1 agreed with those obtained from the exact test
which uses probability ranking. Similar results can be obtained when the ranking

5.2. MULTIVARIATE HYPERGEOMETRIC

DISTRIBUTIONS

141

criteria are the Pearson's X2 and the likelihood ratio test statistic G2 (appendices
D.6 and D.7, respectively). We present the results of these in the table below where
the first column denotes the criterion being used for ranking.
Test
criterion
Probability
X2
G2

P(X22 > Tj)

EXACT
test
0.89711
0.89711
0.89711

0.801
0.800

In the preceeding table, TQ represents the observed value of the corresponding


test criterion. In this example, corresponding observed values of X2 and G2 are
respectively 0.44444 and 0.44536. Because of the near symmetry of the data in Table
5.3 (column marginals are equal and two of the row marginals are also equal), the
number of distinct values of X2 and G2 are 28 each respectively (see appendices D.6
and D.7). Some of the 82 configurations yield the same value of the test statistics. In
this example, the exact test probabilities computed from either ranked probabilities,
X2 or G2 criteria yield the same exact pvalue. This is not often the case, but, as
explained earlier, the symmetry of our table makes the distribution of the test
statistics more discrete.
For larger tables, exact enumeration might not be possible because of the sheer
number of possible configurations that would be consistent with the fixed margins. In such cases, the simulation approach implemented in the statistical package
StatXact will be most appropriate.
The exact probabilities obtained for X2 and G2 are based on the ranking procedure suggested by Radlow and Alf (1975), that is: Carry out exact procedure
as in (i) above. However, the pvalue is obtained by obtaining the sum of the null
probabilities of those tables which have their X2 values greater or equal to the observed X2 value (^Q). The results from SAS software as shown earlier agree with
the results obtained above.

5.2.6

The Mid-P Test

A recent alternative to the exact test is the mid-P test earlier discussed in Chapter
4. The mid-P as defined by Lancaster (1961), and Barnard (1989) is given by

mid-P = Pr{T(u) > T(n0)\H0} + \Pr{T(u] = T(n0)\H0}

where T(u) is the chosen test criterion, which in our case is the Pearson's X2. For
our data, the two-tailed mid-P value will be given as
0.73042 + 0.5 (0.08334 + 0.08334) = 0.81376
In all the four cases (ranked probability, X2, G2, mid-P), we would fail to reject HQ
on the basis of the data given; that is, pain appreciation is randomly distributed
with respect to the type of drug administered. In other words, the type of pain
experienced does not depend on the type of drug administered.

142

5.3

CHAPTER 5. THE GENERAL I x J CONTINGENCY

TABLE

Large Sample Test

As noted in Chapter 4, the main difficulty in carrying out the exact conditional
tests is the sheer number of calculations involved. One must generate all / x J
arrays of nonnegative integers in the set
5 = {njj-,

for all

i,j}

having the same marginal totals Mi,M2 as the given table. The conditional probability in (5.3) as well as the test statistic X2 must be computed for each such table
in S. The number of tables in S, which is denoted by 5 , increase very rapidly
as a function of the sample size JV, I, and J. The \S\ arrays generated are used to
compute values of a GOF such as X2.
For relatively large table dimension, \S\ is too large for the practical implementation of the exact tests even when asymptotic approximations would be crude.
Discussion of the problem of enumerating the number of tables consistent with the
marginal totals is provided by Good (1976), Agresti and Wackerly (1977), Gail and
Mantel (1977), and Agresti et al. (1979). To illustrate this, Klotz and Teng (1977)
gave 1 5 1 as 12,798,781 for a 4 x 4 table data with sample size N = 56.
Earlier algorithms for exact enumerations of an / x J table are March (1972)
algorithm 434, which enumerates all tables and conducts exact conditional tests.
Baker (1977) algorithm AS 112, is applicable to two or one fixed margins or just a
fixed sample size. More recent algorithms on the same subject are those by Mehta,
C.R et al. (1983, 1990). The statistical software StatXact uses both the exact
and simulation procedures to conduct the conditional tests for large sample sizes.
Senchaudhuri et al. (1993) has just proposed a "smart" method of control variates
or Monte Carlo rescue for estimating pvalues for the exact test in an / x J table.
Agresti (1992) has given a very recent review for two- and three-dimensional
exact methods, and some of the important issues in exact inference for higher dimensional tables are discussed in Kreiner (1992). An exact test of significance for
the 23 contingency table is provided (with the relevant FORTRAN algorithm) by
Zelterman et al. (1993).
One solution to the exact test computational problem is to employ the results
concerning the convergence of the multivariate hypergeometic distribution to the
normal distribution in large samples.
Let in' = (ran, rai2, > ra/j) be the vector of expected values of the n'^s under
HQ, that is,
T\T. AT .
and let

be the matrix of appropriately positioned 1's and O's such that the matrix product
An eliminates the last column and last row of the frequencies in an / x J table.
Here ![/ denotes a U x U identity matrix, O[/ denotes a (U x 1) vector of O's and
denotes the Kfonecker product multiplication. Thus A is an [(/ 1)(J 1) x U]
matrix and if we let
G = A(n - m)
be the vector of (J 1)(J 1) differences of the observed and expected values under
H0, then the variances and covariances of G can be obtained from (5.7).

5.3. LARGE SAMPLE TEST

143

We observe that the matrix A is such that A = [I2, 0( 2)4 )]. Here I2 is the 2 x 2
identity matrix (null d.f.) and 0( 2)4 ) is the 2 x 4 matrix of zeros (see Koch et al.,
1982, and Koch and Bhapkar, 1982, for details).
Then for large N, such that ra^ > 1 for i = 1, 2, , / and j = 1, 2, , J, it
can be shown that
_
. ,,,,,_ ,, r -., TT -,,.
G ~ AMN(0, Var{G|#0})
where AMN denotes asymptotically multivariate normal. Under HQ therefore, it
follows that
is an (7 1)( J 1) degree of freedom statistic for testing HQ, which follows the x2
distribution asymptotically. Moreover, it can be shown by matrix equivalence that
AT

Q=

-1

\^Wni

7r

^2 -

~ rni

N
Thus for large samples, the Pearson's X2 criterion is essentially equivalent to the
large sample test derived from the multivariate hypergeometric model.

5.3.1

Example 5.2

For the data in Table 5.3, we have


/I 0 0 0 0 0
\^0 1 0 0 0 0
That is, A is a 2 x 6 matrix and hence G is given by:
0

0 0 0 n

o o o

where n' = (^11,^125^21)^225^31,^32) and I


I = pivot cells and G reduces to
\Ji2iJ
^lOy

\9.QJ

\l

Furthermore, the covariance structure of G under HO is from (5.8) computed as


1
/ (4)(36)(20)(20)
-(4)(18)(20)(20)\
a

' ' ~ (402)(39) y-(4)(18)(20)(20)


0.9231
-0.4615

Consequently,

(18)(22)(20)(20) J

-0.4615A
2.5385 J

j
/1.1916 0.2166^
f^
[Var{lj -o/J =

From this, we have


l

n-r'tv*
Q
- G tVar{G
H0}\ r-({\
G - (0 i1) rtcfT\r

-- 2166

-- n0.4333
4^

144

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE

Alternatively, we can use the fact that under HQ, the expected values are given by

to compute the Pearson's X2 criterion as:


3

(2 - 2) 2
2
2
| (8-9)
9
= 0.4444

(2 - 2) 2 (10 - 9)
2
9
2
(8-9) | (10-9)-

We note from the above results that


39
Q = (0.4444) =0.4333
which agrees with our earlier result.
The large sample test is implemented in SAS software as:
set tab52;
proc freq data=tabS2 order=data;
weight count; tables trt*resp; run;
STATISTICS FOR TABLE OF A BY B
Chi-Square
0.444
0.801
Likelihood Ratio Chi-Square
0.445
0.800
Mantel-Haenszel Chi-Square
1
0.228
0.633
Fisher's Exact Test (2-Tail)
0.897
Sample Size = 40
WARNING: 337. of the cells have expected counts less
than 5. Chi-Square may not be a valid test.

The warning in the SAS software ouput above indicates that almost one-third of the
cells have small expected values, and hence the usual x2 large sample approximation
may not be valid here. In such a situation, it is often desirable to use Fisher's exact
test.

5.3.2

Example 5.3

The data below relate to the distribution of 59 female patients with rheumatoid
arthritis who participated in a randomized clinical trial, (Koch et al., 1982). Table
5.5, is a two-way table obtained by collapsing over the covariate variable age.
Treatment
Active
Placebo
Total (n+J-)

Excellent
1
5
2
7

Patient response status


Good Moderate Fair
2
4
3
1
11
5
4
7
7
15

12

Poor
5
5
12

Total
ni+
27
32

17

59

Table 5.5: Distributions of patient responses by treatment

5.3. LARGE SAMPLE TEST

145

The model of independence here will be based on 1 x 4 = 4 degrees of freedom.


Consequently, we would expect A to be a 4 x 10 matrix such that A = [I^O^e].
Here I4 is the 4 x 4 identity matrix (null d.f.) and 64,6 is the 4 x 6 matrix of zeros.
Thus the matrix G that forms the observed and expected frequencies differences is
of t h e form:
/I 0 0 0 0 0 0 0 0 0\
0 1 0 0 0 0 0 0 0 0 [ _ , ]
0 0 1 0 0 0 0 0 0 0 [n mj
\Q 0 0 1 0 0 0 0 0 O/
where n' = ( n n , n i 2 , - - - ,^25) and the expected values are those relating to the
pivot cells nn,ni2,ni3, and nl4. That is, m' = (3.2034,6.8644,5.4915,3.6610).
The variance-covariance of G would be a 4 x 4 matrix, and similar arguments as in
the previous example lead to Q being computed as:
N_,

5 ,

For these data, it can be shown that Q = 11.73 and hence X2 = 11.9322. This value
gives a pvalue of 0.0179, which indicates that the null hypothesis is not tenable in
this case.
We notice for this example that the response variable is ordinal in nature, and
we could therefore exploit this ordinality in remodeling the data. Following Koch
and Bhapkar (1982), let i/ = (1/1, 1^2, ^3, ^4> ^s)' be a set of scores that conforms to
the ordinal response variable. A possible set of scores is integer scores (1, 2, 3, 4, 5)
or scores whose total sums to zero, e.g., (2, 1,0, 1,2). Other possible scores are
the mid-rank scores, which are employed in StatXact, for instance. With our set
scores obtained, we then compute the test statistic as:
N-l A

.0

where

* = ip; ^ = ^; -d V(,,) =
j=l

l+

j=l

i=l

In the above, fji are the sample mean responses for the two treatment groups, and
rj and V(rf) are the finite population mean and variance of subjects, respectively.
For the data in Table 5.5, ?)i = 2.6296, and 7)2 = 3.7188 with fj = 3.2203 and
V( 77) =1.9684, where for instance

1(5) + 2(11) + 3(5) + 4(1) + 5(5)


1(7) + 2(15) + 3(12) + 4(8) + 5(17)
"1 = -2^- ; ^ = -59-

From the above, we have QRS 8.6764, and it is distributed x2 w^h 1 degree of
freedom (Koch et al., 1982; Koch & Bhapkar, 1982). The corresponding pvalue
is 0.0032, which provides a stronger evidence of rejection of the null than the Q
test based on 4 d.f. Thus, as argued by Koch et al. (1982), the QRS is more
powerful than the Q test, and further, it does not require the stringent condition
that expected values must be large since the test refers to means scores, which are
linear combinations of the n^ rather than the riij themselves. The QRS can be
implemented in SAS software and is produced from the CMH and CMH2 options

146

CHAPTER 5. THE GENERAL I x J CONTINGENCY

TABLE

in SAS software. However, the test statistic is given in the SAS software output by
the "Row Mean Score Differ" line. The above result is implemented in SAS software
(with partial output) as follows:
data kh;
input trt $ response $ count fflffl;
datalines;
active ex 5 active good 11 active mod 5 active fair 1
active poor 5 pla ex 2 pla good 4 pla mod 7 pla fair 7
pla poor 12
proc print; run;
proc freq data=kh order=data;
weight count;
tables trt*response/cmh scores=table;
tables trt*response/cmh2 scores=ridit; run;
Summary Statistics for trt by response
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic

Alternative Hypothesis

DF

Value

Prob

1
2
3

Nonzero Correlation
Row Mean Scores Differ
General Association

1
1
4

8.6751
8.6751
11.7278

0.0032
0.0032
0.0195

Summary Statistics for trt by response


Cochran-Mantel-Haenszel Statistics (Based on Ridit Scores)
Statistic

Alternative Hypothesis

DF

Value

Prob

1
2

Nonzero Correlation
Row Mean Scores Differ

1
1

8.7284
8.7284

0.0031
0.0031

The SAS software results agree closely with the results obtained earlier. The test
statistic for the general association we observed is based on the Q test. In either
case, the null hypothesis is not tenable and would strongly be rejected. If we
recognize that the response categories of the data in Table 5.3, are ordinal (severe,
moderate, none), a trend model applied to this data giving severe, moderate, and
none scores of (3, 2,1), respectively, gives the value of the test statistic QRS 0.228
on 1 d.f. with a pvalue of 0.633, again indicating that there is no reason to believe
that there is a trend in the degree of painful appreciation expressed in relation to
the type of drug being administered.

5.4

Product Multinomial Probability Model

For data arising from stratified simple random sampling, only the one marginal
distribution MI is assumed to be fixed, and again in this case, the underlying
probability structure for the observed frequencies may be summarized as shown in
Table 5.6.
The above probability structure can be written compactly in vector notation as
II' = (TT^ , Tv'2, , TT j) where
nj = ( 7 r i i , 7 r i 2 , - - - ,7Tij),

for i = 1,2, - , / .

Here the TT^- are unknown parameters such that 0 < TT^ < 1 and they satisfy
the constraints
?TJJ = 1 for i 1, 2,.., /

5.4. PRODUCT MULTINOMIAL PROBABILITY MODEL


Sample
1
2

Total

TTn

7Ti2

'''

TTu

7T21

7T22

'''

^27

1
1

""/I

7T/2

'''

7T/J

147

Table 5.6: Probability structure with only MI fixed


The corresponding observed table is displayed in Table 5.7. The observed frequencies can also be represented compactly by a vector
n =

,nj),

where

and the latter is the vector of observed frequencies from the i-th row of the table.
Under this sampling scheme, n^ follows the multinomial distribution with parameters Ni+ and TT^ for i = 1 , 2 , - - - ,/. Since these are independent samples, it
follows that the vector n follows the multivariate product multinomial probability
model
j
ly
N'

i=l
I

(5.9)

with the constraints \~\ nij Ni+ and

= 1 for i 1, 2, , /.

Total

"17

NI+

'

"27

^*2+

"72

"77

N+2

N+J

Sample
1
2

1
nu

2
"12

"21

"22

I
Total

"/I

N+l

Nl+
N

Table 5.7: Observed table for the underlying probability Table 5.6
5.4.1

Homogeneity Hypothesis

In the framework of (5.9), the usual hypothesis involves the comparison of the
distribution of say the column variable (B) from the / subgroups in the sense of
homogeneity for specific categories. In particular, let the common parameter for
the j-th level of B be denoted as TT*J for j = 1, 2, , J. Thus the null hypothesis
of row homogeneity can be stated formally as
Ho

K*

for

,J

Thus under HQ in (5.10), the probability model (5.9) simplifies to

(5.10)

148

CHAPTER 5. THE GENERAL Ix J CONTINGENCY TABLE

(5.11)

The number of degrees of freedom is computed as the total number of cells minus
the number of independent linear constraints on the observations minus the number
of independent parameters to be estimated from the data. That is,
Degrees of freedom = number of cellsnumber of Constraintsnumber of parameters
Specifically, for testing hypothesis (5.10), we have
d.f. =IJ -I-(J

-1}=IJ -I -J + l = (I - l ) ( J - 1)

since (J 1) parameters are being estimated and there are / constraints.

5.4.2

Parameter Estimates

For each row indexed by i = 1, 2, , /, let the sample proportions be denoted by


13

Pij = AT, ,
i+

for j = l , 2 , . - - ,J

Then from previous multinomial results in Chapter 2, we have


E{pij}

= Kij

and

The variance-covariance matrix therefore can be written as:


Pi2PiJ

H+

where

PP.

P' = ( P i , p 2 , - " ,Pi); with

Pi = ( P i i > P i 2 , - - ' ,Pu)

Now under HQ, it can be shown that an unbiased MLE for the {TT*J} can be obtained
as

N
+3
p,j = -j^-,

t
for

3 = i1 , o2 , ' - - ,J7

Therefore, for each row indexed by i = 1, 2, , / , the expected frequencies under


HQ are given by
N N

"o} = ^

5.4. PRODUCT MULTINOMIAL

PROBABILITY MODEL

149

As a result, under HO,

E{(pij-p*j)\H0}

= Q,

,/ and j = 1, 2, ,J

(5.12)

= 1 for i = 1, 2, , 7, without loss of generality, the (7 1)( J 1)

But since

linearly independent differences in (9) have expected value 0. With the above
results, Pearson's test statistic
/ J

can be computed by using the MLE cell estimates under HO, and the statistic
follows a x2 distribution approximately with (/ 1)(J 1) degrees of freedom for
sufficiently large N.

5.4.3

Example 5.4

The example below is adapted from Lyman Ott (1984). Independent random samples of 83, 60, 56, and 62 faculty members of a state university system from four
system universities were polled and by which of the three collective bargaining
agents (union 101, union 102, union 103) was preferred. The resulting data rae
displayed in Table 5.8.

University
1
2
3
4
Total

Bargaining agent
101 102
103
42
29
12
31
23
6
26
28
2
8
17
37
107
97
57

Total
83
60
56
62
261

Table 5.8: Observed table for the cross-classification


Interest centers on whether there is evidence to indicate a difference in the distribution of preference across the four state universities. For universities 1 and 2 the
majority preference was for unit 101, while majority preference for universities 3
and 4 were from units 102 and 103 respectively.
Analysis
Let riij denote the frequency of bargaining agent j = 1,2,3 for the university
i = 1, 2, 3,4. Here Ni+ = {83,60,56, 62}, for i = 1, 2,3,4, is fixed by design. Hence,
E(nij} = rhij = Ni+Kij where \. TT^ = 1 and Ni+ is the total faculty in university
i. That is, we let TTJJ correspond to the probability of a bargaining agent being
classified as of type j in university i. The hypothesis of homogeneity is:
=

7T*j,

j 1,2,3

150

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE

Parameter Estimates
N+Since we have shown that p*j = ^- , we therefore have,
N
97
p^ = = 0.40996;
= = 0.37165; and

P*a = ^ = 0.21839
The expected values for instance for the column "101" are respectively,
83(0.40996) = 34.0267
60(0.40996) = 24.5976
56(0.40996) = 22.9578
62(0.40996) = 25.4175
And similarly for the remaining two columns. These computed expected values,
together with the observed values are displayed in Table 5.9.
University
1

2
3
4
Total

Bargaining agent
102
101
103
42
12
29
34.0267 30.8470 18.1263
31
23
6
24.5976 22.2990 13.1034
2
26
28
22.9578 20.8124 12.2298
17
37
8
25.4175 23.0423 13.5402
107
97
57

Total
83
83.00
60
60.00
56
56.00
62
62.00
261

Table 5.9: Observed and expected values for the data in Table 5.8
Hence, Pearson's X2 is computed as
(42 - 34.0267)

(37-13.5402)
+
34.0267
13.5402
The corresponding likelihood ratio test statistic is
= 71.991
In both situations the pvalue = 0.0000 which indicates that we would have to reject
the null hypothesis of no differences in the distribution of preference across the
four universities. That is, we must conclude that bargaining agent was not uniform
across the universities. The SAS software implementation for this data is presented
in the next section.
Alternatively, we could compare the X2 value of 75.197 with the standard x2
distribution with (4 1) (3 1) = 6 degrees of freedom.
Having established the fact that the null hypothesis is untenable, we would
next wish to locate those cells that are not consistent with the null with a view to
isolating them. We shall endeavor to answer specific questions with regard to the
data above later in this chapter.

151

5.5. THE FULL MULTINOMIAL PROBABILITY MODEL

5.5

The Full Multinomial Probability Model

If we assume that neither of the marginal distributions MI or M2 are fixed, then the
underlying probability structure for the observed frequencies can be summarized as
in Table 5.10. Data giving rise to this structure are usually derived from a simple
random sample from a bivariate distribution.
Sample
1
2
I
Total

Total

T^ll

7Ti2

'''

TTi J

7Ti+

7T21

7T22

'''

T^IJ

7T2+

Kll

7T/2

' '

TT/j

T/+

K+l

7T-I-2

'''

K+J

Table 5.10: Observed probability structure under this scheme


Here, the KIJ are unknown parameters such that 0 < TT^ < 1 and they satisfy the
constraints y / " 7 r i j = 1 fr * = 12, ..,/. The corresponding observed table of
frequencies is displayed in Table 5.11.
Sample
1
2

nn

ni-2

nu

n2i

n22

H2j

Total
NI+
N2+

I
Total

nn
A^+i

nn
JV+2

"
...

njj
AT+J

Ni.
N

Table 5.11: Observed frequency table for the probability table in 5.10
The observed frequencies can be represented compactly by a vector
n =
and n follow the multinomial distribution with parameters ./V and IT, which can
be written as
P{n|AT,n}=(^7r^2i:

21

/>7

(5.13)

with the constraints

5.5.1

= TV and

IT aJ = 1.

Independence Hypothesis

The null hypothesis of independence is of primary interest here. This hypothesis


can be stated formally as
#o : TTij = 7ri+7T+j, for i = 1, 2, , / and j = 1, 2, , J
(5.14)

152

CHAPTER 5. THE GENERAL I x J CONTINGENCY

TABLE

Under HQ as in (5.14), the joint probabilities can be obtained directly as products


of the corresponding marginal probabilities. As a result,

The null hypothesis in (5.14) can equivalently be stated as


HQ : log (ay) = 0; for t = l , 2 , - - - ,(/-!); j = l,2,~- ,(/-!)
Under HG, the probability model in (5.13) simplifies to
j

(5.15)

and the number of degrees of freedom is given by:


d.f. = / J - l - ( / + J - 2 ) = J J - / - J + l = (7

Since there is one constraint and ( / ! ) + (/ 1) parameters to be estimated from


the data.

5.5.2

Parameter Estimates

For each row indexed by i = 1, 2, , /, let the overall sample proportion be


Pij = jj-, for t = l , 2 , - - , ( / - ! ) ; j = 1,2, - - - , (J - 1)
Then from earlier results on the multinomial in Chapter 2, we again have
-

Kij(\. Kij}

Cov{Pij,Pij,} =

Under H0, it can be shown that the MLE (but not unbiased) for the TT^ can be
obtained (using similar arguments) as in chapter 3 as:
Ttij - TTj+TT+j =

l+

+3
N

(5.16)

so that the MLE of the n^ can be obtained as

mi =
It should be noted here that the hypothesis of independence in either of the three
forms {nij, ctij, log (c*ij)} is linear only on the log scale (i.e., log-linear) and a straight
forward linear test similar to the previous quadratic form statistics is not possible
here. We also note that the m^ above are not unbiased. This hypothesis can, for
the general I x J table be tested by a straight forward extension of the B2 statistic
(Lindley, 1964) discussed in chapter 4 for the 2 x 2 table. The formulation uses
linear contrasts on the log scale.
In general, the hypothesis of independence is quite readily handled in the loglinear model formulation. Finally, we note that the test statistics for HQ are identical
to those for HO in the product-multinomial as the m^- are the same in both models
when using either the Pearson's criterion X2 or the likelihood ratio statistic G2.

5.6. RESIDUAL ANALYSIS

5.5.3

153

Example 5.5

Again consider the example in Table 5.8, if the null hypotheses of interest in (5.14)
can be written as:
HQ : Bargaining agent preference is independent of university
Ha : Bargianing agent preference is associated with university
We then have under HQ the following expected values:
2^107.34.0268

=30-8467

ir" =13-5403
The above computed expected values are exactly the same as those computed earlier
in Table 5.9 and hence lead to the same X2 =75.197 as before. Once again, our result
indicates that we would have to reject the hypothesis of independence. That is, the
preference for a bargaining agent depends on the university the faculty belongs to.
The SAS software program using PROC FREQ to implement the above described model is presented below. The corresponding modified SAS software output
is presented in appendix D.8. The summary statistics from the model fitting are
also presented below.
data example; input unv agent count <8<B; datalines;

11 42 12 29 13 12 21 31 22 23 236
31 26 32 28 3 3 2 4 1 8 4 2 17 43 37
; proc freq order=data; weight count;
tables unv*agent/chisq expected cellchi2 nocol norow nopercdent; run;
Statistics for Table of unv by agent
Statistic
Chi-Square
Likelihood Ratio Chi-Square

DF

Value

Prob

6
6

75.1974
71.9911

<.0001
<.0001

The first line in the SAS software output in appendix D.8 gives the observed frequencies, the second line gives the corresponding expected values rhij while the
third line gives the individual cell contributions to the overall X2 of 75.1974.

5.6

Residual Analysis

Having established that there is a need to reject the null hypothesis of independence
or homogeneity or randomness in either of the sampling schemes III, II, or I, our
next concern is to locate the source or sources of lack of independence. The simplest
way to do this is to examine one of the followings:
(i) The standardized residuals Zij is defined (Haberman, 1978) as

154

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE


HOI

#02

Cells
Uij

11
12
13
21
22
23
31
32
33
41
42
43

X'2
G2
df

42
29
12
31
23
6
26
28
2
8
17
37

rhi

Zi-

lUij

34.0
30.8
18.1
24.6
22.3
13.1
23.0
20.8
12.2
25.4
23.0
13.5
75.20
71.99
6

1.4
-0.3
-1.4
1.3
0.1
-2.0
0.6
1.6
-2.9
-3.5
-1.3
6.4

39.16
35.50
8.34
28.31
25.66
6.03
26.42
23.95
5.63
13.11
11.89
37.00
10.756
11.45
5

Zij

0.45
-1.09
1.27
0.51
-0.53
-0.01
-0.08
0.83
-1.53
-1.41
1.48
0.00

Table 5.12: Results of fit based on HOI and #02


We note that ^ z?j X2, the Pearson's test statistic. The distribution of
when the model holds is asymptotically distributed normal with mean 0 and
variance 1. A standardized residual will be said to exhibit lack of fit if \zij\ >
2.0.
(ii) The adjusted residuals r^- (Haberman, 1973) is the value of the standardized
residuals z^j, divided by its standard error. For the test of independence in a
two-way contingency table, it simplifies to

where as before m^- = Npi+p+j. Table 5.12 gives the values of Zij for the data
in Table 5.8 under the models of independence HQI and quasi-independence
HQZ. By quasi-independence, we mean that the table exhibits independence
with repect to a reduced or imcomplete table (see definition later).
An examination of the values of the standardized residuals in Table 5.12 under
model HQI shows that cells (2, 3), (3,3), (4,1), and (4,3) all have \zi3 > 2. The
value for cell (4,3) is particularly high. Positive z^ indicated that there are more
respondents observed in that cell than the hypothesis is expecting. Similarly, a
negative z^ indicates that the observed number of respondents for that cell is less
than what is expected under the model of interest. Thus for cell (4,3), there are
far more faculty respondents in that category than is expected under the model
of independence. The four identified cells above have been referred to by various
authors as the "rogue or aberrant" cells (Upton, 1982).
Corresponding values for the Tij can similarly be obtained from the expression
for Tij above. For instance for cell (4, 3) we have

5.7. PARTITIONING OF THE G2 STATISTIC


= 62/261
= 57/261
hence,

155

= 0.23755
= 0.21839

m)
-..- V/13.5(l
, (37
- .2376)(1 - .2184)

PROC GENMOD in SAS software generates both zij and r^ with the keywords
Reschi and Stdreschi, respectively, in the output statement.
We now discuss two ways of looking at the above data more critically with a view
to explaining why the data does not uphold our hypothesis of independence. The
first method is to partition the G2 (rather than X2) from the 4 x 3 table into some
smaller components. The other method is to fit the model of quasi-independence.

5.7

Partitioning of the G2 Statistic

We now wish to decompose the G2 (the likelihood ratio test) statistic, which was
based on 6 d.f. in our analysis above. Specifically, we wish to sub-divide the 6 d.f.
such that one degree of freedom relates in particular to cell (4,3). Since there are
more than one cell with Zij\ > 2, we usually start to partition first the cell with
the highest Zij\, which in this case happens to be cell (4,3). Decomposition of G2
requires some experience and alternative decomposition to the one adopted for this
particular problem may lead to different conclusions. Maxwell (1961) and Iverson
(1979) give detailed treatments of the technique, and we give below some rules that
must be satisfied.
1. If there are g degrees of freedom for the original table, then there can be no
more than g subtables formed. In this case there can not be more than 6 such
subtables.
2. Each observed cell frequencies in the original table must appear as a cell
frequency in one and only one subtable.
3. Each marginal total of the original table must appear as a marginal total of
one and only one subtable.
4. Subtable cell frequencies not appearing in the original table must appear as
marginal totals in a different subtable. Marginal totals not appearing in the
original table must appear as either cells or grand totals.
Based on our observations above, Tables 5.13 to 5.16 give a decomposition of the
observed G2 for the original table into components that satisfy rules 1 to 4 above.
The values of the goodness-of-fit statistic G2 for each of the subtables in 5.13 to
5.16 are displayed and summarized in Table 5.17. We note that the components
sum exactly to 71.9910 which is the G2 for the original 4 x 3 table.
The decomposition works best for the G2 statistic because it can be readily decomposed. This does not work well with X2. To see the decomposing ability of G2
(here into 4 components), suppose we expand G2 as follows:

156

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE

University
1-3
4
Total

Bargaining agent
101 & 102
103
179
20
25
37
204
57

Total
199
62
261

Table 5.13: Dichotomized university and agent, isolating cell (4,3)

University
1
2
3
Total

Bargaining agent
101
102
42
29
31
23
28
26
99
80

Total
71
54
54
179

Table 5.14: Units 101 and 102, universities 1-3


University
1-3
4
Total

Bargaining agent
101
102
99
80
8
17
107
97

Total
179
25
204

Table 5.15: Bargaining units by dichotomized university


University
1
2
3
Total

Bargaining agent
101 & 102
103
71
12
54
6
54
2
179
20

Total
83
60
56
199

Table 5.16: Bargaining units by universities (1,2,3)

N
J

=2

-2

The results in Table 5.17 indicate that the model of independence is quite tenable
for Tables 5.14 and 5.16 and for 5.15 (though not as strong as the latter two at
a = 0.01) but certainly not tenable for Table 5.13. It is evident therefore that only

5.8. THE QUASI-INDEPENDENCE


Table
4.13a
4.13b
4.13c
4.13d
Total

df
1
2
1
2
6

157

MODEL

G'2
60.5440
1.6378
4.8441
4.9660
71.9910

pvalue
0.0000
0.4411
0.0277
0.0835
0.0000

Table 5.17: Results of the decompositions (5.13 5.16)


the (101 & 102, 103) versus (universities 1 3 combined) table is significant. We
can thus recombine the remaining tables to simplify our final summary. Tables 5.14
and 5.16 are now recombined to produce Table 5.18.
University
1
2
3
Total

Bargaining agent
101 102
103
42
29
12
31
23
6
26
28
2
99
80
20

Total
83
60
56
199

Table 5.18: Bargaining units by universities (1 3)


The final decomposition goodness-of-fit statistic G2 values for these subtables are
as displayed in Table 5.19.
Table
lla
lie
lie
Total

df
1
1
4
6

G'2
60.5440
4.8441
6.6038
71.9910

pvalue
0.0000
0.0277
0.1584
0.0000

Table 5.19: Results of the final decompositions


Our conclusions in light of the above analyses are :
(i) Table 5.18 shows that preference distribution is homogeneous between universities 1 to 3.
(ii) At a 0.01, Table 5.15 shows that preference for bargaining units 101 and 102
was homogeneous between universities (1 3) combined and university 4.
(iii) Table 5.13 shows that preference for a bargaining unit is independent of the
faculty's university with the exception that if a faculty member belongs to
university 4, then he or she is much more likely than would otherwise have
been expected to show preference for bargaining unit 103 (and vice versa).

5.8

The Quasi-Independence Model

In our original analysis, we are interested in either the hypothesis of independence


or the hypothesis of homogeneity. We showed that when either hypotheses holds,

158

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE

the expected values and the values of the test statistics are equivalent for both
situations. Under the model of independence, we recall that
HOI ' Kij = Tri+7r+j
(5.17)
From the previous analysis, we recognize that this model does not fit the data.
A natural extension of this hypothesis therefore is to see whether the responses
(preference) are independent for most but not all the cells. That is, suppose we
exclude cell (4, 3) from the model, would the table now exhibit independence on
this reduced (or incomplete) table? The latter hypothesis would thus be termed
the model of quasi-independence and would be given by
#02 : Tfij = TTi+TT+j for (ij) T (4,3)
(5.18)
Hypothesis //02 will be termed the hypothesis of quasi-independence.

5.8.1

Computing Expected Values for Incomplete Tables

When rows and columns correspond to two variables, it is sometimes impossible to


observe particular category combinations. If this logical impossibility occurs in cell
( i , j ) , we say that we have a structural zero and TT^ = m^ = 0, and we refer to
array of nonzero cells as an incomplete table. For example, if cell (4,3) were removed
from our example data above, then the resulting Table 5.20 will be described as an
incomplete table, where for brevity, we have coded "101" as 1, "102" as 2, and
"103" as 3, respectively.
University
1
2
3
4
Total

1
42
31
26
8
107

Unit
2
29
23
28
17
97

3
12
6
2

20

Total
83
60
56
25
224

Table 5.20: Incomplete Table with cell (4,3) deleted


We notice that in this incomplete table, the marginal totals have been correspondingly reduced by the amount of the excluded cell count (in this case, 37). In order
to calculate the expected values under the new hypothesis of quasi-independence
in the complete table, Bishop et al. (1975) had enunciated several procedures. In
particular their rule 3 for block-triangular tables can be employed to calculate expected for this case. Other situations of incomplete tables of exist and for these
other cases, calculating the expected frequencies can be very time-consuming and
it would be reasonable to employ any of the well-known computer algorithms for
such situations. For now we consider the case in which the pattern is of type 3, the
block-trianngular tables.

5.8.2

Block-Triangular Tables

We say that an incomplete table is in block-triangular form if after suitable permutation of rows and columns, 6ij = 0 implies 6ki 0 for all k > i and I > j where

5.8. THE QUASI-INDEPENDENCE

159

MODEL

$ij = 1 for ( i , j ) S1, the set of cells which do not contain structural zeros, and
^ = 0 for (i J) g S.

Examples
We give three examples of tables of this form for 4 x 4 , (i) and (ii), and 5 x 4 (Hi)
tables.

TO21

m32

m32

m33
m43

m34
ra44

m23
rn33
m43
m53

m24
m34
m44
m54

TO 12
m32

(iii)

We call tables like (i) - (iii) block-triangular, because the nonstructural zero cells,
after suitable permutation of rows and columns, form a right-angled triangle with
a block of cells lying along the hypotenuse of the triangle.
Returning to our data, we recognize that the incomplete table in the previous
section with cell (4,3) presumed to be a structural zero satisfies the concept of
block-triangular tables. Hence the expected values are computed as follows. First,
we compute expected values of the adjacent cells to the structural zero cell, that is,
cell (4,2). Thus
m42 = *]*2L - 11-887
(224 - 20)
20 x 56
= 5.628
TO33 =
(224 - 25)
Because of the marginal constraints, we thus have
rh41 = 25 - (11.887) = 13.113
Next, the marginal totals are adjusted to reflect these estimates, we now have the
reduced table with the corresponding reduced marginal totals (brought forward
from previous calculations).

1
2
3
New Total

1
42
31
26
93.887

2
29
23
28
85.113

3
12
6

14.372

Total
83
60
50.372
193.372

160

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE

Again
m32 =

50.372 x 85.113
n nrn

- 23.952
(193.372 - 14.372)
60 x 14.372
= 6.030
(193.372 - 50.372)

Because of the marginal constraints again, we thus have


m31 = 50.372 - 23.952 = 26.420 and
= 14.372 - 6.030 = 8.342
And finally, we have the reduced 2 x 2 table after adjusting for the marginal totals

1
2
Total
Hence,

1
42
31
67.467

2
29
23
61.161

Total
74.658
53.970
128.628

74.658 x 67.467

128.628

'

Again, because of the new marginal constraints, we now have,


= 74.658 - 39.159 = 35.499
= 67.467 - 39.159 - 28.308 and
77122 = 53.970 - 28.308 = 25.662
These expected values are displayed earlier in Table 5.12 and has the computed G2
=11.45 and is based on 5 d.f. In general, the number of degrees of freedom is given

by
d.f. = ( / !)(/ 1) number of excluded cells
and in our case we have (6 1) = 5 d.f. The corresponding pvalue is 0.043, which
indicated that we would fail to reject the hypothesis of quasi-independence at a =
0.01.
The above procedure for obtaining cell estimates for incomplete tables can be
applied to any table with a block-triangular structure. For other structures, the
extensive treatment provided in Bishop et al. (1975) will be found to cover most
of all possible occuring structures. Of course, if we should have several cells with
structural zeros, the cells estimation may not be straightforward as in the preceeding
example, and iterative procedures would have to be employed for this situation (see
chapter 6). In any case, any of the standard statistical package will handle this
situation.
Other forms of hypotheses relating to incomplete tables will be discussed later
in chapter 6.
Because of the extensive calculations involved above for obtaining the expected
frequencies, there are standard algorithms for computing these values under the
model of quasi-independence. We give the SAS software program for fitting the
quasi-independence model using PROC GENMOD.

5.8. THE QUASI-INDEPENDENCE

161

MODEL

DATA EXAMPLE;
DO UNV=1 TO 4;
DO AGENT=1 TO 3;
INPUT COUNT <SQ;
IF UNV EQ 4 AND AGENT EQ 3 THEN WT=0;
ELSE WT=1; OUTPUT;
END;
END;
DATALINES;
42 29 12 31 23 6 26 28 2 8 17 37
*** fits the model of independence***;
PROC GENMOD;
CLASS UNV AGENT;
MODEL COUNT=UNV AGENT/DIST=POI LINK=LOG; RUN;
*** fits the model of quasi-independence***;
PROC GENMOD DATA=EXAMPLE;
CLASS UNV AGENT WT;
MODEL COUNT=UNV AGENT WT/DIST=POI LINK=LOG;
RUN;
INDEPENDENCE MODEL
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square
UNV

AGENT

1
1
1
2
2
2
3
3
3
4
4
4

1
2
3
1
2
3
1
2
3
1
2
3

DF

Value

Value/DF

6
6

71.9911
75.1973

11.9985
12.5329

COUNT

Pred

Res chi

42
29
12
31
23
6
26
28
2
8
17
37

34.0268
30.8467
18.1265
24.5977
22.2988
13.1035
22.9579
20.8123
12.2299
25.4176
23.0422
13.5403

1.3669
-0.3325
-1.4390
1.2909
0.1485
-1.9624
0.6349
1.5756
-2.9252
-3.4548
-1.2587
6.3754

Streschi
2.1547
-0 . 5079
-1.9709
1.9150
0.2134
-2.5293
0 . 9326
2 . 2427
-3.7334
-5.1508
-1.8185
8 . 2586

QUASI-INDEPENDENCE MODEL
Criteria For As;sessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square
UNV

AGENT

1
1
1
2
2
2
3
3
3
4
4
4

1
2
3
1
2
3
1
2
3
1
2
3

DF

Value

5
5

11.4471
10.7552

COUNT

Pred

Res chi

42
29
12
31
23
6
26
28
2
8
17
37

39.1590
35.4993
8.3417
28.3077
25.6621
6.0302
26.4205
23.9513
5.6281
13.1127
11.8873
37.0000

0.4540
-1.0908
1.2666
0.5060
-0.5255
-0.0123
-0.0818
0.8273
-1.5293
-1.4119
1.4829
0.0000

Value/DF
2 . 2894
2.1510
Streschi
0.7877
-1.8170
1.7492
0.8139
-0.8117
-0.0155
-0.1300
1 . 2626
-1.9022
-2.1859
2.1859

The expected values obtained from SAS software are displayed in Table 5.12 under
hypothesis #02- We shall elaborate more on the use of the above SAS software
codes after our discussion of log-linear models in chapter 6. We shall also revisit

162

CHAPTER 5. THE GENERAL I x J CONTINGENCY

TABLE

the analysis of incomplete tables by discussing current approaches for anlyzing such
data in the next chapter.
The corresponding SAS software program from using PROC CATMOD is also
displayed below:
data new;
set example;
if unv eq 4 and agent eq 3 then count=0;
proc catmod; weight count;
model unv*agent= _response_/ml nogls freq pred=freq
noparm noresponse;
loglin unv agent; run;

5.9

Problems with Small Expected Frequencies

As discussed in chapter 4, when the sample size in a contingency table is small, the
resulting expected values may become correspondingly small and consequently, the
asymptotic x2 approximation may not be valid. Specifically, several authors have
considered the %2 approximation to Pearson's X2 test statistic under the hypothesis
of independence. Hommel (1978), Larntz (1978, Koehler (1986), Lawal and Upton
(1984), Lawal (1989b), and Lawal and Upton (1990) are but a few of them.
The approximation becomes suspect because as Lawal and Upton observed, for
most cases,
Lawal and Upton therefore sought a value k such that
Pr{X2 > kX2d(a)} = a
Their results suggest that k varies with a, and they recommended that for any
/ x J contingency table containing N observations, the Pearson's X2 should be
calculated as usual with expected values based on the independence model, and its
value compared with
/
3\
1 - - 1 Xd(-01) at the 1% level, and
V
2-/V J
jl - i(l - d~ 1/2 ) j Xd(0.05)

at the 5% level

where Xd( a ) 1S the 100a% critical value a x2 distribution with d = (I 1)(J 1)


degrees of freedom.
The following restrictions apply on the use of these procedures:
(i) The dimension of the table should satisfy
/, J>2

(ii) The average cell frequency should be greater than 0.5:


2N >U

(iii) The smallest cell expected value, m, should satisfy the inequality
m > sd~3/2
where s is the number of cells having expected frequencies less than 3.

163

5.10. ASSOCIATION MEASURES IN TWO- WAY TABLES

5.9.1

Example 5.6

The following data relate to a survey conducted by a video rental store of its customers. The two responses of most interest to the store were customer's frequency
of renting (coded 1 for lowest frequency and 4 for highest frequency) and customers'
rating of the adequacy of the store's selection (coded 1 for poorest rating and 4 for
highest rating).

Freq.
1
2
3
4

Total

1
1
(4.38)
2
(2.41)
3
(2.04)
5
(1.17)
11

Adequacy
2
3
4
37
(11.94) (35.04)
6
30
(9.31) (27.30)
8
16
(5.56) (16.30)
12
5
(9.37)
(3.19)
30
88

4
44
(34.64)
29
(26.99)
13
(16.11)
1
(9.26)
87

Total
86
67
40
23
216

Values in parentheses are the expected frequencies under the model of independence.
We note the following:
(a) X2 = 61.04 and G2 = 53.35.
(b) The average observed frequency is 13.5.
(c) There are two expected values (1.17 and 2.04) that are less than 3.0. Hence
by rule (Hi) above, the smallest expected value allowed is 2(9)~ 3 / 2 = 0.074.
The above two expected values satisfy this condition. We should compare the
computed X2 with say at a = 0.05 to

i a
216(

3'

Xg(.05) = 0.9969 x 16.92 = 16.87

Based on the above we would therefore reject the hypothesis that the two classificatory variables are independent. A StatXact test based on 10,000 simulations gives
a pvalue of 0.0000, indicating once again that the hypothesis would have to be rejected. A quasi-independence model may be needed after suitable residual analysis
and diagnostics.

5.10

Association Measures in Two-Way Tables

Discussions of the global odds and local global odds ratios are presented in appendix
D.9. In this section, we discuss concordance and discordance in the context of an
/ x J contingency table. An example is also presented.

CHAPTER 5. THE GENERAL I x J CONTINGENCY

164

5.10.1

TABLE

Concordance and Discordance

For two variables A and B in a two-way table, we say that a pair of observations
is concordant if the subject that ranks higher on variable A also ranks higher on
variable B. Similarly, a pair is said to be discordant if the subject that ranks higher
on A ranks lower on B. Formulae for calculating the number of concordant pairs
(C) and number of discordant pairs (D) are given by Agresti (1984) as:
C=

riijTiks;

D=

nijnks

(5.19)

where for (C), the first summation is over all pairs of rows i < k and the second
summation is over all pairs of columns j < s.
Below is a computational procedure for obtaining C and D for a two-way table
having I rows and J columns. The concordant pairs can be computed as follows:
/
J

, (J - 1).

Example 5.7
The data in Table 5.21 are taken from Agresti (1984) and relate to a comparison of
four (ordinal) different operations for treating duodenal ulcer patients.
Operation
A
B
C
D
Total

Dumping severity
None Slight Moderate
61
28
7
68
23
13
58
40
12
53
38
16
240
129
48

Total
96
104
110
107
417

Table 5.21: Data on duodenal ulcer patients


For the data in Table 5.21, we have
C = nn(n22 + n>23 + ^32 + 7133 + n42 + n 44 )
+ ni 2 (n 23 + n33 + n43) + n 2 i(n 32 + n33 + n42 + n43)
+ n22(ns3 + n4s) + HS 1(^42 + n4s) + n 32 (n 43 )
D = Tli 2 (77. 2 i + 713i + Tl 4 i)
' ^22 + n31 + ^32 + ^41 + ^42) + ^22(^31 + ^41)

- Ti32 + n4i 4- n 42 ) + n 32 (n 4 i) 4- ^53(^1 + ^ 42 )


and M = (2) J, where if / > J the larger of the two dimensions, then M gives the
number of product terms in the expressions for C and D for the general values of /
and J. In this example, M = 18.
Similarly, we define, TA to be the total number of pairs of observations for which
i = i' and it is computed as

5.10. ASSOCIATION MEASURES IN TWO-WAY TABLES

165

TA = 7; ^ n i +( n *+~ 1)
i

TB is defined as the total number of observations for which j = j' and it is also
computed as:
2

Also TAB equals the total number of observations for which i = i' and j = j' and
is also computed as:
^
TAB = We note here that

/n

We now describe three measures of association that are all based on the notion of
concordant and discordant pairs for an / x J ordered table, having variables A and
B respectively, and where we shall assume that category 1 is higher than category
2 and so forth.

5.10.2

Goodman and Kruskal's 7

The gamma measure proposed by Goodman and Kruskal (1954) is defined as:

The range of gamma is 1 < 7 < 1 and under independence, 7 = 0. However,


7 = 0 does not necessarily imply that A and B are independent. For the data in
Table 5.21, C= 21434, D= 15194, and hence,
21434 - 15194
^ 21434 + 15194

5.10.3

Somers's d

Somers (1962) proposed a measure which is a variation of 7, which is said to be more


appropriate when one of the variables, say B, is dependent on the other variable.
This statistic, defined as dsA, is given by:

The statistic is interpreted in Upton (1978)


as the difference between the probabilities of like and unlike orders for
two observations chosen at random from the population, conditional on
their not having tied ranks for variable A.
Again for the data in Table 5.21, we have TB = 38064 and hence ABA = 0.084.
The corresponding statistic that assumes that A is the dependent variable is
dAB = 0.102, since TA = 21582. Also, TAB = 9538.

166

5.10.4

CHAPTER 5. THE GENERAL I x J CONTINGENCY TABLE

Kendall's r

This statistic is defined as:

r=

2(C-D)

(5.22)

Similarly, for the data in Table 5.21, we have r = 0.189.


It has been advocated that the gamma statistic be used because of its ease of
computation, interpretation, and more importantly if the two variables are of equal
interest. However, the gamma tends to depend on the number of categories and the
way these categories are denned. Consequently, Somers's dBA is particularly useful
if one of the variables is dependent and is also particularly useful for the general
2 x J tables in which the column variable B is an ordinal response variable. We can
implement the various measures in SAS software as follows:

DATA MEASR;
DO A=l TO 4;
DO B=l TO 3;
INPUT COUNTfflffl;OUTPUT; END;
DATALINES;
61 28 7 68
16

END;

PROG FREQ; WEIGHT COUNT; TABLES A*B /MEASURES;


TITLE 'ASSOCIATION MEASURES'; RUN;
Association measures
Statistics for Table of a by b
Statistic
Gamma
Kendall's Tau-b
Stuart's Tau-c
Somers' D C|R
Somers' D R|C
Pearson Correlation
Spearman Correlation
Lambda Asymmetric C|R
Lambda Asymmetric R|C
Lambda Symmetric
Uncertainty Coefficient C|R
Uncertainty Coefficient R|C
Uncertainty Coefficient Symmetric
Sample Size = 417

Value

ASE

0.1704
0.1108
0.1077
0.0958
0.1282
0.1222
0.1263
0.0000
0.0456
0.0289
0.0140
0.0094
0.0113

0.0647
0.0423
0.0412
0.0366
0.0490
0.0478
0.0482
0.0000
0.0395
0.0253
0.0083
0.0056
0.0067

The special case of the 2 x r ordered scored responses is presented in appendix D.10.
Examples relating to this are also presented in the appendix.

5.11

Exercises

1. The data for this exercise relate the relationship between people's attitude on
the government's role in guaranteeing jobs and their votes in state and local
elections (Reynolds, 1977).

5.11.

167

EXERCISES
Voting in 1956 State and Local Elections by Attitudes on
Government-Guaranteed Jobs
Vote
Democratic
Even
Republican

Attitude
Moderate
Conservative
34
115
24
110
32
137

Liberal
312
159
210

Fit a model of independence to the above data and conduct an analysis of


residuals. Based on your residual analysis, partion the table into suitable
components and again test for independence. Draw your conclusions.
2. Refer again to the data above,
quasi-independence.

anf fit an appropriate model of

3. For the data on voting attitudes, obtain Kruskal's 7, Kendall's r, and Somers's
d measures of association. Interpret these measures.
4. A survey of drivers was obtained to compare the proportions who use seatbelts regularly for various age categories. These data are displayed in the
accompanying table.

Age Group
16-20
21-25
26-30
>30

Always
1
4
8
15

Regularity of seatbelt usage


Regularly Sometimes Never
19
10
70
80
8
8
77
5
10
30
49
6

Analyze the data (including residual analysis) and draw conclusions.


The data in the next table show a cross-classification of level of smoking and
myocardial infarction for a sample of young women (Agresti, 1989; Shapiro
et al., 1979). Conduct a test of independence for the data, and comment on
your results.

Patients
Control
Myocardial

Smoking Level
( Cigarettes/Day)
0
1-24 > 25
25
25
12
0
1
3

6. The data below refer to 264 marriages in Surinam (Speckman, 1965). Here
husbands and wives are categorized in terms of four religious groups
(C^Christian, M=Moslems, S=Sanatin Dharm, A=Arya Samaj). S and A
are two Hindustan religious groups.

168

CHAPTER 5. THE GENERAL I x J CONTINGENCY

TABLE

Marriage in Surinam (Speckmann, 1965)


Husbands
C
M
S
A
Total

C
17
1
5
4
27

Wives
M
S
4
1
4
66
4
96
2
18
73 122

A
3
2
14
23
42

Total
25
73
119
47
264

(a) Fit the independence model.


(b) Conduct a residual analysis.
(c) Can you fit a quasi-independence model to these data?
7. The data in this example relate to the classification of 2000 sixth-grade children by school performance and weight category.

Poor at school
Satisfactory at school
Above average

Underweight
36
180
34

Normal
160
840
100

Overweight
65
300
35

Obese
50
185
15

Fit the independence model to the data and perform residual analysis.
For the Surinam marriage data, obtain Kendall's, Somers's and Goodman and
Kruskal's measures of association and interpret them. Relate these to your
results above.

Chapter 6

Log-Linear Models for


Contingency Tables
6.1

Introduction

The concept of log-linear analysis in contingency tables is analogous to the concept


of analysis of variance (ANOVA) for the continously distributed factor-response variables. While response observations are assumed to be continuous with underlying
normal distributions in ANOVA, the log-linear analysis assumes that the response
observations are counts having Poisson distributions.
Basically, what we have done in the preceding chapters is to analyze simple (that
is, two-factor or factor-response) two-way contingency tables where emphasis is on
whether the classificatory variables are homogeneous or independent. In a way, the
preceeding methods discussed in chapter 5 cannot be readily extended to situations
where there are more than two underlying variables. Consequently, in this chapter,
we develop a new methodology that will enable us to study various interactions for
multiway contingency tables.
As discussed in chapter 4, the odds ratio is invariant under interchange of rows
and columns. This property of the odds ratio makes it very attractive for use,
especially when the dependent variable (i.e., the response) is not obvious. The
invariance property ensures that the odds ratio can be utilized for studies of independence. The log-linear parameterization, which models association or dependence
in terms of the odds ratios, therefore makes it suitable for the analysis of multiway
contingency tables.
In this chapter, the odds ratio will be used to model multiple-response structure contingency tables. The log-linear model (Goodman, 1970, 1971a, 1972a,b;
Bishop et al., 1975; Haberman, 1978) formulations are similar to analysis of variance
(ANOVA) techniques, except that the models are applied to the natural logarithms
of the multinomial probabilities or expected frequencies.
We begin by examining the log-linear model for a 2 x 2 table and then extend the
formulation to a general / x J table and finally to a general multiway contingency
table.
169

170

6.2

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

The 2 x 2 Table

If for a 2 x 2 table we assume a multinomial sampling scheme, then only the sample
size n will be fixed and the observed frequency n^ follows a multinomial distribution
with parameters TT^- and n. Consider the following example on treatment of angina.
In a study to evaluate the effectiveness of the drug Timolol (Samuels & Witmer,
1999) in preventing angina attacks, patients were randomly allocated to receive a
daily dosage of either Timolol or placebo for 28 weeks. The numbers of patients
who became free of angina attacks are displayed in Table 6.1.
Response
Free Not Free
44
116
19
128
63
244

Treatment
Timolol
Placebo
Total

Total
160
147
307

Table 6.1: Response to angina treatment


Let T and R represent the treatment and response variables respectively. Let the
joint probability that an observation falls in category i of variable T and category
j of variable R be TT^. That is, P(T = i,R = j) = TTJJ > 0 for i, j = 1,2. Let
treatment T be indexed by i = 1,2 for Timolol and placebo, respectively. Similarly,
let the response variable R be indexed by j = 1,2 for free and not free, respectively.
Under the multinomial sampling scheme, the expected frequencies is
where n is the sample size, and the natural logarithm of the expected values In (rhij),
denoted henceforth as lij, is:
lij = ln(n) 4-ln(7Tij)
Since the first term on the right-hand side of the above expression is constant for
a fixed n, the model can be reexpressed in terms of either the underlying probabilities or the expected frequencies. The latter is often used because then maximum
likelihood estimators are in terms of the observed frequencies n^.
The log-linear formulation for the 2 x 2 table in Table 6.1 in terms of l^ is:
;
\R i \TR
lij
= IJL +| A\Ti +, AJ
+ Aij

(( i \

(v-i)

where the terms on the right-hand side of the above model are the parameters and
correspond in order to the overall mean, the main effect of T at level i, the main
effect of R at level j, and the interaction effects of T and R at level combination
(i, j), respectively. For instance, the main effect for the Timolol treatment is the
difference between the average /i + and the overall average f ++ . In general, we define
an estimate of the main effect of factor T at the iih level as:
T
LTi+ ~ LT++
i

A\

Similarly, the estimate of the main effect of the jih level of factor R is defined as:
\R
- T . i
i +j ++

Model (6.1) above has too many parameters. We notice that there are at most four
values of l^, but there are nine model parameters: //, A^, A^, Af, A^, A^, X,
', and X. Since there are more parameters than underlying probabilities, this

6.2. THE 2 x 2 TABLE

171

model is overparameterized. The above implies that we have four equations but
nine unknowns, which would result in an infinite number of solutions.
To overcome this, we impose some constraints or identifiability conditions on the
parameters of model (6.1). We describe below three forms of constraints usually
imposed to solve this problem.
(i) Sum-to-zero constraints on the parameters. Here the parameters are constrained
to sum to zero either row-wise or column-wise for main and interaction effects.
We shall illustrate this in the next section. PROC CATMOD in SAS employs this form.
(ii) Only the parameter of the last category of each variable and corresponding
interaction terms are set to zero. This is the default approach employed by
PROC GENMOD in SAS.
(iii) Similar to (ii) except that the parameter of the first category of each variable
and corresponding interaction terms are set to zero.1
We shall establish the correspondence of the first two approaches later in this chapter. We will however start with the sum-to-zero constraint approach, which is the
most popular. In this case, the relevant constraint conditions for the log-linear
formulation of the data in Table 6.1, viz. equation (6.1) are:

The above conditions imply that

Af + \l = 0;
\ TR

Af + A? = 0;

. x TR _ Q.

x TR

A + A = 0;

, \ T.R _ Q

the sixth constraint A21 + A22 = 0 is not necessary because this constraint is the
sum of the last three constraints involving the interaction terms.
With the above conditions, the number of parameters to be estimated is now
equal to the number of equations minus the number of constraints, which in this
case becomes 9 5 = 4. Model with as many parameters as the number of cells
in the table is called a saturated model. For example, the model in (6.1) with the
above conditions imposed is a saturated model.
Solutions to these parameters lead to the following expressions for the parameter
estimates:
^
A = -^

(6.2a)

Af = ^ - ^

(6.2b)
(6 2c)

*? = T - ^T
1TR __ j
ij

'

_ n+ _ ^+j , ^++
lj

(6.2d)

This is the default approach in GLIM. Parameterization in both (ii) and (iii) can be implemented in both GENMOD and GLIM.

172

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

where l++ = Y~^ In (n^-),

k+ = \^ln(nij), and l+j = V"^ln(n^). Similarly, / i+ ,


3

ij

l+j, and /++ are respectively the averages of the row, column, and overall sums of
the logarithms of the observations.
For the data in Table 6.1, we have in Table 6.2, the observed log of the frequency
counts as well as the relevant means of the sums of the log-counts.

Treatment
1
2
Sum (l+j)

Response
1
2
ln(44) = 3.7842 ln(116) = 4.7536
ln(19) = 2.9444 ln(128) = 4.8520
9.6056
6.7286

Avg.

3.3643

(f+j)

Sum
li+
8.5378
7.7964
16.3342

4.8028

Avg.
*i+
4.2689
3.8982
4.0836

Table 6.2: Log of the observed counts lij = \n(riij)


In Table 6.2, for example, 8.5378 = 3.7842 + 4.7536, 9.6056 = 4.7536 + 4.8520, and
/ 2+ = 7.7964/2 = 3.8982.
From Table 6.2, and using equations (6.2a) to (6.2d), we have

In (44)+In (116)+In (19) +In (128)


= 4.0836

Af =

In (116)

= 4.2689 - 4.0836 = 0.1853

_
and

= 3.3643

_ 4.0836 = -0.7193

\if = In (44) - 4.2689 - 3.3643 + 4.0836 = 0.2346

Alternatively, we can obtain the latter as


A^ = In (44) - Af - Af - jj, = 3.7842 - 0.1853 - (-0.7193) - 4.0836 = 0.2346
We can similarly obtain X^R by making use of the expressions in (6.2d) and substituting the relevant values. With the above estimates and utilizing the zero sum
identifiability constraints, we have
\l = -0.1853,
XR = 0.7193
and

A?? = -A?? = -0.2346 = X = -\

On the other hand, if we substitute the expected counts for l i j , we obtain for i = 1.
~T _ In (mn) + In (77112)

In (mn) + In (rhi 2 ) + In (m 2 i) + In (77122)

x 77112
In (777,11 ) - In (77121 ) + In (rni 2 ) - In (777,22)
1,
= In
rn
4
4
2i

Similarly for the classificatory variable R at j = 1,


-

l
V

x m22

6.2.

THE 2 x 2 TABLE

173

and the ML estimates (with the rh terms replaced by the n^-) are given for our
example as

-T

lafnnxn12A

x n

1 k/jjxllgN

(0.74131)

A*
= I in f""*"") = i In M141 6Xx11 M
= (-2'87699)
= -0.7192
1
28
4
4 Vi2xn22y
4 V
/
The log-odds ratio for the data in Table 6.1 is computed as

The above result illustrates that the interaction term "parameter" estimate of the
log-linear model is a function of the corresponding log odds ratio.

6.2.1

Estimates From other Constraints

Under PROC GENMOD constraints, we would have the following corresponding


solutions.
# = ln(n 2 2 )
=^22
=4.8520
Af = 112 - 122
R

X = l2i-l22

A2 = 0.0,

= -0.0984

=2.9444-4.8520

=-1.9076

AH = In H2 ^21 ~l~ ^22


with

= 4.7536 - 4.8520

A2 = 0.0;

and

0.9382
u

Aj 2 = A21rt = A 22 = 0.0

We shall give a general formula for estimating parameters for the general case later
in this chapter.

6.2.2

Standard Errors of Parameter Estimates

For the sum to zero constraints approach, the expression for the estimator of the
interaction term X is:
\T.R = I L _i _L ++
2
2
4
where the I's are as denned in the previous section. Letting hij = ln(nj) be the
logs of the observed frequencies, then we have in particular,
2

/ill + hi2
= hu

hn + /121
/111 + h\2 + /l21 + /^22
^ +

As shown in chapter 4 using the delta method, the estimate of the asymptotic
variance of In (n^-) = h^ is
and in general, any estimator of the A parameters can be written as a linear combination (a contrast) of the log of the observed cells as

174

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

where the {a>ij} satisfy ^.. a^- = 0. Consequently, an estimate of the asymptotic
variance is given by
2

The asymptotic standard error (a.s.e) for A therefore equals:


a

jj

a.s.e.(A) =
I

For any particular saturated model, the estimated asymptotic parameter variances
need not all be the same. It generally depends on the number of categories for each
variables. Thus to "put the A's on equal footing" (Upton, 1982), it is necessary to
standardize them so that the standardized value Z has variance 1.

Z(A) =

(6.3)

Similar to the standardized residuals discussed in Chapter 5, a A parameter will be


considered important if Z(X)\ > 2.0 (the upper 5% point of a standard normal) or
[Z'(A)]2 can be compared with the upper tail of a %2 distribution with 1 degree of
freedom.
In the example above, {a^-} = ^ for each of the A's, and estimates of the
asymptotic variances are given by

_L
( _L + 1 + _L) = 0.00574
16 \44 + 116 19 128 J
The asymptotic standard error equals \/0. 00574 = 0.0758. The Z's for the effects
and interactions are Z(T) = 0.1853/0.0758 = 2.445, Z(R) = -0.7193/0.0758 =
-9.489, and Z(TG) = 0.2346/0.0758 = 3.095. We notice immediately that all the
effects have \Z(.}\ > 2. We shall explore this result further, later in the chapter.
A 100(1 a)% asymptotic confidence interval may also be obtained using a
standard normal approximation. For example, for the T-effect, a 95% confidence
interval equals
aig53 L96 ( 0 _ 0758 ) = (Q.Q367, 0.3339)
Similar results may be obtained for the other effects and interaction terms.
Under the GENMOD constraints, the estimated asymptotic standard errors are:
= 0.1282
+ )
n<2\

= 0.2459

Tl22 J

i
nii

= 0.3030

6.2. THE 2 x 2 TABLE

175

We present below the SAS software program and relevant output from PROC CATMOD for implementing the saturated model for the data in Table 6.1. The parameter estimates and their asymptotic standard error estimates agree with the ones
computed from the previous sections.
data tab61;
/read in row, column variable names
along with the corresponding count
the $ sign indicates we are reading the levels in
alphanumeric format.
/
input t $ r $ count fflffl;
datalines;
timo free 44 timo not 116 pcebo free 19 pcebo not 128
proc catmod order=data; weight count;
model t*r=_response_/ml;
loglin t|r; run;
OUTPUT
The CATMOD Procedure
Response
Weight Variable
Data Set
Frequency Missing

T*R
COUNT
TAB61
0

Response Levels
Populations
Total Frequency
Observations

4
1
307
4

Maximum Likelihood Analysis of Variance


Source

DF

Chi-Square

T
R
T*R

1
1
1

5.99
90.17
9.59

Likelihood Ratio

Pr > ChiSq

0.0144
<.0001
0 . 0020

Analysis of Maximum Likelihood Estimates

Effect

Parameter

T
R
T*R

1
2
3

Estimate

0.1853
-0.7192
0 . 2345

Standard
Error

ChiSquare

0.0757
0.0757
0.0757

5.99
90.17
9.59

Pr > ChiSq

0.0144
<.0001
0 . 0020

We observe here that the estimates produced by PROC CATMOD for effect R
and interaction term TR above are different in the fourth decimal place from those
earlier obtained. These differences are due to rounding errors.
The following are the corresponding SAS software program and relevant output
for implementing the saturated model using PROC GENMOD.
set tab61;
proc genmod order=data; class t r;
model count=t|r/dist=poi link=log typeS; run;
PARTIAL OUTPUT
The GENMOD Procedure
Model Information
Data Set
Distribution
Link Function
Dependent Variable
Observations Used

WORK.TAB61
Poisson
Log
COUNT
4

176

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

Class Level Information


Class
Levels
Values
2
TIMO PCEBO
2
FREE NOT
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

0
0

0.0000
0.0000

Deviance
Pearson Chi-Square

Value/DF

Analysis Of Parameter Estimates

Parameter
Intercept
T
T
R
R
T*R
T*R
T*R
T*R
Scale

TIMO
PCEBO
FREE
NOT
TIMO
TIMO
PCEBO
PCEBO

FREE
NOT
FREE
NOT

DF

Estimate

1
1
0
1
0
1
0
0
0
0

4,.8520
-0..0984
0,.0000
-1..9076
0,.0000
0,.9382
0,.0000
0,.0000
0,.0000
1..0000

Standard
Error
0 . 0884
0.1282
0 . 0000
0 . 2459
0 . 0000
0 . 3030
0 . 0000
0 . 0000
0 . 0000
0 . 0000

Wald 95'/.
Confidence Limits
4.6788
-0 . 3497
0 . 0000
-2.3895
0 . 0000
0 . 3444
0 . 0000
0 . 0000
0 . 0000
1 . 0000

ChiSquare

5.0253 3013,.40
0.1528
0,.59
0 . 0000
-1.4257
60,.20
0 . 0000
9,.59
1 . 5320
0.0000
0 . 0000
0 . 0000
1 . 0000

Pr > ChiSq
<-0001
0 . 4425
<.0001
0 . 0020

LR Statistics For Type 3 Analysis

Source

T
R
T*R

DF
1
1
1

ChiSquare

6.32
119.73
10.24

Pr > ChiSq

0.0119
<.0001
0.0014

The saturated model is most useful in initial analysis to determine which effects or
interaction terms are sufficiently important for a proper understanding of the data.
In this example, our results from the sum-to-zero constraints indicate that all the
three effects are important for future model consideration, while the results from
the last-equal-zero constraint indicate that only the R effect and the interaction
(TR) effect are important. We examine these further in the next section. Note
that, the model degrees of freedom is zero because this is a saturated model.

6.2.3

Independence and Other Models for the 2 x 2 Table

The model in (6.1) with the identifiablility constraints imposed contained as many
parameters as there are cells. Consequently, the model always fits perfectly. However, if we wish to find simpler models that may also explain the variations in our
data, we would like to test models other than the saturated one. One of these possible models is the model based on the hypothesis of independence. For this case,
we would have to set the interaction term X^R = 0, such that (6.1) now becomes
l y ^ + Af + A?
(6.4)
Under the null hypothesis of independence HQ, we recall that

The relevant likelihood ratio test statistic G2 can then be calculated and tests made.
Other possible models are:

177

6.2. THE 2 x 2 TABLE

(6.5a)

= fj, + Xj

(6.5b)
(6.5c)
In (6.5a), we postulate that the R categories are equally probable. This implies
that T and R are independent. Similarly, we could postulate that the categories of
T are equally probable, leading to model (6.5b). We may finally postulate that all
the categories are equally probable, which leads to the model in (6.5c). Table 6.3
lists the expected values under these models, their respective (72 values, and the
corresponding degrees of freedom. The models in (6.4), (6.5a), (6.5b), and (6.5c)
are respectively designated as {T,R}, {T}, {R}, and {/^}. Table 6.3 gives the results
of implementing these models on the data in Table 6.1.

(6.1)
Cells
11
12
2 1
22
a2
df

mj
44
116
19
128

44.00
116.00
19.00
128.00
0.0
0

(8.3)
{T,R}
32.83
127.17
30.17
116.83
10.24
1

Models
(6.5a)
{T}
80.00
80.00
73.5
73.5
124.20
2

(6.5b)
{R}
31.45
122.00
31.45
122.00
10.79
2

(6.5c)
(M
76.75
76.75
76.75
76.75
124.75
3

Table 6.3: Results when models are applied to data in Table 6.1
The models in (6.4), (6.5a) - (6.5c) are respectively implemented in SAS software
with the following statements and partial SAS software outputs from each implementation to two decimal places only (see the format statement).
set tab61; proc genmod; make 'obstats' out=aa;
class t r; model count=t|r/dist=poi link=log obstats; run;
proc print data=aa noobs; var t r count pred; format pred 8.2; run;

timo
timo
pcebo
pcebo

free
not
free
not

count

Pred

44
116
19
128

44.00
116.00
19.00
128.00

***model 4***;
proc genmod order=data; class t r;
model count=t r/dist=poi obstats; run;
Pred
timo
timo
pcebo
pcebo

free

not
free

not

44
116
19
128

32.83
127.17
30.17
116.83

44
116
19
128

80.00
80.00
73.50
73.50

***fit model 5a***;


proc genmod order=data; class t r;
model count=t/dist=poi; run;

timo
timo
pcebo
pcebo

free

not
free

not

178

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

***fit model 5b***;


proc genmod order=data; class t r;
model count=r/dist=poi; run;
t
r
timo
timo
pcebo
pcebo

44
116

free
not

19
128

31.50
122.00
31.50
122.00

count

Pred

44
116
19
128

76.75
76.75
76.75
76.75

free

***fit model 5c**;


proc genmod order=data;
model count=/dist=poi; run;
t
timo
timo
pcebo
pcebo

6.3

not

r
free
not
free
not

Log-Linear Models for I x J


Contingency Tables

We now consider an extension of the above formulation to the general twoway


I x J tables. If we denote the row and column classificatory variables by A and
B, respectively, then let A be indexed by i = 1, 2, , / and let B be similarly
indexed by j = 1, 2, , J. For either multinomial or product-multinomial sampling
schemes, the log-linear formulation for the saturated model in an / x J table is given

by
lij = n + Xi + Xj + Xij ,

i 1, , /; j = 1, - , J

(6-6)

and where lij = In (n^-). We again impose the sum-to-zero identifiability conditions

The model now has as many parameters (IJ) as the total number of cells in the
table, namely, 1 + (/ - 1) + (J - 1) + (/ - 1)(J - 1) = IJ.
Then the parameter estimates are given by the following expressions:

A = !%

(6.7a)

?A _

lj+

^+ +

,r. y, N

y^B

^+j _ ^++

(R 7 \

\AB _ j
lj

_ ^+ _ ^+3

, ^++

~ ij ~ J ~ / ^ U
= l^ (ft + Xi + Xj )

(6.7d)

where / ++ = \^\n.(riij);li+ VJln(nij) and l+j } ^ I n ( H J J ) .


ij

If any of the observed counts n^- equals zero, the standard practice is to use
instead
,
, /
, K\
lij = ln(riij + n0.5)

6.3. LOG-LINEAR MODELS FOR IxJ CONTINGENCY TABLES

179

That is, we add 0.5 to each observed value n^-. This is necessary for the implementation of the saturated model in PROG CATMOD. The alternative to adding 0.5 to
our data values when some have zero frequencies is to fit an unsaturated or reduced
model to the data. For reduced models, the expected values are only functions of
the marginal totals rather than the individual cell counts. With PROG GENMOD
in SAS^, adding 0.5 to the observed data before fitting the saturated model is not
necessary.

6.3.1

Example

The data in Table 6.4 (Hildebrand & Ott, 1991) relate to the popularity of three
alternative flexible time-scheduling plans among clerical workers in four different
offices. A random sample of 216 workers yields the following counts:
Favored Plan
1
2
3
Total

1
15
8
1
24

Office
2
3
32 18
29 23
20 25
81 66

4
5
18
22
45

Total
70
78
68
216

Table 6.4: Popularity of scheduling times among clerical workers


The corresponding natural logarithmm of the observed counts are displayed in Table
6.5. The results are given to three decimal places for brevity and clarity.
Favored Plan
1
2
3
l+j

1
2.708
2.079
0.000
4.787

Office
2
3
3.466 2.890
3.367 3.136
2.996 3.219
9.829 9.245

4
1.609
2.890
3.091
7.590

l+j

1.596

3.276

3.082

2.530

h+
10.673
11.472
9.306
31.451

h+

2.668
2.868
2.327

2.621

Table 6.5: Log of the observed counts lij = In (n^)


In this example, 7 = 3 and J = 4; hence there are IJ 12 cells in the table. Thus,
A^1 = li+ - l++ = 2.668 - 2.621 = 0.047
Similarly,
and

A = 2.868 - 2.621 = 0.247


,.
\f = -(0.047 + 0.247) = -0.294

The latter result is obtained as a result of the sum-to-zero identifiability constraints.


Similarly, for the offices (B),
Af = 1.596 - 2.621 = -1.025
Af = 3.276 - 2.621

= 0.655

Af = 3.082 - 2.621

= 0.461

180

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

Again, because of the sum-to-zero constraints,


Af - -(-1.025 + 0.655 + 0.461) = -0.091
For the interaction terms,
A^B = lu - ll+ - f+i + 1++ = 2.708 - 2.668 - 1.596 + 2.621 = 1.065
imi ar y,

^ = 3 466 _ 2.668 - 3.276 + 2.621 = 0.143

These maximum likelihood (ML) estimates of the A's are given in Table 6.6.
Office
Favored Plan
1
2
3

1
1.065
0.236
-1.301

2
0.143
-0.156
0.013

Af

-0.239
-0.193
0.432

-0.969
0.113
0.856

0.047
0.247
-0.294

Af

-1.025

0.655

0.461

-0.091

2.621

Table 6.6: ML Estimates of the X^B parameters


Note that the parameter estimate A^5 = 1.301 is relatively small and \fB = 1.065
relatively large. These correspond to a deficit among office type 1 respondents
favoring plan 3 and an excess among office type 1 respondents favoring plan 1
respectively. We see that the results from a saturated log-linear model formulation
allow us to characterize explicitly the variations among the table categories. Below
is the corresponding SAS software output and program from PROC CATMOD.
The results in Table 6.6 agree with the SAS software output under the sum-to-zero
constraints only to the second place decimal with the SAS software results because
of rounding errors. For example, \f^ = 0.143, but in SAS software output, it is
0.1421.
data tab64;
***we define labels here but will not be used
in the output ***;
label a ='plan'
b ='office';
do a=l to 3;
do b=l to 4;
input count <OQ;
output;
end;
end;
datalines;
15 32 18 5 8 29 23 18 1 20 25 22
proc catmod;
weight count;
model a*b=_response_/ml;
loglin a l b ;
run;
The CATMOD Procedure
Response
Weight Variable
Data Set
Frequency Missing

a*b
count
TAB64
0

Response Levels
Populations
Total Frequency
Observations

12
1
216
12

6.3. LOG-LINEAR MODELS FOR IxJ CONTINGENCY TABLES

181

Maximum Likelihood Analysis of Variance


Source

DF

b
a*b

Chi-Square

Pr > ChiSq

3.65
27.85
21.02

0.1611
<.0001
0.0018

Likelihood Ratio

Effect

a
b

a*b

0
Analysis of Maximum Likelihood Estimates
Standard
ChiParameter
Estimate
Error
Square

1
2
3
4
5
6
7
8
9
10
11

0 . 0474
0.2472
-1.0252
0.6553
0 . 4606
1 . 0648
0.1421
-0 . 2386
0 . 2364
-0.1561
-0.1932

0.1396
0.1323
0.2797
0.1362
0.1409
0.3110
0.1764
0.1910
0.3232
0.1723
0.1802

0.12
3.49
13.43
23.14
10.69
11.72
0.65
1.56
0.54
0.82
1.15

Pr > ChiSq

0.7341
0.0618
0 . 0002
<.0001
0.0011
0 . 0006
0 . 4207
0.2115
0 . 4644
0.3649
0 . 2836

In the output above, the first column gives the effects (remember that a is the
favored plan variable). Column 2 gives the parameter estimates. For instance, Af =
0.047, while \^ = 0.2472. These are labeled as parameter 1 and 2, respectively, in
column 2. Hence by the sum-to-zero constraint, A^ would be -0.294 as displayed in
Table 6.6. Similarly, the B main effects Af, A^, A^ are designated as parameters 3,
4, and 5, respectively, in column 2. Corresponding \f can also be obtained from the
sum-to-zero constraint. For the interaction term AB, the parameter estimates are
given by the a*b effects. For instance, Aj\ s , A^5, A^3fl, A^B, A^6, A^8 are labeled as
6, 7, 8, 9, 10, and 11, respectively. The others are again obtained from the use of the
sum-to-zero constraints. Column 4 gives the estimated asymptotic standard errors
for each of the parameter estimates presented. Column 5 gives the corresponding
X2 statistics for testing the null hypothesis of the form HQ : A = 0. For the Aj\B for
instance, it is computed as (1.0648/0.3110)2 = 11.72. The corresponding pvalue is
given in the last column as 0.0006.

6.3.2

Asymptotic Standard Errors of Parameter Estimates

The asymptotic standard errors (a.s.e.) for the parameter estimates (under the
sum-to-zero constraints) are not as simple to obtain in the IxJ tables as for the
2 x 2 table. However, the principles remain basically the same.
To find the asymptotic standard error corresponding to the parameter estimate
AI for instance, we observe that
+ Il2 + Il3 + 1 14:

III + +

12

+ 2/12 + 2/13 + 2/i 4 ) ~ (h + 122 +

12

^34)

182

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

hence,

144
_ J_
~ 144
= 0.0195

"13

"14

"21

"22

1
"34

1"

The estimated a.s.e. therefore equals \/0.0195 = 0.1396 and the standardized value
equals
X
0.047
= 0.337
0.1396
0.1396
Similarly for the B's,
CB

/ll + /21 + /31

/ll 4- + /34

12

(3/n + 3/21 + 3/31) - (/12 + + /22 + + /32 + +


_

in
- J_

12
1
1

_?__?_
"21

"31

1
"34

"12

~ 144
= 0.0783

and the corresponding estimated asymptotic standard error equals \/0.0783 =


0.2799. Similar calculations yield the results for X^.
For the interaction terms, the asymptotic standard error for AI\B, for instance,
can be computed from the fact that:
/ll + /12 + /13 + /14

AB

/ll + /21 + /31

/34

12
6/11 - 2(/ 12 -f

3(/ 2i

12
(/ 22

/23

/34)

12

Hence,
36

:AB

"(*") = is

"11

"12

1
+

144

"22

"23

"13

"14
1
"34 J

144
J_

J_

144 29 + 23

J_

22

= 0.0968
The estimated a.s.e. again equals \/0.0968 = 0.3112.

"21

"31

6.3. LOG-LINEAR MODELS FOR I x J CONTINGENCY TABLES

183

In general, for an / x J contingency table, the estimates of the asymptotic


variances of the A parameter estimates in a saturated model (under the sum-to-zero
constraints) are given by:

j
(6.8a)

t/r 3=1
J-

(6.8b)

Var(A?) =
1=1

/ J
AB

nr

7-1 . + / 3 V
71 ./ IT?
J

/t

(6.8c)

where a. (I 1), ft = (J 1), r = 1, , / and c = 1, , J.

Favored Plan
1
2
3
Totals

1
0.0667
0.1250
1.0000
1.1917

Office
2
3
0.0313 0.0556
0.0345 0.0435
0.0500 0.0400
0.1158 0.1391

4
0.2000
0.0556
0.0455
0.3011

Totals
0.3536
0.2586
1.1355
1.7477

Table 6.7: Table of reciprocals of cell counts


We give below examples of the use of these expressions. Suppose we wish to find
the estimated asymptotic standard errors (a.s.e.) for A^, Af, and A^8 for the data
in Table 6.4, whose reciprocals are displayed in Table 6.7. From this table, we have
a = 2, /3 = 3, and a/3 = 6. For XA, r = 2 and we have from (6.8a) and using the
table of cell counts reciprocals in Table 6.7:
'4(0.2586)> + (1.7477 - 0.2586)
a.s.e. = \i

'- = 0.1324
V
144
Similarly for A^, c = 3 and once again using (6.8b), we have
'9(0.1391) + (1.7477 - 0.1391)
a.s.e. = i ------ = 0.1409
144
For the A^8, we have r = 1, c = 2, and using (6.8c), we also have
36(0.0313) + 4(0.3536 - 0.0313) + 9(0.1158 - 0.0313) + 1.3096
144
= 0.1765.

a.s.e. =

where
1.3096 = 1.7477 - 0.3536 - 0.1158 + 0.0313
Similar computations can be carried out to obtain other estimates of the asymptotic
standard errors.

184

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

6.3.3

PROC GENMOD

Estimates Based on

For the GLM constraints where the last category of each variable and corresponding
interaction terms are set to zero, we have for the data in Table 6.4.
= 3.0910
p, = In(n 34 )
\A = <i4
7 7434
/\!

= 1.6094-3.0910

= -1.4816

A2 ^24 ^34

= 2.8904-3.0910

= -0.2006

^1 = '31 '34

= 0.0-3.0910

= -3.0910

A2 = ^32 ^34

= 2.9957 - 3.0910

= -0.0953

'34

= 3.2189 - 3.0910

= 0.1279

A3 = 0

Similarly,

Af = 0

For the interaction terms,


\AB
AI-I

= 4.1897

*14 ~~ '31 i

= 1.9516

A i o ^ 12 ^14 ^32 +
AB

= I

-I

-I

= 1.1531

= 1-9516

^23 ^23 ~ ^24 ~ ^33 + ^34


fl
where AfAB
=
4

= 0 and

0.1173

AB _
32

\AB
'33

= 0. Below is the SAS

software implementation using PROC GENMOD.


set tab64; proc genmod; class a b;
model co\int=a|b/dist=poi link=log type3; run;
The GENMOD Procedure
Class Level Information
Class
Levels
Values
3
4

a
b

DF

Parameter
Intercept
a
a
a
b
b
b
b
a*b
a*b
a*b
a*b
a*b
a*b
a*b

1
2
3
1
2
3
4
1
1
1
1
2
2
2

1
2
3
4
1
2
3

1
1
1
0
1
1
1
0
1
1
1
0
1
1
1

123
1234
Analysis Of Parameter Estimates
Standard
Wald 957.
Estimate
Error
Confidence Limits
3,.0910
-1 .4816
-0,.2007
0,.0000
-3 .0910
-0,.0953
0.1278
0..0000
4..1897
1..9516
1..1531
0..0000
2,.2801
0..5722
0.,1173

0.2132
2.6732
0.4954
-2.4526
0.3178
-0.8236
0 . 0000
0 . 0000
1.0225
-5.0951
0.3090
-0 . 7009
0.2923
-0.4451
0.0000
0 . 0000
1.1455
1 . 9446
0.5716
0.8313
0 . 5840
0 . 0086
0.0000
0 . 0000
1.1073
0.1099
0 . 4307 -0.2719
0.4295
-0 . 7246

ChiSquare

210,.20
3.5089
8.94
-0.5106
0.4222
0.40
0 . 0000
9.14
-1.0870
0.5102
0.10
0 . 7008
0.19
0 . 0000
6 . 4348
13,.38
3.0719
11,.66
3,.90
2.2976
0.0000
4.4503
4,.24
1,.77
1.4164
0.,07
0.9591

Pr > ChiSq
C.OOOl
0.0028
0.5278
0.0025
0.7577
0.6619
0.0003
0 . 0006
0 . 0483
0.0395
0.1840
0.7848

6.3. LOG-LINEAR MODELS FOR I x J CONTINGENCY TABLES

185

LR Statistics For Type 3 Analysis

ce
a
b
a*b

DF

ChiSquare

Pr > ChiSq

2
3
6

4.56
46.75
30.59

0.1024
<.0001
<.0001

The model statement in PROC GENMOD requests for the fit of a saturated model
and the typeS options requests for the partial tests of the effects as if the effect
enters the model last. The interaction term is highly significant. We would expect
from this analysis that the model of independence would not fit the data in Table 6.4.
The type3 analysis presented is similar to the Maximum Likelihood Analysis of
Variance presented in PROC CATMOD. Because of the identifiability constraints,
we note that some of the parameters of a, b, and a*b effects are zero.
The output from PROC GENMOD above also gives in column 1 the parameter
of interest. Immediately following the parameters are the parameter numbers, where
1 and 2 for A implies parameter estimates Xf and A^1, respectively. We note that
Xf = 0 in this case. The column headed DF gives the appropriated degrees of
freedom for the parameter estimates. Similarly, the column headed standard error
gives the asymptotic standard errors for each of the parameter estimates, while
the Wald 95% C.I. gives the corresponding 95% confidence intervals for each of
the parameter estimates. The chi-square column entries are obtained as explained
earlier under PROC CATMOD and the last column gives the corresponding pvalues.

6.3.4

Estimating Asymptotic Standard Errors

The estimates of the asymptotic standard errors in PROC GENMOD are obtained
for example as follows:
/ I
1\^
a.s.e(Af ) = - + = 0.4955
/I
i \
a.s.e(A?) =
= I-+2

=.0.3090

n34J

+ + + }
2

24

32

= 0.4308

34/

Again, these results agree with those produced by PROC GENMOD in SAS
displayed above.

6.3.5

Analysis of Parameter Estimates

We see that each of the A parameter estimates can be written as a function of the
log of the observed values. That is, if we let
t

a-i In ni

with

a- =0

then from previous results in chapter 5, the asymptotic estimated standard error of
iij is given by

186

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

V ' '">

a.s.e.(^) =

y a^.n..
1
3 13

TABLES

/ -j

and a test of any of the A values being equal to zero can be conducted by the
statistic
\ _ \
Under the null hypothesis, the test statistic for HQ : X^B = 0 versus Ha :
is
\AB
ase(A^)
The corresponding 95 % confidence interval for X^B is given by
AI3 1.96
i -i f\r> aseff \X?<r)
AB\
A rr

For example, the test of the hypothesis that Xf^3 = 0 gives a Z value of 3.42 with
a corresponding 95% confidence interval
1.065 1.96(0.3112) = [0.455, 1.675]
Alternatively, Z2 can be compared to a x2 distribution with one degree of freedom.
SAS PROC CATMOD and PROG GENMOD use this latter test.

6.3.6

Model of Independence

When the hypothesis of independence holds, the log-linear formulation reduces to


hi =^ + xf

+ \f

(6.9)

From chapter 5, we have shown that irrespective of the sampling schemes


/L

lij = - In (n) + In (n+) + In (n+j)

(6.10b)

The formulation in (6.10a) can be seen to be equivalent to the formulation in (6.9)


where p, = ln(n). That is, under the model of independence then, the formulation
in (6.9) holds. On the other hand, given that the log-linear model (6.9) holds, can
,
,, , ,
ni+n+j
we show that m^ = -:
n
Under the null hypothesis of independence, we can show that the above is always
true and this is demonstrated below for the product-multinomial sampling scheme.
If,
= l n (\ m i j7 -/ ) = M
* + A?
i + Afj

holds, then,
where </> = e1^; tyi e

^A
{

\B

; and <pj = e j .

For the product-multinomial case, rhij = ni+7r*j, where fr*j = ^^-, with
^ TT^ = 1 and mi+ = rii+. Consequently,

6.3. LOG-LINEAR MODELS FOR I x J CONTINGENCY

TABLES

187

But

hence,

N l+
since 0 and </>i are constants relative to the summation over j. The above is true
for any i, so that
</,.
^3

7J- .
'< *J 5

flOi
nr

A
1i , OZ,, . . .
J

, J

In Table 6.8 are the results of fitting the models of independence {A,B}, {A}, and
{B} to the data in Table 6.4, where model {A,B} implies the model containing the
effects p,, A and B (this is the model of independence).
Models
{A,B}
{A}
{B}

df
6
9
8

G'z
30.586
67.428
31.355

p- value
< 0.001
< 0.001
< 0.001

Table 6.8: Results of fitting reduced models to the data in Table 6.4
Similarly, the model defined as {A} also implies the model containing p, and the
effects of A and ditto for the model described by {B}. The degrees of freedom for
models {A,B}, {A}, and {B} are computed respectively as follows:
{A, B} :IJ - {1 + (/ - 1) + ( J - 1)} = (J - 1)(J - 1)
{A}:IJ-{I + (I-l)}
= / ( J - l ) and
{B} :/J - {1 + (J - 1)}

=/(/-!)

The ML estimates under the various models are easily obtained. For instance, for
the independence model {A,B}, we have

where Ri is the logarithm of the total for row i and Cj is similarly the logarithm
of the total for column j. Further, R+ = Y^i^i an^ C+ Y^jCi- We leave
the computations of these estimates and associated variances as an exercise to the
reader.
We give below the SAS software codes required to fit the log-linear models
presented in Table 6.8 to the data in Table 6.4.
set tab64;
***fit models using proc CATMOD***;
proc catmod; weight count;
model a*b=_response_/ml;
loglin a b;
run;
loglin a;
run;
loglin b;
run;
***fit models using proc GENMOD***;

188

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

proc genmod; class a b; make 'obstats' out=aa;


model count=a b/dist=poi link=log obstats; run;
proc print data=aa; var count pred xbeta resraw reschi streschi; run;
proc genmod; class a b; model count=a/dist=poi link=log; run;
proc genmod; class a b; model count=b/dist=poi link=log; run;

We give below partial log-linear model output from SAS software under the model
of independence for the data in Table 6.4.
The CATMOD Procedure
Maximum Likelihood Analysis of Variance
Source

DF

Chi-Square

Pr > ChiSq

0.78
31.68

0.6782
<.0001

30.59

<.0001

Likelihood Ratio

Analysis of Maximum Likelihood Estimates


Standard
ChiParameter
Estimate
Error
Square

Effect

1
2
3
4
5

a
b

-0.0264
0.0818
-0.7142
0.5022
0 . 2974

0.07
0.75
19.21
21.23
6.66

0.0970
0.0944
0.1629
0 . 1090
0.1153

The GENMOD Procedure


Criteria For Assessing Goodness Of Fit
Criterion
DF
Value
Deviance
Pearson Chi-Square
Log Likelihood

DF

Parameter
Intercept
a
a
a
b
b
b
b
Scale

1
2
3
1
2
3
4

1
1
1
0
1
1
1
0
0

30.5856
27.1350
427.1250

5.0976
4.5225

Analysis Of Parameter Estimates


Standard
Wald 957.
Confidence Limits
Error
Estimate

2..6509
0.,0290
0,.1372
0..0000
-0..6286
0,.5878
0,.3830
0.,0000
1..0000

0..1797
0,.1703
0..1659
0..0000
0. 2528
0..1859
0. 1933
0.,0000
0,.0000

2.2987
-0 . 3047
-0.1880
0.0000
-1.1240
0 . 2234
0.0041
0 . 0000
1 . 0000

15
32
18
5
8
29
23
18
1
20
25
22

7 . 7778
26.2500
21.3889
14.5833
8 . 6667
29.2500
23.8333
16.2500
7.5556
25 . 5000
20 . 7778
14.1667

2.0513
3 . 2677
3 . 0629
2.6799
2.1595
3.3759
3.1711
2.7881
2.0223
3.2387
3.0339
2 . 6509

0.7854
0.3864
C.OOOl
<.0001
0.0099

Value/DF

3.0031
0.3627
0 . 4624
0.0000
-0.1332
0.9522
0.7619
0.0000
1 . 0000

Observation Statistics
Pred
Xbeta
Resraw

Obs

2
3
4
5
6
7
8
9
10
11
12

6
6

Pr > ChiSq

7.2222
5 . 7500
-3.3889
-9.5833
-0 . 6667
-0.2500
-0.8333
1.7500
-6.5556
-5 . 5000
4.2222
7.8333

ChiSquare

Pr > ChiSq

217..57
0.03
0.68

<.0001
0 . 8648
0 . 4083

6..18
9.99
3,.92

0.0129
0.0016
0 . 0476

Reschi

Streschi

2.5897
1 . 1223
-0.7328
-2.5095
-0.2265
-0 . 0462
-0.1707
0.4341
-2 . 3849
-1.0892
0 . 9263
2.0812

3 . 3409
1.7267
-1.0695
-3.4306
-0 . 3005
-0.0732
-0 . 2563
0.6104
-3.0560
-1.6644
1.3428
2.8258

The estimates of the parameters under the model of independence are parameterized
and given in GENMOD as follows:

6.4. INTERACTION ANALYSIS


Af - A = 0.0290;

189
\ - Xf = 0.1372

Af - Af = -0.6286; A? - Af = 0.5878
Af - Af = 0.3830
Thus to obtain the equivalent estimates based on PROC CATMOD (that is, under
parameter sum-to-zero constraints), we note that by adding the two equations above
involving the effects of A, we have
A f + A ^ - 2 A ^ =0.0290 + 0.1372

=0.1662

-3A^ = 0.1662
3

The last expression is as a result of the constraint \J A^ = 0 . Hence,


Xf = -0.0554
Xf = 0.0290 + Xf

= -0.0264 and

A^ = 0.1372 + Xf

- 0.0818

We can similarly use similar algebraic procedure to get the corresponding parameter
estimates for the effects of B. These parameter estimates agree with those given by
CATMOD. The above results therefore indicate the equivalence between the two
parameterizations used by PROC CATMOD and PROC GENMOD in SAS.

6.4

Interaction Analysis

The model of independence when applied to the data in Table 6.4 obviously does
not fit the data. In order to determine which cells are responsible for the lack of fit
of the independence model, we start by examining the standardized residuals zij or
adjusted residuals r^ under the model of independence. Either of these residuals
designated in the SAS software output above as reschi and Streschi, respectively,
indicated that cells {(1,1), (1,4), (3,1), (3,4)} can be considered to be at variance
with the null hypothesis of independence in view of their magnitudes being greater
than 2.0.
We can test a model of quasi-independence by writing
for(i,j)eS
for(i,j)gS
where S is the set containing cells in which the interaction is significant. Thus the
model of quasi-independence implies that the rows and columns are independent
except in cells contained in S. We usually let the number of elements in S be as
small as possible and generally S < (I 1)(J 1).
For the model of quasi-independence, the expected values satisfy
rhij 5ijOiif3j
(6-11)
where the a's and /3's relate to the row and column variables, respectively, and
i = l,2,..,/, j = 1,2,., J with
fl if(M)6S
s
10 otherwise

190

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

The MLE of the rhij satisfy the marginal constraints


mi+ = ni+ i = 1,2,..,/
m+j = n+j j = 1,2,.., J
The Deming-Stephan iterative proportional fitting technique can then be used to
obtain expected values under the model of quasi-independence. Basically, we start
the iteration by setting at the Oth step
rhij = Oij
(6.12)
for all ( i , j ) . Then the i^-th cycle of the iteration has for all ( i , j ) :
m;

(6.13a)

and

, (2i/)
m\j
=

(2I/-1)

(6.13b)

The iteration is continued for v = 1, 2, until we achieve the desired accuracy.


This procedure is demonstrated in the example below for the data in Table 6.4.

Example
For the data in Table 6.4, suppose we wish to fit the model of quasi-independence to
the data with cells (1,4) and (3,1) removed (these two cells have the largest absolute
values of Z). We would then have the reduced table, Table 6.9 where, observations in
cells (1,4) and (3,1) have been deleted and these cells are accordingly being treated
as structural zeros.
Favored
Plan
1
2
3
Total

1
15
8

23

Office
2
3
32
18
29
23
20
25
81 66

18
22
40

Total
65
78
67
210

Table 6.9: Observed values in the reduced table


Here, the reduced marginal totals are:
ni+ = {65, 78,67} and, n+j = (23, 81,66,40}
and the initial estimates of the MLE are given by:
/I 1 1 0\
mg>= 1 1 1 1

Vo i i i)

For the first cycle of iteration, we start by


row ni+ marginal totals. That is,
' 21.667 21.667
19.500 19.500
22.333
41.167 63.500

first utilizing (6.13a), which matches the

21.667
65
19.500 19.500 78
22.333 22.333 67
63.500 41.167 210

6.4. INTERACTION

191

ANALYSIS

Now we can complete the first cycle of iteration by also matching the column marginal totals using (6.13b) with the reduced n+j above. We notice that these marginal
totals have been altered as a result of the first half of the cycle. The first cycle is
completed with the new expected values below:

62.263
' 12.105 27.638 22.520
10.895 24.874 20.268 18.645 74.682
(2)
ij
~
28.488 23.213 21.355 73.055
_ 24.016 80.959 65.966 39.058 210.000
The above completes the first iteration cycle, where
12.105 =

21.667 x 23
41.167

and

24.874 =

19.5 x 81
63.500

etc.

The iteration continues in this way until we have convergence. By the seventh cycle
we have convergence and the corresponding MLE are

65.000
" 12.157 29.118 23.726
10.843 25.971 21.162 20.023 78.000
25.911 21.113 19.977 67.000
23.000 81.000 66.000 40.000 210.000
Below are the estimated expected values for the various models of quasiindependence considered. The model whose expected values have just been obtained is model 3 in the next table.

Cells
11
12
13
14
21
22
23
24
31
32
33
34
G'2
df

mj
15
32
18
5
8
29
23
18
1
20
25
22

1
m^
7.780
26.25
21.390
14.58
8.67
29.25
23.830
16.25
7.56
25.50
20.75
14.17
30.585
6

2
m^
10.878
24.942
20.323
13.887*
12.122
27.792
22.646
15.440
28.266
23.031
15.703
18.178
5

Models
3
m^
12.157
29.118
23.726
10.843
25.971
21.162
20.023
25.911
21.113
19.977
6.269
4

4
m^
10.878
25.541
19.392
14.189*
12.122
28.459
21.608
15.811
27.000
15.000
17.786
4

5
m^
12.292
30.796
21.912
10.708
26.828
19.088
21.375
23.375
18.625
4.676
3

Asterisk indicates cells with significant | Z values, and the n^ are the observed
cell counts. Models (1) to (5) are described below:
Model 1: This is the model of independence.
Model 2: This is the model of quasi-independence with cell (3,1) set to structural
zero.

192

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

Model 3: This is the model of quasi-independence with cells (1,4) and (3,1) set to
structural zeros.
Model 4: This is the model with cells (3,1) and (3,3) set to structural zeros based
on our residual analysis earlier.
Model 5: This model sets cells (3,1), (3,2) and (1,4) to structural zeros because
model 4 still has cell (1,4) with a significant standardized residual.
The basic approach to fitting the model of quasi-independence is to set cells having
significant standardized residuals to zero in stages, beginning with the cell with the
largest absolute value of Z in the model of independence. In our case, this was cell
(3,1). Removing this cell and fitting a quasi-independence model (model 2) that
does not fit the data because cell (1,4) still gives a significant | Z value. This cell
is further removed, resulting in model 3, which now gives a fitted G2 value of 6.269
on (6 1 1) = 4 degrees of freedom. This model fits the data well.
If we are interested in estimating the {a} and {/?} parameters that will ultimately lead to the estimation of the rhij terms then, the following procedure,
due to Goodman (1964, 1968), which is also presented in Bishop et al. (1975), is
discussed.
The maximum likelihood equations above can be written as

.7 = 1
S

H&i =H+J

j = 1, 2, ' , J

The above can be written succinctly as

The latter representation suggests an iterative procedure for estimating the


and {&}.
Thus if we begin by setting
$ = 1 for j - 1 , 2 , - - - ,J
and then continue at the z/-th cycle of the iteration (y > 1) by setting
~
and

for

0M
= _
^_
fo r
3
7
( }
06--a
a "
r
2^i=l ^3 i

i = l,2,...,/

7 =12--. J

after the v cycle, the estimates of the expected values are given by

6.4. INTERACTION

193

ANALYSIS

The model of quasi-independence can be implemented in SAS software as follows:


In the first SAS software program and output below, we set cell (1,4) to a structural
zero and fit a model of quasi-independence. This is accomplished in SAS software
by creating a dummy variable wt. The results of this fit from the residual analyses
indicate that while G2 has been reduced considerably from 30.585 to 16.992 on 5 d.f.,
cells (1,1),(1,3), and (3,1) still exhibit significant residuals. This model corresponds
to model 2 in the table above. We next set the cell with the largest absolute
standardized residual to a structural zero and refit the model of quasi-independence.
The next cell removed is cell (3,1), and the resulting model is model 3 above.
set tab64;
if a=l and b=4 then wt=0;
else wt=l;
proc gerunod;
make 'obstats1 out=aa;
class a b wt;
model count=a b wt/dist=poi link=log obstats
type3;
run;
proc print data=aa;
var count pred reschi streschi;
format pred reschi streschi 7.4;
run;
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

5
5
5
5

16 .9922
16 .9922
14 .6320
14 .6320
433 .9217

Deviance
Scaled Deviance
Pearson Chi -Square
Scaled Pearson X2
Log Likelihood

Value/DF

3 . 3984
3.3984
2 . 9264
2 . 9264

Obs

count

Pred

Reschi

Streschi

1
2
3
4
5
6
7
8
9
10
11
12

15
32
18
5
8
29
23
18
1
20
25
22

9.1228
30..7895
25,.0877
5..0000
7..9481
26 .8248
21..8572
21..3699
6,.9291
23..3857
19.,0550
18.,6301

1 .9458
0.2182
-1,.4151
0,.0000
0,.0184
0.4200
0,.2444
-0,.7290
-2,.2524
-0,.7001
1.,3619
0.,7807

2.,6656
0..3819
-2. 2936
0..0000
0,,0239
0.,6539
0. 3604
-1. 2536
-2. 8333
-1.,0511
1. 9392
1. 2536

We set cell (3,1) to structural zero and refitting, by creating again a dummy variable
wt where
for cell (1,4)
wt =

for cell (3,1)


elsewhere

This is again implemented in SAS software as:


set tab64;
if a=l and b=4 then wt=0;
else if a=3 and b=l then wt=2;
else wt=l;
proc genmod; class a b wt;
model count=a b wt/dist=poi link=log obstats
type3; run;

194

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

Criteria For Assessing Goodness Of Fit


Criterion

Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood

DF

Value

4
4
4
4

6.,2697
6.,2697
6,,0639
6.,0639
439..2830

Obs

count

Pred

Reschi

Streschi

1
2
3
4
5
6
7
8
9
10
11
12

15
32
18
5
8
29
23
18
1
20
25
22

12,.1568
29 .1177
23,,7255
5 .0000
10,.8432
25,.9715
21 .1619
20..0234
1..0000
25.,9108
21,.1125
19..9766

0,.8155
0,.5341
-1..1755
0,.0000
-0,.8634
0,,5943
0..3996
-0,.4522
0,,0000
-1,,1612
0,.8461
0,.4527

1..3075
0,.9203
-1,,8748

Value/DF

1 . 5674
1 . 5674
1.5160
1.5160

-1,.3075
0,.9093
0.5805
-0,.7581
0..0000
-1..8622
1..2715
0..7581

The basic approach to fitting the model of quasi-independence is to set cells having
significant standardized residuals to zero in stages, beginning with the cell with the
largest absolute value of Z in the model of independence. In our case, this was cell
(1,4). Removing this cell and fitting a quasi-independence model (model 2) that
does not fit the data because cells (1,1), (1,3) and (3,1) still have significant | Z \
values. Cell (3,1) is then further removed, resulting in model 3, which now gives a
fitted G2 value of 6.269 on (6 1 1) = 4 degrees of freedom. This model now fits
the data well. None of the standardized residuals is significant for this model.
Care must be taken in implementing the model of quasi-independence. However,
if we had tried to simultaneously set to zero two or more cells initially exhibiting
significant absolute Z values under the model of independence, model 4 would
be an example of such a case. This model removes the two cells (3,1) and (3,3)
simultaneously as they have the biggest absolute values of Z under independence.
The resulting model, unfortunately, still does not fit, as we would still have to
remove cell (1,4), which still exhibits significant Z. The resulting model from the
removal of this cell gives a G2 of 4.676 on 3 d.f., a good fit, but with a fewer degree
of freedom than model 3. We shall consider parsimony of models later, but it seems
preferable to adopt model 3 because it is based on 4 d.f. Since this model fits
well, this implies that the four different offices are independent of the favored pl^ns
except for office 4 workers choosing favored plan 1 and office 1 workers choosing
favored plan 3.
Alternatively, we could have fitted model 3 above, again in SAS software with
the following program:
set tab64;
if a=l and b=4 then count=e-20;
else if a=3 and b=l then count=e-20;
proc genmod;
make 'obstats' out=aa;
class a b;
model count=a b/dist=poi link=log obstats typeS;
run;

6.5. THREE-WAY CONTINGENCY

TABLES

195

Criteria For Assessing Goodness Of Fit


Criterion

Value

Value/DF

6,,2697
6.,2697
6.,0639
6.,0639

1 .5674
1 .5674
1 .5160
1 .5160

DF

4
4
4
4

Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood

437.,2358

Obs

count

Pred

Resraw

Reschi

Streschi

1
2
3
4
5
6
7
8
9
10
11
12

15
32
18

12 .1568
29..1177
23,.7255
22,.4490
10,.8432
25 .9715
21 .1619
20 .0234
10 .8179
25 .9108
21 .1125
19 .9766

2. 8432
2. 8823
-5. 7255

0.8155
0.5341
-1 .1755

1,.3075
0,.9203
-1,.8748

-2. 8432
3.,0285
1,,8381
-2.,0234

-0 .8634
0.5943
0.3996
-0 .4522

-1,.3075
0.9093
0.5805
-0,.7581

-5.,9108
3.,8875
2.,0234

-1 .1612
0.8461
0.4527

-1,.8622
1,.2715
0,.7581

8
29
23
18
20
25
22

PROC CATMOD could also be used to fit the above models, but first we must
specify that the counts or frequencies in these cells equal e~20 as in the case above.
Yet another alternative for obtaining MLE under the quasi-independence model is
presented in appendix E.I.
Other quasi-independence models will be discussed when we consider square
tables having ordered classificatory variables in chapter 11.

6.5

Three-Way Contingency Tables

As an example of a three-way table, consider the data in Table 6.10, which relate
to surveys carried out in 1975 (Yl) and 1976 (Y2) asking individuals whether they
favored registration of guns (Aicken, 1983). The question was asked either in the
form Ql or in a form Q2 slanted against gun registration. The individual responses
are (Rl) opposes gun registration and (R2) favors gun registration. The data in
Table 6.10 are the response R of whether individual favor gun registration for years
1975 and 1976 (Y) for two forms of mode of questionnaire (F). the example below.
Response
Opposes
Favors

Year
1975
1976
1975
1976

Question Form
Ql
Q2
126
141
152
182
319
290
463
403

Table 6.10: Gun registration Data


Our discussion of log-linear models for three-way contingency tables will use for
illustration the 23 three-way table in Table 6.10, where the factor variables are Y
and Q and the response variable is R, all having two categories. Let R, Y, and Q
be indexed by i = 1, 2, j = 1,2, and k = 1, 2 respectively.
The saturated log-linear formulation for this table is of the form:

196

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY


i
_ .. , \R , \Y , \Q , \RY , ^RQ ,
lijk-V> + \ + Aj + A f c + A ij + A ifc

X VQ
A

,
+ jfc

TABLES

subject to the parameters sum to zero constraints

(6.15b)

The above model contains as many parameters as the number of cells. If we let
lijk denote the natural logarithm of the expected counts, then the estimates of
the A parameters are easily obtained by substituting the observed cell counts for
the expected cell counts either in the previous expressions or by using standard
statistical package such as SAS. We give below expressions for the parameter
estimates of the model in (6.14).
8
ln(m m mii^ rhi-2i 771122 m 2 n m2i2 m22i ^-222)
8
1R _
1

!++ _

+++ _

_i

\m2ii in? 12^22 1

Similar expressions can be written for A^ and A^ . For the first-order interaction
terms such as A^, we have
RY =

11

11+

++++l+ .

----

+++

lm 171221

.
-- = in
8
8 \mi2i m 2 n 771122^212 /

and similarly for A:1 and A n . For the second-order or three-factor interaction
term \ul , we also have
A DA/' (~)

IRYQ _ ,
A
-t in
= - In

The maximum likelihood estimates of the parameters for the saturated model can
be obtained from the above expressions by replacing the expected counts rhijk by
the observed counts n^, we have
_ ln[126(141)(152)(182)(319)(290)(463)(403)j

/J, ---

6.5. THREE-WAY CONTINGENCY TABLES


l h /(126)(141)(152)(182)\
8 V (319)(290)(463)(403);
y
A
l ln /126(141)(319)(290)\
AI
~ 8 ln V 152(182)(463)(403))
\ Q - l i /126(152)(319)(463)\
1
"~ 8 n Vl41(182)(290)(403)y
XRY _ 1 /126(463)(141)(403)\
11
~ 8 n Vl52(182)(319)(290)J
^Q 1 /126(290)(152)(403)\

197

;*
1

>YQ - l ln /126(182)(319)(403)\
An
~ 8ln (,141(152)(290)(463) )
~RYQ _ 1 /126(463)(182)(29Q)\
Am
-8 ln Vl52(319)(141)(403)J

_
" ~'1431

_
~

_
~
_
~ -139

The large sample standard errors for each of the estimated A parameters is a
straightforward generalization of our previous results. Since each effect is obtained
as a difference of four sums (+ve) and four (-ve) sums, the asymptotic standard
error becomes:
1
a.s.e =
nijk
J

An SAS software program and a revised output for fitting the saturated 2x2x2
log-linear model to the data in Table 6.10 are displayed below.
data tab611;
do R=l,2;
do Y=l,2;
do Q=l,2;
input count Qffl; output;
end;
end;
end;
datalines;
126 141 152 182 319 290 463 403

run;
proc catmod; weight count;
model r*y*q=_response_/ml;
loglin r l y l q ;
run;
proc genmod; class r y q; model count=r|y|q/dist=poi typeS;
run;
CATMOD Procedure
Maximum Likelihood Analysis of Variance
Source

R
Y
R*Y
Q

R*q

Y*Q
R*Y*Q

DF

Chi-Square

Pr > ChiSq

330.45

1
1
1
1
1
1

34.17
1.75
0.09
7.24
0.02
0.32

<.0001
c.OOOl
0.1863
0.7651
0.0071
0.9018
0.5703

198

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

Effect

Analysis of Maximum Likelihood Estimates


Standard
ChiParameter
Estimate
Error
Square

R
Y
R*Y

1
2
3
4
5
6
7

Q
R*Q
Y*Q
R*Y*Q

-0.4449
-0.1431
0.0323
-0.00731
-0.0658
0.00302
0.0139

0 . 0245
0 . 0245
0 . 0245
0 . 0245
0.0245
0 . 0245
0.0245

TABLES

Pr > ChiSq

330.45
34.17
1.75
0.09
7.24
0.02
0.32

<.0001
<.0001
0.1863
0.7651
0.0071
0.9018
0.5703

The GENMOD Procedure

Parameter
Intercept
R
Y
R*Y
Q

R*q
Y*q
R*Y*q

DF
1
1
1
1
1
1
1
1
0

1
1
1 1
1

i i
i i
1 1 1

Scale

Analysis Of Parameter Estimates


Standard
ChiEstimate
Error
Square Pr > ChiSq
5 .9989
-0 .7949
-0 .3291
0.0738
0.1388
-0 .3189
-0..0435
0.1111
1 .0000

0.0498
0.0893
0.0770
0.1361
0.0681
0.1293
0.1059
0.1958
0.0000

14502.9
79.23
18.26
0.29
4.15
6.08
0.17
0.32

<,.0001
<,.0001
<,.0001
0,.5875
0,.0416
0,.0136
0,.6815
0,.5703

LR Statistics For Type 3 Analysis


ChiSource
DF
Square
Pr > ChiSq
R
Y
R*Y

1
1

363.33
34.45

1.74

q
R*q

1
1
1
1

0.09
7.26
0.02
0.32

Y*Q

R*Y*Q

<.0001
C.0001
0.1867
0.7650
0.0071
0.9018
0.5703

Considering the maximum likelihood analysis of variance given by PROC CATMOD


or the type 3 analysis from PROC GENMOD for the saturated model, only the
interaction term \R is significantly different from zero with a pvalue of 0.0071.
The third-factor interaction term \RYQ as well as the other second-factor or firstorder interaction terms XRY, and Ar<^ are clearly not significantly different from
zero. Hence, we may conclude (based on the non-significance of the three factor
interaction term) that the association between any two of the variables {R,Y,Q}
does not depend on the level of the third variable. For example, the association
between R and Q does not depend on which year the survey was conducted. The
proceeding conclusions imply that we would fail to reject the hypothesis of no threefactor interaction in the data. We summarize our findings below:
(a) That the three-factor interaction is zero is formally stated as
and we fail to reject this hypothesis.

=0

(b) That the association between year (Y) and response variable (R) is not significant and may be stated formally as the hypothesis HQ : XRY = 0, which we
also fail to reject.
(c) That there is a significant negative association between the form of question
(Q) and the response variable (R). Here, the hypothesis HQ : XRQ = 0 will be
rejected and we would conclude that the response variable R and the form of
question asked (Q) are dependent, after adjusting for the years of the survey.

6.5. THREE-WAY CONTINGENCY TABLES

199

(d) The association between years (Y) and form of question (Q) is not significant
and we may again state that the hypothesis HQ : XY = 0 and we would fail
to reject this hypothesis.
Since only the first-order interaction term XR is significant, the estimated adjusted
(or conditional) log-odds ratio equals (under the parameters sum to zero constraint
- CATMOD)
A Q
fi + A*2Q - (Af2Q + AQ) = 4(-0.0658) = -0.2632
since A n = ~~A 12 = ~A 21 = A 22 . That is, the adjusted odds ratio is estimated to
be e(4*"Q) = e(--2632) = 0.769. The estimated asymptotic standard error for this
odds ratio is 2(.0245) = 0.0490. This gives a 95% confidence interval for population
conditional odds ratio of -0.26321.96(.0490) or (-0.3592, -0.1672) giving an odds
ratio confidence interval (.70,.85). That is, we are 95% confident that the estimated
odds of respondents who were asked the form of question Ql are between 0.70 and
0.85 less likely to oppose gun registration than individuals who were asked the form
of question Q2 after adjusting for the years of survey.
We can fit simpler models to the above data instead of the saturated model that
we applied in the preceeding section. The next simpler model for the above data
is the model that assumes that there is no third-factor interaction term. That is,
the model that sets XRYQ = 0. This model {RY,RQ,YQ} has the log-linear model
formulation:
,
*R A> r , A^Q , A \RY , A \RQ , A \YQ
A
iijk = M +

i +

j + k + ij

+ ik

jk

with the relevant constraints. The model maintains homogeneous odds ratios between any two variables at each level of the third variable. The model has been
described as the homogeneous association model. The model when fitted to our
data has a G2 = 0.3222 on 1 degree of freedom. A more simpler model is model
{YQ,RQ}. Model {YQ,RQ} similarly has a G2 = 2.0154 on 2 degrees of freedom.
The difference between these two models leads to a difference of 1.6932 in G2 with
a corresponding 1 degree of freedom. Obviously, this indicates that the inclusion
of the additional parameter XRY in the model is not significant. We shall explore
other reduced models for the above data in a later section in this chapter. Both
models as expected fit the data well. In particular, model {YQ,RQ} states that R
and Y are conditionally independent given the levels of Q. It does appear as if there
is no time effect association with the response G in respect of the above data. We
give below the GENMOD output for both models together with predicted values
and appropriate residuals.
MODEL {YQ.RY.RQ}
set tabSll;
proc genmod;
make 'obstats' out=bb;
class r y q;
model count=r l y |q<82/dist=poi typeS obstats;
run;
proc print data=bb noobs;
var count pred Xbeta Resraw Reschi Streschi;
format pred Xbeta Resraw Reschi Streschi 8.4;
run;
The GENMOD Procedure
Criteria For Assessing Goodness Of Fit

200

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES


Value

Value/DF

0.3222
0.3222
0.3223
0.3223

0.3222
0.3222
0.3223
0.3223

Criterion
Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2

1
1
1
1

Parameter

DF

Intercept
R
Y
R*Y
q
R*Q
Y*Q
Scale

1
1
1
1
1
1 1 1
1
1
1 1 1
1 1 1
0

Analysis Of Parameter Estimates


Standard
Wald 957.
Estimate
Error
Confidence Limits
6.0061
-0.8182
-0.3463
0.1274
0.1253
-0.2705
-0.0109
1.0000

0.0480
0.0797
0.0709
0.0978
0.0638
0.0971
0.0891
0.0000

5.9120
-0.9744
-0.4852
-0.0643
0.0002
-0.4608
-0.1856
1.0000

6.1002
-0.6619
-0.2073
0.3192
0.2505
-0.0803
0.1637
1.0000

ChiSquare
15647.6
105.28
23.86
1.70
3.85
7.77
0.02

LR Statistics For Type 3 Analysis

Source
R
Y
R*Y
Q
R*Q
Y*Q

DF

ChiSquare

Pr > ChiSq

1
1
1
1
1
1

363.23
35.09
1.69
0.10
7.79
0.02

<.0001
<.0001
0.1932
0.7532
0.0053
0.9023

Observation Statistics
count

Pred

Xbeta

Resraw

Reschi

Streschi

126
141
152
182
319
290
463
403

123.101
143.899
154.899
179.101
321.899
287.101
460.101
405.899

4.8130
4.9691
5.0428
5.1879
5.7742
5.6598
6.1314
6.0061

2.8992
-2.8992
-2.8992
2.8992
-2.8992
2.8992
2.8992
-2.8992

0.2613
-0.2417
-0.2329
0.2166
-0.1616
0.1711
0.1352
-0.1439

0.5677
-0.5677
-0.5677
0.5677
-0.5677
0.5677
0.5677
-0.5677

MODEL -CYQ.RQ}
set tab611;
proc catmod; weight count; model r*y*q=_response_/ml;
loglin r|q ylq; run;
proc genmod; make 'obstats' out=dd;
class r y q; model count=r|q y|q/dist=poi type3 obstats; run;
proc print data=dd noobs;
var count pred Xbeta Resraw Reschi Streschi;
format pred Xbeta Resraw Reschi Streschi 8.4; run;
CATMOD OUTPUT
Maximum Likelihood Analysis of Variance
Source
Y
Q
Y*q
R
R*q
Likelihood Ratio

DF
1
1
1
1
1

Chi-Square
50.07
0.11
0.04
343.22
7.79

2.02

Pr > ChiSq
<.0001
0.7439
0.8392
<.0001
0.0052

Pr > ChiSq
<.0001
<.0001
<.0001
0.1927
0 . 0496
0.0053
0.9023

6.5. THREE-WAY CONTINGENCY

Effect

201

TABLES

Analysis of Maximum Likelihood Estimates


Standard
ChiParameter
Estimate
Error
Square

Pr > ChiSq

50.07
0.11
0.04
343.22
7.79

<.0001
0.7439
0.8392
<.0001
0.005

Y
Q
Y*Q
R
R*Q

1
2
3
4
5

-0.1573
-0.00801
-0.00451
-0 . 4494
-0 . 0677

0.0222
0.0245
0.0222
0.0243
0.0243

GENMDD OUTPUT

Criterion

Criteria For Assessing Goodness Of Fit


DF
Value

Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood

Value/DF
1 . 0077
1 . 0077
1.0114
1.0114

2.0154
2.0154
2.0228
2.0228
9683 . 5985

2
2
2
2

Analysis Of Parameter Estimates


Standard
ChiEstimate
Error
Square Pr > ChiSq
Intercept
Y
Q
Y*Q
R
R*Q
Scale

1
1
1
1
1
1 1 1
1
1
1 1 1
0

5 . 9890
-0.3055
0.1284
-0.0180
-0.7634
-0.2709
1 . 0000

0.0466
0.0635
0.0643
0.0889
0.0674
0.0970
0 . 0000

16543.30
23.16
3.99
0.04
128.39
7.79

<.0001
c.OOOl
0 . 0456
0.8392
<.0001
0.0052

LR Statistics For Type 3 Analysis


Source

Y
Q
Y*Q
R
R*Q

DF

ChiSquare

Pr > ChiSq

1
1
1
1
1

50.69
0.11
0.04
378 . 84
7.81

<.0001
0 . 7439
0.8392
<.0001
0.0052
Observation Statistics

126
141
152
182
319
290
463
403

116.708
137.021
161.293
185.979
328.293
293.979
453.708
399.021

4.7597
4.9201
5.0832
5 . 2256
5.7939
5.6835
6.1175
5 . 9890

9.2921
3 . 9793
-9 . 2928
-3.9793
-9.2925
-3.9793
9 . 2925
3.9793

0.8601
0 . 3400
-0.7317
-0.2918
-0.5129
-0.2321
0.4363
0.1992

1.3147
0.5425
-1.3148
-0.5425
-1.3148
-0.5425
1.3148
0.5425

In the Table 6.11 are displayed the expected counts under the homogeneous association model.
R
Rl=l
R2=2

Yl=1975
Ql
Q2
123.10 143.90
321.90 287.10

Y2=1976
Ql
Q2
154.90
179.10
460.10 405.90

Table 6.11: Expected cell counts under the model of no three-factor interaction
Since we are usually interested in the ratio of the odds of an Yl when the response

202

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

was Rl relative to the odds of an Yl when the response was R2, if there is no
three-factor interaction, then we would expect the log odds ratio not to depend on
the level of k of variable Q. That is,
In . .

.
; /

should be constant (Birch, 1963) for k = 1,2. In our example, from expected counts
in Table 6.11,
/123.10 x 460.10\ _
_
/ 143.90x405.90
n
V 154.90 x 321.90 y ~ '
\179.10 x 287.10
The log-odds of 0.1274 equals the parameter estimate for XRY under the last equals
zero GENMOD constraint. Similarly, it can be shown that the log-odds for the R-Q
subtables at each of the two levels of Y would be equal to 0.2705, the estimate of
XRQ

6.5.1

General I x J x K Contingency Tables

The above log-linear analysis for the 23 contingency table can readily be extended to
the general I x J x K three-way contingency tables, with variables A, B, and C that
are indexed by i 1, 2, , /, j 1, 2, , J, and k 1, 2, , K, respectively. In
this case, the saturated log-linear model formulation for such a table is again of the
m

/ = u + XA 4- A5 + \c + \AB 4- \AC + \BC + \ABC

subject to the constraints similar to those in (6.15). The lijk are the natural logarithms of the observed counts n^. The saturated model above will succinctly be
described as the model {ABC}.
The ML estimates in this case are given below for a few of the terms by:

UK
JK
[AB

13

\A B C

UK

lij+

k++

JK

^3 i

IK
/

I-\-K

AW - ink - - ^ j

+
/

UK

~T~ j fc .

'M~"T~

JK

i _____

_______

________

IK ^ ~TT ~ TJK

Similar expressions can be written for the other parameter estimates (A^,A^,
A^.c) and A^P. Simpler models are formed by deleting terms from the saturated
model (ABC), and the reduced model formed is referred to as an unsaturated model
or reduced model.

6.5.2

Hierarchy Principle

For the general IxJxK three-way table, the saturated model, which can be written
succinctly as {ABC}, implies that the A parameters associated with the following
are included in the model: /^, A, B, C, AB, AC, BC, and ABC. Similarly, the model
written {AB} has included the parameters //, A, B, and the AB interaction term,

6.6. SUFFICIENT STATISTICS FOR LOG-LINEAR MODELS

203

while the model of no three-factor interaction {AB,AC,BC} has /z, A, B, C, AB,


AC, and BC parameters in the model and has the formulation:
/
_ .. , \A , \B , \C , \AB , \AC i \BC
kjk - V + A; + Aj + Ak + A^ + Aik + Ajk

Thus if an interaction, say {AB}, is included in a model, then, implicitly, the lower
order terms A, B are also by the hierarchical principle included in the model, along
with, of course, that interaction term. Conversely, if the interaction term AC = 0,
then by the hierarchy principle, this implies that the higher order interaction ABC
must also be zero. Further, we say that model {AB} is nested in model {AB,AC,BC}
since the parameters of the former are a subset of the parameters of the latter
model. We can take advantage of this in comparing nested log-linear models. We
shall develop this further later in the chapter.

6.6

Sufficient Statistics for Log-Linear Models

The sufficient statistics are the configurations of sums that correspond to the effect
terms of the loglinear model. To derive these statistics, we would need to relate the
log- linear model of interest to the likelihood function.
Consider the saturated model {ABC} for the three-dimensional contingency table where A, B and C are indexed by i = 1, 2, , J, j = 1, 2, , J, and k =
1, 2, , K, respectively. The model has the formulation
In (mijk) = /* + A? + Af + A<r + X*B + \*c + \fkc + \^c

(6.16)

where lijk is the natural logarithm of the expected counts. The usual identifiability
constraints are assumed imposed to ensure that the number of parameters are equal
to the number of cells in the table.
The log-likelihood (I) under the multinomial sampling scheme is again presented
here as:
( n\
'
(6.17)
The first and last terms are constants for any set of rhijk, and we are only interested
in the remaining term, known as the kernel of the function. That is, the kernel of
the multinomial is
^nijk\n(mijk}
(6.18)
ijk

Similarly, for either the product-multinomial or Poisson sampling schemes, the kernels are given by the same expression as above. For example, for the Poisson
distribution, we have the log of the likelihood being
i = ^nijk\u(rhijk) - ^mijk - ^ln(n^ fc !)
ijk

ijk

(6.19)

ijk

Again as before the kernel is given by the first term in the expression above since
the last two terms are constants, with the second term on the RHS being equal to
the sample size n.
Substituting for In (rhijk) in both (6.16) and (6.18) and applying the summation
sign, we have

204

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES


53 nijk In (>ijk) n^i + 53 ni++xf + 5^ n+J+Xf
ijk

5^ n++k^k

A\AB , V^^

ij+ ij

k
\AC

-i- Z^^i+fc^ifc

(6.20)

ifc

jfc

ijfc

Both the multinomial and Poisson distributions belong to the class of exponential
probability density functions as explained in chapter 2. For this class of models, it
is well known that the sufficient statistics are the coefficients of the parameters. In
the above formulation of the kernel of the saturated model, the result is general,
but for unsaturated models, terms will drop out as appropriate and the terms that
remain will give coefficients that will be the sufficient statistics.
We consider sufficient statistics for some of these unsaturated models. We shall
illustrate with the general three-way ABC contingency table.

6.6.1

The No Three-Factor Effect

For brevity, X^c = 0, will be written succinctly as ABC = 0 for all i,j,k. If we
therefore set ABC = 0, in this case, the reduced model is {AB,AC,BC} and so the
last term in (6.20) disappears, and the term N and configurations with numbers
ni++,n + j-|-,n+-t-A;,nij+,ni+fc, and,n+jfc are the sufficient statistics for this model.
The last three configurations yield the others and form the complete minimal set
(since rii++ = Yljnij+ e^c.). Using the more succinct notation of Bishop et al.
(1975), we would say that the minimal sufficient statistics are 6*12, Cis, and C^,
which are succinct forms of n.jj + ,n; + fc, and n+jk respectively.

6.6.2

One-Factor Effect Term

If we further remove one additional term (say AC) by putting AC = ABC = 0, the
natural logarithm of the kernel of the likelihood function becomes
53 nijk In (rhijk) = n/z + ]T ni++Xf + ^ n+j+xf
i

BC
ij

jk

and the sufficient statistics are C\-2 and 23, that is, riij+ and n+jk- We note that
by setting AC = 0, we are implying by the hierarchy principle that ABC is also
zero.
We give in Table 6.12 the sufficient statistics for the complete models in a threedimensional table.
There are three kinds of model type (2), namely, models with absent terms (AC,
ABC), (AB, ABC), and (BC, ABC). Similarly, there are three kinds of model type
(3), again with absent terms (AB, AC, ABC), (AB, BC, ABC), and (AC, BC, ABC).

205

6.7. MAXIMUM LIKELIHOOD ESTIMATES, MLE


Model
Type
1
2
3
4

Absent
term(s)
ABC
AC,ABC
AB,AC,ABC
AB,AC,BC,ABC

Sufficient
Configurations
Ci2, Ci3,C23
Ci2 > C*23
C<23,Ci

C\,C-2, Cs

df
(I-1)(J-1)(K-1)
(I-l)J (K-l)
(I-1)(JK-1)
IJK-(I+J+K)+2

Table 6.12: Complete models in three dimensions

6.7

Maximum Likelihood Estimates, MLE

Again for the three-dimensional case, once a set of sufficient statistics has been
identified, it follows for example for the model {AB,AC,BC} that the sufficient
statistics are Ci2,C'i3, and 23 with corresponding cell marginal counts nij+,ni+k,
and n+jk respectively. Then the following results are due to Birch (1963).
(1) The minimal sufficient statistics are the ML estimates of the corresponding
marginal distributions. That is, for log-linear models, the likelihood equations
match the sufficient statistics to their expected values. Thus the MLE of
rhij+ifhi+k-, and rn+jk must correspond to the observed values. That is,
777.' ' i ~^ 71- i
IJ
, 13+
(6.21)
>i+k Ki+k

th+jk n+jk

and

(6.22)

(2) There is a unique set of elementary cell estimates that satisfies the conditions
of the model and these marginal constraints. For example, for the hypothesis
(ABC) = 0, we have
m

~"*'m ""

for i^rj^s,k^t

(6.23)

Further, there is only one set of estimates satisfying relations (6.21), (6.22), and
(6.23), and this set maximizes the likelihood function.

6.7.1

Two-Dimensional Case

In the two-dimensional / x J contingency table case, if we set AB 0, which is the


model of independence, the sufficient configurations are C\, C<2 with corresponding
cell entries ni+ and n+j, respectively. Thus the likelihood equations are n^+ = rhi+
and n+j = m +J , for all i and j. The expected values are

n
and they satisfy these equations.

6.7.2

Three-Dimensional Tables

Again let us consider the three-dimensional case with AC = (ABC) = 0. The


sufficient statistics are C\i and 623 and they have in common C^- Thus, the ML
estimates are given by:
rriijk =

We give below types of direct estimates possible for three-dimensional tables.

206

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY


Sufficient
configurations

Model

TABLES

Direct
estimates

{A,B,C}

13k

{AB,C}

IJK

ni++n+3+n++k
n2

{AB,AC}
No direct estimates

{AB,AC,BC}
No Restrictions

{ABC}

6.7.3

Four-Dimensional Tables

For four factors A, B, C, and D indexed by i = 1, 2, , /, j = 1, 2, , J, k =


1,2, , K, and / = 1,2, , L, respectively, there are 4 main effects, 6 two-factor
effects, 4 three-factor effects, and 1 four-factor effect. In the next table, we give the
direct estimates for various configurations when certain effect or effects are removed
from the saturated model {ABCD}, where 4F, 3F, and 2F mean four-factor, threefactor, and two-factor terms, respectively.
Set

Effects removed
4F 3F
2F

Sufficient
configurations

Direct

Ci,C 2

The general form of the direct estimates is predictable. For instance, the form
of the ML estimates for set 1 is suggestive. The sufficient configurations Ci23 and

6.7. MAXIMUM LIKELIHOOD ESTIMATES, MLE

207

Ci24 have C\i in common, and hence, the denominator n ij++ in the expression.
We give below the general rules for obtaining these estimates.
(i) The numerator has entries from each sufficient configuration.
(ii) The denominator entries from redundant configurations caused by "overlapping," terms in powers of N, appear either in the numerator or in the denominator to ensure the right order of magnitude. The order of the denominator
is always one less than that of the numerator.
Sometimes, no combination of the sufficient statistics yields a direct estimate.
In such cases, we shall employ iterative procedures. For instance, the estimates
for the ABCD 0 needs the use of an iterative procedure. {ABC, ABD, ACD,
BCD} will be described as the generating class for this model, that is, the highest
order interaction terms in the model, whereas for the model in set 1 above, we
would say that the generating class is ABC and ABD. Further, the model {ABCD}
is comprehensive because it includes all main effects for each of the factors in the
table. A noncomprehensive model therefore is one with at least a missing main
factor effect.

6.7.4

Closed Loops

When overlapping configurations can be linked to each other to form a closed loop,
then no direct estimates exists. As an example, the model with the ABC term =
0 in a three-way table has as its sufficient configurations C\2,Ci3, and 6*23, which
as we can see from Figure 1 in Figure 6.1, forms a closed loop. Hence, no direct
estimates exists. These figures are presented in the next page.

Figure 6.1: Figures 1 and 2


When a closed loop exists, whether all or some of the configurations are only involved
in the loop, iterative methods are necessary to obtain ML estimates, as no direct
estimates exist in this case.
As another example, in a five-dimensional table, direct estimates exist for the
configurations, C\2, 623, 634, and C45, that is, model {AB,BC,CD,DE}, because
these do not form a closed loop (see Figure 2 in Figure 6.1). But we would be
forced to use iterative procedures if we were to add a further two-factor term or
configuration to the four above. For instance, adding Cis, that is, AC to the model
gives a loop connecting ^12,623, and C\z (Figure 3). Figures 3 and 4 are again
respectively presented in Figure 6.2.

208

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

However, we can obtain direct estimates from the configurations C\<23 and 6*345
even though these configurations have six two-factor effects because the configurations do have loops (Figure 4), but the generating classes {ABC} and {CDE} are
also included in the model; that is, the model is decomposable (see next section).
The model in Figure 3, {AB,BC,CD,DE,AC}, includes the generating class (ABC)
The above discussions bring us to the notions of decomposable and graphical
log-linear models.

Figure 6.2: Figures 3 and 4

6.8

Decomposable and Graphical


Log-Linear Models

A decomposable log-linear model (Goodman, 1970, 1971b; Haberman, 1978) is the


class of models with closed form maximum likelihood estimates. They are very easy
to interpret because they have very simple structures.
On the other hand, a graphical log-linear model (Christensen, 1990) is one that
can be interpreted in terms of conditional independence arguments and can be
formally defined as:
A model is graphical if, whenever the model contains all two-factor terms
generated by a higher order interaction, the model also contains the
higher interaction term.
For a three-way table, the model {AB,AC,BC} is not graphical nor decomposable because by definition it should contain the generating three-factor interaction
ABC. Also, model {AB,BC,AC,CD,DE} in a five-factor table is not decomposable
because it does not have the three-factor generating term ABC (that forms the
closed loop) in its model. If we were to add this term (i.e., ABC) to the model,
then, the new model (ABC, CD, DE} would be decomposable and graphical. For
the same five-way table, the model {ABC,CDE} is decomposable because it not
only has the six two-factor terms but also their generating three-factor terms, even
though the ABC and the CDE formed closed loops.
Bishop et al. (1975) have proposed the following rules for determining the
existence of direct estimates (decomposability) for a general fc-way log-linear model.
They refer to these as classifying rules.

6.8.

DECOMPOSABLE AND GRAPHICAL LOG-LINEAR MODELS

6.8.1

209

Rules for Detecting Existence of Direct Estimates

Example
Consider the configurations 6123,6*124,6*456 in a six-way contingency table.
First:
1. Relabel variables that appear together (here variables 1 and 2) as
61/3, 61/4, 6455.

2. Delete common variables to all the three (here none}.


3. Delete variables that appear once (variables 3, 5, and 6) 6*i/, 6*1/4,64.
4. Remove redundancies 6*1/4.
The final configurations do not form a closed loop, hence direct estimates exist and
the model is not only decomposable but is also graphical. This is not surprising
as the model contains all three-factor generating terms that give rise to all the
two-factor interaction terms in the model.
To obtain the corresponding maximum likelihood estimates of model, {ABC,
ABD,DEF}, we note that the first two configurations 6123 and 6*124 have 6*12 in
common, so it yields an estimate of 6*1234, that is,
Estimates of Cells
Add the next configuration
Estimates of Cells

6*1234 = Cells of

612
~

6123455 Cells of
n

= Cells ofc

61236*1246*456

^12^4

The implications of the above rules are that:


1. If direct estimates exist, we can compute cell estimates in stages.
2. If direct estimates exist, then at least one of the two-factor terms is absent in
the model.
3. We can always compute direct estimates by inspection if we can reduce to
only two configurations.
For model {AB,AC,BC} in the three-way table, therefore, direct estimates do not
exist because it has all the three two-factor terms in the model.
The main reasons why we always wish to know if direct estimates exist are:
Difference(s) between two nested direct estimable models can be determined
from the appropriate marginal tables rather than from the complete tables.
Asymptotic variances of the estimated parameters can readily be obtained.
Models can be interpreted in terms of hypotheses of independence, conditional
independence, and equiprobability (Goodman, 1970).

210

6.8.2

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

Structural Breakdown of G2

When we have a log-linear model where direct estimates exist, that is, with closedform maximum likelihood estimates, we need not compute the expected values {m}
to get G2. For an / x J x K three-way table, for example, consider the model given
by setting AC = ABC = 0; that is, model {AB,AC}.
The sufficient statistics are Ci2, C23 with
C-2
and

G = 2 2_^ nijk

ln(nijkn+j+/nij+n+jk)

ijk

= 2 2_^ ^ijk hi(nijfc) 4- 2 ^ n^k ln(n +J - + )


ijk

ijk

ijk

ijk

nijk

= 2

\n(n+j+}

ijk

-2

n+jk

= 2 [Gi23(N) 4- G 2 (N) - Gi 2 (N) - G 23 (N)]


For a three-way contingency table, there are eight unsaturated hierarchical loglinear models, seven of which have direct estimates. We can compute G2 for these
seven models directly from the following quantities:
G f i 2 3 (N),G 1 2 (N),G 1 3 (N),G' 2 3(N),G 1 (N),G 2 (N),G'3(N) and Nln(N)
For any log-linear model with direct estimates therefore in any number of dimensions, the form of the structural breakdown of G2 is always the same and of the
form:
f
G2=2 G(N) -^G*(N)+^G'**(N)
(6.24)
*
**
where * indexes the minimal sufficient configurations associated with the given
model and ** indexes the overlapping of the minimal sufficient statistics.
It must be stated here that since computer software is readily available for
obtaining these estimates (whether direct or indirect), it is therefore not of serious
importance to know of their direct MLE existence. It is, however, challenging to
know how these estimates are derived.
Example
Consider fitting the model {RQ,YQ} to the data in Table 6.10. Then the sufficient
statistics are rii+k and n+jk. In Tables 6.13 and 6.16 are the cell counts for this
data when collapsed over Y and R, that is, n^+fc, and n+jk, respectively, with

6.8. DECOMPOSABLE AND GRAPHICAL LOG-LINEAR

R(i)
1
2
Total

Q(fc)
i

2
323
693
1016

278
782
1060

MODELS

211

Total
601
1475
2076

Table 6.13: Data in Table 6.10 collapsed over Y

Q(*0
Y(j)
1
2
Total

1
445
615
1060

2
431
585
1016

Total
876
1200
2076

Table 6.14: Data in Table 6.10 collapsed over R

= 1060 and

= 1016

Direct ML estimates are therefore given by:


n++k

For instance, mm =
- = 116.708. This value agrees with the expected
values displayed from the previous SAS software output under this model. This
model has a corresponding G2 of 2.0154 on 2 d.f.
Now applying the structural breakdown of G2 to obtain our result, we have:
^ nijk \n(nijk) +/ ^
n++k ln(n ++fc )
^
fc

ijk

-2

ni+k ln(n i+fe ) - 2^ n+jk


ik

jk

G2 = 2{[1261n(126) + 1411n(141) + 1521n(152) + 1821n(182)


+ 319 ln(319) + 290 ln(290) + 463 ln(463) + 403 ln(403)]
+ [10601n(1060) + 10161n(1016)]
- [2781n(278) + 3231n(323) + 7821n(782) + 6931n(693)]
- [445ln(445) + 431 ln(431) + 615ln(615) + 585ln(585)]}
-2.0154
which agrees with the value of G2 obtained via the computed expected values.
Consider, for example, a seven-dimensional table and a log-linear model with
minimal sufficient configurations
> -235 1

212

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

Then we can write


G2 = 2 [G(N) - Gi 24 (N) - G 235 (N) - Gi 36 (N) - G 57 (N) - Gi 23 (N)]
+ 2 [G12(N) + G 23 (N) + G 13 (N) + G 5 (N)]

6.9

MLE via Iterative Procedures

Two standard methods of estimating the expected values in the general log-linear
models are the iterative proportional fitting (IFF) algorithm due to Deming and
Stephan (1940) and the Newton-Raphson NR algorithm, which fits a series of
weighted regression analysis and is therefore some times referred to as the iteratively
reweighted least squares. We describe below these two algorithms when applied to
the data in Table 6.10, where the model of interest is the no three-factor interaction model that is, model {RY,RQ,YQ}, which does not have closed-form direct
estimates.

6.9.1

Iterative Proportional Fitting Algorithm (IFF)

For all models where direct or indirect estimates do or do not exist, model {AB,AC,
BC} in a three-way contingency table, for instance, does not have direct estimates.
In this case, we can obtain maximum likelihood estimates for each of the elementary
cells under such models by iterative fitting of the sufficient configurations. This
method of successive proportional adjustments has the following properties.
It always converges to the required unique set of MLE estimates.
The estimates depend only on the sufficient configurations, and so no special
provisions need be made for sporadic cells with no observations.
Any set of starting values may be chosen that conforms to the model being
fitted.
If direct estimates exist, the procedure yields the exact estimates in one cycle.
A stopping rule may be used that ensures accuracy to any desired degree
(usually 0.001) in the elementary cell estimates, instead of a rule that only
ensures accuracy in one or more summary statistics.
For example , consider again the 2 x 2 x 2 gun registration data in Table 6.10
with variables R, Y, and Q. We give in the next table the observed cell counts for
the data.
Cells

111

flijk

126

211
319

121
141

221
290

112
152

212
463

122
182

222
403

For model {RY,RQ,YQ}, therefore, the sufficient configurations (statistics) are


Gi2,Gi 3 , and G23 respectively. That is, riij+,ni+k, and n+jk, respectively.
STARTING POINT:
To start the IFF procedure, we start by initially setting rh\A. = 1 for every cell
in the table. This correspondingly sets ln(m\jk) 0.

6.9. MLE VIA ITERATIVE PROCEDURES

213

We then adjust the preliminary estimates to fit successively, C^Cis, and C^zFITTING Ci2
Fitting Ci2 gives
(i)
- (0) nij
^
m
m^ '
}

ijk -

ijk , (

FITTING
We next fit the marginal (7i3, giving
- (2) _ m, (i) ni+k
m
ijk - ijk (i)
m
i+k
FITTING C23
We now finally, complete the first cycle by fitting C-23 to give
* (3) Tn
- (2)
^ '
ijk ~ mijk

m
m ^ '

We will then repeat this three-step cycle until convergence to the desired accuracy
is attained. A satisfactory stopping rule is to choose 5 = 0.01 or 6 0.0001 such
that
- (3r)
- (3r-3)
ijk ~ mijk

The above procedure is implemented below for our data.


Cell
111
211
121
221
112
212
122
222

nijk
126
319
152
463
141
290
182
403

m(0)
I
1
1
1
1
1
1
1

ro$
133.5
304.5
167.0
433.0
133.5
304.5
167.0
433.0

- (3)

m(V

123.50
322.87
154.50
459.13
143.50
286.13
179.50
406.87

iik

123.12
321.88
154.84
460.16
143.96
287.04
179.08
405.92

- (12)
iik

123.10
321.90
154.90
460.10
143.90
287.10
179.10
405.90

In the three tables below, we give the sufficient marginals for both the observed and
estimated fitted values for the first cycle of iteration:
C12
11+
21+
12+
22+

nij+
267
609
334
866

m^l
2
2
2
2

Ci3
1+1
2+1
1+2
2+2

ni+k
278
782
323
693

m^fc
300.5
737.5
300.5
737.5

C23
+11
+21
+12
+22

n+jk
445
615
431
585

m(^k
446.37
613.63
429.63
586.37

where, for instance, the estimates for mm and 771222 for the three steps of the first
cycle are (using the marginals in the above tables) given by:

214

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY


m (0)
- 1
111
i( i)

(0)
m222
-1

_ 1 x 267

roiVi = -^

= 133-5

1 x 866

^^y
- 777.0oo :~::: "~""~~~

222

;A
mr 4oO .OU

,(2)
TOi =

TABLES

133.5 x 278

300.5

= 123'5

, (2 ) _ 433.0 x 693
737.5

771222

123.50 x 445
446.37

and
,(3) -

*"

= 123.121

406.87 x 585
= 405.922
586.37

We can show that the above computations are all equivalent. For example,
. (2) _ rnijkni+k _ njj+ni+k
ijk ~ TTt7 7 1 ) ~ ~ 7'^7-1\-

rn

i+k

6.9.2

l++

Successive Cycles of Iteration

At the end of the first cycle of iteration above, we obtained ih^A These expected
values are such that the constraints C^j, are satisfied, That is, the marginals now
add up to {445,615,431,585} but they no longer satisfy the constraints C\i and C\^\
hence there is a need to repeat the process until the difference between the rhijk in
the (3r) and (3r 3) cycles satisfy our accuracy target 6. In the above example, we
have convergence by r = 4, that is, at the 12th step or at the end of the fourth cycle
of iteration with 6 = 0.0 as the convergence criterion. The iterative proportional
algorithm can be implemented in SAS software by the use of the IFF algorithm in
PROC IML in SAS.

6.9.3

The Newton-Raphson Iterative Procedure

Any log-linear model can be reparameterized as a linear model. Consider, for example, the saturated model {AB} in a 2 x 2 table, where

! = u + XA + XB + \-B
where the usual constraints are imposed for identifiability and lij is the log of the
expected counts.
The above model can be rewritten in terms of identifiable parameters as:
Xl
where

Zli =

f 1 if i = 1
j ^
| -1 i f i = 2 and

f 1 if ?' = 1
** = \ -1 i f j = 2

Consequently, the models above can be written in the form:

6.9. MLE VIA ITERATIVE

215

PROCEDURES

" ill "


112
hi

" 1
1
1 - 1

/j,
AfB

1
1 "
1 - 1

\2
. n .

1
1 - 1 - 1
1 - 1 - 1
1

. *22 .

A B

where ^ = In(m^) have the familiar linear model formulation

Similarly, for the saturated 2 x 2 x 2 table, we have


ijk

jfc

with the usual constraints again imposed for identifiability. The above can also be
rewritten as:
7 ,, _L \IA 7li _i_
\B
n Z \iZ-2j
**i7 k A*
" 1 A
-\- A T I

^o-i^Mfc 4- Aini

where

1 if i = 1
-1 i f i = 2

the Z2j and ^3^ are similarly defined. Again, the model reduces to
Jill

" 1

^112

1 - 1 - 1 - 1 - 1

^121

1-1-1

^122

1 - 1 - 1 - 1 - 1

^211

1-1

1-1

^212

1-1

1-1-1

^221

1 - 1 - 1

/222

1-1-1

1-1

M
Af
Af
Aft*
Af
Aff
Aff

1"

1 - 1 - 1

1-1

1-1

1-1

1-1-1

\ABC
lll

1-1

which can again be written as:


For reduced models in the 22 and 23 tables, such models can similarly be written
in the linear model format. Again, consider the models of independence and no
three-factor interactions in the 22 and 23 tables, respectively, for example. The
models have the following matrix formulations, respectively.
' 1
1
1"
112
1 -1
1
1
1 -1
\s
/\o
1
-1
-1
h-2

" 111 '

hi

" Jin "

' 1

Jll2

1 - 1 - 1 - 1

Jl21

1 - 1 - 1
1-1-1

1
1

Jl22

^211

1-1

1-1

-1

hl2

1-1

1 - 1 - 1

h-2i

1 - 1 - 1

h22

1 - 1 - 1

1-1

1"

XX

1-1
-1

1-1

1 -1

1-1
1

-1
1

-.

\BC

216

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

In both cases, the last column in each of the X matrices (of constants) have been
deleted. That is, the models can now be written as:

where X* is the reduced matrix of coefficients.


We see from the above representation of log-linear models that the standard
least-squares approach (weighted) may be applied to estimate the parameters of
interest and consequently, the expected values under a given model.
To demonstrate the implementation of Newton's algorithm, let us consider again,
the data in Table 6.10 where we wish to again fit the model {RQ,RY,YQ} to the
data. This model has RYQ = 0.
Let
.._r-i
i _i i _i i i
i]
be the column containing the ABC (RYQ) interaction in the original matrix X.
Clearly, 2_. cijk 0. Also, let the initial estimates of the MLE be m^ and we
ijk

would initially set these equal to the observed counts, that is, rh.-l = rim?.- To start
the iteration, we also need to compute the followings:
-ijk
ijk

l,(o)
ijk J

(r)
ijk

-'ijk
(r)
ijk

m;
N

'"'ijk - &

where l\-k = \n(nijk) and N is the total sample size.


For our data, we have
7 (0) = 0.11113 /i(0) = 0.03833 / (0) = 2.89952
and using these we have </' = 0.99992.
The results for the first iteration are given in the following table.
Cells
nijk
u(0)
Vijk
rn (1)
m
iik

111
126

111
152

121
141

122
182

211
319

212
463

221
290

222
403

4.813

5.043

4.969

5.188

5.774

6.131

5.660

6.006

123.12

154.92

143.92

179.11

321.89

460.07

287.09

405.88

6. 1 0. INTERPRETATION

OF PARAMETERS IN HIGHER TABLES

21 7

To commence subsequent iterations, we would need to adjust the preceeding value


of l^ and repeat the above cycle of computations. The initial adjustment is of
the form:
,
_ - (*K
*J

For the second iteration, for example, we also have


7(1) = 0.11114 hw = 0.03834

and /(1) = 2.89922

and using these again, </^ = 1.000. The results for the second iteration are
given in the table below.

Cells
nl}k

/4'fc
~ (2)
iik

126

112
141

121
152

122
182

211
319

212
290

221
463

222
403

4.813

5.043

4.969

5.188

5.774

6.131

5.660

6.006

123.10

154.90

143.90

179.10

321.90

460.10

287.10

405.90

111

Since g^ = 1.000, we have thus attained convergence and m^A are the ML estimates based on the Newton- Rap hson iterative procedure. We observe that these
estimates agree to two decimal places with our previous estimates with the iterative
proportional fitting algorithm.
However, if convergence has not yet been attained, the cycle of iterations would
continue. In general,
g

> 1.000

/ * > /

and

mk > mijk

As pointed out in Haberman (1978), the GQ test statistic under the above model is
given by
= (2.89952)2(0.03833) = 0.32222
This value of GQ agrees with the G2 computed for this model in the displayed
SAS software output. The test statistic GQ has been noted by Goodman (1964) to
be equivalent to Woolf's (1955) test statistic for a no three- factor interaction in a
2 x 2 x 2 contingency table.
Haberman (1978) has given a comprehensive approach to the Newton's iterative
procedure. Most major statistical softwares adopt either the IFF algorithm or
Newton's algorithm.

6.10

Interpretation of Parameters in
Higher Tables

We consider in this section, the interpretations of parameters in log-linear models


in three-way and four-way tables. Because of its simplicity relative to other higher
dimensional tables, we will first consider the three-way tables in the next subsection.

218

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

6.10.1

TABLES

The General Three- Way / x J x K Tables

The two- factor effect AB represents the interaction between variables A and B
averaged over variable C. We can therefore define two-factor effects as products of
cross-product ratios. Let
For the 2 x 2 x 2 , the above reduces to:

The three-dimensional table has the three-factor effect ABC, which the twodimensional model does not have. We derive ABC as the average value of (AB)
across the tables and the particular value exhibited by k. Thus
A ABC
111

All the cells whose subscripts sum to an odd number appear in the numerator and
all those whose subscripts sum to an even number appear in the denominator. The
data in Table 6.10 are represented in the form below with computed relative risk
estimates, where a.s.e. stands for asymptotic standard error.
Year
Y(j)
1975 (1)
1976 (2)

Quest.

Q(*0
Qi
Q2
Qi
Q2

R(i)
1
2
126 319
141 290
152 463
182 403

Estimates
-0.9289
-0.7211
-1.1138
-0.7949

a.s.e
0.1052
0.1027
0.0935
0.0893

Table 6.15: Estimates of relative risk


Then, the relative risk of a response of oppose gun registration rather than favor
gun registration is rhijk/rhzjk, for persons in year j and form of question k. Hence
,R.YQ _ , _ / "ij'fc \ * = 1,2
where f is the estimated log-odds and the estimated asymptotic standard error is
given by:
nijk
These estimates are given in the table above. We shall use these results in the next
sections.

6.10.2

Three-Factor Effect Absent

Setting the ABC effect to zero enables us to describe a table with constant two-factor
effects. That is, the model {AB,AC,BC}, which is given in log-linear formulation
as

hjk = M + \ + ^j -t-iCAk -t- A^- -t- A- fc - -t- A-^


The model states that there is "partial association" between each pair of variables.
The model has

6.10. INTERPRETATION

OF PARAMETERS IN HIGHER TABLES

d.f. = UK -

219

- I)}

= (I 1)(J l)(K 1)

degrees of freedom

The model assumes that every variable is directly associated with every other variable and that controlling one variable does not remove the association between the
other two variables. In terms of odds ratios, the above implies that if we fix one
factor variable, then the odds ratios relating the remaining two variables are constant for each categories of the fixed variable. For instance, if we hold variable C
constant, then the odds ratio relating variables A and B at say fixed levels of C
(k = I and k = k) are such that

corresponding to the hypothesis,

for i = 1, 2, , (/ - 1), j = 1, 2, , ( J - 1) and k = 1, 2, , K .


The above of course states that the model holds, if and only if the conditional
log cross-products T\4>\ k are constants across variable C. That is,
. ,
,
.
,
= constant, k = 1,02, ..,r /
= ln
The ML estimates for this model cannot be obtained directly, and iterative methods
such as the IFF or Newton's algorithm are used to estimate the expected values.
We have earlier used both algorithms to obtain the MLE for this model for the data
in Table 6.10.
Under this model, the estimates of the partial log odds ratios are equal. That
is,
In
M = In
n =ln

for i ^ i', j ^ j', and k ^ k'. The asymptotic variance for Ti(ii..,\,,,
>\(kki\} is given by:
_i

Similar expressions for the asymptotic variances of the other two parameter estimates can be easily formulated.
Model {RY, RQ, YQ} implies the following:
,
,
=m
m
= -0.2705

220

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

with estimated asymptotic variance computed as:


11
11
1
1
1
1
1
1
1
1
l
^
'
'
mm
mm
ra
i
m
n
123.10
460.10
321.90
m-211
2 2
2
= 0.0199 for j = l
and

I
A

77li22

I
1

y\

m.221

1
1

1
1 lI 11 lI 11
287.10 ' 179.10 ' 143.90
ra222
= 0.0185 for j = 2

_[_

1
154.90

1
405.90

Hence,
a.s.e.ff^^^) = [ ^^r + ^ )
0.0185/

= 0.0979

An approximate 95% confidence interval for 7V12w12) nas bounds


-0.2705 - 1.96(0.0979) = -0.4624
-0.2705 + 1.96(0.0979) = -0.0786
These values agree with the results obtained from SAS PROG GENMOD under
model {RY,RQ,YQ}. Thus, given the year of the questionnaire, the odds that
an individual will oppose gun registration rather than favor gun registration are
estimated to be exp(-0.4624) = 0.63 to exp(-0.0786) = 0.92, from 0.63 to 0.92
times higher for those administered the question of form 1 (Fl) than for those
administered the question of form 2 (F2).
We can from the SAS software output obtain the parameter estimates and confidence intervals for the other two parameters. For instance, T/ 12 x/j 2 \ = 0.1274
with corresponding 95% confidence interval given as (0.0643,0.3192). Thus given
the type of question, the odds that an individual will oppose gun registration
rather than favor gun registration are estimated to be exp( 0.0643) = 0.94 to
exp(0.3192) = 1.38, from 0.94 to 1.38 times higher for those interviewed in 1975
than for those interviewed in 1976.

6.10.3

Three-Factor and One Two-factor Effect Absent

There are three versions of the model with the three-factor effect and one two-factor
effect missing. The three generating classes have respectively
a ABC = BC = 0 implies model{AB,AC}
b ABC = AC = 0 implies model{AB,BC}
c ABC = AB = 0 implies model{AC,BC}
Selecting (a) above, that is, the model with BC (and hence ABC) absent, we have
lijk = /, + \f + Xf + A? + X*B + \fkc
This model states that variables B and C are independent for every level of variable
A, but each is associated with variable A. In other words, variables B and C are
conditionally independent, given the level of variable A that is,

6.10. INTERPRETATION

OF PARAMETERS IN HIGHER TABLES

221

where vr^-fe is the underlying probability under multinomial sampling scheme. The
model has direct MLEs given by
nij+ni+k
mijk = J^
ni++
with corresponding d.f. given by
d.f. = IJK-[l + (/-!) + (/-!)]

The parameters \f-B and \A^ in the above log-linear formulation refer to the A-B
and A-C partial associations, respectively. We may also note here that the conditional independence of B and C does not rule out the possibility that they may
however be marginally dependent.
Under the conditional independence of B and C given A, written (Anderson,
1997) as B _L C\A, that is, if model {AB,AC} is fitted to the data, then we would
expect the estimates of the log cross-products
ntfc' i = ^. That is,
}i

= In (

'"" / J

\= 0,

i = l,2, - , /

Similar results and interpretations can also be obtained and made for models (b)
and (c) above. Thus, if models {AB,BC} and {AC,BC} are fitted, then we would
expect, respectively,
S.AC.B
_ i
and

Again, when any one of these conditional independence models is applied, then
the partial association coefficients are equal to their respective marginal association
coefficients. That is,
BC A
BC

AC.B

__

AC

and the log-odds ratios (all zeros) reduce respectively to the following:
frhikrhi-'k' \
In I -^
1 = In
In

w~'j

=m
f

i
'm
I H i j TYt
iii'ij'

i'"lJK""l' IK'

In

= In

Only models {RY,YQ} and {RQ,YQ} are applied to the data in Table 6.10. The
third model {RY, RQ} is not considered because that would mean fitting a model
conditional on the response variable R. The results from the SAS software program
below are presented below in Table 6.16.

222

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

set tab611;
proc genmod;
class r y q;
model count=r|y y|q/dist=poi;
run;
proc genmod;
class r y q;
model count=r|q yIq/dist=poi;
run;

Cell
111
112
121
122
211
212
221
222
df
G'2
X2

nijk
126
141
152
182
319
290
463
403

Table 6.16: Estimated cell counts

{RQ,YQ}
116.708
137.021
161.293
185.979
328.293
293.979
453.708
399.021
2
2.015
2.0228

{RY,YQ}
135.634
131.366
171.175
162.825
309.366
299.634
443.825
422.175
2
8.109
8.1059

under the two models considered above

Model {RY, YQ} for instance has,


RQ.Y
,
= ln

,
= ln

F (135.634) (299.634)1
[(309.366)(131.366)J
= In(l.OOOO)

_
~

r(171.175)(422.175)
[(443.825)(162.825)

=0

For the two-way table collapsed over the Y variable, that is, the R-Q subtable,
the expected values under the model of marginal independence are ran = 306.869,
294.131, ra2i = 753.131, ra22 = 721.869. Consequently,
In

Q.o

Similar results can be obtained for model {RQ, YQ}. Of the two models, only model
{RQ, YQ} fits the data with a G2 = 2.0154 on 2 d.f. (pvalue = 0.3651). Therefore,
let us consider the variation involving individuals interviewed in year j (Yl = 1975,
Y2 1976) given that an individual is administered the form of questions (Q) at
levels k = 1 or k = 2, the estimated log-odds (log of relative risk) is estimated by:

6.10. INTERPRETATION

OF PARAMETERS IN HIGHER TABLES

223

For the form of question Ql, (k 1, in this case), we have


,

= -1.0342 - (-1.0342)
=0
with corresponding estimated asymptotic standard error

Thus, for question form 1 (Ql), we have a standardized value of (0.0/0.1407) = 0,


which is not much strong evidence of different relative risks between the form of
questions. An approximate 95% confidence lower and upper bound are estimated
as -0.2709 1.96(0.0970) = -0.2709 0.1902 = [-0.46,-0.08].

6.10.4

Three-Factor and Two Two-Factor Effects Absent

Again, for the three variables A, B, and C, there are three versions of the model with
the three-factor effect and two two-factor effects missing. These are respectively:
a ABC = AB = AC = 0 implies model {A,BC}
b ABC = AB = BC = 0 implies model {B,AC}
c ABC = AC = BC = 0 implies model {C,AB}
Selecting (a) again, we have the log-linear model formulation
/ = u + XA + XB + Xc + XBC
Variable A is now completely independent of the other two variables, while variables
B and C are associated. The model has the probability structure
#03 :

and expected cell values

Kijk = 7Tj ++ 7r + jfc

^
rii++n+jk
mijk =
jj

with corresponding
d.f. = UK - [I + (I - 1) + (J - 1) + (K - 1) + (J - l)(K - I)}
We say that A is jointly independent of B and C written succinctly as A J_ B, C
and this implies that A and B are conditionally independent given C. In addition,
A and C are also conditionally independent given B. Further, A is independent of B
and C in the A-B and A-C tables, and thus the partial odds ratios in the B-C table
at the level of A and its corresponding marginal odds ratio are equal. A further
implication of the above is that we can collapse the three-way table over variable A

224

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES


Cell
111
112
121
122
211
212
221
222
d.f.
G'2

nijk
126
141
152
182
319
290
463
403

fO
f~\ 5 ~\f\
1
Jrtv^
I r

117.31
136.29
160.69
186.71
329.98
292.42
452.02
400.58
3
2.057

{RY,Q}
136.33
130.67
170.54
163.46
310.95
298.05
442.18
423.82
3
8.151

{YQ,R}
128.83
124.77
178.04
169.36
316.17
306.23
436.96
415.64
3
9.829

Table 6.17: Estimated cell counts rhik under the three models considered above
as it is independent of B and C without affecting any of the remaining parameters
of the subscripted terms.
Similar results can also be obtained for models (b) and (c) above. As an example,
we apply the three models to the gun registration data in Table 6.10 and the results
are presented in Table 6.17.
The model {RQ,Y}, which fits the data, postulates that the year of survey (Y) is
completely independent of the remaining two variables R and Q. This model has a
G2 value of 2.0566 on 3 d.f. with a corresponding pvalue of 0.561. That is, Y and
R are conditionally independent given Q and that Y and Q are also conditionally
independent given R. From models {RQ,YQ} in Table 6.16 and model {RQ,Y} in
the table above, the contribution of the interaction term {YQ} has a G2 = 0.042
on 1 d.f., which is clearly not significant. Hence model {RQ,Y} seems the most
parsimonious model to the data in Table 6.10. This model also suggests that the
form of the question (Q) is associated with gun registration response (R) and that
this association is independent of the level of Y. We can put this in terms of log
odds ratio as
_ i
T RQ-y
(n')(kk').j - i
That is, for j = 1, 2 the expression below should be a constant:

\
For the data, we have for j = 1 and j = 2 the partial association log odds ratios as:
,
In

117.31 x 292.42 \
, / 160.69 x 400.58
n nfr
_____ = -0.2709 = In
^329.98 x 136.29 /
\452.02 x 186.71

The corresponding observed log-odds of the marginal RQ table is

The parameter estimates for XR,XR, XQ should be equal to those of the saturated
model based on the 2x2 R-Q marginal table (collapsed over Y). That is, XRQ, XR, XQ
in the marginal table. We give the SAS software program and modified outputs
when these two models are fitted to the data in Table 6.10 below. As expected, the
parameter estimates are equal.

6.10. INTERPRETATION

OF PARAMETERS IN HIGHER TABLES

225

set tab611;proc genmod;


class r y q; model count=r|q y/dist=poi; run;
MODEL {RQ,Y}

1 2
1 2
1 2
Criteria For Assessing Goodness Of Fit
Criterion

DF

Deviance
Pearson Chi-Square

Parameter
Intercept
R
Q
R*Q
Y

DF

1
1
1
1

1
1
1
1
1

Value

Value/DF

2.0566
2.0620

0.6855
0.6873

Analysis Of Parameter Estimates


Standard
ChiEstimate
Error
Square Pr > ChiSq
5.9929
-0.7634
0.1208
-0.2709
-0.3147

0 . 0424
0.0674
0.0522
0.0970
0 . 0444

5.9099
-0.8954
0.0186
-0.4610
-0.4018

<.0001
<.0001
0.0206
0.0052
<.0001

MODEL {RQj-Saturated Model.


proc genmod;
class r q;
model count=r|q/dist=poi;
run;
***fit independence model to collapsed table****;
proc freq; weight count;
tables r*q/chisq;
run;
Analysis Of Parameter Estimates
Standard
ChiParameter
DF Estimate
Error
Square Pr > ChiSq
Intercept
r

r*q

1 1

1
1
1
1

6.5410
-0.7634
0.1208
-0.2709

0 . 0380
0.0674
0.0522
0.0970

29650.10
128.39
5.36
7.79

<.0001
<.0001
0.0206
0.0052

The resulting collapsed 2 x 2 R-Q table has fitted G2=7.8133 on 1 d.f. for the
test of independence. Clearly, the response R of an individual strongly depends
on the form of question Q regardless of the year of survey. Since both marginal
{R,Q} and partial {RY,YQ} indicate strong dependence of variables R and Q (no
contradiction: both did not hold), we can thus collapse the three-way table over
variable Y without distorting the primary association between variables R and Q.
On the other hand, the partial and marginal fits of models {RQ,YQ}, {R,Y}
give G2 = 2.015 and G2 = 1.7193 on 2 and 1 degrees of freedom, respectively.
However, model {RY,Q} does not fit the data. Thus in this case, both the partial
association and marginal models fit the data but the model of joint independence
does not fit the data. Hence, collapsibility is not possible over variable Q without
distorting the primary association between variables R and Y.
It is obvious from the above results that a condition that must hold for the log
odds ratios of the partial and marginal associations to be equal is that the joint
independence model of the form {AB,C} must necessarily hold true. In our case,

226

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

only model {RQ,Y}, fits the data as displayed in Table 6.17, and it should therefore
be possible to collapse over variable Y. We shall explore collapsibility conditions
further in chapter 7.

6.10.5

Three-Factor and All Two-Factor Absent

This model represents complete independence of all the three variables and has the
log-linear representation
, _
, \A , \B , \c
kjk ^ + *i + AJ + *k
The model is written as {A,B,C} and has the probability structure
#04 :

and expected cell values

Kijk

_ rii++n+j+n++k
j^2

mi k =

and corresponding UK -[I + (I - 1) + ( J - 1) + (K - 1)], that is, (IJK-I-JK+2) degrees of freedom. The model is sometimes described as the model of mutual
or pairwise independence. Tables satisfying this hypothesis have the important
property that collapsing the table over any of the variables retains the independence
of the other two variables. That is, any two pairs of variables are conditionally
independent (given the third variable) and also marginally independent. This model
when applied to the data in Table 6.10, that is, model {R, Y, Q}, has a G2 = 9.8699
and is based on 4 d.f. with a pvalue of 0.043. This model therefore does not fit
the data, and indicates that the three variables are not pairwise independent. The
model can be written succinctly as A _L B _L C.
A general relationship between the various forms of independence described in
this section can be succinctly put in the form:
(a) => (6) => {c(i) or c(ii)}
where
(a) means A, B, C mutually independent {A,B,C}: A _L B JL C
(b) means B is jointly independent of A and C, {B,AC}: B _L A, C
(c{i}) means A and B marginally independent, {A,B}: A A, B
(c{ii}) means A and B conditionally independent, {AC,BC}: A _L B\C: AB missing.
Generally therefore, marginal independence implies conditional independence (or
vice versa). For example, model
{R, Y} => {RQ, YQ} if both models fit data
{R, Q} => [RY, YQ} hence both model do not fit data
Further, the partial association between variables A and B given level k of C (for
example) has the log odds ratios defined by:

AB.C
where H is a constant. If this partial association model holds, then H would be
equal to zero. That is, H = 0 when the model {AC,BC} holds. As discussed above,
there are three variants of this model.

6.10. INTERPRETATION

OF PARAMETERS IN HIGHER TABLES

227

Also, the marginal association between variables A and B (variable C being


ignored) has its log odds ratios defined by:
=W

AB

If this model holds, then W = 0.


The basic relationship between partial association and marginal association measures is that TU^)(JJ') k = T(i^)(jj') holds if either A and C is conditionally independent given B (that is, model {AB, BC} holds) or B and C are conditionally
independent given A (that is, model {AB, AC} holds). For three-way tables, therefore, if models {AB, BC} and {AB, AC} fit the data set, then it will be possible to
collapse over variable C without incurring Simpson's paradox. In our example, the
models {RY, RQ} and {RQ,YQ} hold, and therefore it is possible to collapse over
variable Y.

6.10.6

Interpretation of Parameters

For the data in Table 6.10, the most parsimonious model is model {RQ, Y} with
G2 = 2.0566 on 2 d.f. The ML parameter estimates from both GENMOD and
CATMOD procedures for the \fkQ are
Parameter estimates for \ under GENMOD
R(i)
I
2

Q(*)
1
2
-0.2709 0
0
0

Parameter estimates for

under CATMOD
Q(*0

R(i)
1
2

-0.0677
0.0677

0.0677
-0.0677

The interaction between gun registration response and form of question indicates
that fewer respondents oppose gun registration when asked the form of question
1 than question 2. Thus the odds of an individual opposing gun registration is
e _ ( _o.2709) = I3l = e-(4*-.0677) higher among those asked question form 2 than
among those asked question form 1, averaged over the years. Similarly, more of
the respondents favor gun registration when asked the form of question 1 than
when asked the form of question 2. The odds here also are 1.31. Parameter estimates for the main effect Xj (0.1574,0.1574) indicate that there are more
respondents in 1976 than in 1975 in the survey. Similarly, parameter estimates
for A^ = (0.4494,0.4494) again indicate that there are more respondents overall
favoring gun registration in the survey.

6.10.7

Example: Three-Way Table

The following example from Aicken (1983) relates to a follow-up survey in which
voters were first asked to express their political preference (P), and then, upon fol-

228

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

lowup, their actual voting pattern (V). The voters were further classified according
to their income level (I).

P(0

PO

I 0')
< 10,000
10-15,000
15-20,000
> 20, 000

Totals
PI

Totals
119
96
84
97
396

< 10,000
10-15,000
15-20,000
> 20, 000

67
75
67
67
276

37
45
57
63
202

104
120
124
130
478

< 10,000
10-15,000
15-20,000
> 20, 000

5
3
9
14
31

35
28
35
71
169

40
31
44
85
200

Totals
P2

V (fc)
VO
VI
112
7
83
13
76
8
86
11
357
39

Totals

Table 6.18: Observed counts for the political preference survey


For Table 6.18,
P = political preference with categories:
PO = Democrat
PI = Independent
P2 = Republican
V = actual vote
VO = Democrat
VI = Republican
I = income level (in $) per year.
10 = < 10,000
11 = 10,000-15,000
12 = 15,000-20,000
13 = > 20,000
The above table is therefore a 3 x 4 x 2 three-way contingency table. If we fit first a
saturated model to the data, then we would be in a position to examine each main
effect and interaction individually from the type3 tests. The saturated model from
SAS software using PROC GENMOD is displayed in the modified output below:
data tab616;
do p=l to 3; do i=l to 4; do v=l to 2;
input count <8<B; output; end; end; end;
datalines;
112 7 83 13 76 8 86 11 67 37 76 45 67 57 67 63

6.10. INTERPRETATION

OF PARAMETERS IN HIGHER TABLES

229

5 35 3 28 9 35 14 71

proc freq; weight count; tables p*(I,V) I*V/chisq; run;


proc genmod; class p i v;
model count=p|iIv/dist=poi link=log type3; run;
Analysis Of Parameter Estimates

Parameter
Intercept
P
P
I
I
I
P*I
P*I
P*I
P*I
P*I
P*I
V
P*V
P*V
I*V
I*V
I*V
P*I*V
P*I*V
P*I*V
P*I*V
P*I*V
P*I*V

1
2
1
2
3
1
1
1
2
2
2
1
1
2
1
2
3
1
1
1
2
2
2

1
2
3
1
2
3
1
1
1
1
1
1
2
3
1
2
3

1
1
1
1
1
1

DF

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

4..2627
-1..8648
-0..1195
-0..7073
-0.,9305
-0..7073
0.,2553
1.,0975
0.,3889
0.,1751
0.,5940
0.,6072
-1.,6236
3.,6801
1.6852
-0.3223
-0.6100
0.2655
1.0384
0.4074
-0.0707
0.,8545
1.,0592
-0.,1654

0..1187
0,.3240
0,.1731
0,.2065
0,,2232
0,,2065
0.5258
0,.4665
0,.5085
0,.2925
0,.2965
0,.2758
0,.2924
0,.4336
0,.3410
0.,5604
0.,6742
0.,4745
0.,7539
0.,8038
0..6826
0..6220
0..7217
0..5371

1290 .10
33 .12
0.48
11 .73
17 .39
11 .73
0.24
5 .53
0.58
0.36
4.01
4.85
30 .83
72,.02
24 .42
0,.33
0,.82
0,.31
1..90
0,.26
0.,01
1,.89
2,.15
0,.09

< .0001
<,.0001
0.4898
0,.0006
<,.0001
0,.0006
0.6272
0.0186
0.4444
0.5494
0.0451
0.0277
<,.0001
< .0001
<,.0001
0,.5652
0,.3656
0,.5758
0,.1684
0,.6123
0,.9175
0,.1695
0,.1422
0,.7581

LR Statistics For Type 3 Analysis


Chi-

Source

P
I
P*I

DF

Square

2
3

179.28
14.07
16.47
6.13
329.65
2.47
7.51

6
1

P*V
I*V
P*I*V

2
3
6

Pr > ChiSq
<.0001
0.0028.
0.0115 **
0.0133
<.0001 **
0.4812
0.2763

The typeS statement in the above program instructs GENMOD to produce a maximum likelihood analysis of the parameters. Examination of these indicate that only
the PI and PV interaction terms are significantly different from zero (**). The interaction term IV is obviously not important in the model. SAS software also gives
the parameter estimates (under last equal zero constraints) under the saturated
model (together with the standard errors). The estimates are read in the other of
appearance of the variables in the model. Thus Af = 1.86482, Af = 0.1195,
with Af = 0. The next three parameter estimates relate to I (i.e., #4, 5, 6), while
the parameter estimate numbered 7 (#7} relates to V, etc.
The number of parameters for the PIV interaction term equals (3 1) x (4
1) x (2-1) =6.
An examination of the pvalues for these parameters indicates that none of the six
terms in the three-factor interaction is significantly different from zero. Similarly, of
the two-factor interaction terms, all three IV terms indicate nonsignificant pvalues.

230

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

However, both PI and PV two-factor interaction terms each have one parameter
term significantly different from zero.
While results from the above table do suggest that possible candidates for consideration in the final model might be the model (PI, PV}, the saturated approach
has some drawbacks, especially when there are many factor variables. One drawback arose because for multifactor situations, it is not always clear which effects
are to be removed without affecting the others in the model. Removing an effect
of course implies that that effect has been tested to be significantly not different
from zero. Thus the problem with the saturated procedure has been to determine
which effect or effects are to be tested for zero. We shall consider two procedures
that have been advocated for testing each of the individual effects, namely, tests of
partial and marginal associations, in the next section.
Another starting point for determining the most parsimonious model for our
data is to start by fitting log-linear models first to all the r one-factor effects, then
to all the two-factor effects, then to all the three-factor effects and so on and so
forth, that is, a stepwise approach. In our example above this would mean that we
fit the following models:
(a) (P, I, V}
(b) {PI, PV, VI}
(c) {PIV}

The results of fitting the above models are displayed in the following table:
Model

{P,I,V}
(PI, PV, VI}
{PIV}

df
17
6
0

G2
392.9177
7.5096
0

X'2
379.5464
7.4134
0

P-value
< 0.0001
0.276

We can conduct goodness-of-fit tests on each of the above models to determine the
smallest model that fits our data. In our example above, the smallest model that
fits the data is model (b). We may therefore consider eliminating terms from model
(b). This procedure also has the drawback that the choice of an initial model is
very crucial, as different initial models may give different results. Having identified
model (b) as our possible starting point, the next question of course would be if
indeed we would need all the three terms PI, PV, and VI in the model. We shall
discuss these data further at a later section in this chapter.

6.11

Model Interpretations in Higher Tables

Consider a four-dimensional contingency table with factors A, B, C, and D indexed


by , j, k, I and with I, J, K, L Categories, respectively. We give below some possible
models for the four-way table. Some of these models are described in Christensen
(1990).

6.11. MODEL INTERPRETATIONS

6.11.1

IN HIGHER TABLES

231

No Four-Fact or Interaction

Possible models and their interpretations are the following:


1. {ABC,ABD} This model is interpreted as given A and B, factors C and D are
conditionally independent, and is written as C JL D\A, B. The model is
graphic and decomposable.
2. {ABC,AD,BD} Given A and B, factors C and D are conditionally independent,
written as C JL D\A, B
3. {ABC,AD,BD,CD} Each pair of factors is conditionally dependent, but at each
level of D, the association between A and B or between A and C or between B
and C varies across the levels of the remaining factor (Agresti, 1990). Further,
the partial association between D and another variable is the same at each
combination of levels of the other two factors.
4. {ABC,AD} Given A, factor D is independent of factors B and C, written as
D B, C\A. The model is graphic and decomposable.
5. {ABC,D} Factor D is independent of factors A, B, and C, written as D J_
A, B, C. The model is graphic and decomposable.

6.11.2

No Four-, and Three-Factor Terms in the Models

Possible models are:


6. {AB,AC,AD,BC,BD,CD} For this model, each pair of factors is conditionally
dependent, given the other two factors. Thus if A and B are conditionally
independent at each combination levels of C and D, then X^LD = 0. In general,
conditional independence between two factors at the several combination levels
of the other two factors in a four-way table implies the absence of some twofactor interaction term.
7. {AB,AC,AD,BC} Given A, factor D is independent of factors B and C, written
as D _L B, C\A.
8. {AB,BC,CD,AD} Given B and D, A and C are independent, and given A and
C, B and D are independent. The model can be written as A _L C\B,D and
B _L D\A, C and the model is graphical.
9. {AB,AC,AD} Given A, factors B, C, and D are all independent, written as:
B LC LD\A.
10. {AB,AC,BD} Given A, factor C is independent of factors B and D, written as
C J_ B, D\A, and given B, factor D is independent of factors A and C, written
as D _L A,C\B.
11. {AB,CD} Factors A and B are independent of factors C and D (and vice-versa),
written as A _L JB|C, D. The model is graphic and decomposable.

232

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

12. {AB,AC,D} Factor D is independent of factors A, B, and C. Given factor


A, factor B is independent of factor C. The model is written succinctly as
D _L A, B,C and B _1_ C\A, respectively. The model is both graphic and
decomposable.
13. {AB,C,D} Factor C is independent of factors A, B, and D, that is, C J_ A, B, D,
and factor D is independent of factors A, B, and C written as D J_ A, B, C.
The model is graphic and decomposable. An alternative interpretation provided for this model in Anderson (1997) is that factors A, B are independent
of factors C, D, written as: A, B _L C, D, and factor C is independent of factor
D, again written as C A. D.

6.11.3

No Four-, Three-, and Two-Factor Terms


in the Models

The only possible model here is the model of mutual independence:


14. {A,B,C,D} All factors are independent of all other factors, written as A _L B _L
C A, D. The model is both graphic and decomposable.
Note: For models 2 and 7, these models imply their interpretations, but the interpretations do not imply the models.
Expressions for the maximum likelihood estimates for all of these models when
they exist are given in the next table.
Model Number

MLE

1.
2.

No closed form (ABD missing)

3.

No closed form

4.

5.

6.

No closed form

7.

No closed form (ABC missing)

8.

No closed form

9.
10.
11.

12.
13.
14.

N'*
N3

The models numbered 3, 6, and 8 do not have direct ML estimates because they do
not contain their generating term ABCD.

6.12. TESTS OF MARGINAL AND PARTIAL ASSOCIATIONS

6.12

233

Tests of Marginal and Partial Associations

When we are faced with higher dimensional contingency tables with several factors
(presumably more than three), we do not usually have a defined hypothesis of
interest, and in this case we are usually interested in the parameters that need to
be included in a model that will fit the data well. One possible solution to this
would be to set up a table of all possible hierarchical log-linear models for the data.
However, for tables involving four or more factor variables, this can be so numerous
that one would be motivated to search for a solution that would reduce considerably
the number of such possible hierarchical models.
Brown (1976) proposed such a screening procedure that enables us to make an
initial assessment on the significance of the individual parameters in the saturated
model. The procedure proposed by Brown is described as the marginal and partial
association tests.
We are therefore interested in testing whether or not to include a particular
parameter in the model. Our approach would be that, for both marginal and
partial tests, we would fit a model containing the effect of interest and another not
containing the effect of interest, and assessing its significance by calculating the
relevant G2 and the corresponding difference in the number of degrees of freedom.

6.12.1

Tests of Partial Association

To illustrate this, consider a situation where we have four (s = 4) factor variables


A, B, C and D. Then the test for partial association is conducted by first fitting
the largest model of third order (s 1), which in this case would be the model
{ABC,ABD,ACD,BCD}. Next, we drop, say, the \ABC or the ABC-term from the
model. That is, we would now fit the reduced model {ABD,ACD,BCD}. The difference in G2 and the corresponding d.f., enables us to test for the partial association of
the ABC term. Similarly, a partial association test of the BD-term, say, is again obtained by the test of {AB,AC,AD,BC,BD,CD} against {AB,AC,AD,BC,CD}. This
procedure is implemented in SAS software by the type3 likelihood ratio tests for
each of these terms. First the model of pairwise independence is fitted and type3
G2 for each term is obtained; next the model of partial association is fitted, and
finally the saturated model is fitted. Below is the result of such analysis for the
3 x 4 x 2 data in Table 6.18.
set tab616;
proc gerunod; class p i v; model count=p i v/dist=poi type3; run;
proc gerunod; class p i v; model count=p| i I vffl2/dist=poi type3; run;
proc gerunod; class p i v; model count=p| i| v/dist=poi typeS; run;
LR Statistics For Type 3 Analysis
ChiSource
DF
Square
Pr > ChiSq

P*I
P*V
I*V

2
3
1

123.37
9.61
60.64

<.0001
0.0222

6
2
3

17.69
336.10
4.30

0.0071
C.OOOl
0.2304

7.51

0.2763

<.ooo

234

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

The results here indicate that important two-factor terms are PI and PV.

6.12.2

Tests for Marginal Association

The test for marginal association of A, B, and C is made by collapsing the table to
the A,B,C-margin. That is, we collapse over any factor or factors not included in
the particular term of interest (in this case, the ABC term). We would then test
that \ABC = 0 in the marginal table thus formed, that is, the test of {AB,AC,BC}
against the model {ABC}. Similarly, a test for marginal association ABCD is the
test of {ABC,ABD,ACD,BCD} against {ABCD}. Again, the test of marginal association AD is the test of {A,D} against {AD}. The above tests are each equivalent
to dropping (i) \ABC from model {ABC}, (ii) \ABCD from model {ABCD}, and
(iii) \AD from model {AD}, respectively.
Let us illustrate marginal association implementation with the 3 x 4 x 2 data
in Table 6.18. The highest marginal association models we can fit here are the
two-factor models since there are only three factors in the data. These marginal
association models are accomplished in SAS by PROC FREQ and we give below
modified results from these implementations.
set tab616;
proc freq; weight count; tables p*(v i) v*i/chisq; run;
Statistics for Table of P by I
Statistic

DF

Value

Prob

6
6

32.3191
31.3466

<.0001
<.0001

DF

Value

Prob

2
2

319.8592
349.7572

<.0001
c.OOOl

DF

Value

Prob

17 . 9056
17 . 9649

0 . 0005
0 . 0004

Chi-Square
Likelihood Ratio Chi-Square
Statistics for Table of P by V
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Statistics for Table of I by V
Statistic
Chi-Square
Likelihood Ratio Chi-Square

3
3

Clearly, the marginal association (or interaction) (PV) has the highest G2 value
of 319.8592 on 2 degrees of freedom, reflecting the strong dependence in the P-V
subtable displayed in the next table.
V

P
1
2
3
Total

1
357
276
31
664

2
39
202
169
410

Total
396
478
200
1074

All the marginal associations are highly significant based on marginal association
analyses here. Marginal associations are prone to the risk of Simpson's paradox,
and a proper procedure to combine the information from the marginal association

6.12. TESTS OF MARGINAL AND PARTIAL ASSOCIATIONS

235

analyses with those from the partial associations analyses would be needed. We can
summarize these results in Table 6.19.
No
(i)
(ii)
(Hi)

k
3
2
1

df
6
17
23

G'2
7.510
392.918
586.545

Hypothesis
PIV = 0

PI=PV=VI=O
P=V=I=0

pvalue
0.2763
< 0.0001
< 0.0001

Decision
Fail to reject
Reject
Reject

Table 6.19: Tests that /c-factor and higher order effects are simultaneously zero
The first line, k = 3, gives the G2 for the model without the three-factor interaction
PIV. That is, the line tests the hypothesis that PIV = 0. From this result, there
is no sufficient reason not to accept this hypothesis. The line with k 2 indicates
the model without the third- and second-order effects (because of the hierarchy
principle). The pvalue for this hypothesis strongly suggest that this hypothesis is
not tenable. The last line, k = 1, corresponds to a model that has no effects (that
is, all effects are zero), except for the grand mean. Again, this model is not tenable.
We note that the above tests are based on fitting all fc-factor marginals to the
relevant data. A model with first- and second-order effects would seem adequate to
represent our data.
It is sometimes necessary to test whether all interactions of a given order are
simultaneously zero. In the above analysis, interest centers on whether all effects
greater than a certain order are zero; here, however, our interest centers on whether
that particular order interaction is zero. Table 6.20 displays the relevant information
obtained from Table 6.19 by taking the difference.

No
(i)
(")
(iii)

k
I
2
3

d.f.
6
11
6

G'2
193.627
385.408
7.510

pvalue
0.0000
0.0000
0.2763

Table 6.20: Tests that fc-factor effects are simultaneously zero


The G value of 193.627 in Table 6.20 is the difference between the G2 values of a
model with only the mean and first-order effects (586.545-392.918 = 193.627). This
value is an indication of how the inclusion of the first-order effects have improved
our model. The pvalue thus indicates a significant contribution. On the other hand,
the table also informs us that the contribution of the third-order term PIV is not
significant. Once again, this indicates that only first- and second-order terms need
be in our model.
The partial tests earlier displayed also indicate that of all the main-effect and
second-order terms that have been identified as a plausible model, the second-order
term IV is found not to be significant. We may note here that the degrees of freedom
for conducting both marginal and partial tests are the same. Thus, although the
marginal odds ratios describe the association when the third variable is ignored (i.e.,
when collapsed over the third variable), the partial odds ratios on the other hand,
describe the associations present when the third variable is partially controlled or
fixed.
Christensen (1990) suggested the following four ways in choosing an initial model
using Brown's tests:
2

236

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

(a) We include in the model all terms with significant marginal tests.
(b) We include all terms with significant partial tests.
(c) We include all terms for which either the marginal or partial association tests
are significant.
(d) We include all terms in which both the partial and marginal association tests
are significant
It is obvious from the above that model (d) will always give the smallest initial
model, while (c) will always give the largest initial model. Model (d) can therefore
be used to determine an initial model for a forward selection procedure, while (c)
can similarly be used for a corresponding backward selection procedure. (These
procedures will be fully described in chapter 7.) Of course, we can always use as an
initial model a model that is intermediate between (c) and (d).
When Brown's tests are applied to the data in our example above, we have both
the IP and PV being very significant. Thus an initial model would be the model
{IP,PV}, which gives a G2 value of 11.8140 on 9 degrees of freedom. This model
turns out to be the most parsimonious model for describing this data, and we would
thus conclude that income and voting are conditionally independent given political
preference. In other words, political preference determines to a great extent the
actual voting behavior of the respondents. (This is the last model implemented in
the SAS software statements above.)
We note here that, because of the sampling constraint, the effect IP must necessarily be in the explanatory model.
I would like to caution here that the results from marginal and partial tests may
be completely different in terms of describing the associations that are present in
a given A>way table and they could sometimes lead to conflicting results. As an
illustration, for the three-way table in Table 6.18, while the partial tests indicate
significant effects for (PI) and (PV), with effect (IV) not being significant, indicating
that this effect is not significantly important, marginal test gives a pvalue of 0.0004,
very unlikely not to be important. This is illustrated below.
The GENMQD Procedure
LR Statistics For Type 3 Analysis

Source
I*V
I*V

DF
3
3

ChiSquare
4.30
17.9649

Pr > ChiSq
0.2304*
0.0004*

partial assoc. test


marginal assoc. test

The above contradictions in both marginal and partial tests indicate that one must
therefore be very careful in eliminating possible significant effects at the initial stage
of log-linear model analysis.

6.12.3

Interpretation of Chosen Model

The selected model can be interpreted as


I V\P

6.13. COLLAPSIBILITY CONDITIONS FOR CONTINGENCY TABLES

P(0
1
2
3

237

V(*0
1
2
0.9687 -0.9687
0.0177 -0.0177
-0.9864 0.9864

Table 6.21: Parameter estimates for \fkv under CATMOD


Of particular importance to us are the parameter estimates of the X^ interaction,
which are shown in Table 6.21.
The parameters indicate that there are more Democrats voting Democratic while
there is a corresponding significant number of Republicans voting for the Republican party. The 2 independents
are least likely to vote for either of the parties.
Democrats are e (-9687+.9864) = es.9i02 = 49 91 times more likely to vote Demo_
cratic than a Republican. Similarly, Independents aree2(-om+.9864) = e2.oos = 7 45
times more likely to vote Democratic than are Republicans, while the odds are
e (3.9io2-2.oo8) _ e i.9022 _ g 7g times more likely for a Democrat to vote Democratic
than Independents.

6.13

Collapsibility Conditions for


Contingency Tables

We have seen from the previous section that both partial and marginal associations
may not be the same and indeed for many situations they are usually not the
same. In this section we give sufficient conditions under which both the partial and
marginal associations can be the same. First, let us consider again the difference
between marginal and conditional independence.
Given three variables A, B, and C, the following condition is necessary for both
the partial and marginal odds-ratios to be the same for the A-B association (or A-C
or B-C as the case may be):
TAB = T

=T

...=

where r defines a log odds-ratio, variable C has K categories, and 1 < i < (I 1), 1 <
j<(J-l).
As demonstrated from our previous analysis of the gun registration data, the
above collapsibility conditions will hold if either (a) or (b) or both of the conditions
below hold:

(b)

BC.A

= ln

= 1> for 1 < t < /, 1 < j < (J-l), 1 < fc < (K-l)

(a) above implies that A and C, are conditionally independent given B and is
described by model {AB,BC}. That is, either model {AB,C} holds or both models
{AB,AC} and {AC,BC} hold.
Similarly, (b) also implies that B and C are conditionally independent given A
and is also described by the model {AB,AC}.

238

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

For (a) and (b) to hold, then model {AB,C} must hold. That is, C
must be jointly independent of A and B.

6.13.1

Example

Consider again the data on voting preference that wERE analyzed in Table 6.18.
We suggested that the best parsimonious model was the model {IP,PV}, which
has a G2 = 11.814 on 9 d.f. - that is, the model that says that I is conditionally
independent of V given P. The odds ratios will be obtained as

These odds ratios would be expected to be equal to 1.00 under this model (since
the model imposes certain marginal constraints). However, model {IP, IV} does
not fit the data with G2 = 343.6063 on 8 d.f. This should be obvious since model
{IV, P} does not fit the data set. It has a G2 = 374.9528 and is based on 14 degrees
of freedom. Consequently we would not therefore be able to collapse the above
three-way table into, say, a two-way I-V table over variable P without distorting
the underlying association between I and V.
It is worth mentioning here that because the factor variable income (I) is ordinal
in nature, we may take advantage of this fact and fit a less restrictive IP interaction
term, I(1)*P, the linear component of I and P interaction term. This model when
applied to the data in Table 6.18 gives a G2 = 20.577 on 13 d.f. This model fits
the data, but on examining the adjusted residuals for this model, we notice that
the cells (212, 241, 312) with observed counts 37, 67, and 35 have expected counts
48.67, 80.93, and 27.38 respectively. The corresponding adjusted residual values are
2.26, 2.51, and 2.32, respectively. Thus, we would not adopt this model in favor
of our earlier model that was based on 9 d.f.

6.13.2

Another Example

Aicken (1983) presents the data in Table 6.22, which relate to a sample of women
during the Great Depression. The factors are:
E = Amount of full-time employment
EO = None
El = 1 - 4 years
E2 => 4 years

(i = 1,2,3)

C - Class (.7 = 1,2)


CO = Middle class
Cl = Working class
D = Deprivation (k = 1,2)
DO = Not deprived
Dl = Deprived
Interest here centers on fitting the models described by the following hypotheses to
the data in Table 6.22.

6.13. COLLAPSIBILITY

CONDITIONS FOR CONTINGENCY TABLES


CO
EO
El
E2

DO
4
7
2

239

Cl
Dl
5
8
7

DO
2
5
4

Dl
3
13
9

Table 6.22: Three-way table relating employment, class, and deprivation


Hypothesis
Hi

Formulation
= 7ri++7r+j+7T++fc

Model
{E, C, D }
{E, CD}
{C, ED }
{D, EC }

{ED, CD}
{EC, CD}
{EC, ED}
{EC, ED, CD}
Table 6.23: Hypotheses of interest
For each of the models, the likelihood ratio test statistic G2 is computed and the
corresponding degrees of freedom are also obtained. The expected values for these
models are displayed in Table 6.24.
From Table 6.24, we see that model HI holds, that is, variables E, C and D are
pairwise independent. It is not surprising therefore that models H^ H7 also
hold true since they are subsets of model H I . Thus E is independent of C, C is
independent of D, and D is independent of E. This model is interpreted as:
ECD
We may, if we so wish, consider for instance the interaction (CD) to be a single
factor variable with four categories and test the hypothesis that E and (CD) are
independent. The result of such a test will not be contradictory to the above result.
We may also, if we wish, collapse the three-way tables over any of the variables, and
our results above will still stand. We consider two of these possibilities in Tables
6.25 and 6.26.
In Table 6.25, under the model of independence between the two classificatory
variables E and (CD), the expected values are equal as expected to those obtained
for model H2. The resulting G2 = 3.924, and it is based on (3 - 1)(4 - 1) = 6
degrees of freedom. The result also confirms that E and (CD) are independent.

240
Cells
111
1 12
121
122
2 11
212
221
222
3 11
312
321
322

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY


riijk
4
5
2
3
7
8
5
13
2
7
4
9

G'2
df
p- value

#1
2.33
4.37
2.54
4.76
5.49
10.29
5.99
11.23
3.66
6.86
3.99
7.49
4.52
7
0.72

H2

H3

2.64
4.06
2.23
5.07
6.22
9.57
5.26
11.96
4.14
6.38
3.51
7.97
3.92
6
0.69

2.87
3.83
3.13
4.17
5.74
10.04
6.26
10.96
2.87
7.65
3.13
8.35
3.52
5
0.62

H4
3.13
5.87
1.74
3.26
5.22
9.78
6.26
11.74
3.13
5.87
4.52
8.48
2.48
5
0.78

Ha
3.25
3.56
2.75
4.44
6.50
9.33
5.50
11.67
3.25
7.11
2.75
8.89
2.93
4
0.57

TABLES

H6

H7

#8

3.55
5.45
1.53
3.47
5.91
9.09
5.50
12.50
3.55
5.45
3.97
9.03
1.89
4
0.76

3.86
5.14
2.14
2.86
5.45
9.55
6.55
11.45
2.45
6.55
3.55
9.45
1.49
3
0.69

4.11
4.89
1.89
3.11
6.08
8.92
5.92
12.08
2.81
6.19
3.19
9.81
1.08
2
0.58

Table 6.24: Results from the eight hypotheses considered


CD
E CODO COD1 C1DO C1D1
Total
4
2
EO
5
3
14
El
7
8
13
5
33
E2
2
7
4
9
22
Total
11
13
20
25
69
Table 6.25: The 3 x 4 two-way subtable relating employment and class-deprivation
Now suppose the three-way table is collapsed over class (C), for instance; the
resulting table would be that shown in Table 6.26.
D

EO
El
E2
Total

DO
9
15
9
33

Dl
5
18
13
36

Total
14
33
22
69

Table 6.26: The 3 x 2 collapsed two-way subtable relating employment and deprivation
Here again, G2 = 2.0329 for testing independence and is based on 2 d.f., which again
indicates that the model of independence still holds. That is, E _L D. The same will
be true if we had collapsed instead over E or C. Analyses of the resulting collapsed
tables will be consistent with earlier conclusions of pair wise independence.
As another example where collapsibility is possible, consider again the 2 x 2 x 3
contingency table relating cancer status (C) to spousal smoking (S) from three countries (Y) discussed in chapter 4. Next consider the eight hypotheses corresponding
to the earlier hypotheses HI to H8. These hypotheses in terms of terms included
are: H, : {Y,S,C}, H2 : {Y,SC}, H3 : {S,YC}, H4 : {C,YS}, H5 : {YC, SC},
H6 : {YS, SC}, H7 : {YS, YC}, H8 : {YS, YC, SC}. Table 6.27 gives the results of
the analysis for the eight hypotheses HI H8 considered in the previous section.

6.14. PROBLEMS WITH FITTING LOG-LINEAR MODELS


Models
Hi
H<2

H3
H,
H,
H6
H7
HOJ

df
7
6
5
5
4
4
3
2

G'2
22.3399
16.5606
21.2911
6.8440
15.5118
1.0647
5.7952
0.2396

241

pvalue
0.0022
0.0110
0.0007
0.2325
0.0037
0.8998
0.1220
0.8871

Table 6.27: G2 and pvalues for all the models considered


We observe from the above analysis that models H,R,H'i, and H& are tenable for
these data. The best parsimonious model is the model based on HQ, that is, model
{YS,SC}. This is the model that assumes that given any level of factor S (spousal
smoking status), cancer status (C) is independent of country (Y). However, because
of this result, if we collapse the tables over variable S, we would have a G2 = 1.0488
on 2 d.f. for the test of independence between C and Y. There is no contradiction
in this case and Simpson's paradox will not manifest in this example.
The results in the above table are generated from the following SAS software
programs in PROG CATMOD or GENMOD for the eight hypotheses of interest.
data 618a;
do e=l to 3; do c=l to 2; do d=l to 2;
input count (90; output; end; end; end;
datalines;
4 5 2 3 7 8 5 1 32 7 4 9
proc catmod; weight count;
model e*c*d=_response_/ml nogls freq pred=freq
noparm noresponse;
loglin e c d; run; loglin e c|d; run; loglin c d i e ; run;
loglin d e|c; run; loglin e|d c|d; run; loglin e|c c|d; run;
loglin e|c e|d; run; loglin e|c|d Q2; run;
***Corresponding GENMOD codes***;
proc genmod; class e c d; model count=e c d/dist=poi link=log; run;
proc genmod; class e c d;
model count=e|c|d <0<02/dist=poi link=log; run;

The reason why we are able to collapse is because the collapsibility conditions
discussed above are fully met in these data, because models {C,YS}, {YS,SC}, and
{YS,YC} all fit the data.

6.14

Problems Associated with Fitting


Log-Linear Models

Clogg and Eliason (1988) identified what they described as "some common problems in log-linear analysis." We discuss briefly below some insights into some of
these problems, and readers interested in a more thorough coverage of the topic
should consult the aforementioned article by Clogg and Eliason and other references therein.

242

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

6.14.1

TABLES

Problems with Small Cell Counts

In many situations, the sample size n may be very small relative to the dimension
M of the contingency tables. In such cases, the observed counts tend to be thinly
or sparsely spread across the cells of the table. Sampling zeros are not uncommon
in this situation, and sometimes there are usually counts that are very small (say,
1's, 2's, and 3's). The data in Table 6.28 give an example of such a situation. The
table is reproduced from Agresti (1990) and the data relate to the effectiveness
of immediately injected or 1 |-hour-delayed penicillin in protecting rabbits against
lethal injection with /3-hemolytic streptococci.
Penicillin
level
1/8

Delay
None
1

1/4

None

1/2

None

1^
None

None

l|

Mar. Prob.

1*

Response
Cured
Died
0
0
3
0
6
2
5
6
2
5

6
5
3
6
0
4
1
0
0
0

0.537

0.463

Table 6.28: The 5 x 2 x 2 data for this example


This is a 5 x 2 x 2 contingency table, with n = 54 observations spread across the
20 cells, of which there are 7 sampling zeros, and a total of 16 counts less than or
equal to 5. The theories on which the log-linear formulation is based assume large
sample or asymptotic theory. The consequences of the violations of this theory are:
1 The goodness-of-fit test statistics may not possess the approximating x2 distribution with the relevant degrees of freedom. Thus tests of significance are
therefore considerably hampered.
2 The sparseness of the data often results in what Clogg and Eliason (1988) called
the existence problems, which means that estimates of the A parameters originally postulated for the model cannot be calculated because one or more of
them take the value of 00.
3 The statistic (A X)/(ASE(X)} for each of the parameters of the model (except
the constant term or other terms required by marginal constraints) may not
follow the standardized normal variate because the asymptotic standard errors
are grossly inflated. Indeed, the sampling distribution of estimates of A may
be far from the normal and thus the standardized parameter estimates may
be misleading.
One possible solution suggested by Goodman (1984) is that model fits should be
accompanied by both the likelihood ratio statistic G2 and the corresponding Pearson's statistic X2 since both have the same asymptotic distribution when the sample

6.14. PROBLEMS WITH FITTING LOG-LINEAR MODELS

243

size is large and different distributions when the sample size is small (Lawal, 1984).
Goodman argues that when the two statistics lead to different conclusions, then the
sample size is not large enough to make a good decision. However, when the two
statistics lead to the same conclusion, then we are assured that the sample size may
be adequate for the model of interest. Thus inspecting either (72, X 2 , or indeed
/(A) may be sensible when we are confronted with sparse data.
Very attractive and famous in the analysis of contingency tables is the addition of
0.5 to each cell count when we are confronted with data with sampling zeros. This
procedure is particularly recommended for fitting the saturated model; however,
it is not uncommon to see analysts add 0.5 to each cell counts when analyzing
unsaturated models. This approach may be misleading. We illustrate this with the
above data.
Suppose we wish to fit the model {PD,PR} to the above data. Here, P refers to
penicillin level, D to delay, and R to the response. Let these variables be indexed
by i = l , - - - 5 ; j 1,2; k = 1,2, respectively. ML estimates do not exist for this
model (by Birch's theorem) because the three marginals ni+i,nn+, and ns+2 are
zero and estimates exist only when all cell marginal counts are positive.
Parameter estimates (main effects only) for model {PD,PR}
Parameter
ft
Af
Af

xp

Af
Af
Af

G'z

Poisson
Estimate
ASE
-24.000 0.1.136
25.609
1.221
25.504
1.219
24.693
1.275
23.307
0.612
-0.916
0.837
25.609
1.045
14.2938

+0.50
Estimate
ASE
-0.406
1.027
2.117
1.108
2.015
1.108
1.322
1.152
0.406
1.276
-0.693 0.707
2.079
1.061
9.7138

+Weighted values
Estimate
ASE
-1.147
1.489
2.804
1.550
2.699
1.549
1.947
1.587
0.832
1.725
-0.789
0.763
2.790
1.514
11.2021

Model {PD,PR} applied to the data gives a G2 = 14.2938 on 5 d.f., with parameter
estimates having standard errors of zeros. Further, PROG GENMOD warns with
the following statement:
WARNING: Negative of Hessian not positive definite.
In this case therefore our parameter estimates are most unreliable. If we add 0.5
to each of the 20 observations in the data set and we refit this model, we have a
summary of the SAS software output for main effects only in column 2 of the table
above. The SAS software program below will implement the models in columns 1
and 2 of the above table, using PROG GENMOD.
data example;
do p=l to 4; do d=l to 2;

do r=l to 2;

input count <BQ; output; end; end; end;


datalines;
0 6 0 5 3 3 0 6 6 0 2 4 5 1 6 0 2 0 5 0
proc genmod; class p d r; model count=p|d p|r/dist=poi; run;
data new; set example; count=count+0.5;
proc genmod order=data; class p d r;
model count= p|d p|r/dist=poi link=log typeS; run;

244

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

TABLES

In this case, we have what appear to be stable asymptotic standard errors. However, we have achieved this stability by the addition of 0.5 to each cell count, which
in turn has inadvertently increased the sample size from 54 to 64, an almost 19%
increase. This might lead to serious distortions in the interpretations of the underlying associations within the original data. To minimize this distortion, Clogg has
proposed the approach that suggests that, we shrink the data toward the marginal
distribution of the response variable, in this case R. From the table, the marginal
distribution of R is (0.537,0.463). The model of interest {PD,PR} has 5 degrees of
freedom, hence 5 observations must be distributed across the entire 20 cells. For
cells with R (cured), the required constant is 0.537 x (5/10) = 0.2685, and similarly
for cells with R (died), we add 0.463 x (5/10) = 0.2315.
The result of fitting model {PD,PR} to the augmented data is displayed in the
SAS software output below and in column 3 of the table above. We notice that
this model has a G2 of 11.9791 on 5 d.f. The estimates of the parameters of the
model and the corresponding ASE are very much in agreement with those of adding
0.5, and this time around, we have only added 9% of sample size to obtain our very
consistent results. We would therefore suggest that we adopt the Clogg et al. (1990)
approach to analyzing this type of data set.
data newl;
set example;
if r=l then count=count+0.2685;
else count=count+0.2315;
proc genmod order=data;
class p d r; model count= p|d p|r/dist=poi link=log type3; run;

6.14.2

Log-Linear Analysis of Incomplete


Contingency Tables

We consider in this section the analysis of incomplete multiway contingency tables


where sampling zeros are observed. Table 6.29 is such an example and gives the
number of pediatric contacts by subscriber's age, number of children, and age of
youngest children (Freeman, 1987).
Number of Pediatric Contacts
Age of
Subscriber

Number
of Children

Age
of Youngest

<25

1
2+
1
2+
2+
2+
1
2+
2+

<5
<5
<5
<5
5-9
10+
10+
5-9
10+

25-44

45-59

0
4
3
3
17
17
25
63
5
32

1
8
2
2
7
22
5
5
5
4

2
2
2
4
9
12
4
5
7
6

3
5
1
7
18
19
7
5
3
2

4-6
10
8
14
37
33
5
4
8
6

7-9
8
8
7
41
9
5
3
1
5

10+
19
25
26
172
46
9
3
4

Table 6.29: Data for this example


In the above four-way contingency table, variable (D) "age of subscriber" has three
levels (<25, 25-44, 45-59), variable (C) "number of children" has two levels (1, 2+),
and variable (B) "age of youngest" also has three levels (<5, 5-9, 10+), while the

245

6.14. PROBLEMS WITH FITTING LOG-LINEAR MODELS

number of contacts (variable A) has seven categories. Thus a complete contingency


table would have observations in each of the 3 x 2 x 3 x 7 = 126 cells of the four-way
table. However, the observed counts in the above table indicate that we do not have
counts in all the 126 cells. Indeed, we have counts in only 63 of the cells. Thus the
table is incomplete.
We next want to ask if the missing cell entries should be treated as structural
zeros, in which we would wish to fit quasi-log-linear models, or as sampling zeros. In
this example, the number of contacts is ordinal, and this is designated as variable A
here. In the data, there are no subscribers <25 years in age who have the youngest
child age 5 or more. Also, for the ages 25-44 and 45-59, the youngest child is 5 or
more. That is, none has a young child <5 in these two groups. Hence, instead of
having 3 x 2 x 3 = 18 samples, we only have 9 samples in our data. The resulting
table is therefore incomplete. However, these missing cells are not inherently zero.
It is just that they were not observed in our sample. We are thus dealing here with
sampling zeros, rather than structural zeros. We present in Table 6.30 the observed
cell counts for the data in Table 6.29, including the sampling zeros. Table 6.30 is
the complete table of the number of pediatric contacts by subscriber's age, number
of children, and age of youngest children with the sampling zero cells included.
Number of pediatric contacts
Age of
subscriber
< 25

Number
of children
1

2+

25-44

2+

45-59

2+

Age
of youngest
<5
5-9
10+
<5
5-9
10+
<5
5-9
10+
<5
5-9
10+
<5
5-9
10+
<5
5-9
10+

0
4
0
0
3
0
0
3
0
0
17
17
25
0
0
63
0
5
32

1
8
0
0
2
0
0
2
0
0
7
22
5
0
0
5
0
5
4

2
2
0
0
2
0
0
4
0
0
9
12
4
0
0
5
0
7
6

3
5
0
0
I
0
0
7
0
0
18
19
7
0
0
5
0
3
2

4-6
10
0
0
8
0
0
14
0
0
37
33
5
0
0
4
0
8
6

7-9
8
0
0
8
0
0
7
0
0
41
9
5
0
0
3
0
1
5

10+
19
0
0
25
0
0
26
0
0
172
46
9
0
0
3
0
4
3

Table 6.30: The complete table for the data in Table 6.29
Our analysis of the data in Table 6.30 is complicated by the fact that there are
many sampling zeros in the data, leading to nine of the marginal totals being zero.
The log-linear model analysis is based on large sample statistical theory and this
will no doubt be violated if care is not taken in modeling these data. Further,
the number of degrees of freedom on which our various models will be based is
seriously compromised by the presence of so many sampling zeros. Other problems
associated with modeling such data as in Table 6.30 are that the relevant goodnessof-fit statistics, such as G2, may not posses the desired null distribution even with

246

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY TABLES

the use of the correct degrees of freedom. Further, there is also the problem relating
to the existence of parameters, as mentioned in the preceeding section.
The above and similar problems are considered in the analysis below of the data
in Table 6.30. Any program that is based on Newton's iterative procedure will be
able to handle the analysis of this type of data. The IFF based algorithms (software)
usually give the wrong number of degrees of freedom when there are many sampling
zeros resulting in one or more zero group or marginal totals.
The constructed weight (wt) variable in our SAS software program below instructs the program to treat the cells having zero values as sampling zeros and take
this into account when calculating the relevant degrees of freedom for all the models
that will be considered. Logit models that fit the marginal distribution of the set
of explanatory variables B, C, and D are considered. In PROC GENMOD, this is
specified by including the B\C\D model in the model specification. In PROC CATMOD, this is accomplished by specifying POPULATION B C D . The logit models
displayed in appendix E.2 whose results are displayed in Table 6.31, are fitted to
the data in Tables 6.30 and 6.29.
Both PROC CATMOD and GENMOD give the same goodness-of-fit test statistics for all the models.
Number
(i)
(")

(iii)
(iv)
(v)

(vi)
(vii)

Log-linear
models
BCD,A
BCD, AB

BCD,AC
BCD,AD
BCD, AB,AC,AD
BCD,AB,AD
BCD,AB,AC

Logit
models
A*
B
C
D

B, C, D
B, D
B, C

G2

X2

d.f.

395.8815
58.4539
357.0956
172.8302
9.7354
25.0227
33.3738

422.7455
64.4891
363.0394
190.2218
9.5321
25.3667
33.6263

48
36
42
36
18
24
30

pvalue
0.0000
0.0104
0.0000
0.0000
0.9402
0.4046
0.3063

Table 6.31: Results of logit and equivalent log-linear models

Results in Table 6.31 indicate that the response pattern cannot be wholly explained
in terms of B, C or D variables alone, as observed in the pvalues in models (ii), (iii),
and (iv). The difference in G2 between models (v) and (vi) provides a test of the
hypothesis that the term AC is zero. In this case, we have G2 = 25.0227 9.7354 =
15.2873 and it is based on 6 d.f. (pvalue = 0.0181). This clearly indicates that the
interaction term AC cannot be considered zero. Similarly, the difference between
models (v) and (vii) provides a test of the hypothesis that the interaction term AD
is zero. Again, G2 = 33.3738 - 9.7354 = 23.6384 and it is based on 12 d.f. (pvalue
= 0.0228). This result also indicates that the term AD cannot be considered zero in
the model. Of course, from models (i) and (ii), we can show that the AB interaction
is very important for the model. Thus based on the above results, we can say that
both the logit models {B,D} and {B,C} are adequate for an explanation of the
model, but both models are missing either the AC or the AD terms; terms that
have been shown to be important in the model. Hence, we propose the final logit
model {B,C,D}, which is equivalent to the log-linear model (BCD,AB,AC,AD },
for a proper explanation of the data. This model fits the data well and the analysis
of residuals indicates a very good fit to the data. That is, the number of pediatric
contacts is dependent on the age of the subscriber, the number of children registered,
and the age of the youngest child.

6.14. PROBLEMS WITH FITTING LOG-LINEAR MODELS

247

We also try fitting linear by linear association models to the data, all such models
fail to fit the data.

Another Example
Andersen (1997) analyzed the following table of cross-classified data from the Danish
Welfare Study. The sample is classified by variables
A : Strenuous work with categories yes, yes sometimes and no.
B : Type of employment with categories blue collar employee,
white collar employee and employer.
C : Social group with four categories.
A: Strenuous
work
Yes

Yes,
sometimes
No

B: Type of
employment
Blue collar
White collar
Employer
Blue collar
White collar
Employer
Blue collar
White collar
Employer

I-II
0
79
38
0
156
28
0
136
18

C. Social group
III
IV
V
64
182
0
98
110
0
126
19
0
0
131 265
166 292
0
52
150
0
0
156 556
166 382
0
54
180
0

Table 6.32: Table of counts for this problem


Because the sampling design in this example has fixed the B-C marginal totals, the
expected values m^fc for any model that is employed therefore have to satisfy the
condition that:
Possible models satisfying these are: (i) {AB,AC,BC}, (ii) {AB,BC}, (iii) {AC,
BC}, (iv) {ABC}, and (v) {A,BC}. We fit the log-linear models (i), (ii), and (iii)
to the above data. These models correspond to the first three models of Table 5.3
in Andersen (1997). We decide not to implement the fourth model {AB,AC} in
Andersen because it does not satisfy the condition imposed by the sampling design.
The equivalent logit models for the three models above are respectively, (ia) {B,C},
(iia) {B}, and (iiia) {C}. The implementation of these models is carried out with the
SAS software programs in appendix E.3 using PROC CATMOD and in appendix
E.4 using GENMOD. We have also presented relevant partial outputs based on the
logit model (ia). The model is based on 4 degrees of freedom with G2 = 13.5575.
Maximum Likelihood Analysis of Variance
Source

Intercept
b
c
Likelihood Ratio

DF

2
4
6
4

Chi-Square

185.64
14.98
55.48
13.56

Pr > ChiSq

<.0001
0 . 0047
<.0001
0.0088

248

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

Criterion

Criteria For Assessing Goodness Of Fit


DF
Value

Deviance
Pearson Chi-Square

4
4

TABLES

Value/DF

13.5575
13.6843

3.3894
3.4211

We have presented above a partial output from implementation of the log-linear


model (i), together with the expected values under the model in appendices E.3
and E.4. The results of fitting the above logit models are presented in the table
below.
Number
0)
(ii)
(iii)

Log-linear
models
BC, AB, AC
BC, AB
BC, AC

Logit
models
B, C
B
C

G2
13.5575
70.3204
28.4867

X2
13.6843
69.3636
30.0944

d.f

4
10
8

The values of the goodness-of-fit test statistics obtained here are different from those
obtained in Andersen (1997), although our degrees of freedom do agree. We have
therefore presented the expected values obtained under model (i) with our results.

6.15

Weighted Data and Models for Rates

Most data in sociological studies usually arise from stratified sampling scheme in
which the data of interest are weighted on a case-by-case basis. The solution to this
is to fit a weighted log-linear model of the form:
ln(m r /z r ) = X A
where r refers to a particular cell in the table, X and A are as defined previously,
and zr is referred to as the start table in programs based on the IFF algorithm
(BMDP) or exposure, cell weights or offset in programs based on the NewtonRaphson method (SPSS, SAS GENMOD). This approach will be fully explored
in chapter 9.
For the analysis of rare or not so rare events, the log-rate models have been proposed, where for a three-way table in which the response variable C is dichotomous,
we would have
i (miji/n
/ - / j ) \ = 6r +, df
rA +, 0
rB
In
i +

where riij+ is the "group totals," the marginal distribution of the joint variables
composed of all predictors and the IJ responses pertain to rare events. The above
model is equivalent to the weighted log-linear model discussed above, if we set the
cell weight or start table to n^-+. We shall explore this model further in chapter 9.

6.16

Problems with Interpretation

Most log-linear models produce too many parameters in the final model, which
consequently makes full interpretations of these parameters almost impossible. The
problem of interpretations in log-linear models has been echoed among others by
Upton (1991), Kaufman and Schervish (1986), Amemiya (1981), and Alba (1987).

6.17. EXERCISES

249

These authors have provided some insights into this problem and have also suggested
some measures for reducing some of these models to simpler and interpretable models.
Concurrent with the problem of interpretations are the following: the problem
of retrieving the odds ratios in the final model, interpreting higher-order interactions, and the choice of an appropriate scale since log-linear model parameters are
expressed in logarithmic or exponential forms. Clogg et al. (1986) has proposed a
purging method as a possible solution to the latter problem.

6.17

Exercises

1. In the saturated log-linear model for a four dimensional table, let \BD = 0 and
let all of the corresponding higher order terms also be zero, e.g., \ABD 0.
Show whether the resulting model is decomposable or not and find a formula
for rhijki without using the ITP algorithm.
2. Obtain estimates of n, Xf, and \B in a 3 x 2 table displaying independence.
Hence, for the general I x J table, consider the log-linear model
log(mij) = / / + Af
TL' -i-

Show that rhij = y- and that its residual d.f. = I(J-l).


3. A 2 x 2 x 2 table with observed probabilities {^ijk} satisfies Ki++ ^++k
ir++k = \ for all z,j, and k. Give examples of cell probabilities that satisfy
the log-linear models
(a) (A,B,C)
(b) (AB.C)
(c) (AB,BC)
(d) (AB,AC,BC)
(e) (ABC)
For cases (a), (b), and (c) find the following:
(i) minimal sufficient statistics,
(ii) likelihood equations
(iii) fitted values and
(iv) residual d.f. for testing goodness of fit.
4. In a general three-way I x J x K contingency table with observed counts n^,
give a decomposition of the likelihood ratio test statistic
G2 = 2^nijk log I -^- ) for the model {AB,C}.
ijk

fnijk
**

5. The table below is a national study on 15- to 16-year old adolescents and
is described in Morgan and Teachman (1988). The event of interest is ever
having sexual intercourse.

250

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

Race
White
Black

TABLES

Intercourse
Gender Yes
No
Male"43
134~
Female
26
149
Male
Female

29
22

23
36

Calculate the conditional odds ratios between gender and intercourse and
between race and intercourse. Interpret.
6. Explain what is meant by "no statistical interaction" in modeling response
C and explanatory A and B in the following cases. Use graphs or tables to
illustrate
(a) All variables are continuous (multiple regression).
(b) C and A are continuous, B is categorical (analysis of covariance).
(c) C is continuous, A and B are categorical (two-way ANOVA).
(d) All variables are categorical.
7. Suppose {KIJ} satisfy the log- linear model of independence.
(a) Show that Af - Af = ln(7r +a /7r +6 ).
(b) Show that {allX? = 0} is equivalent to ir+j l/J for all j 1, 2, , J.
8. For a 2 x 2 x 2 table, show that for the model (AB,AC,BC),

9. Suppose the log-linear model (AB, AC) holds, find an expression for m^ and
hence calculate mijjr and In (771^-+). Show that when (AB,AC) holds, the log
linear model for the A-B marginal totals has the same association parameters
as {A^B} in model (AB,AC). Hence deduce the odds ratios are the same in
the A-B marginal table as in the partial tables. Derive a similar result for
model (AB,BC) and hence the collapsibility condition in this chapter.
10. The table below from DiFrancisco and Critelman (1984) relates to effects
of nationality and education level on whether one follows politics regularly.
Analyze these data using log-linear models. How do the countries differ with
regard to following politics regularly?

6.17.

251

EXERCISES
FP regularly

Country
USSR

USA

UK

ITALY

MEXICO

Education
level
Primary
Secondary
College
Primary
Secondary
College
Primary
Secondary
College
Primary
Secondary
College
Primary
Secondary
College

Yes
94
318
473
227
371
180
356
256
22
166
142
47
447
78
22

No
84
120
72
112
71
8
144
76
2
256
103
7
430
25
2

11. The data below relate to a health survey in which a single sample of respondents was taken from a population of adults in a certain county. The survey
included questions on health status and access to health care. The results
from two of the questions are displayed in the table below. The first question
was, "In general, would you say your health is excellent, good, fair or poor?"
The second question was, "Are you able to afford the kind of medical care you
should have?"

Health
status
Excellent
Good
Fair
Poor
Total

Afford medical care


Not
Almost
never
often
Often
Always
4
21
20
99
12
43
59
195
21
11
15
58
8
9
17
8
93
35
103
369

Total
144
309
105
42
600

(a) Test the hypothesis that the answers to the health status question are
independent of the answers to the access to health care question for
adults in the selected county. Fit the appropriate log-linear a = 0.05.
(b) Are there any cells contributing more than usual to the overall X2?
(c) Fit an appropriate quasi-independence model and draw your conclusions.
12. The table below is reproduced from Agresti (1990) and refers to the effect of
academic achievement on self-esteem among black and white college students.

252

CHAPTER 6. LOG-LINEAR MODELS FOR CONTINGENCY

Gender
Males
Females

Cumul.
GPA
High
Low
High
Low

Black
High
Low
self-esteem
self-esteem
15
9
26
17
13
22
24
23

TABLES

White
High
Low
self-esteem
self-esteem
17
10
22
26
22
32
3
17

Which log-linear model do you think would be most appropriate for the data?
Interpret the data using your chosen model.
13. Repeat problem 12 above for model (ABC,AD,BD)
14. Consider the log-linear model (A,B,C) for a 2 x 2 x 2 table.
(a) Express the model in the form In m = X /3.
(b) Show that the likelihood equations X'n = X'm equate
in the one- dimensional margins.

and

15. Apply IFF to log-linear model (A,BC), and show the ML estimates are obtained within one cycle.
16. Write out the terms in the log-linear model with generating class
Show that it is decomposable. Find the functional form of its maximum
likelihood estimates.
17. The table below gives a three- factor table based on gender, socio-economic
status, and opinion about legalized abortion. The status has two categories:
low and not low. The table of counts is given below. Analyze these data.

Gender (i)
Female

Status (j)
Low
Not low
Total

Male

Low
Not low
Total

Opinion (k)
Support Not support
171
79
112
138
309
191
152
167
319

148
133
281

Total
250
250
500
300
300
600

18. Find closed-form estimated expected counts under the following models:
(a)
(b)
(c)
(d)
(e)

{A,B,C}
{ABC}
{AB,AC}
{AB,BC}
{AB,C}

6.17. EXERCISES

253

Interpret all the above models in the context of the discussion in this chapter.
19. Fit log-linear models to the data in exercise 6.17. Which is the most parsimonious model? Can we collapse the table over gender?

This page intentionally left blank

Chapter 7

Strategies for Log-Linear


Model Selection
7.1

Introduction

We now consider some strategies for building the most parsimonious model for the
general contingency table. For a fc-way contingency table, the number of effects
and interactions increases very rapidly as k increases. For instance, for k = 3, there
are 7 effects and interactions, and for k = 4 this number increases to 15, while the
number is further increased to 31 for k = 5. In general, the number of effects and
interactions equals (2fc 1). Thus trying to fit all possible models, for instance, if
k were 5, would be unwieldily and a more selective procedure would in this case be
most desirable.
Some important notions (Santner & Duffy, 1989) for model selection would be
appropriate at this point before we discuss the various procedures for model selection
strategies.
1. Parsimony: The model should have as few parameters as possible while adequately explaining the data. That is, we balance between a model that has
as enough parameters to explain the data, while at the same time, it is easy
enough to interpret.
2. Interpretability: It would not be reasonable to fit, for instance, a model with
interaction effects without the main effects being included in the model. This
situation does not seem to pose problems for as long as we restrict our selves
to hierarchical log-linear models.
3. Significant effects: The removal or addition of any effect or interaction must
be either significant or not significant.
4. Coherence: If a model is rejected, then all submodels should also be rejected,
and conversely, if a model is accepted, then all models containing it should be
accepted. This result is due to Gabriel (1969).
5. Factor variables: If there are factor and response variables, then all possible
models to be considered should include the highest interaction term among
255

256

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION


the factor variables, and the selection process must focus on terms linking the
response variables or the response with the factor variables. For example, if
A, B, C are factor variables and D is a response variable, then it would be
advisable to include the interaction effect ABC in the model, and subsequent
models should explore the associations between D and any of the other six
interaction and main effect terms for the factor variables. That is, the marginal total riijk+ is fixed. It would not make much sense to determine whether
there is piecewise or conditional independence between the factor variables.

The last consideration above is often referred to as the sampling condition consideration, because in the example above, the marginal totals riijk+ are assumed fixed
and corresponding ML cell estimates must satisfy these marginal constraints.
As discussed in chapter 6, the initial exploratory analysis of a /c-way contingency
table could be in the form of examining the significant Z values (or chi-squared
values) when a saturated model is fitted. But as also noted, this method is not by
all means without its drawbacks. We also consider, as an alternative to examining
the individual effects in a saturated model, the tests based on marginal and partial
associations. These usually provide a good starting point for model building.
We will consider two other important methods for the next phase of the model
building process for a fc-way table. These two methods are the forward selection
and backward selection techniques discussed next.

7.1.1

The Forward Selection Procedure

The forward selection method belongs to the general class of stepwise selection procedures. For the stepwise procedure, rules will be developed for adding or removing
terms from an initial model with the sole purpose of arriving at a more parsimonious
final model.
For this procedure, terms are added to an initial or working smaller model. The
procedure is sequential in that it adds to or deletes terms from a model one at a
time. At each stage, the test statistic G2 is computed both for the current and larger
model in which additional term has been added. The corresponding G2 values are
obtained for both models and a pvalue is computed to test for the significance of the
extra term added. For a predetermined a level, the term is rejected if the pvalue is
less than or equal to a.
The procedure assumes that we have identified an initial model by either of the
methods suggested in section 6.15. For a /c-factor table, for instance, the forward
selection procedure
(a) Adds the k factor term not already in the model that has the most significant
pvalue.
(b) Continues with the process until further additions do not show any appreciable
significance (based on the chosen a) in fit.

7.1.2

The Backward Selection Procedure

For the backward selection procedure, we begin with an initial large or complex
model and terms are then sequentially deleted from the model in turn. Again, just

7.2. EXAMPLE 7.1: THE STEPWISE

PROCEDURES

257

as for the forward selection procedure, at each stage, we compute corresponding


test statistic G2 for both the current model and the reduced model. Deletion of
a term is based on the nonsignificance of its corresponding pvalue when compared
with a predetermined a. This process continues until further deletion would lead
to significantly poorer fit to the data. Thus:
(a) Delete the k factor term with the least significant pvalue from the model.
(b) Continue with the process until further deletions show deviations of pvalues
from the predetermined a level.
Care must be taken especially when using the backward selection procedure to
ensure that we do not delete an effect that may cause the saturated model test to
be rejected.
As discussed earlier, a term may be forced into the model by the sampling scheme
(e.g., factors) or by the presence of a higher order interaction term. For example,
consider the table with four variables A, B, C, and D. Suppose we fit the model
{ABD,CD}. Then, when considering all two-factor effects to be eliminated, all of
{AB,AD,BD} are forced into the model as a result of {ABD}. The only candidate
for elimination would have to be the effect {CD}.

7.2

Example 7.1: The Stepwise Procedures

We present in Table 7.1 an example of the forward and backward selection procedures. The data are from Aicken (1983), and relate to the study of marijuana use
among adults in a large city and a suburb. The variables are G = geographical
location with Gl = San Francisco and G2 = Contra Costa. Family status F has
levels Fl: married with children and F2: unmarried or without children. Religion
is R with Rl Protestant or Catholic and R2 Others. Response is M; with Mlrused
marijuana and M2:did not use marijuana. We have therefore an 2 x 2 x 2 x 2
contingency table.
Geographical region
M
1
2

R
1
2
1
2

Gl
Fl F2
52
3
37
9
35
15
130
67

G2

Fl
23
69
23
109

F2
12
17
35
136

Table 7.1: Marijuana use (M) by religion (R) by family status (F) by geographical
region (G) for a sample of adults (Aicken, 1983)
We start by first fitting the all three-factor, the all two-factor models, the model
of complete independence (that is, the all one-factor model), and the equiprobable
model to the data. The results of these fits are displayed below together with the
relevant SAS software codes for implementing them.

258

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION

df
1
5
11
15

Model
{MRG,MRF,MGF,RGF}
{MR,MG,MF,RG,RF,GF}
{M,R,G,F}
M

G'2
3.823
16.495
147.978
534.797

pvalue
0.051
0.006
0.000
0.000

Only the all three-factor model fits the data relative to the saturated model since
pvalue >0.05 in this case.
data chap71;
do M=l to 2; do R=l to 2; do G=l to 2; do F=l to 2;
input count (DO; output; end; end; end; end;
datalines;
52 3 23 12 37 9 69 17 35 15 23 35 130 67 109 136
proc genmod order=data; class M R G F;
model count=M|R|G|F<03/dist=poi; run;
proc genmod; model count=M|R|G|F<B2/dist=poi; run;
proc genmod; model count=M R G F/dist=poi; run;
proc genmod; model count=/dist=poi; run;

7.2.1

The Forward Selection Procedure

Since only the all three-factor model fits the data, we would therefore start the forward selection procedure with the all two-factor model, {MR,MG,MF,RG,RF,GF}.
We shall adopt here a. = 0.05 as the criterion for adding terms. That is, we add a
term to a current model if it has the most significant pvalue.
Because the initial model is the model {MR,MG,MF,RG,RF,GF}, we next
start by adding each of the three-factor terms to this model in turn and check
for the significance of their pvalues. For instance, to test for the MRG term,
we compute the difference of G2 under the models {MR,MG,MF,RG,RF,GF} and
{MF,RF,GF,MRG} giving a value (16.495 - 8.411) = 8.084 on (5 - 4) = 1 d.f.
These differences are also displayed below.
Differences
Model
{MR,MG,MF,RG,RF,GF}
{MF,RF,GF,MRG}
{MG,RG,GF,MRF}
{MR,RG,RF,MGF}
{MR,MG,MF,RGF}

Added
term
-

MRG
MRF
MGF
RGF

d.f.
5
4
4
4
4

G2
16.495
8.411
16.207
16.429
9.141

d.f.
1
1
1
1

G2

pvalue

8.084
0.288
0.066
7.354

0.0045
0.5915
0.7973
0.0067

From the above, the term MRG has the smallest pvalue and so the effect {MRG}
must next be added to the initial model, giving a revised current model {MF,RF,GF,
MRG}. The next step is now to proceed by adding another simple effect to the new
model. To determine which effect to add next, we again start with the revised
(or new or current) model {MF,RF,GF,MRG} and add each of the three-factor
terms MRF, MGF, and RGF in turn. Once again, the fitted models and their
corresponding differences are displayed in Table 7.2. We note here that adding for
instance the term MRF to the new model {MF,RF,GF,MRG} changes the model
to (GF,MRG,MRF} since the new term already incorporates the two-factor terms
MF and RF into the model.

259

7.2. EXAMPLE 7.1: THE STEP WISE PROCEDURES


Differences
Model
{MF,RF,GF,MRG}
{GF,MRG,MRF}
{RF,MRG,MGF}
{MF,MRG,RGF}

Added
term
MRF
MGF
RGF

d.f.

G2

d.f.

8.411
8.411
8.351
4.340

1
1
1

3
3
3

G2
-

pvalue

0.000
0.060
4.071

1.0000
0.8065
0.0436

Table 7.2: Models and corresponding differences


Only the term RGF has the smallest pvalue that is less than our cut-off criterion
a = 0.05. Hence, the term RGF should now be included in the model, yielding the
model {MF,MRG,RGF}
We are still interested in whether the inclusion of the remaining two three-factor
terms MRF and MGF would be necessary in our model (given that MF, MRG, and
RGF are already in the model). The above question could simply be answered
by noting that the pvalues in the last display are all not significant, indicating
that we need to stop at this point. Alternatively, we could proceed as before and
fit additional terms MRF and MGF to the new current model {MF,MRG,RGF},
yielding the following display once again in Table 7.3.
Differences
Model
{MF,MRG,RGF}
{MRG,MGF,MRF}
{MRG,RGF,MGF}

Added
term
MRF
MGF

d.f.
3
2
2

G2

d.f.

G2

pvalue

4.340
4.329
3.856

1
1

0.011
0.484

0.9165
0.4860

Table 7.3: Models with additional terms


As expected, none of the two pvalues is significant at the cutoff point a = 0.05.
Consequently, we have arrived at the most parsimonious model {MF,MRG,RGF}
for our data based on the forward selection procedure. This model has a G2 4.340
(pvalue = 0.2270) and is based on 3 degrees of freedom. We will consider this final
model later.

7.2.2

The Backward Selection Procedure

We next consider the backward selection procedure for our data. For this procedure,
it is always desirable to start with the most complex model, which in this case would
be the all three-factor model {MRG,MRF,MGF,RGF}. We will again use a cutoff
point of a 0.05 as our criterion. At each stage of our selection, we delete the term
for which the pvalue will be least significant (that is, highest value). These results
are presented in Table 7.4.
At the first stage, deletion of the term MRF has the least acceptable pvalue and
it will thus be deleted from the model to have a new reduced model {MRG,MGF,
RGF}. At the second stage, we next delete in turn the terms MRG, MGF, and
RGF. Only the term MGF gives the least nonsignificant pvalue and it is thus the
next candidate for deletion, leaving us with a new reduced model {MRG,RGF}.
At this point, none of the other two effects not yet deleted give a pvalue greater
than a = 0.05. Hence at this stage, it will not be desirable to delete any further

260

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION


Differences

Stage
1

Deleted
term

d.f.

MRG
MRF
MGF
RGF

1
2
2
2
2

G2
3.823
7.970
3.856
4.329
8.349

{MRG,MGF,RGF}
{MGF,RGF}
{MF,MRG,RGF}
{RF,MRG,MGF}

MRG
MGF
RGF

2
3
3
3

{MF,MRG,RGF}
{MG,MR,MF,RGF}
{MF,RF,GF,MRG}
{MRG,RGF}

.
MRG
RGF
MF

3
4
4
4

Model
{MRG,MRF,MGF,RGF}
{MRF,MGF,RGF}
{MRG,MGF,RGF}
{MRG,MRF,RGF}
{MRG,MRF,MGF}

G2

pvalue

1
1
1
1

4.147
0.033
0.506
4.526

0.0417
0.855*
0.4768
0.0334

3.856
8.528
4.340
8.351

_
1
1
1

4.672
0.484
4.495

0.0306
0.486*
0.0340

4.340
9.141
8.411
56.521

_
1
1
1

4.801
4.071

0.0285
0.0436
0.0000

d.f.

52.181

Table 7.4: Models based on the backward selection procedure


three-factor term, but we could proceed to the third stage by noting that the five
two-factor terms MR, MG, RG, RF, and GF are automatically included in the
chosen current model. The only two-factor term not included is the term MF.
Consequently, fitting the model and using the backward procedure to test for the
deletion of the terms MRG, RGF, and MF. None of the terms can be deleted based
on our cutoff point. Hence the final model is once again given by {MRG,RGF,MF}
and is based on 3 degrees of freedom.

7.2.3

Selection Based on the Saturated Parameters

For a saturated model fitted to the above data set, we give below the results of
the typeS analysis on each of the terms in the model as provided by SAS^ PROC
GENMOD.
set chapTl;
proc genmod order=data; class M R G F;
model count=M|R|G|F/dist=poi type3; run;
LR Statistics For Type 3 Analysis
ChiSource
DF
Square
Pr > ChiSq

M
R
M*R
G
M*G
R*G
M*R*G
F
M*F
R*F
M*R*F
G*F
M*G*F
R*G*F
M*R*G*F

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

91.22
75.57
10.96
9.55
0.87
0.68
0.37
74.05
39 . 94
0.52
0.56
24.36
0.00
7.93
3.82

<.0001
<.0001
0 . 0009
0 . 0020
0 . 3496
0.4109
0 . 5423
<.0001
<.0001
0 . 4706
0 . 4524
<.0001
0 . 9464
0 . 0049
0.0506

7.3. MARGINAL AND PARTIAL

261

ASSOCIATIONS

Clearly, from the results above, the significant (those with pvalue < 0.05) first-order
and second-order effects are (RGF), (MR), (MF), and (GF) at a = 0.05. An initial
possible model would therefore be {RGF,MR,MF}, since the term GF is already in
the model as a result of RGF being in the model. This model when fitted to the
data has G2 = 13.382 on 5 d.f. Obviously, this model does not fit the data as it has
a pvalue of 0.02.
The next obvious decision would be to introduce the two-factor term MG, which
is the only two-factor term not yet in the model. This model, designated as (ii)
in the table below, barely fits the data with a pvalue of 0.058 and shows that the
inclusion of the MG term is significant. Having included all the six two-factor terms,
the next possible models will be the inclusion of either the MRG, the MGF, or the
MRF. These additional models are displayed in (Hi) to (v) in Table 7.5. Using the
model in (ii) as a baseline model, only the term MRG is significant and is worth
being added to the model in (ii). Once, again, the chosen model is the model
{MF,MRG,RGF} by this procedure.
Differences
No
(i)
(ii)
(iii)
(iv)
(v)

Model
{MF,MR,RGF}
{MR,MF,MG,RGF}
{MF,MRG,RGF}
{MR,RGF,MGF}
{MG,RGF,MRF}

Added
term
MG
MRG
MGF
MRF

d.f.
5
4
3
3
3

G2
13.382
9.141
4.340
8.528
8.601

d.f.
1
1
1
1

G2
4.241*
4.801*
0.6130
0.5400

pvalue
0.0395
0.0284
0.4337
0.4624

Table 7.5: Additional models having {MF,MR,RGF| as baseline

7.3

Selection Based on Marginal


and Partial Associations

Brown (1976) has suggested marginal and partial associations tests for screening
effects in multidimensional contingency tables. To implement the marginal association screening tests in SAS software, we give below the relevant GENMOD
statements to fit the following marginal association models. The first GENMOD
statement fits the equiprobable model, while the last fits the all three-factor model.
set chap71;
proc genmod
(i)
model
proc genmod
(ii) model
proc genmod
(iii) model
proc genmod
(iv) model

order=data; class M R G F;
count=/dist=poi type3; run;
order=data; class M R G F;
count=M R G F/dist=poi type3;run;
order=data; class M R G F;
count=M| R |G| F<B2/dist=poi type3;run;
order=data; class M R G F;
count=M| R |G| FG3/dist=poi type3;run;

The results of these fits are once again summarized in Table 7.6 in the order they
are fitted in the program above.
The marginal association test, for instance, that all one-factor effects (M,R,G,F)
are simultaneously zero is conducted by the difference G2 = (534.797 147.978) =

262

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION


Model

{M,R,G,F}
{MR,MG,MF,RG,RF,GF}
{MRG,MRF,MGF,RGF}

d.f.
15
11
5
1

G2
534.797
147.978
16.495
3.823

pvalue
0.000
0.000
0.006
0.051

Table 7.6: Marginal association models


386.819 on (15 11) = 4 degrees of freedom. These marginal tests are displayed in
Table 7.7.

k
1
2
3
4

d.f.
4
6
4
1

G'2
386.819
131.483
12.672
3.823

pvalue
0.000
0.000
0.013
0.051

Decision
reject
reject
reject
fail to reject

Table 7.7: Marginal tests: Tests that A;-way effects are zero
The first line k = 1 in the table tests the hypothesis that:
HQ : \i

= A 9 = Xk = X, 0

(7.1)

against the alternatives that at least one of the above parameters is not equal to zero.
The result indicates that the main effect terms are necessary in the model. In any
case, we do not wish to eliminate zero-order terms (noncomprehensive models) in
whatever model we finally adopt. The marginal associations for both the six twofactor and the four three-factor effects are also necessary in our eventual model,
though the actual individual terms are not apparent from this result, but hopefully,
the partial associations test would shed more light on these individual terms. The
marginal four-factor term is not significant. We again would not wish to include
this term in the final model in any case as it would represent the saturated model.
The marginal associations tests thus tell us to look out for some two-factor and
three-factor terms.
The partial association model is implemented in SAS^ GENMOD with the
same above statements by simply requesting typeS likelihood ratio test statistics.
Below is presented a modified output for the partial association test from PROC
GENMOD. These results are obtained from model statements in (M), (m), and (iv)
in the SAS software statements above. That is, the upper section of the table below
gives Type 3 LR values for the main effects are obtained from model designated as
(ii) in the SAS software program at the beginning of this section. Similarly, the
middle Type 3 values for the firstorder interaction effects are produced from the
model statement in (iii), while the bottom section was produced from the model
statement in (iv).
The partial associations test below tell us that all terms whose pvalues are
<0.05 are important. These are MRG, RGF, MR, MG, RG, MF, and GF, plus
the main effects that are all embedded in the effects listed because of the hierarchy
principle. Indeed, likewise for the effects MR, MG, and RG as they are contained in
MRG. Similarly, GF is contained in RGF. Consequently, the effects to be fitted are
{MRG,RGF,MF} as the basic model. Our earlier results indicate that this is indeed

7.4. AITKIN'S SELECTION METHOD

263

the final model selected by the previous procedures. We thus see that if Brown's
marginal and partial tests are conducted right from the onset, it usually leads to a
faster selection of the final model. PROG CATMOD can also be employed to obtain
these partial association tests. We present in Table 7.8 the partial and marginal
association tests for each effect in the model.
LR Statistics For Type 3 Analysis
PARTIAL ASSOC. TESTS

MARGINAL ASSOC. TESTS

ChiSquare

DF

ChiSquare

M
R
G
F

1
1
1
1

143.89
191.16
7.49
44.28

<.0001
<.0001
0 . 0062
<.0001

M*R
M*G
R*G
M*F
R*F
G*F

1
1
1
1
1
1

33.09
3.68
6.98
54.97
0.10
35.44

<.0001
0.0551
0 . 0082
<.0001
0.7511
<.0001

34.44
0.02
6.78
54.58
3.16
33.48

<.0001
0.8822
0.0092
<.0001
0.0754
<.0001

M*R*GM*R*F
M*G*F
R*G*F

1
1
1
1

4.15
0.03
0.51
4.53

0.0417
0.8552
0 . 4768
0 . 0334

4.15
0.03
0.51
4.53

0.0417
0.8552
0.4768
0.0334

Pr > ChiSq

Pr > ChiSq

Table 7.8: Partial and marginal association tests: Tests that each effect is zero

7.4

Aitkin's Selection Method

Aitkin (1979) proposed a method for testing the significance of effects in an all
(k 1) factor model against an all /c-factor model. The method consists basically of
fitting an all k, (k 1), (k 2), (A; 3), ,1 models to the k-way data and noting
both the corresponding G2 and the relevant degrees of freedom.
For instance, for a four-way table, we would fit an all four-factor model (that is,
the saturated model), an all three- factor, an all two- factor and finally an all main
effect models. For each of these, G2 and their corresponding degrees of freedom
would be obtained.
We let G2_l and G2 be the computed G2 values for the all (s l)-factor and
s-factor models (5 < fc), respectively. Also, let rfs_i and ds be their corresponding
degrees of freedom. Then by Aitken's method, we would reject the hypothesis of
no s-factor effects if
Gj_1-Gj>X2(l-7a,da-i-d.)

(7-2)

where 7S is chosen so that


,k

The above test is a test for the adequacy of all (s 1) factor model.
In the above, 7fc a is the assigned level (usually 0.05 or 0.01), k is the number
of factors in the table, and s = 2, 3, , k is all the s-factor effects. Further, the
7's are such that 7 must lie within 0.25 and 0.5, where:

264

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION

7=1-11(1-7s=2

For example, consider the case in which k = 4 and 74 = a = 0.05. Then


1 _ 72 = (l - 0.05)b - 0.7351

since

1 - 73 = (1 - 0.05)4 = 0.8145

=6

since

=4

/4\
1 - 74 = (1 - 0.05)l = 0.9500 since ( J = 1

Consequently,

72 - (1 - 0.7351) = 0.2649
73 = (1 - 0.8145) = 0.1855

and of course
Hence,

74 = 0.05

7 = 1- (0.7351)(0.8145)(1 - 0.05) = 1 - 0.5688 = 0.4312

Clearly, this value of 7 = 0.4321 satisfies the condition that it lies between 0.25 and
0.5.
For our example in Table 7.1, Table 7.9 displays for the all s-factor s = 1, 2,3,4
the computed G2 and their corresponding degrees of freedom.
s

1
2
3
4

Model
{M,R,G,F}
{MR,MG,MF,RG,RF,GF}
{MRG,MRF,MGF,RGF}
{MRGF}

G\
147.978
16.495
3.823
0

ds
11
5
1
0

Table 7.9: Computed G2 for the s-factor models


Aitkin's method would on the above basis, to select the all three-factor effects model.
As pointed out by Christensen (1990), Aitkin's method may be problematic and at
best could only provide a starting point to which higher order interactions could be
added. Comparisons of s- and (s l)-factor models using Aitken's proecedure are
dispalyed in Table 7.10.

s
4
3
2

Tests
3 Vs4
2 Vs3
1 Vs2

Gs-i - Gs
3.823
12.672
131.483

X2(%,da-i -da)

X (0.95,l) =3.841
X2(0.815,4) =6.178
X2 (0.735, 6) = 7.638

Table 7.10: Aitkin's method results


We feel that the selection methods described above, when used effectively, will
more than take care of most problems relating to log-linear model selection situations for any fc-way contingency table.

7.5. SELECTION CRITERIA

7.5

265

Selection Criteria

As indicated at the beginning of this chapter, one of the basic characteristics of any
selection procedure would have to be parsimony: that is, the marrying together of a
model complex enough to explain the data, while at the same time it is easy enough
to interpret. To this end, several criteria have been advocated for selecting the best
parsimonious model in a log-linear formulation. Most of these have been regression
type measures such as the R2, adjusted R2, and Mallow's Cp. Others are Akaike's
information criterion (AIC). A good review of these criteria are given in Clayton et
al. (1986).
Since it does not make much sense to have several selection criteria simultaneously, previous results have shown that the Akaike's information criterion AIC is by
far the best of all the model selection criteria, and it will be adopted in this book.

7.5.1

Akaike's Information Criterion AIC

Akaike (1974) considered the expected value of the logarithm of the joint likelihood
of a set of data and proposed that the model selected should be that which maximized this quantity. For log-linear models, maximizing this quantity is equivalent
to choosing the model that minimizes (Sakamoto et al., 1986).
AIC = G2 - 2 d
where G2 is the likelihood ratio test statistic computed for the model and d is its
corresponding degrees of freedom.
Obviously, as we consider every sequence of simpler models (in terms of fewer
parameters), both G2 and d will increase. Akaike suggested that the preferred
model would be the one with a minimum AIC value. However, as Raftery (1986)
pointed out, Akaike's procedure does not take into account the increasing certainty
of selecting a particular model as the sample size increases, but a Bayesian approach
due to Schwarz (1978) does have this desirable property. The strength of this
approach is that Schwarz was able to show that, for large enough sample size, the
exact form of the prior distribution was irrelevant and that the correct model was
certain to be selected. As Raftery shows, this approach is equivalent to selecting the
model for which the quantity BIC (Bayesian information criterion) is a minimum,
2
where BIG is given by
BIC = G - d In (JV)
where N is the total sample size. Unlike the AIC statistic, BIC takes account not
only of the model complexity, but also of the sample size. Upton (1992) strongly
advocated the use of the BIC criterion. We give in Table 7.11 the computation of
these criteria for all models that seemingly fitted our data when our selection cutoff
point was a = 0.05 where N = 772 for these data.
Our results indicate that if the AIC criterion were employed as the sole selection criterion for these data, the best parsimonious model would be the model
{MRG,RGF,MF}. This is the final model selected by all our selection procedures.
However, if the BIC is employed, the best parsimonious model is now model number
5, that is, model {MRG,MF,RF,GF}.
The above results clearly demonstrate the fact that the selection criteria may
sometimes result in the selection of different models. If the original cutoff point had
been a = 0.01 instead of 0.05, the former model would still have been preferred

266

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION


No

1.
2.
3.
4.
5.
6.

Models
{MRG,MRF,MGF,RGF}
{MRG,MGF,MRF}
{MRG,MGF,RGF}
{MRG,RGF,MF}
{MRG,MF,RF,GF}
{RGF,MR,MG,MF}

d.f.
1
2
2
3
4
4

G*

AIC

3.823
4.329
3.856
4.340
8.411
9.141

1.82
0.33
-0.14
-1.66
0.41
1.14

BIG
-2.83
-8.97
-9.44
-15.61
-18.18
-17.45

Table 7.11: Model selection based on AIC and BIG criteria


while the latter model would not have qualified for a good fit in the first instance.
Thus the former model would have been more robust than the model chosen by the
BIG were the selection cutoff point changed. In any case, the model selected by the
BIG is very much like the former in that the interaction effects of interest MRG
and MF are in both models. Really, the effects RF and GF do not tell us much in
relation to M (the response variable). Thus the data can be interpreted in terms of
the effects MRG and MF. We would in any case prefer the former model because
of its robustness to cutoff point a.
One final check on the models is to examine how well the models fit our data
by examining both the standardized and adjusted residuals. These are presented
below for the two models.
MODEL: {MGR, RGF, MF}
set tab71;
proc gerunod order=data; make 'obstats' out=aa; class M R G F;
model count=M|G|R R|G|F M|F/dist=poi liiik=log type3 obstats; run;
count

52
3
23
12
37
9
69
17
35
15
23
35
130
67
109
136

Pred

Resraw

Reschi

Streschi

50. 3218
4.6782
24.4340
10. 5660
40.2815
5. 7185
65. 9627
20.0373
36. 6782
13. 3218
21. 5660
36. 4340
126 .718
70. 2815
112 .037
132 .963

1 .6782
-1 .6782
-1 .4340
1 .4340
-3 .2815
3 .2815
3 .0373
-3 .0373
-1 .6782
1,.6782
1,.4340
-1,.4340
3 .2815
-3..2815
-3,.0373
3..0373

0.2366
-0 .7759
-0 .2901
0.4412
-0 .5170
1..3722
0.3740
-0 .6785
-0 .2771
0.4598
0..3088
-0,.2376
0.2915
-0 .3914
-0 .2870
0.2634

1..0377
-1..0377
-0,.7312
0..7312
-1.,7107
1.,7107
1,.2265
-1,.2265
-1,.0377
1..0377
0.,7312
-0.,7312
1,.7107
-1,.7107
-1,.2265
1.,2265

MODEL: {RGF, MR, MF, MG}


set tab71; proc genmod order=data; make 'obstats' out=bb; class M R G F;
model count=R|G|F M|R M|F M|G/dist=poi link=log typeS obstats; run;
count

52
3
23
12
37
9
69
17

Pred
45.2572
3.7844
27.9829
12.9755
45 . 5498
6 . 4086
62.2101
17.8315

Resraw

Reschi

6 . 7428
-0.7844
-4.9829
-0.9755
-8.5498
2.5914
6.7899
-0.8315

1.0023
-0.4032
-0 . 9420
-0 . 2708
-1.2668
1.0236
0.8609
-0.1969

Streschi
2.4188
-0 . 4967
-1.9527
-0.4244
-2.7221
1.2901
2 . 2842
-0.3131

7.6. INTERPRETATION
35
15
23
35
130
67
109
136

41.7428
14.2156
18.0171
34.0245
121.450
69.5914
115.790
135.168

OF FINAL MODEL

-6 . 7428
0 . 7844
4.9829
0.9755
8 . 5498
-2.5914
-6.7899
0.8315

-1.0436
0.2080
1.1739
0.1672
0.7758
-0.3106
-0.6310
0.0715

267

-2.4188
0 . 4967
1.9527
0 . 4244
2.7221
-1.2901
-2.2842
0.3131

It is evident from the results above that while both models have no significant
Reschi value, their standardized values show that model {MGR,RGF,MF} is much
better than the competing model. Consequently, this model will be chosen as the
most parsimonious model for this data.
rcsclii
We may note here that 1 equals the leverage a,ijki of the ijkl-ih cell
streschi
defined in Christensen (1990).

7.6

Interpretation of Final Model

Since the final chosen model is {MGR,RGF,MF}, Table 7.12 displays the observed
and expected values under this model.

M
1

R
1
2

1
2

Geographical region
G2
Gl
F2
F2
Fl
Fl
52
12
3
23
(24.41)
(10.55)
(50.33)
(4.68)
17
37
9
69
(20.04)
(5.73)
(40.30)
(65.96)
35
(36.69)
130
(126.68)

15
(13.32)
67
(70.28)

23
(21.59)
109
(112.04)

35
(36.44)
136
(132.96)

Table 7.12: Observed and expected counts for the marijuana data in Table 7.1
The model has a G2 value of 4.340 on 3 d.f., with a pvalue of 0.227. Clearly, this
model fits the data well. The model implies that given the response Mh, religion
Rh and geographical location Gh of the respondents are conditionally independent
of family status F. That is, religion and geographical location do not affect the
relationship between family status and the response M. The equivalent log odds
ratio of this relationship can be formulated as:
.MF.RG

for i, i', j,fc,/, I' = 1,2. Note here that we are keeping the levels jk constant. There
are four such combinations of j and fc, namely, (1,1), (1, 2), (2,1), and (2,2). For
each of these, we have

268

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION


Result
1.361

j,k
I 1

i,i'
12

JJ'
12

12

12

12

In

77111217712122
Tn,2121m1122

1.361

21

12

12

In 77ii2iim22i2
7712211777,1212

1.361

12

777,1221 7T12222

1.361

22

12

~.M.K..KOr'

In 77iniiTn2ii2
T7l211JTnill2

In

77122217711222

Below are presented the log-parameter estimates produced by PROC GENMOD


when this model is fitted to the data in Table 7.1. Clearly, the estimate of the
parameter XMF = 1.3627, which corresponds to the values of fMF-RG computed
from expected values above.
Analysis Of Parameter Estimates
Standard
ChiDF Estimate
Error
Square

Parameter
Intercept
M
G
M*G
R
M*R
R*G
M*R*G
F
R*F
G*F
R*G*F
M*F

1
1
1
1
1
1
1
1
1
1
1
1

1
1
1
1 1
1
1
1 1
1

1
1
1
1
1
1
1
1
1
1
1
1
1

4.8901
-1.8925
-0.6376
-0.6163
-1.2946
0.6S46
-0.3685
0.8077
-0.1712
-0.3532
0 . 7607
0 . 7765
1 . 3627

0 . 0847
0.1929
0.1420
0.2139
0.1753
0.2617
0.3207
0.3701
0.1216
0 . 2487
0.1829
0.3900
0.2019

3333.23
96.26
20.15
8.30
54.52
6.26
1.32
4.76
1.98
2.02
17.29
3.96
45.55

Pr > ChiSq
<.0001
<.0001
<.0001
0 . 0040
<.0001
0.0124
0 . 2505
0.0291
0.1591
0.1556
<.0001
0.0465
<.0001

That is, the model stipulates that there is a constant association as measured by
the log odds ratio between the M and F in each of the four (R-G ) subtables. That
is, if we form the R-G subtables of expected counts, then the log odds ratio would
be constant in each of these subtables: These can be further demonstrated clearly
as follows:
Subtable G1R1
Ml
M2
Fl 50.33 36.69
F2
4.68 13.32

Subtable G1R2
Ml
M2
Fl 40.30 126.68
F2
5.73
70.28

Subtable G2R1
Ml
M2
Fl 24.41
21.59
F2 10.55 36.44

Subtable G2R2
M2
Ml
Fl 65.96 112.04
F2 20.04 132.96

Table 7.13: R-G Subtables of expected counts


In each of subtables (G1R1, G1R2, G2R1, G2R2), the estimates of the odds ratios
are:
50.33 x 13.32
TGIRI =
-^r = 1.361
36.69 x 4.68
40.30 x 70.28
1.361
TG1R2
126.68 x 5.73

7.6. INTERPRETATION

269

OF FINAL MODEL

24.41 x 36.44
21.59 x 10.55
65.96 x 132.96
= 1.361
TG2R2. =
112.04 x 20.04
In each of the cases, the log of the odds ratio equals 1.361 and can be seen to be
constant from one-subtable to another. Consequently, 9 = e1-3627 = 3.91. In other
words, for a given religion and geographical location, the odds that an individual
would have smoked marijuana is 3.9 times higher for married respondents with
children than those without children. Similarly, the odds that an individual would
have responded as smoking marijuana is 0 e-6546 = 1.92 higher for Protestant or
Catholic than for those with religious affiliation. On the other hand, the odds are
lower for those respondents resident in San Francisco than for those from Contra
Costa, being about 54% of those from Contra Costa.
Again, since the MRG effect is very important, we present the observed counts
table of religion, geographical location, and response (smoked marijuana) in Table
7.14.
TG2RI =

G
S-Francisco
C-Costa

R
Christians
Others
Christians
Others

Total

Yes
Ml
55
46
35
86
222

No
M2
50
197
58
245
550

Total
105
243
93
331
772

Table 7.14: Observed marginal table for MRG


We observe that there are more respondents in the study from Contra Costa than
from San Francisco. Among the individuals from San Francisco, the proportions of
those who reported that they had used and those that had not used marijuana are
about the same among Catholic and Protestants. On the other hand, the proportion
who responded that they have used marijuana is very much lower than those who
reported they have never used marijuana among the non-Christians. This same
pattern is exhibited among the respondents from Contra Costa.
Smoked marijuana?
Family
status
With children
Without children
Total

Yes
181
41
222

No
297
253
550

Total
478
294
772

Table 7.15: Marginal table MF collapsed over R and G.


When the table is again collapsed over the variables R and G (religion and geographical location), we see that the proportion of individuals who responded yes
to having used marijuana are much more higher for those that are married with
children than those that are unmarried or without children. Clearly, the reporting
of having used marijuana or not is highly associated with the family status of the
individual.

270

7.7

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION

Danish Welfare Study Data: An example

The data for our second example relate to the response to the question of whether
there was a freezer in the household or not among subjects in the 1976 Danish
Welfare Study (from Andersen, 1997). The data are presented in Table 7.16.

A :Sex

B:Age

C: Income
High

Old

Medium
Low

Male
High
Young

Medium
Low

High
Old

Medium
Low

Female
High
Young

Medium
Low

D:Sector
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public
Private
Public

E: Freezer in
the household
Yes
No
152
39
82
18
135
31
35
12
89
45
20
9
259
46
101
26
183
55
54
15
108
54
22
13
82
17
85
16
46
16
60
11
29
29
40
18
160
23
152
28
89
17
56
21
57
41
34
28

Table 7.16: Danish Welfare Study cross-classified according to possession of a freezer


The data are a five-way 2 x 2 x 3 x 2 x 2 contingency table, with variables: A sex,
B age, C family taxable income, D employment sector, and E whether there is a
freezer in the household. The age categories were:
fLow
if<60,000 D.kr
fold
if>40
age =
income = < Med. if 60,000 - 100,000 D.kr
[Young if < 40
I High if > 100,000 D.kr
We shall adopt the selection cutoff point of a 0.05 in this analysis.Results produced from PROC GENMOD are displayed in the table below for both the partial
and marginal associations. The SAS software program for implementing the partial
and marginals are: First, for the partial results, we need to execute GENMOD
to fit the following:
data tab77;

271

7.7. DANISH WELFARE STUDY DATA: AN EXAMPLE


do a=l to 2; do b=l to 2; do c=l to 3; do d=l to 2; do e=l to 2;
input count <B(B; output; end; end; end; end; end; datalines;
152 39 82 18 135 31 35 12 89 45 20 9 259 46 101 26 183 55 54 15 108 54 22
13 82 17 85 16 46 16 60 11 29 29 40 18 160 23 152 28 89 17 56 21 57 41 34 28
proc genmod; class a b c d e; model count=a b e d e/dist=poi type3; run;
model count=a|b|c|d|e<D2/dist=poi type3;run;
model count=alb|c|d|eQ3/dist=poi type3; run;
model count=a|blc|d|eQ4/dist=poi type3; run;
proc freq; weight count; tables a b c d e a*(b c d e) b*(c d e) c*(d e) d*e/chisq; run;

The partials for zero-order, two-factor, three-factor, and four-factor effects are obtained sequentially from the above GENMOD statements, respectively. The twofactor marginal partials are obtained from the PROC FREQ statements. The G2
here is what would be expected under the model of independence between the
two variables involved. These marginal tests can also be obtained from the fourth
GENMOD statement. The G2 obtained for the two-factor interactions are exactly
those obtained from the various tests of independence provided by PROC FREQ
statements. As mentioned in chapter 6 care should be taken in relying solely on
the marginal tests because of problems associated with collapsibility of contingency
tables, namely, Simpson's paradox. Brown's partial association takes care of this
problem as it only measures the contribution of each term as if it enters last in the
model. In general, the partial and marginal G2 values are often very close. These
G2 values are presented in Table 7.17.
Source

d.f.

a
b
c
d
e
a*b
a*c
a*d
a*e
b*c
b*d
b*e
c*d
c*e
d*e

1
1
2
1
1
1
2
1
1
2
1
1
2
2
1
2
1
1
2
2
1
2
2
1
2
2
2
1
2
2

a*b*c
a*b*d
a*b*e
a*c*d
a*c*e
a*d*e
b*c*d
b*c*e
b*d*e
c*d*e
a*b*c*d
a*b*c*e
a*b*d*e
a*c*d*e
b*c*d*e

Partial assoc. tests


G'4
pvalue
< .0001
73.10
< .0001
100.93
235.73
< .0001
263.74
< .0001
864.16
< .0001
3.50
0.0615
0.0353
6.69
< .0001
144.78
0.9817
0.00
0.0364
6.63
0.0288
4.78
0.8631
0.03
25.75
< .0001
98.82
< .0001
0.5683
0.33
0.4882
1.43
0.4160
0.66
0.5524
0.35
6.44
0.0400
0.0291
7.07
0.5795
0.31
0.6809
0.77
2.76
0.2520
4.98
0.0256
1.04
0.5950
4.96
0.0839
0.06
0.9696
0.1250
2.35
0.39
0.8228
0.8976
0.22

Marginal assoc. tests


G*
pvalue

2.09
8.28
144.86
0.03
6.13
2.44
0.41
26.11
98.98
0.07
1.80
0.83
0.04
6.35
6.83
0.23
0.37
2.38
3.57
0.26
3.82
0.16
1.68
0.43
.0.31

0.1485
0.0160
< .0001
0.8536
0.0466
0.1187
0.5245
< .0001
< .0001
0.7979
0.4068
0.3630
0.8373
0.0417
0.0329
0.6335
0.8302
0.3045
0.0589
0.8789
0.1481
0.9214
0.1953
0.8072
0.8556

Table 7.17: Results of Association tests from PROC GENMOD.


We present the G2 values for the equiprobable, zero-order, first-order, second-order,
third-order, and saturated models for the data in Table 7.16 in Table 7.18.
The corresponding marginal tests are presented in Table 7.19:

272

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION

k
0

1
2
3
4
5

df
47
41
27
11
2
0

& p- value
1863 4400
0.0000
325 7866
0.0000
35 8556
0.1186
11 3376
0.4154
3 9186
0.1410
0 0000
-

Table 7.18: G2 values under various models for the data in Table 7.16

k
1
2
3
4
5

d.f.
6
14
16
9
2

G'z
1537.6534
289.9310
24.5180
7.4190
3.9186

pvalue
0.0000
0.0000
0.0788
0.5936
0.1410

Decision
Reject
Reject
Fail to reject
Fail to reject
Fail to reject

Table 7.19: Marginal tests: Tests that A;-way effects are zero
The marginal fc-way tests indicate that four-way and three-way marginals are not
necessary in the model. Because marginal tests are subject to Simpson's paradox,
we examine below the partial association tests obtained from the SAS software
output displayed in Table 7.17.
Results from the marginal and partial tests indicate that the following effects
need to be included in the final model as they have significant partial and marginal
effects. These are presented in order of their importance (pvalues): AD, CE, CD,
AC, BC, BD, CE, BDE, ACE, ACD. We may note that the BDE interaction is
only significant in the partial association test. A possible all two-factor initial
model would therefore be the model with generating class {AD,CE,CD,AC,BC,BD}.
However, we started our model building by the initial model {AD,CE,B} designated
here as model (1). We present in Table 7.20 some possible hierarchical log-linear
models for the data.
Differences
Number
1
2
3
4
5
6
7
8
9

Added
term
CD
AC
BC
BD
na
ACD
AB

ABCD

d.f.
38
36
34
32
31
29
27
26
18

G2
81.9450
55.8432
49.1389
43.0077
39.7386
36.1553
29.8022
27.8265
16.5724

pvalue
0.0000
0.0185
0.0449
0.0926
0.1351
0.1691
0.3231
0.3670
0.5527

d.f.

2
2
2
1
na
2
1
na

G2
26.1018
6.7043
6.1312
3.2691
6.3531
1.9757
na

Table 7.20: Possible hierarchical models


From Table 7.21, we see that model 1 has the two most significant of the 10 twofactor effects as the generating class. Closed-form expression for the expected values

7.7. DANISH WELFARE STUDY DATA: AN EXAMPLE


Model

1
2
3
4
5
6
7
8
9

273

Generating Class

AD, CE, B
AD,CE,CD,B
AD,CE,CD,AC, B
AD,CE,CD,AC, BC
AD,CE,CD,AC, BC, BD
ACE, AD, CD, BC
ACE, ACD,BC
ACE, ACD, BC, AB
ABCD, ACE

Table 7.21: Generating classes for the models in Table 7.20


exists because the model is decomposable. The second model introduce the next
important effect, namely, the CD. The resulting model (2) does not fit the data.
We next introduce the AC term. The resulting model (3) is not decomposable since
it does not have its three- factor generating class ACD. We next introduce the BC
term, leading to model (4). The new model fits the data with a pvalue of 0.0926.
We introduce the BD term in model (5). While this model fits, the contribution of
the BD term is not significant. Hence, we remove the BD term from the model and
then introduce the ACE term in model (6). This model fits the data with a pvalue
of 0.1691. We next introduce the ACD term in the model, leading to model (7).
The model fits the data well with a G2 value of 29.8022 on 27 degrees of freedom
(pvalue = 0.3231). The model is also decomposable as its has the generating class
ACE and ACD in the model.
If we let A, B, C, D, and E be indexed by i,j,k,l, and p, respectively, then
under model 7, we have the estimates as:
n

+jk++

Each of models 4 to 7 fit the data based on their corresponding pvalues. Model 4 is
thus the minimal model for this data. If a four-factor interaction term, representing
the factor variables, is not included in our choice of model, then model (7) is the
most parsimonious, simplest model for these data. Examination of its standardized
and adjusted standardized residuals, however, indicates that four of the cells have
their adjusted standardized residuals greater in absolute values than 2, although all
its standardized residuals seem acceptable as none of them is significant.
Admitting the two-factor interaction term AB into model {ACD,ACE,BC} leads
to model {ACD,ACE,BC,AB}. This is designated as model (8) above. The model
fits even better from the pvalue, but again, three of the cells have significant adjusted standardized residuals, hence not acceptable. The model, however, lacks its
generating term ABCD. If we include this term in the model, we now have a better
model as all the adjusted standardized residuals are no longer significant. Hence,
the final model is model {ABCD, ACE} with a G2 = 16.5724 on 18 d.f. (pvalue =
0.5527), which is model (9) above.

7.7.1

Equivalent Logit Model

If we consider variable E as a response variable, then it would be necessary to start


off by fitting logit models to the data. A logit model would necessarily include

274

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION

the four-factor interaction term ABCD in its log-linear equivalence. The saturated
logit model for the data in Table 7.16 is implemnted in PROC LOGISTIC as follows
together with the type 3 statistics.
set tab77; proc logistic; class a b e d ; weight count;
model e=a|b|c|d/scale=none aggregate; run;
Type III Analysis of Effects
Wald
Effect
DF
Chi-Square
Pr > ChiSq

a
b
a*b
c
a*c
b*c
a*b*c
d
a*d
b*d
a*b*d
c*d
a*c*d
b*c*d
a*b*c*d

1
1
1
2
2
2
2
1
1
1
1
2
2
2
2

0.0063
0.0082
0.0073
78.3490
5.3566
1 . 3830
0 . 0669
0 . 0076
0.7091
5 . 3445
3.1979
1.1274
0.4192
0.1556
3.9139

0 . 9366
0.9278
0.9320
C.0001
0.0687
0.5008
0.9671
0.9306
0.3997
0.0208
0.0737
0.5691
0.8109
0.9252
0.1413

Significant terms from the type 3 analysis above are C and BD with pvalues of
<0.0001 and 0.0208, respectively. Hence our baseline model would be the logit
model {C,BD} which is equivalent to the log-linear model {ABCD,CE,BDE}. Fitting this model, we have G2 = 19.0454 on 18 d.f. (pvalue = 0.3890). The model
fits very well and is implemented with the following SAS software program and
corresponding partial output.
set tab77;
proc logistic;class a (ref=last) b (ref=last)
c (ref=last) d (ref=last)/param=ref;
weight count; model e=c b|d/scale=none aggregate=(a b e d ) ; run;

Deviance and Pearson Goodness-of-Fit Statistics


Criterion

DF

Value

Value/DF

Deviance
Pearson

18
18

19.0454
19.0695

1.0581
1.0594

1
2
1
1
11

1
1
1
1
1
1

0.3768
1.1054
0.7884
0.2432
0.2223
-0.3960

0.1271
0.1120
0.1179
0.1614
0.1272
0.1987

Odds Ratio Estimates

Effect

c 1 vs 3
c 2 vs 3

0 . 3890
0 . 3876

Analysis of Maximum Likelihood Estimates


Standard
DF
Estimate
Error
Chi-Square

Parameter

Intercept
c
c
b
d
b*d

Pr > ChiSq

Point
Estimate
3.020
2.200

95'/. Wald
Confidence Limits
2.425
1.746

3.762
2.772

8.7887
97.4328
44.7073
2.2720
3.0575
3.9728

Pr > ChiSq

0.0030
<.0001
<.0001
0.1317
0 . 0804
0.0462

7.7. DANISH WELFARE STUDY DATA: AN EXAMPLE

275

The class statement in the SAS software program above instructs SAS software
to use the cell reference coding scheme as in PROG GENMOD. The parameter
estimates with this coding therefore will be exactly the same as those that would
be obtained using PROG GENMOD to fit the equivalent log-linear model. The
logit model {C,BD} fits the data very well. We also consider a linear effect of
variable C on the model. The model that assumes the linear effect of variable C has
G2 = 24.530 on 19 d.f. (pvalue = 0.1754) and a log-parameter estimate 0.5469.

7.7.2

Interpretation

C
k
I
2
3

Freezer Status
1
2
1073
213
658
178
399
237

Table 7.22: Observed CE marginal table with C fixed at k = 1,2,3


From the parameter estimates, the odds are 3.02 times higher for an individual with
a low income to respond to yes as against no for having a freeze in the household,
and 2.20 times higher among the middle income group. The linear effect parameter
also indicate that as the income increases, the odds reduce since the estimate is
negative.
The BDE interaction being significant indicates that given the response Ep, age
and employment sector are conditionally independent of sex and income. That is,
age and employment do not affect the relationship between sex and the response
variable.
The above indicates that age and freezer response given the employment sector
do not depend on sex and income.
Let us form the three-way BDE table as in Table 7.23 and we will use this to
obtain the estimated log of the odds ratio.

B
3
I
2
I
2

D
/

1
2

E
P
I
533
856
322
419

2
177
236
84
131

Table 7.23: Observed BDE marginal table with D fixed at / = 1, 2


Estimated log odds ratios for TBE'D are 0.186 and 0.181 at levels of D = 1 and
2, respectively. Thus the odds of a yes response are about 17% lower among the
over 40's than for the under 40's among the private-sector respondents, while the
odds are about 20% higher between the two age groups among the public sector
respondents.
Logit models based on the forward, backward and stepwise procedures from
PROG LOGISTIC when applied to the data in Tables 7.1 and 7.16 will be discussed

276

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION

in later chapters.

7.8

Exercises

1. The data in Table 7.24 make uo a four-way 2 x 2 x 2 x 2 table of severity of


drivers' injuries in auto accident data (Christensen, 1990).
Accident type
Driver
Car ejected
Injury Collision Rollover
Small
No Not severe
350
60
Severe
150
112
Yes Not severe
26
19
Severe
23
80
Standard
No Not severe
1878
148
Severe
1022
404
Yes Not severe
111
22
Severe
161
265
Table 7.24: Auto Accident Data
(a) Use the forward selection procedure to build a suitable model.
(b) Use the backward elimination procedure to build a model.
(c) For both situations above, identify the graphical models as described in
6.8 and hence give the MLE expressions for your models. Interpret your
model. (You may also examine the odds and odds ratios for a proper
interpretation.)
2. Refer to problem 1 above; use backward and forward selection models to
build a model that treats type of car, driver ejection and accident type as
explanatory variables.
3. The data for this problem, presented in Table 7.25 are from Stern et al. (1977)
and relate to a survey to study the relationship between Pap testing and
hysteectomies. Interest in the data centers on whether there is association
between Pap testing and ethnicity.
(a) Fit the saturated log-linear model to the data and identify the important
interactions and hence fit the most parsimonious model based on this
result.
(b) Identify a log-linear model based on both the forward and backward
elimination procedures.
(c) Identify a log-linear model based on partial associations.
(d) Which of the forward, backward, and partial associations procedure
would you prefer and why? Based on this, select the final model and
perform residual analysis. Draw your conclusions.
4. Refer to problem 6.10. Use backward, saturated, and partial association procedures to determine the most parsimonious model for the data.

7.8.

EXERCISES

277

Income
Middle

Low

Age

35-44
45-54
55-64
65+
35-44
45-54
55-64
65+

White
< 2 2+
22
3
23
3
23
5
21
8
21
10
14
11
11
15
13
15

Ethnicity
Black
< 2 2+
37
6
27
4
13
1
6
4
39
16
44
16
24
18
11
25

Spanish
< 2 2+
36
4
30
9
16
4
3
4
15
5
5
15
2
4
3
9

Table 7.25: Data relating ethnicity and pap testing (Freeman, 1987)
The data for this problem is from Andersen (1997) and relate to the results of
an investigation of 1314 employees who had left their jobs during the second
half of the year. These layoffs were cross classified by A length of employment
prior to the lay-off, B cause of layoff, and C employment status. The data are
presented in Table 7.26
A: Length
of employment
<1 month
1 - 3 months
3 months - 1 year
1 - 2 years
2 - 5 years
> 5 years

B: Cause of
layoff
Closure
Replacement
Closure
Replacement
Closure
Replacement
Closure
Replacement
Closure
Replacement
Closure
Replacement

C: Employment status
Still
Got a
new job unemployment
10
8
24
40
42
35
42
85
86
70
41
181
80
62
16
85
67
56
27
118
35
38
10
56

Table 7.26: Data for exercise problem 7.5


(a) Use the backward procedure to find the most parsimonious model.
(b) Use marginal and partial association procedures to determine the most
parsimonious model for the data.
6. The data in Table 7.27 relate to the attitudes toward nontherapeutic abortions
among white Christian subjects in the 1972 - 1974 General Social Surveys as
reported by Haberman (1978). Find an appropriate model for the data and
interpret your results.

278

CHAPTER 7. STRATEGIES FOR LOG-LINEAR MODEL SELECTION


Attitudes

Year
1972

Education
in years

Positive

Mixed

Negative

9
85
77

16
52
30

41
105
38

8
35
37

8
29
15

46
54
22

9-12
> 13

11
47
25

14
35
21

38
115
42

<8
9-12
> 13

17
102
88

17
38
15

42
84
31

<8

9-12
> 13

14
61
49

11
30
11

34
59
19

<8
9-12
> 13

6
60
31

16
29
18

26
108
50

<8

9-12
> 13

23
106
79

13
50
21

32
88
31

S. Prot.

<8
9-12
> 13

5
38
52

15
39
12

37
54
32

Catholic

<8
9-12
> 13

8
65
37

10
39
18

24
89
43

Religion
N. Prot.

<8

9-12
> 13
S. Prot.

<8

9-12
> 13
Catholic

1973

N. Prot.

S. Prot.

Catholic

1974

N. Prot.

<8

Table 7.27: Attitudes towards abortion, Haberman (1978)

Chapter 8

Models for Binary Responses


8.1

Introduction

We shall consider here a response variable having binary or dichotomous categories


such as "yes or no," "alive or dead," "present or absent," "survive or does not
survive," etc., with factor variable(s) denoted by X. The terms "success" and
"failure" are usually used generically for these two categories. If the binary random
variable is defined as:
1 if outcome is a success
Y=
0 if outcome is a failure
where P(Y = 1) = TT (the underlying probability of success) and P(Y = 0) =
1 TT, then with n such independent random variables Y\, 2 , . . . , Yn with constant
probability of success TT, the random variable

has the binomial distribution 6(n, TT):


r = 0,1,2, ,n

(8.1)

To motivate the discussion in this chapter, let us consider the following data in
Table 8.1, relating to a 30-year coronary heart disease (CHD) mortality study of
Black and White men aged 35 to 74 years in Charleston, SC, and Evans County,
GA (Keil et al., 1995).
City or
county
Charleston
Evans

Black men
Examined Died
319
56
407
69

White men
Examined Died
635
139
184
711

Table 8.1: Coronary disease mortality in the Charleston, SC and Evans County,
GA, Heart Studies, 1960 to 1990
In the above example, the outcome Y is binary (died or not died), that is,
279

280

CHAPTERS.

MODELS FOR BINARY RESPONSES

f 1 if died
( 0 if alive
and the explanatory variables are sites (Charleston or Evans county) and race (Black
or White). Evidently, here, deaths are coded as successes. Interest centers on the
relationship between the sites and race on coronary heart disease mortality. We
would like to ask whether there is any difference (and to what degree) in the rates
of mortality between the two sites and if such differences depend on the race of the
individual. These and other similar questions will be considered in this chapter by
fitting models that can adequately describe the relationships that are inherent in
the data.
To develop the necessary tools for handling such data, let us first consider k independent random variables R\, R%, , Rk (each having the binomial distribution
described above) corresponding to the number of successes in k different subgroups.
Let the probability of success P(Ri = TTJ) be constant within each subgroup and
let Hi be the corresponding number of trials in each subgroup. We display this in
Table 8.2:
Outcome
Success
Failure
Totals

Subgroups

Ri

Rk
<n

HI

f?

...

n<2

Tl

nk - Rk
nk

f?

rii

Table 8.2: Outcome display for k subgroups


With the above setup, it follows that Ri ~ 6(n^,7rj). That is,
P(Ri=ri)=(ni\li(l-iri}ni-ri

ri = 0 , l , 2 , . - - ,n

and i = l,2,..,fc

and the log-likelihood function is given by:


L(7T,r) = V Lin (-r^-}
+n i ln(l - ^) + ^ Ml
1
r

^L

v -^/

\ iJ\

(8-2)

With Ri ~ 6(n;, TT^) under the assumption that TT^ is constant within each subgroup,
/?
it follows that an estimate of TTJ is pi = - and that
Hi

E(Pi}=7n,

8.2

and

"

Generalized Linear Model

Our goal is to be able to describe the observed proportion of success in each subgroup
in terms of the explanatory variables X. That is, we wish to model the probabilities
as
0(7rO=X/3
(8.3)
where X is a vector of explanatory or factor variables, /3 is a vector of parameters
and g is a link function as defined in section 6 of chapter 2. (We note here that we
usually use dummy variables for factor levels and measured values for covariates.)

8.2. GENERALIZED LINEAR MODEL

281

The well-known general linear model has


= X-

One disadvantage of the linear model is that estimates of TT^ may sometimes lie
outside the interval [0,1]. To ensure that this does not happen, we use instead the
cumulative distribution
x
F(x)^g-l(K'i(3)^ \ f(y)dy
J oo

where f(y) is the probability density function, and is sometimes referred to as the
p.d.f. of the tolerance distribution. Tolerance distributions that have been well
documented in the literature and are commonly used are the normal, the logistic,
and the extreme value distributions. We discuss these distributions in the next
subsections.

8.2.1

Dose- Response Models

Let us consider in particular a dose-response experiment and let TT; be the theoretical
population survival rate using drug (X) at dosage level Xi (or usually log-dosage).
Let pi be the corresponding observed value of TTJ and we will be interested in modeling TTi as a function of the dose levels Xi . The simplest form of this relationship is

g(^i] = A)
OT

8.2.2

The Normal Tolerance Distribution

If the normal distribution is employed as the tolerance distribution, then we would


\/27r 7-

where <& denotes the cumulative probability distribution for the standard normal
7V(0,1). Thus
(<*) = &-1(ir-)=0 +3 xwhere g(i^i) is the link function, PQ ^ and j3\ ^. The link function g is the
inverse Cumulative Normal probability function $~1. When the tolerance distribution is the normal, the relevant model is called the probit model. The probit of a
probability TT is denned for each dose level ;, i = 1, 2, , k to be a value s such
that
i
i
*
where s?- =
= P(Z < Si],

that is, TTi = $(sj)

The model is appropriately fitted by invoking the probit link in PROC GENMOD
in SAS. SAS also has a PROC PROBIT that can be readily employed. Finney
(1971) and Bliss (1935) have given extensive discussions on probit analysis and
interested readers are encouraged to consult these texts. We display in Figure 8.1
a graph of the probit model for values of 0 < p < 1.0.

CHAPTER 8. MODELS FOR BINARY

282

RESPONSES

Figure 8.1: Probit Plot


8.2.3

The logistic distribution

The logistic tolerance distribution (Berkson, 1953, 1955) has density

f(y} =
and

4. e /3o+/3iy]2

*(x) = J
I

exp(/30 + fax)
I + exp(A) + fax)

An alternative definition of F(x) is:


(8.4)

which leads to:


In

(8.5)

for (i = 1, 2, , fc), which is referred to as the logistic function. The corresponding


link function is given by the logit:

/ nx = iIn

Clearly from the above expression, the odds of response 1 (success) are

syr-'U

\/

In general,
(8.6)

where x^ ft = /30 + faXl + (32X2 + + (3kXk.


Equation (8.6) is described as a linear logistic regression model because it is a
regression type model if the explanatory variables are quantitative. Similarly, (8.6)
will be an ANOVA-type model when the explanatory variables are categorical. In
this case, we often refer to it as the logit model.

8.2. GENERALIZED LINEAR MODEL

283

\i is sometimes referred to as the logit or (log-odds) of the probability of survival


for the i-th dosage level. "Logit" is a contraction of the phrase "logarithmic unit" in
analogy to Bliss's (1935) probit for "probability unit." The logit is a transformation
that takes a number TT between 0 and 1 and transforms it to In f j^ J . Similarly,
the logistic transformation takes a number x on the real line and transforms it to
ex/(l + ex). We note here that the logistic and logit transformations are inverses
of each other, with the former transformation giving TT and the latter giving x.
The shapes of the function f ( y ) and TT(X) are similar to those of the probit
model except in the tails of the distributions (see Cox & Snell, 1989). We display
in Figure 8.2 a plot of the logistic distribution for 0 < p < 1. Any value of TT in the
range (0,1) is transformed into a value of the logit(TT) in (00, oo). Thus as TT > 0,
logit (TT) > oo.

Figure 8.2: Logistic plot


In Figure 8.3 is the plot of the transformation of 0 < X < 80 to TT(X).

Figure 8.3: Transformation of X to TT(X)


In Figure 8.3, TC(X) is defined either as in (8.5) or (8.6). Thus, as x > OO,TT(X) 4- 0
when Pi < 0 and TT(X) 11 when (3\ > 0. As 0i - 0, the curve flattens to a horizontal
straight line and when the model holds true with /3 = 0, the dichotomous or binary
response is independent of X.
TT(X) has d^(x]/dx = ^\-K(X][\ TT(O;)]. Thus the curve has its steepest slope at
x value where TT(X) = ^, which is x = (3o/(3\. Logit(?r) is a sigmoid function that
is symmetric about TT = 0.5. The graph is approximately linear between TT = 0.2

284

CHAPTER 8. MODELS FOR BINARY RESPONSES

and TT = 0.8, but is definitely nonlinear outside this range.


For more details on the properties of ?r(x) see Agresti (1990).
The probit (discussed earlier) and the logit links are similar, and it can be shown
that the variances of the two are equal if the logistic (3^ is ^- times the probit ft\,
and the two models usually give similar fitted values, with parameter estimates for
logit models being usually 1.6 to 1.8 times those of the probit model.

8.2.4

Extreme Value Distribution

Other models that have also been considered for the dose-response data include the
complementary log-log and the log-log link models. The latter, sometimes referred
to as the extreme value model, is characterized by:

and

Figure 8.4: Complimentary log-log


Figure 8.4 displays the graph of
F(x] = I - exp[-ex]
against x for values of x ranging over [10,10]. A transformation of the form
l n [ - l n ( l - 7 r ) ] = A ) + ftz
transforms TT(Z) above to the well-recognized linear model form. The link function,
In [ In (1 TT)], is called the complementary log-log function or CLL for short. The
model is usually preferred to either the probit or the logistic models for values of TT
near 0 or 1. In Figure 8.4 is a plot of the function that transforms a probability TT
in the range (0,1) to a value in (00, oo). From the graph, we see that the function
is not symmetric about TT = 0.5 in this case, and as TT -> 1, ln(7r) -> oo and for
TT = 0.5, In (TT) = 0. The function is a special form of the Gumbel distribution,
which has been found very useful in modeling the breaking strength of materials.
In Figure 8.5, the plot of the link transformation against p uses values of p ranging
from 0.001 to 0.999.

8.2. GENERALIZED LINEAR MODEL

285

Figure 8.5: Complimentary log-log


For particular values of X (say x\ < x 2 ), we have
In [1 - ln{l - 7r(x 2 )}] - In [1 - In (1 - TT(XI)}] = (3i(x2 so that

In[l-7r(z 2 )]

= exp[/3i(z2 -

and

That is, the probability of surviving at x2 equals the probability of surviving at xi


raised to the power exp(/3i) higher for each unit increase in the distance x2 x\.
The complimentary log-log link is not symmetric, but for small TT, it is very close
to the logit link since in this case In (1 TT) ~ TT, so that
In ( In (1 TT)) w In TT,

In (TT/(I TT)) w In TT + TT

The complimentary log-log link model is most suitable in asymmetric situations


when TT(X) increases from 0 fairly slowly, but approaches 1 quite rapidly as X increases.
On the other hand, if TT(X) departs from 1 slowly but approaches 0 sharply as X
increases, the appropriate model is the log-log link, which is defined as:
where F(x) = exp(-e'i;) = exp{- exp[-(/?0 + fax}}}. F(x] is the CDF for the
reversed extreme value distribution. We note here that if v has the extreme value
distribution with CDF
F(x) = 1 - exp(-e v )
then r v has the reversed extreme value distribution with
F(r) = exp(-e~ r )
For TT(X) near 1, the log- log link is very close to the logit link.
The SAS PROC GENMOD and PROC LOGISTIC can be used to fit some
of the models described in the preceding sections by specifying the following link
functions in PROC GENMOD and LOGISTIC.
a. Logistic model (logit link function): LOGIT.
b. Probit model (inverse cumulative Normal link function $-1): PROBIT.

286

CHAPTER 8. MODELS FOR BINARY

RESPONSES

c. Extreme value model (complimentary log log function): CLL


Figure 8.6 gives the graphs of the logit, probit, and complimentary log-log against p
ranging from 0.001 to 0.999. We note that both the probit and logistic are symmetric
about p 0.5. The complimentary log-log is asymmetric in this case.

Figure 8.6: Plots of the three functions

8.3

Estimating the Parameters of a Logistic


Regression Model

For the simple linear logistic regression, we have

(8.7)
where Aj is the logit of the probability of "survival" or "death" for the z-th level of
X.
Thus the logistic model in (8.7) can be fitted by a linear model
Pi
= A) +

where

Si = l n

Pi
-Pi

are the observed logits. Accordingly,


We can compare the above with the classical simple linear regression model E(yi) =
f3o + PiXi, where consistent estimates of the parameters /3o and 0i can be easily
obtained by ordinary least squares (OLS) method. The OLS assumes that the yi
are homoscedastic, that is, the yi has constant variance denoted by a2 and hence
the OLS estimates are given by:
= (X'X)-1X'Y
where

8.3.

PARAMETERS ESTIMATION IN LOGISTIC

REGRESSION

287

However, in our case, the variance of 6i is not constant, being dependent on pi (the
observed success probability) and rii (the binomial denominator) , the probability of
survival when the i-th dosage level is applied and the number of subjects receiving
the dosage, respectively. We encourage students to show, using the delta method,
that the variance of i is given by (appendix F.I)
Var(<5i) =

,2,---,fe

pi)

(8.8)

Pi)]~l is an element of a diagonal matrix.


Thus an estimate of Var((5j) = [n^
we
If we assume independent sampling for each subpopulation i = 1,2,
have
Cov((*>i, Sj) = 0 for i 7^ j
If we therefore write the observed logits in vector form as
&' = (61,62,-- ,6k)

and the theoretical logits similarly in vector form as:


/ \ / 2)
\ ' ' ' j Afc)
\ \
A' \'\l)
then the model in (8.7) can be written as:
where X is an k x 2 and /3 = 2 x 1 matrix. That is,

X=

and /3 =
I

A)

xk

Furthermore, the unrestricted linearized covariance matrix of 6 in (8.6) is given by:

0
0

We see from the above that the variances of the 5's are not constant and hence the
method of weighted least squares (WLS) will be employed to obtain estimates of
our parameters. From WLS theory, we have:

Var(/3) = (X'Vg-

-V

where V^" is the inverse of V^ given above.

8.3.1

Example 8.1: A Bioassay Example

The data in Table 8.3 give the effect of different concentrations of nicotine sulphate in a 1% saponin solution on an insect Drosophila melanogaster, the fruit fly.
We usually employ the logarithm of Xi, and in this case we chose to use log (x^, that
is, the log to base 10 of the doses. Table 8.4 gives the relevant initial calculations

CHAPTER 8. MODELS FOR BINARY RESPONSES

288

Number of
insects

Nicotine sulphate
g/100 cc

Number
killed

Xi

Ti

Ui

0.10
0.15
0.20
0.30
0.50
0.70
0.95

8
14
24
32
38
50
50

47
53
55
52
46
54
52

Table 8.3:
Effect of different concentrations
Drosophila melanogaster, (Hubert, 1992)

logio(zt)
-1.000
-0.824
-0.699
-0.523
-0.301
-0.155
-0.022

Observed
proportions
Pi
0.170
0.264
0.436
0.615
0.826
0.926
0.962

Observed
logits
Si
-1.5856
-1.0253
-0.2574
0.4684
1.5575
2.5268
3.2314

of nicotine sulphate

on

Expected
proportions
T?i

0.1448
0.2864
0.4253
0.6369
0.8387
0.9141
0.9532

Table 8.4: Results of analysis


for the data. While I do not for a moment think that the linear logistic moment
should be fitted by what follows in this section, it is nevertheless incorporated here
so that students can have a proper understanding of what is really going on from
the use of statistical packages.
The linear logistic model can be fitted by writing
' -1.5856 "
" 1 -1.000 "
1 -0.824
-1.0253
-0.2574
1 -0.699
0.4684 = 1 -0.523
1 -0.301
1.5575
2.5268
1 -0.155
1 -0.022
3.2314
that is,
and

V^"1 = diag{6.6317,10.2981,13.5247,12.3123,6.6113,3.7003,1.9009}

which is a 7 x 7 diagonal matrix, where, for instance, 6.6317 is obtained as:


1
6.6317 =
47(0.170)(1 -0.170)

-i
= 7lipi(I

-pi)

8.3. PARAMETERS ESTIMATION IN LOGISTIC REGRESSION


HenCe

289

1189
/Y'V
-1652 I
A
(X
V^ -1^-1
X) _
- T^ Q1652 Q2n2 I and

'

5005
x'v^-^ = f[7'13.4499

(30 1

[ 0.1189 0.1652 1 I" 7.5005


0.1652 0.2712 J [ 13.4499
3.1137
4.8867

Hence, /30 = 3.1137, /3i = 4.8867, with corresponding variances 0.1189<r2 and
0.2712a2, respectively. Thus for a unit change in Iog10 of nicotine sulphate (X),
the estimated odds of the number of insects killed are multiplied by exp{4.8867} =
132.5156.
The expected or predicted logits from the fitted model are:
[-1.7730, -0.9125, -0.3020,0.5585,1.6427,2.3567,3.0048]
with estimated proportions
?fi = [0.145,0.286,0.425,0.637,0.838,0.914,0.953]
With Xi = Iog10 Xi, the above are obtained from the estimated regression equation:
-l }= Po +faXi= 3.1137 + 4.8867z;
\l-Ki)
Thus with Xi 1.00, we have
In

In

-r ) = -1.7730 = -r = 0.1698228
\l-TTiJ

'

0.1698228
1 + 0.1698228

1 - TTj

0.1698228

The level of the dosage that would result in a 50% response by subjects in
the population under study is an important parameter in dose-response models.
A measure of the potency of the drug is the statistic LD50 median lethal dose
(or ED50 for effective median dose, LC50 for median lethal concentration, and its
corresponding EC50, median effective concentration). In this example, LD50 is the
lethal dosage at which 50% of the subjects (insects) are expected to be killed, and in
experiments where the response is not death, we refer to the ED50, median effective
dose. Thus,

LD50 = if)-0-6372 = 0.2306 g/100 cc.


( 9 1 Q79 _ ft \
Similarly, an LD90 is given by 10U , where U -;--, which when

That is, Iogi0 (LD50) = -0.6372

=*

computed gives a value of 0.6493 g/100 cc.


The X\OK
(t fr ^e k x 2 table is given by
Dgit
X?ogit

CHAPTER 8. MODELS FOR BINARY RESPONSES

290

where if we define
= rii+pi(I - pi)

for i = 1,2, ,/K

which are simply the inverses of the asymptotic variances of the observed logits Si,
then
v^ r
is the weighted mean of the observed logits.
For the data in Table 8.3, we have the following calculations:
i
1
2
3
4
5
6
7

rii
47
53
55
52
46
54
52

Pi
0.1702
0.2642
0.4364
0.6154
0.8261
0.9259
0.9615

Pi(l-Pi)
0.1412
0.1944
0.2460
0.2367
0.1437
0.0686
0.0370

Total

U{

Si

Ui8i

6.6383
10.3019
13.5273
12.3077
6.6087
3.7037
1.9231
55.0106

-1.5841
-1.0245
-0.2559
0.4700
1.5581
2.5257
3.2189

-10.5159
-10.5543
-3.4621
5.7847
10.2973
9.3546
6.1901
7.0944

2.5094
1.0496
0.0655
0.2209
2.4278
6.3793
10.3612

**?
16.6584
10.8130
0.8861
2.7188
16.0447
23.6271
19.9253
90.6733

Hence,
7.0944
= 0.1290
55.0106
and as a result
Total x2ogit =
= 90.6733 - (55.0106) * (0.1290)2
= 90.6733 - 0.9149
= 89.7584
Comparison with the WLS Solution
The total sum of squares is given by:

where Si ]T)wi#i and m = riipi(l Pi).


Thus QT = 90.6733 - 0.9149 = 89.7584 on 6 degrees of freedom. The regression
contribution is accordingly given by

ArS"

These results are summarized in Table 8.5.


Source
Regression
Lack of fit
Total

d.f.
1
5
6

x2
88.0525
1.7059
89.7584

pvalue
0.0000
0.8882

Table 8.5: Lack of fit test


This model fits the data well, as can be seen from the nonsignificant x2 lack of fit
pvalue ( 0.8882), while the test of (3\ = 0 was significant with x$eg = 88.0525,
indicating that the shape parameter should be kept in the model.

8.3. PARAMETERS ESTIMATION IN LOGISTIC

REGRESSION

291

We give in the next table, the fits of the various models discussed in the preceding
sections to the data in Table 8.3 using the statistical package SAS^ (equivalent
SPSS implementation can be found in the appendix). The MLE for these models
are obtained by the iterative re-weighted least squares discussed in section 2.6 of
chapter 2. For the logistic model for instance, the likelihood equations are derived
using the Fisher's scoring algorithm discussed in the same section of chapter 2,
which are derived from
where p,i rupi. The derivation of this is provided in appendix F.2.
We observe here that the ratio of the logistic parameter model estimates to those
of the logit models is (3.1236/1.8255)=(4.8995/2.8749)=1.7. This ratio is within the
expected range of 1.6 to 1.8.
Models
Logistic
Probit
CLL
E-V

d.f.
5
5
5
5

G2
0.7336
0.5437
1.5578
4.1421

pvalue
0.9811
0.9904
0.9063
0.5291

Parameters
0Q
/3
3.1236
4.8995
1.8255
2.8749
1.3888
2.9196
-2.7309 -3.5081

CLL: complimentary log-log model


E-V: Extreme value model
All the models provide adequate fits of the data, but both the probit and the logistic
models provide better fits for the data, although the probit models seems better
both in terms of G2 values and the standardized residuals (not printed). All the
models are of course based on 5 degrees of freedom.
The following are the corresponding SAS software statements for fitting the above
models using PROCs LOGISTIC and GENMOD.
data tab83;
input x r n <0;
dose=loglO(x);
surv=n-r;
datalines;
datalines;
0.10 8 47 0.15 14 53 0.20 24 55 0.30 32 52
0.50 38 46 0.70 50 54 0.95 50 52
*** (i) fit logistic model ***;
proc logistic data=tab83;
model r/n=dose/scale=none aggregate influence covb;
output out=aa p=phat stdxbeta=selp h=lev; run;
proc print data=aa; run;
*** (ii) Fits probit model with Proc Logistic ***;
proc logistic;
model r/n=dose/link=normit scale=none aggregate; run;
(iii)
*** (iii) Fits CLL model with Proc Logistic ***;
proc logistic;
model r/n=dose/link=cloglog scale=none aggregate; run;
*** (iv) Fits Extreme value model with Proc Logistic ***;
proc logistic;
model surv/n=dose/link=cloglog scale=none aggregate; run;

Equivalent program using PROC GENMOD is also provided below:

292

CHAPTER 8. MODELS FOR BINARY RESPONSES

set tab83;
proc genmod;
model r/n=dose/dist=bin link=logit obstats; run;
proc genmod;
model r/n=dose/dist=bin link=probit obstats; run;
proc genmod;
model r/n=dose/dist=bin link=cll obstats; run;
proc genmod;
model surv/n=dose/dist=bin link=cll obstats; run;

In PROC LOGISTIC, P, STDXBETA, PRED, and H contain respectively the


predicted values of the probabilities, the standard errors of the linear predictor,
the predicted logits, and the leverages for the model. The AGGREGATE and
influence statements in (i) allow the goodness-of-fit tests (deviance or G2 and the
Pearson's X2) to be presented, while the influence statement allows the detection
of influential observations. The INFLUENCE option may result in a large volume
of output especially if there are many observations in the data. Below is the result
from PROC LOGISTIC when the option aggregate, which must accompany the
SCALE=NONE option in the logistic model statement, is invoked.
The LOGISTIC Procedure
Model Information
Data Set
Response Variable (Events)
Response Variable (Trials)
Number of Observations
Model
Optimization Technique

WORK.TAB83
r
n
7
binary logit
Fisher's scoring

Deviance said Pearson Goodness-of-Fit Statistics


Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

5
5

0.7336
0.7351

0.1467
0.1470

0.9811
0.9810

Number of unique profiles: 7

The parameter estimates from the logistic model using PROC LOGISTIC are displayed as:
Analysis of Maximum Likelihood Estimates

Parameter

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept
dose

1
1

3.1236
4.8995

0.3349
0.5098

86.9809
92.3620

<.0001
<.0001

Odds Ratio Estimates

Effect
dose

Point
Estimate
134.228

95*/. Wald
Confidence Limits
49.419

364.582

Association of Predicted Probabilities and Observed Responses


Percent Concordant
Percent Discordant
Percent Tied
Pairs

80.0
10.8
9.2
30888

Somers' D
Gamma
Tau-a
c

0.692
0.762
0.333
0.846

8.3. PARAMETERS ESTIMATION IN LOGISTIC REGRESSION

293

Estimated Covariance Matrix


Variable

Intercept

dose

Intercept
dose

0.112172
0.15627

0.15627
0.259907

The parameter estimates J3o = 3.1236 and @i = 4.8995 are both significantly (p <
.0001) different from zero. The odds ratio for the intercept would be e3-1236 = 22.73.
Thus at log (base 10) dosage level (dose = 0), it is almost 23 times most likely that
the insect Drosophila melanogaster would die than not die. Note that this dose
level is equivalent to 1.0 g/100 cc nicotine sulphate concentration. Similarly with
each unit increase in Iog10 dosage level, the odds of insect dying increase by 134.23
times.
A test of the hypothesis concerning the parameters of the logistic model is carried
out as follows:
To test whether HQ : (3 = 0 against Ha : (3 ^ 0, the statistic is:

e = -^For binomial data, however, this is not usually distributed as Student's t distribution but as a normal distribution. Hence, when HQ holds, t* will be distributed
approximately normal. In our case, t* = 9.6105, which clearly indicates that the
null hypothesis is not tenable in this case. This test is provided in SAS software
output below plus the parameter estimates when the logistic model is fitted:
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald

Chi-Square

DF

Pr > ChiSq

145.2882
127.2173
92.3620

1
1
1

<.0001
<.0001
C.0001

We observe that 9.61062 = 92.3636, which should have been equal to the Wald's
value of 92.3620 in the above test except for rounding error.
The estimated killing probability as a function of the log dosage is estimated as:
exp (3.1236 + 4.8995dose)
P = I + exp (3.1236 + 4.8995dose)
which for the first dosage becomes
_ exp [3.1236 +4.8995 (-1)]
~ 1 + exp [3.1236 + 4.8995 (-1)]
- -1693
~ 1.1693
= 0.1448
The expected number of deaths for this dosage level is Hi *p = 47 * 0.1448 = 6.8056
or about 7 insects. These and other relevant parameters are displayed below.

CHAPTER 8. MODELS FOR BINARY RESPONSES

294
yhat

Obs

1
2
3
4
5
6
7

8
14
24
32
38
50
50

47
53
55
52
46
54
52

6,.8056
15,.1792
23..3915
33..1188
38..5802
49..3614
49,.5664

phat

selp

0..1448
0..2864
0..4253
0..6369
0..8387
0..9141
0..9532

0.2440
0.1763
0.1439
0.1408
0.2041
0.2646
0.3246

resid

lev

0.4838
0..3465
0..3368 -0.3604
0.,2782 0.1657
0..2382 -0.3206
0..2591 -0 . 2305
0..2968
0.3172
0..2442
0.2926

To fit a probit model, specify the link function as NORMIT in the model statement
(see ii) in PROC LOGISTIC. Similarly, we specify the CLOGLOG statement for
the complimentary log-log model as in (iii). The default in SAS software is the
LOGIT link. The above SAS software results can also be realized by the use of
PROCEDURE CATMOD in SAS software. The final model based on the logistic
regression is given by
- 3.1236 + 4.8995 Iog10 dose;,

In

i = 1,2, , 7

(8.9)

The logistic model can also be implemented with PROC CATMOD. We present
below the SAS software program required to implement this together with a partial
SAS software output, again for the data in Table 8.3.
DATA NEW; SET TAB83;
COUNT=R; Y=0; OUTPUT; COUNT=N-R; Y=l; OUTPUT; DROP N R;
PROC CATMOD DATA=NEW; WEIGHT COUNT; DIRECT DOSE;
MODEL Y=DOSE/FREQ ML NOGLS; RUN;
The CATMOD Procedure
Maximum Likelihood Analysis of Variance
Source

DF

Chi-Square

Pr > ChiSq

Intercept
dose

1
1

86 . 98
92.36

<.0001
<.0001

Likelihood Ratio

0.73

0.9811

Analysis of Maximum Likelihood Estimates

Parameter

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

Intercept
dose

3.1236
4.8996

0.3349
0.5098

86.98
92.36

<.0001
<.0001

Results obtained from the application of PROC CATMOD agree with those obtained earlier for the logistic model.

8.3.2

Implementing a Probit Model

We also implement both the probit and logistic models for the data in Table 8.3
with PROC PROBIT in SAS^. In the program below to implement these models,
notice that there is no need to formally transform the dose to loglO or log dose.
PROC Probit does this automatically by specifying either LOG 10 or LOG in
the PROC statement. The option LACKFIT conducts the usual lack-of-fit tests

8.3. PARAMETERS ESTIMATION IN LOGISTIC REGRESSION

295

(X2 and G2: this test assumes that the data has been sorted by the explanatory
variable), while INVERSECL computes confidence limits for values of the first
continuous explanatory variable, in this case, dose together with the response rates.
To implement the logistic regression in this case, specify distribution (D = logistic)
in the model statement as in the second model statement below.
DATA TAB83; INPUT DOSE R Nfflffl;DATALINES;
0.10 8 47 0.15 14 53 0.20 24 55 0.30 32 52
0.50 38 46 0.70 50 54 0.95 50 52
PROC PROBIT DATA=TAB83 LOG10; MODEL R/N=DOSE/LACKFIT INVERSECL;
OUTPUT OUT=AA P=PROB; RUN;
PROC PROBIT DATA=TAB83 LOG10; MODEL R/N=DOSE/D=LOGISTIC LACKFIT INVERSECL;
OUTPUT OUT=BB P=PROB2; RUN;
DATA NEW; MERGE AA BB; PROC PRINT DATA=NEW; RUN;
Probit Procedure

Value

DF

Pr > ChiSq

0.5451
0.5437

5
5

0.9904
0.9904

Statistic
Pearson Chi-Square
L.R.
Chi-Square

Analysis of Parameter Estimates

Parameter
Intercept
LoglO(DOSE)

1
1

95'/. Confidence
Limits

Standard
Error

DF Estimate

0.1753
0.2710

1..8255
2 .8749

Probit Model in Terms of Tolerance


MU
-0.634963

1.4819
2 . 3438

2 .1690
3 .4061

ChiSquare Pr > ChiSq


108 .47
112 .53

Distribution

SIGMA
0.34783305

Estimated Covariance Matrix


for Tolerance Parameters

MU

SIGMA

MU
SIGMA

0.000766
-0.000173

-0.000173
0.001075

Probability

Probit Analysis on LoglO(DOSE)


LoglO(DOSE)
951/. Fiducial Limits

0.01
0.02
0.03

-1,.44414
-1,.34932
-1,.28917

-1 .64886
-1 .53355
-1.46052

-1 . 30058
-1 .21955
-1 .16801

0.45
0.50
0,.55

-0.,67867
-0.,63496
-0. 59125

-0 .73963
-0 .69216
-0,.64591

-0 .62476
-0 .58172
-0 . 53746

0.98
0.99

0., 07940
0., 17422

-0 . 03443
0,.04691

0.23977
0.35477

Probability

Probit Analysis on DOSE


957, Fiducial Limits
DOSE

0,.01
0..02
0.03

0.03596
0,.04474
0.05138

0.,02245
0..02927
0,. 03463

0,.05005
0,, 06032
0,.06792

0,.45
0.,50
0,.55

0,.20957
0,.23176
0.. 25630

0.,18213
0. 20316
0.,22599

0,.23727
0,,26199
0,. 29009

0.,98
0.,99

1., 20060
1,,49354

0. 92378
1. 11406

1., 73688
2., 26344

<.0001
C.OOOl

296

CHAPTERS. MODELS FOR BINARY RESPONSES

Partial results for the probit analysis on loglO(dose) and dose are presented above.
The complete result are displayed in appendix F.3. The partial results above indicate that the tolerance distribution parameter mean has an estimate of -0.634963.
The variance covariance matrix of the parameter estimates are also displayed. The
LD50 for LOGlO(Dose) is 0.63496, that is, the loglO(dose) corresponding to a
probability of 0.5. The LD50 in this case has a 95% confidence interval (C.I.) of
(-0.69216,-0.58172). Similarly, the LD50 for dose is 0.23176 with a 95% C.I. of
(0.20316,0.26199).
The partial results below are those obtained again when PROC PROBIT is
employed for implementing the logistic regression approach. Again, the results
obtained here agree with those obtained earlier from the use of PROC LOGISTIC.
The LD50 under this model is 0.23039 with a 95% C.I. of (0.20194,0.26096). Again
the detailed results are displayed in appendix F.4.
Goodness-of-Fit Tests
Statistic
Pearson Chi-Square
L.R.
Chi-Square

Value

DF

Pr > ChiSq

0.7351
0.7336

5
5

0.9810
0.9811

Analysis of Parameter Estimates

Parameter
Intercept
LoglO(DOSE)

DF Estimate
1
1

Standard
Error

3.1236
4.8996

957. Confidence
Limits

0.3349
0.5098

2.4672
3.9003

3.7800
5.8988

ChiSquare Pr > ChiSq


86.98
92.36

Probit Model in Terms of Tolerance Distribution


MU
-0.637529

SIGMA
0.20409985

Estimated Covariance Matrix


for Tolerance Parameters
MU
0.000773
-0.000080

MU
SIGMA

SIGMA
-0.000080
0.000451

Probit Analysis on LoglO(DOSE)


Probability

LoglO (DOSE)

95*/. Fiducial Limits

0.,01
0..02

-1 .57539
-1 .43185

-1 .83216
-1 .65304

-1.,40317
-1.,28275

0..45
0.,50
0.,55

-0 .67849
-0,.63753
-0,.59657

-0 .73938
-0,.69479
-0,.65152

-0,.62429
-0., 58342
-0..54121

0..99

0.30033

0. 13984

0..53908

Probit Analysis on DOSE


Probability

DOSE

957. Fiducial Limits

0,.01
0,.02

0.02658
0.03700

0..01472
0.,02223

0.03952
0.05215

0,.45
0.,50
0.,55

0 . 20966
0.23039
0.25318

0.. 18223
0. 20194
0. 22309

0.23752
0,. 26096
0,. 28760

1.99680

1.37988

<.0001
<.0001

8.3. PARAMETERS ESTIMATION IN LOGISTIC REGRESSION

297

Below are presented the predicted probabilities under both the probit and logistic
models. As mentioned before, the probit model fits better.
1
2
3
4
5
6
7

0.10
0.15
0.20
0.30
0.50
0.70
0.95

8
14
24
32
38
50
50

47
53
55
52
46
54
52

PROBIT

LOGISTIC

0.14698
0.29349
0 . 42700
0.62636
0.83148
0.91623
0.96092

0 . 14480
0.28635
0.42530
0 . 63685
0.83871
0.91409
0.95322

In order to test for the adequacy of this model, we would need to examine the plots
of residuals and conduct other diagnostics procedures.

8.3.3

Example 8.2: Beetle Mortality Data

While both the logistic and probit models seem to fit well the data in Table 8.3,
there are other data sets in which both models would be found inadequate. We
give below in Table 8.6 the beetle mortality data (Bliss, 1935) that relate to the
numbers of insects dead after 5 hours of exposure to gaseous carbon disulphide at
various concentrations. The data have also been analyzed by Dobson (1990) and
Agresti (1990).
logio Xi
Xi
n
1.6907 59
1.7242 60
1.7552 62
1.7842 56
1.8113 63
1.8369 59
1.8610
62
1.8839 60
d.f.
G'2
X'2

r
6
13
18
28
52
53
61
60

Logistic
3.457
9.842
22.451
33.898
50.096
53.291
59.222
58.743

Models
CLL
Probit
5.589
3.358
10.722 11.281
23.482 20.954
33.815 30.369
49.615 47.776
53.319 54.143
59.664 61.113
59.228 59.947

E-V

56.637 (53)
47.447 (47)
34.214 (44)
19.568 (28)
13.437 (11)
7.618 (6)
4.898 (1)
2.941 (0)

11.232
10.025

10.120
9.513

3.446
3.295

27.917
25.093

Table 8.6: Expected values under various models for the beetles mortality data
(Bliss, 1935)
We present below extracts from the SAS software output for the CLL model from
PROCs LOGISTIC and GENMOD, respectively.
DATA TAB86;
INPUT DOSE N R <BQ;
DATALINES;
1.6907 59 6 1.7242 60 13 1.7552 62 18
1.7842 56 28 1.8113 63 52 1.8369 59 53
1.8610 62 61 1.8839 60 60
PROC LOGISTIC DATA=TAB86 DESCENDING;
MODEL R/N=DOSE/LINK=CLOGLOG PLCL PLRL LACKFIT
SCALE=DEVIANCE;

AGGREGATE

298

CHAPTERS.

MODELS FOR BINARY

RESPONSES

OUTPUT OUT=AA PREDICTED=PROBS;


RUN;
PROC PRINT DATA=AA; RUN;
proc gslide; run;
GOPTIONS CBACK=WHITE
COLORS=(BLACK)
vsize=6
hsize=6;
PROC SORT DATA=AA;
BY dose; RUN;
SYMBOL2 I=spline VALUE=none HEIGHT=.75;
axisi label=(angle=-90 rotate=90 'EXPECTED PROBS');
axis2 label=('LOG OF DOSAGE');
PROC GPLOT DATA=AA; PLOT PROBS*dose/vaxis=axisl haxis=axis2 VREF=.5;
RUN;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

Pr > ChiSq

6
6

3.4464
3.2947

0.5744
0.5491

0.7511
0.7711

Deviance
Pearson

Number of unique profiles: 8


Analysis of Maximum Likelihood Estimates

Parameter

DF

Estimate

Standard
Error

Chi-Square

Pr > ChiSq

Intercept
DOSE

1
1

-39.5725
22.0412

3.2403
1.7994

149.1487
150.0498

<.0001
<.0001

USING PROC GENMOD:


DATA GEN;
SET TAB86;
PROC GENMOD DATA=TAB86 DESCENDING; MODEL r/n=dose/link=cloglog;
RUN;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square

DF

Value

Value/DF

6
6

3.4464
3.2947

0.5744
0.5491

Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

Intercept
DOSE

1
1

-39.5723
22.0412

3.2290
1.7931

150.19
151.10

<.0001
<.0001

The results suggest that the complimentary log- log model with G2 = 3.446 on 6
d.f. adequately fits the data better than either the logistic or the probit model.
The extreme-value (E-V) model fits poorly. For the E-V model, the figures in
parentheses are the observed number of insects that survived, that is, (n r) which
the model tries to match. The final CLL and logistic models for these data are
given as.
- 7T;)] = -39.5725 + 22.0412
where Xi = Iog10(dose), and
In [ -^-r 1 = -60.7114 + 34.2669
The plot of the estimated complimentary log-log model for the data in Table 8.6 is
displayed in Figure 8.7.

8.3. PARAMETERS ESTIMATION IN LOGISTIC

299

REGRESSION

Figure 8.7: Fitted complimentary log-log model

8.3.4

Example 8.3: Status Data

The following example comes from Schork and Remington (2000). The data represent the outcome variable, HIV status (0 no, 1 = yes), factor variables IV
(intravenous) drug status (0 = no, 1 = yes), and number of sexual partners for 25
men selected from a homeless shelter.
ID
1
2
3
4
5
6
7
8
9
10
11
12
13

HIVSTAT

IVDRUG

SEXPART

0
0
1
0
0
1
1
0
1
0
0
0
0

0
1
1
0
0
0

4
4
3
2
7
12
8
1
9
5
6
4
2

0
0
0
0

1
1

ID
14
15
16
17
18
19
20
21
22
23
24
25

HIVSTAT

IVDRUG

SEXPART

0
1
1
0
1
0

0
1
0
0

5
9
19
7
10
5
8
14
8
14
9
17

0
0

1
1

1
1

Table 8.7: Data for the HIV status example


Let the response variable be Yi from individual i be defined as:
_ f l if HIVSTAT is yes
[ 0 otherwise
Suppose the probability of a yes HIV status depends on the drug, sexpart, and the
interaction between drug and sexpart, that is,
TTi = pr[Yj = l|drug i ,part i ,drug*partj
=

exp ((30 +ffidrugj+ /32part^ + /33drug*parti)


I + exp (A) + Adrug; + /3 2 partj + ^ 3 drug*part i )'

*'W)

where TT^ is the probability of the i-ih individual having the HIV, drug represents
the IV drug effect, part represents the effect of the number of sexual partners,
and drug*part represents the interaction effect of IV drug and numbers of sexual
partners. Also,

300

CHAPTERS.

MODELS FOR BINARY

RESPONSES

[ 1 if yes
drug; = <
10 if otherwise
Model (8.10) therefore becomes:
In ( - } = (5

(S.lla)

7T;

that is
logit; = /3o+(3i drug; + /32 parti 4- f33 drug * part;

(8.lib)

The above model is implemented in SAS software with the following SAS software
program presented with a partial output.
data ex84;
input hiv drug sexpart 08;
datalines;
0 0 4 0 1 4 1 1 3 0 0 2 0 0 7 1 0 1 21 1 8 0 0 1 1 0 9 0 0 5
0 0 6 0 1 4 0 1 2 0 0 5 1 1 9 1 0 1 90 0 7 1 1 1 0 0 0 5 1 1 8
00 14 0 1 8 1 0 14 1 1 9 1 1 17
proc logistic data=ex84 descending;
model hiv=druglsexpart/scale=none aggregate lackfit; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

13
13

11.5979
10.4732

0.8921
0.8056

0.5609
0.6549

Testing Global Null Hypothesis: BETA=0


Test

Chi-Square

DF

Pr > ChiSq

16.1069
12.2969
7.5477

3
3
3

0.0011
0.0064
0.0563

Likelihood Ratio
Score
Wald

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
DRUG
SEXPART
DRUG*SEXPART

DF
1
1
1
1

Estimate
-5.3304
2.5151
0.4816
0 . 0342

Standard
Error
2.5223
3.2281
0.2417
0 . 3867

Chi-Square
4.4659
0.6070
3.9707
0.0078

Pr > ChiSq
0 . 0346
0.4359
0.0463
0.9295

The descending statement in the PROC line above tells SAS software to model
HIV=1 rather than HIV=0. That is, we wish to model those having the positive HIV
status. The deviance or G2 for this model is 11.5979 on 13 d.f. (p-vame=0.5609).
This model fits the data well. The "Testing Global Null Hypothesis" output tests
the hypothesis
HQ : (3i = $2 = 0s = 0;

versus HI : at least one of the /3s is ^ 0

Three alternative tests are provided for this hypothesis utilizing the likelihood ratio,
Fisher's scoring, and Wald based tests. The likelihood and score tests indicate that
we would reject HQ, suggesting that at least one of the /3 parameters is not equal
to zero.

8.3. PARAMETERS ESTIMATION IN LOGISTIC REGRESSION

301

Examination of the parameter estimates indicates that the interaction term is


not significant and can be removed from the model, given that both drug and
sexpart are already in the model. Hence, we next fit a reduced model
fto+0i drugi + 02 part^
(8.12)
The model in (8.12) is implemented in SAS software with the following SAS software
program with again a partial output display.
set ex84; proc logistic data=ex84 descending;
model hiv=drug sexpart/scale=none aggregate lackf it rsq plcl plrl
waldcl waldrl;
units sexpart=l 2 3 drug=l;
output out=aa p=phat; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

14
14

11.6057
10.3988

0.8290
0.7428

0.6379
0.7325

Number of unique profiles: 17


Analysis of Maximum Likelihood Estimates

Parameter

DF

Estimate

Standard
Error

Chi-Square

Pr > ChiSq

Intercept
DRUG
SEXPART

1
1
1

-5.4649
2.7748
0.4954

2.0683
1.3881
0.1901

6.9816
3.9960
6.7938

0.0082
0.0456
0.0091

Odds Ratio Estimates


Point
Estimate

Effect
DRUG
SEXPART

957. Wald
Confidence Limits

16.035
1.641

1.056
1.131

243.551
2.382

Wald Confidence Interval for Adjusted Odds Ratios


Effect

Unit

DRUG
SEXPART
SEXPART
SEXPART

1.0000
1.0000
2.0000
3 . 0000

Estimate
16.035
1.641
2.693
4 . 420

957. Confidence Limits


1.056
1.131
1.279
1 . 446

243.551
2.382
5.674
13.514

Hosmer and Lemeshow Goodness-of-Fit Test


Chi-Square

DF

Pr > ChiSq

The implementation of the model described in (8.12) gives a G2 = 11.6057 on 14 d.f.


The model fits the data very well. Parameter estimates of 0i and 02 are significant.
The analysis show that the odds of an HIV status being positive is 16.035 times
higher for IV drug users than those not on IV drugs when the effect of sexual
partner is controlled. Similarly, the odds increase by 1.641 for a unit increase in the
number of sexual partners. The odds increase by 2.693 = e2*o.4954 _ ^ 54^2 ancj
4.420 = e3*0-4954 1.6413 for 2 units and 3 units increases in the number of sexual
partners, respectively. The UNITS option in the model statement generates these
additional results along with their Wald-based confidence intervals. The Hosmer
and Lemeshow goodness-of-fit test indicate that our model is adequate.

302

CHAPTERS.

MODELS FOR BINARY RESPONSES

We also obtain the expected probabilities (7^), of having a positive HIV status
based on the estimated logistic model:
In

* = -5.4649 + 2.7748 drug; + 0.4954 part;


\ 1 - TTi /

(8.13)

These expected probabilities are presented below. They are generated in the program with the OUTPUT statement into a file named aa, the contents of which
are printed below with a SAS software PRINT statement.
proc print data=aa noobs; run;

Obs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

HIV

0
0
1
0
0

DRUG

1
1
0
0
0

0
0
0
0

1
0
0
0
0
0

1
1
0

1
1
0

1
0
0

0
0

1
1

1
1

SEXPART

4
4
3
2
7
12
8
1
9
5
6
4
2
5
9
19
7
10
5
8
14
8
14
9
17

LEVEL

PHAT

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

0.02979
0.32991
0.23077
0.01127
0.11950
0.61770
0.78125
0 . 00690
0 . 26769
0.04797
0.07638
0.32991
0.15455
0.04797
0.85425
0.98106
0.11950
0 . 90583
0 . 04797
0.78125
0.81314
0.78125
0.81314
0.85425
0 . 99677

The expected probabilities are very much consistent with the observed data. For
instance, among those who had HIV positive in the sample data, their expected
probabilities are quite high except for individuals 3 and 9. Similarly, individuals
21 and 22 have high expected probabilities but were HIV negative. We therefore
decided to see to what degree do the observed HIV status agree with the predicted
probabilities if we classify an individual to be HIV positive if his expected probability
is greater or equal to 0.5, that is, if TT^ > 0.5. We must confess here that this value
is somehow subjective. We present again below the result of McNemar's test (test
of agreement) for this data based on this classification.
data new;
set aa;
predicts=(phat ge 0.5);
proc freq data=new; hiv*predicts/norow nocol nopercent agree;
run;
Statistics for Table of HIV by PREDICTS
McNemar's Test
Statistic (S)
DF
Pr > S

0.0000
1
1.0000

8.3. PARAMETERS ESTIMATION IN LOGISTIC REGRESSION

303

Simple Kappa Coefficient


Kappa
ASE
95'/. Lower Conf Limit
95'/. Upper Conf Limit
Sample Size = 25

0.6753
0.1487
0.3840
0.9667

Table of HIV by PREDICTS


HIV
0
1
Total

PREDICTS
1
0
12
2
2
9
11
14

Total
14
11
25

The estimate of K, the agreement statistic, and McNemar's test of agreement indicate that there is very strong agreement between the observed HIV status and
the expected HIV status based on the model employed. The graph of the predicted
probabilities versus the number of sexual partners for the two levels of IV drug are
presented below with the accompanying SAS software statements.
set aa;
proc gslide;
run;
GOPTIONS CBACK=WHITE
COLORS=(BLACK)
vsize=6
hsize=6;
PROC SORT DATA=AA;
BY sexpart;
RUN;
SYMBOL1 I=spline VALUE=+ HEIGHT=.75;
SYMBOL2 I=spline VALUE=none HEIGHT=.75;
axisl label=(angle=-90 rotate=90 'EXPECTED PROBS');
axis2 label=('NO OF SEXUAL PARTNERS');
PROC GPLOT DATA=AA;
PLOT Phat*sexpart=drug/vaxis=axisl haxis=axis2;
RUN;

Figure 8.8: Predicted probabilities plot from model (8.13)

304

8.4

CHAPTER 8. MODELS FOR BINARY

RESPONSES

Example 8.4: Relative Potency in Bioassays

We give below an example relating to the fit of linear logistic model when we have
two drugs administered to mice.
The following data are an example in which we are interested in the comparison
of two or more groups after adjusting for some factor. In this case, the groups are two
drugs A and B, while the factor is the dosage levels and the response is categorical
(that is, binary). This of course leads to what we know as the quantal assay, in
analogy to the covariance analysis where the response variable is continuous. In the
example below, the acute toxicities of two drugs A and B were tested by intravenous
injection into mice. Each drug was given at different doses to four groups of 20 mice,
and deaths were recorded 5 minutes later.
Drug

Dose
Xi

Coded level

Number dead

Proportion

Ti

Pi

Si

Xi

Logits

2
4
8
16

0
1
2
3

2
9
14
19

0.10
0.45
0.70
0.95

-2.20
-0.20
0.85
2.94

0.3
0.6
1.2
2.4

0
1
2
3

1
6
14
17

0.05
0.30
0.70
0.85

-2.94
-0.85
0.85
1.73

Table 8.8: Data for Example 8.4


It has been established (see Collett, 1991) that responses of subjects to drugs tend
to vary proportionately to the log dosage rather than to dose levels themselves, and
hence an arbitrary log-scale can be established.
In the example above, there are four dosage levels each for drugs A and B
respectively. Of most interest to us is to compare the relative potency of the drugs
after adjusting for the various dosage levels.
To answer the above question, first let y^- denote the number of deaths in the
j-ih dose for the i-ih drug and n^ the corresponding sample size (n^- = 20 for all
i and j ) .
Let P (death | i-ih drug, j-ih dose) = TT; > 0. Then a linear logistic model of
the following form fitted to the data is considered for the two drugs (that is, we fit
two separate logistic regression lines, one for each drug):
loglt (iTij) = (30i +

where i = 1, 2; j 1, 2, 3, 4; and dose^- = In dose for the i-ih drug and j-ih level of
dose.
We are also interested to know if the dosage effect is the same for the two
groups and, given that this is the case, is there a significant difference between the
two drugs? To answer this, let us consider fitting a model with drugs as a factor
variable with two levels (A and B) and the dosage levels as the covariate. Because
the dosage levels are not to the same scale, it would be necessary to transform the
dosage levels by either using the natural or common logarithmic transformations

8.4. EXAMPLE 8.4: RELATIVE POTENCY IN BIOASSAYS

305

to ensure uniformity from one drug to the other. We chose to use the common
logarithmic transformation, that is, In(dosej).
Let us first examine how PROC LOGISTIC and PROC GENMOD code factor
variables that are invoked with the CLASS statement in both procedures. If we
let drug be the drug dummy variable, then in PROC LOGISTIC, drug takes the
values:
f 1 if drug A
d
&
[ -1 if drug B
Drug A is coded 1 since alphabetically, it comes before B while in PROC GENMOD,
drug is coded as:
/
...
.
1 if drug A
drug = <
\Q if drug B
Again, drug A is coded as 1 because of its alphabetical order compared with drug
B. We can achive the same coding scheme from PROC LOGISTIC by specifying:
class drug (ref=last)/param=:ref; in the class statement.
All the models implemented for this example are shown in the statements below:
data tab88;
input drug $ x r n <BQ;
dose=loglO(x) ;
dat alines;
A 2 2 20 A 4 9 20 A 8 14 20 A 16 19 20
B 0.3 1 20 B 0.6 6 20 B 1.2 14 20 B 2.4 17 20
proc logistic order=data;
class drug;
(i)
model r/n=drug/scale=none aggregate; run;
(ii)
model r/n=drug|dose/scale=none aggregate; run;
(iii) model r/n=drug dose/scale=none aggregate; run;
(iv)
model r/n=dose drug*dose/scale=none aggregate; run;
(v)
model r/n=dose/scale=none aggregate; run;
*ditto for proc gennmod;

(a) The model in (i) is:


) = 0o+/3i drug;,

i = 1, 2

and tests the hypothesis that the drug differences are significantly small. For
this model, we have G2 74.184 on 6 d.f. Obviously, this model is untenable
and thus the effects of the drugs are significantly different.
set data;
model r/n=drug/scale=none aggregate=(dose) ; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

6
6

74.1835
64.1999

12.3639
10.7000

<.0001
<.0001

(b) The second fit statement fits two separate regression lines of the form
j) = (30 + /Sitdrug; + fajdosej + /^(drug * dose)jk
(8.14)
where
drug;
is the effect of the i-th drug

CHAPTER 8. MODELS FOR BINARY RESPONSES

306

is the effect of the j-ih dosage


(drug * dose)jfc
is the interaction between drugs and dose levels
The model has a G2 = 1.6398 on 4 d.f. That is, the model is adequate. The
SAS software program and partial outputs are displayed for both the PROC
LOGISTIC and GENMOD.
set tab88;
proc logistic order=data;
class drug; model r/n=drug|dose/scale=none aggregate; run;
proc genmod; class drug; model r/n=drug|dose/dist=b link=logit; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

0F

Value

Value/DF

Pr > ChiSq

4
4

1 .6398
1 .6412

0,.4100
0 .4103

0.8016
0.8014

Deviance
Pearson

Analysis of Maximum Likelihood Estimates

Parameter

DF

Intercept
drug
a
dose
dose*drug a

1
1
1
1

Standard
Error

Estimate
-1.6678
-1.8623
5 . 0969
0.0411

0 . 4346
0 . 4346
0 . 7608
0 . 7608

Chi-Square

Pr > ChiSq

14.7279
18.3629
44.8766
0.0029

0.0001
<.0001
<.0001
0.9569

GENMOD :
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square

DF

Value

Value/DF

4
4

1.6398
1.6412

0.4100
0.4103

Analysis Of Parameter Estimates

Parameter
Intercept
drug
dose
dose*drug

a
a

DF

Estimate

Standard
Error

1
1
1
1

0.1945
-3.7246
5.0558
0.0822

0.2928
0.8692
1.0667
1.5217

Wald 957. Confidence


Limits
-0.3794
-5.4282
2.9650
-2.9002

0.7684
-2.0211
7.1465
3.0647

ChiSquare
0.44
18.36
22.46
0.00

Pr > ChiSq
0 . 5066
<.0001
<.0001
0.9569

The estimated overall regression models under PROC LOGISTIC and GENMOD are respectively.
j ) = -1.6678 - 1.8623dmgi + 5.0969dose,,
(8.15)
+ 0.0411 (drug * dose)^
= 0.1945 - 3.7246drugi + 5.0558dosej
+ 0.0822(drug * dose);,,

(8.16)

where drug takes the values (1, 1) and (1,0) for drug A and B, respectively,
in models (8.15) and (8.16). These are the default coding schemes in PROC
LOGISTIC and GENMOD, respectively.

8.4. EXAMPLE 8.4: RELATIVE POTENCY IN BIOASSAYS

307

Notice that the parameter estimate for drug in PROC LOGISTIC is half that
from PROC GENMOD. This is also true for the interaction term as well as
their corresponding standard errors.
Specifically, if we adopt the PROC logistic output, this model reduces to the
two separate estimated regression lines
-3.5301 + 5.138dosej For Drug A drug = 1
0.1945 + 5.056dosej For Drug B drug = -1
Although this model fits the data very well, further examination of the parameters, shows that the interaction term (3% is not significant with a pvalue of
(0.9569). Hence this term can be removed from the model, leading to parallel
response models.
(c) The third model fits parallel regression lines with a common slope but with
different intercepts. That is,
This model gives a G2 = 1.6427 on 5 degrees of freedom. Again the SAS
software program for implementing this and a partial output from PROC
LOGISTIC is presented below:
set data;
proc logistic order=data;
class drug; model r/n=drug dose/scale=none aggregate; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

5
5

Deviance
Pearson

Value

Value/DF

Pr > ChiSq

0. 3285
0. 3299

0.8960
0.8952

1.6427
1 . 6496

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
drug
a
dose

DF

Estimate

Standard
Error

Chi-Square

Pr > ChiSq

1
1
1

-1.6522
-1.8486
5.0964

0.3232
0.3520
0.7607

26.1349
27.5840
44.8826

<.0001
<.0001
<.0001

This model fits our data well and indicates that there is a common dose effect
for the two drugs, which is another way of stating the parallelism hypothesis.
The model has the estimated response function,
j) = -1.6522 - 1.8486drug. + 5.0964 dose,,
where drug and dose are as defined previously.
The "relative potency" of the two drugs from this analysis under PROC LOGISTIC (effect coding scheme) is obtained as
- 2J3\

Relative potency = 10 "2 = io-7253 = 5.313


It is not too difficult to see that the individual estimated regression lines are:

CHAPTER 8. MODELS FOR BINARY

308

-3.5008 + 5.0964 dose.,


0.1964 + 5.0964 dose,,

RESPONSES

If drug A
If drug B

The relative potency from these individual models can similarly be obtained
as:

-(-3.5008-0.1964)
1 0 5 . 0 9 6 4

If PROC GENMOD is employed, the relative potency will be computed from


-ft
the overall estimated equation as simply 10 to . With PROC GENMOD,
#L = 3.6972 and fa = 5.0964; hence, relative potency equals
ir)3. 6972/5. 0964 _ irvO.7255 _ r 01 en

(d) The fourth fit statement fits a model with different slopes and a common intercept to the data. That is, the model is given by:
logit(TTij) = A) + fa j dose j + faij(dr\ig * dose)ij
The model has a G2 = 26.634 on 5 d.f. The model is inadequate.
(e) The final fit statement (v) fits the covariate X ignoring the grouping to see if a
single line fits the data. That is, the model is
) = fa -f
2

Again, the corresponding G = 39.262 on 6 d.f. This model also does not fit
the data.
From the above analyses, the best models for the data are the parallel regression
models fitted in (c) above. The sketch of the parallel models are displayed in Figure
8.9.

Figure 8.9: Predicted probabilities plot


Model 4 clearly indicates that there are significant differences between the two drugs
as measured by the relative potency of the drugs.
The above analysis for the data in Table 8.8 can therefore be summarized as:
(i) A linear logistic model seems appropriate for the data.
(ii) The dosage effect is the same for both drugs.
(iii) Drug B is 5.313 times more potent than drug A.

309

8.5. ANALYZING DATA ARISING FROM COHORT STUDIES

8.5

Analyzing Data Arising from Cohort Studies

A cohort study involves selecting a sample of individuals free of the diseases under
investigation. These individuals in the sample are then stratified according to exposure factors of interest, and followedup for a given period of time. Each individual is
then classified according to whether or not he or she has developed the disease that
is being studied. The relationship between the probability of disease occurrence
and the exposure factors is then modeled. Such study has also been characterized
as the prospective study.

8.5.1

Example 8.5: The Framingham Study Data

The data in Table 8.9 come from a cohort study where the relationship between
the probability of disease occurrence and the exposure factor is to be modeled.
Framingham is an industrial town located some 20 miles west of Boston. In 1948,
a cohort study was begun with the broad aim of determining which of a number
of potential risk factors are related to the occurrence of coronary heart disease
(CHD). At the start of the study, a large proportion of the town's inhabitants were
examined for the presence of CHD. Measurements were also made on a number
of other variables, including age, serum cholesterol level, systolic blood pressure,
smoking history, and the result of an electrocardiogram. Those individuals found
to be free of CHD at that time were followedup for 12 years and those who developed
CHD during that period were identified. The resulting data set consisted of this
binary response variable and information on the risk factors of 2187 men and 2669
women aged between 30 and 62. The summary data below are adapted from Truett,
Cornfield, and Kannel (1967) and relate to the initial serum cholesterol level (in
units of mg/100 ml) of these individuals, cross-classified according to their age and
sex.
Sex

Male
Female

Age Group
30-49
50-62
30-49
50-62

< 190
13/340
13/123
6/542
9/58

Serum Cholesterol Level


190 - 219 220 - 249
40/421
18/408
35/174
33/176
5/552
10/412
12/135
21/218

> 250
57/362
49/183
18/357
48/395

Table 8.9: Proportions of cases of CHD, cross-classified by age, sex and initial
cholesterol level
Source: Journal of Chronic Diseases, 20, 511-524.
In the data in Table 8.9, the ratio 13/340 for males aged 30 to 49 for instance,
denotes that of the 340 individuals in this category, 13 had CHD by the end of
the study. Our goal here is to model the extent to which CHD is associated with
initial serum cholesterol level, after adjusting for the effects of age group and sex
(confounders) and whether the degree of association is similar for each sex and age
group. The data have also been analyzed by Collet (1991). The exposure or risk
factor here is the serum cholesterol level and the disease is coronary heart disease.
The model of interest here is the model given by:

310

CHAPTERS.

MODELS FOR BINARY

RESPONSES

logit(ptjfc) = 0o + 0ii sex; + foj age,,- + /33fc cholfc + /34ij (sex * age),-,+ 05.7'fc (age * chol)jfc + Psik (sex * chol)^ + higher terms

(8.17)

where
sexi: Is the effect of the i-th sex
age^: Is the effect of the j-th age group
cholfc: Is the effect due to group k-th level of the cholesterol
The last three terms in (8.17) are the interaction terms. Note that because sex
and age are confounders, we must include them and possibly their interaction (if
significant) in the model. We give below the SAS software statements and the
corresponding partial output for the analysis of the above data.
data cohort;
do sex=l to 2; do age=l to 2; do chol=l to 4;
input r nflffl;output; end; end; end;
datalines;
13 340 18 408 40 421 57 362 13 123 33 176 35 174 49 183
6 542 5 552 10 412 18 357 9 58 12 135 21 218 48 395
proc logistic; class sex age chol;
model r/n=sex|age|chol/selection=forward details;
run;
Type III Analysis of Effects
Wald
Effect
DF
Chi-Square

1
1
1
3
3

sex
age
sex*age
chol
age*chol

Pr > ChiSq

78,.2833
109 .9376
5 .5202
55 .8450
14 .5409

<.0001
<.0001
0.0188
<.0001
0.0023

Residual Chi-Square Test


Chi-Square

DF

Pr > ChiSq

8.8651

0.1813

Analysis of Effects Not in the Model

Effect
sex*chol

DF

Score
Chi-Square

Pr > ChiSq

4.4495

0.2168

NOTE: No (additional) effects met the 0.05 significance level for entry into
the model.
Summary of Forward Selection

Step

1
2
3
4
5

Effect
Entered

age
sex
chol
age*chol
sex*age

DF

Number
In

1
1
3
3
1

1
2
3
4
5

Score
Chi-Square
142.8944
82.3854
62.2124
13 . 3407
5 . 5649

Pr > ChiSq
<.0001
<.0001
<.0001
0 . 0040
0.0183

We employ PROC LOGISTIC to conduct a forward selection procedure for the


model above. The significant effects and interactions are those displayed in the
summary of the forward selection procedure. Thus, other higher terms are not
needed in the model above. A similar result from GENMOD gives the typeS analysis
displayed below:

311

8.5. ANALYZING DATA ARISING FROM COHORT STUDIES


set cohort; proc genmod; class sex age chol;
model r/n=sexI age Ichol/dist=b type3; run;
LR Statistics For Type 3 Analysis

Source

DF

ChiSquare

1
1
1
3
3
3
3

55.93
114.31
9.50
47.44
3.94
15.92
3.47

sex
age
sex*age
chol
sex*chol
age*chol
sex*age*chol

Pr > ChiSq
<.0001
<.0001
0.0021
<.0001
0.2681 ns
0.0012
0.3248 ns

Based on the above initial analysis, therefore, our reduced model is now of the form:
logit(pijfc) = A) + Pu sexi + fa agej + /33k cholfc + fan (sex * age)^(8.18)

(age * chol)jfe

We present the SAS software program and output from PROC LOGISTIC for
implementing the reduced model in (8.18).
set cohort;
proc logistic; class sex (ref=last) age (ref=last) chol (ref=last)/param=ref ;
model r/n=sex|age age I chol/scale=none aggregate covb; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Deviance
Pearson

Value

Value/DF

Pr > ChiSq

7.5847
8.8651

1.2641
1.4775

0.2701
0.1813

Number of unique profiles: 16


Type III Analysis of Effects

DF

Wald
Chi-Square

Pr > ChiSq

1
1
1
3
3

26.1780
23.6540
5.5202
8.3190
14.5409

<.0001
<.0001
0.0188
0.0399
0.0023

sex
age
sex*age
chol
age* chol

Parameter
Intercept
sex
age
sex*age
chol
chol
chol
age*chol
age*chol
age*chol

1
1
1
1
2
3
1
1
1

Analysis of Maximum Likelihood Estimates


Standard
Wald
DF
Estimate
Chi-Square
Error

Pr :> ChiSq

210..0556
26,.1780
23,.6540
5 .5202
6.5198
3,.5526
2,.6291
5,.5812
12,.1190
1,.4223

<,0001
,0001
<.,0001
0.,0188
0.,0107
0. 0595
0. 1049
0. 0182
0. 0005
0. 2330

1
2
3

1
1
1
1
1
1
1
1
1
1

-1.,8972
0.,7907
-1.,1161
0.,5718
-0.,6667
-0.3806
-0.3009
-0.8776
-1. 1075
-0.3191

0,.1309
0,.1545
0,.2295
0.2434
0.2611
0,.2019
0,.1856
0,.3715
0,.3181
0,.2676

Estimated Covariance Matrix


Variable
Intercept
sexl
agel

Intercept
0.017135
-0 .01034
-0 .01714

sexl
-0.01034
0 . 023886
0.010337

agel

sexlagel

-0.01714
0.010337
0.052665

0.010337
-0.02389
-0.03699

choll
-0 . 00888
-0.00874
0.008881

cho!2
-0.00991
-0.00636
0.009908

CHAPTER 8. MODELS FOR BINARY RESPONSES

312
sexlagel
choll
cho!2
cho!3
age 1 choll
agelcho!2
agelcho!3

0.010337
-0.00888
-0.00991
-0.0111
0.008881
0 . 009908
0.011102

-0.02389
-0.00874
-0.00636
-0 . 0036
0.008737
0 . 006364
0 . 003605

-0 . 03699
0.008881
0.009908
0.011102
-0.02581
-0 . 02596
-0.02596

0.059229
0 . 008737
0.006364
0 . 003605
-0.00674
-0.00553
-0 . 00436

0.008737
0.06818
0.01499
0.013981
-0.06818
-0.01499
-0.01398

0 . 006364
0.01499
0 . 040777
0.013622
-0.01499
-0.04078
-0.01362

Estimated Covariance Matrix (continued)


Variable

cho!3

Intercept
sexl
agel
sexlagel
choll
cho!2
cho!3
agelcholl
agelcho!2
agelcholS

-0.0111
-0.0036
0.011102
0 . 003605
0.013981
0.013622
0 . 03444
-0.01398
-0.01362
-0.03444

age1choll
0.008881
0.008737
-0.02581
-0.00674
-0.06818
-0.01499
-0.01398
0.13801
0.030461
0.029362

agelcholS

agelcho!2

0.011102
0.003605
-0.02596
-0.00436
-0.01398
-0.01362
-0 . 03444
0.029362
0.029029
0.071599

0.009908
0.006364
-0 . 02596
-0 . 00553
-0.01499
-0 . 04078
-0.01362
0.030461
0.101201
0.029029

This model fits the data with a deviance of 7.5847 on 6 degrees of freedom.

8.5.2

Model Parameter Interpretations

Since the interaction term (age * chol) is significant, we only need to concentrate
on this rather than on the main effects of age and cholesterol level. It is important
to note here that the main effect of a variable that is also in a two-way interaction
can be interpreted as the effect of that variable when the other variable is 0. We
note that the adjusted log-parameter estimates for both the chol and (age * chol)
are given below, respectively, under PROC LOGISTIC.
Cholesterol levels (k)
Parameters
fok
Odds
Odds2

-0 .6667
0. 5134
1.000

-0.3806
0.6835
1.3313

-0.3009
0.7402
1.4418

0 0000
1 0000
1 9478

Log-parameters f3$jk for the interaction term


Cholesterol levels (k)
Age Group (J)
30-49 (1)
50-62 (2)

-0.8776
0.0000

-1.1075
0.0000

-0.3191
0.0000

0.0000
0.0000

Here, odds and odds2 refer respectively to the odds based on the last category
reference (GENMOD or LOGISTIC with REF command) and the odds from the
perspective of referencing the first lowest category. We have asked PROC LOGISTIC to code all variables with the cell referencing approach (this is accomplished
above with the REF=LAST in the class statement). PROC GENMOD in any case
uses this coding scheme. Thus here, variable age is coded as:
age =

1 if 30-49
0 if 50-62

8.5. ANALYZING DATA ARISING FROM COHORT STUDIES

313

Thus, the effects of cholesterol (chol) coefficients represent the effect of cholesterol
when age = 0, that is, when the individual is aged between 50 and 62. For those
individuals aged 30-49, we add the effect of the interaction coefficients to those of
chol. These are similarly displayed below.
Cholesterol levels

Age

30-49
50-62

-1.5443
-0.6667

-1.4881
-0.3806

-0.6200
-0.3009

0.0000
0.0000

Here for instance, -1.5443 = -0.6667 + (-0.8776) and -0.6200 = -0.3009 +


(0.3191). That is, for individuals in the age group 50-62, the log-odds (ljk}
would be computed for j = 1, 2 and k = 1, 2, 3, 4 as:
= /33fc
That is, under model (8.18), for individuals in the j-th group, the odds ratio for an
individual exposed to level k of cholesterol, relative to someone exposed who has
been exposed to level k', is estimated by:
= /33fc choU + fejk (age * chol)jk

(8.19)

- (33k> chol'fc + i35jk> (age * chol) jfc /


The expression in (8.19) can be written more succinctly as:
= (03k ~ fov) + (Ay* - Ay*')

(8.20)

Thus for comparison among the 50-62 age group, (3$jk = fajk' 0. In this case the
relative odds reduces to
(8.21)

= 03k ~

Similarly, for comparisons among persons aged between 30 and 49, the relative odds
is as given in (8.20).
Exponentiating these log-odds, we have the following estimated odds ratio relative to both levels 4 (GENMOD) and 1, respectively.
Age
30-49

50-62

Odds
Odds!
Odds2
Oddsi
Odds2

1
0.2135

1.000
0.5134
1.000

Cholesterol levels
2
3
0.2258 0.5379
2.5194
1.0576
0.6835 0.7402
1.3313 1.4418

4
1.0000
4.6838
1.0000
1.9478

The above table shows that the relative risk of CHD increases more rapidly with
increasing initial serum cholesterol level for persons in the 30-49 age group, compared to that for persons in the 50-62 age group. Further, the relative risk of CHD
for persons with serum cholesterol at level 2 is greater for those in the older age
group, whereas if the initial cholesterol level exceeds level 3, then the relative risk
of CHD is greater for individuals in the younger age group.
Suppose we also wish to obtain confidence intervals for the true odds ratios then
we note, for example, for the odds of CHD occurring in persons aged 50-62 with an

314

CHAPTERS.

MODELS FOR BINARY

RESPONSES

initial cholesterol level 2 relative to those with initial cholesterol level 1 is estimated
using (8.21) by:
032- 03i = -0.3806 - (-0.6667) = 0.2861
The estimated variance from the variance-covariance matrix above is given by
Var{/332 - 31} = Var(/?3i) + Var(&2) - 2COV(/33i, 32)
= 0.06818 + 0.04078 - 2(0.01499)
= 0.07898
These are given respectively by parameter variances and covariance of Choll and
Chol2 in the SAS software output above. Consequently, a 95% confidence interval
for the difference in the log odds ratios is therefore computed as:
0.2861 1.96v/0.07898 = 0.2861 0.5508 = [-0.2647,0.8369]
the corresponding interval for the true odds ratio is (0.77,2.31). This interval includes 1, which suggest that the risk of CHD occurring in persons aged 50-62 is not
significantly different for those with initial serum cholesterol levels less than 190
mg/100 ml compared to those with cholesterol levels between 190 and 210 mg/100
ml. We can implement the above in SAS software using PROC LOGISTIC with
the following statements, since this contrast from (8.21) is /332 /?si, with k = 1
and k' = 1, respectively. We present only the result pertaining to the contrast
statements only below.
set cohort; proc logistic;
class sex (ref=last) age (ref=last) chol (ref=last)/param=ref;
model r/n=sex|age age Ichol/scale=none aggregate;
contrast '2 vs 1 (50-62)' chol - 1 1 0 0/estimate=both;
run;
Contrast Test Results

Contrast
2 vs 1 (50-62)

DF

Wald
Chi-Square

Pr > ChiSq

1.0365

0.3086

Contrast

Contrast Rows Estimation and Testing Results


Standard
Type
Row
Estimate
Error
Alpha

2 vs 1 (50-62)
2 vs 1 (50-62)

FARM
EXP

1
1

0.2861
1.3312

0.2810
0.3741

0.05
0.05

Lower
Limit

Upper
Limit

-0.2647
0.7674

0.8369
2.3093

Contrast Rows Estimation and Testing Results


Wald
T
Row
Contrast
yp e
Chi-Square
Pr > ChiSq
2 vs 1 (50-62)
2 vs 1 (50-62)

PARM
EXP

1
1

1.0365
1.0365

0.3086
0.3086

The SAS software program requests SAS software to use the reference cell coding
scheme. The contrast statement allows us to formally estimate the relative odds of
cholesterol level 2 to cholesterol level 1 for individuals aged 50-62. The log contrast
estimate of 0.2861 agrees with our earlier result and the actual odds ratio of 1.3312
with a 95% C.I. of (0.7674,2.3093). Similarly, the C.I. for persons aged 30-49 with
initial cholesterol level of say 3 and 1 can also be obtained using (8.20). This is
given as: (^33 fai) + ($513 /#5n), since j = 1. Again this is implemented in SAS
software with the following program and a partial output.

8.5. ANALYZING DATA ARISING FROM COHORT STUDIES

315

set cohort; proc logistic;


class sex (ref=last) age (ref=last) chol (ref=last)/param=ref;
model r/n=sex|age age|chol/scale=none aggregate;
contrast '3 vs 1 (30-49)' chol - 1 0 1 0
age*chol - 1 0 1 0 /estimate=both; run;
Contrast Test Results
Wald
Chi -Square
DF

Contrast

3 vs 1 (30-49)

Pr > ChiSq

11.2089

0 . 0008

Contrast

Contras3t Rows Estimation and Testing Results


Standard
Alpha
Type
Row
Estimate
Error

3 vs 1 (30-49)
3 vs 1 (30-49)

FARM
EXP

1
1

0.2761
0.6958

0.9243
2.5202

0.05
0.05

Lower
Limit

Upper
Limit

0.3832
1 . 4670

1.4655
4.3296

Contrast Rows Estimation and Testing Results


Wald
Contrast
Type
Row
Chi-Square
Pr > ChiSq
3 vs 1 (30-49)
3 vs 1 (30-49)

FARM
EXP

1
1

11.2089
11.2089

0.0008
0.0008

The estimate of 0.9243 agree with that obtained from the table above, namely,
0.6200 + 1.5443 = 0.9243. The standard error can be appropriately obtained, and
confidence intervals obtained.
Similarly, for a given level of cholesterol, the true odds and corresponding confidence intervals for comparisons of the age*chol interaction terms at fixed cholesterol
levels 1 and 4 (k = 1 and 4, respectively) are implemented again with the following
contrast statements in PROC LOGISTIC.
set cohort; proc logistic;
class sex (ref=last) age (ref=last) chol (ref=last)/param=ref;
model r/n=sex|age age|chol/scale=none aggregate;
contrast 'agel vs age2 at k=l' age 1 -1
age*chol 1 0 0 0 - 1 /estimate=both;
contrast 'agel vs age2 at k=4' age 1 -1
age*chol 0 0 0 1 0 0 0 - 1 /estimate=both;
Contrast Rows Estimation and Testing Results
Standard
Type
Row Estimate
Error
Alpha

Contrast

run;

Lower
Limit

Upper
Limit

agel vs age2 at k=l


agel vs age2 at k=l

FARM
EXP

1
1

-1.9938
0.1362

0.3729
0.0508

0.05
0.05

-2.7246
0.0656

-1.2629
0.2828

agel vs age2 at k=4


agel vs age2 at k=4

FARM
EXP

1
1

-1.1161
0.3275

0.2295
0.0752

0.05
0.05

-1.5659
0.2089

-0.6663
0.5136

Contrast Rows Estimation and Testing Results


Wald
Contrast
Type
Row Chi-Square
Pr > ChiSq
agel vs age2 at k=l
agel vs age2 at k=l

FARM
EXP

1
1

28.5863
28.5863

<.0001
<.0001

agel vs age2 at k=4


agel vs age2 at k=4

FARM
EXP

1
1

23.6540
23.6540

<.0001
<.0001

We can obtain true confidence intervals for all the other odds ratios in a similar
manner.

CHAPTER 8. MODELS FOR BINARY RESPONSES

316

We can also employ PROC GENMOD to estimate some of the true comparison
odds ratios produced above. For instance, for the contrast of "3 vs 1 (20-39)" can
be implemented as:
set cohort;
proc genmod;
class sex age chol;
model r/n=sex|age age|chol/dist=b typeS;
contrast '3 vs 1 (30-49)' chol - 1 0 1 0
age*chol - 1 0 1 0 /wald;
estimate '3 vs 1 (30-49)' chol - 1 0 1 0
age*chol - 1 0 1 0 /exp; run;
Contrast Estimate Results

Label
3 vs 1 (30-49)
Exp(3 vs 1 (30-49))

Estimate

Standard
Error

Alpha

0.9245
2.5206

0.2761
0.6960

0.05
0.05

Confidence Limits
0 . 3833
1.4671

1 . 4656
4.3303

Contrast Results

Contrast
3 vs 1 (30-49)

8.5.3

DF

ChiSquare

11.21

Pr > ChiSq

Type

0 . 0008

Wald

Further Analysis

The pattern of the estimated odds ratios and hence the parameter values suggest
that there seems to be a linear trend in the effect of the cholesterol level. We
explore this further by considering the chol variable as a continuous variable having
integer scores 1,2,3,4, respectively. This therefore allows us to fit a linear effect in
cholesterol level model to our data. Implementing this in SAS software with PROC
GENMOD or LOGISTIC, we have the following output;
set cohort; proc genmod; class sex age;
model r/n= sex I age cholIage/dist=b type3; run;
proc logistic;
class sex (ref=last) age (ref=last)/param=ref;
model r/n=sex|age age Ichol/scale=none aggregate;
units chol=l 2 3 4 age=0;
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

Value/DF

Deviance
Pearson Chi-Square

10
10

11.0318
12.1454

1..1032
1,.2145

Analysis Of Parameter Estimates

Parameter
Intercept
sex
age
sex*age
chol
chol*age

1
1
1
1

DF

Estimate

1
1
1
1
1
1

-2 .7563
0.7898
-2 .6652
0.5707
0.2099
0.3845

Standard
Error
0.2725
0.1544
0 . 4006
0.2431
0.0748
0.1111

Wald 957.
Confidence Limits

ChiSquare

-3. 2903
0.4872
-3.4503
0.0943
0.0633
0. 1668

102,.32
26..16
44..27
5..51
7..87
11..98

-2.2222
1 .0924
-1.8801
1 .0471
3566
0.
0.
6022

Pr > ChiSq
<.0001
<.0001
<.0001
0.0189
0 . 0050
0.0005

8.6. EXAMPLE 8.6: ANALYSIS OF DATA IN TABLE 8.1

317

LR Statistics For Type 3 Analysis

Source

DF

sex

age
sex*age
chol
chol*age

1
1
1

ChiSquare

Pr > ChiSq

86.34
44.89
5.65
56.20
12.07

<.0001
<.0001
0.0175
<.0001
0.0005

A test of whether the linear model is worthwhile is provided by G2=11.0318 - 7.5847


= 3.4471 on 10 - 6 = 4 degrees of freedom (pvalue = 0.4860). This model not only
fits the data but also indicate that a linear effect of chol will be very appropriate.
Thus when age = 0, the estimated relative risk of CHD for the 50-62 age group is
exp(0.2099) = 1.2336. That is, each 1 level increase in chol increases the relative
risk of a CHD by 1.2336 among persons aged 50-62. A 2 level increase will make this
relative risk be 1.23362 = 1.5218. Similarly, for persons aged 30-49, the estimated
relative risk is exp(0.2099 + 0.3845) = exp(0.5944) = 1.8119. Once again, each 1
level increase in chol increases the relative risk of a CHD by 1.8119 among persons
aged 30-49. A 2 level increase will make this relative risk be 1.81192 = 3.2831.
The results are similar when PROC LOGISTIC is similarly employed. We note,
however, that there are very slight differences in some of the parameter estimates.
This does not in any way affect our overall conclusions above.
In the above analysis of a cohort study data, we have assumed that the follow-up
time is the same for each person. However, if the follow-up time differed for each
person, because they have been at risk for different periods before the study began,
or at risk intermittently through the duration of the study, it would be sensible to
take account of this variation in the analysis. Appropriate methods include Collet
(1991) using the Poisson regression method with the offset being the person-years
of exposure. One common approach is to compute the person-years of exposure
for each individuals and the number of individuals who develop the disease in a
particular group is then expressed as a proportion of the person-years of exposure.
This rate of occurrence is then modeled using the Poisson regression method.
On the other hand, if the time from entry into the study until the occurrence of
a particular disease is of interest, then models developed for survival analysis such
as Cox's proportional hazards model can be used.

8.6

Example 8.6: Analysis of Data in Table 8.1

For the data in Table 8.1, our analysis started by fitting a saturated model involving
the explanatory variables site, race, and their interaction using PROC GENMOD.
The type3 partial analysis below indicate that the interaction between site and race
is not significant, and neither is the site effect.
data tabSl;
input site $ race $ r n <B;
datalines;
chton black 56 319 chton white 139 635
evans black 69 407 evans white 184 711
proc genmod order=data; class race site;
model r/n=site|race/dist=bin type3; run;

318

CHAPTERS. MODELS FOR BINARY RESPONSES

LR Statistics For Type 3 Analysis


ChiSource
DF
Square
Pr > ChiSq

1
1
1

site
race
race*site

0.57
12.24
1.23

0 . 4508
0 . 0005
0.2675

Based on this information, we fitted a model involving site and race. This model
although it fits the data with a G2 1.1178 on 1 d.f., the typeS partial analysis
again indicate that the effect of site with a partial pvalue of (0.1722) is again not
significant. We therefore next fit a model involving only race.
set tabSl;
proc genmod order=data;
class race site;
model r/n=race/dist=bin typeS; run;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square

DF

Value

Value/DF

2
2

2.9812
2.9722

1.4906
1.4861

Analysis Of Parameter Estimates

Parameter
Intercept
race

black

DF

Estimate

Standard
Error

1
1

-1.1528
-0.4174

0.0638
0.1172

Wald 957.
Confidence Limits

ChiSquare

-1.2779
-0.6472

326.27
12.68

-1.0277
-0.1877

Pr > ChiSq
<.0001
0.0004

As seen from the SAS software output above, this model fits the data with G*2 =
2.9812 on 2 d.f. We notice that the odds that a Black person will die due to CHD
are given by e~-3320 = 0.66 lower than those for White men over the period of
the study. This conclusion is consistent with those obtained by the authors in the
original article. The estimated odds ratio has a 95% C.I. given by (0.529, 0.829).
We notice from the above output the ratio of "value/DF" is 1.4906. Ideally, we
would want this value to be very close to 1.00. This value does affect our parameter estimates as well as their estimated standard errors, and we would reconsider
this analysis later under discussion of over dispersion in binomial or Poisson based
models.

8.7

Diagnostics

Influential observations and goodness-of-fit in logistic regression can be assessed by


some of the residual statistics discussed in chapter 5 and some other measures that
will be discussed in this section. These are effected by using the option iplots in
PROC LOGISTIC. Let us give an example of this by revisiting the data in Table
8.6. In that section, we found that the complimentary log-log fits our data well.
We reproduce part of the results below with the PROC LOGISTIC options iplots
and influence invoked. The results below are from the PROC LOGISTIC output.
We now discuss some of the terms in this output.
set tab86;
proc logistic;
model r/n=dose/link=cloglog scale=none aggregate influence iplots; run;

8.7. DIAGNOSTICS

Deviance
Criterion

and Pearson Goodness-of-Fit


DF

Value

6
6

Deviance
Pearson

319

3 . 4464
3 . 2947

Statistics

Value/DF

Pr > ChiSq

0.5744
0.5491

0.7511
0.7711

Having fitted the model, the first diagnostic step is usually to examine both the
chi-squared (xi) and deviance (di) residuals for identifying those observations that
do not fit the model well. The two residuals are defined as:

/7

. - "A
" f-t

(8.22)

~ M

V{"iPi(l-pt)}

and

1/2

if yz < ft

- K) log (53)]
/

^'ZCV

x-,1/2

- y<) log (*Eg)J

if y* > ft

where ^ = np^; Xi and d{ are appropriately referred to as the Pearson's and deviance
residuals respectively. These two residuals are designated as reschi and resdev in
SAS software and are produced below for our data. Standardized versions of these
residuals are obtained by dividing the above raw residuals by the appropriate factors
based on the leverages. The corresponding standardized residuals are designated
as streschi and stresdev respectively. We may also note here that G2 =
These are produced by PROG GENMOD in SAS.
The chi-squared and deviance residuals from the complimentary
model applied to data in Table \ref{Ta:86>.
Obs

1
2
3
4
5
6
7
8

DOSE

1.6907
1.7242
1.7552
1.7842
1.8113
1.8369
1.8610
1.8839

log-log

Reschi

Streschi

Resdev

Stresdev

Reslik

0.1825
0.5681
-0.7932
-0.6355
1 . 2430
-0.5413
-0.1212
0.2298

0.2111
0.6701
-0.9258
-0.7132
1.4560
-0.6721
-0.1445
0.2391

0.1806
0.5577
-0.8033
-0.6344
1 . 2888
-0.5237
-0.1188
0.3250

0 . 2088
0.6579
-0.9376
-0.7120
1 . 5097
-0.6503
-0.1416
0 . 3380

0.2094
0.6614
-0 . 9345
-0.7122
1.4953
-0.6580
-0.1425
0.3315

While the raw residuals do not have the unit variances, the standardized residuals
have unit variances, and although the standardized Pearson residuals are not closely
approximated by a normal distribution, Collett (1991) has advocated the use of
the standardized deviance residuals for routine model checking. Any standardized
residual outside [-2,2] will be considered unsatisfactory. We see from the above
results that none of the observations is unsatisfactory.
A check of influential observations is obtained by conducting influential diagnostics either with PROG LOGISTIC OR PROG GENMOD. The Dfbeta for both
the intercept and dose parameters (usually for intercept and covariates parameters) generated indicates that none of the observations when removed (or deleted)
from the model have any significant effects on the original parameter estimates of
39.5725 and 22.0412, respectively. The diagnostics measures Delta deviance
and Delta chi-square indicate by what the goodness-of-fit test statistics G2 and
X2 values would change if the designated observation were deleted from the model.

CHAPTER 8. MODELS FOR BINARY RESPONSES

320

Plots of these against the case number is provided by iplots and displayed by the
Influence options in PROC LOGISTIC. The results show that were we to remove
observation 5 from the model, the G2 and X2 will change by 2.2360 and 2.1200,
respectively: that is, by almost 65% and 64%, respectively.
Regression Diagnostics
Pearson Residual

Deviance Residual

Covariates
Case
Number

1
2
3
4
5
6
7
8

(1 unit = 0.16)
-8 -4 0 2 4 6 8

DOSE

Value

1 . 6907
1 . 7242
1.7552
1.7842
1.8113
1.8369
1.8610
1.8839

0.1825
0.5681
-0 . 7932
-0 . 6355
1 . 2430
-0.5413
-0.1212
0.2298

I*

(1 unit = 0.16)
Value - 8 - 4 0 2 4 6 8
I

*l

1*

0.1806
0.5578
-0.8033
-0.6344
1.2888
-0.5237
-0.1188
0.3249

|
I* I
|
I* I
| *
I
*
|
|
*
|
I
*|

Hat Matrix Diagonal


Case
Number

1
2
3
4
5
6
7
8

(1 unit = 0.02)
0 2 4 6 8 12 16

Value
0.2522
0.2813
0 . 2660
0 . 2060
0.2712
0.3515
0.2961
0.0757

1
1
1
1
1
1
1
1

*
*
*
*
*
*
*

1
1
I
1
I
*l
1
1

Intercept
DfBeta
Value
0.1093
0.3320
-0.3340
-0.0920
-0.1039
0.1127
0.0173
-0.00848

(1 unit = 0.04)
-8 -4 0 2 4 6 8

1
1
1*
1
1
1
1
1

*
*
*
*
*
*
*

Confidence Interval Displacement C


Case
Number

1
2
3
4
5
6
7
8

DOSE
DfBeta
Value
-0 . 1082
-0 . 3274
0 . 3263
0 . 0862
0.1152
-0.1162
-0.0176
0.00859

(1 unit = 0.04)
-8 -4 0 2 4 6 8
1
1*
1
1
1
1
1
1

*
*
*
*
*

1
1
*|
1
I
1
I
1

Confidence Interval Displacement CBar


Case
(1 unit = 0.04)
Number
Value
0 2 4 6 8 12 16
0.0112
0.1263
0.2280
0.1048
0.5749
0.1588
0.00618
0.00433

I*
I

I*

Value
0.0150
0.1758
0.3106
0.1320
0.7888
0.2448
0.00878
0.00468

(1 unit = 0.05)
0 2 4 6 8 12 16

1*
1
1
1
1
1
1*
1*

1
1
1

*
*
*
*
*

Delta Deviance
(1 unit = 0.14)
Value
0 2 4 6 8 12 16
0.0439
0.4374
0.8732
0.5072
2.2360
0.4330
0.0203
0.1099

I
I
*
I*
I *

8.8.

321

OVERDISPERSION
Delta Chi-Square
(1 unit = 0.13)
Value
0 2 4 6 8 12 16

Case
Number
1
2
3

4
5
6
7
8

0.0446
0.4491
0.8570
0.5087
2.1200
0.4517
0.0209
0.0571

I*

*l

These values are quite substantial and influential. We therefore find it attractive to
remove this observation (5-th observation) from the data and refit the new revised
model based on the remaining observations. Results from this implementation are
displayed below. It gives a better fit and the diagnostics procedures reveal no further
lack of fit or influential observations.
set tab86;
proc genmod; where dose ne 1.8113;
model r/n=dose/link=cloglog scale=none aggregate influence iplots; run;

Criterion

Criteria For Assessing Goodness Of Fit


DF
Value

Deviance
Pearson Chi-Square

1.2364
1.1525

Value/DF
0.2473
0.2305

Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

ChiSquare

Intercept
DOSE
Scale

1
1
0

-39.0409
21.7096
1.0000

3.2115
1.7838
0.0000

-45.3353
18.2134

DOSE

Reschi

Streschi

Resdev

Stresdev

Reslik

0.2537
0.7108
-0.5728
-0.3137
-0 . 0640
0.2311
0.3144

0.2924
0.8404
-0.6793
-0 . 3626
-0 . 0844
0.2936
0.3354

0.2499
0.6943
-0.5784
-0.3136
-0.0637
0.2395
0 . 4445

0.2881
0 . 8209
-0 . 6859
-0 . 3624
-0 . 0841
0 . 3044
0.4741

0.2892
0 . 8265
-0 . 6840
-0 . 3624
-0 . 0842
0 . 3003
0.4595

Obs

1
2
3
4
5
6
7

1.6907
1.7242
1.7552
1.7842
1.8369
1.8610
1.8839

Pr > ChiSq
<.0001
<.0001

In conclusion, care must be taken in accepting a logistic regression model without


conducting diagnostic tests for lack of fit and influential observations. Collett (1991)
has given a comprehensive diagnostics tests for linear logistic regression models.
Interested readers are encouraged to consult this reference.

8.8

Overdispersion

A problem that often arises with modeling binomial and Poisson data is overdispersion, where data involving proportions or counts tend to be more variable than the
underlying binomial or Poisson distributions can accommodate. This phenomenon
in the binomial case is also known as the extra binomial or simply overdispersion.
This problem usually arises when the yi observations are correlated. For, instance,
for the logistic model fitted to binomial data, we assume that the yi are independent and follow a binomial distribution. However, if the yi are positively correlated,

322

CHAPTERS.

MODELS FOR BINARY RESPONSES

then the variance of yi will be greater than n^(l p;), and in this case we would
have overdispersion. On the other hand, when the yi are negatively correlated,
the variance of the binomial response variable will be less than the above expected
variance, and in this case we would have underdispersion. This latter case is less frequently encountered than the former, and hence the former, that is, overdispersion
is considered in this section.
Overdispersion also occurs in incorrectly specified models where some interactions or transformed variables have been omitted from the model. However, the
former is more common. For positively correlated binomial observations, we show
below that the ratio of the residual deviance (or G2) to the corresponding degree of
freedom, that is, the residual mean deviance RMD would take one of the three possible values below, which correspond respectively to overdispersion, no dispersion,
and underdispersion. That is,
f > 1 overdispersion
RMD = <
1 no dispersion
(^ < 1 underdispersion
If we consider the binomial case where, a random variable Y has probability TV
of success. Then, if Y given TT has a binomial distribution 6(r/n, TT) and TT has a
distribution of its own with mean ^ and variance a2 , we can write
E(Y) = E[E(Y | TT)] = n^
and the conditional variance of Y can be shown to be:
Varjy} = n//(l - p.) + n(n - l)a2,
that is,
Var{F}
If we let a2 = 0/z(l - /*), then the above becomes
Varjy} = n/z(l -//)[! + (n - 1)0]
where 0 > 0 is an unknown scale parameter.
From the above, it follows that if the Yi are uncorrelated, then, 0 = 0 and the
above leads to Var{Y}=n//(l //). However, if 0 > 0, then Var(Y) will exceed the
binomial variance nipi(\ Pi), and in this case we have overdispersion. Similarly,
if 0 < 0, then Var(Y) < riipi(I Pi), and in this case we have underdispersion.
The conditional variance of Y above is obtained from conditional probability
property, where the unconditional expected value of a random variable Y can be
obtained from the conditional expected value of Y given X. That is,
E(Y) = E{E(Y | X ) } and
V(Y) = E{V(Y \ X ) } + V{E(Y

8.8.1

X)}

Modeling Over dispersed Data

In this section, we shall consider methods that have been used to model overdispersion in binomial data. We shall in all cases reanalyze the data in Table 8.1 where
the logistic model gives a G2 = 2.9812 on 2 d.f. and a corresponding X2 2.9722
on 2 d.f. The ratio, value/DF Deviance/(n - p), an estimate of 0, for either case is
greater than 1, indicating overdispersion for this data set. The first method, which

8.8.

323

OVERDISPERSION

is applicable for the case of equal n^, estimates 0 by the ratio of Pearson's X2 to
its corresponding degrees of freedom. First let us reproduce our earlier analysis but
requesting a printout of the variance-covariance matrix of the parameters. This is
displayed below.
set tabSl;
proc logistic order=data;
class race(ref=last) site(ref=last)/param=ref;
model r/n=race/scale=none covb;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

Deviance
Pearson

DF

Value

2
2

2.9812
2.9722

Value/DF

Pr > ChiSq

1 . 4906
1.4861

0.2252
0.2262

Analysis of Maximum Likelihood Estimates

Intercept
race
black

DF

Estimate

Standard
Error

Wald
Chi-Square

1
1

-1.1528
-0.4173

0.0638
0.1172

326.2642
12.6761

Pr > ChiSq

<.0001
0 . 0004

Estimated Covariance Matrix


Variable

Intercept

raceblack

0.004073
-0.00407

-0.00407
0.013736

Intercept
raceblack

An estimate of 0 is provided by the deviance or Pearson's mean square (1.4906,


1.4861, respectively). Further, the estimated variance-covariance matrix under this
model is:
_ f Q.0041 -0.0041
ij
~ [ -0.0041 0.0137
With the estimate of </> obtained above, a new model is then fitted to the data,
and this is accomplished in SAS software in either PROC LOGISTIC or GENMOD by specifying the options SCALE= (Deviance or Pearson). Suppose we invoke
SCALE=Deviance in PROC LOGISTIC to the data in Table 8.1 then a new estimate of 4> is computed and the model refitted with this new value. We give below
the SAS software output when this is invoked.
set tabSl;
proc logistic order=data;
class race(ref=last)/param=ref;
model r/n=race/scale=deviance covb;
run;
Deviance and Pearson Goodness-of-Fit
Criterion
Deviance
Pearson

DF

Statistics

Value

Value/DF

Pr > ChiSq

2.9812
2.9722

1.4906
1.4861

0.2252
0.2262

Number of events/trials observations: 4

NOTE: The covariance matrix has been multiplied by the heterogeneity factor
(Deviance / DF) 1.49058.

CHAPTER 8. MODELS FOR BINARY RESPONSES

324

Analysis of Maximum Likelihood

Parameter
Intercept
race
black

Estimates

DF

Estimate

Standard
Error

Wald
Chi -Square

1
1

-1.1528
-0.4173

0.0779
0.1431

218.8835
8.5041

Pr > ChiSq
<.0001
0 . 0035

Estimated Covariance Matrix


Variable

Intercept

raceblack

Intercept
raceblack

0 . 006072
-0.00607

-0.00607
0.020475

For these data, an estimated of (f> is given by the column Value/DF= 1.49058.
We note here that the estimates of the parameters /?o and j3\ are unaffected through
this scaling, but the variance covariance matrix of the parameters are now duly
affected. The new variance-covariance matrix is now given from the output by:
0.0061 -0.0061
-0.0061 0.0205
This variance-covariance matrix has been adjusted to take into account the overdispersion present in the data. We note that
' =<?&,- = 1.49058 d,That is,
0.0061
-0.0061
-0.00611 0.0205

ij
_ 07 A
~
~_

0.0041 -0.0041
-0.0041 0.0137

As expected, this model has a deviance of 2.000 on 2 d.f., that is, a 0 value of
1.000, as would be expected, which now suggests no overdispersion. The standard
errors of the parameter estimates are computed by v/C^- We can obtain a similar
result for Pearson's X2 by also specifying SCALE=Pearson in the options line of the
model statement in PROC LOGISTIC. In PROC GENMOD, you can either use the
Pearson's X2 for instance with a PSCALE option, which gives a </> y^Value/DF
or with a SCALE=Pearson option, which gives the square root of </>. Corresponding
options for the deviance (G2} are DSCALE and SCALE=Deviance in the option
statement. We give below a similar partial output when the option is DSCALE in
PROC GENMOD.
set tabSl;
proc genmod order=data; class race;
model r/n=race/dist=b dscale covb; run;
Criteria For Assessing Goodness Of Fit
Criterion

DF

Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2

2
2
2
2

Estimated Covariance Matrix

Prml
Prm2

Prml
0.006072
-0.006072

Prm2
-0.006072
0.02048

Value
2.9812
2.0000
2 . 9722
1 . 9940

Value/DF
1 . 4906
1 . 0000
1.4861
0.9970

8.8. OVERDISPERSION

325

Analysis Of Parameter Estimates

Parameter
Intercept
race
Scale

DF

Estimate

Standard
Error

1
1
0

-1.1528
-0.4174
1.2209

0.0779
0.1431
0.0000

black

Wald 957.
Confidence Limits

ChiSquare

Pr > ChiSq

-1.3056
-0.6979
1.2209

218.88
8.51

<.0001
0.0035

-1.0001
-0.1370
1.2209

NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF.

We observe for GENMOD that the scale parameter is 1.2209 instead of the customary or familiar 1.0. This value is simply the square root of </>, that is, Vl-4906.

8.8.2

Williams' Procedure

While the above corrections for overdispersion give exact parameter estimates for
both uncorrected and corrected overdispersion models, the difference only being the
adjustment of the variance-covariance matrix of the parameter estimates, Williams
(1975, 1982) proposed a procedure for fitting overdispersed data with several explanatory variables that not only modify the variance-covariance matrix (and hence
standard errors) but also the parameter estimates of the model. This procedure
also requires an estimate of (/>, the heterogeneity factor, from the data. Then, if
we consider each binomial observations yi to have weights Wi, the Pearson's test
statistic X2 becomes
2
Wi(yi _ ni
which has approximately the expected value
^Wi(l -WidiVi)[l + </>(ni - 1)]

where Vi = Uipi(l pi) and di is the diagonal element of the variance-covariance


matrix of the linear predictor, gi = ^2/3jXji. Williams 's procedure can only be implemented in PROG LOGISTIC. GENMOD does not yet have options for Williams's
procedure. Once </> is estimated from the data, the weights [1 + (rii l)^]"1 are
then used in fitting models that have fewer terms than our original full model. We
present again SAS software implementation of Williams's procedure on the data in
Table 8.1.
proc logistic order=data;
class race(ref =last) site(ref=last)/param=ref ;
model r/n=race site/scale=williams covb; run;
Model Information
Data Set
Response Variable (Events)
Response Variable (Trials)
Number of Observations
Weight Variable
Sum of Weights
Model
Optimization Technique

WORK.TAB81
r
n
4
1 / ( 1 + 0.000522 * (n - 1) )
1605.3396016
binary logit
Fisher's scoring

Deviance and Pearson Goodness-of-Fit Statistics


Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

1
1

0.9968
0.9999

0.9968
0.9999

0.3181
0.3173

326

CHAPTERS.

MODELS FOR BINARY RESPONSES

Number of events/trials observations: 4


NOTE: Since the Williams method was used to accommodate overdispersion, the Pearson
chi-squared statistic and the deviance can no longer be used to assess the
goodness of fit of the model.
Analysis of Maximum Likelihood Estimates

Parameter
Intercept
race
black
site
chton

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

1
1
1

-1.0908
-0.4197
-0.1353

0.0934
0.1306
0.1228

136.3507
10.3307
1.2139

<.0001
0.0013
0.2706

We observe here that we have fitted the full model involving site and race in order to
implement Williams's overdispersed model. It has been suggested that it is better
to use a full model when employing Williams's procedure, as this would reduce the
risk of corrupting <f) with a misspecified or incorrect model. In the analysis above,
the estimate of (j) is 0.000522 and is given at the beginning of the SAS software
output under the formula for the WEIGHT variable. Because again the site effect
is not significant, we can remove this variable from the model, giving a reduced
model involving only race. In general, for multivariable explanatory models, the
maximum likelihood will be examined to eliminate certain variables that are not
significant at this stage, and a revised model will be fitted by specifying in the
options statement, the value of </> estimated above, namely; Scale = Williams (</>) in
the model statement. That is, scale=williams(0.000522).
set tabSl;
proc logistic order=data;
class race(ref=last)/param=ref;
model r/n=race/scale=williams(0.000522) covb;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

2
2

2.2141
2.2086

1.1070
1.1043

0.3305
0.3314

Number of events/trials observations: 4


Analysis of Maximum Likelihood Estimates

meter
Intercept
race
black

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

1
1

-1.1544
-0.4153

0.0742
0.1305

241.8578
10.1345

<.0001
0.0015

Odds Ratio Estimates


Point
Estimate

Effect
race black vs white

0.660

957. Wald
Confidence Limits
0.511

0.852

Estimated Covariance Matrix


Variable

Intercept

raceblack

Intercept
raceblack

0.00551
-0.00551

-0.00551
0.017019

When this was applied to our data, parameter values not only changed, but their

8.9. OTHER TOPICS: CASE-CONTROL DATA ANALYSIS

327

estimated standard errors are also lower than the unconnected for dispersion model.
This of course in turn affect the estimated odds ratio, as well as its corresponding
95% confidence intervals.

8.9

Other Topics: Case-Control Data Analysis

8.9.1

Modeling Case-Control Study Data

In what follows, we give examples of analysis of data arising from case-control


studies having binary outcome. The first example relates to analyzing the general
case-control study data while the second example analyzes data arising from a
matched case-control study. The data for both examples are taken from Collett
(1991), who offers similar analyses for both data sets.

8.9.2

Example 8.7: Diverticular Disease Data

Diverticular disease and carcinoma of the colon are among the most common diseases of the large intestine in the population of the UK and North America. Diverticular disease is estimated to be present in one-third of the population over 40
years of age, while carcinoma of the colon accounts for about 10,000 deaths in the
UK alone per annum. It has been found that there is a geographical association
between the prevalence of the two diseases in that their worldwide incidence tends
to go hand in hand. In other to investigate whether the two conditions tend to be
associated in individual patients, a case-control study was carried out and reported
by Berstock, Villers, and Latto (1978).
A total of 80 patients, admitted to the Reading group of hospitals between
1974 and 1977 with carcinoma of the colon, were studied and the incidence of
diverticular disease in each patient was recorded. A control group of 131 individuals
were randomly selected from a population known not to have colonic cancer, and
were similarly tested for the presence of diverticular disease. Table 8.10 gives the
proportions of individuals with diverticular diseases, classified by age, sex, and
according to whether or not they had colonic cancer.
To facilitate analysis, the data is first transformed to reflect the proportion of cases
in the case-control study. In Table 8.11, for instance, are the case-control format
necessary for the 40-49 M, 40-49 F, 80-89 F, and 80-89 M, respectively.
The original data with the other variables are now rearranged and the results presented in Table 8.12 together with the proportion of cases with or without the
presence of diverticular disease (DD).
In this setup, if an individual has the diverticular disease, then DD takes the value
1 and 0 for those without the disease. That is,
_ J 1 If diseased
} 0 Otherwise
The following SAS software program and the accompanying partial outputs accomplish the above, together with the fitting of one of the many models applied to the
data. The full output is available in appendix F.5.
data case;
input age $ sex $ rl nl r2 n2 (0(0;

CHAPTER 8. MODELS FOR BINARY RESPONSES

328

Age
interval
40-49

Midpoint of age
age range
44.5

50-54

52.0

55-59

57.0

60-64

62.0

65-69

67.0

70-74

72.0

75-79

77.0

80-89

84.5

Sex
M
F
M
F
M
F
M
F
M
F
M
F
M
F
M
F

Proportion with DD
Cancer
patients
Controls
0/3
0/7
0/6
1/15
1/2
1/7
0/0
0/0
2/5
3/15
1/7
4/18
1/5
5/18
0/2
2/8
1/4
6/11
0/5
7/17
0/5
1/4
3/13
2/6
1/3
0/0
5/9
0/0
1/2
4/5
4/9
0/0

Table 8.10: Proportion of individuals with diverticular disease (DD) classified by


age, sex and the presence of colonic cancer
rrl=nl-rl; rr2=n2-r2;
ndd=rl+r2; nndd=rrl+rr2;
rdd=r1; rndd=rr1;
drop rrl rr2 rl nl r2 n2;
datalines;
0 3 0 7 40-49 f 0 6 1 15
40-49
50-54 m 1 2 1 7 50-54 f 0 0 0 0
55-59 m 2 5 3 15 55-59 f 1 7 4 18
60-64 m 1 5 5 18 60-64 f 0 2 2 8
65-69 m 1 4 6 11 65-69 f 0 5 7 17
70-74 m 0 5 1 4 70-74 f 3 13 2 6

75-79 m 1 3 0 0 75-79 f 5 9 0 0
80-89 m 1 2 4 5 80-89 f 4 9 0 0
proc print;
var rndd nndd rdd ndd;
run;
data newl (rename=(rndd=r nndd=n));
set case; dd=0; output;
drop rdd ndd; run;
data new2 (rename=(rdd=r ndd=n));;
set case; d=l; output;
drop rndd nndd; run;
data comb; set neul new2;
proc print data=comb; run;
data comb2; set comb;
if r=0 and n =0 then delete;
proc print data=comb2; run;
proc genmod order=data data=comb2;
class age sex dd;
model r/n=age|sex dd/dist=binomial link=logit typeS;
run;
proc logistic order=data data=comb2;
class age sex dd;
model r/n=agelsex dd/scale=none aggregate;
run;

8.9. OTHER TOPICS: CASE-CONTROL DATA ANALYSIS

329

DD
1
0

Cases
0
3

Controls
0
7

Total
0
10

DD
1
0

Cases
0
6

Controls
1
14

Total
1
20

Total

10

Total

15

21

DD
1
0

Cases
1
1

Controls
4
1

Total
5
2

DD
1
0

Cases
4
5

Controls
0
0

Total
4
5

Total

Total

Table 8.11: Transformations before analysis for 40-49 M, 40-49 F, 80-89 M, and
80-89 F respectively.
Proportion of Cases
Age interval Sex Without DD With DD
40-49
M
3/10
0/0
F
0/1
6/20
75-79

M
F
M
F

80-89

2/2
4/4
1/2

1/1
5/5
1/5
4/4

5/5

Table 8.12: Rearranged result for data in Table 8.10


Obs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

age
40-49
40-49
50-54
50-54
55-59
55-59
60-64
60-64
65-69
65-69
70-74
70-74
75-79
75-79
80-89
80-89
40-49
40-49
50-54
50-54
55-59
55-59
60-64
60-64
65-69
65-69
70-74
70-74
75-79
75-79
80-89
80-89

m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f
m
f

10
20
7
0
15
20
17
8
8
15
8
14
2
4
2
5
0
1
2
0
5
5
6
2
7
7
1
5
1
5
5
4

3
6
1
0
3
6
4
2
3
5
5
10
2
4
1
5
0
0
1
0
2
1
1
0
1
0
0
3
1
5
1
4

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

330

CHAPTERS.

MODELS FOR BINARY RESPONSES

Criteria For Assessing Goodness Of Fit


Criterion

DF

Value

Value/DF

Deviance
Pearson Chi-Square

13
13

9.1927
8.5699

0.7071
0.6592

The LOGISTIC Procedure


Model Information

Data Set
Response Variable (Events)
Response Variable (Trials)
Number of Observations
Link Function
Optimization Technique

WORK.COMB2
r
n
29
Logit
Fisher's scoring

Response Profile
Ordered
Value

Binary
Outcome

1
2

Total
Frequency

Event
Nonevent

80
131

Class Level Information


Design Variables
Class

Value

a g e

40-49
50-54
55-59
60-64
65-69
70-74
75-79
80-89

1
0
0
0
0
0
0
0

0
1
0
0
0
0
0
0

0
0
1
0
0
0
0
0

0
0
0
1
0
0
0
0

0
0
0
0
1
0
0
0

0
0
0
0
0
1
0
0

0
0
0
0
0
0
1
0

sex

f
m

1
0

dd

0
1

1
0

Deviance and Pearson Goodness-of-Fit Statistics


Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

13
13

9.1929
8.5700

0.7071
0.6592

0.7582
0.8046

Number of unique profiles: 29


Odds Ratio Estimates

Effect
dd

0 vs 1

Point
Estimate
2.058

957. Wald
Confidence Limits
0.857

4.945

The analysis is based on 29 out of the 32 observations in the data set. This is

8.9. OTHER TOPICS: CASE-CONTROL DATA ANALYSIS

331

because observations 4, 17, and 20 have both r and n = 0 and therefore not used
in the analysis. We present in Table 8.13 other possible models applied to the data
together with their deviances and corresponding degrees of freedom.
Model
A*

{Age}
{Sex}
{Age, Sex}
{Age Sex}
{Age Sex, DD}
{Age Sex, DD Age}
{Age Sex, DD Sex}

d.f.
28
21
27
20
14
13
6
12

G'2
73.49
24.05
69.01
22.73
11.97
9.19
3.71
8.13

Table 8.13: Results of models applied to the data


There are possible confounding effects of age and sex on the relationship between
the occurrence of colon cancer and diverticular disease. Hence, the effect of the
exposure factor DD needs to be assessed after due allowance for the effects of age
and sex. Only age seems to have a potential confounding effect, while the sex effect
is very light. There is interaction present between age and sex.
Once DD is added to the model, there does not seem to be significant interaction
effects of DD with either age or sex, based on the differences in the G2 and corresponding degrees of freedom. The DD parameter for DD in GENMOD is 0.7217 for
DD=0 and 0 for DD=1. Hence the ratio of the odds of colonic cancer relative to one
without is exp(0.7217) = 0.49. This implies that an estimate of the relative risk
of colonic cancer occurring in patients with diverticular diseases, relative to those
without is 0.49. That is, the occurrence of diverticular disease decreases the risk
of colonic cancer, in other words, a patient with diverticular disease is less likely
to be affected by colonic cancer than one without. Put another way, those without
diverticular disease are 2.058=1/0.49 times more likely to have colonic cancer than
those without.
If the cell referencing option is not employed in the class statement in PROC
LOGISTIC, the estimate of the DDo parameter is given by 0.3609 and, the parameter DDi = -0.3609. Hence DD(1) versus DD(0) equals -0.3609 - 0.3609 =
-0.7218. The estimated odds ratio is that of DD(0) versus DD(1) = 2.058. Hence
estimated relative risk of DD(1) versus DD(0) is 1/2.058 = 0.49.

8.9.3

Matched Case-Control Study Data

An alternative to adjusting for confounding effects is to take account of their presence in the study design. This is achieved by what is known as matched case-control
study, in which each diseased person included in the study as a case is matched to
one or more disease-free persons who will be included as controls. The matching is
usually based on potential confounding variables, such as gender, ethnicity, marital
status, parity, etc., or by residential area, or place of work for those variables than
cannot be easily quantified.
1. A design with M controls per case is known as a 1:M matched study, and the
individuals that constitute the one case and the M controls to which the case

332

CHAPTERS. MODELS FOR BINARY

RESPONSES

has been matched are referred to as a matched set. Usually, in this case,
1 < M<5
2. A design in which M = 1 is called a 1:1 matching design. Here, the matched
set consists of one case and one control from each stratum.
3. In a m:n matched study, the matched set consists of n cases with m controls,
where usually 1 < (m, n) < 5.
The appropriate form of logistic regression for these types of data is called conditional logistic regression, which is based on conditional likelihood estimation (Chamberlain, 1980). In constructing the likelihood function, we condition on the number
of 1's and O's that are observed for each individual. Stokes et al. (1995) has presented an elegant treatment of the derivation of this procedure. Suffice it to say here
that for a randomized clinical trial with i = 1, 2, , q centers that are randomly
selected, then for two individuals from the same matched set, the i-th, say, the ratio
of the odds of diseases (event) occurring in a person with covariate x^ relative to
one with covariate Xj2 is given by

+
(8.24)

- 2*22) H ----- h @k(xnk - xi2k)


where i = 1,2, ,#. The process of estimating the /3-parameters in the above
model is often referred to as the conditional logistic modeling. A model that has no
explanatory variable is referred to as the null model and in this case, 0h 0 for
h = 1, 2, , k, and the deviance reduces to 2glog(l + M). If in 1 : 1 matching,
then, the deviance reduces to 2<?log(2).

8.9.4

Example 8.8: An Etiological Study Data

Kelsy and Hardy (1975) described an etiological study to investigate whether driving
of motor vehicles is a risk factor for low back pain caused by acute herniated lumbar
invertebral discs. The design used was a matched case-control study in which one
control was matched to each case. The cases were selected from persons between
the ages of 20 and 64 living in the area of New Haven, Connecticut, who had Xrays taken of the low back between June 1971 and May 1973. Those diagnosed as
having acute herniated lumbar invertebral discs, and who had only recently acquired
symptoms of the disease, were used as the cases in the study. The controls were
drawn from patients who were admitted to the same hospital as a case, or who
presented at the same radiologist's clinic as a case, with a condition unrelated to
the spine. Individual controls were further matched to a case on the basis of sex
and age. The age matching was such that the ages of the case and control in each
matched pair were within 10 years of one another. In total, 217 matched pairs were
recruited, consisting of 89 female pairs and 128 male pairs.
After an individual had been entered into the study as either a case or a control,
information on a number of potential risk factors, including place of residence and
driving habits, was obtained. Those individuals who lived in New Haven itself were
considered to be city residents, while others were classified as suburban residents.
Data on whether or not each individual was a driver and their residential status are
presented in the table below.

8.9. OTHER TOPICS: CASE-CONTROL DATA ANALYSIS

333

The number of matched sets of cases and controls according to driving and residence.
STATUS

Driver?

ase

no
no
no
no

ontrol

ase
ontrol

yes
no

ase
ontrol

ase

yes
no
no
no
no
no
yes
no
yes
no
no
yes
no
yes
yes
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes

ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
ontrol

ase
control

Suburban
resident?

no
no
yes
no
no
no
yes
no
no
yes
yes
yes
no
yes
yes
yes
no
no
yes
no
no
no
yes
no
no
yes
yes
yes
no
yes
yes
yes

# of
matched sets

9
2
14
22

0
2
1
4
10

1
20
32

7
1
29
63

Since this is a 1:1 matching design, i.e., M 1 and there is no treatment except the
explanatory variables, the conditional likelihood function can be written (Collett,
1991) as
-i

-i

(8.25)

where z^ = Xnh x^h-, so that z^ is the value of the h-ih explanatory variable
for the cases minus that for the control, in the i-th matched set. Equation (8.25)
is the likelihood for a linear logistic model with q binary observations that are all
equal to unity, where the linear part of the model contains k explanatory variables
zn, Zi2, , Zik, with i = 1, 2, , q and so no constant term. To implement this
model in SAS software for the data in example 8.8, we first create the following
indicator variables:
I 0 if the individual is not a driver-No
x\ = \
I if the individual is a driver-Yes
and

O if the individual is not a suburban resident-No


1 if the individual is a suburban resident-Yes

CHAPTER 8. MODELS FOR BINARY RESPONSES

334

If we assume that interaction may be present between residence and driving, we can
create variable 3 = x\ x x? at this point. Note that both the cases and the controls
are characterized by these three variables. From this, we now create the z's, which
are the differences between corresponding x's for the cases and the controls. The
following SAS software program fits the conditional logistic regression model to the
data in example 8.8.
data cond;
input idd status $ driver $ res $ count <5;
if driver='yes' then xl=l;
else xl=0;
if res ='yes' then x2=l;
else x2=0;
x3=xl*x2;
datalines;
I case no no 9 1 control no no 9 2 case no yes 2 2 control no no 2
3 case yes no 14 3 control no no 14 4 case yes yes 22 4 control no no 22
5 case no no 0 5 control no yes 0 6 case no yes 2 6 control no yes 2
7 case yes no 1 7 control no yes 1 8 case yes yes 4 8 control no yes 4
9 case no no 10 9 control yes no 10 10 case no yes 1 10 control yes no 1
II case yes no 20 11 control yes no 20 12 case yes yes 32 12 control yes no 32
13 case no no 7 13 control yes yes 7 14 case no yes 1 14 control yes yes 1
15 case yes no 29 15 control yes yes 29 16 case yes yes 63 16 control yes yes 63
proc print; run;
data new;
set cond;
if status = 'control' then delete;
r=count;
drop driver res count;
output; run;
data newl;
set cond;
if status = 'case' then delete;
yl=xl;y2=x2;y3=x3;n=count;
drop xl-x3 driver res count;
output; run;
data comb;
merge new newl;
by idd;
zl=xl-yl; z2=x2-y2; z3=x3-y3;
case=0; run;
proc print data=comb; run;
proc logistic data=comb;
weight n;
model case=zl-z2/noint details; run;

Obs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

dd

!
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

status

ase
ase
ase
ase
ase
ase
ase
ase
ase
ase
ase
ase
ase
ase
ase
ase

xl

0
0
1
1
0
0
1

x2

x3

0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1

1
0

1
0

1
0

0
0

1
1

0
0

1
1

9
2
14
22
0
2
1
4
10
1
20
32
7
1
29
63

0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

0
0
0
0
1
1
1
1
0
0
0
0

1
1
1
1

0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1

9
2
14
22
0
2
1
4
10
1
20
32
7
1
29
63

0
0
1
1
0
0
1
1
-1
-1
0
0
-1
-1
0
0

Response Profile
Ordered

Total

Total

z2

z3

0
1
0
1
-1
0
-1
0
0
1
0
1
-1
0
-1
0

0
0
0
1
0
0
0
1
0
0
0
1
-1
-1
-1
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

8.9. OTHER TOPICS: CASE-CONTROL DATA ANALYSIS


Value

case

Frequency

Weight

15

217.00000

335

NOTE: 1 observation having zero frequency or weight was excluded since it does
not contribute to the analysis.

Test

Testing Global Null Hypothesis: BETA=0


Chi-Square
DF
Pr > ChiSq

Likelihood Ratio
Score
Wald

Parameter

9.5456
9.3130
8 . 8484

0 .0085
0 .0095
0.0120

2
2
2

Analysis of Maximum Likelihood Estimates


Standard
DF
Estimate
Error
Chi-Square

zl
z2

1
1

Effect

zl
z2

0.6576
0.2554

0 . 2940
0.2258

Pr > ChiSq

5 . 0043
1.2792

0.0253
0.2581

Odds Ratio Estimates


Point
957. Wald
Estimate
Confidence Limits

1.930
1.291

1.085
0.829

3.434
2.010

NOTE: Since there is only one response level, measures of association between
the observed and predicted values were not calculated.

The results of some of the models applied to the data in this example are presented
in Table 8.14, while Table 8.15 gives the conditional or partial tests based on the
results displayed in Table 8.14.
Model
Zl
Z2
Zl + Z2

Zl+ Z2 + Z3

d.f.
14
14
13
12

G'2
292.57
296.54
291.28
291.20

Table 8.14: Results of models applied to the data

Z3

(zi,Z2)

G'2
0.08

Z2

Zl

1.29

Zl

Z2

5.26

Source of variation

d.f.

Comments
Interaction adjusted for
driving and residence
Residence, adjusted for
driving
Driving, adjusted for
residence

Table 8.15: Analysis of G2 based on the models above


From the model fitted involving z\ and Z2, the approximate relative risk of herniated
disc in a driver (x\ = 1) relative to a nondriver (x\ = 0), after adjusting for the
place of residence is exp(0.6576) = 1.93. Similarly, the approximate relative risk of
herniated disc occurring in a suburban resident (x-2 1) relative to a nonresident
(a?2 = 0), adjusted for driving is exp(0.2554) = 1.291.

336

CHAPTERS.

MODELS FOR BINARY

RESPONSES

Based on the above results therefore, we may conclude that the risk of herniated
disc occurring in a driver is about twice that of a nondriver, but the risk is not
affected by whether or not they are suburban or city resident.

8.9.5

Alternative Analysis of the Data in Example 8.8

Conditional logistic modeling can also be implemented by using the Cox's proportional hazard model, which utilizes PROC PHREG in SAS software. However, in
order to do this, we must create subject specific covariates and dependent variable.
Consequently, we read the data in for each individual giving a total of 434 obervations (217 pairs). The first and last 20 observations are reproduced below from
implementation of the SAS software data step in the program below.
data cond;
infile 'c:\classdata\cl8\condiii.txt';
input idd status $ driver $ res $ count 0(0;
if driver='yes' then xl=l;
else xl=0;
if res ='yes' then x2=l;
else x2=0;
x3=xl*x2;
if status='case' then event=l;
else event=0;
drop count;
proc sort;
by idd;
run;
data new;
set cond;
event=2-event;
run;
proc print;
run;
proc phreg nosummary;
model event=xl x2/ties=discrete;
strata idd;
run;

res

Obs

idd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10

case
control
case
control
case
control
case
control
case
control
case
control
case
control
case
control
case
control
case
control

no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no

no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes
no

413
414
415

207
207
208

case
control
case

yes
yes
yes

yes
yes
yes

status

driver

x2

x3

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2

1
1
1

1
1
1

1
1
1

1
2
1

xl

event

8.9. OTHER TOPICS: CASE-CONTROL DATA ANALYSIS


416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434

208
209
209
210
210
211
211
212
212
213
213
214
214
215
215
216
216
217
217

yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes

ontrol
ase
ontrol
ase
ontrol
ase
ontrol
ase
ontrol
ase
ontrol
ase
ontrol
ase
ontrol
ase
ontrol
ase
ontrol

yes
y e
y e
y e
y e
y e
yes
y e
y e
y e
y e
y e
yes
y e
yes
y e
y e
y e
y e

s
s
s
s
s
s
s
s
s
s
s
s
s
s
s

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

337

2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2

The DISCRETE option is necessary because the data consist of matched pairs
with each pair containing a 1 or a 0 on the dependent variable. For either 1 : m
or n : m matching designs, the DISCRETE option is essential. In the current
setting, every matched pair has both of the two values of the dependent variable.
In treatment-control matching, there are several ways of carrying out the analysis,
but in case-control matching designs, the options are limited (e.g., conditional logit
and Cox's approaches). The generalized estimating equations (GEE) is usually
suitable for the treatment-control matching.
The event is receded so that the O's (the controls) become 2's. This ensures that
the model variable is the probability of being a case. The result of implementing
the above program is displayed in the partial output below.
The PHREG Procedure
Model Information
Data Set
Dependent Variable
Ties Handling

WORK.NEW
event
DISCRETE

Model Fit Statistics

Criterion

Without
Covariates

With
Covariates

300.826
300.826
300.826

291.280
295.280
303.426

-2 LOG L
AIC
SBC

Testing Global Null Hypothesis: BETA=0


Test

Chi-Square

Likelihood Ratio
Score
Wald

DF

9.5456
9.3130
8.8484

Pr > ChiSq
0.0085
0.0095
0.0120

Analysis of Maximum Likelihood Estimates

Variable

xl
x2

DF

Parameter
Estimate

Standard
Error

1
1

0.65761
0.25542

0.29396
0.22583

Chi-Square
5 . 0043
1.2792

Pr > ChiSq

Hazard
Ratio

0.0253
0.2581

1.930
1.291

338

CHAPTERS.

MODELS FOR BINARY RESPONSES

We observe that the results presented from this alternate approach are identical
with those obtained from using the differences of the variables for conditional logistic approach. Both procedures have their pros and cons. The PHREG approach
requires data to be subject specific. This does not pose any problem if our data come
in this form (as is usual), and it further has the advantage that no transformations
to the z's are necessary.
The conditional logistic approach is also simple to use if the data are subject
specific, but one would need to output the cases as well as the controls to take
differences. It however has the advantage that it could easily be adapted to random
or mixed effect models in which the centers constitute a random sample from a
larger number of centers, i.e., the case in which the nuisance parameters on are
random. In such cases, the SAS^ macro GLIMMIX, which is available on the
SAS^ site will be appropriate.

8.10

A Five-Factor Response Example

The data for this example came from example 7.2, the Danish Welfare Study data
previously analyzed in chapter 7. (see page 270. The data relate to the response to
the question of whether there was a freezer in the household or not among subjects
in the 1976 Danish Welfare Study (from Andersen, 1997).
The data are a five-way 2 x 2 x 3 x 2 x 2 contingency table, with variables A sex,
B age, C family taxable income, D employment sector, and E whether there is a
freezer in the household. The age and income variables are defined as:
/
(Low
if < 60,000 D.kr
1
Age = <
Income = << Medium if
if 60,000 -- 100,000
100,0( D.kr
I Young if < 40
(High
if > 100,000 D.kr
We wish to reanalyze this data by employing the subset selection strategies (the
forward, backward, and stepwise) capability in PROC LOGISTIC. These strategies
are discussed in turn below.

8.10.1

Model Based on the Forward Selection Procedure

This model is implemented in SAS software with the following program together
with a partial output.
set tab77;
proc logistic;
class a (ref=last) b (ref=last)
c (ref=last) d (ref=last)/param=ref;
weight count;
model e=a|b|cId/scale=none aggregate=(a b e d )
selection=forward details;
run;
Step

0. Intercept entered:
Analysis of Effects Not in the Model

Effect

DF

Score
Chi-Square

Pr > ChiSq

1
1

0.0341
0.4058

0.8536
0.5241

8.10. A FIVE-FACTOR RESPONSE

EXAMPLE
<.0001
0.7980

105.1919
0.0655
Step

339

1. Effect c entered:
Residual Chi-Square Test
Chi-Square

DF

Pr > ChiSq

23.4781

21

0.3190

Analysis of Effects Not in the Model

Effect

Score
Chi-Square

DF

Pr > ChiSq

a
1
0.0232
0.8791
b
1
0.0369
0.8476
d
1
0.3582
0.5495
NOTE: No (additional) effects met the 0.05 significance level for entry into the
model.
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

Deviance
Pearson

21
21

23.4249
23.4781

1 .1155
1 .1180

Pr > ChiSq

0 .3218
0 .3190

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
c
1
c
2

DF

Estimate

Standard
Error

1
1
1

0.5209
1.0960
0.7865

0.0820
0.1111
0.1177

Wald
Chi-Square
40.3436
97 . 2486
44.6221

Pr > ChiSq
<.0001
<.0001
<.0001

Odds Ratio Estimates

Effect
c 1 vs 3
c 2 vs 3

Point
Estimate
2.992
2.196

957. Wald
Confidence Limits
2.407
1.743

3.720
2.766

For the forward selection model, step 0 enters the intercept into the model. The
analysis of the effect in the model indicates that variable C is the only one with a
pvalue less than 0.05 (the cutoff point for inclusion of a variable from the model).
Hence, at step 1, variable C was admitted into the model. The resulting analysis
of effects not in the model indicates that no other variable meets the criterion for
inclusion in the model. The model involving only variable C is therefore selected.
The model has a G2 = 23.4781 on 21 d.f. The model fits the data. We next consider
the model selection strategy based on the backward procedure.

8.10.2

Model Based on the Backward Selection Procedure

This model is implemented in SAS software with the following program together
with a partial output.
data new;
set tab77; proc logistic;
class a (ref=last) b (ref=last)
c (ref=last) d (ref=last)/param=ref;
weight count;
model e=a|b|c|d/scale=none aggregate=(a b e d )
selection=backward details; run;

CHAPTER 8. MODELS FOR BINARY

340

Step 0. The following effects were entered:


Intercept a b a*b c a*c b*c a*b*c d a*d
Step
Step
Step
Step
Step
Step
Step
Step
Step

1.
2.
3.
4.
5.
6.
7.
8.
9.

Effect
Effect
Effect
Effect
Effect
Effect
Effect
Effect
Effect

b*d

a*b*d

c*d

a*c*d

b*c*d a*b*c*d

a*b*c*d is removed:
a*b*c is removed:
b*c*d is removed:
a*c*d is removed:
c*d is removed:
b*c is removed:
a*b*d is removed:
a*b is removed:
a*d is removed:
Residual Chi-Square Test
Chi-Square

DF

Pr > ChiSq

11.7796

15

0.6956

NOTE: No (additional) effects met the 0.05 significance level for removal from the
model.
Summary of Backward Elimination
Effect
Number
Wald
Step
Removed
DF
In
Chi-Square
Pr > ChiSq

a*b*c*d
a*b*c
b*c*d
a*c*d
c*d
b*c
a*b*d
a*b
a*d

2
3
4
5
6
7
8
9

0.1413
0.9695
0.8585
0.8187
0.5837
0.3062
0.0877
0.5791
0.4950

3..9139
0.,0619
0..3052
0..4001
1..0766
2..3672
2..9163
0..3077
0..4656

14
13
12
11
10
9
8
7
6

2
2
2
2
2
2
1
1
1

Deviance and Pearson Goodness-of-Fit Statistics


Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

15
15

11.8149
11.7796

0.7877
0.7853

0.6930
0.6956

Number of unique profiles: 24


Type III Analysis of Effects

Effect

DF

Wald
Chi-Square

1
1
2
2
1
1

a
b
c
a*c
d
b*d

4.6557
2 . 5730
74.3851
7.1950
2.9882
4.4481

Pr > ChiSq
0.0310
0.1087
<.0001
0.0274
0.0839
0.0349

Analysis of Maximum Likelihood Estimates

Intercept
a
b
c
c
a*c
a*c
d
b*d

1
1
1
2
1 1
1 2
1
1 1

RESPONSES

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

1
1
1
1
1
1
1
1
1

0,,1749
0.,3619
0.,2607
1.,4396
1,,0335
-0..5933
-0.,4325
0..2237
-0.,4213

0,,1521
0,,1677
0.,1625
0.,1707
0.,1853
0..2259
0..2411
0..1294
0..1997

1..3228
4,.6557
2,.5730
71,.1437
31,.1068
6..8963
3,.2180
2,.9882
4,,4481

0,.2501
0,.0310
0..1087
<..0001
<..0001
0..0086
0..0728
0..0839
0.,0349

8.10. A FIVE-FACTOR RESPONSE EXAMPLE

341

The analysis begins at step 0 with the inclusion of all effects in the model. The
criterion for the removal of a variable in the model is that its type3 pvalue must be
greater than 0.05 and for two effects with competing pvalues, the strategy would be
to remove the higher order effect first. The backward selection procedure requires
10 steps (0-9) to obtain the best subset variables for this data. The effects removed
at each stage or step are summarized in the Summary of Backward Elimination in
the output above. For instance, at step 1, the a*b*c*d effect was removed because
its pvalue 0.1413 is greater than 0.05. At each stage, type3 analysis was carried
out (not included in the output above) and effects were removed according to their
typeS pvalues being greater than the cutoff point of 0.05. The final model is the logit
model {AC,BD} which is equivalent to the log-linear model {ABCD,ACE,BDE}.
The model is based on 15 d.f. with a G2 value of 11.8149.

8.10.3

Model Based on the Stepwise Selection Procedure

This model is implemented in SAS software with the following program together
with a partial output.
data new;
set tab77;
proc logistic;
class a (ref=last) b (ref=last)
c (ref=last) d (ref=last)/param=ref;
weight count;
model e=c|b|cId/scale=none aggregate=(a b e d )
selection=stepwise; run;
Step

0. Intercept entered:
Analysis of Effects Not in the Model

Effect

DF

Score
Chi-Square

Pr > ChiSq

1
1
2
1

0.0341
0.4058
105.1919
0.0655

0.8536
0.5241
<.0001
0.7980

a
b
c
d
Step

1. Effect c entered:
Analysis of Effects Not in the Model

Effect

DF

Score
Chi-Square

Pr > ChiSq

1
1
1

0.0232
0.0369
0.3582

0.8791
0.8476
0.5495

a
b
d

NOTE: No (additional) effects met the 0.05 significance level for entry into the
model.
Summary of Stepvise Selection

Step
1

Effect
Entered
Removed

DF

Number
In

Score
Chi-Square

105.1919

Wald
Chi-Square
.

Deviance and Pearson Goodness-of-Fit Statistics


Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

21
21

23.4249
23.4781

1.1155
1.1180

0.3218
0.3190

Pr > ChiSq
<.0001

CHAPTERS.

342

MODELS FOR BINARY RESPONSES

Number of unique profiles: 24


Analysis of Maximum Likelihood Estimates

Parameter

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

1
1
1

0.5209
1.0960
0.7865

0.0820
0.1111
0.1177

40.3436
97.2486
44.6221

<.0001
<.0001
<.0001

Intercept
c
1
c
2

Odds Ratio Estimates


Point
Estimate

Effect

c 1 vs 3
c 2 vs 3

95'/. Wald
Confidence Limits

2.992
2.196

3.720
2.766

2.407
1.743

Again for the stepwise approach, at step 0, the intercept is introduced into the
model. At step 1, inclusion in the model is based on pvalue less than 0.05. The
type3 analysis shows that effect C is the prime candidate at this stage. Effect C
is therefore introduced at this stage. The type 3 analysis of the effects not in the
model at this stage indicates that no other effect meets the entry criterion of 0.05.
The entry criterion can be changed in SAS software as SLE=value. Value in this
case, is the default value, 0.05. Similarly, to stay or remain in the model, the stay
value can similarly be changed with the option SLSTAY=value. Again, here the
default is 0.05. The model selected is the logit model {C}, which is equivalent to
the log-linear model {ABCD,CE}. This is the same model selected by the forward
selection procedure.
One obvious disadvantage of the forward and stepwise selection procedures is
the lack of consideration for entry of two-factor, three-factor, or four-factor effects.
We can change this by forcing the procedures to include, say, the first n of the
s = 2r 1 effect terms, where r is the number of factor variables. In our case,
s = 24 1 = 15. We present below a stepwise procedure with a starting value of 10.
The model selected with this option corresponds to that selected by the backward
selection procedure, namely, the logit model {AC,BD}. The partial output for the
implementation is presented below.
proc logistic;
class a (ref=last) b (ref=last)
c (ref=last) d (ref=last)/param=ref;
weight count;
model e=a|b|c|d/scale=none aggregate=(a b e d )
selection=stepwise details start=10;
run;
Step

0. The following effects were entered:

Intercept
Step
Step
Step
Step

1.
2.
3.
4.

Effect
Effect
Effect
Effect

a*b

a*c

b*c

a*b*c

a*d

b*d

a*b*c is removed:
a*b is removed:
a*d is removed:
b*c is removed:

NOTE: No (additional) effects met the 0.05 significance level for entry into the
model.
Summary of Stepwise Selection
Effect

Number

Score

Wald

8.10. A FIVE-FACTOR RESPONSE EXAMPLE


Step

Entered

Removed

a*b*c
a*b
a*d
b*c

2
3
4

DF

In

2
1
1
2

9
8
7
6

Chi-Square

343
Chi-Square

Pr > ChiSq

0.1773
0.3720
0.3882
2.7224

0.9151
0.5419
0.5332
0.2563

Deviance and Pearson Goodness-of-Fit Statistics


Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

15
15

11.8149
11.7796

0.7877
0.7853

0.6930
0.6956

Type III Analysis of Effects

DF

Effect

1
1
2
2
1
1

a
b
c
a*c
d
b*d

8.10.4

Wald
Chi-Square
4.6557
2 . 5730
74 . 3851
7.1950
2 . 9882
4.4481

Pr > ChiSq
0.0310
0.1087
<.0001
0.0274
0.0839
0.0349

Interpretations of Results

Based on the above analyses, the most parsimonious model selected is the logit
model {AC,BD}, which is equivalent to the log-linear model {ABCD,ACE,BDE}.
The model has a G2 = 11.8149 on 15 d.f., and the ACE parameters under the logit
model are presented below.
Parameter

a*c
a*c
b*d

1 1
1 2
1 1

DF

Estimate

Standard
Error

Wald
Chi-Square

1
1
1

-0.5933
-0.4325
-0.4213

0.2259
0.2411
0.1997

6.8963
3.2180
4.4481

Pr > ChiSq
0 . 0086
0.0728
0.0349

The ACE interaction terms indicates that given the response Ep, sex AI and family
income Ck of respondents are conditionally independent of age and employment
sector. Thus if we keep sex constant we can compute and interpret relevant log
odds ratios pertaining to family income and freezer status. But first, we form below
the three-way ACE table as in Table 8.16 and we use this to obtain the estimated
log of the odds ratios. These estimates are displayed in Table 8.17.
We observe that all the six estimates in Table 8.17 are positive. The overall pattern
is that for the same values of k and k', females are more likely to have a freezer in
the household. For females, the odds of having a freezer relative to not having one
is e1'419 = 4.1 times higher for respondents with lower family income than those
with high family income. For men, this value is 2.33 times higher. Similarly for
females, the odds is e1'029 = 2.80 times higher for those on medium income than
those with high family income. Again the corresponding odds for men is 1.82.
For the BDE interaction effect, we again present in Table 8.18 the BDE observed.
The estimates of log odds ratios ^mo^L) . are 0.241 and 0.126 for j = 1 and
j 2, respectively. Thus for older respondents age > 40, the odds are 11% lower
of owning a freezer for those in the private sector, relative to those in the public

CHAPTER 8. MODELS FOR BINARY RESPONSES

344

Sex
A
i
1

Income
C
A:
1
2
3

p= 1
594
407
239

p=2
129
113
121

1
2
3

479
251
160

84
65
116

Freezer status
Yes
No

Table 8.16: Observed ACE marginal table with A fixed at 1 and 2

Sex
Male

i
1

Female

Family income (C)


C
k
C
Low
1
Medium
Low
1
High
Medium
2
High

A;'
2
3
3

Freezer
Yes
p
Yes
1
Yes
1
Yes
1

Low
Low
Medium

2
3
3

Yes
Yes
Yes

1
1
2

Medium
High
High

1
1
1

status
No
No
No
No
No
No
No

p'
2
2
2
2
2
2

-CE.A
(kk'}(pp').i

0.246
0.846
0.601
0.390
1.419
1.029

Table 8.17: Estimated log odds ratios for the interaction of family income and
freezer status, given sex

sector. However, for the younger respondents under 40 years old, the corresponding
odds are 13% higher for those in the private sector relative to those in the public
sector of having a freezer.

8.11

Exercises

The data in the table below are reported in Woodward et al., (1941) and
are reproduced from Christensen (1990). The data examine the relationship
between exposure to chloroacetic acid and the death of mice. Ten mice were
exposed at each dose level and the doses are measured in grams per kilogram
of body weight.

.11.

EXERCISES

345
Age
B
j
1

Employment
D
/
1
2

1
2

Freezer status
Yes
No
p=l
p =2
533
177
322
84
856
419

236
131

Table 8.18: Observed BDE marginal table with B fixed at 1 and 2


Dose
0.0794
0.1000
0.1259
0.1413
0.1500
0.1588
0.1778
0.1995
0.2239
0.2512
0.2818
0.3162

Dead
1
2
1
0
1
2
4
6
4
5
5
8

Exposed
10
10
10
10
10
10
10
10
10
10
10
10

Fit the logistic regression model to the data and estimate the LDao, LDgo,
and LDggg. Discuss the possible danger of extrapolation to LDggg. Determine
how well the model fits the data.
2. For the data in problem 1 above, fit the probit, complimentary log-log, and
the extreme value and probability models. What can you say about your
fits? Calculate the residuals and test whether there is lack of fit. (Note: A
probability model has ir(x) = flo + fix, where TT(X) is the proportion and is
fitted by invoking the identity link.) Also obtain estimated proportions TT(Z)
for all the models.
3. Show that the logistic distribution function:
v

'

1 + exp(A) + fax)

has the steepest slope when ir(x) = \. By rewriting TT(X) as a linear model,
show that the LDso is obtained as PIi2.
4. Find the LD50 for the complimentary log-log function.
5. The data in Table 8.19 compare male lung cancer patients with control patients having other diseases, according to the average number of cigarettes
smoked daily over a period of 10-year period preceding the onset of lung cancer. The data are from a retrospective study of the incidence of lung cancer

CHAPTER 8. MODELS FOR BINARY

346

RESPONSES

and tobacco smoking among patients in hospitals in English cities (Agresti,


1990).
Daily average
no. of cigarettes

Disease group
Control
Lung cancer
patients
patients
61
129
570
431
154
12

7
55
489
475
293
38

0
3
9.5

19.5
37
55

Table 8.19: Incidence of cancer and smoking on patients


Fit a logit model to the data.
An antihistaminic drug was used at various doses to protect test animals
against a certain lethal dose of histamine, with the results given below.
Dose
Mg/kg
1000
500
250
125
62.5

Alive/
total
8/8
7/8
4/8
4/8
1/8

Fit the logistic and probit models to the data above and compute the
in each case. Comment about your models.
7. The proportion of individuals in a certain city in Zaire, with malaria antibodies
present was recorded (Morgan, 1992) as in the following table:
Mean
age

Seropositive

Total

1.0
2.0
3.0
4.0
5.0
7.3

2
3
3
3
1
18
14
20

60
63
53
48
31
182
140
138

11.9
17.1

Mean
age
22.0
27.5
32.0
36.8
41.6
49.7
60.8

(a) Fit a logistic model to the above data.


(b) Repeat the analysis with a probit model.
(c) Which of the models fit the data better?

Seropositive
20
19
19
24

7
25
44

Total
84
77
58
75
30
62
74

.11.

347

EXERCISES

(d) With your chosen model, predict the expected probability for an individual with a mean age of 25 years.
(e) Find an EDgo from your chosen model and interpret it.
Two anticonvulsant drugs were compared by administering them to mice,
which were then given electric shock under conditions that caused all control
mice to convulse. The results of the experiment are displayed in the table
below (Goldstein, 1965).

Dose
mg/kg
10
30
90

Drug A
Convulsed/
total
13/15
9/15
4/15

DrugB
Dose
Convulsed/
mg/kg
total
200
12/15
600
6/15
1800
2/15

(a) Fit separate regression lines for both drugs and hence obtain an estimate
of the relative potency from estimates of their LDso's.
(b) Fit a combined regression line and test for equality slopes. Test whether
there is dosage and/or drug effects. Summarize your conclusions.
The data below are reproduced from Collett (1991) and were originally reported by Strand (1930). The experiment was to assess the response of
the confused flour beetle, TriboUum confusum, to gaseous carbon disulphide
(CSz). Prescribed volumes of liquid carbon disulphide were added to flasks
in which a tubular cloth cage containing a batch of about thirty beetles was
suspended. Duplicate batches of beetles were used for each concentration of
C$2- At the end of 5-hour period, the proportion killed was recorded and the
actual concentration of gaseous CS<2 in the flask, measured in mg/liter, was
determined by a volumetric analysis. The mortality data are presented below.
Number of beetles killed, y, out of n exposed to concentrations of CS-2
Concentration
of CS2
49.06
52.99
56.91
60.84
64.76
68.69
72.61
76.54

Replicate I

Replicate II

2
7
9
14
23
29
29
29

29
30
28
27
30
31
30
29

4
6
9
14
29
24
32
31

30
30
34
29
33
28
32
31

(a) Fit separate logistic models to the data in each replicate. How well do
these models fit the data?

348

CHAPTER 8. MODELS FOR BINARY RESPONSES


(b) Now combine the data and fit a third-degree polynomial logistic regression to the model. Discuss your findings and hence give the most parsimonious model for the combined data. Estimate both the LDso and the
LDgo under your assumed model and interpret.

10. Repeat the above analysis using the probit models.

11. The data in Table 8.20 from Breslow and Day (1980,) relate to the occurrence
of esophageal cancer in Frenchmen. Potential risk factors related to the occurrence are age and alcohol consumption where any comsumption of wine
more than one liter a day is considered high.

Age group
25-34

Alcohol
consumption
High

35-44

High

45-54

High

55-64

High

65-74

High

75+

High

Low
Low
Low
Low
Low
Low

Cancer
Yes
No
1
9
0
106
4
26
5
164
25
29
21
138
42
27
34
139
19
18
36
88
5
0
8
31

Table 8.20: Occurrence of esophageal cancer

(a) Fit a logistic model with the explanatory variables age, and alcohol consumption by first considering age as categorical variable and as a continuous variable.
(b) Consider fitting the interaction term in both situations above. Use the
stepwise regression procedure to fins the most parsimonious model. Interpret your results.

12. For the data in the table below (reproduced from Collett, 1991):)

Mortality of the tobacco bud worm 72 hours after exposure to cypermethrin

.11. EXERCISES

349
(Holloway, 1989)
Gender
of moth
Male

Dosage of
cypermethrin
1.0
2.0
4.0
8.0
16.0
32.0

Number affected
out of 20
1
4
9
13
18
20

Female

1.0
2.0
4.0
8.0

0
2
6
10
12
16

16.0
32.0
(a) Fit separate lines to male and female data.
(b) Fit parallel lines to both gender data.

(c) Fit a common line to both data and discuss your results in each case.
(d) Also fit a logistic model with gender and dose as the explanatory variable
(you may also consider including their interaction term in the model).
Again, discuss your results.
(e) Consider the fact that the dosage level are continuous and fit a logistic
model with gender as an explanatory variable and including terms with
powers of dosage level included (e.g., D5)
13. Refer to problem 6.10:
(a) Fit a log-linear model that explains following politics regularly in terms
of nationality and education level. Hint: We assume here that FP is a
response variable.
(b) Repeat (a) but this time ignore the fact that FP is a response variable.
That is, treat all variables as factor variables.
(c) Compare your results.
14. The following data from Finney (1941) and Pregibon (1981) relate to the
occurrence of vasoconstriction in the skin of the fingers as a function of the
rate and volume of air breathed. In times of stress, vasoconstriction restricts
blood flow to the extremities (such as fingers and toes), forcing blood to the
central vital organs. The data are reproduced below. A constriction value of
1 indicates that constriction occurred.

CHAPTER 8. MODELS FOR BINARY

350
Constriction
1
1
1
1
1
1
0
0
0

0
0
0
0
1
1
1
1
1
0

Volume
0.825
1.09
2.5
1.5
3.2
3.5
0.75
1.7
0.75
0.45
0.57
2.75
3.0
2.33
3.75
1.64
1.6
1.415
1.06
1.8

Rate
3.7
3.5
1.25
0.75
0.8
0.7
0.6
1.1
0.9
0.9
0.8
0.55
0.6
1.4
0.75
2.3
3.2
0.85
1.7
1.8

Constriction
0
0
0
0
1
0
1

0
1
0

0
0

1
1
1

0
0

Volume
2.0
1.36
1.35
1.36
1.78
1.5
1.5
1.9
0.95
0.4
0.75
0.03
1.83
2.2
2.0
3.33
1.9
1.9
1.625

RESPONSES
Rate
0.4
0.95
1.35
1.5
1.6
0.6
1.8
0.95
1.9
1.6
2.7
2.35
1.1
1.1
1.2
0.8
0.95
0.75
1.3

(a) Use SAS software to plot the graph (a half page) of rate against volume
for both constriction values on the same page. Are there any ouliers or
influential observations?
(b) Fit a logistic regression to the data with rate and volume as explanatory
variables. Discuss your results.
(c) Pregibon (1981) fits a logit response model having logs of volume and
rate as explanatory variables. How does this model compares with your
model in (b)? Pregibon suggests that the rate for observation 32 should
be 0.3 rather than 0.03 as it appears in the table above. Is there any
evidence for this suggestion? Identify any lack of fit by carrying out the
necessary diagnostics procedures.
15. The Data below relate to a sample of patients with coronary heart disease

(CHD) and a "normal" sample free of CHD (Lunneborg, 1994). A 1 indicates


the patient has no CHD, while a 2 indicates that patient has CHD. Three
risk factors are being evaluated. The risk factors are systolic blood pressure
(SBP), blood-cholesterol level (Choi), and age of the patients.
group

sbp

chol

age

1
1
1
1
1
1
1
1
1
1

135
122
130
148
146
129
162
160
144
166
138
152
138

227
228
219
245
223
215
245
262
230
255
222
250
264

45
41
49
52
54
47
60
48
44
64
59
51
54

1
1
1

.11.

351

EXERCISES
1
1
2
2
2
2
2
2
2
2
2
2

140
134
145
142
135
149
180
150
161
170
152
164

271
220
238
232
225
230
255
240
253
280
271
260

56
50
60
64
54
48
43
43
63
63
62
65

If we define the variable Y to be


I 1 if CHD
~ \Q if no CHD

fit a parsimonious linear logistics regression model to the data and interpret
your results.
16. The data in Table 8.21 are reproduced from Slaton et al., (2000). They came
from an experiment that examined the in utero damage in laboratory rodents
after exposure to boric acid. The experiment uses four levels of boric acid, and
records the number of rodents in the litter and the number of dead embryos.
Dose=0.0
Litter
Dead
size
0
15
0
3
1
9
1
12
1
13
2
13
0
16
0
11
1
11
2
8
0
14
0
13
3
14
1
13
0
8
0
13
2
14
3
14
0
11
2
12
0
15
0
15
2
14
1
11
1
16
0
12
0
14

Dose=0.1
Litter
Dead
size
0
6
1
14
1
12
0
10
2
14
0
12
0
14
3
14
0
10
2
12
3
13
1
11
1
11
0
11
0
13
0
10
1
12
0
11
2
10
2
12
2
15
3
12
1
12
0
12
1
12
1
13
1
15

Dose=0.2
Litter
Dead
size
1
12
0
12
0
11
0
13
0
12
0
14
4
15
0
14
0
12
1
6
2
13
0
10
1
14
1
12
0
10
0
9
1
12
0
13
1
14
0
13
0
14
1
13
2
12
1
14
0
13
0
12
1
7

Dose=0.3
Litter
Dead
size
12
12
1
12
0
13
2
8
2
12
4
13
0
13
1
13
0
12
1
9
3
9
0
11
1
14
0
10
3
12
2
21
3
10
3
11
1
11
1
11
8
14
0
15
2
13
8
11
4
12
2
12

Table 8.21: Damage in laboratory rodents after exposure to boric acid


Analyze the above data and discuss your results.

This page intentionally left blank

Chapter 9

Logit and Multinomial


Response Models
9.1

Introduction

In logistic regression discussed in chapter 8, the emphasis has been on the conditional probabilities of a single response variable given one or several factors. However, in this chapter, we shall be concerned with data with a dependent variable and
several other independent or factor variables. Specifically, we shall be interested in
categorical type dependent and/or explanatory variables, that is, ANOVA type
models. This class of log-linear models that utilize the binary nature of the dependent variables is called logit models. These logit models are examined in the first
few sections of this chapter. We shall examine this methodology to situations where
the the response variable has multiple outcomes. Specificically, we shall examine in
turn cases when the multiple outcomes are either nominal or ordinal in nature.
However, before we go any further, let us consider the general 2 x J table below.

B
A
1
2

...

rhn
rn2i

m,22

'''

"fn-lj

'.'. ^

For the data in the table above, the saturated log-linear model is given by:
with i = 1, 2,j = 1, 2, 3, , J, and with the usual constraints.
However, since variable A is the dependent variable and is dichotomous (or
binary), we can use logits that are given by:
In (
In
I

2j

In particular,
and similarly,
353

AB

i -L.
+ \A1<7- \)

354

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

Thus, the logits are the functions of the same A parameters that appear in the
general log-linear model.
Let us now extend this further to a three-way 2 x 2 x 2 contingency table of
expected values (under some model) below:
C
A
i =1

i=2

B
1

k=l

3
3=2
1
3
=
2
j

k =2

Trim

777.112

7712.21

"7/122

"1211

"1212
"1222

In the above 23 table of expected values, let us suppose that A and B are factor
variables and C is a binary response or dependent variable.
Let itk.ij ^'ijB denote the conditional probability that C = k given that
A = i and B = j, and let
ra ijk

C.AB

mAB
Further let us define
= In

(9.1)

to be the log-odds (i.e., logit) that C is 1 rather than 2 given that A = i and B = j,
and since it\.ij + K2.ij = 1? we have
= ln

Pl.i

.1 -Pi.i_

where the p's are the corresponding observed probabilities.


The three variable saturated log-linear model for the table is given by:

In (rh } = u + XA + XB + Xc + XAB + XAC +

xVBC
BC

S'fc

'

x ABC

(9.2)

But from (9.1)


(9.3)

Hence, substituting (9.3) in (9.2), we have


(jj- 2fA c + XAC + \BC + XABC'
which can succinctly be written in the logit model form:
(9.4)

with T/J + r/2 = 0 etc. and where


AC

We can now give in Table 9.1 equivalent logit models for the corresponding log-linear
model for a three-way contingency table.
For any given logit model, the corresponding log-linear equivalent model can be
obtained by noting the followings:

9.1.

355

INTRODUCTION
Model
1
2
3
4
5

Logit model

77
r? + rjf 4- r?f
(,

?7

Log-linear
ABC
AB/AC/BC
AB/AC
AB/BC
AB/C

Table 9.1: Equivalence of logit and log-linear models


(a) The log-linear equivalent always includes the two-way interaction between the
explanatory or factor variables, that is, AB in this case.
(b) The log-linear equivalent contains the interaction of the response variable C
with the effects that are specified in the logit model.
For simplicity, we would like to write logit models succinctly. For example, the
saturated model:

n + rf + rif+'riij*
would be written succinctly as {AB}. The logit model {B} is equivalent to the
log-linear model given by {AB,BC} using (a) and (b) above.
If we extend this to a four-way 24 contingency table having A, B, and C as
factor variables and D as the response variable, we similarly for example have the
following logit and log-linear model equivalents.
Models
1
2
3
4
5
6
7
8
9
10
11
12
13

Logit
ABC
B/C
A/C
A/B
A/B/C
BC/A
AC/B
AB/C
AC/BC
AB/BC
AB/AC
AB/AC/BC
"H

Log-linear
ABCD
ABC/BD/CD
ABC/AD/CD
ABC/AD/BD
ABC/AD/BD/CD
ABC/BCD/AD
ABC/ACD/BD
ABC/ABD/CD
ABC/ACD/BCD
ABC/ABD/BCD
ABC/ABD/ACD
ABC/ABD/ACD/BCD
ABCD

Table 9.2: Equivalent models for four-way tables


We note here again that by (a) above, each log-linear equivalent models contains
the three-way factor interaction ABC. The remaining terms are D multiplied by the
other terms in the logit model formulation.

9.1.1

Gun Registration Data Revisited: Example 9.1

In chapter 6, we analyzed the gun registration data presented in that chapter in


Table 6.10. There, we have a 23 contingency table with a response variable R
response to gun registration (favors, opposes), Y year of survey (1975, 1976) and Q

356

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

the form of questionnaire administered (Ql, Q2). Since R is a response variable, we


start by fitting the saturated logit model (YQ), which is equivalent to the log-linear
model (RYQ). This can be accomplished by either using SAS PROG CATMOD
or PROC LOGISTIC or PROG GENMOD. Let us use PROG LOGISTIC in this
example. Preliminary analysis gives the following SAS software output.
data tab521;
INPUT R $ Y $ Q $ COUNT <B<8;
DATALINES;
OPP 1975 Ql 126 OPP 1975 Q2 141 OPP 1976 Ql 152 OPP 1976 Q2 182
FAV 1975 Ql 319 FAV 1975 q2 290 FAV 1976 Ql 463 FAV 1976 Q2 403
PROC LOGISTIC;
CLASS Y q; WEIGHT COUNT; MODEL R=Y|q/SCALE=NONE AGGREGATE; RUN;
Type III Analysis of Effects
Wald
DF
Chi-Square
Pr > ChiSq

Effect
Y
Q
Y*q

1
1
1

1.7466
7.2378
0.3220

0.1863
0.0071
0.5704

We see that only the effect of Q on R is important. We also note here that we are
modeling the response category "favors gun registration." Had we wanted to model
"oppose gun registration," we would have included the statement descending after
the PROC LOGISTIC statement. We next therefore fit the logit model (Q), which is
equivalent to the log-linear model (YQ,RQ). We use PROCs CATMOD, LOGISTIC
and GENMOD to implement this model. The following SAS software program and
partial outputs illustrate these fits.
proc catmod; population y q;
weight count; model r=q; run;
The CATMOD Procedure
Analysis of Maximum Likelihood Estimates
Standard
ChiParameter
Estimate
Error
Square

Effect
Intercept
q

1
2

0.8988
0.1354

0.0485
0.0485

Pr > ChiSq

343.22
7.79

<.0001
0.0052

proc logistic; class y q; weight count;


model r=q/scale=none aggregate=(y q); output out=aa p=pred; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

2
2

2.0154
2.0228

1.0077
1.0114

0.3651
0.3637

Analysis of Maximum Likelihood Estimates

Parameter
Intercept

Ql

DF

Estimate

Standard
Error

Chi-Square

0.8988
0.1354

0.0485
0.0485

343.2217
7.7922

Odds Ratio Estimates

Effect
Ql vs Q2

Point
Estimate

957. Wald
Confidence Limits

Pr > ChiSq
<.0001
0 . 0052

9.1.

357

INTRODUCTION

proc genmod; class r y q; model count=ylq r|q/dist=poi link=log type3;


run;
Analysis Of Parameter Estimates

DF

Parameter

R*Q
R*Q

FAV
FAV

1
0

Ql
Q2

Estimate

Standard
Error

Wald 957. Confidence ChiLimits


Square Pr > ChiSq

0.2709
0.0000

0 . 0970
0 . 0000

0.0807
0 . 0000

0.4610
0 . 0000

7.79

0 . 0052

The statement population Y Q in CATMOD induces a sampling scheme with the


YQ fixed. The same is accomplished in LOGISTIC with the aggregate statement.
Because the coding schemes in PROC CATMOD and LOGISTIC are effect coding
scheme, the parameter estimate for Q and its corresponding standard error is half
of that given by PROC GENMOD and the odds of Ql against Q2 equal e-2709 =
2
6 2(o.i354) = i.an.xhe logit model (Q) gives a G = 2.0154 on 2 d.f. (p = 0.3651), a
good fit. The parameter estimates under the GENMOD model equal XRQ = 0.2709,
the equivalent parameter estimated in CATMOD is 0.1354. Thus the odds ratio for
Ql versus Q2 (for those who oppose gun registration) is e'2709 = 1.31 under the
GENMOD model and is equal to e2(-1354) = 1.31 under the CATMOD model. Thus
those that are administered the form of questionnaire 1 are 1.31 times more likely to
respond favorably to gun registration than those that were administered the form of
questionnaire 2. This conclusion is consistent with that obtained earlier in chapter
6. The following prediction probabilities (pred) for favoring gun registration also
confirms our result regardless of year of survey.
Y
1975
1975
1976
1976

9.1.2

pred

Ql

0.7377
0.6821
0.7377
0.6821

Example 9.2

The data in Table 9.3 are for a 24 contingency table from Demo and Parker (1987).
They relate to the effect of academic achievement on self-esteem among black and
white college students.
SEX(i)
Males

GPA(j)
High
Low

Females

High
Low

RACE(k)
Black
White
Black
White
Black
White
Black
White

ESTEEM(l)
High Low
15
9
17
10
26
17
22
26
13
22
22
32
24
23
3
17

Table 9.3: A 24 table from Demo and Parker (1987)


If we regard the variable "self-esteem" as a response variable, we can employ the
logit model formulation to find a model that adequately describes the data. We

358

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

give below the relevant SAS software program for fitting the saturated logit model
to this data, using PROCs CATMOD, LOGISTIC, and GENMOD.
data tab82;
do s=l to 2; do g=l to 2; do r=l to 2; do e=l to 2;
input count (D<8; output; end; end; end; end;
datalines;
15 9 17 10 26 17 22 26 13 22 22 32 24 23 3 17
*** Use CATMOD***;
proc catmod; weight count; population s g r; model e=s|g|r; run;
*** Use PROC LOGISTIC ***;
proc logistic; class s g r; weight count; model e=s|g|r\scale=none aggregate=(s g r); run;
***Use PROC GENMOD ***;
proc genmod; class s g r e; model count= s|g|r|e/dist=poi link=log typeS; run;

We start modeling the above data by first fitting the saturated logit model to the
data using PROC CATMOD and LOGISTIC (version 8.0) and the equivalent loglinear model using PROC GENMOD.
In the use of PROC CATMOD above, we fit the logit saturated model {SGR}
to the data, which produces the analysis of variance table below that enables us to
determine which effect or effects need to be included in a future reduced model. We
present a modified output from PROC CATMOD below.
The CATMOD Procedure
Maximum Likelihood Analysis of Variance
Source
DF
Chi-Square
Pr > ChiSq
s
r
g*r

1
1
1

12.69
4.05
5.40

0.0004 *
0.0443 *
0.0201 *

Of the effects and interactions {S,G,SG,R,SR,GR,SGR} in the model, only the


interaction term GR is significant, with a G2 value of 5.40 on 1 d.f. (pvalue = 0.020).
None of the other two-factor interactions SG and SR is significant. The main effect
terms S and R are also significant. Thus our reduced model would be the logit model
{S,GR}. This model corresponds to the log-linear model {SGR,ES,EGR}. Below
are modified SAS software outputs from the saturated models based on ML analysis
(or Type 3) of effects and interactions from PROC LOGISTIC and GENMOD. In
the GENMOD output, we are only looking for significant interactions involving E
(the response variable). Note that we have only reported those that are significant.
The LOGISTIC Procedure

Effect
s
r
g*r

Type III Analysis of Effects


Wald
DF
Chi-Square
Pr > ChiSq
1
1
1

12.6839
4.0442
5.4027

0.0004 *
0.0443 *
0.0201 *

The GENMOD Procedure


LR Statistics For Type 3 Analysis
ChiSource
DF
Square
Pr > ChiSq
s*e
r*e
g*r*e

1
1
1

13.82
4.22
5.68

0.0002 *
0.0398 *
0.0171 *

9.1.

359

INTRODUCTION

Having identified our reduced model, that is the logit model {S,GR} with corresponds to the log-linear model {SGR, ES, EGR}, we present below the SAS software
program to implement this model for the three procedures together with the modified outputs again from the three procedures. The reduced model fits the data with
G2 = 2.5165 on 3 d.f. (pvalue = 0.4723).
set tab82; proc catmod; weight count; population s g r; model e=glr s; run;
proc logistic; class s g r; weight count; model e=g|r s/scale=none aggregate=(s g r);
run;
proc genmod; class s g r e; model count=s|g|r g|r|e sIe/dist=poi link=log type3; run;
The CATMOD Procedure
Maximum Likelihood Analysis of Variance
Source

DF

Chi-Square

Pr > ChiSq

Intercept
g
r
g*r
s

1
1
1
1
1

0.84
1.14
3.41
4.84
11.52

0 . 3600
0 . 2847
0 . 0647
0.0278
0 . 0007

Likelihood Ratio

2.52

0.4723

Analysis of Maximum Likelihood Estimates

Effect

Parameter

Intercept
g
r
g*r
s

1
2
3
4
5

Estimate

Standard
Error

ChiSquare

-0.1109
0.1336
0.2254
-0.2722
0.4262

0.1212
0.1249
0.1220
0.1237
0.1262

0.84
1.14
3.41
4.84
11.52

Pr > ChiSq
0 . 3600
0.2847
0 . 0647
0.0278
0.0007

The LOGISTIC Procedure


Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

3
3

2.5165
2.4482

0.8388
0.8161

Deviance
Pearson

Pr > ChiSq
0.4723
0 . 4847

Type III Analysis of Effects

Effect
g
r
g*r
s

DF

Wald
Chi-Square

Pr > ChiSq

1
1
1
1

1.1446
3.4120
4.8429
11.5162

0.2847
0.0647
0.0278
0.0007

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
g
r
g*r
s

1
1
11
1

DF

Estimate

Standard
Error

Chi-Square

1
1
1
1
1

-0.1109
0.1336
0.2254
-0.2722
0.4282

0.1212
0.1249
0.1220
0.1237
0.1262

0.8379
1.1446
3.4120
4.8429
11.5162

Effect

s 1 vs 2

Odds Ratio Estimates


Point
957. Wald
Estimate
Confidence Limits
2.355

1.436

3.861

Pr > ChiSq
0.3600
0 . 2847
0.0647
0.0278
0.0007

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

360

The GENMOD Procedure


Criteria For Assessing Goodness Of Fit
Criterion

DF

Deviance
Pearson Chi-Square

Value

Value/DF

2.5165
2.4482

0.8388
0.8161

Analysis Of Parameter Estimates

Parameter
Intercept
g*e
r*e
g*r*e
s*e
Scale

1
1
1
1

1
1
1
1

Standard
Error

DF

Estimate

1
1
1
1
1
0

2.7255
0.8116
0.9951
-1.0887
0.8564
1.0000

0.2361
0.3591
0.3443
0 . 4947
0.2524
0 . 0000

ChiSquare

Pr > ChiSq

133.26
5.11
8.35
4.84
11.52

<.0001
0.0238
0.0038
0.0278
0.0007

*
*
**
**

LR Statistics For Type 3 Analysis

Source

e
g*e
r*e
g*r*e
s*e

DF

ChiSquare

1
1
1
1
1

0.84
1.15
3.44
4.92
11.91

Pr > ChiSq
0.3595
0.2831
0 . 0636
0 . 0265
0 . 0006

The three procedures each gives a model fit of G2 = 2.5165 on 3 d.f. (pvalue =
0.4723), a very good fit. Again the parameter estimates from both PROC CATMOD
and LOGISTIC are the same because of the same coding scheme employed by these
two procedures. The parameter estimates from GENMOD, being from the cell
reference coding scheme; are much easier to interpret. Let us therefore concentrate
on the parameter estimates from PROC GENMOD for a moment. Since we are
only concerned with those interactions involving the response variable E, we notice
that the following effects appear to be significant: GE, RE, GRE, SE. Of these four
terms, only SE and GRE belong to the logit model formulation. However, when
all the above terms are examined in the light of their type 3 contributions to the
model, only the GRE and SE terms are important. These results are consistent
with those from PROC CATMOD and LOGISTIC being the only ones significant
at a = 0.05. These are the two terms that will be employed in interpreting the data
in Table 9.3. In Table 9.4 are the expected values obtained under this model.
SEX(i)
Males

GPA(j)
High
Low

Females

High
Low

RACE(k)
Black
White
Black
White
Black
White
Black
White

ESTEEM(l)
High
Low
14.392
9.608
16.792 10.208
28.553 14.447
20.264 27.736
13.608 21.392
22.208 31.792
21.447 25.553
15.264
4.736

Table 9.4: Estimated expected values under the logit model S/GR

9.1.

361

INTRODUCTION

The estimated odds ratio from the logit model of a high self-esteem to a low selfesteem for a male with high GPA and who is Black = 14.392/9.608 = 1.498. The
above is obtained from the table of estimated expected cell values. The estimated
odds ratios for all cells are given in Table 9.5.
Race
Black
White
1.645
1.498
1.9764 0.7306
0.6361 0.6985
0.8393 0.3103

GPA
High
Low
High
Low

Sex
Males

Females

Table 9.5: Estimated odds ratios of high to low self-esteem perception change under
logit model S/GR
From Table 9.5, we can show that the estimated odds of having a high self-esteem
to low-esteem are 2.355 times greater for males than for females. This is computed
as:
1.498
1.9764
1.645
0.7306
2.355 =
0.6361 0.8393
0.6985
0.3103
In other words,
In
= 2.355
The parameter estimate for the SE interaction from the above logit model (CATMOD and LOGISTIC) are:
Sex
\SE:

Male
0.4282

Female
-0.4282

Since high self-esteem was modeled, the parameter estimates indicate that high selfesteem is higher among males than females, with the odds of a male having a higher
self-esteem to a female being 2-4282-(--4282) = e2(o.4282) = 3355^ ^ obtained from
above. Similarly, from PROC GENMOD output, we can obtain this from the
parameter estimate for the SE interaction effect, which is e(-8569) = 2.356. The
2.355 measures the main effect for sex in the logit model, and it involves the GPArace interaction. PROC LOGISTIC gives us this estimate in its output displayed
above.
Because the logit term GR is significant in the logit model, we present below the
parameter estimates from this term (which corresponds to a three-factor interaction
XGRE], we present these estimates below, again from either PROC CATMOD or
LOGISTIC. The results indicate that high self-esteem is lower among the combination of high GPA-Blacks and low GPA-Whites than among high GPA-Whites and
low GPA-Blacks.

^GRE-

GPA
High
Low

Race
Black
White
-0.2722
0.2722
0.2722
-0.2722

We also present the prediction probabilities TTijk for the high self-esteem category.
This probability is designated as PHAT in the following SAS software output.

362

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


PHAT

1
1
1
1

1
1
2
2
1
1
2
2

2
2
2
2

1
2
1
2
1
2
1
2

0.5997
0.6219
0.6640
0.4222
0 . 3888
0.4113
0 . 4563
0 . 2368

It is immediately obvious from the output presented above that the males have
higher prediction probabilities for high self-esteem and that the highest prediction
probability for high self-esteem is obtained among Black males with low GPA, while
the lowest prediction probability is obtained among White females with low GPA.
It is therefore obvious from the above analyses that the chance of having a high
self-esteem change are much higher among males than females. Further, White
females tend to have the lowest chance of changing their high self-esteem, even
when they have low GPA. Within each gender, however, Whites with low GPA
tend to have less propensity for changing their self-esteem from high to low.

9.1.3

Another Example: Example 9.3

We re-analyze here the 24 table example in chapter 7. The table presented in


Table 7.1 relates to the study of marijuana use among adults in a large city and
a suburb. The variables are G (geographical location, 1 for San Francisco and 2
for Contra Costa), family status F (married with children & un married or without
children), religion R (Protestant, Catholic, or others). The response variable is M
(used marijuana, did not use marijuana). The data were analyzed in chapter 7 as
a log-linear model. We present here the reanalysis of these data using logit models.
Our analysis starts by employing SAS PROG LOGISTIC to select the most
parsimonious logit model using the forward, backward and stepwise selection strategies. We present below the SAS software program and a partial output for the
implementation of the forward selection procedure.
data chap93;
do M=l to 2;
do R=l to 2;
do G=l to 2;
do F=l to 2;
input count 3(0;
output;
end; end; end; end;
datalines;
52 3 23 12 37 9 69 17 35 15 23 35 130 67 109 136
proc logistic order=data; class R (ref=last) G (ref=last) F (ref=last)/param=ref;
weight count; model H=R|G|F/scale=none aggregate=(R G F)
selection=forward details; run;
Analysis of Effects Not in the Model

Effect
R*F
G*F

DF

Score
Chi-Square

Pr > ChiSq

1
1

0.0112
0.4942

0.9155
0.4821

NOTE: No (additional) effects met the 0.05 significance level for entry into the
model.

9.1.

INTRODUCTION

363
Summary of Forward

Effect
Entered

Step

1
2
3
4

Selection

DF

Number
In

Score
Chi-Square

1
1
1
1

1
2
3
4

50.8415
32.6132
4.2012
4.7890

F
R
G
R*G

Pr > ChiSq
<.0001
<.0001
0 . 0404
0.0286

Deviance and Pearson Goodness-of-Fit Statistics


Value

DF

Criterion
Deviance
Pearson

4 . 3403
4.5169

3
3

Pr > ChiSq

Value/DF
1 . 4468
1 . 5056

0.2270
0.2108

Number of unique profiles: 8


Type III Analysis of Effects

Effect

R
G
R*G
F

DF

Wald
Chi-Square

1
1
1
1

6.2577
8.3019
4.7637
45.5430

'
Pr > ChiSq

0.0124
0 . 0040
0.0291
<.0001

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
R
G
R*G
F

1
i
11
1

DF

Estimate

Standard
Error

1
1
l
1
1

-1.8923
0.6546
-0.6163
0.8077
1.3626

0.1929
0.2617
0.2139
0.3701
0.2019

Wald
Chi-Square
96.2575
6.2577
8.3019
4.7637
45 . 5430

Pr > ChiSq
<.0001
0.0124
0 . 0040
0.0291
<.0001

Odds Ratio Estimates

Effect
F 1 vs 2

Point
Estimate
3.906

957. Wald
Confidence Limits
2.630

5.802

The forward selection procedure selects the logit model {RG,F} for the above data.
The model fits with G2 = 4.3403, (pvalue = 0.22). This model is equivalent to a
log-linear model {RGF,RGM,FM}, which is the model arrived at in chapter 7. The
backward and stepwise procedures came up with the same logit model {RG,F} in
this example.

9.1.4

Interpretations of Selected Model

First we interpret the significance of F. The odds of 3.906 indicates that for a given
religion and geographical location, the odds are 3.91 times higher of having smoked
marijuana as against not having smoked marijuana among those individuals having
children versus those not having children.
Since the R*G interaction is significant, we can interpret the interaction effects
by constructing relevant contrasts in SAS software. These contrasts are implemented in the SAS software program below together with an output from their
implementations.

364

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

set tab93;
proc logistic order=data;
class R (ref=last) G (ref=last) F (ref=last)/param=ref;
weight count; model M=R|G F/scale=none aggregate=(R G F);
CONTRAST 'Rl vers 0,G=1' R 1 -1
R*g 1 0 - 1 0/ESTIMATE=EXP;
CONTRAST 'Rl vers 0, G=0' R 1 -1
R*g 0 1 0 -1/ESTIMATE=FJCP;
CONTRAST 'Gl vers 0, R=l' G 1 -1
R*g 1 -1/ESTIMATE=EXP;
CONTRAST 'Gl vers 0, R=0' G 1 -1
R*g 0 0 1 -1/ESTIMATE=EXP; run;
Odds Ratio Estimates
Point
Estimate

95% Wald
Confidence Limits
5.802

F 1 vs 2

Contrast Test Results

DF

Contrast
Rl
Rl
Gl
Gl

Wald
Chi -Square

1
1
1
1

vers 0,G=1
vers 0, G=0
vers 0, R=l
vers 0, R=0

Pr > ChiSq

31 . 2547
6.2577
0.3876
8.3019

C.OOOl
0.0124
0.5336
0 . 0040

Contrast Rows Estimation and Testing Results

Contrast
Rl
Rl
Gl
Gl

vers
vers
vers
vers

Type

0,G=1
0, G=0
0, R=l
0, R=0

EXP
EXP
EXP
EXP

Row

Estimate

Standard
Error

Alpha

Lower
Limit

Upper
Limit

1
1
1
1

4.3158
1..9243
1,.2110
0,.5399

1..1289
0.,5035
0,,3723
0..1155

0.05
0,.05
0.05
0,.05

2 ,5848
1.,1522
0. 6628
0.,3550

7 . 2062
3.2137
2.2124
0.8211

For a given family status, the odds are 4.32 times higher that an individual will
respond to having smoked to not having smoked for a San Francisco respondent
among Protestants and Catholics than among those with other religion. This odds
ratio is significant. Corresponding odds ratio for those respondents living in Contra
Costa is 1.92 times higher among Protestants and Catholics than among other
religious groups. This odds value is also significant.
For a given family status, the odds are 47% lower for having smoked to not
having smoked among San Francisco residents than among Contra residents, given
that the respondent has other religious belief. This odds value is also very significant. Its corresponding odds value for the Protestant or Catholic respondents is
not significant.

9.2
9.2.1

Poisson Regression, and Models for Rates


The Poisson Regression

Suppose the observed counts n; (tii > 0) follow a Poisson distribution with parameter A. Then from Poisson distribution properties, we have
E(rii) A

and

=A

We assume here that observed counts occur over a fixed interval, and because these
counts are nonnegative, a Poisson regression model is defined in terms of log of

9.2. POISSON REGRESSION, AND MODELS FOR RATES

365

expected counts (ra;) as:


(9.5)

= x/8

where the x represents the explanatory variables. The above is equivalent to modeling the intensity (A) as:
A = exp(x ft)
The latter is the multiplicative version of (9.5). We are often concerned with the
rate or intensity with which our events occur and whether this intensity is constant
or changing over time. Events with constant intensity are described as homogeneous
Poisson processes, while those with varying intensities are appropriately described
as nonhomogeneous processes. We now consider an approach for modeling these
kinds of data.
As an example, consider the data below from Zeger (1988), reproduced from
Lindsey (1995), relating the monthly numbers of cases of poliomyelitis over 14 years
in the United States.
Year
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983

J
0
2
0
1
1
0
0
1
0
3
0
1
0
0

F
1
2
3
0
0
1
3
1
0
0
1
0
1
1

M
0
0
1
0
1
0
1
0
0
0
1
0
0
0

A
0
1
0
1
0
1
1
1
1
2
1
0
1
0

Months
J J
3 0
1 3
4 0
1 1
0 1
0 1
2 0
1 0 2
0 1 0
7 8 2
3 0 0
0 0 1
0 1 0
0 1 2

M
1
0
1
1
1
0
0

A
2
3
0
0
0
2
4
1
2
4
0
2
2
1

S
3
2
1
1
1
0
0
3
2
1
0
0
0
0

0
5
1
6
0
0
0
2
1
4
1
1
2
0
1

N
3
1
14
1
0
1
1
2
2
2
0
0
1
3

D
5
5
1
0
2
2
1
4
3
4
1
0
2
6

A homogeneous Poisson process model contains a common mean for all the cases.
When this model is implemented, G2 = 326.2621 on 167 d.f. with A = e'2467 = 1.28.
The SAS software output for implementing this and other models in GENMOD is
displayed below:
data poiss;
do year=1970 to 1983; do month=l to 12;
input count <SQ; output; end; end;
datalines;
0 1 0 0 1 3 0 2 3 5 3 5 2 2 0 1 0 1 3 3 2 1 1 5
0 3
1 4 0 0 1 6 14 1 1 0 0 1 1 1 1 0 1 0 1 0
1 0 1
1 0 1 0 1 0 0 2 0 1 0 1 0 0 1 2 0 0 1 2
0 3 1
0 2 0 4 0 2 1 1 1 1 0 1 1 0 2 1 3 1 2 4
0 0 0
0 1 0 2 2 4 2 3 3 0 0 2 7 8 2 4 1 1 2 4
0 1 1 1 3 0 0 0 0 1 0 1 1 0 0 0 0 0 1 2 0 2 0 0
0 1 0 1 0 1 0 2 0 0 1 2 0 1 0 0 0 1 2 1 0 1 3 6
***Fit homogeneous model***;
proc genmod order=data; model count=/dist=poi; run;
***Fit non-homogeneous year model***;
proc genmod order=data; class year month; model count=year/dist=poi type3;

366

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

contrast 'yy' year 0 0 0 1 - 1 0 0 0 0 0 0 0 0 0 ,


year 0 0 0 0 1 - 1 0 0 0 0 0 0 0 0 ; run;
Analysis Of Parameter Estimates

DF

Parameter
Intercept
year
year
year
year
year
year
year
year
year
year
year
year
year
year

1
1
1
1
1
1
1
1
1
1
1
1
1
1
0

1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983

Estimate
0.2231
0.4274
0 . 3365
0.7259
-0.7621
-0.7621
-0.6286
-0.0000
0.1252
-0 . 0000
0.8183
-0 . 6286
-0.9163
-0 . 6286
0 . 0000

Standard
Error

ChiSquare

0.2582
0.3319
0.3381
0.3145
0.4577
0.4577
0.4378
0.3651
0.3542
0.3651
0.3100
0.4378
0.4830
0.4378
0.0000

0.75
1.66
0.99
5.33
2.77
2.77
2.06
0.00
0.12
0.00
6.97
2.06
3.60
2.06

Pr > ChiSq
0.3875
0.1978
0.3196
0.0210
0 . 0959
0.0959
0.1510
1 . 0000
0.7238
1 . 0000
0 . 0083
0.1510
0.0578
0.1510

Contrast Results

Contrast

DF

ChiSquare

Pr > ChiSq

0.09

0.9562

yy

Type

LR

If we allow the process to vary with the years (that is, different intensities for each
year), then a nonhomogeneous model gives G2 = 260.1858 on 154 d.f. The difference
in G2 = 66.08 on 13 d.f. has (pvalue = <0.0001), which is highly significant.
From examination of the parameter estimates for this model, we observe that
the first three years (1970-1972) have positive contrasts of log intensities, while the
next three years (1973-1975) all have negative contrasts of log intensities. Further,
the magnitudes of these contrasts of log intensities are very similar for the years
1973-1975. Could it be then that these three years have the same intensity?. This
is equivalent to equating the three parameters for these years.
We tested the above hypotheses with the contrast statement in GENMOD above,
and the results gives a G2 = 0.09 on 2 d.f. Clearly this hypothesis is tenable. That
is, the intensities of the cases of poliomyelitis are not significantly different for years
1973-1975. The next four years (1976-1979) all have positive intensities and the last
three years (1980-1982) all have negative intensities.
On the other hand, we may also allow the process to vary with the months
(that is, different intensities for each month): then a nonhomogeneous model gives
G2 = 271.0113 on 156 d.f. The difference in G2 = 55.2508 on 11 d.f. has (pvalue
= <.0001), which is highly significant. Below is the GENMOD statements and a
modified output.
proc gerunod order=data; class month; model count=month/dist=poi ;
run;
Analysis Of Parameter Estimates

Parameter
Intercept

DF

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

0.9445

0.1667

32.11

<.0001

9.2. POISSON REGRESSION, AND MODELS FOR RATES


month
month
month
month
month
month
month
month
month
month
month
month

1
2
3
4
5
6
7
8
9
10
11
12

1
1
1
1
1
1
1
1
1
1
1
0

-1 .3863
-0.9445
-2 .1972
-1 .2809
-0 .8755
-0 .4925
-1 .0186
-0 .4480
-0 .9445
-0 .4055
-0 .1495
0.0000

13..84
8.,99
17,.38
12,.84
8.,12
3.,31
9..91
2,.82
8,,99
2,,37
0.,37

0,.3727
0..3150
0..5270
0,.3575
0,.3073
0.,2706
0..3236
0,.2669
0,.3150
0,,2635
0,.2450

367

0,.0002
0,.0027
< .0001
0.0003
0,.0044
0,.0688
0,.0016
0.0933
0.0027
0,.1239
0,.5417

0.0000

One of the main difference between log-linear modeling and Poisson regression is
that unlike log-linear modeling techniques in which margins, rows, or columns are
fitted, in Poisson regression, rows, columns, or other margins of the data do not
come into play. These margins take the role of "explanatory variables" in Poisson
regression. In other words, we do not fit the marginal totals as in log-linear modeling. However, we have observed for these data that both the month and years
are important in explaining the number of cases of poliomyelitis during this period.
Consequently, we incorporate these two factors into our model in the GENMOD
statement above, but asking for types 1 and 3 tests. The model when implemented
has G2 = 204.9351 on 143 d.f. Results from both the type 1 and type 3 tests
indicate that both factor variables are important in our model.
proc genmod order=data; class year month;
model count=year month/dist=poi typel typeS; run;

Source
Intercept
year
month

LR Statistics For Type 1 Analysis


ChiDeviance
DF
Square
326.2621
260.1858
204.9351

13
11

Pr > ChiSq

<.0001
<.0001

66.08
55.25

LR Statistics For Type 3 Analysis

Source

DF

ChiSquare

Pr > ChiSq

year
month

13
11

66.08
55.25

<.0001
<.0001

From the model, the intensities can be obtained from the xbeta output from PROC
GENMOD.

9.2.2

Another Example

The data below from Upton and Fingleton (1985) relate to hypothetical quadrant
data where there is a possibility of north-south or east-west trends in the data.
Quadrant counts displaying possible north-south trend
0
3
2
1
6

0
3
3
5
2

0
5
6
4
4

1
2
2
6
3

0
0
5
7
4

368

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

Following Upton and Fingleton, to model for the north-south and east-west possible
trend in the data, suppose we let Xi and X? be quadrant coordinates ranging from
(1,1) in the north-west corner to (5,5) in the south-east corner. A constant intensity
model gives G2 50.1493 on 24 d.f. A poor fit. However, when the continuous
explanatory variables X and Xi are introduced into the model, PROC GENMOD
produces the following summary results for the parameters of the model:
E(nij) = exp(A) + PiXu + {32X2j)
data upton;
do xl=l to 5;do x2=l to 5;
input count <D<8; output;end; end;
datalines;
0 0 0 1 0 3 3 5 2 0 2 3 6 2 5 1 5 4 6 7
62434
proc genmod order=data; model count=xl /dist=poi typel type3;
run;
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

Value/DF

Deviance
Pearson Chi-Square

22
22

34.9906
29.7345

1.5905
1.3516

Analysis Of Parameter Estimates

Parameter

DF

Intercept
xl
x2
Scale

1
1
1
0

Estimate
-0.1788
0 . 3248
0 . 0609
1 . 0000

Standard
Error
0.4252
0.0878
0.0824
0 . 0000

ChiSquare
0.18
13.68
0.55

Pr > ChiSq
0 . 6742
0 . 0002
0 . 4598

The model has a G2 of 34.991 on 22 d.f., which is a good fit. The estimated cell
counts for row i and column j are given by:
= (0.8363)(1.3838)Xli(1.0628)*2i
Thus the estimated Poisson parameter A for, say, the quadrant in the third row
and fourth column is 0.8363 x 1.38383 x 1.06284 = 2.8273. Thus a movement down
r rows of the table implies an increase in A by the multiplicative factor 1.3838r,
and similarly for movement across the columns, we have a multiplicative factor of
1.0628. The type3 analysis suggest that /?2 is not significant (p 0.4592) and
can well be removed from the model. Removing this parameter from the model
now gives a model with parameter estimates /3o = 0.0077 (a.s.e. = 0.3386) and
fa = 0.3248, (a.s.e. = 0.0878). The model has G2 of 35.538 on 23 d.f., which is
again a good fit. The above results indicate that the data exhibit only the northsouth trend. This is not surprising as the original artificial data were randomly
generated from a Poisson distribution having A = i, where i is the row of the table.

9.2.3

Weighted Log-Linear Models

The data in Table 9.6 are from a stratified sampling that resulted in a disproprportionate number of population size across age groups for the two cities. The data
relate to incidence of nonmelanoma skin cancer among women in Minneapolis-St.
Paul and Dallas-Ft. Worth (Le, 1992). We shall therefore analyze the data using as

9.2. POISSON REGRESSION, AND MODELS FOR RATES

369

weights the population size in each category. The advantage of the weighted analysis is that it removes the bias due to the unequal population sizes. Such an analysis
gives rise to what has been dubbed Poisson regression models. The justification for
a Poisson regression in this case is that few responses are observed out of a very
large number of possible responses or large number of population size.
Minn-St. Paul
Population
No of
cases
Size
1 172,675
16
123,065
30 96,216
71 92,051
102 72,159
130 54,722
133 32,185
40 8,328

Age Group

(yr)
15-24
25-34
35-44
45-54
55-64
65-74
75-84
85+

Dallas-Ft. Worth
No of
Population
Cases
Size
4
181,343
146,207
38
121,374
119
221 111,353
259 83,004
310 55,932
29,007
226
65 7,538

Table 9.6: Incidence of nonmelanoma skin cancer among women


We can model the incidence of nonmelanoma skin cancer on age (A) and city (C),
if we let riij be the observed number of cases reported for age group i and city j in
a total population size of Wij. Then, if we also let the corresponding expected cell
value be ra^-, our model becomes (Agresti, 1990):
In (rhij/Wij) = n + Af y + Afe

(9.6)

where the term on the left-hand side is the log-weighted frequencies. Estimates
of the A parameters are obtained by conditioning on the cell weights. The model
assumes that there is no interaction between age and cities. The above model seems
like the familiar log-linear model formulation except for the fact that the left-hand
side equals In(rhij) In (Wij) instead of the familiar In(rhjj). The term ln(Wy),
which is referred to as the adjustment term, is called an offset. The above model
can be implemented in SAS^ GENMOD by specifying in the option statement that
log(popl) is an offset, that is, off= In(popl).
proc gerunod order=data;
class city age;
model cases=city age/dist=poi link=log offset=off type3 obstats; run;

DF

Parameter
Intercept
city

age
age
age
age
age
age
age
age
Scale

msp
15-24
25-34
35-44
45-54
55-64
65-74
75-84

85+

1
1
1
1
1
1
1
1
1
0
0

Estimate

-4
-0
-6
-3
-2
-1
-1
-0
-0
0

6754
8043
1782
5480
3308
5830
0909
5328
1196
0000
0000

Standard
Error

ChiSquare

0.0991
0.0522
0.4577
0.1675
0.1275
0.1138
0.1109
0.1086
0.1109
0.0000
0.0000

2225 55

237
182
448
334
193
96
24
1

34
17
76
36
38
75
06
16

Pr > ChiSq

.0001
.0001
.0001
.0001
.0001
.0001
.0001
.0001
0.2809

The model above considers age and city as factor variables. When this model is
fitted to the data in Table 9.6, it gives G2 value of 8.1950 on 7 d.f. with a city

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

370

estimate (Minneapolis/St. Paul) of 0.8043. Consequently, the relative risk for


women from Minneapolis/St. Paul of contracting the skin cancer given the age
group of an individual (that is, after adjusting for age group) is e(--8043) = 0.4429
times that of Fort Worth Dallas. Put in another way, the relative risk of contracting
non-melanoma skin cancer among women is 4*^ = 2.235 times higher in Fort
Worth Dallas (after adjusting for age group) than for women in the Twin Cities
of Minneapolis-St. Paul. Figure 9.1 gives the cancer rates for these two cities for
varying age groups. The figure confirms the higher rate for Dallas and also indicate
the gradual rise in the rate as individuals get older. The rates seem to decline
between the ages of 65 to 70.

Figure 9.1: Plot of Rates against age


An alternative analysis for the data in Table 9.6 is to assume that age is not a
categorical variable. Kleinbaum et al. (1998) suggested a transformation of the
form:

(Midpoint of i-th
- age group) -- 15
oo

,8

and then fitted a Poisson model that has ln([7) as a covariate: that is, the model
with
ln(m^/W^) = A, + A ln(0i) + /32cityj.
(9.7)
When this model is implemented in GENMOD, we have the following summary
results:
U=(AGE-15)/35; T=LOG (U) ; of f =log(popl) ;
proc genmod order=data; class city;
model cases=city TT/dist=poi offset=off type3 obstats; run;
Criteria For Assessing Goodness Of Fit
Criterion

DF

Deviance
Pearson Chi-Square

Value

Value/DF

14.2877
14.1568

1.0991
1.0890

Analysis Of Parameter Estimates

Intercept
city
city

T
Scale

msp
dallas

DF

Estimate

1
1
0
1
0

-6 .2354
-0 .8027
0.0000
2 .2493
1 .0000

Standard
Error
0.0324
0.0522
0 . 0000
0.0621
0 . 0000

Wald 95'/.
Confidence Limits

ChiSquare

Pr > ChiSq

-6,.1718
-0..7004
0.0000
2 .3710
1 .0000

36988.0
236.67

<.0001
<.0001

1312.44

<.0001

-6,.2989
-0,.9050
0,.0000
2,.1276
1,.0000

9.2. POISSON REGRESSION, AND MODELS FOR RATES

371

This model fits the data with a deviance of 14.287 on 13 d.f, with an adjusted odds
ratio of l/(exp0.8027) = 2.2316, indicating again that the risk is about 2.2316
higher for women in Fort. Worth than in Minneapolis/St. Paul after adjusting for
the effect of age. Figure 9.2 gives the sketch of the rates under this model. We
observe that the graph is smoother this time around.

Figure 9.2: Plot of rates against age

9.2.4

Log-Linear Models for Rates

There is a clear relationship between odds and rates and hence between models
for rates and logit models. Odds and rates are approximately equal (Clogg &
Eliason, 1988) when the Poisson approximation to the binomial can be used, that
is, whenever rare events are considered. However, when analyzing rare events (that
is, when p is very small), logit models for a binary response variable will be virtually
identical to log-linear models for rates.
Consider an / x J x 2 contingency table with observed frequency riijk, where the
third variable is the response binary variable (sucess or failure). The logit model
has:
which represents a model with additive effects of the factors on the log-odds of success. Since logit models always fit the "group totals," in this case the AB interaction
or njj+, the marginal distribution of the joint variable composed of all independent
variables. Suppose we define the expected rate for sucess as:
Then if we condition on the observed group totals, riij+, the following log-rate model
AB
m (p
can be used:
m\"^ji/"^j+)
(TT? -. /rr> , 1
m
v r i ij \ )
ili
(QQ\
A

I"'")

The 77 parameters relate to the parameters of the model, The model above is
equivalent to the log-linear model {AB,AC,BC}.

9.2.5

Example 9.3

The data in Table 9.7 relate to a 2 x 5 x 2 table (Clogg & Eliason, 1988) involving
the variables industry, age, and lung functioning (abnormal versus normal), since

372

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


Independent variables
Industry Age group
Manufact.
20-29
30-39
40-49
50-59
60+
Service

20-29
30-39
40-49
50-59
60+

Response
Abnormal Normal
394
9
22
666
668
15
502
37
17
116
12
17
17
30
14

Total
403
688
683
539
133

244
508
582
423
141

256
525
599
453
155

Table 9.7: Lung functioning by age and industry


the rate of abnormal lung functioning is a rare event; indeed, observed rates are
below 0.10 for all age-industry combinations.
The data have been displayed to reflect the marginal total riij+, the number of
observations in category i of variable A and category j of variable B averaged over
the response variable C. Following Clogg and Eliason (1988), let Nij = nij+ =
n>iji + fiij2. For example, for the data in the Table 9.7, NU = 403 and the observed
proportion of abnormal lung functioning in the manufacturing industry among the
age-group 1 (20-29) is thus 9/403 = 0.0223, giving an observed odds of 0.0223/(1.00.0223) = 0.0228 and a resulting observed logit of -3.7808. The three quantities
proportions, odds, and logit will be used to model contingency tables in this section.
To fit the logit model, we would use the observed 20 cell counts, whereas to fit
the log-rates model, we would use only the 10 success counts (abnormal) and the
marginal totals riij+ as the weights.
Three models are considered for modeling the data in Table 9.7. These are:
(a) The linear-probability model (Grizzle et al., 1969), where if we let pij =
ihiji/Nij denote the expected probability that level of response = 1, when
industry and age are at levels i and j respectively, then the saturated model
is
= a + fa + 7j + Si
(9.10)
where a is a constant, /% and 7^ pertain to main effects of industry and
age, and Sij denotes the interaction term if significant or the departure from
additivity (see Clogg &; Eliason, 1988).
(b) The logit model (Goodman, 1978) has the formulation:
In \pij/(l - Pij}} = a + fa + 7j + S

(9.11)

where \n\pij/(I Pij)] = In (n^i/n^) is the observed logit when variable


C takes the value 1 (abnormal) and industry and age are at levels i and j
respectively. The parameters are explained in the previous case, although
estimates from both models are not necessarily the same.

9.2. POISSON REGRESSION, AND MODELS FOR RATES

373

(c) When the proportion (p) is referred to as the rate, then the model in (a) above
can be regarded as a linear model for rates. The relevant model here is the
log-rate model or Poisson regression, which is defined as:
In (rhi^/Nij) = a + & + 7j + &a

(9-12)

The linear-probabilty model can be implemented using weighted least squares (as
indicated in chapter 8), since the variances of the pij are not constant, being functions of the Nij, that is, the marginal totals. We will concentrate our attention on
the other two models in (b) and (c) above. The saturated logit and log-rate models
are first fitted to the data, and type 3 partial tests are examined to find important
explanatory terms. That is, we wish to fit a log-linear model of the form

K
In (rhij/Wij) = aj+^2 PkXijk
3=1
where fhij are the expected cell counts under some model and Wij are the weights(or
offsets). Thus to fit the log-rate model, we specify the weights to be the Nij. The
logit model is also fitted by specifying the fixed marginal totals with the fixed
command. That is, the logit model is fitted by conditioning on the industry-age
marginal.
data tab96;
input Ind $ agel $ abn total; off=log(total);
datalines;
mf 20-29 9 403 mf 30-39 22 688 mf 40-49 15 683 mf 50-59 37 539
mf 60+ 17 133 sv 20-29 12 256 sv 30-39 17 525 sv 40-49 17 599
sv 50-59 30 453 sv 60+ 14 155
***Fit saturated logit Model***;
proc genmod order=data; class ind age;
model abn/total=ind|age/dist=bin link=logit typeS; run;
LR Statistics For Type 3 Analysis

Source
Ind
age
Ind*age

DF

ChiSquare

Pr > ChiSq

1
4
4

0.58
50.28
4.40

0.4448
<.0001
0.3546

The result of the saturated logit model indicate that only the effect of age is important in explaining abnormal lung functioning. We next fit the saturated log-rate
model and we further display partial results below:
set tab96;
proc genmod order=data;
class ind age;
model abn=ind|age/dist=poi offset=off typeS;
run;
LR Statistics For Type 3 Analysis

Source
Ind
age
Ind*age

DF

ChiSquare

Pr > ChiSq

1
4
4

0.64
47.71
4.18

0.4236
<.0001
0.3817

374

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

The results from the saturated log-rate model also indicate that only age is important for a proper explanation of the data in Table 9.7. The x2 values are very close
in both cases. We note here that the second model (the log-rate model) employs
the weights (lu(Nij)) as offsets in this model.
The above results indicate that the type of industry does not seem significant, as
for the interaction effects between age and industry. Thus a reduced model without
the interaction terms is next fitted to the data. Both logit and log-rate models now
fit the data. They have G2 values of 4.3999 and 4.1837 on 4 degrees of freedom,
respectively. We also notice that the estimates of the parameters are very similar
for both models, indicating that for rare events, both models can be considered
equivalent. However, examination of the parameter estimates in both cases again
confirms that the type of industry is again not significant in both models. The
industry parameter has G2 values of 0.17 and 0.16, respectively on 1 d.f. for both
the logit and log-rate models (pvalue >0.8816).
Since age has been found to be signifant, we next consider models involving only
age as the explanatory variable. In this case, the logit and log-rate models give G2
values of 4.5683 and 4.3427 on 5 degreees of freedom, respectively. The parameter
estimates under both models are displayed in Table 9.8.
Parameters
a
Age:
7i
72
73
74
75

Logit
Estimates
ASE
-2.1151 0.1901
-1.2987
-1.2895
-1.5501
-0.5100
0

0.2921
0.2503
0.2611
0.2284
0

Log-rate
Estimates
ASE
-2.2290 0.1796
-1.2172
-1.2083
-1.4615
-0.4661
0

0.2826
0.2406
0.2520
0.2172
0

Table 9.8: Parameter estimates, log-rate models, with only age as the explanatory
variable
The contrast between age 1 and age 5 (i.e., between the 20-29 and 60+) under
the log-rate model is 1.2172, hence, the estimated relative risk ratio is given by
e -i.2i7 _ o.296. That is, the risk of abnormal lung functioning for the 20-29 years
old is about 29.6 % times those for the 60+ years old. In other words, those 60+
have relative risk that is -^ = 3.38 times higher than those 20-29 years old. Based
on the above analyses, the response is very well explained by only the explanatory
variable, age. The equivalent log-linear model {IA,AR} has G2 = 5.3001 on 5
degrees of freedom. The corresponding interaction AR parameter estimates are
{-1.2457, -1.2365, -1.4970, -0.4889, 0} which are again close to those presented
above.
Because of the ordinal nature of the age variable, we consider next fitting a
model with a linear trend and a quadratic response in age, using the centers of the
age classes (ignoring industry), to the data. The results of the quadratic model are
displayed in the partial SAS software output below.
set tab96;
if agel='20-29' then age=.5*(20+29);
else if agel='30-39' then age=0.5*(30+39);
else if agel='40-49' then age=.5*(40+49);

9.2.

POISSON REGRESSION, AND MODELS FOR RATES

375

else if agel='50-59' then age=.5*(50+59);


else age=.5*(60+69);
age2=age*age;
proc genmod; model abn=age age2/dist=poi offset=off type3; run;
Analysis Of Parameter Estimates

Parameter

DF

Estimate

Intercept
age
age2
Scale

1
1
1
0

-1.9041
-0.0991
0.0015
1.0000

Standard
Error
0.9281
0.042
0 . 0005
0 . 0000

ChiSquare

4.21
5.38
10.14

Pr > ChiSq
0 . 0402
0.0204
0.0014

Under the log-rate model, the model gives a G2 value of 11.3355 on 7 degrees of
freedom. When we next consider a cubic term in the model, although the model
fits the data, the extra cubic parameter is not significant, and we therefore conclude
that an appropriate model for the data is the log-rate or logit model with a quadratic
age effect. The estimated model is given by the equation below and the plot of this
response is given in Figure 9.3.
In (rhij/LOij] = -1.9041 - 0.0991(a#e) + 0.0015(a#e)2
(9.13)

Figure 9.3: Plot of rates against age


It should however be noted that when we have nonrare events, the two models (logit
and log-rate) can give quite different parameter estimates. In this situation, we
would then need "to use both statistical and substantive criteria to choose between
the two" (Clogg & Eliason, 1988).

9.2.6

Example 9.4

As our last example in this section, the data in Table 9.9 are reproduced from Clogg
and Eliason (1988). The data arise from a longitudinal study where exposures are
obtained by following individuals through time and the frequencies of occurrence
of events obtained over time. The data in the table relate to death counts among
subjects who had received kidney transplants from donors (cadavar or living), time,
and match grade (number of matched antigens out of a maximum of four). This
resulting table is a 2 x 5 x 3 contingency table. If we designate the exposure in cell
( i , j , k) by Eijk, then a saturated Poisson regression model for the data is given by:

376

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


In (mijk/Eijk)

+ A + A DT

=a +A +
i \DG , \TG
"I" A
~r " '

\DTG

where D refers to donors, T to time, and G to Match grade.

Donor
Cadaver

Time
0-1 mo.
1-3 mo.
3-6 mo.
6-12 mo.

> l yr.
Living
relative

0-1
1-3
3-6
6-12

mo.
mo.
mo.
mo.

> lyr.

Deaths:
Match grade
0-2
3 4
1
204 30
170
16 4
94 10 2
54
6 0
91
10 3
35
64
31
26
34

19
9
8
4
5

Exposures:
Match grade
0-2
3
3471
27390.5
43470
5340
54000
7110
94860
12600
48600
351090

5
6
4
3
6

13677
24510
32715
60210
242100

4665.5
8325
11880
22590
104940

4
521
930
1080
1980
6570
4839
9300
13590
26595
115380

Table 9.9: Graft failures following kidney transplants


The fit of the saturated weighted log-linear model {DTG} with the weights being
the log of exposures indicates that only the interaction term TG is significant, while
a a partial test of the two-factor interaction terms (DT, DG, TG) indicates that
only the interaction term DG is important. This model is implemented for instance
with PROG GENMOD in SAS software with the following:
data tab98;
do donor=l to 2; do time = 1 to 5; do grade=l to 3;
input death exps 3(21; of f=log(exps); output; end; end; end;
datalines;
204 27390.5 30 3471 1 521 170 43470 16 5340 4 930
94 54000 10 7110 2 1080 54 94860 6 12600 0 1980
91 351090 10 48600 3 6570 35 13677 19 4665.5 5 4839
64 24510 9 8325 6 9300 31 32715 8 11880 4 13590
26 60210 4 22590 3 26595 34 242100 5 104940 6 115380

run;
proc genmod order=data; class donor time grade;
model death=donor(time donor I grade time|grade/dist=poi link=log offset=off type3;
run;

Results with various models applied to the data in Table 9.9 give contradictory
conclusions, and we therefore decided to fit various combinations of these two-factor
interactions to the data. These results are summarized in Table 9.10.
Model (i) fits the data well, and we therefore seek a more parsimonious model with
fewer number of parameters than model (i). Models (ii), (iii), and (iv) enable us
to test the hypotheses whether each of the interaction terms DT, DG, and TG is
respectively zero. The results of these tests indicate that only the DG interaction
term is worthy of inclusion in our model. We therefore fit model {T,DG} in (v),
and the pvalue for this model of 0.1104 indicates that the model fits the data. As
expected, models (vi) and (vii) do not fit the data in view of our earlier conclusion.
We again sought to fit a reduced model to (v), by fitting the linear components
of variables T and G, which are ordinal in nature. To implement this, we used

9.2. POISSON REGRESSION, AND MODELS FOR RATES


Number
(i)
(ii)
(iii)
(iv)
(v)
(vi)
(vii)
(viii)
(ix)
(x)
(xi)

G2-

Model
DT,DG,TG
DG,TG
DT,TG
DG,DT

9.1353
16.0698
15.7920
22.3274
27.9643
23.4127
29.8700
28.2226
28.6098
42.5739
42.9260

T,DG
D,TG
G,DT
G,T,D,D*G(1)
G(1),T,D,D*G(1)
G,T(1),D,D*G(1)
G(1),T(1),D,D*G(1)

d.f.
8
12
10
16
20
14
18
21
22
24
25

377

pvalue
0.3307
0.1882
0.1059
0.1334
0.1104
0.0537
0.0385
0.1342
0.1566
0.0111
0.0142

Table 9.10: Results of fitting various models to the data in Table 9.9
the midpoints of variable T as scores and integer scores {1,2,3} for variable G.
Our attempts at fitting these reduced models give models (viii) to (xi), from which
model (ix) (T, DG(1)) emerged as the most parsimonious model. This model has a
G2 value of 28.6098 and is based on 22 degrees of freedom. We present in the next
SAS software output, the parameter estimates from this model.
Clogg and Shockey (1988) also give an equivalent log-rate model utilizing cell
covariates. Their model has the saturated log-linear model formulation:
In(mtjfc) = Q. + Xf +
\DG
Xik

i
\TG
+ Xjk

+ X + DT
\DTG

A fit of model (ix) with the covariate gives G2=28.4356 on 21 degrees of freedom.
This is the model {G(1),T,D,D*G(1),X}, where X in this case is the covariate,
that is, the log of the exposures EIJ^. For this data, there is only one covariate.
However, it is common to have situations involving several covariates. We give
the SAS software statements and partial output from the Poisson regression model
applied to the above data.
set tab98;
proc genmod order=data; class donor time;
model death=donorI grade time/dist=poi offset=off typeS;
contrast 'donorl vs 2' donor 1 -1
donor*grade 1 -1;
estimate 'donorl vs 2' donor 1 -1
donor*grade 1 -1/exp;

run;

Analysis Of Parameter Estimates

Parameter
Intercept
donor
donor
grade
grade*donor
grade*donor
time
time
time
time
time
Scale

1
2
1
2
1
2
3
4
5

DF

Estimate

Standard
Error

1
1
0
1
1
0
1
1
1
1
0
0

-8 .3446
0.1599
0.0000
-0 .5410
0.4461
0.0000
3 .3464
2 .7644
1 .9274
0.8747
0,.0000
1.,0000

0,.1614
0,.1861
0.,0000
0.,0946
0.,1376
0..0000
0. 1007
0..1022
0..1159
0.,1322
0. 0000
0. 0000

Wald 957.
Confidence Limits

-8 .6610
-0,,2048
0,.0000
-0..7263
0,.1764
0..0000
3.,1489
2.,5641
1.,7002
0..6157
0. 0000

1. 0000

-8 .0282
0.5246
0.0000
-0 .3557
0.7158
0.0000
3,.5438
2,.9647
2,.1545
1,.1337
0.,0000
1.,0000

ChiSquare
2672,.33

Pr > ChiSq

0,.74

<.0001
0.3902

32..73
10,.51

<.0001
0.0012

1103.,41
731.,49
276.,58

<.0001
<.0001
<.0001
<.0001

43..80

378

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

Based on these estimates, the estimated log-rate model becomes:


]n(rhijk/Eijk)

= 0.1599 donor* - 0.5410 grade,,


+ 0.4461 (donor * grade);./ + 3.3464 timei
+ 2.7644 time2 + 1.9274 time3 + 0.8747 time4

For the second model employing the cell covariates, we would normally expect the
value of 7 to be one (Clogg & Shockey, 1988). Our results in the above table
indicate that 7 = 0.884 in our model. Of course if 7 = 1, the parameter estimates
from both models would be expected to be identical. Our results further indicate
that the log-rate model utilizing the cell weights is more parsimonious for our data
than the second model, because it has a smaller AIC (Akaike information criterion;
see chapter 7). Further, parameter estimates from this model all have smaller
asymptotic standard errors (a.s.e.) Hence, we could conclude that the explanatory
variable Time is important for a proper explanation of the data, as well as the type
of donor. However, only the linear component of match grade helps to explain the
variation in the data.
The linear match grade and donor interaction is also very significant in the
model. Controlling for the time effects, an estimate of the overall parameter for
interpreting this interaction can be formulated as follows:
~

J 0.5410 * grade,
if donor=0
~ JO. 1599 - 0.0949 * grade, if donor=l

Thus when donor is zero (that is, living relative donors), the odds are exp(0.5410),
exp(0.5410*2), exp(0.5410*3) for match grades 0-2, 3, and 4, respectively. That
is, the odds of rejection are {0.582, 0.339, 0.197} for the match grades, respectively.
Thus for living relative donors, the least odds are those with match grade 4.
Similarly, when donor = 1, that is, those with cadaver donors, the odds are
exp(0.065),exp(-0.0299), and exp(-0.1249) for match grades 0-2, 3, and 4, respectively. That is, the odds of rejection are {1.067, 0.971, 0.883} for the match grades,
respectively. Thus for those who received kidney transplants from cadaver donors,
the least odds of rejection are again those with match grade 4. In comparing the
two results, its quite clear here that those that received living relative organs are
relatively less likely to reject the organ than those that receive cadaver organs for
the same match grade. It should be noted that the main effect of donor in itself is
not significant. It only becomes important when match grade is incorporated into
it. The results of the "contrast" and "estimate" statements in the SAS software
program are displayed below.
Contrast Estimate Results

Label
donorl vs 2
Exp(donorl vs 2)

Contrast
donorl vs 2

Estimate

Standard
Error

Alpha

0.6060
1.8331

0.0815
0.1494

0.05
0.05

Contrast Results
ChiDF
Square
Pr > ChiSq
1

60.54

<.0001

Confidence Limits
0.4463
1.5625

Type
LR

0.7657
2.1505

ChiSquare

Pr > ChiSq

55.30

<.0001

9.2. POISSON REGRESSION, AND MODELS FOR RATES

379

The results indicate that the odds of rejection for match grade (0-2) are 1.83 times
higher among those who received cadaver organs than among those who received
living relative organs. This odds increase to 1.8332 = 3.36 for those with match
grade 3 and to 1.8333 = 6.15 for those receiveing match grade 4 organs. In all the
comparisons, theses differences are statistically significant.
The effect of time is very significant. Thus the odds of rejection are exp(3.3464)
= 28.40 times higher for 0-1 months survived patients than those that received the
transplant and lived for a year or more. Similar odds relative to those with one
year or more after transplantation are 15.87, 6.87, 2.40 for those who had received
transplants for between 1-3, 3-6, and 6-12 months, respectively, while the odds for
1-3 months patients are exp(2.7644 1.9274) = 2.31 times higher than those of 3-6
months patients. These results indicate that the odds of rejection decreases as the
elapsed time between transplantion increases after adjusting for donor and match
grade.

9.2.7

Survival-Time Models

The log-rate or Poisson regression model discussed above can easily be adapted
for fitting survival-time models. Here, if we let rh; denote the expected number of
deaths (or events) for subject i, then
where A is the hazard function, which for the negative exponential equals a constant;
T is the time to death, Xi is the explanatory variable i, and ft are the parameters of
the model. Further, (X]^i) ig the total exposure time at each setting. A log-linear
rate model is therefore usually of the form:

In (mijklEijk) = M + Af + \f +
where A, B, , are the explanatory variables. Agresti (1990, p. 195) analyzed
an example of a lung cancer survival data for 539 males diagonised as having lung
cancer in 1973. The data are reproduced in Table 9.11.
The prognostic factors are histology (3 levels), stage of disease (3 levels), and followup period which was divided into seven 2-month intervals. Models of the form:
In (rrujk/Eijk) = V + Af + Af + A I +
where Eijk, the offset, is the log of exposure time, are considered. Results from
partial tests analysis indicate that only the effect of stage (S) is important. All
other terms turn out not to be significant.
LR Statistics For Type 3 Analysis

Source

DF

Square

t
s
t*s
h
t*h
h*s

6
2
12
2
12
4

10,97
47.92
15.22
2.78
10.12
2.73

Pr > ChiSq
0.0892
c.OOOl
0.2295
0 . 2486
0.6051
0 . 6042

380

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


Histology

I
Stage
Time
Interval
Stage:
0-

II
Stage

III
Stage

(157

2
12
134

3
42
212

1
5
77

2
4
71

3
28
130

1
1
21

2
1
22

101)

2-

2
(139

7
110

26
136

2
68

3
63

19
72

1
17

1
18

11
63)

4-

9
(126

5
96

12
90

3
63

5
58

10
42

1
14

3
14

7
43)

6-

10
(102

10
86

10
64

2
55

4
42

5
21

1
12

1
10

6
32)

8-

1
(88

4
66

5
47

2
50

2
35

0
14

0
10

3
21)

10-

3
(82

3
59

4
39

2
45

1
32

3
13

1
8

0
8

3
14)

12-

1
(76

4
51

1
29

2
42

4
28

2
7

0
6

2
6

3
10)

1
9

3
19

Table 9.11: Number of deaths and total follow-up for the time periods, histology,
and stage of disease
Criteria For Assessing Goodness Of Fit
Criterion

DF

Value

Value/DF

Deviance
Pearson Chi-Square

60
60

57.4021
54.3456

0.9567
0.9058

Analysis Of Parameter Estimates

Parameter
Intercept

s
s

1
2

DF

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

-1.7010
-1.3758
-0.8929

0.0676
0.1477
0.1331

633.66
86.80
44.98

<.0001
C.OOOl
<.0001

1
1

The model {S}, therefore, when applied to the data gives a parsimonious fit with
G2 57.4021 on 60 d.f. and the following estimates of the parameters of S based
on PROC GENMOD.
A f - A f = -1.3758
Af - Af = -0.8929
From the above, we have for instance Af Af = 0.4829 and we can therefore estimate
the corresponding sum to zero parameters by first adding the two expressions and
imposing the sum to zero constraints to give Af = 0.6196, Af = 0.1367, hence,
Af = 0.7562 and Af - Af = -0.1367 + 0.6196 = 0.4829. For instance, regardless of
the follow-up time and histology levels, we estimate the risk to be e-4829=1.62 times
higher at the second stage of disease than at the first stage. Similarly, AS A2 =
0.8929 with a risk of 2.44 times higher at the third stage of the disease than at the
second stage. This result is consistent with fitting a more complicated model of
the form (H,T,S) to the model. The latter model would give a relative risk ratio of
about 2.349.

9.3. MULTICATEGORY RESPONSE MODELS

381

Below are the relevant SAS software statements used in implementing the model
discussed above for the data in Table 9.11.
data surv;
do t=0 to 12 by 2; do h=l to 3; do s=l to 3;
input count tot 8Q; off=log(tot) ;
output; endl; end; end;
datalines;
9 157 12 134 42 212 5 77 4 71 28 130

4 51 1 29 2 42 4 28 2 7 0 6 2 6 3 10
proc genmod order=data; class t h s;
model count=s/dist=poi offset=off typeS; run;

9.3

Multicategory Response Models

So far, we have discussed cases in which the response or outcome variables have
two categories. But what happens if the response category has more than two
categories? Do we collapse the many categories to two, so that we might be able to
use the theory and methods developed for binary response situations? As pointed
out by Lawal (1980), collapsing categories together may sometimes detract from
the main import of the investigation. What happens for those situations where
collpasing categories does not really make sense? A typical example of this is the
response "party affiliation" with categories (Republican, Democrat, Independent):
how do we collapse these categories to only two categories without detracting from
the differences (even if philosophically) among the three parties? We shall develop
in this section the necessary methodology required to analyzing data in which the
outcome variable has many categories.
When a response variable has many categories, we often refer to that variable as
multicategory or we may simply describe it as a polychotomous response variable.
In this section we shall examine the various multinomial logit models that have
been proposed for analyzing data in which a polychotomous outcome variable is
either nominal or ordinal. Let us first consider the case when the outcome variable
is multicategory but nominal.

9.3.1

Baseline Category Model

Consider a two-way / x J contingency table indexed by variables A and R, respectively. Suppose variable R is a response variable. We have shown earlier that for
the case when J = 2, the logit model can be written as
A + A^

(9.14)

with the usual restrictions on A^ . We see that the model of independence implies
that A^ be zero. Now suppose J > 2, and let j and f be any two columns of the
response variable R. Then,

where j is a fixed reference point and j takes all values except j and with
0 being the ( J 1) linear constraints imposed on the parameters. Then, we have

382

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


\ H _ H _
j j

H
j'

^.R = a^R - aAR


The model described by (9.15) has been referred to as the baseline category model.
The model treats R as a nominal variable and category j as the baseline category. It
does not matter which category is chosen as the baseline. The parameter estimates
under any other choice of j', say j*, are linear transformations of the baseline model
with j = j', and the goodness-of-fit statistics will be identical in both situations
in which j = j' or j = j*. Generally, the last category of variable R is often used
as the baseline, and the baseline category logit model with categorical explanatory
variable A becomes
(9.16)
which corresponds to the log-linear model
(9.17)
where the Vj are scores such that v\ < v-i < , < vj and j3 = (/3^ (3 ft ) . Similarly,
the baseline category logit model with continuous explanatory variable(s) can be
written in the form:
J

9.3.2

j = 1,2,- - . , ( . 7 - 1 )

(9.18)

Example 9.6: Severity of Pneumoconiosis

The data in Table 9.12 are reproduced from Lindsey (1995) and show the severity
of pneumoconiosis as related to the number of years working at a coal face.
Years
0.5-11
12-18
19-24
25-30
31-36
37-42
43-49
50-59

Pneumoconiosis
Normal Mild Severe
98
0
0
2
51
1
34
3
6
35
5
8
32
9
10
23
7
8
12
6
10
4
2
5

Table 9.12: Years of employment and frequency of pneumoconiosis examination


outcome among coal miners
In this example, J = 3 and for a specific year i, let
TTn = the probability that pneumoconiasis is severe for year i
KM = the probability that pneumoconiasis is mild for year i
?T3i = the probability that pneumoconiasis is normal for year i
The baseline category logit model or multinomial logit model, with the last category
being the baseline, sought to model ln(7rij/7T3^) and In(7r2j/7r3j), where TTJ-^J =

9.3. MULTICATEGORY RESPONSE MODELS

383

1,2,3 refer respectively to severe, mild, and normal cases of pneumoconiosis for
a given number of years (i) on the coal surface. Here, we have used the normal
category as the reference, because we are assuming that workers are more likely to
fall ill with increasing number of years of exposure to the coal face than otherwise.
That is, a worker's health would rather deteriorate from normal to severe than the
reverse with increasing years of exposure. To model these data, we could consider
years as either a factor or categorical variable with 8 categories and try to fit a
saturated baseline category model to the data. To implement this, however, we
notice that for years group (0.5-11), two of the observations are zero. This would
result in an infinite number of estimates for years (Agresti, 1996, p. 222). We can
overcome this by simply adding some constant (0.1 or 0.5) to the two observations.
The comparisons of the first year category will definitely depend on the choice of
constants so employed.
The second approach that we would explore here is to consider a linear trend
in age by utilizing the midpoints of the classes for years. In this case, the response
R is the severity of pneumoconiosis and X is the number of years at the coal face.
That is, we wish to fit the model,
/TT- \
In M- =aj+/3jxi, j = l,2;i = l , 2 , - - - ,8
(9.19)
It has been suggested (McCullagh & Nelder, 1989) that a logarithmic transformation
of years will explain the data better. We explore both options (years and log-years)
in what follows. With explanatory variable years, the baseline-category logit model
gives G2 13.90 on 12 d.f. (pvalue = 0.3073), while with the explanatory variable
being the log of years (designated as lyear here), we have G2 = 5.32 on 12 d.f.
(pvalue = 0.9461). Clearly, the log-year model fits better and we will therefore
adopt this model. Basically, the baseline category model reduces for our example
to the following:
. f Severe year 1
^ ,
,
. ^ .
In
r-r= ai + j3i lyear and
(9.20a)
Normal | year
Mild | year
In
(9.20b)
Normal | year
The model in (9.20) has four parameters to be estimated. Below we give the parameter estimates under this model as well as the ML analysis of variance table.
This model fits the data with a G2 = 5.33 on 12 d.f. The effects of lyear are also
significant (pvalue < 0.0001).
data base;
input years $ rep $ year count <B<8;
if rep eq 'sev' then resp='asever';
else if rep eq 'mild' then resp='bmild';
else resp=>normal';
lyear=log(year);
datalines;
1 norm 5.75 98 1 mild 5.75 0 1 sev 5.75 0
8 norm 54.5 4 8 mild 54.5 2 8 sev 54.5 5

run;
proc catmod;
weight count;
direct lyear; model resp=lyear;
run;

384

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


Maximum Likelihood Analysis of Variance

Source
Intercept
lyear
Likelihood Ratio

DF

Chi-Square

Pr > ChiSq

2
2

60.45
45.26

<.0001
C.OOOl

12

5.33

0.9461

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
lyear

Function
Number

1
2
1
2

Estimate
-11.8366
-8 . 8549
3.0249
2 . 1402

Standard
Error
1.9533
1 . 5558
0.5514
0.4501

ChiSquare

Pr > ChiSq

36.72
32.39
30.09
22.61

<.0001
<.0001
<.0001
<.0001

Notice that to implement this model, we do not need to add constants to the two
obervations whose average age on the coal face is 5.75 years because we are now not
fitting a saturated model. The data=order statement asks that the severity, which
is now receded as asever, be used as the first category. Generally, CATMOD always
makes the category with the highest value of the response variable the reference
category or the last in alphabetical order if the categories of the response variable
are read in as alphanumeric.
From the above result, we have, where x In (years),
ln(7ri/7T 3 ) = -11.8366 + 3.0249 Zi and
In (712/71-3) = -8.8549 + 2.1402 Xi
Consequently, the estimated log odds that the response is "severe" rather than
"mild" equals for a given x.;,
In(7ri/7r 2 ) = (-11-8366 + 8.8549) + (3.0249- 2.1402) Zi
= -2.9817 + 0.8847 Xi
^ ' '
The SAS software output shows that the Wald statistics for the four parameters
are all significant (p < .0001).
The odds of severe versus normal pneumoconiosis are estimated to be
=exp[-l 1.8366 + 3.0249* lyears]
L Normal J
For coal miners employed for two years, the model estimates the odds of pneumoconiosis to be YTTjoo and is at -, , at 10, 15, and 20 years, respectively.
That is, one of every 17 miners working on the coal face for 20 years will be at
severe risk of the illness.
Similarly, the odds of a mild pneumoconiosis is estimated to be
Mlld
1 =exp[-8.8549 + 2.1402(lyears)l
Normal J
The odds of a mild form of pneumoconiosis for miners who have been employed

for 2, 10, 15, and 20 years on the coal face are estimated to be , , ,
respectively. That is, 1 in 13 coal face miners who have been employed for 20 years
will develop a mild illness of pneumoconiosis.

9.3. MULTICATEGORY RESPONSE MODELS

9.3.3

385

Obtaining Response Probabilities

Estimated response probabilities (Agresti, 1990) can be obtained from the following
expression
iJ 9J ...
J V *( ^/ _ "H
/

fQ
22")
\*Jiitl

c+&*)

Using (9.22), we can obtain the estimated response probabilities of the outcomes
(severe, mild and normal) as:
exp(-11.8366 + 3.0249z)
1 + exp(-11.8366 + 3.0249x) + exp(-8.8549 + 2.1402z)
exp(-8.8549 + 2.1402x)
7T =
1 + exp(-11.8366 + 3.0249z) -f exp(-8.8549 + 2.1402x)
1
7T3 =
1 + exp(-11.8366 + 3.0249z) + exp(-8.8549 + 2.1402z)
2

where again x = lyears is the log of years. When j; = J under the baseline category
model, then, dj = /3j = 0. Consequently, exp(o;3 + ^3) = 1. This accounts for the
1 in the expression for the numerator in the expression for TT3 above. For year 46,
then, x = In (46) = 3.8286 and therefore,
.
exp[-8.8549 + 2.1402(3.8286)]
71-2
~ 1 + exp[-11.8366 + 3.0249(3.8286)] + exp[-8.8549 + 2.1402(3.8286)]
= 0.2254

Figure 9.4: Predicted probabilities for severity of sickness


In Figure 9.4 is the plot of these predicted probabilities against log of years on the
coal face. Here, the solid line represents those for mild, the normal has the top
dotted line while the severe category has the elongated dotted line.
If we wish to test the hypothesis that all the parameter estimates for the
ln(7rii/7T3j) model are equal to those of the In(7r2i/7r3i) model in equations (9.20),
that is,
HQ : cti = a-2

and fl\ = fa

we can accomplish this with the following statements in SAS software by including
the keyword -RESPONSE, as a variable in the model statement. In this case, a
single set of parameters rather than separate sets of parameters at each cutpoint
will be produced.

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

386

set base;
proc catmod; weight count; direct lyear;
model resp=_response_ lyear/noiter; run;
Maximum Likelihood Analysis
Maximum likelihood computations converged.
Maximum Likelihood Analysis of Variance
Source
Intercept
_RESPONSE_
lyear

DF

Chi-Square

1
1
1

59.87
0.44
44.77

Pr > ChiSq
<.0001
0 . 5080
<.0001

Likelihood Ratio

Analysis of Maximum Likelihood Estimates

Parameter

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

Intercept
_RESPONSE_ 1
lyear

-10.2032
0.0733
2.5445

1.3187
0.1107
0.3803

59.87
0.44
44.77

<.0001
0.5080
<.0001

The corresponding partial output is also displayed and if we compare the G2 for this
model with the earlier model with separate sets of parameters for each cutpoint,
we have G2 = 7.21 5.33 = 1.88 on 1 d.f., which indicates that this hypothesis
is tenable. Thus the pair of coefficients do not differ across the two models. We
can also accomplish the above test by including a contrast statement in the original
statements:
model resp=lyear; contrast 'test' 31 lyear 1 82 lyear -1;

The contrast statement on implementation gives a Wald's Q = 1.88 on 1 degree of


freedom. Again, there is no reason to doubt the hypothesis of equal set of parameter
values for the two cutpoints. In other words, we can fit a single model of the form:
In (Tri/TTg) = -10.2032 + 2.5445z; and
In (7T2/7T3) = -10.2032 + 2.5445^
from which the relevant estimated response probabilities can be computed.

9.3.4

Example 9.7: Breast Examination Data

The data below are the results of a survey of women, relating frequency of breast
self-examination and age (Senie et al., 1981).
Age

<45
45-59
60+

Freq. of breast self-examination


Monthly Occasionally Never
91
90
51
150
200
155
172
109
198

Table 9.13: Frequency of self-examination of breast by age


For the data in Table 9.13, the explanatory variable is categorical and we would
want to fit a baseline category logit model of the form:

9.3. MULTICATEGORY RESPONSE MODELS

387

In & = aj + Pj age j = 1, 2

(9.23)

Because variable age is categorical with three levels, two dummy variables Z\ and
Zi would need to be created. PROC CATMOD creates its dummy variable by
utilizing the effect coding scheme, where
f
1 if age <45
f
1 if age 45-59
Zi = < -1 if age 60+
Z2 = < -1 if age 60+
[ 0 Otherwise
[ 0 Otherwise
In this setup the baseline category logit model in (9.23) takes the form:
In />A

= a + p Zl

+ #yZ 2 ,

j = l,2

(9.24)

Of course, it would be much simpler, especially if further analysis would be necessary


to use the reference coding scheme where
1 for age <45

f 1 for age 45-59


*) - \
0 elsewhere
| 0 elsewhere
If we adopt this last coding scheme, then a baseline category logit model where the
order of the response variable is (monthly, occasionally, never} has Never as the
baseline category. We display below the SAS software output from the fit of this
model to the data in Table 9.13. The ages are coded 1 to 3 for age groups <45,
45-59, and 60+, respectively.
DATA breast;
INPUT AGE EXAM $ COUNT <D<8;
Z1=(AGE= 1);
Z2=(AGE= 2);
X=AGE;
DATALINES;
1 MONTH 91 1 OCCASS 90 1 NEVER 51
2 MONTH 150 2 OCCASS 200 2 NEVER 155
3 MONTH 109 3 OCCASS 198 3 NEVER 172
PROC PRINT;
RUN;
PROC CATMOD ORDER=DATA; WEIGHT COUNT; DIRECT Zl Z2;
MODEL EXAM=Z1 Z2/WLS NOITER; RUN;
Maximum Likelihood Analysis of Variance
Source

DF

Intercept
Zl
Z2

2
2
2

Likelihood Ratio

Chi-Square
25.61
24.33
6.67

Pr > ChiSq
<.0001
<.0001
0.0356

Analysis of Maximum Likelihood Estimates

Effect
Intercept

Zl
Z2

Parameter

1
2
3
4
5
6

Estimate

Standard
Error

ChiSquare

-0.4561
0.1408
1.0352
0.4272
0.4234
0.1141

0.1224
0.1042
0.2135
0.2039
0.1677
0.1494

13.88
1.82
23.51
4.39
6.38
0.58

Pr > ChiSq
0.0002
0.1768
<.0001
0.0362
0.0116
0 . 4449

388

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

The model is saturated and indicates that the effect of age group is very evident
in the frequency of self-examination of breasts by women in the survey. The above
estimates of the parameters lead to the following estimated logit equations:
In (Tri/Tra) = -0.4561 + 1.0352zi + 0.4234z2
In (7T2/7T3) = 0.1408 + 0.4272^1 + 0.1141z2
The age effect seems to be concentrated on the <45 group relative to the over 60+
group for the "monthly" versus "never" equation and similarly on the 45-59 group
relative to the 60+ group for the "monthly" versus "never" equation with pvalues
of <0.0001 and 0.0116, respectively. We can therefore say that those who are under
45 years of age are exp(1.0352) = 2.82 times more likely to examine their breasts
monthly rather than never than in the 60+ years old group. Similarly, the 45-59
age group has odds that are exp(0.4234) = 1.53 times higher than the over 60+
years group of monthly breast examination rather than never breast examination.
The analysis also show that the group under 45 years are exp(0.4272) = 1.53 times
more likely to examine their breasts occassionally than the 60+ age group.
The predicted probabilities of each of our responses are designated here as pil
to pi3. For instance, the estimated probability for ^2 = "occassionally" when age
group 45-59 is 0.3960, which is computed as:
1 + ui + (jJi

where
wi = exp[-0.4561 + (1.0352 * zi) + (0.4234 * z 2 )]
u<2 = exp[0.1408 + (0.4272 * zi) + (0.1141 * z2)]
and substituting z\ = 0 and z? = 1 in the above expression.
In the table below are these estimated probabilities for values of the explanatory
variables.
AGE

1
1
1
2
2
2
3
3
3

ZI

Z2

pil

1
1
1
0
0
0
0
0
0

0
0
0
1
1
1
0
0
0

0.3923
0.3923
0.3923
0.2970
0.2970
0.2970
0.2276
0.2276
0.2276

pi2

0 . 3879
0.3879
0.3879
0 . 3960
0 . 3960
0 . 3960
0.4134
0.4134
0.4134

pi3

0.2198
0.2198
0.2198
0 . 3069
0.3069
0 . 3069
0.3591
0.3591
0.3591

The model that assumes equal slope for Z\ and Z2, that is, HQ : @\j faj,! = 1 2
in equation (9.24) gives G2 = 10.52 on 2 d.f. (pvalue = 0.0052), indicating that the
model is not tenable.
Suppose, instead of the above analysis, we have assumed that the variable age
group has a linear effect. That is, the effect from age group "<45" to "45-59" is the
same as the effect from "45-59" to "60+." If we designate these levels by integer
scores {X1, 2, 3}, respectively, and we apply this in our model, we have a model
with G2 = 0.53 on 2 d.f. (p = 0.7674). The model fits well the data. The parameter
estimates under this model are given below.

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

389

set breast;
PROC CATMOD ORDER=DATA;
WEIGHT COUNT; DIRECT X; MODEL EXAM=X/ML NOITER; RUN;
Analysis of Maximum Likelihood Estimates

Effect

Parameter

Intercept

1
2
3
4

Estimate

Standard
Error

ChiSquare

1.0139
0.6878
-0.4985
-0.1904

0.2367
0.2283
0.1025
0.0955

18.35
9.08
23.64
3.97

Pr > ChiSq

< . 0001
0 . 0026
<.0001
0 . 0463

Again, a model that assumes equal slopes for the X variable gives a G2 = 11.18
0.53 = 10.65 on 1 d.f. This again indicates that this assumption is not tenable.
Hence, the estimated baseline category models are:
In (TTI/TTS) = 1.0139 - 0.4985 x
ln(7T2/7T3) = 0.6878 - 0.1904 a:
The baseline category logit model can also be fitted to situations involving either
several explanatory categorical variables or a mixture of explanatory categorical
variables and continuous type variables. Some of the exercises at the end of this
chapter are examples of these cases.

9.4

Models for Ordinal Response Variables

We shall consider here the situation in which the multinomial response variable has
J ordered categories. In this case, it is always possible to reduce these J categories
to a two-category variable by collapsing over some categories: for example, category
1 versus others or category i versus category j such that i / j and where category
j implies all other categories (except i) combined. There are (2) such possible
pairs of categories, and it is quite possible to model each pair by using either the
complimentary log-log model or the logit model, where analysis is carried out in
turn on each (2) pairs of categories.
The above approach, however, is not without its drawbacks, namely, that the
resulting (2) models for the proportions in the J categories may not in general give
fitted probabilities that sum to 1, and that the models themselves might involve
different functions of the explanatory variable or different link functions (see Aitken
et al., 1989). The above drawbacks can be overcome by the use of the multinomial
logit model. We shall next discuss some of the specialized models that are available
for the ordinal response variable.

9.4.1

Cumulative Logit Model

The cumulative logits are defined for observed counts fij as:
J

for j 1, 2, , ( J 1). We illustrate this concept with an example below where


for a given i, the ordinal response variable has J = 4 categories.

390

CHAPTER 9. LOGIT AND MULTINOMIAL

RESPONSE MODELS

j
i

1
Low
/ii

2
Medium
/

3
High
fi3

4
Very high
/*

Then, for a particular level i of the explanatory variable X, we have the following
decomposition of the cumulative model:
/ii
In
= on +
ji

= logit

Low
> Medium/

In
Low

In

ri4

The above model thus has one slope (/3) but three intercepts (cut points): 01,02,
and 0:3 respectively.
The simplest cumulative logit model has
The model implies that the response variable is independent (simultaneously) of
the explanatory variable X. In this case, the {o^}, the cut-point parameters, are
nondecreasing in j. An application of the cumulative logit model to the data in
Table 9.12 is implemented in PROC GENMOD as follows:
set base;
proc genmod; freq count;
model resp=lyear/dist=multinomial
link=cumlogit
aggregate=lyear typel;

run;

Model Information
Data Set
WORK.BASE
Distribution
Multinomial
Cumulative Logit
Link Function
Dependent Variable
resp
Frequency Weight Variable
count
Observations Used
22
Sum Of Frequency Weights
371
Missing Values
2
Response Profile
Ordered
Total
Value
resp
Frequency
1
asever
44
2
bmild
38
3
normal
289
PROC GENMOD is modeling the probabilities of levels of resp having LOWER Ordered
Values in the response profile table. One way to change this to model the
probabilities of HIGHER Ordered Values is to specify the DESCENDING option in the
PROC statement.

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

391

Criteria For Assessing Goodness Of Fit


Criterion

DF

Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood

13
13
13
13

Value

Value/DF

5 . 0007
5 . 0007
4.6806
4.6806
-204.2611

0.3847
0 . 3847
0 . 3600
0 . 3600

Analysis Of Parameter Estimates

Parameter
Interceptl
Intercept2
lyear
Scale

DF

1
1
1
0

Estimate
-10.5729
-9.6673
2 . 5943
1 . 0000

Standard
Error
1 . 3446
1.3241
0.3812
0 . 0000

Wald 957. Confidence


Limits
-13.2081
-12.2625
1.8472
1 . 0000

-7.9376
-7.0721
3.3414
1 . 0000

Source

LR Statistics For Type 1 Analysis


ChiDeviance
DF
Square
Pr > ChiSq

Intercepts
lyear

10 1 . 6406
5.0007

96.64

ChiSquare

Pr > ChiSq

61.83
53.30
46.32

<.0001
<.0001
C.0001

<.0001

The aggregate^ lyear in the model statement asked that lyear be used to define
the multinomial populations for computing the goodness-of-fit test statistics. A test
of the significance of the covariate lyear is provided by the typel options. This test
is highly significant. It is important to make sure that the response profile is such
that the level we wish to model is indeed the one being modeled. PROC GENMOD
has several options to implement this reordering should this be the case. From the
above results we have the following estimated cumulative logit equations
In ( 7 - j = -10.5729 + 2.5943 lyear

(9.26a)

V 1- T

In ( 7r / 1+71 ' 2 , J

= -9.6673 + 2.5943 lyear

(9.26b)

We consider in the next sections some of the specialized cumulative models that
have been applied to categorical data having ordinal response variable(s).

9.4.2

The Proportional Odds Model

For this model, each category of the J ordinal category response variable is considered in turn and the frequency of response at least up to that point on the ordinal
scale is compared to the frequency for all points higher on the scale. That is, the
first category is compared to all the rest combined, then the first and the second
combined are compared to all the rest combined, and so on, which results in the
original J-response table being converted into a series of J 1 subtables, each with
a binary categorization, lower or higher than the point on the scale. The model can
be formally written for a single explanatory variable X as:
r p(y < j I r.\i
In L " . * = a j + f e j = l , 2 , . . . ( J - l ) ; * = l , 2 , . . . , /
(9.27)
> j xz)
We see that there are (J 1) separate proportional odds equations for each possible
cut point j, with each having a common slope /3 and different intercepts aj. In this
formulation, the o^'s are not themselves important, but the slope parameter (3 is.

392

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

As an example of the application of this model, let us consider again the data
in Table 9.12 relating years of employment on the coal face to the development of
pneumo coniosis .
If we let 7Tj|i be the probability that the response is in category j given that the
explanatory variable is Xf, then
j
5^7Tj-|i = li

= 1,2, , /

3=1

Thus there is a linear restriction on the Ttj\i, and there would therefore be I(J 1)
parameters of the conditional multinomial distribution. For the data in Table 9.12,
J = 3, and we would therefore have (J 1) = 2 cut points in our model. The
proportional odds model for this data can thus be formulated as follows:
P(Severe I x)
. [
'" p(Mild or Normal
. [PfSevere or Mild I x^)]
.,
In L
P,M-n\
P(Normal
| Xi)
\ = 0:2 + & lyear
Although there are four parameters to be estimated in the above model, the proportional odds model assumes that J3\ = fa (3. This assumption will be tested with
what is called the score test for the proportional odds assumption. If this assumption holds, then the model has three parameters to be estimated from the data. We
give below the SAS software statements to implement this model together with a
modified SAS software output from the implementation.
set base; proc logistic; weight count;
model resp=lyear/scale=none aggregate; run;
Score Test for the Proportional Odds Assumption
Chi-Square

DF

Pr > ChiSq

0.7096

0.1387

Deviance and Pearson Goodness-of-Fit Statistics


Criterion
DF
Value
Value/DF
Pr > ChiSq
Deviance
13
5.0007
Pearson
13
4.6806
Number of unique profiles: 8

Analysis of Maximum Likelihood Estimates


Standard
DF
Estimate
Error
Chi-Square

Parameter

Intercept
Intercept2
lyear

Effect
lyear

0.3847
0.3600

1
1
1

-10.5728
-9.6672
2 . 5943

1 . 3463
1 . 3249
0.3813

61.6776
53.2392
46 . 2850

0.9752
0.9816

Pr > ChiSq

C.0001
C.0001
<.0001

Odds Ratio Estimates


Point
957. Wald
Estimate
Confidence Limits
13.387

6.340

28.268

The score test for the proportional odds assumption gives a G2 = 0.1387 on 1 d.f. (p
= 0.7096). Thus the assumption that (3\ fa is well justified. Its degree of freedom
is obtained as P(J2), where P is the number of explanatory variables in the model.

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

393

In our case, P = 1 and hence the d.f. = 1(3-2) = 1. The goodness-of-fit test statistic
for this model is the deviance or G2 = 5.0007 on I(J - 1) - r = 8(3 - 1) - 3 = 13
degrees of freedom. Here r is the number of parameters being estimated and I
is the number of cell combinations generated by the product of the levels of the
explanatory variables. In this example, 7 = 8 and r = 3.
The odds of severe pneumoconiosis is estimated to be
if^Vf-*T*f-*

Mild or Normal]

= exp(-10.5728 + 2.5943 lyear)

For coal miners employed for 2 years, the model estimates the odds of pneumoconiosis to be
, and
, , and at 10, 15, and 20 years, respectively. That
is, one of every 18 miners working on the coal face for 20 years will be at severe risk
of the illness.
Similarly, the estimates of the odds of severe or mild pneumoconiosis are estimated to be
r Sever or Mild
,
(_9.6672 + 2.5943 j
L Normal J
The odds of pneumoconiosis in this case, for 2, 10, 15, and 20 years on the coal
face, are again estimated to be
, , , and -, respectively. That is, 1 in 8
coal face miners who have been employed for twenty years will develop a severe or
mild illness of pneumoconiosis. The results here are very consistent with our earlier
results, which utilized the baseline category logit model. However, the proportional
odds model is more parsimonious than the baseline category model. The above
analysis is contingent upon the proportional odds assumption being true. The
assumption assumes the same slope for each odds.

9.4.3

Breast Examination Example Revisited

We shall next consider, in the following sections, the fitting of the proportional odds
model when the explanatory variable has more than two categories or when there
are several explanatory variables, each having two or more categories. We shall
illustrate the first with the data in Table 9.13. The proportional odds model can
be formulated again as:
, f
P (monthly
y I age)
In
' K'
P(occasionally or never | age)
In

f P(monthly or occasionally | age) 1


;
=
[
P(never | age)
J

Since the explanatory variable is categorical, we again create two dummy variables
with the last category as the reference category. The two indicator variables are
j 1 for age <45
10 elsewhere

Z\ = <

j 1 for age 45-59


10 elsewhere

Zi = <

Specifically in this problem, we have the following 6 equations:

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

394

In

P(monthly | 45-59)
P(occasionally or never | 45-59)

In
In
In

P(monthly | <45)
P(occasionally or never <45)

f
P(monthly | 60+)
[P (occasionally or never | 60+)

P(monthly or occasionally | <45)


P(never | <45)
P(monthly or occasionally | 45-59)
P (never | 45-59)

In

= Oil + 02

P (monthly or occasionally 60+)]


P(never 60+)

Here J 3, / = 3. The model has a total of 4 parameters to be estimated from the


data. We give below some SAS software output from this analysis.
set breast; proc logistic order=data; freq count;
MODEL EXAM=zl z2/scale=none aggregate; test: test zl=z2; run;
Score Test for the Proportional Odds Assumption
Chi-Square

DF

Pr > ChiSq

0.7077

0.7020

Deviance and Pearson Goodness-of-Fit Statistics


Criterion

DF

Deviance
Pearson

Value

Value/DF

Pr > ChiSq

0.7124
0.7127

0.3562
0.3563

0.7003
0.7002

Number of unique profiles: 3


Analysis of Maximum Likelihood Estimates
Standard
Error

DF

Intercept
Intercept2
Zl
Z2

Effect
Zl
Z2

1
1
1
1

-1.1783
0.5514
0.7314
0 . 2895

0.0943
0.0888
0.1494
0.1181

Chi-Square
156.1188
38 . 5888
23.9616
6 . 0063

Pr > ChiSq
<.0001
<.0001
c.OOOl
0.0143

Odds Ratio Estimates


Point
957. Wald
Confidence Limits
Estimate
2.078
1.336

1.550
1.060

2.785
1.684

The proportional odds model assumption that the slope is the same across the
response profiles is tested with a score test of 0.7077 on 2 d.f. (p = 0.7020). Here
P = 2, hence d.f. = 2(3 2) = 2. This assumption is well satisfied. The model
gives G2 = 0.7124 on 3(3 - 1) - 4 = 2 d.f. (p = 0.7003), which indicates that the
model again fits well the data. The proportional odds or cumulative odds model
can also be implemented for the Breast examination data in Table 9.13 via PROC
CATMOD and GENMOD with the following statements.

9.4. MODELS FOR ORDINAL RESPONSE

395

VARIABLES

set breast; PROC CATMOD order=data; WEIGHT COUNT;


DIRECT Zl Z2; response clogits;
MODEL EXAM=_response_ zl z2/WLS NOITER; RUN;
proc genmod rorder=data; class age; freq count;
model exam=age/dist=multinomial
link=cumlogit
aggregate=age
typel;
estimate 'oddsl' age 1 -1/exp; estimate 'odds2' age 1 0 -1/exp;
estimate 'oddsS' age 0 1 -1/exp; run;
Contrast Estimate Results

Label
oddsl
Exp( oddsl)
odds 2
Exp(odds2)
odds3
Exp(odds3)

Estimate

0..4419
1..5557
0,.7315
2,.0781
0..2896
1.,3359

Standard
Error
0.1477
0.2298
0.1493
0.3104
0.1181
0.1577

Alpha

Confidence Limits

0,.05

0,.1523

0.05
0.05
0.05
0.05

1 .1645
0.4388
1 .5508
0..0582
1,.0599

0.05

0,.7315
2.0782
1 .0242
2 .7848
0.5210
1..6837

ChiSquare

Pr > ChiSq

8.95

0.0028

23.99

<.0001

6.02

0.0142

Notice that the implementation of this model in PROC GENMOD instructs genmod
to model the response profiles as MONTHLY, OCCASIONALLY, and NEVER in
that order. This is accomplished with the rorder option in the PROC GENMOD
statement line. The odds computed agree with those obtained under the logistic
model approach. We also observe here that we have have used the CLASS statement in GENMOD, because the coding here is reference coding. We accomplish
same in PROC LOGISTIC with the REFlast statement.
A further test of the equality of the j3 parameters, that is, (3i =02, gives a Wald
test statistic of 8.9872 on 1 d.f. (p = 0.0027) in PROC LOGISTIC, indicating the
rejection of this hypothesis. This hypothesis is tested in PROC GENMOD with the
oddsl estimate with a pvalue of 0.0028. The same conclusion is obtained.
Similarly, a test of the hypothesis 0i = {3% = 0, which is equivalent to the model
of independence, also gives a Wald test statistic of 24.1866 on 2 d.f. (p < .0001),
again indicating that the model of independence is not tenable and that the model
permitting an effect fits better than the model of independence. These tests are
implemented in PROC LOGISTIC with the following for this model.
proc logistic order=data; freq count;
model exam=zl z2/aggregate scale=none; test: test zl=z2;
test2: test zl, z2; output out=aa p=phat; run;
Linear Hypotheses Testing Results
Wald
Label
Chi-Square
DF
Pr > ChiSq
test
test2

8.9872
24.1866

1
2

0 . 0027
<.0001

Estimated Response Probabilities

Estimated response probabilities can easily be calculated. For instance, the cumulative probabilities are computed from

P(R < j ) =

I + exp(o!j + izi + /32z2]

396

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

The second estimated cumulative probability for respondents who are in the age
group 45-59 and are monthly or occasionally examining their breasts (z\ = 0, 22
l,j = 2) equals
= 0.2914
where
w2 - exp[-1.1783 + 0.2895 (1)]
These probabilities are obtained in PROC LOGISTIC with the option statement
output out=aa p=phat in the program above.
Sometimes if the proportional odds model does not fit well the data of interest,
a partial proportional odds model (Koch et al., 1985) is suggested as an alternative.
We can of course seek other multinomial logit models that might fit the data better.
In what follows, we again consider age as a continuous variable having a linear
effect with categories again assigned integer scores {1, 2, 3}. In this case, the
cumulative logit model again fits the data with G2 = 1.1839 on 3 d.f. (p = 0.7569).
The hypothesis of common slope, which is tested by the score test, has p =
0.4736, indicating that the proportional odds model assumption of equal slopes is
satisfied. We present below the result of this analysis.
set breast; proc logistic order=data; weight count;
model exam=x/scale=none aggregate; run;
Score Test for the Proportional Odds Assumption
Chi-Square

DF

Pr > ChiSq

0.5135

0.4736

Deviance and Pearson Goodness-of-Fit Statistics


Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

3
3

1.1839
1.1904

0.3946
0.3968

0.7569
0.7553

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
Intercept2
X

DF

1
1
1

Estimate
-0.1394
1 . 5899
-0.3535

Standard
Error

Chi-Square

0.1684
0.1751
0.0726

0.6853
82.4020
23.7230

Pr > ChiSq
0 . 4078
<.0001
<.0001

We have for a given age = z, exp(0.3535) 0.7022, implying that each level unit
increase in the level of that variable multiplies the odds of monthly versus (occasionally or never), (monthly or occasionally) versus (never) by 0.7022. Consequently,
the odds of examination monthly are 0.7022 times greater when the respondent is
in the age group <45 than when the respondent is in age group 45-59. The odds are
0.70222 = 0.4931 times greater when the respondent is in age group <45 or 45-59
than when the respondent is in age group 60+. In other words, those in the 60+
group are 2.03 times more likely to monthly examine their breasts than those in the
<59 year age group.
It is obvious from the two examples examined in this section that the proportional odds model is equivalent to the cumulative logit model. While PROC GENMOD and CATMOD can be employed to implement the cumulative logit model,
PROC LOGISTIC can similarly be employed to implement the equivalent proportional odds model.

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

9.4.4

397

Adjacent- Category Logits

For an ordinal response variable with J categories, comparing adjacent categories


(that is, each category to the next category) leads to what has been described (denned here in terms of expected frequencies) as the adjacent category logits ( Agresti,
1990). It is defined for an individual or subject i to be:
-=a

-+X,

j = l,2,..-,(J-l)

(9.28)

If we let the expected frequencies be modeled as in (9.28), then the above model is
also equivalent to:
J+i.\=a.+f3'Xi

j = i)2,...,(J-l)

(9.29)

In the above two formulations, the adjacent-category model is obtained by the


imposition of a constraint on the set of ( J 1) equations in either (9.28) or (9.29).
Specifically, we impose the constraint ftj = {3 for all j. This is accomplished in
CATMOD with the .response, in the model statement.
The above does define logits for all ( 2 ) pairs of response categories, and can be
interpreted as the odds of getting category j relative to category j + I. They can
also be viewed as the conditional odds given that either category j or j + 1 occurs.
Let us reanalyze the breast examination survey data data in Table 9.13 using
the adjacent category model approach. This model has indicator variables Zi and
Z2 as defined previously. To fit adjacent-category models in PROC CATMOD
in SAS software, we shall define the response as ALOGITS or ALOGIT. The
adjacent-category model when fitted to the data has the partial SAS software output
displayed below. The goodness-of-fit test statistic for this model has G2 = 0.63 on
2 d.f. (p = 0.7308). The model fits the data very well.
set breast;
proc cat mod order=data; weight count; direct zl z2 x;
response alogits; model exam=_response_ zl z2/ml noiter;
contrast
'three' zl 1 z2 -1; run;
Analysis of Weighted Least Squares Estimates

Effect
Intercept
_RESPONSE_
Zl
Z2

Parameter

1
2
3
4

Estimate
0.2226
0.3112
-0.5158
-0.2059

Standard
Error
0 . 0608
0.0590
0.1059
0.0835

ChiSquare

Pr > ChiSq

13.39
27.86
23.71
6.08

0.0003
<.0001
<.0001
0.0137

The ML estimates of the age group effects are J3\ = 0.5158 and fa 0.2059.
The effects of f3s are significant. Based on the above parameter estimates, therefore,
the estimated odds that an individual in age group <45 does do self-examination
of breast as being in category j + 1 rather than in j are exp(0.5158) = 0.597
times the estimated odds for those in age group 60+. Further, the estimated odds
of "monthly" instead of "occasionally" self-examination of breast of the <45 age
group is about 60% of the odds of the 60+ group. In general, the estimated odds
for any pair of response categories (cl,c2),cl > c2 is exp[/?i(cl c2)] for the <45
age group and exp[fa(cl c2)] for the 45-59 age group (all relative to the last
category). For instance, the estimated odds of "Monthly" (category 1) instead of

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

398

"Never" (category 3) is e[-o.5i58(3-i)] = 0.355. That is, those in the age group <45
have odds that are about 36% of the odds of those in the 60+ group in the response
to self-examination of breast. In the above model, we see that the effect of Zi is
also significant and similar interpretations can be made for the 45-59 group relative
to the 60+ group. A test of whether the effect of Z\ is significantly different from
that of Z<i is provided by the contrast statement in the program above. This test
yields a Wald WQ = 8.95 on 1 d.f., with a pvalue of 0.0028.
An equivalent log-linear model formulation of the adjacent category model can
be implemented with the following SAS software statements:
data breast2;
input age exam count <B<D;
examl=exam;
datalines;
1 1 91 1 2 90 1 3 51 2 1 150 2 2 200 2 3 155
3 1 109 3 2 198 3 3 172
proc genmod; class exam age;
model count=age exam age*examl/dist=poi type3; run;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2

DF

Value

Value/DF

2
2
2
2

0.,6268
0.,6268
0..6274
0.,6274

0.3134
0.3134
0.3137
0.3137

Analysis Of Parameter Estimates


Standard
Error

Wald 95'/. Confidence


Limits

Parameter

DF

Intercept

5 . 1664

0.0717

5.0257

1
1

-0.5160
-0 . 2060

0.1060
0.0833

-0.7237
-0.3693

examl*age
examl*age

1
2

Estimate

5 . 3070
-0.3083
-0.0428

ChiSquare

Pr > ChiSq

5185.60

<.0001

23.71
6.12

<.0001
0.0134

Note that, "exam" and "age" group are declared as categorical variables, while a derived variable examl from a data step is declared as a quantitative (integer scored)
variable. The log parameter estimates for examl*age agree with those obtained
from PROC CATMOD.
Again if we consider age as a continuous variable, then a fit of this model has
the following results:
set breast;
proc catmod order=data; weight count; direct x;
response alogits; model exam=_response_ x/uls noiter; run;
Analysis of Weighted Least Squares Estimates

Effect
Intercept
_RESPONSE_
X

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

-0.5112
0.3107
0.2498

0.1190
0.0589
0.0516

18.45
27.78
23.47

<.0001
<.0001
<.0001

The WLS estimate for this model is /3 = 0.2498. The model fits the data with Wald
statistic Q = 1.07 on 3 d.f. (p = 0.7832). Based on this model, the estimated odds
that an individual in age group <45 does do self-examination of breast as being in

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

399

category j + 1 rather than j are exp(0.2498) = 1.284 times the estimated odds for
those in age group 45-59. This implies that a percentage change in the odds for
each 1-unit increase (group change e.g. <45 to 45-59) in age group is associated
with a 28% increase in the odds of being in j + 1-th rather than in the j-th response
category.
The estimated odds of "Monthly" (category 1) instead of "Never" (category 3)
is e[-2498(3-i)] _ i ^5 That is, those in the age group <45 have odds that are about
65% higher than those in the 45-59 age group. Further, those in the age group <45
have odds that are about (1 - 1.652) x 100 = 172% higher than those in the 60+
group in the response to self-examination of breast. The latter result is as a result
of the linear assumption for the effect of age.
When the adjacent-odds ratio model is applied to the coal miners data (with 0.1
added to the cells with zero counts), the model gives a Wald statistic of Q = 4.70 on
13 d.f. (p 0.9812). The model clearly fits the data with an estimated equation:
In (

mij

) = 6.0428 - 1.5490 lyear for j = 1, 2

(9.30)

For a miner who has spent 10 years on the coal face, the odds that the individual would have a severe pnemoconiosis rather than a mild pnumoconiosis are
exp(2.476)=11.9 times higher. Similarly, the predicted odds that the individual
would have mild pnemoconiosis rather than normal are also 11.9 times higher. In
other words, whenever we compare adjacent pneumoconiosis categories, miners have
about 11.9 times more chance of being in the worse category for those that have
been employed for 10 years. Similar odds can be obtained for those with 15 or 20
years of employment on the coal face.

9.4.5

Continuation Ratio Model

For an ordered response variable, the continuation ratio logits are defined as

(9-31)

which in terms of probabilities, becomes


In ( -?1- } + In ( -^- + + In
\K2 + --- + TrjJ
\7r3 + - - - + 7

(9.32)

The model can alternatively be defined as:


In

li

+ In

aa

+ . . .+m

--

(g>33)

where the first category is compared to the second, the first and second combined
to the third, and so on. The two model formulations lead to different parameter
estimates as well as different goodness-of-fit test statistics. We shall adopt the form
in (9.33) in this text.
The continuation ratios model resembles the proportional odds model, except
that for each category of the ordinal table considered in turn, the frequency of the

400

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

response variable at least up to that point on the ordinal scale is compared only to
the frequency for the immediately following category. The implementation of this
model involves creating a series of subtables and fitting individual logistic regression
models to these subtables. We illustrate the fitting of the continuation ratio model
by applying this model to the data in Tables 9.12 and 9.13 again. First for the data
in Table 9.12, we form two separate subtables of successes/failures (a) and (b) in
the first subtable and (a)+(b) and (c) in the second subtable. The first subtable
/severe \
(ratio 1, Table 1) corresponding to the ratio I - 1, while the second ratio (Table
V mild /
.
/ severe+mild \ _
.
_
.
,
,
2)N corresponds to
: . Ihe results from forming these subtables are
\ normal /
displayed in the Table 9.14.
severe
(a)
0
1
3
8
9
8
10
5

Table 1
Mild
(b)
0
2
6
5
10
7
6
2

n
0
3
9
13
19
15
16
7

Table 2
Severe+mild Normal
(a)+(b)
(c)
0
98
3
51
34
9
13
35
32
19
15
23
12
16
4
7

n
98
54
43
48
51
38
28
11

years
5.75
15.00
21.50
27.50
33.50
39.50
46.00
51.50

Table 9.14: Subtables formed from the original data


To fit the continuation ratio model to these data, we now fit separate logistic regression models to each subtable using the log of years as the explanatory variables.
Goodness-of-fit statistics for these models as well as the parameter estimates are
displayed below.
data crm;
input normal mild severe year <D<D;
lyear=log(year);
datalines;
98 0 0 5.75 51 2 1 15.0

12 6 10 46.0 4 2 5 51.50
run;
data newl;
set crm;
table=l; r=severe; n=severe+mild; output;
table=2; r=severe+mild; n=normal+mild+severe; output; run;
proc print; run;
SUB-TABLE ONE
proc logistic; where table=l;
model r/n=lyear/scale=none aggregate;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

1.7471
1.7356

0.3494
0.3471

0.8829
0.8844

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

401

Analysis of Maximum Likelihood Estimates

Parameter

DF

Estimate

Standard
Error

Chi-Square

Pr > ChiSq

Intercept
lyear

1
1

-3.8639
1.1363

2.6880
0.7588

2.0663
2.2424

0.1506
0.1343

SUB-TABLE TWO
proc logistic;
where table=2;
model r/n=lyear/scale=none aggregate;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion
Deviance
Pearson

DF

Value

Value/DF

Pr > ChiSq

6
6

3.1045
2.5801

0.5174
0.4300

0.7956
0.8594

Number of unique profiles: 8


Analysis of Maximum Likelihood Estimates

Parameter

DF

Estimate

Standard
Error

Chi-Square

Pr > ChiSq

Intercept
lyear

1
1

-9.5997
2.5734

1.3399
0.3866

51.3293
44.3199

<.0001
<.0001

The G2 for the full continuation model is the sum of the separate G2 for each
subtable and corresponds to the G2 for simultaneous fitting of the two models. In
this case our overall G2 = 1.7471 + 3.1045 = 4.8516 on 5 4- 6 = 11 degrees of
freedom. The model certainly fits the data. The odds can be computed for each
ratios as before, and we see here that the continuation ratio has two intercepts and
two slopes, in contrast to the proportional odds model with a common slope.
Lindsey (1995) has suggested that the full continuation ratio should include the
subtables as a factor variable in the model. When we implement this, we have the
following results from SAS software. Note the parameterization of tables in this
case. The table variable is coded 1 and 0 for Tables 1 and 2 respectively. This is
accomplished with the ref= in the class statement.
proc logistic;
class table (ref=last)/param=ref;
model r/n=table|lyear/scale=none aggregate; run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

11
11

4.8516
4.3156

0.4411
0.3923

0.9381
0.9598

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
table
1
lyear
lyear*table 1

DF

Estimate

1
1
1
1

-9,.5995
5,.7355
2,.5734
-1,.4370

Standard
Error
1 . 3399
3 . 0034
0.3865
0.8516

Chi-Square

Pr > ChiSq

51 .3293
3 .6467
44..3198
2 .8473

<.0001
0.0562
<.0001
0.0915

The result of fitting the model with table, lyear, and lyear and table interaction
terms gives a deviance value that agrees with the sum of deviances obtained from

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

402

fitting the individual models as outlined above. However, we observe that the
interaction term is not significant in this model. Removing this term and refitting
the continuation ratio model gives the result displayed below.
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

Pr > ChiSq

Deviance
Pearson

12
12

7.5899
6.9192

0.6325
0.5766

0.8163
0.8629

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
table
1
lyear

DF

Estimate

1
1
1

Standard
Error

Chi-Square

Pr > ChiSq

1.1288
0.2750
0.3270

59.7468
6.1585
50.3053

<.0001
0.0131
<.0001

-8.7251
0.6824
2.3189

Qdds Ratio Estimates


Point
Estimate

Effect
table 1 vs 2
lyear

95'/. Wald
Confidence Limits

1.979
10.165

1.154
5.356

3.392
19.293

The model still fits the data with G2 = 7.5899 on 12 d.f. The continuation ratio
therefore provides an alternative to both the baseline category logit and proportional
odds (cumulative logit) models.
To fit the continuation ratio model to the breast data, we need to first generate
the following two subtables of successes/failures.
Tablel

Age

(a)

(b)

1
2
3

91
150
109

181
350
307

Table2
(a+b)
(c)
232
181
350
505
307
479

The above is accomplished by the following SAS software staments and partial
output:
options nodate nonumber ls=85 ps=66;
data breast;
do age= 1 to 3;
input month occass never Q<8;
output; end;
datalines;
91 90 51 150 200 155 109 198 172

run;
data newl;
set breast;
table=l; r=month; n=month+occass; output;
table=2; r=month+occass; n=month+occass+never; output;
run; proc print; run;
Obs

age

month

1
1
2
2
3
3

91
91
150
150
109
109

2
3
4
5
6

table

90
90
200
200
198
198

51
51
155
155
172
172

1
2
1
2
1
2

91
181
150
350
109
307

181
232
350
505
307
479

9.4. MODELS FOR ORDINAL RESPONSE

VARIABLES

403

A logistic regression fitted to subtable 1 gives &G2 0.0013 on 1 d.f. with J3 =


0.3045, a.s.e. (0.0943). Similarly, the logistic model applied to subtable 2 gives
a G2 = 0.6601 on 1 d.f. with 0 = -0.3199, a.s.e. (0.0865). Consequently, the
combined continuation ratio is G2 =-0.0013 + 0.6601 = 0.6614 on 2 d.f. Each of our
models assumes a linear trend in age group where integer scores are assigned. We
are exploiting the intrinsic ordering of this variable. The continuation ratio model
therefore fits the breast data with a pvalue of 0.7184.
For each of the above subtables with three observations per subtable, a logistic
model that assumes that age group is categorical with three categories, as an explanatory variable would have been a saturated model. Consequently, we assume in
the previous analysis that the variable "age" has a linear effect by assigning integer
scores (1,2,3) to the three categories. An alternative method of fitting the continuation ratio (Lindsey, 1995) employs fitting a single model simultaneously to the
subtables so created by assuming the subtables form a factor variable and does not
in any way assume the linear trend effect of the original explanatory variable.
A logistic regression fitted to the above data with age and subtable as factor
variables give G2 = 0.3649 on 2 d.f. (p = 0.8332).
proc logistic order=data;
class table (ref=last) age (ref=last)/param=ref;
model r/n=table age/scale=none aggregate;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

2
2

Deviance
Pearson

Value/DF

Pr > ChiSq

0.1824
0.1822

0.8332
0.8335

0 . 3649
0 . 3643

Analysis of Maximum Likelihood Estimates


Standard
Error
Intercept
table
1
age
1
age
2

1
1
1
1

0.5730
-1.1599
0.6431
0.2672

Chi-Square

Pr > ChiSq
<.0001
<.0001
<.0001
0.0100

47 . 6906
150.4801
23.9054
6.6270

0.0830
0.0946
0.1315
0.1038

Odds Ratio Estimates

Effect
table 1 vs 2
age
1 vs 3
age
2 vs 3

Point
Estimate

0.314
1.902
1.306

957. Wald
Confidence Limits
0.260
1 . 470
1.066

0.377
2 . 462
1.601

If we assume that the variable "age" has an equal interval scale, then the continuation ratio model is implemented as:
proc logistic order=data;
class table (ref=last)/param=ref;
model r/n=table|agel/sacle=none aggregate;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion
DF
Value
Value/DF
Pr > ChiSq
Deviance
Pearson

0.6614
0.6609

0.3307
0.3304

0.7184
0.7186

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

404

Analysis of Maximum Likelihood Estimates

Parameter

Standard
Error

Estimate

Intercept
table
1
agel
agel*table 1

1
1
1
1

1.5117
-1.1934
-0 . 3200
0.0155

Chi-Square

0.2060
0.2959
0 . 0865
0.1280

Pr > ChiSq
<.0001
<.0001
0 . 0002
0 . 9038

53.8560
16.2681
13.6877
0.0146

Since the interaction term is not significant, a model that fits the linear effect of
age and the effects of the subtables has:
Deviance and Pearson Goodness-of-Fit Statistics
Criterion

DF

Value

Value/DF

3
3

0,.6760
0,.6741

0 .2253
0 .2247

Deviance
Pearson

Pr > ChiSq
0 . 8788
0.8793

Analysis of Maximum Likelihood Estimates

Parameter
Intercept
table
I
agel

DF

Estimate

Standard
Error

Chi-Square

Pr > ChiSq

1
1
1

1.4957
-1.1595
-0.3129

0.1574
0.0945
0.0637

90.2701
150.4096
24.1118

<.0001
<.0001
<.0001

Odds Ratio Estimates


Point
Estimate

Effect
table 1 vs 2
agel

0.314
0.731

957. Wald
Confidence Limits
0.377
0.829

0.261
0.645

The advantage of the continuation ratio model is that it is readily more interpretable
than baseline category model with the same number of parameters. Further, the
link functions can readily be changed to the complimentary log-log or probit.

9.4.6

Mean Response Models

Sometimes, we wish to be able to obtain regression type models because of the ease
of interpretations of its parameters. In contingency table analysis employing an
ordinal response variable, this can be accomplished by the use of the mean response
model where the response variable is assigned scores (usually integer scores) and the
mean of these scores is then used as a response function.
For example, consider again the Breast Self-Examination Survey data in Table
9.13, which relate to the frequency of breast self-examination in individual age
groups. In these data, both the explanatory and the response variables are ordinal.
Let us assign scores {ui},i = 1,2,3 and { v j } , j = 1,2,3 to the two variables (age
and examination), respectively. We discuss the possibility of other forms of scores
in the next chapter. Thus within each level of the factor variables, the conditional
mean of the response variable is
Mi = ^vjmij/nl+,
i = 1,2,3
(9.34)
3

where n;+ is the marginal observed total and mij is the expected count for cell

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

405

The regression model therefore becomes (Agresti, 1984)


Mi = fjL + /3(ui - u)

(9.35)

where p, is the average of the conditional means, and /3 is the change in the conditional mean for a unit change in X (the explanatory variable). Since only two
parameters (fJ,,/3} are being estimated in this case, the model would therefore be
based on (/ 2) degrees of freedom. Hence, / > 3 is required to obtain an unsaturated model. The parameter estimates can only be obtained by using a weighted
least squares (WLS) approach (Bhapkar, 1966; Grizzle et al., 1969; Williams & Grizzle, 1972). The WLS solution is applicable only when the explanatory variable(s)
are categorical.
We fit the model described in (9.35) to the Breast Self-Examination Survey data
with frequency of self-examination as the response variable and age as a continuous
explanatory variable. We obtain, using PROC CATMOD in SAS software the
output below, (i = 1.6983, (a.s.e. = 0.0690,) and /3 = 0.1472, (a.s.e. = 0.0295), for
a WLS solution based on integer scores (1,2,3). The predicted increase in the mean
opinion response is 0.1472 categories for every age group <45 to 45-59 and 45-59
to 60+ changes. The hypothesis that HQ : /3 = 0 is tested by Wald's value of 24.83
on 1 d.f., which is not tenable. The residual G2 of 0.43 on 1 d.f indicates that the
model adequately fits the data and that there is a very strong association between
the two classificatory variables. The predicted mean responses are {1.8276, 2.0099,
2.1315}. It does appear that the frequencies of breast examination are different in
the three age group categories, and that individuals who are 60+ exhibit a more
positive self-examination of breast.
data breast2;
INPUT AGE EXAM $ COUNT fflffl;
AGEL=AGE;
DATALINES;

1 MONTH 91 1 OCCASS 90 1 NEVER 51 2 MONTH 150 2 OCCASS 200 2 NEVER 155


3 MONTH 109 3 OCCASS 198 3 NEVER 172
PROC PRINT; RUN;
PROC CATMOD ORDER=DATA;WEIGHT COUNT; DIRECT X;
RESPONSE 1 2 3 ; MODEL EXAM=AGEL/FREQ PROB; RUN;

Source
Intercept
AGEL
Residual

Analysis of Variance
DF
Chi-Square
1
1
1

Pr > ChiSq

606.00
24.83
0.43

<.0001
<.0001
0.5099

Analysis of Weighted Least Squares Estimates

Effect
Intercept
AGEL

9.4.7

Parameter

1
2

Estimate
1.6983
0.1472

Standard
Error
0 . 0690
0.0295

ChiSquare
606 . 00
24.83

Pr > ChiSq
<.0001
<.0001

Multidimensional Tables

For multidimensional tables, we give in Table 9.15, the results of a study to examine
the relationship between drinking habits of subjects living in group quarters to the
location in New York, and the length of time they have been there (Upton, 1978).

406

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


Number of years
in quarters
0
1-4
5+

Location
Bowery

Light
25
21
20

Drinking Habits
Moderate Heavy
21
26
18
23
21
19

0
1-4
5+

Camp

29
16
8

27
13
11

38
24
30

0
1-4
5+

Park Slope

44
18
6

19
9
8

9
4
3

Table 9.15: Drinking habits of subjects living in New York neighborhood


Below is the SAS software output for our preliminary analysis. Notice that because
the response variable is read in as a character variable, we specify the scores assigned
to each category in the response statement line. The saturated model fitted indicate
that the interaction term between length of time spent and location is not significant.
(p 0.4327). Hence, a reduced model can now be fitted to the data.
data mean2; input years $ loc $ habit $ count <8<5;
datalines;
0 bowery light 25 0 bowery mod 21 0 bowery heavy 26
1-4 bowery light 21 1-4 bowery mod 18 1-4 bowery heavy 23
5+ bowery light 20 5+ bowery mod 19 5+ bowery heavy 21
0 camp light 29 0 camp mod 27 0 camp heavy 38
1-4 camp light 16 1-4 camp mod 13 1-4 camp heavy 24
5+ camp light 8 5+ camp mod 11 5+ camp heavy 30
0 park light 44 0 park mod 19 0 park heavy 9
1-4 park light 18 1-4 park mod 9 1-4 park heavy 4
5+ park light 6 5+ park mod 8 5+ park heavy 3
proc catmod order=data; weight count;
response 1 2 3 ; model habit=yearslloc/wls noiter; run;
Analysis of Variance
Source
Intercept
years

loc
years*loc
Residual

DF

1
2
2
4
0

Chi-Square

Pr > ChiSq

2633.35
5.96
38.36
3.81

<.0001
0.0508
<.0001
0.4327

The reduced model fits the data with a Qw = 3.81 on 4 d.f. (p = 0.4327). The
main effect of location is highly significant, while the effects of length of time as
measured in years is on the border at a specified 0.05 a level. We nonetheless decide
to keep it in the model. In order to make comparisons among the locations and
length of time, we constructed the following contrasts, which are self-explanatory.
From the analysis below, it is indicative that those with 0 years (new comers) are
significantly different from the over 5 years residents, but the newcomers are not
in any way different from those with (1-4) years length of stay. However, for the
location, all the three pairs are significantly different with the major difference
between those residing in the Camp neighborhood and those at Park Slope.

9.4. MODELS FOR ORDINAL RESPONSE VARIABLES

407

set mean2;
proc catmod order=data;
weight count; response 1 2 3 ; model habit=years loc;
contrast '0 versus (1-4)' years 1 -1;
contrast '0 versus 5+' years 2 1;
contrast '(1-4) versus 5+' years 1 2;
contrast 'Bowery versus camp' loc 1 -1;
contrast 'Bowery versus Park' loc 2 1; contrast 'Camp versus Park' loc 1 2; run;
Analysis of Contrasts
Contrast
0 versus (1-4)
0 versus 5+
(1-4) versus 5+
Bowery versus camp
Bowery versus Park
Camp versus Park

DF

Chi-Square

1
1
1
1
1
1

0.30
5.62
2.79
5.67
21.19
48.63

Pr > ChiSq
0 . 5844
0.0177
0.0946
0.0172
<.0001
<.0001

For instance, the contrast "0 versus 5+" is constructed in CATMOD as "years 2
1." This is so because the contrast is 0i ^3. But $3 = (/3i + $2) because of
the usual sum-to-zero constraints employed in CATMOD. Substituting this in the
contrast gives the given contrast equation. Here, we have assumed that 01,02,03
refer respectively to the parameters for year level 0, 1-4, and 54-, respectively.
It should be 'noted here that the mean response model is most appropriate
for making inference about an underlying continuous set of variables, where the
categories of the variables reflect some underlying continuous characteristics.

9.4.8

Further Analysis of the Data in Table 9.15

So far, we applied the multicategory logit models to the pnemoconiasis data as well
as a two-way contingency table with an ordered response variable (the breast selfexamination frequency data). In order to further explain how to implement all the
above multicategory logit models to other multidimensional contingency tables, we
now show the application of these models to the data in Table 9.15, which has three
response categories and two factor or explanatory variables each at three levels.
We now fit the proportional odds, the adjacent category, and continuation ratio
odds models to this data set. First, we create dummy variables LOCI, LOC2
(Location 3 is the reference), and YY1 and YY2 to represent dummy variables for
the year. Note that the response category "heavy" is coded to have the highest value
on the drinking scale. The following SAS software statements and selected output
for the proportional odds model indicate that the proportional odds assumption is
satisfied based on the score test (p = 0.5089). The model also fits the data with
a deviance value of 8.6706 on 12 d.f. (p 0.7308). Examination of the parameter
estimates and the corresponding Wald tests indicates that location 1 and location
2 are highly significant, as well as the year 1 of residence relative to year level 3.
Based on the estimated odds-ratios below, we can conclude that a resident living
in Bowery is .391 less likely than those living in Park Slope to have light drinking
habits. Put another way, those leaving in Bowery are -^ = 2.554 or 125% more
likely to have heavy drinking habits than those living in Park Slope. Similarly,
those who have recently moved into this neighborhood (less than 1 year) are 59%
more likely than those that have lived in the neighborhood for at least 5 years to
have light rather than heavy drinking habits.
On the other hand, those residents who are living in Camp are

0.250

= 4 times

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS

408

more likely to have heavy drinking habits rather than moderate drinking habits than
those living in Park Slope. Since the effect of year of living is not significant for this
case, we decide not to give any interpretation to the corresponding odds ratio. We
can just take differences if interest centers on comparing those with moderate and
light habits, as explained earlier.
data tab917;
if lo eq 'bowery' then loc=l;
else if lo eq 'camp' then loc=2;
else loc=3;
locl=loc eq 1;
Ioc2=loc eq 2;
yyl=year eq 1;
yy2=year eq 2;
if habit eq 'light' then habit=l;
else if habit eq 'mod' then habit=2;
else habit=3;
proc logistic descending;
freq count; model habit=locl Ioc2 yyl yy2/scale=n aggregate; run;
Score Test for the Proportional Odds Assumption
Chi-Square

DF

Pr > ChiSq

3.3005

0.5089

Deviance and Pearson Goodness-of-Fit Statistics


Criterion

DF

Value

Value/DF

Deviance
Pearson

12
12

8 . 6706
8.9117

Pr > ChiSq

0.7226
0 . 7426

0.7308
0.7105

Number of unique profiles: 9


Analysis of Maximum Likelihood Estimates

Parameter

DF

Intercept
Intercept2
loci
Ioc2
yyl

1
1
1
1
1
1

Estimate
-0 . 0228
1 . 2402
-0.9378
-1.3849
0 . 4662
0.3491

Standard
Error
0.2460
0.2520
0.2276
0.2276
0.2105
0.2285

Chi-Square
0 . 0086
24.2126
16.9797
37.0231
4 . 9036
2.3333

Pr > ChiSq
0 . 9260
< . 0001
<.0001
<.0001
0 . 0268
0.1266

Odds Ratio Estimates


Input
loci
Ioc2

yyi
yy2

Odds Ratio
0.391
0.250
1.594
1.418

We next implement the adjacent category model on this set of data to again produce
the following select SAS software output. Here again, this model fits with G2 =8.06
on 12 d.f. (p = 0.7805). The Wald tests also indicate that only LOCI, LOC2, and
YYI are significant (p < 0.05). Again for those living in Bowery versus those
in Park Slope, the odds of having a heavy drinking habit rather than moderate
drinking habit are exp(0.6472)=1.910 times higher. For those with less than 1 year
versus those with more than 5 years, the odds of heavy drinking habit to moderate
drinking habit are exp(-0.2921)=0.747. Again, for those living in the camp versus
those living at Park Slope, the odds of having a moderate drinking habit rather
than a light drinking habit is exp(0.9237)= 2.519 higher. These conclusions are
very similar to the earlier conclusions under the proportional odds model.

9.4. MODELS FOR ORDINAL RESPONSE

VARIABLES

409

proc catmod;
weight count;
direct loci Ioc2 yyl yy2;
response alogit;
model habit= _response_ loci Ioc2 yyl yy2/wls; run;
The CATMOD Procedure
Analysis of Variance
Source

DF

Chi-Square

Pr > ChiSq

Intercept
.RESPONSE.
loci
Ioc2

1
1
1
1

yyi
yy2

i
i

6.38
2.49
16.55
33.76
4.25
2.13

0.0115
0.1146
C.OOOl
<.0001
0.0392
0.1448

Residual

0.7805
Analysis of Weighted Least Squares Estimates

Parameter
Intercept
.RESPONSE.
loci
Ioc2
yyl
yy2

1
2

3
4
5
6

Estimate
-0 . 4364
-0.1591
0.6472
0.9237
-0.2921
-0.2220

Standard
Error

ChiSquare

Pr > ChiSq

0.1728
0.1009
0.1591
0.1590
0.1417
0.1523

6.38
2.49
16.55
33.76
4.25
2.13

0.0115
0.1146
<.0001
C.OOOl
0.0392
0.1448

To implement the continuation ratio model, we first construct the tables Light versus Moderate and (Light and Moderate) versus Heavy. The SAS software output
below gives the fit of the continuation ratio model to the data in Table 9.15. The
LOGISTIC model for each of the table gives deviances of 2.0894 and 3.2870, respectively, on 4 d.f. The continuation ratio model therefore gives a deviance value
of 5.3764 (2.0894+3.2870) on 8 d.f. The model fits the data. Instead of fitting
individual logistic models for each table, we can fit a single model to the data that
incorporates the interaction terms of tables with LOCI, LOC2, YYI, and YY2.
This model produces the 8 d.f deviance in the result below. However, this model
indicates that the interaction terms involving LOCI and LOC2 with tables are not
significant (Wald tests not shown). Deleting these terms from the model leads to a
more parsimonious continuation ratio model with a deviance of 5.5163 on 10 d.f.
data cont;
do year=l to 3;
do loc=l to 3;
input r s table <B<8;
yyl=year eq 1;
yy2=year eq 2;
locl=loc eq 1;
Ioc2=loc eq 2;
tot=r+s; output; end; end;
datalines;
25 21 1 21 18 1 20 19 1 29 27 1 16 13 1 8 11 1
44 19 1 18 9 1 6 8 1 46 26 2 39 23 2 39 21 2
56 38 2 29 24 2 19 30 2 63 9 2 27 4 2 14 3 2
proc genmod;
where table=l;
model r/tot=locl Ioc2 yyl yy2/dist=b type3;
run;
proc genmod;
class table;
model r/tot=locl Ioc2 table Iyyl table Iyy2/dist=b type3;
run;

410

CHAPTER 9. LOGIT AND MULTINOMIAL

year

loc

table

yyl

yy2

1
1
2
2
2
3
3
3
1
1
1
2
2
2
3
3
3

2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3

25
21
20
29
16
8
44
18
6
46
39
39
56
29
19
63
27
14

21
18
19
27
13
11
19
9
8
26
23
21
38
24
30
9
4
3

1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2

1
1
1
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0

0
0
0
1
1
1
0
0
0
0
0
0
1
1
1
0
0
0

loci

RESPONSE MODELS

Ioc2

1
0

0
1

0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0

0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0

46
39
39
56
29
19
63
27
14
72
62
60
94
53
49
72
31
17

Criteria For Assessing Goodness Of Fit


Criterion

Value

DF

Deviance
Deviance

Value/DF
0 . 5223-Table 1
0.8217-Table 2

2 . 0894
3 . 2870

4
4

Criteria For Assessing; Goodness Of Fit


Criterion
Deviance
Pearson Chi-Square

DF

Value

8
8

5.3764
5.3626

Value/DF
0.6720
0 . 6703

Criteria For Assessing Goodness Of Fit


Criterion

DF

Value

Value/DF

Deviance
Scaled Deviance
Pearson Chi-Square

10
10
10

5.5163
5.5163
5.5053

0.5516
0.5516
0.5505

Analysis Of Parameter Estimates

Estimate

Parameter

DF

Intercept
loci
Ioc2
table
yyl
yyl*table

1
1
1
1
1
1
1
1
0

yy2*table
Scale

1
1
1

1.5522
0.4072
0.3194
-1.2432
-1.2296
0.7991
-1.7107
1.1318
1 . 0000

Standard
Error
0.3036
0.1847
0.2002
0.3391
0.3097
0.4125
0.3052
0.4175
0 . 0000

ChiSquare

Pr > ChiSq

26.13
4.86
2.54
13.44
15.77
3.75
31.42
7.35

<.0001
0.0275
0.1106
0.0002
<.0001
0.0527
<.0001
0.0067

LR Statistics For Type 3 Analysis

Source
loci
Ioc2
table

yyi
yyl*table
yy2*table

DF
1
1
1
1
1
1
1

ChiSquare
4.86
2.55
14 . 46
16.71
3.85
32.52
7.59

Pr > ChiSq
0.0275
0.1103
0.0001
<.0001
0 . 0498
<.0001
0.0059

9.5. EXERCISES

411

The most obvious significant parameter here is that involving the interaction between YY2 and the table. For the continuation ratio model to hold, the effects of
YY1 and YY2 are very important. Based on the significant YY2 and table interaction, the odds that an individual will have a light or moderate drinking to a heavy
drinking habit is exp(1.7107) = 0.181 times for those who have lived 1-4 years
than for those who have lived at least 5 or more years after adjusting for neighborhood residence. In other words, the 5+ years residents have odds that are 5.5 times
higher than those who have lived 1-4 years of going from moderate or light drinking
to heavy drinking habit. Similarly, the odds are exp( 1.7107+1-1318) = 0.56 times
greater for those who have lived 1-4 years than those who have lived 5+ years of
going from light to moderate drinking habit. That is, the 5+ years residents have
odds that are 1.78 times higher than those who have lived 1-4 years of going from
light to moderate drinking habits, again having controlled for the neighborhood
residence.

9.5

Exercises

1. For the data in exercise 5 chapter 6, we plan to construct a log-linear model


that corresponds to a logit model in which intercourse is the response. Based
on the odds ratios obtained in that exercise, which log-linear model seems
appropriate?
2. For a three-way table with binary response C, give the equivalent log-linear
and logit models for which:
(a) C is jointly independent of A and B.
(b) C is conditionally independent of B.
(c) There is no interaction between A and B in their effects on C.
3. For a four-way table with binary response D, give the equivalent log-linear
and logit models for which:
(a) D is jointly independent of A, B, and C.
(b) D is jointly independent of B and C, given A.
(c) There are main effects of A and B on D, but D is conditionally independent
of C, given A and B.
(d) There is interaction between A and B in their effects on D, and C has
main effects.
4. Refer to exercise 7 in chapter 8. The limiting distribution for the binomial
is the Poisson. Reanalyze these data by using a Poisson regression model.
Discuss the differences from your chosen model in that exercise.
5. Radelet (1981) gives data on the relationship between race and the imposition
of the death penalty. The data are given in the table below. Analyze the data
using logit models by considering death penalty as a response variable.
6. For the data in Table 4.16, fit a suitable mean response model. Interpret your
results.

412

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


Defendant's
race
Black

Victim's
race
Black
White

White

Black
White

Death penalty
Yes
No
6
97
11
52
0
19

9
132

7. The following data is taken from the 1984 General Social Survey of the National Data program in the United States as reproduced by Agresti (1990).

Income (US$)
<6000
6000-15,000
15,000-25,000
> 25, 000

Very
dissatisfied
20
22
13
7

Job satisfaction
Little Moderately
dissatisfied
satisfied
24
80
104
38
81
28
54
18

Very
satisfied
82
125
113
92

Table 9.16: Cross-classification of job satisfaction by income


Treating job satisfaction as the response variable, fit a cumulative logit model
to the data that gives a good fit and interpret the estimated effect.
The data in the table below present (Hedlund, 1978) the relationship between
an ordinal variable, political ideology, and a nominal variable, party affiliation,
for a sample of voters in the 1976 presidential primary election in Wisconsin.

Party affiliation
Democrat
Independent
Republican

Liberal
143
119
15

Political ideology
Moderate Conservative
156
100
210
141
72
127

Analyze the above data using cumulative logits, treating political ideology as
the response. Test the adequacy of the model fit and interpret parameter
estimates. What can you deduce from your analysis as being the influence of
party affiliation on ideology?
9. The data below relate to patients undergoing chemotherapy who were categorized by the severity of nausea and by whether or not they received cisplatinum
(treatment) (Farewell, 1982).

Treatment
No
Yes

None
43
7

Mild
39
7

Response
Moderate
50
30

Severe
29
14

Total
161
58

9.5.

EXERCISES

413

Analyze the above data.


10. The data below are from Agresti (1984) and relate to attitude toward abortion
and schooling of a sample of people.
Education
< High school
High school
> High school

Disapprove
209
151
16

Attitude
Middle
101
126
21

Approve
237
426
138

Fit a proportional odds model to the data and draw your conclusions.
11. The data below are from Christen (1990) and relate to 1237 men between the

ages of 40 and 59 (who did not develop coronary heart attack) taken from a
study conducted in Massachusetts. The men were cross-classified according
to their serum cholesterol and systolic blood pressure.
Bio od Pressur e (in mm ?lg)
< 127 127-148 147-166
167+
121
22
117
47
85
98
43
20
119
209
43
68
67
33
99
46

Cholesterol
(in mg/100 cc)
<200
200-219
220-259
>260

Fit the adjacent category, proportional odds, and continuation odds ratio
models to the above data. Based on your analyses, which of these models
would you recommend and why?
12. The data below relate to 3 year survival of breast cancer patients according
to nuclear grade and diagnostic center (Whittaker, 1990, p. 220).
Center
Boston
Glamorgan

Malignant
Died Survived
59
35
42
77

Died
47
26

Benign
Survived
112
76

Fit an appropriate logit model to this data. What would be the equivalent
log-linear model?
13. The table below relate opinions on whether one agrees or not that grocery
shopping is tiring to availability of a car, obtained in a survey in Oxford,
England (Lindsey, 1995).
Opinions from the Oxford Shopping Survey
Car
available
No
Sometimes
Always

Disagree
55
101
91

Grocery shopping is tiring


In
Tend to
Tend to
disagree between
agree
11
16
17
7
18
23
20
25
16

Agree
100
103
77

414

CHAPTER 9. LOGIT AND MULTINOMIAL RESPONSE MODELS


Fit the proportional odds, adjacent category, and continuation ratio models
to the above data. Which model is the most parsimonious? Interpret your
choice model.

Chapter 10

Models in Ordinal
Contingency Tables
10.1

Introduction

We shall in this chapter explore the special analyses of the general I x J contingency
table when one or both classificatory variables are ordered. For this class of tables,
the usual independence or quasi-independence analyses may not be adequate enough
for a proper explanation of the variation in the data. As an example, the data
below in Table 10.1 (Christensen, 1990) relate to a sample of 1237 men between
the ages of 40 and 59 (who did not develop coronary heart attack) taken from
the Framingham longitudinal study. The men were cross-classified according to
their serum cholesterol and systolic blood pressure. In this example, / = J = 4,
though genuine cases when I ^ J such as the very well-analyzed Manhattan 4 x 6
Manhattan mental health by parents' socioeconomic status data (Goodman, 1979a),
is also presented in Table 10.2.
Cholesterol
(in mg/100 cc)
<200
200-219
220-259
>260

Blood pressure (in mm Hg)


< 127
127-148
147-166
167+
121
47
22
117
85
98
43
20
209
68
43
119
67
99
46
33

Table 10.1: Classification by SCL and BP


For the data in Table 10.1 we see that both row and column variables can be
assumed to be ordered, leading to a 4 x 4 ordered table. In Table 10.2, we can also
assume that the response variable "Mental health status" has some intrinsic order
in its categories. In Table 10.3, too, both rows and columns can also be assumed
to have ordinal ordering about their categories.
Ee let fij be the observed frequency in the ij-th cell of a general / x J table with at
least one variable ordered and also let ra^ be the corresponding expected frequency
under some model. Then, the saturated log-linear model formulation for this table
becomes:
415

416

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY


Mental health
status
Well
Mild symptom
Formation
Moderate symptom
Formation
Impaired

TABLES

Parents' socioeconomic status


A
64

B
57

C
57

D
72

E
36

F
21

94

94

105

141

97

71

58
46

54
40

65
60

77
94

54
78

54
71

Table 10.2: Cross classification og mental health status by Socioeconomic status


In (rhij) = ^ + Af + \] + \%Y

(10.1)

for i = 1, 2, , / and j = 1, 2, , J where X and Y relate to the row and column


variables respectively. Further, we assume the usual identifiability constraints on
the parameters as discussed in chapter 6. The corresponding independence model,
which Goodman called the null or (O) model, has the formulation

In (

= n+ A +

(10.2)

which can be written in the multiplicative form as

Periodontal
Condition
A
B
C
D

Level
1
5
4
26
23

2
3
5
11
11

3
10
8
3
1

4
11
6
6
2

Table 10.3: Cross-classification of 135 women according to their periodontal condition and calcium intake level (Goodman, 1979a)
The model of independence is based on (/ !)(/ 1) degrees of freedom. For
the data in Tables 10.1 and 10.3, the model of independence yields G2 = 20.38
and 46.817 on 9 degrees of freedom, respectively, which clearly indicates that the
model of independence does not fit both data. We will now explore the possibility
of exploiting the fact that one or both of the classificatory variables are ordered into
modeling the tables by assigning known scores to the categories of the variables.
We also note here that the model of independence being inadequate suggests that
there is a significant interaction or association term between the row variable X and
the column variable Y.
Suppose we assign known scores Xi and yi pertaining to row and column categories respectively to the table, and let us now introduce these known scores to the
model in (10.1) (that is, rewriting the interaction term) to give a revised model of
the form:
ln(mj) = /i + Af + X j + < l > x i y j
(10.3)
where 0 is a parameter describing the intrinsic association between variables X and
Y. The model in (10.3) is known as the linear-by-linear association model. The
scores Xi and yi are either specified or known in advance. Possible values of the
scores are:

10.1. INTRODUCTION

(i) Integer scores: that is,

417

Xi = i

and

yj = j

A
and

(ii) Centered Scores: where

which centers the scores.


For a 4 x 4 table, the integers scores assume the values ( i , j ) = 1,2,3,4, while the
second set of scores assumes the values (aJi, y^) = 1.5, 0.5, 0.5, 2.5.
The model formulation in (10.3) assumes that the categories are equally spaced
(that is, interval scaled) and that the A^ y interaction in (10.1) has been modeled
or smoothed (or restricted) by the term 4>xiyi.
If we consider the 2 x 2 subtable formed from adjacent rows ( i.e., rows i and
i + 1) and adjacent columns (i.e. columns j and j + 1) for the general I x J
table, then let 9ij denote the corresponding odds-ratio for i = 1, 2, , ( / ! ) and
j = 1, 2, , ( J 1) based on the expected cell frequencies ra^. Then,

and there are (J 1)(J 1) of such odds ratios in such a table.


If we take the natural logarithm of 9ij above and we define the log odds as:
$ij = In Oij, then from (10.4), we have
&ij In rhij + In rhi+i:j+i In rhi,j+i In ihi+ij

(10.5)

Re writing (10.5) by changing the subscripts appropriately using (10.3) we have:


&V = (f)(xi+l

- Xi)(yj+l - yj);

= </>Af Aj

(10.6)

where Af = (aji+i Xi) is the distance under the postulated set of scores between
the i-ih and the (i + l)-th categories of the row variable X, and where Aj is similarly
defined.
We see that the log-odds ratio in the 2 x 2 subtable is a function of the intrinsic
association 0, the distance between the row categories A?, and the distance between
the column categories A!-. Obviously, if Af = Aj = 1, then $;., = </>. That is,
$- = 0 if and only if the adjacent row and column categories are one unit apart.

10.1.1

Properties of </>

1. (f) is unaffected by a location shift.


For instance, if we change xi to x* = Xi + a and yj to y^ =7/^ + 6 (where a
and 6 are constants), while these location shifts will produce changes in the
main effect parameters in (10.3), they would not change the value of </> since
in this case

$ij- = <Kz* +1 -z*)(y; + i-y;)

= 0{zt+i + a - (xi + a)}{yj+i +b- (yj + b)}

The above is equivalent to (10.6), which indicates that </> is unchanged.

418

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

2. A scale change (or unit point restriction) of the known scores X; and ?/-,-,
however, does have an effect on </> in addition to producing changes in the
main effect parameters. To show this consider x* = axi and y* = byj. If we
let <p* be the new intrinsic association parameter, then we have
$ = </>*(az i+ i - axi)(byj+i - byj)
= (j)*ab(xi+i - X i ) ( y j + i - y-j]
= 0*a6Af A?
Comparing this with (10.6), we have 4>*ab = (f> and consequently, we see that
in this case </>* = </>/a&. The new intrinsic association parameter is thus a
fraction of the original </> being <f)/ab.
Under integer scoring, A? = Aj = 1; hence, we have, using (10.6),
Qij = </>,

or,

(t>

ei:j =e

(10.7a)
=9

(lO.Tb)

The model described above has been called the uniform association (U) model by
Goodman (1979) . Either 0 or 0 can be used as a measure of association between
the X and Y variables if the model holds. Specifically, we can transform the measure
n

to well known [1, +1] scale by taking Q =

0+1

so that there is no difficulty at

all in summarizing the association.


The multiplicative form of the model is given by
rhij = ai/3j613
The (U) model has one parameter (namely, 0) more than the model of independence;
hence, it will be based on (/ 1)(J 1) 1 = (U I J) degrees of freedom.
The model of independence, which Goodman described as the null or O model, of
course is a special case of the linear-by-linear association model and is obtained
when 0 = 0. For our data in Table 10.1 the uniform association (U) model gives
G2 = 7.4291 and Pearson's X2 = 7.4628 on 8 d.f., which indicates that the model is
very satisfactory and that the integer scoring system appears reasonable. Here, the
MLE estimate of 4> = 0.1044, which gives 0 = e* = 1.110. Thus when the U model
holds, then each 2 x 2 subtable formed from adjacent rows and adjacent columns has
an odds ratio of this magnitude, which gives a Yule's Q of 0.0521, which represents
a modest but significant positive association between the two variables.
Model (U) is implemented in SAS software for the data in Table 10.1 with the
following program and partial output.
data tablOl;
do chol=l to 4;
do bp=l to 4;
input count <B<8;
u=chol*bp; /* creates product of integer scores *\;
output;
end; end;
datalines;
117 121 47 22 85 98 43 20 119 209 68 43 67 99 46 33
proc genmod; class chol bp;
model count=chol bp u/dist=poi link=log typeS;
run;

10.1. INTRODUCTION

419
Value/DF

Deviance
Pearson Chi-Square

8
8

7.4291
7.4628

0.9286
0.9328

Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

Wald 95'/. Confidence


Limits

Intercept

1.7591

0.4120

0.9515

0.1044

0.0293

0.0470

ChiSquare

Pr > ChiSq

2.5667

18.23

<.0001

0.1617

12.73

0.0004

In general, the odds ratio between arbitrary rows i and i' (if > i] and columns
and j (jf > j) is given by
where r = i' i and c = j' jf, respectively are the differences between the levels
of the X and Y variables. That is, the expression in (10.8) can be written in terms
of scores as
(xi> - Xi) (yy -yj}0
If r = c = 1, we have the earlier result above. For instance, the odds ratio between
the third and first categories of Y and the first and second categories of X is given
by 9 = 6 2(o.i044) = 1232) since r = 1 and c = 2.

1
2
3
4

1
112, .034
81,.276
130.,141
64.,549

Y
2
130 .954
105, .450
187, .416
103..180

3
43,.088
38.,512
75.,974
46.,427

4
20 .930
20,.764
45,.467
30,.840

Table 10.4: Expected values under model U for the data in Table 10.1
Thus,

,
r(112.034)(38.512)1
*< 1 ' 1 >< 2 ' 3 > = [ (81.276)(43.088) \ =1'232

while for r = 2 and c = 2 we have 9 = e4(-1044) 1.518. We can demonstrate this


again from the table of expected values as:
'(112.034)(75.974)1
= 1.518
and
(130.141)(43.088)J
(81.276)(46.427)1
= 1.518
(64.549)(38.512)J
If the model of independence holds, then 9 = 1, and this implies that G2(O) G2(U)
has a x2 distribution with 1 d.f which tests independence conditional on the (U)
model holding true.
Usually, the (U) association model always fits the data of interest, but in case
this is not the case, we explore below some other association models that can be
used to further examine the variability in the table and to test whether the (U)
model is sufficient for a proper explanation of the row-column association in the
data.

420

10.2

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

The Row Association (R) Model

If the column variables is ordinal, then suppose we assign scores {i>j} to these
column categories. With this set up, the row association model (R) can be denned
in terms of the odds-ratios as
0y=0i+
f o r i = 1 , 2 , - - - ,7-1
(10.9)
The model has (/ 1) more parameters, namely, 0j + , than the (O) model, and is
therefore based on (I 1)(J 1) (/ 1) = (/ 1)( J 2) degrees of freedom. The
model assumes that the row categories are not necessarily ordinal. The model can
be written in the log-linear form (Agresti, 1984) as:
In (rhij} = fi + Af + \] + ri(vj - v]

(10.10)

where v = %^ and Af = Af = 5>4 = 0.


The model is equivalent to the model of independence when T; = 0 for all i.
The row association parameter {r^} is interpreted as being the deviation within a
particular row of In (rhij} from row independence of a known function of the ordinal
variable with slope n.
For arbitrary rows i and i' (i' > i} and columns j and j' (j1 > j } , we have the
log odds ratios for the model above as

TTL J' ' JTL >' t J'i \


/

rriij'mi'j )

= In (rhij} + In (rhi>j>} - [In (rhij'} + In (rh^j}]

= (TV - Ti}(i>j> -v}- (rv - r)(i/j - v)

= (TV -Ti}(vj< - V j ]
That is, the log odds ratio is proportional to the distance between the columns and
is always positive whenever (TV r^) > 0. If \Vj = j}, that is, integer scores, then
the log odds ratio is constant and equals (r^ T V ) for all ( J 1) pairs of adjacent
columns.
The row association model is naturally suited for the general / x J table having
nominal row variable and ordinal column variable since the model has the same form
(and produces the same G2} if rows of the table are permuted. That is, permuting
the row categories would not produce any changes in the value of G2. In other
words, the rows are permutation proof.
Goodman (1979a) has formulated an alternative form of the row-association
model (R). His formulation is of the form:
Oij = 0&+
(10.11)
where 0i+ is equated to 0i+ and where

Taking natural logarithms then, (10.11) and (10.12) are equivalent on the additive
scale to
In 9ij = ^ + r/i+ with
(10.13)
7-1

$>+ =
(10.14)
i=i
where tp = In (0) and r)i+ = In ({+). The above implicitly assumed that /+ = 1
and ?7/-|- = 0.

10.2. THE ROW ASSOCIATION (R) MODEL

10.2.1

421

Example 10.1: Analysis of Data in Table 10.3

We consider in the table below the 4 x 4 data relating to periodontal condition and
calcium intake level of 135 women (Goodman, 1986a) described previously in Table
10.3. The SAS software program for implementing the row-association model is
presented below with partial outputs from PROC GENMOD.
data tab!03;
do cond=l to 4; do level=l to 4;
input count (KB;
nu=level; /* create integer column scores*\;
mu=cond; /* creates integer row scores*\;
output; end; end;
datalines;
5 3 10 11 4 5 8 6 26 11 3 6 23 11 1 2
proc print;
proc genmod; class cond level;
model count= condlnu level/dist=poi; run; /* fits (R) model*\;
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Pearson Chi-Square

DF

Value

Value/DF

6
6

9.8761
9.2897

1.6460
1.5483

Analysis Of Parameter Estimates

Parameter

DF

Intercept
nu
nu*cond
nu*cond
nu*cond

1
1
1
1
1

1
2
3

Estimate
3.4416
-0 . 7634
1.2701
1.0824
0.3101

Standard
Error

1 . 1800
0 . 3896
0.2812
0 . 2850
0.2543

ChiSquare

Pr>ChiSq

8.51
3.84
20.40
14.42
1.49

0 . 0035
0.0501
<.0001
0.0001
0.2227

In order to find the sum to zero parameter estimates in terms of the r/s, we note
the following from the above SAS software output:
r/i - 774 = 1.2701 772 - r)4 = 1.0824
and
7)3 - 7)4 = 0.3101
4

Adding the above and remembering that /J??i 0 to get the equivalent sum to
i=\

zero constraint parameters, we have,


f)i = 0.6045
7)2 = 0.4167
7)3 = -0.3556 and 7)4 = -0.6657
We can compute estimated odds ratios from values of either the sum to zero parameter estimates or the GENMOD estimates (last parameter set to zero constraint).
In either case, for example, 7)3 fji = 0.9600 and the estimated odds ratio is
given by exp {-0.9600}=0.383. This could have been obtained from the GENMOD
parameter estimates as exp{0.3101 1.2701} = 0.9600 as before.

10.2.2

Estimating the Log Odds Ratios

We can estimate the odds ratios (or log of them) based on the above parameter
estimates. If we assume the row association model with integer scores for the
column variable, then we should expect constant odds ratios for adjacent column

422

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY TABLES

categories in the multiplicative model, which is reduced to constant difference of log


odds ratios for adjacent categories.
$(i,j)(i',i') = "Hi1 - "Hi

hence

0(i,j)(i',j') = exp(rV - ?))

(10.15)

To illustrate, 7)3 r)i = 0.960, with corresponding odds-ratio 0 = 0.383, which


means that the odds of calcium intake being 4 rather than 3, or 3 rather than 2 or 2
rather than 1 are 0.383 lower for periodontal condition C than condition A. These
odds ratios can be obtained from the table of expected cell frequencies presented
below under model (R):
Periodontal
condition
A
B
C
D

1
5
4.365
4
4.867
26
24.657
23
24.110

Calcium intake level


2
3
3
10
5.321
7.261
5
8
4.918
5.563
11
3
11.508 6.013
11
1
8.253 3.163

4
11
12.052
6
7.652
6
3.822
2
1.474

Table 10.5: Table of expected values under model R


* Second-row values denote expected values under the R association model.
For instance, 173 771 = 0.960 can be obtained from the table of expected values
as:
5.321 x 6.013 1
7.261 x 3.822 ]
4.365 x 11.508
In
= ln
= -0.960
= ln
12.052
x 6.013 J
5.321 x 24.657
7.261 x 11.508 J
Similarly,
772 - rh = -0.1877
r)3 - r}2 = -0.7723
7)4 - 773 = -0.3092 while r}4 - 771 = -1.2701
which are the log odds ratios for comparing A with B, B with C, C with D and A
with D conditions respectively.
A conditional test of independence for the data in Table 10.3 given that the row
association model holds true, is accomplished by testing the hypothesis under the
multiplicative (r)s or log (r/)s-scales, respectively, as
n = T2 = = TI = 1

ryi = T?2 = ' ' = T)! = 0

respectively. If the model holds, then the homogeneous row association model
corresponds to the model of independence as stated before. And the conditional
test is based on
Q-2(o ( ^ = 2(o) _ Qi(R}
and will be based on (/ 1)(J 1) (/ 1)(J 2) = (/ 1) degrees of freedom.
For the data in Table 10.3, we present the results of this analysis below:
Model
O
R
O R

d.f.
9
6
3

G'2
46.887
9.876
37.011

10.3.

THE COLUMN ASSOCIATION (C) MODEL

423

The analysis above indicates that the conditional test gives a G2 value of 37.011 on
3 d.f. which shows very strong evidence of a row association. It should be noted
here that the row association model can also be employed not only for situations
when the row category is nominal, but when such category is also ordinal.

10.2.3

Effect of Changing Scoring

In the above analysis, we have assumed integer scoring for the column categories.
That is, we assume that Vj {1,2,3,4}. We present below, the analysis based on
centered scoring scheme, where Vj = j (^2^) {1-5, 0.5,0.5,1.5} in this case.
This is accomplished in SAS software with the following together with a partial
output.
set tab!03;
if level eq 1 then score=-1.5;
else if level eq 2 then score=-0.5;
else if level eq 3 then score=0.5; else score=1.5;
proc genmod; class cond level;
model count= condlscore level/dist=poi link=log typeS; run;

Criterion

Criteria For Assessing Goodness Of Fit


DF
Value

Deviance
Pearson Chi-Square

Parameter

DF

Intercept

SCO
SCO
SCO
SCO
SCO

e
e*cond
e*cond
e*cond
e*cond

1
2
3
4

1
1
1
1
0

9.8761
9.2897

Value/DF
1.6460
1.5483

Analysis Of Parameter Estimates


Standard
Wald 957.
Estimate
Error
Confidence Limits

1 5331
-0
1
1
0
0

7634
2701
0824
3101
0000

ChiSquare

Pr > ChiSq

0 . 4049

0 7396

2 . 3266

14.34

0.0002

0.3896
0.2812
0.2850
0.2543
0.0000

-1 5270
0 7189
0 5238
-0 1884
0 0000

0.0002
1.8212
1.6410
0 . 8086
0 . 0000

3.84
20.40
14.42
1.49

0.0501
<.0001
0.0001
0.2227

The parameter changes in this case only affect the row and column parameters
estimates. The estimates of the 775 are not affected and neither are the goodnessof-fit test statistics affected.

10.3

The Column Association (C) Model

The column association model (C) is denned in terms of the odds ratios as
7
for j = 1 , 2 , - - - ,J-1
(10.16)
+3
The model also has (J 1) more parameters, namely, 0+J-, than the (O) model, and
is also based on (/ 1)(J 1) (J 1) = (/ 2)(J 1) degrees of freedom. The
model can similarly be written in the log-linear model form as:
In

) = u. + A? + X +

- //)

(10.17)

where Af = A^ = , ^ = 0.
The model is equivalent to the model of independence when pj =0. The column
association parameters {PJ} is interpreted as being the deviation within a particular
column of In (mi-,-) from column independence of a known function of the ordinal
variable with slope PJ .

424

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY TABLES

Again, for arbitrary rows i and i' (i1 > i) and columns j and j' (j1 > j), we can
show that the log odds are equivalent to:

That is, the log odds ratio is again proportional to the distance between the rows
and is always positive whenever (pj> PJ) > 0. If {/^ = i}, that is, integer scores,
then the log odds ratio is constant and equals (pj> = PJ) for all (1 1} pairs of
adjacent rows.
Like the row association model, the column association model is naturally suited
for the general / x J table having nominal column variable and ordinal row variable
since the model has the same form (and produces the same G2) if columns of the
table are again permuted.
All the models considered so far, namely, the O, U, R, and C association models,
are nested in the sense that O implies U and U implies both R and C. However, R
and C are not themselves nested. We can thus carry out conditional tests of the
form G2(O U),G2(U \ R) or G2(O \ C). These can be used to conditionally test the
significance of the row or column parameters or simply to decompose the baseline
G2 value for the model of independence into parts corresponding to contributions
from the factors to the total.

10.4

The R+C Association Model

We next consider a generalization of the row and column association models that
will include an overall effect plus the effects of the rows and columns association.
This model can be written as:
0y=00 i + 0 + J - or
(ioig)
$j = (j> + &+ + 0+j
The former is the multiplicative form of the model, while the latter is the additive
form of the model. In the above model, 9i+ and 9+j are unspecified and so are the
(j>'s.
Models (10.18) describes the (/ 1)(J 1) odds ratios BIJ and log odds ratios
ln(#ij) in terms of the row and column effects, and it is based on (/ 2)(J 2)
degrees of freedom. The model is called the R+C model because both the row
effect and column effect are added to the overall effect </>. The model is log-linear
or additive in the log odds ratios and is sometimes referred to as model I.
The R+C model requires both ordering of the two classificatory variables. That
is, it assumes that both the row and column variables are ordinal. Thus, changing
the order of any two rows or columns categories changes the model structurally. We
recollect that changing the order of rows in the R model or the order of columns in
the C model does not in any way change the model structurally. The R+C model
would therefore be naturally suited for contingency tables with doubly ordered
categories and where the spacings of the categories are also assumed known. The
model is based on (/ 2)(J 2) degrees of freedom.

10.5

The RC Association Model

Because of the restrictions imposed on the R+C model above, namely,

10.5. THE RC ASSOCIATION MODEL

425

(i) known row and column integer scores and


(ii) both rows and columns being ordinal,
the model RC, which is the multiplicative form of the additive log odds ratio model
for the R+C model, is therefore proposed by Goodman (1979a). The RC model has
the log odds ratios <&ij modeled as:
*y = 0'$+^-

(10.19)

where the </>' values here should not be confused with the </> values in the R+C
model. It is sometimes referred to as model II in comparison to the R+C model,
which is model I.
This model when written in terms of the linear-by-linear model has row score
parameters as fa and column score parameters as Vj where the fa and the Vj are
unknown and need to be estimated from the data, and the corresponding log of the
expected frequencies can be written in the form:
In(mij) = / / + Af + \y + jfjiii/j
(10.20)
with the following constraints imposed:

(10-21b)
Constraints in (10.21a) and (10.21b) are often referred to as properties that ensure
that a scale has to have a zero and unit variance properties. It would therefore
be desirable in order to make scale comparison meaningful to invoke the properties
that a scale has to have a zero and unit properties.
The constraints in (10.21a) and (10.21b) are due to Goodman (1979b) where
the first constraint ensures that the scores are centered at zero, while the second
also ensures that the length of each vector is normed to 1. The above is analogous
to the standardized normal Z variable with mean 0 and unit variance.
The above method has been described (Goodman, 1991) as the unweighted solution or marginal independent constraints. Goodman (1981, 1985) also proposed
instead the marginal-weighted scores where (10.21a) and (10.21b) are now defined
as:
^T faPi+ = ^ VjP+j = 0
(10.22a)
i

]P+j = 1

(10.22b)

Here the score parameters are normed with the marginal distribution where P;+ =
Y^jpij, P+j = ^iPij and the pij are the observed probabilities.
A more general rule is to define row weights gi and column weights hj (see
Becker & Clogg, 1989), which may or may not sum to 1. If gi > 0 for all i and
X^#i = 1) then we may regard gi as a probability distribution for the rows, and
with similar arguments for the hi) for the columns. With these weights, we have:

426

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY


^

S hM

TABLES
(10.23a)

tf = l

(10.23b)

In all cases, the first constraint centers scores at zero or the mean score at zero
or the weighted score at zero. The second constraints adjusts the length of score
vectors in all cases.
The unweighted scores discussed above is a special case of the above when gi
hj = I for all i and j. Similarly, the marginal- weighted solution is obtained if we let
gi Pi+ and hj = P+j. Other choices have #; = I/I and hj = 1/J, which has been
referred to as the uniform-weighted solution. Other weights that have been employed
include 0 = i-(/ + l)/2, i = 1, 2, , / and hj = j - (J + l)/2, j = l , 2 , - - - , J
which adjusts the sums of the weights to zero.
The corresponding log odds ratio from (10.21) thus becomes
Vj) = <1> <l>i+<t>+j

(10.24)

We notice immediately that this model is not log-linear but rather multiplicative
in the log of odds ratios, and this generally complicates estimation procedure, but
the ANOAS algorithm by Goodman or the ANOAS module in CDAS by Eliason
(1990) makes the estimation and fitting of this model very simple.
With [i and v defined as scores, it thus becomes obvious that <f)'i+ denotes the
distance between rows i and i + 1 and that </>^_ similarly denotes the distance
between columns j and j + I . We also note that the ordering of the categories has
been made redundant because of the integer nature of the score parameters, and it
can be shown that this model is unchanged by any permutation of rows or columns.
Because of this property, it does not matter whether we model the RC model in
terms of the log odds ratios or in terms of log of expected odds ratios. The RC
association model, like its additive R+C counterpart, is also based on (I 2)(J 2)
degrees of freedom.
Models O, U, R, and C are all special cases of the RC model. For instance,
the (O) model is obtained when </> = 0. For a more detailed discussion of this
model, readers are advised to refer to the following references: Goodman (1979a),
Haberman (1981), and Clogg (1982). Various scores have been advocated. These
scores range from marginal weighted scores (Goodman, 1981, 1985) and row weights
(Becker & Clogg, 1989).

10.6

Homogeneous R+C or RC Models

The homogeneous row-column effect models imply the following additional constraints:

(RC)H V\ = v\\ pi = v-2;

, fj.i = fi

We note that both the R+C and the RC homogeneous models can only be employed for square contingency tables in which I = J and that there are additional
/ constraints on the homogeneous model parameters. But because of the location

10.6. HOMOGENEOUS R+C OR RC MODELS

427

and scale constraints, there are, however, (/ 2) non-redundant, identifiable or estimable parameters and the degrees of freedom is given by (/ 2)(J 2) + (/ 2) =
(/ 2)(/ 1). Table 10.6 gives the degrees of freedom for all the models discussed
in the preceding sections.
Models

O
U
R
C
R+C

(R+C) H
RC

d.f.
(/ - 1)(J - 1)
(U -I --J)
(I - 1)(J - 2)
(/- 2)(J - 1)
(7-2)(J -2)
( / - 2)(7 - 1)
( J - 2 ) ( J -2)

Table 10.6: Models considered in this chapter


while Table 10.7 gives the degrees of freedom for the conditional association tests
together with their corresponding G2.
Effects on
association
1. General Effect
2. Row effects
3. Column effects
4. Column effects
given Rows C\R
5. Row effects
given Columns R\C
6. Residual
7. Total

Models
used
0-U
U-R
U-C

d.f.
1

G2

RC-R

1-2
J -2
J-2

G\O U) = G* (0)-G2(U)
G2(R U)
G2(C U)
G\RC R)

RC-C

1-2

G2(RC \ C )

(7-2)(J-2)
(7-l)(J-l)

G2(RC)
G2(0)

RC
O

Table 10.7: Degrees of freedom for conditional association tests

10.6.1

General Comments

When the classificatory variables are both ordered, that is, when we have a
doubly ordered contingency table, then the R+C model should be used instead
of the RC model.
If only the rows variables are ordered but not the column variable, then the U
model would not be relevant nor would either of R (since it assumes ordering
of columns) or the R+C model. In this case, only the 0, C, and RC models
would be appropriate. This would also be the case even if the column category
were partially ordered.
When neither row nor column are ordered as in case of a nominal-nominal
table, then only the O and the RC models would be appropriate, since they
are the only models un-affected by permutation of either row categories or
column categories.

428

10.6.2

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY TABLES

Example 10.2: Analysis of Data in Table 10.1

We give below the results of applying the models listed in Table 10.6 to the data in
Table 10.1.
Model
O
U
R
C

d.f.
9

R+C
(R+C) H
RC
(RC)H

8
6
6
4
6
4
6

G'z
20.378
7.429
7.404
5.534
5.488
6.236
4.914
6.619

BIG
-43.71
-49.53
-35.32
-37.19
-22.99
-36.49
-23.57
-36.10

AIC

-12.38
-8.57
-4.60
-6.47
-2.51
-5.76
-3.09
-5.38

The above models are implemented in SAS software with the following SAS software
program.
set tablOl;
***create uniform, column and row integer scores ***;
u=chol*bp;
cl=bp;
rl=chol;
*** INDEPENDENCE MODEL ***;
proc genmod; class chol bp;
model count=chol bp/dist=poi typeS; run;
*** (U)-ASSOC. MODEL ***;
proc genmod; class chol bp;
model count=chol bp u/dist=poi; run;
*** R-ASSOC. MODEL ***;
proc genmod; class chol bp;
model count=cholIcl bp/dist=poi typeS; run;
*** (C)-ASSOC. MODEL ***;
proc genmod; class chol bp;
model count=chol bp|rl/dist=poi typeS; run;
*** R+C-ASSOC. MODEL ***;
proc genmod; class chol bp;
model count=cholIcl bp|rl/dist=poi type3; run;

Apart from the null model (O), all the other models considered adequately fit the
data. Again, using the parsimony consideration in this case, model U (the uniform
association) seems to be the most parsimonious of all the models. This model gives
a BIG (Bayesian information criterion) and an AIC (Akaike information criterion)
of 49.534 and 8.571, respectively. Both criteria agree in this case on the choice
of the most parsimonious model. For these data, the sample size is 1237. Model RC
which is implemented in CD AS, for instance, has parameter estimates displayed in
the following table.
<p
fa:
j>j\

0.51962
i= 1
-0.64330
j=1
-0.72656

2
-0.32248
2
0.06975

3
0.39399
3
-0.02623

4
0.57179
4
0.68305

Table 10.8: MLE under model RC for the data in Table 10.1
Similarly, the ML estimates (MLE) under the homogeneous (RC) model are presented in Table 10.9.

429

10.7. THE GENERAL ASSOCIATION MODEL


4>
fa:
Vj-

0.52582
i=l
-0.71443
.7 = 1
-0.71443

2
-0.12398
2
-0.12398

3
0.17145
3
0.17145

4
0.66696
4
0.66696

Table 10.9: MLE under model (RC)# for the data in Table 10.1
Similarly for the data in Table 10.2, we give the fits of the various models described
in the preceding sections to these data.
Model
O
U
R
C
R+C
RC

d.f.
15
14
12
10
8
8

G*
47.418
9.895
6.281
6.829
3.045
3.571

Using the independence model as the baseline model, the U model accounts for
about 80% of the G2 under independence and fits the data quite well. That the U
model fits the data well indicates that there is positive association between the two
classificatory variables.

10.7

The General Association Model

The general RC(M) association model (Goodman, 1986a, 1996) is defined in terms
of log expected frequencies as:
In (rhij] = A + A^ -f A^ -f- \^ ^ml^im^jm

(10.25)

m=l

where M min(7, J) 1. Here again, the parameters Him and Vjm are scores to be
estimated from the data, while the parameter 0m is again defined to be a measure
of the intrinsic association (in the m dimensions) that will be estimated from the
data. We may assume without loss of generality that the </>m are ordered so that:
When M = 0, the model in (10.25) reduces to the model of independence, that
is, the null model (O). Similarly, when M 1, the model defined by RC(1) is
also equivalent to the (RC) model discussed in the previous section. And cases
where M > I are of interest in this section as they relate to the dimensionality of
the association which can be shown to be analogous to the usual correspondence
analysis. Further, the fit of models in which M > 1 would be necessary for those
data in which the RC model proves inadequate. The RC(M) model is based on
(I M l ) ( J M 1) degrees of freedom.
The log odds ratio formed from rows i and %' and columns j and j' can be
decomposed as:
M

>m(f*im ~ Vi'm)(Vjm
m=l

~ Vj'm)

(10.26)

430

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY TABLES

Thus, the log odds ratio for any 2 x 2 subtable formed from rows i and i' and
columns j and j' is represented by a sum of M components, one component for
each dimension allowed under the model. Further, each component reflects the
intrinsic level of association in that dimension, the distance between rows i and i',
and also the distance between columns j and j'.
For identifiability purposes, constraints would need to be imposed on the score
parameters (the usual zero-point and unit-point restrictions) because the RC(M)
model has M </>'s, MI /z's and MJ z/s, for a total of M(l + / + J) parameter values.
For example, for the 4 x 4 data in Table 10.1, / = J = 4 and model RC(M) has
M = 3, which would give a total of 27 parameter values for this model, in contrast
to only (4 1) (4 1) = 9 nonredundant possible interaction parameters.
Consequently, if we let gi and hj be sets of weights, then we have for the zeropoint restriction
fjm

= 0

(10.27a)

=1

(10.27b)

Similarly, the unit-point restrictions are:


J

7 -(z/ jm )

where in both cases m = 1,2, , M. The above restrictions impose respectively


2M constraints and thus a total of 4M restrictions. This will in effect reduce the
total number of parameters to M(l + / + J) - 4M = M(I + J - 3). Thus for our
4 x 4 table, we would now have for the RC(3) model 15, which are still too many
parameters. We next impose orthogonal constraints in addition to the constraints
imposed above. That is:
/

y^diP'imP'im* y~^ hjVjmVjm* = 0 for all m ^ m*


(10.28)
i=\
j=\
Goodman (1986b, 1991) has described the general association model RC(m) for
various weighted situations. Goodman (1985, 1996) has also discussed extensively
correlation models in the general contingency tables. These models will not be
covered in this text, and interested readers are referred to the references cited above.
These impose ( 2 ) restrictions on the fj, and v parameters, respectively. We
thus now have a total of M(I -f J 3)
^ 2~ '
2
M(I + J M 2)
interaction parameters. The degrees of freedom for the RC(M) model is therefore
given by:
(/ - 1)(J - 1) - M(I + J - M - 2 ) = ( / - M - 1)( J - M - 1)
1S

'

J ( I - M - 1 ) ( J - M - 1 ) if M < min(/, J) - 1
'~ \
0
if M - in(7, J) - 1
The case when M = min(/, J) 1 refers to the saturated situation. All other cases
in which M < min(/, J) 1 refer to unsaturated cases.
We may also note here that the weights Qi and hj may take any of the three
(unweighted, marginally weighted, and uniformly weighted] weighted scores discussed
in the previous section.
The RC(M) model can be implemented by the use of program RCDIM in the
CDAS group of programs by Scott Eliason (1990).

10.7. THE GENERAL ASSOCIATION MODEL


10.7.1

431

Example 10.3: Application to Table 10.1

Again, let us consider the set of data in Table 10.1. For this data we have the
following results when we fit either the unsaturated RC model or the RC(M) to
the data when M* = 1. Both models are equivalent in terms of estimates of
parameters etc only if marginal weighted scores are employed. That is, ANOAS
uses the row marginal scores Pi+ = (0.2482, 0.1987, 0.3549, 0.1981} and column
marginal scores P+j = {0.3137,0.4260, 0.1649, 0.1981}. And it is only in this case
that the estimated correlation can be obtained.
If we were to fit the RC(1) association model to the data for the five weighted
situations listed below, then:
1. Unweighted, that is, gi = l,hj = 1 for all (i, j).
2. Uniformly weighted, that is, gi = j and hj = j
3. Integer-weighted,that is, gi = i and hj = j; i 1, 2, , / and j = 1, 2, , J.
4. Marginally weighted, that is, gi = Pi+ and hj = P+j.
5. Centered scores, where gi = i (I -f l)/2 and hj = j (J + l)/2.
We would find that in all the five situations above, the G2 values would remain the
same indicating that the choice of scores do not affect the magnitude of the relevant
test statistic (Becker & Clogg, 1989). However, the // and v estimates would be
different for all the five scores, as well as the estimates of the intrinsic </> association
parameter. We give an example to demonstrate this general theory. Again, we
consider the data in Table 10.1 for analyses using scores in 1, 2, 3, and 4 as defined
above. We have in Table 10.10 the following results when model RC(1) is fitted. A
more detailed treatment is provided in Becker and Clogg (1989).

ROWS
1
2
3
4
COLUMNS
1
2
3
4
0

G2

Unweighted
SCORES(l)
Ai
Ai
-0.643 -0.464
-0.323 -0.232
0.394
0.284
0.572
0.412
Vi

Vi

-0.727 -0.524
0.070
0.050
-0.026 -0.019
0.683
0.492
0.5195
NA
4.9138

Uniform
SCORES(2)
Ai

Ai

-1.287
-0.645
0.788
1.143

-0.464
-0.232
0.284
0.412

Vi

Vi

-1.453 -0.524
0.140
0.050
-0.053 -0.019
1.366
0.492
0.1299
NA
4.9138

Integer
SCORES(3)

Ai

Ai

-0.622
-0.390
0.127
0.255

-0.626
-0.393
0.128
0.257

Vi

Vi

-0.663 -0.668
-0.097 -0.098
-0.166 -0.167
0.339
0.341
1.0125
NA
4.9138

Marginal
SCORES(4)
Ai
-1.366
-0.715
-0.741
1.101

Ai
-0.458
-0.239
0.249
0.369

Vi

i>i

-1.340 -0.449
0.471
0.158
0.252
0.085
1.865
0.626
0.1125
0.1116
4.9138

Table 10.10: Estimated scores from fitting the RC(1) model under four different
scoring systems

432

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

In all the four cases above, we notice that the G2 values are the same and the only
changes are in the parameter estimates as well as the estimates for the intrinsic
association parameter (/>. The correlation between the scores //, and v is only obtainable for the case when the scores are marginally weighted, that is, case 4 in
which Qi Pi+ and hj = P+j. The models above are each based on 4 d.f. The adjusted scores estimates /^ and v^ are obtained by multiplying by the corresponding

10.8

Correlation Models

For the I x J contingency table, let PIJ denote the probability that an observation
falls in cell (i, j){i = 1, 2, , /; j = 1, 2, , J}. Then the model of independence
is given by:
Pij = Pi+P+j
where Pi+ = \ Ptj and P+j = P^.
Goodman (1985) has introduced a generalization of the above model, which he
described as the correlation or canonical correlation model. This generalization
takes the form:
M
Pij = Pi+P+j(l

+ E PmXimyjm)

(10-29)

m=l

where M = min(/ 1, J 1), and where Xim are the row scores. Similarly, the
yjm are the column scores and the parameter pm is a measure of the correlation
between the row score Xim and the column score yjm. Further, the scores satisfy
the following constraints for m^mf:

]P XiTnPi+ = 0

E yjm P+j = 0

t=l

J=l

/ ^ XjmXim' rj-\- = (J

/ ^ yj

i=l

3=1

From the above, we notice that the Xim and the yjm scores pertaining to the ra-th
component have been standardized and that for m ^ m', the corresponding row
and column scores are uncorrelated with each other.
With the above constraints, it is not too difficult to show that
/ J
E E XimyjmPij

= Pm

for

m = 1, 2, , M

t=l J=l
If we consider the 2 x 2 subtable formed from rows i and i' and from columns j
and j', then the correlation coefficient is given by:
p = (PijPvj,

- PwP

Specifically, for a 2 x 2 contingency table, the above reduces to

10. 8. CORRELATION MODELS

433

P = PllP22 -

and the usual odds ratio becomes


We see that p is marginal dependent, while 9 is marginal free. 0 is therefore useful
for modeling unweighted association models while p is most useful for modeling
correlation models.
We see from (10.29) that
M

(Pij - Pi+P+j)/Pi+P+j = ^ Pmximyjm

(10.30)

m=l

Squaring (10.30) and summing over i and j, we have


i j
i J / M

p+j} = E E E

and the Pearson's X2 like quantity:


M

If we multiply both sides of the above by JV, the sample size, we have
M

*2 = w]T/4
m=l

For a 4 x 4 table, M = 3 and from the above, we see that


o

Pi

P2

X2/N

Pi

X2/N

X2/N

and these correspond to the first inertia, second inertia and third inertia quantities respectively in the usual correspondence analysis. We note that these inertia
quantities add up to 1. That is,
9
Pi

X /N

2
P2

X /N

2
P3

X2/N

-,

As noted by Clogg, the scores parameters derive from principal components (or
singular value decomposition) so that the correlations pm represent correlations
between principal components. Thus a singular value decomposition algorithm can
be used to calculate the eigenvalues (p'ms), the left (row) eigenvectors (xim) and
the right (column) eigenvectors (yjm) based on the estimate of A;J, where

434

10.8.1

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

Example 10.4

We now fit all the models that have been described in the preceding sections to the
data in Table 10.2.
Model
O

R
C
R+C
RC

d.f.
15
14
12
10
8
8

G2
47.418
9.895
6.281
6.829
3.045
3.571

0
0.152
0.160
0.158
0.004
0.166

P
0.150
0.157
0.156
0.100
0.161

The results above are obtained by using the program ANOAS by Leo Goodman.
We note here that for all the models, the centered scores gi i (/ + 2)/2 and
hj j (J + 2)/2 are employed. Under model U, for instance, the estimated values
of the row and column scores are respectively:
/ti = {-1.439, -0.481, 0.477, 1.436} and
Vj = (-1.539, -0.918, -0.298, 0.323, 0.9441.565} with <j> = 0.152
To convert the estimate of the intrinsic association parameter to those based on
equally spaced rows and column scores, we use the relationship

In our example above, 0' = 0.153(-1.439 + 0.481)(-1.539 + .918) = 0.091. Consequently, 0 = exp(0 ) = 1.095. Also, using the independence model as the baseline
model, the U model accounts for about 80% of the G2 under independence and fits
the data quite well. That the U model fits the data well indicates that there is
positive association between the two classificatory variables.
Similarly, the RC(1) models for the data under various weights have as expected
a G2 value of 3.571 on 8 d.f. However, the estimates of the intrinsic association 0 and
the row and column scores differ in each of the cases considered. For instance, we
have < = {0.9649,0.1970,2.4081,0.1665} respectively for the uniform weights & =
hj = 1, uniform marginal weights, gi 1/4, hj 1/6; integer scores gi = i,hj = j
and marginal weighted gi = Pi+,hj = P+j. In the latter case, the estimated
correlation p = 0.1611.
We present in Table 10.11, estimates of the parameters when models RC, RC(1),
and the correlation model are applied to the data in Table 10.2.

10.9

Grouping of Categories in Two- Way Tables

A reasonable grouping of the rows and column categories of a two-way contingency


table can sometimes simplify the analysis of association between the two classificatory variables. Thus by grouping categories, we may "get a more parsimonious
and compact summary of the data" (Fienberg, 1980, p. 154). These will lead to a
considerable reduction in the number of parameters under the hypothesized model.
Thus, we are interested in collapsing our original I x J table to an /* x J* table,
where /* and J* represent lower dimensions, that is, /* < / and J* < J.

10.9. GROUPING OF CATEGORIES IN TWO-WAY TABLES

Parameter
Ai
M2

As
A4
J/i

^2
3
Z>4
l>5
Z>6

RC
-1.678
-0.140
0.137
1.414

-1.112
-1.121
-0.371
0.027
1.010
1.818

435

Estimates
CORR.
RC(1)
1.637
-1.678
0.102
-0.140
-0.105
0.137
1.414
-1.428
-1.112
-1.121
-0.371
0.027
1.010
1.818

1.201
1.289
-0.067
0.040
-0.934
-1.743

Table 10.11: Results from correlation analysis


From chapter 6, the independence model, when it holds, is always collapsible. For other models, we need to establish reasonable criteria for collapsibility.
Goodman (1981) has introduced the homogeneity criterion (amongst others), which
assumes that "particular rows or columns can be combined if these particular rows
or columns are homogeneous" (Gilula, 1986). This translates to:
Two distinct columns a and b are said to be homogeneous if Pia/P+a
for all 1 < i < I and similarly for rows. Using the results of Gilula (1986),
the above implies that a necessary and sufficient condition for two distinct
columns a and b to be homogeneous is that the scores va0L ^faa where
1 < a < min(/ 1, J 1). In our case a 1. The case for the rows is
similarly denned.
If we let G 2 (O) be the likelihood ratio statistic under the model of independence,
then using the Williams (1952) and Goodman (1985) procedure, one can use G\ =
G2(O) G2(S) as a test statistic to judge whether the grouping that leads to model
S is justified. The statistic is distributed asymptotically as x2 w^h (/ 1)(J
1) (/i 1)( Ji 1) d.f., where /i and J\ are the reduced dimensions of the table.
We would use the estimated row and column scores obtained above to explore the
homogeneity of the rows and columns for the data in Table 10.2.
Based on the correlation scores in the above Table 10.11, Gilula (1986) has
suggested that rows 2 and 3 are homogeneous as are columns 1 and 2, and columns
3 and 4. Thus combining rows 1 and 2, columns 1 and 2, and columns 3 and 4, we
have the following new 3 x 4 observed table, that is, Table 10.12.
Mental health
status
Well
Mild+ moderate
Impaired

Socioeconomic status
A+B
C+D
E
F
121
21
129
36
300
388
151 125
154
78
86
71

Table 10.12: Table 10.2 with some rows and columns combined
Clogg and Shihadeh (1994) has examined the case where rows 2 and 3 are combined

436

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

and columns 1 and 2, columns 3 and 4, and columns 5 and 6 are also combined,
leading to the new 3 x 3 table given as Table 10.13.
Socioeconomic status
A+B C+D E+F
121
129
57
300
388
276
86
154
149

Mental health
status
Well
Mild+moderate
Impaired

Table 10.13: Table 10.2 with Clogg's collapsibility rule


Based on the row and column scores obtained from fitting models U, R ,C, and RC
to the data in Table 10.2, I consider that only rows 2 and 3 need to combined, as
well as columns 1 and 2 only. These are the only ones that can be justified from
the use of the above association models. The correlation model of course leads to
the 3 x 4 table suggested by Gilula. Our approach would therefore lead to a 3 x 5
table displayed in Table 10.14.
Mental health
status
Well
Mild+moderate
Impaired

A+B
121
300
86

Socioeconomic status
C
D
E
F
57
72
36
21
170 210 151 125
60
95
78
71

Table 10.14: Table 10.2 with Gilula's rule


We now give the results of fitting the independence and uniform association models
to the data in Tables 10.12 to 10.14. These results are displayed in Table 10.15.

15
14

G2
47.418
9.895

<f>
0 091

9
1 095

O
U

8
7

45.100
2.390

0 181

1 198

O
U

6
5

43.437
1.270

0 256

1 290

O
U

4
3

41.449
1.119

0 318

1 374

Tables
4 x6

Models
O
U

3 x5

3x4
3 x3

d.f.

Table 10.15: Results of fitting both the independence and uniform association models to the collapsed tables
The independence model for all the three collapsed tables gives G2 values of 45.100,
43.437, and 41.449 on 8, 6, and 4 degrees of freedom, respectively. These values
compare very favorably to the previous value of 47.418 on 15 d.f. Each of these,
therefore, gives values of (47.418-45.100) = 2.318 (7 d.f.), (47.418-43.437) = 3.981
(9 d.f.) and (47.418 41.449) = 5.969 (11 d.f.), respectively, for the grouping error.
None of these values is significant.

10.9. GROUPING OF CATEGORIES IN TWO-WAY TABLES

437

The estimates of the 9 parameter for the three cases under the (U) model are
1.198, 1.290, and 1.374 respectively. When these are compared with the original
value of 9 = 1.095 for the 4 x 6 table, all the three uniform association models
indicate that the U model is adequate or satisfactory for the collapsed tables; the
estimates of the associations, as measured by #, did change appreciably for the three
tables. This implies that inferences drawn from a full table and a collapsed table
may not necessarily be the same since the parameter estimates are not the same.
In order to reconcile these differences, Clogg and Shihadeh (1994) has advocated
that since for the full table the estimates of the parameters are based on integer
scores ^ = i and v$ = j, a near equivalent result for a collapsed table can be
obtained by modifying the scoring system for comparability. For instance, for the
3 x 3 collapsed table, if we score rows and columns as //i = 1, ^2 = 2.5, and //3 = 4
and v\ = 1.5, vi = 3.5, and ^3 = 5.5, then, these would be more consistent with
the scoring for the full table. In this case, G2 = 1.12, the same as before but with
an estimate of 9 given by 9 = 1.1118, which is very close to the original 1.095 for
the full table.

10.9.1

Association Versus Correlation Models

The question that readily comes to mind is, "When is the correlation model equivalent to the association model?" The correlation association models dates back to
the earlier works of Pearson, while association models can similarly be linked to
the earlier works of Yule. To answer the above question, Goodman (1996) suggests
that in order to compare the two schools of thought, first consider, R(.} to be a
monotonic increasing function of x. Then, for the correlation models, define R(x)
as:

M
m=l

Similarly, we can define


M

In R(x) = Pij =

,1=1
where 0 is unweighted. Of course, 0 can be weighted.
If R(x) = x, then we would have the correspondence (correlation or canonical
correlation) analysis, that is, Pearsonian approach. On the other hand, if R(x) =
In (z), then we would have the association model.
Consequently, if we define a family of monotonic increasing function as
xc
R(x) = , 0 < c < 1
then
if c = 1
as c > 0

where R(x) = In x is obtained as a limiting form of R(x) as x approaches zero.


That is,
xc
lim = In x
c>0 C

Goodman (1996) has suggested that while c = 1 gives the correlation model, which
is equivalent to the correspondence analysis of Pearson (1947), c = 0, gives the

438

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

usual association (Yule's) models. A middle of the road can be taken such that
c = 1/2 (where the two approaches will be equivalent). He describes such a case as
the mid-way association models.
In general, Goodman (1986b) has suggested that in most situations, the association models seem better than the correlation model. With association models,
you can think about rows and columns of the table being symmetric.
As an example, for the data in Table 10.1, we have p 0.112, which is not
too high; consequently, the equivalent RC model can be employed for this data.
Goodman (1991) has advocated that the RC model be used in these cases.
The analysis of association models in multi-way contingency tables when one or
more classificatory variable are ordered is fully discussed in the next section.

10.10

Higher Dimensional Tables

In this section, we extend the concept of association models to higher dimensional


contingency tables. First, consider a three-way contingency table having variables
A, B, and C with I, J, and K categories respectively. If one, two, or all of these
variables have ordinal categories, then we can exploit the ordinal nature of these
categories in our analysis. Our interest here is to model the association in such a
three-way table. Three types of models that exploit these ordering of categories of
the variables will be discussed here.
(i) The first of such models relates to the case in which two of the variables (say
A and B) are nominal and the third variable C has ordinal categories. Such
models are very common and we will illustrate such a case with the data in
Table 10.16 in the next section.
(ii) The second of such models relate to the case in which two of the variables
(say, A and B) are ordinal and the third variable C has nominal categories (or
groups or layers). Such models are referred to by Clogg (1982) as conditional
association models. An alternative approach to modeling this class of tables
is to fit K row-column association models in m dimensions, that is, we fit the
RC(rn) to the two ordinal variables at various levels of the third (nominal)
variable. This type of models are fully discussed in Becker and Clogg (1989).
(iii) The third type of models to be discussed, considers the three variables A,
B, and C to have ordinal categories. Again, following, Clogg (1982), models
associated with these are described as the partial association models.
In all our discussions in this section, we would suppose that we have a three-way
I x J x K table. In addition, if we let n^ be the observed count in the table,
then rhijk would be the corresponding expected cell count under some model, for
i = 1,2, , /, j = 1, 2, , J, and k = 1, 2, , K.

10.10.1

Type I Group of Models

The models in this class have two variables that are nominal, with the third variable
(response variable) having ordinal categories. An example is given in Table 10.16,
which is reproduced from Clogg and Shockey (1988) and relates to job satisfaction

10.10.

HIGHER DIMENSIONAL

TABLES

439

on a 4-point ordinal scale and explanatory variables race (White, other) and degree
attained (high school or less, more than high school), resulting in a 2 x 2 x 4
contingency table. Let us assume for the sake of our analysis here that the variable
"degree attained" is nominal.

Race
White
Other

Job satisfaction
Moderately
A Little
satisfied
dissatisfied

Very
dissatisfied

Degree
attained

Very
satisfied

<HS
>HS

404
112

319
81

81
16

52
10

<HS
>HS

48
8

46
14

21
1

7
0

Table 10.16: Job satisfaction data


Here, if we let the response variable (job satisfaction) be represented by S, degree
attained by D, and race by R, then a series of log-linear models can be fitted to
this data. The models fitted with their corresponding G2 and degrees of freedom
are presented in Table 10.17.
Model
{RDS}
{RD, RS}
{RD, DS}
{RD,RS,DS}

d.f.
9
6
6
3

G'2
17.4553
12.9673
9.8122
5.5650

pvalue
0.0412
0.0435
0.1326
0.1348

Table 10.17: Results of fitting various logit models


The above models are implemented in SAS software with the following statements:
options nodate nonumber nocenter ls=85 ps=66;
data tab!013;
do r=l to 2; do d=l to 2; do s=4 to 1 by -1;
input countflffl;sl=s; output; end; end; end;
datalines;
404 319 81 52 112 81 16 10 48 46 21 7 8 14 1 0
proc genmod
class r d s
proc genmod
class r d s
proc genmod
class r d s
proc genmod
class r d s

model count=r|d s/dist=poi link=log; run;


model count=r|d s|d/dist=poi link=log; run;

model count=r|d s|r/dist=poi link=log; run;


model count=r|d s|d s|r/dist=poi link=log; run;

For the four models fitted above, their log-linear model formulations can be written.
For instance, for the first model {RD,S}, we have:

\S
A

\RD

We see that the first two models {RD,S} and {RD,R} do not fit the data. Model
{RD,S} states that job satisfaction is jointly independent of both race and degree
attained. The last two models however provided better fits. Since the response

440

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

variable has four levels, and if we assume integer scores, we can examine the linear
effect component of this variable on the other factor variables. We therefore incorporate next, the linear components of S into each of the four models resulting in
the results below in Table 10.18.
Model
RD, S(l)
RD, S, R*S(1)
RD, S, D*S(1)
RD, S, R*S(1), D*S(1)

d.f.
11
8
8
7

G2
91.5181
14.3779
14.3037
11.4406

pvalue
0.0000
0.0724
0.0741
0.1204

Table 10.18: Results of fitting equivalent log-linear models with only the linear
components of S
Both model which suggest linear association between the response variable S and
either race or degree attained have G2 in the neighborhood of 14.3, and each in
turn indicates that the interaction components are significantly different from zero.
Since neither can be eliminated from the model, the last model {RD, S, R*S(1),
D*S(1)} gives a G2 value of 11.4406 on 7 degrees of freedom. This model fits the
data very well and is given by the log-linear formulation: D
In (mijk) = v + Af + Af + A + \%
+ Af/K - fi) + Affcs(fc - u)
where {uk} are the scores (integer) relating to the response variable S. Note that
we have scored the response variable from 1 to 4 with a score of 4 corresponding
to "very satisfied" and a score of 1 corresponding to "very dissatisfied." The ML
estimates of the RS and DS associations in this case are 0.1715 (a.s.e. = 0.0985)
and 0.1473 (a.s.e. = 0.0885), respectively. Since this model fits the data, we can
therefore conclude that being White is associated with more job satisfaction (less
job dissatisfaction) than being nonwhite. Similarly, having at most an high school
degree is associated with less job satisfaction (more job dissatisfaction) than having
more than a high school degree. The model is implemented in SAS software with
the following, where si are the integer scores for the levels of S.
set tablOlS;
sl=s;
proc genmod;
class r d s; model count=r|d s r l s l d|sl/dist=poi link=log; run;

10.11

Conditional Association Models

The models in this class relate to situations in which two ordinal variables (say
A and B) are observed for each of K groups (or K levels of variable C). We are
interested in possible sources of between-group heterogeneity in the association.
The sampling scheme in this case is the product multinomial for the different IJdimensional multinomials.
We shall assume that variables A and B have scores {ui} and {u/}, respectively.
Then for the three-way contingency table, the conditional odds ratios are defined
oc

Oij(k) = (TOijfcmt+ij+i,0/(TOij+i,fcTOi+i i<7 - )fc )

(10.32)

where, (ij) = 1 , 2 , - - - , (/ - 1), and k = 1, 2, ,K.


The data in Table 10.19 is an example of data relating to this class of models.

10.11.

10.11.1

CONDITIONAL ASSOCIATION MODELS

441

Example 10.6

The data in Table 10.19 relate to the 2 x 3 x 3 table of cross-classification of popularity ratings of two groups of children over two occasions (von Eye & Spiel, 1996).
Ratings at Time 2
Groups
1

Ratings at
Time 1
1
2
3

Total
1
2
3

Total

1
1
4
0
5
7
3
1
11

2
3
43
4
50
4
36
7
47

3
0
18
1 3
31
0
9
25
34

Total
4
65
17
86
11
48
33
92

Table 10.19: Cross-tabulation of popularity ratings of two groups of children over


two occasions
If we designate ratings at time 1 to be variable A, ratings at time 2 to be variable
B, and the groups to be variable C, then variables A, B, and C are indexed by
i = 1,2, 3, j = 1, 2,3, and k 1,2, respectively.
Under the product multinomial sampling scheme, the baseline model is the loglinear model {AC,BC}, which is usually described as the null conditional association
model. The following models of interest will be considered for the data in Table
10.19. These models are defined in terms of the odds ratios in (10.32) and the
corresponding equivalent log-linear model formulations in terms of lijk =
(i) The null conditional association model has
(10.33a)
jk

+ A;

,AC

,BC

(10.33b)

The model is based on K(I 1)(J 1) degrees of freedom and is equivalent


to the log-linear model {AC,BC}. In other words, A and B are conditional
independent given the levels of variable C.
(ii) The homogeneous conditional uniform association model has
(10.34a)
,A
*i

, \B _,_ \ C , \ AC
' A i A^, -p Aj^.

j_ \AB(

(10.34b)

\aiu3 I

where Ui = /^ Hi and Vi = Vj Vj. The model states that variables A


and B are uniformly associated for every level of variable C. The model has
K(I 1)(J 1) 1 d.f. The model can be fitted in SAS software as the model
(AC, BC, A(1)*B(1)}. That is, the model containing the linear-by-linear
component of the AB interaction term.
(iii) The heterogeneous uniform association has:

442

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES
(10.35a)

Xj + xk+xij (Ul)(Vj) + xik

(iQ ^ b)

Here, the strength of the uniform association between variables A and B


changes across the levels of variable C. The model has K(IJ I J) degrees of
freedom and can be fitted in SAS software as the model: {AC, BC, A(1)*B(1),
A(1)*B(1)*C }.
(iv) A homogeneous row effect model under (PM) has:
Oij(k) = Oi++

(10.36a)
c

lijk = n + \f + Af + \

+ Xfk

+ Affc + Xt (Vj)

(10.36b)

The model implies that there are only row effects on the association and that
they are homogeneous for each category of variable C. The model can be
implemented as: {AB, AC, A*B(1)} and has (/ - l)(JK - K - I) degrees of
freedom.
(v) The simple heterogeneous row effect model under the (PM) sampling has:
0 y(fc) = 9i++0++k

(10.37a)
c

lijk = n + \f + Af + X k + A + X
i)(vj)

and is implemented as: {AC, BC, A*B(1), A(1)*B(1)*C}. The model implies
that there are only row effects on the association but the overall effects on the
association is different for each level of variable C. The model has K(IJ I
K} (I 2) degrees of freedom.
(vi) A heterogeneous row effect model, which allows for heterogeneity both in the
overall effects on the association and in the row effects on the association,
similarly has:
(10.38a)
A

AC

\
_1_
\
_l_
\
J _U
~r /\
~r AiT" AJL.

BC
-L~r \^ik

(ia38b)

The model which is based on K(IJ 21 + J + 2) degrees of freedom is implemented as model {AC, BC, A*B(1), A*B(1)*C }.
Corresponding column models of the above row models (iv) (vi) can similarly be
defined and the various interpretations for the row effects models discussed therein
can be extended to the column effects models.

10.11.2

The Conditional Association RC Models under


Product Multinomial Sampling Scheme

The row-column conditional association models under the product multinomial sampling S(PM) scheme are defined as follows:

10.11.

CONDITIONAL ASSOCIATION MODELS

443

(vii) The homogeneous RC effects model has:


eij(k) = ei++9+j+

(10.39a)

7
M
i, ~t"
_i_ A,\^ +
_i_ A.\B -f_i_ AL,
XC1 +
_L A,-i.
X^C1 -r
_i_ A.-jr.
\ BC>
ni'fc

A*
4
+ ABM + A>J.)

(10.39b)

This model is based on [K(IJ + !)-(/ + J)(K + 1) - 3] degrees of freedom


and is implemented in SAS software as {AC, BC, A(1)*B, A*B(1)}.
(viii) The heterogeneous row, RC effects model has:
(10.40a)
c

A? + Af + A? + Affc + Xff
^

'

The model is based on (J 2)(IK K 1) degrees of freedom is implemented


as: {AC, BC, A(1)*B, A*B(1), AC*B(1)}.
(ix) The heterogeneous column, RC effects model has:
n
i'j(
k'}

n
n
C/24-4-(7_i_<7ffa\
A

f i J-VJ.Ti.clJ
n / 1 1 a^
\
B

AC

/ ijfc -= //
/v 4- \i 4- Xj 4- X 4- Xik
k

4- A^
jk

(io.41b)

The model is based on (/ 2}(JK K I) degrees of freedom and is implemented as model {AC, BC, A(1)*B, A*B(1), A(1)*BC}.
(x) The heterogeneous row-column, RC effects model has:
(10.42a)

\ikAC _i_' \BC


jk

/^
lABCf^

The model is based on K(I 2)( J 2) degree of freedom and can be implemented in SAS software as: {AC, BC, A(1)*B, A*B(1), A(1)*BC, AC*B(1)}.
The above models are implemented in SAS using PROC GENMOD with the
following statements.
data cond;
do gp=l to 2; do timel=l to 3; do time2=l to 3;
input count fflfl; rl=timel; cl=time2;
output ; end ; end ; end ;
datalines;
1304
43 18 04 13 7 4 0 3 36 917 25
proc
RUN;
proc
(1)
(2a)
(2b)
(3a)
(3b)
(3c)
(4a)

print;
genmod
model
model
model
model
model
model
model

class timel time 2 gp;


ount=timel Igp time2|gp/dist=poi;run;
ount=timel Igp time2|gp rl| cl/dist=poi;run;
ount=timel Igp time2|gp rl|cl |gp/dist=poi;run;
ount=timel Igp time2|gp timel |cl/dist=poi;run;
ount=timel Igp time2|gp timel Icl rl 1 cl|gp/dist=poi ;run;
ount=timel Igp time2|gp timel Icl timel | cl |gp/dist=poi;run;
ount=timel Igp time2|gp time2|rl/dist=poi ;run;

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

444

(4b)
(4c)
(5a)
(5b)
(5c)
(5d)

model
model
model
model
model
model

count=timelIgp
count=timelIgp
count=timelIgp
count=timelIgp
count=timelIgp
count=timel Igp

time2lgp
time2|gp
time2|gp
time2|gp
time2lgp
time2|gp

TABLES

time2|rl rl|cl|gp/dist=poi;run;
time2|rl|gp/dist=poi;run;
timellcl time2|rl/dist=poi;run;
timellclIgp time2|rl/dist=poi;run;
timellcl time2lrl|gp/dist=poi;run;
timel|cl|gp time2|rl|gp /dist=poi;run;

The models have been renumbered from 1 to 5d, where rl and cl are integer scores
for the linear effects at Timel and Time2, respectively. We have displayed the
model statements in SAS software together for brevity, actual implementation calls
using PROC GENMOD.

10.11.3

Models Under the Multinomial Sampling Scheme

The preceding models were obtained under the product-multinomial sampling


scheme. Agresti (1984) considers fitting similar models to the cholesterol data in his
example. In the product multinomial (PM) scheme, the baseline model is the null
conditional association model {AC,BC}, with K(I 1)( J 1)=8 d.f. for the data in
Table 10.19. However, in the multinomial (M) scheme, the baseline model would be
the model of mutual or pairwise independence {A,B,C}, with (UK I-J K + 2)
which would be 12 d.f. for the data in Table 10.19. Equivalent models discussed in
the previous section can be implemented under the multinomial sampling scheme
with the following models (without the actual model formulation) in SAS software.
Model #
(1)

(26)
(3a)

(36)
(3c)
(5a)
(56)

Model Des cription

Mod el Implementation

Null as.iocia HOT.

{A, B, C}
fi rtA >

He erogeneo us uniform
Ho nogeneo- * row effect.
Sir pie hete vgeneous
rov. effect
He erogeneow row effect.
Ho noyeneoiw RC effect*
He er-ogeneo us r-ow
PC
ffffftv
/IL>
ejjeciy
He er-ogeneo us column
RC effects
He erogeneo us
rail-column RC effect*

{A, B', C, A(1)"C, B(l)*c! A(1)*B(1),A(1)*B(1)*C}


{A, B, C, A(1)*C, B(1)*C, A(1)*B}
{A, B, C, A(1)*C, B(1)*C, A(1)*B, A(1)*B(1)*C}
{A, B, C, A(1)*C, B(1)*C, A(1)*B, A(1)*BC}
{A, B, C, A(1)*C, B(1)*C, A(1)*B,A*B(1)}
{A, B, C, A(1)*C, B(1)*C, A(1)*B,A*B(1), A(1)*BC}

(5c)
(5d)

(A, B, C, A(1)*C, B(1)*C, A(1)*B,A*B(1), AC*B(1)}


{A, B, C, A(1)*C, B(1)*C, A(1)*B,A*B(1), A(1)*BC, AC*B(1)}

We present the results of employing these models under both the product multinomial and multinomial sampling schemes to the data in Table 10.19 in Table 10.20.
Based on these results, the most parsimonious model would be model (2a) with
G2 value of 5.155 on 7 d.f. This is the model of homogeneous conditional uniform
association, which gives %(&) = 1.100 for i = 1,2, j = 1,2, and all k 1,2.
The parameter 9 is estimated as exp(2.2565) = 9.5496 where 2.2565 is the logparameter estimate of X^B(UI u)(vi v] with (a.s.e. = 0.3352) obtained from
PROC GENMOD.
Models for the case when all three variables are ordinal have been described by
Clogg (1982) as the partial association models. These models are not considered in
this text but such models can easily be implemented in SAS software.
Other models that have been suggested for handling the kind of data presented in
the latter sections of this chapter include the RC(m)-G model discussed in Becker
and Clogg (1989). A general class of symmetry models is fully discussed in Clogg
(1982) and in Clogg et al. (1990).
In addition to the models considered above, some of the models discussed earlier,
namely, the uniform association, row, and column association models, are sometimes

EXERCISES

10.12.

445
Product Multinomial
G2
X2
d.f.
8
71.897
79.854

Multinomial Scheme
G2
X2
d.f.
12
85.205
118.319

1.

Models
Null

2.
Pa)
(2b)

Uniform :
Homogeneous
Heterogeneous

7
6

5.155
4.737

8.377
10.278

9
8

15.071
7.728

19.550
19.930

3.
(3a)
(3b)
(3c)

Row Effects:
Homogeneous
Simple Heterogeneous
Heterogeneous

6
5
4

5.070
4.677
4.275

8.138
9.940
9.025

8
7
5

15.033
7.504
7.336

18.859
16.170
19.208

4.
(4a)
(4b)
(4c)

Column Effects:
Homogeneous
Simple Heterogeneous
Heterogeneous

6
5
4

4.909
4.413
4.365

6.704
7.653
8.105

8
7
5

14.710
7.551
4.287

18.273
18.334
8.650

5.
(5a)
(5b)
(5c)
(5d)

RC Effects:
Homogeneous
Heterogeneous Row
Heterogeneous Col
Heterogeneous Row-Col

5
3
3
2

2.361
1.142
1.714
0.457

2.132
0.802
1.329
0.255

7
4
4
2

11.916
4.496
1.263
0.457

11.091
4.126
0.963
0.255

Table 10.20: Conditional association models for the data in Table 10.19
very useful for modeling contingency tables having one, two, or three ordinal classificatory variables. However, the RC(m)-G model has an advantage over most
other models because of its ability to fit models in more than one dimension and
estimating the row and column scores in individual groups or for the combined
group. For more detailed explanations and merits of the RC(rn)-G models, the
reader is referred to the following references: Clogg (1982), Clogg and Goodman
(1984), Becker and Clogg (1989), Gilula and Haberman (1988), and Gilula, Krieger,
and Ritov (1988).

10.12

Exercises

1. The data below measure the joint distribution of years of education of married
couples in the United States as obtained from the 1972 General Social Survey
(Haberman, 1978).

Distribution of years of education of married couples


Years of
education
of husband

0-11
12
13-15
16f

Years of education of wife

0-11
283
82
20
4

12
141
180
104
52

13-15
25
43
43
41

16+
4
14

20

69

Fit the model of independence and the uniform association model to this data.
For the uniform association model, estimate O(1,2),(3,4). Comment.
2. Refer to the table above:

446

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

(a) Fit the row association model to the data and estimate the r's. Summarize the difference in the years of education distribution of husbands for
the different years of education of wives.
(b) Fit a column effects model to describe how the years of education distribution of wives differs for the years of education of husbands.
(c) Fit the uniform association and R+C association model. Compare these
with those in (a) and (b).
3. Refer again to the data in exercise 5.4; fit the R, RC, and the R+C models
to the data and comment on your results.
4. The data in the table below relate to the initial and follow-up blood pressure
according to hypertension status in the WHO definition (BPD).
Blood Pressure Normal Borderline Elevated
Normal
105
9
3
12
Borderline
10
1
Elevated
2
3
7
Fit both the row and column associations models to the data and comment.
5. The data in the table below are from Clogg and Shockey (1988) and relate to
voting for President Reagan by color and political views.
Political view
Color
1
2
4
6
3
5
7
1 13 44
White
155 92
100
18
1
2
Nonwhite
0
2
0
0
0
41
Other
White
12 57 71 146 61
8
Nonwhite
16 23
31
7
4
6
8
The political views, coded 1-7 range from extremely liberal to extremely conservative. Analyze these data by using scores for the variable "political view."
If you consider vote as a response variable, fit a suitable logit model to these
data based on the results of your earlier analysis.
Vote
Reagan

6. Bachman, Johnson, and O'Malley (1980) present Table 10.21 for two ordinal
variables. A sample of high school seniors in 1979 gave responses to the
classificatory variables: A, how often he or she rides a car for fun (1 = almost
every day, 2 = at least once a week, 3 = once or twice a month, 4 = a few
times a year and 5 = never) and B; drug use (1 = none, 2 = marijuana only,
3 = a few pills, 4 = more pills, 5 = heroin)
Fit the row, column, and RC models to these data. Interpret the estimated
scores for each model and test whether each of them would give a better fit
than the uniform association model.
7. Bachman, Johnson, and O'Malley (1980) also present Table 10.22 for two
ordinal variables. Here again, a sample of high school seniors in 1980 gave
responses to the classificatory variables: A, attitudes toward school (1 = like
very much, 2 = like quite a lot, 3 = like some, 4 = don't like very much, 5 =

10.12.

EXERCISES

447

A
1
2
3
4
5

1
290
402
206
157
81

Drug Use-B
2
3
4
296
196 395
342
148 216
118
58
62
72
27
37
48
27
27

5
29
10
3
2
1

Table 10.21: Source: Clogg & Shockey (1988)


don't like at all) and B, drug use (1 = none, 2 = marijuana only, 3 = a few
pills, 4 = more pills, 5 = heroin) as in above.
Again, fit the row, column, and RC models to these data. Interpret the
estimated scores for each model and test whether each of them would give a
better fit than the uniform association model.
A
1
2
3
4
5

1
199
398
372
72
20

Drug use-B
2
3
95
44
256
119
343 152
91
34
24
13

4
54
162
291
85
51

5
1
6
6
1
1

Table 10.22: Source: Clogg & Shockey (1988)


8. Fit relevant models discussed in this chapter to the rheumatoid arthritis data
in exercise 7 in chapter 5.. Repeat for the data in exercise 11 in chapter 6.
9. Duncan and McRae (1979) present the following data, which relate to the
evaluations of performances of radio and TV networks in 1959 and 1971. The
data previously appeared in Duncan et al., (1973).
Year
1959

1971

Respondents's
race
White
Black
White
Black

Performance networks
Poor Fair
Good
54
253
325
4
23
81
636
600
158
24
144
224

Table 10.23: Radio and TV network evaluation


Analyze this data and draw your conclusions.
10. Refer to the data in exercise 5, chapter 5, treating smoking level as an ordinal
variable. Fit ordinal models to the data and draw conclusions on the nature
of association present in the data.
11. Refer to the data in Table 6.31, treating strenous work as an ordinal variable.
Reanalyze these data using models discussed in this chapter.
12. The data below from Ku and Kullback (1974), which have also been analyzed
by Agresti (1984), relate to a sample of male residents of Framingham, MA,

448

CHAPTER 10. MODELS IN ORDINAL CONTINGENCY

TABLES

aged 40 to 59, which was classified on blood pressure (BP) and serum cholesterol (CH) levels. During a six-year follow up period, they were observed and
classified according to whether they developed coronary heart disease (D),
which is a binary response variable. The variables BP and CH are ordinal
factor variables.
Coronary
heart
disease
Present

Absent

Serum
cholesterol
(mg/lOOcc)
<200
200-219
220-259
> 260
<200
200-219
220-259
>260

Systolic
Blood Pressure (mm Hg)
< 127 127-146 147-166 167+
2
3
8
7

3
2
11
12

3
0
6
11

4
3
6
11

117
85
119
67

121
98
209
99

47
43
68
46

22
20
43
33

Table 10.24: Classification of men by blood pressure, serum cholesterol, and heart
failure
(a) Fit a log-linear model that the log odds of heart disease depends on blood
pressure and cholesterol, treated as categorical variables.
(b) Fit a log-linear model that says that the log odds of heart disease depends
on blood pressure and cholesterol, treated as covariates.
(c) Obtain confidence intervals for the effects of unit increases in blood pressure and cholesterol levels on the odds of heart disease.

Chapter 11

Analysis of Doubly Classified


Categorical Data
11.1

Introduction

In chapter 10, we discussed association models that provide insight into associations present in the general I x J contingency tables having ordered (one or both)
categories. These have been described as either the uniform, row, column, and RC
association models for the two-way tables.
To what extent can we say that a variable is ordinal? In other words, what
is the degree of ordinality of a variable? The variable age, for instance, may have
categories that represent a discretized version of an underlying continuous distribution, while the variable family size has categories that truly reflect an underlying
discrete distribution. On the other hand, we may have nebulous ordinal variables
like the variable having categories: {strongly agree, disagree,...,strongly disagree}
or a variable with categories: {social class 1, social class 2,...}. In the former, a
definite ordering does exist even if respondents all have different conception of the
locations of the relevant cutoff points. In the latter however, the existence of an
underlying continuum for the purportedly ordinal variable is itself open to question.
While the association models discussed in chapter 10 may be appropriate for the
general I x J table, there remains, as Upton (1985) puts it, "one distinct class of
data for which these models are inappropriate. These are the square I x I tables (
ordered or unordered) in which the classificatory variables are intimately related."
Such data usually arise from repeated measures or from longitudinal studies. Square
tables may arise in several different ways. I give below a few cases that may give
rise to such tables.
1. When a sample of individuals or subjects is cross-classified according to two
essentially similar categorical variables (e.g., vision of right and left eyes; strength
of right hand and strength of left hand). In Table 11.1 is an example which relates
unaided distance vision of 7477 women aged 30-39 employed in the Royal Ordnance
factories from 1943 to 1946 (Stuart, 1955).
2. When samples of pairs of matched individuals or subjects (arising from matchedpair design) such as husbands and wives, or fathers and sons are each classified
449

450

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

Right Eye Grade


Best (1)
Second (2)
Third (3)
Highest (4)
Total

Best
(1)
1520
234
117
36
1907

Left Eye Grade


Second Third Worst
(2)
(3)
(4)
266
124
66
1512
432
78
362
1772
205
492
82
179
2222
2507
841

Total
1976
2256
2456
789
7477

Table 11.1: Unaided distant vision as reported in Stuart (1955)


according to some categorical variable of interest, e.g., mobility, migration, religious
affiliation, attained highest qualification, etc. Two data examples that fall into this
category are displayed in Tables 11.2 and 11.3.
Father's
Status

(1)
(2)
(3)
(4)
(5)
Total

(1)
50
28
11
14
3
106

Son's Status
(2)
(3)
(4)
45
8
18
174
84
154
78
110
223
714
150
185
42
72
320
1429
489
459

(5)
8
55
96
447
411

Total

1017

3500

129
495
518
1510

848

Table 11.2: British Occupational mobility data, (Glass, 1955)


where,
1. Professional and high administrative
2. Managerial, executive and high supervisory
3. Low inspectional and supervisory
4. Routine non-manual and skilled manual
5. Semi and unskilled manual).

Residence
at age 16
NE
South
NC
West
Total

Current residence
North
North
East South Central West
263
22
14
13
26
399
36
30
10
41
368
46
1
8
51 4 8
300
470
423
237

Total
312
491
465
162
1430

Table 11.3: Migration data (Haberman, 1978)


3. In panel studies where each individual or subject in a sample is classified according to the same criterion at two different points in time, (e.g., party affiliation,
party loyalty, religious affiliation, etc.). Again two data examples are presented in
Tables 11.4 and 11.5.

11.2. SYMMETRY

MODELS

451
1970

1966
C
LIB
L
IND

Total

C
68
12
12
8
100

LIB
1
60
3
2
66

L
1
5
13
3
22

IND
7
10
2
6
25

Total
77
87
30
19
213

Table 11.4: British Election Study, Upton (1978)


where, C = Conservative, LIB = Liberal, L = Labor, and IND = Independents.
Table 11.5 refers to the religious mobility data for Great Britain (Breen & Hayes,
1996).
Affiliation
at age 16
1
2
3
4
Total

Religious Affiliation Now


1
2
3
4
863
30
1
52
50
320
0
33
1
1
28
1
27
8
0
33
941
359
29
119

Total
946
403
31
68
1448

Table 11.5: Subjects' religious affiliation (1991)


where: 1 = Protestant, 2 = Catholic, 3 Jewish, 4 = None or others.
4. In rating experiments in which a sample of TV individuals or subjects is rated
independently by the same two raters into one of / nominal or ordinal categories.
Thus, the entries in such a resulting / x / table relate to individuals that are jointly
classified into category i by the first rater and category j by the second rater. The
example in Table 11.6 arose from diagnosis of multiple sclerosis (MS) from Landis
and Koch (1977a): 149 Winnipeg patients were examined by two neurologists, one
from New Orleans, and the other from Winnipeg. The two neurologists classified
each patient into one of the following classes: (1 = Certain MS, 2 = Probable MS,
3 = Possible MS, 4 = Doubtful, unlikely, or definitely not MS).
New Orleans
neurologist
1
2
3
4
Total

Winnipeg neurologist
2
1
4
3
1
5
0
38
11
3
0
33
14
5
10
6
7
3
3
10
37
11
17
84

Total
44
47
35
23
149

Table 11.6: Diagnostic classification regarding multiple sclerosis for the Winnipeg
patients

11.2

Symmetry Models

When the single classificatory variable is nominal rather than ordinal, but individuals are jointly classified with this variable over time, like the data in Table 11.5,
where, strictly speaking, the categories of the variable cannot be truly considered

452

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

ordinal, we are usually interested in fitting models of symmetry and all its associated
decompositions or derivatives.
Specifically, we are concerned with associations that exhibit symmetric or pattern about the main diagonal of the table. I shall illustrate this group of models
with the models of complete symmetry, marginal symmetry or homogeneity, quasisymmetry, quasi-independence, and conditional symmetry.

11.2.1

The Complete Symmetry Model

If we let the joint distribution for the square table be given by {7^-} for 1 < ( i , j ) <
/, then the model of complete symmetry (S) has the hypothesized form:
HS : TTij = TTji

f O T l < i < j < I

(11.1)

The log-linear model formulation for this model can be written in the form:
In (my-) = n + \i + Xj + Aij

(11.2)

with \ij = Xji and }> \i j ^ Xj = 0 and \ ^ A^- = 0 j 1 , , / .


i

The MLE under the model of symmetry (S) has

>ii = - /it,

, .

for i = J

where fa denote the corresponding joint observed frequency in the i-ih row and
j-th column. The model S is based on /(/ l)/2 degrees of freedom. Both the
Pearson's and likelihood ratio statistics reduce respectively to the following under
this model:
i=2 j=i

i=l j=l

For the case when / = 2, Pearson's test reduces to McNemar's symmetry statistic,
V2
M

_ (/2i /i2) 2
~ ~~1
j77
/12 +
721

which is used to test the symmetry hypothesis in a 2 x 2 table. The statistic is


based on 1 d.f.

11.2.2

The Quasi-Symmetry Model

The symmetry model (S) rarely fits the data of interest because of its highly structured form. A less restrictive form of the hypothesis in (11.1) assumes that the
symmetric model would have held if it were not for the distorting effect of the marginal totals. The model of quasi- symmetry (QS) has the log-linear formulation of
the form:
ln(m y ) = /i + A? + Af + Agc, for i + j
(11.3)

with \%c = Ajf; ]T Af = ^ Af - 0 and Agc = 0, j = 1, - , / . The model


i

has the additional constraints Af = A^7 for i ^ j.

11.2. SYMMETRY MODELS

453

Model QS was first introduced by Caussinus (1965). The symmetry model is


the special case in which Af = A^7 for i = 1, 2, , / and where R and C relate to
the row and column variables respectively. The model is based on (/ !)(/ 2)/2
d.f and has the likelihood equations:
mi+ = /;+, i = 1,2, ,/
+j = /+j> j = 1,2, ,/
rhij + rhji = fa + fa, for i ^ j
For both models S and QS, it should be obvious that rha f a . We now propose a procedure for implementing these and other symmetry models in SAS^
using PROC GENMOD in the following sections. Our approach here is to fit these
models using the generalized linear modeling capability provided in SAS^ PROC
GENMOD by employing factor and regression generated variables (Lawal, 2001).
All models that will be considered in this chapter shall be applied to the data in
Table 11.1 as an example.

11.2.3

A Nonstandard Log-Linear Model

A nonstandard log-linear model (von Eye & Spiel, 1996) can be written in the
generalized linear model form (Clogg et al., 1990):
(11.4)
where X is a design matrix consisting of Os and Is that are derived from the factor
or regression variable, A is a vector of parameters, i = ln(m), and m is a I2 vector of
expected values under some model. The formulation above allows us to incorporate
various contrasts of interest in the factor variable as well as several other possible
models (von Eye &; Niedermeier, 1999).
To implement the symmetry and other similar models, we need to be able to
generate the appropriate factor or regression variables necessary for their implementation. Kateri (1993) and Kutylowski (1989) have discussed the generation of
factor variables required for the implementation of some of the models being considered in this chapter. Our implementation of the symmetry model here for instance,
is consistent with the procedure proposed in Friedl (1995) except that our factor
variable for the symmetry model is defined differently. Both ours and Friedl have
' ^ ' levels. The implementation of the symmetry model therefore would involve
only this single factor variable, whereas the approach by Kateri and Kutylowski
involves two such factor variables designated as sc_ and ss_ in both their papers.
Further, their programs are written for GLIM.
The factor variable for implementing the symmetry model in our case is generated for the general I x / table from the recurrence relation (Lawal & Sundheim,
2002) as:
Skh = Skh_l + (I + 2-h), for/i = 2 , . . . , ( / - f c )
(11.5)
where k =\ i j | = 0, 1, , ( / ! ) , k is the k-ih diagonal and S% k + 1. For
a 4 x 4 table for instance, k =\ i j |= 0, 1,3. The main diagonal elements have
k = 0 and h 2,3,4. Lawal and Sundheim (2002) have developed a SAS macro
for implementing all the models that are being considered in this chapter. In their
programming in SAS software for instance, the above recurrence relation and hence

454

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED

DATA

the entries for the factor variable S are generated with the following expressions for
all ( i , j ) :

_((k + l)-(i
13

+ l}(\i + !) + (! + 3)(i + 1) - 3 - 21

~ \(k + 1) - (j + l}(\j + !) + (/ + 3)(j + 1) - 3 - 21

if i < j
]fi<j

where k and I are as defined above. We note here that when i = j, then fc = 0
in (11.6). The S factor variable has levels that equal /(/ + l)/2 = 10 for the case
when 7 = 4. Hence, the resulting vector (this is indicated as a factor variable in
SAS software) necessary for implementing the complete symmetry model which is
generated from the above expression for the 4 x 4 table example is:

S=

1
2
3
4

2
5
6
7

3
6
8
9

4
7
9
10

We may note here that the factor variable defined for S has entries that do not
exactly match those generated in Friedl (1995), but the common feature of both
vectors here is that both have 10 levels as expected. The equivalent S' in Friedl is
generated from the expression:
S' = 2(i-V+2ti-V

for (ij) = 1 , 2 , - - - ,/

Both the complete symmetry (S) and the quasi-symmetry (QS) models can be
implemented in SAS software to the data in Table 11.1 by employing the GENMOD
procedure as follows:
data tabl;
do r=l to 4; do c=l to 4;
input count (80; output; end; end; end;
datalines ;
1520 266 124 66 234 1512 432 78 117 362 1772 205 36 82 179 492
proc genmod; class r e s ; model count=s/dist=poi ; /* fits the S model*/;
proc genmod; class r e s ; model count =r c s/dist=poi; /* fits the QS model */; run;

The above SAS software statements for the complete symmetry model for example
translate into the log-linear model:
where lij = \n(mij) and we will impose the usual last parameter set to zero identifiability constraint on the As. The above can also be rewritten as:
li = JL + \ Z\i + A2 Z<2i
<2ij + A3 Z^i + A Z/n + A Zi
+ A6 ZQIJ + A 7 Zjij + A8 Zgij + A9 ZQIJ + \ioZioij

where
0 elsewhere

tJ

0 elsewhere

The other indicator variables are similarly defined. However, because of the structure of S, there are only 10+1 = 11 such parameters including the common intercept.
Thus the equivalent log-linear model in this case reduces to:

11.2. SYMMETRY MODELS

ill
Il2
/13
Il4
hi

122

fas
Il4
/31

hi
hz
^34

1 41
1*12
^43

. ^44 .

' 1 1 0 0
1 0 1 0
1 0 0 1
1 0 0 0
1 0 1 0
1 0 0 0
1 0 0 0
1 0 0 0
1 0 0 1
1 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
. 1 0 0 0

455

0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0

0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0

0 0 0 0 '
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 1 0
0 0 0 0
1 0 0 0
0 0 1 0
0 0 0 1 .

A*

Al
A2
AS
\A
x\4

A5
6

A?
As
^o
Ao
/\y
A i >n1U
'

which can be written as:


(11.7)
where tij \n(niij} have the familiar expression in (12.15) form and X is the design
matrix consisting of Os and Is, which are derived from the indicator variables representing the levels of the factor variable S. For instance, columns 2 and 3 represent
respectively, indicator variables Z\ and Z^When models S and QS are applied to the data in Table 11.1, we have the
following expected values and the estimated local odds ratios under the symmetry
model:
1520.0 250.0
120.5 51.0
36.7718 0.5447 0.4761
250.0 1512.0 397.0 80.0
0.5447 16.9994 0.5377
120.5 397.0 1772.0 192.0
0.4761 0.5377 23.6497
51.0
80.0
192.0 492.0
For this model, 6?2 = 19.25 on 6 d.f, and for i < j,

#21

=1

Clearly, the complete symmetry model (S) does not fit the data. We shall examine
this further in chapter 11, section 11.2.5.
The QS model can also be characterized in terms of the symmetry of its odds
ratios. It has, Oij = Oji, where 9 is the local odds ratio. Because of this property,
Goodman (1979a) has described the QS model as the symmetric association model.
We illustrate this property below with the expected frequencies under the QS model
for the data in Table 11.1, together with its estimated local odds ratios.
1520.000 263.380
133.584 59.036
36.877 0.546 0.477
236.620 1512.000 418.986 88.394
0.546 17.052 0.539
107.416 375.014 1772.000 201.570
0.477 0.539 23.709
42.964
71.605
182.431 492.000
From the above results, we see that for this model, Oij = Oji. Further, for example:

456

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA


=

/mi4\ /ra 43 \
-
-
m
V

thus

34/

/133.584\ _
V107.416/
The above results will be true for all i and j. For this model, G2 = 7.27 on
(/ -!)(/- 2)/2 = 3 d.f. The model fits the data barely. Under the QS model,
=&j-ai

i<j

(11.8)

The expression in (11.8) therefore leads to the solutions: d2 = 1.1071; a3 =


1.2180; 0:4 = 1.3178 with di = 1.0000.

11.2.4

The Marginal Homogeneity Model (MHM)

When HS, the symmetry model holds, we have V TTJJ = V TTJ;, that is, 7T;+ = TT+;.
The latter is the formulation of the model of unconditional marginal homogeneity
(UMH) . The model assumes that the marginal totals are symmetric but the body of
the table is not. This model is a linear model and it is therefore not log-linear. For
the UMH model, the differences, X)i m *fc ~~ Z]i m fc*> ^ = 1?2, ,/ are (Forthofer
& Lehnen, 1981) compared with the null value of 0 in order to test UMH. We
shall distinguish this from the conditional marginal homogeneity (CMH) model
test, which shall be described later in the next section.
We see that the model of complete symmetry implies marginal homogeneity.
Specifically, if / = 2, both the S and UMH models are equivalent. But for / > 2,
while S implies UMH, the converse is not always true. That is, marginal homogeneity does not imply symmetry (S). The UMH model is based on (/ 1) d.f. and is
not log-linear. We give the implementation of this model in SAS software below
together with a modified output.
set tabl;
proc catmod; weight count;
response marginals; model r*c=_response_/ml freq; repeated time 2; run;
Analysis of Variance
Source
Intercept
time
Residual

DF

Chi-Square

Pr > ChiSq

3
3

78744.17
11.98

<.0001
0.0075

The goodness-of-fit test statistic for the UMH model is provided in the SAS software
output in the "TIME" line. In this case, the UMH model has G2 = 11.98 on 3
degrees of freedom, which indicates that the UMH model does not fit.
Often used, however, is the corresponding conditional marginal homogeneity
(CMH) model, which is related to both the symmetry and quasi-symmetry by the
following relation:
S = QSnCMH
(11.9)
A conditional test of marginal homogeneity (CMH) assuming that QS is true is
provided by examining the quantity G2(S) G2(QS). This conditional test for

11.2. SYMMETRY

MODELS

457

marginal homogeneity is based on /(/ l)/2 ( / ! ) ( / 2)/2 = ( / ! ) degrees


of freedom (d.f.). For this test to be valid, the QS model must hold true. In cases
where the QS is not true, then the unconditional test for marginal homogeneity
(that is, UMH) should be used. For the data in Table 11.1, since the model QS
holds true, the conditional marginal homogeneity (CMH) test is therefore based on
G2 = (19.25 - 7.27) = 11.98 on (6 - 3) = 3 degrees of freedom. This model does
not fit the data.

11.2.5

The Conditional Symmetry Model

We indicated earlier that the symmetry model does not fit the data in Table 11.1.
To see, why the complete symmetry model does not fit, consider below the lower
and upper off-diagonal triangles, AI and A2 below for the data.
Right
Left(j)
1
2
3
()
1
2
234
3
117 362
4
36
82
179
Lower-left AI i > j
HI = 1010

Right
Left(j)
1
2
3
(*)
1
266 124
2
432
3
4
Upper-right A2 i < j
n-2 = 1171

4
66
78
205

The symmetry model in terms of the two triangles above, provides a test whether
the probability of falling into cell (i, j) of AI is the same as the probability of
falling into cell ( i , j ) of A2- This model, however, does not take into account the
fact that the overall observed subtotals in the two triangles are not always equal.
In our example, these subtotals for example are 1010 and 1171, respectively. Thus,
nonequality of these subtotals can seriously affect the probability of membership
into each of the two triangles.
The conditional symmetry (CS) model, on the other hand, remedies this anomaly
by testing whether the probability of falling into both cells in the two triangles are
equal, after adjusting for the observed subtotals in the two triangles. That is, we
are interested in testing whether the probability of falling in cell (i, j) is the same
for both triangles, assuming that the probability of membership in the two triangles
is equal. In this case, the expected values are now weighted by the proportion of
cases in each triangle. Thus the CS model preserves the triangle totals.
The CS model, (McCullagh, 1978) can be formulated as:
7r i j=77r j i
for(i<j)
(11.10)
or in logit form as

/ m ..\
log I I = 77 for i < j

(11-11)

The regression variable required to implement this model in SAS software is given
below for a 4 x 4 table as:
1 1 1 1
2 1 1 1
CS2 2 1 1
2 2 2 1

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

458

and in general for an / x / table,

ry -J 1 ifi^J
~ \ 2 iti>j
The model is implemented in SAS software with the following:
set tabl;
proc genmod; class s; model count=s cs/dist=poi; run;

The implementation of this model implies that the CS model is a composite model
(Lawal & Upton, 1990b; Lawal, 1996). That is, we can modify the nonstandard
log-linear model above as:

ii:i = n + \skij + ncs

and the generalized linear model in (12.15) as:


In (m) = XA + Zu = X'A'
(H-12)
where X, the design matrix in (12.15) is augmented by adding newly generated
column(s) to represent either new factor variable(s) or regression variable(s). For
example, the column vector Z has
Z = [1 1 1 1 2 1 1 1 2 2 1 1 2 2 2 1]'
with corresponding parameter vector u = [AC5]'. Thus the CS model has the
design matrix X now modified to include the regression variable representing the
CS component. Consequently, the new design matrix X' now has 12 columns.
The model when applied to the data in Table 11.1 gives the expected values:
1520.000
231.542
111.605
47.235

268.455
1512.000
367.694
74.095

129.395
426.306
1772.000
177.827

54.765
85.906
206.173
492.000

36.974
0.505
0.476

0.585
17.093
0.498

0.476
0.577
23.779

Under this model,


In

=7

(11.13)

Thus the expected value in cell (2,1) under the CS model is 231.542, which is
(231.542/1010 = 0.2292) of the AI subtotal. Similarly, the expected value in cell
(1,2) is also 268.455, which is (268.455/1171 = 0.2292) of the A 2 subtotal. We can
therefore see that the (CS) model allows estimated probabilities to be the same for
the two triangles, even though the subtotals are not the same.
The above leads to the solution:
7 = Bij/Bji = 1.1594 for i < j
That is, the left eye is better than the right eye because 7 > 1.
When 7 = 1, we have the model of complete symmetry (S). The CS model is
based on (/ +!)(/ 2)/2 d.f. and has the likelihood equations:
my -I- rhji = fij + fji and
EE mn =
/

-J Z.

For this model G2 = 7.354 on 5 d.f. The model fits the data in Table 11.1. We may
note here that the complete symmetry model S implies either the CS model or the
QS model or both.

11.3. THE DIAGONAL-PARAMETERS

11.2.6

SYMMETRY MODELS

459

The Quasi-Conditional Symmetry Model

The quasi- conditional symmetry model (QCS) has the formulation:


*ij = 7 <f>j Kji
for (i < j)
(11-14)
where 7 is unspecified.
A special case of (11.14) is the conditional symmetry model which has <j>j = 1.
For the QCS model:
KijKjk'Kki

= ^Ttji^kjT^ik

(i < j < k]

That is, for this model, Q(ij,jk) &(ij,ik) 7 for 1 < i < j < k < I. The
QCS model is sometimes described as the extended quasi- symmetry (EQS) model
(Tomizawa, 1987). The model is based on /(/ 3)/2 degrees of freedom, fi(ij)St)
denotes the odds ratio for the 2 x 2 subtable formed from the i-ih and j-th rows
and the s-th and t-th columns, that is:

Q(ij,st)

0
"

= --

where

Estimates of O(fj jS t) under some model are obtained from its expected values
Thus,
A
_ rhis x rhjt x fhtj x rhsj
^ J' . '
msi x mtj x rriit x mjs
Model QS in this context has Q(ij,st) = 1 fr l<i<j<k<I. Further, for the
local odds ratios formed from adjacent' rows (i,i + I) and adjacent columns (j, j + 1),
n _ a
&ij Uji-

Model QCS is implemented with the following SAS software statements:


set tabl; proc genmod; class r e s ;
model count=r c s cs/dist=poi; run;

The model when applied to the data in Table 11.1 has G2 = 6.823 on 2 degrees of
freedom. The model does not fit the data.

11.3

The Diagonal- Parameters Symmetry Models

This class of models are appropriate to square tables having ordinal classificatory
variables.

11.3.1

The DPS Model

A decomposition of the (CS) model is Goodman's (1979b) diagonal-parameters symmetry model (DPS), which has the multiplicative form:
TTij =7Tji6k

i> j

where k i j and the parameter 6k represents the odds that an observation falls
in cell (i, j), instead of in cell (j, i), k 1, 2, , (/ 1). The CS model has 6k 7
for all k.
Model DPS has the nonstandard log-linear model,
ij = A6 + \j + \j

and can be implemented in SAS software with the following statements:

460

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

set tabl; proc genmod; class s d;


model count=s d/dist=poi; run;

where D is the factor variable denned as:


" 7 1
4 7
D=
5 4
6 5
When the model is applied to the data
' 1520.000 269.070
121.401
230.930 1512.000 427.284
ihij =
119.599
366.716 1772.000
79.402
36.000
177.354

2
1
7
4

3 "
2
1
7

in Table 11.1, we have:


66.000
1" 36.987
80.599
On =
0.468
206.646
L 0.719
492.000

0.626 0.347
17.099 0.618
0.462 23.788

The model has G2 = 0.498 on (/ - 2)(/ - l)/2 = 3 d.f. If we define T^ as the log
of parameter 6j-i, that is, Tj-i = In (?-;), then maximum likelihood estimates of
Tj-i from this model are {r\ = 0.1529, f 2 = 0.0150, f 3 = 0.6061. Consequently, we
have

Si = 1.1652
<52 = 1-0151
<53 = 1.8334
Because these parameter estimates are each greater than 1, we can therefore conclude that the left eye vision is better than the right eye vision. These results are
consistent with those obtained in Goodman (1972). Because of our parameterization
here, our values are the reciprocals of those in Goodman (1972).
The models based on the null hypotheses HQI : Si = 6% and #02 : ^2 = 1 have
respectively (72 = 2.03, and G2 0.52 on 4 degrees of freedom (Goodman, 1979c).
Lawal (2001) has provided an alternative form of testing these and other similar
null hypotheses for these kind of data. Both models based on H 01 and #02 > fit
the data very well but then we would not encourage making statistical inference
based on these models, since they are arrived at after the analysis of the data.
In terms of expected values under this model, therefore, we have:
In (17112} ~ In (ra2i) = 0.1529 = n
In (mi 3 ) - In (ra3i) - 0.0150 = f 2
In(rai 4 ) In(m 4 i) = 0.6061 = f 3

We may note here that In (rh23) In (ra32) = In (ra23) In (ra32) = T\ and In (ra24)
In(ra 42 ) = f 2 .
With Tj^i defined as the log of parameter Sj-i, we can show that these log
parameter estimates satisfy:
j-i

t=l s=l

where In (Ai) = 0.
Similarly, in terms of the log odds ratios, that is, $ij, and using (11.15), we have
In (A 2 ) = $i2 - $21 = 0.2907

= 2f! - f 2

In (A 3 ) = $13 - 4>3i = -0.7290 = -T3 + 2f2 + n

11.3. THE DIAGONAL-PARAMETERS SYMMETRY MODELS

461

Again, we note here that, In (A 2 ) = $23 $32 = 2fi TIThe DPS model for any 4 x 4 table for 1 < i < j < k < I has, for example:
O(12,23)

O(23,34)

eXp{2Ti T^}

0(13,34) = 0(12,24) = exp{ T 3 -f 2f2 + T\ }

11.3.2

The LDPS Model

Agresti (1983) considered a simpler version of the DPS model, the linear diagonalsparameter symmetry (LDPS) model, in which In (6k} Tk have a linear pattern.
The model has the multiplicative form:
*- = TTj-i 6j-*
j >i
(11.16)
where the log odds that an observation is a certain distance above the main diagonal,
instead of the same distance below it, is assumed to depend linearly on the distance
(Agresti, 1983). The model is also referred to by Tomizawa (1986) as simply the
linear diagonals-parameter model (LDP) and (11.16) above leads to:
= T (j i}

for i < j

(11.17)

where r = In (6}. The nonstandard log-linear model formulation for this model is,
^. = fj, + Ag + r Ag
The LDPS model for the 4 x 4 table for 1 < i < j < k < I has:
0(12,23) = 0(23,34) = 1
0(13,34) = 0(12,24) =
'(12,34) 1

The model is based on (1+ !)(/ 2)/2 degrees of freedom and can be implemented
with the SAS software statements:
set tabl; proc genmod;
class s; model count=s f/dist=poi; run;

where F is a regression variable denned as:

F=

1
1
1
1

2
1
1
1

3
2
1
1

4
3
2
1

The expected values under this model for the data in Table 11.1 are given below:

m,-,- =

1520.000
236.630
107.648
42.879

263.370
1512.000
375.768
71.468

133.352
59.121
418.232
88.532
1772.000 202.268
181.732 492.000

For this model, G2 = 7.2804 on (/ + !)(/- 2)/2 = 5 d.f., ln(/) = 0.1071 and
In

mi

= j(j-i} = 0.1071 (j-i)

for

i<j

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

462
where 7 = In

= 0.1071 for this data.

An extension of the LDPS model that employs the two-ratios parameter


(Tomizawa, 1987) is the 2-ratios-parameter symmetry (2RPS) model which is defined as:
KIJ = KjiJ}?)3

for (i < j)

(11.18)

The MLE rhij satisfies under this model the following equations:

yy

and

jL^s / ^'

f{J-2(j-0}

(J - 0}

1 < (i, J) < /

The model in (11.18) reduces to models LDPS and CS when 77 = 1 and 6 = 1,


respectively.
The model is based on (I2 I 4)/2 degrees of freedom and can be implemented
with the SAS software statements:
set tabl; proc genmod;
class s; model count=s f cs/dist=poi; run;

The model when applied to the data in Table 11.1 has G2 = 6.8252 on 4 d.f.,
log-parameter estimates / = 0.0577, cs = 0.0743 and expected values:
1520.000
233.530
109.110
44.726

11.3.3

266.470
1512.000
370.845
72.438

131.890
423.155
1772.000
179.351

57.274
87.562
204.649
492.000

36.932
0.525
0.477

0.565
17.074
0.518

0.477
0.558
23.753

The QDPS Model

A further decomposition of the (DPS) model is the quasi- diagonals parameter symmetry (QDPS) model which takes the form:
TTij = <f>j TTji 6k

for (i < j)

(11.19)

where the parameter <5fc is the log odds that an observation falls in cell (i,j) satisfying
j i = k instead of a cell (j, i) satisfying j i = k, k = 1, 2, , /. The quasidiagonals parameter symmetry (QDPS) model satisfies the expression in (11.15)
and has the nonstandard log-linear model form:

The model is implemented in SAS software with the statement:


set tabl; proc genmod;
class r c s d; model count=r c s d/dist=poi; run;

For the quasi- diagonals parameter symmetry (QDPS) model, the expected values
and the corresponding odds-ratios are:

11.4. THE ODDS-SYMMETRY MODELS

463

1520.000 267.923 122.077 66.000


232.077 1512.000 432.000 79.923
118.923 362.000 1772.000 203.077
180.923 492.000
36.000
80.077

36.962
0.467
0.731

0.627 0.342
17.133 0.619
0.462 23.729

For this model, G2 = 0.2222 on (7 - 2)(7 - 3)/2 = 1 d.f. and log estimates of D
under this model are r\ = 0.0567 and f 2 = 0.4077.
Following Tomizawa (1987), the log odds ratios satisfy the following:

The above implies that for i < j, the log odds ratio ${j is Aj_i+i more than the
log odds ratio $ji. That is, $jj $jj will be uniform for i < j when j i is a
constant. This implies that for the data in Table 11.1 in our example, the log odds
ratios satisfy:
.
.
.
In (A 2 ) = $12 - $21 = 0.2946 = -f2 + 2n
In (A 3 ) = $13 - $3i = -0.7596 = 2f2 - n
with again, ln(A 2 ) = $23 $32 = 0.2946.
The model has for the 4 x 4 table the following:
6(12,23) = A2

= exp{-f2 + 2fi} = 6(23)34)

(11.20a)

6(12,34) = A3

= exp{2f2 - fi}

(11.20b)

6(13,34) = A 2 A3 = 6Xp{f2 + fi)}

11.4

= 6(12,24)

(11.20C)

The Odds-Symmetry Models

Two extensions of the conditional symmetry model are the odds-symmetry models
I and II (Tomizawa, 1985b) which are defined respectively as:
HOSI:

^^ = ^_

{i<j)

H-OS2 '

(1

J)

Following Tomizawa (1985a), model OSl for instance indicates that


the odds that the column value is in j instead of j + 1 in row i in the
upper-right triangle of the table is equal to the symmetric odds that the
row value is in j instead of j + 1 in column i in the lower-left triangle
of the same table
Model OS2 can be similarly interpreted.
For unspecified parameters r^ and Sj, both models can be expressed respectively
as:
TT
Kij
i-\
(i<j)
(11.22a)
HoSl
- = Ti
(t
< J)
TT-a

TT

n-OS-2 '

"V

Tt-ii

Sj

(*<j)

(11.22b)

The two models are log-linear and are easily programmed in SAS software.

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

464

A simple generalization of the odds-symmetry models is the quasi-odds symmetry


(QOS) model, Tomizawa (1985b). Model (QOS) is equivalent to the adjusted quasisymmetry (AQS) model described in Bishop et al. (1975) because the model adjusts
to preserve the marginal totals of both the lower-left AI, i > j, and the upper-right
^2, i < ji of observed frequency, rather than the marginal totals of the original
contingency table. Under this model, expected values in cells (1, 2), (2,1), (/
I,/), (/,/ 1) are exactly identical to the original observed counts. For the QOS
model, tlijjk exP(7?) for l<i<j<k<I and this translates into the following:
^12,23 = ^12,24 = 7l
^13,34 = ^23,34 = 72

and

^12,34 1

Models OSI, OS2 are each based on (/-!)(/- 2)/2 d.f. while model QOS is based
on(/-2)(/-3)/2d.f.
The factor variables for implementing models OSI and OS2 (Lawal & Sundheim,
2000) are given respectively as:

OS1 =
Models OSI, OS2, and QOS are implemented in SAS software with the following
statements:
set tabl;
proc genmod; class s osl; model count=s osl/dist=poi; run; *** fits OSI***;
proc genmod; class s os2; model count=s os2/dist=poi; run; *** fits OS2***;
proc genmod; class r e s osl; model count=r c s osl/dist=poi; run; *** fits QOS***;

For the odds-symmetry I (OSI) model, the expected values and the corresponding
odds ratios are:
1520.000
229.537
110.637
46.826

270.463
1512.000
369.535
74.465

130.363
424.465
1772.000
179.000

55.174
85.535
205.000
492.000

37.093
0.507
0.476

0.582
17.081
0.501

0.476
0.574
23.759

For this model, G2 = 7.2637 on 3 d.f. and log estimates of parameters under this
model are 71 = 0.1641, 72 = 0.1386,7s = 0.1356. Consequently, for i < j, we have:
,In
>
,* . =n 0.1641
iyii
= TIn -
= In - =, ln(7i)
rn2ij
''''23 \\
i / "^z^i \
i /w \
T
- 0.1386
In ^ ) = ln ( 3 ) = ln(^2)
A

= 0.1356
Further, the log odds ratios satisfy for i < j:
$12 - $21 = ln(72) = 0.1386
$23 - $32 = ln(7s) = 0.1356
$13 - $31 = 0.000

11.4. THE ODDS-SYMMETRY

MODELS

465

Similarly, the odds-symmetry II model when applied to the data has a G2 7.2757
on 3 d.f. and log estimates of parameters under this model are 71 = 0.1613,72 =
0.1491,7s = 0.1282, again leading to:
>
_ , =m0.1613
io
In
= In - = ,In = ln(7i)
=m

In -r-^

= ln(73)

0.1491

= 0.1282

Again, the log odds ratios satisfy under this model:


$12 $21 = hi(73) 0.1282
$23 - $32 = ln(72) = 0.1491
$13 $31 0.000

The estimated log odds ratios under the OS1 model can be summarized as follows:

elsewhere

Similarly, the estimated log-odds ratios under the OS2 models satisfy the following
summary relation:
0

elsewhere

When the quasi odds-symmetry model is applied to the data in Table 11.1, we have
G2 = 6.7934 on 1 d.f. Obviously, this model does not fit the data. However, the
expected values and corresponding estimated odds ratios under this model are given
as:
1520.000
234.000
108.186
44.814

266.000
1512.000
370.814
73.186

132.814 57.186
423.186 86.814
1772.000 205.000
179.000 492.000

36.923
0.531
0.477

0.561
17.074
0.512

0.477
0.564
23.759

What are the expected values in the NW and SE corners? We observe that these
values correspond exactly to the observed frequencies. These are as a result of
the constraints imposed under this model. Under model QOS, the log parameter
estimates are:
71 = 0.0000
72 = 0.0552
73 = 0.0970
And for i < j we have,
if
elsewhere

In Table 11.7 are displayed the goodness-of fit statistic G2 when the models described above are fitted to the selected 4 x 4 tables indicated.

466

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

Model
S

QS
UMH
CMH
CS
QCS
DPS
LDPS
QDPS
2RPS
OS1
OS2
QOS

d.f.
6
3
3
3
5
2
3
5
1
4
3
3
1

Table 11.1
G*
19.249
7.271
11.977
11.978
7.354
6.823
0.498
7.280
0.222
6.825
7.264
7.276
6.793

Table 11.3
Gl
65.081
3.842
57.108
61.239
46.379
3.577
35.780
37.980
3.160
36.875
22.895
3.833
0.462

Table 11.7: Results of above models applied to the 4 x 4 data in Tables 11.1 to 11.3

11.5

Generalized Independence Models

For the 7 x 7 square contingency table, with row variable denoted by R and column
variable denoted by C, let fij and rhij denote, respectively, the observed and corresponding expected frequencies associated with the cell in row i and column j. We
shall assume here that a multinomial sampling scheme applies to the 7 x 7 table,
though other sampling schemes may also be assumed without loss of generality.
The log-linear model for an 7 x 7 table can be written in the form:
R

RC

0 = fl,, -+j_ AJ
\ -+_i_ A\ -(-u A^j
\
tij

with the usual identifiability constraints imposed. In this setup, X^c relate to
the interaction term in the model. Goodman (1979b, 1985) has considered several
models having different structures for X^c. The model of independence (O), for
instance, has A^c = 0 in the formulation above. A nonindependence model assumes
the independence model as the baseline and tries to model the interaction structure
X^c. In this context, therefore, the independence model is often referred to as
the null or baseline model. Often, X^c is modeled as a function of the local odds
ratios Oij = (m^mi+ij+i^mij+imi+ij) (Goodman, 1979b). Yamagushi (1990)
has also modeled the interaction term in terms of Qij = In (Oij/Qji)If we define $ij = ln(^-), as the log-odds ratios, then for doubly classified
/ x / contingency tables having ordinal categories, the $ has a diagonal pattern for
most of the models that are being considered in this chapter section. For instance,
the independence model (O) has $ij = 0 for (i,j) = 1,2, , ( / ! ) . The independence model applied to the data in Table 11.1 gives a G2 = 5704.27 on 9 d.f.
Clearly, this model does not fit the data. It is well known that the independence
model is not adequate for the description of the observed frequencies in the general 7 x 7 ordered tables in which the classificatory variables are intimately related.
For example, for the data in Table 11.1, TV = 7477 and the diagonal cell counts
are fa = (1520,1512,1772,492} for i = 1, ,4. The diagonal cells therefore account for about 71% of the entire data. Any model therefore that must explain the
observed variations in the data must take cognizance of the diagonal cells. The corresponding expected values for the diagonal cells under the model of independence

11.5. GENERALIZED INDEPENDENCE MODELS

467

are: rha = {503.976,670.434,823.484,88.745}. We see that the diagonal counts


are generally underestimated under this model. We may therefore consider a wide
range of modifications to this baseline model (Goodman, 1972).
The inadequacy of the independence model is due primarily to the deflated
main diagonal expected counts under the model. Thus, the introduction of a single
deflation factor, say, e, applied to the off-diagonal cells has the log-linear model
formulation:
iij = v + Af + A? + 6ij6
(11.23)
where ]T) Af = ^ Af = 0 and 6ij is defined such that:
~\Q

ifi=j

The model given by (11.23) is equivalent to the model obtained by the introduction
of an inflation factor, say e, applied to the main diagonal cells (Goodman, 1972).
Again, in this case, the 5ij takes the form:

The model described by either methods has been termed the constant loyalty or
uniform loyalty model and Goodman refers to it in the context of social mobility as
the uniform inheritance model, while Scheuren and Oh (1975) name it the smoothedquasi-independence model. In the context of modeling agreement data, it has been
described as the model of exact agreement. The model has one parameter more than
the model of independence. Consequently, the model is based on (/ I) 2 1 =
1(1 2} degrees of freedom. Following Lawal and Upton (1990b), the model will be
designated here as model (L).
Since we are going to employ the nonstandard log-linear formulations to fit all
the models that will be discussed in this chapter, the non-standard log-linear form
for model (L) is therefore given by:
i^ = / x + Af + A? + <j>Lij
(11.24)
where L is a regression variable, defined as:

2
1
1
1

1
2
1
1

1
1
2
1

1
1
1
2

If the value of e is not constant down the main diagonal, then we have the variable
loyalty or the non-uniform loyalty model. This model is also well known as the
quasi-independence model. The model is also designated as the model Q with e
replaced with ei in the log-linear model formulation in (11.23). The model has
i , i 1,2, ,'/ extra parameters than the baseline (O) model. Hence it has
(/ I) 2 / = (I2 37 + 1) degrees of freedom. The model Q is more familiarly
known as the Mover-Stayer model (Upton & Sarlvik, 1981) or the model of quasiperfect mobility discussed in Bishop et al., (1975). The parameters e; were termed
by Goodman (1969) as the new index of immobility. Both models when applied to
the data in Table 11.1 has G2 = 492.465 and 199.106 on 8 and 5 degrees of freedom
respectively. Neither model fits the data.

468

11.5.1

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

The Uniform Association Model

The uniform association (U) model has the multiplicative form:


mij = ai(3j9ij
and both the odds ratios and log odds ratios are expressed as:
9ij = 9 and

The model has one more parameter (viz. 9) than the (O) model and thus has
/(/ 2) degree of freedom. For this model all local odds ratios are equal and there
is said to be a uniform local association for all cells in the square table. The model is
sometimes referred to as the uniform diagonals model. This model is implemented
in SAS software by defining a regression variable U i*j. Models L, Q, and U are
implemented in SAS software with the following statements respectively:
set tabl; u=r*c;
proc genmod; class r c; model count=r c L/dist=poi; run;
proc genmod; class r c q; model count=r c q/dist=poi; run;
proc genmod; class r c; model count=r c u/dist=poi; run;

Model U when implemented in SAS software to Table 11.1, gives a G2 = 1818.870


on 8 d.f. and Oij 1.3582 V ( i , j ) - Hence, the estimated log odd ratios under
model U therefore becomes $i = 0.3062.

11.6

Diagonal Models

In this section, we consider the class of nonindependence models, which can be


classified into three different groupings. These three groupings of diagonal models
are described (Upton, 1985) below:
(i) The principal diagonal class models where $ij = 0 unless i = j.
(ii) The diagonal band class models where $ij = 0 unless \i j \< 1
(iii) The full diagonal class models, which need not have zero terms but preserve
features of the original structure of the simpler models.
We describe below some of the models that belong to the classes enumerated above.

11.6.1

The Principal Diagonal Models

Both the fixed-distance and variable distance models (Goodman, 1972, 1979b, and
Haberman, 1978) belong to this class of models. The fixed distance model is equivalent (in the context of social mobility) to the vertical mobility model "V" in Hope
(1982), which represents vertical distances traveled from the "origin" class to the
"destination" class.

11. 6. DIAGONAL MODELS

11.6.2

469

The Fixed and Variable Distance Models

The fixed distance model with constant or fixed distance parameter 8 (that is,
adjacent categories have constant distances between them) for an / x I table has
the multiplicative form:
iriij = ai/3j8k, for k =\ i j \
The model, following Lawal and Upton (1990b), is usually designated as model F.
For this model, the structure of the log odds ratios is given by:
i

f 6 if i = j

* = {

0 ititj

,.,, ^-s
(1L25)

We note here that the U model is equivalent to the above model if we replace
k=\i j | by the product i * j.
The corresponding log-linear formulation for the model is as follows:
l n ( m y ) = ^ + A? + Af + Ag

(i, j = 1, 2, , J)

(11.26)

where

The model has one (<5) more parameter than the O model and hence is based on
/(/ 2) degrees of freedom. Model F has the nonstandard log-linear formulation:

where F is the regression variable defined earlier under the (LDPS) model.
The variable distance model is also a member of the principal diagonal class models
and has the multiplicative form:
where

/j-i

f[ 5k if i < j
k=i
i-l

That is, 61, 2,- ,/_i are the distances from categories 1 to 2, 2 to 3,- , and ( / I )
to / respectively. That is, it assumes different intervals among the categories.
The structure of the log odds ratios under this model for some 5^,1 < k < ( / I )
can be derived as:
This variable distance model (V) is based on (7 1)(7 2) degrees of freedom. The
equivalent log-linear formulation of this model is similarly given by (12.14) except
that X now takes the form, for some &, 1 < fc < (7 1):
j-i
T<S
s -> fc

k=i

i-l

(i<rt

470

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

where the d = {61,62, , <5/-i} are the distances from categories 1 to 2, 2 to 3,-
and ( 7 1 ) to 7, respectively.
Model (V) also have the nonstandard log-linear model formulation given by:
where the

i_i are factor variables defined for a 4 x 4 table as:


2 2 1 1
2 1 1 1
2 2 2 1
2 2 1 1
2 2 2
2 2 2 1
Vo
1 1 2 2
2 2 2
2 2 2 1
1 2 2 2
1 1 1 2
1 1 2 2
In general, for an / x I table, we need to construct (/ 1) such patterned factor variables to implement model V in SAS software. Both models F and V are
implemented in SAS software with the following statements:
proc genmod; class r c vl-v3;
model count=r c f/dist=poi; run; /* fits fixed distance model*/;
model count=r c vl-v3/dist=poi; run; /* fits variable distance model*/;

The log-parameter estimates when PROC GENMOD in SAS is applied to the


data in Table 11.1 are given by: / = 2.6964 for model F and v\ = 1.5671; 2 =
1.1941; 7)3 = 1.3276 for model V. Consequently, the estimated log odds ratios
under model F and V using equations (11.25) and (11.27) become:
1.567 0.000 0.000
2.696 0.000 0.000
0.000 1.194 0.000
0.000 2.696 0.000
0.000 0.000 1.328
0.000 0.000 2.696
The estimates of <& satisfy the condition required for being classified as a principal
diagonal class model.
Alternatively, if variables V\, V-2, V-j for model V are defined as quantitative variables in the CLASS statement of PROC GENMOD in SAS, then the nonstandard
log-linear model can be written as:
and the structure of the odds ratios in this case takes the form:
* i if i = j
0
elsewhere
for i = 1,2, ,(/ !). In the above formulation, the log parameter estimates are
given in this case as:
In (1) = 1.5671
In ( 2 ) = 0.5970
In (3) = 0.4425
Substitution of these values in the above expression for $>ij leads to the same results
obtained earlier.
Models F and V applied to the data in Table 11.1 give G2 = 255.7083 on 8 d.f. and
G2 204.6620 on 6 d.f., respectively. Again, neither model fits the data.

11.6.3

The Diagonal Band Models

This class of models has


$ij = 0 unless

i j < 1

The uniform loyalty, the quasi-independence, and the triangles parameters models
belong to this class of models. We discuss these models in what follows:

471

11.6. DIAGONAL MODELS


The Uniform Loyalty Model

The constant loyalty model (L) discussed earlier belongs to this class of models. In
the language of mobility, the model differentiates those who do not change from
those who do. That is, the model differentiates between the diagonal and the offdiagonal cells where the diagonal members are assumed to be homogeneous, that is
they are all assumed to have the same probability of inheritance.
The model has the nonstandard log-linear model formulation in (11.24) and in
general, for any square table, the structure of the log odds ratios under model L
becomes (Goodman, 1969):
\i-j\=l
elsewhere

where In (</>) is the log estimate of parameter </>. When PROC GENMOD is applied
to the data in Table 11.1, In (</>) is 1.9107 and the estimated log odds ratios become:
3.821 -1.911
0.000
-1.911
3.821 -1.911
0.000 -1.911 3.821
The Quasi-Independence Model
The quasi-independence (Q) described earlier and reformulated below (Tomizawa,
1992) belongs to this class of models.
mij = ai(3j^ij where
fa, = 1 for i ^ j
The corresponding log-linear model formulation is given by:
where 6ij is as defined previously. Specifically, model Q is sometimes described as
the symmetric diagonal band model and as discussed earlier, the model is based on
(J2 37 + 1) degrees of freedom.
The model has a nonstandard log-linear model:
where Q is a factor variable defined as:

1
5
5
5

5
2
5
5

5
5
3
5

5
5
5
4

If we therefore let q = {#1, #2,9s, ?4, #5} be a vector of parameters (usually, 7+1),
then it can be shown (Lawal, 2002b), that estimated log odds ratios under this
model can be obtained from the following expression.
if i = j
0

if i < j and \ i j = 1
for | i - j > I

472

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

Note that qi+\ = 0 and the estimates of the log-parameter vector q when the model
is implemented in SAS software are given by:
q = {2.9381, 1.2610, 1.5326, 2.4439, 0}
and from the general expressions above, the estimated log odds ratios under model
Q are calculated as:
4.199 -1.261
0.000
-1.261
2.794 -1.533
0.000 -1.533
3.977

11.6.4

The Triangles Parameters Model

Goodman (1985) described the nonindependence triangle (T) model, where

where

f n for i > j
7tj = \ T2 for i < j
[ 1 for i = j
The r parameters pertain to the upper-right triangle and lower left triangles in the
square table. The model has been described as the asymmetric diagonal band model
(Goodman, 1972).
The model has a nonstandard log-linear model form:
where T is a factor variable defined for a 4 x 4 table as:
T=

1
2
2
2

1
1
2
2

1
1
1
2

1
1
1
1

The structure of the estimated log odds-ratios, 3>ij can be written more succinctly
as:
i
if i < j

The log parameter estimates when this model is implemented in SAS software are:
{TI = 1.7996, r-2 = 2.0240}. Hence, the estimated log odds ratios become using
the above expressions:
.
T-3

3.824
-2.024
0.000

-1.800
3.824
-2.024

0.000
-1.800
3.824

Model T has 2 more parameters than the model O. Consequently, it is based on


(I2 21 1) degrees of freedom.
The estimates of <E> for models L, T, and Q all satisfy the condition for being
classified as a diagonal band class model, since 6=0 for | i j |> 1.
We can again implement these class of models in SAS software by the following
statements:

11.7. THE FULL DIAGONAL MODELS

473

set tabl;
proc genmod; class r c; model count=r c L/dist=poi;run; /* fits model (L)*/;
proc genmod; class r e t ; model count=r c t/dist=poi; run; /* fits model (T)*/;
proc genmod; class r c q; model count=r c q/dist=poi; run;/* fits model (Q)*/;

Models T and Q have respectively G2 = 488.0967 and G2 = 199.1062 on 7 and 5


degrees of freedom. Again neither models fits the data in Table 11.1.

11.7

The Full Diagonal Models

Several models have been developed to take into account the diagonal symmetry
of square tables with ordinal categories. We will assume here that changing status
by one step in either direction on the scale has a different probability than that for
two steps, etc, for greater distances. Goodman (1972) introduced the diagonal D
model, which has
m^ - Q.if3j5k for i ^ j
(11.28)
where k = i j. In this model, we are assuming that changing status are like
a random walk (with drift) with steps taken with different probabilities in each
direction. The model is sometimes called the asymmetric minor diagonal model.
If we consider the 2 x 2 subtable formed from adjacent rows (i,i + ~L) and adjacent
columns (j, j + 1), then the odds ratio for this subtable becomes using (11.28)
(Goodman, 1969):

which simplifies to
l

4-i)

(11.29)

with k = (i j } . Hence, taking logarithms,


$ fc = In (6l/Sk+l <5 fc _i) = 2 In (6k) - In (<J fc+1 ) - In (S^)

(11.30)

where the #'s are the log-parameters of the model. There would be (21 3) such
distinct log odds ratios corresponding to the (21 3) diagonals for an / x I table.
These log-odds are conveniently labeled <I> S , s = (/ 2), , (J 2). Because
there are (21 3) parameters for this model, let us define a parameter vector d as:
d = {di, cfoj > ^(/-i)}> which for our 4 x 4 example becomes:
d = {6/1,^2, C/3, 6/4,^5, ds,d7}

The model has a nonstandard log-linear model:


^ = A * + A? + A? + Ag
where D for the 4 x 4 Table 11.1 example, is the same D factor variable presented
in our discussion of the (DPS) model. Although, the factor variable D implies that
there are (21 1) parameters to be estimated; however, two of these parameters are
redundant, since there are only (21 3) such distinct parameters (diagonals). Consequently, parameters dQ and d7 are set to zero in the SAS software implementation
and the estimates of the distinct parameters of the model are:
di = -2.5734, d2 = -4.6413, d3 = -5.7242, d4 = -0.6090, d5 = -0.4042
with d$ = dj = 0.000. Estimated log odds-ratios under model (D) satisfy:

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

474

4}
ifi=j
= { ~{dk+i - 24 + 4-i}
for
i < j and k = j - i
-{d(i+k') ~ 2d(/+fc'_i) + d(i+k>-2)} for i > j and k' = i - j
where k = 1, 2, , ( / 2) and similarly k' = 1, 2, , ( / 2). In the application
of the above expression for &ij for the i > j case, we must assume that d/_i = 0 in
this case for our expression to work.
Thus k = (j i) = 1,2 correspond to log parameter estimates {^1,^21^3} =
{-2.5734,-4.6413,-5.7242} for i < j. Similarly, k' = (i - j) = 1,2 correspond
to parameters {d^d^^do} = {0.6090,0.4042,0.000} for i > j respectively with
dj 0. For this model therefore, estimated log odds ratios are given by:
3.182
-0.814
-0.200

-0.506 -0.980
3.182 -0.506
-0.814 3.182

For instance, $21 and $31 have k' = 1 and k' = 2 respectively. Hence,
$21 = -(4 - 2d4 + 4) = -{-0.4042 - 2(-0.6090) - 0.000} = -0.814
$31 = -(4 - 24 + 4) = -{0.0 - 2(-0.4042) + (-0.6090)} = -0.1995
In general, 6^7-1 = ^27-2 = 0 under this model.

11.7.1

The Diagonals-Absolute Model

The diagonals-Absolute (DA) model has the multiplicative form:


ra,:,- =

(11.31)

for i

where k =\ i j \. The model has been referred to as the symmetric minor diagonal
model (Lindsey, 1989; Fingleton, 1984) where the categories are assumed to be
ordered and changing categories one step in either direction on the scale has a
different probability than that for two steps, and so on, for greater distances.
The nonstandard log-linear model for this model is:
C

DA

where DA is a factor variable defined as:


DA =

4 1
1 4
2 1
3

For model (DA), there are / parameters to be estimated and if we denote such a
parameter vector for a 4 x 4 table as: v = {y\, ^2, ^3, ^4}', then the estimated log
odds ratios $ij are given by:
\2i>k - i>fc+1 - *_! if k = 1, 2, - , ( / - 2)
where k = \i j\ and VQ VI = 0. When this model is implemented in SAS software
to the data in Table 11.1, the log-parameter estimates are:

11.7. THE FULL DIAGONAL MODELS

475

i> = {>i,z>2,z>3,4} = {-1.5895, -2.5286, -2.8288, 0.0000}


The estimated log odd ratios under model DA therefore become:
3.179
-0.650
-0.639

-0.650
3.179
-0.650

-0.639
-0.650
3.179

Models D, and DA have respectively when applied to the full table, (/ 2)2 and
(/ !)(/ 2) degrees of freedom (Goodman, 1986a). The models are implemented
in SAS software with the following statements.
proc genmod; class r e d ; model count=r c d/dist=poi; run; /* fits model (D)*/;
proc genmod; class r c da; model count=r c da/dist=poi; run; /* fits model (DA)*/;

The uniform association model U described earlier is a member of this class of


models as it has &ij = (j> for all (i, j). For this model all local odds-ratios are equal
and there is said to be a uniform local association for all cells in the square table.
The model has /(/ 2) d.f. and is sometimes referred to as the uniform diagonals
model.

11.7.2

Composite Models

Each of models O, U, L, F, V, T, D, and DA described above cannot alone completely explain the complex variation in most occupational mobility or similar data
we are likely to come across. We will therefore consider second-order or third-order
combinations of these models. These combinations of models or simply composite
models generally fit better such data than the individual models. We examine some
of these composite models that have received attention.
The composite model approach has been successfully employed by Goodman
(1972, 1979a, 1986a,b, 1991), Upton (1985), Lawal and Upton (1990b), and Lawal
(1992d) among others. For instance, the model DAT employed in Goodman (1986a)
is a combination of models DA and T.
Model
O
U
F
UF
L
V
UV
LV
DA
D
VD
T
UT

d.f.
9
8
8
7
8
6
5
5
6
4
2
7
6

G2
5704.27
463.45
27.47
27.25
921.49
19.22
17.11
13.66
23.32
96.36
2.38
907.67
51.18

Model
Q
QU

QF
LF
UL

QV
VDA
VT
QDA
QD
DAT
QT
FT

d.f.
5
4
4
7
7
4
4
4
3
1
5
4
6

G2
475.63
43.36
13.66
26.45
52.37
13.66
6.07
12.54
5.72
2.02
21.91
466.39
25.18

Table 11.8: Results of analysis (up to first order only) for Table 11.1
Similarly, model LF, which is a combination of the loyalty and fixed distance models,
has been described as the loyalty-distance model by Upton and Sarlvik (1981). They

476

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

found that the model works very well with voting data. The model is based on
(/2 37 + 1) d.f. Similarly, model VT is a combination of models V and T while
model UV is a combination of U and V. Model UV for instance has been shown to
be the most parsimonious model for analyzing the 5 x 5 British social mobility data
(Lawal, 1992d) in Table 11.2.
Other composite models are models fitted to the off-diagonal cells (i ^ j ) of
square tables which are prefixed here by Q. Some of these are the QO, QV, QUV,
QFT, QVT, QT, and QU models. Thus model QUV is the composite model UV
fitted to the off-diagonal cells. Model QO for instance is the quasi-independence
model while model QV has been described by Goodman (1972) as the crossingparameter model. We present in Tables 11.8 and 11.9, the results of the applications
of all the models considered in this chapter to the 4 x 4 and 5 x 5 data in Tables
11.1 and 11.2, respectively.
Model
O
U
F
UF
L
V
UV
LV
DA
D
VD
T
UT

d.f.
16
15
15
14
15
12
11
11
12
9
6
14
13

G'2
792.1901
79.4411
93.0318
55.6923
475.973
38.196
10.762
14.528
54.034
105.428
6.910
463.752
52.346

Model
Q
QU
QF
LF
UL
QV
VDA
VT
QDA
QD
DAT
QT
FT

d.f.
11
10
10
14
14
9
9
10
8
5
11
10
13

G2
235.782
14.079
17.791
66.613
54.241
7.736
10.448
12.222
12.804
9.486
52.115
227.933
64.393

Table 11.9: Results for the 5 x 5 British social mobility data in Table 11.2
We notice from the results that the fixed distance and variable distance models when
fitted to the off-diagonal cells, that is, models, QF and QV are both equivalent for
4 x 4 tables even though both models are based respectively on /(/ 3) and (/ 2) 2
degrees of freedom. When / = 4 both models therefore have the same number of
degrees of freedom, hence the equivalence in this case. However, real differences
between both models can be observed with tables in which / > 5. For 3 x 3 tables,
while model QF does not exist, model QV, does exist and is based on 1 degree of
freedom.
Parsimonious models for each of the tables discussed at the beginning of this
chapter can be found from one or more combinations of the models discussed so
far in this chapter. For instance, for the data in Table 11.1, while model VDA
with a G2 = 6.07 on 4 d.f. would be the most parsimonious under this group
of models, we recall that both the conditional symmetry and LDPS models give
G2 = 7.354, 7.2804 on 5 degrees of freedom, respectively. Clearly, model LDPS is
the most parsimonious for the data in Table 11.1.
We also give below some of the models that are equivalent to some of the models
discussed earlier for 4 x 4 tables only.

11.8. CLASSIFICATION

OF ALL THE ABOVE MODELS

477

D = DPS
QD = QDPS
QDAT = QCS
QDA = QS

11.8

Classification of All the Above Models

All the symmetry-based and generalized independence models discussed so far in


this chapter can be classified into three categories, namely:
(a) Nonindependence Models:
These group of models has the independence model (O) as its baseline model.
Consequently, the independence model is often described for this group as
the null model. Belonging to this group are models L, F, V, Q, T, D, and
DA. Goodman (1986a) refers to this group of models as the nonindependence
models. Each of the above models can be applied to the off-diagonal cells of
the table. In such cases, the models will carry a prefix Q, the exception being
model Q itself.
(b) Asymmetric Models:
A model that captures deviations from the symmetry model is described as
the asymmetric model. Such model would have the symmetry model as the
baseline model and, consequently, the symmetry model will be referred to in
this case as the null (O) asymmetry model (Goodman, 1985). The symmetry
model is therefore the null model for this group. Belonging to this group are
models CS, LDPS, DPS, and 2RPS. They are composite model S+T, S+F,
S+D, and S+T+F, respectively. Also belonging to this category are the oddssymmetry models I and II, that is, models OS1 and OS2. Both models are
again composite models S+OS1 and S+OS2 respectively. In all these cases,
the symmetry model (S) acts as the baseline model.
(c) Skew-Symmetric Models:
Yamagushi (1990) introduced the skew-symmetry level models, which are characterized by deviations from the (QS) model, and following Yamagushi (1990),
the (QS) model will be described as the null skew-symmetric model (0). That
is, the QS model is the baseline model for this category of models. Belonging
to this group are models QS, QDPS, QCS, and QOS. Model QS, which is the
null model for this group has been described by Goodman as the RC asymmetry model. Models QDPS, QCS, and QOS are composite models QS+D,
QS+T, and QS+OS1, respectively. Yamagushi (1990) has described the QCS
model, for instance, as the uniform skew-symmetric level model or as the
triangles-parameter skew-symmetry (SPsK) model. Similarly, model QOS is
described as the middle-value-effect skew-symmetry model designated in Yamagushi (1990) as the MSK model. The QDPS model on the other hand has
been described as the diagonals-parameter skew-symmetry model which Yamagushi designates as the DPsK model. We present the equivalent models as
described in Yamagushi (1990).

478

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA


QCS = (SPSK)
QDPS = DPSK
QOS = MSK

We may note here that the models in this category are equivalent to the symmetry
-f- nonindependence models discussed in Goodman (1985).
Parsimonious models for the data in Tables 11.1, 11.2, and 11.3 are presented in
Table 11.10 under the three different classifications presented above.
Table
11.1
11.2
11.3

Model
QDA
UV
LF

Non-Ind
d.f.
G2
3
5.72
11
10.762
7
9.37

Asymmetric
Model
d.f.
&
DPS
3
0.498
CS
9
10.346
OS2
3
3.833

Skew-symmetric
Model
d.f.
G*
QDPS
1
0.222
QCS
5
2.697
QS
3
3.842

Table 11.10: Parsimonious models for three set of data


Lawal and Sundheim (2002) have written a SAS macro that will fit all the models
discussed in the previous sections for any given / x / contingency table.

11.9

The Bradley- Terry Model

Consider a group of individuals being asked to respond and compare all possible
pairs of 7 items, stating which is preferred, that is, to make comparisons between
all possible pairs of such items. Such a situation could arise in a rating experiment,
where, for instance, a cigarette smoker may be asked to rate I brands of cigarettes for
taste. For given pairs of cigarette brands, a rater could probably state a preference
after smoking them at the same occasion. Data collected this way is often referred
to as pairwise comparison data. Such data results in square tables showing how
many individuals prefer each brand of cigarettes as opposed to each other. The
two variables of interest here can be described Lindsey (1989) as prefer and not
prefer, each having / categories (the number of brands or items to be compared).
We are generally interested in ranking the brands in order of "global preference"
for the group of individuals. The ranks are obtained from the number of positive
preferences expressed.
A model that is suitable for preference data is that proposed by Bradley and
Terry (1952). The model assumes that each item or brand has a probability TTJ of
being preferred. Thus the probability that brand B{ is preferred to brand Bj is
P(Bi > Bj) = Ui:i =

(11.32)

where 11^ is the conditional probability that brand i is preferred to brand j, iti > 0
(i 1, 2, , /), and ]T\ 7^ = 1. Further, we shall assume that H^ + Tlji = 1. The
model assumes the independence of the ratings of the same pair by different judges
and different pairs by the same judge.
If we let Xij be the observed number of times that Bi is preferred to Bj in the
comparisons, then, the expected frequencies under the Bradley- Terry model denoted
here by rh^-, is given by the following expression (Fienberg & Larntz, 1976):

11.9. THE BRADLEY- TERRY MODEL

where na = xa +

479

#,
I i"j i

"*J"*

TTi+TT,-

nioox

I J- J.. OO /
V

'

i. and the estimates of the expected frequencies must satisfy:


rhij + rhji = Xij + Xji = n^ for i ^ j
(11.34)

The above constraints suggest that the Bradley-Terry model can be implemented by
fitting the model of quasi-symmetry (QS) to the generating 7 x 7 table having zeros
on the main diagonal. Such a model would be based on (7 1)(7 2)/2 degrees
of freedom. That is, the Bradley-Terry model is equivalent to a quasi-symmetry
model fitted to the square table having zeros on its main diagonal. To implement
fitting this model in SAS software, we can consider two approaches.
1. The first approach fits the B-T model by simply fitting the quasi-symmetry
model to the 7 x 7 table, using the same approach we discussed earlier.
2. Alternatively, we can fit a model having a Preference factor variable and
employ the equivalent S-factor design matrix discussed earlier for the 6 x 6
table. The advantage of this approach is that we can readily recognize the
order of preference from estimates of the parameters of the Preference factor
variable.

11.9.1

Example 11.1

The table below is from Andersen (1980). The table relates to preferences expressed for a series of six collective facilities in a Danish municipality. The data was
originally analyzed in Lindsey (1989).
Preferred
1
2
3
4
5
6

1
49
50
54
61
69

Not preferred
3
4
5
17
25 22
35 34 16
42
40 22
33
43 37
61 54 44
64 63 62 51
2
29

6
9
14
15
16
27
-

Table 11.11: Preference for collective facilities in Denmark


Table 11.11 above is a 6 x 6 table having zeros on its main diagonal. Since the
Bradley-Terry model is equivalent to the model of quasi-symmetry fitted to the
above table, we can implement the B-T model with the following SAS software
statements.
data terry;
do pref=l to 6;
do notp=l to 6;
input count <6<S>; output; end; end;
datalines;
0 29 25 22 17 9 . . . 6 9 64 63 62 51 0
data two; set terry; input bt s <B<8;
datalines;
0 1 1 2 2 3 3 4 4 5 5 6 . . . 5 6 9 11 12 15 14 18 15 20 0 21

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

480

run;
***Fits QS Model****;
proc genmod; make 'obstats' out=aa; class pref notp s;
model count=pref notp s/dist=poi link=log obstats; run;
***Fits PREF+SYMMETRY Model****;
proc genmod; where pref ne notp; make 'obstats' out=bb;
class pref notp bt s;
model count=pref s/dist=poi link=log obstats; run;

where for a 6 x 6 table, we have the factor variables S and BT defined as:

S=

1 2
2 7
3 8 12
4 9 13
5 10 14
6 11 15

13
16
17
18

5
10
14
17
19
20

0
1
2
3
4
5

6
11
15
18
20
21

1
0
6 0 10 11
7 10 0 13
8 11 13 0
9 12 14 15

12
14
15
0

Instead of the S factor variable, we could also use the BT symmetric factor variable
generated above, which has zeros on its main diagonal and is constructed in such a
way that, this factor variable has the number of levels that is equal to the number
of possible pairings, that is, /(/ l)/2 which equals 15 in this case.
The QS model when applied to the data gives a G2 = 6.0721 on 10 degrees
of freedom. The {PREF+BT} model also gives G2 = 6.0721 but with the same
degrees of freedom. Both procedures lead to exactly the same results.
The Bradley-Terry model fits well the data but in order to determine the order of ranking or preference, we need to compute the estimated probabilities for
each preference given by (11.33). The expected frequencies under these models are
displayed in Table 11.12.
PREFERRED
1
2
3
4
5
6

1
-

2
30.071

47.929
49.993
52.728
62.984
69.366

42.842
45.202
55.797
65.088

NOT PREFERRED
3
4
25.007 23.272
34.158 31.798
36.093
40.907
51.469 49.995
62.459 60.842

5
15.016
21.203
24.531
27.005

8.634
12.912
15.541
17.158
26.754

51.246

Table 11.12: Expected values under the Bradley-Terry model

11.9.2

Computing Estimated Probabilities

From the constraints above, it is not too difficult to see that


mJX

Therefore, if we let
mi

for i 1 , 2 , - - - , I ; j = 1,2, , / , then, the expected probabilities under the


Bradley-Terry model are given by:

11.9. THE BRADLEY-TERRY MODEL

481
1

TTi

(11.35)

For the data example above, using (11. 35), we have the following expressions to
estimate TTI for instance,

Adding the above and noting that

1-

^ !

01

,47.924
30.071

= 1 leads to:
-

49.993
25.007

, 61
+ -)TI
m16

ft i.

52.728
23.272

62.984
15.016

69.366,
8.634;

1-

= (1.5939 + 1.9991 + 2.2657 + 4.1945 + 8.0342)7Ti

1 -

- (18.0875)7Ti

Hence, u>i = 18.0875, and


1

= 0.0524
19.0875
The table below gives the values of u; and the corresponding estimated probabilities.
i
1
2
3
4
5
6

Ui
18.0875
10.9755
8.5480
7.4244
3.5506
1.3758

Ki
0.0524
0.0835
0.1047
0.1187
0.2198
0.4209

From the magnitudes of the estimated probabilities in the above table, we can
therefore conclude that the preferences are ranked in the order (6, 5, 4, 3, 2, 1),
that is, in the same order as they are presented in Table 11.11 above, with facility
6 being the most preferred.
Alternatively, from the PREF+SYMMETRY fit, we have the following parameter estimates from SAS software for factor variable Preference (pref).
Parameter
Intercept
pref
pref
pref
pref
pref
pref

DF

1
2
3
4
5
6

1
1
1
1
1
1
0

Estimate

3
-2
-1
-1
-1
-0
0

9366
0837
6175
3910
2658
6499
0000

Standard
Error

0
0
0
0
0
0
0

1246
1652
1576
1554
1541
1519
0000

ChiSquare

997
159
105
80
67
18

54
13
39
17
44
31

Pr > ChiSq

.0001
.0001
.0001
.0001
.0001
.0001

Relative to category 6 of variable PREF, we see that the magnitudes of the parameters are in the order (0, -0.65, -1.26, -1.39, -1.62, -2.08). These estimates retain
the order as the data was presented again. This approach is much simpler since
we do not need to compute probabilities to determine the order of preference or
ranking.

482

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

The conditional probabilities 11^ can be computed from either the expected
frequencies or estimated probabilities. For example,
ra,t,J

TTi + TTj

"

TTlij + '

For the data in example 11.1, we have for instance,


^
7T2
0.0835
7T2+7T3

0.1882

77123

34.158

77.000

= 0.444
= 0.444

Below are the computed conditional probabilities H^ for the data in Table 11.11.
0.6145
0.6666
0.6938
0.8075
0.8893

11.9.3

0.3855 0.3334
0.4436
0.5564
0.5870 0.5313
0.7246 0.6772
0.8345 0.8008

0.3062 0.1925 0.1107 "


0.4130 0.2754 0.1655
0.4687 0.3228 0.1992
0.3507 0.2200
0.6493
0.3430
0.7800 0.6570

Example 11.2

The following example relates to the matches for five professional women tennis
players in 1988 (Agresti, 1990).

Winner
Graf
Navratilova
Sabatini
Evert
Shriver

Graf
0
2
0
1

Navratilova
1
0
2
0

Loser
Sabatini
3
3
1
1

Evert
2
3
2
1

Shriver
2
2
1
0
-

Table 11.13: Results of 1988 tennis tournament for five women players
The Bradley-Terry model fits the data well with G2 value of 8.0071 on (5 - 1)(5
2)/2 = 6 degrees of freedom. The estimated probabilities are:
Player
Graf
Navratilova
Sabatini
Evert
Shriver

TTt

0.3477
0.3151
0.1264
0.0826
0.1281

Based on the above estimates of the probabilities, there is a need to re-rank the
players. The new ranking, which must take cognizance of the estimated probabilities
would be: (Graf, Navratilova, Shriver, Sabatini, and Evert) in that order.
The corresponding RANK parameter estimates from SAS software output when
the alternative method is employed is displayed in the following:

11.10.

MEASURES OF AGREEMENT

Intercept
rank
rank
rank
rank
rank

1
2
3
4
5

DF

Estimate

Standard
Error

1
1
1
1
1
0

-0,.4977
0.,9988
0..9003
-0,.0131
-0,.4386
0.,0000

1,.0711
0,,8994
0,.9485
0,.8766
0.,9788
0.,0000

483
ChiSquare Pr > ChiSq

0,.22
1.,23
0,.90
0,.00
0,.20

0..6422
0..2668
0,.3425
0,.9881
0.,6541

The above log estimates relate to (Graf, Nav, Sabb, Evert, Shriver) = (0.999, 0.900,
0.013,0.439, 0). Based on the magnitude of these estimates, obviously, there is
a need to re-rank these estimates in order of magnitude. Consequently, the new
ranking would be in the order (0.999, 0.900, 0, -0.013, -0.439), that is, in the
order (Graf, Navratilova, Shriver, Sabbatini, Evert). This ranking is consistent
again with that obtained from the estimated probabilities.
The estimated conditional probabilities based on the original ranking are given
as:
0.5246 0.7334 0.8080 0.7308
0.4754
0.7137 0.7923 0.7110
0.2666 0.2863
0.6048 0.4967
0.3921
0.1920 0.2077 0.3952
0.2692 0.2890 0.5033 0.6079
Thus the predicted probability that Graf would beat Navratilova in a match in 1988
is 0.5246, while the probability that Navratilova will defeat Graf in the same match
is 1 0.5246 = 0.4754. Similar interpretations could be given to the other estimated
predicted probabilities. These models are implemented in SAS software with S and
RANK defined for the 5 x 5 appropriately as in previous sections.
Lawal (2002a) has applied the Bradley-Terry model to the 1984-1993 10-year
season results from the 14 teams in the American Professional Baseball, comprising of teams from both Eastern and Western divisions. The model incorporates
home field advantage into the Bradley-Terry model for each of the 10 years. Logit
and Poisson regression approaches were employed to model the usual "home field"
advantage. The results extend those provided in Agresti (1990).

11.10

Measures of Agreement

We consider here the case in which a sample of N individuals or subjects is rated


independently by the same two raters (A and B), to one of I nominal or ordinal
categories. The responses fij of the two raters can then be used to construct a twoway 7x1 contingency table with the main diagonal cells representing the agreement
between the two raters. The fij relate to the number of subjects jointly classified
into category i by rater A and category j by rater B. Thus maximum agreement
occurs when both raters give the same categorical response. For nominal scale
of measurement, a table may display association (which is the dependency of one
categorical level in A on another in B) and yet low or high agreement. For example,
if on an ordinal scale of measurements objects are consistently rated one level higher
than rater B, then we would expect the association to be strong but agreement to
be weak.
Several authors have argued the case for the often-called chance-agreement effect, where for example two raters A and B employ different set of criteria for

484

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

classifying objects. In such a case, the observed agreement will be said to be primarily due to chance. We shall be concerned here with observed agreement for the
beyond-chance situations.We discuss below two measures of agreement.

11.10.1

Two Measures of Agreement

The first measure of agreement is the kappa measure of agreement, proposed by


Cohen (1960), which measures the proportion of agreement between two raters. The
kappa is an adjustment for agreement by chance, as defined under independence.
The measure is defined in terms of observed frequencies as:
Ar v*
f \~*
f f/4-7
J-v
_
f/ -* JTa
"H
z/ Ji J/i-u
* i v r
/ 1 1 ^tf^\
fc

/ ^1 J 1 v J i~^

As pointed out by Tanner and Young (1985), the numerator in the expression for
kappa above can be rewritten as:
fa - rhu)

where rhu = lilL

which shows that k is based on the sum of differences between the observed and
expected cell counts on the main diagonal of the table with the expected counts
being those obtained under the model of independence. Thus k employs the model
of independence as a baseline for measuring the beyond chance agreement between
two raters. Thus an observed cell count will be considered discrepant if it is significantly different from the corresponding expected cell count under the model of
independence.
K ranges from oo < K < 1. I t i s O when the observed and expected by chance
alone amounts of agreement are equal, and it equals 1 when there is complete
agreement between the raters. Kappa will be positive if the observed agreement is
greater than chance agreement and will be negative if the observed agreement is
less than the chance agreement. The asymptotic variance of kappa is given (Fleiss
et al., 1969) as:

where u; = ^pi+p+i,
i

pi+

N(l-u>2)
= ^P/ty, and p^ = fy/N.
3

Jolayemi (1990a) also proposed a measure of agreement, denoted here by r,


where 1 < r < 1, and defined as:
f V A where
.
X2
(n-37)
A is an R2-type statistic, and X2 is the value of Pearson's goodness-of-fit test
statistic under the model of independence. He classified the agreement as being
poor or almost perfect as follows:
0.00-0.20 Poor
0.21 - 0.40 Slight
0.41 - 0.60 Moderate
0.61 - 0.81 Substantial
> 0.81 Almost perfect

11.10.

MEASURES OF AGREEMENT

485

The r measure of agreement has been demonstrated to be better than the K measure
for not too large sample sizes.

11.10.2

Example 11.3

The data in Table 11.14 below relate self-reporting of crimes as reported in San
Jose, CA, from the Law Enforcement Assistance Administration (1972). The data
were reported in Bishop et al. (1975). The 5 x 5 table is a cross-classification of
original police descriptions of a sample of crimes versus the victims' categorization
of the crimes based on recall
Police
categorization
Assault
Burglary
Larceny
Robbery
Rape
Totals

Assault
33
0
0
0
5
38

Victim's recall
Burglary Larceny Robbery
0
91
12
0
0
103

5
0
0
54
0
59

0
2
56
6
0
64

Rape

Totals

1
0
0
0
25
26

39
93
68
60
30
290

Table 11.14: Law Enforcement Assistance Administration data.


For the above data: V /,+/+ = [39 x 38 H

+ 26 x 30] = 19733. Hence,

. 75110 - 19733

K=
~2
= 0.8603
290 - 19733
Similarly, a model of independence fitted to the above data yields an X"2 = 872.6969,
and hence,
872.6969
A =
4 x 290
= 0.7523
From the above, r \/0.7523 = 0.8674. This estimated value of r agrees very
closely with the K estimate.
There is therefore strong evidence of a very strong agreement (almost perfect)
between the characterization of the crimes by the police and the victims. In order
words, there is a strong agreement between the police and victims in their characterizations of the crimes than if the characterizations were independent. The characterizations are dependent, since the model of independence yields an X2 = 872.6969
on 16 degrees of freedom. The SAS software implementation of the above result is
presented below together with a relevant output.
data agree;
do pol=l to 5; do vic=l to 5;
input L Q sym cs count <8<8; output; end; end;
datalines;
2 2 1 0 33
2 6 15 0 25
proc freq; weight count; tables pol*vic/agree; run;

Statistic

Kappa Statistics
Value
ASE
95"/, Confidence Limits

Simple Kappa
0.8603
Weighted Kappa
0.8465
Sample Size = 290

0.0236
0.0312

0.8140
0.7854

0.9066
0.9076

486

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

Other models that are considered for the data in order to explain fully the dependent
structure in the data are the symmetry model, the conditional symmetry model and
the unconditional marginal homogeneity model (UMH). The symmetry model gives
a G2 value of 26.0851 on 10 d.f. The conditional symmetry also gives a G2 value
of 18.5134 on 9 d.f., while the UMH gives a value of 22.4182 on 4 d.f. The quasisymmetry model is very difficult to fit to this data because of so many zeros. The
data clearly provide strong evidence against marginal homogeneity.
Apart from the independence model, which assumes independent ratings by the
two raters, the model below measures the structure of the overall agreement in the
table and has the log-linear model formulation:
In (rriij) = /z + Xf + \f + 8^1(1 = j)

(11.38)

where I(i = j) is an indicator function. The case in which 8ij = 6 has (11.38) being
described as the homogeneous agreement model (HA), which measures the overall
agreement in the table. The model has two components, the first three terms on the
RHS representing chance and the last term representing agreement. The model is
very sensitive only to discrepancies that may be present on the main diagonal. The
model is sensitive only to discrepancies that may be present on the main diagonal.
Melia and Diener-West describe the parameter 6 in model (11.38) as the exact
agreement term, and the model has been referred to as the exact agreement model.
This model when implemented is equivalent to the uniform loyalty model (Upton
fe Sarlvik, 1981) or the equal weight agreement model (Tanner & Young, 1985) and
has G2 = 76.7793 on 15 degrees of freedom. The estimate of parameter 5 is 3.570,
with asymptotic standard error (a.s.e. = 0.2124) indicating very strong agreement
along the main diagonals of the table.
A model similar to the above but that have the 6ij 5; is the familiar quasiindependence or nonuniform loyalty model (Q) discussed earlier. This model when
applied to data has G2 = 67.5917 on 11 degrees of freedom. The quasi-independence
model when used in the context of agreement of raters can help us assess patterns
of agreement (Tanner & Young, 1985). Tanner and Young describe this model as
the differential weight agreement model. For the data in Table 11.14, neither model
HA or Q fits the data.

11.10.3

The Case of Ordinal Categories

While models HA and Q may be appropriate for situations in which the categories
are nominal, they are unsuitable for situations in which the response variables are
ordinal, as, for example, in a rating experiment. This is because, while both models
may account for the high concentration of observations on the main diagonal, they
do not account for the off-diagonal cells where disagreement between the raters are
manifested. Further, with ordered categories, high ratings by rater A are almost
always accompanied by high ratings by rater B . Similarly, low ratings by A are also
accompanied by low ratings by B. In this case, Agresti (1990) suggests that we might
consider the beyond-chance agreement model as consisting of two components. The
first component relates to the linear-by-linear association between the raters, and
a second component that reflects agreement in excess of what we would normally
expect by chance from the linear-by-linear baseline model.

11.10.

MEASURES OF AGREEMENT

487

(a) The first component which is the linear-by-linear association component between the raters is modeled as:
\n(mij) =n + X? + \f -i-puiUj
(11.39)
where {HI} are ordered fixed scores assigned to the levels of the ordinal scale
such that HI < U2 < < uj. With integer scores, then, Ui = i, and the model
in (11.39) becomes the linear-by-linear association (LL) model in Goodman
(1979a). The model is based on /(/ 2) degrees of freedom. The model
is equivalent to the uniform association model (U) when integer scores are
employed. As Agresti (1996) observed, the parameter (3 in equation (11.39)
relates to the direction and strength of the association between A and B. The
model in (11.39) reduces to the independence model when [3 = 0, and if (3 > 0,
then the association is positive, that is, A increases as B increases. Similarly,
when (3 < 0, then the association is negative, and A decreases as B increases.
Further, the association is stronger as \/3\ increases.
(b) The second component reflects agreement in excess of what we would normally
expect by chance from the linear-by-linear baseline model. Agresti (1990)
suggests a log-linear model of the form
In (rriij) = LL + AtA + Xf + (3uiUj + 8I(i = j)
(11.40)
This model which is based on (J2 27 1) degrees of freedom is referred to
as the parsimonious quasi-symmetry (PQS) model. A generalization of the
above model occurs when we do not have homogeneity along the diagonals,
in which case 6 = 6i for i = 1 , 2 , - - - ,/. In this case, the log-linear model
formulation becomes
In (rriij) = fj. + Xf + \f + fiuiUj + 8ijl(i = j }
(H-41)
The model expressed by (11.41) has been described by Goodman (1979a) as
the quasi-uniform association (QUA) model and it is based on 1(13) degrees
of freedom.
Another model that is often employed is the ordinal quasi-symmetry (OQS) model,
which has the log-linear model formulation (Agresti, 1996):
In (rriij) = n + Xf + Xf + \ff

+ /3Ui

(11.42)

where \fjB = A^"4 for all i and j. For this model,


Af - Af = 0Ui

It is obvious that this model is equivalent to the quasi-symmetry model when (3 = 0.


When (3 > 0, then the responses are more likely to be at the low end of the ordinal
scale for the column variable than for the row variable, and that when (3 < 0, the
mean response will be higher for the column variable (Agresti, 1996).
The structure of the estimated log odds ratios formed from adjacent rows and
column categories (that is, local odds ratios) under models (11.39) and (11.40) are
given respectively by:
fc = 4, for (i, j) = 1,2, - . , / - 1
(11.43)
and

488

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED

DATA

26 + p i f ( i = j)
-* + P if | i-.7 |=1
(11.44)
/3 elsewhere
On the other hand, the odds ratios from 2 x 2 subtable formed from rows i and j
and columns i and j is defined as:
n

9n =

TflnTfljj

. . .

for % / i
rriijinji

In this case, the corresponding log odds, given in terms of parameter estimates for
models (11.39), and (11.40), respectively, are:
$ij = (ui - urf/3
2

and

4ij = (m - Uj) p + 26

(11.45a)
(11.45b)

where (i, j) = 1,2, , ( / 1). Following Darrock and McCloud (1986), two categories i and j, are said to be indistinguishable if Oij 1 and Oik Qjk for all k ^ i,j.
Consequently, the index of distinguishability which is based on the odds-ratios is:
Vij = 1 - 0--1

(11.46)

Clearly, Vij is maximum when there is perfect agreement. For most cases 0 < Vij <
1.
We may observe here that there is a relationship between the expected values
under model (OQS) as expressed in (11.42) and to those obtained from the the
linear-diagonal parameters model (LDPS) expressed in (11.17). Consequently, for
4 x 4 tables, models OQS and LDPS are equivalent. That is,
OQS = LDPS
Application of the LDPS model therefore, gives direct estimate of the (3 parameter
in model OQS. The model can also be implemented in SAS software by the use of
PROC LOGISTIC.

11.10.4

Example 11.4

As an example, we consider the data below which arose from diagnosis of multiple
sclerosis, reported in Weslund and Kurland (1953). Sixty-nine New Orleans patients
were examined by two neurologists, one from New Orleans, and the other from
Winnipeg. The two neurologists classified each patient into one of the following
classes: (1) certain multiple sclerosis; (2) probable multiple sclerosis; (3) possible
multiple sclerosis (odds 50:50); and (4) doubtful, unlikely, or definitely not multiple
sclerosis.
Employing the methodology described in the previous sections, the complete symmetry (S) when applied to the above data gives G2 = 11.9483 on 6 d.f. The
conditional symmetry (CS) model also gives a G2 value of 6.3575 on 5 d.f., while
the quasi-symmetry (QS) model gives a G2 = 2.0367 on 3 d.f. Since the categories
are considered ordered, then a conditional marginal symmetry model is obtained
(since the conditional symmetry model holds) as: G2 = 11.9483 6.3575 = 5.5908
on 1 d.f. The conditional test is valid because the CS model holds. The symmetry,
the conditional symmetry and the QS models all fit these data. The marginal homogeneity hypothesis, however, is not tenable for the data. The symmetry model

11.10.

MEASURES OF AGREEMENT

489

Winnipeg neurologist
New Orleans
neurologist
1
2
3
4
Totals

1
5
3
2
1
11

2
3
11
13
2
29

3
0
4
3
4
11

0
0
4
14
18

Totals
8
18
22
21
69

Table 11.15: Diagnostic classification regarding multiple sclerosis for the New Orleans patients
indicates that mis- classification by either pathologist were compensating misclassification (n^ approximately equals to n^) for all i ^ j. Also QS fitting the data
similarly indicates that the association is roughly symmetric.
Let us now focus on implementing the log-linear models discussed in the previous
sections to the data above. These models can be implemented in SAS software with
the following statements.
data agree;
do new=l to 4; do win=l to 4; input L Q sym cs count fflffl;
ul=new; vl=uin; output; end; end;
datalines;
2 2 1 0 5 ... 25 10 0 14
****Fit exact agreement (HA) model****;
proc genmod; class new win s q; model count=new win L/dist=poi link=log; run;
****Fit QUA model****;
proc genmod; class new win s q; model count=new win Q/dist=poi link=log;
****Fit linear-by-linear (U) model****;
proc genmod; class new win s q; model count=new win ul*vl/dist=poi link=log;
****Fit PQS model****;
model count=new win ul*vl L/dist=poi link=log; model count=ul S/dist=poi link=log;
****Fit OQS model****;
run;

When the models discussed above are each applied to the data in Table 11.15, we
display the results of these fits in Table 11.16.
Models
O
HA
Q
U
PQS
QUA

OQS
DA

d.f.

9
8
5
8
7
4
5
6

G'z
46.2621
29.2252
10.1855
8.8430
8.8367
4.0184
4.2904
6.4601

X'2
44.0662
26.2960
8.1690
10.4662
10.3516
4.2060
3.7510
6.9508

Table 11.16: Results of fitting the models to the data in Table 11.15
The exact agreement model (HA) fits poorly from the results in Table 11.16. The
parsimonious quasi-symmetry (PQS) or the "linear-by-linear+exact agreement"
model fits the data well with the following log parameter estimates:

490

Parameter
(3
6

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED

d.f.
1
1

Estimate
1.0412
0.0277

s.e.
0.2971
0.3487

Wald 95% Confidence


limits
1.6234
0.4589
-0.6557
0.7111

Chisquare
12.28
0.01

DATA

Pr>X2
0.0005
0.9367

We notice that while the effect of the (3 parameter is significant, that of the covariate
is not. The model has a G2 value of 8.8367 on 7 d.f. Since the last term is not
significant, we can drop this term from the model and refit the reduced model. The
reduced model , which is the uniform association model (U), gives a G2 value of
8.8434 on 8 d.f. (this is the most parsimonious model) with J3 = 1.0556 (a.s.e. =
0.2360), and 9 = e1'0556 = 2.874, for adjacent rows and column categories, indicating
that the odds that the diagnosis of New Orleans neurologist is i + 1 rather then i
is estimated to be 2.874 higher when the diagnosis of the Winnipeg neurologist is
j + I than when it is j for all cases in which | i j \= 1. Similarly, the odds that
the diagnosis of New Orleans neurologist is i + 2 rather then i is estimated to be
(2.S74)2 = 8.260 higher when the diagnosis of the Winnipeg neurologist is j +1 than
when it is j in this case. Lawal (2003b) discusses the case when the 6 parameter is
significant (see exercise 6 at the end of this chapter).
Another approach to measuring agreement when categories are ordinal is to
obtain a different measure of agreement different from K. The K that we discussed
earlier is most useful for nominal categories. An alternative measure with ordinal
categories is the weighted kappa, KW, which is defined (Spitzer et al., 1967) by:
__ E E ^tjTTij - E Uij

where 0 < uij = 1 (i j)2/(I - I) 2 < 1. The following results are obtained when
this is implemented in SAS software.
set agree;
proc freq; weight count; tables new*win/chisq agree; run;
Test of Symmetry
Statistic (S)
DF
Pr > S

Statistic
Simple Kappa
Weighted Kappa
Sample Size = 69

9.7647
6
0.1349
Kappa Statistics
Value
ASE
95'/. Confidence Limits
0.2965
0.4773

0.0785
0.0730

0.1427
0.3341

0.4504
0.6204

For the above data, kw = 0.4773. Under the model of independence, X2 = 44.0662
and G2 = 46.2641. Hence r = v/44.0662/(3 x 69) = 0.4614. The measure of
agreement based on r indicates that the agreement between the two neurologists is
moderate.

11.11

Multirater Case

We now extend the theory developed in the earlier sections to the case in which
we have multiple raters. The simplest of this is the case involving three raters.

11.11.

MULTIRATER CASE

491

The data in Table 11.17 below relate to the degree of necrosis for 612 tumors as
cross-classified by three raters with grading scale: 1, none; 2, <10%; 3, >10%. The
data, a 2 x 3 x 3 contingency table are again taken from Melia and Diener-West
(1994).

Rater A
1

Rater B
1
2
3

1
315
14
3

Rater C
2
105
22
0

3
13
3
1

1
2
3

33
8
0

16
16
2

2
7
4

1
2
3

5
1
0

6
3
4

1
4
24

Table 11.17: Degree of necrosis of tumor in 612 eyes cross-classified by three raters
The PQS model, in (11.40) for example, can be extended to the case where we have
three raters. In this case, we would have:
c
In (rriijk) = ^ + Af + Af + A

i = k)
+

(11.48)
+

The /i, /2, ^3 pertain to exact agreement between pairs of raters A and B; A and C;
and B and C, respectively, beyond that due to the linear-by-linear association while
4 describes the additional exact agreement among all the three raters. The /'s are
created by the following expressions.
2 ifi = j
0 otherwise

2 ifj = k
0 otherwise

otherwise

otherwise

We have adopted the scores Ui = i 2, that is, scores centered at zero for these
data set so that we can compare our parameter estimates with those obtained in
Melia and Diener-West (1994).
We present below the log-linear formulations of the models described in Melia
and Diener-West (1994).

492

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA


(Ml)

In (mijk) = H + Xf + Xf + Af + /i/(i - j) + I2l(i = k)


+ kl(j = k) + lj(i = j = k]

In (mijk) = n + Xf + Xf + Xk + Pi
In .

\ ,, _l_ \A , \B

, \C ,

_k) H + \ 4-Aj 4-A fc +

In (mijk) = V 4 Af 4 Af 4- \k +

(M2)

4 (32UiUk +
j = k) + lj(i =j = k)

(M3a)

4 02UiUk 4- (33UjUk +

(M3b)

4-

(M5a)

i = k) + I3l(j = k)
In

- p- + X + X + Xk +

In (rriijk) = fi + Af 4- Xf + Xk 44- hl(i = j) 4- I2li = k


In (

4-

+ (32UiUk + /33UjUk 4- k,j,kl(i, j, k))


(M5b)
+ /32UiUk +

(j = k)

(M5c)

4- (32UiUk +

+ kl(i = j) + I2l(i = k) + I3l(j = k) + lj(i = j = k)


(M5d)
Model (M5b), which assumes homogeneity in pairwise /, has l(i,j, k) defined in this
case by:
l ( i , j , k ) = h(i = j) +I2(i = k) + I3(j = k)

11.11.1

Results

In Table 11.18 are the results of employing the models in (Ml) to (M5d).
Model
Ml
M2
M3
(a)
(b)
M5
(a)
(b)
(c)
(d)

d.f.
20
16

Deviance
384.3596
114.0314

AIC
-

17
16

28.3525
23.0422

-5.6
-9.0

14
15
13
12

19.1596
15.5831
9.6941
9.5375

-8.8
-14.4
-16.31
-14.40

Table 11.18: Comparison of log-linear models for the data in Table 11.17
The results in Table 11.18 are consistent with those in Melia and Diener-West
(1994). The column labeled AIC refers to the Akaike information criterion for
selecting the most parsimonious models among those models that fit the data. These
values are presented for only models that fit our data. It is obvious from the values of
the AIC that model (M5c) fits best. That is, model (M5c) is the most parsimonious
model for the data in Table 11.17. Relevant parameter estimates under this model
are presented in Table 11.19.

11.11.

Parameter

01
02
03
04
ll
h

493

MULTIRATER CASE
d.f.
1
1
1
1
1
1
1

Estimate
0.5689
0.9383
1.0040
0.4997
0.6613
-0.0837
0.3627

Standard
error
0.2310
0.2294
0.2334
0.1695
0.2093
0.1891
0.1930

Wald 95%
confidence limits
0.1161
1.0217
0.4887 1.3880
0.5465
1.4614
0.1675
0.8318
0.2511
1.0715
-0.4543
0.2868
-0.0157
0.7410

Chisquare
6.06
16.73
18.50
8.69
9.98
0.20
3.53

Pr > Chisq
0.0138
<0.0001
<0.0001
0.0032
0.0016
0.6578
0.0603

Table 11.19: Analysis of parameter estimates


The SAS software implementation of these models discussed above is carried out
with the following SAS software program.
data agree;
do A=-l to 1 BY 1; do B=-l to 1 BY 1; do C=-l to 1 BY 1;
input COUNT <8ffl;
IF A EQ B THEN Dl=2;
ELSE Dl=l;
IF A EQ C THEN D2=2;
ELSE D2=l;
IF B EQ C THEN D3=2;
ELSE D3=l;
IF A EQ B EQ C THEN D4=2; ELSE D4=l;
D=D1+D2+D3; U12=A*B; U13=A*C; U23=B*C; U123=A*B*C;
output; end; end; end;
datalines;
315 105 13 14 22 3 3 0 1 33 16 2 8 16 7 0 2 4 5 6 1 1 3 4 0 4 24

run;

(i)
(ii)
(iii)
(iv)

(v)
(vi)
(vii)
(viii)

proc genmod; class A B C ;


model ount=A B C/dist=poi; run;
model ount=A B C Dl-D4/dist=poi; run;
model ount=A B C U12 U13 U23/dist=poi; run;
model ount=A B C U12 U13 U23 U123/dist=poi; run;
model ount=A B C U12 U13 U23 Dl-D3/dist=poi; run;
model ount=A B C U12 U13 U23 D/dist=poi; run;
model ount=A B C U12 U13 U23 U123 Dl-D3/dist=poi; run;
model ount=A B C U12 U13 U23 U123 Dl-D4/dist=poi; run;

** We represent the /i,/2^3 and /4 in the text respectively by D1-D4 in the SAS
software implementation above.
Because the effect of exact agreement as measured by /i between raters A and B is
highly significant, we can therefore conclude that our chosen model indicates that
there is strong agreement beyond that would be expected by chance in the form of
the contributions of /3i,/?2,^3 and /?4, the linear-by-linear associations parameters.
The results also indicate that /34, the three-way linear-by-linear association parameter is highly significant. Following Melia and Diener-West (1994), we present below
parameter estimates of the linear-by-linear association for each pair of raters by the
level of the third rater. That is, for any pair p = 1, 2, 3, we calculate the parameter
estimate to be equal to:
ftp +

Ugfa

where u\ = 1, u^ = 0, and uz 1 are respectively the centered scores derived


from Ui = i 2. For the pair of raters {A,C} for example, corresponding to p = 2
in the table below and the third level of rater C, we have, since q 3 in this case,
Az + (1 * At) = 0.9383 + (1 * 0.4997) = 1.4380.
These results are displayed in Table 11.20.
As observed in Melia and Diener-West (1994), increase in the degree of necrosis
increases the strength of the linear-by-linear association between any pairs of raters,
as the level of the third rater also increases. This indicates that raters tend to agree

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

494

p
1
2
3

Rater pair
{A,B}
{A,C}
{B,C}

Level of third rater, q


1
2
3
0.0692 0.5689
1.0686
0.4386 0.9383 1.4380
0.5043 1.0040 1.5037

Table 11.20: Estimates of linear-by-linear association for each pair of raters by level
of third rater
about which cases have a large degree of necrosis than they about which cases have
little or no necrosis. However, this degree of agreement depends on which pair of
raters being considered. While the linear-by-linear association is weaker between
raters A and B than between A and C or B and C, raters A and B have the strongest
additional exact agreement, l\ = 0.661. There is no significant additional agreement
between raters B and C with /3 = 0.3627 and also no significant additional exact
agreement between A and C since 1-2 0.0837 is not significant both at a 0.05.
The above findings that raters A and B have strongest exact agreement while
raters A and C and raters B and C have strongest linear-by-linear association suggest as Melia and Diener-West put it, that "rater C's evaluations may be shifted
with respect to those of raters A and B." This is exemplified by the marginal distributions. For instance, rater C is less likely to assign cases to "no necrosis," 379
of 612 (or 62%), than are raters A and B; 78% (476 of 612) and 81% (496 of 612),
respectively, and correspondingly, more likely to assign a higher degree of necrosis,
59 of 612 for rater C as compared to 48 and 38 out of 612 for raters A and B
respectively.

11.12

Exercises

1. For the data in Table 10.21 in chapter 10,


(a) Fit the symmetry model to these data.
(b) Fit the quasi-symmetry model to the data and use it to test for marginal
homogeneity.
(c) Fit the QI, CS, and diagonal parameter models to the above data.
2. For the data in Table 10.22 in chapter 10,
(a) Fit the models of independence, QI, F, V, T, Q, D, and DA to these data,
(a) Fit symmetry models to these data.
(c) Fit skew-symmetry models to the above data and discuss the most parsimonious model.
3. The following data is supplied by E. Jensen of Faellesforeningen for Danmarks
Brugsforeninger, Copenhagen: 15 persons examined all possible pairings of 4
different samples for taste, resulting in the following preference table (David,
1988, pp. 115 to 116).

11.12.

EXERCISES

495
AI

AI

A-2

12
13
13

A3
A,

A2
3
4
12

A4
2
3
5
-

A3
2
11
10

Analyze the data assuming the Bradley-Terry model and also test the goodness of fit of your model. Give a ranking for the four samples and estimate
the probabilities TTJ.
4. The data below relate to 44 Caucasian women from North Carolina who were
under 30 and married to their first husbands. Women were asked to respond
for pairs of numbers x and y between 0 and 6 with x < y. The question
asked was, "given a choice of having, during your entire lifetime, either x or y
children, which would you choose?" The data are summarized below (Imrey,
et al, 1976).
Alternative
choice
0
1
2
3
4
5
6

0
2
1
3
1
1
2

Preferred number
2
3
1
17 22 22
19 13
11
0
1
7
10 12 13
11 18 15
13 20 22

of children
4
5
15 26
9
10
11
6
2
6
4
17
14 12

6
25
11
6
6
0
11

Table 11.21: Family size preference


(a) Test whether the Bradley-Terry model fits.
(b) Estimate the probabilities TT;.
The data below relate to two pathologist classifying each of 118 slides in terms
of carcinoma in situ of the uterine cervix (Landis & Koch, 1977a)based on the
most involved lesion. The classification is into one of the ordered categories
(1 = negative, 2 = atypical squamous hyperplasia, 3 = carcinoma in situ, 4
= squamous carcinoma with early stromal invasion, 5 = invasive carcinoma)
resulting into the 5 x 5 table below:
Pathologist B
Pathologist A
1
2
3
4
5

22
5
0
0
0

2
7
2
1
0

2
14
36
14
3

0
0
0
7
0

0
0
0
0
3

Obtain estimates of K and T for the above data. Find a parsimonious loglinear model of the form discussed in this chapter for these data and interpret
your parameter estimates.

496

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

6. The data in Table 11.22, are from the Collaborative Ocular Melanoma Study
(COMS) (1989) and were analyzed in Melia and Diener-West (1994).

Rater A
1
2
3
4
5

1
291
186
2
3
1

Rater B
2 3
74 1
256 7
4 0
10 1
7 1

4
1
7
2
14
8

5
1
3
0
2
3

Table 11.22: Scleral extension in 885 eyes: cross-classified by two raters (Melia &
Diener-West, 1994)
They came from multi-center clinical trials investigating the treatment of
choroidal melanoma, a very rare cancer of the eye. A detailed description
of the data is available in Melia and Diener-West (1994). The data are a
summary of the classification by two raters A and B (pathologists) of the
extent of scleral extension of choroidal melanoma in 885 eyes. The category
grading scale is: (1) none or innermost layers; (2) within sclera, but does not
extend to scleral surface; (3) extends to scleral surface; (4) extrascleral extension without transection; (5) extrascleral extension with presumed residual
tumor in orbit. The categories are assumed ordered, and agreement among
raters has implications for the grading system reliability of the histopathological features believed to be important prognostic indicators for the disease.
Melia and Diener-West have already analyzed these data. Show that the PQS
model is the most parsimonious model for this data.
7. The data below refer to 264 marriages in Surinam (Speckman, 1965). Here
husbands and wives are categorized in terms of four religious groups: C =
Christian, M = Moslems, S = Sanatin Dharm, A = Arya Samaj. S and A are
two Hindustan religious groups.
Wives
Husbands
C

M
S
A
Total

C
17
1
5
4
27

M
1
66
4
2
73

S
4
4
96
18
122

A
3
2
14
23
42

Total
25
73
119
47
264

Table 11.23: Marriage in Surinam (Speckmann, 1965)


Fit the models of symmetry, conditional symmetry, quasi-symmetry, and
quasi-independence to this data. Do any of the distance models fit these
data? Also fit the various diagonal models to the data.
The table below relates mother's education to father's education (Mullins
& Sites, 1984) for a sample of eminent Black Americans (defined as persons
having biographical sketch in the publication Who's Who Among Black Amer-

11.12.

EXERCISES

497

icans). Fit the symmetry, conditional symmetry, marginal homogeneity, and


quasi-symmetry to the data and interpret the data.

Mother's
education
8th Grade or less
Part high school
High school
College

8th Grade
or less
81
14
43
21

Father's education
Part High
High
school
school
3
9
8
9
7
43
24
6

College
11
6
18
87

9. For the two data sets below, fit appropriate models to these data and interpret.

Danish occupational mobility data (Svalastoga, 1959)


Father's
Status

(1)
(2)
(3)
(4)
(5)
Total

Origin
religion
(1)
(2)
(3)
(4)
(5)
(6)
Total

(1)
123
10
2
0
0
1
136

Son's Status

(4)
4
59
217
384
201
865

(5)
2
21
95
198
246
562

Total

Current religion
(5)
(3)
(4)
(2)
1
2
0
0
1
4
9
420
1
21 102
5
2
0
15
8
7
4
0
0
1
1
0
3
18
18
458 113

(6)
48
217
54
6
5
62
392

Total
174
661
185
31
16
68
1135

(1)
18
24
23
8
6
79

(2)
17
105
84
49
8
263

(3)
16
109
289
175
69
658

57
318
708
814
530
2427

Table 11.24: Religious Mobility: Cross-classification of origin religion by current


religion
where: (1) Catholic, (2) Anglican, (3) Mainline Protestant, (4) Fundamentalist Protestant, (5) Other protestant, (6) None.
10. Caussinus (1965) presented the data in the table below, which relate to a crossclassification of individuals by their social and professional status in 1954 and
for the same individuals in 1962.

498

CHAPTER 11. ANALYSIS OF DOUBLY CLASSIFIED DATA

1954
1
2
3
4
5
6

1
187
4
22
6
1
0

2
13
191
8
6
3
2

Status in 1962
4
3
11
17
4
9
182 20
10
323
4
2
2
5

5
3
22
14
7
126
1

6
1
1
3
4
17
153

(a) Fit the symmetry model to the data. Does this model fit the data?
(b) Fit both the QS and QI models to the data. Test for marginal homogeneity.
(c) Fit nonindependence models to the data. Which is the most parsimonious
model?

Chapter 12

Analysis of Repeated
Measures Data
12.1

Introduction

Data arising from repeated measures either by design or by epidemiologic (observational) designs often occur when observations on subjects or objects are taken over
several times or occasions. Such data are often described as longitudinal data.The
theory on the analysis of repeated measures data is not new, especially when the
response variable if of the continous type. Classical multivariate methodology exists for this analysis. In recent years, because of the inherent correlation between
the outcome observations over time (it is sometimes believed that observations
closer together are often more correlated than those farther apart) several methods of analysis have been developed. The mixed effects models for example, have
been developed in recent years to take care of this problem, and the SAS^ book
SAS^ System for Mixed Models by Littel et al., (1996) provides excellent examples and various correlation structures for analyzing longitudinal data when the
outcome variable is continuous.
Given that we are concerned in this chapter with the situation in which the
outcome variable is categorical, that is, longitudinal data with categorical outcome
variable, the logistic regression for a binary outcome for example, assumes that
observations are independent across time. But it is not uncommon in longitudinal
data to at least imagine, for example, if observations were taken at say, time 1, time
2 up to time 6 then observations at times 1 and 2 are more likely to have higher
correlations than say, at times 1 and 6.
Recent advances in the methodology of analyzing repeated measured data with
categorical outcome variable (Liang & Zeger, 1986) have made it possible to actually
model the covariance structure of the repeated observations. We shall examine this
with examples in a later section. We will, however, begin our discussion in this
chapter with the following binary correlated data.
499

500

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

12.2

Logit Models for Repeated Binary Response

Consider the case when the response variable is binary, that is, / = 2 categories,
and let the k = 1, 2, ,p denote the relevant populations. If we assume that there
are t repeated occassions, then the logit of the response can be modeled for the ji-th
occassion as:
where ipi(j, k} represents the i-th response probability at the j-th occassion in population k. The above can be succintly written in the form:
+ f3

that is,

logit,,.fc = /3X
where P refers to the population, T to the occassion and P * T to the interaction
between the population and occassion (if it exists). The usual constraints are needed
for proper identifiability of the parameters.

12.2.1

Example 12.1

The table below is from Woolson and Clarke (1984) and relates to the longitudinal
studies of coronary risk factors in schoolchildren. A sample of 11 to 13 year-old
children in 1977 were classified by gender and by relative weight (obese, not obese)
in 1977, 1979, and 1981.
Gender
Male
Female

NNN

119
129

NNO
7
8

NON
8
7

Responses
ONN
NOO
3
13
9
6

ONO
4
2

OON
11
7

ooo
16
14

Total
181
182

Table 12.1: Classification of children by gender and relative weight


Here NNN indicates not obese in 1977, 1979 and 1981, and NON similarly indicates
not obese in 1977, obese in 1979, and not obese in 1981. Similar interpretations can
be obtained for the remaining six outcomes. Here, there are two subpopulations
and the response variable obese, not obese is observed at T = 3 occasions (1977,
1979, and 1981). We thus have a 2 x 23 contingency table. We intend to model the
nonobsese response category.
If we let 1977, 1979, and 1981 occasions be represented by j = 1,2, and 3
respectively and k = 1 for male and k = 2 for female similarly, then the logit can
be appropriately represented by L(j,k). Since the occasions are ordered, we may
employ the linear occasion effects {flf (3vi}, for fixed scores {r//}. There are
6 marginal distributions, since there are two gender levels and 3 occasions. The
following model is first employed for our analysis:
(12.2)
If the marginal distribution for 1997 response (occasion 1) is identical for males
and female respondents, then the method of Lipsitz (1988) allows us to fit this
population-averaged or marginal model. In this case, models are specified in terms
of marginal parameters such as those describing the mean response at time 1, time
2, or time 3. Alternatively, we can fit the cumulative logit model to the data. The

12.2. MODELS FOR REPEATED BINARY

RESPONSE

501

results of both considerations are displayed respectively below. The models assume
that interaction is present that leads to a saturated model. Let us first consider the
marginal model to the data in Table 12.1. We give below the relevant SAS software
instructions (together with the corresponding output) for implementing the model
in (12.2) where yearl, year2, and yearS refer respectively to 1977, 1979, and 1981.
data binary;
input gender$ yearl$ year2$ year3$ count <8<D;
datalines;

m n n n 119 rnnno? ... f o o n 7 f o o o ! 4


proc catmod order=data; weight count;
response marginals;
/* use marginal homogeneity model*/
model yearl*year2*year3=gender|_response_; repeated year; run;
Population Profiles
Sample

gender

1
2

Sample Size

m
f

181
182

Response Profiles
Response

1
2
3
4
5
6
7
8

Sample

Function
Number

Response
Function

yearl

year2

yearS

n
n
n
n
o
o
o
o

n
n
o
o
n
n
o
o

n
o
n
o
n
o
n
o

2
1
1

1
2
3

0.75691
0 . 79006
0.83425

1
1
1

1
2
3

0 . 84066
0 . 79670
0.81868

1
1
1

Design Matrix
3
4

1
0
1

0
1
-1

1
0
-1

0
1
-1

1
0
-1

0
1
-1

-1
0
1

0
-1
1

1
1
-1

Analysis of Variance
DF
Chi-Square

Source

1
1
2
2
0

Intercept
gender
year
gender*year
Residual

2267.75
0 . 54
2 . 87
5 . 93

Pr > ChiSq
< .0001
.4613
0
0
.2382
0
.0516

Analysis of Weighted Least Squares Estimates

Effect
Intercept
gender
year
gender*year

Parameter

1
2
3
4
5
6

Estimate

Standard
Error

ChiSquare

Pr > ChiSq

0.8062
-0.0125
-0.00743
-0.0128
-0.0294
0.00915

0.0169
0.0169
0.0121
0.0111
0.0121
0.0111

2267.75
0.54
0.37
1.33
5.86
0.67

<.0001
0.4613
0.5410
0 . 2496
0.0155
0.4117

The above response functions are obtained from Table 12.2, which are generated in
SAS software by PROC FREQ.

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

502

Sample
1 (Men)
2 (Females)

Response function
yearl year2 year3
143
137
151
145
149
153

Total
181
182

Table 12.2: Frequencies of all no obese


where, for instance, 0.75691 is computed from 137/181 and so on. The model has six
parameters representing respectively, the overall effect (1), a gender effect (2), two
years effects (3,4) and the two gender*year interaction effects (5,6). The numbers
in the parentheses represent corrresponding columns in the design matrix columns
as well as parameters in the analysis of WLS estimates in the output.
Similarly, we have for the cumulative logit fit the following select SAS software
program and corresponding output.
set binary;
proc catmod order=data; weight count;
response logits; /* fits cummulative logit model */
model yearl*year2*year3= gender I_response_; repeated year;
run;

Sample

Function
Number

Response
Function

Design Matrix
3
4

1
2
3

1..13579
1..32526
1 .61608

1
1
1

1
1
1

1
0
-1

0
1
-1

1
0
-1

0
1
-1

1
2
3

1 .66314
1 .36582
1 . 50744

1
1
1

-1
-1
-1

1
0
-1

0
1
-1

-1
0
1

0
-1
1

Analysis of Variance
DF

Chi-Square

1
1
2
2
0

Intercept
gender
year
gender*year
Residual

172.28
0.49
2.71
5.70

Pr > ChiSq

<.0001
0 . 4840
0.2577
0.0580

Analysis of Weighted Least Squares Estimates

Estimate

Intercept
gender
year
gender*year

1
2
3
4
5
6

1 . 4356
-0 . 0765
-0.0361
-0.0901
-0.1871
0.0563

Standard
Error

0.1094
0.1094
0.0785
0 . 0698
0.0785
0.0698

ChiSquare

172.28
0.49
0.21
1.66
5.68
0.65

Pr > ChiSq

<.0001
0 . 4840
0.6453
0.1971
0.0171
0.4203

The results from both analyses are very similar. While the overall effects each
of gender, year, and gender*year terms are not significant based on the pvalues
from the Wald test, examination of the parameter estimates indicates that for 1997
there is a significant interaction presence between gender and year with pvalues of
0.0155 and 0.0171, respectively, from both models. We can therefore fit unsaturated
model to this data set, bearing in mind that we must include the effects of the first
intercept, gender (male) effect and linear effect of year and the first interaction term.
In other words, we need the design vectors of parameters 1, 2, 3, and the product of

12.2. MODELS FOR REPEATED BINARY RESPONSE

503

2 and 3 in our model statement. These correspond respectively to design vectors 1,


2, 3, and 5 respectively. The reduced model is implemented in SAS software with
the following statements and a partial output.
set binary;
proc catmod order=data; weight count; population gender; response logits;
model yearl*year2*year3=(l 1 1 1,
1 1 0 0 ,
1 1 - 1- 1 ,
1 - 1 1- 1 ,
1 - 1 0 0,
1 -1 -1 1)
(!='Intercept',
2='Gender Male',
3='Year 1977' ,
4= 'Gender Year I')/ pred freq; run;
Analysis of Variance
Source

DF

Chi-Square

Pr > ChiSq

Intercept
Gender Male
Year 1977
Gender Year 1

1
1
1
1

172.04
0.47
1.29
4.43

<.0001
0.4918
0.2562
0.0353

Residual

2.50

0.2861

Analysis of Weighted Least Squares Estimates

Effect
Model

Parameter

1
2
3
4

Estimate
1 . 4345
-0.0752
-0.0820
-0.1520

Standard
Error

ChiSquare

0.1094
0.1094
0.0722
0.0722

172.04

0.47
1.29
4.43

Pr > ChiSq
<.0001
0.4918
0.2562
0.0353

The weighted least squares analysis from the SAS software output gives WQ = 2.50
on 2 d.f. The model fits the data with a pvalue of 0.2861. The parameter estimates
are also displayed. Since the interaction term is significant (pvalue 0.0353), for
males therefore the odds are e2*(--0752) 0.86, indicating that they are 14% less
likely to be classified as nonobese than being obese in 1977 among boys. Put
another way, the odds are 1/0.86 = 1.16, that is, 16% times higher to be classified
as obese than nonobese among boys in 1977. Among girls (females), the odds is
e 2*(-o.0752-o.i520) = QQ^ indicating that females are 36% less likely to be classified
as nonobese than being obese or are (I/.64 = 1.58), 58% more likely to be classified
as obese in 1977, a slightly higher odds than the boys. For the year 1977 the
response to not being obese has the odds e(--08<2) Q.92 against being obese, that
is, in 1977 girls are 9% more likely to be obese than not being obese. For the boys,
we have the odds to be e (--0820-2*(-.i520)) = 1>25? that is, boys are 25% more likely
to be classified not being obese than being obese In 1977.

12.2.2

Alternative Analysis

An alternative analysis of the data in example 12.1 is to fit a Rasch model to this
data set. The Rasch model, originally developed by Rasch (1961), is a logistic item
response model that describes subject i's response to item k. If this response is
denoted by y ifc , then the Rasch model is formulated as:

504

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

Andersen (1970, 1973) proposed a conditional argument for estimating the parameter f5 and Tjur (1982) showed that the conditional model can be fitted as a log- linear
model. To implement this, each margin Rk of each item will be fitted together with
a factor variable giving the total score of successes (see Lindsey, 1995). In order to
implement this model, we define binary variables Rl, R2, R3 for years 1977, 1979,
and 1981 respectively, which takes the value 1 if respondent replies not obsese (that
is, N) and 0 otherwise. That is,
_ J 1 if not obese (N)
i
~ JO if obese (O)
For the three years, we next count how many 1's (maximum of 3) are recorded for
each year. We can implement this in SAS software with the following:
set binary;
if yearl eq 'n' then
if year2 eq 'n' then
if yearS eq 'n' then
total =rl+r2+r3;
datalines;
m n n n 119 1 1 1 3

rl=l;else rl=0;
r2=l;else r2=0;
r3=l;else r3=0;

. . . f o o o 1 40 0 0 0

run;
gender

m
m
m
m
m
m
m
m
f
f
f
f
f
f
f
f

yearl

n
n
n
n
o
o
0

o
n
n
n
n
o
o
o
o

year2

n
n
o
o
n
n
o
o
n
n
o
o
n
n
o
o

year3

n
o
n
o
n
0

n
o
n
o
n
o
n
o
n
o

rl

r2

r3

1
1

0
0

1
1

0
0

0
0

1
1

1
1
0
0

0
0
0
0

1
1

0
0

1
1
1
1
0
0
0
0

1
1
1
1

1
1
1

total

count

3
2
2
1
2
1
1
0
3
2
2
1
2
1
1
0

119
7
8
3
13
4
11
16
129
8
7
9
6
2
7
14

In the above table, we see that for the (NNN) combination (RI,R2,R3] = (1,1,1)
and yields a total of 3 positive (1's) responses. The Rasch model now can be
implemented in GENMOD with binary explanatory variables Rl, R2, R3, factor
variable TOTAL with 4 levels and any other covariates (in this case gender). The
following models are employed with their appropriate interpretations.
(i) {Rl, R2, R3, TOTAL, GENDER}: Model 1
This model implies that the responses are the same for each gender. The
model gives a deviance of 11.9916 on 11 d.f. This model fits, indicating that
the no obesity classification of the school children is uniform across gender.
(ii) { (Rl, R2, R3, TOTAL)* GENDER }: Model 2
This model has the dependence of gender incorporated into the model. The
model gives a deviance of 4.3312 on 4 degrees of freedom. This model also
fits well but is too structured. We relax some of these restrictions in the next
two models.

12.2. MODELS FOR REPEATED BINARY

RESPONSE

505

(iii) {(Rl, R2, R3)* GENDER, TOTAL}: Model 3


This model tries to answer that the "no obese" classification is the same for
the two sexes. This hypothesis tests if the total number of "N" varies from
males and females. Notice the absence of the gender*total interaction term in
this model. Again this model, gives a deviance of 5.1658 on 6 d.f. This model
fits indicating that the classificatory variable does not vary among the sexes.
(iv) {Rl, R2, R3, TOTAL*GENDER}: Model 4
This last model, in the language of item response methodology, tests if the
item characteristic curve is the same across gender. Notice again the absence
of the interaction terms between gender and the item responses. This model
gives a deviance of 10.4542 on 6 d.f. The model fits but is not as good as
model 3. The responses seem to have the same frequency of occurrence of "no
obesity" for the two sexes.
(v) {TOTAL, GENDER, TOTAL*GENDER}: Model 5
Model 5 examines if the probability of "no obese" responses are the same.
This model has a deviance value of 13.4024 on 8 d.f. Again this model fits
but we consider model 3, the most parsimonious.
The above models are implemented in GENMOD with the following model statements (put together for brevity only).
proc genmod;
class gender total;
model count=rl r2 r3 total gender /dist=poi type3; run;/* fits model 1 */
model count=rlI gender r2|gender r3|gender total|gender/dist=p; run;/* fits model 2 */
model count=rlI gender r21 gender r31 gender total/dist=poi type3; run; /* fits model 3 */
model count=rl r2 r3 totallgender /dist=poi typeS; run;/* fits model 4 */
model count=totalI gender /dist=poi type3;run; /* fits model 5 */

Our chosen model here is model 3, and we display below partial output of the parameter estimates under this model from GENMOD. Once again, the effect of gender
is not significant (p = 0.6030). While the total number of "no obsese" recorded
are significantly different (with positive NO's being the largest frequency), the estimates below (based on the magnitudes of rl, r2,r3) indicate that 1977 recorded the
expected classification "no obese" most often from the model, followed by 1981 and
lastly 1979. Only the 1977 gender interaction rl*gender is significant (p = 0.0121),
indicating that the distribution of no responses is more frequent in 1977 among girls
than boys since the parameter estimate is negative.
Analysis Of Parameter Estimates

Parameter

DF

Estimate

Standard
Error

Intercept

1
1

2 .6303
-0 .6345
0..1499
-0..8367
-1..2353
0..1711
-0..9426
0,.4584
5..0306
1..2843

0,.2394
0,.3349
0..2881
0.,3335
0.,3480
0.,3223
0.,3428
0. 3425
0. 6334
0. 4084

rl
gender
rl*gender

m
m

1
1

r2
r2*gender

r3
r3*gender
total
total

m
3
2

1
1
1

Wald 95'/. Confidence


Limits

ChiSquare

3..0995
0..0219
0.,7146
-0.,1830
-0..5532
0.,8028
-0.,2707
1. 1297
6. 2721
2. 0848

120 .73
3 .59
0.27
6.29
12 .60
0.28
7 .56
1 .79
63 .07
9.89

2 .1611
-1 .2909
-0 .4148
-1 .4904
-1 .9173
-0 .4607
-1 .6144
-0.,2128
3,,7890
0,,4838

Pr > ChiSq

< .0001
0.0582
0.6030
0.0121
0.0004
0.5956
0.0060

0.1807

< .0001
0.0017

The results of the analysis described above are displayed in Table 12.3.

506

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA


Model
I
II
III
IV
V

d.f.
9
4
6
6
8

G*
11.9916
4.3312
5.1658
10.4542
13.4024

pvalue
0.2138
0.3630
0.5227
0.1068
0.0987

Table 12.3: Results of analysis

12.3

Generalized Estimating Equations

For longitudinal data, observations of subjects at multiple points in time are usually
correlated. Basically, the generalized estimating equations (GEE) comprise the
following:
(a) n subjects, indexed by i 1,2, , n, and all observations arising from the
multiple measurements or observations on a single subject are usually referred
to as a cluster. Such multiple measurements on the same subject, whether
under different conditions or over time, introduces within-subject correlation.
Generalized estimating equations (GEE) procedure allows the within-subject
correlation to be worked into the parameter estimation procedure. We would
therefore expect n clusters from n subjects. The GEE analysis is based on
first collapsing across subjects, and then model the marginal parameters. This
approach yields the population-averaged regression parameters.
(b) TI measurements over time on the iih subject (t = 1, 2, , T, i = 1, 2, , n).
(c) ya is the observed response on subject i at time t (or measurement ), where,
for a binary response outcome,

l if subject has response


0 otherwise

(d) Xjt is the set of p covariates measured on subject i at time i, where the p
covariates are indexed by k = 1,2, ,p. Both time-varying (within-subject)
and time-stationary (between-subject) covariates could be included here.
A typical data structure for longitudinal data is presented in Table 12.4 where each
of Xj. and
are i-dimensional.
Response

Subjects
1
2

Xn

Xj2

Covariates
'''

X^pi

Xip

X21

X22

' ''

X 2 ,p_l

X2t

Xnl

X n2

-'

X^p.!

X Bt

":
yn

Table 12.4: Typical data structure


The GEE approach collapses first across subjects, and then models the marginal parameters, yielding population-averaged regression parameters. A natural approach

12.3. GENERALIZED ESTIMATING EQUATIONS

507

to the analysis of such data is to model the marginal distributions of the observed
response at each measurement times as a function of the covariates. Such a model
allows us to look forward to future applications of results from it. The marginal
models are sometimes referred to as cross-sectional models.
The marginal density of yu, which is Bernoulli, is given by
f(Vit x it ) = 7rf t "(l-^) ( 1 - V i t )
where, following Williamson et al., (1999), we shall assume that
Kit = 7rt(/3) = pr(yit = l|x;t,/3) =

/ ePo+ftXit

(12.4)

~ ,
\1 + e ^o+Pi x "y

(12.5)

Although we have used a link function in (12.5) above, in general if we assume a


GLM model for j/i t , then,
E(yit) = mt
9(1**) = xJ t /3
(12.6)
where g is the link function. Liang and Zeger (1986) show that consistent estimates
of the marginal model parameters can be obtained from the solutions of the estimating equations by treating the correlation parameters as nuisance parameters. This
procedure models the within-subject correlation structure, which in turn increases
the efficiency of the estimators of the /3s.
2_^/'D'iVi l(Yi - Hi] = ^DjA/ R;(a)A/ (Y - M) =

( 12 - 7 )

where
(i) D^ is a Ti x p matrix whose (, k}th component is <9/,
(ii) Ai is a Ti x Ti symmetric positive definite diagonal matrix representing the
variance of Yi which are given by Var(l r it |xj t ) = Trit(l irit}.
(iii) Rj(o:) is a Ti x Ti working correlation matrix which depends on the p x 1
vector of unknown parameters and V; is the corresponding Ti x Tl working
covariance matrix.
(iv) a = (ai, Q!2, , otp) are parameters describing the within-subject correlation.
(v) (Yi MI) is a Ti x 1 residual vector, measuring deviations of observed responses
of the iih subject from its mean.
The maximum likelihood estimate, /3, is the solution to the estimating equations
using iterative procedure usually the Gauss-Newton procedure. If R; is set to be
the identity matrix, then the estimating equations become those that would apply if
the measurements were independent. In this case, all the observations both within
and among subjects are assumed to be independent, and the GEE in (12.7) reduces
to
n

5^DjVr1(Yi-Mi)=0

(12.8)

i=l

We notice that the approach requires the specification of the first and second moments of the vector of correlated binary responses for each individual. Prentice
(1988) has also adopted this procedure. Before we consider our first example here,
we may note the following concerning the GEE.

508

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

When employing the GEE method, a working correlation must be employed or


one must be chosen to validate the marginal expectation (Pepe & Anderson,
1994). The choice of the working correlation structure (since this is often not
known) is often left to the analyst. We shall discuss possible choices of the
working correlation structures that are available to the analyst in the next
section.
The estimator of the covariance matrix of /3 is robust. That is, the GEE
method has the property of being a consistent estimator of the covariance
matrix of the estimators of /3, even if the working correlation matrix is misspecified (Chang, 2000).
Diagnostics tests should be carried out to ascertain if the final model from the
GEE fits the data as accurately as possible. In this case, it has been found
that the usual residual plots could be misleading and a non-parametric method
"the Wald-Wolfowitz" run test (Chang, 2000) would be very appropriate in
this case.
It is also important that we define the concept of time-varying and time-stationary
covariates,
A time-stationary covariate is a between-subject variate that would be repeated in each of the T; measurements for the i-th subject. An example here
would be the variable gender.
A time-vary ing is a within subject variate that assumes different values for
each of the Ti measurements on the i-ih subject. Examples are income which
varies over time and age.

12.4

Example 12.2: Mother Rat Data

The following example by courtesy of Stuart Lipsitz relate to the effect of some
possibly toxic substance on the offsprings of n = 46 pregnant rats who are given
different doses of a toxic substance. The outcome for a baby (offspring) in the litter
is whether he/she has a birth defect. The data also contains information about
the weight of the offspring (lighter babies tend to have more birth defects, since
one possible defect is a missing limb) and sex of the offspring. We wish to model
therefore the probability of a defect as a function of dose, sex, and weight. Since
births are in litters (offsprings from the same parents), we can therefore consider
data arising from this study to be clustered (litters).
Let the response variable be Yij from dose i on the j-ih subject be defined as:
1 if defect is found
0 otherwise
Suppose the probability (in the j-ih cluster) of a birth defect depends on the dose
and sex, weight, that is,
TTij = pr[Y;j = 11 dosei,sextj, weight^-]
_

exp (0o + 0idose; + 02 sexjj + 03 weight^.)


1 + exp (0o + 0i dose; + 02 sex;.,- + 03 weight^-)

(12.9)

12.4. EXAMPLE 12.2: MOTHER RAT DATA

509

where dose^ is a cluster-level covariate since it is the same for all offspring in the
litter (we only need the subscript i).
Also,
/
I 1 it male
sexv,3 = <
\Q if female
is a within-cluster covariate since it can be different for different offspring in the
litter, and weight is similarly a within-cluster covariate.
The model in (12.9) can be written as:
log

^ = /3 X,
\l-KijJ

that is,
(12.10)

logit^- = Po+fii dosei + fa sex^ + fa weight^The GEE procedure uses iterative generalized least squares with the weight matrix,
W with nonzero off-diagonal elements that are functions of the correlations among
the observations. Using the correlations among the Pearson's based residuals, the
matrix of correlation W is reestimated at each iteration, until converge is attained.
For our data, we are interested in estimating /3 = [fa,/3i,fa,fa].
We present a
sample of the mother rat data below for the first 2 of the 46 clusters. The entire
data is presented in appendix G.I.
CLUSTER

49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
53
53
53
53
53
53
53
53
53
53
53
53

DOSE
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0.100

12.4.1

WEIGHT
0.989
0.898
0.945
0.899
0.933
0.842
0.896
1.006
1.115
1.007
0.958
0.999
0.909
0.848
0.999
0.751
0.902
0.875
0.964
0.973
0.965
0.925
0.936
1.012
0.858
0.816
1.007

SEX
F
M
M
M
F
F
F
M
M
F
F
M
F
F
F
F
F
F
M
M
M
M
M
M
M
F
M

DEFECT (DEFECT: l=Yes, 0=No)

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Correlation Structure

GEE uses several correlation structures to model the correlation matrix, among the
observations within each cluster. We shall first examine two of these structures for
now and extend the results to other structures later in the chapter.
1. If we assume the compound symmetry correlation structure, which SAS software calls EXCH for exchangeable, the structure assumes equal correlations
within subjects at all time points in the model. That is,

510

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

p p

2. Also, if we assume that observations in a cluster are independent (that is,


all correlations are assumed zero), which is the usual assumption in logistic regression, then this will be equivalent to specifying that the correlation
structure is IND independent in SAS software.
We implement the GEE in GENMOD with in the following sections for the data in
our example above.

12.4.2

GEE with Exchangeable Structure

The following SAS software program is employed to fit the GEE with EXCH covariance structure.
data rat;
input cluster dose weight sex $ defect 09;
49 0.000 0.989 F 0 49 0.000 0.898 M 0 ...199 0.000 0.791 M 0 199 0.000 0.961 F 0
proc genmod data=rat descending; class cluster sex ; /* must use class for cluster id */
model defect = dose weight sex /
link=logit dist=binomial; /* logistic regression */
/* binomial distribution */
/* binomial = bernoulli */
/* when binomial sample */
/* size = 1
*/
repeated subject=cluster / type=EXCH corrw ;
/* subject = cluster id;*/
/* type = correlation; */
/* corrw prints out
*/
/* correlation matrix
*/
run;

In the above, the GEE is invoked with the repeated statement. The typeEXCH
asks for the exchangeable correlation structure, while corrw asks for the printing
of the working correlation matrix. The descending specifies that the "defect
1" be modeled. We present below a selected output from implementing the above
program in SAS software.
GEE Model Information
Correlation Structure
Subject Effect
Number of Clusters
Correlation Matrix Dimension
Maximum Cluster Size
Minimum Cluster Size

Exchangeable
cluster (46 levels)
46
16
16
2

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Parameter
Intercept
dose
weight
sex
F
sex
M

Standard
Error
Estimate

3 .2840
35 .8357
-7 .4991
-0 .4600
0.0000

95*/, Confidence
Limits
Lower
Upper

1 .7096 -0,.0667
6.0009 24,.0742
1 .9348 -11..2912
-1..0109
0.2811
0,.0000
0,.0000

6.6347
47,.5972
-3,.7071
0..0910
0,.0000

Z Pr > |Z|
1 .92
5 .97
-3 .88
-1 .64

0.0547
<.0001
0.0001
0.1018

12.4.

511

EXAMPLE 12.2: MOTHER RAT DATA

In the selected output above, the GEE model information tells us that there are 46
clusters in the data, with each cluster ranging from 2 to 16 (that is, the number of
litters). This implies that while some clusters have sixteen offspring, some have just
two offspring. There would therefore be a 16 x 16 working correlation matrix. Since
the correlation structure assumes equal correlations between pairs of observations
within each cluster, we have therefore presented the first seven columns of row 1 for
this matrix below. Essentially, p = 0.1384.
Working Correlation Matrix
COL1
1.0000

COL2
0.1384

COL4
0.1384

COL3
0.1384

COL5
0.1384

COL6
0.1384

COL7
0.1384

The estimate of the intracluster correlation coefficient using weighted (GEE) is


0.1384 (SAS software doesn't print out the pvalue, but a test would probably reject
HQ : p = 0). We observe here that the intracluster correlation is probably significant,
and we would expect its estimated value to be smaller for a model without the
covariates and should increase with a model involving one or more covariates that
are essential for a better explanation of the variability within the clusters.
Naive Estimate Under Independence
A similar model employing the independence covariance structure (the usual logistic
model under the scoring algorithm) is implemented with the following SAS software
program with the corresponding selected output.
set rat;
proc genmod data=rat; class cluster sex ; model defect = dose weight sex /
link=logit dist=binomial; repeated subject=cluster / type=IND ; /* naive independ */
run;
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter

Estimate

Empirical 957, Confidence Limits


Std Err
Lower
Upper
Z

INTERCEPT
DOSE
WEIGHT
SEX
F
SEX
M
Scale

2 .5701
34,.3778
-6..7969
-0,.2840
0,.0000
0..9200

12.4.3

Comparing Estimates

1..7347
6,.1975
2,.0081
0,.3148
0,.0000

-0..8299
22 .2310
-10 .7327
-0..9010
0..0000

5.,9700 1.4816
46.,5246 5.5471
-2.,8610 -3.385
0,.3330 -.9021
0..0000 0 . 0000

Pr>|Z|

0..1385
0..0000
0..0007
0.,3670
0.,0000

The parameter estimates from the GEE with exchangeable correlation and independent structures are displayed below.
Parameter
INTERCEPT
DOSE
SEX F
WEIGHT

EXCH
Estimate
3.2840
35.836*
-0.4600
-7.499*

* implies significance at .05 using robust variance


The following observations are presented:

IND
Estimate
2.5701
34.378*
-0.2840
-6.797*

512

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

The estimates from both methods are similar (since both are asymptotically
unbiased).
The biggest difference is in SEX effect, which is not significant using either
estimate. We estimate that the odds of a birth defect increases by a factor of
exp(-&) ~exp(.46) = 1.58
for male versus female offspring. The other two covariates appear to significantly predict birth defects.
From this output, we see that male offspring and offspring whose mother had
higher doses tend to have increased odds of birth defects, and offspring who
weigh more have a decreased odds of a birth defect.
We estimate that the odds of a birth defect increases by a factor of
exp(/3s * --1) exp(-7.5 * -.1) = 2.12
for an offspring that weighs .1 kg less.
Similarly, we estimate that the odds of a birth defect increases by a factor of
exp(4i * -15) exp(35.84 * .15) = 216.16
for those whose mother has a high dose (.15) versus no dose (0).
All the odds above are computed using the parameters estimates from the exchangeable model.

12.4.4

Naive Standard Errors

By default, SAS PROC GENMOD prints out the robust (or empirical) standard
errors of the parameter estimates under GEE. We can therefore obtain printed values of the "naive" or sometimes called "model-based" standard errors by including
the modelse option in the repeated statement. The model based-parameter parameter standard errors for the GEE under EXCH and IND correlation structures
are respectively given below.
proc genmod data=rat;
class cluster sex ;
model defect = dose weight sex /
link=logit dist=binomial; repeated subject=cluster / type=EXCH
run;

modelse;

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates
(EXCH STRUCTURE)

Parameter
Intercept
dose
weight
sex
F
sex
M
Scale

Standard
Estimate
Error
3.2840
35.8357
-7.4991
-0.4600
0.0000
1.0000

95'/. Confidence
Limits
Lower
Upper

1.5677
0.2113
6.2331 23.6190
1.7409 -10.9113
0.3412 -1.1287
0.0000
0.0000
.
.

6.3567
48.0524
-4.0870
0.2087
0.0000
.

Z Pr > |Z|
2.09
5.75
-4.31
-1.35
.

0.0362
<.0001
<.0001
0.1776
.

12.4. EXAMPLE 12.2: MOTHER RAT DATA

513

Model-Based Standard Error Estimates


(IND STRUCTURE)

Standard
Estimate
Error

Parameter
Intercept
dose
weight
sex
F
sex
M
Scale

2,.5701
34,.3778
-6 .7969
-0,.2840
0,.0000
1..0000

1,.4737
4..3212
1..5775
0,.3498
0.,0000

95'/, Confidence
Limits
Lower
Upper

-0 .3183
25 .9083
-9 .8886
-0 .9695
0,.0000

5.4584
42 .8473
-3 .7051
0.4016
0.0000

Z Pr > |Z|

1..74
7..96
-4,.31
-0..81

0.0812
<.0001
<.0001
0.4169

We notice immediately that while the parameter estimates are identical to the
earlier cases respectively, the standard errors are now different. The standard errors
presented in the IND case now equals those that would normally be obtained from a
logistic regression using reweighted least squares, that is, from the WALD approach.

12.4.5

Efficiency

We now examine the efficiency of the GEE versus the ordinary logistic regression
(GEE independence). The table below gives the estimate efficiency for the independence GEE (ordinary logistic regression) versus the weighted GEE.

PARAMETER
INTERCEP
DOSE
SEX
WEIGHT

VARIANCE
IND
EXCH
2.9227
3.0092
36.0108 38.4090
0.0790
0.0991
3.7435
4.0325

Efficiency (%)
[Var(EXCH)/Var(IND)]
97.13
93.76
79.72
92.83

Now, we see that, for the cluster-level covariate (dose, and actually the intercept), ordinary logistic regression is almost as efficient as weighted logistic
regression.
Ordinary logistic regression appears to be pretty efficient for estimating the
effects of weight, a within-cluster covariate.
However, we see that it appears to be very inefficient (79.72%) for estimating
the effect of sex, a within-cluster covariate.
In general, ordinary logistic regression can be very efficient for estimating the
effects of cluster-level covariates, but can be inefficient for the effects of within
cluster covariates.

12.4.6

Hypothesis Involving Parameters

We can also use PROG GENMOD to make joint tests about the /3's. For
example, suppose we want to test that none of the covariates in the model
below are important:

514

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA


logit(7Tij) = 00 + 0i dose* + 02 sexij + fa weight^That is, we wish to test the hypothesis:
H0: 0i= 0,

& = 0,

(33 = 0

We can write this null as a contrast of the elements of the parameter vector
(3 = [0o ,0i, 02,0s]'- In particular, we can use

or

0 1 0 0
0 0 1 0
0 0 0 1

02
03

Since there is no likelihood involved with the estimating equations, the statistic
calculated for a contrast is a Wald statistic (a multivariate generalization of the
estimate divided by its standard error). The contrast statement is implemented in
GENMOD for the EXCH model for instance by:
set rat; if sex='F' then sexg=l; if sex='M' then sexg=0;
run;
proc genmod data=rat;
class cluster sex; model defect = dose weight sexg /
link=logit dist=binomial; CONTRAST 'no effects'
dose 1 weight 0 sexg 0,
dose 0 weight 1 sexg 0,
dose 0 weight 0 sexg 1 / wald;
CONTRAST 'ALTER' dose 1, weight 1, sexg 1/wald;
repeated subject=cluster / type=EXCH ; run;
Contrast Results for GEE Analysis

Contrast
NO EFFECT
ALTER

DF

ChiSquare

Pr > ChiSq

Type

3
3

43.77
43.77

<.0001
<.0001

Wald
Wald

The contrast test gives a Qw = 43.77 On 3 d.f. with p =< 0.0001. A similar
result was obtained from the GEE with IND correlation structure. Here again,
Qw 37.34 on 3 d.f. with p < 0.0001. Both tests tell us that there are covariate
effects. The hypothesis above can alternatively be constructed within SAS software
as indicated with the "ALTER" contrast. Both contrast formulations lead to the
same result as expected.

12.5

The Six Cities Longitudinal Study Data

Now, we apply the above methods to the Six Cities Study (Ware et ah, 1984) on the
health effects of pollution. We analyze below the data from only two of the cities.
Children in Kingston, Harriman, and Portage were examined for wheezing at each
of ages 7 through 10 years of age. The mothers' smoking habits were also recorded
at the start of the study.
The response of interest at age t (t 7,8,9,10) is the wheeze status of the child,
where

12.5. THE SIX CITIES LONGITUDINAL STUDY DATA

515

0 if no wheeze
1 if wheeze
The covariates are city;, the child's city of residence (city; equals 1 if the child lived
in Kingston-Harriman, the more polluted city, and 0 if the child lived in Portage);
smoke;j, the maternal cigarette smoking at that age in packs per day and the child's
age in years (tn = 7, t^ = 8,tis = 9,i;4 = 10). The complete data are presented in
appendix G.2.
Interest centers on whether age has an effect (because as the child gets older,
we would expect he/she to get stronger physically, and consequently wheeze less).
Since age has four levels and equidistance, we sought to find whether this effect
is linear, quadratic, or cubic. We therefore created linear, quadratic, and cubic
orthogonal polynomials in age, say (agel;j,ageq;j, agec;j). The coefficients of these
polynomials are displayed below. In particular, we have the following orthogonal
polynomials,
Status =

agel
ageq
agec

7
-3
1
-1

Age
8
9
-1
1
-1 -1
3 -3

10
3
1
1

Let the probability of wheeze be modeled as


^ = pr[WHEEZEi:7- = YES^]
Then, the model becomes
J
log ( -*a
\ = 0o + 0i city; + 02 smokeij + 03 agel;
V 1 - TTi j J

(12.11)

+ 04 ageq;j -f 0$ agec^-

or as

ij] =0Q+0i city; + 02 smoke; j + 03 agel^-

(12.12)

+ 0,
The model is implemented in SAS software with the following statements (with the
necessary data transformation statements). The orthogonal components of the age
effects are defined as tl,tq, and tc in the SAS software statements.
DATA ONE(KEEP=ID City TIME Smoke WHEEZE);
infile 'six.dat';
INPUT ID City SI S2 S3 S4 WHEEZE1 WHEEZE2 WHEEZE3 WHEEZE4;
WHEEZE=WHEEZE1; TIME=1; Smoke=Sl; OUTPUT;
WHEEZE=WHEEZE2; TIME=2; Smoke=S2; OUTPUT;
WHEEZE=WHEEZE3; TIME=3; Smoke=S3; OUTPUT;
WHEEZE=WHEEZE4; TIME=4; Smoke=S4; OUTPUT; RUN;
/* FORMING ORTHOGONAL POLYNOMIALS */
DATA TWO; SET ONE;
if time=l then do; tl=-3; tq=l; tc=-l; end;
if time=2 then do; tl=-l; tq=-l; tc=3; end;
if time=3 then do; tl=l; tq=-l; tc=-3; end;
if time=4 then do; tl=3; tq=l; tc=l; end; run;
proc genmod data=two;
class time id;
model outcome = city smoke tl tq tc/dist=b link=logit type3;
repeated subject=id/ type=EXCH corrw WITHINSUBJECT=time modelse;
/*WITHINSUBJECT is time variable*/;

516

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

The data for this example are presented in appendix G.2; even though it has a
lot of missing values, the sample size is still sufficient to estimate an unstructured
correlation matrix, which is the most general.
Although we have specified the exchangeable correlation structure in the model
statement, we now consider other correlation structures that have gained wide acceptance in the GEE theory.
1. Autoregressive

If we assume that the correlation structure has a first-order autoregressive


model, AR(1), then the structure is of the form:
Pijk = p' J ~ fc '

0< p<1

Here, the correlation between two observations c times apart is pc~l. That is,
for T = 4, we have
1 p p2 p3
P3

P2

That is, adjacent observations have higher correlations than noadjacent ones.

2. MDEP (M)
The in-dependent structure with m = 1, 2, , has the correlation structure
of the following form:
=p

unless r > M, in which


The 1-dependent structure has for instance
1
p
0
0

p
1
p
0

0
p
1
p

0
0
p
1

which indicates that correlations between adjacent observations are nonzero


and equal.
The corresponding 2-dependent structure has
1 Pi P2 0
Pi 1 Pi P2
P2

Pi

P2

1 Pi

Pi

which again indicate that observations one time period and two time periods
apart have respectively nonzero and equal correlations pi and p2, respectively.

3. Unstructured
Here the correlation matrix is unstructured and has the form
Pijk = Pjk

which again indicates that all correlations are to be estimated independently


from the data.

12.5. THE SIX CITIES LONGITUDINAL STUDY DATA

517

The above correlation structures are specified in the REPEATED statement option
as type = AR(1), type = MDEP(2), and type = UN, respectively. The type =
IND still refers to the situation where we are assuming that the observations are
independently distributed. We present below the analysis of GEE parameter estimates with empirical standard error results when each of these correlation structures
is invoked on the data in this example.
Correlation Structure: Independence

Parameter Estimate
Intercept
City
Smoke
tl
tq
tc

-1 .6131
0.5177
0.0128
-0 .0531
0.0072
-0 .0227

Standard
Error
0.1595
0.1916
0 . 0076
0 . 0250
0.0513
0.0206

957, Confidence
Limits
-1 .9256
0.1420
-0 .0021
-0 .1021
-0 .0934
-0 .0632

-1 .3005
0.8933
0.0278
-0 .0041
0.1078
0.0177

Z Pr > |Z|
-10 .12
2 .70
1 .68
-2 .12
0.14
-1 .10

<. 0001
0.0069
0.0929
0.0338
0.8884
0.2708

Correlation Structure :: Exchangeable

Parameter Estimate
Intercept
City
Smoke
tl
tq
tc

-1 .6305
0.5357
0.0124
-0 .0492
-0 .0033
-0 .0220

Standard
Error
0.1535
0.1891
0.0069
0.0247
0.0503
0 . 0204

95"/. Confidence
Limits
-1 .9313 -1 .3298
0.1650
0.9064
-0 .0011
0.0259
-0 .0977 -0 .0007
-0 .1019
0.0954
-0 .0620
0.0180

Z Pr > IZI
-10 .63
2.83
1 .80
-1 .99
-0 .06
-1 .08

<. 0001
0.0046
0.0715
0.0470
0. 9482
0. 2816

Correlation Structure :: AR(1)

Parameter Estimate
Intercept
City
Smoke
tl
tq
tc

-1 .6406
0.5478
0.0132
-0 .0477
0.0076
-0 .0254

Standard
Error
0.1555
0.1899
0.0072
0.0244
0.0503
0 . 0208

957. Confidence
Limits
-1 .9454
0.1755
-0 .0008
-0 .0955
-0 .0910
-0 .0662

-1 .3357
0,.9200
0.0273
0,.0001
0..1062
0..0153

Z Pr > |Z|
-10 .55
2 .88
1 .85
-1 .95
0.15
-1 .22

<. 0001
0. 0039
0. 0650
0.0507
0.8802
0. 2213

Correlation Structure : MDEP(l)

Parameter Estimate
Intercept
City
Smoke
tl
tq
tc

-1 .6238
0.5334
0.0127
-0 .0429
0.0231
-0 .0274

Standard
Error
0.1561
0 . 1906
0.0073
0 . 0244
0.0501
0.0214

957. Confidence
Limits
-1 .9298
0.1599
-0 .0016
-0 .0908
-0 .0752
-0,.0693

-1,.3178
0,.9070
0,.0270
0,.0049
0,.1213
0,.0145

Z Pr > IZI
-10 .40
2 .80
1 .74
-1 .76
0.46
-1 .28

<. 0001
0.0051
0.0820
0.0787
0.6452
0. 1997

Correlation Structure : MDEP(2)

Parameter Estimate

Standard
Error

-1..6600
0.,5742
0.,0136
-0.,0512
-0.,0041
-0.,0245

0.1594
0.1948
0.0075
0.0247
0.0515
0.0207

Intercept
City
Smoke
tl
tq
tc

957. Confidence
Limits
-1..9724
0.,1924
-0.,0011
-0. 0995
-0. 1049
-0. 0650

-1.,3477
0.,9559
0. 0283
-0. 0028
0. 0968
0. 0160

Z Pr > IZI
-10..42
2..95
1.,82
-2.,07
-0.,08
-1. 19

<. 0001
0. 0032
0. 0691
0.0382
0.9371
0. 2353

518

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

Correlation Structure: MDEP(3)

Parameter Estimate
Intercept
City
Smoke
tl
tq
tc

-1,.6320
0.5368
0.0127
-0 .0478
0.0022
-0,.0235

Standard
Error

957, Confidence
Limits

0..1534 -1.9327 -1..3313


0,.1889
0 . 1666 0.9069
0..0069 -0.0008
0.0261
0,.0245 -0.0959
0.0003
0..0501 -0.0960
0.1003
0,.0206 -0.0638
0,.0168

Z Pr > |Z|
-10 .64
2 .84
1 .84
-1 .95
0.04
-1 .14

<,.0001
0,.0045
0,.0662
0,.0516
0,.9656
0,.2536

Correlation Structure: Unstructured

Parameter Estimate

Standard
Error

-1 .6304
0,.5366
0,.0124
-0,.0478
0.0025
-0 .0235

0,.1534
0.,1888
0,,0069
0,.0245
0,.0500
0..0206

Intercept
City
Smoke
tl
tq
tc

12.5.1

957. Confidence
Limits
-1,.9311
0.,1665
-0..0011
-0..0959
-0,.0955
-0 .0639

-1..3298
0. 9068
0. 0260
0.,0003
0..1006
0.,0168

Z Pr > |Z|
-10,.63
2.,84
1.,80
-1..95
0,.05
-1,.14

<.0001
0 . 0045
0.0711
0.0512
0.9597
0.2530

Comparing Parameter Estimates

If we look at the estimates of /3 using the different correlation models, we see


that the estimates of j3 are similar using all of the correlation structures. This
is to be expected since they are all (asympotically) unbiased. In particular, city
is significant, and smoking is marginally significant. The linear age effect is also
marginally significant.
Suppose we want to test for no age effect in the model defined in (12.11), that
is, we wish to test
#o : 13s = fa = fo = 0
we can construct this contrast as
fa

0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1

fa
fa

This contrast is not significant using any of the correlation models when implemented in SAS software. These results are presented in Table 12.5 for the various
correlation structures considered.
CORK. MODEL
IND
EXCH
AR(1)
MDEP(l)
MDEP(2)
MDEP(3)
UNSTR

d.f.
3
3
3
3
3
3
3

Qw
6.23
5.66
5.90
5.40
6.31
5.63
5.65

pvalue
0.1008
0.1292
0.1165
0.1447
0.0974
0.1310
0.1300

Table 12.5: Test for no age effects

12.5. THE SIX CITIES LONGITUDINAL STUDY DATA

519

The results in Table 12.5 indicate that irrespective of the correlation structures
employed, we would fail to reject the null hypotheis. Thus indicating that there is
no age effect. The test is implemented in SAS software by the contrast statement
'no time effects' tl 1, tq 1, tc 1/e wald. This relevant SAS software statement
is presented under the Estimated Relative Efficiency section.
Alternatively, we could have conducted the above test by looking at the loglikelihood. When the age effects (that is, the full model) are in the model for the
exchangeable structure for instance, the log-likelihhod is 726.5196. When the
effects are removed from the model (leading to a reduced model), the new loglikelihood is 728.4838. Hence, a test of hypotheses:
HO ' 02 Ps = 04 = 0 versus HA ' at least one (3^0
equals
-2 * (log-likelihoodreduced - log-likelihoodFull) = -2{-728.4838 + 726.5196}
= 3.9284
or we could have used the differences in the deviance values which are respectively
for the reduced and full model equal 1456.9677 (on 1384 d.f.) and 1453.0392 (on
1381 d.f.). Once again, the test is based on
(Deviance^educed - Deviance Fu //)/</> = (1456.9677 - 1453.0392)/1.00
= 3.9285
Because the cubic component of the age effect is not significant from all the models,
we therefore removed this effect from all subsequent models. We give below the
estimated correlation matrices for the responses ages 7, 8, 9, and 10 for the different
models.

12.5.2

Correlation Matrices

Exchangeable

Rowl
Row2
Row3
Row4
AR(1)

Rowl
Row2
Row3
Row4

Correlation Matrix

Coll
1 . 0000
0.4289
0 . 4289
0.4289

Col2
0.4289
1 . 0000
0.4289
0.4289

Col3
0.4289
0.4289
1 . 0000
0 . 4289

Col4
0.4289
0.4289
0.4289
1 . 0000

Correlation Matrix
Coll
1 . 0000
0.4814
0.2317
0.1115

Col2
0.4814
1.0000
0.4814
0.2317

Col3
0.2317
0.4814
1 . 0000
0.4814

Co 14
0.1115
0.2317
0.4814
1 . 0000

Coll
1 . 0000
0.4824
0 . 0000
0 . 0000

Col2
0 . 4824
1.0000
0 . 4824
0 . 0000

Col3
0 . 0000
0.4824
1 . 0000
0 . 4824

Col4
0 . 0000
0 . 0000
0.4824
1 . 0000

Coll
1.0000
0.4807
0 . 3686
0.0000

Col2
0.4807
1.0000
0.4807
0.3686

Col3
0.3686
0 . 4807
1 . 0000
0.4807

Col4
0 . 0000
0 . 3686
0.4807
1 . 0000

MDEP(l)
Rowl
Row2
Row3
Row4
MDEP(2)
Rowl
Row2
Row3
Row4

520

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

MDEP(3)
Rowl
Row2
Row3
Row4

Coll
1.0000
0.4806
0.3681
0.4000

Col2
0.4806
1.0000
0.4806
0.3681

Col3
0.3681
0.4806
1.0000
0.4806

Col4
0.4000
0.3681
0.4806
1.0000

Col3
0 . 3698
0 . 4980
1 . 0000
0 . 4906

Col4
0.3998
0.3738
0.4906
1 . 0000

Unstructured Correlation Matrix

Rowl
Row2
Row3
Row4

Coll
1 . 0000
0.4713
0.3698
0 . 3998

Col2
0.4713
1 . 0000
0 . 4980
0.3738

If one looks at the unstructured correlation matrix, observations closest in time


are the most highly correlated. Further, the correlation matrix from the unstructured model looks pretty similar to the MDEP(3) correlation matrix, meaning the
MDEP(3) correlation matrix is probably a good fit. What is obvious from this
analysis, and which is most important, is that the observations look far from independent, and, we would like to see how much efficiency we gain in estimating
/3 when using more complex correlation structures over independence (ordinary
logistic regression).

12.5.3

Interpretation of Parameter Estimates

Since ft\ is approximately 0.5 using all of the correlation structures, the more polluted city (Kingston-Harriman) tends to increase the odds of wheezing by a factor
?s exp(.5) = 1.65
(given the age and maternal smoking level).
Since /^ is approximately .013 using all of the correlation structures, an increase
in two packs (40 cigarrettes) per day smoked by the mother tends to increase the
odds of the child wheezing by a factor of
exp[/32 40)] w exp[.013 40] = 1.68
(given the age and city).

12.5.4

Confidence Intervals

A 95% confidence interval for the log-odds ratio can be calculated via,

l
where Var(ftf] comes from the output (and it is better to use the robust variance).
Thus, a 95% C.I. for the odds ratio is calculated by exponentiating the endpoints
of this confidence interval,
[\
/
exp 0j 1.96y
Suppose we use the exchangeable correlation model for confidence intervals (with
the robust variance):

12.5. THE SIX CITIES LONGITUDINAL STUDY DATA

521

Correlation Structure: Exchangeable


PARAMETER ESTIMATES with robust variance
Empirical Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl
tq
tc

-1.6305
0.5357
0.0124
-0.0492
-0.0033
-0.0220

Standard
Error
0.1535
0.1891
0.0069
0.0247
0.0503
0.0204

957. Confidence
Limits
-1.9313 -1.3298
0.1650
0.9064
-0.0011
0.0259
-0.0977 -0.0007
-0.1019
0.0954
-0.0620
0.0180

Z Pr > |Z|
-10.63
2.83
1.80
-1.99
-0.06
-1.08

<.0001
0.0046
0.0715
0.0470
0.9482
0.2816

Then a 95% confidence interval for the city odds ratio is


exp[0.536 1.96(0.189)] = [1.18, 2.48]
Similarly, the 95% confidence interval for an increase in two packs (40 cigarrettes)
per day smoked by the mother is

r-

/-- -11

exp ^ 40 /33 1.96y Var(J33) I = exp{40[0.0124 1.96(0.00689)]}


I L
J)
= [0.96, 2.82]

12.5.5

(12.13)

Estimated Relative efficiency

To get a rough idea of the asymptotic efficiency of the estimates, we can compare
the robust variance estimators of J3 under two different correlation models. First, we
give parameter estimates and the robust standard errors under the different models
for the case excluding both the quadratic and cubic age effects (since the proceeding
analysis indicate that both effects are not significant at a = 0.05) under the IND,
EXCH, AR(1), MDEP(3) and UNSTR correlation structures. Partial SAS software
outputs for these implementation are presented below.
Independence:
proc genmod data=two ;
class id time;
model wheeze = city smoke tl tq/ link=logit dist=binomial;
contrast 'no time effects' tl I/ e wald;
repeated subject=id /type=IND corrw WITHINSUBJECT=time modelse;
/* WITHINSUBJECT is time variable */
run;

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6137
0.5181
0.0128
-0.0530

Standard
Error
0.1596
0.1917
0.0076
0.0251

95X Confidence
Limits
-1.9266
0.1425
-0.0021
-0.1023

-1.3008
0.8938
0.0278
-0.0038

Z Pr > |Z|
-10.11
2.70
1.68
-2.11

<.0001
0.0069
0.0920
0.0347

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

522

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6137
0.5181
0.0128
-0.0530

Standard
Error
0.1076
0.1318
0.0057
0.0293

95'/. Confidence
Limits
-1.8245
0.2597
0.0017
-0.1105

Z Pr > |Z|

-1.4029
0.7765
0.0240
0.0044

-15.00
3.93
2.25
-1.81

<.0001
<.0001
0.0244
0.0702

Contrast Results for GEE Analysis

Contrast
no linear effect

DF

ChiSquare

Pr > ChiSq

Type

4.46

0.0347

Wald

Exchangeable:
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6308
0.5367
0.0124
-0.0488

Standard
Error
0.1537
0.1891
0.0069
0.0248

957, Confidence
Limits
-1.9321
0.1661
-0.0011
-0.0974

-1.3295
0.9073
0.0258
-0.0003

Z Pr > IZI
-10.61
2.84
1.80
-1.97

<.0001
0.0045
0.0724
0.0485

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6308
0.5367
0.0124
-0.0488

Standard
Error
0.1516
0.1895
0.0073
0.0229

957. Confidence
Limits
-1.9279
0.1652
-0.0019
-0.0937

-1.3337
0.9082
0.0266
-0.0040

Z Pr > |Z|
-10.76
2.83
1.70
-2.13

<.0001
0.0046
0.0895
0.0329

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6399
0.5496
0.0132
-0.0526

Standard
Error
0.1555
0.1901
0.0072
0.0244

95*/i Confidence
Limits
-1.9446
0.1771
-0.0009
-0.1005

-1.3351
0.9222
0.0272
-0.0047

Z Pr > |Z|
-10.55
2.89
1.84
-2.15

<.0001
0.0038
0.0661
0.0312

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6399
0.5496
0.0132
-0.0526

Standard
Error
0.1439
0.1784
0.0070
0.0291

957. Confidence
Limits
-1.9219
0.2000
-0.0005
-0.1096

-1.3578
0.8993
0.0269
0.0043

Z Pr > IZI
-11.40
3.08
1.88
-1.81

<.0001
0.0021
0.0595
0.0701

12.5. THE SIX CITIES LONGITUDINAL STUDY DATA

523

MDEP(3):
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6328
0.5383
0.0126
-0.0506

Standard
Error
0.1536
0.1889
0.0069
0.0245

957, Confidence
Limits
-1.9338
0.1681
-0.0009
-0.0986

-1.3317
0.9085
0.0261
-0.0026

Z Pr > |Z|
-10.63
2.85
1.83
-2.07

<.0001
0.0044
0.0671
0.0388

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6328
0.5383
0.0126
-0.0506

Standard
Error
0.1520
0.1898
0.0073
0.0243

95% Confidence
Limits
-1.9306
0.1662
-0.0017
-0.0982

-1.3349
0.9103
0.0269
-0.0030

Z Pr > |Z|
-10.74
2.84
1.73
-2.08

<.0001
0.0046
0.0831
0.0374

Unstructured:
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard
95'/. Confidence
Parameter Estimate
Error
Limits
Intercept
City
Smoke
tl

-1.6328
0.5386
0.0124
-0.0501

0.1536
0.1889
0.0069
0.0245

-1.9339
0.1683
-0.0012
-0.0981

-1.3317
0.9089
0.0259
-0.0021

Z Pr > I Z I

-10.63
2.85
1.79
-2.05

<.0001
0.0044
0.0729
0.0408

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

-1.6328
0.5386
0.0124
-0.0501

Standard
Error
0.1522
0.1901
0.0073
0.0243

95% Confidence
Limits
-1.9310
0.1661
-0.0019
-0.0977

-1.3346
0.9112
0.0267
-0.0025

Z Pr > IZI
-10.73
2.83
1.70
-2.06

<.0001
0.0046
0.0890
0.0390

From the outputs above, the estimated (robust) standard errors (empirical s.e) of J3
under exchangeability, AR(1), MDEP(3), and unstructured are very similar under
the reduced model:
logit[7Tij] = fa + pi city^ + (32 smoke^ + fa agel^-

(12.14)

Hence, we appear to gain the greatest efficiency for estimating ft by assuming an


exchangeable model instead of independence, and we do not appear to gain much
efficiency by assuming AR(1), MDEP(M), or unstructured instead of exchangeable.
As long as we use some correlation model other than independence, the simplest of
which is exchangeable, we gain efficiency. The estimated relative efficiency of the
elements of 0 for an independence model versus an exchangeable model is given in
Table 12.6.
The efficiencies are very high, with the smallest being for the maternal smoking
effect. The estimates under the naive assumption of independence appear very efficient for all effects, except for the within-cluster (time-varying) covariate Maternal
Smoking. This result is very similar to the result we found for the teratology data
set in example 12.2. In that data set, the within-cluster covariate was the sex of

524

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

Parameters
Intercept
City
Smoke
tl

Standard errors
EXCH
IND
0.1537
0.1596
0.1891
0.1917
0.0069 0.0076
0.0248 0.0251

Efficiency (%)
[Var(EXCH)/Var(IND)]
92.74
97.31
82.43
97.62

Table 12.6: Relative efficiency of IND versus EXCH models

the offspring, and the efficiency of the independence (ordinary logistic regression)
estimates was only 79.72%. As a general rule, therefore, ordinary logistic regression gives high efficiency for the time-stationary (cluster-level) covariates, and low
efficiency for the time-varying (within-cluster) covariates.

12.5.6

"Naive" Versus Robust Variance Estimate

Thus far, we have not discussed using the "naive" variance estimate. One question
often asked is, when can I use the naive estimate of variance (the one that assumes
you have modeled the correlation structure correctly)? The general rule is that for
the robust estimator to be a good estimate, the number of clusters (n) should be
large. On the other hand, if you have specified the correlation model correctly, the
naive variance estimator will be a good estimate as long as X^i Ki is large. This
can occur if either the number of subjects within the cluster is large (Ki is large)
or the number of clusters (n) is large, or both are intermediate. Lipsitz (1999)
suggested that, for instance, that when n is small, say n < 25, then you should
carefully model your correlation so that you can use the naive variance estimate
(the robust is not a good estimate in this case).
However, if only n = 20 children had been available, with wheeze measurements
each day for a year (Ki = 365 days), then one would be better off assuming that
the correlation model is correct, and use the "naive" variance estimate. For the Six
Cities Study example in this section, the number of clusters n = 412 is large enough
to use the robust variance estimator (you would like to have at least n = 25).
One may also wish to calculate the relative bias of the "naive" estimator by
looking at
REL BIAS =

SE(naive) - SE(robust)
SE (robust)

This is necessary because with large n, the "robust" variance estimators are correct,
and the naive are only correct if we have modeled the correlation correctly. Thus
we can get an idea of the relative bias of the naive estimator by looking at the
relative bias under different correlation structures. We present these results for
both the independence and exchangeable correlation structures below under the
reduced model which excludes the quadratic and cubic age effects. First we present
that of the independence correlation structure:

12.5. THE SIX CITIES LONGITUDINAL STUDY DATA

525

(a) Ordinary Logistic Regression


Correlation Structure: Independence
PARAMETER ESTIMATES with naive variance (Model-Based Estimates)
Standard
95'/, Confidence
Parameter Estimate
Error
Limits
Intercept
City
Smoke
tl

-1.6137
0.5181
0.0128
-0.0530

0.1076
0.1318
0.0057
0.0293

-1.8245
0.2597
0.0017
-0.1105

-1.4029
0.7765
0.0240
0.0044

Z Pr > |Z|
-15.00
3.93
2.25
-1.81

<.0001
<.0001
0.0244**
0.0702

PARAMETER ESTIMATES with robust variance (Empirical Estimates)

Parameter Estimate
Intercept
City
Smoke
tl

Standard
Error

-1.6137
0.5181
0.0128
-0.0530

0.1596
0.1917
0.0076
0.0251

957. Confidence
Limits
-1.9266
0.1425
-0.0021
-0.1023

-1.3008
0.8938
0.0278
-0.0038

Z Pr > |Z|
-10.11
2.70
1.68
-2.11

<.0001
0.0069
0.0920
0.0347

The biases of the standard errors of the parameters are summarized in Table
12.7.
Parameter
INTERCEP
City
Smoke
tl

SE(NAIVE)
0.1076
0.1318
0.0057
0.0293

SE(ROBUST)
0.1596
0.1917
0.0076
0.0251

Relative
bias
-0.3258
-0.3125
-0.2500
0.1673

Table 12.7: Relative bias of standard errors under independence model


Here, we see that the relative biases are mostly greater than 25%. A "myth"
that has prevailed is that, since ordinary logistic regression treats all observations within clusters as independent, we are in effect assuming that we have
more information than we actually do, and that we will therefore always underestimate the true variance. We see here that that is not always true, i.e.,
the naive variance does not always underestimate the true variance: It depends on what parameter you are estimating. We demonstrate this with a
simple theoretical justification below:
Suppose we have two correlated means, Y\ and Y-2, where Cov(Yi, y2) > 0, and
also suppose we look at (Yi Y^), then the naive variance under independence
is
Var (Yi - F2) = Var (Yi) + Var(Y2)
which is larger than the true variance,
Var(Yi - y2) = Varft) + Var(y 2 ) - 2Cov(Yi, y2)
since Cov(Yi,Y 2 ) > 0.
On the other hand, suppose we consider instead, (Yi + Y^)- The naive variance
under independence is again,
Var (Yi + y2) = Var (Yi) + Var (Y2)

526

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA


which this time is smaller than the true variance,
Var (Yi - y2) = Var(Yi) + Var (Y2) + 2Cov(F 1 ,y 2 )

12.5.7

Effect of Bias on P Values

From the results from both the "naive" (model-based) and "robust" (empirical) estimates, smoking appears significant using the "naive" estimate, but
not significant when the robust estimate is employed under the independence
model. We present again the results below.
Hence, we should always use the robust variance when using ordinary logistic
regression.
(b) Bias of Naive Variance in the Exchangeable Model
Results under this model are similarly displayed below.
Correlation Structure: Exchangeable
PARAMETER ESTIMATES with naive variance

Parameter Estimate
Intercept
City
Smoke
tl

-1.6308
0.5367
0.0124
-0.0488

Standard
Error
0.1516
0.1895
0.0073
0.0229

95% Confidence
Limits
-1.9279
0.1652
-0.0019
-0.0937

-1.3337
0.9082
0.0266
-0.0040

Z Pr > |Z|
-10.76
2.83
1.70
-2.13

<.0001
0.0046
0.0895
0.0329

PARAMETER ESTIMATES with robust variance

Parameter Estimate
Intercept
City
Smoke
tl

-1.6308
0.5367
0.0124
-0.0488

Standard
Error
0.1537
0.1891
0.0069
0.0248

957. Confidence
Limits
-1.9321
0.1661
-0.0011
-0.0974

-1.3295
0.9073
0.0258
-0.0003

Z Pr > |Z|
-10.61
2.84
1.80
-1.97

<.0001
0.0045
0.0724
0.0485

The corresponding summary table for the biases is similarly presented in Table
12.8. The relative biases of the "naive" standard errors from the exchangeable
model are all less than 10%.
Variable
Intercept
City
Smoke
tl

SE(NAIVE)
0.1516
0.1895
0.0073
0.0229

SE(ROBUST)
0.1537
0.1891
0.0069
0.0248

Relative
bias
-0.0137
0.0021
0.0580
-0.0766

Table 12.8: Relative bias of standard errors under exchangeable model


For the exchangeable model, unlike the independence model, smoking is not
significant using either the "naive" estimate and the robust estimate under
the reduced model (pvalue >0.05 in each case).

12.6. ANALYSIS OF NONBINARY RESPONSE DATA

527

The point is that the robust estimate should definitely be used if you are using
the idependence correlation model and it is still a good idea to use it with the
other correlation models, but probably not 100% necessary, especally when n
is small (since the robust estimate needs n to be large for it to be good).

12.5.8

Parsimonious Model

For the data in Example 12.3, both the quadratic and cubic effects of age are not
significant as shown previously; hence the reduced model is given by the expression
in (12.14). This model only incorporates the linear effect of age in the model.
The robust parameter estimates under this model with the exchangeable correlation structure are again displayed below.
proc genmod data=two;
class id time;
model wheeze=city smoke tl/dist=bin link=logit;
repeated subject=id/type=exch corrw within=time modelse;
run;
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter Estimate
Intercept
City
Smoke
tl

1.6308
-0.5367
-0.0124
0.0488

Standard
Error
0.1537
0.1891
0.0069
0.0248

95'/. Confidence
Limits
1.3295
-0.9073
-0.0258
0.0003

1.9321
-0.1661
0.0011
0.0974

Z Pr > |Z|
10.61
-2.84
-1.80
1.97

<.0001
0.0045
0.0724
0.0485

Both the linear effect of age and city effect are significant with an estimated correlation parameter p = 0.4287. From the output above, we can conclude the odds of
wheezing are exp(0.5367) 1.71 times higher in the more polluted city (KingstonHarriman) than in the less polluted city of Portage. Similarly, for a unit increase
in age of the child, the odds of wheezing increase by exp(0.0488) = 1.05, while the
odds increase by exp(2 * 0.0488) = 1.10 for two-unit increase in age (say from 7 to
9 years, for instance).

12.6

Analysis of Nonbinary Response Data

The examples of the GEE approach in the previous sections deal mainly with cases
in which the outcome variable is binary or dichotomous. We consider in this section
the case in which the outcome variable is not binary. The following example from
Thall and Vail (1990) relates to data arising from a clinical trial of 59 epileptics.
The complete data are presented in appendix G.3. However, we present next the
first and last 5 observations from the complete data.
id

yl

y2

y3

y4

trt

base

age

5
3
2
4
7

3
5
4
4
18

3
3
0
1
9

3
3
5
4
21

0
0
0
0
0

11
11
6
8
66

31
30
25
36
22

2
3
4
5

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

528

3
1
2
0
1

55
56
57
58
59

4
19
0
0
3

5
23
3
0
4

1
1
1
1
1

3
8
1
0
2

16
22
25
13
12

32
26
21
36
37

The data relate to patients suffering from simple or complex partial seizures who
were randomized to receive either the antiepileptic drug progabide or placebo, as
an adjuvant to chemotherapy. At each of four successive postrandomization clinic
visits (yl,y2,y3, y4), the number of seizures occurring over the previous 2 weeks
was reported. The data above further displays the age (in years) of the epileptic,
the treatment assigned and the 8-week baseline seizure counts.
Following Thall and Vail (1990), the following covariates were employed: the log
of the baseline seizure rates, obtained as the log of | of the 8-week seizure count,
log of age, the treatment, and Visit4 for the fourth clinic visit. However, before we
adopt the model proposed by Thall and Vail (1990), let us consider the four 2-week
visits as a covariate having four equally spaced levels. This leads us to further
consider employing the linear, quadratic and cubic effects of the covariate visits in
our model. We give below the SAS software program for the data step together
with a sample output of the first twenty observations for the first five epileptics.
data epilepsy;
infile 'c:\classdata\clll\lepsyii.txt';
input id yl-y4 trt base age;
datalines;
run;
data new; set epilepsy; by id;
array xx[*] yl-y4; do visit=l to 4; y=xx[visit]; output; end;
drop yl-y4; run;
DATA TWO; SET new;
***FORMING ORTHOGONAL CONTRASTS ***;
/* Statements here similar to in previous example;
logage=log(age); Iogbase4=log(base/4); v4=(visit=4); run;
proc print; run;

Obs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

id

1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
5
5
S
5

age

trt

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

11
11
11
11
11
11
11
11
6
6
6
6
8
8
8
8
66
66
66
66

31
31
31
31
30
30
30
30
25
25
25
25
36
36
36
36
22
22
22
22

visit

wkl

wkq

wkc

logage

Iogbase4

v4

1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4

5
3
3
3
3
5
3
3
2
4
0
5
4
4
1
4
7
18
9
21

-3
-1
1
3
-3
-1
1
3
-3
-1
1
3
-3
-1
1
3
-3
-1
1
3

1
-1
-1
1
1
-1
-1
1
1
-1
-1
1
1
-1
-1
1
1
-1
-1
1

-1
3
-3
1
-1
3
-3
1
-1
3
-3
1
-1
3
-3
1
-1
3
-3
1

3.43399
3 . 43399
3.43399
3.43399
3.40120
3.40120
3.40120
3.40120
3.21888
3.21888
3.21888
3.21888
3.58352
3.58352
3.58352
3.58352
3.09104
3.09104
3.09104
3.09104

1.01160
1.01160
1.01160
1.01160
1.01160
1.01160
1.01160
1.01160
0 . 40547
0 . 40547
0 . 40547
0 . 40547
0.69315
0.69315
0.69315
0.69315
2.80336
2 . 80336
2 . 80336
2 . 80336

0
0
0
1
0
0
0
1
0
0
0

1
0
0
0

1
0
0
0

529

12.6. ANALYSIS OF NONBINARY RESPONSE DATA


We propose a model of the form:
f-ij = 00 + 0i trtt + 02 base4tj + 0s trt*base4jj + 04
+ /35 wklij + 06 wkqfj- +
where
trt =

(12.15)

I if progabide
0 if placebo
V

and lij are the log of counts, trt*base4 is the interaction between treatment group
and log base4 as denned previously, and wkl, wkq, and wkc are the orthogonal linear,
quadratic, and cubic components of the weeks of visits, respectively. We present
below the results of employing the usual log-linear model to the epileptic data and
those from the GEE using the independent (IND) and exchangeable correlation
structures. The results from the other correlation structures are very similar.
data prelim;
set two;
proc genmod data=prelim;
title 'Epileptic seizure data: Standard Poisson regression model';
model y=trt Iogbase4 trt*logbase4 logage wkl wkq wkc/ dist=poisson typeS;
run;
proc genmod data=prelim;
title 'Epileptic seizure data: Marginal model with independent correlation
structure1;
class id visit;
model y=trt Iogbase4 trt*logbase4 logage wkl wkq wkc/ dist=poisson typeS;
contrast 'linear' wkl 1, wkq 1, wkc 1/e wald;
repeated subject=id/type=EXCH corrw within=visit modelse;
run;
LOG-LINEAR MODEL
Analysis Of Parameter Estimates

Parameter

DF

Estimate

Intercept

1
1
1
1
1
1
1
1
0

-2 7661
-1 3386
0 9486
0 5615
0 8876
-0 0301
-0 0180
-0.0111
1 0000

trt
Iogbase4
trt*logbase4
logage

wkl
wkq
wkc
Scale

Standard
Error

0
0
0
0
0
0
0
0
0

Wald 957. Confidence


Limits
Lower
Upper

4074
1568
0436
0635
1165
0102
0227
0101
0000

-3
-1
0
0
0
-0
-0
-0
1

5647
6459
8632
4370
6593
0502
0625
0308
0000

-1
-1
1
0
1
-0
0
0

9676
0314
0341
6860
1159
0101
0266
0087
0000

ChiSquare

46
72
473
78
58
8
0
1

GEE with IND correlation structure:


Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter
Intercept

trt
Iogbase4
trt*logbase4
logage

wkl
wkq
wkc

Standard
Estimate
Error

-2 .7661
-1 .3386
0.9486
0.5615
0,.8876
-0,.0301
-0,.0180
-0,.0111

0.9393
0.4255
0..0965
0..1739
0..2727
0..0170
0,,0438
0,,0196

957. Confidence
Limits
Lower
Upper

-4,.6072
-2,.1726
0,.7595
0,.2207
0.,3530
-0.,0634
-0.,1037
-0.,0494

-0..9250
-0,.5047
1 .1377
,
0,.9024
1,,4222
0,,0031
0,,0678
0.,0273

Z Pr > |Z|

-2,.94
-3,.15
9,.83
3,.23
3..25
-1,.78
-0.,41
-0.,57

0.0032
0.0017
< .0001
0.0012
0.0011
0.0755
0..6814
0,.5716

09
90
46
16
05
66
63
20

Pr > ChiSq
<.0001
<.0001*
<.0001*
C.OOOl*
c.OOOl*
0.0033*
0.4290
0.2724

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

530

GEE with EXCH structure:


Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter

Intercept
trt
Iogbase4
trt*logbase4
logage
wkl
wkq
wkc

95'/. Confidence
Limits
Lower
Upper

Standard
Estimate
Error

-2,.8434
-1..3528
0,.9489
0,.5696
0,.9097
-0,.0301
-0,.0180
-0,.0111

0.,9512
0.,4287
0.,0985
0.,1744
0.,2752
0.,0170
0.,0438
0.,0196

-4,.7077
-2 .1930
0.7558
0.2277
0.3704
-0 .0634
-0 .1037
-0 .0494

-0 .9792
-0 .5125
1 .1420
0.9115
1 .4491
0.0031
0.0678
0.0273

Z Pr > | Z |

-2..99
-3..16
9..63
3..27
3,.31
-1..78
-0,.41
-0,.57

0 . 0028
0.0016**
<.0001**
0.0011**
0.0009**
0.0755
0.6814
0.5716

The results indicate that the linear, quadratic, and cubic effect of weeks of visits are
not significant based on the type 3 analysis. Hence with a hypothesis of no weeks
of visits effect in (12.15), that is, testing that
=fo=07 = 0,
we can construct the following contrast:

A)
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1

A,
06

This contrast is not significant using any of the correlation structures when implemented in SAS software. The results for the exchangeable correlation structure for
instance give a Wald's test value of 6.70 on 3 degrees of freedom with a corresponding nonsignificant pvalue of p = 0.0820. Refitting the model with no weeks of visits
effect, we have for the reduced model
gij = p0 + pl trtj 4- 02 base4ij + 03 trt*base4;j + 04 age^
(12.16)
and the following parameter estimates from the various correlation structures.
Parameter

POISSON

Intercept
trt
Iogbase4
trt*logbase4
logage

-2,.7634
-1,.3356
0,.9486
0,.5615
0,.8876

EXCH

-2.,7634
-1,,3356
0,.9486
0,.5615
0,.8876

AR(1)

-3.0533
-1.4864
0.9413
0 . 6200
0.9790

UNSTR

1-DEP

-3,.0533
-1 .4864
0.9413
0.6200
0.9790

-3.,2516
-1.,5876
0.,9364
0.,6600
1.,0413

-3.0529
-1.4862
0.9413
0.6199
0.9788

-3.0932
-1.5211
0.9320
0.6359
0.9973

It is clear that not much is gained in terms of parameter estimates between the
log-linear, independent, and exchangeable correlation structures. Similarly, the parameter estimates from both the AR(1) and 1-dependent correlation structures are
exactly the same except for the different standard errors. The 2-dependent, the
3-dependent, and the unstructured correlation structures returned different parameter estimates as shown. For all correlation structures employed for this example,

12.6. ANALYSIS OF NONBINARY RESPONSE DATA

531

we notice that the standard errors produced under the GEE are much greater than
those produced from the usual log-linear regression. The standard errors produced
under the log-linear models assume that the observations are independent from visits to visits, which is not necessaily the case. Hence, these standard errors may be
misleading and the appropriate model would be one that assumes some correlations
between observations relating to visit 1 to visit4. Under the GEE models all the
parameters of the models are highly significant.

12.6.1

Interpretation of Parameter Estimates

Since J3 is approximately 0.98 using all the correlation structures, we see that the
number of epileptic attacks increases by a factor of
exp{04)} ~ exp{(0.98)} = 2.66
that is, approximately by 3 attacks given the levels of the other covariates for the
8-week period for a unit increase in age. Since there is significant interaction effect
of treatment and Iogbase4, we see that the negative value of the estimate for 02
indicates that the seizure count is significantly lower after 8 weeks for the treatment
group compared to the placebo group and that, further, this decrease is affected by
the baseline seizure counts for each group.

12.6.2

Thall and Vail Model

Thall & Vail (1990) proposed the following model for the epileptic data in this
example:
tij =0o+0i trti + 02 base4ij + 03 trt*base4^ + 0* age; + /35v4y

(12.17)

where

1 if Fprogabide
6
0 if placebo

,
(l
and v4 = <
I0

if visit=4
otherwise

The values of t>4 and trt were displayed earlier for the first 5 subjects.
The model in (12.17) is implemented under various correlation structures, and
the parameter estimates under both the IND and EXCH correlation structures are
displayed below along with the accompanying SAS software program.
data two; set epilepsy;
proc genmod data=two;
title 'Epileptic seizure data: Marginal model with INDEPENDENT correlation
structure';
class id visit;
model y=trt Iogbase4 trt*logbase4 logage v4/ dist=poisson type3;
repeated subject=id/type=IND corrw within=visit modelse;
run;
proc genmod data=two;
title 'Epileptic seizure data: Marginal model with exchangeable correlation
structure';
class id visit;
model y=trt Iogbase4 trt*logbase4 logage v4/ dist=poisson type3;
repeated subject=id/type=EXCH corrw within=visit modelse;
run;

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

532

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Parameter
Intercept

trt
Iogbase4
trt*logbase4
logage

v4

95'/. Confidence
Limits
Upper
Lower

Standard
Error
Estimate

-2 .7258
-1 .3386
0.9486
0.5615
0.8876
-0,.1598

0,.9382
0,.4255
0,.0965
0.1739
0.2727
0,.0651

-4,.5646
-2 .1726
0..7595
0..2207
0.3530
-0,.2874

-0,.8870
-0,.5047
1..1377
0.9024
1 .4222
-0,.0321

Z Pr > |Z|

-2 .91
-3 .15
9.83
3 .23
3 .25
-2 .45

0.0037
0.0017
<.0001
0.0012
0.0011
0.0142

EXCH:
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter
Intercept
trt
Iogbase4
trt*logbase4
logage
v4

957. Confidence
Limits
Lower
Upper

Standard
Estimate
Error
-2.7599
-1.3361
0.9495
0.5625
0.8965
-0.1598

0.9491
0.4293
0.0987
0.1749
0.2751
0.0651

-4.6202
-2.1775
0.7561
0.2197
0.3574
-0.2874

-0.8996
-0.4946
1.1428
0.9053
1.4356
-0.0321

Z Pr > |Z|
-2.91
-3.11
9.62
3.22
3.26
-2.45

0.0036
0.0019
<.0001
0.0013
0.0011
0.0142

The overdispersion parameter for the above model is 4.4139 with an estimated
correlation coefficient under the exchangeable structure of p = 0.3543. The type 3
GEE analysis also indicate that trt, Iogbase4, logage, and v4 are significant at the
5% point with pvalues of 0.0171,0.0038,0.0111, and 0.0417, respectively.

12.6.3

Diggle, Liang, and Zeger Model

Diggle, Liang, and Zeger (1995) proposed the following Poisson regression model
for the epileptic data:
y = In

trt

A,

(12.18)

* trt;,

where the covariates are denned as:


1 if visit=l,2,3 or 4
0 if baseline
and
ij

trt =

l if progabide
0 if placebo.

8 if . 7 = 0
2 if j = 1,2,3, or 4

The t^ are included in the model to account for the different observation periods.
Notice here that there is no age effect in the model. Diggle et al. (1995) conclude
that patients in the two treatment groups "appear to be comparable in terms of
baseline age and eight-week baseline seizure counts".
Again, the results of implementing this model for the exchangeable correlation
structure are displayed below, where we have included the data transformation for
the first two patients in the partial output.

533

12.6. ANALYSIS OF NONBINARY RESPONSE DATA


data newl; set new; output;
if visit=l then do; y=base; visit=0; output; end; run;
data new2; set newl; if visit=0 then do; xl=0; ltime=log(8); end;
else do; xl=l; ltime=log(2); end; xltrt=xl*trt; run;
proc genmod data=new2;
class id;
model y=xl trt xl*trt/ dist=poisson offset=ltime type3;
repeated subject=id/type=EXCH corrw modelse; run;

Obs

id

1
1
1
1
1
2
2
2
2
2

2
3
4
5
6
7
8
9
10

base

0
0
0
0
0
0
0
0
0
0

11
11
11
11
11
11
11
11
11
11

age
31
31
31
31
31
30
30
30
30
30

intercpt

1
1
1
1
1
1
1
1
1
1

Itime

visit

1
0
2
3
4
1
0
2
3
4

5
11
3
3
3
3
11
5
3
3

1
0
1
1
1
1
0

1
1
1

0.69315
2 . 07944
0.69315
0.69315
0.69315
0.69315
2.07944
0.69315
0.69315
0.69315

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Parameter Estimate

Standard
Error

1..3476
0..1087
0.0265
-0,.1016

0,.1574
0,,1156
0.,2219
0,,2134

Intercept

xl
trt
xl*trt

957. Confidence
Limits

1..0392
-0..1179
-0..4083
-0..5198

1 .6560
0.3354
0.4613
0.3166

Z Pr > |Z|

8.,56
0.,94
0.,12
-0.,48

<.0001
0.3472
0.9049
0.6339

Results from this model suggest that there is very little difference between the treatment and placebo groups in the change of seizure counts before and after treatment,
since the estimated coefficient of (3$ is nonsignificant in the above analysis. The
overdispersion parameter for this model is 19.69 with an estimate of p being 0.7712.
The overdispersion parameter for this model is relatively higher (19.62) here as
compared to that obtained from the Thall and Vail model.

12.6.4

Effect of Removing Patient 49

We observe that patient with id number 49 (equivalent to patient 207) in Thall and
Vail (1990) and Diggle et al. (1995) has unusual and extremely high seizure count
of 151 in 8 weeks at baseline and 302 counts in 8 weeks. We examine the effect of
dropping this patient on our analyses below.
Again, the model incorporating the linear, quadratic, and cubic components of
weeks of visit in the model has the hypothesis of no time effects, providing a Wald
statistic of Qw = 4.15 on 3 d.f. with a pvalue of 0.2457, again indicating that this
hypothesis is tenable.
When patient 49 is deleted from the data, with the analysis employing the Thall
and Vail (1990) model in (12.17), we have the following parameter estimates under
the exchangeable correlation structure.
proc genmod data=two;
where id ne 49;
class id visit;
model y=trt Iogbase4 trt*logbase4 logage v4/ dist=poisson type3;
repeated subject=id/type=EXCH corrw within=visit;
run;

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

534

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Parameter

Estimate

Standard
Error

Intercept

-2 .3223
-0 .5160
0.9500
0.1375
0..7662
-0 .1464

0,,8734
0,,4178
0,.0983
0,,1946
0,,2535
0,,0758

trt
Iogbase4
trt*logbase4
logage

v4

957, Confidence
Limits

-4.,0342
-1.,3349
0.,7573
-0.,2439
0.,2694
-0.,2949

-0 .6103
0.3029
1 .1426
0.5189
1,.2630
0.0022

Z Pr > |Z|

-2,.66
-1,.24
9,.67
0,.71
3,.02
-1,.93

0.0078
0.2168
<.0001
0.4797
0.0025
0.0534

The model indicates again that the treatment has very little effect on seizure counts
after randomization of the subjects to both groups. The negative value of the
treatment coefficient, however, indicates that there is some reduction in the seizure
counts for the progabide group, although not that significant. The overdispersion
parameter for this model is 4.1496 with estimated correlation coefficient of 0.3353.
Similarly, when we employed the Diggle et al. (1995) model in (12.18), we obtain
the following results, again based on the exchangeable correlation structure.
proc genmod data=new2;
where id ne 49;
class id;
model y=xl trt xl*trt/ dist=poisson offset=ltime typeS;
repeated subject=id/type=EXCH corrw;
run;
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Parameter Estimate

Standard
Error

1.,3476
0.,1087
-0.,1080
-0.,2995

0.,1574
0..1156
0..1937
0.,1709

Intercept

xl
trt
xl*trt

95% Confidence
Limits

1..0392
-0..1179
-0..4876
-0..6345

1,.6560
0..3354
0..2716
0,.0354

Z Pr > |Z|

8,.56
0,.94
-0..56
-1..75

<.0001
0.3472
0.5770
0.0797

The model indicates again that the treatment has very little effect on seizure counts
after randomization of the subjects to both groups. The negative value of the treatment coefficient however indicates that there is some reduction in the seizure counts
for the progabide group, although this effect is not significant. The overdispersion
parameter for this model is 10.5308 with estimated correlation coefficient of 0.5932.

12.7.

535

EXERCISES

12.7

Exercises

1. Ware, et al. (1984) analyzed the wheezing data in Table 12.9 from a six-city
study on the health effects of pollution. We give below the data from only
one of the cities. Children in Steubenville, OH, were examined for wheezing
at each of ages 7 through 10 years of age. The mothers' smoking habits were
also recorded at the start of the study.
Nonsmoking mother

Mother smokes
Age 7

Age 8

Age 9

No

No

No
Yes
No
Yes
No
Yes
No
Yes

Yes
Yes

No
Yes

Age 10
No
Yes
118
6
8
2
11
1
6
4
7
3
3
1
4
2
4
7

Age 7
No

Age 8
No
Yes

Yes

No
Yes

Age 9
No
Yes
No
Yes
No
Yes
No
Yes

Age
No
237
15
16
7
24
3
6
5

10
Yes
10
4
2
3
3
2
2
11

Table 12.9: Wheezing data


Analyze the above data and interpret your results. Is there an age effect on
the child's wheezing? What effect does mother smoking status has on the
child's wheezing?
2. Responses to three questions on abortion in surveys conducted over three
years (Haberman, 1978, p. 482) are given in Table 12.10.
Response

1972

Year
1973

1974

YYY
YYN
YNY
YNN
NYY
NYN
NNY
NNN

334
34
12
15
53
63
43
501

428
29
13
17
42
53
31
453

413
29
16
18
60
57
37
430

Table 12.10: Abortion survey data


The questions were: Should a pregnant woman be able to obtain a legal
abortion if
(1) she is married and does not want more children;
(ii) the family has very low income and cannot afford any more children;
(iii) she is not married and does not want to marry the man?
Fit a Rasch model to the above data.
3. Data on responses (C: correct, W: wrong) to four questions from the arithmetic reasoning test on the Armed Services Vocational Aptitude Battery, with
samples from White and Black males and females (Lindsey, 1995), are reproduced in Table 12.11.

CHAPTER 12. ANALYSIS OF REPEATED MEASURES DATA

536

Response
CCCC
WCCC

CWCC
WWCC

ccwc
wcwc
cwwc
wwwc
cccw
wccw
cwcw
wwcw
ccww
wcww
cwww
wwww

White
M
F
42
86
1
7
19
6
2
2
11
15
3
5
6
8
5
8
23
20
6
11
7
9
12
14
21
18
16
20
22
23
23
20

Black
M
F
2
4
0
3
1
2
3
3
9
5
5
5
10
10
5
8
10
8
4
6
11
8
7
15
7
19
14
16
14
15
29
27

Table 12.11: Arithmetic reasoning data


Fit a Rasch model to the above data and test if your results are the same for
each of the four groups.
4. The data below is a subset of data from Woolson and Clarke (1984). The
variables are respectively sex (0 = female), yl-y3 (1 = obese), and count of
the number of people with each pattern. Thus, this data set contains records
on 1014 children who were 7-9 years old in 1977 (the first year measurements
were obtained). Measures of obesity were obtained in three survey years:
1977, 1979, and 1981 (yl, y2, y3, respectively).

1 0
1 0

1 19
0 13

12.7.

537

EXERCISES
0
0
0
0
0
0

0
0
0
.
.
.

1
0
.
1
0
.

Analyze the above data.


The following data is from a longitudinal study of dental growth in children
(Potthoff & Roy, 1964). Measured was the distance from the center of the
pituitary gland to the maxillary fissure for children at ages 8, 10, 12, and 14.
The data are presented below.

Girls
1
2
3
4
5
6
7
8
9
10
11

Age
8
21
21
20.5
23.5
21.5
20
21.5
23
20
16.5
24.5

(in years): Girls


10
12
14
20
21.5
23
21.5
24
25.5
24
24.5
26
24.5
25
26.5
23
22.5 23.5
21
21
22.5
22.5
23
25
23
23.5
24
21
22
21.5
19
19
19.5
25
28
28

Boys
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Age
8
26
21.5
23
25.5
20
24.5
22
24
23
27.5
23
21.5
17
22.5
23
22

in years): Boys
10
12
14
25
29
31
22.5
23
26.5
22.5
24
27.5
27.5 26.5
27
23.5 22.5
26
25.5
27
28.5
22
24.5 26.5
21.5 24.5 25.5
20.5
31
26
28
31
31.5
23
23.5
25
23.5
24
28
24.5
26
29.5
25.5 25.5
26
24.5
26
30
21.5 23.5
25

(a) Perform an analysis of the above data assuming an exchangeable structure.


(b) Repeat (a) using an AR(1) structure.
(c) Compare and contrast the two analyses. Which seems best?

This page intentionally left blank

Appendices
All appendices for this book can be found in the CD-Rom enclosed with this text.
A separate table of contents and indexes are prepared for this chapter.

539

This page intentionally left blank

Table of the Chi-Squared Distribution


d.f.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
110
120

0.250
1.32
2.77
4.11
5.39
6.63
7.84
9.04
10.22
11.39
12.55
13.70
14.85
15.98
17.12
18.25
19.37
20.49
21.60
22.72
23.83
24.93
26.04
27.14
28.24
29.34
30.43
31.53
32.62
33.71
34.80
45.62
56.33
66.98
77.58
88.13
98.65
109.14
119.61
130.05

0.100
2.71
4.61
6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99
17.28
18.55
19.81
21.06
22.31
23.54
24.77
25.99
27.20
28.41
29.62
30.81
32.01
33.20
34.38
35.56
36.74
37.92
39.09
40.26
51.81
63.17
74.40
85.53
96.58
107.57
118.50
129.39
140.23

Right-Tail Probability
0.025
0.010
0.050
5.02
6.63
3.84
9.21
7.38
5.99
11.34
7.81
9.35
13.28
9.49
11.14
15.09
12.83
11.07
16.81
14.45
12.59
18.48
16.01
14.07
20.09
17.53
15.51
21.67
19.02
16.92
23.21
18.31
20.48
24.72
21.92
19.68
26.22
23.34
21.03
24.74
27.69
22.36
29.14
26.12
23.68
30.58
27.49
25.00
32.00
28.85
26.30
33.41
30.19
27.59
34.81
31.53
28.87
30.14
36.19
32.85
31.41
37.57
34.17
38.93
35.48
32.67
33.92
40.29
36.78
41.64
38.08
35.17
42.98
36.42
39.36
44.31
40.65
37.65
45.64
41.92
38.89
46.96
43.19
40.11
48.28
44.46
41.34
49.59
45.72
42.56
50.89
46.98
43.77
59.34
63.69
55.76
76.15
71.42
67.50
88.38
83.30
79.08
95.02 100.43
90.53
101.88 106.63 112.33
113.15 118.14 124.12
124.34
129.56 135.81
147.41
135.48 140.92
146.57 152.21 158.95

0.005
7.88
10.60
12.84
14.86
16.75
18.55
20.28
21.95
23.59
25.19
26.76
28.30
29.82
31.32
32.80
34.27
35.72
37.16
38.58
40.00
41.40
42.80
44.18
45.56
46.93
48.29
49.64
50.99
52.34
53.67
66.77
79.49
91.95
104.21
116.32
128.30
140.17
151.95
163.65

541

0.001
7.88
10.60
12.84
14.86
16.75
18.55
20.28
21.95
23.59
25.19
26.76
28.30
29.82
31.32
32.80
34.27
35.72
37.16
38.58
40.00
41.40
42.80
44.18
45.56
46.93
48.29
49.64
50.99
52.34
53.67
66.77
79.49
91.95
104.21
116.32
128.30
140.17
151.95
163.65

This page intentionally left blank

Bibliography
Abramowitz, M., & Stegun, I.A. (1970). Handbook of Mathematical Functions with Formulas,
Graphs and mathematical Tables. U.S. Govt. Printing Office, Washington.
Agresti, A. (1983). A simple diagonals-parameter symmetry and quasi-symmetry model. Statist. Prob.
Letters, 1, 313-316.
Agresti, A. (1984). Analysis of Ordinal Categorical Data. John Wiley and Sons, New York.
Agresti, A. (1990). Categorical Data Analysis Wiley, New York.
Agresti, A. (1992). A survey of exact inference for contingency tables (with discussion). Statistical
Science, 7: 131-177.
Agresti, A. (1996). An Introduction to Categorical Data Analysis. John Wiley and Sons, New York.
Agresti, A., Mehta, C.R., & Patel, N.R. (1990). Exact inference for contingency tables with ordered
categories. JASA, 85: 453-458.
Agresti, A., & Wackerly, D. (1977). Some exact conditional tests of independence for R x C crossclassification tables. Psychometrika, 42: 111-125.
Agresti A., Wackerly D., & Boyett J.M. (1979). Exact conditional tests for cross-classification: approximation of attained significance levels. Psychometrika, 44: 75-83.
Aitkin, M. (1979). A simultaneous test procedure for contingency tables. Applied Statistics, 28, 233242.
Aitken, M., Anderson, D., Francis, B. & Hinde, J. (1989). Statistical Modelling in GLIM. Clarendon
Press, Oxford.
Aicken, M. (1983). Linear Statistical Analysis of Discrete Data J. Wiley & Sons, New York.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Autom. Contr.
AC-19, 716-723.
Alba, R.D. (1987). Interpreting the parameters of log-linear models. Sociological Methods and Research, 16, 45-77.
Amemiya, T. (1981). Qualitative response models: a survey. J. of Econ. Literature, 19, 1483-1536.
Andersen, E. B. (1970) Asymptotic properties of conditional maximum likelihood estimators. Jour,
of the Royal Statistical Society, B32, 283-301.
Andersen, E. B. (1973). Conditional Inference and Models for Measuring. Copenhagen: Mentalhygieniejuisk Forlag.
Andersen, E.B. (1980). Discrete Statistical Models with Social Science Applications. North-Holland,
Amsterdam.
Anderson, E.B. (1997) Introduction to the Statistical Analysis of Categorical Data., Springer-Verlag,
Berlin.
Anderson, T.W., &: Goodman, L.A. (1957). Statistical inference about Markov chains. Ann.
Statist.,28, 89-110.

543

Math.

544

BIBLIOGRAPHY

Bachman, J.G., Johnson, L.D., & O'Malley, P.M. (1980). Monitoring the future: Questionnaire
responses from the nation's high school seniors, 1979 and 1980. Ann Arbor, MI: Survey Research
Center.
Baker, R.J. (1977). Exact distributions derived from two-way tables. Applied Statistics, 26: 199-206,
Corr. 27: 109.
Barnard, G.A. (1989). On alleged gains in power from lower P-values. Statist. Med., 8: 1469-1477.
Barnard, G.A. (1990) Comment Statist. Med., 9, 373-375.
Becker, M.P., and Clogg, C.C. (1989). Analysis of sets of two-way contingency tables using association
models. J. Amer. Statist. Assoc., 84, 142-151.
Berchtold, H. (1972). vertrauensgrenzen und vergleich zweier wahrschein-lichkeiten. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 22, 112-119.
Berkson, J. (1953). A statistically precise and relatively simple method of estimating the bio-assay
with quantal response based on the logistic function. J. Amer. Statist. Assoc., 41, 569-99.
Berkson, J. (1955). Maximum likelihood and minimum x 2 estimate of the logistic function. J. Amer.
Statist. Assoc., 50, 130-162.
Berstock, D. A., Villers, C., & Latto, C. (1978). Diverticular disease and carcinoma of the colon.
Unpublished manuscript.
Best, D.J., & Roberts, D.E. (1975). Algorithm AS 91: The percentage points of the x distribution.
Applied Statist., 24, 385-388.
Bhapkar, V.P. (1966). A note on the equivalence of two test criteria for hypotheses in categorical
data. J. Amer. Statist. Assoc., 61, 228-235.
Bhapkar, V.P. (1979). On tests of marginal symmetry and quasi-symmetry in two and threedimensional contingency tables. Biometrics, 35, 417-426.
Birch, M.W. (1963). Maximum likelihood in three-way contingency tables. J. Roy. Statist. Soc., B25,
220-233.
Birch, M.W. (1964). The detection of partial association, I: The 2 x 2 case. J. Roy. Statist. Soc.,
B26, 313-324.
Bishop, Y.M.M, Fienberg, S.E., & Holland, P.W. (1975). Discrete Multivariate Analysis. MIT Press.
Bliss, C.I. (1935). The calculation of the dosage-mortality curve. Ann. Appl. Biol., 22, 134-167.
Bortkewicz, L. von (1898). Das Gesetz der Kleinen Zahlen. Leipzig: Teubner.
Bradley, J. V. (1968). Distribution Free Statistical Tests. Englewood Cliffs, NJ: Prentice Hall.
Bradley, R.A., & Terry, M.E. (1952). Rank analysis of incomplete block designs I. The method of
paired comparisons. Biometrika, 39, 324-345.
Breslow, N.E., & Day, N.E. (1980). Statistical Methods in Cancer Research. Lyon: International
Agency for Research on Cancer.
Brown, M.B. (1976). Screening effects in multidimensional contingency tables. Applied Statistics, 25,
37-46.
Burstein, H. (1981). Binomial 2x2 test for independent samples with independent proportions. Comm.
Statist., A10, 1, 11-29.
Caussinus, H. (1965). Contributions a 1'analyse statistique des tableaux de correlation. Ann. Fac.
Sci. Univ. Toulouse, 29, 77-182.
Chamberlain, G. (1980). Analysis of covariance with qualitative data. Review of Economic Statistics,
48, 225-238.
Chang, Y. (2000). Residuals analysis of the generalized linear models for longitudinal data. Statist.
Med., 19, 1277-1293.
Chatterji, S.D. (1963). Some elementary characterizations of the Poisson distribution. Amer. Math.
Monthly, 70, 958-964.

BIBLIOGRAPHY

545

Christensen, R. (1987). Plane Answers to Complex


Springer-Verlag, New York.

Questions:

The Theory of Linear Models.

Christensen, R. (1990). Log-Linear Models. Springer-Verlag, New York.


Clarke, S. R., &: Norman, J.M. (1995). Home ground advantage of individual clubs in English soccer.
The Statistician, 44, 509-521.
Clayton, M.K., Geisser, S and Jennings, D.E. (1986). A comparison of several model selection procedures. In Bayesian Inference and Decission Techniques, edited by P. Goel and A. Zellner. Amsterdam: North Holland.
Clogg, C.C. (1978). Adjustments of rates using multiplicative models. Demography, 15, 523-539.
Clogg, C.C. (1982). Some models for the analysis of association in multiway cross-classifications having
ordered categories. Jour. Araer. Statist. Assoc., 77, 803-815.
Clogg, C.C. (1995). Latent Class Models, in Handbook of Statistical Modeling for the Social and
Behavioral Sciences, eds Arminger, G.E, Clogg, C.C. & Sobel, M. Plenum Press, New York. pp.
311-359.
Clogg, C.C., &: Eliason, S. R. (1987). Some common problems in log- linear analysis. Social. Meth.
& Research, 16, 8-44.
Clogg. C.C. & Eliason, S.R. (1988). A flexible procedure for adjusting rates and proportions, including
statistical methods for group comparisons. Amer. Social. Rev., 53, 267-283.
Clogg, C.C., Eliason, S.R., &: Grego, J.M. (1990). Models for the analysis of change in discrete
variables.in Statistical Methods in Longitudinal Research, 2, Time series and categorical longitudinal
data.
Clogg, C.C., & Goodman, L.A. (1984). Latent structure analysis of a set of multidimensional contingency tables. Jour. Amer. Statist. Assoc., 79, 762-771.
Clogg, C.C., & Shihadeh, E.S. (1994). Statistical Models for Ordinal Variables, Thousand Oaks,
CA:Sage.
Clogg, C.C. & Shockey, J.W. (1988). Multivariate Analysis of discrete Data, in Handbook of Multivariate Experimental Psychology, eds J.R. Nesselroade and R.B. Cattell, New York:Plenum Press,
pp. 337-365.
Clogg, C.C, Shockey, J.W., & Eliason, S.R. (1990). A general statistical framework for adjustment of
rates.Social. Meth. & Res., 19, 156-195
Cochran, W.G. (1942). The X
421-436.
Cochran, W.G. (1952). The x2

correction for continuity. Iowa State College Journal of Science, 16,


test

of goodness-of-fit. Ann. Math. Statist., 23, 315-345.

Cochran, W.G. (1954). Some methods of strengthening the common \2 tests. Biometrika, 10, 417-451.
Cohen, J. (1960). A coefficient of Agreement for nominal scales. Educ. Psychol. Meas., 20, 37-46.
Collet, D. (1991). Modeling Binary Data. Chapman & Hall, London.
Conover, W.J. (1974). Some reasons for not using the Yates continuity correction on 2 x 2 contingency
tables (with comments). J. Amer. Statist. Assoc., 69, 374-382.
Correa, D., Pickle, L.W., Fontham, E., Lin, Y., &: Haenszel, W. (1983). Passive smoking and lung
cancer Lancet 8350, 595-597.
Cox, C.P. (1987). A Handbook of Statistical Methods. Wiley, New York.
Cox, D.R. (1970a). The Analysis of Binary Data. Methuen, London.
Cox, D.R. (1970b). The continuity correction. Biometrika, 57, 217-219.
Cox, D.R., &: Snell, E.J. (1989). Analysis of Binary Data, Second Edition. Chapman &; Hall, London.
Cox, M.A.A., & Plackett, R.L. (1980). Small samples in contingency tables. Biometrika, 67: 1-14.
Cramer, H. (1946). Mathematical Methods of Statistics. Princeton, NJ: Princenton University Press.

546

BIBLIOGRAPHY

Cressie, W., & Read, T.R.C. (1984). Multinomial goodness-of-fit tests. J. R. Statist. Soc., B, 46,
440-464.
Cressie, W., & Read, T.R.C. (1988). Goodness-of-fit
Springer-Verlag, New York.

statistics for discrete multivariate data.

Darroch, J. N., & McCloud, P. I. (1986). Category distinguishability and observed agreement. Australian Journal of Statistics, 28, 371-388.
David, H.A. (1988). The Method of Paired Comparisons. Oxford: Oxford University Press.
Davidson, R.R., & Beaver, R.J. (1977). On extending the Bradley-Terry model to incorporate withinpair order effects. Biometrics, 33, 693-702.
Deming,W.E., & Stephan, F.F. (1940). On a least squares adjustment of a sampled frequency table
when the expected marginal totals are known. Ann. Math. Statist., 11, 427-444.
Demo, D.H., & Parker, F.F. (1987). Academic achievement and self-esteem among black and white
college students. J. Social Psychol., 127, 345-355.
DiFrancisco, W., & Critelman, Z. (1984). Soviet political culture and covert participation in policy
implementation. Amer. Polit. Sci. Rev., 78, 603-621.
Diggle, P. J., Liang, K., & Zeger, S. L. (1995) Analysis of Longitudinal Data. Oxford Science Publications, Oxford.
Dobson, A.J. (1990). An Introduction to Generalized Linear Models, Chapman and Hall, London.
Dozzi, M., & Riedwyl, H. (1984). Small sample properties of asymptotic tests for two binomial proportions. Biom. J., 26, 505-516.
Duncan, O.D., & McRae, J.A. Jr. (1979). Multiway contingency analysis with a scaled response or
factor. In Sociological Methodology. San Francisco: Jossey-Bass.
Duncan, O.D., Schuman, H., & Duncan, B. (1973). Social Change in a Metropolitan
New York: Russell Sage Foundation.

Community.

Edwards, A.W.F. (1963). The measure of association in a 2 x 2 table. J. Roy. Statist. Soc., A126,
109-114.
Eliason, S. (1990). Categorical Data Analysis System, Version 3.50. Dept. of Sociology, University of
Iowa, Ames.
Elson, R.C., & Johnson, W.D. (1994). Essentials of Biostatistics, F.A. Davis, Philadelphia.
Everitt, B.S. (1977). The Analysis of Contingency Tables. London: Chapman and Hall.
Farewell, V. T. (1982). A note on regression analysis of ordinal data with variability of classification.
Biometrika, 69: 27-32.
Fienberg, S.E. (1980). The Analysis of Cross-Classified Categorical Data. New York: John Wiley.
Fienberg, S.E., & Larntz, K. (1976). Loglinear representation for paired and multiple comparison
models. Biometrika, 63, 245-254.
Fingleton, B. (1984). Models of Category Counts. Cambridge University Press, Cambridge.
Finney, D.J. (1941). The estimation from individual records of the relationship between dose and
quantal response. Biometrika, 34, 320-334.
Finney, D.J. (1971). Probit Analysis, 3rd edition, Cambridge University Press.
Firth D., & Treat, B. R. (1988). Square contingency tables and GLIM. GLIM Newletter,l6, 16-20.
Fisher, R. A. (1924). The condition under which x measures the discrepancy between observation
and hypothesis. Jour. R. Statist. Soc., 87, 442-450.
Fisher, R.A. (1935). The case of zero survivors (Appendix to Bliss, C.I. (1935)). Ann. Appl. Biol.,
22, 164-165.
Fisher, R.A. (1970). Statistical Methods for Research Workers. 14th edition, Edinburgh:Oliver &:
Boyd. Oliver & Boyd, Edinburgh (14th edition).

BIBLIOGRAPHY

547

Fisher, R. A. (1973). Statistical Methods and Scientific Inference. New York: Hafner.
Fleiss, J.L. (1981). Statistical Methods for Rates and Proportions, second edition. New York: Wiley.
Fleiss, J.L., Cohen, J., and Everitt, B. S. (1969). Large-sample standard errors of kappa and weighted
kappa. Psychol. Bull., 72, 323-327.
Forthofer, R. N., & Lehmen, R.G. (1981). Public Program Analysis: A New Categorical Data Approach. Belmont, CA: Lifetime Learning Publications.
Freeman, D.H. (1987). Applied Categorical Data Analysis. Marcel Dekker, Inc., New York.
Freeman, G.H., & Halton, J.H. (1951). Note on an exact treatment of contingency, goodness of fit
and other problems. Biometrika, 38: 141-149.
Freeman, M.F., & Tukey, J.W. (1950). Transformations related to the angular and the square root.
Ann. Math. Statist., 27, 601-611.
Friedl, H. (1995). A note on generating the factors for symmetry parameters in log-linear models.
GLIM Newletter, 24, 33-36.
Gabriel, K.R. (1969). Simultaneous test procedures: Some theory of multiple comparisons.
Math. Statist., 40, 224-250.

Ann.

Gail, M.H., &: Mantel, N. (1977). Counting the number of r X c contingency tables with fixed margins.
J. Amer. Statist. Assoc., 72: 859-862.
Gibbons, J.D., &: Pratt, J.W. (1975). P-values: Interpretation and methodology. Amer. Statist., 29,
20-25.
Gilula, Z. (1986). Grouping and association in contingency tables: An exploratory canonical correlation approach. J. Amer. Statist. Assoc., 81, 773-779.
Gilula, Z., Krieger, A.M., & Ritov, Y. (1988). Ordinal association in contingency tables: Some interpretive aspects. Jour. Amer. Statist. Assoc., 83, 540-545.
Glass, D.V. (1954). Social Mobility in Britain. Glenco, III: Free Press.
Goldstein, A. (1965). Biostatistics: An Introductory Text. New York: Macmillan.
Good, I.J. (1976). On the application of symmetric Dirichlet distributions and their mixtures to
contingency tables. Ann. Math. Statist., 4: 1159-1189.
Good, I.J., Cover, T.N., & Mitchell, G.J. (1970). Exact distributions for X2 and for the likelihoodratio statistic for the equi-probable multinomial distribution. J. Amer. Statist. Assoc., 65, 267-283.
Goodman, L.A. (1962). Statistical methods for analyzing processes of change. Amer. J. Social., 68,
57-78.
Goodman, L.A. (1964). Interaction in multi-dimensional contingency tables. Ann. Math. Statist. 35,
632-646.
Goodman, L.A. (1968). The analysis of cross-classified data: independence, quasi-independence, and
interactions in contingency tables with or without missing entries. J. Amer. Statist. Assoc., 63,
1091-1131.
Goodman, L.A. (1969). On partioning chi-square and detecting partial association in multiple classifications. J. Roy. Statist. Soc., B31, 486-498.
Goodman, .A. (1970). The multivariate analysis of qualitative data: interactions among multiple
classifications. J. Amer. Statist. Assoc., 65, 226-256.
Goodman, L.A. (1971a). The analysis of multidimensional contingency tables: stepwise procedures
and direct estimation methods for building models for multiple classifications. Technometrics, 13,
33-61.
Goodman, L.A. (1971b). The partioning of chi-square, the analysis of marginal contingency tables,
and the estimation of expected frequencies in multidimensional contingency tables. J. Amer. Statist.
Assoc., 66, 339-344.
Goodman, L.A. (1972). Some multiplicative models for the analysis of cross-classified data, in L. Le
Cam et al., Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 649-696. Berkeley: University of California Press.

548

BIBLIOGRAPHY

Goodman, L.A. (1978). Analyzing Qualitative/Categorical Data. Cambridge, MA: Abt Books.
Goodman, L.A. (1979a). Simple models for the analysis of Association in cross-classifications having
ordered categories. Jour. Amer. Statist. Assoc., 74, 537-552.
Goodman, L.A. (1979b). Multiplicative models for the analysis of occupational mobility tables and
other kinds of cross-classification tables. Amer. Jour. Sociology, 84, 804-829.
Goodman, L. A. (1979c). Multiplicative models for square contingency tables with ordered categories.
Biometrika, 66, 413-418.
Goodman, L.A. (1981). Criteria for determining whether certain categories in a cross-classification
table should be combined with special reference to occupational categories in an occupational mobility
table. Amer. J. Sociology, 87, 612-650.
Goodman, L.A. (1984). The Analysis of Cross-Classified Data Having Ordered Categories, Cambridge, MA: Harvard University Press.
Goodman, L.A. (1985). The analysis of cross-classified data having ordered and/or unordered categories: Association models, correlation models and symmetry models for contingency tables with or
without missing entries. The Annals of Statistics, 13, 10-69.
Goodman, L.A. (1986a). Some useful extensions of the usual correspondence analysis approach and
the usual log-linear models approach in the analysis of contingency tables. Internal. Statist. Rev.,
54, 243-309.
Goodman, L.A (1986b). Correspondence analysis and log-linear models Int. Statist. Rev., 54, 243-309.
Goodman, L. A. (1991). Measures, models, and graphical displays in the analysis of cross-classified
data. J. Amer. Statist. Assoc., 86, 1085-1111. (with comments)
Goodman, L.A. (1996). A single general method for the analysis of cross-classified data: Reconciliation
and synthesis of some methods of Pearson, Yule, and Fisher, and also some methods of correspondence
analysis and association analysis. J. Amer. Statist. Assoc., 91, 408-428.
Goodman, L.A., & Kruskal, W. H. (1954). Measures of association for cross classifications. J. Amer.
Statist. Assoc., 49, 732-764
Green, P.J. (1984). Iteratively re weighted least squares for maximum likelihood estimation and some
robust and resistant alternatives (with discussion). J. R. Statist. Soc., B, 46, 149-192.
Greenland, S. (1991). On the logical justification of conditional tests for two-by-two contingency
tables. Amer. Statist., 45, 248-251.
Grizzle, J.E. (1967). Continuity correction in the chi-square test for 2 x 2 tables. Amer. Statist., 21,
28-32.
Grizzle, J.E., Starmer, C.F., & Koch, G.G. (1969). Analysis of categorical data by linear models.
Biometrics, 25, 489-504.
Guttman, L. (1968). A general nonmetric technique for finding the smallest coordinate space for a
configuration of points. Psychometrika, 33, 469-506.
Haberman, S.J. (1973). The analysis of residuals in cross- classification tables. Biometrics, 29: 205220.
Haberman, S.J. (1978). Analysis of Qualitative Data, Volumes 1 & 2. Academic Press: New York.
Haberman, S. (1981). Tests for independence in two-way contingency tables based on canonical correlation and on a linear-by-linear interactions. The Annals of Statistics, 9, 1178-1186.
Haldane, J.B.S. (1937). The exact value of the moments of the distribution of x2, used as atest of
goodness-of-fit when expectations are small. Biometrika, 29, 133-143; Corr. 13, 220.
Hansteen, V., Moinichen, E., Lorentsen, E., Andersen, A., Strom, O., Soiland, K., Dyrbekk, D.,
Refsum, A.M., Tromsdal, A., Knudsen, K., Eika, C., Bakken, J., Smith, P., & Hoff, P.I. (1982).
One year's treatment with propranolol after myocardial infarction: preliminary report of Norwegian
multicentre trial. Brit. Med. J. (Clin Res Ed), 284(6310), 155-60.
Hedlund, R.D. (1978). Cross-over voting in a 1976 presidential primary. Public Opinion Quart., 41,
498-514.
Hildebrand, D., & Ott, L. (1991). Statistical Thinking for Managers. Third edition. PWS Kent:MA.

BIBLIOGRAPHY

549

Hill, I.D. (1988). Discussion on the paper by William Rice. Biometrics,44, 14-16.
Hirji, K. F., Tan, S.J., & Elashoff, R.M. (1991). A quasi-exact test for comparing two binomial
proportions. Statist. Med., 10, 1137-1153.
Hogg, R. V., & Tanis, E. A. (1997). Probability and Statistical Inference, Fifth Edition, Prentice
Hall. Upper Saddle River, NJ.
Holloway, J.W. (1989). A comparison of the toxicity of the pyrethroid trans-cypermethrin, with and
without the synergist piperonyl butoxide, to adult moths from two strains of Heliothis virescens.
Final year dissertation, University of Reading, UK.
Hommel, G. (1978). Tail probabilities for contingency tables with small expectations. JASA, 23:
764-766.
Hope, K. (1982). Vertical and non vertical class mobility in three countries. Amer. Social. Review,
47, 99-113.
Hubert, J.J. (1992). Bioassay. Kendall/Hunt Publishing Company: Dubuque, Iowa.
Imrey, P.B., Johnson, W.D., & Koch, G.G. (1976). An incomplete contingency table approach to
paired comparison experiments. Jour. Amer. Statist. Assoc., 71, 614-623.
Iverson, G.R. (1979). Decomposing chi-square: A forgotten technique. Social. Methods Res., 8: 143157.
Jarret, R.G. (1979). A note on the intervals between coal-mining disasters. Biometrika, 66, 191-193.
Johnson, N.L., &: Kotz, S. (1982). Distributions in Statistics: Discrete Distributions. Boston:
Houghton Mifflin.
Jolayemi, E.T. (1990a). On the measure of agreement between two raters. Biom. J, 32, 87-93.
Jolayemi, E.T. (1990b). The model selection for one-dimensional multinomials. Biometrical Journal,
32, 827-834.
Kateri, M. (1993). Fitting asymmetry and non-independence models with GLIM. GLIM Newsletter,
22, 46-50.
Kaufman, R.L. and Schervish, P.G. (1986). Using adjusted cross tabulations to interpret log-linear
relationships. Amer. Soc. Rev., 51, 717-733.
Kaufman, R.L, & Schervish, P.G. (1987). Variations on a theme: More uses of odds ratios to interpret
log-linear models. Sociological Methods and Research. 16, 218-255.
Kelsey, J.L., & Hardy, R. J. (1975). Driving of motor vehicles as a risk factor for acute herniated
lumbar intervertebral disc. Amer. Journal of Epidemiology, 102, 63-73.
Kendall, M.G. (1945). The treatment of ties in rank problems. Biometrika, 33, 239-251.
Kendall, M. G. (1952). The Advanced Theory of Statistics, Vol. 1, 5th edition. Griffin, London.
Keil, J.E., Sutherland, S.E., Hames, C., Lackland, D., Gazes, P., Knapp, R., & Tyroler, H. (1995).
Coronary disease mortality and risk factors in black and white men. Archives of Internal Medicine,
155, 1521-1527.
Kleinbaum, D.G., Kupper, L.L., Muller, K.E., & Nizam, A. (1998). Applied Regression Analysis and
Other Multivariable Methods. Duxbury Press, Pacific Grove, CA.
Klotz, J., & Teng, J. (1977) One-way layout for counts and the exact enumeration of the KruskalWallis H distribution with ties. J. Amer. Statist. Assoc., 72, 165-169.
Koch, G.G., Amara, I.A., Davis, G.W. & Gillings, D.B. (1982). A review of some statistical methods
for covariance analysis of categorical data. Biometrics, 38, 563-595.
Koch, G.G., & Bhapkar, V.P. (1982). Chi-square tests. In Encyclopedia of Statistical Sciences, Jonson, N.L., and Kotz, S. (eds), 442-457. Wiley, New York.
Koch, G.G., Imrey, P.B., Singer, J.M., Atkinson, S.S., & Stokes, M.E. (1985). Lecture Notes for
Analysis of Categorical Data. Montreal: Les Presses de L'Universite de Montreal.
Koch, G.G & Stokes, M.E. (1981). Chi-square tests: numerical examples. In Encyclopedia of Statistical Sciences, Johnson, N.L., & Kotz, S. (eds). Wiley, New York.

550

BIBLIOGRAPHY

Koehler, K. (1986). Goodness-of-fit tests for log-linear models in sparse contingency tables. JASA,
81: 483-493.
Koehler, K.J., & Larntz, K. (1980). Empirical investigation of goodness-of-fit statistics for sparse
multinomials. J. Amer. Statist. Assoc., 75, 336-344.
Kreiner, S. (1992). Exact inference in multidimensional tables. Comment on Agresti (1992). Statistical
Science 7: 163-165.
Kudo, A. & Tarumi, T. (1978). 2 x 2 Tables emerging out of different chance mechanisms. Commun.
Statist., 977-986.
Kullback, S. (1959). Information Theory and Statistics. Wiley, New York.
Kutylowski, A. J. (1989). Analysis of symmetric cross-classifications. In Lecture Notes in Statistics,
57, 188-197.
Lancaster, H. O. (1949a). The derivation and partition of x in certain discrete distributions. Biometrika, 36, 117-122.
Lancaster, H.O. (1949b). Statistical control of counting experiments. Biometrika, 39, 419-422.
Lancaster, H.O. (1961). Significance tests in discrete distributions J. Amer. Statist. Assoc., 56, 223234.
Landis, J.R., & Koch, G.G. (1977a). The measurement of observer agreement for categorical data.
Biometrics, 33, 159-174.
Landis, J.R., &: Koch, G.G. (1977b). An application of hierarchical Kappa-type statistics in the
assessment of majority agreement among multiple observers. Biometrics, 33, 363-374.
Larntz, K. (1978). Small-sample comparison of exact levels for chi- squared goodness-of-fit statistics.
JASA, 73: 253-263.
Lawal, H.B. (1980). Tables of percentage points of Pearson's goodness-of-fit statistic for use with
small expectations. Appl. Statist., 29, 292-298.
Lawal, H.B. (1984). Comparisons of X , Y2, Freeman-Tukey and William's improved G2 test statistics
in small samples of one-way multinomials. Biometrika, 71, 415-458.
Lawal, H.B. (1986). Review of approximations to the null distribution of Pearson's X2 test statistic.
Nig. Jour. Scientific Research, 1, 65-69.
Lawal, H.B. (1989a). Comparing power behaviors for some goodness-of-fit test statistics in sparse
multinomials. Biom. Journal., 31, 297-306.
Lawal, H.B. (1989b). On the X
AMSE Review, 12: 37-51.

statistic for testing independence in two-way contingency tables

Lawal, H.B. (1992a). A modified X2 test when some cells have small expectations in the multinomial
distribution. J. Statist. Comput. Simul., 40, 15-27.
Lawal, H.B. (1992b). Using discrete distributions to compute powers for goodness-of-fit test statistics
in a one-way multinomial setting. Biometrical Journal, 34, 429-435.
Lawal, H.B. (1992c). Approximating the non-null distribution of the likelihood ratio test statistic.
Biometrical Journal, 34, 473-483.
Lawal, H.B. (1992d). Parsimonious uniform-distance association models for the occupational mobility
table data. J. Japan Statist. Soc., 22, 123-134.
Lawal, H.B. (1993a). Association, symmetry, and diagonal models for occupational mobility and other
similar square contingency tables having ordered categorical variables. Biometrical Journal, 35, 193206.
Lawal, H.B. (1993b). Comparisons of some chi-squared goodness-of-fit test statistics in sparse one-way
multinomials. Biometrical Journal, 35, 589-599.
Lawal, H.B. (1996). Using SAS Genmod Procedure to fit Diagonal class Models to Square Contingency
Tables Having Ordered Categories Proceedings of the MidWest SAS Users Group 96, 149-160.
Lawal, H.B. (2001). Modeling symmetry models in square contingency tables. Jour. Statist. Comp.
Simul., 71, 59-83.

BIBLIOGRAPHY

551

Lawal, H.B. (2002a). Modeling the 1984-1993 American League Baseball results as dependent categorical data. Jour. Applied Probability, 27, 53-66.
Lawal, H. B. (2002b). The structure of the log odds-ratios in non-independence and symmetry diagonal
models for square contingency tables. Jour, of Quality & Quantity, 36(3), 197-220.
Lawal, H. B. (2003a). Implementing point-symmetry models for square tables having ordered categories in SAS. To appear in Journal of Italian Statistical Society.
Lawal, H.B. (2003b) Application of non standard log-linear models to symmetry and diagonals parameter models in square contingency tables. Submitted, South African Statist. Jour..
Lawal, H.B., & Upton, G.J.G (1980). An approximation to the distribution of the X2 goodness-of-fit
statistic for use with small expectations. Biometrika, 67, 447-453.
Lawal, H.B., & Upton, G.J.G. (1984).On the use of X 2 as a test of independence in contingency tables
with small cell expectations. Australian J. Statist., 26, 75-85.
Lawal, H.B., & Upton, G.J.G. (1990a). Comparisons of some chi- squared tests for the test of independence in sparse two-way contingency tables. Biom. J., 32: 59-72.
Lawal, H.B. & Upton, G.J.G. (1990b). Alternative interaction structures in square contingency tables
having ordered classificatory variables. Quality and Quantity, 24, 107-127.
Lawal, H.B., & Upton, G.J.G. (1995). A Computer algorithm for fitting models to N x N contingency
tables having ordered categories. Comm. Statist., Simula, 24(3), 793-805.
Lawal, H.B., & Sundheim, R. (2002). Generating factor variables for asymmetry, non-independence
and skew-symmetry models in square contingency tables using a SAS macro. Journal of Statistics
and Computing, Vol. 7, (8), 1-23.
Le, C.T. (1992). Fundamentals of bio statistical inference; New York: Marcel Dekker.
Lehmann, E.L. (1966). Some concepts of dependence. Ann. Math. Statist., 37, 1137-1153.
Liang, K.Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22.
Lindgren, B.W. (1968). Statistical Theory. Macmillan, London.
Lindsey, D.V. (1964). The Bayesian analysis of contingency tables. Ann. Math. Statist., 35: 16221643.
Lindsey, J.K. (1989). The Analysis of Categorical Data Using GLIM. John Wiley, New York.
Lindsey, J.K. (1992). Fitting distributions in GLIM as log-linear models. GLIM Newsletter, 21, 9-12.
Lindsey, J.K. (1995). Modelling Frequency and Count Data. Oxford Science Publications, Oxford.
Lipsitz, S. (1988). Methods for analyzing repeated categorical outcomes. Unpublished PhD dissertation, Department of Biostatistics, Harvard University.
Lipsitz, S. (1999). Lecture Notes on the GEE for BMTRY 726. Department of Biometry and Epidemiology, MUSC, Charleston, SC.
Littel, R.C., Milliken, G.A., Stroup, W.W., & Wolfinger, R.D. (1996). SAS System for Mixed
Models. SAS Institute, Inc., Cary, NC.
Lord, F. (1953). On the statistical treatment of Footbal Numbers. Amer. Psychologist, 8, 750-751.
Lunneborg, C. E. (1994). Modeling Experimental and Observational Data. Duxbury. Belmont, CA.
Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel-Haenszel
procedure. J. Amer. Statist. Assoc., 58, 690-700.
Mantel, N., & Greenhouse, S.W. (1968). What is the continuity correction? Amer. Statist., 22, 27-30.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective
studies of disease. J. Nat. Cancer Inst., 22, 719-748.
Maxwell, A.E. (1961). Analyzing Qualitative Data, Methuen, London.

552

BIBLIOGRAPHY

McCullagh, P. (1978). A class of parametric models for the analysis of square contingency tables with
ordered categories. Biometrika 65, 413-418.
McCullagh, P. (1980). Regression models for ordinal data (with discussion). J. R. Statist. Soc., B 42,
109-142.
McCullagh, P., & Nelder, J.A. (1989). Generalized Linear Models. 2nd edition, Chapman and Hall,
London.
Mead, R., & Curnow, R.N. (1983). Statistical Methods in Agriculture and Experimental
Chapman and Hall, London.

Biology.

Mecklenburg, R.S., Benson, E.A., Benson, J.W., Fredlung, P.N., Guinn, T., Metz, R.J., Nielsen, R.
L., fe Sannar, C.A. (1984). Acute complications associated with insulin pump therapy: Report of
experience with 161 patients. JAMA, 252(23), 3265-3269.
Mehta, C.R., & Hilton, J.F. (1993). Exact power of conditional and unconditional tests: Going beyond
the 2 x 2 contingency table. Amer. Statistician, 47, 91-98.
Mehta, C.R., & Patel, N.R.(1983). A network algorithm for performing Fisher's exact test in r x c
contingency tables. Jour. Amer. Statist. Assoc. 78, 427-434.
Mehta, C.R., Patel, N.R.(1990). Exact significance testing by the method of control variates. COMPSTAT,9, 141- 144.
Melia, B.M., & Diener-West M. (1994). Modeling inter rater agreement for pathologic features of
choroidal melanoma. In Case Studies in Biometry, edited by Nicolas Lange et al.. Wiley and Sons.
Mendenhall, W. (1968). Introduction to Linear Models and the Design and Analysis of Experiments.
Belmont, CA: Wadsworth.
Morgan, B.J.T. (1992). Analysis of Quantal Response Data. London:Chapman and Hall.
Morgan, S.P., & Teachman, J.D. (1988). Logistic regression: Description, examples, and comparisons.
J. Marriage Family, 50, 929-936.
Mosteller, F., & Tukey, J. W. (1977). Data Analysis and Regression. Boston: Addison-Wesley.
Mullins, E.J. & Sites, P. (1984). The origins of contemporary eminent black Americans: A threegeneration analysis of social origin. Amer. Social. Rev., 49, 672-685.
Nass, C.A.G. (1959). The X2 test for small expectations in contingency tables with special reference
to accidents and absenteeism. Biometrika, 46, 365-85.
Nelder, J.A., & Wedderburn, R.W.M. (1972). Generalized linear models. J. R. Statist. Soc., A, 135,
370-384.
Neyman, J. (1949). Contribution to the theory of the x test. Proceedings of the First Berkeley
Symposium on Mathematical Statistics and Probability, 239-273.
Ott, L. (1984). An Introduction to Statistical Methods and Data Analysis. Boston: Duxbury Press.
Pagano, M. & Gauvreau, K. (1993). Principles of Bio statistics. Duxbury Press, Belmont, CA.
Paul, J.S, Ronald, S.R., & Ronald, W.M. (1979). Exact and approximate distribution of the chi-square
statistic for equi-probability. Comm. Statist., B8(2), 131-147.
Pearson, E.S. (1947). The choice of a statistical test illustrated on the interpretation of data classified
in 2 x 2 tables. Biometrika, 34, 139-167.
Pearson, K. (1900). On a criterion that a given system of deviations from the probable in the case of a
correlated system of variables is such that it can be reasonably supposed to have arisen from random
sampling. Philo. Mag., Series 5, 50: 157-175.
Pearson, K. (1904). Mathematical contributions to the theory of evolution XIII: On the theory of
contingency and its relation to association and normal correlation. Draper's Co. Research Memoirs.
Biometric Series.
Pepe, M.S., & Anderson, G.L. (1994). A cautionary note on inference for marginal regression models
with longitudinal data and general correlated response data. Comm. Statist.-Simula, 23(4), 939-951.
Pirie, W.R, fc Hamden, M.A. (1972). Some revised continuity corrections for discrete distribution.
Biometrics, 28, 693-701.

BIBLIOGRAPHY

553

Plackett, R.L. (1981). The Analysis of Categorical Data. Griffin, London.


Potthoff, R.F., &: Roy, S.N. (1964) A generalized multivariate analysis of variance model useful especially for growth curve models. Biometrika, 51, 665-680.
Pregibon, D. (1981). Logistic regression diagnostics. Ann. Statist., 9, 705-724.
Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observations. Biometrics, 44, 1033-48.
Prentice, R. L., & Zhao, L. P. (1991). Estimating equations for parameters in means and covariances
of multivariate discrete and continuous responses. Biometrics, 47, 825-39.
Radelet, M. (1981). Racial characteristics and the imposition of the death penalty. Amer. Social.
Rev., 46, 918-927.
Radlow, R., &: Alf, E.F. (1975). An alternate multinomial assessment of the accuracy of the x2 test
of goodness of fit. JASA, 70: 811-813.
Raftery, A.E. (1986). A note on Bayes factors for log-linear contingency table models with vague prior
information. Jour. R. Statist. Soc., 48, 249-250.
Randolf, R.H., & Wolfe, D.A. (1979). Introduction to the Series of Nonparametric Statistics. John
Wiley, New York.
Rasch, G. (1961). On general laws and the meaning of measurement in psychology, pp. 321-333. In
Proc. 4^h Berkeley Symp. Math. Statist. Probab., vol 4, ed. J. Newman. University of CA Press,
Berkeley.
Rayner, J.C.W., &: Best, D.J. (1982). The choice of class probabilities and number of classes for the
simple X2 goodness-of-fit test. Sankhya, B44, 28-38.
Read, T.R.C., & Cressie, W.(1988). Goodness-of-Fit
Springer-Verlag, New York.

Statistics for Discrete Multivariate Data.

Reynolds, H.T. (1977). The Analysis of Cross-Classifications. Collier Macmillan, London.


Rice, W.R. (1988). A new probability model for determining exact P values for 2 x 2 contingency
tables when comparing binomial proportions. Biometrics, 44> 1-22.
Roscoe, J.T., & Byars, J.A. (1971). An investigation of the restraints with respect to sample size
commonly imposed on the use of the chi- square statistic. J. Amer. Statist. Assoc., 66, 755-759.
Rosner, B. (2000). Fundamentals of Biostatistics. Duxbury, Belmont, CA.
Sachs, L. (1986). Alternatives to the chi-square test of homogeneity in 2 x 2 tables and to Fisher's
exact test. Biometrical Journal, 28, 975-979.
Sakamoto, Y, Ishiguro, M., &: Kitagawa, G. (1986). Akaike Information
Scientific Publishers, MA.

Criterion Statistics. KTL

Samuels, M. L., & Witmer, J. A. (1999). Statistics for the Life Sciences. Prentice Hall, Upper Saddle
River, NJ.
Santner, T., & Duffy, D.E. (1989). The Statistical Analysis of Discrete Data. New York: SpringerVerlag.
Schork, M.A., &; Remington, R.D. (2000). Statistics with applications to the Biological and Health
Sciences. Prentice Hall, Englewood Cliffs, NJ.
Schouten, F.J., &: Loch Oh, H. (1975). A data analysis approach to fitting square tables. Comm.
Statist., 4, 595-615.
Schouten, F.J., Molenaar, I.W, Van Strik, R., & Boomsma, A. (1980). Comparing two independent
binomial proportions by a modified chi- square test. Biom. J. 22, 241-248.
Schwarz, G. (1978). Estimating the dimensions of a model. Annals of Statistics, 6, 461-464.
Senchaudhuri, P., Cyrus, R.M., & Patel, N.R. (1993). Estimating exact P-values by the method of
control variates, or Monte Carlo Rescue. ASA, San Francisco Conference
Senie, R.T, Rosen, P.P., Lesser, M.L., & Kinne, D.W. (1981). Breast self-examination and medical
examination related to breast cancer status. Amer. J. of Public Health, 71, 583-590.

554

BIBLIOGRAPHY

Shapiro, S., Slone, D., Rosenberg, L., Kaufman, D., Stolley, P.O., & Miettinen, O.S. (1979). Oral
contraceptive use in relation to myocardial infarction. Lancet, i: 743-746.
Simpson, E.H. (1951). The interpretation of interaction in contingency tables. J. Roy. Statist. Soc.,
B13: 238-241.
Slaton, T.L., Piegorsch, W.W., and Durham, S.D. (2000). Estimation and testing with overdispersed
proportions using the beta-logistic regression model of Heckman and Willis. Biometrics, 56, 125-133.
Slomczynski, K.M., & Krauze, T.K. (1987). Cross-National similarity in social mobility patterns: A
direct test of the Featherman-Jones-Hauser hypothesis. Amer. Sociological Review, 52, 598-611.
Slutsky, E.E. (1925). Uber Stochastische asymptoten und Grenzwerte. Matron, 5, 1-90.
Snedecor, G.W., & Cochran, G.W. (1973). Statistical Methods. Iowa State University Press, Ames.
Sobel, M.E., Hout, M., & Duncan, O.T. (1985). Exchange, structure and symmetry in occupational
mobility. Amer. Jour. Sociology, 91, 359-372.
Somers, R.H. (1962). A new asymmetric measure of association for ordinal variables Amer. Social.
Rev., 27, 799-811.
Speckman, J.J. (1965). Marriage and kinship among the East Indians in Suriname. Assen, Netherlands: Van Gorcum.
Spitzer, R. L., Cohen, J., Fleiss, J.L, and Endicott, J. (1967). Quantification of agreement in psychiatric diagnosis. Arch. Gen. Psychiatry, 17, 83-87.
Stanley, G., Appadu, B., Mead, M.R. & Rowbotham, D.J. (1996). Dose requirements, efficacy and
side effects of morphine and pethidine delivered by patient controlled analgesia after gynaecological
surgery. Brit. J. Anaesth., 76, 484-486.
StatXact (1991). StatXact: Statistical Software for Exact Nonparametric Inference. Cytel Software,
Cambridge, MA.
Stern, E., Misczyncki, M., Greenland, S., Damus, K., & Coulson, A. (1977). Pap testing and hysterectomy prevalence: a survey of communities with high and low cervical cancer rates. Amer. J. of
Epidem., 106, 296-305.
Stevens, S. S. (1946). On the theory of scale measurements. Science, 103, 677-680.
Stevens S. S.(1951). Mathematics, measurement, and psychophysics, pp. 1-49. In Handbook of Experimental Psychology, ed. by S.S. Stevens. New York: John Wiley.
Stevens, S.S. (1968). Measurement, statistics and the schematic view. Science, 161, 849-854.
Stokes, M.E., Davis C. S., & Koch, G. G. (1995). Categorical Data Analysis Using the SAS System.
SAS Institute, Inc., Gary, NC.
Strand, A.L. (1930). Measuring the toxicity of insect fumigants. Industrial and Engineering Chemistry: analytical edition, 2, 4-8.
Stuart, A. (1955). A test for homogeneity of the marginal distributions in a two-way classification.
Biometrika, 42, 412-416.
Sugiura, N., & Otake, M. (1974). An extension of the Mantel-Haenszel procedure to a k 2 X c contingency tables and the relation to the logit models. Comm. Statist., 3(9), 829-842.
Svalastoga, K. (1959). Prestige, class, and mobility. Heinemann, London.
Tanner, M.A., & Young, M.A. (1985). Modeling agreement among raters. J. Amer. Statist. Assoc.,
80, 175-180.
Tate, M.W., & Hyer, L.A. (1973). Inaccuracy of the X test of goodness-of- fit when expected frequencies are small. J. Amer. Statist. Assoc., 68, 836-841.
Thall, P.F., &; Vail, S. C. (1990). Some covariance models for longitudinal count with overdispersion.
Biometrics, 46, 657-71.
Tjur, T. (1982) A connection between Rasch's item analysis model and a multiplicative Poisson model.
Scandinavian Journal of Statistics, 9, 23-30.

BIBLIOGRAPHY

555

Tomizawa, S. (1985a). Decompositions for odds-symmetry models in a square contingency table with
ordered categories. J. Japan Statist. Soc., 15, 151-159.
Tomizawa, S. (1985b). Analysis of data in square contingency tables with ordered categories using the
conditional symmetry and its decomposed models. Environmental Health Perspectives, 63, 235-239.
Tomizawa, S. (1986). A decomposition for the inclined point-symmetry model in a square contingency
table. Biom. J. 28, 371-380.
Tomizawa, S. (1987). Decompositions for 2-Ratios-Parameter symmetry model in square contingency
tables with ordered categories. Biom. J., 29, 45-55.
Tomizawa, S. (1992). Quasi-diagonals-parameter symmetry model for square contingency tables with
ordered categories. Calcutta Statist. Assoc. Bull., 39, 53-61.
Truett, J., Cornfield, J. & Kannel, W. (1967). A multivariate analysis of the risk of coronary heart
disease in Framingham. J. Chronic Diseases, 20, 511-524.
Tukey, J. W. (1961). Data analysis and behavioral science or learning to bear the quantitative man's
burden by shunning badmandments. In The Collected Works of John W. Tukey, Vol III (1986), ed.
L.V. Jones. Belmont, CA: Wadsworth, pp. 391-484.
Upton, G.J.G. (1978). The Analysis of Cross-Tabulated Data. John Wiley, Chichester.
Upton, G.J.G. (1982). A comparison of alternative tests for the 2 x 2 comparative trial. J. Roy.
Statist. Soc., Ser. A, 145, 86-105.
Upton, G.J.G. (1985). A survey of log-linear models for ordinal variables in an / X J contingency
tables. Guru Nanak Journal of Sociology,6, 1-18.
Upton, G.J.G. (1991). The explanatory analysis of survey data using log-linear models. The Statistician, 40, 169-182.
Upton, G.J.G. (1992). Fisher's exact test J. Roy. Statist. Soc., Ser. A, 155, 395-402.
Upton, G.J.G., & Sarlvik, B. (1981). A loyalty-distance model for voting change. J. R. Statist. Soc.,
A, 144, 247-259.
Upton, G.J.G. & Fingleton, (1985). Spatial Data Analysis by Example. Vol. I. John Wiley, New York.
Velleman, P.F., & Wilkinson, L. (1993). Nominal, ordinal, interval and ratio typologies are misleading.
American Statistician, 47, 65-72.
von Eye, A., & Indurkhya, A. (2000). Log-linear representations of the Mantel-Haenszel and BreslowDay tests. Methods of Psychol. Research, 5, No. 4, 13-30.
von Eye, A. & Niedermeier, K. E. (1999). Statistical Analysis of Longitudinal Categorical Data in
the Social and Behavioral Sciences. Lawrence Erlbaum Associates, Mahwah, NJ.
von Eye, A., & Spiel, C. (1996). Standard and non-standard log-linear symmetry models for measuring
change in categorical variables. American Statistician, 50, 300-305.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of
observations is large. Amer. Math. Soc. Transactions, 54, 426-482.
Ware, J.H., Dockery, D.W., Spiro, A. Ill, Speizer, F.E., and Ferris, B.C. Jr. (1984). Passive smoking,
gas cooking and respiratory health in children living in six cities. Amer. Review of Respiratory
Diseases, 129, 366-374.
Ware, J.H., Lipsitz, S., & Speizer, F. E. (1988). Issues in the analysis of repeated categorical outcomes.
Statistics in Medicine, 7, 95-107.
West, E.N., & Kempthorne, O. (1972). A comparison of the x
alternatives. J. Statist. Comput. Simul., 1, 1-33.

a.nd likelihood ratio tests for composite

Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. New York: Wily.
Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Statist. 9, 60-62.
Williams, D.A. (1975). The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity. Biometrics, 31, 949-952.

556

BIBLIOGRAPHY

Williams, D.A. (1982). Extra-binomial variation in logistic linear models. Applied Statist., 31, 144148.
Williams, E.J. (1952). Use of scores for the analysis of association in contingency tables, Biometrika,
39, 274-289.
Williamson, J., Lipsitz, S., & Kim, K. M. (1999). GEECAT and GEEGOR: Computer programs for the
analysis of correlated categorical response data. Computer Methods and Programs in Biomedicine,
58 25-34.
Woodward, G., Lange, S.W., Nelson, K.W., & Calvert, H.O. (1941). The acute oral toxicity of acetic,
chloracetic, dichloracetic and trichloracetic acids. J. of Industrial Hygience and Toxicology, 23, 78-81.
Woolf, B. (1955). On estimating the relation between blood group and disease. Ann. Human Genet.,
19, 251-253.
Woolson, R. F., & Clarke, W. R. (1984). Analysis of categorical incomplete longitudinal data. J.
Royal Statistical Society, A147: 87-99.
Yamaguchi, K. (1987). Models for comparing mobility tables: Toward parsimony and substance.
Amer. Social. Rev., 32, 482-494.
Yamaguchi, K. (1900). Some models for the analysis of asymmetric association in square contingency
tables with ordered categories. In Social. Method., ed. C.C. Clogg. Oxford: Basil Blackwell, 181-212.
Yarnold, J.K. (1970). The minimum expectation in x goodness-of-fit tests and the accuracy of approximations for the null distribution. J. Amer. Statist. Assoc., 65, 865-886.
Yates, F. (1934). Contingency tables involving small numbers and the x
Suppl, 1, 217-235.

test. J. R. Statist. Soc.,

Yule, G.U. (1900). On the association of attributes in Statistics. Phil. Trans., A, 194, 257-319.
Yule, G.U. (1912). On the methods of measuring association between two attributes (with discussion)
J. Roy. Statist. Soc., 75, 579-642.
Zeger, S.L. (1988). A regression model for time series counts. Biometrika, 75, 621-629.
Zelterman, D. (1987). Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc., 82, 624-629.
Zelterman, D., Chan, S.F., & Mielke, P.W. (1993). Exact tests of significance in higher dimensional
tables. ASA, San Francisco Conference.

Subject Index
Adjacent-Category Models 397

Bias in P-values 526

Akaike information criterion 265, 265

Birch's Criteria 205

Aitken's Selection Method 263

Block Triangular Table 158

Approximations to X distribution 48
The continuity correction 48
The C(m) distribution 49
The Gamma approximation 49
The log-normal approximation 50

Breslow-Day Test 126


Brown's Tests 235
Case-Control Study 94, 95, 327
Diverticular data 327
SAS software implementation 327
Matched case-control 331336

Association Measures 163


implementation with SAS software 166
Association Models 415
Null Association Model 418
Uniform Association Model 418
Row Association Model 420
Column Association Model 423
R+C Association Model 424
RC Association Model 424
Homogeneous Models 426
Implementations with SAS software 428
In higher tables 438
Conditional assoc. 438, 440, 442
Partial assoc. 438

Categorical variable 1
Closed Loops 207
Clusters 506, 508
cluster-level covariates 509
CMH Test 124
Coding Schemes 305

Asymmetry Models 477

Cohort Study 94, 309


Framingham Study 309-318

Backward Selection Strategy 256, 259, 339


Examples 259

Collapsibility conditions 237


Employment Data 238

Baseline Category Model 381


an Example 382
Response Probabilities 385

Complimentary log-log 284, 285


an Example 297
Composite Models 475

Bayesian information criterion 265

Comprehensive Model 207

Beetle Murtality Data 297

Concordant 164, see also 106

Binary Variable 2, 279

Conditional Independence 208, 221, 226, 231

Binomial Distribution 11
asymptotic properties 13
Moments generating function 11
other properties 16

Conditional logistic 332


Conditional Symmetry 457

Bioassay 287
an Example 287

Conditional test 456

Breast Examination Survey Data 386

Constant Loyalty Model 467

Bradley-Terry Model 478


an Example 479

Continuation ratio model 399


In higher tables 405-411

557

558

SUBJECT INDEX

Coronary data 500

Empirical standard errors 512

Correlation models 432

Epileptic data 527


Thail & Vail model 531
Diggle, Liang & Zeger 532

Correlation matrices 519


Correlation structure 507, 509
Compound 509
Exchangeable: EXCH 509
Autoregressive AR(1) 516
MDEP 511
Unstructured 516

Estimating Equations 506


Exact Agreement Model 467, see also 486, 467
Exact Binomial test 105
Exact conditional test 140

Cressie-Read /(A) 44
Exact multinomial test 40, 41
Cross-sectional models 506
Extra binomial 321
Cummulative Logit Model 389
Extreme Value Distribution 284, 281
Danish Welfare Study Data 270
Factor Variable 453
Decomposable models 208, 208, 208, 232, 232
Fisher's Scoring algorithm 33
Delta Method 15, 287
Feldman &; Klinger method 82
Design Matrix 453, 455, 458
Deviance 43
Diagnostics 318
Dfbeta 319
Delta deviance 319
Delta Chi-Square 319
Implementation of with SAS software 321

Fisher's Exact Test 85, also see 140


implementation 87
Follow-up study 94
Forward Selection Strategy 256, 258, 338
Examples 258
Freeman-Tukey T2 43

Diagonal Band Models 470, see definition 468


Uniform loyalty model 471
Quasi-Independence Model 471
Triangles Parameter Model 472
Diagonal Models 468
Principal Diagonal class 468, see also 468
Diagonal Band Models 468, see also 470
Full Diagonal Models 468, see also 473

Full Diagonal Models 473


Diagonals-Absolute Model 474
Full Multinomial Model 99, 151
Independence hypothesis 100, 176
Functions of Odds-ratios 112
Asymptotic Variance of g(9) 112
Yule's Q 112

Diagonal-Parameter Symmetry 459


G 2 42

Dichotomous variable 2
General Association Model 429-432
Differential weight agreement 486
Direct Estimates Rules 209

General 3-way Tables 202


Political data 227

Distance Models 468

Generalized Estimating Equations 506, 508


With binary response 506
With nonbinary response 527

Dose-Response Models 281

Generalized Independence Models 466

Doubly Classified Data 449

Generalized least squares 509

Discordant 164, see also 106

ED50 289
Effect Modifier 127
Efficiency 513

Generalized Linear Models 27,280


Parameter estimation of 32
Random components of 28
Systematic components of 30
Fisher's Scoring algorithm 33
Canonical Link Functions 30

SUBJECT INDEX
for Normal distribution 30
for Poisson distribution 31
for Binomial distribution 30, 31
for Gamma distribution 31
Summary of Canonical Links 32

559
Likelihood equations 33
Lindsay Statistic 102
Linear-by-linear 416, see also 487, 493

Generating classes 207

Linear Diagonal Parameter Symmetry 461

GOF for Binomial Data 68


Variance test for the Binomial 72

Linear Probability Model 372

GOF for Poisson data 57


Horse Kicks data 58
Test for change in Poisson level 61
Variance test for Poisson 60

Linear Trend Models 62


Link functions 30
Local effects model 66

Goodness-of-fit test statistics 41, 52


Likelihood ratio test 42
The deviance 43
The Freeman-Tukey 43
Power-Divergence statistics 44
Graphical Models 208, 232
Grouping Categories 434
Hierarchy Principle 202
HIV Status Data 299
Homogeneity Hypothesis 95
In Two-way Tables 147
HIV Testing Example 97
Hypergeometric distribution 81, 26, 25
Generalizations of 25
Means and Variances 25, 83
Independence Models 176, see also 100, 186
for / X J Tables 178
ASE of Parameter estimates 181
Estimates based on SAS GENMOD 184,
185
Implementation with SAS software 153,
187
Indicator variables 454

Logits 15, 283


Logit Models 273, 283, 285, 353, 354
Logistic Regresion 279
Factor-response Example 338
Implementations with SAS software 291,
291, 294, 295
Log-Linear Model 169, see also 99
Overparametrization 171
Identifiability constraints 171
Sum to zero 171
Last Category to zero 171
Saturated model 171
Selection based on 260
Parameter Estimates from GENMOD 173
Standard errors 173
CATMOD implementation 175, 180
GENMOD implementation 175, 184
Problems associated with 241
In incomplete tables 244
Longitudinal studies 499, 506
Loyalty Model 467, 470, 471
Marginal Association 221, 233, 234, 225, 226
Selection Strategy based on 261

Interaction Analysis 189

Marginal modeling 500


marginal parameters 506
population-averaged parameters 506

Intracluster correlation 511

Marginal Homogeneity 456

Intrinsic Association 417, 418


Matched-pair 106, 132, 331, 449
Iterative Proportional Fitting algorithm 212,
see also 190
Kruskal's gamma 165
Large Sample test 90
Yates continuity correction 91
In Two-way Tables 142, 142
LD50 289
Computation from Logistic models 289
Computations from Probit models 296
Likart variable 2

McNemar's test 107, 452


Concordant pairs 106, 164
Discordant pairs 106, 164
Mean Response Models 404
Measures of Agreement 483
Kappa 484
Homogeneous model 486
Exact model 486
PQS Model 487
QUA model 487
OQS model 487

560

SUBJECT INDEX

Measures of Association 109


In Two-way Tables 163
Odds-ratio 6 110
Cross-product ratio 110

Parameters Interpretation in LLM 217


in 3-way Tables 218-227
in Higher Tables 230

Melia & Diener-West 491

Partial Association 221, 233, 233, 225


Selection Strategy based on 261-263

metric variable 3

Partioning the G2 Statistic 155

Mobility Data 451

Party Affiliation 450


Party Loyalty 450

Model based s.e. 512


Pearson's Correlation Coefficient 117
Model Selection Strategies 255

Pearson's Residuals 509

Modified X2 test 51

Pearson's X

Moment Generating Functions, 9


Definition, 9
Properties 10
for Normal Distribution 11
for Binomial Distribution 12
for Poisson 16

Pivot cell 81

Mother Rats data 508


Mover-stayer model 467
Multi-Category Response Models 381, see also
415
Multi-Rater 490
Multinomial Distribution 19, 39
Factorial moments of 21
Means & Variances 21
Maximum Likelihood Estimation of 22
Multivariate Hypergeometric 136
Moments 137

41

Pneumoconiosis 382
Poisson Distribution 16
Conditional property 17
Asymptotic property 17
Predictive values 115
Poisson regression 529
offset 248
exposure 248
cell weight 248
Preference Model 479, 479
Estimated probabilities 480
Primary tail probability 87, 89
Principal Diagonal Class Models 469-470, 468
Fixed Distance Model 469
Variable Distance Model 469

Naive estimate 511, 524


standard errors 512, 524

Probit 15, 281, 285, 295

Newton-Raphson Iterative algorithm 214

Product Binomial Model 93

Nominal variable 53

Product Multinomial Model 146

Nonbinary response 527

Proportional hazard model 336

Non-Independence Models 477


Non standard Log-Linear 453

Proportional Odds Model 391


Oxford Shopping Example 393
Estimated Probabilities 395
in higher tables 405-411

Obesity 500, 536

Prospective Studies 94, 95

Odds symmetry models 463

Quasi-Conditional Symmetry 459, 479

Ordered categories 2, 55, 486

Quasi-Diagonals Parameters 462

Ordinal Response Models 389,415

Quasi-Independence 157,189, 189, 471


Incomplete Tables 158, 244
Implementations 160, 162, 193, 193,194,
195

Ordinal-quasi symmetry 487


Over Dispersion 321
Modeling of 322

Quasi log-linear model 245, 247

SUBJECT INDEX
implementation with SAS software 246

561
Somers' d 165

Quasi-odds symmetry 465

Specificity 115

Quasi-perfect mobility 467

Stepwise Selection Strategy 257, see also 341

Quasi-Symmetry 452, see also 479, 454, 451,


452,479

Structural Breakdown of G2 210


Structural zero 158, 159, 244

Ranking
by
by
by

Procedures 88
probabilities 40, 86, 89, 90
Log-Likelihood 86, 89, 90
CP 86,89, 90

Rasch Model 503


item response model 503
Rating Experiment 451

Sub-total 458
Sufficient Configurations 205
for two dimensional Tables 205
for three dimensional Tables 205
for four dimensional Tables 206
Sufficient Statistics 203
Sufficient Configurations 205

Reduced models 199


Summary Recommendations 102
Regression Variable 453
Relative Potency 304
Computations of 307

Symmetry Models 55, 55, 451, 452, 454, 455


The 2 x 2 Table 79
Sampling Consideration 80

Relative Risk 95, 113


Repeated Measures design 499
Repeated binary logit model 500, see also 273
Residual Analysis 153
Rogue cells 154
Retrospective Studies 94, 95

The Mid-P test 92


in Two-way Tables 141
Three-way Tables 195
Gun registration data 195, 221, 355
SAS software implementation 197, 199
Tolerance Distributions 281
Normal Tolerance Distribution 281
Logistic Tolerance Distribution 282

Robust standard error 512


Two-way / x J table 135
Row Association Model 420
an Example of 421
Estimating log-odds ratios of 126, 421

Unconditional marginal homogeneity 456


Uniform Association Model 468

Scaled Deviance 324


Uniform Inheritance Model 467
Scores 416, 417
effect of changing 423
row mean score 146
Secondary tail probability 87, 89
Selection Criteria 265
AIC 265
BIG 265
Sensitivity 115
Simpson's paradox 122, 241
Mantel-Haenszel Test 124
Six city example 514
Skew-Symmetry Models 477
Small expected values 162
in log-linear models 242

Wald Statistic 101


Wald-Wolfowitz run test 508
Weighted data 248
Wheezing Data 515, 535
William's Procedure 325
Within-subjects 506, 508
Yarnold's rule 48, 50

You might also like