Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
Image Segmentation via Multiscale Perceptual Grouping
Next Article in Special Issue
Visualising Departures from Symmetry and Bowker’s X2 Statistic
Previous Article in Journal
Computed Mass-Fragmentation Energy Profiles of Some Acetalized Monosaccharides for Identification in Mass Spectrometry
Previous Article in Special Issue
Asymmetry Model Based on Quasi Local Odds Symmetry for Square Contingency Tables
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Measures of Departure from Local Marginal Homogeneity for Square Contingency Tables

1
Department of Information Science, Graduate School of Science and Technology, Tokyo University of Science, Noda City 278-8510, Japan
2
Department of Information Science, Faculty of Science and Technology, Tokyo University of Science, Noda City 278-8510, Japan
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(6), 1075; https://doi.org/10.3390/sym14061075
Submission received: 24 March 2022 / Revised: 27 April 2022 / Accepted: 19 May 2022 / Published: 24 May 2022
(This article belongs to the Special Issue Advances in Quasi-Symmetry Models)

Abstract

:
When focusing on changes in political party support, it is crucial to determine whether or not there has been a change in the aggregate. From this perspective, various types of marginal homogeneity models have been proposed. We propose local marginal homogeneity models, which indicate that there are symmetric structures of probabilities for only one pair of symmetric marginal probabilities or cumulative probabilities. In addition, we propose two measures, one for nominal categories and one for ordered categories, to express the degree of departure from local marginal homogeneity models. We also apply the measures to data and confirm that the measures help compare the degree of departure from the model in several tables.

1. Introduction

Let us consider r × r contingency tables with the same row and column classifications. In such contingency tables, the test of independence is meaningless because the observations are concentrated on the main diagonal cell. Therefore, we perform an analysis with respect to the symmetry of the contingency table. Let p i j denote the probability that an observation will fall in the ( i , j )th cell of the table ( i = 1 , , r ; j = 1 , , r ). For nominal contingency tables, several symmetry models with respect to the main diagonal are considered. The symmetry (S) model (Bowker [1] and Bishop et al. [2]) is defined as
p i j = p j i for   all   ( i , j ; i j ) .
The partial symmetry (PS) model (Saigusa et al. [3]) is defined as
p i j = p j i for   at   least   one   ( i , j ; i j ) .
The local symmetry (LS) model (Saigusa et al. [4]) is defined as
p i j = p j i for   only   one   ( i , j ; i j ) .
The LS model indicates that the cell probability that an observation falls in the ith row category and the jth ( > i ) column category is equal to the probability that the observation falls in the jth row category and the ith column category, for only one ( i , j ). Because of the strong constraints of the S model, various models using marginal probabilities have been proposed to loosen the constraints. The marginal homogeneity (MH) model (Stuart [5]) is defined as
p i · = p · i for   all i = 1 , , r ,
where p i · = t = 1 r p i t , and p · i = s = 1 r p s i . The partial marginal homogeneity (PMH) model (Saigusa et al. [6]) is defined as
p i · = p · i for   at   least   one   i = 1 , , r .
In addition to these, other symmetry (e.g., quasi symmetry [7]) models or asymmetry (e.g., conditional symmetry [8], diagonal-parameter symmetry [9], and linear diagonals-parameter symmetry [10]) models are proposed.
Some symmetry models are also proposed for square contingency tables with ordered categories, including cumulative probabilities from the upper-right and lower-left corners of the table. Let us denote the row and column variables by X and Y, respectively. The cumulative probability is defined as
C i j = P ( X i , Y j ) = s = 1 i t = j r p s t when   i < j , P ( X i , Y j ) = s = i r t = 1 j p s t when   i > j .
Then, the S model can also be expressed as
C i j = C j i for   all ( i , j ; i j ) .
The cumulative partial symmetry (CPS) model (Saigusa et al. [11]) is defined as
C i j = C j i for   at   least   one   ( i , j ; i j ) .
The cumulative local symmetry (CLS) model (Saigusa et al. [12]) is defined as
C i j = C j i for   only   one   ( i , j ; i j ) .
The CLS model describes the probability that an observation falls in the ith row category or below and the jth ( > i ) column category or above (upper-right corner) is equivalent to the probability that the observation falls in the jth row category or above and the ith column category or below (lower-left corner), for only one ( i , j ). Also proposed are some marginal homogeneity models that have cumulative probabilities. The cumulative probability is defined as
G 1 ( i ) = P ( X i , Y i + 1 ) = s = 1 i t = i + 1 r p s t , G 2 ( i ) = P ( X i + 1 , Y i ) = s = i + 1 r t = 1 i p s t .
Then, the MH model is expressed as
G 1 ( i ) = G 2 ( i ) for   all i = 1 , , r 1 .
The cumulative partial marginal homogeneity (CPMH) model (Nakagawa et al. [13]) is defined as
G 1 ( i ) = G 2 ( i ) for   at   least   one   i = 1 , , r 1 .
Some statistics for testing the goodness of fit of the MH model are provided by, for example, Stuart [5], Bhapkar [14], Fleiss and Everitt [15], Bishop et al. [2] and Agresti [16]. Let us now consider several square tables. When there is no structure of MH in any of these tables, we are interested in measuring and comparing the degrees of departure from MH in the tables. The test statistic can be used for testing the goodness-of-fit of the MH model, but the test statistic is not suitable for comparing the degrees of departure from the MH model in several square tables. See Tomizawa et al. [17] for details.
We mention that statistics cannot measure the degree of departure from the model for some contingency tables that do not fit the model. Therefore, measures have been proposed to measure the degree of departure from the model. In the analysis of two-way contingency tables, the degree of departure from independence is assessed by using measures of association between the row and column variables. Measures of association include, for example, Yule’s coefficients of association and colligation [18,19], Cramér’s coefficient [20], and Goodman and Kruskal’s coefficient [21]. For contingency tables with nominal categories, measures to represent the degree of departure from the S, PS, and LS models have been developed (Tomizawa et al. [22], Saigusa et al. [3], and Saigusa et al. [4]). These measures are given by Patil and Taillie as forms of weighted arithmetic, geometric, and harmonic means of a diversity index consisting of cell probabilities [23]. In the sense that the values of these measures do not depend on the order of the categories, these measures may not be suitable for ordered contingency tables. For square contingency tables with ordered categories, several measures of the structure of cumulative probability are proposed that incorporate information about the order of the categories. The measures for the S, CPS, and CLS models are given as weighted arithmetic, geometric, and harmonic means of the diversity index consisting of the cumulative probabilities C i j (Tomizawa et al. [24], Saigusa et al. [11], and Saigusa et al. [12]). Similarly, measures to represent the degree of departure from several MH models are proposed. For square contingency tables with nominal categories, the measures for the MH and PMH models are given as weighted arithmetic and geometric means of the diversity index consisting of marginal probabilities (Tomizawa and Makii [25], Altun and Aktaş [26], and Saigusa et al. [6]). The values of these measures do not depend on the order of the categories. For square contingency tables with ordered categories, the measures for the MH and CPMH models are given as weighted arithmetic and geometric means of the diversity index consisting of the cumulative probabilities G 1 ( i ) and G 2 ( i ) (Tomizawa et al. [17] and Nakagawa et al. [13]).
On the other hand, the Rand index [27] is proposed as a correspondence measure between different partitions. Hubert and Arabie [28] introduce an extension of the Rand index and its application to the rows and columns of contingency tables. The application to contingency tables is based on dividing the entire sample with respect to row and column categories to form a contingency table. Therefore, the symmetry-related measures and Rand index have different objectives. In addition, the Rand index is calculated based on the number of samples in each contingency table cell, while the measures proposed in prior studies and this paper are not.
This paper aims to propose local marginal homogeneity models for marginal probabilities and cumulative probabilities. Moreover, we propose weighted harmonic mean measures for the proposed models. Section 2 proposes new measures for the local homogeneity of marginal probabilities p i · and p · i with nominal categories and cumulative probabilities G 1 ( i ) and G 2 ( i ) with ordered categories. Section 3 provides an approximate confidence interval of the measures. Section 4 denotes the properties of the measures using artificial data sets. Section 6 shows examples that apply to the measures.

2. New Models and Measures

In Section 2.1, we propose a new model that has the structure of local marginal homogeneity for a square contingency table with nominal categories; we also propose its measure, which expresses the degree of departure from the model. In Section 2.2, we define another model with cumulative local marginal homogeneity structure for a square contingency table with ordered categories; we also provide its measure.

2.1. For the Nominal Category

For square contingency tables with nominal categories, we propose a local marginal homogeneity (LMH) model defined by
p i · = p · i for only one i ( i = 1 , , r ) .
The LMH model describes that the probability that an observation falls in the ith row category is equal to that of the observation falling in the ith column category, for only one i.
Let us assume that p i · + p · i 0 ( i = 1 , , r ) and p i · p · i for any i except for only one a. We propose the following measure:
ψ M H ( H ) ( λ ) = s = 1 r ψ s ( λ ) i = 1 r π i s = 1 s i r ψ s ( λ ) ( λ > 1 ) ,
where π i = ( p i · + p · i ) / 2 , p 1 ( i ) = p i · / ( p i · + p · i ) , p 2 ( i ) = p · i / ( p i · + p · i ) ,
ψ i ( λ ) = 1 λ 2 λ 2 λ 1 I i ( λ ) ,
I i ( λ ) = 1 λ 1 p 1 ( i ) λ + 1 p 2 ( i ) λ + 1 .
For λ = 0 , we define that ψ M H ( H ) ( 0 ) = lim λ 0 ψ M H ( H ) ( λ ) . Note that λ is a real value chosen by users. The index I i ( λ ) is a diversity index of degree- λ for { p 1 ( i ) , p 2 ( i ) } . We note that the diversity index includes the Shanon entropy (when λ = 0 ) and the Gini concentration (when λ = 1 ) in special cases. For more details of this diversity index, see Patio and Taillie [23]. We can rewrite submeasure ψ i ( λ ) as follows:
ψ i λ = λ ( λ 1 ) 2 λ 1 D i ( λ ) { p k ( i ) } ; 1 2 ,
D i ( λ ) { p k ( i ) } ; 1 2 = 1 λ ( λ + 1 ) p 1 ( i ) p 1 ( i ) 1 / 2 λ 1 + p 2 ( i ) p 2 ( i ) 1 / 2 λ 1 .
D i ( λ ) is a power divergence between two distributions: { p 1 ( i ) , p 2 ( i ) } and { 1 / 2 , 1 / 2 } . We note that the power divergence includes the Kullback–Leibler (KL) information (when λ = 0 ) and the Pearson chi-squared type discrepancy (when λ = 1 ) in special cases. For more details of the power divergence, see Cressie and Read [29] and Read and Cressie [30]. For any λ > 1 , the ψ M H ( H ) ( λ ) has the following characteristics:
1.
the measure ψ M H ( H ) ( λ ) must lie between 0 and 1.
2.
ψ M H ( H ) ( λ ) = 0 if and only if the LMH model holds.
3.
ψ M H ( H ) ( λ ) = 1 if and only if the degree of departure from LMH is the maximum, in the sense that p i · = 0 (then p · i > 0 ) or p · i = 0 (then p i · > 0 ) for all i = 1 , , r .
When the LMH model does not hold, it is easy to see that
ψ M H ( H ) ( λ ) = i = 1 r π i ψ i ( λ ) 1 .
Namely, the measure is expressed as the weighted harmonic mean of { ψ i ( λ ) }.
The measure ψ M H ( H ) ( λ ) is appropriate for analyzing data on a nominal scale because the value of ψ M H ( H ) ( λ ) is invariant under the same arbitrary permutation of the row and column categories.

2.2. For the Ordered Category

For square contingency tables with ordered categories, we propose the cumulative local marginal homogeneity (CLMH) model defined by
G 1 ( i ) = G 2 ( i ) for only one ( i = 1 , , r 1 ) .
The CLMH model describes that the probability that an observation falls in the ith row category or below and the i + 1 th column category or above is equal to the probability that the observation falls in the i + 1 th row category or above and the ith column category or below, for only one i.
Assume that G 1 ( i ) + G 2 ( i ) 0 ( i = 1 , , r 1 ) and G 1 ( i ) G 2 ( i ) for any i except for only one a. We propose the following measure:
τ M H ( H ) ( λ ) = s = 1 r 1 ω s ( λ ) i = 1 r 1 G 1 ( i ) * + G 2 ( i ) * s = 1 s i r 1 ω s ( λ ) ( λ > 1 ) ,
where G s ( i ) * = G s ( i ) / Δ ( Δ = i = 1 r 1 ( G 1 ( i ) + G 2 ( i ) ) ) , G s ( i ) c = G s ( i ) / ( G 1 ( i ) + G 2 ( i ) ) ,
ω i ( λ ) = 1 λ 2 λ 2 λ 1 H i ( λ ) , H i ( λ ) = 1 λ 1 G 1 ( i ) c λ + 1 G 2 ( i ) c λ + 1 .
For λ = 0 , we define that τ M H ( H ) ( 0 ) = lim λ 0 τ M H ( H ) ( λ ) . The measure holds the following properties, which are the same as the measure of the LMH model in Section 2.1. For any λ > 1 :
(1)
the measure τ M H ( H ) ( λ ) must lie between 0 and 1.
(2)
τ M H ( H ) ( λ ) = 0 if and only if the probability table has the structure of CLMH.
(3)
τ M H ( H ) ( λ ) = 1 if and only if the probability table has the structure of complete marginal inhomogeneity in the sense that G 1 ( i ) = 0 (then G 2 ( i ) 0 ) or G 2 ( i ) = 0 (then G 1 ( i ) 0 ) for all i = 1 , , r 1 .
It should be noted that the measure τ M H ( H ) ( λ ) is expressed as the weighted harmonic mean of { ω s ( λ ) } .

3. Approximate Confidence Interval of the Measures

In this section, we construct an approximate confidence interval for ψ M H ( H ) ( λ ) and τ M H ( H ) ( λ ) . As seen in Section 2, the measures ψ M H ( H ) ( λ ) and τ M H ( H ) ( λ ) are the functions of p i j . For the sake of general discussion, we first consider Φ ( λ ) as a function of p i j and construct an approximate confidence interval for it. Then, we obtain the approximate confidence intervals of the measures ψ M H ( H ) ( λ ) and τ M H ( H ) ( λ ) by replacing Φ M H ( H ) ( λ ) with ψ M H ( H ) ( λ ) and τ M H ( H ) ( λ ) . Let n i j denote the observed frequency in the ( i , j )th cell of the table ( i = 1 , , r ; j = 1 , , r ). Assuming that a multinomial distribution applies to the r × r table, we consider the approximate standard error and the large-sample confidence interval of the measure Φ ( λ ) using the delta method, the description of which is given by, for example, Bishop et al. [2] and Agresti [31]. The sample version of Φ ( λ ) , i.e., Φ ^ ( λ ) , is given by Φ ( λ ) with { p i j } replaced by { p ^ i j }, where p ^ i j = n i j / N and N = i = 1 r j = 1 r n i j . Using the delta method, N Φ ^ ( λ ) Φ ( λ ) asymptotically (as N ) has a normal distribution with a mean of zero and a variance of σ 2 , where
σ 2 = i = 1 r j = 1 r p i j Φ ( λ ) p i j 2 i = 1 r j = 1 r p i j Φ ( λ ) p i j 2 ( λ > 1 ) .
Let σ ^ 2 denote σ 2 with { p i j } replaced by { p ^ i j } . Then, σ ^ / N is an estimated approximate standard error for Φ ^ ( λ ) , and Φ ^ ( λ ) ± z α / 2 σ ^ / N is the approximate ( 1 α ) confidence limit for Φ ( λ ) , where z α / 2 is the upper α / 2 point of the standard normal distribution.
The confidence interval of the measure ψ M H ( H ) ( λ ) is given by Φ ( λ ) / p i j replaced by γ i j ( λ ) , where
γ i j ( λ ) = ψ M H ( H ) ( λ ) 2 1 ψ i ( λ ) 2 A 12 ( i ) + 1 ψ j ( λ ) 2 A 21 ( j ) ( λ 0 ) ,
with
A 12 ( i ) = ψ i ( λ ) 2 2 λ 1 ( λ + 1 ) 2 λ 1 p 2 ( i ) p 1 ( i ) λ p 2 ( i ) λ , A 21 ( i ) = ψ i ( λ ) 2 + 2 λ 1 ( λ + 1 ) 2 λ 1 p 1 ( i ) p 1 ( i ) λ p 2 ( i ) λ ,
and the confidence interval of the measure τ M H ( H ) ( λ ) is also given by Φ ( λ ) / p i j replaced by β i j ( λ ) , where
β i j ( λ ) = τ M H ( H ) ( λ ) 2 Δ k = i j 1 B 12 ( k ) + ( j i ) τ M H ( H ) ( λ ) Δ ( i < j ) , τ M H ( H ) ( λ ) 2 Δ k = t i 1 B 21 ( k ) + ( i j ) τ M H ( H ) ( λ ) Δ ( i > j ) ,
with
B 12 ( k ) = 2 λ ( λ + 1 ) G 2 ( k ) c ( 2 λ 1 ) ( ω k ( λ ) ) 2 G 1 ( k ) c λ G 2 ( k ) c λ 1 ω k ( λ ) , B 21 ( k ) = 2 λ ( λ + 1 ) G 1 ( k ) c ( 2 λ 1 ) ( ω k ( λ ) ) 2 G 2 ( k ) c λ G 1 ( k ) c λ 1 ω k ( λ ) ,
and γ i j ( 0 ) = lim λ 0 γ i j ( λ ) , β i j ( 0 ) = lim λ 0 β i j ( λ ) .

4. Properties of Measures

In this section, we check the properties of the measures given in this paper and their relationship to the measures proposed in previous studies using artificial data. Firstly, we show that the proposed measures are the smallest in each of the nominal contingency tables and ordered contingency tables. Let us denote the measures for MH and PMH for nominal contingency tables ψ M H ( A ) and ψ M H ( G ) , respectively (see Appendix A). Since the arithmetic mean is larger than the geometric mean, it holds that
ψ M H ( H ) ( λ ) ψ M H ( G ) ( λ ) ψ M H ( A ) ( λ )
and the equal signs can be used only when
ψ 1 ( λ ) = ψ 2 ( λ ) = = ψ r ( λ ) .
This means that, from the formula ψ i ( λ ) , the ratio of p 1 ( i ) and p 2 ( i ) is equal for all i.
Let us denote the measure for MH and CPMH for ordered contingency tables τ M H ( A ) and τ M H ( G ) , respectively (see Appendix A). In the same manner as in the discussion above, it holds that
τ M H ( H ) ( λ ) τ M H ( G ) ( λ ) τ M H ( A ) ( λ )
and equal signs can be used only when
ω 1 ( λ ) = ω 2 ( λ ) = = ω r 1 ( λ ) .
From the formula ω i ( λ ) , the ratio of G 1 ( i ) c and G 2 ( i ) c is also equal for all i.
Now, we check the above properties by using artificial data, as seen in Table 1 and Table 2. As we can see from a glance at Table 2, properties (1) and (2) are satisfied. Table 1a is a table with p 1 · = p · 1 and G 1 ( 1 ) = G 2 ( 1 ) . From Table 2a(a) and Table 2b(a), it can be confirmed that ψ M H ( H ) ( λ ) = τ M H ( H ) ( λ ) = 0 . In Table 1c,d, as we can see from the actual calculation, G 1 ( i ) c / G 2 ( i ) c is equivalent to 1 / 2   or   2 , ω 1 ( λ ) = ω 2 ( λ ) = ω 3 ( λ ) and p 1 ( i ) / p 2 ( i ) are equal to 1 / 3   or   3 , ψ 1 ( λ ) = ψ 2 ( λ ) = ψ 3 ( λ ) = ψ 4 ( λ ) , respectively. Therefore, it can be confirmed that ψ M H ( H ) ( λ ) = ψ M H ( G ) ( λ ) = ψ M H ( A ) ( λ ) and τ M H ( H ) ( λ ) = τ M H ( G ) ( λ ) = τ M H ( A ) ( λ ) from Table 2a(c,d). Table 1b,c has numbers (1) and (4) interchanged. ψ M H ( H ) ( λ ) is invariant from Table 2a(b,c), but τ M H ( H ) ( λ ) has changed from Table 2b(b,c). Therefore, it can be confirmed that τ M H ( H ) ( λ ) is the measure that takes order into account. Table 1e,f provides examples of contingency tables that have the structures with the greatest departures from CLMH and LMH, respectively. They do not necessarily have the same structure.

5. Simulation

This section simulates the probability of coverage of the confidence intervals for the LMH and CLMH model measures.
Simulations were performed on 4 × 4 randomly generated contingency tables. Tables with sample sizes of 200, 500, and 1000 were generated 1000 times according to the probability structure of the contingency tables. Confidence intervals for the LMH and CLMH measures were calculated with eight lambda values (−0.5, 0.0, 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0) to determine the probability of the actual measures falling within the 95% confidence interval.
The confidence interval is sufficiently reliable since it exceeds 90% in most of the cells in Table 3. The probability of an actual measure falling in the confidence interval increases as the sample size increases, but this is not the case for some cells, e.g., the sample size 1000 for λ = 0.0 in Table 3a. This may be because when the sample size is large, the simulation completes without problems even when the scale takes extreme values.

6. Example

In this section, we show examples of the adaptation of each measure for nominal or ordered contingency tables.
The first set of data provides an example of a contingency table with nominal categories taken from Upton [32], showing the changes in choice of voting party for the three parties (Conservative, Labour, and Liberal) and abstentions in 1964, 1966, and 1970. Table 4a shows the results of estimating the measure ψ M H ( H ) ( λ ) for the change in voting party from 1964 to 1966, and Table 4b estimates the measure for the difference in voting party from 1966 to 1970 to see the degree of departure from the LMH model. Table 4a shows that the changes in 1964 and 1966 fit the LMH model well. Table 4b shows that the degree of departure from the LMH model is more significant for the changes in voting party between 1966 and 1970 than between 1964 and 1966.
The second set of data provides an example of a contingency table with ordered categories and is taken from Tominaga [33]; the data show the cross-classifications of occupational statuses for Japanese fathers and their sons in 1955 and 1975. Although it may appear bizarre to think of occupational classes in modern society, we treat them as an ordered category according to the references. The statuses of the category numbers are as follows: (1) professional and managers; (2) clerical and sales; (3) skilled manual, semiskilled manual, and unskilled manual; and (4) farmers. Table 5a shows the results of estimating the measure τ M H ( H ) ( λ ) for the occupation class of a father and son as of 1955, and Table 5b estimates the measure for the occupational class of a father and son as of 1975 to see the degree of departure from the CLMH model. From Table 5, the values in the confidence interval of τ M H ( H ) ( λ ) are greater for Table 5b than for Table 5a. Therefore, the degree of departure from the CLMH model for father and son pairs is estimated to be larger in 1975 than in 1955.

7. Concluding Remarks

For r × r square contingency tables, we proposed an LMH model for nominal categories and a CLMH model for ordered categories. In addition, we proposed harmonic mean-type measures of departure from these models. As shown in the example in Section 6, there are two types of categories, namely, nominal and ordered. If we applied an ordered measure to a nominal contingency table, we would introduce extra information; if we used a nominal measure for an ordered contingency table, information about the order would be lost. Therefore, to analyze a contingency table, it is necessary to consider whether the elements of the categories are ordered or not.
As described in Section 1, the measures of MH, PMH, and LMH models are constructed using arithmetic, geometric, and harmonic means, respectively. We seek to express these three measures in a single formula.

Author Contributions

All authors contributed to the writing and reviewing of the paper. Additionally, K.S. and N.T. implemented the method, contributed the original draft, and co-wrote and revised the paper. A.I. and T.N. contributed to the validation and co-wrote the original and revised versions of the paper. S.T. defined and reviewed the methodology and supervised the whole study and the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in [32,33].

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Measures Proposed in Previous Studies

The measures for the MH and PMH models for nominal contingency tables and the MH and CPMH models for ordered contingency tables are shown. Assuming that p i · + p · i 0 , Tomizawa and Makii [25] proposed a measure to represent the degree of departure from the MH model as follows:
ψ M H ( A ) ( λ ) = i = 1 r π i ψ i ( λ ) for λ > 1
where
π i = p i · + p · i 2 , p 1 ( i ) = p i · p i · + p · i , p 2 ( i ) = p · i p i · + p · i ,
ψ i ( λ ) = 1 λ 2 λ 2 λ 1 I i ( λ ) for λ 0 , 1 1 log 2 I i ( 0 ) for λ = 0 ,
I i ( λ ) = 1 λ 1 p 1 ( i ) λ + 1 p 2 ( i ) λ + 1 for λ 0 , p 1 ( i ) log p 1 ( i ) p 2 ( i ) log p 2 ( i ) for λ 0 .
Saigusa et al. [6] proposed a measure for the PMH model defined by
ψ M H ( G ) ( λ ) = i = 1 r ψ i ( λ ) π i for λ > 1 .
Assuming that G 1 ( i ) + G 2 ( i ) 0 , Tomizawa et al. [17] proposed a measure to represent the degree of departure from the MH model as follows:
τ M H ( A ) ( λ ) = i = 1 r 1 G 1 ( i ) * + G 2 ( i ) * ω i ( λ ) for λ > 1
where
G s ( i ) * = G s ( i ) Δ , Δ = i = 1 r 1 G 1 ( i ) + G 2 ( i ) , G s ( i ) c = G s ( i ) G 1 ( i ) + G 2 ( i ) ( s = 1 or 2 ) ,
ω i ( λ ) = 1 λ 2 λ 2 λ 1 H i ( λ ) for λ 0 , 1 1 log 2 H i ( 0 ) for λ = 0 ,
H i ( λ ) = 1 λ 1 G 1 ( i ) c λ + 1 G 2 ( i ) c λ + 1 for λ 0 , G 1 ( i ) c log G 1 ( i ) c G 2 ( i ) c log G 2 ( i ) c for λ 0 .
Nakagawa et al. [13] proposed a measure for the CPMH model defined by
τ M H ( G ) ( λ ) = i = 1 r 1 ω i ( λ ) G 1 ( i ) * + G 2 ( i ) * for λ > 1 .
It can be seen that the measure ψ M H ( A ) ( λ ) and τ M H ( A ) ( λ ) are weighted arithmetic means of the submeasure ψ i ( λ ) and ω i ( λ ) , respectively. ψ M H ( G ) ( λ ) and τ M H ( G ) ( λ ) are also weighted geometric means of the submeasure ψ i ( λ ) and ω i ( λ ) , respectively.

Appendix B. Differentiation of the Proposed Measures

Appendix B.1. Measure of LMH

Consider p i j ( i = 1 , , r , j = 1 , , r ) . Differentiating ψ M H ( H ) ( λ ) by p i j , we obtain
p i j ( ψ M H ( H ) ( λ ) ) = i = 1 r π i s = 1 s i r ψ s ( λ ) 1 · p i j s = 1 r ψ s ( λ ) + s = 1 r ψ s ( λ ) · p i j i = 1 r π i s = 1 s i r 1 ψ s ( λ ) 1 = ψ M H ( H ) ( λ ) 2 π i ( ψ i ( λ ) ) 2 · ψ i ( λ ) p i j + π j ( ψ j ( λ ) ) 2 · ψ j ( λ ) p i j ψ M H ( H ) ( λ ) 2 1 ψ i ( λ ) · π i p i j + 1 ψ j ( λ ) · π j p i j .
Considering the derivative of ψ i ( λ ) and ψ j ( λ ) , we obtain
ψ i p i j = 2 λ 1 ( λ + 1 ) 2 λ 1 p 2 ( i ) π i p 1 ( i ) λ p 2 ( i ) λ , ψ j p i j = 2 λ 1 ( λ + 1 ) 2 λ 1 p 1 ( j ) π j p 1 ( j ) λ p 2 ( j ) λ .
Because π i / p i j and π j / p i j is equal to 1/2, we obtain
p i j ( ψ M H ( H ) ( λ ) ) = ψ M H ( H ) ( λ ) 2 π i ( ψ i ( λ ) ) 2 · ψ i ( λ ) p i j + π j ( ψ j ( λ ) ) 2 · ψ j ( λ ) p i j ψ M H ( H ) ( λ ) 2 1 ψ i ( λ ) · π i p i j + 1 ψ j ( λ ) · π j p i j = ψ M H ( H ) ( λ ) 2 1 2 ψ i ( λ ) 2 λ 1 ( λ + 1 ) 2 λ 1 p 2 ( i ) ( ψ i ( λ ) ) 2 p 1 ( i ) λ p 2 ( i ) λ ψ M H ( H ) ( λ ) 2 1 2 ψ j ( λ ) + 2 λ 1 ( λ + 1 ) 2 λ 1 p 2 ( j ) ( ψ j ( λ ) ) 2 p 1 ( j ) λ p 2 ( j ) λ .

Appendix B.2. Measure of CLMH

Consider p s t ( s < t ) ( s = 1 , , r , t = 1 , , r ) . Differentiating τ M H ( H ) ( λ ) by p s t , we obtain
p s t ( τ M H ( H ) ( λ ) ) = i = 1 r 1 ( G 1 ( i ) * + G 2 ( i ) * ) s = 1 s i r 1 ω s ( λ ) 1 · p s t s = 1 r 1 ω s ( λ ) + s = 1 r 1 ω s ( λ ) · p s t i = 1 r 1 ( G 1 ( i ) * + G 2 ( i ) * ) s = 1 s i r 1 ω s ( λ ) 1 = τ M H ( H ) ( λ ) 2 G 1 ( s ) * + G 2 ( s ) * ( ω s ( λ ) ) 2 · ω s ( λ ) p s t + + G 1 ( t 1 ) * + G 2 ( t 1 ) * ( ω t 1 ( λ ) ) 2 · ω t 1 ( λ ) p s t τ M H ( H ) ( λ ) 2 1 ω 1 ( λ ) · ( G 1 ( 1 ) * + G 2 ( 1 ) * ) p s t + + 1 ω r ( λ ) · ( G 1 ( r ) * + G 2 ( r ) * ) p s t .
Considering the derivative of ω s ( λ ) , we obtain
ω s ( λ ) p s t = 2 λ ( λ + 1 ) G 2 ( s ) c ( 2 λ 1 ) ( G 1 ( s ) + G 2 ( s ) ) ( ( G 1 ( s ) c ) λ ( G 2 ( s ) c ) λ ) .
Consider with respect to the derivative of G 1 ( i ) * + G 2 ( i ) * . Assume that G 1 ( n ) * contains p s t and G 1 ( m ) * does not contain p s t , we have
( G 1 ( n ) * + G 2 ( n ) * ) p s t = 1 Δ { 1 ( t s ) ( G 1 ( n ) * + G 2 ( n ) * ) } , ( G 1 ( m ) * + G 2 ( m ) * ) p s t = ( t s ) 1 Δ ( G 1 ( n ) * + G 2 ( n ) * ) .
Substituting these derivatives into the derivative of τ M H ( H ) ( λ ) , we obtain
p s t ( τ M H ( H ) ( λ ) ) = ( τ M H ( H ) ( λ ) ) 2 G 1 ( s ) * + G 2 ( s ) * ( ω s ( λ ) ) 2 · ω s ( λ ) p s t + + G 1 ( t 1 ) * + G 2 ( t 1 ) * ( ω t 1 ( λ ) ) 2 · ω t 1 ( λ ) p s t ( τ M H ( H ) ( λ ) ) 2 1 ω 1 ( λ ) · ( G 1 ( 1 ) * + G 2 ( 1 ) * ) p s t + + 1 ω r ( λ ) · ( G 1 ( r ) * + G 2 ( r ) * ) p s t = ( τ M H ( H ) ( λ ) ) 2 Δ k = s t 1 2 λ ( λ + 1 ) G 2 ( k ) c ( 2 λ 1 ) ( ω k ( λ ) ) 2 ( ( G 1 ( k ) c ) λ ( G 2 k s ) c ) λ ) 1 ω k ( λ ) + ( t s ) τ M H ( H ) ( λ ) Δ .
Similarly consider p s t ( s > t ) ( s = 1 , , r , t = 1 , , r ) . Noting that the derivative of ω s ( λ ) is
ω s ( λ ) p s t = 2 λ ( λ + 1 ) G 1 ( s ) c ( 2 λ 1 ) ( G 1 ( s ) + G 2 ( s ) ) ( ( G 2 ( s ) c ) λ ( G 1 ( s ) c ) λ ) ,
the derivative of τ M H ( H ) ( λ ) is
p s t ( τ M H ( H ) ( λ ) ) = ( τ M H ( H ) ( λ ) ) 2 G 1 ( t ) * + G 2 ( t ) * ( ω t ( λ ) ) 2 · ω t ( λ ) p s t + + G 1 ( s 1 ) * + G 2 ( s 1 ) * ( ω s 1 ( λ ) ) 2 · ω s 1 ( λ ) p s t ( τ M H ( H ) ( λ ) ) 2 1 ω 1 ( λ ) · ( G 1 ( 1 ) * + G 2 ( 1 ) * ) p s t + + 1 ω r ( λ ) · ( G 1 ( r ) * + G 2 ( r ) * ) p s t = ( τ M H ( H ) ( λ ) ) 2 Δ k = t s 1 2 λ ( λ + 1 ) G 1 ( k ) c ( 2 λ 1 ) ( ω k ( λ ) ) 2 ( ( G 2 ( k ) c ) λ ( G 1 ( k ) c ) λ ) 1 ω k ( λ ) + ( s t ) ( τ M H ( H ) ( λ ) ) Δ .

References

  1. Bowker, A.H. A test for symmetry in contingency tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef]
  2. Bishop, Y.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analysis: Theory and Practice; The MIT Press: Cambridge, UK, 1975. [Google Scholar]
  3. Saigusa, Y.; Tahata, K.; Tomizawa, S. Measure of departure from partial symmetry for square contingency tables. J. Math. Stat. 2016, 12, 152–156. [Google Scholar] [CrossRef] [Green Version]
  4. Saigusa, Y.; Takami, M.; Ishii, A.; Tomizawa, S. Measure of departure from local symmetry for square contingency tables. Int. J. Stat. Probab. 2019, 8, 140–145. [Google Scholar] [CrossRef]
  5. Stuart, A. A test for homogeneity of the marginal distributions in a two-way classification. Biometrika 1955, 42, 412–416. [Google Scholar] [CrossRef]
  6. Saigusa, Y.; Kubo, Y.; Tahata, K.; Tomizawa, S. A measure of departure from partial marginal homogeneity for square contingency tables. J. Stat. Appl. Probab. Lett. 2020, 7, 1–7. [Google Scholar]
  7. Caussinus, H. Contribution to correlation analysis of two qualitative variables. Ann. Fac. Des Sci. L’Univ. Toulouse 1965, 29, 77–182. (In French) [Google Scholar] [CrossRef]
  8. McCullagh, P. A class of parametric models for the analysis of square contingency tables with ordered categories. Biometrika 1978, 65, 413–418. [Google Scholar] [CrossRef]
  9. Goodman, L.A. Multiplicative models for square contingency tables with ordered categories. Biometrika 1979, 66, 413–418. [Google Scholar] [CrossRef]
  10. Agresti, A. A simple diagonals-parameter symmetry and quasi-symmetry model. Stat. Probab. Lett. 1983, 1, 313–316. [Google Scholar] [CrossRef]
  11. Saigusa, Y.; Takami, M.; Ishii, A.; Nakagawa, T.; Tomizawa, S. Measure for departure from cumulative partial symmetry square contingency tables with ordered categories. J. Stat. Adv. Theory Appl. 2019, 21, 53–70. [Google Scholar] [CrossRef]
  12. Saigusa, Y.; Takada, T.; Ishii, A.; Nakagawa, T.; Tomizawa, S. Measure of departure from cumulative local symmetry for square contingency tables having ordered categories. Biom. Lett. 2020, 57, 23–35. [Google Scholar] [CrossRef]
  13. Nakagawa, T.; Takei, T.; Ishii, A.; Tomizawa, S. Geometric mean type measure of marginal homogeneity for square contingency tables with ordered categories. J. Math. Stat. 2020, 16, 170–175. [Google Scholar] [CrossRef]
  14. Bhapkar, V.P. A note on the equivalence of two test criteria for hypotheses in categorical data. J. Am. Stat. Assoc. 1966, 61, 228–235. [Google Scholar] [CrossRef]
  15. Fleiss, J.L.; Everitt, B.S. Comparing the marginal totals of square contingency tables. Br. J. Math. Stat. Psychol. 1971, 24, 117–123. [Google Scholar] [CrossRef]
  16. Agresti, A. Testing marginal homogeneity for ordinal categorical variables. Biometrics 1983, 39, 505–510. [Google Scholar] [CrossRef]
  17. Tomizawa, S.; Miyamoto, N.; Ashihara, N. Measure of departure from marginal homogeneity for square contingency tables having ordered categories. Behaviormetrika 2003, 30, 173–193. [Google Scholar] [CrossRef]
  18. Yule, G.U. On the association of attributes in statistics. Philos. Trans. R. Soc. London. Ser. A Contain. Pap. Math. Phys. Character 1900, 194, 257–319. [Google Scholar]
  19. Yule, G.U. On the methods of measuring association between two attributes. J. R. Stat. Soc. 1912, 75, 579–652. [Google Scholar] [CrossRef] [Green Version]
  20. Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
  21. Goodman, L.A.; Kruskal, W.H. Measures of association for cross classifications. J. Am. Stat. Assoc. 1954, 49, 732–764. [Google Scholar]
  22. Tomizawa, S.; Seo, T.; Yamamoto, H. Power-divergence-type measure of departure from symmetry for square contingency tables that have nominal categories. J. Appl. Stat. 1998, 25, 387–398. [Google Scholar] [CrossRef]
  23. Patil, G.P.; Taillie, C. Diversity as a concept and its measurement. J. Am. Stat. Assoc. 1982, 77, 548–561. [Google Scholar] [CrossRef]
  24. Tomizawa, S.; Miyamoto, N.; Hatanaka, Y. Measure of asymmetry for square contingency tables having ordered categories. Aust. N. Z. J. Stat. 2001, 43, 335–349. [Google Scholar] [CrossRef]
  25. Tomizawa, S.; Makii, T. Generalized measures of departure from marginal homogeneity for contingency tables with nominal categories. J. Stat. Res. 2001, 35, 1–24. [Google Scholar]
  26. Altun, G.; Aktaş, S. Measures of departure from marginal homogeneity model in square contingency tables. İstat. Derg. İstat. Aktüerya 2018, 11, 93–108. [Google Scholar]
  27. Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
  28. Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
  29. Cressie, N.; Read, T.R. Multinomial goodness-of-fit tests. J. R. Stat. Soc. Ser. B (Methodol.) 1984, 46, 440–464. [Google Scholar] [CrossRef]
  30. Read, T.R.; Cressie, N.A. Goodness-of-Fit Statistics for Discrete Multivariate Data; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  31. Agresti, A. Categorical Data Analysis, 2nd ed.; Wiley: New York, NY, USA, 2002. [Google Scholar]
  32. Upton, G.J.G. A memory model for voting transitions in British elections. J. R. Stat. Soc. Ser. A (Gen.) 1977, 140, 86–94. [Google Scholar] [CrossRef]
  33. Tominaga, K. Nippon no Kaisou Kouzou (Japanese Hierarchical Structure); University of Tokyo Press: Tokyo, Japan, 1979. (In Japanese) [Google Scholar]
Table 1. Artificial data.
Table 1. Artificial data.
(a)(d)
(1)(2)(3)(4)Total (1)(2)(3)(4)Total
(1)0.120.090.070.020.30(1)0.020.090.120.040.27
(2)0.080.090.120.020.31(2)0.020.030.030.020.10
(3)0.060.030.060.050.20(3)0.020.010.080.040.15
(4)0.040.010.080.060.19(4)0.030.170.220.060.48
Total0.300.220.330.151.00Total0.090.300.450.161.00
(b)(e)
(1)(2)(3)(4)Total (1)(2)(3)(4)Total
(1)0.160.120.050.030.36(1)0.000.200.000.100.30
(2)0.020.100.030.020.17(2)0.000.000.300.050.35
(3)0.040.010.140.020.21(3)0.000.000.000.350.35
(4)0.040.100.000.120.26(4)0.000.000.000.000.00
Total0.260.330.220.191.00Total0.000.200.300.501.00
(c)(f)
(1)(2)(3)(4)Total (1)(2)(3)(4)Total
(1)0.120.100.000.040.26(1)0.000.200.000.450.65
(2)0.020.100.030.020.17(2)0.000.000.000.000.00
(3)0.020.010.140.040.21(3)0.000.050.000.300.35
(4)0.030.120.050.160.36(4)0.000.000.000.000.00
Total0.190.330.220.261.00Total0.000.250.000.751.00
Table 2. Values of six measures for Table 1 that are related to various Marginal Homogeneity models.
Table 2. Values of six measures for Table 1 that are related to various Marginal Homogeneity models.
(a) Measures of nominal categories
Applied tables
(a)(b)(c)(d)(e)(f)
ψ ^ MH ( A ) ( λ ) λ 0.000.0190.0290.0290.1890.4161.000
0.500.0240.0360.0360.2300.4201.000
1.000.0260.0390.0390.2500.4221.000
ψ ^ MH ( G ) ( λ ) λ 0.000.0000.0110.0110.1890.0761.000
0.500.0000.0140.0140.2300.0871.000
1.000.0000.0160.0160.2500.0921.000
ψ ^ MH ( H ) ( λ ) λ 0.000.0000.0020.0020.1890.0121.000
0.500.0000.0020.0020.2300.0151.000
1.000.0000.0020.0020.2500.0171.000
(b) Measures of ordered categories
Applied tables
(a)(b)(c)(d)(e)(f)
τ ^ MH ( A ) ( λ ) λ 0.000.0220.0600.0820.1801.0000.877
0.500.0280.0750.1010.2171.0000.897
1.000.0310.0820.1110.2341.0000.905
τ ^ MH ( G ) ( λ ) λ 0.000.0000.0520.0820.0461.0000.847
0.500.0000.0650.1010.0561.0000.878
1.000.0000.0710.1110.0601.0000.889
τ ^ MH ( H ) ( λ ) λ 0.000.0000.0440.0820.0041.0000.811
0.500.0000.0550.1010.0051.0000.855
1.000.0000.0610.1110.0061.0000.871
Table 3. Simulation results for LMH and CLMH.
Table 3. Simulation results for LMH and CLMH.
(a) Results for LMH(b) Results for CLMH
λ Sample Size λ Sample Size
20050010002005001000
−0.50.9410.9550.949−0.50.8740.8850.940
0.00.9390.9290.8970.00.9460.9510.954
0.50.8740.8900.9180.50.9060.9480.885
1.00.9490.9410.9651.00.9420.9400.947
1.50.9400.9560.9101.50.9370.9500.952
2.00.9620.9400.9512.00.9340.9620.917
2.50.9390.8510.9232.50.9390.9340.948
3.00.9340.9480.9433.00.9360.9270.875
Table 4. The estimated measures, estimated approximate standard errors, and approximate 95% confidence interval for ψ M H ( H ) ( λ ) , applied to voting changes in the 1964, 1966, and 1970 British elections; taken from Upton [32].
Table 4. The estimated measures, estimated approximate standard errors, and approximate 95% confidence interval for ψ M H ( H ) ( λ ) , applied to voting changes in the 1964, 1966, and 1970 British elections; taken from Upton [32].
(a) Result of voting changes between the 1966 and 1964 British elections
λ Estimated measureStandard errorConfidence interval
−0.50.00000.0005(−0.0009, 0.0010)
0.00.00010.0008(−0.0015, 0.0016)
0.50.00010.0010(−0.0019, 0.0021)
1.00.00010.0011(−0.0021, 0.0023)
1.50.00010.0011(−0.0021, 0.0023)
2.00.00010.0011(−0.0021, 0.0023)
2.50.00010.0010(−0.0019, 0.0021)
3.00.00010.0009(−0.0018, 0.0020)
(b) Result of voting changes between the 1966 and 1970 British elections
λ Estimated measureStandard errorConfidence interval
−0.50.00790.0033(0.0014, 0.0144)
0.00.01330.0056(0.0024, 0.0243)
0.50.01670.0070(0.0030, 0.0304)
1.00.01840.0077(0.0033, 0.0335)
1.50.01880.0079(0.0034, 0.0343)
2.00.01840.0077(0.0033, 0.0335)
2.50.01730.0072(0.0031, 0.0315)
3.00.01580.0066(0.0028, 0.0288)
Table 5. The estimated measures, estimated approximate standard errors, and approximate 95% confidence interval for τ M H ( H ) ( λ ) , applied to cross-classifications of the occupational statuses of Japanese fathers and sons in 1955 and 1975 (Tominaga [33]).
Table 5. The estimated measures, estimated approximate standard errors, and approximate 95% confidence interval for τ M H ( H ) ( λ ) , applied to cross-classifications of the occupational statuses of Japanese fathers and sons in 1955 and 1975 (Tominaga [33]).
(a) Result in 1955
λ Estimated measureStandard errorConfidence interval
−0.50.00320.0094(−0.0151, 0.0216)
0.00.00550.0158(−0.0255, 0.0364)
0.50.00680.0198(−0.0319, 0.0456)
1.00.00760.0218(−0.0352, 0.0504)
1.50.00780.0224(−0.0361, 0.0516)
2.00.00760.0218(−0.0352, 0.0504)
2.50.00710.0205(−0.0331, 0.0474)
3.00.00650.0188(−0.0303, 0.0433)
(b) Result in 1975
λ Estimated measureStandard errorConfidence interval
−0.50.07130.0196(0.0328,  0.1098)
0.00.11720.0314(0.0556, 0.1788)
0.50.14430.0379(0.0700, 0.2187)
1.00.15760.0410(0.0773, 0.2379)
1.50.16110.0417(0.0793, 0.2428)
2.00.15760.0410(0.0773, 0.2379)
2.50.14950.0392(0.0726, 0.2265)
3.00.13850.0369(0.0662, 0.2109)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Saito, K.; Takakubo, N.; Ishii, A.; Nakagawa, T.; Tomizawa, S. Measures of Departure from Local Marginal Homogeneity for Square Contingency Tables. Symmetry 2022, 14, 1075. https://doi.org/10.3390/sym14061075

AMA Style

Saito K, Takakubo N, Ishii A, Nakagawa T, Tomizawa S. Measures of Departure from Local Marginal Homogeneity for Square Contingency Tables. Symmetry. 2022; 14(6):1075. https://doi.org/10.3390/sym14061075

Chicago/Turabian Style

Saito, Ken, Nozomi Takakubo, Aki Ishii, Tomoyuki Nakagawa, and Sadao Tomizawa. 2022. "Measures of Departure from Local Marginal Homogeneity for Square Contingency Tables" Symmetry 14, no. 6: 1075. https://doi.org/10.3390/sym14061075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop