Open AccessArticle

Constrained Inference When the Sampled and Target Populations Differ

Huijun Yi

^1,† and

Bhaskar Bhattacharya

^2,*,†

Department of Mathematics, Troy University, Troy, AL 36082, USA

Department of Mathematics, Southern Illinois University, Carbondale, IL 62901, USA

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2016, 18(3), 97; https://doi.org/10.3390/e18030097

Submission received: 9 November 2015 / Revised: 23 February 2016 / Accepted: 7 March 2016 / Published: 16 March 2016

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

In the analysis of contingency tables, often one faces two difficult criteria: sampled and target populations are not identical and prior information translates to the presence of general linear inequality restrictions. Under these situations, we present new models of estimating cell probabilities related to four well-known methods of estimation. We prove that each model yields maximum likelihood estimators under those restrictions. The performance ranking of these methods under equality restrictions is known. We compare these methods under inequality restrictions in a simulation study. It reveals that these methods may rank differently under inequality restriction than with equality. These four methods are also compared while US census data are analyzed.

Keywords:

categorical data; inequalities; least square; maximum likelihood; minimum chi-squared; raking

1. Introduction

When working with a sample contingency table, a researcher might need to adjust it based on information available from other sources. This information might come from prior surveys, censuses, established theories or other sources. Often it comes as marginal information such as row and/or column totals. For example, consider a data set where each subject is cross-classified by income (low/high) and urbanity (urban/rural), and, marginal information about income and urbanity is available from a census. One would like to adjust the sample data to conform to the desired margins from census.

For two-way contingency tables of size (

I \times J

), four well-known [1,2] margin-adjusting methods for estimating cell probabilities are raking (RAKE), least squares (LSQ), minimum chi-squared (MCSQ) and maximum likelihood under random sampling (MLRS). Assume that a random sample

{n_{i j}, 1 \leq i \leq I, 1 \leq j \leq J}

is available from a multinomial

(n, π)

probability distribution, where

n = \sum_{i, j} n_{i j}, π = (π_{i j}, \forall i, j)

. Let

p_{i j} = \frac{n_{i j}}{n}

denote the sample cell proportions. Then RAKE finds the estimates

{{\hat{π}}_{i j}^{R K}}

that minimize the discrimination information,

\sum_{i = 1}^{I} \sum_{j = 1}^{J} {\hat{π}}_{i j} ln (\frac{{\hat{π}}_{i j}}{p_{i j}}),

under the marginal constraints

\sum_{j} {\hat{π}}_{i j} = π_{i +}, \sum_{i} {\hat{π}}_{i j} = π_{+ j}, i = 1, \dots, I, j = 1, \dots, J,

(1)

where

{\hat{π}}_{i j}

denotes the estimators of target cell probabilities

π_{i j}

\forall i, j

π_{i +}, π_{+ j}

are known,

\forall i, j

Under the same constraints (1), the methods LSQ, MCSQ, MLRS find the estimates

{{\hat{π}}_{i j}^{L S Q}}, {{\hat{π}}_{i j}^{M C S Q}}, {{\hat{π}}_{i j}^{M L}}

that minimize

\sum_{i = 1}^{I} \sum_{j = 1}^{J} \frac{{(p_{i j} - {\hat{π}}_{i j})}^{2}}{p_{i j}}, \sum_{i = 1}^{I} \sum_{j = 1}^{J} \frac{{({\hat{π}}_{i j} - p_{i j})}^{2}}{{\hat{π}}_{i j}}, - \sum_{i = 1}^{I} \sum_{j = 1}^{J} p_{i j} ln ({\hat{π}}_{i j}),

respectively.

Instead of given marginal totals, one might like to use restrictions of a more general nature. Consider the survey data [3] from the second National Health and Nutrition Examination Survey (NHANES II).

Table 1a shows the sample proportions and corresponding census proportions of

2 \times 2

contingency tables of income by urbanity, and Table 1b shows the sample proportions and corresponding census proportions of

2 \times 2

contingency tables of education by urbanity. We observe differences in the census and sample values, possibly due to differences in target and sampled populations. For example, in Table 1a census data, the magnitude of row totals (

0.3191 < 0.6809

) is different from that of the sample data (

0.5260 > 0.4740

). Similarly, in Table 1b census data, the off-diagonal entries satisfy an order relation (

0.2625 > 0.2107

), but, in samples, the relation goes in the opposite direction (

0.2360 < 0.2682

). If such constraints are known a priori (e.g., from census or other sources), then it is wiser to incorporate them into the analysis while adjusting the sample data.

Much prior work (e.g., [2]) assumed that random samples were directly taken from the

target population

with

known

row and column margins (

π_{i +}, π_{+ j}

respectively). However, in practice, there are situations in which a random sample from the target population is inaccessible. For example, often sample units are too expensive to locate or unwilling to participate in the survey. In this case, to estimate the target cell probabilities, we have to take a random sample from a

sampled population

that is systematically different from the target population. Clearly, the resulting estimators are typically biased. Researchers in [3] have studied such discrepancies under marginal row and column constraints. A similar problem in a regression context can be found in [4].

It is well-known that all four margin-adjusting methods are asymptotically equivalent under simple random sampling. However, their small sample results can be different. Using simulation methods, [5] found that MCSQ is best, followed by MLRS, RAKE and LSQ, in order of performance in average root mean squared error. However, for margin adjusting, [3] found that both RAKE and MLRS dominate MCSQ; and LSQ is inferior to all three methods when the sampled population is systematically different from the target population. In this paper, we consider general linear constraints (not necessarily marginal) under inequality restrictions and study the performance of those four methods. For simulation (Section 4), we have restricted our attention to

(2 \times 2)

tables to facilitate comparison with Little and Wu [3].

2. Solutions from Each Method

First, we vectorize the

I \times J

contingency table of probabilities lexicographically, say,

π = (π_{i j})

denote the

I J \times 1

target population probability vector. Thus, the pair

(i, j) = t

, for some

t, 1 \leq t \leq I J

. We assume that the available knowledge of the population can be expressed as r constraints as

A^{T} π \leq c,

(2)

where

A = (a_{i j})

denotes an

(I J \times r), r \leq I + J - 1

, matrix of constants with rank

(A) = r

c = (c_{i})

denotes the

(r \times 1)

corresponding known values vector.

First, we set each of these four methods as an optimization problem. The objective function

f (π)

takes the form

\sum_{t = 1}^{I J} π_{t} ln (\frac{π_{t}}{p_{t}})

for RAKE,

\sum_{t = 1}^{I J} \frac{{(π_{t} - p_{t})}^{2}}{p_{t}}

for LSQ,

\sum_{t = 1}^{I J} \frac{{(π_{t} - p_{t})}^{2}}{π_{t}}

for MCSQ and

- \sum_{t = 1}^{I J} p_{t} ln π_{t}

for MLRS, respectively. We seek to minimize the convex function

f (π)

over a region ℓ defined as

ℓ = {π ∣ A^{T} π \leq c, π_{t} > 0, \forall t, \sum_{t} π_{t} = 1},

(3)

and this is known as the

primal

problem. The Kuhn–Tucker method [6] identifies an equivalent

dual

problem that could be substantially easier to solve than the primal problem (for larger

I, J

The

Lagrangian

of the problem is defined by

L (λ, π) = f (π) + λ^{T} (A^{T} π - c), i f λ_{i} \geq 0 f o r a l l i, π \in ℓ,

and,

L (λ, π) = - \infty,

λ_{i} < 0

for some

i, π \in ℓ,

L (λ, π) = + \infty,

π \notin ℓ

, where

λ = (λ_{1}, \dots, λ_{r})

are called Lagrange multipliers.

Consider maximization in

λ

and minimization in

π

L (λ, π)

. Suppose there exists

(λ^{*}, π^{*})

for which

L (λ, π^{*}) \leq L (λ^{*}, π^{*}) \leq L (λ^{*}, π), \forall λ, π,

if and only if

\begin{matrix} \frac{\partial L}{\partial π_{t}} = \sum_{s} [\frac{\partial f (π_{s})}{\partial π_{s}} + \sum_{i = 1}^{r} λ_{i} (a_{i t} - c_{i})] = 0, f o r t = 1, 2, \dots, I J, a t (λ^{*}, π^{*}), \\ λ_{i}^{*} (\sum_{t} a_{i t} π_{t}^{*} - c_{i}) = 0, 1 \leq i \leq r, A^{T} π^{*} - C \leq 0, λ_{i}^{*} \geq 0, 1 \leq i \leq r, \end{matrix}

(4)

then

λ^{*}

is a Kuhn–Tucker vector and

π^{*}

is an optimal solution of the primal problem, and

L (λ^{*}, π^{*})

is the optimal value of the

L (λ, π)

More generally,

λ^{*}

is a Kuhn–Tucker vector if and only if

- \infty < {inf}_{π} L (λ^{*}, π) = {inf}_{π} {sup}_{λ} L (λ, π) = {sup}_{λ} {inf}_{π} L (λ, π) .

The dual problem is given by

{sup}_{λ} g (λ)

, where the function g is defined by

g (λ) = {inf}_{π} L (λ, π)

[6]. Often, the dual problem has a nice form, and

λ^{*}

can be found by numerical methods. Then, one can use the relation (4) to find the solution

π^{*}

to the primal problem.

3. Models Relating the Sampled and Target Populations

Suppose a random sample of size n is taken from the sampled population. For the

(i, j)

th cell, let

π_{i j}, τ_{i j}

be the target and sampled probabilities, respectively. Consider the RAKE model in Equation (5), below which it specifies how the sampled and target populations are connected. Theorem 1 shows that the solution to the model in (5) are the maximum likelihood (ML) estimators under the RAKE model.

Theorem 1. Suppose the probabilities of target and sampled populations are related by

\begin{matrix} ln (π_{t}^{*} / τ_{t}) = \sum_{i = 1}^{r} λ_{i}^{*} (\sum_{t} a_{i t} - c_{i}) \\ A^{T} π^{*} - c \leq 0, λ_{i}^{*} (\sum_{t} a_{i t} π_{t}^{*} - c_{i}) = 0, λ_{i}^{*} \geq 0, 1 \leq i \leq r . \end{matrix}

(5)

Then

π_{t}^{*}

, given by (5), are the maximum likelihood estimates of the cell probabilities

π_{t}

in the target population.

Proof. Consider the (primal) raking problem of minimizing

\sum_{t = 1}^{I J} π_{t} ln (\frac{π_{t}}{τ_{t}})

subject to

A^{T} π - c \leq 0

. Using an example from (p. 309, [7]), the Lagrangian for this problem is given by

L (λ, π) = \sum_{t = 1}^{I J} π_{t} ln (\frac{π_{t}}{τ_{t}}) + \sum_{i = 1}^{r} λ_{i} (\sum_{t} a_{i t} π_{t} - c_{i}),

(6)

λ_{i} \geq 0

for all i,

π \in ℓ

L (λ, π) = - \infty

λ_{i} < 0

for some i,

π \in ℓ

L (λ, π) = + \infty

, if

π \notin ℓ

To find the dual problem, define

\begin{matrix} g (λ) = inf_{π} L (λ, π) & = & inf_{π} \sum_{t = 1}^{I J} π_{t} ln [\frac{π_{t}}{τ_{t} exp [- \sum_{i = 1}^{r} λ_{i} (a_{i t} - c_{i})]}], \\ = - ln [\sum_{t = 1}^{I J} τ_{t} exp [- \sum_{i = 1}^{r} λ_{i} (a_{i t} - c_{i})]], \end{matrix}

(7)

λ_{i} \geq 0, \forall i

= - \infty

, if

λ_{i} < 0

for some i. This follows easily by Jensen’s inequality. The dual problem is

{sup}_{λ \geq 0} g (λ)

- {inf}_{λ \geq 0} \sum_{t = 1}^{I J} τ_{t} exp [- \sum_{i = 1}^{r} λ_{i} (a_{i t} - c_{i})],

from (7).

The dual problem is solved numerically (e.g., using the Newton–Raphson method), and we obtain the dual solutions

λ = λ^{*}

as functions of

τ_{t}

. Differentiating L in (6) with respect to each

π_{t}

and setting equal to zero, the primal solutions of

({\hat{π}}_{t})

are given by

{\hat{π}}_{t} = {\hat{τ}}_{t} exp [\sum_{i = 1}^{r} λ_{i}^{*} (\sum_{t = 1}^{I J} a_{i t} - c_{i})] .

Assuming the counts

n_{t}

follow a multinomial

(n, τ_{t}, \forall t)

distribution, the MLE of

τ_{t}

is given by

{\hat{τ}}_{t} = p_{t} = n_{t} / n

. Since

{\hat{λ}}_{i}^{*}

are functions of

p_{t}

, hence

{\hat{π}}_{t}

are MLEs. Thus, raking yields ML estimates for the RAKE model (5). ☐

In general, consider the model

\begin{matrix} {(π_{t}^{*} / τ_{t})}^{α} = \sum_{i = 1} λ_{i} (\sum_{t = 1}^{I J} a_{i t} - c_{i}) \\ A^{T} π^{*} - c \leq 0, λ_{i}^{*} (\sum_{t} a_{i t} π_{t}^{*} - c_{i}) = 0, λ_{i}^{*} \geq 0, 1 \leq i \leq r . \end{matrix}

(8)

By using similar arguments as above, LSQ is ML for the LSQ model obtained by setting

α = 1

, MLRS is ML for the MLRS model obtained by setting

α = - 1

, and MCSQ is ML for the MCSQ model obtained by setting

α = - 2

in (8). Of course,

α \to 0

in (8) corresponds to RAKE.

Theorem 3.1 shows that for any model α, (8) yields MLEs of

π_{t}

for that model. If

π_{t}

is generated from a method different from α in (8), the solution is still available, but it is not MLE under (8). Hence, it is of interest how these four different methods stack up against each other (as MLE versus not MLE) in a given situation. To address this issue, a simulation study is conducted in the next section.

4. A Simulation Study

We performed a simulation study to compare the methods in a systematic way. We restrict our attention to

(2 \times 2)

tables so that comparison with equality [3] is facilitated. In contrast to margin-adjusting methods (e.g., [3]) where only one parameter, e.g.

π_{11}

, is enough to consider, for inequality constraints one needs to consider all

π_{i j}, \forall i, j

. In this simulation, we have saught solution of the primal problem itself because the table dimensions (

2 \times 2

) are the smallest, and duality approach does not help much to reduce the necessary computation load.

We have considered two types of inequality restrictions in the simulation: isotonic and nonisotonic (see [7] for definitions). For each of the 16 designs described below, sample sizes

n = 30, 100, 1000

are considered. Thus, in each of

16 \times 3 = 48

cases, for a given

π

as the target population vector, we vary

λ

and find

τ

using (8). Then, we take multinomial random samples from this

τ

and calculate p. This process is repeated 200 times for each of 48 cases.

For isotonic constraints, we use a tree order as:

π_{11} \leq {π_{12}, π_{21}}

. The initial choices are

π = (π_{11}, π_{12}, π_{21}, π_{22}) = (. 232, . 232, . 232, . 304)

(0.231, 0.303, 0.264, 0.202); λ = (λ_{1}, λ_{2}) = (0.5, 0.5), (0, 0.5), (0.1, 0.1), (0.1, 0.5) .

[The results from

λ = (0.5, 0), (0.5, 0.1)

are not reported because performancewise

(0, 0.5) \approx (0.5, 0), (0.5, 0.1) \approx (0.1, 0.5)

For isotonic constraints, closed-form solutions (

π^{*}

) are available for all four methods as follows. The LSQ under tree order is calculated using the algorithm on page 19 of [7], and, MLRS = LSQ. The RAKE and MCSQ values are given by least square projections of

ln p

and

p^{2}

on to the constraints of interest, and then applying the inverse of those transformations (see pages 240 and 278 of [7], respectively).

For nonisotonic constraints, we consider the constraints:

π_{11} + π_{12} \leq c_{1}

π_{11} + π_{21} \leq c_{2}

, where

(c_{1}, c_{2}) = (0.4, 0.6)

or (0.6,0.7). Here, we use

π = (0.184, 0.216, 0.416, 0.184)

(0.387, 0.213, 0.313, 0.087)

with

λ = (λ_{1}, λ_{2}) = (0.5, 0.5), (0, 0.5), (0.1, 0.1), (0.1, 0.5)

With given

λ

and the target probabilities

π_{i j}

, first we determine the sample probabilities

τ_{i j}

using NEQNF of IMSL libraries of Fortran (version 7, Rogue Wave Software, Inc., Louisville, CO, USA). Then, a multinomial random sample of size n is taken from the sampled population by using the multinomial random number generator GGMTN in the IMSL subroutine library, and we calculate p.

Next,

π^{*}

is found for each of four methods. When there is no violation, no adjustment is needed. When there is a violation, the solution is found by using the subroutine LCONG of IMSL.

After we find the estimates

π^{*} = {π_{i}^{*}}

for either constraints, we calculate the root mean squared error of the estimates as RMSE =

\sum_{i = 1}^{4} {(π_{i} - π_{i}^{*})}^{2}

, where

π_{i}

is the true value of the target probability. To provide a more systematic comparison between these four methods, we compute a relative RMSE (RRMSE) defined as

R R M S E = 100 * ln (\frac{R M S E}{R M S E^{*}}),

(9)

where

R M S E^{*}

is the root mean squared error of the method that is ML under the model that generated the data, so

R M S E = R M S E^{*}

R R M S E = 0

for each model under its corresponding method.

Figure 1, Figure 2 and Figure 3 give visual comparisons of the methods under each model, for sample sizes

n = 30, 100, 1000

, respectively. For each figure the horizontal reference line with 0 RRMSE corresponds to the ML estimates under the model that was used to generate the data.

As mentoned earlier, for each sample size a total of 16 designs are considered, 1–8 are nonisotonic and 9–16 are isotonic; these are listed below. These designs are so numbered on the horizontal axis of each of the Figure 1, Figure 2 and Figure 3.

1. Nonisotonic;

π_{11} + π_{12} \leq 0.4, π_{11} + π_{21} \leq 0.4, π = (0.184, 0.216, 0.416, 0.184)

with

λ = (0.5, 0.5)

; 2. ...

λ = (0, 0.5)

; 3. ...

λ = (0.1, 0.1)

; 4. ...

λ = (0.1, 0.5)

; 5. Nonisotonic;

π_{11} + π_{12} \leq 0.6, π_{11} + π_{21} \leq 0.7, π = (0.387, 0.213, 0.313, 0.087)

with

λ = (0.5, 0.5)

; 6. ...

λ = (0, 0.5)

; 7. ...

λ = (0.1, 0.1)

; 8. ...

λ = (0.1, 0.5)

; 9. Isotonic;

π_{11} \leq {π_{12}, π_{21}}, π = (0.232, 0.232, 0.232, 0.304)

with

λ = (0.5, 0.5)

; 10. ...

λ = (0, 0.5)

; 11. ...

λ = (0.1, 0.1)

; 12. ...

λ = (0.1, 0.5)

; 13. Isotonic;

π_{11} \leq {π_{12}, π_{21}}, π = (0.231, 0.303, 0.264, 0.202)

with

λ = (0.5, 0.5)

; 14. ...

λ = (0, 0.5)

; 15. ...

λ = (0.1, 0.1)

; 16. ...

λ = (0.1, 0.5)

Overall RMSE of estimators. A crude comparison of the estimators is presented in Table 2, which gives the average RMSEs for each method over the 16 designs in each of isotonic and nonisotonic cases. Although the designs are different, this gives some illustration of the performance of the four methods. The RNDM values are obtained when the sample is taken directly from the target population. One would expect these values to be smaller than those that were generated from the sampled population, but we did not find that to be the case in our simulation study although they are pretty close.

When the target and sampled populations differ, one would expect that the method that is ML under the model that generated the data would have the lowest RMSE. For nonisotonic cases, RAKE satisfies this property; although MLRS does not satisfy this property, it follows RAKE closely. The RAKE estimates had the lowest RMSE under LSQ and MCSQ models as well. Thus, RAKE estimates seem to perform best, while MLRS follows RAKE very closely in each case. For the isotonic case, however, a different picture arises. Here, the LSQ estimates had the smallest RMSEs for the data generated under the RAKE model. Both LSQ and MCSQ estimates had the smallest RMSEs when the data were generated under the respective models. For MLRS, the MLRS estimates had slightly higher RMSEs than that of MCSQs.

Figure 1, Figure 2 and Figure 3 present RRMSEs for data generated under each of the four models for all 16 problems with n = 30, 100, 1000, respectively. To interpret them, first note that smaller values of

c_{1}, c_{2}

mean stronger constraints. In addition, a negative value of RRMSE reflects that bias from model misspecification is represented by lower variance than the method that is ML for the model that generated the data.

Certain reasonable patterns emerge from these figures; estimates based on the correct model dominate other methods when the sample size is large, or when the constraints are isotonic; here, the bias from model misspecification dominates RMSE. Results from nonisotonic constraints are more homogeneous. For them, LSQ turned out to be generally larger than MLRS.

Panel a of the figures summarizes results for the data generated under the RAKE model. For nonisotonic constraints, RAKE and MLRS performed similarly. For n = 30, 100, LSQ is slightly inferior to the other methods for the nonisotonic constraints with

(c_{1}, c_{2}) = (0.4, 0.6)

but is competitive when

(c_{1}, c_{2}) = (0.6, 0.7)

. RAKE seems to dominate (or close) and MCSQ performs worst (except when n = 30) of all nonisotonic constraints cases 1–8. RAKE performs slightly worse for isotonic constraints cases when n = 30, but is best again when n = 100, 1000.

Panel b of the figures summarizes results for data generated under the LSQ model. For all constraints with n 1000, LSQ and MLRS performed similarly. For n = 30, 100, LSQ is much inferior to MLRS for the nonisotonic constraints with

(c_{1}, c_{2}) = (0.4, 0.6)

but performs similarly when

(c_{1}, c_{2}) = (0.6, 0.7)

. MCSQ performs worst throughout, except for isotonic constraints with n = 30, when all three methods did better than RAKE, but this turned around when n = 100, 1000.

Panel c of the figures summarizes results for the data generated under the MCSQ model. Although for isotonic constraints LSQ = MLRS, for nonisotonic constraints, LSQ performed much worse than MLRS. The MCSQ values were close to the LSQ values for all constraints, except for isotonic constraints designs 9, 12 with n = 1000 when MCSQ is way off. Rake performed competitively with MLRS for nonisotonic cases. However, for isotonic constraints, RAKE was outperformed by other three methods for all n.

Panel d of the figures summarizes results for data generated under the MLRS model. Although for isotonic constraints MCSQ performed best for all n, with nonisotonic constraints, MCSQ is beaten by all other methods for n = 100, and by RAKE and MLRS when n = 30. LSQ performed much worse than MLRS for all nonisotonic cases. MLRS performed best for nonisotonic constraints and was close to best (MCSQ) for isotonic constraints, for all n.

5. Applying Four Methods to Real World Data

In this section, we illustrate the four methods studied in this paper using the data from [3] on the second National Health and Nutrition Examination Survey (NHANES ⨿). The data are presented in Table 1.

It is not hard to observe that in Table 1a the sample cell and marginal proportions differ considerably from the census (see Section 1). From the census values for the income case from Table 1a, we see

π_{11} + π_{12} = 0.3191

. Hence, it is reasonable to consider

π_{11} + π_{12} \leq 0.32

; observing similar other discrepancies, we consider the following three inequality restrictions:

\{\begin{matrix} π_{11} + π_{12} \leq 0.32 \\ π_{12} + π_{22} \leq 0.30 \\ π_{11} - 2 π_{12} \leq 0 \end{matrix},

(10)

and for the education case, we consider

\{\begin{matrix} π_{11} + π_{12} \leq . 67 \\ π_{11} + π_{21} \geq . 72 \end{matrix} .

(11)

For each problem, the estimates and RMSEs are computed and the results are displayed in Table 3. We consider the census proportions as the target probabilities

{π_{i j}}

Let NHANES ⨿ data be our sample proportions

{p_{i j}}

. Let

{{\hat{π}}_{i j}}

be the estimates under the constraints in (10), also under (11) (considered separately). Below, we define the adjusted RMSE, the unadjusted RMSE, and the proportional unexplained root mean square error (PURMSE), where “adjusted” means estimates under constraints, and “unadjusted” means unrestricted sample proportions.

a d j u s t e d R M S E = \sqrt{\sum_{i} \sum_{j} {({\hat{π}}_{i j} - π_{i j})}^{2}},

(12)

u n a d j u s t e d R M S E = \sqrt{\sum_{i} \sum_{j} {(p_{i j} - π_{i j})}^{2}},

(13)

P U R M S E = 100 * \frac{u n a d j u s t e d R M S E - a d j u s t e d R M S E}{u n a d j u s t e d R M S E} .

(14)

The larger value of PURMSE means that the adjusted MSEs are smaller than the unadjusted, which subsequently means that the used constraints give estimates that are quite close to the target values. Table 3 shows that all four methods perform well. Among all methods, LSQ performs comparatively the worst.

6. Conclusions

The paper [3] compared four margin-adjusting methods using equality constraints of known marginal totals for

(I \times J)

contingency tables. Here, under general linear inequality constraints, theoretical models are proposed for the differences between the sampled and target populations. To compare the performance of these four methods, a simulation is performed for the case of

I = J = 2

. Based on this simulation, we find that the performance of the methods depends on the specific type of constraints. For nonisotonic constraints, we find RAKE to perform the best, with MLRS being a close second. The MCSQ and LSQ perform worse, of which MCSQ is slightly better than LSQ. These findings are parallel to those of [3].

However, we also find that the performance ranking of the methods changes for isotonic constraints (tree order). Here, MCSQ and LSQ (= MLRS) perform better than RAKE. In addition, MCSQ is slightly better than LSQ (= MLRS). These results are very different from those of [3].

The theoretical models for the differences between the sampled and target populations, and the corresponding methods and techniques described can be extended to higher dimensions in a similar manner as in Theorem 1.

As opposed to the case under equality restrictions, the distribution of estimators under inequality constraints is not known; hence, their mean and standard errors are also not tractable. It is difficult to explain the different behavior of the estimators under isotonic and non-isotonic constraints as seen in simulation. It is well-known that estimators under isotonic constraints have special properties of partial order relations.

We have fixed the values of

c_{1}, c_{2}

earlier in the simulation keeping in par with applications. This makes the choices of

π_{i j}

restricted under inequality constraints. Note that with equality constraints, once we fix

π_{11}

, all values of

π_{i j}

are fixed. This is not the case under inequality constraints. The choices of

π_{i j}, λ_{i}

have to be such that the optimization problem has a solution.

Acknowledgments

The authors thank the referees for several suggestions which led to improved presentation of this paper.

Author Contributions

The authors have equal contribution. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Causey, B.D. Estimation under generalized sampling of cell proportions for contingency tables subject to marginal constraints. Commun. Stat. Theory Meth. 1984, 13, 2487–2494. [Google Scholar] [CrossRef]
Read, T.R.C.; Cressie, N. Goodness-of-fit Statistics for Discrete Multivariate Data; Springer: New York, NY, USA, 1988. [Google Scholar]
Little, R.J.A.; Wu, M.-M. Models for contingency tables with known margins when target and sampled populations differ. J. Am. Stat. Assoc. 1991, 86, 87–95. [Google Scholar] [CrossRef]
Hellerstein, J.K.; Imbens, G.W. Imposing moment restrictions from auxiliary data by weighting. Rev. Econ. Stat. 1996, 81, 1–14. [Google Scholar] [CrossRef]
Causey, B.D. Estimation of proportions for multinomial contingency tables subject to known marginal constraints. Commun. Stat. Theory Meth. 1983, 12, 2581–2587. [Google Scholar] [CrossRef]
Rockafellar, R.T. Convex Analysis; John Wiley & Sons, Inc.: New York, NY, USA, 1970. [Google Scholar]
Robertson, T.; Wright, F.T.; Dykstra, R.L. Order Restricted Statistical Inference; John Wiley & Sons, Inc.: New York, NY, USA, 1988. [Google Scholar]

Figure 1. RRMSEs for data generated under four models (a) RAKE, (b) LSQ, (c) MCSQ, (d) MLRS, when

n = 30