Abstract
In multi-site studies, sharing individual-level information across multiple data-contributing sites usually poses a significant risk to data security. Thus, due to privacy constraints, analytical tools without using individual-level data have drawn considerable attention to researchers in recent years. In this work, we consider regression analysis of current status data arising from multi-site cross-sectional studies and develop two distributed estimation methods tailored to the unstratified and stratified additive hazards models, respectively. In particular, instead of utilizing the individual-level data, the proposed methods only require transferring the summary statistics from each site to the analysis center, which achieves the aim of privacy protection. We establish the asymptotic properties of the proposed estimators, including the consistency and asymptotic normality. Specifically, the distributed estimators derived are shown to be asymptotically equivalent to those based on the pooled individual-level data. Simulation studies and an application to a multi-site gonorrhea infection data set demonstrate the proposed methods’ satisfactory performance and practical utility.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings in this paper are available on request from the corresponding author. The data are not publicly available due to privacy.
References
Andersen, P.K., Gill, R.D.: Cox’s regression model for counting processes: a large sample study. Ann. Stat. 10(4), 1100–1120 (1982)
Anschuetz, G.L., Asbel, L., Spain, C.V., Salmon, M., et al.: Association between enhanced screening for chlamydia trachomatis and Neisseria gonorrhoeae and reductions in sequelae among women. J. Adolesc. Health 51(1), 80–85 (2012)
Brighton, R.W., Wilding, K.: Delayed diagnosis of gonococcal arthritis of the foot caused by beta-lactamase-producing Neisseria gonorrhoeae. Med. J. Aust. 156(5), 368 (1992)
Duan, R., Boland, M.R., Liu, Z., Liu, Y., et al.: Learning from electronic health records across multiple sites: a communication-efficient and privacy-preserving distributed algorithm. J. Am. Med. Inform. Assoc. 27(3), 376–385 (2020)
Greenhalgh, T., Stramer, K., Bratan, T., Byrne, E., Mohammad, Y., Russell, J.: Introduction of shared electronic records: multi-site case study using diffusion of innovation theory. BMJ-Br. Med. J. 337, 1786 (2008)
Huang, J.: Efficient estimation for the proportional hazards model with interval censoring. Ann. Stat. 24, 540–568 (1996)
Huang, C., Wei, K., Wang, C., Yu, Y., Qin, G.: Covariate balance-related propensity score weighting in estimating overall hazard ratio with distributed survival data. BMC Med. Res. Methodol. 23(1), 233 (2023)
Kalbfleisch, J.D., Prentice, R.L.: The Statistical Analysis of Failure Time Data. Wiley, New Jersey (2002)
Li, S., Peng, L.: Instrumental variable estimation of complier causal treatment effect with interval-censored data. Biometrics 79, 253–263 (2023)
Li, S., Hu, T., Sun, J.: Regression analysis of misclassified current status data. J. Nonparametr. Stat. 32(1), 1–19 (2020)
Li, S., Tian, T., Hu, T., Sun, J.: A simulation-extrapolation approach for regression analysis of misclassified current status data with the additive hazards model. Stat. Med. 40(28), 6309–6320 (2021)
Li, D., Lu, W., Shu, D., Toh, S., Wang, R.: Distributed Cox proportional hazards regression using summary-level information. Biostatistics 24(3), 776–794 (2023)
Li, S., Hu, T., Wang, L., McMahan, C.S., Tebbs, J.M.: Regression analysis of group-tested current status data. Biometrika (2024). https://doi.org/10.1093/biomet/asae006
Lin, D.Y., Oakes, D., Ying, Z.: Additive hazards regression with current status data. Biometrika 85(2), 289–298 (1998)
Lu, C.L., Wang, S., Ji, Z., Wu, Y., et al.: WebDISCO: a web service for distributed cox model learning without patient-level data sharing. J. Am. Med. Inform. Assoc. 22(6), 1212–1219 (2015)
Luo, C., Islam, M.N., Sheils, N.E., Buresh, J., et al.: dPQL: a lossless distributed algorithm for generalized linear mixed model with application to privacy-preserving hospital profiling. J. Am. Med. Inform. Assoc. 29(8), 1366–1371 (2022)
Martinussen, T., Scheike, T.H.: Efficient estimation in additive hazards regression with current status data. Biometrika 89(3), 649–658 (2002)
Mateos, G., Bazerque, J.A., Giannakis, G.B.: Distributed sparse linear regression. IEEE Trans. Signal Process. 58(10), 5262–5276 (2010)
Mcmahan, C.S., Wang, L., Tebbs, J.M.: Regression analysis for current status data using the EM algorithm. Stat. Med. 32(25), 4452–4466 (2013)
Mcmurry, A.J., Murphy, S.N., MacFadden, D., Weber, G., et al.: Shrine: enabling nationally scalable multi-site disease studies. PLoS One 8(3), 55811 (2013)
Russell, M.W.: Immune responses to Neisseria gonorrhoeae: challenges and opportunities with respect to pelvic inflammatory disease. J. Infect. Dis. 224, 96–102 (2021)
Sanderson, S.C., Brothers, K.B., Mercaldo, N.D., Clayton, E.W., et al.: Public attitudes toward consent and data sharing in biobank research: a large multi-site experimental survey in the us. Am. J. Hum. Genet. 100(3), 414–427 (2017)
Shu, D., Yoshida, K., Fireman, B.H., Toh, S.: Inverse probability weighted cox model in multi-site studies without sharing individual-level data. Stat. Methods Med. Res. 29(6), 1668–1681 (2020)
St Cyr, S., Barbee, L., Workowski, K.A., Bachmann, L.H., et al.: Update to CDC’s treatment guidelines for gonococcal infection, 2020. Med. J. Aust. 69(50), 1911–1916 (2020)
Stewart, K., Carlson, M., Segal, A.M., White, C.S.: Gonococcal arthritis caused by penicillinase-producing strains of Neisseria gonorrhoeae. Arthritis Rheum. 34(2), 245–6 (1991)
Sun, J.: The Statistical Analysis of Interval-Censored Failure Time Data. Springer, New York (2006)
Tian, L., Cai, T.: On the accelerated failure time model for current status and interval censored data. Biometrika 93(2), 329–342 (2006)
Toh, S.: Analytic and data sharing options in real-world multidatabase studies of comparative effectiveness and safety of medical products. Clin. Pharmacol. Ther. 107(4), 834–842 (2020)
Toh, S., Wellman, R., Coley, R.Y., et al.: Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research. Clin. Epidemiol. 10, 1773–1786 (2018)
Tsevat, D.G., Wiesenfeld, H.C., Parks, C., Peipert, J.F.: Sexually transmitted diseases and infertility. Am. J. Obstet. Gynecol. 216(1), 1–9 (2017)
Wang, L., Sun, J., Tong, X.: Regression analysis of case II interval-censored failure time data with the additive hazards model. Stat. Sin. 20(4), 1709–1723 (2010)
Wolfson, M., Wallace, S.E., Masca, N., Rowe, G., et al.: Datashield: resolving a conflict in contemporary bioscience-performing a pooled analysis of individual-level data without sharing the data. Int. J. Epidemiol. 39(5), 1372–1382 (2010)
Zeng, D., Mao, L., Lin, D.Y.: Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103(2), 253–271 (2016)
Zeng, D., Gao, F., Lin, D.Y.: Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104(3), 505–525 (2017)
Zhao, X., Duan, R., Zhao, Q., Sun, J.: A new class of generalized log rank tests for interval-censored failure time data. Comput. Stat. Data Anal. 60, 123–131 (2013)
Acknowledgements
We are grateful to the associate editor and two reviewers for their insightful comments and suggestions that greatly improve this article. Shuwei Li’ research was partially supported by the Nature Science Foundation of Guangdong Province of China (Grant No. 2022A1515011901) and National Statistical Science Research Project (Grant No. 2022LY041). Xinyuan Song’ research was partially supported by GRF Grant (Grant No. 14303622) from the Research Grant Council of the Hong Kong Special Administrative Region.
Author information
Authors and Affiliations
Contributions
Conceptualization, SL and XS; methodology, PH, SL and XS; software, PH; resources, SL; data curation, SL; writing—original draft preparation, PH, SL and XS; supervision, XS; funding acquisition, SL and XS; All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Regularity conditions
Recall that \(\Omega = \{1, \dots , n\}\) is the index set of the observed data from K data-contributing sites. \(\Omega _k = \{i: i\) is from site k, for \(i = 1, \ldots ,n\}\) denotes the index set of the kth site, and \(n_k\) is the size of \(\Omega _k\), then \(n=n_1+ \ldots +n_K\). The size of \(\widetilde{\Omega }\) is d, where \(\widetilde{\Omega }=\{i:\delta _i =1,\) for \(i=1, \ldots , n\}\). For each \(k = 1, \ldots , K\), \(\widetilde{\Omega }_k=\{i: \delta _i=1, i\) is from site k, for \(i=1, \ldots , n\}\). Denoted by \(\varvec{\beta }_0\) the true value of \(\varvec{\beta }\) related to the population formed by the K sites. Denoted by \(\varvec{\beta }_{0(k)}\) the true value of \(\varvec{\beta }\) related to the kth site. Let \(\varvec{\theta _0}=(\varvec{\beta }_{0(1)}^{\top },\ldots ,\varvec{\beta }_{0(K)}^{\top },\varvec{\beta }_{0}^{\top })^{\top }\) denote the true value of \(\varvec{\theta }=(\varvec{\beta }_{(1)}^{\top },\ldots ,\varvec{\beta }_{(K)}^{\top },\varvec{\beta }^{ \top })^{\top }\).
Let \(S^{(m)}(\varvec{\beta },t)=\sum _{j \in \Omega } Y_{j}(t)e^{-\varvec{\beta }^{\top } \varvec{Z}_{j}t}{(\varvec{Z}_{j}t)}^{\otimes m} \), \(S_k^{(m)}(\varvec{\beta },t)=\sum _{j \in \Omega _k} Y_{j(k)}(t)e^{-\varvec{\beta }_{(k)}^{\top } \varvec{Z}_{j(k)}t}{(\varvec{Z}_{j(k)}t)}^{\otimes m}, \) for \(k=1, \ldots , K\) and \(m = 0,1,2\), where \(\varvec{a}^{\otimes 0} = 1, \varvec{a}^{\otimes 1} = \varvec{a}\), and \( \varvec{a}^{\otimes 2} = \varvec{a}\varvec{a}^\top \) for a column vector \(\varvec{a}\). Define \(E_k(\varvec{\beta },t)= S_k^{(1)}(\varvec{\beta },t)/S_k^{(0)}(\varvec{\beta },t)\), \(V_k(\varvec{\beta },t)= S_k^{(2)}(\varvec{\beta },t)/S_k^{(0)}(\varvec{\beta },t) - E_k(\varvec{\beta },t)^{\otimes {2}}\) for \(t \in (0, \tau ]\) and \(\varvec{\beta }\in \mathcal {B}\), where \(\mathcal {B}\) is a compact set in \(\mathbb {R}^p\). For a matrix \(\varvec{A}\) or a vector \(\varvec{a}\), \(||\varvec{A}||=\sup _{i,j}|a_{ij}|\) and \(||\varvec{a}||=\sup _{i}|a_{i}|\), where \(a_{ij}\) is the (i, j)th element of \(\varvec{A}\) and \(a_{i}\) is the ith component of \(\varvec{a}\). Let “\(\overset{P}{\rightarrow }\)” and “\(\overset{D}{\rightarrow }\)” denote the convergence in probability and the convergence in distribution, respectively.
To establish the asymptotic properties of proposed estimator \(\hat{\varvec{\beta }}\), we need the following regularity conditions:
-
(C1)
\(P(Y(t) = 1 \mid \varvec{Z}) > 0\) for all \(t \in (0, \tau ]\), where \(\tau \) satisfies \(P(C \ge \tau ) > 0\) and \(Y(t) = I(C \ge t)\).
-
(C2)
\(\varvec{Z}_i\) is bounded for \(i=1,2,\ldots ,n\). The true value of H(t) denoted by \(H_0(t)\) at \(\tau \) is finite. For \(k = 1, \ldots , K\), the true value of \(H_k(t)\) denoted by \(H_{0(k)}(t)\) at \(\tau \) is finite, where \(\textrm{d} H_k(t) = e^{-\Lambda _k (t)}\textrm{d}\Lambda _c(t)\) and \(\Lambda _k (t)=\int _{0}^{t}\lambda _k (s)\textrm{d}s\).
-
(C3)
The true regression vector \(\varvec{\beta }_0\) lies in the interior of \(\mathcal {B}\) and there exist functions \(s^{(m)}(\varvec{\beta },t)\) with \(m=0,1,2\) defined on \(\mathcal {B}\times (0,\tau ]\) satisfying
-
(a)
\(\sup \limits _{\varvec{\beta }\in \mathcal {B}, t\in (0,\tau ]} ||n^{-1}S^{(m)}(\varvec{\beta },t)-s^{(m)}(\varvec{\beta },t)|| \overset{P}{\rightarrow }\textbf{0}\), as \(n\rightarrow \infty \).
-
(b)
\(s^{(0)}(\varvec{\beta },t)\) is bounded away from 0.
-
(c)
For \(m=0,1,2\), \(s^{(m)}(\varvec{\beta },t)\) is a uniformly continuous function of \(\varvec{\beta }\) in \((0, \tau ]\), where \(s^{(1)}(\varvec{\beta },t) = \partial {s^{(0)}(\varvec{\beta },t)}/{\partial \varvec{\beta }} \) and \(s^{(2)}(\varvec{\beta },t)=\partial ^2{s^{(0)}(\varvec{\beta },t)}/{\partial \varvec{\beta }\partial \varvec{\beta }^{\top }}.\)
-
(d)
For \(\varvec{\beta }\in \mathcal {B}\), let \(e(\varvec{\beta },t)=s^{(1)}(\varvec{\beta },t)/s^{(0)}(\varvec{\beta },t)\), \(v(\varvec{\beta },t)=s^{(2)}(\varvec{\beta },t)/s^{(0)}(\varvec{\beta },t)-e(\varvec{\beta },t) e(\varvec{\beta },t)^{\top }.\) \(\varvec{\Sigma }^*(\varvec{\beta }_0)=\int _0^{\tau }v(\varvec{\beta }_0,u)s^{(0)}(\varvec{\beta }_0,u)\textrm{d} H_0(u)\) is positive definite.
-
(a)
-
(C4)
-
(a)
Underlying parameters are identical across all sites, that is, \(\varvec{\beta }_{0(k)}=\varvec{\beta }_0\) for \(k=1,\ldots , K\);
-
(b)
Limiting processes of \(S_{k}^{(m)}(\varvec{\beta },t)\)’s are identical across all sites, that is, for \(k=1,\ldots , K\), \(\sup \limits _{\varvec{\beta }\in \mathcal {B}, t\in (0,\tau ]} ||n_k^{-1}S_k^{(m)}(\varvec{\beta },t)-s^{(m)} (\varvec{\beta },t)|| \overset{P}{\rightarrow }\ 0\), as \(n_k\rightarrow \infty \) for \(\varvec{\beta }\in \mathcal {B}\) and \(t \in (0, \tau ]\).
-
(a)
Conditions (C1)–(C3) are standard assumptions in the literature of survival analysis, see for instance Andersen and Gill (1982). Condition (C4) is usually referred to as the homogeneity assumption, which implies that data from different sites follow the same underlying model or different models but with same regression parameters (Li et al. 2023, for example).
Appendix B: Proof of Theorem 1
Let \(\hat{\varvec{\beta }}\) denote the estimator of \(\varvec{\beta }\) by solving \(\widetilde{\varvec{U}}(\varvec{\beta })=0\). For each \(k = 1, \dots , K\), we first obtain the estimate of \(\varvec{\beta }\) denoted by \(\hat{\varvec{\beta }}_{(k)}\) by solving \(\varvec{U}_k^*(\varvec{\beta }) = 0\) within site k, where \(\varvec{U}^*_k(\varvec{\beta })\) has the same form as \(\varvec{U}^* (\varvec{\beta })\) (3) after replacing \(\Omega \) with \(\Omega _k\). Define \(\hat{\varvec{\theta }} = (\hat{\varvec{\beta }}_{(1)}^{\top }, \ldots , \hat{\varvec{\beta }}_{(K)}^{\top }, \hat{\varvec{\beta }}^{\top })^{\top }\). By applying Taylor series expansion about \(\varvec{\beta }_{0}\) to \(\varvec{U}_k^*(\hat{\varvec{\beta }}_{(k)}) \), we have
where \(\varvec{\beta }^*_{(k)}\) is between \(\hat{\varvec{\beta }}_{(k)}\) and \(\varvec{\beta }_0\) and \(\varvec{I}^*_k(\varvec{\beta })\) has the same form as \(\varvec{I}^*(\varvec{\beta })\) (4) after replacing \(\Omega \) with \(\Omega _k\). By the arguments given in Andersen and Gill (1982) and Conditions (C1)–(C4), it can be shown that \(\hat{\varvec{\beta }}_{(k)} \overset{P}{\rightarrow }\ \varvec{\beta }_0\), and \(n_k^{-1}\varvec{I}_k^*(\varvec{\beta }^*_{(k)})\overset{P}{\rightarrow }\varvec{\Sigma }^*(\varvec{\beta }_0)\) for any \(\varvec{\beta }_{(k)}^*\) that converges in probability to \(\varvec{\beta }_0\). For each \(k=1, \ldots , K\), we follow Lin et al. (1998) and can conclude that \(n_k^{-1/2} \varvec{U}_k^*(\varvec{\beta }_0) \overset{D}{\rightarrow } \varvec{N}(\varvec{0},\varvec{\Sigma }^*(\varvec{\beta }_0))\) and \(n_k^{1/2}(\hat{\varvec{\beta }}_{(k)}-\varvec{\beta }_0)\overset{D}{\rightarrow } N(\varvec{0}, \varvec{\Sigma }^*(\varvec{\beta }_0)^{-1})\) under Conditions (C1)–(C4). Note that \(\varvec{\Sigma }^*(\varvec{\beta }_0) = \lim \limits _{n_k \rightarrow \infty }n_k^{-1}\)\(\varvec{I}_k^*(\varvec{\beta }_0)\), and under Condition (C4), we also have \(\varvec{\Sigma }^*(\varvec{\beta }_0) = \lim \limits _{n \rightarrow \infty }n^{-1}\varvec{I}(\varvec{\beta }_0)\).
Consider the following Taylor series expansion
Then \(S^{(0)}(\varvec{\beta },t)\), \(S^{(1)}(\varvec{\beta },t)\) and \(S^{(2)}(\varvec{\beta },t)\) can be approximated by
and
respectively. Under Conditions (C1)–(C4), we have \(\sup \limits _{\varvec{\beta }\in \mathcal {B}, t\in (0,\tau ]}\)\( ||n^{-1}\widetilde{S}^{(m)}(\varvec{\beta },t)-s^{(m)} (\varvec{\beta },t)|| \overset{P}{\rightarrow }\ 0\) as \(n\rightarrow \infty \), for \(m=0,1,2\) and \(\hat{\varvec{\beta }}_{(k)}\) that is close to \(\varvec{\beta }\). The pseudo log-likelihood function related to the score function (7) is given by
The first and second derivatives of \(\widetilde{l}(\varvec{\beta })\) with respect to \(\varvec{\beta }\) are
and
respectively. By following Andersen and Gill (1982) and Kalbfleisch and Prentice (2002), we define
for \(t \le \tau \), where \(\widetilde{l}(\varvec{\beta }, t)=-\sum _{i \in \widetilde{\Omega }} \int _0^{t} \{\varvec{\beta }^{\top }\varvec{Z}_{i}t + \textrm{log}\)\( [\widetilde{S}^{(0)}(\varvec{\beta },t) ] \} \textrm{d}N_{i}(u)\). Note that \(\hat{\varvec{\beta }}\) maximizes \(X(\varvec{\beta },\tau )\), we can obtain the compensator of \(X(\varvec{\beta },t)\), which is given by
By following Andersen and Gill (1982), it can be shown that \(X(\varvec{\beta }, t)-\tilde{X}(\varvec{\beta }, t)\) converges to 0 for any \(t \in (0,\tau ]\). Define
Under Conditions (C1)–(C4), it can be concluded that \(\tilde{X}(\varvec{\beta },\tau )\overset{P}{\rightarrow }f(\varvec{\beta })\). By applying the Lenglart’s inequality given in Andersen and Gill (1982), we have \(X(\varvec{\beta },\tau )\overset{P}{\rightarrow }f(\varvec{\beta })\). The first and second derivatives of \(f(\varvec{\beta })\) with respect to \(\varvec{\beta }\) are
and
respectively. Note that \(\partial f(\varvec{\beta })/\partial \varvec{\beta }|_{\varvec{\beta }=\varvec{\beta }_0} = 0\) and the negative of \(\partial ^{2} f(\varvec{\beta })/\partial \varvec{\beta }\partial \varvec{\beta }^{\top }\) is a positive definite matrix, \(\varvec{\beta }_0\) is the unique maximum of \(f(\varvec{\beta })\). Therefore, by following Andersen and Gill (1982), we can conclude that \(\hat{\varvec{\beta }} \overset{P}{\rightarrow }\ \varvec{\beta }_0\) as \(n {\rightarrow } \infty \).
Notice that
By applying Taylor series expansion about \(\varvec{\theta }_0\) to \(\widetilde{\varvec{U}}(\hat{\varvec{\theta }})\) and following the arguments given in Li et al. (2023), we have
where \(\varvec{\theta ^*}\) is between \(\hat{\varvec{\theta }}\) and \(\varvec{\theta }_0\), \(\widetilde{\varvec{I}}_{k} (\varvec{\theta })= \partial \widetilde{\varvec{U}}(\varvec{\theta })/\partial \varvec{\beta }_{(k)}^{\top }\) and \(\widetilde{\varvec{I}}_{K+1} (\varvec{\theta })= \partial \widetilde{\varvec{U}}(\varvec{\theta })/\partial \varvec{\beta }^{\top }\). The consistency of \(\hat{\varvec{\theta }}\) indicates that \(n_k^{-1}\widetilde{\varvec{I}}_{k}({\varvec{\theta }^*}) \overset{P}{\rightarrow } \varvec{\Sigma }_{k}(\varvec{\theta }_0)\) and \(n^{-1}\widetilde{\varvec{I}}_{K+1}({\varvec{\theta }^*}) \overset{P}{\rightarrow } \varvec{\Sigma }_{K+1}(\varvec{\theta }_0)\) for any \(\varvec{\theta }^*\) that converges in probability to \(\varvec{\theta }_0\), where \(\varvec{\Sigma }_{k}(\varvec{\theta }_0) = \lim \limits _{n_k \rightarrow \infty }n_k^{-1}\widetilde{\varvec{I}}_k(\varvec{\theta }_0)\) and \(\varvec{\Sigma }_{K+1}(\varvec{\theta }_0) = \lim \limits _{n \rightarrow \infty }n^{-1}\widetilde{\varvec{I}}_{K+1}(\varvec{\theta }_0)\).
Next, we show that \(\textrm{Cov}(\varvec{U}^*_k({\varvec{\beta }_{0}}),\widetilde{\varvec{U}}({\varvec{\theta }_0}))=\varvec{0}\).
For \(k=1,\ldots ,K\), the observed data from the kth site are denoted by \(\mathcal {O} = \{ (C_j, \delta _j, \varvec{Z}_j), j\in \Omega _k \}\). Without loss of generality, we temporarily assume \(K=2\), the covariance between \(\varvec{U}_1^{*}(\varvec{\beta }_{0})\) and \(\widetilde{\varvec{U}}(\varvec{\theta }_0)\) is
Since \(\mathbb {E}(\varvec{U}_1^*(\varvec{\beta }_{0}))=\varvec{0}\), it can be conclude that \(\textrm{Cov}(\varvec{U}_1^*(\varvec{\beta }_{0}),\)\( \widetilde{\varvec{U}}(\varvec{\theta }_0))=\varvec{0}\). Similarly, we have \(\textrm{Cov}(\varvec{U}_2^*(\varvec{\beta }_{0}),\widetilde{\varvec{U}}(\varvec{\theta }_0))=\varvec{0}\). Additionally, owing to the independence of K data-contributing sites, we have \( \textrm{Cov}(\varvec{U}_i^*(\varvec{\beta }_{0}), \varvec{U}_j^*(\varvec{\beta }_{0}))=\varvec{0}\) for \(i \ne j\). By applying the Central Limit Thoerem (Andersen and Gill 1982; Kalbfleisch and Prentice 2002), we can conclude that
where
Furthermore, by (13) and (14), we have
where
Hence, we have
as \(n {\rightarrow } \infty \). Then by (15), we can conclude that \(n^{1/2}(\hat{\varvec{\beta }}-\varvec{\beta }_0)\) converges in distribution to a zero-mean normal random vector with covariance matrix
which corresponds to the bottom right corner block of \( \textbf{P}^{-1}\textbf{H}(\textbf{P}^{\top })^{-1}\). Note that
where
and
where
and
In (14), under Conditions (C1)–(C4), we know that \(\varvec{\Sigma }_{K+1}(\varvec{\theta }_0)\) equals the limit of \(n^{-1}\widetilde{\varvec{I}}_{K+1}(\varvec{\theta }_0)\), \(\varvec{\Sigma }_k(\varvec{\theta }_0)\) equals the limit of \(n_k^{-1}\widetilde{\varvec{I}}_{k}(\varvec{\theta }_0)\) and \(\varvec{\Sigma }^*(\varvec{\beta }_{0})\) is the limit of \(n^{-1}{\varvec{I}} (\varvec{\beta }_0)\). Therefore, since \(\varvec{\Sigma }_k(\varvec{\theta }_0) =\varvec{0}\) for \(k=1,\ldots ,K\), \(\varvec{\Sigma }_{K+1}(\varvec{\theta }_0)=\varvec{\Sigma }^*(\varvec{\beta }_{0})\), we can conclude from (16) that
Notice that when the pooled individual-level data from the K sites are accessible, the estimate of \(\varvec{\beta }\) can be obtained by solving \(\varvec{U}(\varvec{\beta })=0\). Based on the arguments provided in Andersen and Gill (1982) and Conditions (C1)–(C4), the limiting covariance matrix of the estimator of \(\varvec{\beta }\) based on the pooled individual-level data is the same as the proposed distributed estimator \(\hat{\varvec{\beta }}\). This implies that the distributed estimator \(\hat{\varvec{\beta }}\) is asymptotically equivalent to the estimator obtained from analyzing the pooled individual-level data.
By the conclusions obtained above, we know that \(\varvec{\Sigma }_{K+1}(\varvec{\theta _0})\), \(\varvec{\Sigma }_{k}(\varvec{\theta _0})\) and \(\varvec{\Sigma }^{*}(\varvec{\theta _0})\) can be consistently estimated by \(n^{-1}\widetilde{\varvec{I}}( \hat{\varvec{\theta }})\), \(n_k^{-1}\varvec{\widetilde{I}}_{k}(\hat{\varvec{\theta }})\) and \(n_k^{-1}\varvec{I}_k^*(\hat{\varvec{\beta }}_{(k)})\), respectively. In light of (16), we can obtain a consistent estimate of the covariance matrix of \(\hat{\varvec{\beta }}\), which is given by
Appendix C: Proof of Theorem 2
Under Conditions (C1)–(C4) and by following Andersen and Gill (1982), we have \(\hat{\varvec{\beta }}_{(k)}\overset{P}{\rightarrow }\ \varvec{\beta }_0\), \(n_k^{-1}\varvec{I}^*_k(\hat{\varvec{\beta }}_{(k)})\overset{P}{\rightarrow } \varvec{\Sigma }^*(\varvec{\beta }_0)\), \(n_k^{-1}\varvec{\Phi }_k(\hat{\varvec{\beta }}_{(k)})\overset{D}{\rightarrow } \varvec{N}(\varvec{0},\varvec{\Sigma }^*(\varvec{\beta }_0))\) and \(n_k^{1/2}(\hat{\varvec{\beta }}_{(k)}-\varvec{\beta }_0)\overset{D}{\rightarrow } N(\varvec{0}, \varvec{\Sigma }^*(\varvec{\beta }_0)^{-1})\) as \(n_k {\rightarrow } \infty \), where \(\varvec{\Sigma }^*(\varvec{\beta }_0) = \lim \limits _{n_k \rightarrow \infty }n_k^{-1}\varvec{I}_k^*(\varvec{\beta }_0)\). Let \(\hat{\varvec{\beta }}\) denote the estimator of \(\varvec{\beta }\) by solving \(\widetilde{\varvec{\Phi }}(\varvec{\beta })=0\). Under Condition (C4), we also have that \(\varvec{\Sigma }^*(\varvec{\beta }_0) = \lim \limits _{n \rightarrow \infty }n^{-1}\varvec{H}(\varvec{\beta }_0)\). Notice that \(S_k^{(0)}(\varvec{\beta })\), \(S_k^{(1)}(\varvec{\beta })\) and \(S_k^{(2)}(\varvec{\beta })\) are approximated by
and
respectively. The pseudo log-likelihood function related to the score function (11) is given by
The first and second derivatives of \(\widetilde{l}(\varvec{\beta })\) with respect to \(\varvec{\beta }\) are
and
respectively. By following the similar arguments given in Appendix B, we can conclude that \(\hat{\varvec{\beta }}\overset{P}{\rightarrow } \varvec{\beta }_0\) as \(n {\rightarrow } \infty \). Notice that
To derive the asymptotic normality of \(\hat{\varvec{\beta }}\), we first apply Taylor series expansion about \(\varvec{\theta }_0\) to \(\widetilde{\varvec{\Phi }}(\hat{\varvec{\theta }})\) and obtain
where \(\varvec{\theta ^*}\) is between \(\hat{\varvec{\theta }}\) and \(\varvec{\theta }_0\), \(\varvec{\widetilde{H}}_{k} (\varvec{\theta })= \partial \varvec{\widetilde{\Phi }}(\varvec{\theta })/\partial \varvec{\beta }_{(k)}^{\top }\) and \(\varvec{ \widetilde{H}}_{K+1} (\varvec{\theta })= \partial \varvec{\widetilde{\Phi }}(\varvec{\theta })/\partial \varvec{\beta }^{\top }\). By following Lin et al. (1998) and Li et al. (2023), for \(k=1,\ldots K\), it can be concluded that \(n_k^{-1}\varvec{\widetilde{H}}_{k}({\varvec{\theta }^*}) \overset{P}{\rightarrow } \varvec{\Sigma }_{k}(\varvec{\theta }_0)\) and \(n^{-1}\varvec{ \widetilde{H}}_{K+1}({\varvec{\theta }^*}) \overset{P}{\rightarrow } \varvec{\Sigma }_{K+1}(\varvec{\theta }_0)\) for any \(\varvec{\theta }^*\) that converges in probability to \(\varvec{\theta }_0\), where \(\varvec{\Sigma }_{k}(\varvec{\theta }_0) = \lim \limits _{n_k \rightarrow \infty }n_k^{-1}\varvec{\widetilde{H}}_k(\varvec{\theta }_0)\) and \(\varvec{\Sigma }_{K+1}(\varvec{\theta }_0) = \lim \limits _{n \rightarrow \infty }n^{-1}\varvec{\widetilde{H}}_{K+1}(\varvec{\theta }_0)\). By following the similar arguments given in Appendix B, it can be concluded that
where
Furthermore, by Taylor expansion about \(\varvec{\beta }_0\) to \(\varvec{\Phi }_k(\hat{\varvec{\beta }}_{(k)})\) and the equation (17), we have
where
Therefore, we have that
as \(n {\rightarrow } \infty \). Then we can conclude that \(n^{1/2}(\hat{\varvec{\beta }}-\varvec{\beta }_0)\) converges in distribution to a zero-mean normal random vector with covariance matrix
which corresponds to the bottom right corner block of \( \textbf{P}^{-1}\textbf{H}(\textbf{P}^{\top })^{-1}\). Note that
where
and
where
and
In (17), by Conditions (C1)–(C4), we know that \(\varvec{\Sigma }_{K+1}(\varvec{\theta }_0)\) is the limit of \(n^{-1}\varvec{\widetilde{H}}_{K+1}(\varvec{\theta }_0)\), \(\varvec{\Sigma }_k(\varvec{\theta }_0)\) is the limit of \(n_k^{-1}\varvec{\widetilde{H}}_{k}(\varvec{\theta }_0)\) and \(\varvec{\Sigma }^*(\varvec{\beta }_{0})\) is the limit of \(n^{-1}\varvec{H}(\varvec{\beta }_0)\). Since \(\varvec{\Sigma }_k(\varvec{\theta }_0)=\varvec{0}\) for \(k=1,\ldots ,K\) and \(\varvec{\Sigma }_{K+1}(\varvec{\theta }_0)=\varvec{\Sigma }^*(\varvec{\beta }_{0})\), we can conclude from (19) that
The limiting covariance matrix of the distributed estimator \(\hat{\varvec{\beta }}\) is the same as the estimator of \(\varvec{\beta }\) based on the pooled individual-level data Andersen and Gill (1982).
In addition, note that \(\varvec{\Sigma }_{K+1}(\varvec{\theta _0})\), \(\varvec{\Sigma }_{k}(\varvec{\theta _0})\) and \(\varvec{\Sigma }^*(\varvec{\theta _0})\) can be consistently estimated by \(n^{-1}\varvec{\widetilde{H}}( \hat{\varvec{\theta }})\), \(n_k^{-1}\varvec{\widetilde{H}}_{k}(\hat{\varvec{\theta }})\) and \(n_k^{-1}\varvec{I}^*_k(\hat{\varvec{\beta }}_{(k)})\), respectively. In light of (19), we can derive a consistent estimate of the covariance matrix of \(\hat{\varvec{\beta }}\), taking the form
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, P., Li, S. & Song, X. Distributed additive hazards regression analysis of multi-site current status data without using individual-level data. Stat Comput 34, 208 (2024). https://doi.org/10.1007/s11222-024-10523-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10523-4