Adaptive sufficient sparse clustering by controlling false discovery

Zihao Yuan¹,
Jiaqing Chen²,
Han Qiu²,
Houxiang Wang² &
…
Yangxin Huang³

126 Accesses
Explore all metrics

Abstract

Sparse clustering divides samples into distinct groups while simultaneously reducing dimensionality in ultra-high dimensional unsupervised learning. To achieve sparse clustering that can screening out noise variables, an adaptive sparse clustering framework based on sufficient variable screening, abbreviated as adaptive sufficient sparse clustering (ASSC), is developed by controlling false discovery. Without any specific model, ASSC employs a composite hypothesis testing procedure that leverages conditional marginal correlations of variables across distinct groups, aiming to pinpoint sufficient variables via sparse clustering and promoting the performance of sparse clustering in ultra-high dimensionality. To control false discoveries of the hypothesis testing procedure at a predetermined level, ASSC provides the adaptive threshold for identifying conditional marginal dependency, which ensures to accuracy of clustering with high probability. Under mild conditions, the sufficient screening properties and sufficient clustering properties of ASSC are established based on variable screening procedure by controlling false discovery. Numerical studies on synthetic data and real datasets corroborate the performance and flexibility of ASSC, and underscore its potent utility in unsupervised learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional sufficient variable selection with prior information

Article Open access 26 October 2024

GOLFS: feature selection via combining both global and local information for high dimensional clustering

Article 03 August 2023

Sparse Weighted K-Means for Groups of Mixed-Type Variables

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Sequence data sets that support the findings of this study are download from the website as follows: 1.gene expression cancer RNA-Seq: https://archive.ics.uci.edu/dataset/401/gene+expression+cancer+rna+seq 2.lung CT image dataset: https://www.kaggle.com/datasets/kmader/finding-lungs-in-ct-data

References

Adeen, N., Abdulazeez, M., Zeebaree, D.: Systematic review of unsupervised genomic clustering algorithms techniques for high dimensional datasets. Technol. Rep. Kansai Univ. 62(3), 355–374 (2020)
Google Scholar
Amiri, S., Clarke, B.S., Clarke, J.L.: Clustering categorical data via ensembling dissimilarity matrices. J. Comput. Graph. Stat. 27(1), 195–208 (2018)
Article MathSciNet Google Scholar
Benati, S., García, S., Puerto, J.: Mixed integer linear programming and heuristic methods for feature selection in clustering. J. Operat. Res. Soc. 69(9), 1379–1395 (2018)
Article Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc.: Ser. B (Methodol.) 57(1), 289–300 (1995)
Article MathSciNet Google Scholar
Bishop, C.: Pattern recognition and machine learning. 16, 140–155 (2006)
Costa, V., Aprile, M., Esposito, R., Ciccodicola, A.: RNA-SEQ and human complex diseases: recent accomplishments and future perspectives. Eur. J. Hum. Genet. 21(2), 134–142 (2013)
Article Google Scholar
Cui, H., Li, R., Zhong, W.: Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc. 110(510), 630–641 (2015)
Article MathSciNet Google Scholar
Chipman, H.A., Tibshirani, R.: Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7, 286–301 (2005)
Article Google Scholar
Chang, J., Tang, C.Y., Wu, Y.: Marginal empirical likelihood and sure independence feature screening. Ann. Stat. 41(4), 2123–2148 (2013)
Article MathSciNet Google Scholar
Chang, X., Wang, Y., Li, R., Xu, Z.: Sparse k-means with $\ell _\infty $/$\ell _0$ penalty for high-dimensional data clustering. Stat. Sin. 28(3), 1265–1284 (2018)
Google Scholar
Donoho, D.L.: High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geomet. 35(4), 617–652 (2006)
Article MathSciNet Google Scholar
Fan, J., Feng, Y., Song, R.: Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 106(494), 544–557 (2011)
Article MathSciNet Google Scholar
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B Stat. Methodol. 70(5), 849–883 (2008)
Article MathSciNet Google Scholar
Fan, J., Song, R.: Sure independence screening in generalized linear models with np-dimensionality. Ann. Stat. 38(6), 3567–3604 (2010)
Article MathSciNet Google Scholar
Guo, J., Levina, E., Michailidis, G., Zhu, J.: Pairwise variable selection for high-dimensional model-based clustering. Biometrics 66(3), 793–804 (2010)
Article MathSciNet Google Scholar
Guo, X., Ren, H., Zou, C., Li, R.: Threshold selection in feature screening for error rate control. J. Am. Stat. Assoc. 1–13 (2022)
Han, X.: Nonparametric screening under conditional strictly convex loss for ultrahigh dimensional sparse data. Ann. Stat. 47(4), 1995–2022 (2019)
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The elements of statistical learning: Data mining, inference, and prediction 2nd edition. (2020)
Hallquist, M.N., Wiley, J.F.: Mplusautomation: An r package for facilitating large-scale latent variable analyses in mplus. Struct. Equ. Model. 25(4), 621–638 (2018)
Article MathSciNet Google Scholar
He, X., Wang, L., Hong, H.G.: Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 41(1), 342–369 (2013)
Article MathSciNet Google Scholar
Hao, N., Zhang, H.H.: A note on high-dimensional linear regression with interactions. Am. Stat. 71(4), 291–297 (2017)
Article MathSciNet Google Scholar
Lin, X., Guan, J., Chen, B., Zeng, Y.: Unsupervised feature selection via orthogonal basis clustering and local structure preserving. IEEE Trans. Neural Netw. Learn. Syst. 1–12 (2021)
Liu, W., Ke, Y., Liu, J., Li, R.: Model-free feature screening and FDR control with knockoff features. J. Am. Stat. Assoc. 117(537), 428–443 (2022)
Article MathSciNet Google Scholar
Lee, H., Li, J.: Variable selection for clustering by separability based on ridgelines. J. Comput. Graph. Stat. 21(2), 315–336 (2012)
Article MathSciNet Google Scholar
Liu, W., Li, R.: In Fuleky, P. (ed.) Variable Selection and Feature Screening, pp. 293–326. Springer, Cham (2020)
Lu, J., Lin, L.: Model-free conditional screening via conditional distance correlation. Stat. Pap. 61(1), 225–244 (2020)
Article MathSciNet Google Scholar
Li, G., Peng, H., Zhang, J., Zhu, L.: Robust rank correlation based screening. Ann. Stat. 40(3), 1846–1877 (2012)
Article MathSciNet Google Scholar
Lim, D.K., Rashid, N.U., Ibrahim, J.G.: Model-based feature selection and clustering of RNA-SEQ data for unsupervised subtype discovery. Annals Appl. Stat. 15(1), 481–508 (2021)
Article MathSciNet Google Scholar
Lin, L., Sun, J., Zhu, L.: Nonparametric feature screening. Comput. Stat. Data Anal. 67, 162–174 (2013)
Article MathSciNet Google Scholar
Li, Z., Tang, J.: Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans. Image Process. 24(12), 5343–5355 (2015)
Article MathSciNet Google Scholar
Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. J. Am. Stat. Assoc. 107(499), 1129–1139 (2012)
Article MathSciNet Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967). Oakland
Mohamed, I.B., Mirakhmedov, S.M.: Approximation by normal distribution for a sample sum in sampling without replacement from a finite population. Sankhya A 78, 188–220 (2016)
Maji, P., Pal, S.K.: Fuzzy-rough feature selection using -information measures, pp. 117–159 (2012)
Pollard, D.: Convergence of Stochastic Processes. Springer, Berlin (2012)
Google Scholar
Qiu, H., Chen, J., Yuan, Z.: Quantile correlation-based sufficient variable screening by controlling false discovery rate. Adv. Theory Simul. 7(5), 2301099 (2024)
Article Google Scholar
Rosenberg, J.M., Beymer, P.N., Anderson, D.J., Lissa, Cj., Schmidt, J.A.: tidylpa: an r package to easily carry out latent profile analysis (LPA) using open-source or commercial software. J. Open Sour. Softw. 3(30), 978 (2018)
Article Google Scholar
Seo, B., Lin, L., Li, J.: Block-wise variable selection for clustering via latent states of mixture models. J. Comput. Graph. Stat. 31(1), 138–150 (2022)
Article MathSciNet Google Scholar
Shao, S., Tunc, C., Al-Shawi, A., Hariri, S.: Automated twitter author clustering with unsupervised learning for social media forensics. In: 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), pp. 1–8 (2019)
Tan, M., Tsang, I.W., Wang, L.: Minimax sparse logistic regression for very high-dimensional feature selection. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1609–1622 (2013)
Article Google Scholar
Tang, W., Xie, J., Lin, Y., Tang, N.: Quantile correlation-based variable selection. J. Bus. Econ. Stat. 1–13 (2021)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Wallace, M.L., Buysse, D.J., Germain, A., Hall, M.H., Iyengar, S.: Variable selection for skewed model-based clustering: Application to the identification of novel sleep phenotypes. J. Am. Stat. Assoc. 113(521), 95–110 (2018)
Article MathSciNet Google Scholar
Wang, Y., Chang, X., Li, R., Xu, Z.: Sparse k-means with the $\ell _q(0 \le q < 1)$ constraint for high-dimensional data clustering. In: 2013 IEEE 13th International Conference on Data Mining, pp. 797–806 (2013)
Wan, Q., Dingerdissen, H., Fan, Y., Gulzar, N., Pan, Y., Wu, T.-J., Yan, C., Zhang, H., Mazumder, R.: Bioxpress: an integrated RNA-SEQ-derived gene expression database for pan-cancer analysis. Database 2015, 019 (2015)
Article Google Scholar
Wang, H., Pang, G., Shen, C., Ma, C.: Unsupervised representation learning by predicting random distances. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 2950–2956 (2021)
Witten, D., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105, 713–726 (2010)
Article MathSciNet Google Scholar
Wang, B., Zhang, Y., Sun, W.W., Fang, Y.: Sparse convex clustering. J. Comput. Graph. Stat. 27(2), 393–403 (2018)
Article MathSciNet Google Scholar
Xie, J., Lin, Y., Yan, X., Tang, N.: Category-adaptive variable screening for ultra-high dimensional heterogeneous categorical data. J. Am. Stat. Assoc. 115(530), 747–760 (2020)
Article MathSciNet Google Scholar
Yang, M.-S., Benjamin, J.B.M.: Sparse possibilistic c-means clustering with lasso. Pattern Recogn. 138, 109348 (2023)
Article Google Scholar
Yuan, Q., Chen, X., Ke, C., Yin, X.: Independence index sufficient variable screening for categorical responses. Comput. Stat. Data Anal. 174, 107530 (2022)
Article MathSciNet Google Scholar
Yuan, Z., Chen, J., Qiu, H., Huang, Y.: Quantile-adaptive sufficient variable screening by controlling false discovery. Entropy 25(3), 524 (2023)
Article MathSciNet Google Scholar
Yin, X., Hilafu, H.: Sequential sufficient dimension reduction for large $p$, small $n$ problems. J. R. Stat. Soc. B (Stat. Methodol.) 77(4), 879–892 (2015)
Article MathSciNet Google Scholar
Yao, C., Liu, Y.-F., Jiang, B., Han, J., Han, J.: Lle score: a new filter-based unsupervised feature selection method based on nonlinear manifold embedding and its application to image recognition. IEEE Trans. Image Process. 26(11), 5257–5269 (2017)
Article MathSciNet Google Scholar
Yu, H., Wang, Y., Zeng, D.: A general framework of nonparametric feature selection in high-dimensional data. Biometrics 79(2), 951–963 (2023)
Article MathSciNet Google Scholar
Zhu, L.-P., Li, L., Li, R., Zhu, L.-X.: Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 106(496), 1464–1475 (2011)
Article MathSciNet Google Scholar
Zhao, Y., Shrivastava, A.K., Tsui, K.L.: Regularized gaussian mixture model for high-dimensional clustering. IEEE Trans. Cyber. 49(10), 3677–3688 (2018)
Article Google Scholar
Zhou, T., Zhu, L., Xu, C., Li, R.: Model-free forward screening via cumulative divergence. J. Am. Stat. Assoc. 115(531), 1393–1405 (2020)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Many thanks to the reviewers for their positive feedback, valuable comments, and constructive suggestions that helped improve the quality of this article. Many thanks to the editors’ great help and coordination for the publication of this paper.

Funding

This research was supported by the National Natural Science Foundation of China under Grant No.81671633 to J.C.

Author information

Authors and Affiliations

College of Physics and Mechanics, Wuhan University of Technology, 122 Luoshi Road, Wuhan, 430070, Hubei, China
Zihao Yuan
College of Mathematics and Statistics, Wuhan University of Technology, 122 Luoshi Road, Wuhan, Hubei, 430070, China
Jiaqing Chen, Han Qiu & Houxiang Wang
College of Public Health, University of South Florida, 4202 E. Fowler Avenue, Tampa, FL, 33620, USA
Yangxin Huang

Authors

Zihao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Han Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Houxiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yangxin Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, J.C.; Data curation, Z.Y.; Formal analysis, Z.Y.; Funding acquisition, J.C.; Investigation, Z.Y.; Methodology, Z.Y.; Project administration, H.Q. and H.W.; Supervision, J.C.; visualization, H.Q. and H.W.; Validation, Y.H.; Writing-original draft, Z.Y.; Writing-review and editing, Z.Y. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jiaqing Chen.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Main proof

1.1 Proof of Lemma 1

Proof

According to the definition of $\mathcal {A}_k$ in Assumption (i), for any $\varvec{x}_{\mathcal {A}_k^c} \in \mathbb {R}_{\mathcal {A}_k^c}$, we have

$$\begin{aligned} \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}) = \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}_{\mathcal {A}_{{k}}}). \end{aligned}$$

(A1)

Due to that $\varvec{X} = ( \varvec{X}_{\mathcal {A}_k}, \varvec{X}_{\mathcal {A}_k^{c}} )$, for any $\varvec{x}_{\mathcal {A}_k^c} \in \mathbb {R}_{\mathcal {A}_k^c}$, multiply $\textrm{Pr} (\varvec{X}_{\mathcal {A}_k^{c}} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k})$ on both sides of the Eq. (A1), we obtain that

$$\begin{aligned}&\textrm{Pr} (\varvec{X} \in \mathcal {C}_k, \varvec{X}_{\mathcal {A}^{c}_k} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k}) \\&= \textrm{Pr} (\varvec{X}_{\mathcal {A}^{c}_k} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k}) \cdot \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}_{\mathcal {A}_k}). \end{aligned}$$

Due to the multiplication theorem of probability, it is clear that $ \textrm{I} ( \varvec{X} \in \mathcal {C}_k ) {\not \perp \!\!\! \perp } \mathcal {A}_k^{c} \mid \mathcal {A}_k. $ In terms of invertibility, we proved that

$$\begin{aligned} \mathcal {A}_k&= \{ 1 \le j \le p: \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}) \\&\qquad \text { functionally depends on } X_j \} \\&= \{ 1 \le j \le p: \textrm{I} ( \varvec{X} \in \mathcal {C}_k ) {\not \perp \!\!\! \perp } X_j \mid \varvec{X}_{\mathcal {A}_k}\}. \end{aligned}$$

Thus, $\mathcal {A}_k$ is the sufficient screening variables index set. When $K = 2$, it is easy to find that $\textrm{Pr} (\varvec{X} \in \mathcal {C}_2 | \varvec{X}) = 1 -\textrm{Pr} (\varvec{X} \in \mathcal {C}_1 | \varvec{X})$. Thus, $\mathcal {A}_1 = \mathcal {A}_2$.

Note that $\mathcal {A}_k^c = \{ 1 \le j \le p: \textrm{I} ( \varvec{X} \in \mathcal {C}_k ) {\perp \!\!\! \perp } X_j | \varvec{X}_{\mathcal {A}_k}\}$. It indicates that for any $\varvec{x}_{\mathcal {A}_k^c}$ belong to the support set $\mathbb {R}_{\mathcal {A}_k^c}$, we have

$$\begin{aligned}&\textrm{Pr} ( \varvec{X} \in \mathcal {C}_k , \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} ) \\&= \textrm{Pr} ( \varvec{X} \in \mathcal {C}_k \mid \varvec{X}_{\mathcal {A}_k} ) \cdot \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} )\\ \iff&\textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X} \in \mathcal {C}_k , \varvec{X}_{\mathcal {A}_k} ) \\&= \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} ) \\ \iff&\mathbb {E}_{\varvec{X}_{\mathcal {A}_k^c}} \left\{ \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X} \in \mathcal {C}_k , \varvec{X}_{\mathcal {A}_k} ) \right\} \\&= \mathbb {E}_{\varvec{X}_{\mathcal {A}_k^c}} \left\{ \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} ) \right\} \\ \iff&\textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \mid \varvec{X} \in \mathcal {C}_k ) = \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} ) . \end{aligned}$$

Thus, we obtain that $F_j(x\mid \varvec{X} \in \mathcal {C}_k) - F_j(x) = 0$ holds when $j \notin \mathcal {A}_k$. $\square $

1.2 Proof of Proposition 1

Proof

Suppose Assumptions (i)-(ii) hold. If $x_j$ is not sufficient active for kth cluster $\mathcal {C}_k$, we have $F_{j, k}\left( x\right) = \textrm{Pr} \left( x_j \le x | \varvec{X} \in \mathcal {C}_k \right) = \textrm{Pr} (x_j \le x) = F_{j}\left( x\right) $. By the definition of Eq. (3), we obtain

$$\begin{aligned} \tau _{j k}&= \mathbb {E}_x \left\{ F_{j, k}\left( x\right) - F_{j}\left( x\right) \right\} = \mathbb {E}_x\left\{ F_{j, k}\left( x\right) \right\} -\frac{1}{2} \\&= \mathbb {E}_x\left\{ F_{j}\left( x\right) \right\} - \frac{1}{2} = 0. \end{aligned}$$

Hence, $\upsilon _{j k} = 12 \cdot (n + 1) \cdot p_k / (1 - p_k) \tau _{j k}^2 = 0$. When $x_j$ is not sufficient variable with the partition $\mathbb {C}$, it implies that $\upsilon _{j k} = 0$ hold for all $k = 1, \ldots , K$. Finally, we have that $\sum _{k=1}^{K} \upsilon _{j k} = 0$. $\square $

1.3 Proof of Lemma 2

Proof

Note $\Omega = \{{0}, {1}, \ldots , {n-1}\}$, and $\Omega _k = \{ \xi _1, \xi _2, \ldots , \xi _e \}$ $(e=1, \ldots , n)$ is an e size sample set random sampled in equal probability without replacement from the population, where $\xi _{i} (i = 1, 2, \ldots , e)$ represents the ith random sample, which satisfies a discrete uniform distribution from 0 to $n-1$, that is, $\xi _{i} \sim {\text {DiscreteU}}({0}, n-1)$. $\xi _{i}$ has the following properties:

$$\begin{aligned}&\mathbb {E} \xi _{i} = \frac{n - 1}{2}, \quad \mathbb {E} \xi ^2_i = \frac{(n - 1)(2n - 1)}{6}, \\&{\text {Var}}\xi _{i} = \frac{n^2 - 1}{12}. \end{aligned}$$

For all $i_1 \ne i_2$, where $i_1, i_2 \in \{1, 2, \ldots , n\}$, we get

$$\begin{aligned} \mathbb {E} \xi _{i_1} \xi _{i_2} = \frac{(n - 2)(3n - 1)}{12}, \quad {\text {Cov}} \left( \xi _{i_1}, \xi _{i_2} \right) = -\frac{n + 1}{12}. \end{aligned}$$

For continuous variables subject to arbitrary distribution $\varvec{X}_j=\left\{ X_{1 j}, X_{2 j}, \ldots , X_{n j} \right\} $, let $\xi _{ij} = \sum _{k \ne i}^{n} \textrm{I} ( X_{i j} \le X_{k j} )$. It is easy to find that the random variable $\xi _{ij} (i = 1, 2, \ldots , n)$ is a special case in Mohamed and Mirakhmedov (2016), where $e = n$. According to the definition of $\hat{\tau }_{j k}$ and $\xi _{i j}$, it is obviously established that $\textrm{Pr}(\xi _i = r)= 1/n (i = 1, 2, \ldots , n; t = 0, 1, \ldots , n-1)$ and $\xi _{i_1} \ne \xi _{i_2}(\forall i_1 \ne i_2)$. By the definition of $\xi _{i j}$, for all $j=1,2,\ldots , p, k=1,2,\ldots ,K$,

$$\begin{aligned}&\frac{1}{n+1} \sum _{k=1}^{n}{ \frac{1}{n} \sum _{i=1}^{n} \frac{\textrm{I}(X_{ij} \le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k )}{{\hat{p}}_k}} \\&= \frac{1}{n(n+1){\hat{p}}_k} \sum _{k=1}^{n} \sum _{i=1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I}(X_{ij}\le X_{kj}) \\&= \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I}(X_{ij}\le X_{kj}) + \right. \\&\quad \left. + \sum _{k=1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \right] \\&= \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} + \frac{1}{n+1}. \end{aligned}$$

If $\textrm{H}_{0, j, k}$ is true, the expectation and variance of $\hat{\tau }_{j k}$ can be obtained as follows:

$$\begin{aligned}&\mathbb {E} \left\{ \hat{\tau }_{j k} \right\} \\&= \mathbb {E} \left\{ \frac{1}{n+1} \sum _{k=1}^{n} { \frac{1}{n} \sum _{i=1}^{n}\frac{ \textrm{I} ( X_{ij} \le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k ) }{ {\hat{p}}_k}} - \frac{1}{2} \right\} \\&= \mathbb {E} \left\{ \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} \right\} + \frac{1}{n+1} - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \mathbb {E} \left\{ \xi _{i j} \right\} + \frac{1}{n+1} - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \frac{n-1}{2} n {\hat{p}}_k + \frac{1}{n+1} - \frac{1}{2} =0 \end{aligned}$$

and

$$\begin{aligned}&{\text {Var}}\left\{ \hat{\tau }_{j k} \right\} \\&= {\text {Var}} \left\{ \frac{1}{n+1}\sum _{k=1}^{n}{\frac{1}{n}\sum _{i=1}^{n}\frac{\textrm{I}(X_{ij}\le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k )}{{\hat{p}}_k}} - \frac{1}{2} \right\} \\&= {\text {Var}} \left\{ \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} \right\} \\&= \frac{1}{n^2 (n+1)^2 {{\hat{p}}_k}^2} \left[ \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) {\text {Var}} \left\{ \xi _{i j} \right\} \right. \\&\quad \left. + \sum _{i_1 \ne i_2}^{n} \textrm{I}\left( \varvec{X}_{i_1} \in \hat{\mathcal {C}}_k, \varvec{X}_{i_2} \in \hat{\mathcal {C}}_k \right) {\text {Cov}} \left( \xi _{i_1 j},\xi _{i_2 j}\right) \right] \\&= \frac{1}{n^2 (n+1)^2 {{\hat{p}}_k}^2} \left\{ n {\hat{p}}_k \frac{n^{2} - 1}{12} - n {\hat{p}}_k \left( n \hat{p}_{k} - 1\right) \frac{n+1}{12} \right\} \\&= \frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}. \end{aligned}$$

Let $\Omega _{n} = (\zeta _{1 n}, \ldots , \zeta _{n n} )$ be a random permutation of $\{0,1\ldots ,n-1\}$, and $r=(r_1, \ldots , r_N)$ is a random vector independent of $\zeta _{1 n}, \ldots , \zeta _{n n}$ satisfying $\textrm{Pr}\{ r_1 = k_1, \ldots , r_{n} = k_{n} \} = 1 / (n!)$, where $k=(k_1,\ldots ,k_{n})$ is also a random permutation of $1,\ldots , n$. Note that $S_{n \hat{p}_k, n} = \zeta _{r_1 n} + \cdots + \zeta _{r_{n \hat{p}_k} n}$ represents a sum of n random vector samples chosen at random without replacement from the population $\Omega _{n}$. It can be expressed equivalently as $S_{n \hat{p}_k, n} = \textrm{I} \{ \varvec{X}_1 \in \hat{\mathcal {C}}_k \} \zeta _{1 N} + \cdots + \textrm{I} \{ \varvec{X}_n \in \hat{\mathcal {C}}_k \} \zeta _{n n}$, where $\textrm{I}(\cdot )$ represents indicative function. It is shown that $\sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j}$ and $S_{n \hat{p}_k, n}$ have the same distribution. Denote

$$\begin{aligned}&\tau =\frac{n - 1}{2}, \quad b^{2}=\frac{n^2-1}{12}, \quad \hat{\zeta }_{m n}= \frac{\left( \zeta _{m n}-\tau \right) }{b}, \\&\sigma ^{2}= \frac{(1-\hat{p}_k)(n^2-1)}{12}, \quad {\sigma ^{*}}^{2}= \frac{(1-\hat{p}_k)(n+1)n}{12}, \\&A_{k}=\frac{1}{n}\sum _{m=1}^{n} \hat{\zeta }_{m n}^{k}, \quad B_{k}=\frac{1}{n}\sum _{m=1}^{n} |\hat{\zeta }_{m n}|^{k}. \end{aligned}$$

Let $F_{n}(u)=\textrm{Pr}\left\{ S_{n \hat{p}_k, n}< u \sigma ^{*} \sqrt{n \hat{p}_k}+ n \hat{p}_k \tau \right\} $. Due to the same distribution of $S_{n \hat{p}_k, n}$ and $\sum _{i = 1}^{n} \textrm{I} (\varvec{X}_i \in \hat{\mathcal {C}}_k) \xi _{i j}$, for any j, we have

$$\begin{aligned}&F_{n}(u) \\&= \textrm{Pr}\left\{ \sum _{i = 1}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j}< u \sigma ^{*} \sqrt{n \hat{p}_k}+ n \hat{p}_k \tau \right\} \\&= \textrm{Pr}\left\{ \frac{1}{n \hat{p}_k (n+1)} \left( \sum _{i = 1}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} - n \hat{p}_k \frac{n - 1}{2} \right) \right. \\&\quad \left.< u \frac{\sigma ^{*}}{\sqrt{n \hat{p}_k}(n+1)} \right\} \\&= \textrm{Pr}\left\{ \hat{\tau }_{j k} < u \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} \right\} := F_{\hat{\tau }_{j k}}(u). \end{aligned}$$

Considering

$$\begin{aligned}&\hat{\tau }_{j k} \\&= \frac{1}{n+1} \sum _{k=1}^{n} {\frac{1}{n} \sum _{i=1}^{n} \frac{\textrm{I} (X_{ij}\le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k )}{{\hat{p}}_k}} - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I}(X_{ij} \le X_{kj}) \right. \\&\quad \left. + \sum _{k=1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \right] - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k \right) \textrm{I} (X_{ij} \le X_{kj}) \right] \\&\quad - \frac{n-1}{2(n+1)} \\&= \frac{n-1}{2(n+1)} \\&\quad - \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I} (X_{ij} \ge X_{kj}) \right] \\&= -\left[ \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j}^{\prime } - \frac{n-1}{2(n+1)} \right] , \end{aligned}$$

where continuous variable $x_j$ satisfies $\textrm{Pr}(x_{ij} = x_{kj}) = 0$. Define $\xi _{ij}^{\prime } = \sum _{k \ne i}^{n} \textrm{I} (X_{i j} \ge X_{k j})$; then, $\xi _{ij}^{\prime }$ has the same distribution with $S_{n \hat{p}_k, n}$; as a result, $\xi _{ij}^{\prime }$ has the same distribution with $\xi _{ij}$. In other words, $\hat{\tau }_{j k}$ and $-\hat{\tau }_{j k}$ have the same distribution, which is $F_{\hat{\tau }_{j k}}(u) = 1 - F_{\hat{\tau }_{j k}}(-u)$. If $\lim \limits _{n \rightarrow \infty } \hat{p}_{k} (1 - \hat{p}_{k})>0$, we have the following relationship by using Theorem 3.4, Corollary 3.4, Corollary 3.5, and Corollary 3.6 of Mohamed and Mirakhmedov (2016), and we can obtain that

For all $u = o(\sqrt{n})\ge 0$, we obtain that

$$\begin{aligned}&\frac{1-F_{\hat{\tau }_{j k}}(u)}{1-\Phi (u)} = \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)} \nonumber \\&= \exp \left\{ \frac{u^{3}}{\sqrt{n}} L_{n}\left( \frac{u}{\sqrt{n}}\right) \right\} \left( 1+O\left( \frac{u+1}{\sqrt{n}}\right) \right) . \end{aligned}$$

(A2)

where $L_{n}(v)$ is an odd power series that, for all sufficiently large N, is majorized by a power series with coefficients not depending on N, and is convergent in some discussions, and $L_{n}(v)$ converges uniformly in n for sufficiently small values of v, where $L_{n}(0) = 0$.

For all $u = o(n^{1/4}) \ge 0$, we obtain that

$$\begin{aligned}&\frac{1-F_{\hat{\tau }_{j k}}(u)}{1-\Phi (u)}= \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)} \nonumber \\&= \exp \left\{ \frac{u^{3}}{\sqrt{n}} O\left( \frac{u}{\sqrt{n}}\right) \right\} \left( 1+O\left( \frac{u+1}{\sqrt{n}}\right) \right) \nonumber \\&= \left( 1 + O\left( \frac{u^4}{n}\right) \right) \left( 1+O\left( \frac{u+1}{\sqrt{n}}\right) \right) . \end{aligned}$$

(A3)

For all $u = o(n^{1 / 6}) \ge 0$, we obtain that

$$\begin{aligned} \frac{1-F_{\hat{\tau }_{j k}}(u)}{1-\Phi (u)} = \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)} = 1+O\left( \frac{(u+1)^{3}}{\sqrt{n}}\right) . \end{aligned}$$

(A4)

For all u satisfies that $0 \le u \le C \min \left( \frac{\sqrt{{n^{*}} \hat{q}} }{ \max |\hat{Y}_{m N}|},\frac{({n^{*}} \hat{q})^{1 / 6}}{ B_{3}^{1 / 3}}\right) $, we have

$$\begin{aligned} \begin{aligned}&1-F_{\hat{\tau }_{j k}}(u) \\&=(1-\Phi (u))\left( 1+ O \left( \frac{(u+1)^{3} B_{3} }{\sqrt{{n^{*}} \hat{q}}}\right) \right) . \end{aligned} \end{aligned}$$

(A5)

Based on the above cases, using Taylor’s expansion of the Eqs. (A2)–(A5), the following conclusions can be obtained through the order correlation of u and n: if $\textrm{H}_{0, j, k} (j = 1, 2, \ldots , p; k = 1, 2, \ldots , K)$ is true, then

$$\begin{aligned} \frac{1 - F_{\hat{\tau }_{j k}}(u)}{1 - \Phi (u)} = \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)}= 1 + O \left( n^{-1/2} \right) , \end{aligned}$$

where $\Phi (-u) = 1/2p = O(n^{\alpha })$ and $\alpha \le 1/2$. In other words, $\hat{\tau }_{j k}$ has the asymptotic normal distribution $\mathcal {N} (0, {(1 - \hat{p}_k)}/{(12 (n+1) \hat{p}_k)} )$ for all $j \in \{ 1, 2, \ldots , p \}$ and all $k \in \{ 1, 2, \ldots , K \}$. $\square $

1.4 Proof of Corollary 1

Proof

By definition of $\hat{\upsilon }_{j k}$, we can easily find that $\hat{\upsilon }_{j k}$ has the asymptotic distribution as $\chi ^2_1$, where $\chi ^2_1$ is the chi-square distribution with degree of freedom 1. Since $\sum _{k=1}^{K}\hat{p}_k \hat{\tau }_{j k} = 0$, then $\hat{\upsilon }_{j}=\sum _{k=1}^{K}\hat{\upsilon }_{j k}$ has the asymptotic distribution as $\chi ^2_{K-1}$ with degree of freedom $K-1$. $\square $

1.5 Proof of Theorem 1

Proof

Let $p_{k}=\textrm{Pr}\left( \varvec{X} \in \mathcal {C}_k\right) $, $F_{j}(k, x)=\textrm{I}\left( X_{j} \le x, \varvec{X} \in \mathcal {C}_k\right) $ and $\eta _{k}=\textrm{I}\left( \varvec{X} \in \mathcal {C}_k\right) $. For some $j=1, \ldots , p$, let $\{X_{i j}: i=1, \ldots , n\}$ be a random sample of $X_j$. Write $\eta _{k i}=\textrm{I}\left( \varvec{X}_i \in \mathcal {C}_k\right) $, $f_{i j}(k, x)=\textrm{I}\left( X_{i j} \le x, \varvec{X}_i \in \mathcal {C}_k\right) $, $\zeta _{j}(k)=\mathbb {E}_{x}\left\{ F_{j, k}(x)\right\} $, $\tilde{\zeta }_{j}(k)=\frac{1}{n^{2}} \sum _{l=1}^{n} \sum _{i=1}^{n} \frac{\textrm{I}\left( X_{i j} \le X_{l j}, \varvec{X}_i \in \mathcal {C}_k\right) }{\hat{p}_{k}}$ and $\hat{\zeta }_{j}(k)=\frac{1}{n(n+1)} \sum _{l=1}^{n} \sum _{i=1}^{n} \frac{\textrm{I}\left( X_{i j} \le X_{l j}, \varvec{X}_i \in \mathcal {C}_k\right) }{\hat{p}_{k}}$. Due to that $\mathbb {E}\left( \eta _{k i}\right) =p_{k}, \eta _{k i}$ follows $\textrm{Bernoulli}(\left( p_{k}\right) $ and $\sum _{i=1}^{n} \eta _{k i}$ follows $\textrm{Binomial}\left( n, p_{k}\right) $. By Bernstein’s inequality, for any k and j, we have

$$\begin{aligned} \begin{aligned}&\textrm{Pr}\left\{ \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon \right\} \\&=\textrm{Pr}\left\{ \left| \sum _{i=1}^{n}\left( \eta _{k i}-p_{k}\right) \right| \ge n \epsilon \right\} \\&\le 2 \exp \left\{ -\frac{n^{2} \epsilon ^{2}}{2\left( n p_{k}+n \epsilon / 3\right) }\right\} \\&=2 \exp \left\{ -\frac{n \epsilon ^{2}}{2\left( p_{k}+\epsilon / 3\right) }\right\} . \end{aligned} \end{aligned}$$

(A6)

Since $\left| \textrm{I}\left( X_{i j} \le x\right) \right| \le 1$, it can be proved straightforwardly by Hoeffding’s inequality and the empirical process theory Pollard (2012) that for given $k=1, \ldots , K$ and $j=1, \ldots , p$, we get

$$\begin{aligned}&\textrm{Pr}\left[ \sup _{x \in \mathbb {R}_{x_j}}\left| \frac{1}{n} \sum _{i=1}^{n} \textrm{I}\left( X_{i j} \le x\right) -\mathbb {E}\left\{ \textrm{I}\left( X_{j} \le x\right) \right\} \right| \ge \epsilon \right] \nonumber \\&\le 2(n+1) \exp \left( -2 n \epsilon ^{2}\right) . \end{aligned}$$

(A7)

By Bernstein’s inequality, the empirical process theory Pollard (2012), and

$$\begin{aligned}&\left| f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \\&=\left| \textrm{I}\left( X_{i j} \le x, \varvec{X}_i \in \mathcal {C}_k\right) -F_{j k}\left( x\right) p_{k}\right| \le 1, \end{aligned}$$

for given $k=1, \ldots , K$ and $j=1, \ldots , p$, we obtain

$$\begin{aligned} \begin{aligned}&\textrm{Pr}\left[ \sup _{x \in \mathbb {R}_{x_j}}\left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \ge \epsilon \right] \\&\le 2(n+1) \exp \left\{ -\frac{n \epsilon ^{2}}{2\left( p_{k}+\epsilon / 3\right) }\right\} . \end{aligned} \end{aligned}$$

(A8)

Note that

$$\begin{aligned}&\left| \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right| \\&= \left| \int \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}} d \hat{F}_{j}(x)-\mathbb {E}_{x}\left\{ F_{j, k}(x)\right\} \right| \\&\le \left| \int \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}} d\left\{ \hat{F}_{j}(x)-F_{j}(x)\right\} \right| \\&\quad +\sup _{x \in \mathbb {R}_{x_j}}\left| \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}}-\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} }{p_{k}}\right| \\&\equiv \textrm{I}_{1}+\textrm{I}_{2} , \end{aligned}$$

where $\hat{F}_{j}(x)=n^{-1} \sum _{i=1}^{n} \textrm{I}\left( X_{i j} \le x\right) $ and $F_{j}(x)=\textrm{Pr}\left( X_j \le x\right) $.

In terms of $\textrm{I}_{1}$, from the inequality

$$\begin{aligned}&\left| \int \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}} d\left\{ \hat{F}_{j}(x)-F_{j}(x)\right\} \right| \\&\le \sup _{x \in \mathbb {R}_{x_{j}}}\left| \hat{F}_{j}(x)-F_{j}(x)\right| \end{aligned}$$

and Eq. (A8), we have

$$\begin{aligned}&\textrm{Pr}\left( \textrm{I}_{1}>\epsilon \right) \le \textrm{Pr}\left\{ \sup _{x \in \mathbb {R}_{x_{j}}}\left| \hat{F}_{j}(x)-F_{j}(x)\right| >\epsilon \right\} \\&\le 2(n+1) \exp \left( -2 n \epsilon ^{2}\right) . \end{aligned}$$

In terms of $\textrm{I}_{2}$, due to that

$$\begin{aligned} \sup _{x \in \mathbb {R}_{x_{j}}} \mathbb {E}\left\{ F_{j}(k, x)\right\} =\sup _{x \in \mathbb {R}_{x_{j}}} \textrm{Pr}\left( x_{j} \le x, \varvec{x} \in \mathcal {C}_k\right) =p_{k}, \end{aligned}$$

write

$$\begin{aligned}&\sup _{x \in \mathbb {R}_{x_{j}}}\left| \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}}-\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} }{p_{k}}\right| \\&\le \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| \sum _{i=1}^{n} f_{i j}(k, x) / n-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} \\&\quad + \sup _{x \in \mathbb {R}_{x_{j}}}\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} \left| \hat{p}_{k}-p_{k}\right| }{p_{k} \hat{p}_{k}} \\&= \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| \sum _{i=1}^{n} f_{i j}(k, x) / n-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} + \frac{\left| \hat{p}_{k}-p_{k}\right| }{\hat{p}_{k}}. \end{aligned}$$

Thereby, from Eqs. (A6) and (A7), for any $\epsilon >0$, we have that

$$\begin{aligned}&\textrm{Pr}\left( \textrm{I}_{2} \ge \epsilon \right) \\&\le \textrm{Pr}\left[ \sup _{x \in \mathbb {R}_{x_{j}}}\left| \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}}-\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} }{p_{k}}\right| \ge \epsilon \right] \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| n^{-1} \sum _{i=1}^{n} f_{i j}(k, x) - \mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} \right. \\&\qquad \left. + \frac{\left| \hat{p}_{k}-p_{k}\right| }{\hat{p}_{k}} \ge \epsilon \right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| n^{-1} \sum _{i=1}^{n} f_{i j}(k, x) - \mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} \right. \\&\qquad \left. + \frac{\left| n^{-1} \sum _{i=1}^{n} \eta _{k i} - \mathbb {E}\left( \eta _{k}\right) \right| }{\hat{p}_{k}} \ge \epsilon , \right. \\&\qquad \left. \min _{1 \le k \le K} \hat{p}_{k} \ge c_{1} / 2 K\right) \\&\quad +\textrm{Pr}\left( \min _{1 \le k \le K} \hat{p}_{k}< c_{1} / 2 K\right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \right. \\&\qquad \left. + \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon c_{1} / 2 K\right) \\&\quad +\textrm{Pr}\left( \min _{1 \le k \le K} \hat{p}_{k}<c_{1} / 2 K\right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \right. \\&\qquad \left. + \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon c_{1} / 2 K\right) \\&\quad + \textrm{Pr}\left( \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge c_{1} / 2 K \right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}}\left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \right. \\&\qquad \left. \ge \epsilon c_{1} / 4 K\right) \\&\quad + 2 \textrm{Pr}\left( \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon c_{1} / 4 K\right) \\&\le 4 \exp \left\{ -\frac{n\left( \epsilon c_{1} / 4 K\right) ^{2}}{2\left( p_{k}+\epsilon c_{1} / 12 K\right) }\right\} \\&\quad +2(n+1) \exp \left\{ -\frac{n\left( \epsilon c_{1} / 4 K\right) ^{2}}{2\left( p_{k}+\epsilon c_{1} / 12 K\right) }\right\} \\&\le 2(n+3) \exp \left( -c_{4} n \epsilon ^{2} / K\right) , \end{aligned}$$

where $c_{4}=\frac{\left( c_{1} / 4 K\right) ^{2}}{2\left( p_{k}K+\epsilon c_{1} / 12 \right) } >0$ and the inequality above holds based on the following inequality

$$\begin{aligned}&\left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| =\left| \hat{p}_{k}-p_{k}\right| \ge p_{k}-\hat{p}_{k}\\&>\frac{c_{1}}{K}-\frac{c_{1}}{2 K}=\frac{c_{1}}{2 K}. \end{aligned}$$

Under Condition C3, $c_{4} n \epsilon ^{2} / K<2 n \epsilon ^{2}$. Therefore, we have

$$\begin{aligned} \begin{aligned}&\textrm{Pr}\left\{ \left| \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right| \ge \epsilon \right\} \le \textrm{Pr}\left( \textrm{I}_{1}+\textrm{I}_{2} \ge \epsilon \right) \\&\le \textrm{Pr}\left( \textrm{I}_{1} \ge \epsilon / 2\right) +\textrm{Pr}\left( \textrm{I}_{2} \ge \epsilon / 2\right) \\&\le 4(n+2) \exp \left( -c_{4} n \epsilon ^{2} / K\right) . \end{aligned} \end{aligned}$$

(A9)

Let $\hat{\zeta }_{j}(k)-\zeta _{j}(k) \equiv \frac{n}{n+1}\left\{ \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right\} +\Delta $, where $\Delta =-\zeta _{j}(k) /(n+1)$. It follows from Condition C3 and Eqs. (A5) and (A9) that $K=O\left( n^{\xi }\right) $ for $\xi + 2\kappa < 1$. By letting $\epsilon = c_{3}^{*} n^{-\kappa } = c_{3}^{*} n^{-1/2\beta }$ for $0 \le \kappa < 1 / 4$ and $c_{3}^{*}>0$, we have

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&\le p \textrm{Pr}\left( \left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&= p \textrm{Pr}\left\{ \left| \hat{\zeta }_{j}(k)-\zeta _{j}(k)\right| \ge c_{3}^{*} n^{-\kappa }\right\} \\&\le p \textrm{Pr}\left\{ \left| \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right| \ge {n^{-1}(n+1)\left( c_{3}^{*} n^{-\kappa }-|\Delta |\right) }\right\} \\&\le 4(n+2) p \exp \left( -c_{4} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

where $c_{4}$ is a constant. The last inequality above holds due to $\Delta = O(1 / n)$. Let $c_3 = 12 (n+1) \hat{p}_k / (1-\hat{p}_k)c_3^{*} $, by the definition of $\hat{\upsilon }_{j k}$, we have

$$\begin{aligned}&\textrm{Pr} \left( \max _{1 \le j \le p}\left| \hat{\upsilon }_{j k}-\upsilon _{j k}\right| \ge c_{3} n^{-\kappa } \right) \\&= \textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \cdot \left| \hat{\tau }_{j, k}+\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&\le \textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{4} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

where we use the inequalities that $|\hat{\tau }_{j, k}+\tau _{j, k}| \le |\hat{\tau }_{j, k}| + |\tau _{j, k}| \le 1/2 + 1/2 = 1$. $\square $

1.6 Proof of Theorem 2

Proof

Let $c_{3} n^{-\kappa } = \rho _0 / 4$. It follows from Condition C2 and Eq. (A9) that

$$\begin{aligned}&\textrm{Pr}\left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \hat{\upsilon }_{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \hat{\upsilon }_{j, k}\right|<\frac{\rho _{0}}{2}\right) \\&\le \textrm{Pr}\left\{ \left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \hat{\upsilon }_{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \hat{\upsilon }_{j, k}\right| \right) \right. \\&\quad \left. - \left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \upsilon _{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \upsilon _{j, k}\right| \right) <-\frac{\rho _{0}}{2}\right\} \\&\le \textrm{Pr}\left\{ \left| \left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \hat{\upsilon }_{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \hat{\upsilon }_{j, k}\right| \right) \right. \right. \\&\quad \left. \left. -\left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \upsilon _{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \upsilon _{j, k}\right| \right) \right|>\frac{\rho _{0}}{2}\right\} \\&\le \textrm{Pr}\left( 2 \max _{1 \le j \le p}\left| \hat{\upsilon }_{j, k}-\upsilon _{j, k}\right| >\frac{\rho _{0}}{2}\right) \\&\le 4(n+2) p \exp \left( -c_{4}^{*} n \rho _{0}^{2} / K\right) , \end{aligned}$$

where $c_{4}^{*}>0$ is a constant. $K \log (n)= o\left( n \rho _{0}^{2}\right) $ and $K \log (p)=o\left( n \rho _{0}^{2}\right) $ ensures that, there exists some $n_{0}>0$ for $n>n_{0}$, $p \le \exp \left( c_{4}^{*} n \rho _{0}^{2} / 2 K\right) $ and $c_{4}^{*} n \rho _{0}^{2} / 2 K \ge 3 \log \{4(n+2)\}$. Consequently, we have that

$$\begin{aligned}&\sum _{n=n_0}^{\infty } 4(n+2) p \exp \left( -c_{4}^{*} n \rho _{0}^{2} / K\right) \\&\le \sum _{n=n_{0}}^{\infty } \exp \left[ -c_{4}^{*} n \rho _{0}^{2} / K + c_{4}^{*} n \rho _{0}^{2} / 2 K+\log \{4(n+2)\}\right] \\&\le \sum _{n=n_{0}}^{\infty } \exp [-3 \log \{4(n+2)\}+\log \{4(n+2)\}] \\&\le \sum _{n=n_{0}}^{\infty }\left\{ 4(n+2) \right\} ^{-2} \le \infty . \end{aligned}$$

By Borel-Contelli Lemma,

$$\begin{aligned} \liminf \limits _{n \rightarrow \infty } \left\{ \min \limits _{j_1 \in \mathcal {A}_{k_1}} \hat{\upsilon }_{j_1, k_1} -\max \limits _{j_2 \notin \mathcal {A}_{k_2}} \hat{\upsilon }_{j_2, k_2} \right\} > 0 \end{aligned}$$

almost surely. $\square $

1.7 Proof of Corollary 2

Proof

Suppose the Assumptions (i)–(ii) and the Conditions (C1)–C3 hold. Firstly, according to the Theorem 1, for any constant $c_4>0$, we have

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\upsilon }_{j k} - \upsilon _{j k}\right| \ge c_{3} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{4} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

where $c_4>0$ is a constant. Write $c_5 = (\sum _{k=1}^{K}s_k) \cdot c_5^{*}$. Then, we obtain

$$\begin{aligned}&\textrm{Pr} ( |{\tilde{\Upsilon }} - {\Upsilon }| \ge c_{5} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( |{\tilde{\Upsilon }} - {\Upsilon }| \le c_{5} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} \left( \sum _{k=1}^K \sum _{j \in \mathcal {A}_k}|\hat{\upsilon }_{jk} - {\upsilon }_j| \le c_5^{*} n^{-\kappa } \sum _{k=1}^{K}s_k \right) \\&\le 1 - \prod _{k=1}^K \prod _{j \in \mathcal {A}_k} \textrm{Pr} ( |\hat{\upsilon }_{jk} - {\upsilon }_j| \le c_{5}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{7} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k}. \end{aligned}$$

Similarly, let $c_6 = s \cdot K \cdot c_6^{*}$, we have

$$\begin{aligned}&\textrm{Pr} ( |{\tilde{\Upsilon }^{*}} - {\Upsilon }^{*} | \ge c_{6} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( |{\tilde{\Upsilon }^{*}} - {\Upsilon }^{*} | \le c_{6} n^{-\kappa } )\\&\le 1 - \prod _{j \in \mathcal {A}} \textrm{Pr} ( |{\tilde{\Upsilon }_{j}^{*}} - {\Upsilon }_j^{*})| \le c_{6}^{*} K n^{-\kappa } ) \\&\le 1 - \prod _{j \in \mathcal {A}} \prod _{k=1}^{K} \textrm{Pr} ( |\hat{\upsilon }_{jk} - {\upsilon }_j | \le c_{6}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{8} n^{1-2 \kappa -\xi } ) ]^{Ks}. \end{aligned}$$

$\square $

1.8 Proof of Lemma 3

Proof

Suppose Assumptions (i)–(ii) and Conditions (C1)–C2 hold. For the convenience of discussion, in this proof, the sparse clustering utility $\Upsilon $ is considered as a function of any given partition $\mathbb {C}= \{\mathcal {C}_1, \ldots , \mathcal {C}_K\}$, i.e, note the sparse clustering utilities to hypothesis testing (2) and hypothesis testing (8) as ${\Upsilon }(\mathbb {C})$ and ${\Upsilon }^{*}(\mathbb {C})$. Correspondingly, write the sufficient variable screening utilities of the jth variable as ${\upsilon }_{j k}(\mathbb {C})$ and ${\upsilon }_{j}(\mathbb {C})$.

Let the true partition $\mathbb {C}_0 = \left\{ \mathcal {C}_{0,1}, \ldots , \mathcal {C}_{0,K} \right\} $ separates the samples into K groups with the true mixture distributions $\mathcal {F}_0 = \{F_{0,1}, \ldots , F_{0,K} \}$, where the mixing proportion satisfies that $p_{0k} = \textrm{Pr}(\mathcal {C}_{0, k} | \varvec{x})$. For any given partition $\mathbb {C} = \left\{ \mathcal {C}_{1}, \ldots , \mathcal {C}_{K} \right\} $, write the true mixture distributions as $\mathcal {F} = \{F_{1}, \ldots , F_{K} \}$ with the mixing proportion $p_{k} = \textrm{Pr}(\varvec{X} \in \mathcal {C}_{k} )$. Denote that $\varvec{p} = (p_1, \ldots , p_K)^{\textsf{T}}$, $\varvec{p}_0 = (p_{01}, \ldots , p_{0K})^{\textsf{T}}$, $p_k=\textrm{Pr}\{x\in \mathcal {C}_k \}$, $\varvec{P}=[p_{kl}]_{K \times K}$, the elements $p_{k l} = \textrm{Pr}(\varvec{X} \in \mathcal {C}_{k} | \varvec{X} \in \mathcal {C}_{0\,l})$ in $\varvec{P}$ represents the probability of the lth component distribution of the partition $\mathbb {C}_0$ separate into the kth component distribution of the partition $\mathbb {C}$ and $\sum _{k=1}^{K} p_{k l} = 1$, $l = 1, \ldots , K$. Then for any $\mathbb {C}$, we can obtain that $\varvec{p} = \varvec{P} \cdot \varvec{p}_0$ and $F(\varvec{x}) = \varvec{p} \cdot (F_{1}(\varvec{x}), \cdots , F_{K}(\varvec{x})) = \varvec{p}_0 \cdot (F_{01}(\varvec{x}), \cdots , F_{0K}(\varvec{x}))$. For any $j \in \{1, \ldots , p\}$ and $k \in \{1, \ldots , K\}$, the marginal utility follows the relationship as

$$\begin{aligned} \tau _{j k}(\mathbb {C})&= \frac{p_{0 k} \cdot p_{k k}}{p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}} \cdot \tau _{j k}(\mathbb {C}_0) \\&\quad + \frac{p_{0 \bar{k}} \cdot p_{k \bar{k}}}{p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}} \cdot \tau _{j \bar{k}}(\mathbb {C}_0) \\&= \frac{2 \cdot p_{kk}-1}{p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}} \cdot p_{0 k} \cdot \tau _{j k}(\mathbb {C}_0), \end{aligned}$$

where $p_{0 \bar{k}} = 1 - p_{0 k}$, $p_{k \bar{k}} = 1 - p_{k k}$ and $\tau _{j \bar{k}}(\mathbb {C}_0) = - p_k \cdot \tau _{j k}(\mathbb {C}_0) / (1-p_k)$. By the definition of $\upsilon _{j k}$, we can obtain that

$$\begin{aligned} \upsilon _{j k}(\mathbb {C}) = \frac{12 (n+1) (2 p_{kk}-1)^2 }{\Delta _{p_{kk}} (1 - \Delta _{p_{kk}})} p_{0 k}^2 \tau _{j k}^2(\mathbb {C}_0), \end{aligned}$$

where $\Delta _{p_{kk}} = p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}$. Since that $(2 \cdot p_{k k} - 1)^2$ achieves maximum only and if only at $p_{kk} = 0$ and $p_{kk}= 1$, and $\Delta _{p_{kk}}$ achieves maximum only and if only at $p_{kk} = 0$ or $p_{kk}= 1$, we have

$$\begin{aligned} \max _{p_{kk}} \upsilon _{j k}(\mathbb {C})&= \max \{\upsilon _{j k}(\mathbb {C}_0)|_{p_{kk}=0}, \upsilon _{j k}(\mathbb {C}_0)|_{p_{kk}=1}\} \\&= \upsilon _{j k}(\mathbb {C})|_{p_{kk}=0} = \upsilon _{j k}(\mathbb {C})|_{p_{kk}=1}= \upsilon _{j k}(\mathbb {C}_0). \end{aligned}$$

Considering the non-label process of the clustering and the limitation that $\sum _{k=1}^{K} p_{k l} = 1$ for any $l=1, \ldots , K$, without losing the meaning, let $p_{kk}=1$ and $p_{k \bar{k}} = 0$. Then for any j, we conclude that

$$\begin{aligned} {\upsilon }_j(\mathbb {C}) = S_j \sum _{k = 1}^{K} \upsilon _{j k}(\mathbb {C}) \le {\upsilon }_j(\mathbb {C}_0), \end{aligned}$$

where the equivalence holds only when the matrix P is an identity matrix. Thus, $P = \varvec{I}_{K \times K}$ is a unique global optimal solution of ${\Upsilon _{d}}_j(\mathbb {C})$ when $S_j = 1$. Similarly, for any j, we obtain

$$\begin{aligned} {\upsilon }_{jk}(\mathbb {C})&= \sum _{k = 1}^{K} S_{j k} \upsilon _{j k}(\mathbb {C}) \le \sum _{k = 1}^{K} S_{j k} \cdot \upsilon _{j k}(\mathbb {C}_0) \\&= {\upsilon }_{j k}(\mathbb {C}_0), \end{aligned}$$

where the equivalence holds only when the matrix P is an identity matrix, which is the same as $S_j(\mathbb {C})$. It indicates that ${\Upsilon }(\mathbb {C})$ and ${\Upsilon }^{*}(\mathbb {C})$ achieve the maximum values ${\Upsilon }(\mathbb {C}_0)$ and ${\Upsilon }^{*}(\mathbb {C}_0)$, respectively, if and only if $\mathbb {C} = \mathbb {C}_0$, where

$$\begin{aligned} \mathbb {C}_0 = \mathop {\arg \max }_{\mathbb {C}} {\Upsilon }(\mathbb {C}) = \mathop {\arg \max }_{\mathbb {C}} {\Upsilon }^{*}(\mathbb {C}). \end{aligned}$$

$\square $

1.9 Proof of Theorem 3

Proof

Suppose Assumption (i), Assumption (ii) and Conditions (C1)–C3 hold. For any constant $c_9 = 2 \cdot c_9^{*}$ and $c_{10} = 2 \cdot c_{10}^{*}$, the follows inequalities are established by Corollary 2 and Lemma 3.

$$\begin{aligned}&\textrm{Pr} ( {\tilde{\Upsilon }}_0 - {\tilde{\Upsilon }} \ge c_{9} n^{-\kappa } ) = 1 - \textrm{Pr} ( {\tilde{\Upsilon }}_0 - {\hat{\Upsilon }} \le c_{9} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} \left( |{\tilde{\Upsilon }}_0 - {{\Upsilon }_0} + {\tilde{\Upsilon }} - {{\Upsilon }}| \le 2 \cdot c_{9}^{*} n^{-\kappa } \right) \\&\le 1 - \textrm{Pr} ( |{\tilde{\Upsilon }}_0 - {{\Upsilon }_0}| \le c_9^{*} n^{-\kappa } ) \textrm{Pr} ( |{\tilde{\Upsilon }} - {{\Upsilon }}| \le c_9^{*} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{11} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k} \right\} ^2 \end{aligned}$$

and

$$\begin{aligned}&\textrm{Pr} ( {\tilde{\Upsilon }}_0^{*} - {\tilde{\Upsilon }}^{*} \ge c_{10} n^{-\kappa } ) = 1 - \textrm{Pr} ( {\tilde{\Upsilon }}_0^{*} - {\hat{\Upsilon }}^{*} \le c_{10} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} \left( |{\tilde{\Upsilon }}_0^{*} - {{\Upsilon }_0^{*}} + {\tilde{\Upsilon }^{*}} - {{\Upsilon }^{*}}| \le 2 \cdot c_{10} n^{-\kappa } \right) \\&\le 1 - \textrm{Pr} ( |{\tilde{\Upsilon }}_0^{*} - {{\Upsilon }_0^{*}}| \le c_{10}^{*} n^{-\kappa } ) \textrm{Pr} ( |{\tilde{\Upsilon }}^{*} - {{\Upsilon }^{*}}| \le c_{10}^{*} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{12} n^{1-2 \kappa -\xi } ) ]^{Ks} \right\} ^2, \end{aligned}$$

where $c_{11}>0$ and $c_{12}>0$ are some constants. As $n \rightarrow \infty $, it can be written as

$$\begin{aligned} \liminf _{n \rightarrow \infty } \{ \tilde{\Upsilon }_0 - \tilde{\Upsilon } \} \ge 0, \quad \liminf _{n \rightarrow \infty } \{ \tilde{\Upsilon }^{*}_0 - \tilde{\Upsilon }^{*} \} \ge 0. \end{aligned}$$

Denote $\tilde{\mathbb {C}}_0 = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} {S}_{j k} \cdot b_{j k}$ and $\tilde{\mathbb {C}}_0^{*} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} {S}_{j} \cdot b_{j}$. We can get that $\liminf \limits _{n \rightarrow \infty } \tilde{\mathbb {C}}_0 = \liminf \limits _{n \rightarrow \infty } \tilde{\mathbb {C}}_0^{*}$. $\square $

1.10 Proof of Lemma 4

Proof

By the Eq. (15), the component of the OVR BCSS-RV cost object can be rewritten as

$$\begin{aligned} b_{j k}&= \frac{1}{n} \sum _{i, i^{\prime }}^{n} d_{i i^{\prime } j}- \frac{1}{n_k} \sum _{i, i^{\prime } \in \mathcal {C}_k} d_{i i^{\prime } j} - \frac{1}{n - n_k} \sum _{i, i^{\prime } \notin \mathcal {C}_k} d_{i i^{\prime } j} \\&= \sum _{i = 1}^n ( \xi _{i j} - \bar{{\xi }}_{j} )^2 - \sum _{i \in \mathcal {C}_k} ( \xi _{i j} - \bar{{\xi }}_{k j} )^2 \\&\quad - \sum _{i \notin \mathcal {C}_k} ( \xi _{i j} - \bar{{\xi }}_{k j}^{\prime } )^2 \\&= n_k ( \bar{{\xi }}_{k j} - \bar{{\xi }}_{j} )^2 + (n - n_k) ( \bar{{\xi }}_{k j}^{\prime } - \bar{{\xi }}_{j} )^2 \\&= n_k^{-1} ( \sum _{i \in \mathcal {C}_k} {\xi }_{i j} - n_k \bar{{\xi }}_{j} )^2 \\&\quad + (n - n_k)^{-1} ( \sum _{i \notin \mathcal {C}_k} {\xi }_{i j} - (n - n_k) \bar{{\xi }}_{j} )^2 \\&= \frac{n}{n_k (n - n_k)} ( \sum _{i \in \mathcal {C}_k} {\xi }_{i j} - n_k \bar{{\xi }}_{j} )^2 \\&= \frac{n (n+1)^2 n_k}{(n - n_k)} \left( \frac{1}{n+1} \bar{{\xi }}_{k j} - \frac{1}{2} \right) ^2 \\&\propto \frac{12 (n + 1) {\hat{p}}_k}{1 - {\hat{p}}_k} \hat{\tau }_{j k}^2 \\&= \hat{\upsilon }_{j k}, \end{aligned}$$

where we use the equation $\bar{{\xi }}_{k j} = {n_k}^{-1} \sum _{i \in \mathcal {C}_k} \xi _{i j}$, $\bar{{\xi }}_{k j}^{\prime } = (n - n_k)^{-1} \sum _{i \notin \mathcal {C}_k} \xi _{i j}$, and note $\bar{{\xi }}_{j} = {n}^{-1} \sum _{i = 1}^{n} \xi _{i j} = (n + 1) / 2$. Hence, one can obtain

$$\begin{aligned} \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{j k} \cdot \hat{\upsilon }_{j k} = \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{j k} \cdot b_{j k}. \end{aligned}$$

By the Eq. (15), the component of the OVR BCSS-RV cost object satisfies that $b_{j k} \propto (n-1)(n+1)^4 + \hat{\upsilon }_{j k}$ for any $j = 1, \ldots , p$ and any $k = 1, \ldots , K$. Hence, we also get that $b_j = \sum _{k=1}^{K} b_{j k} \propto K (n-1)(n+1)^4 + \sum _{k=1}^{K} \hat{\upsilon }_{j k} = K (n-1)(n+1)^4 + \hat{\upsilon }_{j}$. Due to $b_{j_1} > b_{j_2}$ for any $j_1 \in \hat{\mathcal {A}}$ and $j_2 \notin \hat{\mathcal {A}}$, we obtain that $\sum _{j=1}^{p} w_{j} b_{j}$ reaches optimal value at $w_j = S_j$ under $\Vert \varvec{w}\Vert _{\infty } \le 1$ and $\Vert \varvec{w}\Vert _{0} \le s$.

Similarly, the component of the OVR BCSS-RV cost object can be rewritten as

$$\begin{aligned} b_j = \sum _{k=1}^{K} b_{j k} \propto \sum _{k=1}^{K} \frac{12 (n + 1) {\hat{p}}_k}{1 - {\hat{p}}_k} \hat{\tau }_{j k}^2 = \hat{\upsilon }_{j}, \end{aligned}$$

where we use that $b_j = \sum _{k=1}^{K} b_{j k}$ and $\hat{\upsilon }_{j} = \sum _{k=1}^{K} \hat{\upsilon }_{j k} = \sum _{k=1}^{K} \frac{12 (n + 1) {\hat{p}}_k}{1 - {\hat{p}}_k} \hat{\tau }_{j k}^2$. Thus, we have

$$\begin{aligned} \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{k} \cdot \hat{\upsilon }_{j} = \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{j} \cdot b_{j}. \end{aligned}$$

Therefore, we get that $b_{j k} \propto (n-1)(n+1)^4 + \hat{\upsilon }_{j k}$ for any $j = 1, \ldots , p$ and any $k = 1, \ldots , K$. Due to $b_{j_1} > b_{j_2}$ for any $j_1 \in \hat{\mathcal {A}_{k}}$ and $j_2 \notin \hat{\mathcal {A}_{k}}$ with a same k, one can obtain that $\sum _{j=1}^{p} w_{j k} b_{j k}$ reaches the optimal value at $w_{j k} = S_{j k}$ under $\Vert \varvec{w}_k\Vert _{\infty } \le 1$ and $\Vert \varvec{w}_k\Vert _{0} \le s$ for all $k = 1, \ldots , K$. $\square $

1.11 Proof of Theorem 4

Proof

Due to the equivalence of $\hat{\upsilon }_{j k} \ge \rho $ and $|\hat{\tau }_{j, k}| \ge \rho \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}}$, The proof of Theorem 4 will be obtained by $|\hat{\tau }_{j, k}|$. For $k = 1, \ldots , K$ and $j = 1, \ldots , p$, note that

$$\begin{aligned}&\textrm{Pr}_{|\hat{\tau }_{j, k}|}(x) := \textrm{Pr} \left\{ |\hat{\tau }_{j, k}|< x \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} \right\} , \\&\textrm{Pr}_{|\hat{\tau }_{k}|}(x) := \textrm{Pr} \left\{ |\hat{\tau }_{j, k}| < x \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} , \right. \\&\qquad \qquad \qquad \qquad \left. \forall j \in \{j: j \notin \mathcal {A}_k\} \right\} , \\&\textrm{Pr}_{|\hat{\tau }_{k}|}^{c}(x) := \textrm{Pr} \left\{ |\hat{\tau }_{j, k}| \ge x \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} , \right. \\&\qquad \qquad \qquad \qquad \left. \exists j \in \{j: j \notin \mathcal {A}_k\} \right\} . \end{aligned}$$

Then, we have that if rejecting $\textrm{H}_{0,j,k}$, then we get

$$\begin{aligned} \textrm{Pr}_{|\hat{\tau }_{j, k}|}(x)&= \textrm{Pr}_{\hat{\tau }_{j, k}}(x) - \textrm{Pr}_{\hat{\tau }_{j, k}}(-x) \\&= 1 - 2 \textrm{Pr}_{\hat{\tau }_{j, k}}(-x). \end{aligned}$$

Hence, we obtain

$$\begin{aligned}&\textrm{Pr}_{|\hat{\tau }_{k}|}(x) = [ \textrm{Pr}_{|\hat{\tau }_{j, k}|}(x) ]^{p-s_k} = [ 1 - 2 \textrm{Pr}_{\hat{\tau }_{j, k}}(-x) ]^{p-s_k}, \\&\textrm{Pr}_{|\hat{\tau }_{k}|}^{c}(x) = 1 - \textrm{Pr}_{|\hat{\tau }_{k}|}(x) = 1 - [ 1 - 2 \textrm{Pr}_{\hat{\tau }_{j, k}}(-x) ]^{p-s_k}. \end{aligned}$$

When $\beta \in (2, + \infty )$, we have $\rho = o(n^{1/4})$ as a constant. Hence, we obtain

$$\begin{aligned} \frac{1-F_{\hat{\tau }_{j k}}(\sqrt{\rho })}{1/2p}=\frac{F_{\hat{\tau }_{j k}}(-\sqrt{\rho })}{1/2p}= 1 + O \left( n^{-1/2} \right) . \end{aligned}$$

(A10)

By the definition of $F_{|\hat{\tau }_{k}|}^{c}(x)$, it implies that

$$\begin{aligned} F_{|\hat{\tau }_{k}|}^{c}(k) = 1 - \left\{ 1 - p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] \right\} ^{q_k}. \end{aligned}$$

(A11)

By the ultra-high dimensional settings, we have $p = o\left( \exp \{n^{c} \} \right) , c > 0$ and $s_k = o(n)$. If $c > 1/2$, according to Eqs. (A10) and (A11), we have

$$\begin{aligned}&\left\{ 1 - p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] \right\} ^{q_k} \\&= \exp \left\{ q_k \cdot \log \left\{ 1 - p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] \right\} \right\} \\&= \exp \left\{ - q_k \cdot \left( p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] + o(1/p) \right) \right\} \\&= \exp \left\{ - 1 + O ( n^{-1/2} ) \right\} \exp \left\{ o(1/p) \right\} \rightarrow e^{-1}. \end{aligned}$$

To conclude, $F_{|\hat{\tau }_{k}|}^{c}(k) \rightarrow 1 - e^{-1}$ a.s. $n \rightarrow \infty $; in other words,

$$\begin{aligned}&\lim _{p \rightarrow \infty } {\text {Pr}} \{ {\text {FD}}_{k, \hat{\rho }_0}> 0 \} \\&= \lim _{p \rightarrow \infty } {\text {Pr}} \{ \sum _{j \in \mathcal {A}_{k}^c} I({\hat{\upsilon }}_{j k} \ge \rho ) > 0 \} \\&= 1 - e^{-1} \quad (k = 1, \ldots , K). \end{aligned}$$

The number of variables screened into the adaptive false discovery set is subjected to the p times Bernoulli testing; then, the expectation and variance of the number of the false discovery are

$$\begin{aligned}&{\text {EFD}} = q_k \cdot 1/p = ( 1 + O ( n^{-1/2}))(1 - o(p^{-1})) \\&= 1 + O ( n^{-1/2} ), \\&{\text {VFD}} = q_k \cdot 1/p \cdot (1-1/p) \\&= ( 1 + O ( n^{-1/2}))(1 - o(p^{-1}))^2 \\&= 1 + O ( n^{-1/2} ). \end{aligned}$$

$\square $

1.12 Proof of Theorem 5

Proof

According to the Condition C2, the definition of $\hat{\rho }_0$ in Sect. 4.1 and $\max \limits _{j \in \mathcal {A}_{k}} |\hat{\upsilon }_{j, k} - \upsilon _{j, k} | \le c_{13}^{*} n^{-\kappa }$ in Theorem 1, we have

$$\begin{aligned}&\min _{j \in \mathcal {A}_{k}} |\hat{\upsilon }_{j k}| \ge \min _{j \in \mathcal {A}_{k}} ( |\upsilon _{j k} | - |\hat{\upsilon }_{j, k} - \upsilon _{j k} | ) \\&\ge \min _{j \in \mathcal {A}_{k}} | \upsilon _{j k} |-\max _{j \in \mathcal {A}_{k}} |\upsilon _{j k} - \hat{\upsilon }_{j, k} | \ge c_{13}^{*} n^{-\kappa }. \end{aligned}$$

Therefore, we obtain that

$$\begin{aligned}&{\text {Pr}}\left( \mathcal {A}_{k} \subset \hat{\mathcal {A}}_{k, \hat{\rho }_0}\right) \ge {\text {Pr}}\left( \max _{j \in \mathcal {A}_{k}} |\hat{\upsilon }_{j, k}-\upsilon _{j, k}| \le c n^{-\kappa }\right) \\&\ge 1 - 4 (n + 1) p s_{r} \exp \left( -c_{13} n^{1-2 \kappa -\xi }\right) \end{aligned}$$

holds for some constant $c_{13}>0$. $\square $

1.13 Proof of Corollary 3

Proof

Suppose the Assumptions (i)–(ii) and the Conditions (C1)–C3 hold. Firstly, by the Theorem 1, for any positive constant $c_{15}^{*}$ satisfies that $c_{15}^{*} n^{-\kappa } = \hat{\rho }_{0}$, we get

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{15}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{17} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for some constant $c_{17} > 0$. Let $c_{15} = (K+\sum _{k=1}^{K}s_k) \cdot c_{15}^{*}$, by Theorem 1 and Theorem 4, we have

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\rho } - {\Upsilon } | \ge c_{15} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{\rho } - {\Upsilon } | \le c_{15} n^{-\kappa } )\\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{15}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{k=1}^{K} \prod _{j \in \mathcal {A}_k} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{15}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{17} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k}, \end{aligned}$$

where

$$\begin{aligned}&\textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{15}^{*} n^{-\kappa } \right) \\&= 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge c_{15}^{*} n^{-\kappa } \right) \\&\ge 1 -\sum _{j \notin \mathcal {A}_k} \textrm{Pr} \left( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge c_{15}^{*} n^{-\kappa } | j \in \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\qquad \cdot \textrm{Pr} \left( j \in \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\quad - \sum _{j \notin \mathcal {A}_k} \textrm{Pr} \left( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge c_{15}^{*} n^{-\kappa } | j \notin \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\qquad \cdot \textrm{Pr} \left( j \notin \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\ge 1 - q_k/p \cdot 4(n+2) p \exp (-c_{17} n^{1-2 \kappa -\xi } ) - q_k/p \cdot 0\\&\ge 1 - 4(n+2) p \exp (-c_{17} n^{1-2 \kappa -\xi } ) \end{aligned}$$

Similarly, by the Theorem 1, for any positive constant $c_{16}^{*}$ satisfies that $c_{16}^{*} n^{-\kappa } = \hat{\rho }_{0}$, we get

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{16}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{18} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for some constant $c_{18} > 0$. Let $c_{16} = (s + 1) \cdot K \cdot c_{16}^{*}$. By Theorem 1 and Theorem 4, we obtain

$$\begin{aligned}&\textrm{Pr} ( | {\hat{\Upsilon }_{\rho }}^{*} - {\Upsilon }^{*} | \ge c_{15} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( | {\hat{\Upsilon }_{\rho }}^{*} - {\Upsilon }^{*} | \le c_{16} n^{-\kappa } )\\&\le 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \le c_{16}^{*} K n^{-\kappa } \right) \\&\quad \cdot \prod _{j \in \mathcal {A}} \textrm{Pr} ( |\hat{\upsilon }_{j} - \upsilon _{j}| \le c_{16}^{*} K n^{-\kappa } ) \\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{16}^{*} n^{-\kappa } \right) \\&\quad \cdot \prod _{j \in \mathcal {A}} \prod _{k=1}^{K} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{16}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{18} n^{1-2 \kappa -\xi } ) ]^{(s+1)K}, \end{aligned}$$

where

$$\begin{aligned}&\textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \le Kc_{16}^{*} n^{-\kappa } \right) \\&= 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \ge Kc_{16}^{*} n^{-\kappa } \right) \\&\ge 1 - \sum _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge Kc_{16}^{*} n^{-\kappa } \right) \\&\ge 1 -\sum _{k=1}^{K} \sum _{j \notin \mathcal {A}} \textrm{Pr} \left( |\hat{\upsilon }_{j} - \upsilon _{j}| \ge c_{16}^{*} n^{-\kappa } | j \in \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\qquad \cdot \textrm{Pr} \left( j \in \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\quad - \sum _{k=1}^{K} \sum _{j \notin \mathcal {A}} \textrm{Pr} \left( |\hat{\upsilon }_{j} - \upsilon _{j}| \ge c_{16}^{*} n^{-\kappa } | j \notin \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\qquad \cdot \textrm{Pr} \left( j \notin \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\ge \left[ 1 - q/p \cdot 4(n+2) p \exp (-c_{18} n^{1-2 \kappa -\xi } ) - q/p \cdot 0 \right] ^{K}\\&\ge \left[ 1 - 4(n+2) p \exp (-c_{18} n^{1-2 \kappa -\xi } ) \right] ^{K}. \end{aligned}$$

$\square $

1.14 Proof of Theorem 6

Proof

Suppose Assumptions (i)–(ii) and Conditions (C1)–C3 hold. By Lemma 2 and Corollary 3, for any positive constants $c_{19,1}$ and $c_{19,2}$, such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0} - {{\Upsilon }}_{0, \hat{\rho }_0}| \ge c_{19,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{21} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k} \end{aligned}$$

and

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\hat{\rho }_0} - {{\Upsilon }}_{\hat{\rho }_0}| \ge c_{19,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{21} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k} \end{aligned}$$

hold for some positive constant $c_{21}$. Let $c_{19} = c_{19,1} + c_{19,2}$. Thus, we obtain that

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}_{\hat{\rho }_0} \ge c_{19} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}_{\hat{\rho }_0} \le c_{19} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0} - {{\Upsilon }}_{0, \hat{\rho }_0}| \le c_{19,1} n^{-\kappa } ) \\&\quad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0} - {{\Upsilon }}_{0, \hat{\rho }_0}| \le c_{19,2} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{21} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k} \right\} ^2, \end{aligned}$$

Similarly, By Lemma 2 and Corollary 3, for any positive constant $c_{20} = c_{20,1} + c_{20,2}$, there exists $c_{20,1}$ and $c_{20,2}$ are positive constants, such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {{\Upsilon }}_{0, \hat{\rho }_0}^{*}| \ge c_{20,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{22} n^{1-2 \kappa -\xi } ) ]^{K(s+1)} \end{aligned}$$

and

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\hat{\rho }_0}^{*} - {{\Upsilon }}_{\hat{\rho }_0}^{*}| \ge c_{20,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{22} n^{1 - 2 \kappa -\xi } ) ]^{K(s+1)} \end{aligned}$$

hold for some constant $c_{22}>0$. Then, we get that

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {\hat{\Upsilon }}_{\hat{\rho }_0}^{*} \ge c_{20} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {\hat{\Upsilon }}_{\hat{\rho }_0}^{*} \le c_{20} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {{\Upsilon }}_{0, \hat{\rho }_0}^{*}| \le c_{20,1} n^{-\kappa } ) \\&\quad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {{\Upsilon }}_{0, \hat{\rho }_0}^{*}| \le c_{20,2} n^{-\kappa } ) \\&\le \{1 - [ 1 - 4(n+2) p \exp (-c_{22} n^{1-2 \kappa -\xi } ) ]^{K(s+1)}\}^2, \end{aligned}$$

When $n \rightarrow \infty $, it can be written as

$$\begin{aligned} \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}_{\hat{\rho }_0} \} \ge 0, \quad \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}^{*}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}^{*}_{\hat{\rho }_0} \} \ge 0. \end{aligned}$$

Denote $\hat{\mathbb {C}}_{0, \hat{\rho }_0} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{\hat{\rho }_0, j k} \cdot \hat{\upsilon }_{j k}$ and $\hat{\mathbb {C}}_{0, \hat{\rho }_0}^{*} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \hat{S}_{\hat{\rho }_0, j} \cdot \hat{\upsilon }_{j}$. We have that

$$\begin{aligned} \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \hat{\rho }_0} = \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \hat{\rho }_0}^{*}. \end{aligned}$$

$\square $

1.15 Proof of Theorem 7

Proof

Similar to proof of Theorem 5, we have

$$\begin{aligned} {\text {Pr}} (\mathcal {A}_{k} \subset \hat{\mathcal {A}}_{k, \alpha } ) \ge 1- 8p s_{k} \exp \left( -c_{23} n^{1-2 \kappa -\xi }\right) \end{aligned}$$

holds for some constant $c_{7}>0$.

To prove $\widehat{\textrm{FDR}}_{\hat{\rho }_k} \rightarrow \alpha $ in probability, under the assumption that $q_k/p \rightarrow 1$ as $p\rightarrow \infty $ and for any $\rho > 0$, by Corollary 1 and the Hoeffding’s inequality, it suffices to show that

$$\begin{aligned} \sup _{0 \le \rho \le \epsilon }\left[ \sum _{j \in \mathcal {A}_{k}^c} I({\hat{\upsilon }}_{j k} \ge \rho ) / \left\{ p S_{\chi ^2_1}(\rho )\right\} \right] \rightarrow 1 \end{aligned}$$

(A12)

in probability as $n \rightarrow \infty $, where $\epsilon > 0$ is a constant$S_{\chi ^2_1}(\rho ) = 1 - F_{\chi ^2_1}(\rho )$, the survival function of the distribution of $\chi ^2_1$. Thus,

$$\begin{aligned} {\text {FDP}}_{k, \rho }&= \frac{{\text {FD}}_{k, \rho }}{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \}} \\&= \frac{\sum _{j \in \mathcal {A}_{k}^c} I({\hat{\upsilon }}_{j k} \ge \rho ) / q_k}{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \} / p} \\&= \frac{p S_{\chi ^2_1}(\rho )}{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \}} \\&= \frac{p S_{\chi ^2_1}(\rho )}{\max \{p S_{\chi ^2_1}(\rho ) + \sum _{j \in \mathcal {A}_{k}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \}}. \end{aligned}$$

Notice that $\sum _{j \in \mathcal {A}_{k}} I({\hat{\upsilon }}_{j k} \ge \rho )$ is monotone in $\rho $ and asymptotically converges to $s_k$, and $S_{\chi ^2_1}(\rho )$ is continuous and monotone. Then, there exists a unique constant $0< \tilde{\rho }_k \le C n^{-\beta }$ such that

$$\begin{aligned} \frac{p S_{\chi ^2_1}(\hat{\rho }_k) }{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \hat{\rho }_k), 1 \}} = \alpha \end{aligned}$$

(A13)

in probability as $n \rightarrow \infty $. Therefore, according to the Eqs. (A12) and (A13), we obtain that

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{\widehat{{\text {FDR}}}_{\hat{\rho }_k}}{\alpha }=1. \end{aligned}$$

$\square $

1.16 Proof of Corollary 4

Proof

Suppose the Assumptions (i)–(ii) and the Conditions (C1)–C3 hold. By Theorem 1, for any positive constant $c_{24}^{*} n^{-\kappa } = \max \{ \hat{\rho }_{\alpha , 1}, \ldots , \hat{\rho }_{\alpha , K} \}$,

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{24}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{26} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for some constant $c_{26} > 0$. Due to that

$$ \begin{aligned}&\textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&= 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&\ge 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k \& j \in \hat{\mathcal {A}}_{\alpha , k}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&\ge \left[ 1 - 4(n+2) p \exp (-c_{26} n^{1-2 \kappa -\xi } ) \right] ^{\frac{\alpha }{1-\alpha } s_k}. \end{aligned}$$

Then, by Theorem 1, Theorem 4, and Theorem 7, let $c_{24} = c_{24}^{*} \sum _{k=1}^{K}s_k/(1-\alpha )$, we have

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha } - {\Upsilon } | \ge c_{24} n^{-\kappa } ) = 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha } - {\Upsilon } | \le c_{24} n^{-\kappa } )\\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{k=1}^{K} \prod _{j \in \mathcal {A}_k} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{24}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{26} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k/(1-\alpha )}. \end{aligned}$$

Similarly, by the Theorem 1, for any constant $c_{25}^{*} > 0$ satisfies that $c_{25}^{*} n^{-\kappa } = \hat{\rho }_{\alpha }$,

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{25}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{27} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for constant $c_{27} > 0$. Let $c_{25} = c_{25}^{*} Ks/(1-\alpha )$. By Theorem 1 and Theorem 4, we obtain

$$\begin{aligned}&\textrm{Pr} ( | {\hat{\Upsilon }_{\alpha }}^{*} - {\Upsilon }^{*} | \ge c_{25} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( | {\hat{\Upsilon }_{\alpha }}^{*} - {\Upsilon }^{*} | \le c_{25} n^{-\kappa } )\\&\le 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \le \frac{\alpha }{1-\alpha } s K c_{25}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{j \in \mathcal {A}} \textrm{Pr} ( |\hat{\upsilon }_{j} - \upsilon _{j}| \le c_{25}^{*} K n^{-\kappa } ) \\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le \frac{\alpha }{1-\alpha } s c_{25}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{j \in \mathcal {A}} \prod _{k=1}^{K} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{25}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{27} n^{1-2 \kappa -\xi } ) ]^{sK/(1-\alpha )}. \end{aligned}$$

$\square $

1.17 Proof of Theorem 8

Proof

Suppose Assumptions (i)–(ii) and Conditions (C1)–C3 hold. By Lemma 2 and Corollary 4, for constant $c_{28}= c_{28,1} + c_{28,2}$, where $c_{28,1}$ and $c_{28,2}$ are positive constants, such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha } - {{\Upsilon }}_{0, \alpha }| \ge c_{28,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{30} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k /(1-\alpha )}, \\&\textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha } - {{\Upsilon }}_{\alpha }| \ge c_{28,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{30} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k / (1-\alpha )} \end{aligned}$$

hold for positive constant $c_{21}$. Thus, we obtain

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha } - {\hat{\Upsilon }}_{\alpha } \ge c_{28} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha } - {\hat{\Upsilon }}_{\alpha } \le c_{28} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha } - {{\Upsilon }}_{0, \alpha }| \le c_{28,1} n^{-\kappa } ) \\&\qquad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha } - {{\Upsilon }}_{0, \alpha }| \le c_{28,2} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{30} n^{1-2 \kappa -\xi } ) ]^{\frac{\sum _{k=1}^{K}s_k}{(1-\alpha )}} \right\} ^2, \end{aligned}$$

Similarly, By Lemma 2 and Corollary 3, for any positive constant $c_{29} = c_{29,1} + c_{29,2}$, there exists $c_{29,1}$ and $c_{29,2}$ are positive constants, such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha }^{*} - {{\Upsilon }}_{0, \alpha }^{*}| \ge c_{29,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{31} n^{1-2 \kappa -\xi } ) ]^{Ks/(1-\alpha )}, \\&\textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha }^{*} - {{\Upsilon }}_{\alpha }^{*}| \ge c_{29,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{31} n^{1 - 2 \kappa -\xi } ) ]^{Ks/(1-\alpha )} \end{aligned}$$

hold for some positive constant $c_{31}$. Then, we get

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha }^{*} - {\hat{\Upsilon }}_{\alpha }^{*} \ge c_{29} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha }^{*} - {\hat{\Upsilon }}_{\alpha }^{*} \le c_{29} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha }^{*} - {{\Upsilon }}_{0, \alpha }^{*}| \le c_{29,1} n^{-\kappa } ) \\&\qquad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha }^{*} - {{\Upsilon }}_{0, \alpha }^{*}| \le c_{29,2} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{31} n^{1-2 \kappa -\xi } ) ]^{Ks/(1-\alpha )} \right\} ^2, \end{aligned}$$

When $n \rightarrow \infty $, it can be written as

$$\begin{aligned} \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}_{0, \alpha } - {\hat{\Upsilon }}_{\alpha } \} \ge 0, \quad \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}^{*}_{0, \alpha } - {\hat{\Upsilon }}^{*}_{\alpha } \} \ge 0. \end{aligned}$$

Denote $\hat{\mathbb {C}}_{0, \alpha } = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{\alpha , j k} \cdot \hat{\upsilon }_{j k}$ and $\hat{\mathbb {C}}_{0, \alpha }^{*} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \hat{S}_{\alpha , j} \cdot \hat{\upsilon }_{j}$. We have that

$$\begin{aligned} \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \alpha } = \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \alpha }^{*}. \end{aligned}$$

$\square $

Supplements

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yuan, Z., Chen, J., Qiu, H. et al. Adaptive sufficient sparse clustering by controlling false discovery. Stat Comput 34, 193 (2024). https://doi.org/10.1007/s11222-024-10507-4

Download citation

Received: 20 March 2024
Accepted: 19 September 2024
Published: 07 October 2024
DOI: https://doi.org/10.1007/s11222-024-10507-4

Adaptive sufficient sparse clustering by controlling false discovery

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Conditional sufficient variable selection with prior information

GOLFS: feature selection via combining both global and local information for high dimensional clustering

Sparse Weighted K-Means for Groups of Mixed-Type Variables

Explore related subjects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Main proof

1.1 Proof of Lemma 1

Proof

1.2 Proof of Proposition 1

Proof

1.3 Proof of Lemma 2

Proof

1.4 Proof of Corollary 1

Proof

1.5 Proof of Theorem 1

Proof

1.6 Proof of Theorem 2

Proof

1.7 Proof of Corollary 2

Proof

1.8 Proof of Lemma 3

Proof

1.9 Proof of Theorem 3

Proof

1.10 Proof of Lemma 4

Proof

1.11 Proof of Theorem 4

Proof

1.12 Proof of Theorem 5

Proof

1.13 Proof of Corollary 3

Proof

1.14 Proof of Theorem 6

Proof

1.15 Proof of Theorem 7

Proof

1.16 Proof of Corollary 4

Proof

1.17 Proof of Theorem 8

Proof

Supplements

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation