Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Adaptive sufficient sparse clustering by controlling false discovery

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Sparse clustering divides samples into distinct groups while simultaneously reducing dimensionality in ultra-high dimensional unsupervised learning. To achieve sparse clustering that can screening out noise variables, an adaptive sparse clustering framework based on sufficient variable screening, abbreviated as adaptive sufficient sparse clustering (ASSC), is developed by controlling false discovery. Without any specific model, ASSC employs a composite hypothesis testing procedure that leverages conditional marginal correlations of variables across distinct groups, aiming to pinpoint sufficient variables via sparse clustering and promoting the performance of sparse clustering in ultra-high dimensionality. To control false discoveries of the hypothesis testing procedure at a predetermined level, ASSC provides the adaptive threshold for identifying conditional marginal dependency, which ensures to accuracy of clustering with high probability. Under mild conditions, the sufficient screening properties and sufficient clustering properties of ASSC are established based on variable screening procedure by controlling false discovery. Numerical studies on synthetic data and real datasets corroborate the performance and flexibility of ASSC, and underscore its potent utility in unsupervised learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Sequence data sets that support the findings of this study are download from the website as follows: 1.gene expression cancer RNA-Seq: https://archive.ics.uci.edu/dataset/401/gene+expression+cancer+rna+seq 2.lung CT image dataset: https://www.kaggle.com/datasets/kmader/finding-lungs-in-ct-data

References

  • Adeen, N., Abdulazeez, M., Zeebaree, D.: Systematic review of unsupervised genomic clustering algorithms techniques for high dimensional datasets. Technol. Rep. Kansai Univ. 62(3), 355–374 (2020)

    Google Scholar 

  • Amiri, S., Clarke, B.S., Clarke, J.L.: Clustering categorical data via ensembling dissimilarity matrices. J. Comput. Graph. Stat. 27(1), 195–208 (2018)

    Article  MathSciNet  Google Scholar 

  • Benati, S., García, S., Puerto, J.: Mixed integer linear programming and heuristic methods for feature selection in clustering. J. Operat. Res. Soc. 69(9), 1379–1395 (2018)

    Article  Google Scholar 

  • Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc.: Ser. B (Methodol.) 57(1), 289–300 (1995)

    Article  MathSciNet  Google Scholar 

  • Bishop, C.: Pattern recognition and machine learning. 16, 140–155 (2006)

  • Costa, V., Aprile, M., Esposito, R., Ciccodicola, A.: RNA-SEQ and human complex diseases: recent accomplishments and future perspectives. Eur. J. Hum. Genet. 21(2), 134–142 (2013)

    Article  Google Scholar 

  • Cui, H., Li, R., Zhong, W.: Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc. 110(510), 630–641 (2015)

    Article  MathSciNet  Google Scholar 

  • Chipman, H.A., Tibshirani, R.: Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7, 286–301 (2005)

    Article  Google Scholar 

  • Chang, J., Tang, C.Y., Wu, Y.: Marginal empirical likelihood and sure independence feature screening. Ann. Stat. 41(4), 2123–2148 (2013)

    Article  MathSciNet  Google Scholar 

  • Chang, X., Wang, Y., Li, R., Xu, Z.: Sparse k-means with \(\ell _\infty \)/\(\ell _0\) penalty for high-dimensional data clustering. Stat. Sin. 28(3), 1265–1284 (2018)

    Google Scholar 

  • Donoho, D.L.: High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geomet. 35(4), 617–652 (2006)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Feng, Y., Song, R.: Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 106(494), 544–557 (2011)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B Stat. Methodol. 70(5), 849–883 (2008)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Song, R.: Sure independence screening in generalized linear models with np-dimensionality. Ann. Stat. 38(6), 3567–3604 (2010)

    Article  MathSciNet  Google Scholar 

  • Guo, J., Levina, E., Michailidis, G., Zhu, J.: Pairwise variable selection for high-dimensional model-based clustering. Biometrics 66(3), 793–804 (2010)

    Article  MathSciNet  Google Scholar 

  • Guo, X., Ren, H., Zou, C., Li, R.: Threshold selection in feature screening for error rate control. J. Am. Stat. Assoc. 1–13 (2022)

  • Han, X.: Nonparametric screening under conditional strictly convex loss for ultrahigh dimensional sparse data. Ann. Stat. 47(4), 1995–2022 (2019)

    Article  MathSciNet  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.H.: The elements of statistical learning: Data mining, inference, and prediction 2nd edition. (2020)

  • Hallquist, M.N., Wiley, J.F.: Mplusautomation: An r package for facilitating large-scale latent variable analyses in mplus. Struct. Equ. Model. 25(4), 621–638 (2018)

    Article  MathSciNet  Google Scholar 

  • He, X., Wang, L., Hong, H.G.: Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 41(1), 342–369 (2013)

    Article  MathSciNet  Google Scholar 

  • Hao, N., Zhang, H.H.: A note on high-dimensional linear regression with interactions. Am. Stat. 71(4), 291–297 (2017)

    Article  MathSciNet  Google Scholar 

  • Lin, X., Guan, J., Chen, B., Zeng, Y.: Unsupervised feature selection via orthogonal basis clustering and local structure preserving. IEEE Trans. Neural Netw. Learn. Syst. 1–12 (2021)

  • Liu, W., Ke, Y., Liu, J., Li, R.: Model-free feature screening and FDR control with knockoff features. J. Am. Stat. Assoc. 117(537), 428–443 (2022)

    Article  MathSciNet  Google Scholar 

  • Lee, H., Li, J.: Variable selection for clustering by separability based on ridgelines. J. Comput. Graph. Stat. 21(2), 315–336 (2012)

    Article  MathSciNet  Google Scholar 

  • Liu, W., Li, R.: In Fuleky, P. (ed.) Variable Selection and Feature Screening, pp. 293–326. Springer, Cham (2020)

  • Lu, J., Lin, L.: Model-free conditional screening via conditional distance correlation. Stat. Pap. 61(1), 225–244 (2020)

    Article  MathSciNet  Google Scholar 

  • Li, G., Peng, H., Zhang, J., Zhu, L.: Robust rank correlation based screening. Ann. Stat. 40(3), 1846–1877 (2012)

    Article  MathSciNet  Google Scholar 

  • Lim, D.K., Rashid, N.U., Ibrahim, J.G.: Model-based feature selection and clustering of RNA-SEQ data for unsupervised subtype discovery. Annals Appl. Stat. 15(1), 481–508 (2021)

    Article  MathSciNet  Google Scholar 

  • Lin, L., Sun, J., Zhu, L.: Nonparametric feature screening. Comput. Stat. Data Anal. 67, 162–174 (2013)

    Article  MathSciNet  Google Scholar 

  • Li, Z., Tang, J.: Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans. Image Process. 24(12), 5343–5355 (2015)

    Article  MathSciNet  Google Scholar 

  • Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. J. Am. Stat. Assoc. 107(499), 1129–1139 (2012)

    Article  MathSciNet  Google Scholar 

  • MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967). Oakland

  • Mohamed, I.B., Mirakhmedov, S.M.: Approximation by normal distribution for a sample sum in sampling without replacement from a finite population. Sankhya A 78, 188–220 (2016)

  • Maji, P., Pal, S.K.: Fuzzy-rough feature selection using -information measures, pp. 117–159 (2012)

  • Pollard, D.: Convergence of Stochastic Processes. Springer, Berlin (2012)

    Google Scholar 

  • Qiu, H., Chen, J., Yuan, Z.: Quantile correlation-based sufficient variable screening by controlling false discovery rate. Adv. Theory Simul. 7(5), 2301099 (2024)

    Article  Google Scholar 

  • Rosenberg, J.M., Beymer, P.N., Anderson, D.J., Lissa, Cj., Schmidt, J.A.: tidylpa: an r package to easily carry out latent profile analysis (LPA) using open-source or commercial software. J. Open Sour. Softw. 3(30), 978 (2018)

    Article  Google Scholar 

  • Seo, B., Lin, L., Li, J.: Block-wise variable selection for clustering via latent states of mixture models. J. Comput. Graph. Stat. 31(1), 138–150 (2022)

    Article  MathSciNet  Google Scholar 

  • Shao, S., Tunc, C., Al-Shawi, A., Hariri, S.: Automated twitter author clustering with unsupervised learning for social media forensics. In: 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), pp. 1–8 (2019)

  • Tan, M., Tsang, I.W., Wang, L.: Minimax sparse logistic regression for very high-dimensional feature selection. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1609–1622 (2013)

    Article  Google Scholar 

  • Tang, W., Xie, J., Lin, Y., Tang, N.: Quantile correlation-based variable selection. J. Bus. Econ. Stat. 1–13 (2021)

  • Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  • Wallace, M.L., Buysse, D.J., Germain, A., Hall, M.H., Iyengar, S.: Variable selection for skewed model-based clustering: Application to the identification of novel sleep phenotypes. J. Am. Stat. Assoc. 113(521), 95–110 (2018)

    Article  MathSciNet  Google Scholar 

  • Wang, Y., Chang, X., Li, R., Xu, Z.: Sparse k-means with the \(\ell _q(0 \le q < 1)\) constraint for high-dimensional data clustering. In: 2013 IEEE 13th International Conference on Data Mining, pp. 797–806 (2013)

  • Wan, Q., Dingerdissen, H., Fan, Y., Gulzar, N., Pan, Y., Wu, T.-J., Yan, C., Zhang, H., Mazumder, R.: Bioxpress: an integrated RNA-SEQ-derived gene expression database for pan-cancer analysis. Database 2015, 019 (2015)

    Article  Google Scholar 

  • Wang, H., Pang, G., Shen, C., Ma, C.: Unsupervised representation learning by predicting random distances. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 2950–2956 (2021)

  • Witten, D., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105, 713–726 (2010)

    Article  MathSciNet  Google Scholar 

  • Wang, B., Zhang, Y., Sun, W.W., Fang, Y.: Sparse convex clustering. J. Comput. Graph. Stat. 27(2), 393–403 (2018)

    Article  MathSciNet  Google Scholar 

  • Xie, J., Lin, Y., Yan, X., Tang, N.: Category-adaptive variable screening for ultra-high dimensional heterogeneous categorical data. J. Am. Stat. Assoc. 115(530), 747–760 (2020)

    Article  MathSciNet  Google Scholar 

  • Yang, M.-S., Benjamin, J.B.M.: Sparse possibilistic c-means clustering with lasso. Pattern Recogn. 138, 109348 (2023)

    Article  Google Scholar 

  • Yuan, Q., Chen, X., Ke, C., Yin, X.: Independence index sufficient variable screening for categorical responses. Comput. Stat. Data Anal. 174, 107530 (2022)

    Article  MathSciNet  Google Scholar 

  • Yuan, Z., Chen, J., Qiu, H., Huang, Y.: Quantile-adaptive sufficient variable screening by controlling false discovery. Entropy 25(3), 524 (2023)

    Article  MathSciNet  Google Scholar 

  • Yin, X., Hilafu, H.: Sequential sufficient dimension reduction for large \(p\), small \(n\) problems. J. R. Stat. Soc. B (Stat. Methodol.) 77(4), 879–892 (2015)

    Article  MathSciNet  Google Scholar 

  • Yao, C., Liu, Y.-F., Jiang, B., Han, J., Han, J.: Lle score: a new filter-based unsupervised feature selection method based on nonlinear manifold embedding and its application to image recognition. IEEE Trans. Image Process. 26(11), 5257–5269 (2017)

    Article  MathSciNet  Google Scholar 

  • Yu, H., Wang, Y., Zeng, D.: A general framework of nonparametric feature selection in high-dimensional data. Biometrics 79(2), 951–963 (2023)

    Article  MathSciNet  Google Scholar 

  • Zhu, L.-P., Li, L., Li, R., Zhu, L.-X.: Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 106(496), 1464–1475 (2011)

    Article  MathSciNet  Google Scholar 

  • Zhao, Y., Shrivastava, A.K., Tsui, K.L.: Regularized gaussian mixture model for high-dimensional clustering. IEEE Trans. Cyber. 49(10), 3677–3688 (2018)

    Article  Google Scholar 

  • Zhou, T., Zhu, L., Xu, C., Li, R.: Model-free forward screening via cumulative divergence. J. Am. Stat. Assoc. 115(531), 1393–1405 (2020)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Many thanks to the reviewers for their positive feedback, valuable comments, and constructive suggestions that helped improve the quality of this article. Many thanks to the editors’ great help and coordination for the publication of this paper.

Funding

This research was supported by the National Natural Science Foundation of China under Grant No.81671633 to J.C.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, J.C.; Data curation, Z.Y.; Formal analysis, Z.Y.; Funding acquisition, J.C.; Investigation, Z.Y.; Methodology, Z.Y.; Project administration, H.Q. and H.W.; Supervision, J.C.; visualization, H.Q. and H.W.; Validation, Y.H.; Writing-original draft, Z.Y.; Writing-review and editing, Z.Y. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jiaqing Chen.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Main proof

1.1 Proof of Lemma 1

Proof

According to the definition of \(\mathcal {A}_k\) in Assumption (i), for any \(\varvec{x}_{\mathcal {A}_k^c} \in \mathbb {R}_{\mathcal {A}_k^c}\), we have

$$\begin{aligned} \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}) = \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}_{\mathcal {A}_{{k}}}). \end{aligned}$$
(A1)

Due to that \(\varvec{X} = ( \varvec{X}_{\mathcal {A}_k}, \varvec{X}_{\mathcal {A}_k^{c}} )\), for any \(\varvec{x}_{\mathcal {A}_k^c} \in \mathbb {R}_{\mathcal {A}_k^c}\), multiply \(\textrm{Pr} (\varvec{X}_{\mathcal {A}_k^{c}} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k})\) on both sides of the Eq. (A1), we obtain that

$$\begin{aligned}&\textrm{Pr} (\varvec{X} \in \mathcal {C}_k, \varvec{X}_{\mathcal {A}^{c}_k} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k}) \\&= \textrm{Pr} (\varvec{X}_{\mathcal {A}^{c}_k} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k}) \cdot \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}_{\mathcal {A}_k}). \end{aligned}$$

Due to the multiplication theorem of probability, it is clear that \( \textrm{I} ( \varvec{X} \in \mathcal {C}_k ) {\not \perp \!\!\! \perp } \mathcal {A}_k^{c} \mid \mathcal {A}_k. \) In terms of invertibility, we proved that

$$\begin{aligned} \mathcal {A}_k&= \{ 1 \le j \le p: \textrm{Pr} (\varvec{X} \in \mathcal {C}_k \mid \varvec{X}) \\&\qquad \text { functionally depends on } X_j \} \\&= \{ 1 \le j \le p: \textrm{I} ( \varvec{X} \in \mathcal {C}_k ) {\not \perp \!\!\! \perp } X_j \mid \varvec{X}_{\mathcal {A}_k}\}. \end{aligned}$$

Thus, \(\mathcal {A}_k\) is the sufficient screening variables index set. When \(K = 2\), it is easy to find that \(\textrm{Pr} (\varvec{X} \in \mathcal {C}_2 | \varvec{X}) = 1 -\textrm{Pr} (\varvec{X} \in \mathcal {C}_1 | \varvec{X})\). Thus, \(\mathcal {A}_1 = \mathcal {A}_2\).

Note that \(\mathcal {A}_k^c = \{ 1 \le j \le p: \textrm{I} ( \varvec{X} \in \mathcal {C}_k ) {\perp \!\!\! \perp } X_j | \varvec{X}_{\mathcal {A}_k}\}\). It indicates that for any \(\varvec{x}_{\mathcal {A}_k^c}\) belong to the support set \(\mathbb {R}_{\mathcal {A}_k^c}\), we have

$$\begin{aligned}&\textrm{Pr} ( \varvec{X} \in \mathcal {C}_k , \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} ) \\&= \textrm{Pr} ( \varvec{X} \in \mathcal {C}_k \mid \varvec{X}_{\mathcal {A}_k} ) \cdot \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} )\\ \iff&\textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X} \in \mathcal {C}_k , \varvec{X}_{\mathcal {A}_k} ) \\&= \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} ) \\ \iff&\mathbb {E}_{\varvec{X}_{\mathcal {A}_k^c}} \left\{ \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X} \in \mathcal {C}_k , \varvec{X}_{\mathcal {A}_k} ) \right\} \\&= \mathbb {E}_{\varvec{X}_{\mathcal {A}_k^c}} \left\{ \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \le \varvec{x}_{\mathcal {A}_k^c} \mid \varvec{X}_{\mathcal {A}_k} ) \right\} \\ \iff&\textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} \mid \varvec{X} \in \mathcal {C}_k ) = \textrm{Pr} ( \varvec{X}_{\mathcal {A}_k^c} ) . \end{aligned}$$

Thus, we obtain that \(F_j(x\mid \varvec{X} \in \mathcal {C}_k) - F_j(x) = 0\) holds when \(j \notin \mathcal {A}_k\). \(\square \)

1.2 Proof of Proposition 1

Proof

Suppose Assumptions (i)-(ii) hold. If \(x_j\) is not sufficient active for kth cluster \(\mathcal {C}_k\), we have \(F_{j, k}\left( x\right) = \textrm{Pr} \left( x_j \le x | \varvec{X} \in \mathcal {C}_k \right) = \textrm{Pr} (x_j \le x) = F_{j}\left( x\right) \). By the definition of Eq. (3), we obtain

$$\begin{aligned} \tau _{j k}&= \mathbb {E}_x \left\{ F_{j, k}\left( x\right) - F_{j}\left( x\right) \right\} = \mathbb {E}_x\left\{ F_{j, k}\left( x\right) \right\} -\frac{1}{2} \\&= \mathbb {E}_x\left\{ F_{j}\left( x\right) \right\} - \frac{1}{2} = 0. \end{aligned}$$

Hence, \(\upsilon _{j k} = 12 \cdot (n + 1) \cdot p_k / (1 - p_k) \tau _{j k}^2 = 0\). When \(x_j\) is not sufficient variable with the partition \(\mathbb {C}\), it implies that \(\upsilon _{j k} = 0\) hold for all \(k = 1, \ldots , K\). Finally, we have that \(\sum _{k=1}^{K} \upsilon _{j k} = 0\). \(\square \)

1.3 Proof of Lemma 2

Proof

Note \(\Omega = \{{0}, {1}, \ldots , {n-1}\}\), and \(\Omega _k = \{ \xi _1, \xi _2, \ldots , \xi _e \}\) \((e=1, \ldots , n)\) is an e size sample set random sampled in equal probability without replacement from the population, where \(\xi _{i} (i = 1, 2, \ldots , e)\) represents the ith random sample, which satisfies a discrete uniform distribution from 0 to \(n-1\), that is, \(\xi _{i} \sim {\text {DiscreteU}}({0}, n-1)\). \(\xi _{i}\) has the following properties:

$$\begin{aligned}&\mathbb {E} \xi _{i} = \frac{n - 1}{2}, \quad \mathbb {E} \xi ^2_i = \frac{(n - 1)(2n - 1)}{6}, \\&{\text {Var}}\xi _{i} = \frac{n^2 - 1}{12}. \end{aligned}$$

For all \(i_1 \ne i_2\), where \(i_1, i_2 \in \{1, 2, \ldots , n\}\), we get

$$\begin{aligned} \mathbb {E} \xi _{i_1} \xi _{i_2} = \frac{(n - 2)(3n - 1)}{12}, \quad {\text {Cov}} \left( \xi _{i_1}, \xi _{i_2} \right) = -\frac{n + 1}{12}. \end{aligned}$$

For continuous variables subject to arbitrary distribution \(\varvec{X}_j=\left\{ X_{1 j}, X_{2 j}, \ldots , X_{n j} \right\} \), let \(\xi _{ij} = \sum _{k \ne i}^{n} \textrm{I} ( X_{i j} \le X_{k j} )\). It is easy to find that the random variable \(\xi _{ij} (i = 1, 2, \ldots , n)\) is a special case in Mohamed and Mirakhmedov (2016), where \(e = n\). According to the definition of \(\hat{\tau }_{j k}\) and \(\xi _{i j}\), it is obviously established that \(\textrm{Pr}(\xi _i = r)= 1/n (i = 1, 2, \ldots , n; t = 0, 1, \ldots , n-1)\) and \(\xi _{i_1} \ne \xi _{i_2}(\forall i_1 \ne i_2)\). By the definition of \(\xi _{i j}\), for all \(j=1,2,\ldots , p, k=1,2,\ldots ,K\),

$$\begin{aligned}&\frac{1}{n+1} \sum _{k=1}^{n}{ \frac{1}{n} \sum _{i=1}^{n} \frac{\textrm{I}(X_{ij} \le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k )}{{\hat{p}}_k}} \\&= \frac{1}{n(n+1){\hat{p}}_k} \sum _{k=1}^{n} \sum _{i=1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I}(X_{ij}\le X_{kj}) \\&= \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I}(X_{ij}\le X_{kj}) + \right. \\&\quad \left. + \sum _{k=1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \right] \\&= \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} + \frac{1}{n+1}. \end{aligned}$$

If \(\textrm{H}_{0, j, k}\) is true, the expectation and variance of \(\hat{\tau }_{j k}\) can be obtained as follows:

$$\begin{aligned}&\mathbb {E} \left\{ \hat{\tau }_{j k} \right\} \\&= \mathbb {E} \left\{ \frac{1}{n+1} \sum _{k=1}^{n} { \frac{1}{n} \sum _{i=1}^{n}\frac{ \textrm{I} ( X_{ij} \le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k ) }{ {\hat{p}}_k}} - \frac{1}{2} \right\} \\&= \mathbb {E} \left\{ \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} \right\} + \frac{1}{n+1} - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \mathbb {E} \left\{ \xi _{i j} \right\} + \frac{1}{n+1} - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \frac{n-1}{2} n {\hat{p}}_k + \frac{1}{n+1} - \frac{1}{2} =0 \end{aligned}$$

and

$$\begin{aligned}&{\text {Var}}\left\{ \hat{\tau }_{j k} \right\} \\&= {\text {Var}} \left\{ \frac{1}{n+1}\sum _{k=1}^{n}{\frac{1}{n}\sum _{i=1}^{n}\frac{\textrm{I}(X_{ij}\le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k )}{{\hat{p}}_k}} - \frac{1}{2} \right\} \\&= {\text {Var}} \left\{ \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} \right\} \\&= \frac{1}{n^2 (n+1)^2 {{\hat{p}}_k}^2} \left[ \sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) {\text {Var}} \left\{ \xi _{i j} \right\} \right. \\&\quad \left. + \sum _{i_1 \ne i_2}^{n} \textrm{I}\left( \varvec{X}_{i_1} \in \hat{\mathcal {C}}_k, \varvec{X}_{i_2} \in \hat{\mathcal {C}}_k \right) {\text {Cov}} \left( \xi _{i_1 j},\xi _{i_2 j}\right) \right] \\&= \frac{1}{n^2 (n+1)^2 {{\hat{p}}_k}^2} \left\{ n {\hat{p}}_k \frac{n^{2} - 1}{12} - n {\hat{p}}_k \left( n \hat{p}_{k} - 1\right) \frac{n+1}{12} \right\} \\&= \frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}. \end{aligned}$$

Let \(\Omega _{n} = (\zeta _{1 n}, \ldots , \zeta _{n n} )\) be a random permutation of \(\{0,1\ldots ,n-1\}\), and \(r=(r_1, \ldots , r_N)\) is a random vector independent of \(\zeta _{1 n}, \ldots , \zeta _{n n}\) satisfying \(\textrm{Pr}\{ r_1 = k_1, \ldots , r_{n} = k_{n} \} = 1 / (n!)\), where \(k=(k_1,\ldots ,k_{n})\) is also a random permutation of \(1,\ldots , n\). Note that \(S_{n \hat{p}_k, n} = \zeta _{r_1 n} + \cdots + \zeta _{r_{n \hat{p}_k} n}\) represents a sum of n random vector samples chosen at random without replacement from the population \(\Omega _{n}\). It can be expressed equivalently as \(S_{n \hat{p}_k, n} = \textrm{I} \{ \varvec{X}_1 \in \hat{\mathcal {C}}_k \} \zeta _{1 N} + \cdots + \textrm{I} \{ \varvec{X}_n \in \hat{\mathcal {C}}_k \} \zeta _{n n}\), where \(\textrm{I}(\cdot )\) represents indicative function. It is shown that \(\sum _{i = 1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j}\) and \(S_{n \hat{p}_k, n}\) have the same distribution. Denote

$$\begin{aligned}&\tau =\frac{n - 1}{2}, \quad b^{2}=\frac{n^2-1}{12}, \quad \hat{\zeta }_{m n}= \frac{\left( \zeta _{m n}-\tau \right) }{b}, \\&\sigma ^{2}= \frac{(1-\hat{p}_k)(n^2-1)}{12}, \quad {\sigma ^{*}}^{2}= \frac{(1-\hat{p}_k)(n+1)n}{12}, \\&A_{k}=\frac{1}{n}\sum _{m=1}^{n} \hat{\zeta }_{m n}^{k}, \quad B_{k}=\frac{1}{n}\sum _{m=1}^{n} |\hat{\zeta }_{m n}|^{k}. \end{aligned}$$

Let \(F_{n}(u)=\textrm{Pr}\left\{ S_{n \hat{p}_k, n}< u \sigma ^{*} \sqrt{n \hat{p}_k}+ n \hat{p}_k \tau \right\} \). Due to the same distribution of \(S_{n \hat{p}_k, n}\) and \(\sum _{i = 1}^{n} \textrm{I} (\varvec{X}_i \in \hat{\mathcal {C}}_k) \xi _{i j}\), for any j, we have

$$\begin{aligned}&F_{n}(u) \\&= \textrm{Pr}\left\{ \sum _{i = 1}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j}< u \sigma ^{*} \sqrt{n \hat{p}_k}+ n \hat{p}_k \tau \right\} \\&= \textrm{Pr}\left\{ \frac{1}{n \hat{p}_k (n+1)} \left( \sum _{i = 1}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j} - n \hat{p}_k \frac{n - 1}{2} \right) \right. \\&\quad \left.< u \frac{\sigma ^{*}}{\sqrt{n \hat{p}_k}(n+1)} \right\} \\&= \textrm{Pr}\left\{ \hat{\tau }_{j k} < u \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} \right\} := F_{\hat{\tau }_{j k}}(u). \end{aligned}$$

Considering

$$\begin{aligned}&\hat{\tau }_{j k} \\&= \frac{1}{n+1} \sum _{k=1}^{n} {\frac{1}{n} \sum _{i=1}^{n} \frac{\textrm{I} (X_{ij}\le X_{kj}, \varvec{X}_i \in \hat{\mathcal {C}}_k )}{{\hat{p}}_k}} - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I}(X_{ij} \le X_{kj}) \right. \\&\quad \left. + \sum _{k=1}^{n} \textrm{I}\left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \right] - \frac{1}{2} \\&= \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k \right) \textrm{I} (X_{ij} \le X_{kj}) \right] \\&\quad - \frac{n-1}{2(n+1)} \\&= \frac{n-1}{2(n+1)} \\&\quad - \frac{1}{n(n+1){\hat{p}}_k} \left[ \sum _{k=1}^{n} \sum _{i \ne k}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \textrm{I} (X_{ij} \ge X_{kj}) \right] \\&= -\left[ \frac{1}{n(n+1){\hat{p}}_k} \sum _{i = 1}^{n} \textrm{I} \left( \varvec{X}_i \in \hat{\mathcal {C}}_k\right) \xi _{i j}^{\prime } - \frac{n-1}{2(n+1)} \right] , \end{aligned}$$

where continuous variable \(x_j\) satisfies \(\textrm{Pr}(x_{ij} = x_{kj}) = 0\). Define \(\xi _{ij}^{\prime } = \sum _{k \ne i}^{n} \textrm{I} (X_{i j} \ge X_{k j})\); then, \(\xi _{ij}^{\prime }\) has the same distribution with \(S_{n \hat{p}_k, n}\); as a result, \(\xi _{ij}^{\prime }\) has the same distribution with \(\xi _{ij}\). In other words, \(\hat{\tau }_{j k}\) and \(-\hat{\tau }_{j k}\) have the same distribution, which is \(F_{\hat{\tau }_{j k}}(u) = 1 - F_{\hat{\tau }_{j k}}(-u)\). If \(\lim \limits _{n \rightarrow \infty } \hat{p}_{k} (1 - \hat{p}_{k})>0\), we have the following relationship by using Theorem 3.4, Corollary 3.4, Corollary 3.5, and Corollary 3.6 of Mohamed and Mirakhmedov (2016), and we can obtain that

  • For all \(u = o(\sqrt{n})\ge 0\), we obtain that

$$\begin{aligned}&\frac{1-F_{\hat{\tau }_{j k}}(u)}{1-\Phi (u)} = \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)} \nonumber \\&= \exp \left\{ \frac{u^{3}}{\sqrt{n}} L_{n}\left( \frac{u}{\sqrt{n}}\right) \right\} \left( 1+O\left( \frac{u+1}{\sqrt{n}}\right) \right) . \end{aligned}$$
(A2)

where \(L_{n}(v)\) is an odd power series that, for all sufficiently large N, is majorized by a power series with coefficients not depending on N, and is convergent in some discussions, and \(L_{n}(v)\) converges uniformly in n for sufficiently small values of v, where \(L_{n}(0) = 0\).

  • For all \(u = o(n^{1/4}) \ge 0\), we obtain that

$$\begin{aligned}&\frac{1-F_{\hat{\tau }_{j k}}(u)}{1-\Phi (u)}= \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)} \nonumber \\&= \exp \left\{ \frac{u^{3}}{\sqrt{n}} O\left( \frac{u}{\sqrt{n}}\right) \right\} \left( 1+O\left( \frac{u+1}{\sqrt{n}}\right) \right) \nonumber \\&= \left( 1 + O\left( \frac{u^4}{n}\right) \right) \left( 1+O\left( \frac{u+1}{\sqrt{n}}\right) \right) . \end{aligned}$$
(A3)
  • For all \(u = o(n^{1 / 6}) \ge 0\), we obtain that

$$\begin{aligned} \frac{1-F_{\hat{\tau }_{j k}}(u)}{1-\Phi (u)} = \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)} = 1+O\left( \frac{(u+1)^{3}}{\sqrt{n}}\right) . \end{aligned}$$
(A4)
  • For all u satisfies that   \(0 \le u \le C \min \left( \frac{\sqrt{{n^{*}} \hat{q}} }{ \max |\hat{Y}_{m N}|},\frac{({n^{*}} \hat{q})^{1 / 6}}{ B_{3}^{1 / 3}}\right) \),   we have

$$\begin{aligned} \begin{aligned}&1-F_{\hat{\tau }_{j k}}(u) \\&=(1-\Phi (u))\left( 1+ O \left( \frac{(u+1)^{3} B_{3} }{\sqrt{{n^{*}} \hat{q}}}\right) \right) . \end{aligned} \end{aligned}$$
(A5)

Based on the above cases, using Taylor’s expansion of the Eqs. (A2)–(A5), the following conclusions can be obtained through the order correlation of u and n: if \(\textrm{H}_{0, j, k} (j = 1, 2, \ldots , p; k = 1, 2, \ldots , K)\) is true, then

$$\begin{aligned} \frac{1 - F_{\hat{\tau }_{j k}}(u)}{1 - \Phi (u)} = \frac{F_{\hat{\tau }_{j k}}(-u)}{\Phi (-u)}= 1 + O \left( n^{-1/2} \right) , \end{aligned}$$

where \(\Phi (-u) = 1/2p = O(n^{\alpha })\) and \(\alpha \le 1/2\). In other words, \(\hat{\tau }_{j k}\) has the asymptotic normal distribution \(\mathcal {N} (0, {(1 - \hat{p}_k)}/{(12 (n+1) \hat{p}_k)} )\) for all \(j \in \{ 1, 2, \ldots , p \}\) and all \(k \in \{ 1, 2, \ldots , K \}\). \(\square \)

1.4 Proof of Corollary 1

Proof

By definition of \(\hat{\upsilon }_{j k}\), we can easily find that \(\hat{\upsilon }_{j k}\) has the asymptotic distribution as \(\chi ^2_1\), where \(\chi ^2_1\) is the chi-square distribution with degree of freedom 1. Since \(\sum _{k=1}^{K}\hat{p}_k \hat{\tau }_{j k} = 0\), then \(\hat{\upsilon }_{j}=\sum _{k=1}^{K}\hat{\upsilon }_{j k}\) has the asymptotic distribution as \(\chi ^2_{K-1}\) with degree of freedom \(K-1\). \(\square \)

1.5 Proof of Theorem 1

Proof

Let \(p_{k}=\textrm{Pr}\left( \varvec{X} \in \mathcal {C}_k\right) \), \(F_{j}(k, x)=\textrm{I}\left( X_{j} \le x, \varvec{X} \in \mathcal {C}_k\right) \) and \(\eta _{k}=\textrm{I}\left( \varvec{X} \in \mathcal {C}_k\right) \). For some \(j=1, \ldots , p\), let \(\{X_{i j}: i=1, \ldots , n\}\) be a random sample of \(X_j\). Write \(\eta _{k i}=\textrm{I}\left( \varvec{X}_i \in \mathcal {C}_k\right) \), \(f_{i j}(k, x)=\textrm{I}\left( X_{i j} \le x, \varvec{X}_i \in \mathcal {C}_k\right) \), \(\zeta _{j}(k)=\mathbb {E}_{x}\left\{ F_{j, k}(x)\right\} \), \(\tilde{\zeta }_{j}(k)=\frac{1}{n^{2}} \sum _{l=1}^{n} \sum _{i=1}^{n} \frac{\textrm{I}\left( X_{i j} \le X_{l j}, \varvec{X}_i \in \mathcal {C}_k\right) }{\hat{p}_{k}}\) and \(\hat{\zeta }_{j}(k)=\frac{1}{n(n+1)} \sum _{l=1}^{n} \sum _{i=1}^{n} \frac{\textrm{I}\left( X_{i j} \le X_{l j}, \varvec{X}_i \in \mathcal {C}_k\right) }{\hat{p}_{k}}\). Due to that \(\mathbb {E}\left( \eta _{k i}\right) =p_{k}, \eta _{k i}\) follows \(\textrm{Bernoulli}(\left( p_{k}\right) \) and \(\sum _{i=1}^{n} \eta _{k i}\) follows \(\textrm{Binomial}\left( n, p_{k}\right) \). By Bernstein’s inequality, for any k and j, we have

$$\begin{aligned} \begin{aligned}&\textrm{Pr}\left\{ \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon \right\} \\&=\textrm{Pr}\left\{ \left| \sum _{i=1}^{n}\left( \eta _{k i}-p_{k}\right) \right| \ge n \epsilon \right\} \\&\le 2 \exp \left\{ -\frac{n^{2} \epsilon ^{2}}{2\left( n p_{k}+n \epsilon / 3\right) }\right\} \\&=2 \exp \left\{ -\frac{n \epsilon ^{2}}{2\left( p_{k}+\epsilon / 3\right) }\right\} . \end{aligned} \end{aligned}$$
(A6)

Since \(\left| \textrm{I}\left( X_{i j} \le x\right) \right| \le 1\), it can be proved straightforwardly by Hoeffding’s inequality and the empirical process theory Pollard (2012) that for given \(k=1, \ldots , K\) and \(j=1, \ldots , p\), we get

$$\begin{aligned}&\textrm{Pr}\left[ \sup _{x \in \mathbb {R}_{x_j}}\left| \frac{1}{n} \sum _{i=1}^{n} \textrm{I}\left( X_{i j} \le x\right) -\mathbb {E}\left\{ \textrm{I}\left( X_{j} \le x\right) \right\} \right| \ge \epsilon \right] \nonumber \\&\le 2(n+1) \exp \left( -2 n \epsilon ^{2}\right) . \end{aligned}$$
(A7)

By Bernstein’s inequality, the empirical process theory Pollard (2012), and

$$\begin{aligned}&\left| f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \\&=\left| \textrm{I}\left( X_{i j} \le x, \varvec{X}_i \in \mathcal {C}_k\right) -F_{j k}\left( x\right) p_{k}\right| \le 1, \end{aligned}$$

for given \(k=1, \ldots , K\) and \(j=1, \ldots , p\), we obtain

$$\begin{aligned} \begin{aligned}&\textrm{Pr}\left[ \sup _{x \in \mathbb {R}_{x_j}}\left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \ge \epsilon \right] \\&\le 2(n+1) \exp \left\{ -\frac{n \epsilon ^{2}}{2\left( p_{k}+\epsilon / 3\right) }\right\} . \end{aligned} \end{aligned}$$
(A8)

Note that

$$\begin{aligned}&\left| \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right| \\&= \left| \int \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}} d \hat{F}_{j}(x)-\mathbb {E}_{x}\left\{ F_{j, k}(x)\right\} \right| \\&\le \left| \int \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}} d\left\{ \hat{F}_{j}(x)-F_{j}(x)\right\} \right| \\&\quad +\sup _{x \in \mathbb {R}_{x_j}}\left| \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}}-\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} }{p_{k}}\right| \\&\equiv \textrm{I}_{1}+\textrm{I}_{2} , \end{aligned}$$

where \(\hat{F}_{j}(x)=n^{-1} \sum _{i=1}^{n} \textrm{I}\left( X_{i j} \le x\right) \) and \(F_{j}(x)=\textrm{Pr}\left( X_j \le x\right) \).

In terms of \(\textrm{I}_{1}\), from the inequality

$$\begin{aligned}&\left| \int \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}} d\left\{ \hat{F}_{j}(x)-F_{j}(x)\right\} \right| \\&\le \sup _{x \in \mathbb {R}_{x_{j}}}\left| \hat{F}_{j}(x)-F_{j}(x)\right| \end{aligned}$$

and Eq. (A8), we have

$$\begin{aligned}&\textrm{Pr}\left( \textrm{I}_{1}>\epsilon \right) \le \textrm{Pr}\left\{ \sup _{x \in \mathbb {R}_{x_{j}}}\left| \hat{F}_{j}(x)-F_{j}(x)\right| >\epsilon \right\} \\&\le 2(n+1) \exp \left( -2 n \epsilon ^{2}\right) . \end{aligned}$$

In terms of \(\textrm{I}_{2}\), due to that

$$\begin{aligned} \sup _{x \in \mathbb {R}_{x_{j}}} \mathbb {E}\left\{ F_{j}(k, x)\right\} =\sup _{x \in \mathbb {R}_{x_{j}}} \textrm{Pr}\left( x_{j} \le x, \varvec{x} \in \mathcal {C}_k\right) =p_{k}, \end{aligned}$$

write

$$\begin{aligned}&\sup _{x \in \mathbb {R}_{x_{j}}}\left| \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}}-\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} }{p_{k}}\right| \\&\le \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| \sum _{i=1}^{n} f_{i j}(k, x) / n-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} \\&\quad + \sup _{x \in \mathbb {R}_{x_{j}}}\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} \left| \hat{p}_{k}-p_{k}\right| }{p_{k} \hat{p}_{k}} \\&= \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| \sum _{i=1}^{n} f_{i j}(k, x) / n-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} + \frac{\left| \hat{p}_{k}-p_{k}\right| }{\hat{p}_{k}}. \end{aligned}$$

Thereby, from Eqs. (A6) and (A7), for any \(\epsilon >0\), we have that

$$\begin{aligned}&\textrm{Pr}\left( \textrm{I}_{2} \ge \epsilon \right) \\&\le \textrm{Pr}\left[ \sup _{x \in \mathbb {R}_{x_{j}}}\left| \frac{1}{n} \sum _{i=1}^{n} \frac{f_{i j}(k, x)}{\hat{p}_{k}}-\frac{\mathbb {E}\left\{ F_{j}(k, x)\right\} }{p_{k}}\right| \ge \epsilon \right] \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| n^{-1} \sum _{i=1}^{n} f_{i j}(k, x) - \mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} \right. \\&\qquad \left. + \frac{\left| \hat{p}_{k}-p_{k}\right| }{\hat{p}_{k}} \ge \epsilon \right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \frac{\left| n^{-1} \sum _{i=1}^{n} f_{i j}(k, x) - \mathbb {E}\left\{ F_{j}(k, x)\right\} \right| }{\hat{p}_{k}} \right. \\&\qquad \left. + \frac{\left| n^{-1} \sum _{i=1}^{n} \eta _{k i} - \mathbb {E}\left( \eta _{k}\right) \right| }{\hat{p}_{k}} \ge \epsilon , \right. \\&\qquad \left. \min _{1 \le k \le K} \hat{p}_{k} \ge c_{1} / 2 K\right) \\&\quad +\textrm{Pr}\left( \min _{1 \le k \le K} \hat{p}_{k}< c_{1} / 2 K\right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \right. \\&\qquad \left. + \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon c_{1} / 2 K\right) \\&\quad +\textrm{Pr}\left( \min _{1 \le k \le K} \hat{p}_{k}<c_{1} / 2 K\right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}} \left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \right. \\&\qquad \left. + \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon c_{1} / 2 K\right) \\&\quad + \textrm{Pr}\left( \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge c_{1} / 2 K \right) \\&\le \textrm{Pr}\left( \sup _{x \in \mathbb {R}_{x_{j}}}\left| \frac{1}{n} \sum _{i=1}^{n} f_{i j}(k, x)-\mathbb {E}\left\{ F_{j}(k, x)\right\} \right| \right. \\&\qquad \left. \ge \epsilon c_{1} / 4 K\right) \\&\quad + 2 \textrm{Pr}\left( \left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| \ge \epsilon c_{1} / 4 K\right) \\&\le 4 \exp \left\{ -\frac{n\left( \epsilon c_{1} / 4 K\right) ^{2}}{2\left( p_{k}+\epsilon c_{1} / 12 K\right) }\right\} \\&\quad +2(n+1) \exp \left\{ -\frac{n\left( \epsilon c_{1} / 4 K\right) ^{2}}{2\left( p_{k}+\epsilon c_{1} / 12 K\right) }\right\} \\&\le 2(n+3) \exp \left( -c_{4} n \epsilon ^{2} / K\right) , \end{aligned}$$

where \(c_{4}=\frac{\left( c_{1} / 4 K\right) ^{2}}{2\left( p_{k}K+\epsilon c_{1} / 12 \right) } >0\) and the inequality above holds based on the following inequality

$$\begin{aligned}&\left| \frac{1}{n} \sum _{i=1}^{n} \eta _{k i}-\mathbb {E}\left( \eta _{k}\right) \right| =\left| \hat{p}_{k}-p_{k}\right| \ge p_{k}-\hat{p}_{k}\\&>\frac{c_{1}}{K}-\frac{c_{1}}{2 K}=\frac{c_{1}}{2 K}. \end{aligned}$$

Under Condition C3, \(c_{4} n \epsilon ^{2} / K<2 n \epsilon ^{2}\). Therefore, we have

$$\begin{aligned} \begin{aligned}&\textrm{Pr}\left\{ \left| \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right| \ge \epsilon \right\} \le \textrm{Pr}\left( \textrm{I}_{1}+\textrm{I}_{2} \ge \epsilon \right) \\&\le \textrm{Pr}\left( \textrm{I}_{1} \ge \epsilon / 2\right) +\textrm{Pr}\left( \textrm{I}_{2} \ge \epsilon / 2\right) \\&\le 4(n+2) \exp \left( -c_{4} n \epsilon ^{2} / K\right) . \end{aligned} \end{aligned}$$
(A9)

Let \(\hat{\zeta }_{j}(k)-\zeta _{j}(k) \equiv \frac{n}{n+1}\left\{ \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right\} +\Delta \), where \(\Delta =-\zeta _{j}(k) /(n+1)\). It follows from Condition C3 and Eqs. (A5) and (A9) that \(K=O\left( n^{\xi }\right) \) for \(\xi + 2\kappa < 1\). By letting \(\epsilon = c_{3}^{*} n^{-\kappa } = c_{3}^{*} n^{-1/2\beta }\) for \(0 \le \kappa < 1 / 4\) and \(c_{3}^{*}>0\), we have

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&\le p \textrm{Pr}\left( \left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&= p \textrm{Pr}\left\{ \left| \hat{\zeta }_{j}(k)-\zeta _{j}(k)\right| \ge c_{3}^{*} n^{-\kappa }\right\} \\&\le p \textrm{Pr}\left\{ \left| \tilde{\zeta }_{j}(k)-\zeta _{j}(k)\right| \ge {n^{-1}(n+1)\left( c_{3}^{*} n^{-\kappa }-|\Delta |\right) }\right\} \\&\le 4(n+2) p \exp \left( -c_{4} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

where \(c_{4}\) is a constant. The last inequality above holds due to \(\Delta = O(1 / n)\). Let \(c_3 = 12 (n+1) \hat{p}_k / (1-\hat{p}_k)c_3^{*} \), by the definition of \(\hat{\upsilon }_{j k}\), we have

$$\begin{aligned}&\textrm{Pr} \left( \max _{1 \le j \le p}\left| \hat{\upsilon }_{j k}-\upsilon _{j k}\right| \ge c_{3} n^{-\kappa } \right) \\&= \textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \cdot \left| \hat{\tau }_{j, k}+\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&\le \textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\tau }_{j, k}-\tau _{j, k}\right| \ge c_{3}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{4} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

where we use the inequalities that \(|\hat{\tau }_{j, k}+\tau _{j, k}| \le |\hat{\tau }_{j, k}| + |\tau _{j, k}| \le 1/2 + 1/2 = 1\). \(\square \)

1.6 Proof of Theorem 2

Proof

Let \(c_{3} n^{-\kappa } = \rho _0 / 4\). It follows from Condition C2 and Eq. (A9) that

$$\begin{aligned}&\textrm{Pr}\left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \hat{\upsilon }_{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \hat{\upsilon }_{j, k}\right|<\frac{\rho _{0}}{2}\right) \\&\le \textrm{Pr}\left\{ \left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \hat{\upsilon }_{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \hat{\upsilon }_{j, k}\right| \right) \right. \\&\quad \left. - \left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \upsilon _{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \upsilon _{j, k}\right| \right) <-\frac{\rho _{0}}{2}\right\} \\&\le \textrm{Pr}\left\{ \left| \left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \hat{\upsilon }_{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \hat{\upsilon }_{j, k}\right| \right) \right. \right. \\&\quad \left. \left. -\left( \min _{j_1 \in \mathcal {A}_{k_1}}\left| \upsilon _{j, k}\right| -\max _{j_2 \notin \mathcal {A}_{k_2}}\left| \upsilon _{j, k}\right| \right) \right|>\frac{\rho _{0}}{2}\right\} \\&\le \textrm{Pr}\left( 2 \max _{1 \le j \le p}\left| \hat{\upsilon }_{j, k}-\upsilon _{j, k}\right| >\frac{\rho _{0}}{2}\right) \\&\le 4(n+2) p \exp \left( -c_{4}^{*} n \rho _{0}^{2} / K\right) , \end{aligned}$$

where \(c_{4}^{*}>0\) is a constant. \(K \log (n)= o\left( n \rho _{0}^{2}\right) \) and \(K \log (p)=o\left( n \rho _{0}^{2}\right) \) ensures that, there exists some \(n_{0}>0\) for \(n>n_{0}\), \(p \le \exp \left( c_{4}^{*} n \rho _{0}^{2} / 2 K\right) \) and \(c_{4}^{*} n \rho _{0}^{2} / 2 K \ge 3 \log \{4(n+2)\}\). Consequently, we have that

$$\begin{aligned}&\sum _{n=n_0}^{\infty } 4(n+2) p \exp \left( -c_{4}^{*} n \rho _{0}^{2} / K\right) \\&\le \sum _{n=n_{0}}^{\infty } \exp \left[ -c_{4}^{*} n \rho _{0}^{2} / K + c_{4}^{*} n \rho _{0}^{2} / 2 K+\log \{4(n+2)\}\right] \\&\le \sum _{n=n_{0}}^{\infty } \exp [-3 \log \{4(n+2)\}+\log \{4(n+2)\}] \\&\le \sum _{n=n_{0}}^{\infty }\left\{ 4(n+2) \right\} ^{-2} \le \infty . \end{aligned}$$

By Borel-Contelli Lemma,

$$\begin{aligned} \liminf \limits _{n \rightarrow \infty } \left\{ \min \limits _{j_1 \in \mathcal {A}_{k_1}} \hat{\upsilon }_{j_1, k_1} -\max \limits _{j_2 \notin \mathcal {A}_{k_2}} \hat{\upsilon }_{j_2, k_2} \right\} > 0 \end{aligned}$$

almost surely. \(\square \)

1.7 Proof of Corollary 2

Proof

Suppose the Assumptions (i)–(ii) and the Conditions (C1)–C3 hold. Firstly, according to the Theorem 1, for any constant \(c_4>0\), we have

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p}\left| \hat{\upsilon }_{j k} - \upsilon _{j k}\right| \ge c_{3} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{4} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

where \(c_4>0\) is a constant. Write \(c_5 = (\sum _{k=1}^{K}s_k) \cdot c_5^{*}\). Then, we obtain

$$\begin{aligned}&\textrm{Pr} ( |{\tilde{\Upsilon }} - {\Upsilon }| \ge c_{5} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( |{\tilde{\Upsilon }} - {\Upsilon }| \le c_{5} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} \left( \sum _{k=1}^K \sum _{j \in \mathcal {A}_k}|\hat{\upsilon }_{jk} - {\upsilon }_j| \le c_5^{*} n^{-\kappa } \sum _{k=1}^{K}s_k \right) \\&\le 1 - \prod _{k=1}^K \prod _{j \in \mathcal {A}_k} \textrm{Pr} ( |\hat{\upsilon }_{jk} - {\upsilon }_j| \le c_{5}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{7} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k}. \end{aligned}$$

Similarly, let \(c_6 = s \cdot K \cdot c_6^{*}\), we have

$$\begin{aligned}&\textrm{Pr} ( |{\tilde{\Upsilon }^{*}} - {\Upsilon }^{*} | \ge c_{6} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( |{\tilde{\Upsilon }^{*}} - {\Upsilon }^{*} | \le c_{6} n^{-\kappa } )\\&\le 1 - \prod _{j \in \mathcal {A}} \textrm{Pr} ( |{\tilde{\Upsilon }_{j}^{*}} - {\Upsilon }_j^{*})| \le c_{6}^{*} K n^{-\kappa } ) \\&\le 1 - \prod _{j \in \mathcal {A}} \prod _{k=1}^{K} \textrm{Pr} ( |\hat{\upsilon }_{jk} - {\upsilon }_j | \le c_{6}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{8} n^{1-2 \kappa -\xi } ) ]^{Ks}. \end{aligned}$$

\(\square \)

1.8 Proof of Lemma 3

Proof

Suppose Assumptions (i)–(ii) and Conditions (C1)–C2 hold. For the convenience of discussion, in this proof, the sparse clustering utility \(\Upsilon \) is considered as a function of any given partition \(\mathbb {C}= \{\mathcal {C}_1, \ldots , \mathcal {C}_K\}\), i.e, note the sparse clustering utilities to hypothesis testing (2) and hypothesis testing (8) as \({\Upsilon }(\mathbb {C})\) and \({\Upsilon }^{*}(\mathbb {C})\). Correspondingly, write the sufficient variable screening utilities of the jth variable as \({\upsilon }_{j k}(\mathbb {C})\) and \({\upsilon }_{j}(\mathbb {C})\).

Let the true partition \(\mathbb {C}_0 = \left\{ \mathcal {C}_{0,1}, \ldots , \mathcal {C}_{0,K} \right\} \) separates the samples into K groups with the true mixture distributions \(\mathcal {F}_0 = \{F_{0,1}, \ldots , F_{0,K} \}\), where the mixing proportion satisfies that \(p_{0k} = \textrm{Pr}(\mathcal {C}_{0, k} | \varvec{x})\). For any given partition \(\mathbb {C} = \left\{ \mathcal {C}_{1}, \ldots , \mathcal {C}_{K} \right\} \), write the true mixture distributions as \(\mathcal {F} = \{F_{1}, \ldots , F_{K} \}\) with the mixing proportion \(p_{k} = \textrm{Pr}(\varvec{X} \in \mathcal {C}_{k} )\). Denote that \(\varvec{p} = (p_1, \ldots , p_K)^{\textsf{T}}\), \(\varvec{p}_0 = (p_{01}, \ldots , p_{0K})^{\textsf{T}}\), \(p_k=\textrm{Pr}\{x\in \mathcal {C}_k \}\), \(\varvec{P}=[p_{kl}]_{K \times K}\), the elements \(p_{k l} = \textrm{Pr}(\varvec{X} \in \mathcal {C}_{k} | \varvec{X} \in \mathcal {C}_{0\,l})\) in \(\varvec{P}\) represents the probability of the lth component distribution of the partition \(\mathbb {C}_0\) separate into the kth component distribution of the partition \(\mathbb {C}\) and \(\sum _{k=1}^{K} p_{k l} = 1\), \(l = 1, \ldots , K\). Then for any \(\mathbb {C}\), we can obtain that \(\varvec{p} = \varvec{P} \cdot \varvec{p}_0\) and \(F(\varvec{x}) = \varvec{p} \cdot (F_{1}(\varvec{x}), \cdots , F_{K}(\varvec{x})) = \varvec{p}_0 \cdot (F_{01}(\varvec{x}), \cdots , F_{0K}(\varvec{x}))\). For any \(j \in \{1, \ldots , p\}\) and \(k \in \{1, \ldots , K\}\), the marginal utility follows the relationship as

$$\begin{aligned} \tau _{j k}(\mathbb {C})&= \frac{p_{0 k} \cdot p_{k k}}{p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}} \cdot \tau _{j k}(\mathbb {C}_0) \\&\quad + \frac{p_{0 \bar{k}} \cdot p_{k \bar{k}}}{p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}} \cdot \tau _{j \bar{k}}(\mathbb {C}_0) \\&= \frac{2 \cdot p_{kk}-1}{p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}} \cdot p_{0 k} \cdot \tau _{j k}(\mathbb {C}_0), \end{aligned}$$

where \(p_{0 \bar{k}} = 1 - p_{0 k}\), \(p_{k \bar{k}} = 1 - p_{k k}\) and \(\tau _{j \bar{k}}(\mathbb {C}_0) = - p_k \cdot \tau _{j k}(\mathbb {C}_0) / (1-p_k)\). By the definition of \(\upsilon _{j k}\), we can obtain that

$$\begin{aligned} \upsilon _{j k}(\mathbb {C}) = \frac{12 (n+1) (2 p_{kk}-1)^2 }{\Delta _{p_{kk}} (1 - \Delta _{p_{kk}})} p_{0 k}^2 \tau _{j k}^2(\mathbb {C}_0), \end{aligned}$$

where \(\Delta _{p_{kk}} = p_{0 k} \cdot p_{k k} + p_{0 \bar{k}} \cdot p_{k \bar{k}}\). Since that \((2 \cdot p_{k k} - 1)^2\) achieves maximum only and if only at \(p_{kk} = 0\) and \(p_{kk}= 1\), and \(\Delta _{p_{kk}}\) achieves maximum only and if only at \(p_{kk} = 0\) or \(p_{kk}= 1\), we have

$$\begin{aligned} \max _{p_{kk}} \upsilon _{j k}(\mathbb {C})&= \max \{\upsilon _{j k}(\mathbb {C}_0)|_{p_{kk}=0}, \upsilon _{j k}(\mathbb {C}_0)|_{p_{kk}=1}\} \\&= \upsilon _{j k}(\mathbb {C})|_{p_{kk}=0} = \upsilon _{j k}(\mathbb {C})|_{p_{kk}=1}= \upsilon _{j k}(\mathbb {C}_0). \end{aligned}$$

Considering the non-label process of the clustering and the limitation that \(\sum _{k=1}^{K} p_{k l} = 1\) for any \(l=1, \ldots , K\), without losing the meaning, let \(p_{kk}=1\) and \(p_{k \bar{k}} = 0\). Then for any j, we conclude that

$$\begin{aligned} {\upsilon }_j(\mathbb {C}) = S_j \sum _{k = 1}^{K} \upsilon _{j k}(\mathbb {C}) \le {\upsilon }_j(\mathbb {C}_0), \end{aligned}$$

where the equivalence holds only when the matrix P is an identity matrix. Thus, \(P = \varvec{I}_{K \times K}\) is a unique global optimal solution of \({\Upsilon _{d}}_j(\mathbb {C})\) when \(S_j = 1\). Similarly, for any j, we obtain

$$\begin{aligned} {\upsilon }_{jk}(\mathbb {C})&= \sum _{k = 1}^{K} S_{j k} \upsilon _{j k}(\mathbb {C}) \le \sum _{k = 1}^{K} S_{j k} \cdot \upsilon _{j k}(\mathbb {C}_0) \\&= {\upsilon }_{j k}(\mathbb {C}_0), \end{aligned}$$

where the equivalence holds only when the matrix P is an identity matrix, which is the same as \(S_j(\mathbb {C})\). It indicates that \({\Upsilon }(\mathbb {C})\) and \({\Upsilon }^{*}(\mathbb {C})\) achieve the maximum values \({\Upsilon }(\mathbb {C}_0)\) and \({\Upsilon }^{*}(\mathbb {C}_0)\), respectively, if and only if \(\mathbb {C} = \mathbb {C}_0\), where

$$\begin{aligned} \mathbb {C}_0 = \mathop {\arg \max }_{\mathbb {C}} {\Upsilon }(\mathbb {C}) = \mathop {\arg \max }_{\mathbb {C}} {\Upsilon }^{*}(\mathbb {C}). \end{aligned}$$

\(\square \)

1.9 Proof of Theorem 3

Proof

Suppose Assumption (i), Assumption (ii) and Conditions (C1)–C3 hold. For any constant \(c_9 = 2 \cdot c_9^{*}\) and \(c_{10} = 2 \cdot c_{10}^{*}\), the follows inequalities are established by Corollary 2 and Lemma 3.

$$\begin{aligned}&\textrm{Pr} ( {\tilde{\Upsilon }}_0 - {\tilde{\Upsilon }} \ge c_{9} n^{-\kappa } ) = 1 - \textrm{Pr} ( {\tilde{\Upsilon }}_0 - {\hat{\Upsilon }} \le c_{9} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} \left( |{\tilde{\Upsilon }}_0 - {{\Upsilon }_0} + {\tilde{\Upsilon }} - {{\Upsilon }}| \le 2 \cdot c_{9}^{*} n^{-\kappa } \right) \\&\le 1 - \textrm{Pr} ( |{\tilde{\Upsilon }}_0 - {{\Upsilon }_0}| \le c_9^{*} n^{-\kappa } ) \textrm{Pr} ( |{\tilde{\Upsilon }} - {{\Upsilon }}| \le c_9^{*} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{11} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k} \right\} ^2 \end{aligned}$$

and

$$\begin{aligned}&\textrm{Pr} ( {\tilde{\Upsilon }}_0^{*} - {\tilde{\Upsilon }}^{*} \ge c_{10} n^{-\kappa } ) = 1 - \textrm{Pr} ( {\tilde{\Upsilon }}_0^{*} - {\hat{\Upsilon }}^{*} \le c_{10} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} \left( |{\tilde{\Upsilon }}_0^{*} - {{\Upsilon }_0^{*}} + {\tilde{\Upsilon }^{*}} - {{\Upsilon }^{*}}| \le 2 \cdot c_{10} n^{-\kappa } \right) \\&\le 1 - \textrm{Pr} ( |{\tilde{\Upsilon }}_0^{*} - {{\Upsilon }_0^{*}}| \le c_{10}^{*} n^{-\kappa } ) \textrm{Pr} ( |{\tilde{\Upsilon }}^{*} - {{\Upsilon }^{*}}| \le c_{10}^{*} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{12} n^{1-2 \kappa -\xi } ) ]^{Ks} \right\} ^2, \end{aligned}$$

where \(c_{11}>0\) and \(c_{12}>0\) are some constants. As \(n \rightarrow \infty \), it can be written as

$$\begin{aligned} \liminf _{n \rightarrow \infty } \{ \tilde{\Upsilon }_0 - \tilde{\Upsilon } \} \ge 0, \quad \liminf _{n \rightarrow \infty } \{ \tilde{\Upsilon }^{*}_0 - \tilde{\Upsilon }^{*} \} \ge 0. \end{aligned}$$

Denote \(\tilde{\mathbb {C}}_0 = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} {S}_{j k} \cdot b_{j k}\) and \(\tilde{\mathbb {C}}_0^{*} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} {S}_{j} \cdot b_{j}\). We can get that \(\liminf \limits _{n \rightarrow \infty } \tilde{\mathbb {C}}_0 = \liminf \limits _{n \rightarrow \infty } \tilde{\mathbb {C}}_0^{*}\). \(\square \)

1.10 Proof of Lemma 4

Proof

By the Eq. (15), the component of the OVR BCSS-RV cost object can be rewritten as

$$\begin{aligned} b_{j k}&= \frac{1}{n} \sum _{i, i^{\prime }}^{n} d_{i i^{\prime } j}- \frac{1}{n_k} \sum _{i, i^{\prime } \in \mathcal {C}_k} d_{i i^{\prime } j} - \frac{1}{n - n_k} \sum _{i, i^{\prime } \notin \mathcal {C}_k} d_{i i^{\prime } j} \\&= \sum _{i = 1}^n ( \xi _{i j} - \bar{{\xi }}_{j} )^2 - \sum _{i \in \mathcal {C}_k} ( \xi _{i j} - \bar{{\xi }}_{k j} )^2 \\&\quad - \sum _{i \notin \mathcal {C}_k} ( \xi _{i j} - \bar{{\xi }}_{k j}^{\prime } )^2 \\&= n_k ( \bar{{\xi }}_{k j} - \bar{{\xi }}_{j} )^2 + (n - n_k) ( \bar{{\xi }}_{k j}^{\prime } - \bar{{\xi }}_{j} )^2 \\&= n_k^{-1} ( \sum _{i \in \mathcal {C}_k} {\xi }_{i j} - n_k \bar{{\xi }}_{j} )^2 \\&\quad + (n - n_k)^{-1} ( \sum _{i \notin \mathcal {C}_k} {\xi }_{i j} - (n - n_k) \bar{{\xi }}_{j} )^2 \\&= \frac{n}{n_k (n - n_k)} ( \sum _{i \in \mathcal {C}_k} {\xi }_{i j} - n_k \bar{{\xi }}_{j} )^2 \\&= \frac{n (n+1)^2 n_k}{(n - n_k)} \left( \frac{1}{n+1} \bar{{\xi }}_{k j} - \frac{1}{2} \right) ^2 \\&\propto \frac{12 (n + 1) {\hat{p}}_k}{1 - {\hat{p}}_k} \hat{\tau }_{j k}^2 \\&= \hat{\upsilon }_{j k}, \end{aligned}$$

where we use the equation \(\bar{{\xi }}_{k j} = {n_k}^{-1} \sum _{i \in \mathcal {C}_k} \xi _{i j}\), \(\bar{{\xi }}_{k j}^{\prime } = (n - n_k)^{-1} \sum _{i \notin \mathcal {C}_k} \xi _{i j}\), and note \(\bar{{\xi }}_{j} = {n}^{-1} \sum _{i = 1}^{n} \xi _{i j} = (n + 1) / 2\). Hence, one can obtain

$$\begin{aligned} \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{j k} \cdot \hat{\upsilon }_{j k} = \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{j k} \cdot b_{j k}. \end{aligned}$$

By the Eq. (15), the component of the OVR BCSS-RV cost object satisfies that \(b_{j k} \propto (n-1)(n+1)^4 + \hat{\upsilon }_{j k}\) for any \(j = 1, \ldots , p\) and any \(k = 1, \ldots , K\). Hence, we also get that \(b_j = \sum _{k=1}^{K} b_{j k} \propto K (n-1)(n+1)^4 + \sum _{k=1}^{K} \hat{\upsilon }_{j k} = K (n-1)(n+1)^4 + \hat{\upsilon }_{j}\). Due to \(b_{j_1} > b_{j_2}\) for any \(j_1 \in \hat{\mathcal {A}}\) and \(j_2 \notin \hat{\mathcal {A}}\), we obtain that \(\sum _{j=1}^{p} w_{j} b_{j}\) reaches optimal value at \(w_j = S_j\) under \(\Vert \varvec{w}\Vert _{\infty } \le 1\) and \(\Vert \varvec{w}\Vert _{0} \le s\).

Similarly, the component of the OVR BCSS-RV cost object can be rewritten as

$$\begin{aligned} b_j = \sum _{k=1}^{K} b_{j k} \propto \sum _{k=1}^{K} \frac{12 (n + 1) {\hat{p}}_k}{1 - {\hat{p}}_k} \hat{\tau }_{j k}^2 = \hat{\upsilon }_{j}, \end{aligned}$$

where we use that \(b_j = \sum _{k=1}^{K} b_{j k}\) and \(\hat{\upsilon }_{j} = \sum _{k=1}^{K} \hat{\upsilon }_{j k} = \sum _{k=1}^{K} \frac{12 (n + 1) {\hat{p}}_k}{1 - {\hat{p}}_k} \hat{\tau }_{j k}^2\). Thus, we have

$$\begin{aligned} \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{k} \cdot \hat{\upsilon }_{j} = \mathop {\arg \max }_{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{j} \cdot b_{j}. \end{aligned}$$

Therefore, we get that \(b_{j k} \propto (n-1)(n+1)^4 + \hat{\upsilon }_{j k}\) for any \(j = 1, \ldots , p\) and any \(k = 1, \ldots , K\). Due to \(b_{j_1} > b_{j_2}\) for any \(j_1 \in \hat{\mathcal {A}_{k}}\) and \(j_2 \notin \hat{\mathcal {A}_{k}}\) with a same k, one can obtain that \(\sum _{j=1}^{p} w_{j k} b_{j k}\) reaches the optimal value at \(w_{j k} = S_{j k}\) under \(\Vert \varvec{w}_k\Vert _{\infty } \le 1\) and \(\Vert \varvec{w}_k\Vert _{0} \le s\) for all \(k = 1, \ldots , K\). \(\square \)

1.11 Proof of Theorem 4

Proof

Due to the equivalence of \(\hat{\upsilon }_{j k} \ge \rho \) and \(|\hat{\tau }_{j, k}| \ge \rho \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}}\), The proof of Theorem 4 will be obtained by \(|\hat{\tau }_{j, k}|\). For \(k = 1, \ldots , K\) and \(j = 1, \ldots , p\), note that

$$\begin{aligned}&\textrm{Pr}_{|\hat{\tau }_{j, k}|}(x) := \textrm{Pr} \left\{ |\hat{\tau }_{j, k}|< x \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} \right\} , \\&\textrm{Pr}_{|\hat{\tau }_{k}|}(x) := \textrm{Pr} \left\{ |\hat{\tau }_{j, k}| < x \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} , \right. \\&\qquad \qquad \qquad \qquad \left. \forall j \in \{j: j \notin \mathcal {A}_k\} \right\} , \\&\textrm{Pr}_{|\hat{\tau }_{k}|}^{c}(x) := \textrm{Pr} \left\{ |\hat{\tau }_{j, k}| \ge x \sqrt{\frac{1 - \hat{p}_{k}}{12 (n+1) \hat{p}_{k}}} , \right. \\&\qquad \qquad \qquad \qquad \left. \exists j \in \{j: j \notin \mathcal {A}_k\} \right\} . \end{aligned}$$

Then, we have that if rejecting \(\textrm{H}_{0,j,k}\), then we get

$$\begin{aligned} \textrm{Pr}_{|\hat{\tau }_{j, k}|}(x)&= \textrm{Pr}_{\hat{\tau }_{j, k}}(x) - \textrm{Pr}_{\hat{\tau }_{j, k}}(-x) \\&= 1 - 2 \textrm{Pr}_{\hat{\tau }_{j, k}}(-x). \end{aligned}$$

Hence, we obtain

$$\begin{aligned}&\textrm{Pr}_{|\hat{\tau }_{k}|}(x) = [ \textrm{Pr}_{|\hat{\tau }_{j, k}|}(x) ]^{p-s_k} = [ 1 - 2 \textrm{Pr}_{\hat{\tau }_{j, k}}(-x) ]^{p-s_k}, \\&\textrm{Pr}_{|\hat{\tau }_{k}|}^{c}(x) = 1 - \textrm{Pr}_{|\hat{\tau }_{k}|}(x) = 1 - [ 1 - 2 \textrm{Pr}_{\hat{\tau }_{j, k}}(-x) ]^{p-s_k}. \end{aligned}$$

When \(\beta \in (2, + \infty )\), we have \(\rho = o(n^{1/4})\) as a constant. Hence, we obtain

$$\begin{aligned} \frac{1-F_{\hat{\tau }_{j k}}(\sqrt{\rho })}{1/2p}=\frac{F_{\hat{\tau }_{j k}}(-\sqrt{\rho })}{1/2p}= 1 + O \left( n^{-1/2} \right) . \end{aligned}$$
(A10)

By the definition of \(F_{|\hat{\tau }_{k}|}^{c}(x)\), it implies that

$$\begin{aligned} F_{|\hat{\tau }_{k}|}^{c}(k) = 1 - \left\{ 1 - p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] \right\} ^{q_k}. \end{aligned}$$
(A11)

By the ultra-high dimensional settings, we have \(p = o\left( \exp \{n^{c} \} \right) , c > 0\) and \(s_k = o(n)\). If \(c > 1/2\), according to Eqs. (A10) and (A11), we have

$$\begin{aligned}&\left\{ 1 - p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] \right\} ^{q_k} \\&= \exp \left\{ q_k \cdot \log \left\{ 1 - p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] \right\} \right\} \\&= \exp \left\{ - q_k \cdot \left( p^{-1} \left[ {1 + O \left( n^{-1/2} \right) } \right] + o(1/p) \right) \right\} \\&= \exp \left\{ - 1 + O ( n^{-1/2} ) \right\} \exp \left\{ o(1/p) \right\} \rightarrow e^{-1}. \end{aligned}$$

To conclude, \(F_{|\hat{\tau }_{k}|}^{c}(k) \rightarrow 1 - e^{-1}\) a.s. \(n \rightarrow \infty \); in other words,

$$\begin{aligned}&\lim _{p \rightarrow \infty } {\text {Pr}} \{ {\text {FD}}_{k, \hat{\rho }_0}> 0 \} \\&= \lim _{p \rightarrow \infty } {\text {Pr}} \{ \sum _{j \in \mathcal {A}_{k}^c} I({\hat{\upsilon }}_{j k} \ge \rho ) > 0 \} \\&= 1 - e^{-1} \quad (k = 1, \ldots , K). \end{aligned}$$

The number of variables screened into the adaptive false discovery set is subjected to the p times Bernoulli testing; then, the expectation and variance of the number of the false discovery are

$$\begin{aligned}&{\text {EFD}} = q_k \cdot 1/p = ( 1 + O ( n^{-1/2}))(1 - o(p^{-1})) \\&= 1 + O ( n^{-1/2} ), \\&{\text {VFD}} = q_k \cdot 1/p \cdot (1-1/p) \\&= ( 1 + O ( n^{-1/2}))(1 - o(p^{-1}))^2 \\&= 1 + O ( n^{-1/2} ). \end{aligned}$$

\(\square \)

1.12 Proof of Theorem 5

Proof

According to the Condition C2, the definition of \(\hat{\rho }_0\) in Sect. 4.1 and \(\max \limits _{j \in \mathcal {A}_{k}} |\hat{\upsilon }_{j, k} - \upsilon _{j, k} | \le c_{13}^{*} n^{-\kappa }\) in Theorem 1, we have

$$\begin{aligned}&\min _{j \in \mathcal {A}_{k}} |\hat{\upsilon }_{j k}| \ge \min _{j \in \mathcal {A}_{k}} ( |\upsilon _{j k} | - |\hat{\upsilon }_{j, k} - \upsilon _{j k} | ) \\&\ge \min _{j \in \mathcal {A}_{k}} | \upsilon _{j k} |-\max _{j \in \mathcal {A}_{k}} |\upsilon _{j k} - \hat{\upsilon }_{j, k} | \ge c_{13}^{*} n^{-\kappa }. \end{aligned}$$

Therefore, we obtain that

$$\begin{aligned}&{\text {Pr}}\left( \mathcal {A}_{k} \subset \hat{\mathcal {A}}_{k, \hat{\rho }_0}\right) \ge {\text {Pr}}\left( \max _{j \in \mathcal {A}_{k}} |\hat{\upsilon }_{j, k}-\upsilon _{j, k}| \le c n^{-\kappa }\right) \\&\ge 1 - 4 (n + 1) p s_{r} \exp \left( -c_{13} n^{1-2 \kappa -\xi }\right) \end{aligned}$$

holds for some constant \(c_{13}>0\). \(\square \)

1.13 Proof of Corollary 3

Proof

Suppose the Assumptions (i)–(ii) and the Conditions (C1)–C3 hold. Firstly, by the Theorem 1, for any positive constant \(c_{15}^{*}\) satisfies that \(c_{15}^{*} n^{-\kappa } = \hat{\rho }_{0}\), we get

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{15}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{17} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for some constant \(c_{17} > 0\). Let \(c_{15} = (K+\sum _{k=1}^{K}s_k) \cdot c_{15}^{*}\), by Theorem 1 and Theorem 4, we have

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\rho } - {\Upsilon } | \ge c_{15} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{\rho } - {\Upsilon } | \le c_{15} n^{-\kappa } )\\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{15}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{k=1}^{K} \prod _{j \in \mathcal {A}_k} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{15}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{17} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k}, \end{aligned}$$

where

$$\begin{aligned}&\textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{15}^{*} n^{-\kappa } \right) \\&= 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge c_{15}^{*} n^{-\kappa } \right) \\&\ge 1 -\sum _{j \notin \mathcal {A}_k} \textrm{Pr} \left( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge c_{15}^{*} n^{-\kappa } | j \in \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\qquad \cdot \textrm{Pr} \left( j \in \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\quad - \sum _{j \notin \mathcal {A}_k} \textrm{Pr} \left( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge c_{15}^{*} n^{-\kappa } | j \notin \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\qquad \cdot \textrm{Pr} \left( j \notin \hat{\mathcal {A}}_{\hat{\rho }_0, k} \right) \\&\ge 1 - q_k/p \cdot 4(n+2) p \exp (-c_{17} n^{1-2 \kappa -\xi } ) - q_k/p \cdot 0\\&\ge 1 - 4(n+2) p \exp (-c_{17} n^{1-2 \kappa -\xi } ) \end{aligned}$$

Similarly, by the Theorem 1, for any positive constant \(c_{16}^{*}\) satisfies that \(c_{16}^{*} n^{-\kappa } = \hat{\rho }_{0}\), we get

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{16}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{18} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for some constant \(c_{18} > 0\). Let \(c_{16} = (s + 1) \cdot K \cdot c_{16}^{*}\). By Theorem 1 and Theorem 4, we obtain

$$\begin{aligned}&\textrm{Pr} ( | {\hat{\Upsilon }_{\rho }}^{*} - {\Upsilon }^{*} | \ge c_{15} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( | {\hat{\Upsilon }_{\rho }}^{*} - {\Upsilon }^{*} | \le c_{16} n^{-\kappa } )\\&\le 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \le c_{16}^{*} K n^{-\kappa } \right) \\&\quad \cdot \prod _{j \in \mathcal {A}} \textrm{Pr} ( |\hat{\upsilon }_{j} - \upsilon _{j}| \le c_{16}^{*} K n^{-\kappa } ) \\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{16}^{*} n^{-\kappa } \right) \\&\quad \cdot \prod _{j \in \mathcal {A}} \prod _{k=1}^{K} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{16}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{18} n^{1-2 \kappa -\xi } ) ]^{(s+1)K}, \end{aligned}$$

where

$$\begin{aligned}&\textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \le Kc_{16}^{*} n^{-\kappa } \right) \\&= 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \ge Kc_{16}^{*} n^{-\kappa } \right) \\&\ge 1 - \sum _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge Kc_{16}^{*} n^{-\kappa } \right) \\&\ge 1 -\sum _{k=1}^{K} \sum _{j \notin \mathcal {A}} \textrm{Pr} \left( |\hat{\upsilon }_{j} - \upsilon _{j}| \ge c_{16}^{*} n^{-\kappa } | j \in \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\qquad \cdot \textrm{Pr} \left( j \in \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\quad - \sum _{k=1}^{K} \sum _{j \notin \mathcal {A}} \textrm{Pr} \left( |\hat{\upsilon }_{j} - \upsilon _{j}| \ge c_{16}^{*} n^{-\kappa } | j \notin \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\qquad \cdot \textrm{Pr} \left( j \notin \hat{\mathcal {A}}_{\hat{\rho }_0} \right) \\&\ge \left[ 1 - q/p \cdot 4(n+2) p \exp (-c_{18} n^{1-2 \kappa -\xi } ) - q/p \cdot 0 \right] ^{K}\\&\ge \left[ 1 - 4(n+2) p \exp (-c_{18} n^{1-2 \kappa -\xi } ) \right] ^{K}. \end{aligned}$$

\(\square \)

1.14 Proof of Theorem 6

Proof

Suppose Assumptions (i)–(ii) and Conditions (C1)–C3 hold. By Lemma 2 and Corollary 3, for any positive constants \(c_{19,1}\) and \(c_{19,2}\), such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0} - {{\Upsilon }}_{0, \hat{\rho }_0}| \ge c_{19,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{21} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k} \end{aligned}$$

and

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\hat{\rho }_0} - {{\Upsilon }}_{\hat{\rho }_0}| \ge c_{19,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{21} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k} \end{aligned}$$

hold for some positive constant \(c_{21}\). Let \(c_{19} = c_{19,1} + c_{19,2}\). Thus, we obtain that

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}_{\hat{\rho }_0} \ge c_{19} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}_{\hat{\rho }_0} \le c_{19} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0} - {{\Upsilon }}_{0, \hat{\rho }_0}| \le c_{19,1} n^{-\kappa } ) \\&\quad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0} - {{\Upsilon }}_{0, \hat{\rho }_0}| \le c_{19,2} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{21} n^{1-2 \kappa -\xi } ) ]^{K+\sum _{k=1}^{K}s_k} \right\} ^2, \end{aligned}$$

Similarly, By Lemma 2 and Corollary 3, for any positive constant \(c_{20} = c_{20,1} + c_{20,2}\), there exists \(c_{20,1}\) and \(c_{20,2}\) are positive constants, such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {{\Upsilon }}_{0, \hat{\rho }_0}^{*}| \ge c_{20,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{22} n^{1-2 \kappa -\xi } ) ]^{K(s+1)} \end{aligned}$$

and

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\hat{\rho }_0}^{*} - {{\Upsilon }}_{\hat{\rho }_0}^{*}| \ge c_{20,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{22} n^{1 - 2 \kappa -\xi } ) ]^{K(s+1)} \end{aligned}$$

hold for some constant \(c_{22}>0\). Then, we get that

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {\hat{\Upsilon }}_{\hat{\rho }_0}^{*} \ge c_{20} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {\hat{\Upsilon }}_{\hat{\rho }_0}^{*} \le c_{20} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {{\Upsilon }}_{0, \hat{\rho }_0}^{*}| \le c_{20,1} n^{-\kappa } ) \\&\quad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \hat{\rho }_0}^{*} - {{\Upsilon }}_{0, \hat{\rho }_0}^{*}| \le c_{20,2} n^{-\kappa } ) \\&\le \{1 - [ 1 - 4(n+2) p \exp (-c_{22} n^{1-2 \kappa -\xi } ) ]^{K(s+1)}\}^2, \end{aligned}$$

When \(n \rightarrow \infty \), it can be written as

$$\begin{aligned} \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}_{\hat{\rho }_0} \} \ge 0, \quad \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}^{*}_{0, \hat{\rho }_0} - {\hat{\Upsilon }}^{*}_{\hat{\rho }_0} \} \ge 0. \end{aligned}$$

Denote \(\hat{\mathbb {C}}_{0, \hat{\rho }_0} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{\hat{\rho }_0, j k} \cdot \hat{\upsilon }_{j k}\) and \(\hat{\mathbb {C}}_{0, \hat{\rho }_0}^{*} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \hat{S}_{\hat{\rho }_0, j} \cdot \hat{\upsilon }_{j}\). We have that

$$\begin{aligned} \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \hat{\rho }_0} = \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \hat{\rho }_0}^{*}. \end{aligned}$$

\(\square \)

1.15 Proof of Theorem 7

Proof

Similar to proof of Theorem 5, we have

$$\begin{aligned} {\text {Pr}} (\mathcal {A}_{k} \subset \hat{\mathcal {A}}_{k, \alpha } ) \ge 1- 8p s_{k} \exp \left( -c_{23} n^{1-2 \kappa -\xi }\right) \end{aligned}$$

holds for some constant \(c_{7}>0\).

To prove \(\widehat{\textrm{FDR}}_{\hat{\rho }_k} \rightarrow \alpha \) in probability, under the assumption that \(q_k/p \rightarrow 1\) as \(p\rightarrow \infty \) and for any \(\rho > 0\), by Corollary 1 and the Hoeffding’s inequality, it suffices to show that

$$\begin{aligned} \sup _{0 \le \rho \le \epsilon }\left[ \sum _{j \in \mathcal {A}_{k}^c} I({\hat{\upsilon }}_{j k} \ge \rho ) / \left\{ p S_{\chi ^2_1}(\rho )\right\} \right] \rightarrow 1 \end{aligned}$$
(A12)

in probability as \(n \rightarrow \infty \), where \(\epsilon > 0\) is a constant\(S_{\chi ^2_1}(\rho ) = 1 - F_{\chi ^2_1}(\rho )\), the survival function of the distribution of \(\chi ^2_1\). Thus,

$$\begin{aligned} {\text {FDP}}_{k, \rho }&= \frac{{\text {FD}}_{k, \rho }}{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \}} \\&= \frac{\sum _{j \in \mathcal {A}_{k}^c} I({\hat{\upsilon }}_{j k} \ge \rho ) / q_k}{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \} / p} \\&= \frac{p S_{\chi ^2_1}(\rho )}{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \}} \\&= \frac{p S_{\chi ^2_1}(\rho )}{\max \{p S_{\chi ^2_1}(\rho ) + \sum _{j \in \mathcal {A}_{k}} I({\hat{\upsilon }}_{j k} \ge \rho ), 1 \}}. \end{aligned}$$

Notice that \(\sum _{j \in \mathcal {A}_{k}} I({\hat{\upsilon }}_{j k} \ge \rho )\) is monotone in \(\rho \) and asymptotically converges to \(s_k\), and \(S_{\chi ^2_1}(\rho )\) is continuous and monotone. Then, there exists a unique constant \(0< \tilde{\rho }_k \le C n^{-\beta }\) such that

$$\begin{aligned} \frac{p S_{\chi ^2_1}(\hat{\rho }_k) }{\max \{\sum _{j \in \mathcal {I}} I({\hat{\upsilon }}_{j k} \ge \hat{\rho }_k), 1 \}} = \alpha \end{aligned}$$
(A13)

in probability as \(n \rightarrow \infty \). Therefore, according to the Eqs. (A12) and (A13), we obtain that

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{\widehat{{\text {FDR}}}_{\hat{\rho }_k}}{\alpha }=1. \end{aligned}$$

\(\square \)

1.16 Proof of Corollary 4

Proof

Suppose the Assumptions (i)–(ii) and the Conditions (C1)–C3 hold. By Theorem 1, for any positive constant \(c_{24}^{*} n^{-\kappa } = \max \{ \hat{\rho }_{\alpha , 1}, \ldots , \hat{\rho }_{\alpha , K} \}\),

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{24}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{26} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for some constant \(c_{26} > 0\). Due to that

$$ \begin{aligned}&\textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&= 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&\ge 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k \& j \in \hat{\mathcal {A}}_{\alpha , k}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \ge \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&\ge \left[ 1 - 4(n+2) p \exp (-c_{26} n^{1-2 \kappa -\xi } ) \right] ^{\frac{\alpha }{1-\alpha } s_k}. \end{aligned}$$

Then, by Theorem 1, Theorem 4, and Theorem 7, let \(c_{24} = c_{24}^{*} \sum _{k=1}^{K}s_k/(1-\alpha )\), we have

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha } - {\Upsilon } | \ge c_{24} n^{-\kappa } ) = 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha } - {\Upsilon } | \le c_{24} n^{-\kappa } )\\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}_k} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le \frac{\alpha }{1-\alpha } s_k c_{24}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{k=1}^{K} \prod _{j \in \mathcal {A}_k} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{24}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{26} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k/(1-\alpha )}. \end{aligned}$$

Similarly, by the Theorem 1, for any constant \(c_{25}^{*} > 0\) satisfies that \(c_{25}^{*} n^{-\kappa } = \hat{\rho }_{\alpha }\),

$$\begin{aligned}&\textrm{Pr}\left( \max _{1 \le j \le p} \left| \hat{\upsilon }_{j k} - \upsilon _{j k} \right| \ge c_{25}^{*} n^{-\kappa }\right) \\&\le 4(n+2) p \exp \left( -c_{27} n^{1-2 \kappa -\xi }\right) , \end{aligned}$$

holds for constant \(c_{27} > 0\). Let \(c_{25} = c_{25}^{*} Ks/(1-\alpha )\). By Theorem 1 and Theorem 4, we obtain

$$\begin{aligned}&\textrm{Pr} ( | {\hat{\Upsilon }_{\alpha }}^{*} - {\Upsilon }^{*} | \ge c_{25} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( | {\hat{\Upsilon }_{\alpha }}^{*} - {\Upsilon }^{*} | \le c_{25} n^{-\kappa } )\\&\le 1 - \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{j} - \upsilon _{j}| \le \frac{\alpha }{1-\alpha } s K c_{25}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{j \in \mathcal {A}} \textrm{Pr} ( |\hat{\upsilon }_{j} - \upsilon _{j}| \le c_{25}^{*} K n^{-\kappa } ) \\&\le 1 - \prod _{k=1}^{K} \textrm{Pr} \left( \sum _{j \notin \mathcal {A}} |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le \frac{\alpha }{1-\alpha } s c_{25}^{*} n^{-\kappa } \right) \\&\qquad \cdot \prod _{j \in \mathcal {A}} \prod _{k=1}^{K} \textrm{Pr} ( |\hat{\upsilon }_{jk} - \upsilon _{jk}| \le c_{25}^{*} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{27} n^{1-2 \kappa -\xi } ) ]^{sK/(1-\alpha )}. \end{aligned}$$

\(\square \)

1.17 Proof of Theorem 8

Proof

Suppose Assumptions (i)–(ii) and Conditions (C1)–C3 hold. By Lemma 2 and Corollary 4, for constant \(c_{28}= c_{28,1} + c_{28,2}\), where \(c_{28,1}\) and \(c_{28,2}\) are positive constants, such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha } - {{\Upsilon }}_{0, \alpha }| \ge c_{28,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{30} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k /(1-\alpha )}, \\&\textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha } - {{\Upsilon }}_{\alpha }| \ge c_{28,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{30} n^{1-2 \kappa -\xi } ) ]^{\sum _{k=1}^{K}s_k / (1-\alpha )} \end{aligned}$$

hold for positive constant \(c_{21}\). Thus, we obtain

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha } - {\hat{\Upsilon }}_{\alpha } \ge c_{28} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha } - {\hat{\Upsilon }}_{\alpha } \le c_{28} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha } - {{\Upsilon }}_{0, \alpha }| \le c_{28,1} n^{-\kappa } ) \\&\qquad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha } - {{\Upsilon }}_{0, \alpha }| \le c_{28,2} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{30} n^{1-2 \kappa -\xi } ) ]^{\frac{\sum _{k=1}^{K}s_k}{(1-\alpha )}} \right\} ^2, \end{aligned}$$

Similarly, By Lemma 2 and Corollary 3, for any positive constant \(c_{29} = c_{29,1} + c_{29,2}\), there exists \(c_{29,1}\) and \(c_{29,2}\) are positive constants, such that

$$\begin{aligned}&\textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha }^{*} - {{\Upsilon }}_{0, \alpha }^{*}| \ge c_{29,1} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{31} n^{1-2 \kappa -\xi } ) ]^{Ks/(1-\alpha )}, \\&\textrm{Pr} ( |{\hat{\Upsilon }}_{\alpha }^{*} - {{\Upsilon }}_{\alpha }^{*}| \ge c_{29,2} n^{-\kappa } ) \\&\le 1 - [ 1 - 4(n+2) p \exp (-c_{31} n^{1 - 2 \kappa -\xi } ) ]^{Ks/(1-\alpha )} \end{aligned}$$

hold for some positive constant \(c_{31}\). Then, we get

$$\begin{aligned}&\textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha }^{*} - {\hat{\Upsilon }}_{\alpha }^{*} \ge c_{29} n^{-\kappa } ) \\&= 1 - \textrm{Pr} ( {\hat{\Upsilon }}_{0, \alpha }^{*} - {\hat{\Upsilon }}_{\alpha }^{*} \le c_{29} n^{-\kappa } ) \\&\le 1 - \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha }^{*} - {{\Upsilon }}_{0, \alpha }^{*}| \le c_{29,1} n^{-\kappa } ) \\&\qquad \cdot \textrm{Pr} ( |{\hat{\Upsilon }}_{0, \alpha }^{*} - {{\Upsilon }}_{0, \alpha }^{*}| \le c_{29,2} n^{-\kappa } ) \\&\le \left\{ 1 - [ 1 - 4(n+2) p \exp (-c_{31} n^{1-2 \kappa -\xi } ) ]^{Ks/(1-\alpha )} \right\} ^2, \end{aligned}$$
Fig. 6
figure 6

Boxplot of PD data in each cluster by the clustering results of SSC-A

Fig. 7
figure 7

Boxplot of PD data in each cluster by the clustering results of ASSC-AFD-A

Fig. 8
figure 8

Boxplot of PD data in each cluster by the clustering results of ASSC-FDR-A

When \(n \rightarrow \infty \), it can be written as

$$\begin{aligned} \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}_{0, \alpha } - {\hat{\Upsilon }}_{\alpha } \} \ge 0, \quad \liminf _{n \rightarrow \infty } \{ {\hat{\Upsilon }}^{*}_{0, \alpha } - {\hat{\Upsilon }}^{*}_{\alpha } \} \ge 0. \end{aligned}$$

Denote \(\hat{\mathbb {C}}_{0, \alpha } = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \sum _{k=1}^{K} \hat{S}_{\alpha , j k} \cdot \hat{\upsilon }_{j k}\) and \(\hat{\mathbb {C}}_{0, \alpha }^{*} = \mathop {\arg \max }\limits _{\mathbb {C}} \sum _{j=1}^{p} \hat{S}_{\alpha , j} \cdot \hat{\upsilon }_{j}\). We have that

$$\begin{aligned} \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \alpha } = \liminf _{n \rightarrow \infty } \hat{\mathbb {C}}_{0, \alpha }^{*}. \end{aligned}$$

\(\square \)

Supplements

Fig. 9
figure 9

Boxplot of PD data in each cluster by the clustering results of SSC-C

Fig. 10
figure 10

Boxplot of PD data in each cluster by the clustering results of ASSC-FDR-C

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, Z., Chen, J., Qiu, H. et al. Adaptive sufficient sparse clustering by controlling false discovery. Stat Comput 34, 193 (2024). https://doi.org/10.1007/s11222-024-10507-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-024-10507-4

Keywords

Mathematics Subject Classification

Navigation