Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Two-Stage Online Debiased Lasso Estimation and Inference for High-Dimensional Quantile Regression with Streaming Data

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

In this paper, the authors propose a two-stage online debiased lasso estimation and statistical inference method for high-dimensional quantile regression (QR) models in the presence of streaming data. In the first stage, the authors modify the QR score function based on kernel smoothing and obtain the online lasso smoothed QR estimator through iterative algorithms. The estimation process only involves the current data batch and specific historical summary statistics, which perfectly accommodates to the special structure of streaming data. In the second stage, an online debiasing procedure is carried out to eliminate biases caused by the lasso penalty as well as the accumulative approximation error so that the asymptotic normality of the resulting estimator can be established. The authors conduct extensive numerical experiments to evaluate the performance of the proposed method. These experiments demonstrate the effectiveness of the proposed method and support the theoretical results. An application to the Beijing PM2.5 Dataset is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Robbins H, Monro S, A stochastic approximation method, The Annals of Mathematical Statistics, 1951, 22(3): 400–407.

    Article  MathSciNet  Google Scholar 

  2. Wang C, Chen M H, Wu J, et al., Online updating method with new variables for big data streams, Canadian Journal of Statistics, 2018, 46(1): 123–146.

    Article  MathSciNet  Google Scholar 

  3. Lin L, Lu J, Li W, et al., Online updating statistics for heterogenous updating regressions via homogenization techniques, arXiv preprint, arXiv: 2106.12370, 2021.

  4. Tibshirani R, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267–288.

    MathSciNet  Google Scholar 

  5. Fan J and Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 2001, 96(456): 1348–1360.

    Article  MathSciNet  Google Scholar 

  6. Zou H and Hastie T, Regularization and variable selection via the elastic net, Journal of The Royal Statistical Society: Series B (Statistical Methodology), 2005, 67(2): 301–320.

    Article  MathSciNet  Google Scholar 

  7. Zou H, The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 2006, 101(476): 1418–1429.

    Article  MathSciNet  Google Scholar 

  8. Zhang C H, Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 2010, 38(2): 894–942.

    Article  MathSciNet  Google Scholar 

  9. Lian H and Fan Z, Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions, Journal of Machine Learning Research, 2017, 18): 1–26.

    Google Scholar 

  10. Zhao W, Zhang F, and Lian H, Debiasing and distributed estimation for high-dimensional quantile regression, IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(7): 2569–2577.

    MathSciNet  Google Scholar 

  11. Lü S and Lian H, Debiased distributed learning for sparse partial linear models in high dimensions, Journal of Machine Learning Research, 2022, 23): 1–32.

    MathSciNet  Google Scholar 

  12. Luo L, Han R, Lin Y, et al., Statistical inference in high-dimensional generalized linear models with streaming data, arXiv preprint, arXiv: 2108.04437, 2021.

  13. Deshpande Y, Javanmard A, and Mehrabi M, Online debiasing for adaptively collected high-dimensional data with applications to time series analysis, Journal of the American Statistical Association, 2023, 118(542): 1126–1139.

    Article  MathSciNet  Google Scholar 

  14. Koenker R and Bassett J G, Regression quantiles, Econometrica: Journal of the Econometric Society, 1978, 46(1): 33–50.

    Article  MathSciNet  Google Scholar 

  15. Koenker R, Quantile regression: 40 years on, Annual Review of Economics, 2017, 9): 155–176.

    Article  Google Scholar 

  16. Wang K, Wang H, and Li S, Renewable quantile regression for streaming datasets, Knowledge-Based Systems, 2022, 235): 107675.

    Article  Google Scholar 

  17. Whang Y J, Smoothed empirical likelihood methods for quantile regression models, Econometric Theory, 2006, 22(2): 173–205.

    Article  MathSciNet  Google Scholar 

  18. Zhang T and Wang L, Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response, Computational Statistics & Data Analysis, 2020, 144): 106888.

    Article  MathSciNet  Google Scholar 

  19. Zhang C H and Zhang S S, Confidence intervals for low dimensional parameters in high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2014, 76(1): 217–242.

    Article  MathSciNet  Google Scholar 

  20. Van de Geer S, Bühlmann P, Ritov Y, et al., On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics, 2014, 42(3): 1166–1202.

    Article  MathSciNet  Google Scholar 

  21. Javanmard A and Montanari A, Confidence intervals and hypothesis testing for high-dimensional regression, Journal of Machine Learning Research, 2014, 15(1): 2869–2909.

    MathSciNet  Google Scholar 

  22. Ning Y and Liu H, A general theory of hypothesis tests and confidence regions for sparse high dimensional models, The Annals of Statistics, 2017, 45(1): 158–195.

    Article  MathSciNet  Google Scholar 

  23. Lü X and Li R, Smoothed empirical likelihood confidence intervals for quantile regression parameters with auxiliary information, Statistical Methodology, 2013, 15): 46–54.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Wang.

Ethics declarations

The authors declare no conflict of interest.

Additional information

This research was supported by the Fundamental Research Funds for the Central Universities and the National Natural Science Foundation of China under Grant No. 12271272.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, Y., Wang, L. Two-Stage Online Debiased Lasso Estimation and Inference for High-Dimensional Quantile Regression with Streaming Data. J Syst Sci Complex 37, 1251–1270 (2024). https://doi.org/10.1007/s11424-023-3014-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-023-3014-y

Keywords

Navigation