Abstract
In this paper, the authors propose a two-stage online debiased lasso estimation and statistical inference method for high-dimensional quantile regression (QR) models in the presence of streaming data. In the first stage, the authors modify the QR score function based on kernel smoothing and obtain the online lasso smoothed QR estimator through iterative algorithms. The estimation process only involves the current data batch and specific historical summary statistics, which perfectly accommodates to the special structure of streaming data. In the second stage, an online debiasing procedure is carried out to eliminate biases caused by the lasso penalty as well as the accumulative approximation error so that the asymptotic normality of the resulting estimator can be established. The authors conduct extensive numerical experiments to evaluate the performance of the proposed method. These experiments demonstrate the effectiveness of the proposed method and support the theoretical results. An application to the Beijing PM2.5 Dataset is also presented.
Similar content being viewed by others
References
Robbins H, Monro S, A stochastic approximation method, The Annals of Mathematical Statistics, 1951, 22(3): 400–407.
Wang C, Chen M H, Wu J, et al., Online updating method with new variables for big data streams, Canadian Journal of Statistics, 2018, 46(1): 123–146.
Lin L, Lu J, Li W, et al., Online updating statistics for heterogenous updating regressions via homogenization techniques, arXiv preprint, arXiv: 2106.12370, 2021.
Tibshirani R, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267–288.
Fan J and Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 2001, 96(456): 1348–1360.
Zou H and Hastie T, Regularization and variable selection via the elastic net, Journal of The Royal Statistical Society: Series B (Statistical Methodology), 2005, 67(2): 301–320.
Zou H, The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 2006, 101(476): 1418–1429.
Zhang C H, Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 2010, 38(2): 894–942.
Lian H and Fan Z, Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions, Journal of Machine Learning Research, 2017, 18): 1–26.
Zhao W, Zhang F, and Lian H, Debiasing and distributed estimation for high-dimensional quantile regression, IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(7): 2569–2577.
Lü S and Lian H, Debiased distributed learning for sparse partial linear models in high dimensions, Journal of Machine Learning Research, 2022, 23): 1–32.
Luo L, Han R, Lin Y, et al., Statistical inference in high-dimensional generalized linear models with streaming data, arXiv preprint, arXiv: 2108.04437, 2021.
Deshpande Y, Javanmard A, and Mehrabi M, Online debiasing for adaptively collected high-dimensional data with applications to time series analysis, Journal of the American Statistical Association, 2023, 118(542): 1126–1139.
Koenker R and Bassett J G, Regression quantiles, Econometrica: Journal of the Econometric Society, 1978, 46(1): 33–50.
Koenker R, Quantile regression: 40 years on, Annual Review of Economics, 2017, 9): 155–176.
Wang K, Wang H, and Li S, Renewable quantile regression for streaming datasets, Knowledge-Based Systems, 2022, 235): 107675.
Whang Y J, Smoothed empirical likelihood methods for quantile regression models, Econometric Theory, 2006, 22(2): 173–205.
Zhang T and Wang L, Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response, Computational Statistics & Data Analysis, 2020, 144): 106888.
Zhang C H and Zhang S S, Confidence intervals for low dimensional parameters in high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2014, 76(1): 217–242.
Van de Geer S, Bühlmann P, Ritov Y, et al., On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics, 2014, 42(3): 1166–1202.
Javanmard A and Montanari A, Confidence intervals and hypothesis testing for high-dimensional regression, Journal of Machine Learning Research, 2014, 15(1): 2869–2909.
Ning Y and Liu H, A general theory of hypothesis tests and confidence regions for sparse high dimensional models, The Annals of Statistics, 2017, 45(1): 158–195.
Lü X and Li R, Smoothed empirical likelihood confidence intervals for quantile regression parameters with auxiliary information, Statistical Methodology, 2013, 15): 46–54.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare no conflict of interest.
Additional information
This research was supported by the Fundamental Research Funds for the Central Universities and the National Natural Science Foundation of China under Grant No. 12271272.
Rights and permissions
About this article
Cite this article
Peng, Y., Wang, L. Two-Stage Online Debiased Lasso Estimation and Inference for High-Dimensional Quantile Regression with Streaming Data. J Syst Sci Complex 37, 1251–1270 (2024). https://doi.org/10.1007/s11424-023-3014-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-023-3014-y