1. Introduction
Time series analysis is useful in many application domains for understanding patterns over time and for forecasting into the future. Most time series methods have been developed for regularly spaced time series, where the gaps between two consecutive time points are fixed and equal. Regular time series include hourly, daily, monthly, or annually recorded time series. Recently, there has been increasing interest in developing methods for analyzing and forecasting irregularly spaced time series, where the gaps between subsequent observations are not the same. Such irregular time series are observed in diverse fields such as astronomy [
1], climatology [
2], finance [
3], etc. For instance, intra-day transaction-level data in finance consist of prices of a financial asset recorded at each trade within a trading day, resulting in irregular time intervals (in seconds, say) between consecutive trades. An example in real estate consists of the time series of sale prices of houses, where the discrete time gaps (in days or weeks) between subsequent sale dates are typically nonconstant.
Consider an irregularly spaced time series consisting of observations at discrete times , where . The gaps between consecutive time points, denoted by , are not constant.
To model irregularly spaced time series with discrete gaps, Nagaraja et al. [
4] proposed the stationary gap time autoregressive (Gap AR(1)) model:
where
is the error term with the standard normal distribution, i.e.,
,
is the autoregressive parameter, and
is a scale parameter. They used (
1) to model and forecast house prices and then constructed a house price index. Other models have also been constructed in the literature for irregularly spaced time series. Erdogan et al. [
5] described a nonstationary irregularly spaced autoregressive (NIS-AR(1)) model and illustrated its use on astronomy data. Anantharaman et al. [
6] described Bayesian modeling for an irregular stochastic volatility autoregressive conditional duration (IR-SV-ACD) model and used this for estimating and forecasting inter-transaction gaps and the volatility of financial log-returns.
In this paper, we refer to the Gap AR(1) model in (
1) as the irregularly spaced AR(1) or the IS-AR(1) model. Estimation in this model can be handled by the method of maximum likelihood. While a model for irregularly spaced time series enables handling a wider range of temporal scenarios, the statistical analysis involves considerable computational complexity. Specifically, the task of capturing temporal structures by estimating the model parameters becomes more challenging due to the irregular gaps. One potential solution to address these computational challenges is to perform the computations of interest on
subsamples that are selected from the full data.
Despite extensive recent research on subsampling techniques, subsampling methods tailored for irregularly spaced time series data are sparse. In this article, we fill this gap by proposing novel subsampling methods specifically designed for irregularly spaced time series data analyzed by the IS-AR(1) model. In comparison to other subsampling methods, our proposed algorithms are founded upon the optimality criteria under classical statistical frameworks. It should be noted that the term “subsample” in time series analysis generally refers to a subset of data that are in the form of multiple series, blocks, or sequences, and the main objective of their analysis is to provide estimates of the variance of the full-data estimator; see Carlstein [
7], Fukuchi [
8], Politis [
9]. By contrast, the subsampling methods discussed in this paper are used to provide estimates of model parameters based on subsamples from the full data in situations where obtaining such estimates from the full data is computationally prohibitive.
The remainder of this paper is organized as follows.
Section 3 shows full-data estimates for parameters of the IS-AR(1) model. Subsampling methods for the IS-AR(1) model are described in
Section 4. These methods include a random subsampling method based on A-optimality (in
Section 4.1), information-based subdata selection (in
Section 4.2), and a sequential thinning method (in
Section 4.3).
Section 5.1 illustrates the techniques using simulated data. Lastly, in
Section 6, we summarize the key findings of our study.
2. Background
Section 2.1 gives a brief review of irregularly spaced time series, while
Section 2.2 reviews subsampling methods from the recent literature.
2.1. Modeling Irregularly Spaced Time Series
Irregularly spaced (or unevenly spaced) time series occur in many domains including astronomy, biomedicine, climatology, ecology, environment, finance, geology, etc. For instance, high-frequency financial transactions typically occur at irregularly spaced time points within a trading day, as each trade is recorded. Within a trading day, the elapsed times (durations) between consecutive trades are not the same for any selected stock. These times also vary between different stocks. In any given time interval (say, one hour), transactions of a stock may occur rapidly, separated by short durations, or occur slowly with longer durations. Since methods that are used for modeling and forecasting regular time series [
10] are not useful for analyzing irregular time series, modified approaches have been developed.
A recent approach for analyzing irregularly spaced time series is the gap time autoregressive (AR) model in (
1), which was discussed in [
4] for modeling house prices. In this model, the higher the value of
, the less useful
becomes for explaining and predicting
. Erdogan et al. [
5] described AR type models for stationary and nonstationary irregular time series. Similar models were used in [
1,
11] to model and forecast future values of irregularly spaced astronomical light curves from variable stars, for which Elorrieta et al. [
12] proposed a bivariate model.
Ghysels and Jasiak [
13] proposed the autoregressive conditional duration generalized autoregressive conditionally heteroscedastic (ACD-GARCH) model for analyzing irregularly spaced financial returns, employing the computationally cumbersome Generalized Method of Moments (GMM) approach for estimating parameters. Meddahi et al. [
14] proposed a GARCH-type model for irregularly spaced time series by discretizing a continuous time stochastic volatility process, thereby combining the advantages of the ACD-GARCH model [
13] and the ACD model [
15]. Maller et al. [
16] described a continuous version of the GARCH (i.e., the COGARCH) model for irregularly spaced time series, while [
17] proposed a multivariate local-level model with score-driven covariance matrices for intra-day log prices, treating the asynchronicity as a missing data problem. Recently, [
6] extended the gap time modeling idea of [
4] to construct useful time series models to understand volatility patterns in irregularly spaced financial time series, considering the gaps as random variables. An alternate stochastic volatility model treating the gaps as fixed constants was discussed in [
18].
Most methods for analyzing long irregularly spaced time series can be computationally challenging. Subsampling methods can help us obtain estimates in a computationally feasible way.
2.2. Subsampling Methods
Constructing parameter estimates based on subsamples from the full data is a popular technique to speed up computations. While the simplest approach of uniform sampling may not be effective for extracting useful information from a large dataset, optimized subsampling methods do provide a better trade-off between estimation efficiency and computational efficiency. Such methods have attracted significant attention in recent years because they are designed to (a) give higher preference to more informative data points and (b) be subject to less information loss. Typical practices include stochastic subsampling and deterministic subdata selection.
Stochastic subsampling methods are successful because they specify inclusion probabilities that allow more informative data points to have a higher chance of being included in the subsample. In early attempts, Drineas et al. [
19] advocated the use of normalized statistical leverage scores as subsampling probabilities in least squares estimation problems, while Yang et al. [
20] showed that using the normalized square roots of the statistical leverage scores provides a tighter error bound. Ma et al. [
21] examined the statistical properties of estimators resulting from subsampling methods based on statistical leverage scores, they and termed it
algorithmic leverage. Xie et al. [
22] applied the statistical leverage scores for subsampling under a vector autoregressive model.
This approach has been used in several statistical modeling frameworks. Zhu [
23] proposed a subsampling method using the gradients of the objective function for linear models. Wang et al. [
24] proposed optimal subsampling probabilities under A-optimality and L-optimality criteria for logistic regression. Teng et al. [
25] examined the asymptotic properties of subsampling estimators for generalized linear models under unbounded design. The optimal subsampling framework has been extended to other modeling scenarios such as multiclass logistic regression, generalized linear models, and quantile regressions.
While the aforementioned studies are based on sampling with replacement (in which the same data point may be included more than once in the subsample), further research derived optimal subsampling probabilities and a distributed sampling strategy under Poisson sampling (sampling without replacement). Wang et al. [
26] further showed that Poisson sampling can be superior to sampling with replacement in terms of estimation efficiency. Poisson sampling for irregular time series analysis is described in
Section 4.1.
Deterministic subdata selection uses a particular criterion to determine a subsample without involving additional randomness. Wang et al. [
27] introduced a novel method, termed
information-based optimal subsampling (IBOSS), designed to select data points with extreme values to approximate the D-optimality criterion in designed experiments. Pronzato and Wang [
28] later proposed an online selection strategy that leverages directional derivatives to decide whether to include a data point in a subsample, aiming to achieve optimality according to a given criterion. This approach processes only the data points encountered up to the current time, determining whether the current point should be included in the subsample. As a result, it is especially well suited for applications involving streaming data.
Despite the rapid recent developments mentioned above, applications of the stochastic subsampling and deterministic subdata selection methods remain unexplored for irregularly spaced time series.
3. Full-Data Estimation for the IS-AR(1) Model
Given a time series of length
m from the IS-AR(1) model in (
1), we present the full-data maximum likelihood estimates (MLEs) of the model parameters. Since
m is assumed to be large, we can obtain the MLEs of the unknown parameters
by maximizing the
conditional log-likelihood function (which ignores the marginal distribution of the initial data point
at time
). Up to a normalizing constant, this has the form
The MLE
is the maximizer of (
2) and must be solved numerically since an analytical solution is infeasible. Finding the MLE is challenging due to (a) the domain restriction (i.e.,
) and (b) the nonconcavity of the objective function. These challenges often cause convergence issues for the the classical Newton–Raphson algorithm.
Since
is a correlation coefficient constrained to lie between
and 1, an unconstrained gradient-based optimization method may not work well. To remedy this, we use Fisher’s
z-transformation to map
from
onto
so that an unconstrained optimizing algorithm can be used on the transformed parameter. Although the choice of the transformation is not unique and other transformations may be used, Fisher’s
z-transformation provides a well-understood estimator of the correlation coefficient, especially when the error term
follows a normal distribution [
29]. Specifically, let
so that
We have
and
where
and
are, respectively, the hyperbolic tangent and secant functions. Below, we show the use of the coordinate ascent algorithm to maximize (
2) in terms of
z; we then insert the estimate
into (
3) to obtain the estimate of
.
We denote the gradient vector and Hessian matrix of the per-observation log-likelihood
in (
2) as follows:
and
The first and second derivatives of
with respect to
z are, respectively, denoted by
We describe the coordinate ascent algorithm. Suppose
is obtained in the
k-th step of the algorithm. We find the value of
given
by solving
, which gives
Given
, we find the value of
z in the
-th step by implementing the Newton–Raphson algorithm, i.e., computing
for
until convergence, where
and
. We alternate this updating of the values of
and
z until the values become stable.
We summarize the aforementioned estimation procedure in Algorithm 1. Here, is an initial value, while and are error tolerances to determine convergence.
For completeness, we provide below detailed expressions of the gradient vector and Hessian matrix of the per-observation log-likelihood:
where,
.
Algorithm 1 Numerical optimization algorithm. |
|
4. Subsampling Methods for the IS-AR(1) Model
If the size of the full data
m is moderate, we can easily obtain the full-data MLE by maximizing (
2) using Algorithm 1. However, it is computationally challenging to implement this algorithm when
m is very large. In this case, we may resort to subsampling strategies, using a small fraction of the full data to obtain estimators. We propose three distinct methods: (1) optimal subsampling under A-optimality (
opt), (2) information-based optimal subdata selection (
iboss), and (3) sequential thinning (
thin). We describe these procedures and propose practical algorithms in the following subsections.
4.1. Optimal Subsampling Under A-Optimality
Optimal subsampling is a stochastic subsampling strategy satisfying a certain optimality property. The inclusion of subsampled data points is determined by carefully designed probabilities in order to meet the optimality criterion. In this section, we implement optimal Poisson subsampling, which is suitable for time series and avoids the need to simultaneously access the full data. In comparison to sampling with replacement, Poisson sampling cannot sample the same data point more than once. Since repeatedly sampled data points do not contribute any new information, Poisson sampling preserves more distinct information from the full data and is therefore more efficient than sampling with replacement. See Wang et al. [
26] for a theoretical justification of the superiority of Poisson sampling.
Let
be the sampling probability and
be the indicator variable for the inclusion of the
j-th data point in the subsample. With
randomly sampled from the uniform distribution
, we set
when
and
otherwise. The actual subsample size from Poisson sampling, which is denoted by
, is random. The expected subsample size is
. We obtain the subsample estimator
by maximizing the following target function:
The efficacy of
depends on the subsampling probabilities
. We obtain the A-optimal subsampling probabilities which depend on the unknown true parameter and minimize the asymptotic mean squared error of
[
24,
26]. In practice, this unknown parameter could be replaced by a pilot estimate
obtained from a pilot subsample of size
. The A-optimal subsampling probabilities with the pilot estimate
take the general form
where
is a parameter to ensure that the expected sample size is set around
r and prevent it from being too small, the notation
,
is the average Hessian matrix with the pilot subsample, and
is the indicator variable for inclusion of the
j-th data point in the pilot sample. While the exact value of
requires additional computation, it can be approximated by
when the sampling rate
is small. This is usually the case in practice.
The approximated subsampling probabilities
in (
7) are subject to additional disturbance from the pilot estimation, and small values of
may inflate the asymptotic variance of the resulting estimator. To address this problem, we mix
with the uniform subsampling probabilities to prevent the sampling probabilities from getting too small. Specifically, we use
where
is a tuning parameter. The final subsample estimator
is obtained by combining the pilot estimator
and the optimal subsample estimator
:
where
We summarize the A-optimal subsampling strategy in Algorithm 2.
Algorithm 2 Poisson sampling under A-optimality. |
|
4.2. Information-Based Optimal Subdata Selection
The
iboss method is a deterministic selection approach to obtain a subsample. Its basic motivation is to maximize the determinant of the Fisher information matrix conditioned on the covariates. Under the model specified in (
2), the conditional Fisher information for the observed data at time
is
where
The information matrix
has full rank. We select data points with the
r largest values of
,
. This can be undertaken efficiently by established methods like some partition-based selection algorithms [
30]. We implement the procedure via the following steps:
Calculate for .
Take the r data points with the largest using the Quickselect algorithm. Denote their inclusion indicators as s.
Obtain
by maximizing the following target function:
Obtain the final subsample estimator
by
where
is defined in (
8) and
We present the
iboss method in Algorithm 3.
Algorithm 3 Information-based optimal subdata selection. |
|
4.3. Sequential Thinning on Streaming Data
The
thin method proposed in Pronzato and Wang [
28] is another deterministic subdata selection approach. Unlike
iboss, which requires simultaneous access to the full data,
thin only uses the data points that are already included in the subsample to determine whether the next data point should be included in the subsample or not. This online decision nature of the
thin algorithm makes it suitable for time series data. We present the main idea of the
thin method based on D-optimality below.
Let the average information matrix of a subsample indexed by
be
where
and
is defined in (
10). The contribution of a data point, say
, to the average information matrix, if included in the subsample, can be measured by the directional derivative, which is defined as
The
thin method aims to include
in the subsample if
is large enough. This is motivated by the key result in optimal experimental design theory that the optimal design consists of design points with the largest directional derivatives in the design space [
31,
32]. In the context of subsampling, this means that we need to find the subsample with the
r largest directional derivatives under the unknown optimal average information matrix. In order to achieve this in an online manner, we need to sequentially estimate the upper
quantile for the distribution of the directional derivatives.
Unlike the linear models discussed in Pronzato and Wang [
28] for which the information matrix is completely known, the information matrix for our model depends on the unknown parameter
. We therefore need a pilot step to obtain a pilot estimator. We present the outline of the
thin method for our problem in the following steps:
Reserve the first
data points up to time
as a pilot sample, and use it to obtain a pilot estimate
as in
Section 4.1.
Calculate and obtain its sample upper -quantile , where and is the average information matrix for the pilot sample.
For , let be the average information matrix and be the estimated quantile from the subsample collected up to time . If , include the data point in the subsample and calculate the updated ; otherwise, . Calculate the updated .
Let
be the indicators for the
thin subsample collected in Step 3. Obtain the subsample estimator
by maximizing
Obtain the final subsample estimator
by
where
is defined in (
8) and
A detailed algorithm implementing
thin is detailed in Algorithm 4.
Algorithm 4 Sequential thinning under D-optimality. |
|
|
5. Numerical Results
We perform numerical simulation to evaluate the performance of the subsampling methods. We present the results on estimation efficiency in
Section 5.1 and the results on computational efficiency in
Section 5.2.
We consider the performances of the uniform subsample estimator (unif), the A-optimal subsample estimator (opt), the information-based optimal subdata selection estimator (iboss), and the sequential thinning estimator (thin). We set the full-data sample size as and consider different values of the true parameters; we allow and , 10 and 20. The time gaps are randomly sampled from the set .
We consider different subsample sizes , 10,000, 15,000, 20,000, 25,000, and 30,000, corresponding, respectively, to , and of the full dataset. We let . For opt, we set the mixing rate . To implement the thin subsampling procedure, we set the tuning parameters , , and . We also implement the uniform subsampling method with a subsample size as a benchmark for comparisons.
5.1. Estimation Efficiency
We use the empirical mean squared error (MSE) to measure the estimation efficiency of the subsample estimator with respect to the true parameter. We implement the subsampling methods discussed in
Section 4 and repeat the simulation 1000 times to calculate the empirical MSE, which is defined as
where
is the subsample estimate in the
s-th repetition. Due to the nonconvex nature of the model, some methods failed to converge in a few repetitions. We drop the results in these repetitions when calculating the
. This does not affect the accuracy of the
because the nonconvergence rate is very low.
The results in terms of the empirical MSE are presented in
Figure 1. It is seen that all three proposed subsampling methods outperform the uniform sampling in all the cases. Overall, the
opt algorithm displays the best performance, especially when
or
is large. The deterministic methods,
iboss and
thin, have higher estimation efficiency when
is small.
The varying performances in empirical MSE among the proposed subsampling methods are partly due to the different optimality criteria adopted. The A-optimality criterion used by opt seeks to minimize the trace of the asymptotic variance matrix, whereas the D-optimality criterion used by both iboss and thin focuses on minimizing the determinant of the asymptotic variance matrix. As a result, opt places greater emphasis on parameter components with larger variances, while iboss and thin distribute their focus more evenly across all parameters. When is small (e.g., ), the variance in estimating is low across all subsampling methods, allowing iboss and thin to express their advantage in reducing the variance of estimating , thereby outperforming opt. However, when , the variance in estimating becomes the dominant contributor to the empirical MSE. Since iboss and thin do not prioritize the estimation of , their performance weakens in this case. A similar reasoning applies to the differing performances across various values of . To better understand this, we further decompose the empirical MSEs for and to validate our explanation.
In order to further examine the performance of the subsampling methods on estimating different types of parameters, we plot MSE
ϕ for estimating
in
Figure 2 and MSE
σ for estimating
in
Figure 3. Irrespective of the values of
and
,
iboss and
thin outperform
opt in estimating
, while they fall short in estimating
compared to
opt.
We observe that the proposed subsampling methods provide greater benefit for smaller r. As r increases, the differences between the methods gradually diminish. This is to be expected, since all estimators converge to the full-data estimator when r gets closer to m. Therefore, with larger datasets, the advantage of using optimal subsampling estimators becomes more pronounced.
In the IS-AR(1) model, the predicted value at any given time point is directly influenced by the correlation coefficient
. Therefore, more accurate estimates of
lead to better prediction accuracy. As shown in
Figure 2, the optimal subsampling methods provide exceptional performance in estimating
, especially when using the
iboss and
thin approaches. Notably, with a subsample size of
, both
iboss and
thin outperform the uniform sampling method, which uses a significantly larger subsample size of
r = 30,000 for estimating
.
5.2. Time Complexity
To evaluate the computational efficiency of the subsampling methods, we repeat the simulation 30 times and record the average computational times for each method. For comparison, we implement the full-data estimator (
full) using the algorithm described in
Section 3 as a benchmark.
We consider three different full-data sample sizes,
,
, and
, and six subsample sizes
, and 3000.
Table 1 reports the average computational times in milliseconds for the case where
and
. The computation times for other values of
and
follow a similar pattern. These results are based on simulations using a Julia implementation run on an Apple MacBook Pro with an M1 Pro chip.
The entries in the table show a substantial reduction in computation times by using the subsampling methods compared to the time taken for full-data estimation. Unlike the uniform subsampling method, which incurs almost no computational overhead during the subsampling process, optimal subsampling algorithms require additional computations to determine the inclusion probability for each data point. The numerical optimization using subsamples has a time complexity of , whereas calculating the subsampling probabilities for nonuniform subsampling methods requires a cost of .
Compared to full-data-based estimation, both opt and iboss demonstrate exceptional efficiency, reducing the computation time by a factor of 1/40 on average when the sample size m is sufficiently large. Although not as fast as these two methods, thin still achieves significant computational savings, taking less than one-tenth of the time required for the full-data estimation. Additionally, an advantage of using thin is that subsample selection can be performed sequentially. Overall, all three proposed optimal subsampling methods for the IS-AR(1) model offer reliable strategies for reducing the computational burden in large-scale data applications.
6. Conclusions
In this paper, we investigated the technique of computationally feasible subsampling in the context of irregularly spaced time series data. We proposed practical algorithms for implementing the opt, iboss, and thin methods for the IS-AR(1) model. The numerical results demonstrated that the proposed subsampling methods outperform the naive uniform subsampling approach with improved estimation efficiency and show significant benefits in reducing the computation time compared with the full-data estimation.
While our work is currently focused on the IS-AR(1) model, it highlights the potential of optimal subsampling methods for time series data. In future research, these techniques can be extended to more complex models, such as IS-AR(p) for . Typically, as the number of parameters increases, optimal subsampling methods become even more effective in reducing computation times. We hope that our work paves the way for further exploration of subsampling strategies in time series analysis.
Author Contributions
Conceptualization, J.L. and Z.W.; methodology, J.L. and Z.W.; formal analysis, J.L. and Z.W.; investigation, J.L. and Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, N.R. and H.W.; supervision, N.R. and H.W.; project administration, N.R. and H.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Data used for the paper are computer-generated. Codes for generating the data are available upon reasonable request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Elorrieta, F.; Eyheramendy, S.; Palma, W. Discrete-time autoregressive model for unequally spaced time-series observations. Astron. Astrophys. 2019, 627, A120. [Google Scholar] [CrossRef]
- Mudelsee, M. Trend analysis of climate time series: A review of methods. Earth-Sci. Rev. 2019, 190, 310–322. [Google Scholar] [CrossRef]
- Dutta, C.; Karpman, K.; Basu, S.; Ravishanker, N. Review of statistical approaches for modeling high-frequency trading data. Sankhya B 2023, 85, 1–48. [Google Scholar] [CrossRef]
- Nagaraja, C.H.; Brown, L.D.; Zhao, L.H. An autoregressive approach to house price modeling. Ann. Appl. Stat. 2011, 5, 124–149. [Google Scholar] [CrossRef]
- Erdogan, E.; Ma, S.; Beygelzimer, A.; Rish, I. Statistical models for unequally spaced time series. In Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, Newport Beach, CA, USA, 21–23 April 2005; pp. 626–630. [Google Scholar]
- Anantharaman, S.; Ravishanker, N.; Basu, S. Hierarchical modeling of irregularly spaced financial returns. Stat 2024, 13, e692. [Google Scholar] [CrossRef]
- Carlstein, E. The use of subseries values for estimating the variance of a general statistic from a stationary sequence. Ann. Stat. 1986, 14, 1171–1179. [Google Scholar] [CrossRef]
- Fukuchi, J.I. Subsampling and model selection in time series analysis. Biometrika 1999, 86, 591–604. [Google Scholar] [CrossRef]
- Politis, D.N. Scalable subsampling: Computation, aggregation and inference. Biometrika 2023, 111, 347–354. [Google Scholar] [CrossRef]
- Shumway, R. Time Series Analysis and Its Applications; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
- Eyheramendy, S.; Elorrieta, F.; Palma, W. An autoregressive model for irregular time series of variable stars. Proc. Int. Astron. Union 2016, 12, 259–262. [Google Scholar] [CrossRef]
- Elorrieta, F.; Eyheramendy, S.; Palma, W.; Ojeda, C. A novel bivariate autoregressive model for predicting and forecasting irregularly observed time series. Mon. Not. R. Astron. Soc. 2021, 505, 1105–1116. [Google Scholar] [CrossRef]
- Ghysels, E.; Jasiak, J. GARCH for irregularly spaced financial data: The ACD-GARCH model. Stud. Nonlinear Dyn. Econom. 1998, 2, 1–19. [Google Scholar] [CrossRef]
- Meddahi, N.; Renault, E.; Werker, B. GARCH and irregularly spaced data. Econ. Lett. 2006, 90, 200–204. [Google Scholar] [CrossRef]
- Engle, R.F.; Russell, J.R. Autoregressive conditional duration: A new model for irregularly spaced transaction data. Econometrica 1998, 66, 1127–1162. [Google Scholar] [CrossRef]
- Maller, R.A.; Müller, G.; Szimayer, A. GARCH modelling in continuous time for irregularly spaced time series data. Bernoulli 2008, 14, 519–542. [Google Scholar] [CrossRef]
- Buccheri, G.; Bormetti, G.; Corsi, F.; Lillo, F. A score-driven conditional correlation model for noisy and asynchronous data: An application to high-frequency covariance dynamics. J. Bus. Econ. Stat. 2021, 39, 920–936. [Google Scholar] [CrossRef]
- Dutta, C. Modeling Multiple Irregularly Spaced High-Frequency Financial Time Series. Ph.D. Thesis, University of Connecticut, Storrs, CT, USA, 2022. [Google Scholar]
- Drineas, P.; Mahoney, M.W.; Muthukrishnan, S. Sampling algorithms for l2 regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, USA, 22–24 January 2006; pp. 1127–1136. [Google Scholar]
- Yang, T.; Zhang, L.; Jin, R.; Zhu, S. An explicit sampling dependent spectral error bound for column subset selection. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 135–143. [Google Scholar]
- Ma, P.; Mahoney, M.W.; Yu, B. A statistical perspective on algorithmic leveraging. J. Mach. Learn. Res. 2015, 16, 861–991. [Google Scholar]
- Xie, R.; Wang, Z.; Bai, S.; Ma, P.; Zhong, W. Online decentralized leverage score sampling for streaming multidimensional time series. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 16–18 April 2019; Volume 89, pp. 2301–2311. [Google Scholar]
- Zhu, R. Gradient-based sampling: An adaptive importance sampling for least-squares. Adv. Neural Inf. Process. Syst. 2018, 29, 406–414. [Google Scholar]
- Wang, H.; Zhu, R.; Ma, P. Optimal subsampling for large sample logistic regression. J. Am. Stat. Assoc. 2018, 13, 829–844. [Google Scholar] [CrossRef]
- Teng, G.; Tian, B.; Zhang, Y.; Fu, S. Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design. Entropy 2022, 25, 84. [Google Scholar] [CrossRef]
- Wang, J.; Zou, J.; Wang, H. Sampling with replacement vs Poisson sampling: A comparative study in optimal subsampling. IEEE Trans. Inf. Theory 2022, 68, 6605–6630. [Google Scholar] [CrossRef]
- Wang, H.; Yang, M.; Stufken, J. Information-based optimal subdata selection for big data linear regression. J. Am. Stat. Assoc. 2019, 114, 393–405. [Google Scholar] [CrossRef]
- Pronzato, L.; Wang, H. Sequential online subsampling for thinning experimental designs. J. Stat. Plan. Inference 2021, 212, 169–193. [Google Scholar] [CrossRef]
- Casella, G.; Berger, R. Statistical Inference; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
- Kleinberg, J.; Tardos, E. Algorithm Design; Pearson/Addison-Wesley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Wynn, H. Optimum Submeasures with Applications to Finite Population Sampling; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
- Fedorov, V.V.; Hackl, P. Model-Oriented Design of Experiments; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).