Open AccessArticle

Subsampling Algorithms for Irregularly Spaced Autoregressive Models

Department of Statistics, University of Connecticut, Storrs, CT 06269, USA

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2024, 17(11), 524; https://doi.org/10.3390/a17110524

Submission received: 25 September 2024 / Revised: 6 November 2024 / Accepted: 11 November 2024 / Published: 15 November 2024

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download

Browse Figures

Versions Notes

Abstract

With the exponential growth of data across diverse fields, applying conventional statistical methods directly to large-scale datasets has become computationally infeasible. To overcome this challenge, subsampling algorithms are widely used to perform statistical analyses on smaller, more manageable subsets of the data. The effectiveness of these methods depends on their ability to identify and select data points that improve the estimation efficiency according to some optimality criteria. While much of the existing research has focused on subsampling techniques for independent data, there is considerable potential for developing methods tailored to dependent data, particularly in time-dependent contexts. In this study, we extend subsampling techniques to irregularly spaced time series data which are modeled by irregularly spaced autoregressive models. We present frameworks for various subsampling approaches, including optimal subsampling under A-optimality, information-based optimal subdata selection, and sequential thinning on streaming data. These methods use A-optimality or D-optimality criteria to assess the usefulness of each data point and prioritize the inclusion of the most informative ones. We then assess the performance of these subsampling methods using numerical simulations, providing insights into their suitability and effectiveness for handling irregularly spaced long time series. Numerical results show that our algorithms have promising performance. Their estimation efficiency can be ten times as high as that of the uniform sampling estimator. They also significantly reduce the computational time and can be up to forty times faster than the full-data estimator.

Keywords:

subsampling; irregularly spaced autoregressive model; time series; big data

1. Introduction

Time series analysis is useful in many application domains for understanding patterns over time and for forecasting into the future. Most time series methods have been developed for regularly spaced time series, where the gaps between two consecutive time points are fixed and equal. Regular time series include hourly, daily, monthly, or annually recorded time series. Recently, there has been increasing interest in developing methods for analyzing and forecasting irregularly spaced time series, where the gaps between subsequent observations are not the same. Such irregular time series are observed in diverse fields such as astronomy [1], climatology [2], finance [3], etc. For instance, intra-day transaction-level data in finance consist of prices of a financial asset recorded at each trade within a trading day, resulting in irregular time intervals (in seconds, say) between consecutive trades. An example in real estate consists of the time series of sale prices of houses, where the discrete time gaps (in days or weeks) between subsequent sale dates are typically nonconstant.

Consider an irregularly spaced time series

y_{j}, j = 0, \dots, m

consisting of observations at discrete times

t_{j}

, where

t_{0} < t_{1} < \dots < t_{m}

. The gaps between consecutive time points, denoted by

d_{j} = t_{j} - t_{j - 1}, j = 1, \dots, m

, are not constant.

To model irregularly spaced time series with discrete gaps, Nagaraja et al. [4] proposed the stationary gap time autoregressive (Gap AR(1)) model:

y_{0} = σ \sqrt{\frac{1}{1 - ϕ^{2}}} ε_{0}, y_{j} = ϕ^{d_{j}} y_{j - 1} + σ \sqrt{\frac{1 - ϕ^{2 d_{j}}}{1 - ϕ^{2}}} ε_{j} for j = 1, \dots, m,

(1)

where

ε_{j}

is the error term with the standard normal distribution, i.e.,

ε_{j} \sim N (0, 1)

ϕ \in (- 1, 1)

is the autoregressive parameter, and

σ > 0

is a scale parameter. They used (1) to model and forecast house prices and then constructed a house price index. Other models have also been constructed in the literature for irregularly spaced time series. Erdogan et al. [5] described a nonstationary irregularly spaced autoregressive (NIS-AR(1)) model and illustrated its use on astronomy data. Anantharaman et al. [6] described Bayesian modeling for an irregular stochastic volatility autoregressive conditional duration (IR-SV-ACD) model and used this for estimating and forecasting inter-transaction gaps and the volatility of financial log-returns.

In this paper, we refer to the Gap AR(1) model in (1) as the irregularly spaced AR(1) or the IS-AR(1) model. Estimation in this model can be handled by the method of maximum likelihood. While a model for irregularly spaced time series enables handling a wider range of temporal scenarios, the statistical analysis involves considerable computational complexity. Specifically, the task of capturing temporal structures by estimating the model parameters becomes more challenging due to the irregular gaps. One potential solution to address these computational challenges is to perform the computations of interest on subsamples that are selected from the full data.

Despite extensive recent research on subsampling techniques, subsampling methods tailored for irregularly spaced time series data are sparse. In this article, we fill this gap by proposing novel subsampling methods specifically designed for irregularly spaced time series data analyzed by the IS-AR(1) model. In comparison to other subsampling methods, our proposed algorithms are founded upon the optimality criteria under classical statistical frameworks. It should be noted that the term “subsample” in time series analysis generally refers to a subset of data that are in the form of multiple series, blocks, or sequences, and the main objective of their analysis is to provide estimates of the variance of the full-data estimator; see Carlstein [7], Fukuchi [8], Politis [9]. By contrast, the subsampling methods discussed in this paper are used to provide estimates of model parameters based on subsamples from the full data in situations where obtaining such estimates from the full data is computationally prohibitive.

The remainder of this paper is organized as follows. Section 3 shows full-data estimates for parameters of the IS-AR(1) model. Subsampling methods for the IS-AR(1) model are described in Section 4. These methods include a random subsampling method based on A-optimality (in Section 4.1), information-based subdata selection (in Section 4.2), and a sequential thinning method (in Section 4.3). Section 5.1 illustrates the techniques using simulated data. Lastly, in Section 6, we summarize the key findings of our study.

2. Background

Section 2.1 gives a brief review of irregularly spaced time series, while Section 2.2 reviews subsampling methods from the recent literature.

2.1. Modeling Irregularly Spaced Time Series

Irregularly spaced (or unevenly spaced) time series occur in many domains including astronomy, biomedicine, climatology, ecology, environment, finance, geology, etc. For instance, high-frequency financial transactions typically occur at irregularly spaced time points within a trading day, as each trade is recorded. Within a trading day, the elapsed times (durations) between consecutive trades are not the same for any selected stock. These times also vary between different stocks. In any given time interval (say, one hour), transactions of a stock may occur rapidly, separated by short durations, or occur slowly with longer durations. Since methods that are used for modeling and forecasting regular time series [10] are not useful for analyzing irregular time series, modified approaches have been developed.

A recent approach for analyzing irregularly spaced time series is the gap time autoregressive (AR) model in (1), which was discussed in [4] for modeling house prices. In this model, the higher the value of

d_{j}

, the less useful

y_{j - 1}

becomes for explaining and predicting

y_{j}

. Erdogan et al. [5] described AR type models for stationary and nonstationary irregular time series. Similar models were used in [1,11] to model and forecast future values of irregularly spaced astronomical light curves from variable stars, for which Elorrieta et al. [12] proposed a bivariate model.

Ghysels and Jasiak [13] proposed the autoregressive conditional duration generalized autoregressive conditionally heteroscedastic (ACD-GARCH) model for analyzing irregularly spaced financial returns, employing the computationally cumbersome Generalized Method of Moments (GMM) approach for estimating parameters. Meddahi et al. [14] proposed a GARCH-type model for irregularly spaced time series by discretizing a continuous time stochastic volatility process, thereby combining the advantages of the ACD-GARCH model [13] and the ACD model [15]. Maller et al. [16] described a continuous version of the GARCH (i.e., the COGARCH) model for irregularly spaced time series, while [17] proposed a multivariate local-level model with score-driven covariance matrices for intra-day log prices, treating the asynchronicity as a missing data problem. Recently, [6] extended the gap time modeling idea of [4] to construct useful time series models to understand volatility patterns in irregularly spaced financial time series, considering the gaps as random variables. An alternate stochastic volatility model treating the gaps as fixed constants was discussed in [18].

Most methods for analyzing long irregularly spaced time series can be computationally challenging. Subsampling methods can help us obtain estimates in a computationally feasible way.

2.2. Subsampling Methods

Constructing parameter estimates based on subsamples from the full data is a popular technique to speed up computations. While the simplest approach of uniform sampling may not be effective for extracting useful information from a large dataset, optimized subsampling methods do provide a better trade-off between estimation efficiency and computational efficiency. Such methods have attracted significant attention in recent years because they are designed to (a) give higher preference to more informative data points and (b) be subject to less information loss. Typical practices include stochastic subsampling and deterministic subdata selection.

Stochastic subsampling methods are successful because they specify inclusion probabilities that allow more informative data points to have a higher chance of being included in the subsample. In early attempts, Drineas et al. [19] advocated the use of normalized statistical leverage scores as subsampling probabilities in least squares estimation problems, while Yang et al. [20] showed that using the normalized square roots of the statistical leverage scores provides a tighter error bound. Ma et al. [21] examined the statistical properties of estimators resulting from subsampling methods based on statistical leverage scores, they and termed it algorithmic leverage. Xie et al. [22] applied the statistical leverage scores for subsampling under a vector autoregressive model.

This approach has been used in several statistical modeling frameworks. Zhu [23] proposed a subsampling method using the gradients of the objective function for linear models. Wang et al. [24] proposed optimal subsampling probabilities under A-optimality and L-optimality criteria for logistic regression. Teng et al. [25] examined the asymptotic properties of subsampling estimators for generalized linear models under unbounded design. The optimal subsampling framework has been extended to other modeling scenarios such as multiclass logistic regression, generalized linear models, and quantile regressions.

While the aforementioned studies are based on sampling with replacement (in which the same data point may be included more than once in the subsample), further research derived optimal subsampling probabilities and a distributed sampling strategy under Poisson sampling (sampling without replacement). Wang et al. [26] further showed that Poisson sampling can be superior to sampling with replacement in terms of estimation efficiency. Poisson sampling for irregular time series analysis is described in Section 4.1.

Deterministic subdata selection uses a particular criterion to determine a subsample without involving additional randomness. Wang et al. [27] introduced a novel method, termed information-based optimal subsampling (IBOSS), designed to select data points with extreme values to approximate the D-optimality criterion in designed experiments. Pronzato and Wang [28] later proposed an online selection strategy that leverages directional derivatives to decide whether to include a data point in a subsample, aiming to achieve optimality according to a given criterion. This approach processes only the data points encountered up to the current time, determining whether the current point should be included in the subsample. As a result, it is especially well suited for applications involving streaming data.

Despite the rapid recent developments mentioned above, applications of the stochastic subsampling and deterministic subdata selection methods remain unexplored for irregularly spaced time series.

3. Full-Data Estimation for the IS-AR(1) Model

Given a time series of length m from the IS-AR(1) model in (1), we present the full-data maximum likelihood estimates (MLEs) of the model parameters. Since m is assumed to be large, we can obtain the MLEs of the unknown parameters

θ = {(ϕ, σ)}^{'}

by maximizing the conditional log-likelihood function (which ignores the marginal distribution of the initial data point

y_{0}

at time

t_{0}

). Up to a normalizing constant, this has the form

\begin{matrix} M_{m} (θ) & = \frac{1}{m} \sum_{j = 1}^{m} ℓ (θ; y_{j}, y_{j - 1}, d_{j}) \\ = \frac{1}{m} \sum_{j = 1}^{m} (- log σ - \frac{1}{2} log \frac{1 - ϕ^{2 d_{j}}}{1 - ϕ^{2}} - \frac{{(y_{j} - ϕ^{d_{j}} y_{j - 1})}^{2} (1 - ϕ^{2})}{2 σ^{2} (1 - ϕ^{2 d_{j}})}) . \end{matrix}

(2)

The MLE

\hat{θ} = arg {max}_{θ} M_{m} (θ)

is the maximizer of (2) and must be solved numerically since an analytical solution is infeasible. Finding the MLE is challenging due to (a) the domain restriction (i.e.,

- 1 < ϕ < 1

) and (b) the nonconcavity of the objective function. These challenges often cause convergence issues for the the classical Newton–Raphson algorithm.

Since

ϕ

is a correlation coefficient constrained to lie between

- 1

and 1, an unconstrained gradient-based optimization method may not work well. To remedy this, we use Fisher’s z-transformation to map

ϕ

from

(- 1, 1)

onto

R

so that an unconstrained optimizing algorithm can be used on the transformed parameter. Although the choice of the transformation is not unique and other transformations may be used, Fisher’s z-transformation provides a well-understood estimator of the correlation coefficient, especially when the error term

ε_{j}

follows a normal distribution [29]. Specifically, let

z = atanh (ϕ) = \frac{1}{2} log (\frac{1 + ϕ}{1 - ϕ}),

so that

ϕ = \tanh (z) = \frac{exp (2 z) - 1}{exp (2 z) + 1} .

(3)

We have

\frac{\partial ϕ}{\partial z} = {sech}^{2} (z) = 1 - ϕ^{2},

and

\frac{\partial^{2} ϕ}{\partial z^{2}} = - 2 \tanh (z) {sech}^{2} (z) = - 2 ϕ (1 - ϕ^{2}),

where

\tanh (\cdot)

and

sech (\cdot)

are, respectively, the hyperbolic tangent and secant functions. Below, we show the use of the coordinate ascent algorithm to maximize (2) in terms of z; we then insert the estimate

\hat{z}

into (3) to obtain the estimate of

ϕ

We denote the gradient vector and Hessian matrix of the per-observation log-likelihood

ℓ (θ; y_{j}, y_{j - 1}, d_{j})

in (2) as follows:

\dot{ℓ} (θ; y_{j}, y_{j - 1}, d_{j}) = \frac{\partial ℓ (θ; y_{j}, y_{j - 1}, d_{j})}{\partial θ} = [\begin{matrix} {\dot{ℓ}}_{1} (θ; y_{j}, y_{j - 1}, d_{j}) \\ {\dot{ℓ}}_{2} (θ; y_{j}, y_{j - 1}, d_{j}) \end{matrix}],

(4)

and

\ddot{ℓ} (θ; y_{j}, y_{j - 1}, d_{j}) = \frac{\partial^{2} ℓ (θ; y_{j}, y_{j - 1}, d_{j})}{\partial θ \partial θ^{'}} = [\begin{matrix} {\ddot{ℓ}}_{11} (θ; y_{j}, y_{j - 1}, d_{j}) & {\ddot{ℓ}}_{12} (θ; y_{j}, y_{j - 1}, d_{j}) \\ {\ddot{ℓ}}_{12} (θ; y_{j}, y_{j - 1}, d_{j}) & {\ddot{ℓ}}_{22} (θ; y_{j}, y_{j - 1}, d_{j}) \end{matrix}] .

(5)

The first and second derivatives of

ℓ (θ; y_{j}, y_{j - 1}, d_{j})

with respect to z are, respectively, denoted by

{\dot{ℓ}}_{z 1} (θ) = (1 - ϕ^{2}) {\dot{ℓ}}_{1} (θ) and {\ddot{ℓ}}_{z 11} (θ) = {(1 - ϕ^{2})}^{2} {\ddot{ℓ}}_{11} (θ) - 2 (1 - ϕ^{2}) ϕ {\dot{ℓ}}_{1} (θ) .

We describe the coordinate ascent algorithm. Suppose

ϕ^{(k)} = \tanh (z^{(k)})

is obtained in the k-th step of the algorithm. We find the value of

σ

given

ϕ^{(k)}

by solving

{\dot{ℓ}}_{2} (ϕ^{(k)}, σ) = 0

, which gives

σ^{(k)} = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} \frac{{(y_{j} - ϕ^{d_{j}} y_{j - 1})}^{2} (1 - {ϕ^{(k)}}^{2})}{1 - {ϕ^{(k)}}^{2 d_{j}}}} .

Given

σ^{(k)}

, we find the value of z in the

(k + 1)

-th step by implementing the Newton–Raphson algorithm, i.e., computing

z^{(k + 1), l + 1} = z^{(k + 1), l} - \frac{\sum_{j = 1}^{m} {\dot{ℓ}}_{z 1} (ϕ^{(k + 1), l}, σ^{(k)}; y_{j}, y_{j - 1}, d_{j})}{\sum_{j = 1}^{m} {\ddot{ℓ}}_{z 11} (ϕ^{(k + 1), l}, σ^{(k)}; y_{j}, y_{j - 1}, d_{j})}

for

l = 0, 1, 2, \dots

until convergence, where

ϕ^{(k + 1), 0} = ϕ^{(k)}

and

ϕ^{(k + 1), l} = \tanh (z^{(k + 1), l})

. We alternate this updating of the values of

σ

and z until the values become stable.

We summarize the aforementioned estimation procedure in Algorithm 1. Here,

ϕ^{(0)}

is an initial value, while

ϵ_{NR}

and

ϵ_{CA}

are error tolerances to determine convergence.

For completeness, we provide below detailed expressions of the gradient vector and Hessian matrix of the per-observation log-likelihood:

\begin{matrix} {\dot{ℓ}}_{1} (θ; y_{j}, y_{j - 1}, d_{j}) & = \frac{d_{j} ϕ^{2 d_{j} - 1}}{1 - ϕ^{2 d_{j}}} - \frac{ϕ}{1 - ϕ^{2}} + \frac{ϕ e_{j}^{2}}{σ^{2} (1 - ϕ^{2 d_{j}})} \\ + \frac{1 - ϕ^{2}}{σ^{2} {(1 - ϕ^{2 d_{j}})}^{2}} d_{j} e_{j} (ϕ^{d_{j} - 1} y_{j - 1} - ϕ^{2 d_{j} - 1} y_{j}) \\ {\dot{ℓ}}_{2} (θ; y_{j}, y_{j - 1}, d_{j}) & = - \frac{1}{σ} + \frac{e_{j}^{2} (1 - ϕ^{2})}{σ^{3} (1 - ϕ^{2 d_{j}})}, \\ {\ddot{ℓ}}_{11} (θ; y_{j}, y_{j - 1}, d_{j}) & = \frac{2 d_{j}^{2} ϕ^{2 d_{j} - 2}}{{(1 - ϕ^{2 d_{j}})}^{2}} - \frac{d_{j} ϕ^{2 d_{j} - 2}}{1 - ϕ^{2 d_{j}}} - \frac{1 + ϕ^{2}}{{(1 - ϕ^{2})}^{2}} \\ + \frac{e_{j}^{2} - 2 d_{j} e_{j} ϕ^{d_{j}} y_{j - 1}}{σ^{2} (1 - ϕ^{2 d_{j}})} + \frac{2 d_{j} e_{j}^{2} ϕ^{2 d_{j}}}{σ^{2} {(1 - ϕ^{2 d_{j}})}^{2}} \\ + \frac{(1 - ϕ^{2}) d_{j}^{2} ϕ^{2 d_{j} - 2} (2 y_{j} e_{j} - y_{j - 1}^{2} + y_{j}^{2})}{σ^{2} {(1 - ϕ^{2 d_{j}})}^{2}} \\ + \frac{d_{j} e_{j} (y_{j - 1} - ϕ^{d_{j}} y_{j}) [(d_{j} - 1) ϕ^{d_{j} - 2} - (d_{j} + 1) ϕ^{d_{j}}]}{σ^{2} {(1 - ϕ^{2 d_{j}})}^{2}} \\ - \frac{4 (1 - ϕ^{2}) d_{j}^{2} e_{j}^{2} ϕ^{2 d_{j} - 2}}{σ^{2} {(1 - ϕ^{2 d_{j}})}^{3}}, \\ {\ddot{ℓ}}_{12} (θ; y_{j}, y_{j - 1}, d_{j}) & = \frac{2 d_{j} e_{j} ϕ^{d_{j} - 1} (ϕ^{d_{j}} y_{j} - y_{j - 1}) (1 - ϕ^{2})}{σ^{3} {(1 - ϕ^{2 d_{j}})}^{2}} - \frac{2 e_{j}^{2} ϕ}{σ^{3} (1 - ϕ^{2 d_{j}})}, \\ {\ddot{ℓ}}_{22} (θ; y_{j}, y_{j - 1}, d_{j}) & = \frac{1}{σ^{2}} - \frac{3 e_{j}^{2} (1 - ϕ^{2})}{σ^{4} (1 - ϕ^{2 d_{j}})}, \end{matrix}

where,

e_{j} = y_{j} - ϕ^{d_{j}} y_{j - 1}

Algorithm 1 Numerical optimization algorithm.

4. Subsampling Methods for the IS-AR(1) Model

If the size of the full data m is moderate, we can easily obtain the full-data MLE by maximizing (2) using Algorithm 1. However, it is computationally challenging to implement this algorithm when m is very large. In this case, we may resort to subsampling strategies, using a small fraction of the full data to obtain estimators. We propose three distinct methods: (1) optimal subsampling under A-optimality (opt), (2) information-based optimal subdata selection (iboss), and (3) sequential thinning (thin). We describe these procedures and propose practical algorithms in the following subsections.

4.1. Optimal Subsampling Under A-Optimality

Optimal subsampling is a stochastic subsampling strategy satisfying a certain optimality property. The inclusion of subsampled data points is determined by carefully designed probabilities in order to meet the optimality criterion. In this section, we implement optimal Poisson subsampling, which is suitable for time series and avoids the need to simultaneously access the full data. In comparison to sampling with replacement, Poisson sampling cannot sample the same data point more than once. Since repeatedly sampled data points do not contribute any new information, Poisson sampling preserves more distinct information from the full data and is therefore more efficient than sampling with replacement. See Wang et al. [26] for a theoretical justification of the superiority of Poisson sampling.

Let

η_{j}

be the sampling probability and

δ_{j}

be the indicator variable for the inclusion of the j-th data point in the subsample. With

u_{j}

randomly sampled from the uniform distribution

U (0, 1)

, we set

δ_{j} = 1

when

u_{j} \leq η_{j}

and

δ_{j} = 0

otherwise. The actual subsample size from Poisson sampling, which is denoted by

r^{*} = \sum_{j = 1}^{m} δ_{j}

, is random. The expected subsample size is

r = \sum_{j = 1}^{m} η_{j}

. We obtain the subsample estimator

\overset{˘}{θ}

by maximizing the following target function:

M_{r}^{(opt)} (θ) = \frac{1}{m} \sum_{j = 1}^{m} δ_{j} \frac{ℓ (θ; y_{j}, y_{j - 1}, d_{j})}{η_{j}} .

(6)

The efficacy of

\overset{˘}{θ}

depends on the subsampling probabilities

{η_{j}}_{j = 1}^{m}

. We obtain the A-optimal subsampling probabilities which depend on the unknown true parameter and minimize the asymptotic mean squared error of

\overset{˘}{θ}

[24,26]. In practice, this unknown parameter could be replaced by a pilot estimate

θ^{(plt)}

obtained from a pilot subsample of size

r_{0}

. The A-optimal subsampling probabilities with the pilot estimate

θ^{(plt)}

take the general form

{\hat{η}}_{j}^{(opt)} = ∥ κ {\{{\ddot{M}}_{r_{0}}^{(plt)} (θ^{(plt)})\}}^{- 1} \dot{ℓ} (θ^{(plt)}; y_{j}, y_{j - 1}, d_{j}) ∥ \land 1, for j = 1, \dots, m,

(7)

where

κ

is a parameter to ensure that the expected sample size is set around r and prevent it from being too small, the notation

a \land b = \min (a, b)

{\ddot{M}}_{r_{0}}^{(plt)} (θ) = \frac{1}{r_{0}} \sum_{j = 1}^{m} δ_{j}^{(plt)} \ddot{ℓ} (θ; y_{j}, y_{j - 1}, d_{j})

(8)

is the average Hessian matrix with the pilot subsample, and

δ_{j}^{(plt)}

is the indicator variable for inclusion of the j-th data point in the pilot sample. While the exact value of

κ

requires additional computation, it can be approximated by

κ_{a} = \frac{r}{\sum_{j = 1}^{m} ∥ {\{{\ddot{M}}_{r_{0}}^{(plt)} (θ^{(plt)})\}}^{- 1} \dot{ℓ} (θ^{(plt)}; y_{j}, y_{j - 1}, d_{j}) ∥}

when the sampling rate

r / m

is small. This is usually the case in practice.

The approximated subsampling probabilities

{\hat{η}}_{j}^{(opt)}

in (7) are subject to additional disturbance from the pilot estimation, and small values of

{\hat{η}}_{j}^{(opt)}

may inflate the asymptotic variance of the resulting estimator. To address this problem, we mix

{\hat{η}}_{j}^{(opt)}

with the uniform subsampling probabilities to prevent the sampling probabilities from getting too small. Specifically, we use

(1 - α_{1}) η_{j}^{(opt)} + α_{1} \frac{r}{m}, for j = 1, \dots, m,

where

α_{1} \in (0, 1)

is a tuning parameter. The final subsample estimator

\tilde{θ}

is obtained by combining the pilot estimator

θ^{(plt)}

and the optimal subsample estimator

\overset{˘}{θ}

\tilde{θ} = {(r_{0} {\ddot{M}}_{r_{0}}^{(plt)} (θ^{(plt)}) + r^{*} {\ddot{M}}_{r}^{(opt)} (\overset{˘}{θ}))}^{- 1} (r_{0} {\ddot{M}}_{r_{0}}^{(plt)} (θ^{(plt)}) θ^{(plt)} + r^{*} {\ddot{M}}_{r}^{(opt)} (\overset{˘}{θ}) \overset{˘}{θ}),

(9)

where

{\ddot{M}}_{r}^{(opt)} (\overset{˘}{θ}) = \frac{1}{m} \sum_{j = 1}^{m} δ_{j} \frac{\ddot{ℓ} (\overset{˘}{θ}; y_{j}, y_{j - 1}, d_{j})}{{\hat{η}}_{α_{1}, j}^{(opt)}} .

We summarize the A-optimal subsampling strategy in Algorithm 2.

Algorithm 2 Poisson sampling under A-optimality.

4.2. Information-Based Optimal Subdata Selection

The iboss method is a deterministic selection approach to obtain a subsample. Its basic motivation is to maximize the determinant of the Fisher information matrix conditioned on the covariates. Under the model specified in (2), the conditional Fisher information for the observed data at time

t_{j}

\begin{matrix} I_{j} (θ | y_{j - 1}, d_{j}) & = - E ([\begin{matrix} {(\frac{\partial ℓ_{j}}{\partial ϕ})}^{2} & \frac{\partial ℓ_{j}}{\partial ϕ} \frac{\partial ℓ_{j}}{\partial σ} \\ \frac{\partial ℓ_{j}}{\partial ϕ} \frac{\partial ℓ_{j}}{\partial σ} & {(\frac{\partial ℓ_{j}}{\partial σ})}^{2} \end{matrix}] | y_{j - 1}, d_{j}) \\ = [\begin{matrix} 2 c_{j} {(θ)}^{2} + \frac{d_{j}^{2} ϕ^{2 d_{j} - 2} y_{j - 1}^{2} (1 - ϕ^{2})}{σ^{2} (1 - ϕ^{2 d_{j}})} & - \frac{2}{σ} c_{j} (θ) \\ - \frac{2}{σ} c_{j} (θ) & \frac{2}{σ^{2}} \end{matrix}], \end{matrix}

(10)

where

c_{j} (θ) = \frac{d_{j} ϕ^{2 d_{j} - 1}}{1 - ϕ^{2 d_{j}}} - \frac{ϕ}{1 - ϕ^{2}} .

The information matrix

I_{j} (θ | y_{j - 1}, d_{j})

has full rank. We select data points with the r largest values of

\det {I_{j} (θ | y_{j - 1}, d_{j})}

j = 1, \dots, m

. This can be undertaken efficiently by established methods like some partition-based selection algorithms [30]. We implement the procedure via the following steps:

Obtain a pilot estimate $θ^{(plt)}$ as in Section 4.1.
Calculate $\det (I_{j} (θ^{(plt)} | y_{t_{j - 1}}, d_{j}))$ for $j = 1, \dots, m$ .
Take the r data points with the largest $\det (I_{j} (θ^{(plt)} | y_{j - 1}, d_{j}))$ using the Quickselect algorithm. Denote their inclusion indicators as $δ_{j}^{(iboss)}$ s.
Obtain ${\overset{˘}{θ}}^{(iboss)}$ by maximizing the following target function:

$M_{r}^{(iboss)} (θ) = \frac{1}{r} \sum_{j = 1}^{m} δ_{j}^{(iboss)} ℓ (θ; y_{j}, y_{j - 1}, d_{j}) .$
Obtain the final subsample estimator ${\tilde{θ}}^{(iboss)}$ by

$\begin{matrix} {\tilde{θ}}^{(iboss)} = & {(r_{0} {\ddot{M}}_{r_{0}}^{(plt)} ({\overset{˘}{θ}}^{(iboss)}) + r {\ddot{M}}_{r}^{(iboss)} ({\overset{˘}{θ}}^{(iboss)}))}^{- 1} \\ \times (r_{0} {\ddot{M}}_{r_{0}}^{(plt)} ({\overset{˘}{θ}}^{(iboss)}) \times θ^{(plt)} + r {\ddot{M}}_{r}^{(iboss)} ({\overset{˘}{θ}}^{(iboss)}) \times {\overset{˘}{θ}}^{(iboss)}), \end{matrix}$

(11)

where ${\ddot{M}}_{r_{0}}^{(plt)} (θ)$ is defined in (8) and

${\ddot{M}}_{r}^{(iboss)} ({\overset{˘}{θ}}^{(iboss)}) = \frac{1}{r} \sum_{j = 1}^{m} δ_{j} \ddot{ℓ} ({\overset{˘}{θ}}^{(iboss)}; y_{j}, y_{j - 1}, d_{j}) .$

We present the iboss method in Algorithm 3.

Algorithm 3 Information-based optimal subdata selection.

4.3. Sequential Thinning on Streaming Data

The thin method proposed in Pronzato and Wang [28] is another deterministic subdata selection approach. Unlike iboss, which requires simultaneous access to the full data, thin only uses the data points that are already included in the subsample to determine whether the next data point should be included in the subsample or not. This online decision nature of the thin algorithm makes it suitable for time series data. We present the main idea of the thin method based on D-optimality below.

Let the average information matrix of a subsample indexed by

S = {δ_{j}^{S}}_{j = 1}^{m}

I_{S} (θ) = \frac{1}{r_{S}} \sum_{j = 1}^{m} δ_{j}^{S} I_{j} (θ | y_{j - 1}, d_{j}),

(12)

where

r_{S} = \sum_{j = 1}^{m} δ_{j}^{S}

and

I_{j} (θ | y_{j - 1}, d_{j})

is defined in (10). The contribution of a data point, say

(y_{j^{'}}, y_{j^{'} - 1}, d_{j^{'}})

, to the average information matrix, if included in the subsample, can be measured by the directional derivative, which is defined as

ζ_{I_{S} (θ), j^{'}} = tr ({I_{S} (θ)}^{- 1} I_{j^{'}} (θ | y_{j^{'} - 1}, d_{j^{'}})) - 2 .

(13)

The thin method aims to include

(y_{j^{'}}, y_{j^{'} - 1}, d_{j^{'}})

in the subsample if

ζ_{I_{S} (θ), j^{'}}

is large enough. This is motivated by the key result in optimal experimental design theory that the optimal design consists of design points with the largest directional derivatives in the design space [31,32]. In the context of subsampling, this means that we need to find the subsample with the r largest directional derivatives under the unknown optimal average information matrix. In order to achieve this in an online manner, we need to sequentially estimate the upper

r / m

quantile for the distribution of the directional derivatives.

Unlike the linear models discussed in Pronzato and Wang [28] for which the information matrix is completely known, the information matrix for our model depends on the unknown parameter

θ

. We therefore need a pilot step to obtain a pilot estimator. We present the outline of the thin method for our problem in the following steps:

Reserve the first $r_{0}$ data points up to time $t_{r_{0}}$ as a pilot sample, and use it to obtain a pilot estimate $θ^{(plt)}$ as in Section 4.1.
Calculate ${ζ_{I_{r_{0}} (θ^{(plt)}), j}}_{j = 1}^{r_{0}}$ and obtain its sample upper $α$ -quantile ${\hat{C}}_{r_{0}}$ , where $α = r / (m - r_{0})$ and $I_{r_{0}} (θ^{(plt)})$ is the average information matrix for the pilot sample.
For $j = r_{0} + 1, \dots, m$ , let $I_{j - 1} (θ^{(plt)})$ be the average information matrix and ${\hat{C}}_{j - 1}$ be the estimated quantile from the subsample collected up to time $t_{j - 1}$ . If $ζ_{I_{j - 1} (θ^{(plt)}), j} > {\hat{C}}_{j - 1}$ , include the data point $(y_{j}, y_{j - 1}, d_{j})$ in the subsample and calculate the updated $I_{j} (θ^{(plt)})$ ; otherwise, $I_{j} (θ^{(plt)}) = I_{j - 1} (θ^{(plt)})$ . Calculate the updated ${\hat{C}}_{j}$ .
Let $δ_{j}^{(thin)}$ be the indicators for the thin subsample collected in Step 3. Obtain the subsample estimator ${\overset{˘}{θ}}^{(thin)}$ by maximizing

$M_{r}^{(thin)} (θ) = \frac{1}{r} \sum_{j = r_{0} + 1}^{m} δ_{j}^{(thin)} ℓ (θ; y_{j}, y_{j - 1}, d_{j}) .$
Obtain the final subsample estimator ${\tilde{θ}}^{(thin)}$ by

$\begin{matrix} {\tilde{θ}}^{(thin)} = & {(r_{0} {\ddot{M}}_{r_{0}}^{(plt)} ({\overset{˘}{θ}}^{(thin)}) + r {\ddot{M}}_{r}^{(thin)} ({\overset{˘}{θ}}^{(thin)}))}^{- 1} \\ \times (r_{0} {\ddot{M}}_{r_{0}}^{(plt)} ({\overset{˘}{θ}}^{(thin)}) \times θ^{(plt)} + r {\ddot{M}}_{r}^{(thin)} ({\overset{˘}{θ}}^{(thin)}) \times {\overset{˘}{θ}}^{(thin)}), \end{matrix}$

(14)

where ${\ddot{M}}_{r_{0}}^{(plt)} (θ)$ is defined in (8) and

${\ddot{M}}_{r}^{(thin)} (\overset{˘}{θ}) = \frac{1}{r} \sum_{j = 1}^{m} δ_{j}^{(thin)} \ddot{ℓ} (\overset{˘}{θ}; y_{j}, y_{j - 1}, d_{j}) .$

A detailed algorithm implementing thin is detailed in Algorithm 4.

Algorithm 4 Sequential thinning under D-optimality.

5. Numerical Results

We perform numerical simulation to evaluate the performance of the subsampling methods. We present the results on estimation efficiency in Section 5.1 and the results on computational efficiency in Section 5.2.

We consider the performances of the uniform subsample estimator (unif), the A-optimal subsample estimator (opt), the information-based optimal subdata selection estimator (iboss), and the sequential thinning estimator (thin). We set the full-data sample size as

m = 10^{5}

and consider different values of the true parameters; we allow

ϕ = 0.1, 0.5, 0.9

and

σ = 1

, 10 and 20. The time gaps

d_{j}

are randomly sampled from the set

{1, 2, 3, 4, 5}

We consider different subsample sizes

r = 5000

, 10,000, 15,000, 20,000, 25,000, and 30,000, corresponding, respectively, to

5 %, 10 %, 15 %, 20 %, 25 %

, and

30 %

of the full dataset. We let

r_{0} = 1000

. For opt, we set the mixing rate

α_{1} = 0.1

. To implement the thin subsampling procedure, we set the tuning parameters

q = 0.7

γ = 0.1

, and

ϵ_{1} = 0.001

. We also implement the uniform subsampling method with a subsample size

r + r_{0}

as a benchmark for comparisons.

5.1. Estimation Efficiency

We use the empirical mean squared error (MSE) to measure the estimation efficiency of the subsample estimator with respect to the true parameter. We implement the subsampling methods discussed in Section 4 and repeat the simulation 1000 times to calculate the empirical MSE, which is defined as

MSE = \frac{1}{1000} \sum_{s = 1}^{1000} {∥{\tilde{θ}}^{(s)} - θ∥}^{2} = \frac{1}{1000} \sum_{s = 1}^{1000} [{({\tilde{ϕ}}^{(s)} - ϕ)}^{2} + {({\tilde{σ}}^{(s)} - σ)}^{2}] = {MSE}_{ϕ} + {MSE}_{σ},

(15)

where

{\tilde{θ}}^{(s)} = ({\tilde{ϕ}}^{(s)}, {\tilde{σ}}^{(s)})

is the subsample estimate in the s-th repetition. Due to the nonconvex nature of the model, some methods failed to converge in a few repetitions. We drop the results in these repetitions when calculating the

MSE

. This does not affect the accuracy of the

MSE

because the nonconvergence rate is very low.

The results in terms of the empirical MSE are presented in Figure 1. It is seen that all three proposed subsampling methods outperform the uniform sampling in all the cases. Overall, the opt algorithm displays the best performance, especially when

σ

ϕ

is large. The deterministic methods, iboss and thin, have higher estimation efficiency when

σ

is small.

The varying performances in empirical MSE among the proposed subsampling methods are partly due to the different optimality criteria adopted. The A-optimality criterion used by opt seeks to minimize the trace of the asymptotic variance matrix, whereas the D-optimality criterion used by both iboss and thin focuses on minimizing the determinant of the asymptotic variance matrix. As a result, opt places greater emphasis on parameter components with larger variances, while iboss and thin distribute their focus more evenly across all parameters. When

σ

is small (e.g.,

σ = 1

), the variance in estimating

σ

is low across all subsampling methods, allowing iboss and thin to express their advantage in reducing the variance of estimating

ϕ

, thereby outperforming opt. However, when

σ = 20

, the variance in estimating

σ

becomes the dominant contributor to the empirical MSE. Since iboss and thin do not prioritize the estimation of

σ

, their performance weakens in this case. A similar reasoning applies to the differing performances across various values of

ϕ

. To better understand this, we further decompose the empirical MSEs for

σ

and

ϕ

to validate our explanation.

In order to further examine the performance of the subsampling methods on estimating different types of parameters, we plot MSE_ϕ for estimating

ϕ

in Figure 2 and MSE_σ for estimating

σ

in Figure 3. Irrespective of the values of

ϕ

and

σ

, iboss and thin outperform opt in estimating

ϕ

, while they fall short in estimating

σ

compared to opt.

We observe that the proposed subsampling methods provide greater benefit for smaller r. As r increases, the differences between the methods gradually diminish. This is to be expected, since all estimators converge to the full-data estimator when r gets closer to m. Therefore, with larger datasets, the advantage of using optimal subsampling estimators becomes more pronounced.

In the IS-AR(1) model, the predicted value at any given time point is directly influenced by the correlation coefficient

ϕ

. Therefore, more accurate estimates of

ϕ

lead to better prediction accuracy. As shown in Figure 2, the optimal subsampling methods provide exceptional performance in estimating

ϕ

, especially when using the iboss and thin approaches. Notably, with a subsample size of

r = 5000

, both iboss and thin outperform the uniform sampling method, which uses a significantly larger subsample size of r = 30,000 for estimating

ϕ

5.2. Time Complexity

To evaluate the computational efficiency of the subsampling methods, we repeat the simulation 30 times and record the average computational times for each method. For comparison, we implement the full-data estimator (full) using the algorithm described in Section 3 as a benchmark.

We consider three different full-data sample sizes,

m = 10^{4}

10^{5}

, and

10^{6}

, and six subsample sizes

r = 500, 1000, 1500, 2000, 2500

, and 3000. Table 1 reports the average computational times in milliseconds for the case where

ϕ = 0.5

and

σ = 1

. The computation times for other values of

ϕ

and

σ

follow a similar pattern. These results are based on simulations using a Julia implementation run on an Apple MacBook Pro with an M1 Pro chip.

The entries in the table show a substantial reduction in computation times by using the subsampling methods compared to the time taken for full-data estimation. Unlike the uniform subsampling method, which incurs almost no computational overhead during the subsampling process, optimal subsampling algorithms require additional computations to determine the inclusion probability for each data point. The numerical optimization using subsamples has a time complexity of

O (r)

, whereas calculating the subsampling probabilities for nonuniform subsampling methods requires a cost of

O (m)

Compared to full-data-based estimation, both opt and iboss demonstrate exceptional efficiency, reducing the computation time by a factor of 1/40 on average when the sample size m is sufficiently large. Although not as fast as these two methods, thin still achieves significant computational savings, taking less than one-tenth of the time required for the full-data estimation. Additionally, an advantage of using thin is that subsample selection can be performed sequentially. Overall, all three proposed optimal subsampling methods for the IS-AR(1) model offer reliable strategies for reducing the computational burden in large-scale data applications.

6. Conclusions

In this paper, we investigated the technique of computationally feasible subsampling in the context of irregularly spaced time series data. We proposed practical algorithms for implementing the opt, iboss, and thin methods for the IS-AR(1) model. The numerical results demonstrated that the proposed subsampling methods outperform the naive uniform subsampling approach with improved estimation efficiency and show significant benefits in reducing the computation time compared with the full-data estimation.

While our work is currently focused on the IS-AR(1) model, it highlights the potential of optimal subsampling methods for time series data. In future research, these techniques can be extended to more complex models, such as IS-AR(p) for

p > 1

. Typically, as the number of parameters increases, optimal subsampling methods become even more effective in reducing computation times. We hope that our work paves the way for further exploration of subsampling strategies in time series analysis.

Author Contributions

Conceptualization, J.L. and Z.W.; methodology, J.L. and Z.W.; formal analysis, J.L. and Z.W.; investigation, J.L. and Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, N.R. and H.W.; supervision, N.R. and H.W.; project administration, N.R. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data used for the paper are computer-generated. Codes for generating the data are available upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Elorrieta, F.; Eyheramendy, S.; Palma, W. Discrete-time autoregressive model for unequally spaced time-series observations. Astron. Astrophys. 2019, 627, A120. [Google Scholar] [CrossRef]
Mudelsee, M. Trend analysis of climate time series: A review of methods. Earth-Sci. Rev. 2019, 190, 310–322. [Google Scholar] [CrossRef]
Dutta, C.; Karpman, K.; Basu, S.; Ravishanker, N. Review of statistical approaches for modeling high-frequency trading data. Sankhya B 2023, 85, 1–48. [Google Scholar] [CrossRef]
Nagaraja, C.H.; Brown, L.D.; Zhao, L.H. An autoregressive approach to house price modeling. Ann. Appl. Stat. 2011, 5, 124–149. [Google Scholar] [CrossRef]
Erdogan, E.; Ma, S.; Beygelzimer, A.; Rish, I. Statistical models for unequally spaced time series. In Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, Newport Beach, CA, USA, 21–23 April 2005; pp. 626–630. [Google Scholar]
Anantharaman, S.; Ravishanker, N.; Basu, S. Hierarchical modeling of irregularly spaced financial returns. Stat 2024, 13, e692. [Google Scholar] [CrossRef]
Carlstein, E. The use of subseries values for estimating the variance of a general statistic from a stationary sequence. Ann. Stat. 1986, 14, 1171–1179. [Google Scholar] [CrossRef]
Fukuchi, J.I. Subsampling and model selection in time series analysis. Biometrika 1999, 86, 591–604. [Google Scholar] [CrossRef]
Politis, D.N. Scalable subsampling: Computation, aggregation and inference. Biometrika 2023, 111, 347–354. [Google Scholar] [CrossRef]
Shumway, R. Time Series Analysis and Its Applications; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
Eyheramendy, S.; Elorrieta, F.; Palma, W. An autoregressive model for irregular time series of variable stars. Proc. Int. Astron. Union 2016, 12, 259–262. [Google Scholar] [CrossRef]
Elorrieta, F.; Eyheramendy, S.; Palma, W.; Ojeda, C. A novel bivariate autoregressive model for predicting and forecasting irregularly observed time series. Mon. Not. R. Astron. Soc. 2021, 505, 1105–1116. [Google Scholar] [CrossRef]
Ghysels, E.; Jasiak, J. GARCH for irregularly spaced financial data: The ACD-GARCH model. Stud. Nonlinear Dyn. Econom. 1998, 2, 1–19. [Google Scholar] [CrossRef]
Meddahi, N.; Renault, E.; Werker, B. GARCH and irregularly spaced data. Econ. Lett. 2006, 90, 200–204. [Google Scholar] [CrossRef]
Engle, R.F.; Russell, J.R. Autoregressive conditional duration: A new model for irregularly spaced transaction data. Econometrica 1998, 66, 1127–1162. [Google Scholar] [CrossRef]
Maller, R.A.; Müller, G.; Szimayer, A. GARCH modelling in continuous time for irregularly spaced time series data. Bernoulli 2008, 14, 519–542. [Google Scholar] [CrossRef]
Buccheri, G.; Bormetti, G.; Corsi, F.; Lillo, F. A score-driven conditional correlation model for noisy and asynchronous data: An application to high-frequency covariance dynamics. J. Bus. Econ. Stat. 2021, 39, 920–936. [Google Scholar] [CrossRef]
Dutta, C. Modeling Multiple Irregularly Spaced High-Frequency Financial Time Series. Ph.D. Thesis, University of Connecticut, Storrs, CT, USA, 2022. [Google Scholar]
Drineas, P.; Mahoney, M.W.; Muthukrishnan, S. Sampling algorithms for l₂ regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, USA, 22–24 January 2006; pp. 1127–1136. [Google Scholar]
Yang, T.; Zhang, L.; Jin, R.; Zhu, S. An explicit sampling dependent spectral error bound for column subset selection. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 135–143. [Google Scholar]
Ma, P.; Mahoney, M.W.; Yu, B. A statistical perspective on algorithmic leveraging. J. Mach. Learn. Res. 2015, 16, 861–991. [Google Scholar]
Xie, R.; Wang, Z.; Bai, S.; Ma, P.; Zhong, W. Online decentralized leverage score sampling for streaming multidimensional time series. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 16–18 April 2019; Volume 89, pp. 2301–2311. [Google Scholar]
Zhu, R. Gradient-based sampling: An adaptive importance sampling for least-squares. Adv. Neural Inf. Process. Syst. 2018, 29, 406–414. [Google Scholar]
Wang, H.; Zhu, R.; Ma, P. Optimal subsampling for large sample logistic regression. J. Am. Stat. Assoc. 2018, 13, 829–844. [Google Scholar] [CrossRef]
Teng, G.; Tian, B.; Zhang, Y.; Fu, S. Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design. Entropy 2022, 25, 84. [Google Scholar] [CrossRef]
Wang, J.; Zou, J.; Wang, H. Sampling with replacement vs Poisson sampling: A comparative study in optimal subsampling. IEEE Trans. Inf. Theory 2022, 68, 6605–6630. [Google Scholar] [CrossRef]
Wang, H.; Yang, M.; Stufken, J. Information-based optimal subdata selection for big data linear regression. J. Am. Stat. Assoc. 2019, 114, 393–405. [Google Scholar] [CrossRef]
Pronzato, L.; Wang, H. Sequential online subsampling for thinning experimental designs. J. Stat. Plan. Inference 2021, 212, 169–193. [Google Scholar] [CrossRef]
Casella, G.; Berger, R. Statistical Inference; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
Kleinberg, J.; Tardos, E. Algorithm Design; Pearson/Addison-Wesley: Hoboken, NJ, USA, 2006. [Google Scholar]
Wynn, H. Optimum Submeasures with Applications to Finite Population Sampling; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
Fedorov, V.V.; Hackl, P. Model-Oriented Design of Experiments; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]