Efficient EM-variational inference for nonparametric Hawkes process

Feng Zhou¹,
Simon Luo^2,3,
Zhidong Li⁴,
Xuhui Fan⁵,
Yang Wang⁴,
Arcot Sowmya¹ &
…
Fang Chen⁴

886 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

The classic Hawkes process assumes the baseline intensity to be constant and the triggering kernel to be a parametric function. Differently, we present a generalization of the parametric Hawkes process by using a Bayesian nonparametric model called quadratic Gaussian Hawkes process. We model the baseline intensity and trigger kernel as the quadratic transformation of random trajectories drawn from a Gaussian process (GP) prior. We derive an analytical expression for the EM-variational inference algorithm by augmenting the latent branching structure of the Hawkes process to embed the variational Gaussian approximation into the EM framework naturally. We also use a series of schemes based on the sparse GP approximation to accelerate the inference algorithm. The results of synthetic and real data experiments show that the underlying baseline intensity and triggering kernel can be recovered efficiently and our model achieved superior performance in fitting capability and prediction accuracy compared to the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic variational inference for scalable non-stationary Gaussian process regression

Article Open access 17 February 2023

Fast multi-resolution segmentation for nonstationary Hawkes process using cumulants

Article 08 June 2020

Sparse estimation for generalized exponential marked Hawkes process

Article 13 April 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Adams, R.P., Murray, I., MacKay, D.J.: Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 9–16. ACM (2009)
Bacry, E., Muzy, J.F.: First-and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Trans. Inf. Theory 62(4), 2184–2202 (2016)
Article MathSciNet Google Scholar
Bacry, E., Mastromatteo, I., Muzy, J.F.: Hawkes processes in finance. Mark. Microstruct. Liq. 1(01), 1550005 (2015)
Article Google Scholar
Bishop, C.: Pattern recognition and machine learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn. Springer, New York (2007)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
MathSciNet Google Scholar
Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Probability and Its Applications, vol. I. Springer, Berlin (2003)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
MathSciNet MATH Google Scholar
Eichler, M., Dahlhaus, R., Dueck, J.: Graphical modeling for multivariate Hawkes processes with nonparametric link functions. J. Time Ser. Anal. 38(2), 225–242 (2017)
Article MathSciNet Google Scholar
Flaxman, S., Teh, Y.W., Sejdinovic, D., et al.: Poisson intensity estimation with reproducing kernels. Electron. J. Stat. 11(2), 5081–5104 (2017)
Article MathSciNet Google Scholar
Hawkes, A.G.: Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90 (1971)
Article MathSciNet Google Scholar
Hewlett, P.: Clustering of order arrivals, price impact and trade path optimisation. In: Workshop on Financial Modeling with Jump processes, Ecole Polytechnique, pp 6–8 (2006)
Lewis, E., Mohler, G.: A nonparametric EM algorithm for multiscale Hawkes processes. J. Nonparametr. Stat. 1(1), 1–20 (2011)
Google Scholar
Lloyd, C., Gunter, T., Osborne, M., Roberts, S.: Variational inference for Gaussian process modulated Poisson processes. In: International Conference on Machine Learning, pp. 1814–1822 (2015)
Marsan, D., Lengline, O.: Extending earthquakes’ reach through cascading. Science 319(5866), 1076–1079 (2008)
Article Google Scholar
Mei, H., Eisner, J.M.: The neural Hawkes process: a neurally self-modulating multivariate point process. In: Advances in Neural Information Processing Systems, pp. 6754–6764 (2017)
Mohler, G.O., Short, M.B., Brantingham, P.J., Schoenberg, F.P., Tita, G.E.: Self-exciting point process modeling of crime. J. Am. Stat. Assoc. 106(493), 100–108 (2011)
Article MathSciNet Google Scholar
Ogata, Y.: Space-time point-process models for earthquake occurrences. Ann. Inst. Stat. Math. 50(2), 379–402 (1998)
Article Google Scholar
Opper, M., Archambeau, C.: The variational Gaussian approximation revisited. Neural Comput. 21(3), 786–792 (2009)
Article MathSciNet Google Scholar
Reynaud-Bouret, P., Schbath, S., et al.: Adaptive estimation for Hawkes processes; application to genome analysis. Ann. Stat. 38(5), 2781–2822 (2010)
Article MathSciNet Google Scholar
Rizoiu, M.A., Mishra, S., Kong, Q., Carman, M., Xie, L.: SIR-Hawkes: linking epidemic models and Hawkes processes to model diffusions in finite populations. In: Proceedings of the 2018 World Wide Web Conference, pp. 419–428 (2018)
Samo, Y.L.K., Roberts, S,: Scalable nonparametric Bayesian inference on point processes with Gaussian processes. In: International Conference on Machine Learning, pp. 2227–2236 (2015)
Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: Artificial Intelligence and Statistics, pp. 567–574 (2009)
Wheatley, S., Schatz, M., Sornette, D.: The ARMA point process and its estimation. arXiv preprint arXiv:1806.09948 (2018)
Zhang, R., Walder, C., Rizoiu, M.A.: Variational inference for sparse Gaussian process modulated Hawkes process. arXiv preprint arXiv:1905.10496v2 (2019)
Zhou, K., Zha, H., Song, L.: Learning triggering kernels for multi-dimensional Hawkes processes. In: International Conference on Machine Learning, pp. 1301–1309 (2013)
Zhou, F., Li, Z., Fan, X., Wang, Y., Sowmya, A., Chen, F.: A refined MISD algorithm based on Gaussian process regression. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 584–596. Springer (2018)

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Kensington, Australia
Feng Zhou & Arcot Sowmya
School of Mathematics and Statistics, The University of Sydney, Sydney, Australia
Simon Luo
Data Analytics for Research and Environment, Australian Research Council, Canberra, Australia
Simon Luo
Data Science Institute, University of Technology Sydney, Ultimo, Australia
Zhidong Li, Yang Wang & Fang Chen
School of Mathematics and Statistics, University of New South Wales, Kensington, Australia
Xuhui Fan

Authors

Feng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Simon Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuhui Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Arcot Sowmya
View author publications
You can also search for this author in PubMed Google Scholar
Fang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhidong Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proof of lower-bound

The lower-bound $Q(\mu (t),\phi (\tau )|\mu ^{(s)}(t),\phi ^{(s)}(\tau ))$ in Eq. (4) is induced as follows. Based on Jensen’s inequality, we have

$$\begin{aligned}&\log p(D|\mu (t), \phi (\tau ))=\sum _{i=1}^N \log \left( \mu (t_i)+\sum _{j=1}^{i-1}\phi (t_i-t_j)\right) \nonumber \\&\qquad -\int _0^T\left( \mu (t)+\sum _{t_i<t}\phi (t-t_i)\right) dt\nonumber \\&\quad \ge \sum _{i=1}^N\left( p_{ii}\log \frac{\mu (t_i)}{p_{ii}}+\sum _{j=1}^{i-1}p_{ij}\log \frac{\phi (t_i-t_j)}{p_{ij}}\right) \nonumber \\&\qquad -\int _0^T\mu (t)dt-\sum _{i=1}^N\int _{t_i}^{t_i+T_{\phi }}\phi (t-t_i)dt\nonumber \\&\quad =\sum _{i=1}^Np_{ii}\log \mu (t_i)-\int _0^T\mu (t)dt\nonumber \\&\qquad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\log \phi (t_i-t_j)-\sum _{i=1}^N\int _{t_i}^{t_i+T_\phi }\phi (t-t_i)dt+C\nonumber \\ \end{aligned}$$

(13)

where C is a constant because $p_{ii}$ and $p_{ij}$ are given in the E-step.

Analytical solution of ELBO

The $\text {KL}\left( q(\mathbf {u}_f)||p(\mathbf {u}_f)\right) $ can be written as

$$\begin{aligned} \begin{aligned} \text {KL}\left( q(\mathbf {u}_f)||p(\mathbf {u}_f)\right) =&\frac{1}{2}\bigg [\text {Tr}(\mathbf {K}_{z_f z_f}^{-1}\mathbf {S}_f)+\log \frac{|\mathbf {K}_{z_f z_f}|}{|\mathbf {S}_f|}\\&-M_f+\mathbf {m}_f^T\mathbf {K}_{z_f z_f}^{-1}\mathbf {m}_f\bigg ], \end{aligned} \end{aligned}$$

(14)

where $\text {Tr}(\cdot )$ means trace, $|\cdot |$ means determinant and $M_f$ is the dimensionality of $\mathbf {u}_f$.

The last two terms in Eq. (8) have analytical solutions (Lloyd et al. 2015)

$$\begin{aligned} \int _0^T\mathbb {E}^2_{q(f)}[f(t)]dt=\mathbf {m}_f^T\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f\mathbf {K}^{-1}_{z_f z_f}\mathbf {m}_f, \end{aligned}$$

(15)

$$\begin{aligned} \begin{aligned} \int _0^T\text {Var}_{q(f)}[f(t)]dt=&\theta _0^fT-\text {Tr}(\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f)+\\&\text {Tr}(\mathbf {K}_{z_f z_f}^{-1}\mathbf {S}_f\mathbf {K}_{z_f z_f}^{-1}\varvec{\Psi }_f), \end{aligned} \end{aligned}$$

(16)

where $\varPsi _f(z_f,z_f^\prime )=\int _0^Tk(z_f,t)k(t,z_f^\prime )dt$. For the squared exponential kernel, $\varPsi _f$ can be written as (Lloyd et al. 2015)

$$\begin{aligned} \varPsi _f(z_f,z_f^\prime )=&-\frac{(\theta _0^f)^2}{2}\sqrt{\frac{\pi }{\theta _1^f}}\exp \left( -\frac{\theta _1^f(z_f-z_f^\prime )^2}{4}\right) \nonumber \\&\left[ \text {erf}\left( \sqrt{\theta _1^f}(\bar{z}_f-T)\right) -\text {erf}\left( \sqrt{\theta _1^f}\bar{z}_f\right) \right] ,\nonumber \\ \end{aligned}$$

(17)

where $\text {erf}(\cdot )$ is Gauss error function and $\bar{z}_f=(z_f+z_f^\prime )/2$.

The first term in Eq. (8) also has an analytical solution (Lloyd et al. 2015)

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{q(f)}\left[ \log f^2(t_i)\right] \\&\quad =\int _{-\infty }^\infty \log f^2(t_i)\mathcal {N}(f(t_i)|\tilde{v}_f(t_i),{\tilde{\sigma }}^2_f(t_i))df(t_i)\\&\quad =-\tilde{G}\left( -\frac{\tilde{v}^2_f(t_i)}{2{\tilde{\sigma }}^2_f(t_i)}\right) +\log \left( \frac{{\tilde{\sigma }}^2_f(t_i)}{2}\right) -C, \end{aligned} \end{aligned}$$

(18)

where ${\tilde{\sigma }}^2_f(t_i)$ is the diagonal entry of ${\tilde{\varSigma }}_f(t,t^\prime )$ in Eq. (7) at $t_i$, C is the Euler-Mascheroni constant 0.57721566 and $\tilde{G}(z)$ is a special case of the partial derivative of the confluent hyper-geometric function $_1F_1(a,b,z)$ (Lloyd et al. 2015)

$$\begin{aligned} \tilde{G}(z)={_1F_1}^{(1,0,0)}(0,0.5,z). \end{aligned}$$

(19)

It is worth noting that $\tilde{G}(z)$ does not need to be computed for inference. Actually we only need to know $\tilde{G}(0)=0$ because $\mathbf {m}_f^*=\mathbf {0}$ as we have shown in the section of inference speed up.

Matrix derivative of ELBO

Given $\mathbf {m}_f=\mathbf {0}$, $\text {ELBO}_\mu $ can be written as

$$\begin{aligned}&\text {ELBO}_\mu \nonumber \\&\quad =-\left( \theta _0^fT-\text {Tr}(\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f)+\text {Tr}(\mathbf {K}^{-1}_{z_f z_f}\mathbf {S}_f\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f)\right) \nonumber \\&\qquad +\sum _{i=1}^Np_{ii}\left( -\tilde{G}(0)+\log ({\tilde{\sigma }}^2_f(t_i))-\log 2-C\right) \nonumber \\&\qquad -\frac{1}{2}\left( \text {Tr}(\mathbf {K}_{z_f z_f}^{-1}\mathbf {S}_f)+\log |\mathbf {K}_{z_f z_f}|-\log |\mathbf {S}_f|-M_f\right) .\nonumber \\ \end{aligned}$$

(20)

If $\mathbf {S}_f$ is symmetric, $\partial \text {ELBO}_\mu /\partial \mathbf {S}_f$ can be written as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\mu }{\partial \mathbf {S}_f}&=-(2\mathbf {K}_{z_f z_f}^{-1}\varPsi _f\mathbf {K}_{z_f z_f}^{-1}-\mathbf {K}_{z_f z_f}^{-1}\varPsi _f\mathbf {K}_{z_f z_f}^{-1}\circ \mathbf {I})\\&\quad +\sum _{i=1}^Np_{ii}\bigg (2\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\\&\quad -\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}\bigg )/{\tilde{\sigma }}_f^2(t_i)\\&\quad -\frac{1}{2}\left( 2\mathbf {K}^{-1}_{z_f z_f}-\mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}-(2\mathbf {S}_f^{-1}-\mathbf {S}_f^{-1}\circ \mathbf {I})\right) , \end{aligned} \end{aligned}$$

(21)

where $\mathbf {I}$ denotes the identity matrix, $\circ $ denotes the Hadamard (elementwise) product and ${\tilde{\sigma }}_f^2(t_i)=\theta _0^f-\mathbf {k}_{t_i z_f}\mathbf {K}_{z_f z_f}^{-1}\mathbf {k}_{z_f t_i}+\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\mathbf {S}_f\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}$ is the diagonal entry of ${\tilde{\varSigma }}_f(t,t^\prime )$ in Eq. (7).

If $\mathbf {S}_f$ is diagonal, $\partial \text {ELBO}_\mu /\partial \mathbf {S}_f$ can be further simplified as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\mu }{\partial \mathbf {S}_f}&=-\mathbf {K}_{z_f z_f}^{-1}\varPsi _f\mathbf {K}_{z_f z_f}^{-1}\circ \mathbf {I}\\&\quad +\sum _{i=1}^Np_{ii}\frac{\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}}{{\tilde{\sigma }}_f^2(t_i)}\\&\quad -\frac{1}{2}\left( \mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}-\mathbf {S}_f^{-1}\right) . \end{aligned} \end{aligned}$$

(22)

Similarly given $\mathbf {m}_g=\mathbf {0}$, $\text {ELBO}_\phi $ can be written as

$$\begin{aligned}&\text {ELBO}_\phi \nonumber \\&\quad =-\sum _{i=1}^N\left( \theta _0^gT_\phi -\text {Tr}(\mathbf {K}^{-1}_{z_g z_g}\varvec{\Psi }_g)+\text {Tr}(\mathbf {K}^{-1}_{z_g z_g}\mathbf {S}_g\mathbf {K}^{-1}_{z_g z_g}\varvec{\Psi }_g)\right) \nonumber \\&\qquad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\left( -\tilde{G}(0)+\log ({\tilde{\sigma }}^2_g(\tau _{ij}))-\log 2-C\right) \nonumber \\&\qquad -\frac{1}{2}\left( \text {Tr}(\mathbf {K}_{z_g z_g}^{-1}\mathbf {S}_g)+\log |\mathbf {K}_{z_g z_g}|-\log |\mathbf {S}_g|-M_g\right) .\nonumber \\ \end{aligned}$$

(23)

If $\mathbf {S}_g$ is symmetric, $\partial \text {ELBO}_\phi /\partial \mathbf {S}_g$ can be written as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\phi }{\partial \mathbf {S}_g}&=-\sum _{i=1}^N(2\mathbf {K}_{z_g z_g}^{-1}\varPsi _g\mathbf {K}_{z_g z_g}^{-1}-\mathbf {K}_{z_g z_g}^{-1}\varPsi _g\mathbf {K}_{z_g z_g}^{-1}\circ \mathbf {I})\\&\quad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\bigg (2\mathbf {K}^{-1}_{z_g z_g}\mathbf {k}_{z_g \tau _{ij}}\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\\&\quad -\mathbf {K}^{-1}_{z_g z_g}\mathbf {k}_{z_g \tau _{ij}}\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}\bigg )/{\tilde{\sigma }}_g^2(\tau _{ij})\\&\quad -\frac{1}{2}\left( 2\mathbf {K}^{-1}_{z_g z_g}-\mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}-(2\mathbf {S}_g^{-1}-\mathbf {S}_g^{-1}\circ \mathbf {I})\right) , \end{aligned} \end{aligned}$$

(24)

where ${\tilde{\sigma }}_g^2(\tau _{ij})=\theta _0^g-\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}_{z_g z_g}^{-1}\mathbf {k}_{z_g \tau _{ij}}+\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\mathbf {S}_g\mathbf {K}^{-1}_{z_g z_g} \mathbf {k}_{z_g \tau _{ij}}$ is the diagonal entry of ${\tilde{\varSigma }}_g(\tau ,\tau ^\prime )$.

If $\mathbf {S}_g$ is diagonal, $\partial \text {ELBO}_\phi /\partial \mathbf {S}_g$ can be further simplified as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\phi }{\partial \mathbf {S}_g}&=-\sum _{i=1}^N\mathbf {K}_{z_g z_g}^{-1}\varPsi _g\mathbf {K}_{z_g z_g}^{-1}\circ \mathbf {I}\\&\quad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\frac{\mathbf {K}^{-1}_{z_g z_g}\mathbf {k}_{z_g \tau _{ij}}\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}}{{\tilde{\sigma }}_g^2(\tau _{ij})}\\&\quad -\frac{1}{2}\left( \mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}-\mathbf {S}_g^{-1}\right) . \end{aligned} \end{aligned}$$

(25)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, F., Luo, S., Li, Z. et al. Efficient EM-variational inference for nonparametric Hawkes process. Stat Comput 31, 46 (2021). https://doi.org/10.1007/s11222-021-10021-x

Download citation

Received: 02 May 2020
Accepted: 18 May 2021
Published: 04 June 2021
DOI: https://doi.org/10.1007/s11222-021-10021-x

Efficient EM-variational inference for nonparametric Hawkes process

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stochastic variational inference for scalable non-stationary Gaussian process regression

Fast multi-resolution segmentation for nonstationary Hawkes process using cumulants

Sparse estimation for generalized exponential marked Hawkes process

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Proof of lower-bound

Analytical solution of ELBO

Matrix derivative of ELBO

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient EM-variational inference for nonparametric Hawkes process

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stochastic variational inference for scalable non-stationary Gaussian process regression

Fast multi-resolution segmentation for nonstationary Hawkes process using cumulants

Sparse estimation for generalized exponential marked Hawkes process

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Proof of lower-bound

Analytical solution of ELBO

Matrix derivative of ELBO

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation