Abstract
The classic Hawkes process assumes the baseline intensity to be constant and the triggering kernel to be a parametric function. Differently, we present a generalization of the parametric Hawkes process by using a Bayesian nonparametric model called quadratic Gaussian Hawkes process. We model the baseline intensity and trigger kernel as the quadratic transformation of random trajectories drawn from a Gaussian process (GP) prior. We derive an analytical expression for the EM-variational inference algorithm by augmenting the latent branching structure of the Hawkes process to embed the variational Gaussian approximation into the EM framework naturally. We also use a series of schemes based on the sparse GP approximation to accelerate the inference algorithm. The results of synthetic and real data experiments show that the underlying baseline intensity and triggering kernel can be recovered efficiently and our model achieved superior performance in fitting capability and prediction accuracy compared to the state-of-the-art approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adams, R.P., Murray, I., MacKay, D.J.: Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 9–16. ACM (2009)
Bacry, E., Muzy, J.F.: First-and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Trans. Inf. Theory 62(4), 2184–2202 (2016)
Bacry, E., Mastromatteo, I., Muzy, J.F.: Hawkes processes in finance. Mark. Microstruct. Liq. 1(01), 1550005 (2015)
Bishop, C.: Pattern recognition and machine learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn. Springer, New York (2007)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Probability and Its Applications, vol. I. Springer, Berlin (2003)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
Eichler, M., Dahlhaus, R., Dueck, J.: Graphical modeling for multivariate Hawkes processes with nonparametric link functions. J. Time Ser. Anal. 38(2), 225–242 (2017)
Flaxman, S., Teh, Y.W., Sejdinovic, D., et al.: Poisson intensity estimation with reproducing kernels. Electron. J. Stat. 11(2), 5081–5104 (2017)
Hawkes, A.G.: Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90 (1971)
Hewlett, P.: Clustering of order arrivals, price impact and trade path optimisation. In: Workshop on Financial Modeling with Jump processes, Ecole Polytechnique, pp 6–8 (2006)
Lewis, E., Mohler, G.: A nonparametric EM algorithm for multiscale Hawkes processes. J. Nonparametr. Stat. 1(1), 1–20 (2011)
Lloyd, C., Gunter, T., Osborne, M., Roberts, S.: Variational inference for Gaussian process modulated Poisson processes. In: International Conference on Machine Learning, pp. 1814–1822 (2015)
Marsan, D., Lengline, O.: Extending earthquakes’ reach through cascading. Science 319(5866), 1076–1079 (2008)
Mei, H., Eisner, J.M.: The neural Hawkes process: a neurally self-modulating multivariate point process. In: Advances in Neural Information Processing Systems, pp. 6754–6764 (2017)
Mohler, G.O., Short, M.B., Brantingham, P.J., Schoenberg, F.P., Tita, G.E.: Self-exciting point process modeling of crime. J. Am. Stat. Assoc. 106(493), 100–108 (2011)
Ogata, Y.: Space-time point-process models for earthquake occurrences. Ann. Inst. Stat. Math. 50(2), 379–402 (1998)
Opper, M., Archambeau, C.: The variational Gaussian approximation revisited. Neural Comput. 21(3), 786–792 (2009)
Reynaud-Bouret, P., Schbath, S., et al.: Adaptive estimation for Hawkes processes; application to genome analysis. Ann. Stat. 38(5), 2781–2822 (2010)
Rizoiu, M.A., Mishra, S., Kong, Q., Carman, M., Xie, L.: SIR-Hawkes: linking epidemic models and Hawkes processes to model diffusions in finite populations. In: Proceedings of the 2018 World Wide Web Conference, pp. 419–428 (2018)
Samo, Y.L.K., Roberts, S,: Scalable nonparametric Bayesian inference on point processes with Gaussian processes. In: International Conference on Machine Learning, pp. 2227–2236 (2015)
Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: Artificial Intelligence and Statistics, pp. 567–574 (2009)
Wheatley, S., Schatz, M., Sornette, D.: The ARMA point process and its estimation. arXiv preprint arXiv:1806.09948 (2018)
Zhang, R., Walder, C., Rizoiu, M.A.: Variational inference for sparse Gaussian process modulated Hawkes process. arXiv preprint arXiv:1905.10496v2 (2019)
Zhou, K., Zha, H., Song, L.: Learning triggering kernels for multi-dimensional Hawkes processes. In: International Conference on Machine Learning, pp. 1301–1309 (2013)
Zhou, F., Li, Z., Fan, X., Wang, Y., Sowmya, A., Chen, F.: A refined MISD algorithm based on Gaussian process regression. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 584–596. Springer (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
Proof of lower-bound
The lower-bound \(Q(\mu (t),\phi (\tau )|\mu ^{(s)}(t),\phi ^{(s)}(\tau ))\) in Eq. (4) is induced as follows. Based on Jensen’s inequality, we have
where C is a constant because \(p_{ii}\) and \(p_{ij}\) are given in the E-step.
Analytical solution of ELBO
The \(\text {KL}\left( q(\mathbf {u}_f)||p(\mathbf {u}_f)\right) \) can be written as
where \(\text {Tr}(\cdot )\) means trace, \(|\cdot |\) means determinant and \(M_f\) is the dimensionality of \(\mathbf {u}_f\).
The last two terms in Eq. (8) have analytical solutions (Lloyd et al. 2015)
where \(\varPsi _f(z_f,z_f^\prime )=\int _0^Tk(z_f,t)k(t,z_f^\prime )dt\). For the squared exponential kernel, \(\varPsi _f\) can be written as (Lloyd et al. 2015)
where \(\text {erf}(\cdot )\) is Gauss error function and \(\bar{z}_f=(z_f+z_f^\prime )/2\).
The first term in Eq. (8) also has an analytical solution (Lloyd et al. 2015)
where \({\tilde{\sigma }}^2_f(t_i)\) is the diagonal entry of \({\tilde{\varSigma }}_f(t,t^\prime )\) in Eq. (7) at \(t_i\), C is the Euler-Mascheroni constant 0.57721566 and \(\tilde{G}(z)\) is a special case of the partial derivative of the confluent hyper-geometric function \(_1F_1(a,b,z)\) (Lloyd et al. 2015)
It is worth noting that \(\tilde{G}(z)\) does not need to be computed for inference. Actually we only need to know \(\tilde{G}(0)=0\) because \(\mathbf {m}_f^*=\mathbf {0}\) as we have shown in the section of inference speed up.
Matrix derivative of ELBO
Given \(\mathbf {m}_f=\mathbf {0}\), \(\text {ELBO}_\mu \) can be written as
If \(\mathbf {S}_f\) is symmetric, \(\partial \text {ELBO}_\mu /\partial \mathbf {S}_f\) can be written as
where \(\mathbf {I}\) denotes the identity matrix, \(\circ \) denotes the Hadamard (elementwise) product and \({\tilde{\sigma }}_f^2(t_i)=\theta _0^f-\mathbf {k}_{t_i z_f}\mathbf {K}_{z_f z_f}^{-1}\mathbf {k}_{z_f t_i}+\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\mathbf {S}_f\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\) is the diagonal entry of \({\tilde{\varSigma }}_f(t,t^\prime )\) in Eq. (7).
If \(\mathbf {S}_f\) is diagonal, \(\partial \text {ELBO}_\mu /\partial \mathbf {S}_f\) can be further simplified as
Similarly given \(\mathbf {m}_g=\mathbf {0}\), \(\text {ELBO}_\phi \) can be written as
If \(\mathbf {S}_g\) is symmetric, \(\partial \text {ELBO}_\phi /\partial \mathbf {S}_g\) can be written as
where \({\tilde{\sigma }}_g^2(\tau _{ij})=\theta _0^g-\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}_{z_g z_g}^{-1}\mathbf {k}_{z_g \tau _{ij}}+\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\mathbf {S}_g\mathbf {K}^{-1}_{z_g z_g} \mathbf {k}_{z_g \tau _{ij}}\) is the diagonal entry of \({\tilde{\varSigma }}_g(\tau ,\tau ^\prime )\).
If \(\mathbf {S}_g\) is diagonal, \(\partial \text {ELBO}_\phi /\partial \mathbf {S}_g\) can be further simplified as
Rights and permissions
About this article
Cite this article
Zhou, F., Luo, S., Li, Z. et al. Efficient EM-variational inference for nonparametric Hawkes process. Stat Comput 31, 46 (2021). https://doi.org/10.1007/s11222-021-10021-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10021-x