Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Empirical geodesic graphs and CAT(k) metrics for data analysis

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. For a probability distribution, the length along a path between two points can be defined as the amount of probability mass accumulated along the path. The geodesic, then, is the shortest such path and defines a geodesic metric. Such metrics are transformed in a number of ways to produce parametrised families of geodesic metric spaces, empirical versions of which allow computation of intrinsic means and associated measures of dispersion. These reveal properties of the data, based on geometry, such as those that are difficult to see from the raw Euclidean distances. Examples of application include clustering and classification. For certain parameter ranges, the spaces become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal spanning tree of a graph based on the data becomes CAT(0). In another, a so-called “metric cone” construction allows extension to CAT(k) spaces. It is shown how to empirically tune the parameters of the metrics, making it possible to apply them to a number of real cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Alexander, L., Jones, P.: Updated precipitation series for the UK and discussion of recent extremes. Atmos. Sci. Lett. 1(2), 142–150 (2000)

    Article  Google Scholar 

  • Amari, S.I.: Differential-Geometrical Methods in Statistics. Springer, New York (1985)

    Book  Google Scholar 

  • Amari, S.I., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2007)

    Book  Google Scholar 

  • Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). arXiv:1701.07875

  • Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge/A Series of Modern Surveys in Mathematics. Springer, Berlin (2017)

    Book  Google Scholar 

  • Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml/. Accessed 9 Jan 2015

  • Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 585–591. MIT Press, Cambridge (2002)

    Google Scholar 

  • Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  • Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)

    Article  Google Scholar 

  • Bhattacharya, A., Bhattacharya, R.: Nonparametric Inference on Manifolds: With Applications to Shape Spaces. Cambridge University Press, Cambridge (2012a)

    Book  Google Scholar 

  • Bhattacharya, A., Bhattacharya, R.: Nonparametric inference on manifolds: with applications to shape spaces, vol. 2. Cambridge University Press, Cambridge (2012b)

    Book  Google Scholar 

  • Billera, L.J., Holmes, S.P., Vogtmann, K.: Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001)

    Article  MathSciNet  Google Scholar 

  • Bridson, M.R., Haefliger, A.: Metric Spaces of Non-positive Curvature, vol. 319. Springer, Berlin (2011)

    MATH  Google Scholar 

  • Cayton, L.: Algorithms for manifold learning. Univ. Calif. San Diego Tech. Rep. 12(1–17), 1 (2005)

    Google Scholar 

  • CIESEN, FAO, CIAT: Gridded population of the world, version 3 (GPWV3): population count grid (2005). http://dx.doi.org/10.7927/H4639MPP. Accessed 16 June 2016

  • Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: International Conference on Machine Learning, jmlr.org, pp. 685–693 (2014)

  • De Berg, M., Van Kreveld, M., Overmars, M., Schwarzkopf, O.C.: Computational Geometry, pp. 1–17. Springer, Berlin (2000)

    Book  Google Scholar 

  • Deza, M.M., Deza, E.: Encyclopedia of Distances, pp. 1–583. Springer, Berlin (2009)

    Book  Google Scholar 

  • Dryden, I.L., Mardia, K.V.: Statistical Shape Analysis: With Applications in R. Wiley, Hoboken (2016)

    Book  Google Scholar 

  • Floyd, R.W.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)

    Article  Google Scholar 

  • Gromov, M.: Hyperbolic groups. In: Gersten, S.M. (ed.) Essays in Group Theory, pp. 75–263. Springer, Berlin (1987)

    Chapter  Google Scholar 

  • Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. Lond. Math. Soc. 16(2), 81–121 (1984)

    Article  MathSciNet  Google Scholar 

  • Kendall, M.G., Morán, PA.: Geometrical probability. Technical Report (1963)

  • Kendall, D.G., Barden, D., Carne, T.K., Le, H.: Shape and Shape Theory. Wiley, Hoboken (2009)

    MATH  Google Scholar 

  • Komaki, F.: Shrinkage priors for Bayesian prediction. Ann. Stat. 34(2), 808–819 (2006)

    Article  MathSciNet  Google Scholar 

  • Lin, Y., Lu, L., Yau, S.T.: Ricci curvature of graphs. Tohoku Math. J. 63(4), 605–627 (2011)

    Article  MathSciNet  Google Scholar 

  • Marron, J.S., Alonso, A.M.: Overview of object oriented data analysis. Biom. J. 56(5), 732–753 (2014)

    Article  MathSciNet  Google Scholar 

  • McCullagh, P.: Tensor Methods in Statistics. Courier Dover Publications, Mineola (1986)

    MATH  Google Scholar 

  • Nye, T.M.W.: Principal components analysis in the space of phylogenetic trees. Ann. Stat. 39(5), 2716–2739 (2011)

    Article  MathSciNet  Google Scholar 

  • Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, vol. 501. Wiley, Hoboken (2009)

    MATH  Google Scholar 

  • Owen, M., Provan, J.S.: A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(1), 2–13 (2011)

    Article  Google Scholar 

  • Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances (2018). arXiv:1806.05500

  • Patrangenaru, V., Ellingson, L.: Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis. CRC Press, Boca Raton (2015)

    Book  Google Scholar 

  • Peyré, G., Cuturi, M.: Computational optimal transport (2018). arXiv:1803.00567

  • Ramsay, J.O., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, Berlin (2007)

    MATH  Google Scholar 

  • Saul, L.K.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 119–155 (2003)

    MathSciNet  MATH  Google Scholar 

  • Solomon, J., de Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Gr. 34(4), 66:1–66:11 (2015)

    Article  Google Scholar 

  • Srivastava, A., Klassen, E.P.: Functional and Shape Data Analysis. Springer Series in Statistics. Springer, Berlin (2016)

    Book  Google Scholar 

  • Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48(4), 1005–1026 (2011)

    MathSciNet  MATH  Google Scholar 

  • Tanaka, F., Komaki, F.: A superharmonic prior for the autoregressive process of the second-order. J. Time Ser. Anal. 29(3), 444–452 (2008)

    Article  MathSciNet  Google Scholar 

  • Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  • Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)

    Article  Google Scholar 

  • Villani, C.: Optimal Transport: Old and New. Springer, Berlin (2008)

    MATH  Google Scholar 

  • Wang, H., Marron, J.S.: Object oriented data analysis: sets of trees (2007). arXiv:0711.3147

    Article  MathSciNet  Google Scholar 

  • Yang, L., Jin, R.: Distance metric learning: a comprehensive survey. Mich. State Univ. 2(2), 4 (2006)

    Google Scholar 

Download references

Acknowledgements

Funding was provided by JST, PRESTO (JPMJPR14E3) and JSPS, KAKENHI (26280009,16K02843), Japan. The first author would like to thank Masayuki Sakai, Takaaki Koike and Tatsuhiro Aoshima for their excellent computation and visualization of the results. He also appreciates Reiko Miyaoka and Hiroshi Kokubu for their helpful and encouraging advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kei Kobayashi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

JST PREST, JSPS KAKENHI Grant Numbers 16K02843 and 26280009.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 81 KB)

A Proof of Theorem 6

A Proof of Theorem 6

(1) Denote the mapped points of abc and x by the projection \(\tilde{{\mathcal {X}}}_\beta \rightarrow {\mathcal {X}}_\beta \) as ABC and X, respectively, as shown in Fig. 14 (left). Denote the origin of the metric cone as O. If the sum of the lengths of the geodesics \({\widetilde{AB}}\), \({\widetilde{AC}}\) and \({\widetilde{BC}}\) in \({\mathcal {X}}_\beta \) exceeds \(2\pi \beta \), it is easy to see that the cone spanned by \({\overline{ab}} \cup {\overline{ac}} \cup {\overline{bc}}\) becomes CAT(0) and \(\varDelta abc\) satisfies the CAT(0) property. Therefore, assume that \(|{\widetilde{AB}}|+|{\widetilde{AC}}|+|{\widetilde{BC}}|\le 2 \pi \beta \).

Fig. 14
figure 14

The cone spanned by geodesics \({\overline{ab}}\), \({\overline{ac}}\), \({\overline{bc}}\) and \({\overline{ax}}\) (left) and the cone spanned by a comparison triangle \(\varDelta a'b'c'\) (right)

Next, let \(\varDelta a'b'c'\) be a comparison triangle of \(\varDelta abc\) and let \(x'\) be a point on a geodesic \(\overline{b'c'}\) such that \(|{\overline{bx}}|=|\overline{b'x'}|\). Thus, \(|\overline{a'x'}|< |{\overline{ax}}|\). Arrange the points \(a'\), \(b'\) and \(c'\) in a three-dimensional Euclidean space with origin \(O'\) such that the lengths of \(\overline{O'a'}\), \(\overline{O'b'}\) and \(\overline{O'c'}\) are equal to the lengths of \({\overline{Oa}}\), \({\overline{Ob}}\) and \({\overline{Oc}}\), respectively. Denote the radial projection of \(a'\), \(b'\), \(c'\) and \(x'\) to a unit sphere as \(A'\), \(B'\), \(C'\) and \(X'\), respectively, as shown in Fig. 14 (right). By the definition of a metric cone, \(|{\overline{Ox}}|=|\overline{O'x'}|\) and the geodesics \(\widetilde{A'B'}\), \(\widetilde{A'C'}\), \(\widetilde{B'C'}\) and \(\widetilde{A'X'}\) in the unit sphere are arcs satisfying \(|\widetilde{A'B'}|=|{\widetilde{AB}}|\), \(|\widetilde{A'C'}|=|{\widetilde{AC}}|\), \(|\widetilde{B'C'}|=|{\widetilde{BC}}|\) and \(|\widetilde{B'X'}|=|{\widetilde{BX}}|\).

From the argument above, \(|\widetilde{A'B'}|+|\widetilde{A'C'}|+|\widetilde{B'C'}|= |{\widetilde{AB}}|+|{\widetilde{AC}}|+|{\widetilde{BC}}|\le 2 \pi \). Since the unit sphere has a positive constant curvature and \({\mathcal {X}}_\beta \) is CAT(0), \(|\widetilde{A'X'}|> |{\widetilde{AX}}|\). However, since \(|{\overline{Oa}}|=|\overline{O'a'}|\) and \(|{\overline{Ox}}|=|\overline{O'x'}|\), \(|\widetilde{A'X'}|> |{\widetilde{AX}}|\) implies that \(|\widetilde{a'x'}|> |{\widetilde{ax}}|\) by the property of a metric cone. Thus, \(\varDelta abc\) has CAT(0) property and (1) of the theorem is proved.

(2) Assume that \(0<\beta _1<\beta _2<\infty \) and a metric cone \(\tilde{{\mathcal {X}}}_{\beta _1}\) is not CAT(0) for proving the latter half of the theorem by contradiction. Then, there is a geodesic triangle \(\varDelta a_1 b_1 c_1\) in \(\tilde{{\mathcal {X}}}_{\beta _1}\) and a point \(x_1\) on the geodesic \(\overline{b_1 c_1}\) such that the geodesic \(\overline{a_1 x_1}\) is longer than the corresponding geodesic of a comparison triangle. By defining \(A_1, B_1, C_1, X_1,a'_1,b'_1,c'_1,x'_1,A'_1, B'_1, C'_1\) and \(X'_1\) as above, we can say that \(|\widetilde{A_1 X_1}|>|\widetilde{A'_1 X'_1}|\).

Next, each of \(A_1, B_1, C_1\) and \(X_1\) corresponds to a point in \({\mathcal {X}}_{\beta _1}\) and we can consider the corresponding points \(A_2, B_2, C_2\) and \(X_2\) in the other metric cone \(\tilde{{\mathcal {X}}}_{\beta _2}\). When restricted to \({\mathcal {X}}_{\beta _1}\), a geodesic \(\widetilde{A_1 X_1}\) is just a rescaling of \(\widetilde{A_2 X_2}\) and \(|\widetilde{A_1 X_1}|=\frac{\beta _1}{\beta _2}|\widetilde{A_2 X_2}|\).

Now, \(\varDelta A_2'B_2'C_2'\) is a geodesic triangle on the unit sphere, but after rescaling by \(\frac{\beta _2}{\beta _1}\), we can get a geodesic triangle \(\varDelta A_2''B_2''C_2''\) on a sphere of radius \(\frac{\beta _2}{\beta _1}\) whose edges have the same length as \(\varDelta A_1'B_1'C_1'\). By a known result on spherical triangles with the same edge lengths on different spheres, a larger radius implies a “thinner” triangle and \(|\widetilde{A_1 X_1}|<|\widetilde{A_2'' X_2''}|\) where \(X_2''\) is a point on the geodesic \(\widetilde{B_2'' C_2''}\) such that \(|\widetilde{B_1 X_1}|=|\widetilde{B_2'' X_2''}|\).

Combining all the arguments gives

$$\begin{aligned} \textstyle |\widetilde{A_2 X_2}|=\frac{\beta _2}{\beta _1}|\widetilde{A_1 X_1}|>\frac{\beta _2}{\beta _1}|\widetilde{A'_1 X'_1}| >\frac{\beta _2}{\beta _1}|\widetilde{A_2'' X_2''}|=|\widetilde{A'_2 X'_2}|. \end{aligned}$$

Select a non-degenerate geodesic triangle in \(\tilde{{\mathcal {X}}}_{\beta _2}\) by selecting arbitrary points \(a_2,b_2\) and \(c_2\) on the geodesics \(\overline{OA_2}\), \(\overline{OB_2}\) and \(\overline{OC_2}\) in \(\tilde{{\mathcal {X}}}_{\beta _2}\), respectively, and let \(x_2\) be the intersection point of \(\overline{OX_2}\) and \(\overline{b_2 c_2}\). Then, by \(|\widetilde{A_2 X_2}|>|\widetilde{A'_2 X'_2}|\), we can say that \(|\overline{a_2 x_2}|>|\overline{a'_2 x'_2}|\). This implies that \(\tilde{{\mathcal {X}}}_{\beta _2}\) is not CAT(0) and (2) of the theorem is proved.

(3) For \(k=0\), the statement holds by (1). For \(k>0\) and \(\beta \le \pi \), it is sufficient to prove for \(\beta =\pi /\sqrt{k}\) by (2). Let \(\varDelta abc\) be a geodesic triangle in \(\tilde{{\mathcal {X}}}_\beta \) and let \(\varDelta ABC\) be a geodesic triangle in \({\mathcal {X}}_\beta \). Let ABC be the projection of abc, respectively. If the perimeter of \(\varDelta ABC\) is longer than or equal to \(2\pi \), the cone spanned by the perimeter becomes CAT(0) by the same argument as that for (1). Therefore, \(\varDelta abc\) is CAT(0) and satisfies the CAT(0) property.

If the perimeter of \(\varDelta ABC\) is smaller than \(2\pi \), since \({\mathcal {X}}\) is CAT(k) and \({\mathcal {X}}_\beta \) is CAT(1), for any \(X\in {\widetilde{BC}}\), \({\widetilde{BX}}\) is shorter than the corresponding great arc \(\widetilde{B'X'}\) of a comparison triangle \(\varDelta A'B'C'\), which is a spherical triangle on the unit sphere. Since a comparison triangle \(\varDelta a'b'c'\) of \(\varDelta abc\) can be embedded on the cone spanned by \(\varDelta A'B'C'\), \({\widetilde{bx}}\) is shorter than the corresponding line segment \(\widetilde{b'x'}\). This means the \(\varDelta abc\) satisfies the CAT(0) property. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kobayashi, K., Wynn, H.P. Empirical geodesic graphs and CAT(k) metrics for data analysis. Stat Comput 30, 1–18 (2020). https://doi.org/10.1007/s11222-019-09855-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-019-09855-3

Keywords

Navigation