Empirical geodesic graphs and CAT(k) metrics for data analysis

383 Accesses
1 Altmetric
Explore all metrics

Abstract

A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. For a probability distribution, the length along a path between two points can be defined as the amount of probability mass accumulated along the path. The geodesic, then, is the shortest such path and defines a geodesic metric. Such metrics are transformed in a number of ways to produce parametrised families of geodesic metric spaces, empirical versions of which allow computation of intrinsic means and associated measures of dispersion. These reveal properties of the data, based on geometry, such as those that are difficult to see from the raw Euclidean distances. Examples of application include clustering and classification. For certain parameter ranges, the spaces become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal spanning tree of a graph based on the data becomes CAT(0). In another, a so-called “metric cone” construction allows extension to CAT(k) spaces. It is shown how to empirically tune the parameters of the metrics, making it possible to apply them to a number of real cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient geodesics and an effective algorithm for distance in the complex of curves

Article 22 January 2016

Mean Distance on Metric Graphs

Article Open access 19 March 2024

Continuous Mean Distance of a Weighted Graph

Article Open access 29 April 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Alexander, L., Jones, P.: Updated precipitation series for the UK and discussion of recent extremes. Atmos. Sci. Lett. 1(2), 142–150 (2000)
Article Google Scholar
Amari, S.I.: Differential-Geometrical Methods in Statistics. Springer, New York (1985)
Book Google Scholar
Amari, S.I., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2007)
Book Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). arXiv:1701.07875
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge/A Series of Modern Surveys in Mathematics. Springer, Berlin (2017)
Book Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml/. Accessed 9 Jan 2015
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 585–591. MIT Press, Cambridge (2002)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Article Google Scholar
Bhattacharya, A., Bhattacharya, R.: Nonparametric Inference on Manifolds: With Applications to Shape Spaces. Cambridge University Press, Cambridge (2012a)
Book Google Scholar
Bhattacharya, A., Bhattacharya, R.: Nonparametric inference on manifolds: with applications to shape spaces, vol. 2. Cambridge University Press, Cambridge (2012b)
Book Google Scholar
Billera, L.J., Holmes, S.P., Vogtmann, K.: Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001)
Article MathSciNet Google Scholar
Bridson, M.R., Haefliger, A.: Metric Spaces of Non-positive Curvature, vol. 319. Springer, Berlin (2011)
MATH Google Scholar
Cayton, L.: Algorithms for manifold learning. Univ. Calif. San Diego Tech. Rep. 12(1–17), 1 (2005)
Google Scholar
CIESEN, FAO, CIAT: Gridded population of the world, version 3 (GPWV3): population count grid (2005). http://dx.doi.org/10.7927/H4639MPP. Accessed 16 June 2016
Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: International Conference on Machine Learning, jmlr.org, pp. 685–693 (2014)
De Berg, M., Van Kreveld, M., Overmars, M., Schwarzkopf, O.C.: Computational Geometry, pp. 1–17. Springer, Berlin (2000)
Book Google Scholar
Deza, M.M., Deza, E.: Encyclopedia of Distances, pp. 1–583. Springer, Berlin (2009)
Book Google Scholar
Dryden, I.L., Mardia, K.V.: Statistical Shape Analysis: With Applications in R. Wiley, Hoboken (2016)
Book Google Scholar
Floyd, R.W.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)
Article Google Scholar
Gromov, M.: Hyperbolic groups. In: Gersten, S.M. (ed.) Essays in Group Theory, pp. 75–263. Springer, Berlin (1987)
Chapter Google Scholar
Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. Lond. Math. Soc. 16(2), 81–121 (1984)
Article MathSciNet Google Scholar
Kendall, M.G., Morán, PA.: Geometrical probability. Technical Report (1963)
Kendall, D.G., Barden, D., Carne, T.K., Le, H.: Shape and Shape Theory. Wiley, Hoboken (2009)
MATH Google Scholar
Komaki, F.: Shrinkage priors for Bayesian prediction. Ann. Stat. 34(2), 808–819 (2006)
Article MathSciNet Google Scholar
Lin, Y., Lu, L., Yau, S.T.: Ricci curvature of graphs. Tohoku Math. J. 63(4), 605–627 (2011)
Article MathSciNet Google Scholar
Marron, J.S., Alonso, A.M.: Overview of object oriented data analysis. Biom. J. 56(5), 732–753 (2014)
Article MathSciNet Google Scholar
McCullagh, P.: Tensor Methods in Statistics. Courier Dover Publications, Mineola (1986)
MATH Google Scholar
Nye, T.M.W.: Principal components analysis in the space of phylogenetic trees. Ann. Stat. 39(5), 2716–2739 (2011)
Article MathSciNet Google Scholar
Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, vol. 501. Wiley, Hoboken (2009)
MATH Google Scholar
Owen, M., Provan, J.S.: A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(1), 2–13 (2011)
Article Google Scholar
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances (2018). arXiv:1806.05500
Patrangenaru, V., Ellingson, L.: Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis. CRC Press, Boca Raton (2015)
Book Google Scholar
Peyré, G., Cuturi, M.: Computational optimal transport (2018). arXiv:1803.00567
Ramsay, J.O., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, Berlin (2007)
MATH Google Scholar
Saul, L.K.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 119–155 (2003)
MathSciNet MATH Google Scholar
Solomon, J., de Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Gr. 34(4), 66:1–66:11 (2015)
Article Google Scholar
Srivastava, A., Klassen, E.P.: Functional and Shape Data Analysis. Springer Series in Statistics. Springer, Berlin (2016)
Book Google Scholar
Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48(4), 1005–1026 (2011)
MathSciNet MATH Google Scholar
Tanaka, F., Komaki, F.: A superharmonic prior for the autoregressive process of the second-order. J. Time Ser. Anal. 29(3), 444–452 (2008)
Article MathSciNet Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)
Article Google Scholar
Villani, C.: Optimal Transport: Old and New. Springer, Berlin (2008)
MATH Google Scholar
Wang, H., Marron, J.S.: Object oriented data analysis: sets of trees (2007). arXiv:0711.3147
Article MathSciNet Google Scholar
Yang, L., Jin, R.: Distance metric learning: a comprehensive survey. Mich. State Univ. 2(2), 4 (2006)
Google Scholar

Download references

Acknowledgements

Funding was provided by JST, PRESTO (JPMJPR14E3) and JSPS, KAKENHI (26280009,16K02843), Japan. The first author would like to thank Masayuki Sakai, Takaaki Koike and Tatsuhiro Aoshima for their excellent computation and visualization of the results. He also appreciates Reiko Miyaoka and Hiroshi Kokubu for their helpful and encouraging advice.

Author information

Authors and Affiliations

Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
Kei Kobayashi
London School of Economics, Houghton Street, London, WC2A 2AE, UK
Henry P. Wynn

Authors

Kei Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Henry P. Wynn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kei Kobayashi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

JST PREST, JSPS KAKENHI Grant Numbers 16K02843 and 26280009.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 81 KB)

A Proof of Theorem 6

(1) Denote the mapped points of a, b, c and x by the projection $\tilde{{\mathcal {X}}}_\beta \rightarrow {\mathcal {X}}_\beta $ as A, B, C and X, respectively, as shown in Fig. 14 (left). Denote the origin of the metric cone as O. If the sum of the lengths of the geodesics ${\widetilde{AB}}$, ${\widetilde{AC}}$ and ${\widetilde{BC}}$ in ${\mathcal {X}}_\beta $ exceeds $2\pi \beta $, it is easy to see that the cone spanned by ${\overline{ab}} \cup {\overline{ac}} \cup {\overline{bc}}$ becomes CAT(0) and $\varDelta abc$ satisfies the CAT(0) property. Therefore, assume that $|{\widetilde{AB}}|+|{\widetilde{AC}}|+|{\widetilde{BC}}|\le 2 \pi \beta $.

Next, let $\varDelta a'b'c'$ be a comparison triangle of $\varDelta abc$ and let $x'$ be a point on a geodesic $\overline{b'c'}$ such that $|{\overline{bx}}|=|\overline{b'x'}|$. Thus, $|\overline{a'x'}|< |{\overline{ax}}|$. Arrange the points $a'$, $b'$ and $c'$ in a three-dimensional Euclidean space with origin $O'$ such that the lengths of $\overline{O'a'}$, $\overline{O'b'}$ and $\overline{O'c'}$ are equal to the lengths of ${\overline{Oa}}$, ${\overline{Ob}}$ and ${\overline{Oc}}$, respectively. Denote the radial projection of $a'$, $b'$, $c'$ and $x'$ to a unit sphere as $A'$, $B'$, $C'$ and $X'$, respectively, as shown in Fig. 14 (right). By the definition of a metric cone, $|{\overline{Ox}}|=|\overline{O'x'}|$ and the geodesics $\widetilde{A'B'}$, $\widetilde{A'C'}$, $\widetilde{B'C'}$ and $\widetilde{A'X'}$ in the unit sphere are arcs satisfying $|\widetilde{A'B'}|=|{\widetilde{AB}}|$, $|\widetilde{A'C'}|=|{\widetilde{AC}}|$, $|\widetilde{B'C'}|=|{\widetilde{BC}}|$ and $|\widetilde{B'X'}|=|{\widetilde{BX}}|$.

From the argument above, $|\widetilde{A'B'}|+|\widetilde{A'C'}|+|\widetilde{B'C'}|= |{\widetilde{AB}}|+|{\widetilde{AC}}|+|{\widetilde{BC}}|\le 2 \pi $. Since the unit sphere has a positive constant curvature and ${\mathcal {X}}_\beta $ is CAT(0), $|\widetilde{A'X'}|> |{\widetilde{AX}}|$. However, since $|{\overline{Oa}}|=|\overline{O'a'}|$ and $|{\overline{Ox}}|=|\overline{O'x'}|$, $|\widetilde{A'X'}|> |{\widetilde{AX}}|$ implies that $|\widetilde{a'x'}|> |{\widetilde{ax}}|$ by the property of a metric cone. Thus, $\varDelta abc$ has CAT(0) property and (1) of the theorem is proved.

(2) Assume that $0<\beta _1<\beta _2<\infty $ and a metric cone $\tilde{{\mathcal {X}}}_{\beta _1}$ is not CAT(0) for proving the latter half of the theorem by contradiction. Then, there is a geodesic triangle $\varDelta a_1 b_1 c_1$ in $\tilde{{\mathcal {X}}}_{\beta _1}$ and a point $x_1$ on the geodesic $\overline{b_1 c_1}$ such that the geodesic $\overline{a_1 x_1}$ is longer than the corresponding geodesic of a comparison triangle. By defining $A_1, B_1, C_1, X_1,a'_1,b'_1,c'_1,x'_1,A'_1, B'_1, C'_1$ and $X'_1$ as above, we can say that $|\widetilde{A_1 X_1}|>|\widetilde{A'_1 X'_1}|$.

Next, each of $A_1, B_1, C_1$ and $X_1$ corresponds to a point in ${\mathcal {X}}_{\beta _1}$ and we can consider the corresponding points $A_2, B_2, C_2$ and $X_2$ in the other metric cone $\tilde{{\mathcal {X}}}_{\beta _2}$. When restricted to ${\mathcal {X}}_{\beta _1}$, a geodesic $\widetilde{A_1 X_1}$ is just a rescaling of $\widetilde{A_2 X_2}$ and $|\widetilde{A_1 X_1}|=\frac{\beta _1}{\beta _2}|\widetilde{A_2 X_2}|$.

Now, $\varDelta A_2'B_2'C_2'$ is a geodesic triangle on the unit sphere, but after rescaling by $\frac{\beta _2}{\beta _1}$, we can get a geodesic triangle $\varDelta A_2''B_2''C_2''$ on a sphere of radius $\frac{\beta _2}{\beta _1}$ whose edges have the same length as $\varDelta A_1'B_1'C_1'$. By a known result on spherical triangles with the same edge lengths on different spheres, a larger radius implies a “thinner” triangle and $|\widetilde{A_1 X_1}|<|\widetilde{A_2'' X_2''}|$ where $X_2''$ is a point on the geodesic $\widetilde{B_2'' C_2''}$ such that $|\widetilde{B_1 X_1}|=|\widetilde{B_2'' X_2''}|$.

Combining all the arguments gives

$$\begin{aligned} \textstyle |\widetilde{A_2 X_2}|=\frac{\beta _2}{\beta _1}|\widetilde{A_1 X_1}|>\frac{\beta _2}{\beta _1}|\widetilde{A'_1 X'_1}| >\frac{\beta _2}{\beta _1}|\widetilde{A_2'' X_2''}|=|\widetilde{A'_2 X'_2}|. \end{aligned}$$

Select a non-degenerate geodesic triangle in $\tilde{{\mathcal {X}}}_{\beta _2}$ by selecting arbitrary points $a_2,b_2$ and $c_2$ on the geodesics $\overline{OA_2}$, $\overline{OB_2}$ and $\overline{OC_2}$ in $\tilde{{\mathcal {X}}}_{\beta _2}$, respectively, and let $x_2$ be the intersection point of $\overline{OX_2}$ and $\overline{b_2 c_2}$. Then, by $|\widetilde{A_2 X_2}|>|\widetilde{A'_2 X'_2}|$, we can say that $|\overline{a_2 x_2}|>|\overline{a'_2 x'_2}|$. This implies that $\tilde{{\mathcal {X}}}_{\beta _2}$ is not CAT(0) and (2) of the theorem is proved.

(3) For $k=0$, the statement holds by (1). For $k>0$ and $\beta \le \pi $, it is sufficient to prove for $\beta =\pi /\sqrt{k}$ by (2). Let $\varDelta abc$ be a geodesic triangle in $\tilde{{\mathcal {X}}}_\beta $ and let $\varDelta ABC$ be a geodesic triangle in ${\mathcal {X}}_\beta $. Let A, B, C be the projection of a, b, c, respectively. If the perimeter of $\varDelta ABC$ is longer than or equal to $2\pi $, the cone spanned by the perimeter becomes CAT(0) by the same argument as that for (1). Therefore, $\varDelta abc$ is CAT(0) and satisfies the CAT(0) property.

If the perimeter of $\varDelta ABC$ is smaller than $2\pi $, since ${\mathcal {X}}$ is CAT(k) and ${\mathcal {X}}_\beta $ is CAT(1), for any $X\in {\widetilde{BC}}$, ${\widetilde{BX}}$ is shorter than the corresponding great arc $\widetilde{B'X'}$ of a comparison triangle $\varDelta A'B'C'$, which is a spherical triangle on the unit sphere. Since a comparison triangle $\varDelta a'b'c'$ of $\varDelta abc$ can be embedded on the cone spanned by $\varDelta A'B'C'$, ${\widetilde{bx}}$ is shorter than the corresponding line segment $\widetilde{b'x'}$. This means the $\varDelta abc$ satisfies the CAT(0) property. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kobayashi, K., Wynn, H.P. Empirical geodesic graphs and CAT(k) metrics for data analysis. Stat Comput 30, 1–18 (2020). https://doi.org/10.1007/s11222-019-09855-3

Download citation

Received: 22 March 2018
Accepted: 02 February 2019
Published: 21 February 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11222-019-09855-3

Empirical geodesic graphs and CAT(k) metrics for data analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others