Computer Science > Machine Learning

arXiv:2212.08172 (cs)

[Submitted on 15 Dec 2022 (v1), last revised 31 Jul 2023 (this version, v2)]

Title:Reliable Measures of Spread in High Dimensional Latent Spaces

Authors:Anna C. Marbut, Katy McKinney-Bock, Travis J. Wheeler

View PDF

Abstract:Understanding geometric properties of natural language processing models' latent spaces allows the manipulation of these properties for improved performance on downstream tasks. One such property is the amount of data spread in a model's latent space, or how fully the available latent space is being used. In this work, we define data spread and demonstrate that the commonly used measures of data spread, Average Cosine Similarity and a partition function min/max ratio I(V), do not provide reliable metrics to compare the use of latent space across models. We propose and examine eight alternative measures of data spread, all but one of which improve over these current metrics when applied to seven synthetic data distributions. Of our proposed measures, we recommend one principal component-based measure and one entropy-based measure that provide reliable, relative measures of spread and can be used to compare models of different sizes and dimensionalities.

Comments:	24 pages, 11 figures, 13 tables
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2212.08172 [cs.LG]
	(or arXiv:2212.08172v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.08172

Submission history

From: Anna Marbut [view email]
[v1] Thu, 15 Dec 2022 22:15:11 UTC (422 KB)
[v2] Mon, 31 Jul 2023 19:11:04 UTC (440 KB)

Computer Science > Machine Learning

Title:Reliable Measures of Spread in High Dimensional Latent Spaces

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Reliable Measures of Spread in High Dimensional Latent Spaces

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators