Efficient Discretization of Optimal Transport
<p>Discretization of “Girl with a Pearl Earring” and “Starry Night” using EDOT with 2000 discretization points for each RGB channel. <math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mo>=</mo> <mn>0.01</mn> <mo>×</mo> <msup> <mi>diam</mi> <mn>2</mn> </msup> </mrow> </semantics></math>.</p> "> Figure 2
<p>(<b>a</b>–<b>c</b>) Plots of EDOT discretizations of the Examples (1)–(3). In (<b>c</b>), the <span class="html-italic">x</span>-axis and <span class="html-italic">y</span>-axis are the 2D coordinates, and the probability density of <math display="inline"><semantics> <mi>μ</mi> </semantics></math> and weights of <math display="inline"><semantics> <msub> <mi>μ</mi> <mi>m</mi> </msub> </semantics></math> are encoded by color. (<b>d</b>,<b>e</b>) show comparison between EDOT and i.i.d. sampling for Examples (1) and (2). EDOT are calculated with <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> to 7 (3 to 8). The 4 boundary curves of the shaded region are <math display="inline"><semantics> <mrow> <mn>5</mn> <mo>%</mo> </mrow> </semantics></math>-, <math display="inline"><semantics> <mrow> <mn>25</mn> <mo>%</mo> </mrow> </semantics></math>-, <math display="inline"><semantics> <mrow> <mn>75</mn> <mo>%</mo> </mrow> </semantics></math>-, and <math display="inline"><semantics> <mrow> <mn>95</mn> <mo>%</mo> </mrow> </semantics></math>-percentile curves; the orange line represents the level of <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>; (<b>f</b>) plots the entropy regularized Wasserstein distance <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>ζ</mi> </mrow> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <msub> <mi>μ</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> versus the SGD steps for Example (2) with <math display="inline"><semantics> <msub> <mi>μ</mi> <mi>m</mi> </msub> </semantics></math> optimized by 5-point EDOT. <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mspace width="-0.166667em"/> <mo>=</mo> <mspace width="-0.166667em"/> <mn>0.01</mn> </mrow> </semantics></math> in all cases.</p> "> Figure 3
<p>(<b>a</b>): Approximation of a transference plan. <b>Left</b>: <math display="inline"><semantics> <mrow> <mn>5</mn> <mo>×</mo> <mn>5</mn> </mrow> </semantics></math> EDOT approximation. <b>Right</b>: <math display="inline"><semantics> <mrow> <mn>25</mn> <mo>×</mo> <mn>25</mn> </mrow> </semantics></math> naive approximation. In both figures, magnitudes of each point is color coded, the background grayscale density represents the true EOT plan. (<b>b</b>): An example of adaptive refinement on a unit square. Left: division of 10,000 sample <span class="html-italic">S</span> approximating a mixture of two truncated Gaussian distributions and the refinement for 30 discretization points. Number of discretization points assigned to each region is marked by black numbers. E.g., upper left regaion needs 6 points. Right: the discretization optimized locally and combined as a probability measure with <math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>.</p> "> Figure 4
<p>(<b>a</b>) <b>Left</b>: 30,000 samples from <math display="inline"><semantics> <msub> <mi>μ</mi> <mrow> <mi>s</mi> <mi>p</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>e</mi> </mrow> </msub> </semantics></math> and the 3D cells under divide-and-conquer algorithm <b>Right</b>: 40-point EDOTs in 3D. (<b>b</b>) The 40-point CW-EDOTs in 2D. Red dots: samples, other dots: discrete atoms with weights represented in colors. <b>Left</b>: upper hemisphere. <b>Right</b>: lower hemisphere, stereographic projections about poles. <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mo>=</mo> <mn>0.01</mn> <mo>×</mo> <msup> <mi>diam</mi> <mn>2</mn> </msup> </mrow> </semantics></math>.</p> "> Figure 5
<p>EDOT of an example from [<a href="#B19-entropy-25-00839" class="html-bibr">19</a>]. Potrait of Riemann, discretization of size 625. <b>Left</b>: green dots show position and weights of EDOT discretization (same as right); cells in background are discretization of the same size in the original [<a href="#B19-entropy-25-00839" class="html-bibr">19</a>]. <b>Right</b>: A size 10,000 discretization from [<a href="#B19-entropy-25-00839" class="html-bibr">19</a>]; we directly applied EDOT to this picture, treating it as the continuous distribution. <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mo>=</mo> <mn>0.01</mn> <mo>×</mo> <msup> <mi>diam</mi> <mn>2</mn> </msup> </mrow> </semantics></math>.</p> "> Figure 6
<p>A comparison of EDOT (<b>left</b>), EDOT-Equal (<b>mid</b>), and [<a href="#B18-entropy-25-00839" class="html-bibr">18</a>] (<b>right</b>) on the Canary islands, treated as a binary distribution with a constant density on islands and 0 in the sea. Discretizations for each method is shown by black bullets. Wasserstein distances: EDOT: <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mn>0.005</mn> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.02876</mn> </mrow> </semantics></math>, EDOT-Equal: <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mn>0.005</mn> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.05210</mn> </mrow> </semantics></math>, Claici: <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mn>0.005</mn> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.05288</mn> </mrow> </semantics></math>. Map size is <math display="inline"><semantics> <mrow> <mn>3.13</mn> <mo>×</mo> <mn>1.43</mn> </mrow> </semantics></math>.</p> "> Figure 7
<p>Discretization of a kitty. Discretization by each method is shown in red bullets on top of the Kitty image. (<b>a</b>) EDOT, 10 points, <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mn>0.001</mn> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.009176</mn> </mrow> </semantics></math>, radius represents weight; (<b>b</b>) EDOT-Equal, 10 points, <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mn>0.001</mn> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.008960</mn> </mrow> </semantics></math>; (<b>c</b>) [<a href="#B18-entropy-25-00839" class="html-bibr">18</a>], 10 points, <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mn>0.001</mn> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.009832</mn> </mrow> </semantics></math>; (<b>d</b>) [<a href="#B18-entropy-25-00839" class="html-bibr">18</a>], 200 points. Figure size <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mo>=</mo> <mn>0.01</mn> <mo>×</mo> <msup> <mi>diam</mi> <mn>2</mn> </msup> </mrow> </semantics></math>.</p> "> Figure 8
<p>2000-Point Discretizations, (<b>a</b>). EDOT (weight plotted in color), (<b>b</b>). EDOT-Equal, (<b>c</b>). Relations between <math display="inline"><semantics> <mrow> <mo form="prefix">log</mo> <mo>(</mo> <msup> <mi>W</mi> <mn>2</mn> </msup> <mo>)</mo> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mo form="prefix">log</mo> <mi>m</mi> </mrow> </semantics></math> (all with divide and conquer); it can be seen that the advantage of <math display="inline"><semantics> <msub> <mi>W</mi> <mi>EDOT</mi> </msub> </semantics></math> over <math display="inline"><semantics> <msub> <mi>W</mi> <mi>Equal</mi> </msub> </semantics></math> grows with the size of discretization.</p> "> Figure 9
<p>Discretization of 16 blue disks in a unit square with 24 points (black). (<b>a</b>) EDOT, <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mi>ζ</mi> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.001398</mn> </mrow> </semantics></math>; (<b>b</b>) EDOT-Equal, <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mi>ζ</mi> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.002008</mn> </mrow> </semantics></math>; (<b>c</b>) [<a href="#B18-entropy-25-00839" class="html-bibr">18</a>], <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mi>ζ</mi> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.002242</mn> </mrow> </semantics></math>. <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>4</mn> </mrow> </msup> </mrow> </semantics></math>. Figure size is <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> <mn>1</mn> </mrow> </semantics></math>.</p> "> Figure A1
<p>Divide-and-conquer strategies: original and Variation 1. Original tends to assign more atoms to vast region with small weights, while Variation 1 does better. Example: distribution on <math display="inline"><semantics> <mrow> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> </mrow> </semantics></math>, pdf is plotted in blue curve, discretization size of each cell is in black.</p> "> Figure A2
<p>Richardson Extrapolation: the power of the expansion about <math display="inline"><semantics> <msup> <mi>N</mi> <mrow> <mo>−</mo> <mn>1</mn> </mrow> </msup> </semantics></math>. We take the EDOT <math display="inline"><semantics> <msub> <mi>μ</mi> <mn>5</mn> </msub> </semantics></math> of example 2 (1-dim truncated normal mixture) as the target <math display="inline"><semantics> <msub> <mi>μ</mi> <mi>m</mi> </msub> </semantics></math> and use evenly positioned <math display="inline"><semantics> <msub> <mi>μ</mi> <mi>N</mi> </msub> </semantics></math> for different <span class="html-italic">N</span>s to estimate. The <span class="html-italic">y</span>-axis is <math display="inline"><semantics> <mrow> <mo form="prefix">ln</mo> <mo>(</mo> <msubsup> <mi>W</mi> <mrow> <mn>2</mn> <mo>,</mo> <mn>0.01</mn> </mrow> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <msub> <mi>μ</mi> <mrow> <mo>(</mo> <mi>r</mi> <mi>N</mi> <mo>)</mo> </mrow> </msub> <mo>,</mo> <msub> <mi>μ</mi> <mn>5</mn> </msub> <mo>)</mo> </mrow> <mo>−</mo> <msubsup> <mi>W</mi> <mrow> <mn>2</mn> <mo>,</mo> <mn>0.01</mn> </mrow> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <msub> <mi>μ</mi> <mrow> <mo>(</mo> <mi>N</mi> <mo>)</mo> </mrow> </msub> <mo>,</mo> <msub> <mi>μ</mi> <mn>5</mn> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> are calculated. With <math display="inline"><semantics> <mrow> <mo form="prefix">ln</mo> <mi>N</mi> </mrow> </semantics></math> as <span class="html-italic">x</span>-axis, linearity can be observed. The slopes are both about <math display="inline"><semantics> <mrow> <mo>−</mo> <mn>1.9998</mn> </mrow> </semantics></math>, which represent the exponent of the leading non-constant term of <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mn>2</mn> <mo>,</mo> <mn>0.01</mn> </mrow> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <msub> <mi>μ</mi> <mrow> <mo>(</mo> <mi>N</mi> <mo>)</mo> </mrow> </msub> <mo>,</mo> <msub> <mi>μ</mi> <mn>5</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> on <span class="html-italic">N</span>, while the theoretical result is <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mo>−</mo> <mi>k</mi> <mo>/</mo> <mi>d</mi> <mo>=</mo> <mo>−</mo> <mn>2</mn> </mrow> </semantics></math>. The differences are from higher order terms on <span class="html-italic">N</span>.</p> "> Figure A3
<p>The sphere example with 3D discretization (same as the paper) on two view directions. Colors of dots represent the weights of each atom in the distribution.</p> "> Figure A4
<p>The McCann interpolation figures in finer time resolution for visualizing the transference plan from (<b>1</b>)–(<b>11</b>). It is a refined figure of the original paper. We see can see that the larger bubbles (representing a large probability mass) moved in a short distance, and smaller pieces moved longer.</p> "> Figure A5
<p>Discretization of a distribution supported on a Swiss Roll. <b>Left</b>: A total of 15,000 samples from the truncated normal mixture distribution <math display="inline"><semantics> <msub> <mi>μ</mi> <mi>swiss</mi> </msub> </semantics></math> over <math display="inline"><semantics> <msub> <mi>X</mi> <mi>swiss</mi> </msub> </semantics></math>. <b>Right</b>: A 50-point 3D discretization using Variation 2 of Algorithm A2; the refinement cells are shown in colored boxes.</p> "> Figure A6
<p>Swiss Roll under isometry. (<b>a</b>) Refinement cells under 3D Euclidean metric (one color per samples from a refinement cell). (<b>b</b>) Refinement cells under 2D metric induced by the isometry. (<b>c</b>) EDOT of 50 points with respect to the 2D refinement. <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mo>=</mo> <mn>0.01</mn> <mo>×</mo> <msup> <mi>diam</mi> <mn>2</mn> </msup> </mrow> </semantics></math> for all.</p> "> Figure A7
<p>A 7-point barycenter of two Gaussian distributions: (<b>a</b>): EDOT, area of dots represent the weights, <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>m</mi> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.7322</mn> </mrow> </semantics></math>; (<b>b</b>): EDOT-Equal, <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>m</mi> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.7380</mn> </mrow> </semantics></math>; (<b>c</b>): [<a href="#B18-entropy-25-00839" class="html-bibr">18</a>], <math display="inline"><semantics> <mrow> <msubsup> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>m</mi> </mrow> <mn>2</mn> </msubsup> <mo>=</mo> <mn>0.7389</mn> </mrow> </semantics></math>; <span class="html-italic">W</span> are regularized with <math display="inline"><semantics> <mrow> <mi>ζ</mi> <mo>=</mo> <mn>0.04</mn> </mrow> </semantics></math>.</p> ">
Abstract
:1. Introduction
2. Efficient Discretizations
3. Gradient of the Objective Function
4. The Discretization Algorithm
5. Methods of Improvement
6. Analysis of the Algorithms
7. Related Work and Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
OT | Optimal Transport |
EOT | Entropy-Regularized Optimal Transport |
EDOT | Efficient Discretization of Optimal Transport |
SGD | Stochastic Gradient Descent |
Appendix A. Proof of Proposition 1
Appendix B. Gradient of
Appendix B.1. The Gradient
Appendix B.2. Second Derivatives
Appendix C. Algorithms
Appendix C.1. Algorithm: Simple EDOT
Algorithm A1 Simple EDOT using minibatch SGD |
|
Appendix C.2. Proof of Proposition 2
Appendix C.3. Proof of Proposition 3
Appendix C.4. Adaptive Refinement via DFS: Midpoints
Algorithm A2 Adaptive Refinement via Depth First Search |
|
Algorithm A3 Adaptive EDOT Variation 1 |
|
Appendix C.5. Adaptive Refinement: Variation 1
Appendix C.6. Adaptive Refinement: Variation 2
Appendix D. Empirical Parts
Appendix D.1. Estimate : Richardson Extrapolation and Others
Appendix D.2. Example: The Sphere
Spherical Geometry
Appendix D.3. A Note on the Implementation of SGD with Momentum
Appendix D.4. An Example on Transference Plan with Adaptive EDOT
Appendix D.5. Example: Swiss Roll
Appendix D.6. Example: Figure Densities
Appendix D.7. Example: Simple Barycenter Problems
References
- Kantorovich, L.V. On the translocation of masses. J. Math. Sci. 2006, 133, 1381–1382. [Google Scholar] [CrossRef]
- Peyré, G.; Cuturi, M. Computational optimal transport. Found. Trends Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
- Villani, C. Optimal Transport: Old and New; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; Volume 338. [Google Scholar]
- Janati, H.; Muzellec, B.; Peyré, G.; Cuturi, M. Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form. Adv. Neural Inf. Process. Syst. 2020, 33, 10468–10479. [Google Scholar]
- Aude, G.; Cuturi, M.; Peyré, G.; Bach, F. Stochastic optimization for large-scale optimal transport. arXiv 2016, arXiv:1605.08527. [Google Scholar]
- Allen-Zhu, Z.; Li, Y.; Oliveira, R.; Wigderson, A. Much faster algorithms for matrix scaling. In Proceedings of the 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, 15–17 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 890–901. [Google Scholar]
- Lin, T.; Ho, N.; Jordan, M.I. On the efficiency of the Sinkhorn and Greenkhorn algorithms and their acceleration for optimal transport. arXiv 2019, arXiv:1906.01437. [Google Scholar]
- Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In Proceedings of the Advances in Neural Information Processing Systems, Harrahs and Harveys, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2292–2300. [Google Scholar]
- Sinkhorn, R.; Knopp, P. Concerning nonnegative matrices and doubly stochastic matrices. Pac. J. Math. 1967, 21, 343–348. [Google Scholar] [CrossRef]
- Wang, J.; Wang, P.; Shafto, P. Sequential Cooperative Bayesian Inference. In Proceedings of the International Conference on Machine Learning, PMLR, Online/Vienna, Austria, 12–18 July 2020; pp. 10039–10049. [Google Scholar]
- Tran, M.N.; Nott, D.J.; Kohn, R. Variational Bayes with intractable likelihood. J. Comput. Graph. Stat. 2017, 26, 873–882. [Google Scholar] [CrossRef]
- Overstall, A.; McGree, J. Bayesian design of experiments for intractable likelihood models using coupled auxiliary models and multivariate emulation. Bayesian Anal. 2020, 15, 103–131. [Google Scholar] [CrossRef]
- Wang, P.; Wang, J.; Paranamana, P.; Shafto, P. A mathematical theory of cooperative communication. Adv. Neural Inf. Process. Syst. 2020, 33, 17582–17593. [Google Scholar]
- Luise, G.; Rudi, A.; Pontil, M.; Ciliberto, C. Differential properties of sinkhorn approximation for learning with wasserstein distance. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2018; pp. 5859–5870. [Google Scholar]
- Accinelli, E. A Generalization of the Implicit Function Theorems. 2009. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1512763 (accessed on 3 February 2021).
- Weed, J.; Bach, F. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli 2019, 25, 2620–2648. [Google Scholar] [CrossRef]
- Dudley, R.M. The speed of mean Glivenko-Cantelli convergence. Ann. Math. Stat. 1969, 40, 40–50. [Google Scholar] [CrossRef]
- Claici, S.; Chien, E.; Solomon, J. Stochastic Wasserstein Barycenters. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 999–1008. [Google Scholar]
- Mérigot, Q. A multiscale approach to optimal transport. In Proceedings of the Computer Graphics Forum; Blackwell Publishing Ltd.: Oxford, UK, 2011; Volume 30, pp. 1583–1592. [Google Scholar]
- Solomon, J.; de Goes, F.; Peyré, G.; Cuturi, M.; Butscher, A.; Nguyen, A.; Du, T.; Guibas, L. Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains. ACM Trans. Graph. 2015, 34, 1–11. [Google Scholar] [CrossRef]
- Staib, M.; Claici, S.; Solomon, J.M.; Jegelka, S. Parallel Streaming Wasserstein Barycenters. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Long Beach, CA, USA, 2017; Volume 30. [Google Scholar]
- Beugnot, G.; Genevay, A.; Greenewald, K.; Solomon, J. Improving Approximate Optimal Transport Distances using Quantization. arXiv 2021, arXiv:2102.12731. [Google Scholar]
- Jacobs, M.; Léger, F. A fast approach to optimal transport: The back-and-forth method. Numer. Math. 2020, 146, 513–544. [Google Scholar] [CrossRef]
- Genevay, A.; Chizat, L.; Bach, F.; Cuturi, M.; Peyré, G. Sample complexity of sinkhorn divergences. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Naha, Okinawa, Japan, 16–18 April 2019; pp. 1574–1583. [Google Scholar]
- Bottou, L.; Curtis, F.E.; Nocedal, J. Optimization methods for large-scale machine learning. Siam Rev. 2018, 60, 223–311. [Google Scholar] [CrossRef]
- Mensch, A.; Peyré, G. Online sinkhorn: Optimal transport distances from sample streams. Adv. Neural Inf. Process. Syst. 2020, 33, 1657–1667. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Wang, P.; Shafto, P. Efficient Discretization of Optimal Transport. Entropy 2023, 25, 839. https://doi.org/10.3390/e25060839
Wang J, Wang P, Shafto P. Efficient Discretization of Optimal Transport. Entropy. 2023; 25(6):839. https://doi.org/10.3390/e25060839
Chicago/Turabian StyleWang, Junqi, Pei Wang, and Patrick Shafto. 2023. "Efficient Discretization of Optimal Transport" Entropy 25, no. 6: 839. https://doi.org/10.3390/e25060839
APA StyleWang, J., Wang, P., & Shafto, P. (2023). Efficient Discretization of Optimal Transport. Entropy, 25(6), 839. https://doi.org/10.3390/e25060839