Improved Practical Matrix Sketching with Guarantees

Mina Ghashami¹⁷,
Amey Desai¹⁷ &
Jeff M. Phillips¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8737))

Included in the following conference series:

European Symposium on Algorithms

2300 Accesses

Abstract

Matrices have become essential data representations for many large-scale problems in data analytics, and hence matrix sketching is a critical task. Although much research has focused on improving the error/size tradeoff under various sketching paradigms, we find a simple heuristic iSVD, with no guarantees, tends to outperform all known approaches. In this paper we adapt the best performing guaranteed algorithm, FrequentDirections, in a way that preserves the guarantees, and nearly matches iSVD in practice. We also demonstrate an adversarial dataset for which iSVD performs quite poorly, but our new technique has almost no error. Finally, we provide easy replication of our studies on APT, a new testbed which makes available not only code and datasets, but also a computing platform with fixed environmental settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

M $$^2$$ M: A General Method to Perform Various Data Analysis Tasks from a Differentially Private Sketch

Sketching Techniques for Very Large Matrix Factorization

Sketching Data Structures for Massive Graph Problems

References

Concept drift in machine learning and knowledge discovery group, http://mlkd.csd.auth.gr/concept_drift.html
vision.caltech, http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
Achlioptas, D., McSherry, F.: Fast computation of low rank matrix approximations. In: STOC (2001)
Google Scholar
Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J.M., Wei, Z., Yi, K.: Mergeable summaries. In: Proceedings of the 31st Symposium on Principles of Database Systems (2012)
Google Scholar
Arasu, A., Babu, S., Widom, J.: An abstract semantics and concrete language for continuous queries over streams and relations (2002)
Google Scholar
Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000)
Chapter Google Scholar
Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near optimal column-based matrix reconstruction. In: FOCS (2011)
Google Scholar
Brand, M.: Incremental singular value decomposition of uncertain data with missing values. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 707–720. Springer, Heidelberg (2002)
Chapter Google Scholar
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for internet databases. ACM SIGMOD Record 29(2), 379–390 (2000)
Article Google Scholar
Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: STOC (2009)
Google Scholar
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: VLDB (2008)
Google Scholar
Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: a language for extracting signatures from data streams. In: KDD (2000)
Google Scholar
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, p. 348. Springer, Heidelberg (2002)
Chapter Google Scholar
Deshpande, A., Vempala, S.S.: Adaptive Sampling and Fast Low-Rank Matrix Approximation. In: Díaz, J., Jansen, K., Rolim, J.D.P., Zwick, U. (eds.) APPROX 2006 and RANDOM 2006. LNCS, vol. 4110, pp. 292–303. Springer, Heidelberg (2006)
Chapter Google Scholar
Drineas, P., Kannan, R.: Pass efficient algorithms for approximating large matrices. In: SODA (2003)
Google Scholar
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing 3636(1), 158–183 (2006)
Article MathSciNet Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications 30, 844–881 (2008)
Article MATH MathSciNet Google Scholar
Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo algorithms for finding low-rank approximations. In: FOCS (1998)
Google Scholar
Ghashami, M., Liberty, E., Phillips, J.M.: Frequent directions: Simple and deterministic matrix sketchings. In: Personal Communication (2014)
Google Scholar
Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximation. In: SODA (2014)
Google Scholar
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Quicksand: Quick summary and analysis of network data. Technical report, DIMACS 2001-43 (2001)
Google Scholar
Golub, G.H., van Loan, C.F.: Matrix Computations, vol. 3. JHUP (2012)
Google Scholar
Hall, P., Marshall, D., Martin, R.: Incremental eigenanalysis for classification. In: British Machine Vision Conference (1998)
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM ToDS 28, 51–55 (2003)
Article Google Scholar
Levey, A., Lindenbaum, M.: Sequential Karhunen-Loeve basis extraction and its application to images. IEEE ToIP 9, 1371–1374 (2000)
Google Scholar
Liberty, E.: Simple and deterministic matrix sketching. In: KDD (2013)
Google Scholar
Liberty, E., Woolfe, F., Martinsson, P.-G., Rokhlin, V., Tygert, M.: Randomized algorithms for the low-rank approximation of matrices. PNAS 104(51), 20167–20172 (2007)
Article MATH MathSciNet Google Scholar
Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. PNAS 106, 697–702 (2009)
Article MATH MathSciNet Google Scholar
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM ToDS 31, 1095–1133 (2006)
Article Google Scholar
Misra, J., Gries, D.: Finding repeated elements. Sc. Comp. Prog. 2, 143–152 (1982)
Article MATH MathSciNet Google Scholar
Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: PODS (1998)
Google Scholar
Ricci, R.: Apt (adaptable profile-driven testbed) (2014), http://www.flux.utah.edu/project/apt
Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. IJCV 77, 125–141 (2008)
Article Google Scholar
Rudelson, M., Vershynin, R.: Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM 54(4), 21 (2007)
Article MathSciNet Google Scholar
Sarlos, T.: Improved approximation algorithms for large matrices via random projections. In: FOCS (2006)
Google Scholar
Sullivan, M., Heybey, A.: A system for managing large databases of network traffic. In: Proceedings of USENIX (1998)
Google Scholar
Emulab testbed, http://www.flux.utah.edu/project/emulab
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., Attenberg, J.: Feature hashing for large scale multitask learning. In: ICML (2009)
Google Scholar
Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Utah, Utah, USA
Mina Ghashami, Amey Desai & Jeff M. Phillips

Authors

Mina Ghashami
View author publications
You can also search for this author in PubMed Google Scholar
Amey Desai
View author publications
You can also search for this author in PubMed Google Scholar
Jeff M. Phillips
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, USA
Andreas S. Schulz
Karlsruhe Institute of Technology (KIT), Kaiserstruhe 12, 76131, Karlsruhe, Germany
Dorothea Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghashami, M., Desai, A., Phillips, J.M. (2014). Improved Practical Matrix Sketching with Guarantees. In: Schulz, A.S., Wagner, D. (eds) Algorithms - ESA 2014. ESA 2014. Lecture Notes in Computer Science, vol 8737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44777-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-662-44777-2_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44776-5
Online ISBN: 978-3-662-44777-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics