Abstract
Matrices have become essential data representations for many large-scale problems in data analytics, and hence matrix sketching is a critical task. Although much research has focused on improving the error/size tradeoff under various sketching paradigms, we find a simple heuristic iSVD, with no guarantees, tends to outperform all known approaches. In this paper we adapt the best performing guaranteed algorithm, FrequentDirections, in a way that preserves the guarantees, and nearly matches iSVD in practice. We also demonstrate an adversarial dataset for which iSVD performs quite poorly, but our new technique has almost no error. Finally, we provide easy replication of our studies on APT, a new testbed which makes available not only code and datasets, but also a computing platform with fixed environmental settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Concept drift in machine learning and knowledge discovery group, http://mlkd.csd.auth.gr/concept_drift.html
vision.caltech, http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
Achlioptas, D., McSherry, F.: Fast computation of low rank matrix approximations. In: STOC (2001)
Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J.M., Wei, Z., Yi, K.: Mergeable summaries. In: Proceedings of the 31st Symposium on Principles of Database Systems (2012)
Arasu, A., Babu, S., Widom, J.: An abstract semantics and concrete language for continuous queries over streams and relations (2002)
Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000)
Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near optimal column-based matrix reconstruction. In: FOCS (2011)
Brand, M.: Incremental singular value decomposition of uncertain data with missing values. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 707–720. Springer, Heidelberg (2002)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for internet databases. ACM SIGMOD Record 29(2), 379–390 (2000)
Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: STOC (2009)
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: VLDB (2008)
Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: a language for extracting signatures from data streams. In: KDD (2000)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, p. 348. Springer, Heidelberg (2002)
Deshpande, A., Vempala, S.S.: Adaptive Sampling and Fast Low-Rank Matrix Approximation. In: Díaz, J., Jansen, K., Rolim, J.D.P., Zwick, U. (eds.) APPROX 2006 and RANDOM 2006. LNCS, vol. 4110, pp. 292–303. Springer, Heidelberg (2006)
Drineas, P., Kannan, R.: Pass efficient algorithms for approximating large matrices. In: SODA (2003)
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing 3636(1), 158–183 (2006)
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications 30, 844–881 (2008)
Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo algorithms for finding low-rank approximations. In: FOCS (1998)
Ghashami, M., Liberty, E., Phillips, J.M.: Frequent directions: Simple and deterministic matrix sketchings. In: Personal Communication (2014)
Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximation. In: SODA (2014)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Quicksand: Quick summary and analysis of network data. Technical report, DIMACS 2001-43 (2001)
Golub, G.H., van Loan, C.F.: Matrix Computations, vol. 3. JHUP (2012)
Hall, P., Marshall, D., Martin, R.: Incremental eigenanalysis for classification. In: British Machine Vision Conference (1998)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM ToDS 28, 51–55 (2003)
Levey, A., Lindenbaum, M.: Sequential Karhunen-Loeve basis extraction and its application to images. IEEE ToIP 9, 1371–1374 (2000)
Liberty, E.: Simple and deterministic matrix sketching. In: KDD (2013)
Liberty, E., Woolfe, F., Martinsson, P.-G., Rokhlin, V., Tygert, M.: Randomized algorithms for the low-rank approximation of matrices. PNAS 104(51), 20167–20172 (2007)
Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. PNAS 106, 697–702 (2009)
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM ToDS 31, 1095–1133 (2006)
Misra, J., Gries, D.: Finding repeated elements. Sc. Comp. Prog. 2, 143–152 (1982)
Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: PODS (1998)
Ricci, R.: Apt (adaptable profile-driven testbed) (2014), http://www.flux.utah.edu/project/apt
Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. IJCV 77, 125–141 (2008)
Rudelson, M., Vershynin, R.: Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM 54(4), 21 (2007)
Sarlos, T.: Improved approximation algorithms for large matrices via random projections. In: FOCS (2006)
Sullivan, M., Heybey, A.: A system for managing large databases of network traffic. In: Proceedings of USENIX (1998)
Emulab testbed, http://www.flux.utah.edu/project/emulab
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., Attenberg, J.: Feature hashing for large scale multitask learning. In: ICML (2009)
Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ghashami, M., Desai, A., Phillips, J.M. (2014). Improved Practical Matrix Sketching with Guarantees. In: Schulz, A.S., Wagner, D. (eds) Algorithms - ESA 2014. ESA 2014. Lecture Notes in Computer Science, vol 8737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44777-2_39
Download citation
DOI: https://doi.org/10.1007/978-3-662-44777-2_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44776-5
Online ISBN: 978-3-662-44777-2
eBook Packages: Computer ScienceComputer Science (R0)