Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3183713.3196930acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Sketching Linear Classifiers over Data Streams

Published: 27 May 2018 Publication History

Abstract

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis that establishes recovery guarantees for batch and online learning, and demonstrate empirical improvements in memory-accuracy trade-offs over alternative memory-budgeted methods, including count-based sketches and feature hashing.

References

[1]
Dimitris Achlioptas. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of computer and System Sciences Vol. 66, 4 (2003), 671--687.
[2]
Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654--2662.
[3]
Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, and Sahaana Suri. 2017. Macrobase: Prioritizing attention in fast data. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 541--556.
[4]
Nagender Bandi, Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2007. Fast data stream algorithms using associative memories Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM, 247--256.
[5]
Avrim Blum, Adam Kalai, and John Langford. 1999. Beating the hold-out: Bounds for k-fold and progressive cross-validation Proceedings of the twelfth annual conference on Computational learning theory. ACM, 203--208.
[6]
Oscar Boykin, Sam Ritchie, Ian O'Connell, and Jimmy Lin. 2014. Summingbird: A framework for integrating batch and online mapreduce computations. Proceedings of the VLDB Endowment Vol. 7, 13 (2014), 1441--1451.
[7]
Daniela Brauckhoff, Xenofontas Dimitropoulos, Arno Wagner, and Kavè Salamatian. 2012. Anomaly extraction in backbone networks using association rules. IEEE/ACM Transactions on Networking (TON) Vol. 20, 6 (2012), 1788--1799.
[8]
Cristian Buciluv a, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 535--541.
[9]
Robert Calderbank, Sina Jafarpour, and Robert Schapire . {n. d.}. Compressed Learning: Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain. (.{n. d.}).
[10]
Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. Automata, languages and programming (2002), 784--784.
[11]
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2013. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005 (2013).
[12]
Jeffrey Considine, Feifei Li, George Kollios, and John Byers. 2004. Approximate aggregation techniques for sensor databases Data Engineering, 2004. Proceedings. 20th International Conference on. IEEE, 449--460.
[13]
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. arXiv preprint arXiv:1701.08230 (2017).
[14]
Graham Cormode and Marios Hadjieleftheriou. 2008. Finding frequent items in data streams. Proceedings of the VLDB Endowment Vol. 1, 2 (2008), 1530--1541.
[15]
Graham Cormode and Shan Muthukrishnan. 2005 a. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms Vol. 55, 1 (2005), 58--75.
[16]
Graham Cormode and S Muthukrishnan. 2005 b. What's new: Finding significant differences in network data streams. IEEE/ACM Transactions on Networking (TON) Vol. 13, 6 (2005), 1219--1232.
[17]
Koby Crammer, Jaz Kandola, and Yoram Singer. 2004. Online classification on a budget. In Advances in neural information processing systems. 225--232.
[18]
Alberto Dainotti, Antonio Pescape, and Kimberly C Claffy. 2012. Issues and future directions in traffic classification. IEEE network, Vol. 26, 1 (2012).
[19]
Ofer Dekel, Shai Shalev-Shwartz, and Yoram Singer. 2006. The Forgetron: A kernel-based perceptron on a fixed budget Advances in neural information processing systems. 259--266.
[20]
Erik D Demaine, Alejandro López-Ortiz, and J Ian Munro. 2002. Frequency estimation of internet packet streams with limited space European Symposium on Algorithms. Springer, 348--360.
[21]
John Duchi and Yoram Singer. 2009. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research Vol. 10, Dec (2009), 2899--2934.
[22]
Benjamin V Durme and Ashwin Lall. 2009. Streaming pointwise mutual information. In Advances in Neural Information Processing Systems. 1892--1900.
[23]
Pavlos S Efraimidis and Paul G Spirakis. 2006. Weighted random sampling with a reservoir. Inform. Process. Lett. Vol. 97, 5 (2006), 181--185.
[24]
Philippe Flajolet. 1985. Approximate counting: a detailed analysis. BIT Numerical Mathematics Vol. 25, 1 (1985), 113--134.
[25]
Daniel Golovin, D Sculley, Brendan McMahan, and Michael Young. 2013. Large-scale learning with less ram via randomization Proceedings of the 30th International Conference on Machine Learning (ICML-13). 325--333.
[26]
Michael Greenwald and Sanjeev Khanna. 2001. Space-efficient online computation of quantile summaries ACM SIGMOD Record, Vol. Vol. 30. ACM, 58--66.
[27]
Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. 2017. ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices International Conference on Machine Learning. 1331--1340.
[28]
Michael Gutmann and Aapo Hyvämodels Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 297--304.
[29]
Elad Hazan et almbox. 2016. Introduction to online convex optimization. Foundations and Trends® in Optimization, Vol. 2, 3--4 (2016), 157--325.
[30]
Elad Hazan, Amit Agarwal, and Satyen Kale. 2007. Logarithmic regret algorithms for online convex optimization. Machine Learning, Vol. 69, 2 (2007), 169--192.
[31]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[32]
Steven CH Hoi, Jialei Wang, Peilin Zhao, and Rong Jin. 2012. Online feature selection for mining big data. In Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications. ACM, 93--100.
[33]
William B Johnson and Joram Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemporary mathematics Vol. 26, 189--206 (1984), 1.
[34]
Nobuhiro Kaji and Hayato Kobayashi. 2017. Incremental skip-gram model with negative sampling. arXiv preprint arXiv:1704.03956 (2017).
[35]
Daniel M Kane and Jelani Nelson. 2014. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM (JACM) Vol. 61, 1 (2014), 4.
[36]
Ashish Kapoor, Simon Baker, Sumit Basu, and Eric Horvitz. 2012. Memory constrained face recognition. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2539--2546.
[37]
Richard M Karp, Scott Shenker, and Christos H Papadimitriou. 2003. A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), Vol. 28, 1 (2003), 51--55.
[38]
Jakub Konevcnỳ, Brendan McMahan, and Daniel Ramage. 2015. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015).
[39]
Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things International Conference on Machine Learning. 1935--1944.
[40]
John Langford, Lihong Li, and Tong Zhang. 2009. Sparse online learning via truncated gradient. Journal of Machine Learning Research Vol. 10, Mar (2009), 777--801.
[41]
Kasper Green Larsen, Jelani Nelson, Huy L Nguyên, and Mikkel Thorup. 2016. Heavy hitters via cluster-preserving clustering. Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 61--70.
[42]
Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization Advances in neural information processing systems. 2177--2185.
[43]
David D Lewis, Yiming Yang, Tony G Rose, and Fan Li. 2004. RCV1: A new benchmark collection for text categorization research. Journal of machine learning research Vol. 5, Apr (2004), 361--397.
[44]
Brent Longstaff, Sasank Reddy, and Deborah Estrin. 2010. Improving activity classification for health applications on mobile devices using active and semi-supervised learning. In International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth). IEEE.
[45]
Ge Luo, Lu Wang, Ke Yi, and Graham Cormode. 2016. Quantiles over data streams: experimental comparisons, new analyses, and further improvements. The VLDB Journal, Vol. 25, 4 (2016), 449--472.
[46]
Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. 2009. Identifying suspicious URLs: an application of large-scale online learning Proceedings of the 26th annual international conference on machine learning. ACM, 681--688.
[47]
Gurmeet Singh Manku and Rajeev Motwani. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 346--357.
[48]
Chandler May, Kevin Duh, Benjamin Van Durme, and Ashwin Lall. 2017. Streaming Word Embeddings with the Space-Saving Algorithm. arXiv preprint arXiv:1704.07463 (2017).
[49]
Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hacsim Sak, Alexander Gruenstein, Franccoise Beaufays, et almbox. 2016. Personalized speech recognition on mobile devices. Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 5955--5959.
[50]
H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et almbox. 2013. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1222--1230.
[51]
Alexandra Meliou, Sudeepa Roy, and Dan Suciu. 2014. Causality and explanations in databases. In VLDB.
[52]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams International Conference on Database Theory. Springer, 398--412.
[53]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[54]
Katsiaryna Mirylenka, Graham Cormode, Themis Palpanas, and Divesh Srivastava . 2015. Conditional heavy hitters: detecting interesting correlations in data streams. The VLDB Journal, Vol. 24, 3 (2015), 395--414.
[55]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.
[56]
Pratanu Roy, Arijit Khan, and Gustavo Alonso. 2016. Augmented sketch: Faster and more accurate stream processing Proceedings of the 2016 International Conference on Management of Data. ACM, 1449--1463.
[57]
Robert Schweller, Ashish Gupta, Elliot Parsons, and Yan Chen. 2004. Reversible sketches for efficient and accurate change detection over network data streams Proceedings of the 4th ACM SIGCOMM conference on Internet measurement. ACM, 207--212.
[58]
Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter . 2011. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming Vol. 127, 1 (2011), 3--30.
[59]
Ohad Shamir. 2016. Without-Replacement Sampling for Stochastic Gradient Methods. Advances in Neural Information Processing Systems 29, bibfieldeditorD. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 46--54. http://papers.nips.cc/paper/6245-without-replacement-sampling-for-stochastic-gradient-methods.pdf
[60]
Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alexander L Strehl, Alex J Smola, and SVN Vishwanathan. 2009. Hash kernels International Conference on Artificial Intelligence and Statistics. 496--503.
[61]
Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, and Subhash Suri . 2004. Medians and beyond: new aggregation techniques for sensor networks Proceedings of the 2nd international conference on Embedded networked sensor systems. ACM, 239--249.
[62]
Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. 2017. Federated Multi-Task Learning. In Advances in Neural Information Processing Systems. 4427--4437.
[63]
J. Stamper, A. Niculescu-Mizil, S. Ritter, G.J. Gordon, and K.R. Koedinger. 2010. Algebra I 2008--2009. Challenge data set from KDD Cup 2010 Educational Data Mining Challenge. (2010). Find it at http://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp.
[64]
Jacob Steinhardt and John Duchi. 2015. Minimax rates for memory-bounded sparse linear regression Conference on Learning Theory. 1564--1587.
[65]
Peter D Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research Vol. 37 (2010), 141--188.
[66]
CAIDA UCSD. 2008. The CAIDA UCSD Anonymized Passive OC48 Internet Traces Dataset. (2008). http://www.caida.org/data/passive/passive_oc48_dataset.xml
[67]
Balajee Vamanan, Gwendolyn Voskuilen, and TN Vijaykumar. 2010. EffiCuts: optimizing packet classification for memory and throughput ACM SIGCOMM Computer Communication Review, Vol. Vol. 40. ACM, 207--218.
[68]
Shoba Venkataraman, Dawn Song, Phillip B Gibbons, and Avrim Blum. 2005. New streaming algorithms for fast detection of superspreaders. Department of Electrical and Computing Engineering (2005), 6.
[69]
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1113--1120.
[70]
Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. Proceedings of the VLDB Endowment Vol. 6, 8 (2013), 553--564.
[71]
Lin Xiao. 2010. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research Vol. 11, Oct (2010), 2543--2596.
[72]
Tianbao Yang, Lijun Zhang, Rong Jin, and Shenghuo Zhu. 2015. Theory of dual-sparse regularized randomized reduction International Conference on Machine Learning. 305--314.
[73]
Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G McKenzie, Jung-Wei Chou, Po-Han Chung, Chia-Hua Ho, Chun-Fu Chang, Yin-Hsuan Wei, et almbox. 2010. Feature engineering and classifier ensemble for KDD cup 2010 KDD Cup.
[74]
Minlan Yu, Lavanya Jose, and Rui Miao. 2013. Software Defined Traffic Measurement with OpenSketch. NSDI, Vol. Vol. 13. 29--42.
[75]
Tong Yu, Yong Zhuang, Ole J Mengshoel, and Osman Yagan. 2016. Hybridizing personal and impersonal machine learning models for activity recognition on mobile devices.
[76]
Ce Zhang, Arun Kumar, and Christopher Ré. 2016. Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems (TODS), Vol. 41, 1 (2016), 2.
[77]
Lijun Zhang, Mehrdad Mahdavi, Rong Jin, Tianbao Yang, and Shenghuo Zhu. 2014. Random projections for classification: A recovery approach. IEEE Transactions on Information Theory Vol. 60, 11 (2014), 7300--7316.
[78]
Martin Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent Proceedings of the 20th International Conference on Machine Learning (ICML-03). 928--936.
[79]
Hui Zou and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 67, 2 (2005), 301--320.

Cited By

View all
  • (2024)Relative Keys: Putting Feature Explanation into ContextProceedings of the ACM on Management of Data10.1145/36392632:1(1-28)Online publication date: 26-Mar-2024
  • (2024)Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338858946:11(7136-7153)Online publication date: Nov-2024
  • (2024)Unbiased Real-Time Traffic SketchingIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.328400411:3(2371-2383)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. linear classification
  2. online learning
  3. sketching

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)155
  • Downloads (Last 6 weeks)38
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Relative Keys: Putting Feature Explanation into ContextProceedings of the ACM on Management of Data10.1145/36392632:1(1-28)Online publication date: 26-Mar-2024
  • (2024)Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338858946:11(7136-7153)Online publication date: Nov-2024
  • (2024)Unbiased Real-Time Traffic SketchingIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.328400411:3(2371-2383)Online publication date: May-2024
  • (2024)Robust Sparse Online Learning through Adversarial Sparsity Constraints2024 9th IEEE International Conference on Smart Cloud (SmartCloud)10.1109/SmartCloud62736.2024.00015(42-47)Online publication date: 10-May-2024
  • (2023)Online Level-wise Hierarchical ClusteringProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599455(1733-1745)Online publication date: 6-Aug-2023
  • (2023)MicroscopeSketch: Accurate Sliding Estimation Using Adaptive ZoomingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599432(2660-2671)Online publication date: 6-Aug-2023
  • (2023)BurstSketch: Finding Bursts in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322368635:11(11126-11140)Online publication date: 1-Nov-2023
  • (2023)SGX-Stream: A Secure Stream Analytics Framework In SGX-enabled Edge CloudJournal of Information Security and Applications10.1016/j.jisa.2022.10340372(103403)Online publication date: Feb-2023
  • (2023)Local differentially private frequency estimation based on learned sketchesInformation Sciences10.1016/j.ins.2023.119667649(119667)Online publication date: Nov-2023
  • (2022)Memory bounds for the experts problemProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3520069(1158-1171)Online publication date: 9-Jun-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media