Nothing Special   »   [go: up one dir, main page]

Skip to main content

Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9607))

Included in the following conference series:

  • 622 Accesses

Abstract

Data streams are unbounded and infinite flows of data arriving at high rates which cannot be stored for offline processing. Because of this, classical approaches for Data Mining cannot be used straightforwardly in data stream scenario. This paper introduces a single-pass hardware-based algorithm for frequent itemsets mining on data streams that uses the top-k frequent 1-itemsets. Experimental results of the hardware implementation of the proposed algorithm are also presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Dataset is referred to databases, unstructured data file, relational databases or any other data source. In this paper, dataset is used to refer data streams.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc, San Francisco (1994)

    Google Scholar 

  2. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2002, pp. 1–16. ACM, New York (2002)

    Google Scholar 

  3. Baker, Z.K., Prasanna, V.K.: Efficient hardware data mining with the Apriori algorithm on FPGAs. In: Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2005, pp. 3–12. IEEE Computer Society, Washington (2005)

    Google Scholar 

  4. Baker, Z.K., Prasanna, V.K.: An architecture for efficient hardware data mining using reconfigurable computing systems. In: 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006, FCCM 2006, pp. 67–75, April 2006

    Google Scholar 

  5. Baralis, E., Cerquitelli, T., Chiusano, S., Grand, A., Grimaudo, L.: An efficient itemset mining approach for data streams. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011, Part II. LNCS, vol. 6882, pp. 515–523. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Borgelt, C.: Software for frequent pattern mining. http://www.borgelt.net/fpm.html. Accessed on 20 May 2015

  7. Borgelt, C.: Efficient implementations of Apriori and Eclat. In: FIMI. CEUR Workshop Proc., vol. 90. CEUR-WS.org (2003)

    Google Scholar 

  8. Borgelt, C.: An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, OSDM 2005, pp. 1–5. ACM (2005)

    Google Scholar 

  9. Cheng, J., Ke, Y., Ng, W.: A survey on algorithms for mining frequent itemsets over data streams. Knowl. Inf. Syst. 16(1), 1–27 (2008)

    Article  MathSciNet  Google Scholar 

  10. Compton, K., Hauck, S.: Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34(2), 171–210 (2002)

    Article  Google Scholar 

  11. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 464–475. VLDB Endowment (2003)

    Google Scholar 

  12. Corporation, I.B.M.: IBM quest market-basket synthetic data generator. http://www.cs.loyola.edu/cgiannel/assoc_gen.html. Accessed on 20 April 2015

  13. Golab, L., Ozsu, M.: Data stream management issues - a survey. SIGMOD Rec. 32(2), 5–14 (2003)

    Article  Google Scholar 

  14. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000)

    Google Scholar 

  15. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, New York (2001)

    Google Scholar 

  16. Jiang, N., Gruenwald, L.: Research issues in data stream association rule mining. SIGMOD Rec. 35(1), 14–19 (2006)

    Article  Google Scholar 

  17. Jin, R., Agrawal, G.: Frequent pattern mining in data streams. In: Aggarwal, C. (ed.) Data Streams. Advances in Database Systems, vol. 31, pp. 61–84. Springer, US (2007)

    Chapter  Google Scholar 

  18. Lai, Y.K., Wang, N.C., Chou, T.Y., Lee, C.C., Wellem, T., Nugroho, H.T.: Implementing on-line sketch-based change detection on a NetFPGA platform. In: 1st Asia NetFPGA Developers Workshop (2010)

    Google Scholar 

  19. Lee, W., Stolfo, S.J., Mok, K.W.: Adaptive intrusion detection: A data mining approach. Artif. Intell. Rev. 14(6), 533–567 (2000)

    Article  MATH  Google Scholar 

  20. Thanh, L.H., Calders, T.: Mining top-k frequent items in a data stream with flexible sliding windows. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 283–292. ACM, New York (2010)

    Google Scholar 

  21. Lichman, M.: UCI machine learning repository (2013). Accessed on 20 April 2015

    Google Scholar 

  22. Malarvizhi, S.P., Sathiyabhama, B.: Enhanced reconfigurable weighted association rule mining for frequent patterns of web logs. Int. J. Comput. 13(2), 97–105 (2014)

    Google Scholar 

  23. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 346–357. VLDB Endowment (2002)

    Google Scholar 

  24. Mesa, A., Feregrino-Uribe, C., Cumplido, R., Hernández-Palancar, J.: A highly parallel algorithm for frequent itemset mining. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Kittler, J. (eds.) MCPR 2010. LNCS, vol. 6256, pp. 291–300. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  25. Metwally, A., Agrawal, D.P., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  26. Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)

    Article  Google Scholar 

  27. Pramod, S., Vyas, O.P.: Data stream mining: A review on windowing approach. Global J. Comput. Sci. Technol. 12(11-C) (2012)

    Google Scholar 

  28. Shi, S., Qi, Y., Wang, Q.: Accelerating intersection computation in frequent itemset mining with FPGA. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013, pp. 659–665, November 2013

    Google Scholar 

  29. Shie, B.E., Yu, P.S., Tseng, V.S.: Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst. Appl. 39(17), 12947–12960 (2012)

    Article  Google Scholar 

  30. Sun, S., Steffen, M., Zambreno, J.: A reconfigurable platform for frequent pattern mining. In: International Conference on Reconfigurable Computing and FPGAs, 2008, ReConFig 2008, pp. 55–60, December 2008

    Google Scholar 

  31. Sun, S., Zambreno, J.: Mining association rules with systolic trees. In: International Conference on Field Programmable Logic and Applications, 2008, FPL 2008, pp. 143–148, September 2008

    Google Scholar 

  32. Sun, S., Zambreno, J.: Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 22(9), 1497–1505 (2011)

    Article  Google Scholar 

  33. Thoni, D.W., Strey, A.: Novel strategies for hardware acceleration of frequent itemset mining with the Apriori algorithm. In: International Conference on Field Programmable Logic and Applications, 2009, FPL 2009, pp. 489–492, August 2009

    Google Scholar 

  34. Tong, D., Prasanna, V.: Online heavy hitter detector on FPGA. In: 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6. IEEE (2013)

    Google Scholar 

  35. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 226–235. ACM, New York (2003)

    Google Scholar 

  36. Wen, Y.H., Huang, J.W., Chen, M.S.: Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. Knowl. Data Eng. 20(6), 784–795 (2008)

    Article  Google Scholar 

  37. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

    Article  MathSciNet  Google Scholar 

  38. Zhang, Y., Singh, S., Sen, S., Duffield, N., Lund, C.: Online identification of hierarchical heavy hitters: Algorithms, evaluation, and applications. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, IMC 2004, pp. 101–114. ACM, New York (2004)

    Google Scholar 

  39. Zhang, Y., Zhang, F., Jin, Z., Bakos, J.D.: An FPGA-based accelerator for frequent itemset mining. ACM Trans. Reconfigurable Technol. Syst. 6(1), 2:1–2:17 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lázaro Bustio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bustio, L., Cumplido, R., Hernández, R., Bande, J.M., Feregrino, C. (2016). Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2015. Lecture Notes in Computer Science(), vol 9607. Springer, Cham. https://doi.org/10.1007/978-3-319-39315-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39315-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39314-8

  • Online ISBN: 978-3-319-39315-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics