Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware

Lázaro Bustio^18,19,
René Cumplido¹⁹,
Raudel Hernández¹⁸,
José M. Bande¹⁸ &
…
Claudia Feregrino¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9607))

Included in the following conference series:

International Workshop on New Frontiers in Mining Complex Patterns

629 Accesses

Abstract

Data streams are unbounded and infinite flows of data arriving at high rates which cannot be stored for offline processing. Because of this, classical approaches for Data Mining cannot be used straightforwardly in data stream scenario. This paper introduces a single-pass hardware-based algorithm for frequent itemsets mining on data streams that uses the top-k frequent 1-itemsets. Experimental results of the hardware implementation of the proposed algorithm are also presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the design of hardware-software architectures for frequent itemsets mining on data streams

Article 16 May 2017

Fast Approximation of the Top-k Items in Data Streams Using a Reconfigurable Accelerator

Frequent Itemsets in Data Streams Using Dynamically Generated Minimum Support

Notes

1.
Dataset is referred to databases, unstructured data file, relational databases or any other data source. In this paper, dataset is used to refer data streams.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc, San Francisco (1994)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2002, pp. 1–16. ACM, New York (2002)
Google Scholar
Baker, Z.K., Prasanna, V.K.: Efficient hardware data mining with the Apriori algorithm on FPGAs. In: Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2005, pp. 3–12. IEEE Computer Society, Washington (2005)
Google Scholar
Baker, Z.K., Prasanna, V.K.: An architecture for efficient hardware data mining using reconfigurable computing systems. In: 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006, FCCM 2006, pp. 67–75, April 2006
Google Scholar
Baralis, E., Cerquitelli, T., Chiusano, S., Grand, A., Grimaudo, L.: An efficient itemset mining approach for data streams. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011, Part II. LNCS, vol. 6882, pp. 515–523. Springer, Heidelberg (2011)
Chapter Google Scholar
Borgelt, C.: Software for frequent pattern mining. http://www.borgelt.net/fpm.html. Accessed on 20 May 2015
Borgelt, C.: Efficient implementations of Apriori and Eclat. In: FIMI. CEUR Workshop Proc., vol. 90. CEUR-WS.org (2003)
Google Scholar
Borgelt, C.: An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, OSDM 2005, pp. 1–5. ACM (2005)
Google Scholar
Cheng, J., Ke, Y., Ng, W.: A survey on algorithms for mining frequent itemsets over data streams. Knowl. Inf. Syst. 16(1), 1–27 (2008)
Article MathSciNet Google Scholar
Compton, K., Hauck, S.: Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34(2), 171–210 (2002)
Article Google Scholar
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 464–475. VLDB Endowment (2003)
Google Scholar
Corporation, I.B.M.: IBM quest market-basket synthetic data generator. http://www.cs.loyola.edu/cgiannel/assoc_gen.html. Accessed on 20 April 2015
Golab, L., Ozsu, M.: Data stream management issues - a survey. SIGMOD Rec. 32(2), 5–14 (2003)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000)
Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, New York (2001)
Google Scholar
Jiang, N., Gruenwald, L.: Research issues in data stream association rule mining. SIGMOD Rec. 35(1), 14–19 (2006)
Article Google Scholar
Jin, R., Agrawal, G.: Frequent pattern mining in data streams. In: Aggarwal, C. (ed.) Data Streams. Advances in Database Systems, vol. 31, pp. 61–84. Springer, US (2007)
Chapter Google Scholar
Lai, Y.K., Wang, N.C., Chou, T.Y., Lee, C.C., Wellem, T., Nugroho, H.T.: Implementing on-line sketch-based change detection on a NetFPGA platform. In: 1st Asia NetFPGA Developers Workshop (2010)
Google Scholar
Lee, W., Stolfo, S.J., Mok, K.W.: Adaptive intrusion detection: A data mining approach. Artif. Intell. Rev. 14(6), 533–567 (2000)
Article MATH Google Scholar
Thanh, L.H., Calders, T.: Mining top-k frequent items in a data stream with flexible sliding windows. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 283–292. ACM, New York (2010)
Google Scholar
Lichman, M.: UCI machine learning repository (2013). Accessed on 20 April 2015
Google Scholar
Malarvizhi, S.P., Sathiyabhama, B.: Enhanced reconfigurable weighted association rule mining for frequent patterns of web logs. Int. J. Comput. 13(2), 97–105 (2014)
Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 346–357. VLDB Endowment (2002)
Google Scholar
Mesa, A., Feregrino-Uribe, C., Cumplido, R., Hernández-Palancar, J.: A highly parallel algorithm for frequent itemset mining. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Kittler, J. (eds.) MCPR 2010. LNCS, vol. 6256, pp. 291–300. Springer, Heidelberg (2010)
Chapter Google Scholar
Metwally, A., Agrawal, D.P., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2005)
Chapter Google Scholar
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)
Article Google Scholar
Pramod, S., Vyas, O.P.: Data stream mining: A review on windowing approach. Global J. Comput. Sci. Technol. 12(11-C) (2012)
Google Scholar
Shi, S., Qi, Y., Wang, Q.: Accelerating intersection computation in frequent itemset mining with FPGA. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013, pp. 659–665, November 2013
Google Scholar
Shie, B.E., Yu, P.S., Tseng, V.S.: Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst. Appl. 39(17), 12947–12960 (2012)
Article Google Scholar
Sun, S., Steffen, M., Zambreno, J.: A reconfigurable platform for frequent pattern mining. In: International Conference on Reconfigurable Computing and FPGAs, 2008, ReConFig 2008, pp. 55–60, December 2008
Google Scholar
Sun, S., Zambreno, J.: Mining association rules with systolic trees. In: International Conference on Field Programmable Logic and Applications, 2008, FPL 2008, pp. 143–148, September 2008
Google Scholar
Sun, S., Zambreno, J.: Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 22(9), 1497–1505 (2011)
Article Google Scholar
Thoni, D.W., Strey, A.: Novel strategies for hardware acceleration of frequent itemset mining with the Apriori algorithm. In: International Conference on Field Programmable Logic and Applications, 2009, FPL 2009, pp. 489–492, August 2009
Google Scholar
Tong, D., Prasanna, V.: Online heavy hitter detector on FPGA. In: 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6. IEEE (2013)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 226–235. ACM, New York (2003)
Google Scholar
Wen, Y.H., Huang, J.W., Chen, M.S.: Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. Knowl. Data Eng. 20(6), 784–795 (2008)
Article Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Article MathSciNet Google Scholar
Zhang, Y., Singh, S., Sen, S., Duffield, N., Lund, C.: Online identification of hierarchical heavy hitters: Algorithms, evaluation, and applications. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, IMC 2004, pp. 101–114. ACM, New York (2004)
Google Scholar
Zhang, Y., Zhang, F., Jin, Z., Bakos, J.D.: An FPGA-based accelerator for frequent itemset mining. ACM Trans. Reconfigurable Technol. Syst. 6(1), 2:1–2:17 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Technologies Application Center, 7a ♯ 21812 e/218 y 222, Rpto. Siboney, Playa, CP 12200, Havana, Cuba
Lázaro Bustio, Raudel Hernández & José M. Bande
National Institute for Astrophysics, Optics and Electronic, Luis Enrique Erro No 1, Sta. Ma. Tonantzintla, 72840, Puebla, Mexico
Lázaro Bustio, René Cumplido & Claudia Feregrino

Authors

Lázaro Bustio
View author publications
You can also search for this author in PubMed Google Scholar
René Cumplido
View author publications
You can also search for this author in PubMed Google Scholar
Raudel Hernández
View author publications
You can also search for this author in PubMed Google Scholar
José M. Bande
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Feregrino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lázaro Bustio .

Editor information

Editors and Affiliations

Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Università degli Studi di Bari Aldo Moro, Bari, Italy
Corrado Loglisci
ICAR-CNR, Rende, Italy
Giuseppe Manco
ICAR-CNR, Rende, Italy
Elio Masciari
University of North Carolina, Charlotte, North Carolina, USA
Zbigniew W. Ras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bustio, L., Cumplido, R., Hernández, R., Bande, J.M., Feregrino, C. (2016). Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2015. Lecture Notes in Computer Science(), vol 9607. Springer, Cham. https://doi.org/10.1007/978-3-319-39315-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-39315-5_3
Published: 18 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39314-8
Online ISBN: 978-3-319-39315-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics