Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets

Published: 27 May 2017 Publication History

Abstract

Stream processing has become extremely popular for analyzing huge volumes of data for a variety of applications, including IoT, social networks, retail, and software logs analysis. Streams of data are produced continuously and are mined to extract patterns characterizing the data. A class of data mining algorithm, called generate-and-test, produces a set of candidate patterns that are then evaluated over data. The main challenges of these algorithms are to achieve high throughput, low latency, and reduced power consumption. In this article, we present a novel power-efficient, fast, and versatile hardware architecture whose objective is to monitor a set of target patterns to maintain their frequency over a stream of data. This accelerator can be used to accelerate data-mining algorithms, including itemsets and sequences mining.
The massive fine-grain reconfiguration capability of field-programmable gate array (FPGA) technologies is ideal to implement the high number of pattern-detection units needed for these intensive data-mining applications. We have thus designed and implemented an IP that features high-density FPGA occupation and high working frequency. We provide detailed description of the IP internal micro-architecture and its actual implementation and optimization for the targeted FPGA resources. We validate our architecture by developing a co-designed implementation of the Apriori Frequent Itemset Mining (FIM) algorithm, and perform numerous experiments against existing hardware and software solutions. We demonstrate that FIM hardware acceleration is particularly efficient for large and low-density datasets (i.e., long-tailed datasets). Our IP reaches a data throughput of 250 million items/s and monitors up to 11.6k patterns simultaneously, on a prototyping board that overall consumes 24W in the worst case. Furthermore, our hardware accelerator remains generic and can be integrated to other generate and test algorithms.

References

[1]
Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Databases (VLDB’94), Vol. 1215. 487--499.
[2]
Chris Anderson. 2006. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion.
[3]
Z. K. Baker and V. K. Prasanna. 2005. Efficient hardware data mining with the Apriori algorithm on FPGAs. In Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 3--12.
[4]
Z. K. Baker and V. K. Prasanna. 2006. An architecture for efficient hardware data mining using reconfigurable computing systems. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 67--75.
[5]
Christian Borgelt. 2003. Efficient implementations of apriori and eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
[6]
Lázaro Bustio, René Cumplido, Raudel Hernández, José M. Bande, and Claudia Feregrino. 2016. Proceedings of the New Frontiers in Mining Complex Patterns: 4th International Workshop (NFMCP’16). Springer International Publishing, 32--45.
[7]
Octavian Cret, Zsolt Mathe, Paul Ciobanu, Sonia Marginean, and Adrian Darabant. 2009. A hardware algorithm for the exact subsequence matching problem in DNA strings. Roman. J. Inform. Sci. Technol. 12, 1 (2009), 51--67.
[8]
FIMI Repository. 2003. Frequent Itemset Mining Dataset Repository. Retrieved from http://fimi.ua.ac.be/data/.
[9]
Xiaoqi Gu, Yongxin Zhu, Shengyan Zhou, Chaojun Wang, Meikang Qiu, and Guoxing Wang. 2016. A real-time FPGA-based accelerator for ECG analysis and diagnosis using association-rule mining. ACM Trans. Embed. Comput. Syst. 15, 2 (2016), 25.
[10]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol. 29. ACM, 1--12.
[11]
IBM. 2012. IBM Quest Synthetic Data Generator. (2012). Retrieved from http://sourceforge.net/projects/ibmquestdatagen/.
[12]
M. Jacobsen, D. Richmond, M. Hogains, and R. Kastner. 2015. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM Trans. Reconfig. Technol. Syst. 8, 4 (Sept. 2015).
[13]
Micron, Inc. 2016. Micron Automata Developer Portal - Hardware. (2016). Retrieved from http://www.micronautomata.com/hardware.
[14]
Micron Technology, Inc. 2013. Micron Automata Processor—A Brief Introduction.
[15]
V. B. Nikam and B. B. Meshram. 2014. Scalable frequent itemset mining using heterogeneous computing: ParApriori algorithm. Int. J. Distrib. Parallel Syst. 5, 5 (2014), 13.
[16]
Shaobo Shi, Yue Qi, and Qin Wang. 2013. FPGA acceleration for intersection computation in frequent itemset mining. In Proceedings of the 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. 514--519.
[17]
Song Sun, M. Steffen, and J. Zambreno. 2008. A reconfigurable platform for frequent pattern mining. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. 55--60.
[18]
S. Sun and J. Zambreno. 2011. Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 22, 9, 1497--1505.
[19]
D. W. Thoni and Alfred Strey. 2009. Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm. In Proceedings of the 2009 International Conference on Field Programmable Logic and Applications.
[20]
Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. 2003. LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
[21]
Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. 2005. LCM Ver.3: Collaboration of array, bitmap and prefix tree for frequent itemset mining. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (OSDM’05). ACM, 77--86.
[22]
Ke Wang, Yanjun Qi, J. J. Fox, M. R. Stan, and K. Skadron. 2015. Association rule mining with the micron automata processor. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 689--699.
[23]
Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen. 2008. Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. Knowl. Data Eng. 20, 6, 784--795.
[24]
Xilinx Inc. 2015. Device Reliability Report—First Half 2015. Technical Report.
[25]
Osmar Zaiane. 2014. Rich Data: Risks, Issues, Controversies 8 Hype. Keynote speech at the International Conference on Advanced Data Mining and Applications.
[26]
Mohammed J. Zaki. 2000. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 3, 372--390.
[27]
Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. 1997. New algorithms for fast discovery of association rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. 283--286.
[28]
Fan Zhang, Yan Zhang, and Jason D. Bakos. 2013a. Accelerating frequent itemset mining on graphics processing units. J. Supercomput. 66, 1, 94--117.
[29]
Yan Zhang, Fan Zhang, Zheming Jin, and Jason D. Bakos. 2013b. An FPGA-based accelerator for frequent itemset mining. ACM Trans. Reconfig. Technol. Syst. 6, 1, Article 2.

Cited By

View all
  • (2023)Fast approximation of the top‐k items in data streams using FPGAsIET Computers & Digital Techniques10.1049/cdt2.1205317:2(60-73)Online publication date: 19-Feb-2023
  • (2020)On the design of hardware architectures for parallel frequent itemsets miningExpert Systems with Applications10.1016/j.eswa.2020.113440157(113440)Online publication date: Nov-2020
  • (2019)Accelerating Itemset Sampling using Satisfiability Constraints on FPGA2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714932(1046-1051)Online publication date: Mar-2019
  • Show More Cited By

Index Terms

  1. Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 3
        September 2017
        187 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3102109
        • Editor:
        • Steve Wilton
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 May 2017
        Accepted: 01 December 2016
        Revised: 01 September 2016
        Received: 01 June 2016
        Published in TRETS Volume 10, Issue 3

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Apriori algorithm
        2. Data mining
        3. FPGA architecture
        4. frequent itemset mining
        5. hardware accelerator
        6. sequence mining
        7. stream mining

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Grenoble Alpes Métropole through the Nano2017 Esprit project

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 16 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Fast approximation of the top‐k items in data streams using FPGAsIET Computers & Digital Techniques10.1049/cdt2.1205317:2(60-73)Online publication date: 19-Feb-2023
        • (2020)On the design of hardware architectures for parallel frequent itemsets miningExpert Systems with Applications10.1016/j.eswa.2020.113440157(113440)Online publication date: Nov-2020
        • (2019)Accelerating Itemset Sampling using Satisfiability Constraints on FPGA2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714932(1046-1051)Online publication date: Mar-2019
        • (2019)Application of data mining for spare parts information in maintenance schedule: a case studyJournal of Manufacturing Technology Management10.1108/JMTM-09-2018-0303ahead-of-print:ahead-of-printOnline publication date: 25-Jul-2019

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media