Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1315451.1315461dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

A regression-based temporal pattern mining scheme for data streams

Published: 09 September 2003 Publication History

Abstract

We devise in this paper a regression-based algorithm, called algorithm FTP-DS (Frequent Temporal Patterns of Data Streams), to mine frequent temporal patterns for data streams. While providing a general framework of pattern frequency counting, algorithm FTP-DS has two major features, namely one data scan for online statistics collection and regression-based compact pattern representation.To attain the feature of one data scan, the data segmentation and the pattern growth scenarios are explored for the frequency counting purpose. Algorithm FTP-DS scans online transaction flows and generates candidate frequent patterns in real time. The second important feature of algorithm FTP-DS is on the regression-based compact pattern representation. Specifically, to meet the space constraint, we devise for pattern representation a compact ATF (standing for Accumulated Time and Frequency) form to aggregately comprise all the information required for regression analysis. In addition, we develop the techniques of the segmentation tuning and segment relaxation to enhance the functions of FTP-DS. With these features, algorithm FTP-DS is able to not only conduct mining with variable time intervals but also perform trend detection effectively. Synthetic data and a real dataset which contains net-Permission work alarm logs from a major telecommunication company are utilized to verify the feasibility of algorithm FTP-DS.

References

[1]
{1} R. Agrawal and R. Srikant. Mining Sequential Patterns. Proceedings of the 11th International Conference on Data Engineering, pages 3-14, March 1995.
[2]
{2} S. Chakrabarti, S. Sarawagi, and B. Dom. Mining Surprising Patterns Using Temporal Description Length. Proceedings of the 24th International Conference on Very Large Data Bases, pages 606-617, August 1998.
[3]
{3} S. Chandrasekaran and M. J. Franklin. Streaming Queries over Streaming Data. Proceedings of the 28th International Conference on Very Large Data Bases, pages 203-214, August 2002.
[4]
{4} Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-Dimensional Regression Analysis of Time-Series Data Streams. Proceedings of the 28th International Conference on Very Large Data Bases, pages 323-334, August 2002.
[5]
{5} D. Cheung, J. Han, V. Ng, and C. Y. Wong. Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. Proceedings of the 12th International Conference on Data Engineering , pages 106-114, February 1996.
[6]
{6} G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule Discovery from Time Series. Proceedings of the 4th ACM SIGKDD, pages 16-22, August 1998.
[7]
{7} M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining Stream Statistics over Sliding Windows. Proceedings of the 2002 Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002.
[8]
{8} A. Dobra, M. N. Garofalakis, J. Gehrke, and R. Rastogi. Processing Complex Aggregate Queries over Data Streams. Proceedings of the 2002 ACM SIGMOD , pages 61-72, June 2002.
[9]
{9} P. Domingos and G. Hulten. Mining High-Speed Data Streams. Proceedings of the 6th ACM SIGKDD, pages 71-80, August 2000.
[10]
{10} V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Data Streams under Block Evolution. SIGKDD Explorations , 3(2): 1-10, January 2002.
[11]
{11} M. N. Garofalakis, J. Gehrke, and R. Rastogi. Querying and Mining Data Streams: You Only Get One Look. Proceedings of the 2002 ACM SIGMOD, June 2002.
[12]
{12} J. Gehrke, F. Korn, and D. Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. Proceedings of the 2001 ACM SIGMOD, pages 13-24, May 2001.
[13]
{13} S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering Data Streams. Proceedings of the 41st Annual Symposium on Foundations of Computer Science , pages 359-366, November 2000.
[14]
{14} G. Hulten, L. Spencer, and P. Domingos. Mining Time-Changing Data Streams. Proceedings of the 7th ACM SIGKDD, pages 97-106, August 2001.
[15]
{15} P. Indyk, N. Koudas, and S. Muthukrishnan. Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. Proceedings of the 26th International Conference on Very Large Data Bases, pages 363-372, September 2000.
[16]
{16} E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An Online Algorithm for Segmenting Time Series. Proceedings of the 1st IEEE International Conference on Data Mining, pages 289-296, November 2001.
[17]
{17} C.-H. Lee, C.-R. Lin, and M.-S. Chen. Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining. Proceeding of the ACM 10th International Conference on Information and Knowledge Management, pages 263-270, November 2001.
[18]
{18} C.-H. Lee, P. S. Yu, and M.-S. Chen. Causality Rules: Exploring the Relationship between Triggering and Consequential Events in a Database of Short Transactions. Proceedings of the 2nd SIAM International Conference on Data Mining, pages 403-419, April 2002.
[19]
{19} H. Lu, J. Han, and L. Feng. Stock Movement Prediction and N-Dimensional Inter-Transaction Association Rules. Proceedings of the 1998 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 12:1-12:7, June 1998.
[20]
{20} G. S. Manku and R. Motwani. Approximate Frequency Counts over Streaming Data. Proceedings of the 28th International Conference on Very Large Data Bases, pages 346-357, August 2002.
[21]
{21} H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of Frequent Episodes in Event Sequences. Data Mining and Knowledge Discovery, 1(3):259-289, 1997.
[22]
{22} L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. High-Performance Clustering of Streams and Large Data Sets. Proceedings of the 18th International Conference on Data Engineering, February 2002.
[23]
{23} J. Yang, W. Wang, P. S. Yu, and J. Han. Mining Long Sequential Patterns in a Noisy Environment. Proceedings of the 2002 ACM SIGMOD, pages 406-417, June 2002.
[24]
{24} M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery of Association Rules. Proceedings of the 3rd ACM SIGKDD, pages 283-286, August 1997.
[25]
{25} D. Zhang, D. Gunopulos, V. J. Tsotras, and B. Seeger. Temporal Aggregation over Data Streams Using Multiple Granularities. Proceedings of the 8th International Conference on Extending Database Technology, pages 646-663, March 2002.

Cited By

View all
  • (2018)Maximally informative k-itemset mining from massively distributed data streamsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167187(502-509)Online publication date: 9-Apr-2018
  • (2017)A highly scalable parallel algorithm for maximally informative k-itemset miningKnowledge and Information Systems10.1007/s10115-016-0931-250:1(1-26)Online publication date: 1-Jan-2017
  • (2014)Plan, Activity, and Intent RecognitionundefinedOnline publication date: 10-Mar-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '03: Proceedings of the 29th international conference on Very large data bases - Volume 29
September 2003
1134 pages

Sponsors

  • VLDB Endowment: Very Large Database Endowment

Publisher

VLDB Endowment

Publication History

Published: 09 September 2003

Qualifiers

  • Article

Conference

VLDB '03
Sponsor:
  • VLDB Endowment
VLDB '03: Very large data bases
September 9 - 12, 2003
Berlin, Germany

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Maximally informative k-itemset mining from massively distributed data streamsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167187(502-509)Online publication date: 9-Apr-2018
  • (2017)A highly scalable parallel algorithm for maximally informative k-itemset miningKnowledge and Information Systems10.1007/s10115-016-0931-250:1(1-26)Online publication date: 1-Jan-2017
  • (2014)Plan, Activity, and Intent RecognitionundefinedOnline publication date: 10-Mar-2014
  • (2013)Sequential pattern mining -- approaches and algorithmsACM Computing Surveys10.1145/2431211.243121845:2(1-39)Online publication date: 12-Mar-2013
  • (2012)Detection of variable length anomalous subsequences in data streamsInternational Journal of Intelligent Information and Database Systems10.1504/IJIIDS.2012.0470056:3(273-288)Online publication date: 1-May-2012
  • (2011)Incremental mining of closed inter-transaction itemsets over data stream sliding windowsJournal of Information Science10.1177/016555151140153937:2(208-220)Online publication date: 1-Apr-2011
  • (2011)A false negative maximal frequent itemset mining algorithm over streamProceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I10.1007/978-3-642-25853-4_3(29-41)Online publication date: 17-Dec-2011
  • (2010)An efficient approach for mining segment-wise intervention rules in time-series streamsProceedings of the 11th international conference on Web-age information management10.5555/1884017.1884049(238-249)Online publication date: 15-Jul-2010
  • (2010)Discovering highly informative feature sets from data streamsProceedings of the 21st international conference on Database and expert systems applications: Part I10.5555/1881867.1881877(91-104)Online publication date: 30-Aug-2010
  • (2010)Modeling and prediction of moving region trajectoriesProceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming10.1145/1878500.1878507(23-30)Online publication date: 2-Nov-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media