Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3448016.3452757acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

FeatTS: Feature-based Time Series Clustering

Published: 18 June 2021 Publication History

Abstract

Clustering time series is a recurrent problem in real-life applications involving data science and data analytics pipelines. Existing time series clustering algorithms are ineffective for feature-rich real-world time series since they only compare the time series based on raw data or use a fixed set of features for determining the similarity. In this paper, we showcase FeatTS, a feature-based semi-supervised clustering framework addressing the above issues for variable-length and heterogeneous time series. Specifically, FeatTS leverages a graph encoding of the time series that is obtained by considering a high number of significant extracted features. It then employs community detection and builds upon a Co-Occurrence matrix in order to unify all the best clustering results. We let the user explore the various steps of FeatTS by visualizing the initial data, its graph encoding and its division into communities along with the obtained clusters. We show how the user can interact with the process for the choice of the features and for varying the percentage of input labels and the various parameters. In view of its characteristics, FeatTS outperforms the state of the art clustering methods and is the first to be able to digest domain-specific time series such as healthcare time series, while still being robust and scalable.

Supplementary Material

MP4 File (3448016.3452757.mp4)
The problem of clustering time series has several applications in real-life contexts, especially in data science and data analytics pipelines. Existing time series clustering algorithms are ineffective for feature-rich real-world time series since they only compute the similarity of time series based on raw data or use a fixed setof features. In this paper, we showcase FeatTS, a feature-based semi-supervised clustering framework addressing the above issues for variable-length and heterogeneous time series. Specifically, it first relies on a graph encoding of the time series that is obtained by considering a high number of significant extracted features. Itthen employs community detection and leverages a co-occurrencematrix in order to group together all the best clustering results.We let the user delve in the various steps of FeatTS by visualizingthe initial data, its graph encoding and its division into communitiesalong with the obtained clusters.We show how the user can interactwith the process for the choice of the features and for varyingthe percentage of input labels and the parameters. In view of itscharacteristics, FeatTS outperforms the state of the art clusteringmethods and is the first to be able to digest domain-specific timeseries such as healthcare time series, while still being robust andscalable.

References

[1]
Sugato Basu, Arindam Banerjee, and Raymond J. Mooney. 2002. Semi-supervised Clustering by Seeding. In Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8--12, 2002, Claude Sammut and Achim G. Hoffmann (Eds.). Morgan Kaufmann, 27--34.
[2]
Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W. Kempa-Liehr. 2018. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package). Neurocomputing 307 (2018), 72--77. https://doi.org/10.1016/j.neucom.2018.03.067
[3]
Hoang Anh Dau, Anthony J. Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn J. Keogh. 2019. The UCR time series archive. IEEE CAA J. Autom. Sinica 6, 6 (2019), 1293--1305. https://doi.org/10.1109/jas.2019.1911747
[4]
Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice- Hall, Inc., Upper Saddle River, NJ, USA.
[5]
Yijuan Lu, Ira Cohen, Xiang Sean Zhou, and Qi Tian. 2007. Feature selection using principal feature analysis. In Proceedings of the 15th International Conference on Multimedia 2007, Augsburg, Germany, September 24--29, 2007, Rainer Lienhart, Anand R. Prasad, Alan Hanjalic, Sunghyun Choi, Brian P. Bailey, and Nicu Sebe (Eds.). ACM, 301--304. https://doi.org/10.1145/1291233.1291297
[6]
Mark E. J. Newman. 2010. Networks: An Introduction. Oxford University Press. https://doi.org/10.1093/ACPROF:OSO/9780199206650.001.0001
[7]
Donato Tiano, Angela Bonifati, and Raymond Ng. 2021. Feature-driven Time Series Clustering. In Proceedings of EDBT.
[8]
Haishuai Wang, Qin Zhang, Jia Wu, Shirui Pan, and Yixin Chen. 2019. Time series feature learning with labeled and unlabeled data. Pattern Recognit. 89 (2019), 55--66. https://doi.org/10.1016/j.patcog.2018.12.026

Cited By

View all
  • (2024)From Peaks to TroughsUsing Strategy Analytics for Business Value Creation and Competitive Advantage10.4018/979-8-3693-2823-1.ch009(188-206)Online publication date: 28-Jun-2024
  • (2024)DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time SeriesProceedings of the VLDB Endowment10.14778/3681954.368199617:11(3229-3242)Online publication date: 30-Aug-2024
  • (2024)Improving Building Temperature Forecasting: A Data-driven Approach with System Scenario Clustering2024 IEEE Power & Energy Society General Meeting (PESGM)10.1109/PESGM51994.2024.10689186(1-5)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering for data science
  2. community detection
  3. features selection
  4. semi-supervised clustering

Qualifiers

  • Short-paper

Funding Sources

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)10
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)From Peaks to TroughsUsing Strategy Analytics for Business Value Creation and Competitive Advantage10.4018/979-8-3693-2823-1.ch009(188-206)Online publication date: 28-Jun-2024
  • (2024)DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time SeriesProceedings of the VLDB Endowment10.14778/3681954.368199617:11(3229-3242)Online publication date: 30-Aug-2024
  • (2024)Improving Building Temperature Forecasting: A Data-driven Approach with System Scenario Clustering2024 IEEE Power & Energy Society General Meeting (PESGM)10.1109/PESGM51994.2024.10689186(1-5)Online publication date: 21-Jul-2024
  • (2024)Power Profile Monitoring and Tracking Evolution of System-Wide HPC Workloads2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00018(93-104)Online publication date: 23-Jul-2024
  • (2024)Unsupervised feature selection using chronological fitting with Shapley Additive explanation (SHAP) for industrial time-series anomaly detectionApplied Soft Computing10.1016/j.asoc.2024.111426155:COnline publication date: 2-Jul-2024
  • (2023)Querying Similar Multi-Dimensional Time Series with a Spatial DatabaseISPRS International Journal of Geo-Information10.3390/ijgi1204017912:4(179)Online publication date: 21-Apr-2023
  • (2023)A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobsInformation Fusion10.1016/j.inffus.2022.12.01793:C(1-20)Online publication date: 1-May-2023
  • (2023)PLAHSApplied Soft Computing10.1016/j.asoc.2023.110718147:COnline publication date: 1-Nov-2023
  • (2022)Time2FeatProceedings of the VLDB Endowment10.14778/3565816.356582216:2(193-201)Online publication date: 1-Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media