short-paper

FeatTS: Feature-based Time Series Clustering

Authors:

Donato Tiano,

Angela Bonifati,

Raymond NgAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 2784 - 2788

https://doi.org/10.1145/3448016.3452757

Published: 18 June 2021 Publication History

Get Access

Abstract

Clustering time series is a recurrent problem in real-life applications involving data science and data analytics pipelines. Existing time series clustering algorithms are ineffective for feature-rich real-world time series since they only compare the time series based on raw data or use a fixed set of features for determining the similarity. In this paper, we showcase FeatTS, a feature-based semi-supervised clustering framework addressing the above issues for variable-length and heterogeneous time series. Specifically, FeatTS leverages a graph encoding of the time series that is obtained by considering a high number of significant extracted features. It then employs community detection and builds upon a Co-Occurrence matrix in order to unify all the best clustering results. We let the user explore the various steps of FeatTS by visualizing the initial data, its graph encoding and its division into communities along with the obtained clusters. We show how the user can interact with the process for the choice of the features and for varying the percentage of input labels and the various parameters. In view of its characteristics, FeatTS outperforms the state of the art clustering methods and is the first to be able to digest domain-specific time series such as healthcare time series, while still being robust and scalable.

Supplementary Material

MP4 File (3448016.3452757.mp4)

The problem of clustering time series has several applications in real-life contexts, especially in data science and data analytics pipelines. Existing time series clustering algorithms are ineffective for feature-rich real-world time series since they only compute the similarity of time series based on raw data or use a fixed setof features. In this paper, we showcase FeatTS, a feature-based semi-supervised clustering framework addressing the above issues for variable-length and heterogeneous time series. Specifically, it first relies on a graph encoding of the time series that is obtained by considering a high number of significant extracted features. Itthen employs community detection and leverages a co-occurrencematrix in order to group together all the best clustering results.We let the user delve in the various steps of FeatTS by visualizingthe initial data, its graph encoding and its division into communitiesalong with the obtained clusters.We show how the user can interactwith the process for the choice of the features and for varyingthe percentage of input labels and the parameters. In view of itscharacteristics, FeatTS outperforms the state of the art clusteringmethods and is the first to be able to digest domain-specific timeseries such as healthcare time series, while still being robust andscalable.

Download
169.82 MB

References

[1]

Sugato Basu, Arindam Banerjee, and Raymond J. Mooney. 2002. Semi-supervised Clustering by Seeding. In Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8--12, 2002, Claude Sammut and Achim G. Hoffmann (Eds.). Morgan Kaufmann, 27--34.

Google Scholar

[2]

Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W. Kempa-Liehr. 2018. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package). Neurocomputing 307 (2018), 72--77. https://doi.org/10.1016/j.neucom.2018.03.067

Digital Library

Google Scholar

[3]

Hoang Anh Dau, Anthony J. Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn J. Keogh. 2019. The UCR time series archive. IEEE CAA J. Autom. Sinica 6, 6 (2019), 1293--1305. https://doi.org/10.1109/jas.2019.1911747

Crossref

Google Scholar

[4]

Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice- Hall, Inc., Upper Saddle River, NJ, USA.

Digital Library

Google Scholar

[5]

Yijuan Lu, Ira Cohen, Xiang Sean Zhou, and Qi Tian. 2007. Feature selection using principal feature analysis. In Proceedings of the 15th International Conference on Multimedia 2007, Augsburg, Germany, September 24--29, 2007, Rainer Lienhart, Anand R. Prasad, Alan Hanjalic, Sunghyun Choi, Brian P. Bailey, and Nicu Sebe (Eds.). ACM, 301--304. https://doi.org/10.1145/1291233.1291297

Digital Library

Google Scholar

[6]

Mark E. J. Newman. 2010. Networks: An Introduction. Oxford University Press. https://doi.org/10.1093/ACPROF:OSO/9780199206650.001.0001

Crossref

Google Scholar

[7]

Donato Tiano, Angela Bonifati, and Raymond Ng. 2021. Feature-driven Time Series Clustering. In Proceedings of EDBT.

Google Scholar

[8]

Haishuai Wang, Qin Zhang, Jia Wu, Shirui Pan, and Yixin Chen. 2019. Time series feature learning with labeled and unlabeled data. Pattern Recognit. 89 (2019), 55--66. https://doi.org/10.1016/j.patcog.2018.12.026

Crossref

Google Scholar

Cited By

View all

Dwivedi DKhashouf SMahanty G(2024)From Peaks to TroughsUsing Strategy Analytics for Business Value Creation and Competitive Advantage10.4018/979-8-3693-2823-1.ch009(188-206)Online publication date: 28-Jun-2024
https://doi.org/10.4018/979-8-3693-2823-1.ch009
Zuo RLi GCao RChoi BXu JBhowmick S(2024)DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time SeriesProceedings of the VLDB Endowment10.14778/3681954.368199617:11(3229-3242)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681996
Zhao DChen ZLi ZYuan XTaniguchi I(2024)Improving Building Temperature Forecasting: A Data-driven Approach with System Scenario Clustering2024 IEEE Power & Energy Society General Meeting (PESGM)10.1109/PESGM51994.2024.10689186(1-5)Online publication date: 21-Jul-2024
https://doi.org/10.1109/PESGM51994.2024.10689186
Show More Cited By

Index Terms

FeatTS: Feature-based Time Series Clustering
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
    2. Machine learning algorithms
      1. Feature selection
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Time series analysis

Recommendations

Density-based semi-supervised clustering

Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based ...
Orthogonal feature learning for time series clustering
ISNN'11: Proceedings of the 8th international conference on Advances in neural networks - Volume Part II

This paper presents a new method that uses orthogonalized features for time series clustering and classification. To cluster or classify time series data, either original data or features extracted from the data are used as input for various clustering ...
Feature Selection and Semi-supervised Clustering Using Multiobjective Optimization
ISCMI '14: Proceedings of the 2014 International Conference on Soft Computing and Machine Intelligence

In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering techniques are used to overcome the problems associated with unsupervised and supervised classification. But in general all the features ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

June 2021

2969 pages

ISBN:9781450383431

DOI:10.1145/3448016

General Chairs:
Guoliang Li
Tsinghua University (China)
,
Zhanhuai Li
Northwestern Polytechnical University (China)
,
Program Chairs:
Stratos Idreos
Harvard University (USA)
,
Divesh Srivastava
AT&T (USA)

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Agence Nationale de la Recherche

Conference

SIGMOD/PODS '21

Sponsor:

SIGMOD

SIGMOD/PODS '21: International Conference on Management of Data

June 20 - 25, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
752
Total Downloads

Downloads (Last 12 months)111
Downloads (Last 6 weeks)10

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Dwivedi DKhashouf SMahanty G(2024)From Peaks to TroughsUsing Strategy Analytics for Business Value Creation and Competitive Advantage10.4018/979-8-3693-2823-1.ch009(188-206)Online publication date: 28-Jun-2024
https://doi.org/10.4018/979-8-3693-2823-1.ch009
Zuo RLi GCao RChoi BXu JBhowmick S(2024)DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time SeriesProceedings of the VLDB Endowment10.14778/3681954.368199617:11(3229-3242)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681996
Zhao DChen ZLi ZYuan XTaniguchi I(2024)Improving Building Temperature Forecasting: A Data-driven Approach with System Scenario Clustering2024 IEEE Power & Energy Society General Meeting (PESGM)10.1109/PESGM51994.2024.10689186(1-5)Online publication date: 21-Jul-2024
https://doi.org/10.1109/PESGM51994.2024.10689186
Karimi ASattar NShin WWang F(2024)Power Profile Monitoring and Tracking Evolution of System-Wide HPC Workloads2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00018(93-104)Online publication date: 23-Jul-2024
https://doi.org/10.1109/ICDCS60910.2024.00018
Li QJi YZhu MZhu XSun L(2024)Unsupervised feature selection using chronological fitting with Shapley Additive explanation (SHAP) for industrial time-series anomaly detectionApplied Soft Computing10.1016/j.asoc.2024.111426155:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.asoc.2024.111426
Liu ZKang CXing X(2023)Querying Similar Multi-Dimensional Time Series with a Spatial DatabaseISPRS International Journal of Geo-Information10.3390/ijgi1204017912:4(179)Online publication date: 21-Apr-2023
https://doi.org/10.3390/ijgi12040179
Enes JExpósito RFuentes JCacheiro JTouriño J(2023)A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobsInformation Fusion10.1016/j.inffus.2022.12.01793:C(1-20)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.inffus.2022.12.017
Navajas-Guerrero APortillo EManjarres D(2023)PLAHSApplied Soft Computing10.1016/j.asoc.2023.110718147:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.asoc.2023.110718
Bonifati ABuono FGuerra FTiano D(2022)Time2FeatProceedings of the VLDB Endowment10.14778/3565816.356582216:2(193-201)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.14778/3565816.3565822

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Density-based semi-supervised clustering

Orthogonal feature learning for time series clustering

Feature Selection and Semi-supervised Clustering Using Multiobjective Optimization