A Compression-Based Dissimilarity Measure for Multi-task Clustering

Nguyen Huy Thach²³,
Hao Shao²³,
Bin Tong²³ &
…
Einoshin Suzuki²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6804))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

Abstract

Virtually all existing multi-task learning methods for string data require either domain specific knowledge to extract feature representations or a careful setting of many input parameters. In this work, we propose a feature-free and parameter-light multi-task clustering algorithm for string data. To transfer knowledge between different domains, a novel dictionary-based compression dissimilarity measure is proposed. Experimental results with extensive comparisons demonstrate the generality and the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improved Compression-Based Pattern Recognition Exploiting New Useful Features

Multi-Manifold Matrix Tri-Factorization for Text Data Clustering

The impact of pre-clustering on classification of heterogeneous protein data

Article 07 December 2021

References

Cai, D., He, X., Wu, X., Han, J.: Non-negative Matrix Factorization on Manifold. In: ICDM, pp. 63–72 (2008)
Google Scholar
Caruana, R.: Multitask Learning. Machine Learning 28, 41–75 (1997)
Article Google Scholar
Dhillon, I.S.: Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning. In: KDD, pp. 269–274 (2001)
Google Scholar
Indrajit, B., et al.: Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering. In: ICDM, pp. 41–50 (2009)
Google Scholar
Gu, Q., Zhou, J.: Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification. In: ICDM, pp. 159–168 (2009)
Google Scholar
Juba, B.: Estimating Relatedness via Data Compression. In: ICML, pp. 441–448 (2006)
Google Scholar
Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)
Article MathSciNet MATH Google Scholar
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards Parameter-free Data Mining. In: KDD, pp. 206–215 (2004)
Google Scholar
Liu, Q., Liao, X., Carin, H.L., Stack, J.R., Carin, L.: Semisupervised Multitask Learning. IEEE Trans. on PAMI 31, 1074–1086 (2009)
Article Google Scholar
Mahmud, M.M.H.: On Universal Transfer Learning. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 135–149. Springer, Heidelberg (2007)
Chapter Google Scholar
Mahmud, M.M.H., Ray, S.R.: Transfer Learning Using Kolmogorov Complexity: Basic Theory and Empirical Evaluations. In: NIPS, pp. 985–992 (2008)
Google Scholar
Ming, L., Paul, V.: An Introduction to Kolmogorov Complexity and its Applications, 2nd edn. Springer, New York (1997)
MATH Google Scholar
Schwaighofer, A., Tresp, V., Yu, K.: Learning Gaussian Process Kernels via Hierarchical Bayes. In: NIPS, pp. 1209–1216 (2004)
Google Scholar
Slonim, N., Tishby, N.: Document Clustering Using Word Clusters via the Information Bottleneck Method. In: SIGIR, pp. 208–215 (2000)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining, pp. 25–36 (2000)
Google Scholar
Vitanyi, P.M.B., Balbach, F.J., Cilibrasi, R., Li, M.: Normalized Information Distance. In: CoRR, abs/0809.2553 (2008)
Google Scholar
Welch, T.: A Technique for High-Performance Data Compression. Computer 17, 8–19 (1984)
Article Google Scholar
Zhang, J., Zhang, C.: Multitask Bregman Clustering. In: AAAI (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan
Nguyen Huy Thach, Hao Shao, Bin Tong & Einoshin Suzuki

Authors

Nguyen Huy Thach
View author publications
You can also search for this author in PubMed Google Scholar
Hao Shao
View author publications
You can also search for this author in PubMed Google Scholar
Bin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Einoshin Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology,, Nowowiejska 15/19, 00-665, Warsaw, Poland
Marzena Kryszkiewicz
Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Henryk Rybinski
University of Warsaw, 02-097, Warsaw, Poland
Andrzej Skowron
Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19,, 00-665, Warsaw, Poland
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thach, N.H., Shao, H., Tong, B., Suzuki, E. (2011). A Compression-Based Dissimilarity Measure for Multi-task Clustering. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-21916-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21915-3
Online ISBN: 978-3-642-21916-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Compression-Based Dissimilarity Measure for Multi-task Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improved Compression-Based Pattern Recognition Exploiting New Useful Features

Multi-Manifold Matrix Tri-Factorization for Text Data Clustering

The impact of pre-clustering on classification of heterogeneous protein data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Compression-Based Dissimilarity Measure for Multi-task Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improved Compression-Based Pattern Recognition Exploiting New Useful Features

Multi-Manifold Matrix Tri-Factorization for Text Data Clustering

The impact of pre-clustering on classification of heterogeneous protein data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation