Abstract
Virtually all existing multi-task learning methods for string data require either domain specific knowledge to extract feature representations or a careful setting of many input parameters. In this work, we propose a feature-free and parameter-light multi-task clustering algorithm for string data. To transfer knowledge between different domains, a novel dictionary-based compression dissimilarity measure is proposed. Experimental results with extensive comparisons demonstrate the generality and the effectiveness of our proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cai, D., He, X., Wu, X., Han, J.: Non-negative Matrix Factorization on Manifold. In: ICDM, pp. 63–72 (2008)
Caruana, R.: Multitask Learning. Machine Learning 28, 41–75 (1997)
Dhillon, I.S.: Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning. In: KDD, pp. 269–274 (2001)
Indrajit, B., et al.: Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering. In: ICDM, pp. 41–50 (2009)
Gu, Q., Zhou, J.: Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification. In: ICDM, pp. 159–168 (2009)
Juba, B.: Estimating Relatedness via Data Compression. In: ICML, pp. 441–448 (2006)
Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards Parameter-free Data Mining. In: KDD, pp. 206–215 (2004)
Liu, Q., Liao, X., Carin, H.L., Stack, J.R., Carin, L.: Semisupervised Multitask Learning. IEEE Trans. on PAMI 31, 1074–1086 (2009)
Mahmud, M.M.H.: On Universal Transfer Learning. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 135–149. Springer, Heidelberg (2007)
Mahmud, M.M.H., Ray, S.R.: Transfer Learning Using Kolmogorov Complexity: Basic Theory and Empirical Evaluations. In: NIPS, pp. 985–992 (2008)
Ming, L., Paul, V.: An Introduction to Kolmogorov Complexity and its Applications, 2nd edn. Springer, New York (1997)
Schwaighofer, A., Tresp, V., Yu, K.: Learning Gaussian Process Kernels via Hierarchical Bayes. In: NIPS, pp. 1209–1216 (2004)
Slonim, N., Tishby, N.: Document Clustering Using Word Clusters via the Information Bottleneck Method. In: SIGIR, pp. 208–215 (2000)
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining, pp. 25–36 (2000)
Vitanyi, P.M.B., Balbach, F.J., Cilibrasi, R., Li, M.: Normalized Information Distance. In: CoRR, abs/0809.2553 (2008)
Welch, T.: A Technique for High-Performance Data Compression. Computer 17, 8–19 (1984)
Zhang, J., Zhang, C.: Multitask Bregman Clustering. In: AAAI (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thach, N.H., Shao, H., Tong, B., Suzuki, E. (2011). A Compression-Based Dissimilarity Measure for Multi-task Clustering. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-21916-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21915-3
Online ISBN: 978-3-642-21916-0
eBook Packages: Computer ScienceComputer Science (R0)