Article

Model-shared subspace boosting for multi-label classification

Authors:

John R. SmithAuthors Info & Claims

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 834 - 843

https://doi.org/10.1145/1281192.1281281

Published: 12 August 2007 Publication History

Abstract

Typical approaches to the multi-label classification problem require learning an independent classifier for every label from all the examples and features. This can become a computational bottleneck for sizeable datasets with a large label space. In this paper, we propose an efficient and effective multi-label learning algorithm called model-shared subspace boosting (MSSBoost) as an attempt to reduce the information redundancy in the learning process. This algorithm automatically finds, shares and combines a number of base models across multiple labels, where each model is learned from random feature subspace and boots trap data samples. The decision functions for each label are jointly estimated and thus a small number of shared subspace models can support the entire label space. Our experimental results on both synthetic data and real multimedia collections have demonstrated that the proposed algorithm can achieve better classification performance than the non-ensemble baselineclassifiers with a significant speedup in the learning and prediction processes. It can also use a smaller number of base models to achieve the same classification performance as its non-model-shared counterpart.

References

[1]

R. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Technical Report RC23462, IBM Research Center, 45, 2004.

[2]

K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3, 2002.

Digital Library

[3]

L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.

[4]

L. Breiman. Random forests. Mach. Learn., 45(1):5--32, 2001.

Digital Library

[5]

R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.

Digital Library

[6]

R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In Intl. Conf. of Machine Learning, 2004.

Digital Library

[7]

C. Chen, A. Liaw, and L. Breiman. Using random forest to learn unbalanced data. Technical Report 666, Statistics Department, University of California at Berkeley, 2004.

[8]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119--139, 1997.

Digital Library

[9]

J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Technical report, Dept. of Statistics, Stanford University, 1998.

[10]

N. Ghamrawi and A. McCallum. Collective multi-label classification. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 195--200, New York, NY, USA, 2005. ACM Press.

Digital Library

[11]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer Verlag, Basel, 2001.

[12]

T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell., 20(8):832--844, 1998.

Digital Library

[13]

T. Joachims. Making large-scale support vector machine learning practical. In A. S. B. Schölkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.

Digital Library

[14]

M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. IEEE MultiMedia, 13(3):86--91, 2006.

Digital Library

[15]

A. Natsev, M. R. Naphade, and J. R. Smith. Semantic representation: search and mining of multimedia content. In Proceedings of the 2004 ACM SIGKDD international conference, pages 641--646, 2004.

Digital Library

[16]

A. Natsev, M. R. Naphade, and J. Tešić. Learning the semantics of multimedia queries and concepts from a small number of examples. In Proceedings of the 13th annual ACM international conference on Multimedia, pages 598--607, New York, NY, USA, 2005. ACM Press.

Digital Library

[17]

S. Rosset. Robust boosting and its relation to bagging. In Proceeding of the eleventh ACM SIGKDD international conference, pages 249--255, New York, NY, USA, 2005.

Digital Library

[18]

R. Schapire and Y. Singer. Boostexter: A system for multiclass multi-label text categorization. Machine Learning, 39(2), 2000.

Digital Library

[19]

R. E. Schapire. Using output codes to boost multiclass learning problems. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 313--321, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc.

Digital Library

[20]

A. Smeaton and P. Over. TRECVID: Benchmarking the effectiveness of information retrieval tasks on digital video. In Proc. of the Intl. Conf. on Image and Video Retrieval, 2003.

Digital Library

[21]

C. Snoek, M. Worring, J. Geusebroek, D. Koelma, and F. Seinstra. The mediamill TRECVID 2004 semantic viedo search engine. In Proc. of TRECVID, 2004.

[22]

D. Tao, X. Tang, X. Li, and X. Wu. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 28(7):1088--1099, 2006.

Digital Library

[23]

A. Torralba, K. Murphy, and W. Freeman. Sharing visual features for multiclass and multiview object detection. In IEEE Computer Vision and Pattern Recognition(CVPR), 2004.

[24]

R. Yan and A. G. Hauptmann. Mining relationship between video concepts using probabilistic graphical model. In Proceedings of IEEE International Conference On Multimedia and Expo (ICME), 2006.

[25]

Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. of the 14th ICML, pages 412--420, 1997.

Digital Library

[26]

J. Zhang, Z. Ghahramani, and Y. Yang. Learning multiple related tasks using latent independent component analysis. In Neural Information Processing Systems (NIPS) 18, 2005.

Cited By

Awal Kassim MViktor HMichalowski W(2024)Multi-Label Lifelong Machine Learning: A Scoping Review of Algorithms, Techniques, and ApplicationsIEEE Access10.1109/ACCESS.2024.340356912(74539-74557)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3403569
Wang WZhang XFu H(2024)Personalized Medicine with Multiple TreatmentsStatistics in Precision Health10.1007/978-3-031-50690-1_6(131-161)Online publication date: 25-Jun-2024
https://doi.org/10.1007/978-3-031-50690-1_6
Mokhberi MBiswas AMasud ZKteily-Hawa RGoldstein AGillis JRayana SAhmed S(2023)Development of a COVID-19–Related Anti-Asian Tweet Data Set: Quantitative StudyJMIR Formative Research10.2196/404037(e40403)Online publication date: 28-Feb-2023
https://doi.org/10.2196/40403
Show More Cited By

Index Terms

Model-shared subspace boosting for multi-label classification
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory

Recommendations

Clustered intrinsic label correlations for multi-label classification

The classifier for each label consists of a label-specific part and a shared one.The label-specific part characterizes the corresponding label.The shared part represents the information shared by all labels.Intrinsic label correlations are represented ...
Improving multi-label classification using semi-supervised learning and dimensionality reduction
PRICAI'12: Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence

Multi-label classification has been increasingly recognized since it can assign multiple class labels to an object. This paper proposes a new method to solve simultaneously two major problems in multi-label classification; (1) requirement of sufficient ...
Incorporating label dependency into the binary relevance framework for multi-label classification

In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2007

1080 pages

ISBN:9781595936097

DOI:10.1145/1281192

General Chair:
Pavel Berkhin
Yahoo!, USA
,
Program Chairs:
Rich Caruana
Cornell University, USA
,
Xindong Wu
University of Vermont, USA

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD07

Sponsor:

KDD07: The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 12 - 15, 2007

California, San Jose, USA

Acceptance Rates

KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

87
Total Citations
View Citations
1,335
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)5

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Awal Kassim MViktor HMichalowski W(2024)Multi-Label Lifelong Machine Learning: A Scoping Review of Algorithms, Techniques, and ApplicationsIEEE Access10.1109/ACCESS.2024.340356912(74539-74557)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3403569
Wang WZhang XFu H(2024)Personalized Medicine with Multiple TreatmentsStatistics in Precision Health10.1007/978-3-031-50690-1_6(131-161)Online publication date: 25-Jun-2024
https://doi.org/10.1007/978-3-031-50690-1_6
Mokhberi MBiswas AMasud ZKteily-Hawa RGoldstein AGillis JRayana SAhmed S(2023)Development of a COVID-19–Related Anti-Asian Tweet Data Set: Quantitative StudyJMIR Formative Research10.2196/404037(e40403)Online publication date: 28-Feb-2023
https://doi.org/10.2196/40403
Malik SIdrees MDanish HAhmad AKhalid SShahzad S(2023)Classification of Call TranscriptionsVAWKUM Transactions on Computer Sciences10.21015/vtcs.v11i2.159111:2(18-34)Online publication date: 7-Oct-2023
https://doi.org/10.21015/vtcs.v11i2.1591
Dong HWang FHe DLiu Y(2023)Decision system for copper flotation backbone processEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106410123(106410)Online publication date: Aug-2023
https://doi.org/10.1016/j.engappai.2023.106410
Dong HWang FHe DLiu Y(2022)The intelligent decision-making of copper flotation backbone process based on CK-XGBoostKnowledge-Based Systems10.1016/j.knosys.2022.108429243:COnline publication date: 11-May-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.108429
Li LCao PYang JZaiane O(2022)Modeling global and local label correlation with graph convolutional networks for multi-label chest X-ray image classificationMedical & Biological Engineering & Computing10.1007/s11517-022-02604-160:9(2567-2588)Online publication date: 4-Jul-2022
https://doi.org/10.1007/s11517-022-02604-1
Li XGuo Y(2022)Bi-directional Representation Learning for Multi-label ClassificationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_14(209-224)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44851-9_14
Xie JLi SSun P(2022)Analysis and Detection Against Overlapping Phenomenon of Behavioral Attribute in Network AttacksScience of Cyber Security10.1007/978-3-031-17551-0_14(217-232)Online publication date: 30-Sep-2022
https://doi.org/10.1007/978-3-031-17551-0_14
Ghawi RPfeffer J(2021)A Hybrid Thresholding Strategy combining RCut and PCut for Multi-label ClassificationThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487702(278-287)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1145/3487664.3487702
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents