research-article

Batch Mode Active Learning for Networked Data

Authors:

Jie TangAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 3, Issue 2

Article No.: 33, Pages 1 - 25

https://doi.org/10.1145/2089094.2089109

Published: 01 February 2012 Publication History

Abstract

We study a novel problem of batch mode active learning for networked data. In this problem, data instances are connected with links and their labels are correlated with each other, and the goal of batch mode active learning is to exploit the link-based dependencies and node-specific content information to actively select a batch of instances to query the user for learning an accurate model to label unknown instances in the network. We present three criteria (i.e., minimum redundancy, maximum uncertainty, and maximum impact) to quantify the informativeness of a set of instances, and formalize the batch mode active learning problem as selecting a set of instances by maximizing an objective function which combines both link and content information. As solving the objective function is NP-hard, we present an efficient algorithm to optimize the objective function with a bounded approximation rate. To scale to real large networks, we develop a parallel implementation of the algorithm. Experimental results on both synthetic datasets and real-world datasets demonstrate the effectiveness and efficiency of our approach.

References

[1]

Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., and Ng, A. 2005. Discriminative learning of markov random fields for segmentation of 3d scan data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 169--176.

Digital Library

[2]

Attenberg, J. and Provost, F. 2010. Active inference and learning for classifying streams. In Proceedings of the Budgeted Learning Workshop in International Conference on Machine Learning (ICML Workshop).

[3]

Besag, J. 1986. On the statistical analysis of dirty pictures. J. Roy. Statist. Soc. 259--302.

[4]

Beygelzimer, A., Dasgupta, S., and Langford, J. 2009. Importance weighted active learning. In Proceedings of the International Conference on Machine Learning (ICML). 49--56.

Digital Library

[5]

Bilgic, M. and Getoor, L. 2008. Effective label acquisition for collective classification. In Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 43--51.

Digital Library

[6]

Bilgic, M. and Getoor, L. 2009. Reflect and correct: A misclassification prediction approach to active inference. ACM Trans. Knowl. Discov. Data (TKDD) 3, 4, 1--32.

Digital Library

[7]

Bilgic, M. and Getoor, L. 2010. Active inference for collective classification. In Proceedings of the National Conference on Artificial Intelligence (AAAI).

[8]

Bilgic, M., Mihalkova, L., and Getoor, L. 2010. Active learning for networked data. In Proceedings of the International Conference on Machine Learning (ICML).

[9]

Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222--1239.

Digital Library

[10]

Brinker, K. 2003. Incorporating diversity in active learning with support vector machines. In Proceedings of the International Conference on Machine Learning (ICML). 59--66.

[11]

Cannon, L. E. 1969. A cellular computer to implement the kalman filter algorithm. Ph.D. thesis, Montana State University.

Digital Library

[12]

Cesa-Bianchi, N., Gentile, C., Vitale, F., and Zappella, G. 2010. Active learning on trees and graphs. Tech. rep., MIT Press.

[13]

Chakrabarti, S., Dom, B., and Indyk, P. 1998. Enhanced hypertext categorization using hyperlinks. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD). 307--318.

Digital Library

[14]

Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S. 1998. Learning to extract symbolic knowledge from the world wide web. In Proceedings of the National Conference on Artificial Intelligence (AAAI). 509--516.

Digital Library

[15]

Frank, A. and Asuncion, A. 2010. UCI machine learning repository. http://archive.ics.uci.edu/ml/citation_policy.html.

[16]

Getoor, L., Segal, E., Taskar, B., and Koller, D. 2001. Probabilistic models of text and link structure for hypertext classification. In Proceedings of the International Joint Conference on Artificial Intelligence Workshop on “Text Learning: Beyond Supervision” (IJCAI Workshop). 24--29.

[17]

Gropp, W., Lusk, E., and Skjellum, A. 1994. Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press.

Digital Library

[18]

Guo, Y. and Schuurmans, D. 2008. Discriminative batch mode active learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 593--600.

[19]

Harpale, A. S. and Yang, Y. 2008. Personalized active learning for collaborative filtering. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR). 91--98.

Digital Library

[20]

Heß, A. and Kushmerick, N. 2004. Iterative ensemble classification for relational data: A case study of semantic web services. In Proceedings of the European Conference on Machine Learning (ECML). 156--167.

[21]

Hoi, S. C. H., Jin, R., and Lyu, M. R. 2006a. Large-Scale text categorization by batch mode active learning. In Proceedings of the World Wide Web Conference (WWW). 633--642.

Digital Library

[22]

Hoi, S. C. H., Jin, R., Zhu, J., and Lyu, M. R. 2006b. Batch mode active learning and its application to medical image classification. In Proceedings of the International Conference on Machine Learning (ICML). 417--424.

Digital Library

[23]

Hoi, S. C. H., Jin, R., Zhu, J., and Lyu, M. R. 2008. Semi-Supervised svm batch mode active learning for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--7.

[24]

Huang, K. and Wang, F. 1997. Design patterns for parallel computations of master-slave model. In Proceedings of the International Conference on Information, Communications and Signal Processing (ICICS). 1508--1512.

[25]

Hummel, R. and Zucker, S. 1983. On the foundations of relaxation labeling processes. IEEE Trans. Pattern Anal. Mach. Intell. 5, 3, 267--287.

Digital Library

[26]

Jensen, D. and Neville, J. 2002. Linkage and autocorrelation cause feature selection bias in relational learning. In Proceedings of the International Conference on Machine Learning (ICML). 259--266.

Digital Library

[27]

Jensen, D., Neville, J., and Gallagher, B. 2004. Why collective inference improves relational classification. In Proceedings of ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 593--598.

Digital Library

[28]

Joshi, A. J., Porikli, F., and Papanikolopoulos, N. 2010. Multi-Class batch-mode active learning for image classification. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). 1873--1878.

[29]

Kawahara, Y., Nagano, K., Tsuda, K., and Bilmes, J. 2009. Submodularity cuts and applications. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).

[30]

Kschischang, F. R., Frey, B. J., and Loeliger, H. 2001. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 498--519.

Digital Library

[31]

Lafferty, J. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML). 282--289.

Digital Library

[32]

Li, Y. and Jain, A. K. 1998. Classification of text documents. In Proceedings of the International Conference on Pattern Recognition (ICPR). 1295.

Digital Library

[33]

Liben-Nowell, D. and Kleinberg, J. 2007. The link-prediction problem for social networks. J. Amer. Soc. Inf. Sci. Technol. 58, 7, 1019--1031.

Digital Library

[34]

Lu, Q. and Getoor, L. 2003. Link-Based classification. In Proceedings of the 20th International Conference on Machine Learning (ICML). 496--503.

[35]

Macskassy, S. A. 2007. Improving learning in networked data by combining explicit and mined links. In Proceedings of the National Conference on Artificial Intelligence (AAAI). 590--595.

Digital Library

[36]

Macskassy, S. A. 2009. Using graph-based metrics with empirical risk minimization to speed up active learning on networked data. In Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 597--606.

Digital Library

[37]

Macskassy, S. A. and Provost, F. 2007. Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935--983.

Digital Library

[38]

Namata, G. M., Sen, P., Bilgic, M., and Getoor, L. 2009. Collective classification for text classification. In Text Mining: Classification, Clustering, and Applications. Taylor and Francis Group.

[39]

Nemhauser, G., Wolsey, L., and Fisher, M. 1978. An analysis of the approximations for maximizing submodular set functions. Math. Program. 14, 265--294.

Digital Library

[40]

Neville, J. and Jensen, D. 2000. Iterative classificataion in relational data. In Proceedings of the National Conference on Artificial Intelligence Workshop on Statistical Relational Learing (AAAI Workshop). 42--49.

[41]

Nodelman, U., Shelton, C., and Koller, D. 2003. Learning continuous time bayesian networks. In Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence (UAI). 451--458.

Digital Library

[42]

Özdogan, C. 2006. Cannon’s matrix-matrix multiplication with mpi’s topologies. siber.cankaya.edu.tr/ozdogan/GraduateParallelComputing/ceng505/.

[43]

Pease, M. C. 1967. Matrix inversion using parallel processing. J. ACM 14, 4, 757--764.

Digital Library

[44]

Rajan, S., Yankov, D., Gaffney, S. J., and Ratnaparkhi, A. 2010. A large-scale active learning system for topical categorization on the web. In Proceedings of the World Wide Web Conference (WWW). 791--800.

Digital Library

[45]

Rattigan, M. J., Maier, M., and JJensen, D. 2007. Exploiting network structure for active inference in collective classification. In Proceedings of the International Conference on Data Mining Workshop (ICDM Workshop). 429--434.

Digital Library

[46]

Roy, N. and McCallum, A. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML). 441--448.

Digital Library

[47]

Sen, P., Namata, G. M., Bilgic, M., Getoor, L., Gallagher, B., and Eliassi-Rad, T. 2008. Collective classification in network data. AI Mag. 29, 3, 93--106.

Digital Library

[48]

Settles, B. 2010. Active learning literture survey. Tech. rep. 1648, Computer Science Department, University of Wisconsin-Madison.

[49]

Shi, L. and Zhao, Y. 2010. Batch mode sparse active learning. In Proceedings of the International Conference on Data Mining Workshop (ICDM Workshop). 875--882.

Digital Library

[50]

Slattery, S. and Craven, M. 1998. Combining statistical and relational methods for learning in hypertext domains. In Proceedings of the 8th International Conference on Inductive Logic Programming (ILP). 38--52.

Digital Library

[51]

Steven, C. H. H., Rong, J., and R, R. L. M. 2009. Batch mode active learning with applications to text categorization and image retrieval. IEEE Trans. Knowl. Data Engin. 21, 1233--1248.

Digital Library

[52]

Tan, C., Tang, J., Sun, J., Lin, Q., and Wang, F. 2010. Social action tracking via noise tolerant time-varying factor graphs. In Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 1049--1058.

Digital Library

[53]

Tang, J., Sun, J., Wang, C., and Yang, Z. 2009. Social influence analysis in large-scale networks. In Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 807--816.

Digital Library

[54]

Taskar, B., Segal, E., and Koller, D. 2001. Probabilistic classification and clustering in relational data. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 870--878.

Digital Library

[55]

Taskar, B., Abbeel, P., and D.Kooler. 2002. Discriminative probabilistic models for relational data. In Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence (UAI). 485--492.

Digital Library

[56]

Tong, S. and Chang, E. 2001. Support vector machine active learning for image retrieval. In Proceedings of the ACM Multimedia Conference (MULTIMEDIA). 107--118.

Digital Library

[57]

Xu, L., Wilkinson, D., Southey, F., and Schuurmans, D. 2006. Discriminative unsupervised learning of structured predictors. In Proceedings of the International Conference on Machine Learning (ICML). 1057--1064.

Digital Library

[58]

Xu, Z., Hogan, C., and Bauer, R. 2009. Greedy is not enough: An efficient batch mode active learning algorithm. In Proceedings of the International Conference on Data Mining Workshop (ICDM Worshop). 326--331.

Digital Library

[59]

Yang, T., Jin, R., Chi, Y., and Zhu, S. 2009. Combining link and content for community detection: A discriminative approach. In Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 927--936.

Digital Library

[60]

Yedidia, J., Freeman, W., and Weiss, Y. 2005. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory 51, 7, 2282--2312.

Digital Library

[61]

Zhu, X. 2005. Semi-Supervised learning with graphs. Ph.D. thesis, Carnegie Mellon University. CMU-LTI-05-192.

Digital Library

[62]

Zhu, X., Ghahramani, Z., and Lafferty, J. 2003a. Semi-Supervised learning using gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning (ICML). 912--919.

[63]

Zhu, X., Lafferty, J., and Ghahramani, Z. 2003b. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning Workshop (ICML Workshop). 58--65.

Cited By

Zhang KQian BWei JYin CCao SLi XCao YZheng Q(2023)Adaptive batch mode active learning with deep similarityEgyptian Informatics Journal10.1016/j.eij.2023.10041224:4(100412)Online publication date: Dec-2023
https://doi.org/10.1016/j.eij.2023.100412
Luo XDu HZhou GLi XMao FZhu DXu YZhang MHe SHuang Z(2021)A Novel Query Strategy-Based Rank Batch-Mode Active Learning Method for High-Resolution Remote Sensing Image ClassificationRemote Sensing10.3390/rs1311223413:11(2234)Online publication date: 7-Jun-2021
https://doi.org/10.3390/rs13112234
Bashar MNayak R(2021)Active Learning for Effectively Fine-Tuning Transfer Learning to Downstream TaskACM Transactions on Intelligent Systems and Technology10.1145/344634312:2(1-24)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3446343
Show More Cited By

Index Terms

Batch Mode Active Learning for Networked Data
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Semisupervised SVM batch mode active learning with applications to image retrieval

Support vector machine (SVM) active learning is one popular and successful technique for relevance feedback in content-based image retrieval (CBIR). Despite the success, conventional SVM active learning has two main drawbacks. First, the performance of ...
Multi-label active learning by model guided distribution matching

Multi-label learning is an effective framework for learning with objects that have multiple semantic labels, and has been successfully applied into many real-world tasks. In contrast with traditional single-label learning, the cost of labeling a multi-...
Discriminative batch mode active learning
NIPS'07: Proceedings of the 20th International Conference on Neural Information Processing Systems

Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 3, Issue 2

February 2012

455 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2089094

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2012

Accepted: 01 August 2011

Revised: 01 June 2011

Received: 01 March 2011

Published in TIST Volume 3, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang KQian BWei JYin CCao SLi XCao YZheng Q(2023)Adaptive batch mode active learning with deep similarityEgyptian Informatics Journal10.1016/j.eij.2023.10041224:4(100412)Online publication date: Dec-2023
https://doi.org/10.1016/j.eij.2023.100412
Luo XDu HZhou GLi XMao FZhu DXu YZhang MHe SHuang Z(2021)A Novel Query Strategy-Based Rank Batch-Mode Active Learning Method for High-Resolution Remote Sensing Image ClassificationRemote Sensing10.3390/rs1311223413:11(2234)Online publication date: 7-Jun-2021
https://doi.org/10.3390/rs13112234
Bashar MNayak R(2021)Active Learning for Effectively Fine-Tuning Transfer Learning to Downstream TaskACM Transactions on Intelligent Systems and Technology10.1145/344634312:2(1-24)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3446343
Zhou QLi LWu XCao NYing LTong H(2021)Attent: Active Attributed Network AlignmentProceedings of the Web Conference 202110.1145/3442381.3449886(3896-3906)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449886
Li YYin JChen L(2021)SEAL: Semisupervised Adversarial Active Learning on Attributed GraphsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.300968232:7(3136-3147)Online publication date: Jul-2021
https://doi.org/10.1109/TNNLS.2020.3009682
Kumar KKumar Kori A(2021)Power Load flow analysis for Active Islanding Mode2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC52330.2021.9640725(1670-1674)Online publication date: 11-Nov-2021
https://doi.org/10.1109/I-SMAC52330.2021.9640725
Hasan MPaul SMourikis ARoy-Chowdhury A(2020)Context-Aware Query Selection for Active Learning in Event RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.287869642:3(554-567)Online publication date: 1-Mar-2020
https://doi.org/10.1109/TPAMI.2018.2878696
Das BIsufi ELeus G(2020)Active Semi-Supervised Learning for Diffusions on GraphsICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP40776.2020.9054300(9075-9079)Online publication date: May-2020
https://doi.org/10.1109/ICASSP40776.2020.9054300
Luo RWang X(2020)Batch Active Learning With Two-Stage SamplingIEEE Access10.1109/ACCESS.2020.29793158(46518-46528)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2979315
Bhattacharjee STolone WParanjape V(2019)Identifying malicious social media contents using multi-view Context-Aware active learningFuture Generation Computer Systems10.1016/j.future.2019.03.015Online publication date: May-2019
https://doi.org/10.1016/j.future.2019.03.015
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents