Abstract
For automatic object detection tasks, large amounts of training images are usually labeled to achieve more reliable training of the object classifiers; this is cost-expensive since it requires hiring professionals to label large-scale training images. When a large number of object classes come into view, the issue of obtaining a large enough amount of the labeled training images becomes more critical. There are three potential solutions to reduce the burden for image labeling: (1) allowing people to provide the object labels loosely at the image level rather than at the object level (e.g., loosely-tagged images without identifying the exact object locations in the images); (2) harnessing large-scale collaboratively-tagged images that are available on the Internet; and, (3) developing new machine learning algorithms that can directly leverage large-scale collaboratively- or loosely-tagged images for achieving more effective training of a large number of object classifiers. Based on these observations, a multi-task multi-label multiple instance learning (MTML-MIL) algorithm is developed in this paper by leveraging both interobject correlations and large-scale loosely-labeled images for object classifier training. By seamlessly integrating multi-task learning, multi-label learning, and multiple instance learning, our MTML-MIL algorithm can achieve more accurate training of a large number of inter-related object classifiers (where an object network is constructed for determining the inter-related learning tasks directly in the feature space rather than in the label space). Our experimental results have shown that our MTML-MIL algorithm can achieve higher detection accuracy rates for automatic object detection.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Boutell, M.R., Luo, J., Shen, X., Brown, C.M., 2004. Learning multi-label scene classification. Pattern Recogn., 37(9):1757–1771. [doi:10.1016/j.patcog.2004.03.009]
Chen, Y., Bi, J., Wang, J.Z., 2006. MILES: multiple instance learning via embedded instance selection. IEEE Trans. PAMI, 28(12):1931–1947. [doi:10.1109/TPAMI.2006.248]
Deng, Y., Manjunath, B.S., 1999. Color Image Segmentation. IEEE CVPR, p.2446–2451. [doi:10.1109/CVPR.1999.784719]
Evgeniou, T., Micchelli, C.A., Pontil, M., 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res., 6:615–637.
Fan, J., Gao, Y., Luo, H., 2004. Multi-Level Annotation of Natural Scenes Using Dominant Image Components and Semantic Image Concepts. ACM Multimedia, p.540–547. [doi:10.1145/1027527.1027660]
Fan, J., Luo, H., Gao, Y., Jain, R., 2007. Incorporating concept ontology for hierarchical video classification, annotation and visualization. IEEE Trans. Multimedia, 9(5):939–957. [doi:10.1109/TMM.2007.900143]
Fan, J., Gao, Y., Luo, H., 2008a. Integrating concept ontology and multi-task learning to achieve more effective classifier training for multi-level image annotation. IEEE Trans. Image Process., 17(3):407–426. [doi:10.1109/TIP.2008.916999]
Fan, J., Gao, Y., Luo, H., Jain, R., 2008b. Mining multi-level image semantics via hierarchical classification IEEE Trans. Multimedia, 10(1):167–187. [doi:10.1109/TMM.2007.911775]
Fan, J., Shen, Y., Zhou, N., Gao, Y., 2010. Harvesting Large-Scale Weakly-Tagged Image Databases from the Web. IEEE CVPR, p.802–809. [doi:10.1109/CVPR.2010.5540135]
Fan, R., Chen, P., Lin, C.J., 2005. Working set selection using the second order information for training SVM. J. Mach. Learn. Res., 6:1889–1918.
Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]
Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V., 2004. Parallel Support Vector Machines: the Cascade SVM. NIPS, p.1–8.
Hanley, J.A., McNeil, B.J., 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36.
Jiang, W., Chang, S.F., Loui, A., 2007. Context-Based Concept Fusion with Boosted Conditional Random Fields. IEEE ICASSP, p.949–952. [doi:10.1109/ICASSP.2007.366066]
Joachims, T., Finley, T., Yu, C., 2009. Cuttingplane training of structural SVMs. Mach. Learn., 77(1):27–59. [doi:10.1007/s10994-009-5108-8]
Kumar, S., Herbert, M., 2006. Discriminative random fields. Int. J. Comput. Vis., 68(2):179–201. [doi:10.1007/s11263-006-7007-9]
Liu, J., Li, M., Ma, W.Y., Liu, Q., Lu, H., 2006. An Adaptive Graph Model for Automatic Image Annotation. ACM Multimedia Workshop on MIR, p.61–70. [doi:10.1145/1178677.1178689]
Maron, O., Ratan, A.L., 1998. Multiple-Instance Learning for Natural Scene Classification. ICML, p.341–349.
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J., 2007. Correlative Multi-Label Video Annotation. ACM Multimedia, p.17–26. [doi:10. 1145/1291233.1291245]
Russell, B., Efros, A., Sivic, J., Freeman, W., Zisserman, A., 2006. Using Multiple Segmentations to Discover Objects and Their Extent in Image Collections. IEEE CVPR, p.1605–1614. [doi:10.1109/CVPR.2006.326]
Tang, J., Hua, X., Wang, M., Gu, Z., Qi, G., Wu, X., 2009. Correlative linear neighborhood propagation for video annotation. IEEE Trans. SMC, 39(2):409–416. [doi:10.1109/TSMCB.2008.2006045]
Torralba, A., Murphy, K.P., Freeman, W.T., 2004. Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection. IEEE CVPR, p.762–769. [doi:10.1109/CVPR.2004.1315241]
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y., 2005. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6:1453–1484.
Vijayanarasimhan, S., Grauman, K., 2008. Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization. IEEE CVPR, p.1–8. [doi:10.1109/CVPR.2008.4587632]
Yang, J., Liu, Y., Ping, E.X., Hauptmann, A.G., 2007. Harmonium Models for Semantic Video Representation and Classification. SIAM Conf. on Data Mining, p.1–12.
Zha, Z., Hua, X.S., Mei, T., Wang, J., Qi, G.J., Wang, Z., 2008. Joint Multi-Label Multi-Instance Learning for Image Classification. IEEE CVPR, p.1–8. [doi:10.1109/CVPR.2008.4587384]
Zhang, Q., Yu, W., Goldman, S.A., Fritts, J.E., 2002. Content-Based Image Retrieval Using Multiple-Instance Learning. ICML, p.682–689.
Zhu, Z.H., Zhang, M.L., 2006. Multi-Instance Multi-Label Learning with Application to Scene Classification. NIPS, p.1609–1616.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shen, Y., Fan, Jp. Multi-taskmulti-labelmultiple instance learning. J. Zhejiang Univ. - Sci. C 11, 860–871 (2010). https://doi.org/10.1631/jzus.C1001005
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1001005
Key words
- Object network
- Loosely tagged images
- Multi-task learning
- Multi-label learning
- Multiple instance learning