Learning from label proportions with pinball loss

Yong Shi^2,3,4,
Limeng Cui¹,
Zhensong Chen² &
…
Zhiquan Qi ORCID: orcid.org/0000-0001-9289-9110⁵

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Learning from label proportions is a new kind of learning problem which has drawn much attention in recent years. Different from the well-known supervised learning, it considers instances in bags and uses the label proportion of each bag instead of instance. As obtaining the instance label is not always feasible, it has been widely used in areas like modeling voting behaviors and spam filtering. However, learning from label proportions still suffers great challenges due to the inference of noise, the improper partition of bags and so on. In this paper, we propose a novel learning from label proportions method based on pinball loss, called “pSVM-pin”, to address the above issues. The pinball loss is introduced to generate an effective classifier in order to eliminate the impact of noise. Experimental results prove the precision of pSVM-pin compared with competing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LaRW: boosting open-set semi-supervised learning with label-guided re-weighting

Article 20 October 2023

Evaluation in Learning from Label Proportions: An Approximation to the Precision-Recall Curve

Partial Label Learning with Noisy Labels

Article 31 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hernández-González J, Inza I, Lozano JA (2015) A novel weakly supervised problem: learning from positive-unlabeled proportions. In: Puerta J et al (eds) Advances in artificial intelligence. Springer, Cham, pp 3–13
Chapter Google Scholar
Chapelle O, Schölkopf B, Zien A et al (2006) Semi-supervised learning. IEEE Transactions on Neural Networks 20(3):542–542
Article Google Scholar
Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning 3(1):1–130
Article MATH Google Scholar
Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: Advances in neural information processing systems, pp 561–568
Bunescu RC, Mooney RJ (2007) Multiple instance learning for sparse positive bags. In: Proceedings of the 24th international conference on machine learning. ACM, pp 105–112
Quadrianto N, Smola AJ, Caetano TS, Le QV (2009) Estimating labels from label proportions. J Mach Learn Res 10:2349–2374
MathSciNet MATH Google Scholar
Rueping S (2010) SVM classifier estimation from group probabilities. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 911–918
Stolpe M, Morik K (2011) Learning from label proportions by optimizing cluster model selection. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 349–364
Chapter Google Scholar
Yu F, Liu D, Kumar S, Tony J, Chang SF (2013) $\propto$SVM for learning with label proportions. In: Proceedings of the 30th international conference on machine learning, pp 504–512
Patrini G, Nock R, Caetano T, Rivera P (2014) (Almost) no label no cry. In: Advances in Neural Information Processing Systems, pp 190–198
Musicant DR, Christensen JM, Olson JF (2007) Supervised learning by training on aggregate outputs. Data mining, 2007. ICDM 2007. Seventh IEEE international conference on IEEE, pp 252–261
Chen T, Yu FX, Chen J, Cui Y, Chen YY, Chang SF (2014) Object-based visual sentiment concept analysis and application. In: Proceedings of the ACM international conference on multimedia. ACM, pp 367–376
Lai KT, Yu FX, Chen MS, Chang SF (2014) Video event detection by inferring temporal instance labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2251–2258
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(05):557
Article MathSciNet MATH Google Scholar
Xiao X, Tao Y (2006) Anatomy: Simple and effective privacy preservation. In: Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment, pp 139–150
Martin DJ, Kifer D, Machanavajjhala A, Gehrke J, Halpern JY (2007) Worst-case background knowledge for privacy-preserving data publishing. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on IEEE, pp 126–135
Kumari DA (2013) Slicing: a new approach to privacy preserving data publishing related to medical data-base using k-means clustering technique. Int J Adv Engg Res Technol 2(8)
Li XB, Sarkar S (2006) A tree-based data perturbation approach for privacy-preserving data mining. IEEE Trans Knowl Data Eng 18(9):1278
Article Google Scholar
Muralidhar K, Parsa R, Sarathy R (1999) A general additive data perturbation method for database security. Manag Sci 45(10):1399–1415
Article Google Scholar
Mitra P, Murthy C, Pal SK (2000) Data condensation in large databases by incremental learning with support vector machines. Pattern recognition, 2000. In: Proceedings of 15th international conference on, vol 2. IEEE, pp 708–711
Pan F, Zhang X, Wang W (2008) Crd: fast co-clustering on large datasets utilizing sampling-based matrix decomposition. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, pp 173–184
Kück H, de Freitas N (2005) Learning about individuals from group statistics. In: Proceedings of the twenty-first conference on uncertainty in artificial intelligence. AUAI Press, Corvallis, pp 332–339
Google Scholar
Hernández J, Inza I (2011) Learning naive Bayes models for multiple-instance learning with label proportions. In: Lozano JA, Gámez JA, Moreno JA (eds) Advances in Artificial Intelligence. Springer, Berlin, Heidelberg, pp 134–144
Chapter Google Scholar
Huang X, Shi L, Suykens JA (2015) Sequential minimal optimization for SVM with pinball loss. Neurocomputing 149:1596–1603
Article Google Scholar
Koenker R (2005) Quantile regression, vol 38. Cambridge University Press
Christmann A, Steinwart I (2007) How SVMs can estimate quantiles and the median. In: Advances in neural information processing systems, pp 305–312
Steinwart I, Christmann A et al (2011) Estimating conditional quantiles with the help of the pinball loss. Bernoulli 17(1):211–225
Article MathSciNet MATH Google Scholar
Huang X, Shi L, Suykens J et al (2014) Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 36(5):984–997
Article Google Scholar
Huang X, Shi L, Suykens JA (2014) Solution path for PIN-SVM classifiers with positive and negative $\tau$ values. IEEE transactions on neural networks and learning systems
Tragante do OV, Fierens D, Blockeel H (2011) Instance-level accuracy versus bag-level accuracy in multi-instance learning. In: Proceedings of the 23rd Benelux conference on artificial intelligence (BNAIC), p 8
Moro S, Laureano R, Cortez P (2011) Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Proceedings of European Simulation and Modelling Conference-ESM'2011, pp 117–121
Yu FX, Choromanski K, Kumar S, Jebara T, Chang SF (2014) On Learning from Label Proportions. arXiv:1402.5902 (arXiv preprint)

Download references

Acknowledgements

We thank the anonymous reviewer for thoroughly reading our manuscript and providing helpful comments.This work is supported by National Natural Science Foundation of China (Grant nos. 91546201, 71331005, 71110107026, 61402429).

Author information

Authors and Affiliations

School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
Limeng Cui
School of Economics and Management, University of Chinese Academy of Sciences, Beijing, China
Yong Shi & Zhensong Chen
Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China
Yong Shi
College of Information Science & Technology, University of Nebraska Omaha, Omaha, NE, USA
Yong Shi
Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, China
Zhiquan Qi

Authors

Yong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Limeng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhensong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiquan Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiquan Qi.

Appendices

Appendix A: Dual problem of pSVM-pin

The problem in Eq. (7) can be transformed into:

$$\begin{aligned}&\min _{w, b}\frac{1}{2}\vert \vert w\vert \vert ^2+C\sum ^N_{i=1}\xi _i \\&\qquad s.t. y_i(w^T\phi (x_i)+b)\ge 1-\xi _i, \quad i=1,2,\dots ,N, \\&\quad y_i(w^T\phi (x_i)+b)\le 1+\frac{1}{\tau }\xi _i, \quad i=1,2,\dots ,N. \end{aligned}$$

(18)

Introduce the Lagrange function

$$\begin{aligned}&L(w, b ,\xi , \alpha , \beta ) \\&\quad =\frac{1}{2}\vert \vert w\vert \vert ^2+C\sum ^N_{i=1}{\xi _i} \\&\qquad -\sum ^N_{i=1}{\alpha _i(y_i(w^T\phi (x_i)+b)-1+\xi _i)} \\&\qquad -\sum ^N_{i=1}{\beta _i(-y_i(w^T\phi (x_i)+b)+1+\frac{1}{\tau }\xi _i)} \end{aligned}$$

(19)

where $\alpha _i=(\alpha _1,\dots ,\alpha _N)^T$ and $\beta _i=(\beta _1,\dots ,\beta _N)^T$ are the Lagrange multiplier vectors.

Then the KKT sufficient and necessary optimality conditions of the problem (18) are shown by

$$\begin{aligned}&w-\sum ^N_{i=1}(\alpha _i-\beta _i)y_i\phi (x_i)=0, \\&\quad -\sum ^N_{i=1}{(\alpha _i-\beta _i)y_i}=0, \\&C-\alpha _i-\frac{1}{\tau }\beta _i=0, \\&\sum ^N_{i=1}{\alpha _i(y_i(w^T\phi (x_i)+b)-1+\xi _i)}=0, \\&\sum ^N_{i=1}{\beta _i(-y_i(w^T\phi (x_i)+b)+1+\frac{1}{\tau }\xi _i)}=0, \\&\quad \alpha _i\ge 0, \beta _i\ge 0. \end{aligned}$$

(20)

that is

$$\begin{aligned} w=\sum ^N_{i=1}(\alpha _i-\beta _i)y_i\phi (x_i). \end{aligned}$$

(21)

The dual problem of (18) is obtained as follows,

$$\begin{aligned}&\max _{\alpha , \beta }-\frac{1}{2}\sum ^N_{i=1}{\sum ^N_{j=1}{(\alpha _i-\beta _i)y_i\phi (x_i)^T\phi (x_j)y_j(\alpha _j-\beta _j)}} \\&\quad +\sum ^N_{i=1}{{(\alpha _i-\beta _i)}} \\&s.t. \sum ^N_{i=1}{(\alpha _i-\beta _i)y_i=0}, \\&\qquad \alpha _i+\frac{1}{\tau }\beta _i=C, \quad i=1,2,\dots ,N, \\&\qquad \alpha _i\ge 0, \quad i=1,2,\dots ,N, \\&\qquad \beta _i\ge 0, \quad i=1,2,\dots ,N. \end{aligned}$$

(22)

Introduce the variables $\gamma _i=\alpha _i-\beta _i$ and eliminate the quality constraint $\alpha _i+\frac{1}{\tau }\beta _i=C$, we get

$$\begin{aligned}&\max _{\gamma , \beta }-\frac{1}{2}\sum ^N_{i=1}{\sum ^N_{j=1}{\gamma _iy_i\phi (x_i)^T\phi (x_j)y_j\gamma _j}}+\sum ^N_{i=1}{{\gamma _i}} \\& s.t. \sum ^N_{i=1}{\gamma _iy_i=0}, \\&\quad -\tau C\le \gamma _i\le C, i=1,2,\dots ,N. \end{aligned}$$

(23)

The dual problem (23) has the same solution set w.r.t.$\alpha$ as that to the following convex quadratic programming problem in the Euclidean space $R^l$:

$$\begin{aligned}&\min _{\gamma , \beta }\frac{1}{2}\sum ^N_{i=1}{\sum ^N_{j=1}{\gamma _iy_i\phi (x_i)^T\phi (x_j)y_j\gamma _j}}-\sum ^N_{i=1}{{\gamma _i}} \\& s.t. \sum ^N_{i=1}{\gamma _iy_i=0}, \\&\quad -\tau C\le \gamma _i\le C, i=1,2,\dots ,N. \end{aligned}$$

(24)

Suppose $\gamma ^*=(\gamma _1^*, \gamma _2^*, \dots , \gamma _l^*)$ is the solution to problem (24).

We can have

$$\begin{aligned} w^*=\sum ^N_{i=1}{\gamma _i^*}y_i\phi (x_i), \end{aligned}$$

(25)

and

$$\begin{aligned} b^*=y_j-\sum ^N_{i=1}y_i\gamma _i^*\phi (x_i)^T\phi (x_j), \end{aligned}$$

(26)

where $\forall j: -\tau C<\gamma _j^*<C$.

Then the obtained function can be represented as

$$\begin{aligned} f(x)=\sum ^N_{i=1}y_i\gamma _i^*\phi (x_i)^T\phi (x_j)+b^*, \forall j: -\tau C<\gamma _j^*<C. \end{aligned}$$

(27)

Appendix B: Additional experiment results

We show additional experiment results in Tables 9, 10 and 11.

Table 9 Accuracy with rbf kernel and its standard deviation (bag size: 2, 4, 8, 16, 32, 64), where the best accuracy achieved by LLP methods is shown by bold figures

Full size table

Table 10 Bag error with rbf kernel (bag size: 2, 4, 8, 16, 32, 64), where the least bag error obtained by LLP methods is presented by bold figures

Full size table

Table 11 Instance-level accuracy with rbf kernel (bag size: 2, 4, 8, 16, 32, 64 and the bold figures denote the best results obtained by LLP methods.)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, Y., Cui, L., Chen, Z. et al. Learning from label proportions with pinball loss. Int. J. Mach. Learn. & Cyber. 10, 187–205 (2019). https://doi.org/10.1007/s13042-017-0708-2

Download citation

Received: 14 September 2016
Accepted: 31 July 2017
Published: 19 August 2017
Issue Date: 31 January 2019
DOI: https://doi.org/10.1007/s13042-017-0708-2

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LaRW: boosting open-set semi-supervised learning with label-guided re-weighting

Evaluation in Learning from Label Proportions: An Approximation to the Precision-Recall Curve

Partial Label Learning with Noisy Labels

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Dual problem of pSVM-pin

Appendix B: Additional experiment results

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Learning from label proportions with pinball loss

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LaRW: boosting open-set semi-supervised learning with label-guided re-weighting

Evaluation in Learning from Label Proportions: An Approximation to the Precision-Recall Curve

Partial Label Learning with Noisy Labels

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Dual problem of pSVM-pin

Appendix B: Additional experiment results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now