research-article

Open access

Graph Learning with Loss-Guided Training

Authors:

Edith CohenAuthors Info & Claims

GRADES-NDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)

Article No.: 7, Pages 1 - 13

https://doi.org/10.1145/3398682.3400060

Published: 14 June 2020 Publication History

Abstract

Classically, ML models trained with stochastic gradient descent (SGD) are designed to minimize the average loss per example and use a distribution of training examples that remains static in the course of training. Research in recent years demonstrated, empirically and theoretically, that significant acceleration is possible by methods that dynamically adjust the training distribution in the course of training so that training is more focused on examples with higher loss. We explore loss-guided training in a new domain of node embedding methods pioneered by DeepWalk. These methods work with implicit and large set of positive training examples that are generated using random walks on the input graph and therefore are not amenable for typical example selection methods. We propose computationally efficient methods that allow for loss-guided training in this framework. Our empirical evaluation on a rich collection of datasets shows significant acceleration over the baseline static methods, both in terms of total training performed and overall computation.

References

[1]

G. Alain, A. Lamb, C. Sankar, A. C. Courville, and Y. Bengio. Variance reduction in SGD by distributed importance sampling. CoRR, abs/1511.06481, 2015.

[2]

J. Atwood and D. Towsley. Diffusion-convolutional neural networks. In NIPS, 2016.

Digital Library

[3]

Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009.

Digital Library

[4]

Michael W. Berry, Susan T. Dumais, and Gavin W. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573--595, 1995.

Digital Library

[5]

HongYun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans. Knowl. Data Eng., 30(9):1616--1637, 2018.

Digital Library

[6]

M. T. Chao. A general purpose unequal probability sampling plan. Biometrika, 69(3):653--656, 1982.

[7]

E. Cohen, N. Duffield, C. Lund, M. Thorup, and H. Kaplan. Efficient stream sampling for variance-optimal estimation of subset sums. SIAM J. Comput., 40(5), 2011.

Digital Library

[8]

E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In ACM PODC, 2007.

Digital Library

[9]

E. Cohen and H. Kaplan. Sketch-based estimation of subpopulation-weight. Technical Report 802.3448, CORR, 2008.

[10]

Anne Condon and Richard M. Karp. Algorithms for graph partitioning on the planted partition model. Random Struct. Algorithms, 18(2):116--140, 2001.

[11]

M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, 2016.

Digital Library

[12]

J. Deng, W. Dong, R. Socher, L-J Li, K. Li, and F-F Li. Imagenet: A large-scale hierarchical image database. In In CVPR, 2009.

[13]

N. Duffield, M. Thorup, and C. Lund. Priority sampling for estimating arbitrary subset sums. J. Assoc. Comput. Mach., 54(6), 2007.

Digital Library

[14]

R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In ACM KDD 2011. ACM, 2011.

Digital Library

[15]

A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In KDD. ACM, 2016.

Digital Library

[16]

Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In ICDM, 2008.

Digital Library

[17]

T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.

[18]

Thomas N. Kipf and Max Welling. Variational graph auto-encoders, 2016.

[19]

Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD, 2008.

Digital Library

[20]

Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42, 2009.

[21]

I. Loshchilov and F. Hutter. Online batch selection for faster training of neural networks. In ICLR, 2016.

[22]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.

Digital Library

[23]

M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2), February 2004.

[24]

E. Ohlsson. Sequential poisson sampling. J. Official Statistics, 14(2):149--162, 1998.

[25]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.

Digital Library

[26]

B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In KDD. ACM, 2014.

Digital Library

[27]

Radim Řehůřek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.

[28]

H. Robbins and D. O. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. In J. S. Rustagi, editor, Optimizing Methods in Statistics. Academic Press, 1971.

[29]

B. Rosén. Asymptotic theory for order sampling. J. Statistical Planning and Inference, 62(2):135--158, 1997.

[30]

Benedek Rozemberczki, Ryan Davies, Rik Sarkar, and Charles A. Sutton. GEMSEC: graph embedding with self clustering. In ASONAM '19: International Conference on Advances in Social Networks Analysis and Mining, Vancouver, British Columbia, Canada, 27-30 August, 2019, pages 65--72, 2019.

Digital Library

[31]

R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative filtering. In ICML. ACM, 2007.

Digital Library

[32]

F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015.

[33]

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93--106, 2008.

Digital Library

[34]

word2vec implementation.

[35]

S. Shalev-Shwartz and Y. Wexler. Minimizing the maximal loss: How and why. In ICML, 2016.

[36]

A. Shrivastava, A. Gupta, and R. B. Girshick. Training region-based object detectors with online hard example mining. In CVPR, 2016.

[37]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, page 1--21, 2020.

[38]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, page 1--21, 2020.

[39]

Z. Yang, W. W. Cohen, and R. Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. In ICML. JMLR.org, 2016.

[40]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, KDD '18. ACM, 2018.

Digital Library

[41]

P. Zhao and T. Zhang. Stochastic optimization with importance sampling for regularized loss minimization. In ICML, 2015.

Index Terms

Graph Learning with Loss-Guided Training
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Dimensionality reduction and manifold learning

Recommendations

Adaptive graph guided embedding for multi-label annotation
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Adaptive Graph Guided Disambiguation for Partial Label Learning
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Partial label learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one is the ground-truth label. The common strategy to train predictive model is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GRADES-NDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)

June 2020

93 pages

ISBN:9781450380218

DOI:10.1145/3398682

Editors:
Akhil Arora
EPFL
,
Semih Salihoglu
University of Waterloo
,
Nikolay Yakovets
Eindhoven University of Technology

Copyright © 2020 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2020

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Israel Science Foundation

Conference

SIGMOD/PODS '20

Sponsor:

SIGMOD

SIGMOD/PODS '20: International Conference on Management of Data

June 14, 2020

OR, Portland, USA

Acceptance Rates

GRADES-NDA'20 Paper Acceptance Rate 9 of 15 submissions, 60%;

Overall Acceptance Rate 29 of 61 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
254
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)10

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten