Abstract
The proliferation of networked data in various disciplines motivates a surge of research interests on network or graph mining. Among them, node classification is a typical learning task that focuses on exploiting the node interactions to infer the missing labels of unlabeled nodes in the network. A vast majority of existing node classification algorithms overwhelmingly focus on static networks and they assume the whole network structure is readily available before performing learning algorithms. However, it is not the case in many real-world scenarios where new nodes and new links are continuously being added in the network. Considering the streaming nature of networks, we study how to perform online node classification on this kind of streaming networks (a.k.a. online learning on streaming networks). As the existence of noisy links may negatively affect the node classification performance, we first present an online network embedding algorithm to alleviate this problem by obtaining the embedding representation of new nodes on the fly. Then we feed the learned embedding representation into a novel online soft margin kernel learning algorithm to predict the node labels in a sequential manner. Theoretical analysis is presented to show the superiority of the proposed framework of online learning on streaming networks (OLSN). Extensive experiments on real-world networks further demonstrate the effectiveness and efficiency of the proposed OLSN framework.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
It should be mentioned that for the convenience of presentation, we assume only one new node is added at time stamp \(t+1\). But it can be naturally extended to the case when a new set of m (\(m\ge 1\)) nodes are added to the existing network.
Definition RKHS Given a nonempty set \(\mathcal X\) and a Hilbert space \({{\mathcal H}}\) of functions \(f:\mathcal X\mapsto \mathcal R\), \(\mathcal H\) is a RKHS (Schölkopf and Smola 2002) endowed with kernel function \(k:\mathcal X \times \mathcal X\mapsto \mathcal R\) if k has the reproducing property:
$$\begin{aligned} \langle f(\cdot ),k(\mathbf{x},\cdot )\rangle _{\mathcal H}=f(\mathbf{x}),\forall f\in \mathcal H,\forall \mathbf{x}\in \mathcal X, \end{aligned}$$in particular, \(\langle k(\mathbf{x},\cdot ),k(\mathbf z ,\cdot )\rangle _{\mathcal H}=k(\mathbf{x},\mathbf z ), \forall \mathbf{x},\mathbf z \in \mathcal X\), and k is called the reproducing kernel for \(\mathcal H\).
References
Aggarwal CC, Li N (2011) On node classification in dynamic content-based networks. In: The 11th SIAM international conference on data mining. SIAM, pp 355–366
Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola AJ (2013) Distributed large-scale natural graph factorization. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 37–48
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 14:585–591
Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Aggarwal C (ed) Social network data analytics. Springer, pp 115–148
Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 119–128
Chang S, Zhang Y, Tang J, Yin D, Chang Y, Hasegawa-Johnson M, Huang TS (2016) Positive-unlabeled learning in streaming networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 755–764
Chen C, Tong H, Xie L, Ying L, He Q (2016) Fascinate: fast cross-layer dependency inference on multi-layered networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 765–774
Crammer K, Singer Y (2002) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Crammer K, Singer Y (2003) Ultraconservative online algorithms for multiclass problems. J Mach Learn Res 3:951–991
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585
Francesco O, Crammer K (2010) New adaptive algorithms for online classification. In: Advances in neural information processing systems, pp 1840–1848
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 855–864
Gu Q, Han J (2014) Online spectral learning on a graph with bandit feedback. In: Proceedings of the 2014 IEEE international conference on data mining. IEEE, pp 833–838
Guo T, Zhu X, Pei J, Zhang C (2014) Snoc: streaming network node classification. In: Proceedings of the 2014 IEEE international conference on data mining. IEEE, pp 150–159
Hazan E, Kale S (2011) Newtron: an efficient bandit algorithm for online multiclass prediction. In: Advances in neural information processing systems, pp 891–899
Helmbold DP, Kivinen J, Warmuth MK (1999) Relative loss bounds for single neurons. IEEE Trans Neural Netw 10(6):1291–1304
Huang X, Li J, Hu X (2017a) Accelerated attributed network embedding. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, pp 633–641
Huang X, Li J, Hu X (2017b) Label informed attributed network embedding. In: Proceedings of the 10th ACM international conference on web search and data mining. ACM, pp 731–739
Jian L, Shen S, Li J, Liang X, Li L (2016) Budget online learning algorithm for least squares svm. IEEE transactions on neural networks and learning systems
Hu X, Tang L, Tang J, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the 6th ACM international conference on web search and data mining. ACM, pp 537–546
Kakade SM, Shalev-Shwartz S, Tewari A (2008) Efficient bandit algorithms for online multiclass prediction. In: Proceedings of the 25th international conference on machine learning. ACM, pp 440–447
Kivinen J, Smola AJ, Williamson RC (2002) Large margin classification for moving targets. In: International conference on algorithmic learning theory. Springer, pp 113–127
Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52(8):2165–2176
Li L, Tong H (2015) The child is father of the man: Foresee the success at the early stage. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 655–664
Li J, Hu X, Tang J, Liu H (2015a) Unsupervised streaming feature selection in social media. In: Proceedings of the 24th ACM international conference on conference on information and knowledge management. ACM, pp 1041–1050
Li L, Tong H, Xiao Y, Fan W (2015b) Cheetah: fast graph kernel tracking on dynamic graphs. In: Proceedings of the 2015 SIAM international conference on data mining. SIAM, pp 280–288
Li J, Hu X, Wu L, Liu H (2016) Robust unsupervised feature selection on networked data. In: Proceedings of the 2016 SIAM international conference on data mining. SIAM
Li J, Wu L, Zaïane OR, Liu H (2017a) Toward personalized relational learning. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, pp 444–452
Li J, Dani H, Hu X, Tang J, Chang Y, Liu H (2017b) Attributed network embedding for learning in a dynamic environment. arXiv preprint arxiv:1706.01860
Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th international conference on machine learning, pp 496–503
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 701–710
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for svm. Math Program 127(1):3–30
Singh AP, Gordon GJ (2008) Relational learning via collective matrix factorization. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 650–658
Staiano J, Lepri B, Aharony N, Pianesi F, Sebe N, Pentland A (2012) Friends don’t lie: inferring personality traits from social network structure. In: Proceedings of the 2012 ACM conference on ubiquitous computing. ACM, pp 321–330
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. ACM, pp 1067–1077
Wang Z, Crammer K, Vucetic S (2012a) Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale svm training. J Mach Learn Res 13(10):3103–3131
Wang J, Zhao P, Hoi SCH (2012b) Exact soft confidence-weighted learning. In: Proceedings of the 29th international conference on machine learning, pp 121–128
Wang J, Zhao P, Hoi SCH (2014) Cost-sensitive online classification. IEEE Trans Knowl Data Eng 26(10):2425–2438
Wei X, Cao B, Yu PS (2016) Unsupervised feature selection on networks: a generative view. In: Proceedings of the 30th AAAI conference on artificial intelligence
Wilkinson JH, Wilkinson JH (1965) The algebraic eigenvalue problem, vol 87. Clarendon Press, Oxford
Xia H, Hoi SC, Jin R, Zhao P (2014) Online multiple kernel similarity learning for visual search. IEEE Trans Pattern Anal Mach Intell 36(3):536–549
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Yang C, Liu Z, Zhao D, Sun M, Chang EY (2015) Network representation learning with rich text information. In: Proceedings of the 24 international conference on artificial intelligence. IJCAI/AAAI Press, pp 2111–2117
Zhang D, Yin J, Zhu X, Zhang C (2016) Collective classification via discriminative matrix factorization on sparsely labeled networks. In: Proceedings of the 25th ACM international conference on conference on information and knowledge management. ACM, pp 1563–1572
Zhao P, Hoi SCH (2013) Cost-sensitive online active learning with application to malicious url detection. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 919–927
Zhu S, Yu K, Chi Y, Gong Y (2007) Combining content and link for classification using matrix factorization. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 487–494
Acknowledgements
The authors would like to thank Liangyue Li for helpful discussions. Ling Jian is supported by the National Natural Science Foundation of China under Grant Nos. 61403419, 61503412, 11671418, Natural Science Foundation of Shandong Province under Grant Nos. ZR2013FQ034, ZR2014AP004, and Fundamental Research Funds for the Central Universities under Grant No. 16CX02048A. He also would like to thank the support of China Scholarship Council of visiting the Data Mining and Machine Learning Laboratory of Arizona State University during the year 2015 to 2016. Jundong Li and Huan Liu are supported by the National Science Foundation under Grant 1614576.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Charu Aggarwal.
Appendix
Appendix
In this section, we conduct experiments to see if Perceptron, Pegasos, and OSLG are sensitive to the model parameters involved. We only present the results on DBLP and Citeseer datasets as we have similar observations on the other datasets. In the experiments, we specify the embedding dimension as 30 and the percentage of training as 50%. We vary the parameter of learning rate \(\eta \) in Perceptron and the regularization parameter \(\mu \) in OLSG among \(\{0.1, 0.2, 0.5, 1, 2, 5\}\). For Pegasos, we vary the regularization parameter \(\lambda \) among \(\{10^{-6}, 10^{-5}, 10^{-4}, 10^{-3}, 10^{-2}, 10^{-1}\}\).
The performance variance w.r.t these parameters are listed in Tables 5 and 6. As can be observed, the changes of the parameter \(\eta \) in Perceptron do not affect the classification accuracy at all. For Pegasos and OSLG, different parameter settings (i.e., regularization parameters \(\lambda \) and \(\mu \)) slightly affect the end classification accuracy. However, in Pegasos and OSLG, the default parameter setting is good enough when prior knowledge is not available.
Rights and permissions
About this article
Cite this article
Jian, L., Li, J. & Liu, H. Toward online node classification on streaming networks. Data Min Knowl Disc 32, 231–257 (2018). https://doi.org/10.1007/s10618-017-0533-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-017-0533-y