Abstract
In this paper, we present a topic level expertise search framework for heterogeneous networks. Different from the traditional Web search engines that perform retrieval and ranking at document level (or at object level), we investigate the problem of expertise search at topic level over heterogeneous networks. In particular, we study this problem in an academic search and mining system, which extracts and integrates the academic data from the distributed Web. We present a unified topic model to simultaneously model topical aspects of different objects in the academic network. Based on the learned topic models, we investigate the expertise search problem from three dimensions: ranking, citation tracing analysis, and topical graph search. Specifically, we propose a topic level random walk method for ranking the different objects. In citation tracing analysis, we aim to uncover how a piece of work influences its follow-up work. Finally, we have developed a topical graph search function, based on the topic modeling and citation tracing analysis. Experimental results show that various expertise search and mining tasks can indeed benefit from the proposed topic level analysis approach.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to mcmc for machine learning. Machine Learning, 50, 5–43.
Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2009). On smoothing and inference for topic models. In Proceedings of the twenty-fifth annual conference on uncertainty in artificial intelligence (UAI’09) (pp. 27–34).
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM.
Balog, K., Azzopardi, L., & de Rijke, M. (2006). Formal models for expert finding in enterprise corpora. In Proceedings of the 29th ACM SIGIR international conference on information retrieval (SIGIR’2006) (pp. 43–55).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Brefeld, U., & Scheffer, T. (2005). Auc maximizing support vector learning. In Proceedings of ICML’05 workshop on ROC analysis in machine learning.
Buckley, C., & Voorhees, E. M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’04) (pp. 25–32).
Craswell, N., de Vries, A. P., & Soboroff, I. (2005). Overview of the trec-2005 enterprise track. In TREC 2005 conference notebook (pp. 199–205).
Dom, B., Eiron, I., Cozzi, A., & Zhang, Y. (2003). Graph-based ranking algorithms for e-mail expertise analysis. In Data mining and knowledge discovery (pp. 42–48).
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–479.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. In Proceedings of the national academy of sciences (PNAS’04) (pp. 5228–5235).
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd international conference on research and development in information retrieval (SIGIR’99) (pp. 50–57).
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Liu, X., Bollen, J., Nelson, M. L., & de Sompel, H. V. (2005). Co-authorship networks in the digital library research community. Information Processing & Management, 41(6), 681–682.
McCallum, A. (1999). Multi-label text classification with a mixture model trained by em. In Proceedings of AAAI’99 workshop on text learning.
McCallum, A., Wang, X., & Corrada-Emmanuel, A. (2007). Topic and role discovery in social networks with experiments on enron and academic email. Journal of Artificial Intelligence Research (JAIR), 30, 249–272.
McDonell, K. J. (1977). An inverted index implementation. The Computer Journal, 20(1), 116–123.
McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In Proceedings of the 2002 ACM conference on computer supported cooperative work (CSCW’02) (pp. 116–125).
Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 171–180).
Mimno, D., & McCallum, A. (2007). Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’07) (pp. 500–509).
Minka, T. (2003). Estimating a Dirichlet distribution. In Technique report. http://research.microsoft.com/minka/papers/dirichlet/.
Moffat, A., & Zobel, J. (1996). Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4), 349–379.
Moffat, A., Zobel, J., & Sacks-Davis, R. (1994). Memory efficient ranking. Information Processing and Management, 30(6), 733–744.
Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI’99) (pp. 926–931).
Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2007). Distributed inference for latent Dirichlet allocation. In Proceedings of the 19th neural information processing systems (NIPS’07).
Nie, Z., Zhang, Y., Wen, J.-R., & Ma, W.-Y. (2005). Object-level ranking: bringing order to web objects. In Proceedings of the 14th international conference on world wide web (WWW’05) (pp. 567–574).
Nie, Z., Ma, Y., Shi, S., Wen, J.-R., & Ma, W.-Y. (2007). Web object retrieval. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 81–90).
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: bringing order to the web (Technical Report SIDL-WP-1999-0120). Stanford University.
Robertson, S. E., Walker, S., Hancock-Beaulieu, M., Gatford, M., & Payne, A. (1996). Okapi at trec-4. In Text retrieval conference.
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th international conference on uncertainty in artificial intelligence (UAI’04) (pp. 487–494).
Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. McGraw-Hill, New York.
Steyvers, M., Smyth, P., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’04) (pp. 306–315).
Tang, J., Hong, M., Li, J., & Liang, B. (2006). Tree-structured conditional random fields for semantic annotation. In Proceedings of the 5th international semantic web conference (ISWC’06) (pp. 640–653).
Tang, J., Zhang, D., & Yao, L. (2007). Social network extraction of academic researchers. In Proceedings of 2007 IEEE international conference on data mining (ICDM’07) (pp. 292–301).
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008a). Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’08) (pp. 990–998).
Tang, J., Jin, R., & Zhang, J. (2008b). A topic modeling approach and its integration into the random walk framework for academic search. In Proceedings of 2008 IEEE international conference on data mining (ICDM’08) (pp. 1055–1060).
Tang, J., Yao, L., Zhang, D., & Zhang, J. (2010, to appear). A combination approach to web user profiling. ACM Transactions on Knowledge Discovery from Data.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the twenty-first international conference on machine learning (ICML’04) (pp. 823–830).
Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2001). Tree-based reparameterization for approximate estimation on loopy graphs. In Proceedings of the 13th neural information processing systems (NIPS’01) (pp. 1001–1008).
Xi, W., Zhang, B., Chen, Z., Lu, Y., Yan, S., Ma, W.-Y., & Fox, E. A. (2004). Link fusion: a unified link analysis framework for multi-type interrelated data objects. In Proceedings of the 13th international conference on world wide web (WWW’04) (pp. 319–327).
Xi, W., Fox, E. A., Fan, W., Zhang, B., Chen, Z., Yan, J., & Zhuang, D. (2005). Simfusion: measuring similarity using unified relationship matrix. In Proceedings of the 28th ACM SIGIR international conference on information retrieval (SIGIR’2005) (pp. 130–137).
Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th ACM SIGIR international conference on information retrieval (SIGIR’01) (pp. 334–342).
Zhai, C., Velivelli, A., & Yu, B. (2004). A cross-collection mixture model for comparative text mining. In KDD’04 (pp. 743–748).
Zhang, D., Tang, J., & Li, J. (2007a). A constraint-based probabilistic framework for name disambiguation. In Proceedings of the 16th conference on information and knowledge management (CIKM’07) (pp. 1019–1022).
Zhang, J., Ackerman, M. S., & Adamic, L. (2007b). Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 221–230).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: S.V.N. Vishwanathan, Samuel Kaski, Jennifer Neville, and Stefan Wrobel.
Rights and permissions
About this article
Cite this article
Tang, J., Zhang, J., Jin, R. et al. Topic level expertise search over heterogeneous networks. Mach Learn 82, 211–237 (2011). https://doi.org/10.1007/s10994-010-5212-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5212-9