Topic level expertise search over heterogeneous networks

Jie Tang¹,
Jing Zhang¹,
Ruoming Jin²,
Zi Yang¹,
Keke Cai³,
Li Zhang³ &
…
Zhong Su³

1586 Accesses
66 Citations
Explore all metrics

Abstract

In this paper, we present a topic level expertise search framework for heterogeneous networks. Different from the traditional Web search engines that perform retrieval and ranking at document level (or at object level), we investigate the problem of expertise search at topic level over heterogeneous networks. In particular, we study this problem in an academic search and mining system, which extracts and integrates the academic data from the distributed Web. We present a unified topic model to simultaneously model topical aspects of different objects in the academic network. Based on the learned topic models, we investigate the expertise search problem from three dimensions: ranking, citation tracing analysis, and topical graph search. Specifically, we propose a topic level random walk method for ranking the different objects. In citation tracing analysis, we aim to uncover how a piece of work influences its follow-up work. Finally, we have developed a topical graph search function, based on the topic modeling and citation tracing analysis. Experimental results show that various expertise search and mining tasks can indeed benefit from the proposed topic level analysis approach.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to mcmc for machine learning. Machine Learning, 50, 5–43.
Article MATH Google Scholar
Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2009). On smoothing and inference for topic models. In Proceedings of the twenty-fifth annual conference on uncertainty in artificial intelligence (UAI’09) (pp. 27–34).
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM.
Google Scholar
Balog, K., Azzopardi, L., & de Rijke, M. (2006). Formal models for expert finding in enterprise corpora. In Proceedings of the 29th ACM SIGIR international conference on information retrieval (SIGIR’2006) (pp. 43–55).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Article MATH Google Scholar
Brefeld, U., & Scheffer, T. (2005). Auc maximizing support vector learning. In Proceedings of ICML’05 workshop on ROC analysis in machine learning.
Buckley, C., & Voorhees, E. M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’04) (pp. 25–32).
Craswell, N., de Vries, A. P., & Soboroff, I. (2005). Overview of the trec-2005 enterprise track. In TREC 2005 conference notebook (pp. 199–205).
Dom, B., Eiron, I., Cozzi, A., & Zhang, Y. (2003). Graph-based ranking algorithms for e-mail expertise analysis. In Data mining and knowledge discovery (pp. 42–48).
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–479.
Article Google Scholar
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. In Proceedings of the national academy of sciences (PNAS’04) (pp. 5228–5235).
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd international conference on research and development in information retrieval (SIGIR’99) (pp. 50–57).
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Article MATH MathSciNet Google Scholar
Liu, X., Bollen, J., Nelson, M. L., & de Sompel, H. V. (2005). Co-authorship networks in the digital library research community. Information Processing & Management, 41(6), 681–682.
Article Google Scholar
McCallum, A. (1999). Multi-label text classification with a mixture model trained by em. In Proceedings of AAAI’99 workshop on text learning.
McCallum, A., Wang, X., & Corrada-Emmanuel, A. (2007). Topic and role discovery in social networks with experiments on enron and academic email. Journal of Artificial Intelligence Research (JAIR), 30, 249–272.
Google Scholar
McDonell, K. J. (1977). An inverted index implementation. The Computer Journal, 20(1), 116–123.
Article Google Scholar
McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In Proceedings of the 2002 ACM conference on computer supported cooperative work (CSCW’02) (pp. 116–125).
Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 171–180).
Mimno, D., & McCallum, A. (2007). Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’07) (pp. 500–509).
Minka, T. (2003). Estimating a Dirichlet distribution. In Technique report. http://research.microsoft.com/minka/papers/dirichlet/.
Moffat, A., & Zobel, J. (1996). Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4), 349–379.
Article Google Scholar
Moffat, A., Zobel, J., & Sacks-Davis, R. (1994). Memory efficient ranking. Information Processing and Management, 30(6), 733–744.
Article Google Scholar
Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI’99) (pp. 926–931).
Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2007). Distributed inference for latent Dirichlet allocation. In Proceedings of the 19th neural information processing systems (NIPS’07).
Nie, Z., Zhang, Y., Wen, J.-R., & Ma, W.-Y. (2005). Object-level ranking: bringing order to web objects. In Proceedings of the 14th international conference on world wide web (WWW’05) (pp. 567–574).
Nie, Z., Ma, Y., Shi, S., Wen, J.-R., & Ma, W.-Y. (2007). Web object retrieval. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 81–90).
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: bringing order to the web (Technical Report SIDL-WP-1999-0120). Stanford University.
Robertson, S. E., Walker, S., Hancock-Beaulieu, M., Gatford, M., & Payne, A. (1996). Okapi at trec-4. In Text retrieval conference.
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th international conference on uncertainty in artificial intelligence (UAI’04) (pp. 487–494).
Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. McGraw-Hill, New York.
Google Scholar
Steyvers, M., Smyth, P., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’04) (pp. 306–315).
Tang, J., Hong, M., Li, J., & Liang, B. (2006). Tree-structured conditional random fields for semantic annotation. In Proceedings of the 5th international semantic web conference (ISWC’06) (pp. 640–653).
Tang, J., Zhang, D., & Yao, L. (2007). Social network extraction of academic researchers. In Proceedings of 2007 IEEE international conference on data mining (ICDM’07) (pp. 292–301).
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008a). Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’08) (pp. 990–998).
Tang, J., Jin, R., & Zhang, J. (2008b). A topic modeling approach and its integration into the random walk framework for academic search. In Proceedings of 2008 IEEE international conference on data mining (ICDM’08) (pp. 1055–1060).
Tang, J., Yao, L., Zhang, D., & Zhang, J. (2010, to appear). A combination approach to web user profiling. ACM Transactions on Knowledge Discovery from Data.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the twenty-first international conference on machine learning (ICML’04) (pp. 823–830).
Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2001). Tree-based reparameterization for approximate estimation on loopy graphs. In Proceedings of the 13th neural information processing systems (NIPS’01) (pp. 1001–1008).
Xi, W., Zhang, B., Chen, Z., Lu, Y., Yan, S., Ma, W.-Y., & Fox, E. A. (2004). Link fusion: a unified link analysis framework for multi-type interrelated data objects. In Proceedings of the 13th international conference on world wide web (WWW’04) (pp. 319–327).
Xi, W., Fox, E. A., Fan, W., Zhang, B., Chen, Z., Yan, J., & Zhuang, D. (2005). Simfusion: measuring similarity using unified relationship matrix. In Proceedings of the 28th ACM SIGIR international conference on information retrieval (SIGIR’2005) (pp. 130–137).
Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th ACM SIGIR international conference on information retrieval (SIGIR’01) (pp. 334–342).
Zhai, C., Velivelli, A., & Yu, B. (2004). A cross-collection mixture model for comparative text mining. In KDD’04 (pp. 743–748).
Zhang, D., Tang, J., & Li, J. (2007a). A constraint-based probabilistic framework for name disambiguation. In Proceedings of the 16th conference on information and knowledge management (CIKM’07) (pp. 1019–1022).
Zhang, J., Ackerman, M. S., & Adamic, L. (2007b). Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 221–230).

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Jie Tang, Jing Zhang & Zi Yang
Department of Computer Science, Kent State University, Kent, OH, 44241, USA
Ruoming Jin
IBM, China Research Lab, Beijing, China
Keke Cai, Li Zhang & Zhong Su

Authors

Jie Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruoming Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Keke Cai
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Tang.

Additional information

Editors: S.V.N. Vishwanathan, Samuel Kaski, Jennifer Neville, and Stefan Wrobel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, J., Zhang, J., Jin, R. et al. Topic level expertise search over heterogeneous networks. Mach Learn 82, 211–237 (2011). https://doi.org/10.1007/s10994-010-5212-9

Download citation

Received: 01 June 2009
Accepted: 01 May 2010
Published: 17 September 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s10994-010-5212-9

Topic level expertise search over heterogeneous networks

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Expert Profiling Using Heterogeneous Information Networks

A network approach to expertise retrieval based on path similarity and credit allocation

Detecting topic-level influencers in large-scale scientific networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Topic level expertise search over heterogeneous networks

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Expert Profiling Using Heterogeneous Information Networks

A network approach to expertise retrieval based on path similarity and credit allocation

Detecting topic-level influencers in large-scale scientific networks

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation