Abstract
In this paper we introduce a novel and efficient approach to detect and rank topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection and topic ranking has become a challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. We devise a modified time independent PageRank algorithm that assigns an authoritative score to each topic by considering the sub-graph in which the topic appears, producing a ranked list of topics. The use of citation network and the introduction of time invariance in the topic ranking algorithm reveal very interesting results. Our approach also provides a clustering technique for the research papers using topics as similarity measure. We extend our algorithms to study various aspects of topic evolution which gives interesting insight into trends in research areas over time. Our algorithms also detect hot topics and landmark topics over the years. We test our algorithms on the DBLP dataset and show that our algorithms are fast, effective and scalable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. of the 20th VLDB Conference (1994)
Pasquier, N., Bastide, Y., Taoull, R., Lakhal, L.: Efficient Mining of Association Rules Using Closed Itemset Lattices. Information Systems (1999)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of the 7th International Conference on World Wide Web (1998)
Klienberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (1998)
The DBLP Computer Science Bibliography, http://dblp.uni-trier.de/
Wartena, C., Brussee, R.: Topic Detection by Clustering Keywords. In: Proc. of the 19th International Conference on Database and Expert Systems Applications (2008)
Beil, F., Ester, M., Xu, X.: Frequent Term-Based Text Clustering. In: Proc. of the 8th International Conference on Knowledge Discovery and Data Mining (2002)
Krishna, S.M., Bhavani, S.D.: An Efficient Approach for Text Clustering Based on Frequent Itemsets. European Journal of Scientific Research (2010)
Zhuang, L., Dai, H.: A Maximal Frequent Itemset Approach for Web Document Clustering. In: Proc. of the 4th International Conference on Computer and Information Technology (2004)
Geng, X., Wang, J.: Toward theme development analysis with topic clustering. In: Proc. of the 1st International Conference on Advanced Computer Theory and Engineering (2008)
Jo, Y., Lagoze, C., Giles, C.L.: Detecting Research Topics via the Correlation between the Graphs and Texts. In: Proc. of SIGKDD (2007)
Agarwal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. of the 1993 ACM SIGMOD Conference (1993)
Griffiths, T.I., Steyvers, M.: Finding Scientific Topics. Proc. of the National Academy of Sciences (2004)
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.I.: Probabilistic Author-topic Models for Information Discovery. In: Proc. of SIGKDD (2004)
Mei, Q., Zhai, C.: Discovery Evolutionary Theme Patterns from Text – An Exploration of Temporal Text Mining. In: Proc. of SIGKDD (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shubhankar, K., Singh, A.P., Pudi, V. (2011). An Efficient Algorithm for Topic Ranking and Modeling Topic Evolution. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23088-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-23088-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23087-5
Online ISBN: 978-3-642-23088-2
eBook Packages: Computer ScienceComputer Science (R0)