Abstract
Acquiring an overview of an unfamiliar discipline and exploring relevant papers and journals is often a laborious task for researchers. In this paper we show how exploratory search can be supported on a large collection of academic papers to allow users to answer complex scientometric questions which traditional retrieval approaches do not support optimally. We use our ConceptCloud browser, which makes use of a combination of concept lattices and tag clouds, to visually present academic publication data (specifically, the ACM Digital Library) in a browsable format that facilitates exploratory search. We augment this dataset with semantic categories, obtained through automatic keyphrase extraction from papers’ titles and abstracts, in order to provide the user with uniform keyphrases of the underlying data collection. We use the citations and references of papers to provide additional mechanisms for exploring relevant research by presenting aggregated reference and citation data not only for a single paper but also across topics, authors and journals, which is novel in our approach. We conduct a user study to evaluate our approach in which we asked 34 participants, from different academic backgrounds with varying degrees of research experience, to answer a variety of scientometric questions using our ConceptCloud browser. Participants were able to answer complex scientometric questions using our ConceptCloud browser with a mean correctness of 73%, with the user’s prior research experience having no statistically significant effect on the results.
Similar content being viewed by others
Notes
The introductory video of the ConceptCloud Browser for academic papers is available at https://www.youtube.com/watch?v=8zJ618yOWBI.
References
Abt, H. A. (2007). The future of single-authored papers. Scientometrics, 73(3), 353–358.
Accociation for Computing Machinery. (2015). ACM computing classification system ToC. http://www.acm.org/about/class. Accessed 18 August 2016.
ACM Digital Library. (2016). ACM digital library. http://dl.acm.org/. Accessed 18 August 2016.
Aguillo, I. F., Bar-Ilan, J., Levene, M., & Ortega, J. L. (2010). Comparing university rankings. Scientometrics, 85(1), 243–256.
Beck, F., Koch, S., & Weiskopf, D. (2016). Visual analysis and dissemination of scientific literature collections with survis. IEEE Transactions on Visualization and Computer Graphics, 22(1), 180–189.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the seventh international conference on World Wide Web. WWW ’07 (pp. 107–117). Amsterdam, The Netherlands: Elsevier Science Publishers B.V.
Carpineto, C., & Romano, G. (1996). A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning, 24(2), 95–122.
Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google’s PageRank algorithm. Journal of Informetrics, 1(1), 8–15.
Connor, J. (2012). Scholar updates: Making new connections. http://googlescholar.blogspot.co.za/2012/08/scholar-updates-making-new-connections.html. Accessed 18 August 2016.
Davey, B. A., & Priestley, H. A. (2002). Introduction to lattices and order (2nd ed.). Cambridge: Cambridge University Press.
de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Dörk, M., Riche, N. H., Ramos, G., & Dumais, S. (2012). Pivotpaths: Strolling through faceted information spaces. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2709–2718.
Dunaiski, M., Visser, W., & Geldenhuys, J. (2016). Evaluating paper and author ranking algorithms using impact and contribution awards. Journal of Informetrics, 10(2), 392–407.
Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information Science and Technology, 63(12), 2351–2369.
Eccles, C. (2002). The use of university rankings in the united kingdom. Higher Education in Europe, 27(4), 423–432.
Fischer, B. (2000). Specification-based browsing of software component libraries. Automated Software Engineering, 7(2), 179–200.
Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., & Nevill-Manning, C. G. (1999). Domain-specific keyphrase extraction. In Proceedings of the sixteenth international joint conference on artificial intelligence. IJCAI ’99 (pp. 668–673). San Francisco, CA: Morgan Kaufmann Publishers Inc.
Ganter, B. (2010). Two basic algorithms in concept analysis. In International conference on formal concept analysis (pp. 312–340). Berlin: Springer.
Ganter, B., & Wille, R. (1999). Formal concept analysis—Mathematical foundations. Berlin: Springer.
Garfield, E. (1979). Is citation analysis a legitimate evaluation tool? Scientometrics, 1(4), 359–375.
Gollapalli, S. D., & Caragea, C. (2014). Extracting keyphrases from research papers using citation networks. In AAAI (pp. 1629–1635).
Greene, G. J., & Fischer, B. (2014). Conceptcloud: A tagcloud browser for software archives. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. FSE 2014 (pp. 759–762). New York, NY: ACM.
Greene, G. J., & Fischer, B. (2015). Interactive tag cloud visualization of software version control repositories. In 2015 IEEE 3rd working conference on software visualization (VISSOFT). VISSOFT 2015 (pp. 56–65). IEEE.
Greene, G. J., & Fischer, B. (2016). Cvexplorer: Identifying candidate developers by mining and exploring their open source contributions. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ASE 2016 (pp. 804–809). New York, NY: ACM.
Grineva, M., Grinev, M., & Lizorkin, D. (2009). Extracting key terms from noisy and multi-theme documents. In Proceedings of the 18th international conference on World Wide Web. WWW ’09 (pp. 661–670). New York, NY: ACM.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572.
Hoey, S. E. (2015). New research features on Mendeley.com! https://blog.mendeley.com/2015/11/03/new-research-features-on-mendeley-com/. Accessed 18 August 2016.
Huang, C., Tian, Y., Zhou, Z., Ling, C. X., & Huang, T. (2006). Keyphrase extraction using semantic networks structure analysis. In Proceedings of the sixth international conference on data mining. ICDM ’06 (pp. 275–284). Washington, DC: IEEE Computer Society.
Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on empirical methods in natural language processing. EMNLP ’03 (pp. 216–223). Stroudsburg, PA: Association for Computational Linguistics.
Jiang, X., Hu, Y., & Li, H. (2009). A ranking approach to keyphrase extraction. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. SIGIR ’09 (pp. 756–757). New York, NY: ACM.
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting on association for computational linguistics (Vol. 1, pp. 423–430). ACL ’03. Stroudsburg, PA: Association for Computational Linguistics.
Li, Y., Bandar, Z. A., & McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4), 871–882.
Li, Y., McLean, D., Bandar, Z. A., O’Shea, J. D., & Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8), 1138–1150.
Lindig, C. (1995). Concept-based component retrieval. In: Working Notes of the IJCAI-95 Workshop: Formal Approaches to the Reuse of Plans, Proofs, and Programs. pp. 21–25.
Lindig, C. (2000). Fast concept analysis. In Working with conceptual structures-contributions to ICCS (pp. 152–161).
Liu, Z., Li, P., Zheng, Y., & Sun, M. (2009). Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing. EMNLP ’09 (Vol. 1, pp. 257–266). Stroudsburg, PA: Association for Computational Linguistics.
Liu, P., Wu, Q., Mu, X., Yu, K., & Guo, Y. (2015). Detecting the intellectual structure of library and information science based on formal concept analysis. Scientometrics, 104(3), 737–762.
Lohmann, S., Ziegler, J., & Tetzlaff, L. (2009). Comparison of tag cloud layouts: Task-related performance and visual exploration. In INTERACT (1) (pp. 392–404).
Marchionini, G. (2006). Exploratory search: From finding to understanding. Communications of the ACM, 49(4), 41–46.
Medelyan, O., Frank, E., & Witten, I. H. (2009). Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing. EMNLP ’09 (Vol. 3, pp. 1318–1327). Stroudsburg, PA: Association for Computational Linguistics.
Medlar, A., Ilves, K., Wang, P., Buntine, W., & Glowacka, D. (2016). Pulp: A system for exploratory search of scientific literature. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. SIGIR ’16 (pp. 1133–1136). New York, NY: ACM.
Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into texts. Proceedings of EMNLP, 4(4), 404–411.
Miller, G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41.
Nguyen, T. D., & Kan, M.-y. (2007). Keyphrase Extraction in Scientific Publications. In: Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers. Springer-Verlag, pp. 317–326.
Osborne, F., Motta, E., & Mulholland, P. (2013). Exploring scholarly data with reexplore. In The semantic web–ISWC 2013 (pp. 460–477). Berlin: Springer.
Parolo, P. D. B., Pan, R. K., Ghosh, R., Huberman, B. A., Kaski, K., & Fortunato, S. (2015). Attention decay in science. Journal of Informetrics, 9(4), 734–745.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS One, 5(1), e8694.
Schrammel, J., Leitner, M., & Tscheligi, M. (2009). Semantically structured tag clouds: An empirical evaluation of clustered presentation approaches. In Proceedings of the SIGCHI conference on human factors in computing systems. CHI ’09 (pp. 2037–2040). New York, NY: ACM.
Van Dogen, S. M. (2000). Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht.
Wallace, M. L., Larivière, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practices. PLoS One, 7(3), e33339.
Wan, X., & Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd national conference on artificial intelligence . AAAI’08 (Vol. 2, pp. 855–860). London: AAAI Press.
West, J. D., Bergstrom, T. C., & Bergstrom, C. T. (2010). The eigenfactor metricstm: A network approach to assessing scholarly journals. College and Research Libraries, 71(3), 236–244.
West, J. D., Jensen, M. C., Dandrea, R. J., Gordon, G. J., & Bergstrom, C. T. (2013). Author-level eigenfactor metrics: Evaluating the influence of authors, institutions, and countries within the social science research network community. Journal of the American Society for Information Science and Technology, 64(4), 787–801.
White, R. W., & Roth, R. A. (2009). Exploratory search: Beyond the query-response paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services, 1(1), 1–98.
Wille, R. (1982). Restructuring lattice theory: An approach based on hierarchies of concepts. In Ordered sets. Reidel (pp. 445–470).
Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. G. (1999). KEA: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on digital libraries (pp. 254–255). ACM.
You, W., Fontaine, D., & Barthes, J.-P. (2009). Automatic keyphrase extraction with a refined candidate set. In IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies (Vol. 1, pp. 576–579). IET.
Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. (2016). Comparing keywords plus of wos and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972.
Acknowledgements
This research is funded in part by a STIAS Doctoral Scholarship, CAIR, NRF Grant 93582 and the MIH Media Lab.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dunaiski, M., Greene, G.J. & Fischer, B. Exploratory search of academic publication and citation data using interactive tag cloud visualizations. Scientometrics 110, 1539–1571 (2017). https://doi.org/10.1007/s11192-016-2236-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-2236-3