Abstract
Scientific mapping has now become an important subject in the scientometrics field. Journal clustering can provide insights into both the internal relations among journals and the evolution trend of studies. In this paper, we apply the affinity propagation (AP) algorithm to do scientific journal clustering. The AP algorithm identifies clusters by detecting their representative points through message passing within the data points. Compared with other clustering algorithms, it can provide representatives for each cluster and does not need to pre-specify the number of clusters. Because the input of the AP algorithm is the similarity matrix among data points, it can be applied to various forms of data sets with different similarity metrics. In this paper, we extract the similarity matrices from the journal data sets in both cross citation view and text view and use the AP algorithm to cluster the journals. Through empirical analysis, we conclude that these two clustering results by the two single views are highly complementary. Therefore, we further combine text information with cross citation information by using the simple average scheme and apply the AP algorithm to conduct multi-view clustering. The multi-view clustering strategy aims at obtaining refined clusters by integrating information from multiple views. With text view and citation view integrated, experiments on the Web of Science journal data set verify that the AP algorithm obtains better clustering results as expected.
Similar content being viewed by others
References
Bickel, S., & Scheffer, T. (2004). Multi-view clustering. ICDM, 4, 19–26.
Blaschko, M. B., & Lampert, C. H. (2008). Correlational spectral clustering. In IEEE Conference on, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE, (pp. 1–8).
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10, P10008.
Boyack, K. W., Börner, K., & Klavans, R. (2009). Mapping the structure and evolution of chemistry research. Scientometrics, 79(1), 45–60.
Cai, X., Nie, F., & Huang, H. (2013). Multi-view k-means clustering on big data. In Proceedings of the twenty-third international joint conference on artificial intelligence (pp. 2598–2604). AAAI Press.
Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Social Science Information, 22(2), 191–235.
Chaudhuri, K., Kakade, S. M., Livescu, K., & Sridharan, K. (2009). Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning, ACM (pp. 129–136).
Drost, I., Bickel, S., & Scheffer, T. (2006). Discovering communities in linked data by multi-view clustering. In From data and information analysis to knowledge engineering (pp. 342–349). Springer.
Dueck, D., & Frey, B. J. (2007). Non-metric affinity propagation for unsupervised image categorization. In IEEE 11th International Conference on, Computer Vision, 2007. ICCV 2007. IEEE (pp. 1–8).
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd, 96(34), 226–231.
Frey, B. J., & Dueck, D. (2006). Mixture modeling by affinity propagation. Advances in Neural Information Processing Systems, 18, 379.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.
Givoni, I., Chung, C., & Frey, B. J. (2012). Hierarchical affinity propagation. arXiv preprint arXiv:12023722.
Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548–1572.
Han, J., Kamber, M., & Pei, J. (2006). Data mining, Southeast Asia edition: Concepts and techniques. Los Altos, CA: Morgan Kaufmann.
Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., & Tibshirani, R. (2009). The elements of statistical learning (Vol. 2). Berlin: Springer.
Hatcher, E., & Gospodnetic, O. (2004). Lucene in action. Greenwich: Manning Publications.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data (Vol. 6). Englewood Cliffs: Prentice Hall.
Janssens, F., Zhang, L., De Moor, B., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing & Management, 45(6), 683–702.
Jia, Y., Wang, J., Zhang, C., & Hua, X. S. (2008). Finding image exemplars using fast sparse affinity propagation. In Proceedings of the 16th ACM international conference on Multimedia, ACM (pp. 639–642).
Kostoff, R. N., Buchtel, H. A., Andrews, J., & Pfeil, K. M. (2005). The hidden structure of neuropsychology: Text mining of the journal cortex: 1991–2001. Cortex, 41(2), 103–115.
Lai, D., Nardini, C., & Lu, H. (2011). Partitioning networks into communities by message passing. Physical Review E, 83(1), 016,115.
Leone, M., & Weigt, M. (2007). Clustering by soft-constraint affinity propagation: Applications to gene-expression data. Bioinformatics, 23(20), 2708–2715.
Leydesdorff, L. (2006). Can scientific journals be classified in terms of aggregated journal–journal citation relations using the journal citation reports? Journal of the American Society for Information Science and Technology, 57(5), 601–613.
Leydesdorff, L., & Rafols, I. (2009). A global map of science based on the isi subject categories. Journal of the American Society for Information Science and Technology, 60(2), 348–362.
Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., & De Moor, B. (2010). Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. Journal of the American Society for Information Science and Technology, 61(6), 1105–1119.
Liu, X., Glänzel, W., & De Moor, B. (2012). Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping. Scientometrics, 91(2), 473–493.
Liu, X., Ji, S., Glanzel, W., & De Moor, B. (2013). Multiview partitioning via tensor methods. IEEE Transactions on Knowledge and Data Engineering, 25(5), 1056–1069.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(14), 281–297.
Mirkin, B. (1998). Mathematical classification and clustering: From how to what and why. Berlin: Springer.
Moya Anegón, S. G., Vargas Quesada, B., Chinchilla Rodríguez, Z., CoreraÁvarez, E., Munoz Fernández, F. J., & Herrero Solana, V. (2007). Visualizing the marrow of science. Journal of the American Society for Information Science and Technology, 58(14), 2167–2179.
Muller, E., Gunnemann, S., Farber, I., & Seidl, T. (2012). Discovering multiple clustering solutions: Grouping objects in different views of the data. In 2012 IEEE 28th International Conference on, Data Engineering (ICDE), IEEE (pp. 1207–1210).
Rip, A., Callon, M., & Law, J. (1986). Mapping the dynamics of science and technology: Sociology of science in the real world. New York: Macmillan.
Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York: Mcgraw-Hill.
Shang, F., Jiao, L. C., Shi, J., Wang, F., & Gong, M. (2012). Fast affinity propagation clustering: A multilevel approach. Pattern Recognition, 45(1), 474–486.
Strehl, A., & Ghosh, J. (2003). Cluster ensembles–A knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3, 583–617.
Tremolieres, R. (1979). The percolation method for an efficient grouping of data. Pattern Recognition, 11(4), 255–262.
Xu, C., Tao, D., & Xu, C. (2013). A survey on multi-view learning. arXiv preprint arXiv:13045634.
Yu, S., Tranchevent, L. C., Liu, X., Glanzel, W., Suykens, J. A. K., De Moor, B., et al. (2012). Optimized data fusion for kernel k-means clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5), 1031–1039.
Zhang, L., Janssens, F., Liang, L. M., & Glänzel, W. (2009). Hybrid clustering analysis for mapping large scientific domains. In Proceedings of ISSI (pp. 178–188).
Zhang, L., Liu, X., Janssens, F., Liang, L., & Glänzel, W. (2010). Subject clustering analysis based on isi category classification. Journal of Informetrics, 4(2), 185–193.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Meng, X., Liu, X., Tong, Y. et al. Multi-view clustering with exemplars for scientific mapping. Scientometrics 105, 1527–1552 (2015). https://doi.org/10.1007/s11192-015-1682-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-015-1682-7