Abstract
In the massive microblog texts, the ultra-short microblog text is difficult to be independently understood because of its special characteristics such as data sparseness, content fragmentation and so on. To solve this problem, this paper presents an associated semantic representation model for the ultra-short microblog text (ASRM-UMT) to help users understand it better. First, multi-layer associated semantic views of the ultra-short microblog text are built. The ICTCLAS system is adopted to extract the feature keywords from microblog texts. The mining algorithm of associated semantic on a dynamic time window is proposed to mine the associated semantic relations among the feature keywords. The mining process has deeply considered three aspects including context, comments and transmissions of microblog texts. Then, multi-layer associated semantic views of the ultra-short microblog text are optimized. The comparison of the clustering coefficients among several multi-layer associated semantic views is presented to select the optimal associated semantic view. Experimental results show that the proposed model can represent the ultra-short microblog text accurately and effectively.
Similar content being viewed by others
References
Li, J., Fan, Q., Zhang, K.: Keyword extraction based on tf/idf for Chinese news document. Wuhan Univ. J. Nat. Sci. 12(5), 917–921 (2007)
Wartena, C., Brussee, R., Slakhorst, W.: Keyword extraction using word co-occurrence. In: Proceedings of the 2010 Workshops on Database and Expert Systems Applications, IEEE Computer Society, pp. 54–58 (2010)
Jiao, H., Liu, Q., Jia, H.B.: Chinese keyword extraction based on N-gram and word co-occurrence. In: Computational Intelligence and Security Workshops, 2007 (CISW 2007), pp. 152–155 (2007)
Zhao, W., Hou, X.: News topic recognition of Chinese microblog based on word co-occurrence graph. CAAI Trans. Intell. Syst. 7(5), 444–449 (2012)
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization[C]. In: Mmies 08 Workshop on Multi-source Multilingual Information Extraction and Summar, pp. 17–24 (2008)
Chien, L.F.: PAT-tree-based keyword extraction for Chinese information retrieval. In: Machinery, ACM SIGIR Forum, Association for Computing, pp. 221–222 (1989)
Zhang, K., Xu, H., Tang, J., et al.: Keyword extraction using support vector machine. Lecture Notes in Computer Science, vol. 4016, pp. 85–96 (2006)
Wang, L., Zheng, T., Cheng, Q., et al.: Discovering news topics from microblogs based on semantic co-occurrence. Comput. Eng. Appl. 50(17), 150–154 (2014)
Zhang, S., Wang. Y., Liu, W., et al.: A model for estimating the out-degree of nodes in associated semantic network from semantic feature view. Concurr. Comp. Pract. E (2016). doi:10.1002/cpe.3819
Razavi, A.H., Inkpen, D.: Text Representation using Multi-level Latent Dirichlet Allocation. In: Sokolova, M., van Beek, P., et al. (eds.) Advances in artificial intelligence, pp. 215–226. Springer, New York (2014)
Tang, W., Chen, X., Xu, Z.: A Semantic Representation of Microblog Short Text Based on Topic Model. Springer, Berlin (2014)
Hu, J., Xiong, C., Shu, J., et al.: A novel method of three dimensional text representation. In: IEEE, pp. 1–4 (2009)
Delpisheh, E., An, A.: Topic Modeling Using Collapsed Typed Dependency Relations. In: Tseng, V.S., et al. (eds.) Advances in knowledge discovery and data mining, pp. 146–161. Springer, New York (2014)
Xuan, W.F., Liu, B.Q., Sun, C.J., et al.: Finding main topics in blogosphere using document clustering based on topic model. In: Machine Learning and Cybernetics (ICMLC), IEEE, pp. 1902–1908, (2011)
Luo, X., Zhang, J., Ye, F., et al.: Power series representation model of text knowledge based on human concept learning. IEEE Syst. Man Cybern. Syst. 44(1), 86–102 (2014)
Wu, S., Zhang, Z., Qian, Q.: Research on text representation model based on language network. Inf. Sci. 12(31), 119–125 (2013)
Liao, T., Liu, Z., Wang, X.: Research on event-based method for text representation. Comput. Sci. 39(12), 188–191 (2012)
Wu, J., Liu, Q.: Research on graph structure based method for Chinese text representation. J. China Soc. Sci. Tech. 29(4), 618–624 (2010)
Li, G., Mao, J.: A review on text graph representation and its application in mining. J. China Soc. Sci. Tech. 32(12), 1257–1264 (2013)
Rusu, D., Fortuna, B., Mladenic, D., et al.: Visual analysis of documents with semantic graphs [EB/OL]. http://www.hiit.fi/vakd09/papers.html. Accessed 10 Aug 2012
Qu, Q., Qiu, J.G., Sun, C.Y., et al.: Graph-based knowledge representation model and pattern retrieval. In: FSKD, IEEE, pp. 541–545 (2008)
Xu, Z., Liu, Y., Mei, L., et al.: Semantic based representing and organizing surveillance big data using video structural description technology. J. Syst. Softw. 102, 217–225 (2014)
Xu, Z., Liu, Y., Mei, L., et al.: Generating temporal semantic context of concepts using web search engines. J. Netw. Comput. Appl. 43(1), 42–55 (2014)
Luo, X., Xu, Z., Yu, J., et al.: Building association link network for semantic link on web resources. IEEE Trans. Autom. Sci. Eng. 8(3), 482–494 (2011)
Bernard, T., Bui, A., Pilard, L., et al.: A distributed clustering algorithm for large-scale dynamic networks. Clust. Comput. 15(4), 335–350 (2012)
Yu, Z., Wang, H., Lin, X., et al.: Understanding short texts through semantic enrichment and hashing. IEEE Knowl. Data Eng. 28(2), 566–579 (2016)
Tang, J., Wang, X., Gao, H., et al.: Enriching short text representation in microblog for clustering. Front. Comput. Sci. 6(1), 88–101 (2012)
Chen, Y., Li, F., Fan, J.: Mining association rules in big data with NGEP. Clust. Comput. 18(2), 577–585 (2015)
Tan, P.N.: An Introduction to Data Mining. Turing series of Computer Science. People’s Posts and Telecommunications press, Beijing (2011)
Wang, X.F.: Complex Network Theory and Its Application. Tsinghua University press, Beijing (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, S., Wang, Y., Zhang, S. et al. Building associated semantic representation model for the ultra-short microblog text jumping in big data. Cluster Comput 19, 1399–1410 (2016). https://doi.org/10.1007/s10586-016-0602-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0602-9