Abstract
Analytic jobs over social media data typically need to explore data of different periods. However, most existing keyword search work merely use creation time of items as the measurement of their recency. In this paper we propose top-k temporal keyword query that ranks data by their aggregate sum of shared times during the given time window. A query algorithm that can be executed over a general temporal inverted index is provided. The complexity analysis based on the power law distribution reveals the upper bound of accessed items. Furthermore, two-tiers structure and piecewise maximum approximation sketch are proposed as refinements. Extensive empirical studies on a reallife dataset show the combination of two refinements achieves remarkable performance improvement under different query settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anand, A., Bedathur, S.J., Berberich, K., Schenkel, R.: Efficient temporal keyword search over versioned text. In: CIKM, pp. 699–708 (2010)
Arge, L., Vitter, J.S.: Optimal external memory interval management. SIAM J. Comput. 32(6), 1488–1508 (2003)
Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A time machine for text search. In: SIGIR, p. 519 (2007)
Chen, C., Li, F., Ooi, B.C., Wu, S.: Ti: an efficient indexing mechanism for real-time search on tweets. In: SIGMOD Conference, pp. 649–660 (2011)
Fuchs, E., Gruber, T., Nitschke, J., Sick, B.: Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2232–2245 (2010)
He, J., Suel, T.: Faster temporal range queries over versioned text. In: SIGIR, p. 565 (2011)
Huo, W., Tsotras, V.J.: A comparison of Top-k temporal keyword querying over versioned text collections. In: Liddle, S.W., Schewe, K.-D., Tjoa, A.M., Zhou, X. (eds.) DEXA 2012, Part II. LNCS, vol. 7447, pp. 360–374. Springer, Heidelberg (2012)
Jestes, J., Phillips, J.M., Li, F., Tang, M.: Ranking large temporal data. PVLDB 5(11), 1412–1423 (2012)
Keogh, E.J., Chu, S., Hart, D.M., Pazzani, M.J.: An online algorithm for segmenting time series. In: ICDM, pp. 289–296 (2001)
Lemire, D.: A better alternative to piecewise linear time series segmentation. In: SDM, pp. 545–550 (2007)
Li, F., Yi, K., Le, W.: Top- k queries on temporal data. VLDB J. 19(5), 715–733 (2010)
Ma, H., Qian, W., Xia, F., He, X., Xu, J., Zhou, A.: Towards modeling popularity of microblogs. Front. Comput. Sci. 7(2), 171–184 (2013)
Wu, L., Lin, W., Xiao, X., Xu, Y.: LSII: an indexing structure for exact real-time search on microblogs. In: ICDE, pp. 482–493 (2013)
Zhuang, Y.: Building a complete Tweet index. Tuesday, 18 November 2014 (2014). https://blog.twitter.com/2014/building-a-complete-tweet-index. Accessed 21 Nov 2014
Acknowledgements
This work is partially supported by National High-tech R&D Program (863 Program) under grant number 2015AA015307 and National Science Foundation of China under grant number 61432006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Xia, F., Yu, C., Qian, W., Zhou, A. (2016). Top-k Temporal Keyword Query over Social Media Data. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9931. Springer, Cham. https://doi.org/10.1007/978-3-319-45814-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-45814-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45813-7
Online ISBN: 978-3-319-45814-4
eBook Packages: Computer ScienceComputer Science (R0)