Abstract
Social media platforms are a rich source of information these days, however, of all the available information, only a small fraction is of users’ interest. To help users catch up with the latest topics of their interests from the large amount of information available in social media, we present a relevant content filtering based framework for data stream summarization. More specifically, given the topic or event of interest, this framework can dynamically discover and filter out relevant information from irrelevant information in the stream of text provided by social media platforms. It then captures the most representative and up-to-date information to generate a sequential summary or event story line along with the evolution of the topic or event. This framework does not depend on any labeled data, it instead uses the weak supervision provided by the user, which matches the real scenarios of users searching for information about an ongoing event. The experiments on two real events traced by Twitter verified the effectiveness of the proposed framework. The robustness of using the most easy-to-obtain weak supervision, i.e., trending topic or hashtag indicates that the framework can be easily integrated into social media platforms such as Twitter to generate sequential summaries for the events of interest. We also make the manually generated gold-standard sequential summaries of the two test events publicly available (https://drive.google.com/open?id=15jRw13i0xARUW3HqBn3BdR45IXk7P2Qj-HO__OFmMW0) for future use in the community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
It is not explicitly labeled for the classification task, rather than obtained from the data itself.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Chakrabarti, D., Punera, K.: Event summarization using tweets. In: ICWSM (2011)
Chang, Y., Wang, X., Mei, Q., Liu, Y.: Towards Twitter context summarization with user influence models. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 527–536. ACM (2013)
Dong, C., Agarwal, A.: WS\({}^{\text{2}}\)F: a weakly supervised framework for data stream filtering. In: 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, October 27–30, pp. 50–57 (2014)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22(1), 457–479 (2004)
Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 42–47 (2011)
Ounis, I., Craig Macdonald, J.L., Soboroff, I.: Overview of the TREC-2011 microblog track. In: Proceedings of the 20th Text REtrieval Conference (TREC 2011) (2011)
Kelly, R.: Twitter study - august 2009 (August 2009). http://pearanalytics.com/wp-content/uploads/2012/12/Twitter-Study-August-2009.pdf
Khan, M.A.H., Iwai, M., Sezaki, K.: An improved classification strategy for filtering relevant tweets using bag-of-word classifiers. J. Inf. Process. 21(3), 507–516 (2013)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81 (2004)
Long, R., Wang, H., Chen, Y., Jin, O., Yu, Y.: Towards effective event detection, tracking and summarization on microblog data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 652–663. Springer, Heidelberg (2011)
Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 227–236. ACM (2011)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. Association for Computational Linguistics (2004)
Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using Twitter. In: Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, pp. 189–198. ACM (2012)
Olariu, A.: Hierarchical clustering in improving microblog stream summarization. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7817, pp. 424–435. Springer, Heidelberg (2013)
Olariu, A.: Efficient online summarization of microblogging streams. In: EACL 2014, p. 236 (2014)
Osborne, M., Moran, S., McCreadie, R., Von Lunen, A., Sykora, M., Cano, E., Ireson, N., Macdonald, C., Ounis, I., He, Y., et al.: Real-time detection, tracking, and monitoring of automatically discovered events in social media. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 37–42 (2014)
Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004)
Sharifi, B., Hutton, M.A., Kalita, J.K.: Experiments in microblog summarization. In: 2010 IEEE Second International Conference on Social Computing, pp. 49–56 (2010)
Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 685–688 (2010)
Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing twitter feeds. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 370–378. ACM (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Dong, C., Agarwal, A. (2016). A Relevant Content Filtering Based Framework for Data Stream Summarization. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10047. Springer, Cham. https://doi.org/10.1007/978-3-319-47874-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-47874-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47873-9
Online ISBN: 978-3-319-47874-6
eBook Packages: Computer ScienceComputer Science (R0)