Abstract
We introduce a tool that supports knowledge workers who want to gain insights from a tweet collection, but due to time constraints cannot go over all tweets. Our system first pre-processes, de-duplicates, and clusters the tweets. The detected clusters are presented to the expert as so-called information threads. Subsequently, based on the information thread labels provided by the expert, a classifier is trained that can be used to classify additional tweets. As a case study, the tool is evaluated on a tweet collection based on the key terms ‘genocide’ and ‘Rohingya’. The average precision and recall of the classifier on six classes is 0.83 and 0.82 respectively. At this level of performance, experts can use the tool to manage tweet collections efficiently without missing much information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
We used scikit-learn v0.17.1 for all machine learning tasks in this study http://scikit-learn.org.
- 7.
The annotation is designed to be done or coordinated by a single person in our setting.
- 8.
The expert may prefer to tolerate a few different tweets at the end of the group in case majority of the tweets are coherent and treat the cluster as coherent.
- 9.
We note that the repetition pattern analysis is valuable in its own right. However, this information is not within the scope of the present study.
References
Borra, E., Rieder, B.: Programmed method: developing a toolset for capturing and analyzing tweets. Aslib J. Inf. Manage. 66(3), 262–278 (2014). http://dx.doi.org/10.1108/AJIM-09-2013-0094
Chau, D.H.: Data mining meets hci: Making sense of large graphs. Technical report, DTIC Document (2012)
Choudhury, M.D., Counts, S., Czerwinski, M.: Find Me the Right Content! Diversity-based sampling of social media spaces for topic-centric search. ICWSM, pp. 129–136 (2011). http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewPDFInterstitial/2792/3290
Felt, M.: Social media and the social sciences: How researchers employ Big Data analytics. Big Data Soc. 3(1), 1–15 (2016). http://bds.sagepub.com/lookup/doi/10.1177/2053951716645828
Gella, S., Cook, P., Baldwin, T.: One sense per tweeter... and other lexical semantic tales of twitter. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 215–220 (2014). http://www.aclweb.org/anthology/E14-4042
Gella, S., Cook, P., Han, B.: Unsupervised word usage similarity in social media texts. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 248–253 (2013). http://www.aclweb.org/anthology/S13-1036
Lau, J.H., Cook, P., McCarthy, D., Newman, D., Baldwin, T., Computing, L.: Word sense induction for novel sense detection. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 591–601 (2012)
Mccarthy, D., Apidianaki, M., Erk, K.: Word sense clustering and clusterability. Comput. Linguist. 42(2), 4943 (2016)
Tanguy, L., Tulechki, N., Urieli, A., Hermann, E., Raynal, C.: Natural language processing for aviation safety reports: from classification to interactive analysis. Comput. Ind. 78, 80–95 (2016)
Wang, S., Manning, C.: Baselines and Bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics vol. 94305(1), pp. 90–94 (2012)
Yang, Y., Eisenstein, J.: Putting Things in Context: Community-specific embedding projections for sentiment analysis. CoRR abs/1511.0 (2015). http://arxiv.org/abs/1511.06052
Acknowledgements
This research was funded by the Dutch national research programme COMMIT. We gratefully acknowledge the contribution by Statistics Netherlands.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Hürriyetoǧlu, A., Gudehus, C., Oostdijk, N., van den Bosch, A. (2016). Relevancer: Finding and Labeling Relevant Information in Tweet Collections. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10047. Springer, Cham. https://doi.org/10.1007/978-3-319-47874-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-47874-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47873-9
Online ISBN: 978-3-319-47874-6
eBook Packages: Computer ScienceComputer Science (R0)