Relevancer: Finding and Labeling Relevant Information in Tweet Collections

Ali Hürriyetoǧlu^15,16,
Christian Gudehus¹⁷,
Nelleke Oostdijk¹⁶ &
…
Antal van den Bosch¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10047))

Included in the following conference series:

International Conference on Social Informatics

2642 Accesses

Abstract

We introduce a tool that supports knowledge workers who want to gain insights from a tweet collection, but due to time constraints cannot go over all tweets. Our system first pre-processes, de-duplicates, and clusters the tweets. The detected clusters are presented to the expert as so-called information threads. Subsequently, based on the information thread labels provided by the expert, a classifier is trained that can be used to classify additional tweets. As a case study, the tool is evaluated on a tweet collection based on the key terms ‘genocide’ and ‘Rohingya’. The average precision and recall of the classifier on six classes is 0.83 and 0.82 respectively. At this level of performance, experts can use the tool to manage tweet collections efficiently without missing much information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Supporting Experts to Handle Tweet Collections About Significant Events

TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets

Comprehensive Analysis of Clustering Techniques on Microblog Tweets

Notes

1.
For example, https://wiki.ushahidi.com/display/WIKI/SwiftRiver, https://github.com/qcri-social/AIDR/wiki/AIDR-Overview, and https://github.com/JakobRogstadius/CrisisTracker.
2.
https://bitbucket.org/hurrial/relevancer.
3.
http://relevancer.science.ru.nl.
4.
http://www.oed.com/view/Entry/71808.
5.
https://dev.twitter.com/rest/public.
6.
We used scikit-learn v0.17.1 for all machine learning tasks in this study http://scikit-learn.org.
7.
The annotation is designed to be done or coordinated by a single person in our setting.
8.
The expert may prefer to tolerate a few different tweets at the end of the group in case majority of the tweets are coherent and treat the cluster as coherent.
9.
We note that the repetition pattern analysis is valuable in its own right. However, this information is not within the scope of the present study.

References

Borra, E., Rieder, B.: Programmed method: developing a toolset for capturing and analyzing tweets. Aslib J. Inf. Manage. 66(3), 262–278 (2014). http://dx.doi.org/10.1108/AJIM-09-2013-0094
Article Google Scholar
Chau, D.H.: Data mining meets hci: Making sense of large graphs. Technical report, DTIC Document (2012)
Google Scholar
Choudhury, M.D., Counts, S., Czerwinski, M.: Find Me the Right Content! Diversity-based sampling of social media spaces for topic-centric search. ICWSM, pp. 129–136 (2011). http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewPDFInterstitial/2792/3290
Felt, M.: Social media and the social sciences: How researchers employ Big Data analytics. Big Data Soc. 3(1), 1–15 (2016). http://bds.sagepub.com/lookup/doi/10.1177/2053951716645828
Article Google Scholar
Gella, S., Cook, P., Baldwin, T.: One sense per tweeter... and other lexical semantic tales of twitter. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 215–220 (2014). http://www.aclweb.org/anthology/E14-4042
Gella, S., Cook, P., Han, B.: Unsupervised word usage similarity in social media texts. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 248–253 (2013). http://www.aclweb.org/anthology/S13-1036
Lau, J.H., Cook, P., McCarthy, D., Newman, D., Baldwin, T., Computing, L.: Word sense induction for novel sense detection. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 591–601 (2012)
Google Scholar
Mccarthy, D., Apidianaki, M., Erk, K.: Word sense clustering and clusterability. Comput. Linguist. 42(2), 4943 (2016)
Article MathSciNet Google Scholar
Tanguy, L., Tulechki, N., Urieli, A., Hermann, E., Raynal, C.: Natural language processing for aviation safety reports: from classification to interactive analysis. Comput. Ind. 78, 80–95 (2016)
Article Google Scholar
Wang, S., Manning, C.: Baselines and Bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics vol. 94305(1), pp. 90–94 (2012)
Google Scholar
Yang, Y., Eisenstein, J.: Putting Things in Context: Community-specific embedding projections for sentiment analysis. CoRR abs/1511.0 (2015). http://arxiv.org/abs/1511.06052

Download references

Acknowledgements

This research was funded by the Dutch national research programme COMMIT. We gratefully acknowledge the contribution by Statistics Netherlands.

Author information

Authors and Affiliations

Statistics Netherlands, P.O. Box 4481, 6401 CZ, Heerlen, The Netherlands
Ali Hürriyetoǧlu
Centre for Language Studies, Radboud University, P.O. Box 9103, 6500, Nijmegen, HD, The Netherlands
Ali Hürriyetoǧlu, Nelleke Oostdijk & Antal van den Bosch
Faculty of Social Science, Ruhr University Bochum, 150 Building GB 04/148, 44801, Bochum, Germany
Christian Gudehus

Authors

Ali Hürriyetoǧlu
View author publications
You can also search for this author in PubMed Google Scholar
Christian Gudehus
View author publications
You can also search for this author in PubMed Google Scholar
Nelleke Oostdijk
View author publications
You can also search for this author in PubMed Google Scholar
Antal van den Bosch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Hürriyetoǧlu .

Editor information

Editors and Affiliations

University of Washington, Seattle, Washington, USA
Emma Spiro
Indiana University, Bloomington, Indiana, USA
Yong-Yeol Ahn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hürriyetoǧlu, A., Gudehus, C., Oostdijk, N., van den Bosch, A. (2016). Relevancer: Finding and Labeling Relevant Information in Tweet Collections. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10047. Springer, Cham. https://doi.org/10.1007/978-3-319-47874-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-47874-6_15
Published: 19 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47873-9
Online ISBN: 978-3-319-47874-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics