Abstract
We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
http://www.clef-initiative.eu, accessed January 19, 2019.
- 3.
http://clef2019.clef-initiative.eu, accessed January 19, 2019.
- 4.
https://emw.ku.edu.tr/clef-protestnews-2019, accessed January 19, 2019.
- 5.
Using available corpora that are already being allowed to be distributed freely is not an option for our setting due to the requirement of having a representative sample from the source and target countries. Also, the dataset should contain data created in more than one country in order to be useful in our setting.
- 6.
The overlap ratio is 100%.
- 7.
We mainly annotate the event trigger, place, time, participant, organizer, and target of the protest.
- 8.
The difference between our and these projects’ annotation manuals potentially affects the precision and recall as well.
- 9.
https://github.com/emerging-welfare/ie-tools-test-on-India-b1, accessed January 19.
References
Akdemir, A., Hürriyetoğlu, A., Yörük, E., Gürel, B., Yoltar, C., Yüret, D.: Towards generalizable place name recognition systems: analysis and enhancement of NER systems on English News from India. In: Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018, pp. 8:1–8:10. ACM, New York (2018). https://doi.org/10.1145/3281354.3281363
Boschee, E., Natarajan, P., Weischedel, R.: Automatic extraction of events from open source text for predictive forecasting. In: Subrahmanian, V. (ed.) Handbook of Computational Approaches to Counterterrorism, pp. 51–67. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-5311-6_3
Büyüköz, B., Hürriyetoğlu, A., Yörük, E., Yüret, D.: Examining existing information extraction tools on manually-annotated protest events in Indian news. In: Proceedings of Computational Linguistics in Netherlands (CLIN), CLIN29 (2019)
Chenoweth, E., Lewis, O.A.: Unpacking nonviolent campaigns: introducing the NAVCO 2.0 dataset. J. Peace Res. 50(3), 415–423 (2013). https://doi.org/10.1177/0022343312471551
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ettinger, A., Rao, S., Daumé III, H., Bender, E.M.: Towards linguistically generalizable NLP systems: a workshop and shared task. In: Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, pp. 1–10. Association for Computational Linguistics (2017). http://aclweb.org/anthology/W17-5401
Giugni, M.G.: Was it worth the effort? The outcomes and consequences of social movements. Ann. Rev. Sociol. 24, 371–393 (1998). http://www.jstor.org/stable/223486
Hammond, J., Weidmann, N.B.: Using machine-coded event data for the micro-level study of political violence. Res. Polit. 1(2) (2014). https://doi.org/10.1177/2053168014539924
Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention, vol. 2, pp. 1–49. Citeseer (2013)
Lorenzini, J., Makarov, P., Kriesi, H., Wueest, B.: Towards a dataset of automatically coded protest events from English-language Newswire documents. In: Paper Presented at the Amsterdam Text Analysis Conference (2016)
Nardulli, P.F., Althaus, S.L., Hayes, M.: A progressive supervised-learning approach to generating rich civil strife data. Sociol. Methodol. 45(1), 148–183 (2015). https://doi.org/10.1177/0081175015581378
Schrodt, P.A., Beieler, J., Idris, M.: Three’sa charm? Open event data coding with el: Diablo, Petrarch, and the open event data alliance. In: ISA Annual Convention (2014)
Soboroff, I., Ferro, N., Fuhr, N.: Report on GLARE 2018: 1st workshop on generalization in information retrieval: can we predict performance in new domains? SIGIR Forum 52(2), 132–137 (2018). http://sigir.org/wp-content/uploads/2019/01/p132.pdf
Sönmez, Ç., Özgür, A., Yörük, E.: Towards building a political protest database to explain changes in the welfare state. In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 106–110. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/W16-2113, http://www.aclweb.org/anthology/W16-2113
Tarrow, S.: Power in Movement: Social Movements, Collective Action and Politics. Cambridge Studies in Comparative Politics, Cambridge University Press (1994). https://books.google.com.tr/books?id=hN5nQgAACAAJ
Wang, W.: Event detection and extraction from news articles. Ph.D. thesis, Virginia Tech (2018)
Wang, W., Kennedy, R., Lazer, D., Ramakrishnan, N.: Growing pains for global monitoring of societal events. Science 353(6307), 1502–1503 (2016). https://doi.org/10.1126/science.aaf6758. http://science.sciencemag.org/content/353/6307/1502
Weidmann, N.B., Rød, E.G.: The Internet and Political Protest in Autocracies, Chap. Coding Protest Events in Autocracies. Oxford University Press, Oxford (2019)
Yoruk, E.: The politics of the Turkish welfare system transformation in the neoliberal era: welfare as mobilization and containment. The Johns Hopkins University (2012)
Acknowledgments
This work is funded by the European Research Council (ERC) Starting Grant 714868 awarded to Dr. Erdem Yörük for his project Emerging Welfare. (https://emw.ku.edu.tr, accessed January 19) We are grateful to our steering committee members for the CLEF 2019 lab Sophia Ananiadou, Antal van den Bosch, Kemal Oflazer, Arzucan Özgür, Aline Villavicencio, and Hristo Tanev. Finally, we thank to Theresa Gessler and Peter Makarov for their contribution in organizing the CLEF lab by reviewing the annotation manuals and sharing their work with us respectively.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hürriyetoğlu, A. et al. (2019). A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11438. Springer, Cham. https://doi.org/10.1007/978-3-030-15719-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-15719-7_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15718-0
Online ISBN: 978-3-030-15719-7
eBook Packages: Computer ScienceComputer Science (R0)