Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2786451.2786470acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Analyzing Discourse Communities with Distributional Semantic Models

Published: 28 June 2015 Publication History

Abstract

This paper presents a new corpus-driven approach applicable to the study of language patterns in social and political contexts, or Critical Discourse Analysis (CDA) using Distributional Semantic Models (DSMs). This approach considers changes in word semantics, both over time and between communities with differing viewpoints. The geometrical spaces constructed by DSMs or "word spaces" offer an objective, robust exploratory analysis tool for revealing novel patterns and similarities between communities, as well as highlighting when these changes occur. To quantify differences between word spaces built on different time periods and from different communities, we analyze the nearest neighboring words in the DSM, a process we relate to analyzing "concordance lines". This makes the approach intuitive and interpretable to practitioners. We demonstrate the usefulness of the approach with two case studies, following groups with opposing political ideologies in the Scottish Independence Referendum, and the US Midterm Elections 2014.

References

[1]
L. Akoglu. Quantifying political polarity based on bipartite opinion networks. In Proceedings of the International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, 2014.
[2]
L. Anthony. Antconc: A learner and classroom friendly, multi-platform corpus analysis toolkit. In Proceedings of IWLeL 2004: An Interactive Workshop on Language e-Learning, 2005.
[3]
P. Baker, C. Gabrielatos, M. Khosravinik, M. Krzyżanowski, T. McEnery, and R. Wodak. A useful methodological synergy? combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the uk press. Discourse & Society, 19(3):273--306, 2008.
[4]
P. Baker, T. McEnery, and C. Gabrielatos. Using collocation analysis to reveal the construction of minority groups: The case of refugees, asylum seekers and immigrants in the uk press. In Corpus Linguistics, 2007.
[5]
M. Baroni, G. Dinu, and G. Kruszewski. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22--27, 2014, Baltimore, MD, USA, Volume 1: Long Papers, pages 238--247, 2014.
[6]
P. Basile, A. Caputo, and G. Semeraro. Analysing word meaning over time by exploiting temporal random indexing. In CLIC 2014, The Italian Conference on Computational Linguistics, pages 38--42, 2014.
[7]
S. Boulianne. Social media use and participation: a meta-analysis of current research. Information, Communication & Society, 18(5):524--538, 2015.
[8]
C. Chen, W. L. Buntine, N. Ding, L. Xie, and L. Du. Differential topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):230--242, 2015.
[9]
K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22--29, 1990.
[10]
M. Conover, J. Ratkiewicz, M. R. Francisco, B. Gonçalves, F. Menczer, and A. Flammini. Political polarization on twitter. In Proceedings of the International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, 2011.
[11]
N. Fairclough. Critical Discourse Analysis: Papers in the Critical Study of Language. Language in social life series. Longman, 1995.
[12]
J. Firth. A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis. Philological Society, Oxford, 1957. reprinted in Palmer, F. (ed. 1968) Selected Papers of J. R. Firth, Longman, Harlow.
[13]
S. Ghosh, M. B. Zafar, P. Bhattacharya, N. Sharma, N. Ganguly, and K. Gummadi. On sampling the wisdom of crowds: Random vs. expert sampling of the twitter stream. In Proceedings of the International Conference on Conference on Information & Knowledge Management, CIKM '13, pages 1739--1744, New York, NY, USA, 2013. ACM.
[14]
D. Greene, D. O'Callaghan, and P. Cunningham. How many topics? stability analysis for topic models. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD '14, Nancy, France, pages 498--513, 2014.
[15]
K. Gulordava and M. Baroni. A distributional similarity approach to the detection of semantic change in the google books ngram corpus. In Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP '11, pages 67--71. Association for Computational Linguistics, 2011.
[16]
F. Hill, R. Reichart, and A. Korhonen. SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. ArXiv e-prints, Aug. 2014.
[17]
A. Jungherr. Twitter in politics: a comprehensive literature review. Available at SSRN 2402443, 2014.
[18]
P. Juola. The time course of language change. Computers and the Humanities, 37(1):77--96, 2003.
[19]
D. Kreiss. Seizing the moment: The presidential campaigns' use of twitter during the 2012 electoral cycle. New Media & Society, 2014.
[20]
V. Kulkarni, R. Al-Rfou, B. Perozzi, and S. Skiena. Statistically significant detection of linguistic change. In Proceedings of the International Conference on World Wide Web, WWW '15, pages 625--635, Geneva, Switzerland, 2015. International World Wide Web Conferences Steering Committee.
[21]
O. Levy and Y. Goldberg. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada, pages 2177--2185, 2014.
[22]
H. Lietz, C. Wagner, A. Bleier, and M. Strohmaier. When politicians talk: Assessing online conversational practices of political parties on twitter. In Proceedings of the International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, 2014.
[23]
W. C. Mann and S. A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organisation. Text - Interdisciplinary Journal for the Study of Discourse, 3(8):234--281, 1988.
[24]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
[25]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States., pages 3111--3119, 2013.
[26]
M. J. Paul, C. Zhai, and R. Girju. Summarizing contrastive viewpoints in opinionated text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 66--76, Stroudsburg, PA, USA, 2010.
[27]
J. Porter. Audience and rhetoric: an archaeological composition of the discourse community. Prentice Hall studies in writing and culture. Prentice Hall, 1992.
[28]
J. Sinclair. Corpus, Concordance, Collocation. Oxford University Press, Oxford, 1991.
[29]
R. Sulo, T. Berger-Wolf, and R. Grossman. Meaningful selection of temporal resolution for dynamic networks. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG '10, pages 127--136, New York, NY, USA, 2010.
[30]
A. Thanopoulos, N. Fakotakis, and G. Kokkinakis. Comparative evaluation of collocation extraction metrics. In Proceedings of the Third International Conference on Language Resources and Evaluation, LREC 2002, May 29-31, 2002, Las Palmas, Canary Islands, Spain, 2002.
[31]
Z. Tufekci. Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In Proceedings of the International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, 2014.
[32]
T. A. Van Dijk. Critical discourse analysis. The handbook of discourse analysis, 18:352, 2003.
[33]
R. Wodak. The discourse-historical approach. In Methods of critical discourse analysis, pages 63--94, 2001.
[34]
M. Zappavigna. Discourse of Twitter and Social Media: How We Use Language to Create Affiliation on the Web. Bloomsbury Academic. 2012.

Cited By

View all
  • (2024)Analysis of the Formation of Scientific Communities in the Journal Research and Education in Nursing (2010 - 2020) and its Disciplinary Influence: an Approach from Bibliometric Analysis, Network Analysis, and Natural Language ProcessingInvestigación y Educación en Enfermería10.17533/udea.iee.v42n2e1242:2Online publication date: 2024
  • (2024)A Pertinence Score for Political Discourse Analysis: The Case of 2018 Colombian ElectionsDigital Government: Research and Practice10.1145/36892135:3(1-15)Online publication date: 8-Oct-2024
  • (2024)Climate change denial and ideology in Swedish online media: measuring ideology change using a computational approachJournal of Computational Social Science10.1007/s42001-024-00343-x8:1Online publication date: 16-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '15: Proceedings of the ACM Web Science Conference
June 2015
366 pages
ISBN:9781450336727
DOI:10.1145/2786451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

WebSci '15
Sponsor:
WebSci '15: ACM Web Science Conference
June 28 - July 1, 2015
Oxford, United Kingdom

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Analysis of the Formation of Scientific Communities in the Journal Research and Education in Nursing (2010 - 2020) and its Disciplinary Influence: an Approach from Bibliometric Analysis, Network Analysis, and Natural Language ProcessingInvestigación y Educación en Enfermería10.17533/udea.iee.v42n2e1242:2Online publication date: 2024
  • (2024)A Pertinence Score for Political Discourse Analysis: The Case of 2018 Colombian ElectionsDigital Government: Research and Practice10.1145/36892135:3(1-15)Online publication date: 8-Oct-2024
  • (2024)Climate change denial and ideology in Swedish online media: measuring ideology change using a computational approachJournal of Computational Social Science10.1007/s42001-024-00343-x8:1Online publication date: 16-Dec-2024
  • (2022)Discursive construction of migrant otherness on Facebook: A distributional semantics approachDiscourse & Society10.1177/0957926522111701434:2(236-254)Online publication date: 21-Oct-2022
  • (2022)A scoping review on the use of natural language processing in research on political polarization: trends and research prospectsJournal of Computational Social Science10.1007/s42001-022-00196-26:1(289-313)Online publication date: 19-Dec-2022
  • (2020)EPIC30M: An Epidemics Corpus of Over 30 Million Relevant Tweets2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9377739(1206-1215)Online publication date: 10-Dec-2020
  • (2018)Identify Shifts of Word Semantics through Bayesian SurpriseThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210040(825-834)Online publication date: 27-Jun-2018
  • (2018)Stance Classification through Proximity-based Community DetectionProceedings of the 29th on Hypertext and Social Media10.1145/3209542.3209549(220-228)Online publication date: 3-Jul-2018
  • (2018)Viewpoint Discovery and Understanding in Social NetworksProceedings of the 10th ACM Conference on Web Science10.1145/3201064.3201076(47-56)Online publication date: 15-May-2018
  • (2017)Users Are Known by the Company They KeepProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132897(87-96)Online publication date: 6-Nov-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media