Time-sensitive Arabic multiword expressions extraction from social networks

Daoud Daoud¹,
Akram Al-Kouz¹ &
Mohammad Daoud²

274 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we present a comprehensive approach for extracting and relating Arabic multiword expressions (MWE) from Social Networks. 15 million tweets were collected and processed to form our data set. Due to the complexity of processing Arabic and the lack of resources, we built an experimental system to extract and relate similar MWE using statistical methods. We introduce a new metrics for measuring valid MWE in Social Networks. We compare results obtained from our experimental system against semantic graph obtained from web knowledgebase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

https://lucene.apache.org/.
The translation of “عاصفة الحزم” into English using Google translate is “Storm packets”, which is unrelated to the source MWE. This is a clear demonstration of the necessity to treat MWE as one unit.
http://wiki.dbpedia.org/.

References

Al-Haj, H. (2010). Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In Proceedings of the 23rd international conference on computational linguistics (COLING).
Alkouz, A. & Albayrak, S. (2012). An interests discovery approach in social networks based on semantically enriched graphs. In International conference on advances in social networks analysis and mining, Istanbul.
Baldwin, T. et al. (2008). A machine learning approach to multiword expression extraction. In Proceedings of the LREC workshop towards a shared task for multiword expressions.
Bar, K. & Dershowitz N. (2014). Inferring paraphrases for a highly inflected language from a monolingual corpus. In Computational linguistics and intelligent text processing, Lecture notes in computer science, New York: Springer, 8404, pp 254–270.
Bruce, C., et al. (2009). Search engines: Information retrieval in practice. Boston: Addison-Wesley Publishing Company.
Google Scholar
Covington, M. A. (1992). A dependency parser for variable-word-order languages. In K. R. Billingsley, H. U. Brown III, & E. Derohanes (Eds.), Computer assisted modeling on the IBM 3090: Papers from the 1989 IBM supercomputing competition. Athens: Baldwin Press.
Google Scholar
Daoud, D. (2005). Arabic Deconversion in the framework of the universal networking language. In J. Cardeٌosa, A. Gelbukh & E. Tovar (Eds.), Universal networking language, Advances in Theory and Applications. Research on Computing Science (Vol. 12).
Daoud, D. & Qais H. (2011). Stemming arabic using longest-match and dynamic normalization. In Arabic language technology international conference (ALTIC) 2011, Bibliotheca Alexandrina (B.A.), Alexandria.
Daoud, D., & Boitet, C. (2014). Correctness, strength and similarity evaluation of stemming algorithms for arabic. The Egyptian Journal of Language Engineering, 1(1), 17–23.
Google Scholar
Daoud, D., et al. (2015). Arabic tweets clustering and labeling based on lingual and semantically enriched bayesian network model. Recent Patents on Computer Science, 8(2), 1–14.
Article MathSciNet Google Scholar
Ethnologue (2015). Ethnologue languages of the world. Retrieved 2015, from http://www.ethnologue.com/statistics/size.
Frank, S. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.
Google Scholar
Graham, K. & Giesbrecht, E. (2006). Automatic identification of non-compositional multiword expressions using latent semantic analysis. In Workshop on multiword expressions: Identifying and exploiting underlying properties, Sydney: Association for Computational Linguistics.
Grinev, M. et al. (2011). Analytics for the realtime web. In Proceedings of the VLDB endowment.
Haewoon, K., et al. (2010). What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web. Raleigh: ACM.
Ivan, A. S. et al. (2002). Multiword expressions: a pain in the neck for NLP. In Proceedings of the third international conference on computational linguistics and intelligent text processing, Springer-Verlag.
Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT Press.
Google Scholar
Kenneth Ward, C., & Patrick, H. (1990). Word association norms, mutual information, and lexicography. Computational Linguistic, 16(1), 22–29.
Google Scholar
Meghdad, F. & Ronaldo M. (2014). A supervised model for extraction of multiword expressions, based on statistical context features. In Proceedings of the 10th workshop on multiword expressions (MWE), Gothenburg: Association for Computational Linguistics.
Piao, S. S., et al. (2005). Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech & Language, 19(4), 378.
Article Google Scholar
Ramisch, C. (2015). Multiword expressions acquisition: A generic and open framework. Cham: Springer.
Book Google Scholar
Ramisch, C. et al. (2010). Multiword Expressions in the wild? The mwetoolkit comes in handy. COLING (Demos), In Demonstrations volume.
Salloum, W. & Habash N. (2011). Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation. In Dialects workshop at the conference for empirical methods in natural language processing, Edinburgh.
Uherčík, T. et al. (2013). Utilizing microblogs for web page relevant term acquisition. In SOFSEM 2013: Theory and practice of computer science lecture notes in computer science, 7741: pp. 457–468.
Veronika Vincze, N. T. I. & Berend G. (2011). Multiword expressions and named entities in the Wiki50 corpus. In International conference recent advances in natural language processing, RANLP.
Yassin, Y. A. (2003). Why arabic is the most difficult language for localization. Globalization Insider, XII(3.6), 5.
Google Scholar
Yulia, T. & Shuly W. (2010). Extraction of multi-word expressions from small parallel corpora. In Proceedings of the 23rd international conference on computational linguistics: Posters, Beijing: Association for Computational Linguistics.
Yulia, T., & Shuly, W. (2014). Identification of multiword expressions by combining multiple linguistic information sources. Computational Linguistics, 40, 449–468.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Princess Sumaya University for Technology, Amman, Jordan
Daoud Daoud & Akram Al-Kouz
Department of Computer Science, American University of Madaba, Madaba, Jordan
Mohammad Daoud

Authors

Daoud Daoud
View author publications
You can also search for this author in PubMed Google Scholar
Akram Al-Kouz
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Daoud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akram Al-Kouz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daoud, D., Al-Kouz, A. & Daoud, M. Time-sensitive Arabic multiword expressions extraction from social networks. Int J Speech Technol 19, 249–258 (2016). https://doi.org/10.1007/s10772-015-9315-3

Download citation

Received: 04 June 2015
Accepted: 18 October 2015
Published: 29 October 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10772-015-9315-3

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sentiment Analysis on Arabic Tweets: Challenges to Dissecting the Language

A Hybrid Approach for Extracting Arabic Persons’ Names and Resolving Their Ambiguity from Twitter

Sentiment Analysis for Micro-blogging Platforms in Arabic

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Time-sensitive Arabic multiword expressions extraction from social networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sentiment Analysis on Arabic Tweets: Challenges to Dissecting the Language

A Hybrid Approach for Extracting Arabic Persons’ Names and Resolving Their Ambiguity from Twitter

Sentiment Analysis for Micro-blogging Platforms in Arabic

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now