Abstract
In this paper, we present a comprehensive approach for extracting and relating Arabic multiword expressions (MWE) from Social Networks. 15 million tweets were collected and processed to form our data set. Due to the complexity of processing Arabic and the lack of resources, we built an experimental system to extract and relate similar MWE using statistical methods. We introduce a new metrics for measuring valid MWE in Social Networks. We compare results obtained from our experimental system against semantic graph obtained from web knowledgebase.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The translation of “عاصفة الحزم” into English using Google translate is “Storm packets”, which is unrelated to the source MWE. This is a clear demonstration of the necessity to treat MWE as one unit.
References
Al-Haj, H. (2010). Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In Proceedings of the 23rd international conference on computational linguistics (COLING).
Alkouz, A. & Albayrak, S. (2012). An interests discovery approach in social networks based on semantically enriched graphs. In International conference on advances in social networks analysis and mining, Istanbul.
Baldwin, T. et al. (2008). A machine learning approach to multiword expression extraction. In Proceedings of the LREC workshop towards a shared task for multiword expressions.
Bar, K. & Dershowitz N. (2014). Inferring paraphrases for a highly inflected language from a monolingual corpus. In Computational linguistics and intelligent text processing, Lecture notes in computer science, New York: Springer, 8404, pp 254–270.
Bruce, C., et al. (2009). Search engines: Information retrieval in practice. Boston: Addison-Wesley Publishing Company.
Covington, M. A. (1992). A dependency parser for variable-word-order languages. In K. R. Billingsley, H. U. Brown III, & E. Derohanes (Eds.), Computer assisted modeling on the IBM 3090: Papers from the 1989 IBM supercomputing competition. Athens: Baldwin Press.
Daoud, D. (2005). Arabic Deconversion in the framework of the universal networking language. In J. Cardeٌosa, A. Gelbukh & E. Tovar (Eds.), Universal networking language, Advances in Theory and Applications. Research on Computing Science (Vol. 12).
Daoud, D. & Qais H. (2011). Stemming arabic using longest-match and dynamic normalization. In Arabic language technology international conference (ALTIC) 2011, Bibliotheca Alexandrina (B.A.), Alexandria.
Daoud, D., & Boitet, C. (2014). Correctness, strength and similarity evaluation of stemming algorithms for arabic. The Egyptian Journal of Language Engineering, 1(1), 17–23.
Daoud, D., et al. (2015). Arabic tweets clustering and labeling based on lingual and semantically enriched bayesian network model. Recent Patents on Computer Science, 8(2), 1–14.
Ethnologue (2015). Ethnologue languages of the world. Retrieved 2015, from http://www.ethnologue.com/statistics/size.
Frank, S. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.
Graham, K. & Giesbrecht, E. (2006). Automatic identification of non-compositional multiword expressions using latent semantic analysis. In Workshop on multiword expressions: Identifying and exploiting underlying properties, Sydney: Association for Computational Linguistics.
Grinev, M. et al. (2011). Analytics for the realtime web. In Proceedings of the VLDB endowment.
Haewoon, K., et al. (2010). What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web. Raleigh: ACM.
Ivan, A. S. et al. (2002). Multiword expressions: a pain in the neck for NLP. In Proceedings of the third international conference on computational linguistics and intelligent text processing, Springer-Verlag.
Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT Press.
Kenneth Ward, C., & Patrick, H. (1990). Word association norms, mutual information, and lexicography. Computational Linguistic, 16(1), 22–29.
Meghdad, F. & Ronaldo M. (2014). A supervised model for extraction of multiword expressions, based on statistical context features. In Proceedings of the 10th workshop on multiword expressions (MWE), Gothenburg: Association for Computational Linguistics.
Piao, S. S., et al. (2005). Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech & Language, 19(4), 378.
Ramisch, C. (2015). Multiword expressions acquisition: A generic and open framework. Cham: Springer.
Ramisch, C. et al. (2010). Multiword Expressions in the wild? The mwetoolkit comes in handy. COLING (Demos), In Demonstrations volume.
Salloum, W. & Habash N. (2011). Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation. In Dialects workshop at the conference for empirical methods in natural language processing, Edinburgh.
Uherčík, T. et al. (2013). Utilizing microblogs for web page relevant term acquisition. In SOFSEM 2013: Theory and practice of computer science lecture notes in computer science, 7741: pp. 457–468.
Veronika Vincze, N. T. I. & Berend G. (2011). Multiword expressions and named entities in the Wiki50 corpus. In International conference recent advances in natural language processing, RANLP.
Yassin, Y. A. (2003). Why arabic is the most difficult language for localization. Globalization Insider, XII(3.6), 5.
Yulia, T. & Shuly W. (2010). Extraction of multi-word expressions from small parallel corpora. In Proceedings of the 23rd international conference on computational linguistics: Posters, Beijing: Association for Computational Linguistics.
Yulia, T., & Shuly, W. (2014). Identification of multiword expressions by combining multiple linguistic information sources. Computational Linguistics, 40, 449–468.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Daoud, D., Al-Kouz, A. & Daoud, M. Time-sensitive Arabic multiword expressions extraction from social networks. Int J Speech Technol 19, 249–258 (2016). https://doi.org/10.1007/s10772-015-9315-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9315-3