Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Time-sensitive Arabic multiword expressions extraction from social networks

  • Special Issue Article
  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we present a comprehensive approach for extracting and relating Arabic multiword expressions (MWE) from Social Networks. 15 million tweets were collected and processed to form our data set. Due to the complexity of processing Arabic and the lack of resources, we built an experimental system to extract and relate similar MWE using statistical methods. We introduce a new metrics for measuring valid MWE in Social Networks. We compare results obtained from our experimental system against semantic graph obtained from web knowledgebase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://lucene.apache.org/.

  2. The translation of “عاصفة الحزم” into English using Google translate is “Storm packets”, which is unrelated to the source MWE. This is a clear demonstration of the necessity to treat MWE as one unit.

  3. http://wiki.dbpedia.org/.

References

  • Al-Haj, H. (2010). Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In Proceedings of the 23rd international conference on computational linguistics (COLING).

  • Alkouz, A. & Albayrak, S. (2012). An interests discovery approach in social networks based on semantically enriched graphs. In International conference on advances in social networks analysis and mining, Istanbul.

  • Baldwin, T. et al. (2008). A machine learning approach to multiword expression extraction. In Proceedings of the LREC workshop towards a shared task for multiword expressions.

  • Bar, K. & Dershowitz N. (2014). Inferring paraphrases for a highly inflected language from a monolingual corpus. In Computational linguistics and intelligent text processing, Lecture notes in computer science, New York: Springer, 8404, pp 254–270.

  • Bruce, C., et al. (2009). Search engines: Information retrieval in practice. Boston: Addison-Wesley Publishing Company.

    Google Scholar 

  • Covington, M. A. (1992). A dependency parser for variable-word-order languages. In K. R. Billingsley, H. U. Brown III, & E. Derohanes (Eds.), Computer assisted modeling on the IBM 3090: Papers from the 1989 IBM supercomputing competition. Athens: Baldwin Press.

    Google Scholar 

  • Daoud, D. (2005). Arabic Deconversion in the framework of the universal networking language. In J. Cardeٌosa, A. Gelbukh & E. Tovar (Eds.), Universal networking language, Advances in Theory and Applications. Research on Computing Science (Vol. 12).

  • Daoud, D. & Qais H. (2011). Stemming arabic using longest-match and dynamic normalization. In Arabic language technology international conference (ALTIC) 2011, Bibliotheca Alexandrina (B.A.), Alexandria.

  • Daoud, D., & Boitet, C. (2014). Correctness, strength and similarity evaluation of stemming algorithms for arabic. The Egyptian Journal of Language Engineering, 1(1), 17–23.

    Google Scholar 

  • Daoud, D., et al. (2015). Arabic tweets clustering and labeling based on lingual and semantically enriched bayesian network model. Recent Patents on Computer Science, 8(2), 1–14.

    Article  MathSciNet  Google Scholar 

  • Ethnologue (2015). Ethnologue languages of the world. Retrieved 2015, from http://www.ethnologue.com/statistics/size.

  • Frank, S. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.

    Google Scholar 

  • Graham, K. & Giesbrecht, E. (2006). Automatic identification of non-compositional multiword expressions using latent semantic analysis. In Workshop on multiword expressions: Identifying and exploiting underlying properties, Sydney: Association for Computational Linguistics.

  • Grinev, M. et al. (2011). Analytics for the realtime web. In Proceedings of the VLDB endowment.

  • Haewoon, K., et al. (2010). What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web. Raleigh: ACM.

  • Ivan, A. S. et al. (2002). Multiword expressions: a pain in the neck for NLP. In Proceedings of the third international conference on computational linguistics and intelligent text processing, Springer-Verlag.

  • Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT Press.

    Google Scholar 

  • Kenneth Ward, C., & Patrick, H. (1990). Word association norms, mutual information, and lexicography. Computational Linguistic, 16(1), 22–29.

    Google Scholar 

  • Meghdad, F. & Ronaldo M. (2014). A supervised model for extraction of multiword expressions, based on statistical context features. In Proceedings of the 10th workshop on multiword expressions (MWE), Gothenburg: Association for Computational Linguistics.

  • Piao, S. S., et al. (2005). Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech & Language, 19(4), 378.

    Article  Google Scholar 

  • Ramisch, C. (2015). Multiword expressions acquisition: A generic and open framework. Cham: Springer.

    Book  Google Scholar 

  • Ramisch, C. et al. (2010). Multiword Expressions in the wild? The mwetoolkit comes in handy. COLING (Demos), In Demonstrations volume.

  • Salloum, W. & Habash N. (2011). Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation. In Dialects workshop at the conference for empirical methods in natural language processing, Edinburgh.

  • Uherčík, T. et al. (2013). Utilizing microblogs for web page relevant term acquisition. In SOFSEM 2013: Theory and practice of computer science lecture notes in computer science, 7741: pp. 457–468.

  • Veronika Vincze, N. T. I. & Berend G. (2011). Multiword expressions and named entities in the Wiki50 corpus. In International conference recent advances in natural language processing, RANLP.

  • Yassin, Y. A. (2003). Why arabic is the most difficult language for localization. Globalization Insider, XII(3.6), 5.

    Google Scholar 

  • Yulia, T. & Shuly W. (2010). Extraction of multi-word expressions from small parallel corpora. In Proceedings of the 23rd international conference on computational linguistics: Posters, Beijing: Association for Computational Linguistics.

  • Yulia, T., & Shuly, W. (2014). Identification of multiword expressions by combining multiple linguistic information sources. Computational Linguistics, 40, 449–468.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akram Al-Kouz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Daoud, D., Al-Kouz, A. & Daoud, M. Time-sensitive Arabic multiword expressions extraction from social networks. Int J Speech Technol 19, 249–258 (2016). https://doi.org/10.1007/s10772-015-9315-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9315-3

Keywords

Navigation