Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/1075218.1075244dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

A morphologically sensitive clustering algorithm for identifying Arabic roots

Published: 03 October 2000 Publication History

Abstract

We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes and accurate clustering to up to 94.06% for unedited Arabic text samples, without the use of dictionaries.

References

[1]
Adams, E. (1991) A Study of Trigrams and their feasibility as Index Terms in a full text Information Retrieval System. PhD Thesis, George Washington University, USA.
[2]
Adamson, George W. and J. Boreham (1974) The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval,. Vol 10, pp 253--260
[3]
Al-Fedaghi Sabah S. and Fawaz Al-Anzi (1989) A new algorithm to generate Arabic root-pattern forms. Proceedings of the 11th National Computer Conference, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia., pp. 04--07
[4]
Al-Kharashi, I. and M. Evens (1994) Comparing words, stems, and roots as Index terms in an Arabic Information Retrieval system. Journal of the American Society for Information Science, 45/8, pp. 548--560
[5]
Al-Najem, Salah R. (1998). An Explanation of Computational Arabic Morphology. DATR Documentation Report, University of Sussex.
[6]
Al-Raya (1997) Newspaper. Quatar.
[7]
Al-Shalabi, R. and M. Evens (1998) A Computational Morphology System for Arabic. Proceedings of COLING-ACL, New Brunswick, NJ.
[8]
Al-Watan (2000) Newspaper. Qatar.
[9]
Beesley, K. B. (1996) Arabic Finite-State Morphological Analysis and Generation. Proceedings of COLING-96, pp 89--94.
[10]
Beesley, K. B. (1998) Arabic Morphological Analysis on the Internet. Proceedings of the 6th International Conference and Exhibition on Multi-Lingual Computing, Cambridge.
[11]
El-Sadany, T. and M. Hashish (1989) An Arabic morphological system. IBM System Journal, 28/4
[12]
Harman, D. (1991) How effective is suffixing? Journal of the American Society for Information Science, 42/1, pp 7--15.
[13]
Hmeidi, I., Kanaan, G. and M. Evens (1997) Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents. Journal of the American Society for Information Science, 48/10, pp. 867--881.
[14]
Kiraz, G. (1994) Multi-tape two-level Morphology: a case study in Semitic non-linear morphology. Proceedings of COLING-94, pp 180--186.
[15]
Lovins, J. B. (1968) Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11/1.
[16]
Popovic, M. and P. Willet (1992) The effectiveness of stemming for natural language access to Sloven textual data. Journal of the American Society for Information Science, 43/5, pp. 384--390.
[17]
Porter, M. F. (1980) An Algorithm for suffix stripping. Program, 14 /3, pp 130--137
[18]
Stalls, B. and Knight, K. (1998) Translating names and technical terms in Arabic text. Proceedings of COLING-ACL, New Brunswick, NJ, 1998
[19]
van Rijsbergen, C. J. (1979) Information Retrieval. Butterworths, London.
[20]
Robertson, A. and Willett, P.(1992) Searching for historical word-forms in a database of 17th- century English text using spelling-correction methods. 15th Annual International Conference SIGIR.
[21]
Ubu-Salem H., Al-Omari M., and M. Evens (1999) Stemming methodologies over individual query words for an Arabic information retrieval system. Journal of the American Society for Information Science. 50/6, pp 524--529.

Cited By

View all
  1. A morphologically sensitive clustering algorithm for identifying Arabic roots

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    ACL '00: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
    October 2000
    598 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    Published: 03 October 2000

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 85 of 443 submissions, 19%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)53
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Arabic Cross-Language Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/278921015:3(1-44)Online publication date: 28-Jan-2016
    • (2014)Naïve Bayes classifiers for authorship attribution of Arabic textsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2014.06.00626:4(473-484)Online publication date: 1-Dec-2014
    • (2011)A Fast Corpus-Based StemmerACM Transactions on Asian Language Information Processing10.1145/1967293.196729510:2(1-16)Online publication date: 1-Jun-2011
    • (2010)A comparison study of some Arabic root finding algorithmsJournal of the American Society for Information Science and Technology10.5555/1814508.181452261:5(1015-1024)Online publication date: 1-May-2010
    • (2007)YASSACM Transactions on Information Systems10.1145/1281485.128148925:4(18-es)Online publication date: 1-Oct-2007
    • (2006)Arabic OCR error correction using character segment correction, language modeling, and shallow morphologyProceedings of the 2006 Conference on Empirical Methods in Natural Language Processing10.5555/1610075.1610132(408-414)Online publication date: 22-Jul-2006
    • (2006)Word-Based correction for retrieval of arabic OCR degraded documentsProceedings of the 13th international conference on String Processing and Information Retrieval10.1007/11880561_17(205-216)Online publication date: 11-Oct-2006
    • (2005)POS tagging of dialectal ArabicProceedings of the ACL Workshop on Computational Approaches to Semitic Languages10.5555/1621787.1621798(55-62)Online publication date: 29-Jun-2005
    • (2005)Applying Authorship Analysis to Extremist-Group Web Forum MessagesIEEE Intelligent Systems10.1109/MIS.2005.8120:5(67-75)Online publication date: 1-Sep-2005
    • (2005)Dictionary-based techniques for cross-language information retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2004.06.01241:3(523-547)Online publication date: 1-May-2005
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media