Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/973620.973690acmconferencesArticle/Chapter ViewAbstractPublication PagescompsystechConference Proceedingsconference-collections
Article

Building an inflectional stemmer for Bulgarian

Published: 19 June 2003 Publication History

Abstract

The paper starts with an overview of the most important approaches to stemming for English as well as for some Slavic languages. Then, the design, implementation and evaluation of an inflectional stemmer for Bulgarian are described. The problem is addressed as a machine-learning task from a large morphological dictionary. A detailed automatic evaluation for different parameter values in terms of under-stemming, over-stemming and coverage is provided.

References

[1]
{1} Adamson G., J. Boreham. The use of an association measure based on character structure to identify semantically related pairs of words and document titles. In Information Processing & Management, vol. 10(7/8), pp. 253-260, 1974.
[2]
{2} Dawson J. 1974. Suffix removal for word conflation. In Bulletin of the Association for Literary & Linguistic Computing. vol. 2(3), pp. 33-46, 1974.
[3]
{3} Frakes W. Term Conflation for Information Retrieval. PhD. dissertation. Syracuse University, August 1982.
[4]
{4} Frakes W., R. Baeza-Yates. Information Retrieval: Data Structures & Algorithms. Englewood Cliffs, N J: Prentice-Hall, 1992. (Chapter 8).
[5]
{5} Hafer M., S. Weiss. Word segmentation by letter successor varieties. In Information Processing & Management, vol. 10(11/12), pp. 371-386, 1974.
[6]
{6} Harman, D. How effective is suffixing? In Journal of The American Society of Information Science. Vol. 42, No 1, pp. 7-15. 1991.
[7]
{7} Hull, D. Stemming Algorithms: A Case study for detailed evaluation. In Journal of The American Society of Information Science. Vol. 47, No 1. pp. 70-84. 1996.
[8]
{8} Kovalenko A. Stemka: Morphological analyzer for small search systems. In System Administrator Magazine. Moscow, October 2002.
[9]
{9} Kraaij W. R. Pohlmann. Porter's stemming algorithm for Dutch. In Noordman LGM and de Vroomen WAM, eds. Informatiewetenschap 1994: Wetenschappelijke bijdragen aan de derde STINFON Conferentie, Tilburg, pp. 167-180, 1994.
[10]
{10} Kraaij W. Viewing stemming as recall enhancement. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 40-48. ACM, New York. 1996.
[11]
{11} Krovetz R. Viewing Morphology as an Inference Process. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 191-202. ACM. New York. 1993.
[12]
{12} Kukushkina O., A. Polikarpov. DicTUM-1: A system for dictionary-text universal manipulations and analysis. In Proc. of XI international Conference "History and Computing". Moscow, August 20-24, 1996.
[13]
{13} Lovins J. Development of a stemming algorithm. Mech. Trans. And Comp. Ling. 11, pp. 22-31. 1968.
[14]
{14} Paice C. Another stemmer. In Proc. of SIGIR Forum, vol. 24(3), pp. 56-61, 1990.
[15]
{15} Popovic M., Willett P. The Effectiveness of Stemming for Natural Language access to Slovene Textual Data. In Journal of The American Society of Information Science. Vol. 43, No 5. pp. 384-390. 1992.
[16]
{16} Porter M. An algorithm for suffix stripping. Program 14, 3. pp. 130-137. 1980.
[17]
{17} Savoy J. Stemming of French words based on grammatical categories. In Journal of the American Society for Information Science. vol. 44(1), pp. 1-9, 1993.
[18]
{18} Schinke R., M. Greengrass, A. Robertson, P. Willett. A stemming algorithm for Latin text databases. In Journal of Documentation. vol. 52, pp.172-187, 1996.
[19]
{19} Silberztein M. Dictionnaires electroniques et analyse automatique de textes: le systeme INTEX. Masson, Paris, 1993.
[20]
{20} Slovoto, http://slovoto.orbitel.bg
[21]
{21} Snowball: http://snowball.tartarus.org
[22]
{22} Xu J., Croft B. Corpus Based Stemming Using Coocurrence of Word Variants. In ACM Transactions on Information Systems, vol. 16, No 1. pp. 61-81. 1998.

Cited By

View all
  • (2024)Analyzing Methods for Classification of Electronic Word-of-Mouth: A ReviewProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660872(85-89)Online publication date: 25-May-2024
  • (2024)A Systematic Review of Stemmers of Indian and Non-Indian Vernacular LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360461223:1(1-51)Online publication date: 15-Jan-2024
  • (2022)Automatic Sentiment Analysis on Hotel Reviews in Bulgarian—Basic Approaches and ResultsInternational Conference on Intelligent Emerging Methods of Artificial Intelligence & Cloud Computing10.1007/978-3-030-92905-3_5(48-56)Online publication date: 5-May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CompSysTech '03: Proceedings of the 4th international conference conference on Computer systems and technologies: e-Learning
June 2003
732 pages
ISBN:9549641333
DOI:10.1145/973620
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. lemmatization
  3. machine learning
  4. stemming
  5. text processing

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 241 of 492 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Analyzing Methods for Classification of Electronic Word-of-Mouth: A ReviewProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660872(85-89)Online publication date: 25-May-2024
  • (2024)A Systematic Review of Stemmers of Indian and Non-Indian Vernacular LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360461223:1(1-51)Online publication date: 15-Jan-2024
  • (2022)Automatic Sentiment Analysis on Hotel Reviews in Bulgarian—Basic Approaches and ResultsInternational Conference on Intelligent Emerging Methods of Artificial Intelligence & Cloud Computing10.1007/978-3-030-92905-3_5(48-56)Online publication date: 5-May-2022
  • (2021)Text Analytics in Bulgarian: An Overview and Future DirectionsCybernetics and Information Technologies10.2478/cait-2021-002721:3(3-23)Online publication date: 7-Dec-2021
  • (2021)WhatTheWikiFactProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481987(4690-4695)Online publication date: 26-Oct-2021
  • (2021)Modified Pointwise Mutual Information-Based Feature Selection for Text ClassificationProceedings of the Future Technologies Conference (FTC) 2021, Volume 210.1007/978-3-030-89880-9_26(333-353)Online publication date: 4-Nov-2021
  • (2018)Indirect Association Rules Mining in Clinical TextsArtificial Intelligence: Methodology, Systems, and Applications10.1007/978-3-319-99344-7_4(36-47)Online publication date: 29-Aug-2018
  • (2017)A systematic review of text stemming techniquesArtificial Intelligence Review10.1007/s10462-016-9498-248:2(157-217)Online publication date: 1-Aug-2017
  • (2016)Text StemmingACM Computing Surveys10.1145/297560849:3(1-46)Online publication date: 16-Sep-2016
  • (2016)Combining Structured and Free Textual Data of Diabetic Patients’ Smoking StatusArtificial Intelligence: Methodology, Systems, and Applications10.1007/978-3-319-44748-3_6(57-67)Online publication date: 18-Aug-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media