Abstract
The paper addresses the problem of extracting acronyms and their expansions from text. We propose a support vector machines (SVM) based approach to deal with the problem. First, all likely acronyms are identified using heuristic rules. Second, expansion candidates are generated from surrounding text of acronyms. Last, SVM model is employed to select the genuine expansions. Analysis shows that the proposed approach has the advantages of saving over the conventional rule based approaches. Experimental results show that our approach outperforms the baseline method of using rules. We also show that the trained SVM model is generic and can adapt to other domains easily.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adar E (2004) SaRAD: a simple and robust abbreviation dictionary. Bioinformatics 20:527–533
Bowden PR, Automatic (1999) Glossary construction for technical papers. Department Working Paper, Nottingham Trent University
Bowden PR, Halstead P, Rose TG (2000). Dictionaryless English plural noun singularisation using a corpus-based list of irregular forms. In: Proceedings of the 17th international conference on English Language Research on Computerized Corpora, Rodopi, Amersterdam, The Netherlands, pp 130–137
Chang JT, Schutze H, Altman RB (2002) Create an online dictionary of abbreviation from MEDLINE. J Am Med Inform Assoc 9(6):612–620
Hettich S, Bay SD (1999) The UCI KDD Archive. [http:// kdd.ics.uci.edu]. Department of Information and Computer Science, University of California, Irvine
Larkey LS, Ogilvie P, Price MA, Tamilio B (2000) Acrophile: An automated acronym extractor and server. In: Proceedings of the 5th ACM conference on digital libraries. ACM Press, San Antonio, pp 205–214
Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 conference on empirical methods in natural language processing, Pittsburgh, pp 126–133
Pustejovsky J, Castano J, Cochran B, Kotecki M, Morrell M (2001) Automatic extraction of acronym-meaning pairs from MEDLINE databases. Medinfo 10(Pt 1):371–375
Schwartz A, Hearst M (2003) A simple algorithm for identifying abbreviation definitions in biomedical text. In: Proceedings of the 2003 pacific symposium on biocomputing. World Scientific Press, Singapore
Taghva K, Gilbreth J (1999) Recognizing acronyms and their definitions. Technical Report, ISRI (Information Science Research Institute), UNLV
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York
Yeates S (1999) Automatic extraction of acronyms from text. In: Proceedings of the 3rd new zealand computer science research students’ conference, University of Waikato, Hamilton, pp 117–124
Yeates S, Bainbridge D, Witten IH (2000) Using compression to identify acronyms in text. In: Proceedings of data compression conference, IEEE Press, New York, pp 582
Yoshida M, Fukuda K, Takagi T (2000) PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary. Bioinformatics 16:169–175
Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc 9:262–272
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, J., Huang, Y. Using SVM to Extract Acronyms from Text. Soft Comput 11, 369–373 (2007). https://doi.org/10.1007/s00500-006-0091-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-006-0091-5