A Comparison of Methods for Automatic Term Extraction for Domain Analysis

William B. Frakes¹⁷,
Gregory Kulczycki¹⁷ &
Jason Tilley¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8919))

Included in the following conference series:

International Conference on Software Reuse

1133 Accesses
1 Citations

Abstract

Fourteen word frequency metrics were tested to evaluate their effectiveness in identifying vocabulary in a domain. Fifteen domain-engineering projects were examined to measure how closely the vocabularies selected by the fourteen word frequency metrics were to the vocabularies produced by domain engineers. Stemming and stopword removal were also evaluated to measure their impact on selecting proper vocabulary terms. The results of the experiment show that stemming and stopword removal do improve performance and that term frequency is a valuable contributor to performance. Most word frequency metrics gave similar results. A few of the metrics did poorly compared to the others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Language Independent Extraction of Key Terms: An Extensive Comparison of Metrics

Biomedical term extraction: overview and a new methodology

Article 25 August 2015

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Article Open access 10 July 2021

References

Crawley, M.J.: The R Book. Wiley, West Sussex (2007)
Book MATH Google Scholar
Frakes, W.: A Method for Bounding Domains. In: IASTED International Conference Software Engineering and Applications, Las Vegas, NV, pp. 269–272 (2000)
Google Scholar
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992)
Google Scholar
Frakes, W.B., Kang, K.: Software Reuse Research: Status and Future. IEEE Transactions on Software Engineering 31(7), 529–536 (2005)
Article Google Scholar
Frakes, W., Prieto-Diaz, R., Fox, C.: DARE: Domain Analysis and Reuse Environment. Annals of Software Engineering, 125–141 (1998)
Google Scholar
Justeson, J., Katz, S.: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. In: Natural Language Engineering, pp. 9–27. IBM Research Division, Almadem (1993)
Google Scholar
Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Noreault, T., McGill, M., Koll, M.: A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment. In: Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, pp. 57–76. Butterworth and Co., Cambridge (1980)
Google Scholar
Porter, M.F.: An Algorithm for Suffix Striping. Program 14(3), 130–137 (1980)
Article Google Scholar
Sclano, F., Velardi, P.: TermExtractor: A Web Application to Learn the Shared Terminology of Emergent Web Communities. In: Gonçalves, R.J., Müller, J.P., Mertins, K., Zelm, M. (eds.) Enterprise Interoperability II, pp. 287–290. Springer, London (2007)
Chapter Google Scholar
Tilley, J.: A Comparison of Statistical Filtering Methods for Automatic Term Extraction for Domain Analysis. Masters Thesis, Computer Science Department, Virginia Tech (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Software Reuse Laboratory, Virginia Tech, Falls Church, VA, USA
William B. Frakes, Gregory Kulczycki & Jason Tilley

Authors

William B. Frakes
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Kulczycki
View author publications
You can also search for this author in PubMed Google Scholar
Jason Tilley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Softwaretechnik und Fahrzeuginformatik, Technische Universität Braunschweig, Mühlenpfordtstr. 23, 38106, Braunschweig, Germany
Ina Schaefer
Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
Ioannis Stamelos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frakes, W.B., Kulczycki, G., Tilley, J. (2014). A Comparison of Methods for Automatic Term Extraction for Domain Analysis. In: Schaefer, I., Stamelos, I. (eds) Software Reuse for Dynamic Systems in the Cloud and Beyond. ICSR 2015. Lecture Notes in Computer Science, vol 8919. Springer, Cham. https://doi.org/10.1007/978-3-319-14130-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-14130-5_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14129-9
Online ISBN: 978-3-319-14130-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparison of Methods for Automatic Term Extraction for Domain Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Language Independent Extraction of Key Terms: An Extensive Comparison of Metrics

Biomedical term extraction: overview and a new methodology

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparison of Methods for Automatic Term Extraction for Domain Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Language Independent Extraction of Key Terms: An Extensive Comparison of Metrics

Biomedical term extraction: overview and a new methodology

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation