Abstract
Fourteen word frequency metrics were tested to evaluate their effectiveness in identifying vocabulary in a domain. Fifteen domain-engineering projects were examined to measure how closely the vocabularies selected by the fourteen word frequency metrics were to the vocabularies produced by domain engineers. Stemming and stopword removal were also evaluated to measure their impact on selecting proper vocabulary terms. The results of the experiment show that stemming and stopword removal do improve performance and that term frequency is a valuable contributor to performance. Most word frequency metrics gave similar results. A few of the metrics did poorly compared to the others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Crawley, M.J.: The R Book. Wiley, West Sussex (2007)
Frakes, W.: A Method for Bounding Domains. In: IASTED International Conference Software Engineering and Applications, Las Vegas, NV, pp. 269–272 (2000)
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992)
Frakes, W.B., Kang, K.: Software Reuse Research: Status and Future. IEEE Transactions on Software Engineering 31(7), 529–536 (2005)
Frakes, W., Prieto-Diaz, R., Fox, C.: DARE: Domain Analysis and Reuse Environment. Annals of Software Engineering, 125–141 (1998)
Justeson, J., Katz, S.: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. In: Natural Language Engineering, pp. 9–27. IBM Research Division, Almadem (1993)
Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Noreault, T., McGill, M., Koll, M.: A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment. In: Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, pp. 57–76. Butterworth and Co., Cambridge (1980)
Porter, M.F.: An Algorithm for Suffix Striping. Program 14(3), 130–137 (1980)
Sclano, F., Velardi, P.: TermExtractor: A Web Application to Learn the Shared Terminology of Emergent Web Communities. In: Gonçalves, R.J., Müller, J.P., Mertins, K., Zelm, M. (eds.) Enterprise Interoperability II, pp. 287–290. Springer, London (2007)
Tilley, J.: A Comparison of Statistical Filtering Methods for Automatic Term Extraction for Domain Analysis. Masters Thesis, Computer Science Department, Virginia Tech (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Frakes, W.B., Kulczycki, G., Tilley, J. (2014). A Comparison of Methods for Automatic Term Extraction for Domain Analysis. In: Schaefer, I., Stamelos, I. (eds) Software Reuse for Dynamic Systems in the Cloud and Beyond. ICSR 2015. Lecture Notes in Computer Science, vol 8919. Springer, Cham. https://doi.org/10.1007/978-3-319-14130-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-14130-5_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14129-9
Online ISBN: 978-3-319-14130-5
eBook Packages: Computer ScienceComputer Science (R0)