Abstract
Manual term extraction is similar to literal meaning: A translator browses text, classifies words, and prepares for translation. Terminology, as a centralized carrier of expertise, creation, popularization, and disappearance, dynamically reflects the development and evolution of an industry. The automatic extraction of terminology is a key technology for creating a professional terminology database, and it is also a key topic in the field of natural language processing. The purpose of this paper is to study how to analyse a term extraction algorithm based on machine learning and a comprehensive feature strategy. Focusing on the problems of poor generality and single statistical features of current term extraction algorithms, this paper proposes an improved domain ontology term extraction algorithm based on a comprehensive feature strategy. Moreover, automatic term extraction experiments based on a word-based maximum entropy model and a conditional random field model based on machine learning are conducted in this paper. Its word-based conditional random field model outperforms the maximum entropy model. The experimental results show that the algorithm based on the comprehensive feature strategy improves the accuracy by 8.6% compared with the TF-IDF algorithm and the C-value term extraction algorithm. This algorithm can be used to effectively extract the terms in a text and has good generality.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
References
Helma C, Cramer T, Kramer S et al (2018) Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Comput 35(4):1402–1411
Mullainathan S, Spiess J (2017) Machine learning: an applied econometric approach. J Econ Perspect 31(2):87–106
Voyant C, Notton G, Kalogirou S et al (2017) Machine learning methods for solar radiation forecasting: a review. Renew Energy 105:569–582
Zhou L, Pan S, Wang J et al (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361
Kavakiotis I, Tsave O, Salifoglou A et al (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 15:104–116
Lamperti F, Roventini A, Sani A (2018) Agent-based model calibration using machine learning surrogates. J Econ Dyn Control 90:366–389
Zhang L, Tan J, Han D et al (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discovery Today 22(11):1680–1685
Usman N et al (2021) A comprehensive survey on word representation models: from classical to state-of-the-art word representation language models. Trans Asian Low Res Lang Info Process 20(5):1–35
Nazanin F, Nazarenko A, Alizon F (2020) Keyword extraction: Issues and methods. Nat Lang Eng 26(3):259–291
Jiang Linfeng. (2019) Research on target detection method based on conditional random field model [D]. Shanghai Jiaotong University
Poret N, Twilley RR, Coronado-Molina RM (2018) Object-based correction of LiDAR DEMs using RTK-GPS data and machine learning modeling in the coastal Everglades. Environ Model Softw 112(3):491–496
Liu S, Wang X, Liu M et al (2017) Towards better analysis of machine learning models: a visual analytics perspective. V Info 1(1):48–56
Zhang J, Zhuo W, Verma N (2017) In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J Solid State Circuits 52(4):1–10
Brynjolfsson E, Mitchell T (2017) What can machine learning do? Workforce implications. Science 358(6370):1530–1534
Thrall JH, Li X, Li Q et al (2018) Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. J Am Coll Radiol 15(3):504–508
Gastegger M, Behler J, Marquetand P (2017) Machine learning molecular dynamics for the simulation of infrared spectra. Chem Sci 8(10):6924–6935
Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 09(1):1–16
Benjamin SL, Alán AG (2018) Inverse molecular design using machine learning: generative models for matter engineering. Science 361(6400):360–365
Goodfellow I, Mcdaniel P, Papernot N (2018) Making machine learning robust against adversarial inputs. Commun ACM 61(7):56–66
Char DS, Shah NH, Magnus D (2018) Implementing machine learning in health care-addressing ethical challenges. N Engl J Med 378(11):981–983
Zhang Y, Kim EA (2017) Quantum loop topography for machine learning[J]. Phys Rev Lett 118(21):2164011–2164015
Cai J, Luo J, Wang S et al (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Assouline D, Mohajeri N, Scartezzini JL (2017) Quantifying rooftop photovoltaic solar energy potential: a machine learning approach. Solar Energy 141:278–296
Funding
There is no funding for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no potential conflict of interest in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gong, X., Cheng, B., Hu, X. et al. A term extraction algorithm based on machine learning and comprehensive feature strategy. Neural Comput & Applic 36, 2385–2398 (2024). https://doi.org/10.1007/s00521-023-08960-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08960-9