A Machine Learning Approach for the NLP-Based Analysis of Cyber Threats and Vulnerabilities of the Healthcare Ecosystem †
<p>Individual Risk Assessment methodology schema.</p> "> Figure 2
<p>A conceptual schema of threat prioritization step.</p> "> Figure 3
<p>CVE data format.</p> "> Figure 4
<p>Flow diagram of the proposed supervised text-based machine leaning pipeline for the attack vector.</p> "> Figure 5
<p>Flow diagram supervised text-based machine learning pipeline flow chart for exploitability and impact metrics.</p> "> Figure 6
<p>Unseen data evaluation phase.</p> "> Figure 7
<p>Histogram of the number of samples per severity level in the test data.</p> "> Figure 8
<p>Confusion Matrix for Logistic Regression.</p> "> Figure 9
<p>Confusion Matrix for XGBoost.</p> "> Figure 10
<p>Comparison of Accuracy Level obtained with logistic regression and XGBoost.</p> ">
Abstract
:1. Introduction
2. Related Works
2.1. Threat Modeling and Cyber Attacks in the Healthcare Sector
2.2. Threat and Vulnerability Analysis Using Machine Learning Models
3. Proposed Approach
- Determination of the Scope and Context;
- Analysis of the Health Care Supply Chain;
- Individual Risk Assessment;
- Cascading Risk Assessment;
- Risk Controls.
- Healthcare Ecosystem Context, which identifies the main assets of healthcare ecosystem context, including them into four distinct healthcare areas and categorizing them depending on their functionalities.
- Threat Assessment, which identifies and prioritizes the threats related to the services and assets of the HCII, adopting an NLP-based approach. The identified threats are categorized through threat taxonomies and then are assessed in a qualitative manner using threat scales.
- Vulnerability Assessment, which provides an automated vulnerability scoring system, based on a supervised ML solution.
3.1. Healthcare Ecosystem Context
3.2. Threat Assessment
- Abstraction: Defines the different abstraction levels that apply to an attack pattern. A Meta level attack pattern provides an abstract characterisation of a specific methodology or technique used for an attack and generalization of a related group of standard level attack patterns. It is often void of specific technology or implementation and provides an understanding of a high-level approach.
- Status: Defines the different status values of an entry of the CAPEC catalog including view, category, attack pattern.
- Description: A short description of the threat.
- Alternate Terms: Indicates one or more other names used to describe this attack pattern.
- Vendor and Item: Respectively identify the vendor and item (e.g., Google and Chrome) affected by the CS issue.
- Likelihood of Attack: Determines the likelihood and severity of an attack that leverages using the attack pattern and may not be completely accurate for all attacks.
- Typical Severity: It is used to capture an overall average severity value for attacks that leverage this attack pattern with the understanding that it will not be completely accurate for all attacks.
- Related Attack Patterns: Refers to other attack patterns and related high-level categories. These relationships give insight to similar items that may exist at higher and lower levels of abstraction.
- Execution Flow: It is used to provide a detailed step-by-step flow performed by an adversary for a specific attack pattern. It is applicable to attack patterns with an abstraction level of details.
- Prerequisites: Indicates one or more prerequisite conditions necessary for an attack.
- Skills and Resource Required: Describe skill level or knowledge and possible resources (e.g., CPU cycles, IP addresses, tools) required by an adversary for an attack.
- Indicators: The possible indicators including activities, events, conditions, or behaviors that may indicate an attack which could be imminent, in progress, or has occurred. Each Indicator element provides a textual description of the indicator.
- Consequences: The possible consequences associated with an attack pattern. The required Scope element identifies the security property that is violated. The optional Impact element describes the technical impact that arises if an adversary succeeds in their attack.
- Mitigation: The suitable counter measure to prevent or mitigate the risk of an attack. The approaches described in each mitigation element should help improve the resiliency of the target system, reduce its attack surface, or reduce the impact of the attack if it is successful.
- Example Instances: It is used to describe one or more example instances of the attack pattern. An example helps the reader understand the nature, context, and variability of the attack in more practical and concrete terms.
- Related Weaknesses: Contains references to weaknesses associated with this attack pattern. The association implies a weakness that must exist for a given attack to be successful. If multiple weaknesses are associated with the attack pattern, then any of the weaknesses (but not necessarily all) may be present for the attack to be successful. Each related weakness is identified by a (Common Weakness Enumeration) CWE identifier (https://cwe.mitre.org (accessed on 30 October 2022)).
- Taxonomy Mappings: It is used to provide a mapping from an entry (Attack Pattern or Category) in CAPEC to an equivalent entry in a different taxonomy.
- Notes: It is used to provide any additional comments that cannot be captured using the other elements of the view.
3.3. Vulnerability Assessment
- Network;
- Adjacent network;
- Local;
- Physical.
- Attack complexity (“low”, “high” labels);
- Privileges required (“none”, “low”, “high” labels)
- User interaction (“none”, “required” labels);
- scope (“unchanged”, “changed” labels);
- confidentiality (“high”, “low”, “none” labels);
- integrity (“high”, “low”, “none” labels);
- availability (“high”, “low”, “none” labels).
4. Experimental Assessment
4.1. Datasets
4.2. Resources and Tools
4.3. Metrics
4.4. Threat Assessment Experiments
- Theclipboard poisoning attack[THREAT] is said to have been accidentally introduced inChrome version 104[ASSET], according to developer Jeff Johnson.
- By uploading a JSP file to thetomcat’s [ASSET] root directory, it is possible to achievecode execution[THREAT], leading tocommand execution[THREAT].
- Threat actors are increasingly mimicking legitimate applications such asSkype[ASSET],Adobe Reader[ASSET], andVLC Player[ASSET] as a means toabuse trust relationships[THREAT] and increase the likelihood of a successfulsocial engineering attack[THREAT].
- There are indications thatCVE-2021-22600[THREAT] may be under limited, targeted exploitation,” Google noted in itsAndroid[ASSET] Security Bulletin for May 2022.
4.5. Vulnerability Assessment Experiments
- Logistic regression
- -
- penalty: [l1, l2]
- -
- C: [100, 10, 1.0, 0.1, 0.01]
- -
- solver: [liblinear]
- -
- max_iter: [100, 1000, 2500, 5000]
- XGBoost
- -
- n_estimators: [100, 400, 800]
- -
- max_depth: [3, 6, 9]
- -
- learning_rate: [0.05, 0.1, 0.20]
- -
- min_child_weight: [1, 10, 100]
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Islam, S.; Abba, A.; Ismail, U.; Mouratidis, H.; Papastergiou, S. Vulnerability prediction for secure healthcare supply chain service delivery. Integr. Comput. Aided Eng. 2022, 29, 389–409. [Google Scholar] [CrossRef]
- Ponemon Institute. Sixth Annual Benchmark Study on Privacy & Security of Healthcare Data; Technical Report; Ponemon Institute: North Traverse City, MI, USA, 2016. [Google Scholar]
- Coventry, L.; Branley, D. Cybersecurity in healthcare: A narrative review of trends, threats and ways forward. Maturitas 2018, 113, 48–52. [Google Scholar] [CrossRef] [PubMed]
- Islam, S.; Papastergiou, S.; Kalogeraki, E.M.; Kioskli, K. Cyberattack Path Generation and Prioritisation for Securing Healthcare Systems. Appl. Sci. 2022, 12, 4443. [Google Scholar] [CrossRef]
- McKee, D.; Laulheret, P. McAfee Enterprise ATR Uncovers Vulnerabilities in Globally Used B. Braun Infusion Pump; Trellix: Milpitas, CA, USA, 2021. [Google Scholar]
- Halperin, D.; Heydt-Benjamin, T.S.; Ransford, B.; Clark, S.S.; Defend, B.; Morgan, W.; Fu, K.; Kohno, T.; Maisel, W.H. Pacemakers and implantable cardiac defibrillators: Software radio attacks and zero-power defenses. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (sp 2008), Oakland, CA, USA, 18–22 May 2008; pp. 129–142. [Google Scholar] [CrossRef] [Green Version]
- Nifakos, S.; Chandramouli, K.; Nikolaou, C.K.; Papachristou, P.; Koch, S.; Panaousis, E.; Bonacina, S. Influence of Human Factors on Cyber Security within Healthcare Organisations: A Systematic Review. Sensors 2021, 21, 5119. [Google Scholar] [CrossRef] [PubMed]
- Islam, S.; Papastergiou, S.; Mouratidis, H. A Dynamic Cyber Security Situational Awareness Framework for Healthcare ICT Infrastructures. In Proceedings of the PCI 2021: 25th Pan-Hellenic Conference on Informatics, Volos, Greece, 26–28 November 2021; ACM: New York, NY, USA, 2021; pp. 334–339. [Google Scholar] [CrossRef]
- Di Sarno, C.; Formicola, V.; Sicuranza, M.; Paragliola, G. Addressing Security Issues of Electronic Health Record Systems through Enhanced SIEM Technology. In Proceedings of the 2013 International Conference on Availability, Reliability and Security, Regensburg, Germany, 2–6 September 2013; pp. 646–653. [Google Scholar] [CrossRef]
- Tikhomirov, M.; Loukachevitch, N.V.; Sirotina, A.; Dobrov, B.V. Using BERT and Augmentation in Named Entity Recognition for Cybersecurity Domain. In Proceedings of the Natural Language Processing and Information Systems—25th International Conference on Applications of Natural Language to Information Systems, NLDB, Saarbrücken, Germany, 24–26 June 2020; Springer: Cham, Switzerland, 2020; Volume 12089, pp. 16–24. [Google Scholar] [CrossRef]
- Mendsaikhan, O.; Hasegawa, H.; Yamaguchi, Y.; Shimada, H. Identification of Cybersecurity Specific Content Using the Doc2Vec Language Model. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019; Volume 1, pp. 396–401. [Google Scholar] [CrossRef]
- Ciampi, M.; De Pietro, G.; Masciari, E.; Silvestri, S. Some Lessons Learned Using Health Data Literature for Smart Information Retrieval. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; pp. 931–934. [Google Scholar]
- Lima, A.Q.; Keegan, B. Chapter 3—Challenges of using machine learning algorithms for cybersecurity: A study of threat-classification models applied to social media communication data. In Cyber Influence and Cognitive Threats; Benson, V., Mcalaney, J., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 33–52. [Google Scholar] [CrossRef]
- Boyd, D.; Crawford, K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 2012, 15, 662–679. [Google Scholar] [CrossRef]
- Ma, P.; Jiang, B.; Lu, Z.; Li, N.; Jiang, Z. Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields. Tsinghua Sci. Technol. 2021, 26, 259–265. [Google Scholar] [CrossRef]
- Zhou, S.; Liu, J.; Zhong, X.; Zhao, W. Named Entity Recognition Using BERT with Whole World Masking in Cybersecurity Domain. In Proceedings of the 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA), Xiamen, China, 5–8 March 2021; pp. 316–320. [Google Scholar] [CrossRef]
- Chen, Y.; Ding, J.; Li, D.; Chen, Z. Joint BERT Model Based Cybersecurity Named Entity Recognition. In Proceedings of the ICSIM 2021: 2021 The 4th International Conference on Software Engineering and Information Management, Yokohama, Japan, 16–18 January 2021; pp. 236–242. [Google Scholar] [CrossRef]
- Gao, C.; Zhang, X.; Liu, H. Data and knowledge-driven named entity recognition for cyber security. Cybersecurity 2021, 4, 9. [Google Scholar] [CrossRef]
- Mavroeidis, V.; Bromander, S. Cyber Threat Intelligence Model: An Evaluation of Taxonomies, Sharing Standards, and Ontologies within Cyber Threat Intelligence. In Proceedings of the 2017 European Intelligence and Security Informatics Conference (EISIC), Athens, Greece, 11–13 September 2017; pp. 91–98. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD ’16: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
- Wu, D.; Guo, P.; Wang, P. Malware Detection based on Cascading XGBoost and Cost Sensitive. In Proceedings of the 2020 International Conference on Computer Communication and Network Security (CCNS), Xi’an, China, 21–23 August 2020; pp. 201–205. [Google Scholar] [CrossRef]
- Tang, L.; Mahmoud, Q.H. A Survey of Machine Learning-Based Solutions for Phishing Website Detection. Mach. Learn. Knowl. Extr. 2021, 3, 672–694. [Google Scholar] [CrossRef]
- Dixit, P.; Silakari, S. Deep Learning Algorithms for Cybersecurity Applications: A Technological and Status Review. Comput. Sci. Rev. 2021, 39, 100317. [Google Scholar] [CrossRef]
- Paleyes, A.; Urma, R.G.; Lawrence, N.D. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Comput. Surv. 2022, 55, 1–29. [Google Scholar] [CrossRef]
- Shevchenko, N. Threat Modeling: 12 Available Methods; Carnegie Mellon University: Pittsburgh, PA, USA, 2018. [Google Scholar]
- Center for Internet Security (CIS). Cyber Attacks: In the Healthcare Sector; Center for Internet Security (CIS): East Greenbush, NY, USA, 2017. [Google Scholar]
- Goud, N. Malware and Ransomware Attack on Medical Devices; Cybersecurity Insiders: Baltimore, MD, USA, 2017. [Google Scholar]
- Argaw, S.T.; Troncoso-Pastoriza, J.R.; Lacey, D.; Florin, M.; Calcavecchia, F.; Anderson, D.; Burleson, W.P.; Vogel, J.; O’Leary, C.; Eshaya-Chauvin, B.; et al. Cybersecurity of Hospitals: Discussing the challenges and working towards mitigating the risks. BMC Med. Inform. Decis. Mak. 2020, 20, 146. [Google Scholar] [CrossRef] [PubMed]
- Ghaffarian, S.M.; Shahriari, H.R. Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey. ACM Comput. Surv. 2017, 50, 56. [Google Scholar] [CrossRef]
- Yeboah-Ofori, A.; Mouratidis, H.; Ismai, U.; Islam, S.; Papastergiou, S. Cyber Supply Chain Threat Analysis and Prediction Using Machine Learning and Ontology. In Proceedings of the Artificial Intelligence Applications and Innovations—17th IFIP WG 12.5 International Conference, AIAI 2021, Crete, Greece, 25–27 June 2021; Springer: Cham, Switzerrland, 2021; Volume 627, pp. 518–530. [Google Scholar] [CrossRef]
- Haque, N.I.; Rahman, M.A.; Shahriar, M.H.; Khalil, A.A.; Uluagac, A.S. A Novel Framework for Threat Analysis of Machine Learning-based Smart Healthcare Systems. arXiv 2021, arXiv:2103.03472. [Google Scholar]
- Zong, S.; Ritter, A.; Mueller, G.; Wright, E. Analyzing the Perceived Severity of Cybersecurity Threats Reported on Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; Volume 1, pp. 1380–1390. [Google Scholar] [CrossRef] [Green Version]
- Satyapanich, T.; Ferraro, F.; Finin, T. CASIE: Extracting Cybersecurity Event Information from Text. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, the Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; pp. 8749–8757. [Google Scholar]
- Alicante, A.; Benerecetti, M.; Corazza, A.; Silvestri, S. A distributed architecture to integrate ontological knowledge into information extraction. Int. J. Grid Util. Comput. 2016, 7, 245–256. [Google Scholar] [CrossRef]
- Silvestri, S.; Gargiulo, F.; Ciampi, M. Improving Biomedical Information Extraction with Word Embeddings Trained on Closed-Domain Corpora. In Proceedings of the 2019 IEEE Symposium on Computers and Communications (ISCC), Barcelona, Spain, 29 June–3 July 2019; pp. 1129–1134. [Google Scholar] [CrossRef]
- Nikoloudakis, Y.; Kefaloukos, I.; Klados, S.; Panagiotakis, S.; Pallis, E.; Skianis, C.; Markakis, E.K. Towards a Machine Learning Based Situational Awareness Framework for Cybersecurity: An SDN Implementation. Sensors 2021, 21, 4939. [Google Scholar] [CrossRef]
- Singh, K.; Grover, S.S.; Kumar, R.K. Cyber Security Vulnerability Detection Using Natural Language Processing. In Proceedings of the 2022 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 6–9 June 2022; pp. 174–178. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
- Ameri, K.; Hempel, M.; Sharif, H.; Lopez Jr., J.; Perumalla, K. CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model. J. Cybersecur. Priv. 2021, 1, 615–637. [Google Scholar] [CrossRef]
- Alam, M.T.; Bhusal, D.; Park, Y.; Rastogi, N. CyNER: A Python Library for Cybersecurity Named Entity Recognition. arXiv 2022, arXiv:2204.05754. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Akbik, A.; Bergmann, T.; Blythe, D.; Rasul, K.; Schweter, S.; Vollgraf, R. FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 54–59. [Google Scholar] [CrossRef]
- Islam, S.; Papastergiou, S.; Silvestri, S. Cyber Threat Analysis Using Natural Language Processing for a Secure Healthcare System. In Proceedings of the 2022 IEEE Symposium on Computers and Communications (ISCC), Rhodes, Greece, 30 June–3 July 2022; pp. 1–7. [Google Scholar] [CrossRef]
- Silvestri, S.; Gargiulo, F.; Ciampi, M. Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases. Appl. Sci. 2022, 12, 5775. [Google Scholar] [CrossRef]
- Fu, J.; Liu, P.; Zhang, Q. Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, the Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; pp. 7732–7739. [Google Scholar]
- Aizawa, A. An information-theoretic perspective of TF–IDF measures. Inf. Process. Manag. 2003, 39, 45–65. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the International Conference on Learning Representations (ICLR 2013), Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef] [Green Version]
- FIRST. Org. Common Vulnerability Scoring System Version 3.1 Specification Document; Technical Report; FIRST.Org: Cary, NC, USA, 2019. [Google Scholar]
- Stucco-Data Cyber Security Data Sources. Available online: http://stucco.github.io/data/ (accessed on 20 September 2022).
- Phandi, P.; Silva, A.; Lu, W. SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP). In Proceedings of the 12th International Workshop on Semantic Evaluation, New Orleans, LA, USA, 5–6 June 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 697–706. [Google Scholar] [CrossRef] [Green Version]
- Hugging Face—The AI Community Building the Future. Available online: https://huggingface.co (accessed on 20 September 2022).
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- SpaCy. Industrial–Strength Natural Language Processing in Python. Available online: https://spacy.io (accessed on 20 September 2022).
- Beautiful Soup Documentation. Available online: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (accessed on 20 September 2022).
- Scikit-learn. Machine Learning in Python. Available online: https://scikit-learn.org/stable/index.html (accessed on 20 September 2022).
- XGBoost Documentation. Available online: https://xgboost.readthedocs.io/en/stable/index.html (accessed on 20 September 2022).
- Gargiulo, F.; Silvestri, S.; Ciampi, M.; De Pietro, G. Deep neural network for hierarchical extreme multi-label text classification. Appl. Soft Comput. 2019, 79, 125–138. [Google Scholar] [CrossRef]
- Karunasingha, D.S.K. Root mean square error or mean absolute error? Use their ratio as well. Inf. Sci. 2022, 585, 609–629. [Google Scholar] [CrossRef]
- Kasuya, E. On the use of r and r squared in correlation and regression. Ecol. Res. 2019, 34, 235–236. [Google Scholar] [CrossRef]
- Alicante, A.; Corazza, A.; Isgrò, F.; Silvestri, S. Unsupervised entity and relation extraction from clinical records in Italian. Comput. Biol. Med. 2016, 72, 263–275. [Google Scholar] [CrossRef]
Paper | Area | Method and Review |
---|---|---|
[29] | Review on ML and Data Mining techniques for software vulnerabilities | Vulnerability prediction based on text mining on software source code produced better result than metrics-based work despite availability of metrics. Anomaly detection approaches applicable with mature software system but lack of focus on security related vulnerabilities and high false positive. |
[30] | Supply Chain threat analysis | Random Forest and XGBoost algorithm are used for the threat analysis with based on the threat intelligence features. |
[31] | Identification of potential attack in smart healthcare system | Machine Learning and formal analysis capabilities are integrated for identification of attack vector based on Dynamic Casual Modeling (DCM) supervised and Automated decision-making (ADM) unsupervised ML model. |
[32] | Cyber threat severity analysis | NLP based on logistic regression, used to identify the threat severity based on tweet data describing software vulnerability. |
[15,18] | Cyber Security information/entity identification | NLP DL-based architecture is used for Named Entity Recognition (NER) in cyber security based on unstructured NER dataset. Data-driven DL with knowledge-driven dictionary method is used to improve NER. |
[37] | Software code vulnerability detection | Automated software vulnerability detection using recent DL approaches. The vulnerability in software code is treated as an NLP problem. |
[40] | Cyber Security Claim Classification | CS feature claims classifier based on BERT model, which also includes an approach to obtain optimal hyperparameters. The model obtains SOTA results, but it needs a specifically annotated corpus for the fine-tuning. |
[17] | Cyber Security NER | A BERT-based model fine-tuned for the CS NER task. The obtained results are improved using CS-domain dictionaries. |
[41] | Cyber Security NER | An XLM RoBERTa-large model pretrained on threat reports and fine-tuned for the NER task for the CS domain. The approach improves the performance by adopting other additional approaches (regular expressions and KBs, a ML-based model for generic domain entities and a Flair-based NER model), leveraging a priority-based merging for extracting entities. |
[16] | Cyber Security NER | CS NER model that integrates BERT and BiLSTM-CRF architectures, improving baseline performance. |
Area | Name |
---|---|
1 | User interactions with implants and sensors |
2 | Medical equipment and IT devices |
3 | Services and processes |
4 | Interdependent HCIIs – Ecosystem |
Category | Functionalities |
---|---|
Influence | Found in most organizations, distinct |
Type | Software, hardware, Operating System (OS), information Sensitivity |
Sensitivity | Restricted, unrestricted |
Criticality | Essential, required, deferrable |
Threat Level | Percentage of Occurrence Range |
---|---|
Very High | [80–100] |
High | [60–80] |
Medium | [40–60] |
Low | [20–40] |
Very Low | [1–20] |
CVSS-Like Score Range | Severity Level |
---|---|
Very High | |
High | |
Medium | |
Low | |
Very Low |
Dataset | News Count | Word Count | Average Word Count | Word Stddev | Sentence Count | Average Sentence Count | Sentence Stddev |
---|---|---|---|---|---|---|---|
The Hacker News Dataset (6 September 2022) | 1064 | 514,220 | 484.18 | 245.33 | 21,093 | 19.86 | |
NER Training set | 224 | 39,826 | 4708 | ||||
NER Test set | 84 | 20,086 | 1701 | ||||
Threat Level (TL) dataset | 756 | 454,308 | 14,595 |
Dataset | Reports Count | Total Word Count | Average Length | Standard Deviation | Median |
---|---|---|---|---|---|
CVE Dataset | 77,441 | 2,880,401 | 34 | ||
Training set | 58,080 | 2,153,576 | 34 | ||
Test set | 19,361 | 726,785 | 34 |
Severity Level | Number of Samples |
---|---|
Very High | 3393 |
High | 6385 |
Medium | 3583 |
Low | 355 |
Very Low | 4 |
Method | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
DS | ||||
BERT | ||||
SecBERT | 0.9662 | 0.7995 | 0.8750 | 0.9975 |
Entity Type | Number of Entities |
---|---|
Threat | 2145 |
Asset | 6483 |
Assets | Threats Level |
---|---|
Apache Tomcat | Medium |
Adobe Reader | High |
Google Chrome | Very High |
Laravel framework | Low |
Debian Linux | Medium |
Android | High |
ML Model | MAE | MSE | |
---|---|---|---|
Multiclass Logistic Regression | |||
XGBoost |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Silvestri, S.; Islam, S.; Papastergiou, S.; Tzagkarakis, C.; Ciampi, M. A Machine Learning Approach for the NLP-Based Analysis of Cyber Threats and Vulnerabilities of the Healthcare Ecosystem. Sensors 2023, 23, 651. https://doi.org/10.3390/s23020651
Silvestri S, Islam S, Papastergiou S, Tzagkarakis C, Ciampi M. A Machine Learning Approach for the NLP-Based Analysis of Cyber Threats and Vulnerabilities of the Healthcare Ecosystem. Sensors. 2023; 23(2):651. https://doi.org/10.3390/s23020651
Chicago/Turabian StyleSilvestri, Stefano, Shareeful Islam, Spyridon Papastergiou, Christos Tzagkarakis, and Mario Ciampi. 2023. "A Machine Learning Approach for the NLP-Based Analysis of Cyber Threats and Vulnerabilities of the Healthcare Ecosystem" Sensors 23, no. 2: 651. https://doi.org/10.3390/s23020651
APA StyleSilvestri, S., Islam, S., Papastergiou, S., Tzagkarakis, C., & Ciampi, M. (2023). A Machine Learning Approach for the NLP-Based Analysis of Cyber Threats and Vulnerabilities of the Healthcare Ecosystem. Sensors, 23(2), 651. https://doi.org/10.3390/s23020651