1 Introduction

The IoT (Internet of Things) is a conceptual term of device computing, which describes the idea of connecting physical objects to the internet every day and identifying themselves with other devices. IoT is a vital piece of technology with different things related to each other and is expected to develop dramatically over time for human well-being. There is untapped potential in the IoT, such as effective data-driven decision making, the ability to monitor and track things, greater workload ease with automation, increased efficiency and production by saving resources and money, and paving the way to a better quality of life. Rapid growth in communication and information technology with more excellent computing capabilities, leveraging the potential of IoT to the medical domain and named as the Internet of Medical Things (IoMT) technology. Healthcare IoT, or IoMT, states the connected infrastructure of medical devices and software applications that can communicate with various healthcare information technology systems [1].

IoMT itself is an umbrella term that encompasses different and varied types of medical devices employed at various stages of the healthcare process and in several unique capacities. The underlying concept behind all IoMT devices is that they capture, compile, and use data to streamline processes and maximize performance in various services involved in the medical sector (Fig. 1). IoMT systems use several smart actuators and biosensors, which are responsible for gathering real-time a nd confidential data related to patients. It allows medical professionals to understand and interpret sensitive information [2] better and more effectively. These sensors can be used as implants inside the human body and produce enormous real-time data for analysis and effective decision making. Over the next few decades, IoMT would explore the key challenges and trends in the world to transform the healthcare sector with its great potential for many applications, from remote monitoring to the incorporation of medical equipment. As a result, IoMT-based health care assessment can prevent fatal outcomes and increase the Nation's productivity. It results in bringing substantial growth for developing countries.

Fig. 1
figure 1

IoMT services

Moreover, IoMT can keep modern society functionally productive even in pandemic circumstances. IoMT with enabled devices unleashes the ability to maintain the security and safety of patients and support doctors to provide excellent treatment. IoMT facilitates the continuity of healthcare services and keeps updating the patient status for those who need real-time and regular medical monitoring and preventive interventions [3]. IoMT also promotes the process of diagnosis and treatment, such as chronic disorders, exercise services, and elderly care. As interactions with doctors have become more effective and uncomplicated with increased patient involvement and satisfaction, remote patient health monitoring tends to avoid re-admissions and reduce hospital stays. Indeed, it introduces many benefits, as shown in Fig. 1, and more will be discovered as it continues to grow dramatically in the health sector.

The swift development of IoMT systems with advancements in data and device management leads to security vulnerabilities. Security-related issues, such as valid authentication, improper data transmission encryption, insecure interfaces, harmful firmware/software, and security concerns are significant apprehensions for any IoMT system. IoMT systems require an enhanced design framework that ropes in complexity management and scalability to avoid attacks. In addition to the challenges of control, the scopes and scales of IoMT devices give rise to another significant challenge, i.e., users’ privacy. While attacks on IoT systems are getting progressively more common, security metrics frequently center on networks and software. The attackers can acquire control and carry out pernicious activities, e.g., assaulting other devices near the undermined or compromised nodes. The IoT-based healthcare system works with physiological conditions recorded by wearable sensors of the users. It can also be supplemented by contextual information to predict unusual patterns of the situation more precisely. The sensed data from various biosensors are timely communicated to the centralized storage (cloud) through ZigBee, Wi-Fi, or Bluetooth using smart devices. An attacker explores ways to exploit the vulnerabilities by scanning weak sensors and injecting malicious data to gain access and control the medical data. Unknowingly, the malicious data are updated into the centralized storage, as shown in Fig. 2.

Fig. 2
figure 2

General overview of the IoMT healthcare system

Mostly, IoMT devices do not have efficient malware or virus protection software. They are merely a reflection of the low-memory as well as low-power storing-based mechanism. The inaccessibility of malware and virus security on IoMT devices help to act as bots and transform the malicious activity to additional devices in the network. Moreover, to attack such devices, hackers can also access sensitive information gathered and transmitted through the IoMT devices. The absence of high security, integrity, and confidentiality of data in IoMT restricts its potential to disrupt the widespread implementation of this technology. With all the advantages, the IoMT application is accompanied by the possibility of vulnerabilities and new security breaches in the healthcare system. This is correlated with the following factors: (1) medical devices primarily capture and exchange confidential patient data, (2) IoMT design technology causes complexity and incompatibility issues, and (3) medical IoT system manufacturers do not prioritize security features. Major security concerns such as confidentiality, integrity, and availability are increasing due to the factors mentioned above. So, implementing apt security measures is very crucial.

As described above, we list the seven factors in Table 1, extending from security concerns to the threats of high client expectations. These factors of Table 1 are significant concerns for the development and growth of the IoT in the medical domain. The key to creating natural and long-term productivity and affluence through these incredible technologies will help overcome many issues.

Table 1 Challenges of IoMT in Healthcare

To deal with such issues in the IoMT environment, several machine learning frameworks have been introduced in the recent past. Begli et al. [4] proposed a secure remote healthcare architecture system for remote monitoring (patients’ data were collected) and responding appropriately in an emergency. Because of offering a safe framework against User to Root (U2R) and Denial of Service (DoS) attacks, a multi-layer (hybrid) intrusion detection system with a machine learning algorithm (support vector machine) was proposed. The proposed method is designed with security measures to deal with possible attacks that can minimize the uncertainty of complex decisions and allocate specific values to the outcomes of actions. However, because of the multi-agent-based layered and rule-based architecture, it consumes more energy. It incurs a substantial computational cost that makes the system ineffective in dealing with security measures of eavesdropping.

Marwan et al. [5] proposed machine learning techniques for preventing illegal access to medical records and personal information and provide secure data processing in a cloud environment. The offered machine learning (support vector machine) method was implemented using fuzzy C-means clustering to classify image pixels effectively. It also incorporated the module CloudSec for reducing the risk of the possible exposure of medical data through a conventional two-layered architecture. This approach aims to optimal feature extraction and could help cloud providers avoid expensive encryption methods for data protection. However, the model’s consistency is not sure. This is because the classifier’s performance may differ with the dataset types, which take more training time due to complex matrix operations of images and Gaussian assumptions that lead to higher computational costs.

After having a deeper perspective on these developments, it is observed that traditional machine learning algorithms are confronted with significant challenges in dealing with algorithm selection, data acquisition, time and resources, interpretation of results, and high error-susceptibility. Similarly, deep learning applications need large annotated data sets, and these are hard to obtain. Often, annotation is time-consuming, costly, and also vague. Additionally, over-trained NNs yield the worst generalization performance. Thus, validation and appropriate stopping criteria are required to minimize the cost function. The challenge of the backpropagation algorithm, i.e., local minima effect, makes it unstable. All these factors make the learning process unsuitable to use optimally.

Unlike the conventional machine learning models like NN and deep learning implementations, ELMs are single- or multi-hidden-layer feed-forward neural networks used to solve much real-world data mining and other complex problems. In ELM, the weight parameters between the input and the hidden layers need not be tuned, and it is a straightforward computation approach. The number of hidden nodes is arbitrarily assigned and has an automatic updating procedure with the corresponding ancestors. The experimental results provide more efficient generalization performance with high-speed learning than classical, popular learning strategies for classification-based problems and approximation for objective benchmark functions. In the traditional approach of tuning the hyperparameters using grid search and randomized search, it is required to evaluate each set of hyperparameters by using the objective function, which becomes cost computing while dealing with a large no. of hyperparameter settings. Moreover, it becomes worst when the training time of the model is high.

Keeping all these aspects in view, a novel hybrid Bayesian optimization and ELM-based lightweight framework are designed to identify malicious access in the IoMT environment. The major contributions of this article are as follows:

  1. i.

    An optimized ELM model has been designed to identify and mitigate malicious activities in the IoMT environment, using an efficient Bayesian optimization approach. The method has been adopted for finding the optimal set of hyper-parameters of ELM to analyze the bigdata as a part of sensors and IoT devices in an IoMT environment.

  2. ii.

    An intelligent hybrid security framework is designed by using a realistic dataset named ToN_IoT [23] to realize the impact of security measures on a dynamic scenario.

  3. iii.

    The performance evaluation of the proposed model is made with the state-of-the-art ensemble and other conventional machine learning-based methods to realize its efficacy over others.

The rest of the paper is organized as follows. Section 2 discusses the literature study of the related IoT security research with intelligent methods. Section 3 elaborates the proposed Bayesian optimized hyper-tuned ELM approach to detect intrusive activities in an IoMT environment. The environmental setup for this experiment and the analysis of the results are described in Sect. 4. In this section, a rigorous performance analysis among all the machine learning, ensemble learning-based methods, and the proposed method is conducted to analyze the proposed method’s efficiency compared to others. Section 5 concludes this work with a few critical future concerns.

2 Literature study

Several types of research have been carried out for dealing with the security issues of IOMT network-based devices. Newaz et al. [6] proposed a smart healthcare system that used implantable wearable and medical devices to monitor patients’ vital signs continuously and robotically detect them to prevent critical medical conditions. A machine learning framework with a security-based health guard was proposed to detect malicious activities. The health guard perceives the different vital signs of connected devices and correlates them to understand the functionalities of the patient’s body to distinguish between normal and malicious activities. Machine learning techniques such as artificial neural network (NN), decision tree (DT), random forest (RF), k-nearest neighbor (KNN) are applied to detect malicious activities in a smart healthcare system. The computation of the proposed model is easy, and accuracy can be improved for large datasets. But, the results cannot be reliable in some instances when dependence (or correlation) exists between variables. Also, it maximizes the ambiguity of complex decisions and does not assign precise values to the outcomes of actions when data is limited.

He et al. [7] proposed connected healthcare systems for remote monitoring of patients’ physical conditions. The paper deals with the security aspects and vulnerabilities of the systems and derives a new intrusion detection method based on a stacked autoencoder. The central part of the connected healthcare system is composed of three parts, i.e., the acquisition unit for human physiological data, the field control unit, and the remote monitoring center, client–server architecture. The major advantage of the proposed model is to reduce the feature dimensions through extracting more distinguishing features and can detect derivative attacks that have not occurred earlier. However, it is difficult to interpret the NN’s processing elements called the BlackBox phenomenon and requires high processing time, making it unsuitable for large-coupled data.

Al-Shaher et al. [8] noted that the recent development of malicious codes had raised significant security concerns for patients’ unauthorized access to electronic health records. The author(s) proposed an intelligent healthcare security system that includes a wavelet neural network approach, a smart firewall, an intelligent network intrusion detection subsystem, and an intelligent web filter for dealing with unwanted security threats. Multi-layer perceptron NN is also used to detect and classify attack analysis mechanisms. The analyzed structures of wavelet neural networks are used to develop an optimal neural network paradigm for the security problem. The projected model minimizes the ambiguity of complex decisions and assigns precise values as an outcome of actions. It can process high-dimensional data. The model is highly scalable and self-organized in detecting attacks. But, it demands high computational costs due to packet-based classification and probabilistic graphical rules.

To implement healthcare applications in a distributed environment, Lakhan et al. [9] have analyzed various offloading and scheduling problems in IoMT fog-cloud network. Further, the authors have developed a novel framework based on deep reinforcement learning and blockchain-enabled approaches. The framework consists of multi-criteria offloading that makes use of policies of deep reinforcement learning and blockchain-enabled task scheduling algorithms including task sequencing for the implementation of healthcare applications in distributed IoMT environment. The empirical results reveal that the suggested deep reinforcement learning and blockchain-enabled approaches enhance the performance of the framework by minimizing the computation cost and communication time in the distributed environment.

Lakhan et al. [10] designed a novel and cost-effective IoMT architecture based on blockchain-enabled fog cloud technology for minimizing the cost of healthcare applications and to provide security to the data in the healthcare networks. To minimize the cost, the framework makes use of the blockchain-enable smart-contract cost-efficient scheduling algorithm framework (BECSAF) scheme. The framework makes use of the smart-contract blockchain scheme to provide data consistency and symmetric cryptography algorithm for validation. Moreover, experimental results show that the suggested algorithm schemes obtain better performance in terms of implementation of healthcare applications when compared with the standard approaches.

To implement changes in the dynamic environment, a novel security framework has been developed by Lakhan et al. [11]. The proposed framework makes use of the deep neural networks energy cost-efficient partitioning and task scheduling (DNNECTS) algorithm. The framework consists of the components namely application partitioning, task sequencing, and scheduling for processing critical healthcare tasks in the dynamic experiment. Moreover, the empirical results indicate that the suggested framework outperforms in the dynamic environment in terms of applications’ cost and energy utilization.

To efficiently manage the resources, a healthcare resource management optimization(HRMO) framework has been suggested by Mutlag et al. [12]. In the suggested framework, fog computing has been incorporated as an intermediate layer to reduce the drawbacks of cloud computing. The chain fog nodes in fog computing are used to process the healthcare crucial tasks through the utilization of the MAS (multi-agent system) which is considered as the major responsibility of fog computing. Thus, MAS plays a major role in connecting all processing levels, namely edge, fog, and cloud in the proposed framework.

Golec et al. [13] have developed a security and privacy-based lightweight architecture known as iFaaSBus to protect the data acquired from IoT devices and to forecast the trend of the ailment. To diagnose the COVID-19 infection and to efficiently manage the resources, the developed framework makes use of machine learning, IoT, Function as a Service (FaaS). The patient’s health data are secured using the OAuth-2.0 Authorization protocol-based privacy and JSON web token and transport layer socket (TLS) protocol-based security provided by the iFaaSBus. Further, the model has been validated using various machine learning approaches. It was evident from the results that KNN (K-nearest neighbor) attained better performance with an accuracy of 97.51% when compared with other approaches. It is also evident from the results that iFaaSBus attained better performance in comparison to non-serverless computing in terms of response time.

A model based on machine learning has been proposed by Yuvaraj et al. [14], for the parallelization of the jobs allocated and minimization of the runtime problems of the serverless frameworks. The suggested approach makes use of the GWO (gray wolf optimization) approach to enhance the mechanism of task allocation. In addition, the suggested approach also makes use of reinforcement learning (RIL) to optimize the GWO parameters which in turn enhances the task allocation mechanism. The simulation outcomes indicate that the suggested GWO-RIL approach provides reduced runtimes and accustoms with differing load conditions.

To easily manage relative 3D distances, a centralized heterogeneous formation flight position control design based on LQR PI (linear quadratic regulator proportional integral) controller has been proposed by Pirbhulal et al. [15]. In the proposed model, two wingmen quadcopters are used to track the output of the leader quadcopter. The pole placement control method and LQR PI control methods are used to control the leader and the two followers, respectively. During the flight, formation geometry may be alternated to arbitrary shape using a control scheme if it is incorporated with collision avoidance mechanism. Further, singular values are used to analyze the closed-loop system stability. Finally, the proposed approach has been validated using MATLAB/Simulink and the results indicate that the model attains promising results even in the existence of critical perturbations in terms of output tracking and stability of the leader.

To efficiently classify the attacks by the intruders in the IoMT environment, a hybrid approach known as PCA-GWO (principal component analysis-grey-wolf optimization) based on the deep neural network has been suggested by Priya et al. [16]. Initially, categorical data are transformed into numerical data by utilizing a one-hot encoding scheme. Then, PCA and GWO are applied to the pre-processed dataset to minimize the attribute dimensions to choose highly significant attributes. Further, various machine learning approaches known as Naïve–Bayes, support vector machine, K-nearest neighbor, random forest, and deep neural networks have been used for the classification of the reduced dataset. The experimental results indicate that the proposed approach obtains better performance with a 15% enhancement in accuracy and a 32% reduction in time complexity when compared with the conventional machine learning approaches in the efficient classification and prediction of cyber-attacks in the IoMT environment.

To predict the network resource consumption and to enhance the transmission of IoT services on time, a model based on machine learning and SDN (software-defined network) framework has been presented by Haseeb et al. [17]. The proposed framework makes use of the SDN centralized model to minimize the overhead caused by the control plane in the deployed network of IoT. In addition, the proposed approach makes use of a machine learning approach to optimize the performance of routing in a real-time environment. Then, the proposed approach makes use of dynamic metrics and SDN architecture for the prediction of link status and refinement of the strategies. Finally, the SDN controller makes use of a security algorithm for the efficient management and safeguard of the IoT nodes from anonymous occurrences. From the experimental outcomes, it was observed that the developed model obtained better performance in terms of network throughput and data delay by 10% and 21%, respectively.

For accurate identification of the brain tumor concerning its grade at the early stage, an automated security system that makes use of PART (partial tree) has been presented by Khan et al. [18]. Further, the proposed approach has been validated using tenfold cross-validation and an advanced feature set that has not been utilized formerly for the accurate recognition of the brain tumor. From the empirical results, it was identified that the suggested approach obtained better performance in terms of computational cost and accuracy when compared with other approaches such as random tree, Naive Bayes, rep tree, and random forest.

For the diagnosis of age-related macular degeneration (AMD) disease, a teleophthalmology framework based on scalable cloud technology that makes use Internet of Medical Things (IoMT) has been proposed by Das et al. [19]. The proposed framework forwards the retinal fundus images that were captured from the head-mounted camera of the patients to their personal and secure cloud storage for the prediction and detection of the severity of the AMD disease. Further, the severity of AMD disease is detected and identified by analyzing the images using the AMD-ResNet convolution neural network which makes use of 152 layers. For diagnosing the disease severity, the proposed model was trained using 130,000 AREDS (age-related eye disease study) fundus images acquired over 12 years from the NIH (National Institute of Health). From the experimental results, it was observed that the proposed model obtained 94.97 ± 0.5% sensitivity and 98.32 ± 0.1% specificity, respectively. Moreover, the proposed framework also makes use of temporal long-short term memory (LSTM) deep neural network for the prediction of advancement of AMD disease and precision medicine.

3 Proposed system

In this work, an ELM-based model [20, 21] with optimized parameters is developed for the efficient detection of intrusive behaviors in an IoT framework (Fig. 3).

Fig. 3
figure 3

Proposed system architecture

The proposed problem can be visualized as an optimization problem where the objective is to select the best \(p_{i} = \left\{ {f_{i} ,H_{i} ,\alpha_{i} } \right\}\) in \(P = \left\{ {p_{1} ,p_{2} ...p_{n} } \right\}\) (population with ‘\(n\)’ number of hyper-parameter sets). Here, \(p_{i}\) is the ith randomly generated hyper-parameter value set which is drawn an allowed range of values as follows:\(f_{i} \in {\text{list}}\left[ {1,2,3,4,5,6,7,8} \right]\), \(H_{i} \in {\text{range}}\left[ {1,60} \right]\), and \(\alpha_{i} \in {\text{range}}[0.1,1.0]\). Here \(f_{i}\),\(H_{i}\), and \(\alpha_{i}\) are ith activation function, the selected number of the hidden layer, and the learning rate. The activation function \(f_{i}\) is chosen as ‘1,’ ‘2,’ ‘3,’ ‘4,’ ‘5,’ ‘6,’ ‘7,’ and ‘8’ for \(Sine\), \(Tanh\), \(Tribas\), \(Sigmoid\), \(Hand\lim\),\(Soft\lim\), \(Gaussian\), and \(Multiquadric\), respectively. The performance of ELM on the prediction of attack type is dependent on these parameters \(f\),\(H\), and \(\alpha\). On the given baseline ELM model, the impact of \(\alpha\),\(H\), and \(f\) on the prediction performance is shown in Figs. 4, 5, and 6, respectively. Here, the studied problem can be visualized as an optimization problem to get optimal \(p_{i}^{*} = \left\{ {f_{i} ,H_{i} ,\alpha_{i} } \right\}\) in \(P\), which is the optimal parameter set of ELM for solving identification various attack types in IoT network. On the given IoT accesses profiles with connection traces \(X = \left\{ {x_{i} ,y_{i} } \right\}_{i = 1}^{m}\), \(P = \left\{ {p_{i} } \right\}_{i = 1}^{n}\) and model \(ELM\left( {p_{i} ,X} \right)\), here the objective is to find optimal \(p_{i}^{*}\) which optimize the following objective function (Eq. 1):

$$ \begin{aligned} p_{i}^{*} & = \mathop {\arg \max }\limits_{{p_{i} \in P}} \left\{ {s_{i} = {\text{score}}\left( {y,\hat{y} = ELM\left( {p_{i} ,X} \right)} \right)} \right\} \\ & = \mathop {\arg \max }\limits_{{p_{i} \in P}} \left\{ {s_{i} = {\text{score}}\left( {y,\hat{y}} \right) = \frac{1}{m}\sum\limits_{i = 1}^{m} {I\left( {\hat{y}_{i} ,y_{i} } \right)} } \right\} \\ \end{aligned} $$
(1)

The connection traces dataset \(X = \left\{ {x_{i} ,y_{i} } \right\}_{i = 1}^{m}\) is the collection of instances \(x_{i}\) with 42 features and one class label \(at_{i}\) representing seven different attack types and one normal type. The proposed ELM model has been trained with these instances. In ELM, the prediction of the class label is made using Eq. (2), where \(H\_out\)(Eq. 3) is the output matrix, \(\beta\) (Eq. 4) is the weight matrix representing the weights between hidden layer neurons and out neuron, and (Eq. 5) is the prediction.

$$ \hat{y} = H\_{\text{out}} \times \beta $$
(2)
$$ H\_{\text{out}} = \left[ {\begin{array}{*{20}c} {f\left( {b_{1} + x_{1} \times w_{1} } \right)} & {...} & {f\left( {b_{L} + x_{1} \times w} \right)} \\ {...} & {...} & {...} \\ {f\left( {b_{1} + x_{N} \times w_{1} } \right)} & {...} & {f\left( {b_{L} + x_{N} \times w} \right)} \\ \end{array} } \right]_{N \times L} $$
(3)
$$ \beta = \left[ {\beta_{1} ,\beta_{2} ...\beta_{L} } \right]^{{\text{T}}}_{L \times 1} $$
(4)
$$ \hat{y} = \left[ {y_{1} ,y_{2} ...,y_{N} } \right]^{{\text{T}}}_{N \times 1} $$
(5)

In this work, the hyperparameters of ELM are optimized with Bayesian optimization [22]. For unlabeled and complex (big) size data, this optimization often performs well (especially for unprecedented functions) to optimize the objective function. In this considered problem, the objective is to find out the optimal set of hyperparameters that maximize the score function defined in the objective function (Eq. 1). The ELM is successful and widely adopted for ease of implementation, incremental learning, batch learning, and sequential learning due to its efficiency and learning speed, generalization ability, and fast convergence. It is different from a traditional neural network it makes use of the Moore–Penrose generalized inverse technique for weight adjustment.

Fig. 4
figure 4

Study on the impact of learning rate (\(\alpha\)) on F1-measure

Fig. 5
figure 5

Study on impact of no. of hidden layer (\(H\)) on F1-measure

Fig. 6
figure 6

Study on the impact of activation functions (\(f\)) on F1-measure

Bayesian optimization is an efficient choice over the grid and randomized search for optimizing hyperparameters as it searches the hyperparameters’ values in the search space in an informed manner. Bayesian optimization initiates the search in hyperparameter search space from a small region of interest by using a surrogate function which is an approximation to the used objective function. Initial sample candidate solutions (points) in the search space are selected, and the surrogate function is obtained. Then, the obtained surrogate function is used to identify potential candidate solutions. Based on identified potential candidate solutions, the surrogate function is updated and other promising regions are identified. In each iteration, identify and focus on more regions of interest by updating the surrogate function. Unlike other optimization approaches that optimize an objective function, the Bayesian optimization technique optimizes the surrogate function. It makes use of a probabilistic model that computes the probability of the score on a given set of hyperparameters’ values (surrogate function). These scores are used to select suitable hyperparameters’ values and to guide the search. It usually provides better searching time as it focuses on those areas in the search space having a better probability of score. Bayesian optimization operates along with probability distribution for each hyper-parameter that it will sample from. These distributions have to be set by the users. Initially, this optimization approach starts with a wide search space and gradually focuses on a specific area around the optimal parameter retrieved from previous iterations. The domains of the hyper-parameters are defined along with many distribution functions.

In this work, the domains of the considered hyperparameters are as follows: activation function (\(f\)), alpha (\(\alpha\)), and several hidden neurons (\(H\)). Here, the \(f\) is the mathematical equations that are responsible for determining whether the neuron input is significant for prediction. The \(\alpha\) is the controlling parameter for the adjustment of weights and \(H\) is the number of hidden neurons that highly impact the performance and network stability.

This work is focused on the process of finding optimal parameters (\(p_{i}\)) of ELM by using Bayesian optimization. The main components of Bayesian optimization are the objective function, surrogate function, and selection function. The objective function is used to evaluate the parameters combinations (\(p_{i}\)) and output a score \(s_{i} = ELM\left( {p_{i} ,X} \right)\) which indicates how well the set of hyperparameters performs for the considered problem. For the present problem, we have considered ‘F1 score’ as the evaluation matrix, and it is the objective to maximize the objective function presented in Eq. (1). Here, the objective is to put the restriction on the number of evaluations of objective functions.

figure a

Algorithm 1 takes a set of hyperparameters \(p_{i} = \left\{ {f_{i} ,H_{i} ,\alpha_{i} } \right\}\), train the model \(ELM\left( {p_{i} ,X} \right)\), and returns accuracy as a score. Here, we have considered negative score that our proposed optimization process requires a minimal value. Initially, \(n\) number of hyperparameters is generated randomly \(P = \left\{ {p_{i} } \right\}_{i = 1}^{n}\) from the hyperparameter’s domain distribution function, where each \(p_{i} = \left\{ {f_{i} ,H_{i} ,\alpha_{i} } \right\}\) represents ith hyperparameter set. Each \(p_{i}\) is evaluated by using the objective function \(s_{i} \leftarrow ELM\left( {X,p_{i} } \right)\), which is meant for recognition of IoT security attack type. By this process \(n\) number of pairs \(\left( {p_{i} ,s_{i} } \right)\) are generated, from which the probability of set of hyperparameters given a score are broken down into two distributions as \(l\left( P \right)\) and \(g\left( P \right)\) as in Eq. (6).

$$ {\text{prob}}\left( {P| - s} \right) = \left\{ {\begin{array}{*{20}c} {l\left( P \right)} & {if - s < {\text{threshold}}\left( { - s} \right)} \\ {g\left( P \right)} & {if - s \ge {\text{threshold}}\left( { - s} \right)} \\ \end{array} } \right\} $$
(6)

By using a threshold \(\lambda\) on score, \(m\) (where \(m < n\)) number of hyperparameters \(P^{^{\prime}} = \left\{ {p_{1} ,p_{2} ...,p_{m} } \right\}\) are selected from \(P\). Then, the expected improvements \(E \times I_{{s^{*} }} \left( {p_{i} } \right)\) of each \(p_{i}\) in \(P^{^{\prime}}\) are computed to select the optimal hyperparameters set \(p_{i}^{*}\) to update the Surrogate function.

$$ E \times I_{{s^{*} }} \left( P \right) = \frac{{\lambda \times s^{*} \times l\left( P \right) - l\left( P \right) \times \int\limits_{ - \infty }^{{s^{*} }} {{\text{prob}}\left( s \right){\text{d}}s} }}{{\lambda \times l\left( P \right) + \left( {1 - \lambda } \right) \times g\left( P \right)}}\alpha \left( {\lambda + \frac{g\left( P \right)}{{l\left( P \right)}}\left( {1 - \lambda } \right)} \right)^{ - 1} $$
(7)
$$ {\text{prob}}\left( {s_{i} |p_{i} } \right) = \frac{{{\text{prob}}\left( {p_{i} } \right) \times {\text{prob}}\left( {p_{i} |s_{i} } \right)}}{{{\text{prob}}\left( {s_{i} } \right)}} $$
(8)

Here, the objective is to maximize the \(l\left( P \right)/g\left( P \right)\) to maximize the expected improvement. From \(P^{^{\prime}}\), the optimal \(p_{i}^{*}\) is selected which will maximize the Expected improvement \(E \times I_{{s^{*} }} \left( {p_{i} } \right)\). By using the selected \(p_{i}^{*}\), the Surrogate function is updated along with the feedback of the objective function (Algorithm 1). This process is repeated unless or until there is no change in optimal hyperparameters or no further improvement in expected improvement.

4 Experimental setup and result analysis

This section enlightens the experimental setup and evaluation of the proposed system based on the ensemble ELM approach and cloud architecture to mitigate the cyber-attack in a real-time IoMT environment.

4.1 Dataset collection and environment setup

In this study, the ToN_IoT dataset [23] that is obtained from a practical and large-scale network developed by UNSW Canberra Cyber IoT Lab, School of Engineering and Information Technology (SEIT), UNSW Canberra @ The Australian Defense Force Academy (ADFA) is considered for experimenting the proposed approach. The dataset consists of 43 features with a total of 4,61,043 observations, of which 3,00,000 are normal observations and 1,61,043 are cyber observations. We have performed the simulation using the machine with the following specifications. It is an HP (ProDesk 600 G2 MT) desktop with operating system: Windows 10 Pro 64-bit, Processor type: Intel(R) Core (TM) i7-6700 CPU with a capacity of 3.40 GHz (8 CPUs) and a memory of 4096 MB RAM for the experimentation. Further, software packages of PYTHON library such as Pandas, Imblearn, and Numpy framework are utilized to analyze data better, a framework like Matplotlib is utilized for the visualization of the data, and sklearn and Mlxtend are utilized for the implementation of machine learning and ensemble-based stacking. In addition, the optimized parameters of all the deliberated models that have been considered for the optimization of distinct hyper-parameters of the ELM approach are illustrated in Table 2.

Table 2 Optimized parameters of all considered models

4.2 Result analysis

To prove the effectiveness of the proposed approach, distinct machine learning and ensemble learning approaches have been considered as a part of experimentation. Moreover, the effectiveness of the proposed approach is also evaluated by considering the k-fold cross-validation technique, which divides the dataset into a ‘k’ number of folds in a random fashion. In this study, tenfold with stratified sampling is chosen to maintain efficient error estimation and less bias and variance. Further, the performance of all the considered approaches, along with the proposed approach, has been assessed by utilizing performance metrics such as precision, recall, F1 score, F2 score, Fbeta score, and ROC-AUC. In addition, particular focus has been given to the F1 score for accurately assessing the system’s performance as the distribution of class labels is highly imbalanced and non-uniform.

In this study, a comprehensive relative analysis of the evaluation metrics utilized to assess all the ML and ensemble ELM approaches together with the proposed method is displayed in Table 3. The results show that the performance of the proposed method concerning the precision, recall, F1 score, F2 score, Fbeta score, and ROC-AUC values is superior when compared with the considered ML and ensemble ELM approaches. Figure 7a–g represents the actual versus predicted performance of all the deliberated ML approaches together with the proposed model on the activities of IoMT communication profiles.

Table 3 Comparative analysis of performance metrics among all models
Fig. 7
figure 7figure 7figure 7

AUC-ROC analysis on all the methods. (a) DT prediction performance. (b) NB prediction performance. (c) LR prediction performance. (d) RF prediction performance. (e) XGBoost prediction performance. (f) Prediction performance of ELM. (g) Prediction performance of ELM + Bayesian Optimization

From the results of the figures, it has been noticed that except for a few approaches like DT, NB, LR, and RF, all the other approaches such as XGBoost, ELM, and ELM along with Bayesian optimization have performed well. It is also noticed that the proposed approach surpasses all the considered approaches by achieving the highest predicted results in all the activities of the IoMT communication profiles.

Table 4 describes the experimental outcomes of the considered ensemble approaches such as ELM + RS, ELM + GA, and ELM + Bayesian optimization on stratified tenfold cross-validated data. From the empirical results represented in Table 4, it is realized that the proposed ELM, along with Bayesian optimization, has surpassed the other two approaches in almost all the folds of stratified tenfold cross-validated data.

Table 4 Comparison of F1-measure among three ELM models

The finding of the AUC-ROC curve of the ELM approach, along with the proposed ELM and Bayesian optimization approach, is depicted in Fig. 8a, b. It is noticed from the visualization findings shown in Fig. 8a, b that both the ELM and ELM along with Bayesian optimization have displayed unremitting performance in every class of the IoMT communication profiles. Moreover, the actual versus predicted performance results using ensemble ELM approaches such as ELM + RS, ELM + GA, and ELM + Bayesian optimization are interpreted in Fig. 9a–c. From the visualization findings, it is identified that the predicted performance of ELM with Bayesian optimization has surpassed the other two approaches in all the classes of the IoMTcovisualisation profiles. The finding of the F1 score of all the deliberated ML approaches along with the proposed approach is represented in Fig. 10. It is identified from the figure that the proposed method has attained a higher F1 score when compared with the F1 score of the considered ML and other ensemble approaches. Therefore, from all the visual findings, it can be concluded that the proposed approach attains superior performance in all the classes of IoMT communication profiles and has proven to be the most robust approach for mitigating cyber-attacks in real-world applications of IoMT.

Fig. 8
figure 8

AUC-ROC analysis of ELM and ELM + BO. (a) ELM. (b) ELM + BO

Fig. 9
figure 9

a ELM with randomized search, b ELM + Genetic Algorithm, c ELM + Bayesian Optimization

Fig. 10
figure 10

F1 score comparison analysis of the studied models

The performance of the proposed model is also compared with other existing similar research related to IoMT. Table 5 indicates the performance comparison of the proposed model with other similar intelligent models studied in the literature. The performance analysis shows the superiority of the projected method over other described methods to a larger extent. The experimental results show that the proposed system based on the cloud and ensemble learning approach has several advantages. The main advantage of the proposed method is that it can quickly identify malicious activities in highly dynamic and assorted networks of IoMT as the framework of the proposed system is simple and easy to implement. Another advantage is that few parameters are used in the training and testing phase to design the IDS. Moreover, these parameters can be easily updated in real-time predictions to enhance the overall efficiency of the proposed system regarding the accuracy, detection rate, low false-positive rate, and processing time.

Table 5 Performance comparison with other similar IoMT security models

5 Conclusion and future work

In the last few decades, advanced technology and the population have increased day by day worldwide, which has increased the cost of healthcare and prices of services. Moreover, the rapid advancement in the IoT technology has contributed to vast development in the Internet of medical things (IoMT) which aims in enhancing the quality of patient’s life. Therefore, the transformation of the healthcare sector to IoMT is required to assure a better quality of medical services at an affordable cost. The significant challenge in the deployment of the IoMT environment is the growing cyber-attacks. To protect the IoMT environment from unforeseen cyber-attacks, various researchers are functioning in this domain to provide different approaches for securing the IoMT environment from other cyber-attacks. This study proposed a hybrid approach based on ELM and Bayesian optimization that utilized cloud architecture to mitigate cyber-attacks in a real-time IoMT environment. Besides, the fundamental advantage of using an ensemble learning approach is to forecast advanced predictions by considering the predictions obtained from the individual considered machine learning approaches.

In this study, the effectiveness of the proposed ELM with Bayesian optimization for reducing cyber-attacks in the IoMT environment had been demonstrated by comparing the results with several machine learning approaches, such as DT, NB, LR, RF, XGBoost, ELM, and ensemble learning approaches, such as ELM with GA and ELM with RS. The experimental results demonstrated the superiority of the proposed method in terms of precision, recall, F1 score, F2 score, Fbeta score, and AUC-ROC curve with values 0.990300, 0.990300, 0.990300, 0.989175, 0.986652, and 0.870034, respectively, over the considered machine learning and ensemble methodologies. More focus may be attained in the future study of this work on the security and privacy concerns concerning multiple cloud/fog-based dynamic environments prolonged with extensive devices. Also, the adoption of ELM certainly works for few learned patterns, while it may fail for larger sized nonlinear data approximation based solutions. In this case, it is advisable to adopt deep learning neural network with better adaptability and learning capability for untrained features. Several security issues, such as hijacking, tampering messages, eaves dropping, device cloning, denial of service, denial of power attack [32], arise in the deployment of IoT technology as the components of IoMT are interconnected from different locations. In addition, deployment of IoMT technology also suffer from several limitations such as low battery capacity, less processing power and less memory, interoperability and security. In this aspect, it is highly apt to adhere the intelligent methods to protect the devices and network. Moreover, based on the specific domain expert knowledge and other related information, the present study may be extended to ensure correct user-centric recommendations in a real-time cloud/fog-based environment.