Open AccessArticle

False Data Injection Attacks Detection Based on Stacking and MIC-DCXGB

Tong Li

^1,2,

Tian Xia

^3,*,

Haoming Zhang

³,

Dongyang Liu

³,

Hai Zhao

¹ and

Zhuolin Liu

School of Computer and Engineering, Northeastern University, Shenyang 110169, China

State Grid Liaoning Electric Power Research Institute, Shenyang 110006, China

School of Information Science and Engineering, Northeastern University, Shenyang 110819, China

Author to whom correspondence should be addressed.

Sustainability 2024, 16(22), 9692; https://doi.org/10.3390/su16229692

Submission received: 9 October 2024 / Revised: 30 October 2024 / Accepted: 5 November 2024 / Published: 7 November 2024

(This article belongs to the Special Issue Renewable and Sustainable Energy Systems: Architecture, Methodology and Technology, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

With the integration of sustainable energy, the power grid has become increasingly information-intensive and complex. To address the issue of power grid cyber-physical systems being unable to operate securely and stably when systems suffer false data injection attacks, a two-stage detection method based on Stacking and Maximum Information Coefficient and Dual-layer Confidence Extreme Gradient Boosting (MIC-DCXGB) is proposed by the paper. Firstly, a Stacking classification model consisting of multiple heterogeneous learners detects anomalies in real-time measurement data samples to determine if false data are present. Secondly, the method incorporates the Maximum Information Coefficient (MIC) for feature selection, which non-linearly measures the correlation between data features and fairly removes redundant features by evaluating the amount of information one feature variable contains about another. This approach effectively tackles the high-dimensional redundancy problem commonly faced in false data injection attack detection. Then, the paper introduces a dual-layer confidence Extreme Gradient Boosting (XGBoost) tree with positive feedback information transmission to classify node states. By combining grid topology learning with label correlation, it selectively uses preceding label information to reduce errors in the predictions learned by subsequent classifiers, achieving precise localization of the attack positions. Finally, extensive simulations validate the effectiveness of the proposed method.

Keywords:

stacking; MIC-DCXGB; FDIA; distributed network

1. Introduction

The integration of renewable energy into the distribution network has markedly increased the complexity of information transmission within the cyber-physical systems of these networks. With this increased complexity comes a heightened risk of adversarial attacks, among which false data injection attacks are particularly disruptive. The detection and mitigation of such attacks are therefore paramount to ensure the secure and stable operation of distribution networks, which is essential for the sustainable integration of renewable energy sources [1,2,3]. Power CPSs have leveraged efficient data transmission to effectively harness the advantages of highly integrated cyber-physical systems. However, frequent data communication also made them more vulnerable to network attacks. In 2015, several regions in Ukraine experienced large-scale power outages due to a hacker attack, marking a significant system failure caused by external attacks on power CPSs [4]. Cybersecurity issues have become a top priority for power CPSs [5,6].

False Data Injection Attack (FDIA), which is a more threatening type of network attack, has exhibited extremely high stealth capabilities. It can bypass the existing state information detection mechanisms in energy management systems, tamper with grid measurement data, and cause control centers to make incorrect estimates of the current power CPS status, thereby issuing erroneous commands that affect the power grid [7,8,9]. Ref. [10] proposed an FDIA method that has been able to bypass traditional bad data detection mechanisms and directly impact state estimation, but this method required the attacker to know the entire system topology. Ref. [11] relaxed the attack conditions, allowing the attacker to only know the phase angle differences within the attack area to construct an FDIA that evades bad data detection mechanisms. Thus, the detection and identification of FDIA have become necessary conditions for ensuring the power CPS’s stable operation. Existing FDIA detection methods are mainly divided into model-driven state estimation detection methods and data-driven machine learning detection methods. State estimation FDIA detection methods generally rely on the operational state of the grid and use residual searching for FDIA detection and identification [12]. Ref. [13] proposed a novel state estimation algorithm that is resilient to sparse data injection attacks and robust to additive and multiplicative modeling errors. Ref. [14] developed an attack-resilient estimation algorithm for linear discrete-time stochastic systems with inequality constraints on the actuator attacks and states. Ref. [15] generated two Markov Chain models to detect FDIA by comparing the state estimation in response to the variable characteristics of new energy internet operation. However, as the power system continues to evolve towards smart technology, big data are gradually being applied in power CPS. Model-driven detection methods can no longer cope with the increasing volume of grid data and fail to meet the growing online application demands of the power system.

The preliminary exploration of this study observed that some machine learning methods address certain drawbacks of neural networks in localization and detection problems and offer advantages comparable to neural networks. For instance, the XGBoost algorithm proposed in [16] has characteristics such as independence from hyperparameters, simplicity, efficiency, and strong interpretability, which allow it to effectively extract non-stationary and non-linear attack features and demonstrate significant advantages in localization and detection of FDIA. However, the high dimensionality of measurement data generally leads to the “curse of dimensionality”, which greatly increases the training time and space consumption of XGBoost, thus severely interfering with localization and detection. Ref. [17] combined active learning and Bayesian feature selection to optimize the XGBoost detector and enhance the model’s localization performance. However, Bayesian feature selection involved choosing a subset of features for each state variable, leading to increasing computational overhead as the number of state variables grows, reducing generalizability, and not accounting for label correlations.

Therefore, this study introduces the MIC [18] for feature selection on measurement data [19]. MIC not only measures various associations between data features non-linearly but also fairly removes irrelevant and redundant features based on the amount of information contained in one feature variable about another. The selected features have strong generalizability and can be used for localization detection of all state variables, effectively addressing the high-dimensional redundancy problem in measurement data. Consequently, this study treats the FDIA localization and detection problem as a multi-label binary classification problem and proposes an MIC-XGB-based FDIA localization and detection method. However, MIC-XGB decomposes the multi-label learning problem into several independent binary classification problems based on binary associations. Although it has high localization and detection capabilities, it still does not learn label correlations.

To address this, the study incorporates the MIC to learn label correlations by connecting the transmission of preceding labels based on the power grid topology, allowing MIC-XGB to capture label dependencies.

This study proposes a defense strategy against False Data Injection based on Stacking and MIC-DCXGB. The Stacking method is used to detect whether FDIA exists in measurement samples; MIC is introduced for feature selection to address the high-dimensional redundancy of measurement data; and DCXGB classifies the status of each node to precisely localize attack locations.
It sets up a two-layer detector framework to leverage positive feedback in information transmission, continually reducing errors in the predictive information learned by subsequent classifiers, thereby more accurately localizing FDIA.
Specifically, MIC-XGB and the MIC-XGB that learns label correlations are used as the first and second-layer detectors, respectively, and defining this method as MIC-XGB with a “double-layer confidence” structure.

2. Principles of Related Models

2.1. Principles of False Data Injection Attack

Attackers, by carefully constructing attack vectors, can make the attacked measurement data conform to the physical laws, thereby evading bad data detection. The attacked vector is

z^{'} = z + a

(1)

where

a

represents the attack and the real-time measurement vector

z \in R^{m}

generally includes measurements such as node voltage amplitudes, node injection power, and branch power.

With the development of digital grids, power CPS exhibits multidimensional and heterogeneous characteristics [20]. In the context of power CPS, the behavior of FDIA is illustrated in Figure 1. Attackers can inject false data into the measurement system, communication system, and terminal control system, tampering with network information to cause decision failures and achieve their attack objectives [21]. However, the continuous advancement of CPS technology presents more challenges to successfully implementing FDIA.

Firstly, state estimators based on single data acquisition and supervisory control systems (SCADA) are becoming increasingly heterogeneous. Therefore, when constructing FDIA, corresponding adjustments are needed to prevent detection by PMUs and to avoid failure. Secondly, current research often constructs DC FDIA models using simplified linear equations, which lack generalizability and accuracy, making them easily detectable.

Therefore, this paper adopts a heterogeneous state estimation AC FDIA model, where the constraint objective for the attack vector

a

\min ‖ Z^{℧_{A}} - h^{℧_{A}} (x^{a}) ‖_{0}

(2)

where

℧_{A}

represents the attack area, which has a measurement vector

Z^{℧_{A}}

, which includes the power injections and flows measured by SCADA, as well as voltage magnitudes and current vectors measured by PMUs. The function mapping

h (\cdot)

characterizes the relationship, which is between the attacked values and corresponding state variables for both SCADA and PMU. Lastly,

x^{a} = [θ^{a}; U^{a}]

denotes the measurement values of phase angles and voltage magnitudes.

Equation (2) sparsifies the attack vector by minimizing non-zero elements of the attack vector, thereby enhancing the concealment. Additionally, this paper adopts the measurement function

h (\cdot)

from reference [22], which defines an AC power grid model that satisfies system constraints. Under the aforementioned conditions, the attack strategy is to gradually overload the target transmission line without detection, ultimately leading to a cascading failure in the CPPS. The process of overloading the target transmission line is represented as

\sqrt{{(P_{1} + Δ P_{1}^{a})}^{2} + {(Q_{1} + Δ Q_{1}^{a})}^{2}} \geq S_{1}^{\max}

(3)

where

P_{1}

represents the active power and

Q_{1}

represents the reactive power of the target transmission line, respectively;

Δ P_{1}^{a}

and

Δ Q_{1}^{a}

denote the increments of

P_{1}

and

Q_{1}

; and

S_{1}^{\max}

is the limit of the line. In summary, the attacker first performs optimal power flow analysis to estimate the state information of the grid and select the attack line. Then, the attacker solves Equation (2) to obtain the optimal attack vector and injects this vector to gradually overload the target line, thereby achieving a False Data Injection Attack (FDIA).

Finally, existing research often assumes that attackers have complete information about the grid, which is costly and unrealistic to obtain [23]. Compared to the global attack model that requires comprehensive information, the FDIA model described above allows for the selection of an attack area A based on the available information, enabling more feasible local attacks and offering greater versatility.

2.2. The Principle of Stacking for False Data Detection

Ensemble learning is a commonly used method in machine learning to improve model accuracy, typically divided into parallel and serial ensemble methods. Stacking is a type of parallel ensemble method, generally structured in two layers. The learners used in the first layer are called base learners, while the learner in the second layer is referred to as the meta-learner.

Stacking employs heterogeneous models as base learners, observing the original dataset from different perspectives and data structures through various base learners and extracting different features and outputs. The combination of features extracted by each base learner is referred to as meta-features. To prevent overfitting, the meta-learner does not directly observe the original dataset; instead, it observes the meta-features and outputs the final prediction results.

The Stacking method can combine the advantages of various types of machine learning models, thereby improving accuracy and generalization ability. In this paper, we utilize the Stacking classification model to detect the presence of FDIA in power grid measurement data, with the specific detection method illustrated in Figure 2.

3. MIC-DCXGB

3.1. MIC-XGB Detector

MIC [18] is a method based on mutual information that compares the strength of data associations both horizontally and vertically, allowing for the broad capture of various associations. It is commonly used in filter-based feature selection to identify the most useful features from the raw data, reducing data dimensionality and improving model performance.

For any two different sets of variables given in the dataset

D

, they are divided into blocks

x

and

y

. If the grid sizes

x

and

y

are fixed, then the two-dimensional grid

G = x \times y

defined by the two-dimensional coordinate axes remains fixed. At this point,

\max I (D / G)

represents the highest mutual information entropy.

By traversing all possible grid partition sizes

x

and

y

for the mutual information entropy and normalizing it to a range of 0–1, we obtain the mutual information matrix

M

. This process is represented as

M_{(D)} = \frac{\max I (D / G)}{\log_{2} m i n {x, y}} = \int p (x, y) l o g_{2} \frac{p (x, y)}{p (x) p (y)} d x d y

(4)

where

p (x, y)

represents the joint distribution of the two sets of variables;

p (x)

is the distribution of the two sets of variables and

p (y)

is also. The maximum information coefficient

C_{M I C}

is obtained by taking the maximum value from

M_{(D)}

This process is represented as

C_{MIC} = \max_{x y \leq B (m)} {M_{(D)}}

(5)

where

B (m)

represents the highest limited number of grid partitions. When the two sets of variables are independent,

C_{M I C}

is 0; the stronger the correlation between the two sets of variables, the larger the value of

C_{M I C}

Additionally, in the multi-label binary classification problem mentioned in the introduction, “multi-label” refers to the voltage magnitudes and phase angles of various nodes in the power grid system, where 0 indicates a normal state and 1 indicates contamination. An XGB classifier is trained separately for each label based on the measurement features selected by MIC to form the MIC-XGB detector. The detector outputs the predicted probabilities t for each label category based on the corresponding features of the current input sample. The rule for predicting the final result is as

t = \{\begin{array}{l} 1, t \geq 0.5 \\ 0, t < 0.5 \end{array}

(6)

For instance, in an IEEE-57 bus system, the MIC-XGB method determines whether each node is under attack based on the final predicted results of 1 or 0 for the positions of the nodes. The dispatcher identifies the attacked area or location based on the differentiation of the state variables, thereby achieving localization detection.

3.2. MIC-DCXGB Detector

MIC-XGB can precisely utilize feature information while reducing runtime and computational costs. However, its method of converting the multi-label binary classification problem using a “first-order” binary association approach does not leverage label correlations, which limits its learning performance.

Label correlation, which involves constructing detectors by incorporating possible relationships between labels in a highly structured output space, aims to improve the model’s effectiveness for specific problems [24]. In the power grid, which is inherently physically connected between nodes and constrained by KCL, KVL laws, and the topological characteristics of power flows [25], there is a significant radial structure and some dependence between labels, such as voltage magnitudes and phase angles, at different nodes. Therefore, this paper proposes MIC-DCXGB based on MIC-XGB to capture label correlations and enhance model performance.

Unlike the classifier chain method mentioned in the introduction, which utilizes label correlations, this paper considers the physical connections in the power grid topology to account for label correlations between adjacent nodes and strengthens the use of label correlations by defining the detection sequence according to the number of adjacent nodes. For example, in the IEEE-14 bus system, assuming MIC-XGB has already completed classification for nodes 9 and 12, the state prediction information obtained is shown in Figure 3.

Nodes 4, 7, 10, and 14 are adjacent to node 9, and when utilizing prediction information, only node 9’s state prediction is considered while avoiding node 12 to prevent feature redundancy. Given that measurement features provide current flow data on each line, node 9’s state prediction information serves as an enhancement feature to determine the state of another line node.

Since node 12 has no direct flow with these nodes and the status prediction of intermediary nodes (5, 6, 11, 13) is unknown, incorporating node 12’s state prediction could introduce interference, reducing the classifier’s accuracy [26]. The proposed method prevents high-dimensional prediction information from compromising prediction efficiency while maintaining feature quality, especially in large systems with many labels.

For example, once node 4’s state prediction is obtained by combining node 9’s prediction, it can then use the predictions from nodes 4 and 9 to classify node 7’s state. The more adjacent nodes’ predictions are available, the more information can be utilized for subsequent nodes. The prediction information from node 9 influences node 4’s prediction, and this, in turn, affects nodes 2 and 5, showing that early state predictions propagate through the network, affecting subsequent nodes. It is evident that a node that obtains state prediction results earlier will have more neighboring nodes and thus receive more prediction information from its neighbors [27]. With this diffusion process, the impact on the state predictions of other nodes becomes broader, and the utilization of label correlations is more comprehensive.

Therefore, the detection order is specified from nodes with more neighbors to fewer neighbors to better utilize label correlations. A MIC-XGB classifier is trained for each label in sequence, considering the label correlations of neighboring nodes, resulting in a feature-enhanced MIC-XGB detector. The advantages are as follows:

(1): Low Complexity: Despite being a higher-order modeling approach, selective use of prediction information keeps the model simple, with minimal computational and spatial costs.
(2): High Expandability: The method performs well and can be extended to any network size for FDIA detection problems.
(3): Strong Interpretability: Based on “information theory”, both MIC feature selection and XGB classification offer a more rational explanation of predictions.

However, while the enhanced MIC-XGB detector improves classification ability, it can introduce new errors. Incorrect predictions in one label classifier may mislead subsequent classifiers, potentially causing more errors. To address this, the paper suggests adjusting the XGB classifier’s classification method based on confidence thresholds before and after feature enhancement.

Firstly, the input to the MIC-XGB classifier before feature enhancement consists entirely of measurement data and does not include erroneous state prediction information. Although its predictive capability is lower, it does not mislead subsequent label detection. Therefore, the paper uses MIC-XGB as the first layer classifier and sets a confidence threshold

α

for its label prediction probability

t

. Since the critical value for distinguishing whether the current label state is contaminated is 0.5, the closer the prediction

t

is to this critical value, the lower the reliability. Conversely, if the prediction

t

is far from the critical value

α

and exceeds the specified range of unreliable predictions, the prediction for the current label state is considered reliable. The following rule is proposed:

\{\begin{array}{l} t < 1 - α o r t > α, confidence \\ 1 - α < t < α, unconfidence \end{array}

(7)

When the confidence condition is not met, the predictions of MIC-XGB for the current label on the input sample are not considered absolutely reliable.

Therefore, since the feature-enhanced MIC-XGB can leverage the label-related information present in the feature inputs, it is used to improve the prediction capability for labels that are not absolutely reliable in the first-layer classifier. Thus, the feature-enhanced MIC-XGB is employed as a second-layer classifier. For labels deemed not absolutely reliable by the first-layer classifier, the prediction probability

t^{'}

from the second-layer classifier is compared with

t

that from the first-layer classifier, and the more reliable prediction is taken as the current label’s prediction probability

t_{l a s t}

. The process is as follows:

t_{last} = \{\begin{array}{l} t, |t - 0.5| > |t^{'} - 0.5| \\ [2 e x] t^{'}, |t - 0.5| < |t^{'} - 0.5| \end{array}

(8)

This method not only avoids introducing new errors but also utilizes label-related information to correct some errors. Since only the labels predicted as not absolutely reliable by the first-layer classifier are referred to the second-layer classifier [28], any errors contained in the second-layer classifier will not interfere with the detection samples that have been predicted reliably by the first layer. At the same time, incorporating label-related information enhances the prediction capability for samples predicted as not absolutely reliable, thereby improving the overall accuracy of the current label. More accurate predictions of preceding labels enable the second-layer classifier for adjacent node labels to learn fewer errors from the enhanced features, improving its classification capability and, consequently, the overall accuracy of the subsequent adjacent node labels. As detection progresses downward, this positive feedback process gradually corrects errors in subsequent labels and continuously improves the detector’s performance.

This method involves training a MIC-XGB classifier with a dual-layer confidence structure for each label in sequence, resulting in the MIC-DCXGB detector. The method flow is shown in Figure 4.

4. Results Verification

4.1. Base Learners and Meta-Learners

In Stacking ensemble learning, the base learners should first be selected based on the principle of “good and diverse” models. In this study, the initial base learners considered are Logistic Regression (LR) and so on. These models are tested on the dataset, and the final selection of base learners is determined by comparing and analyzing evaluation metrics, taking into account both accuracy and diversity requirements. Table 1 shows the performance metrics of each model.

From Table 1, it can be seen that among the seven models, SVM performs the worst, with an accuracy and ROC curve area both below 0.6 and an F1 score of only 0.28. This indicates that the SVM model has low stability and poor generalization capability, making it ineffective at accurately identifying FDIA. LR performs the worst among the remaining models, with all three evaluation metrics around 0.6. Both SVM and LR have ROC curve areas close to 0.5, indicating that they are prone to misclassifying normal data as fraudulent and fraudulent data as normal, which can significantly impact the normal operation of the power grid. Therefore, SVM and LR are initially considered for exclusion from the base learners. Differentiated learners, which observe diverse features from the training set, provide more opportunities for the meta-learner to improve and help avoid overfitting. The Q values of the remaining five learners are compared in Table 2. KNN has the smallest Q values compared to the other learners. For binary classification problems, a larger Q value generally indicates better performance among the learners. It can be observed that the Q values between RF, ET, XGB, and LGB are all above 0.9. Considering that RF and ET are improved algorithms based on bagging decision trees, and XGB and LGB are improved algorithms based on boosting decision trees, the final choice of base learners is ET and LGB based on algorithm diversity, classification performance, and training time. Therefore, the chosen base learners for the model are KNN, ET, and LGB.

In Stacking ensemble learning, the base learners handle the high-dimensional raw dataset, while the meta-learner deals with the lower-dimensional meta-dataset produced by the base learners. Therefore, a simpler learner is generally considered for the meta-learner to fit the outputs of the base learners. In this study, LR and DT are chosen as candidate models for the meta-learner. The performance of both models is compared to determine the meta-learner.

LR and DT are individually used as meta-learners in the Stacking ensemble learning strategy, combining the selected three base learners. The models are trained 10 times on the dataset, and Table 3 shows the results for average accuracy, F1 score, ROC-AUC value, and training time. It is evident that after Stacking, the three evaluation metrics for both LR and DT exceed 0.98, demonstrating the effectiveness of the Stacking ensemble strategy. Since LR outperforms DT in both evaluation metrics and training time, LR is chosen as the meta-learner.

4.2. MIC Feature Selection Analysis

The initial number of features for the 14-node system was 104 dimensions, and for the 57-node system it was 419 dimensions. After MIC feature selection, the dimensions were reduced to 73 and 207, respectively, representing reductions of 29.81% and 50.56%. This indicates that the method effectively reduced the feature dimensions, with a more noticeable effect on removing redundant features in larger node systems.

CMIC features with larger values are more strongly correlated with other features, leading to overlapping information. Feature selection can eliminate redundant features while retaining those with stronger independence. Features with MIC values of 0.50, 0.40, 0.30, 0.20, and 0.10 exhibit stronger independence and are all retained. Subsequent numerical examples are simulated using the selected feature subset as input for the detector.

4.3. Attack Examples with Complete Information

The 14-node system has a total of 28 labels. Since the voltage magnitude and reference phase angle of the balanced nodes remain unchanged before and after the attack, 26 effective detection labels, excluding the 2 labels for balanced node 1, were simulated using MIC-DCXGB on the test set. Comparing various detection methods, the average accuracy for each state is in Figure 5. Obviously, the classic ELM achieves better detection accuracy for some labels compared to deep CNN. However, both methods exhibit issues such as high fluctuation in detection accuracy and insufficient learning capability, resulting in lower overall accuracy. In contrast, XGB shows significant advantages in localization detection, with overall accuracy improving by 1.38% and 2.58% compared to DBN-ELM and CNN, respectively. Additionally, the overall accuracy of the CNN-XGB method, which combines deep learning with ensemble learning, is 0.64% higher than XGB but slightly lower than MIC-XGB.

The proposed MIC-DCXGB method achieves an overall accuracy that is 1.14% higher than that of CNN-XGB. This indicates that the proposed method not only provides more stable and reliable classification performance for each state in the small-node system but also excels in overall accuracy.

It is evident that MIC-DCXGB is closer to the top-left corner of the ROC curve and is able to more accurately identify contaminated states with a lower false positive rate. Table 4 provides a comparison of various localization metrics and the Area Under the Curve values, further validating the superior performance of the proposed method. It is important to note that, due to MIC-DCXGB’s consideration of label correlations, the localization rate metrics reflecting absolute differentiation capabilities are significantly higher than those of neural networks that directly map all label prediction results based on fully connected layers.

4.4. Attack Examples with Partial Information

In reality, obtaining complete grid information is expensive and unrealistic [29], so this paper only uses partial information for simulation in the 57-node system. In a 57-node system with a total of 114 labels, 112 effective detection labels, excluding the 2 labels for node 1, were simulated using MIC-DCXGB on the test set. The average accuracy for each state is compared with various detection methods and shown in Figure 6.

Due to the high measurement dimensions, large data volume, and complexity, the detection difficulty in the 57-node system is greater than that in the 14-node system, leading to a general decline in detection accuracy. At this point, XGB shows slight fluctuations, while the fluctuations of DBN-ELM and CNN are more pronounced. XGB outperforms DBN-ELM and CNN by 6.57% and 6.76%, respectively. Additionally, MIC-XGB and CNN-XGB show overall accuracy that is higher than XGB by 0.72% and 0.65%, respectively. Both methods maintain similar accuracy in the 14-node and 57-node systems, indicating that the MIC feature selection method provides data processing results comparable to the feature expression capabilities of deep learning.

Moreover, the overall accuracy of MIC-DCXGB is 1.06% higher than that of MIC-XGB and 1.78% higher than that of XGB, demonstrating its superior detection performance in large node systems and good generalization capability.

MIC-DCXGB still shows superior performance in various localization detection metrics for the 57-node system, as shown in Table 5. Additionally, the feature-enhanced MIC-XGB achieves an average correction rate of 17.01% for misclassified labels in the 14-node system and 22.39% in the 57-node system. This indicates that the proposed method, which integrates grid topology relationship learning to assess label correlations, effectively locates and detects FDIA.

5. Conclusions

This paper addresses the challenge of accurately locating FDIA in CPS by proposing a malicious data false instruction localization detection method based on Stacking and MIC-DCXGB. The main contributions of this paper are as follows:

The FDIA localization problem is transformed into a multi-label binary classification problem after detecting FDIA by our proposed method.
This approach not only reduces the impact of feature redundancy on classifier performance and improves learning efficiency but also incorporates grid topology relationships to utilize label correlations.
This enhancement strengthens the learning of samples with less reliable classifications, resulting in more precise and reliable detection outcomes.

The availability of our method in localization detection problems has been validated through extensive simulations on IEEE-14 and 57-node systems, demonstrating its applicability across different node system topologies and attack scenarios.

Author Contributions

T.L. conceived the idea for the manuscript, and T.X., H.Z. (Haoming Zhang), H.Z. (Hai Zhao), D.L. and Z.L. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Science and Technology Project of Electric Power Research Institute of State Grid Liaoning Electric Power Supply Co., Ltd.—Research on Lightweight Identification and Protection of Malicious Service Instructions for Active Distribution System (2024YF-78).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, B.; Li, Y.; Zhan, F.; Sun, Q.; Zhan, H. A Distributed Robust Economic Dispatch Strategy for Integrated Energy System Considering Cyber-Attacks. IEEE Trans. Ind. Inform. 2022, 18, 880–890. [Google Scholar] [CrossRef]
Huang, B.; Zheng, S.; Wang, R.; Wang, H.; Xiao, J.; Wang, P. Distributed Optimal Control of DC Microgrid Considering Balance of Charge State. IEEE Trans. Energy Convers. 2022, 37, 2162–2174. [Google Scholar] [CrossRef]
Li, Y.; Zhang, H.; Liang, X.; Huang, B. Event-triggered based distributed cooperative energy management for multienergy systems. IEEE Trans. Ind. Inform. 2019, 15, 2008–2022. [Google Scholar] [CrossRef]
Liu, Z.; Huang, B.; Hu, X.; Du, P.; Sun, Q. Blockchain-Based Renewable Energy Trading Using Information Entropy Theory. IEEE Trans. Netw. Sci. Eng. 2023, 10, 1–12. [Google Scholar] [CrossRef]
Li, Y.; Gao, D.W.; Gao, W.; Zhang, H.; Zhou, J. A Distributed Double-Newton Descent Algorithm for Cooperative Energy Management of Multiple Energy Bodies in Energy Internet. IEEE Trans. Ind. Inform. 2021, 17, 5993–6003. [Google Scholar] [CrossRef]
Li, D.; Sun, Q.; Wang, R.; Sui, Z. Transient Stability Analysis and Enhancement of Inverter-Based Microgrid Considering Current Limitation. IEEE Trans. Power Electron. 2024, 10, 1109. [Google Scholar] [CrossRef]
Wang, J.; Hui, L.C.; Yiu, S.M.; Wang, E.K.; Fang, J. A survey on cyber attacks against nonlinear state estimation in power systems of ubiquitouscities. Pervasive Mob. Comput. 2017, 39, 52–64. [Google Scholar] [CrossRef]
Chaojun, G.; Jirutitijaroen, P.; Motani, M. Detecting false data injection attacks in ac state estimation. IEEE Trans. Smart Grid 2015, 6, 2476–2483. [Google Scholar] [CrossRef]
Li, Y.; Gao, D.W.; Gao, W.; Zhang, H.; Zhou, J. Double-Mode Energy Management for Multi-Energy System via Distributed Dynamic Event-Triggered Newton-Raphson Algorithm. IEEE Trans. Smart Grid 2020, 11, 5339–5356. [Google Scholar] [CrossRef]
Liu, Y.; Ning, P.; Reiter, M.K. False data injection attacks against state estimation in electric power grids. Acm Trans. Inf. Syst. Secur. 2011, 14, 309–341. [Google Scholar] [CrossRef]
Liu, X.; Li, Z. False data attacks against ac state estimation with incomplete network information. IEEE Trans. Smart Grid 2017, 8, 2239–2248. [Google Scholar] [CrossRef]
Erjian, Y. Zero Residual Identification Method for Bad Data in Power System State Estimation. Electr. Power Technol. 1981, 66–73. [Google Scholar]
Yong, S.Z.; Foo, M.Q.; Frazzoli, E. Robust and resilient estimation for Cyber-Physical Systems under adversarial attacks. In Proceedings of the 2016 American Control Conference (ACC), Boston, MA, USA, 6–8 July 2016; pp. 308–315. [Google Scholar]
Wan, W.; Kim, H.; Hovakimyan, N.; Voulgaris, P.G. Attack-resilient Estimation for Linear Discrete-time Stochastic Systems with Input and State Constraints. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 5107–5112. [Google Scholar]
Yang, S.; Tan, B.; Guo, J. Detection of False Data Injection Attacks in New Energy Internet Based on Double Markov Chains. Electr. Power Autom. Equip. 2021, 41, 131–137. [Google Scholar]
Xue, W.; Wu, T. Active Learning-Based XGBoost for Cyber Physical System Against Generic AC False Data Injection Attacks. IEEE Access 2020, 8, 144575–144584. [Google Scholar] [CrossRef]
Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting Novel Associations in Large Data Sets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef]
Hu, R.; Ma, X.; Sun, B.; Wang, M.; Cheng, W.; Zhang, H.; Ye, H.; Tang, J.; Zheng, L. Transient Safety Assessment and Its Interpretability Based on Feature Selection. Power Syst. Technol. 2023, 47, 755–762. [Google Scholar]
Zia, M.F.; Inayat, U.; Noor, W.; Pangracious; Benbouzid, M. Locational Detection of False Data Injection Based on Multilabel Machine Learning Attack in Smart Grid Classification Methods. In Proceedings of the 2023 IEEE IAS Global Conference on Renewable Energy and Hydrogen Technologies (GlobConHT), Male, Maldives, 11–12 March 2023; pp. 1–5. [Google Scholar]
Liu, Z.; Huang, B.; Li, Y.; Sun, Q.; Pedersen, T.B.; Gao, D.W. Pricing Game and Blockchain for Electricity Data Trading in Low-Carbon Smart Energy Systems. IEEE Trans. Ind. Inform. 2024, 20, 6446–6456. [Google Scholar] [CrossRef]
Wei, S.; Wu, Z.; Xu, J.; Hu, Q. Multiarea Probabilistic Forecasting-Aided Interval State Estimation for FDIA Identification in Power Distribution Networks. IEEE Trans. Ind. Inform. 2024, 20, 4271–4282. [Google Scholar] [CrossRef]
Zhang, M.-L.; Zhou, Z.-H. A Review on Multi-Label L earning Algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
Huang, B.; Liu, L.; Zhang, H.; Li, Y.; Sun, Q. Distributed Optimal Economic Dispatch for Microgrids Considering Communication Delays. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 1634–1642. [Google Scholar] [CrossRef]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Huang, B.; Li, Y.; Zhang, H.; Sun, Q. Distributed optimal co-multi-microgrids energy management for energy internet. IEEE/CAA J. Autom. Sin. 2016, 3, 357–364. [Google Scholar] [CrossRef]
Mohammadpourfard, M.; Weng, Y.; Genc, I.; Kim, T. An Accurate False Data Injection Attack (FDIA) Detection in Renewable-Rich Power Grids. In Proceedings of the 2022 10th Workshop on Modelling and Simulation of Cyber-Physical Energy Systems (MSCPES), Milan, Italy, 3 May 2022; pp. 1–5. [Google Scholar]
Liu, Z.; Wang, D.; Wang, J.; Wang, X.; Li, H. A Blockchain-Enabled Secure Power Trading Mechanism for Smart Grid Employing Wireless Networks. IEEE Access 2020, 8, 177745–177756. [Google Scholar] [CrossRef]
Verma, B.; Rahman, A. Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning. IEEE Trans. Knowl. Data Eng. 2012, 24, 605–618. [Google Scholar] [CrossRef]
Liu, Z.; Xu, Y.; Zhang, C.; Elahi, H.; Zhou, X. A blockchain-based trustworthy collaborative power trading scheme for 5G-enabled social internet of vehicles. Digit. Commun. Netw. 2022, 8, 976–983. [Google Scholar] [CrossRef]

Figure 1. The behavior of FDIA.

Figure 2. Stacking based FDIA detection method.

Figure 3. IEEE-14 node system detection path diagram.

Figure 4. Flowchart.

Figure 5. IEEE-14 node system each state detection accuracy.

Figure 6. IEEE-57 node system each state detection accuracy.

Table 1. Evaluation indicators.

Model	Accuracy	F1 Value	ROC-AUC
LR	0.6202	0.5935	0.6203
KNN	0.8457	0.8176	0.8458
SVM	0.5753	0.2860	0.5755
RF	0.9979	0.9979	0.9979
ET	0.9992	0.9992	0.9992
XGB	0.9764	0.9759	0.9764
LGB	0.9949	0.9948	0.9949

Table 2. Q values of each model.

Model	Q Values
LGB	0.6440	0.9936	0.9917	0.9994
XGB	0.5290	0.9890	0.9663	0.9994
ET	0.9086	0.9907	0.9663	0.9917
RF	0.5630	0.9907	0.9890	0.9936
KNN	0.5630	0.9086	0.5290	0.6440

Table 3. Evaluation indicators of each model.

Model	Accuracy	F1 Value	ROC-AUC	Time/s
LR	0.9994	0.9993	0.9994	24.1472
DT	0.9856	0.9857	0.9857	28.3963

Table 4. Comparison of 14 node system positioning indicators.

Model	Localization	Accuracy	Precision	Recall	F1 Value	AUC Value
MIC-DCXGB	0.8688	0.9902	0.9890	0.9882	0.9886	0.9977
MIC-XGB	0.8354	0.9798	0.9839	0.9688	0.9763	0.9962
CNN-XGB	0.8333	0.9788	0.9811	0.9694	0.9752	0.9969
XGB	0.8104	0.9724	0.9878	0.9476	0.9673	0.9954
DBN-ELM	0.6042	0.9586	0.9703	0.9321	0.9508	0.9904
CNN	0.4771	0.9466	0.9591	0.9148	0.9364	0.9873

Table 5. Comparison of 57 node system positioning indicators.

Model	Localization	Accuracy	Precision	Recall	F1 Value	AUC Value
MIC-DCXGB	0.5875	0.9785	0.9706	0.9677	0.9691	0.9950
MIC-XGB	0.5104	0.9679	0.9622	0.9450	0.9535	0.9932
CNN-XGB	0.4917	0.9672	0.9592	0.9444	0.9517	0.9923
XGB	0.5063	0.9602	0.9549	0.9298	0.9422	0.9917
DBN-ELM	0.1708	0.8926	0.8854	0.7943	0.8374	0.9367
CNN	0.2125	0.8964	0.8869	0.7990	0.8406	0.9581

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, T.; Xia, T.; Zhang, H.; Liu, D.; Zhao, H.; Liu, Z. False Data Injection Attacks Detection Based on Stacking and MIC-DCXGB. Sustainability 2024, 16, 9692. https://doi.org/10.3390/su16229692

AMA Style

Li T, Xia T, Zhang H, Liu D, Zhao H, Liu Z. False Data Injection Attacks Detection Based on Stacking and MIC-DCXGB. Sustainability. 2024; 16(22):9692. https://doi.org/10.3390/su16229692

Chicago/Turabian Style

Li, Tong, Tian Xia, Haoming Zhang, Dongyang Liu, Hai Zhao, and Zhuolin Liu. 2024. "False Data Injection Attacks Detection Based on Stacking and MIC-DCXGB" Sustainability 16, no. 22: 9692. https://doi.org/10.3390/su16229692

APA Style

Li, T., Xia, T., Zhang, H., Liu, D., Zhao, H., & Liu, Z. (2024). False Data Injection Attacks Detection Based on Stacking and MIC-DCXGB. Sustainability, 16(22), 9692. https://doi.org/10.3390/su16229692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

False Data Injection Attacks Detection Based on Stacking and MIC-DCXGB

Abstract

1. Introduction

2. Principles of Related Models

2.1. Principles of False Data Injection Attack

2.2. The Principle of Stacking for False Data Detection

3. MIC-DCXGB

3.1. MIC-XGB Detector

3.2. MIC-DCXGB Detector

4. Results Verification

4.1. Base Learners and Meta-Learners

4.2. MIC Feature Selection Analysis

4.3. Attack Examples with Complete Information

4.4. Attack Examples with Partial Information

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI