CN115664720A

CN115664720A - A secure power information transmission method using data representation desensitization

Info

Publication number: CN115664720A
Application number: CN202211200261.9A
Authority: CN
Inventors: 李荷婷; 何平; 兴胜利; 薛劲松; 杨钰
Original assignee: Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Current assignee: Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-01-31

Abstract

The invention discloses a safe transmission method for electric power information desensitized by data representation, which comprises the following steps: inputting the non-sensitive input data into a variational automatic encoder for conversion to obtain mean value and variance data, constructing simulated Gaussian distribution, randomly sampling from the Gaussian distribution to obtain characterization features, dividing the characterization features into a zx part and a za part, fitting the transmission information by using the characterization features, fitting sensitive input data by using the za part data, training by generating an antagonistic network, thereby improving the independence of the zx part and the za part, inputting the zx part output after training into a regression network to obtain an individual evaluation value, and carrying out efficiency investigation on the individual evaluation value to judge whether the preset requirements are met. The electric power information safety transmission method provided by the invention processes the information data characteristics to be transmitted by adopting a data representation method in deep learning, so that an attacker cannot reversely deduce the key sensitive information in the original data through the processed data.

Description

A secure power information transmission method using data representation desensitization

技术领域technical field

本发明涉及信息安全技术领域，尤其涉及一种采用数据表征脱敏的电力信息安全传输方法。The invention relates to the technical field of information security, in particular to a power information security transmission method using data representation desensitization.

背景技术Background technique

随着信息技术的高速发展，人们和计算机技术之间的关联越来越密切。然而，在计算机网络运行过程中，存在信息网络被攻击的安全隐患，而一旦用户的信息安全得不到保证，将会面临巨大的隐私，财产等风险。在电力配送网络系统中，系统中的用户信息被攻击者窃取后，攻击者可以通过用户的相关信息获取其电力需求、用户个人信息、家庭住址等敏感信息，造成潜在的危害。并且通过截取用户端给系统的反馈信息，攻击者可以给电力配送方发送错误的用户使用电力情况，引发数据矛盾甚至造成配电混乱。提高网络信息安全，能够确保组织或机构业务的正常运转，避免相关互联网重要信息泄露等问题的出现。信息攻击具体指的是信息在不同网络环境的交互过程中，攻击者利用数据拦截和相关非法软件等技术，获取传输数据中的其重要信息，使企业和单位的核心信息被盗取。数据传输安全的核心是将数据在发送、传输、接收等环节进行加密处理，使得信息攻击者无法轻易的获得信息的内容，达到实现数据安全的目的。With the rapid development of information technology, the relationship between people and computer technology is getting closer. However, in the process of computer network operation, there are potential security risks of information network being attacked, and once the user's information security is not guaranteed, they will face huge risks such as privacy and property. In the power distribution network system, after the user information in the system is stolen by the attacker, the attacker can obtain sensitive information such as power demand, user personal information, and home address through the relevant information of the user, causing potential harm. And by intercepting the feedback information from the user terminal to the system, the attacker can send wrong information about the user's power usage to the power distributor, causing data contradictions and even power distribution confusion. Improving network information security can ensure the normal operation of the organization or institution's business and avoid problems such as leakage of important information on the Internet. Information attack specifically refers to the process of information interaction in different network environments. Attackers use technologies such as data interception and related illegal software to obtain important information in the transmitted data, so that the core information of enterprises and units is stolen. The core of data transmission security is to encrypt data in the links of sending, transmitting, and receiving, so that information attackers cannot easily obtain the content of information and achieve the purpose of data security.

目前最常见的提升信息安全的办法是对数据进行加密，即便信息遭到非法攻击者的夺取，攻击者仍然无法有效破解数据的内在含义，保护机构或组织的数据不再进一步侵犯。然而，攻击者通过差分分析，数据挖掘等手段，能够间接破解加密信息。而系统的敏感信息，是最需要受到保护的，机构或组织的敏感数据一旦被攻击者完全获取并理解，将会造成更大的损失。即便是非敏感信息被攻击者窃取，通过技术手段用其他数据也可以反推出数据拥有者的敏感信息，存在极大的安全隐患。同时，对传输的全部数据进行加密，则需要大量的处理时间，进而增加数据传输的整体时间、降低整体效率，无法达到电力数据传输效率的要求。而电力信息的数据里又有部分数据是非敏感数据，敏感数据和非敏感数据对安全的要求不同。因此，亟待设计一种方法，既保证数据传输的安全性，又充分满足电力传输高效、准实时的要求。At present, the most common way to improve information security is to encrypt data. Even if the information is seized by an illegal attacker, the attacker still cannot effectively decipher the inherent meaning of the data, and the data of the protection institution or organization will not be further violated. However, attackers can indirectly decipher encrypted information through differential analysis, data mining and other means. The sensitive information of the system needs to be protected the most. Once the sensitive data of an institution or organization is fully obtained and understood by an attacker, it will cause greater losses. Even if the non-sensitive information is stolen by the attacker, the sensitive information of the data owner can be reversed by using other data through technical means, which poses a great security risk. At the same time, encrypting all the transmitted data requires a lot of processing time, which increases the overall time of data transmission and reduces the overall efficiency, which cannot meet the requirements of power data transmission efficiency. And some data in the power information data is non-sensitive data, and sensitive data and non-sensitive data have different security requirements. Therefore, it is urgent to design a method that not only ensures the security of data transmission, but also fully meets the requirements of high efficiency and quasi-real-time power transmission.

以上背景技术内容的公开仅用于辅助理解本发明的发明构思及技术方案，其并不必然属于本专利申请的现有技术，也不必然会给出技术教导；在没有明确的证据表明上述内容在本专利申请的申请日之前已经公开的情况下，上述背景技术不应当用于评价本申请的新颖性和创造性。The disclosure of the above background technical content is only used to assist in understanding the inventive concepts and technical solutions of the present invention, and it does not necessarily belong to the prior art of this patent application, nor does it necessarily give technical teaching; if there is no clear evidence to show the above content In the case that the patent application has been published before the filing date, the above background technology should not be used to evaluate the novelty and inventiveness of the application.

发明内容Contents of the invention

为了克服现有技术存在的不足，本发明提供一种采用数据表征脱敏的电力信息安全传输方法，具体技术方案如下：In order to overcome the deficiencies in the prior art, the present invention provides a power information security transmission method using data representation desensitization, and the specific technical scheme is as follows:

一方面，提供了一种采用数据表征脱敏的电力信息安全传输方法，包括以下步骤：On the one hand, a method for secure transmission of power information using data representation desensitization is provided, including the following steps:

以机器学习的方式对传输信息进行建立深度学习模型，并给出相应的模型输入数据(x，a，y)以及模型输出数据

的定义，将传输信息分为非敏感输入数据和敏感输入数据，x为非敏感输入数据，a为敏感输入数据，x和a通过用户隐私预定义的信息敏感类别进行区分，y为待预测的任务标签，

为企业通过所述深度学习模型给出的个体评定值；Establish a deep learning model for the transmission information in the form of machine learning, and give the corresponding model input data (x, a, y) and model output data

The definition of the transmission information is divided into non-sensitive input data and sensitive input data, x is non-sensitive input data, a is sensitive input data, x and a are distinguished by information sensitive categories predefined by user privacy, y is to be predicted task label,

is the individual evaluation value given by the enterprise through the deep learning model;

在深度学习模型中，将所述非敏感输入数据输入变分自动编码器进行转换，以得到均值和方差数据，用得到的均值和方差构建模拟高斯分布，从高斯分布中随机采样以得到表征特征，记作z，将表征特征划分为两部分，分别记作zx部分和za部分，利用所述表征特征去拟合所述传输信息，利用za部分数据去拟合敏感输入数据，通过生成对抗网络进行训练，从而提升zx部分和za部分的独立性，将训练后输出的zx部分作为已去除敏感输入的加密特征以输入回归网络，进而得到

对

进行效率考察以判断是否满足预设要求，若不满足预设要求，则需进一步优化所述深度学习模型，直至满足预设要求，将优化后输出的zx部分作为加密信息进行传输。In the deep learning model, the non-sensitive input data is input into a variational autoencoder for conversion to obtain mean and variance data, and the obtained mean and variance are used to construct a simulated Gaussian distribution, and random samples are taken from the Gaussian distribution to obtain representative features , denoted as z, divide the characteristic features into two parts, respectively denoted as zx part and za part, use the characterization features to fit the transmission information, use the za part data to fit the sensitive input data, and generate an adversarial network Perform training to improve the independence of the zx part and the za part, and use the zx part output after training as the encrypted feature that has removed the sensitive input to enter the regression network, and then get

right

Efficiency inspection is carried out to determine whether the preset requirements are met. If the preset requirements are not met, the deep learning model needs to be further optimized until the preset requirements are met, and the optimized output zx part is transmitted as encrypted information.

进一步地，所述效率考察包括加密信息可用性考察和数据含敏感信息指标考察，其中，所述加密信息可用性考察利用以下公式进行，Further, the efficiency inspection includes the inspection of the availability of encrypted information and the inspection of data containing sensitive information indicators, wherein the inspection of the availability of encrypted information is carried out using the following formula,

式中，A_t是加密信息对还原个体t的效率，y_t是企业通过实际信息预测的个体评定值，

是企业通过深度学习模型给出的加密信息预测的个体评定值；In the formula, A _t is the efficiency of encrypted information to restore individual t, y _t is the individual evaluation value predicted by the enterprise through actual information,

It is the individual evaluation value predicted by the encrypted information given by the enterprise through the deep learning model;

所述数据含敏感信息指标考察利用以下公式进行，The data contains sensitive information index inspection using the following formula,

式中，

是压缩数据消除敏感属性能力的指标；

是企业依据深度学习模型给出加密信息表征做出的评估结果，

var(x)是x的方差统计值；T是地域总数；k是敏感输入类型；k_t表示个体t的敏感属性k数值分布；

是x的平均值，m是规定的敏感信息数量。In the formula,

is an indicator of the ability to compress data to eliminate sensitive attributes;

It is the evaluation result of the encrypted information representation given by the enterprise based on the deep learning model.

var(x) is the variance statistical value of x; T is the total number of regions; k is the sensitive input type; k _t represents the numerical distribution of the sensitive attribute k of individual t;

is the average value of x, and m is the specified amount of sensitive information.

进一步地，所述变分自动编码器包括均值编码器和方差编码器，所述均值编码器和方差编码器均定义2层神经网络，均用relu函数作为层间激活函数，均用sigmoid函数作为尾层激活函数，Further, the variational autoencoder includes a mean encoder and a variance encoder, and both the mean encoder and the variance encoder define a 2-layer neural network, both use the relu function as the interlayer activation function, and both use the sigmoid function as the Tail layer activation function,

其中，所述均值编码器对应的转换公式如下：Wherein, the conversion formula corresponding to the mean encoder is as follows:

u＝sigmoid(W_u2×relu(W_u1×x+b_u1)+b_u2)u＝sigmoid(W _u2 ×relu(W _u1 ×x+b _u1 )+b _u2 )

所述方差编码器对应的转换公式如下：The conversion formula corresponding to the variance encoder is as follows:

v＝sigmoid(W_v2×relu(W_v1×x+b_v1)+b_v2)v＝sigmoid(W _v2 ×relu(W _v1 ×x+b _v1 )+b _v2 )

式中，u，v分别表示数据的均值和方差，W_*i表示处理*的第i个网络层权重，其中b_*i表示处理*的第i个网络层的偏移量，*为u或v。In the formula, u and v represent the mean and variance of the data respectively, W _*i represents the weight of the i-th network layer that processes *, where b _*i represents the offset of the i-th network layer that processes *, and * is u or v.

进一步地，利用所述表征特征去拟合传输信息，其对应的拟合公式如下：Further, using the characteristic features to fit the transmission information, the corresponding fitting formula is as follows:

利用Za部分数据去拟合敏感输入数据，其对应的拟合公式如下：Use the Za part data to fit the sensitive input data, and the corresponding fitting formula is as follows:

式中，

为拟合后的传输信息，

为拟合后的敏感输入数据，W_*i表示处理*的第i个网络层权重，其中b_*i表示处理*的第i个网络层的偏移量，*为x或a。In the formula,

is the fitted transmission information,

is the sensitive input data after fitting, W _*i represents the weight of the i-th network layer that processes *, where b _*i represents the offset of the i-th network layer that processes *, and * is x or a.

进一步地，利用所述表征特征去拟合传输信息对应至一个输出无敏感输入信息还原的神经网络，所述输出无敏感输入信息还原的神经网络的损失函数计算如下：Further, using the characteristic features to fit the transmission information corresponds to a neural network whose output is restored without sensitive input information, and the loss function of the neural network whose output is restored without sensitive input information is calculated as follows:

利用za部分数据去拟合敏感输入数据对应至一个输出敏感输入信息还原的神经网络，所述输出敏感输入信息还原的神经网络损失函数计算如下：Use the za part of the data to fit the sensitive input data corresponding to a neural network that outputs sensitive input information restoration, and the neural network loss function of the output sensitive input information restoration is calculated as follows:

使得zx部分和za部分之间互相独立的损失函数计算如下：The loss function that makes the zx part and the za part independent of each other is calculated as follows:

式中，p(x|z)是解码分布，q(z|x)是编码分布，p(z)是表征特征先验，p(a|za)是敏感信息解码分布，za_k是第k个预测敏感输入表征维度，z,zx,za之间的关系是z＝[zx,za]；where p(x|z) is the decoding distribution, q(z|x) is the encoding distribution, p(z) is the characteristic prior, p(a|za) is the sensitive information decoding distribution, za _k is the kth A prediction-sensitive input representation dimension, the relationship between z, zx, za is z=[zx, za];

通过以上三个损失函数，以得出加密表征时总体的损失函数为：Through the above three loss functions, the overall loss function of encryption representation is obtained as:

Loss＝Loss_VAE+Loss_a+λ×Loss_z Loss＝Loss _VAE + Loss _a +λ×Loss _z

式中，λ为控制敏感信息消除程度的参数。In the formula, λ is a parameter controlling the degree of removal of sensitive information.

进一步地，若加密信息可用性满足要求而数据含敏感信息指标不满足要求，则需调高λ数值再次进行学习优化，若数据含敏感信息指标满足要求而加密信息可用性不满足要求，则需调低λ再次进行学习优化，其中λ取值范围为[0,1]。Further, if the availability of encrypted information meets the requirements but the data contains sensitive information indicators do not meet the requirements, then the value of λ needs to be increased for learning optimization again; if the data contains sensitive information indicators meet the requirements but the availability of encrypted information does not meet the requirements, then it is necessary to lower λ is optimized for learning again, where the value range of λ is [0,1].

进一步地，所述回归网络记作f，其对应的损失函数如下：Further, the regression network is denoted as f, and its corresponding loss function is as follows:

其中，

是回归阈值。in,

is the regression threshold.

进一步地，所述传输信息为供电公司与用户之间的通讯数据，所述非敏感输入数据包括用户电量信息、用电时间和户号，所述敏感输入数据包括用户姓名、用户身份证号、用户性别、用户地址、用户联系方式。Further, the transmission information is communication data between the power supply company and the user, the non-sensitive input data includes user power information, power consumption time and account number, and the sensitive input data includes user name, user ID number, User gender, user address, user contact information.

又一方面，提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现所述电力信息安全传输方法。In yet another aspect, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, the power information secure transmission method is realized .

再一方面，提供了一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现任一项所述的电力信息安全传输方法。In another aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the power information secure transmission method described in any one is implemented.

与现有技术相比，本发明具有下列优点：采用深度学习中数据表征的方法来加工待传输的信息数据特征，使得攻击者无法通过加工后的数据反推出原始数据中的关键敏感信息。Compared with the prior art, the present invention has the following advantages: the method of data representation in deep learning is used to process the characteristics of the information data to be transmitted, so that the attacker cannot deduce the key sensitive information in the original data through the processed data.

附图说明Description of drawings

图1是本发明实施例提供的采用数据表征脱敏的电力信息安全传输方法中变分自动编码数据处理过程示意图；Fig. 1 is a schematic diagram of the data processing process of variational automatic coding in the power information security transmission method using data representation desensitization provided by the embodiment of the present invention;

图2是本发明实施例提供的采用数据表征脱敏的电力信息安全传输方法中整体训练流程图；Fig. 2 is an overall training flow chart in the power information security transmission method using data representation desensitization provided by the embodiment of the present invention;

图3是本发明实施例提供的采用数据表征脱敏的电力信息安全传输方法中去敏感信息流程图。Fig. 3 is a flow chart of desensitizing information in the power information security transmission method using data representation desensitization provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion.

深度学习是用多层神经网络进行数据拟合，预测，分析的机器学习方法，表征学习是深度学习中编码特征的数据处理方法，通过转换数据的映射空间，挖掘出数据的潜在信息。变分自动编码器是一种无监督的数据生成模型，它能编码输入的数据到表征空间，再从表征空间还原数据。编码过程不仅对数据进行了加密，能够获得列之间更加独立的表征特征。Deep learning is a machine learning method that uses multi-layer neural networks for data fitting, prediction, and analysis. Representation learning is a data processing method for encoding features in deep learning. By converting the mapping space of data, the potential information of data is mined. A variational autoencoder is an unsupervised data generation model that encodes input data into a representation space and then restores data from the representation space. The encoding process not only encrypts the data, but also obtains more independent characterization features between columns.

本发明的目的是改善数据加密方式，能在加密信息的同时消除信息中的敏感因素，使得无论是数据拥有者还是数据攻击者，都无法通过加密信息反推出数据拥有者的敏感信息，那它将极大地提升信息系统的安全性。The purpose of the present invention is to improve the data encryption method, which can eliminate the sensitive factors in the information while encrypting the information, so that neither the data owner nor the data attacker can deduce the sensitive information of the data owner through the encrypted information. It will greatly enhance the security of the information system.

在本发明的一个实施例中，提供了一种采用数据表征脱敏的电力信息安全传输方法，包括以下步骤：In one embodiment of the present invention, a method for secure transmission of power information using data representation desensitization is provided, including the following steps:

步骤一、数学建模Step 1. Mathematical Modeling

其中，可以以电力配送管理系统为例，给出相应的模型输入数据；Among them, the power distribution management system can be taken as an example to give the corresponding model input data;

非敏感输入信息x：描述业主信息及其用电信息。它不包括定义的身份敏感输入信息，它主要包括如用户电量信息、用电时间、户号等相关数据；Non-sensitive input information x: Describe the owner's information and its electricity consumption information. It does not include defined identity-sensitive input information, it mainly includes related data such as user power information, power consumption time, account number, etc.;

敏感输入信息a：配电系统中用户的敏感信息，它包括用户姓名、用户身份证号、用户性别、用户地址、用户联系方式，用户地区地域信息(东北地区，华北地区，华南地区等)，用电类型(工业用电、商业用电、住宅用电、稻田排灌用电)、用户折扣等；Sensitive input information a: Sensitive information of users in the power distribution system, which includes user name, user ID number, user gender, user address, user contact information, user regional information (Northeast China, North China, South China, etc.), Types of electricity consumption (industrial electricity, commercial electricity, residential electricity, rice field drainage and irrigation electricity), user discounts, etc.;

待预测的任务标签y：这个标签是非必须的，可以通过该标签来检验加密信息的可用性，只有当加密信息能够被正常使用，加密过程才有意义。如实际用电场景中，根据用户的历史用电数据的加密信息，预测下个月用户的用电量，将预测的量与未来实际用户用电量进行比较，评价加密信息的使用能力。评价可以不通过预测任务来完成，模型可以给出相应的解码器，将加密信息还原，对比加密前，还原后数据的匹配度，一样可以评价加密的可用性。本实施例用标签预测过程为例来评价模型可用性。考察深度学习信息加密模型有两个主要的指标，首先就是加密信息的可用性，加密信息必须能够还原为企业能理解的形式，通过加密信息，企业给个体的评定

必须与实际y相符；其次是加密信息的安全性，它不能还原出原始数据带有的敏感信息，或者说，原始信息本身不带有敏感性，但可以通过原始信息进行差分攻击，推断出敏感性，这是不希望看到的，为此，加密信息必须做到敏感信息的保密。Task label y to be predicted: This label is not necessary, and the availability of encrypted information can be checked through this label. Only when encrypted information can be used normally, the encryption process is meaningful. For example, in the actual electricity consumption scenario, according to the encrypted information of the user's historical electricity consumption data, the user's electricity consumption in the next month is predicted, and the predicted amount is compared with the actual user's electricity consumption in the future to evaluate the ability to use the encrypted information. The evaluation can be done without predicting tasks. The model can provide the corresponding decoder to restore the encrypted information, and compare the matching degree of the data before encryption and after restoration, so as to evaluate the usability of encryption. This embodiment uses the label prediction process as an example to evaluate model availability. There are two main indicators to investigate the deep learning information encryption model. The first is the availability of encrypted information. The encrypted information must be able to be restored to a form that the enterprise can understand. Through encrypted information, the evaluation of the individual by the enterprise

must be consistent with the actual y; secondly, the security of the encrypted information, which cannot restore the sensitive information contained in the original data, or in other words, the original information itself does not contain sensitivity, but the differential attack can be inferred through the original information to infer the sensitive information Sex, which is undesirable, for this reason, encrypted information must keep sensitive information confidential.

步骤二、模型网络搭建Step 2. Model network construction

在深度学习模型中，参见图1和图2，将所述非敏感输入数据输入变分自动编码器进行转换，以得到均值和方差数据，用得到的均值和方差构建模拟高斯分布，从高斯分布中随机采样以得到表征特征，记作z，将表征特征划分为两部分，分别记作zx部分和za部分，利用所述表征特征去拟合传输信息，利用za部分数据去拟合敏感输入数据，通过生成对抗网络进行训练，从而提升zx部分和za部分的独立性。In the deep learning model, see Figure 1 and Figure 2, the non-sensitive input data is input into the variational autoencoder for conversion to obtain mean and variance data, and the obtained mean and variance are used to construct a simulated Gaussian distribution, from the Gaussian distribution random sampling in order to obtain the characteristic feature, denoted as z, the characteristic feature is divided into two parts, respectively recorded as zx part and za part, use the said characteristic feature to fit the transmission information, and use the za part data to fit the sensitive input data , by generating an adversarial network for training to improve the independence of the zx part and the za part.

具体地，使用变分自动编码器和生成对抗网络对数据特征表示进行去敏感信息化处理，首先搭建自动编码器模型，公式(3)-(5)描述了数据通过变分自动编码器的转换流程。变分自动编码器由均值编码器和方差编码器组成，这两个编码器都定义2层神经网络，用relu函数作为层间激活函数，用sigmoid函数作为尾层激活函数，用预测得到的均值和方差构建模拟分布，最终表征特征从分布中随机采样而来。Specifically, a variational autoencoder and a generative confrontation network are used to desensitize the data feature representation. First, an autoencoder model is built. Formulas (3)-(5) describe the transformation of data through a variational autoencoder. process. The variational autoencoder consists of a mean encoder and a variance encoder. Both encoders define a 2-layer neural network, use the relu function as the interlayer activation function, use the sigmoid function as the tail layer activation function, and use the predicted mean and variance to construct a simulated distribution, and the final characteristic features are randomly sampled from the distribution.

将得到的表征划分为zx和za两部分，用[zx+za]拟合原始数据，用za去拟合敏感输入数据a,公式(6)-(8)描述了这一对抗学习过程。最后提升zx和za的独立性。最终输出zx作为已去除敏感输入的加密特征。Divide the obtained representation into two parts zx and za, use [zx+za] to fit the original data, and use za to fit the sensitive input data a, formulas (6)-(8) describe this adversarial learning process. Finally, the independence of zx and za is improved. The final output zx is the encrypted feature with sensitive input removed.

u＝sigmoid(W_u2×relu(W_u1×x+b_u1)+b_u2) (1)u＝sigmoid(W _u2 ×relu(W _u1 ×x+b _u1 )+b _u2 ) (1)

v＝sigmoid(W_v2×relu(W_v1×x+b_v1)+b_v2) (2)v＝sigmoid(W _v2 ×relu(W _v1 ×x+b _v1 )+b _v2 ) (2)

zx,za＝split(z) (4)zx,za=split(z) (4)

式中，u，v分别表示数据的均值和方差，

为拟合后的传输信息，

为拟合后的敏感输入数据，W_*i表示处理*的第i个网络层权重，其中b_*i表示处理*的第i个网络层的偏移量，*为u、v、x、a。以上操作是为了去敏表征，将变分自动编码器和对抗网络一起作为表征模型，因此表征模型整体包括如下几个子网络：输出数据均值的神经网络，输出数据方差的神经网络，输出无敏感输入信息还原的神经网络和输出敏感输入信息还原的神经网络。In the formula, u and v respectively represent the mean and variance of the data,

is the fitted transmission information,

is the sensitive input data after fitting, W _*i represents the weight of the i-th network layer that processes *, where b _*i represents the offset of the i-th network layer that processes *, and * is u, v, x, a . The above operation is to desensitize the representation, and the variational autoencoder and the confrontation network are used together as the representation model, so the representation model as a whole includes the following sub-networks: the neural network that outputs the mean value of the data, the neural network that outputs the variance of the data, and the output without sensitive input Neural Network for Information Restoration and Neural Network for Output Sensitive Input Information Restoration.

步骤三、模型优化Step 3. Model optimization

将最终输出的zx部分作为已去除敏感输入的加密特征以输入回归网络，进而得到电量需求预测值，对得到的电量需求预测值进行效率考察以判断是否满足预设要求，若不满足预设要求，则需进一步优化所述深度学习模型，直至满足预设要求。The zx part of the final output is used as the encrypted feature that has removed the sensitive input to input the regression network, and then the predicted value of power demand is obtained, and the efficiency of the predicted value of power demand is checked to determine whether it meets the preset requirements. If the preset requirements are not met , the deep learning model needs to be further optimized until the preset requirements are met.

具体地，模型优化包括以下内容：Specifically, model optimization includes the following:

在变分自动编码器中，每一列特征都从自己独特的预测均值和方差产生的高斯分布中采样而来，通过约束生成的高斯分布与标准高斯分布的关系，使得表征特征之间更加独立。In the variational autoencoder, each column of features is sampled from a Gaussian distribution generated by its own unique predicted mean and variance. By constraining the relationship between the generated Gaussian distribution and the standard Gaussian distribution, the representation features are more independent.

利用所述表征特征去拟合传输信息对应至一个输出无敏感输入信息还原的神经网络，通过一个简单的均方误差损失函数，约束表征中的信息携带；所述输出无敏感输入信息还原的神经网络的损失函数计算如下：Use the characterization features to fit the transmission information corresponding to a neural network that outputs non-sensitive input information restoration, and constrains the carrying of information in the representation through a simple mean square error loss function; the output has no sensitive input information restoration neural network The loss function of the network is calculated as follows:

利用za部分数据去拟合敏感输入数据对应至一个输出敏感输入信息还原的神经网络，通过一个简单的均方误差损失函数，约束表征中za部分的信息携带；所述输出敏感输入信息还原的神经网络损失函数计算如下：Use the za part of the data to fit the sensitive input data corresponding to a neural network that outputs sensitive input information restoration, and constrains the information carried in the za part of the representation through a simple mean square error loss function; the neural network that outputs sensitive input information restoration The network loss function is calculated as follows:

Loss＝Loss_VAE+Loss_a+λ×Loss_z Loss＝Loss _VAE + Loss _a +λ×Loss _z

式中，λ为控制表征消除输入信息程度的参数，λ取值范围为[0,1]。λ越大，说明对去敏感的优化更深，表征如果提升了去敏感性，那么其包含的有用信息将减少，因此选择一个适当的λ作为优化平衡参数也是非常重要的。In the formula, λ is a parameter that controls the degree of elimination of input information, and the value range of λ is [0,1]. The larger the λ, the deeper the optimization for desensitization is. If the desensitization is improved, the useful information contained in the characterization will be reduced. Therefore, it is also very important to choose an appropriate λ as the optimization balance parameter.

通过最终得到的表征zx，利用一个回归网络f输出个体评定值，即

回归网络f对应的损失函数如下：Through the final representation zx, a regression network f is used to output the individual evaluation value, namely

The loss function corresponding to the regression network f is as follows:

其中，

是回归阈值。in,

is the regression threshold.

对得到的电量需求预测值进行效率考察以判断是否满足预设要求，所述效率考察包括加密信息可用性考察和数据含敏感信息指标考察，Efficiency inspection is carried out on the obtained forecasted value of electricity demand to judge whether it meets the preset requirements. The efficiency inspection includes the inspection of the availability of encrypted information and the inspection of data containing sensitive information indicators.

其中，所述加密信息可用性考察利用以下公式进行，Wherein, the investigation of the availability of the encrypted information is carried out using the following formula,

式中，A_t是加密信息对还原个体t的效率，其值越大，说明加密信息的信息丢失率越低，加密传输性越高效，

是企业通过深度学习模型给出的加密信息预测的个体评定值；y_t是企业通过实际信息预测的个体评定值。In the formula, _At is the efficiency of the encrypted information to restore the individual t, and the larger the value, the lower the information loss rate of the encrypted information and the more efficient the encryption transmission.

is the individual evaluation value predicted by the encrypted information given by the enterprise through the deep learning model; y _t is the individual evaluation value predicted by the enterprise through the actual information.

式中，

是压缩数据消除敏感属性能力的指标，其值越大，说明压缩信息中的敏感信息越少，

是企业依据深度学习模型给出加密信息表征做出的评估结果(即

)；var(x)是x的方差统计值；T是个体总数；k是敏感输入类型；k_t表示个体t的敏感属性k数值分布；

是x的平均值；m是规定的敏感信息数量。In the formula,

is an indicator of the ability to compress data to eliminate sensitive attributes. The larger the value, the less sensitive information in the compressed information.

It is the evaluation result made by the enterprise based on the deep learning model to give the encrypted information representation (that is,

); var(x) is the variance statistical value of x; T is the total number of individuals; k is the sensitive input type; k _t represents the numerical distribution of the sensitive attribute k of individual t;

is the average value of x; m is the specified amount of sensitive information.

若加密信息可用性满足要求而数据含敏感信息指标不满足要求，则需调高λ数值再次进行学习优化，若数据含敏感信息指标满足要求而加密信息可用性不满足要求，则需调低λ再次进行学习优化，通过不断的调整优化，直到同时满足加密信息可用性要求和数据含敏感信息指标要求。If the availability of encrypted information meets the requirements but the data contains sensitive information indicators do not meet the requirements, it is necessary to increase the value of λ for learning optimization again; Learning optimization, through continuous adjustment and optimization, until the requirements for the availability of encrypted information and the requirements for data containing sensitive information indicators are met at the same time.

本发明实施例中，步骤一主要工作是对问题进行数学建模；步骤二主要工作是搭建模型的各个网络框架，表征的输出是依靠变分自动编码器实现的，表征的信息分离是靠强化不同部分之间的独立性实现的，表征的信息保留是靠解码网络实现的；步骤三主要工作是将不同的神经网络用各自的优化目标进行优化迭代。在本发明的一个优选实施例中，参见图3，数学建模后进行网络模块初始化，然后对表征模型进行训练，即对变分自动编码器和对抗网络进行训练，通过调整设置λ以达到预期屏蔽信息能力，然后进一步训练回归模型，通过调整设置λ并训练表征模型，以使得回归模型满足期望的准确率要求，以最终输出表征模型和回归模型，将模型输入数据依次经过表征模型和回归模型以得到电量需求预测值。In the embodiment of the present invention, the main work of step 1 is to mathematically model the problem; the main work of step 2 is to build each network framework of the model, the output of the representation is realized by the variational autoencoder, and the information separation of the representation is achieved by strengthening The independence between different parts is achieved, and the representation information is preserved by the decoding network; the main work of step three is to optimize and iterate different neural networks with their own optimization goals. In a preferred embodiment of the present invention, referring to Figure 3, the network module is initialized after the mathematical modeling, and then the representation model is trained, that is, the variational autoencoder and the confrontation network are trained, and the desired value is achieved by adjusting the setting λ Mask the information ability, and then further train the regression model, by adjusting the setting λ and training the representation model, so that the regression model meets the expected accuracy requirements, and finally output the representation model and the regression model, and the model input data is sequentially passed through the representation model and the regression model To get the predicted value of electricity demand.

本发明实施例还提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现上述的电力信息安全传输方法。本设备实施例的思想与上述实施例中检测方法的工作过程属于同一思想，通过全文引用的方式将上述检测方法实施例的全部内容并入本设备实施例，不再赘述。An embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the above-mentioned safe transmission of electric power information is realized when the processor executes the computer program method. The idea of this device embodiment is the same as the working process of the detection method in the above embodiment, and the entire content of the above detection method embodiment is incorporated into this device embodiment by citing the full text, and will not be described again.

本发明实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述的电力信息安全传输方法。本存储介质实施例的思想与上述实施例中检测方法的工作过程属于同一思想，通过全文引用的方式将上述检测方法实施例的全部内容并入本存储介质实施例，不再赘述。An embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above method for safely transmitting power information is implemented. The idea of this embodiment of the storage medium is the same as the working process of the detection method in the above embodiment, and the entire content of the above embodiment of the detection method is incorporated into the embodiment of the storage medium by citing the full text, and will not be described again.

本发明提供的电力信息安全传输方法利用深度学习中表征的手段对传输的数据进行去敏感信息化，实现信息的安全传输，防止差分攻击。本发明基于变分自动编码器和生成对抗网络对数据进行表征，以深度学习模型的对信息进行加密处理，提升对敏感信息的保护，防止遭受更严重的信息侵犯。The power information secure transmission method provided by the present invention utilizes the means of representation in deep learning to desensitize and informatize the transmitted data, realizes secure transmission of information, and prevents differential attacks. The present invention characterizes data based on a variational autoencoder and a generative confrontation network, encrypts information with a deep learning model, improves the protection of sensitive information, and prevents more serious information violations.

本发明提供的电力信息安全传输方法采用变分自动编码器和生成对抗网络降低敏感信息和传输信息的相关性，对传输数据的信息进行分离，使得数据中的敏感信息不进入传输环节，并且要求传输中的非敏感信息也无法推断出敏感信息。The power information security transmission method provided by the present invention adopts variational autoencoder and generation confrontation network to reduce the correlation between sensitive information and transmission information, and separates the information of transmission data, so that the sensitive information in the data does not enter the transmission link, and requires Sensitive information cannot be inferred from non-sensitive information in transit.

以上所述仅为本发明的优选实施例，并非因此限制其专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above is only a preferred embodiment of the present invention, and does not limit the scope of its patents. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings is directly or indirectly used in other related technical fields. All are included in the scope of patent protection of the present invention in the same way.

Claims

1. A safe transmission method for electric power information with data characterization desensitization is characterized by comprising the following steps:

establishing a deep learning model for the transmission information in a machine learning manner, and providing corresponding model input data (x, a, y) and model output data

The definition of (1) is to divide the transmission information into non-sensitive input data and sensitive input data, x is the non-sensitive input data, a is the sensitive input data, x and a are distinguished by the information sensitive category predefined by the user privacy, y is the task label to be predicted,

evaluating values of individuals given by the enterprises through the deep learning model;

in a deep learning model, inputting the non-sensitive input data into a variational automatic encoder for conversion to obtain mean value and variance data, constructing simulated Gaussian distribution by using the obtained mean value and variance, randomly sampling from the Gaussian distribution to obtain a characteristic feature, recording the characteristic feature as z, dividing the characteristic feature into two parts, recording the two parts as a zx part and a za part respectively, fitting the transmission information by using the characteristic feature, fitting the sensitive input data by using the data of the za part, training by generating an antagonistic network to improve the independence of the zx part and the za part, inputting the zx part output after training as an encrypted feature with sensitive input removed into a regression network to obtain the encrypted feature with sensitive input removed, and obtaining the mean value and the variance data

To pair

Carrying out efficiency investigation to judge whether the preset requirements are met, and if the preset requirements are not met, further optimizing the deep learning modelAnd transmitting the optimized zx part as encryption information until the preset requirement is met.

2. The method for safely transmitting power information according to claim 1, wherein the efficiency investigation comprises encrypted information availability investigation and data content sensitive information indicator investigation, wherein the encrypted information availability investigation is performed by using the following formula,

in the formula, A _t Is the efficiency of the cryptographic information pair to recover the individual t, y _t Is an individual evaluation value predicted by enterprises through actual information,

the individual evaluation value is predicted by the encryption information given by the enterprise through a deep learning model;

the data-sensitive information index investigation is performed by using the following formula,

in the formula,

is an index of the ability of the compressed data to eliminate sensitive attributes;

the enterprise gives out the evaluation result made by the encrypted information representation according to the deep learning model,

var (x) is the variance statistic for x; t is the total number of regions; k is a sensitive input type; k is a radical of _t Sensitivity attribute representing individual tk value distribution;

is the average value of x and m is the specified amount of sensitive information.

3. The method for safely transmitting power information according to claim 1, wherein the variable automatic encoder comprises a mean encoder and a variance encoder, the mean encoder and the variance encoder both define a 2-layer neural network, both use a relu function as an interlayer activation function, and both use a sigmoid function as a tail layer activation function,

the conversion formula corresponding to the mean encoder is as follows:

u＝sigmoid(W _u2 ×relu(W _u1 ×x+b _u1 )+b _u2 )

the conversion formula corresponding to the variance encoder is as follows:

v＝sigmoid(W _v2 ×relu(W _v1 ×x+b _v1 )+b _v2 )

in the formula, u and v represent the mean and variance of data, W _*i I-th network layer weight representing processing, wherein b _*i The offset of the ith mesh layer representing the process x is u or v.

4. The power information secure transmission method according to claim 2,

and fitting the transmission information by using the characterization characteristics, wherein a corresponding fitting formula is as follows:

and (3) fitting the sensitive input data by using the Za partial data, wherein the corresponding fitting formula is as follows:

in the formula,

for the transmission information after the fitting to be performed,

for the fitted sensitive input data, W _*i I-th network layer weight representing processing, wherein b _*i The offset of the ith network layer representing process is x or a.

5. The method for safely transmitting power information according to claim 4, wherein the characterization feature is used to fit the transmission information to a neural network without sensitive input information restoration output, and the loss function of the neural network without sensitive input information restoration output is calculated as follows:

fitting sensitive input data to a neural network corresponding to the restoration of the output sensitive input information by using the za part data, wherein the loss function of the neural network restored by the output sensitive input information is calculated as follows:

the loss function that makes the zx and za portions independent of each other is calculated as follows:

where p (x | z) is the decoding distribution, q (z | x) is the encoding distribution, p (z) is the characterising prior, p (a | za) is the sensitive information decoding distribution, za _k Is the kth prediction-sensitive input characterization dimensionThe relationship between degree, z, zx, za is z = [ zx, za =]；

By the three loss functions, the overall loss function when the encrypted representation is obtained is as follows:

Loss＝Loss _VAE +Loss _a +λ×Loss _z

in the formula, λ is a parameter for controlling the degree of elimination of the sensitive information.

6. The method for safely transmitting the electric power information according to claim 5, wherein if the availability of the encrypted information meets the requirement and the availability of the data containing the sensitive information does not meet the requirement, the lambda value needs to be increased for learning optimization again, and if the availability of the data containing the sensitive information meets the requirement and the availability of the encrypted information does not meet the requirement, the lambda value needs to be decreased for learning optimization again, wherein the lambda value range is [0,1].

7. The method according to claim 1, wherein the regression network is denoted as f, and the corresponding loss function is as follows:

wherein,

is the regression threshold.

8. The method for safely transmitting the power information as claimed in claim 1, wherein the transmission information is communication data between a power supply company and a user, the non-sensitive input data comprises user power information, power utilization time and a user number, and the sensitive input data comprises a user name, a user identity card number, a user gender, a user address and a user contact way.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for secure transmission of power information according to any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the power information secure transmission method according to any one of claims 1 to 8.