CN118070040A

CN118070040A - Steel mill data acquisition method and device, electronic equipment and storage medium

Info

Publication number: CN118070040A
Application number: CN202410218789.1A
Authority: CN
Inventors: 童俊; 周克; 孔大明
Original assignee: CISDI Engineering Co Ltd
Current assignee: CISDI Engineering Co Ltd
Priority date: 2024-02-28
Filing date: 2024-02-28
Publication date: 2024-05-24

Abstract

The invention relates to a steel mill data acquisition method, a device, electronic equipment and a storage medium, wherein the method is characterized in that time sequence data of the current production process of a steel mill are acquired, the time sequence data are input into a feature extraction model to obtain nonlinear data features, the feature extraction model is obtained by training a pre-constructed extraction model through sample data, a dynamic threshold value is determined based on a dynamic density estimated value of the nonlinear data features, and data acquisition frequency is adjusted according to a comparison result of the dynamic density estimated value and the dynamic threshold value so as to acquire data of the steel mill through the adjusted data acquisition frequency; the invention can adjust the data acquisition frequency according to the comparison result of the dynamic density estimation value and the dynamic threshold value, and solves the technical problems of excessive acquisition or insufficient acquisition quantity caused by adopting fixed frequency to acquire data of a steel mill.

Description

A steel plant data collection method, device, electronic equipment and storage medium

技术领域Technical Field

本发明涉及数据分析技术领域，尤其涉及一种炼钢厂数据采集方法、装置、电子设备及存储介质。The present invention relates to the technical field of data analysis, and in particular to a steel mill data collection method, device, electronic equipment and storage medium.

背景技术Background technique

随着工业4.0和智能制造的快速发展，炼钢厂作为重要的基础工业生产环节，对数据的实时采集和精确分析的需求日益增强。炼钢生产过程涉及多个复杂的步骤，如熔炼、连铸、轧制等，每个步骤都伴随着大量的数据产生。这些数据不仅反映了生产过程的状态，还与设备的健康状况、生产效率、产品质量等关键指标密切相关。With the rapid development of Industry 4.0 and intelligent manufacturing, steel mills, as an important basic industrial production link, have an increasing demand for real-time data collection and accurate analysis. The steelmaking production process involves multiple complex steps, such as smelting, continuous casting, rolling, etc., and each step is accompanied by a large amount of data. These data not only reflect the status of the production process, but are also closely related to key indicators such as equipment health, production efficiency, and product quality.

相关的数据采集和分析方法往往基于固定的采集频率和预定义的模型进行，这在某种程度上限制了数据分析的深度和广度，存在以下技术问题：(1)由于炼钢厂的生产环境具有高度的动态性和不确定性，通过固定的采集频率对炼钢厂数据进行采集，导致在数据变化剧烈时采集不足，而在数据变化缓慢时采集过多，既浪费资源又可能错过关键信息；(2)难以捕获炼钢厂数据中的深层次模式和特征，尤其是在不同的时间尺度上的特征；(3)由于炼钢生产过程的复杂性，数据中经常包含大量的噪声和干扰，对于数据中的干扰部分可能缺乏有效的识别和处理方法，导致数据分析的结果受到干扰，准确性低。Related data collection and analysis methods are often based on fixed collection frequencies and predefined models, which to some extent limits the depth and breadth of data analysis and presents the following technical problems: (1) Since the production environment of a steel plant is highly dynamic and uncertain, collecting steel plant data at a fixed collection frequency results in insufficient collection when the data changes dramatically and excessive collection when the data changes slowly, which wastes resources and may miss key information; (2) It is difficult to capture deep patterns and features in steel plant data, especially features at different time scales; (3) Due to the complexity of the steelmaking production process, the data often contains a large amount of noise and interference. There may be a lack of effective identification and processing methods for the interference part of the data, resulting in interference in the data analysis results and low accuracy.

发明内容Summary of the invention

鉴于以上所述现有技术的缺点，本申请提供一种炼钢厂数据采集方法、装置、电子设备及存储介质，以解决上述技术问题。In view of the above-mentioned shortcomings of the prior art, the present application provides a steel plant data collection method, device, electronic equipment and storage medium to solve the above-mentioned technical problems.

本发明提供的一种炼钢厂数据采集方法，所述炼钢厂数据采集方法包括：获取炼钢厂当前生产过程的时序数据；将所述时序数据输入特征提取模型，得到非线性数据特征，所述特征提取模型由样本数据对预先构建的提取模型进行训练得到，所述预先构建的提取模型包括用于捕获结构特征的时间序列分析层、用于识别异常结构特征的自注意力机制层和用于提取非线性数据特征的非线性变换层，所述样本数据包括炼钢厂历史生产过程的全部或部分时序数据；基于所述非线性数据特征的动态密度估计值，确定出动态阈值；根据所述动态密度估计值与所述动态阈值的比对结果，调整数据采集频率，以通过调整后的数据采集频率对所述炼钢厂进行数据采集。The present invention provides a steel mill data collection method, which comprises: acquiring time series data of a current production process of the steel mill; inputting the time series data into a feature extraction model to obtain nonlinear data features, wherein the feature extraction model is obtained by training a pre-constructed extraction model with sample data, wherein the pre-constructed extraction model comprises a time series analysis layer for capturing structural features, a self-attention mechanism layer for identifying abnormal structural features, and a nonlinear transformation layer for extracting nonlinear data features, wherein the sample data comprises all or part of the time series data of the historical production process of the steel mill; determining a dynamic threshold based on a dynamic density estimation value of the non-linear data feature; and adjusting a data collection frequency according to a comparison result of the dynamic density estimation value with the dynamic threshold value, so as to collect data from the steel mill at the adjusted data collection frequency.

于本发明的一实施例中，通过调整后的数据采集频率对所述炼钢厂进行数据采集后，所述炼钢厂数据采集方法包括：基于采集得到的炼钢厂数据，构建多层次数据向量场，所述多层次数据向量场用于提取不同层次数据特征；将所述多层次数据向量场输入预设干扰识别函数，得到干扰数据；将所述干扰数据从所述炼钢厂数据中剔除，得到纯净的炼钢厂数据。In one embodiment of the present invention, after collecting data from the steel mill through the adjusted data collection frequency, the steel mill data collection method includes: constructing a multi-level data vector field based on the collected steel mill data, and the multi-level data vector field is used to extract data features at different levels; inputting the multi-level data vector field into a preset interference identification function to obtain interference data; and removing the interference data from the steel mill data to obtain pure steel mill data.

于本发明的一实施例中，根据所述动态密度估计值与所述动态阈值的比对结果，调整数据采集频率的过程包括：获取所述时序数据的当前数据采集频率；In one embodiment of the present invention, the process of adjusting the data acquisition frequency according to the comparison result of the dynamic density estimation value and the dynamic threshold value includes: obtaining the current data acquisition frequency of the time series data;

若所述动态密度估计值小于或等于所述动态阈值，则判定所述动态密度估计值对应的数据特征为正常数据特征，并将所述当前数据采集频率调整为第一数据采集频率，所述第一数据采集频率小于所述当前数据采集频率；若所述动态密度估计值大于所述动态阈值，则判定所述动态密度估计值对应的数据特征为异常数据特征，并将所述当前数据采集频率调整为第二数据采集频率，所述第二数据采集频率大于所述当前数据采集频率。If the dynamic density estimation value is less than or equal to the dynamic threshold, the data feature corresponding to the dynamic density estimation value is determined to be a normal data feature, and the current data collection frequency is adjusted to a first data collection frequency, which is less than the current data collection frequency; if the dynamic density estimation value is greater than the dynamic threshold, the data feature corresponding to the dynamic density estimation value is determined to be an abnormal data feature, and the current data collection frequency is adjusted to a second data collection frequency, which is greater than the current data collection frequency.

于本发明的一实施例中，若所述预先构建的提取模型还包括用于提取不同尺度数据特征的多尺度分析层，则通过样本数据对预先构建的提取模型进行训练，得到特征提取模型的过程包括：将所述样本数据输入所述时间序列分析层，得到结构特征；将所述结构特征输入所述自注意力机制层，得到异常结构特征；将所述异常结构特征输入所述非线性变换层，得到非线性数据特征；将所述非线性数据特征输入所述多尺度分析层，得到不同尺度数据特征；基于所述不同尺度数据特征和所述时序数据，构建损失函数，以最小化所述损失函数为目标，对所述时间序列分析层的参数、所述自注意力机制层的参数、所述非线性变换层的参数和所述多尺度分析层的参数进行更新；将参数更新后的时间序列分析层、参数更新后的自注意力机制层、参数更新后的非线性变换层和参数更新后的多尺度分析层进行组合，得到所述特征提取模型。In one embodiment of the present invention, if the pre-constructed extraction model also includes a multi-scale analysis layer for extracting data features of different scales, the pre-constructed extraction model is trained by sample data to obtain a feature extraction model, including: inputting the sample data into the time series analysis layer to obtain structural features; inputting the structural features into the self-attention mechanism layer to obtain abnormal structural features; inputting the abnormal structural features into the nonlinear transformation layer to obtain nonlinear data features; inputting the nonlinear data features into the multi-scale analysis layer to obtain data features of different scales; constructing a loss function based on the data features of different scales and the time series data, and updating the parameters of the time series analysis layer, the parameters of the self-attention mechanism layer, the parameters of the nonlinear transformation layer, and the parameters of the multi-scale analysis layer with the goal of minimizing the loss function; combining the time series analysis layer after parameter update, the self-attention mechanism layer after parameter update, the nonlinear transformation layer after parameter update, and the multi-scale analysis layer after parameter update to obtain the feature extraction model.

于本发明的一实施例中，所述时间序列分析层的表达式为：其中，ts(x)表示时序数据的结构特征，x表示时序数据，sigm表示激活函数，W_t表示权重，b_t表示偏置，β表示高斯积分的权重系数，T表示转置；所述自注意力机制层的表达式为：/>其中，Q、K、V分别表示自注意力机制层中的查询矩阵、键矩阵和值矩阵，a(ts)表示异常结构特征，T表示转置，d_k表示键的维度，ts表示结构特征；所述非线性变换层的表达式为：其中，f(a)表示非线性数据特征，W_f表示非线性变换层的权重矩阵，b_f表示非线性变换层的偏置向量，μ表示正则化系数，a表示正弦函数的权重系数，W_r表示正则化的权重矩阵，a表示异常结构特征；所述多尺度分析层的表达式为：/>其中，m(f)表示不同尺度数据特征，W_i表示第i个尺度的权重矩阵，σ_i表示第i个尺度的标准差，f表示非线性数据特征；所述损失函数的表达式为：/>其中，L表示损失函数，μ₁和μ₂均为正则化系数，h_i表示第i个尺度所在神经元的状态，F表示Frobenius范数，W_i表示第i个尺度的权重矩阵，x表示时序数据；所述动态密度估计值的计算公式为：其中，d_t(f)表示在时间t下非线性数据特征f的动态密度估计值，N表示非线性数据特征数量，K表示空间-时间核函数，用于测量非线性数据特征之间在空间和时间上的相似度，u表示非线性数据特征之间的空间距离，ω表示时间衰减函数，Δt表示非线性数据特征的任一时间t₀和当前时间t的时间差；所述动态阈值的计算公式为：其中，Φ表示动态阈值，/>表示非线性数据特征的动态密度估计值d_t的均值，θ表示常数，σ表示高斯核函数的标准差。In one embodiment of the present invention, the expression of the time series analysis layer is: Among them, ts(x) represents the structural characteristics of time series data, x represents time series data, sigm represents activation function, W _t represents weight, b _t represents bias, β represents weight coefficient of Gaussian integral, and T represents transposition; the expression of the self-attention mechanism layer is:/> Among them, Q, K, V represent the query matrix, key matrix and value matrix in the self-attention mechanism layer respectively, a(ts) represents the abnormal structural feature, T represents the transposition, d _k represents the dimension of the key, and ts represents the structural feature; the expression of the nonlinear transformation layer is: Wherein, f(a) represents nonlinear data features, _Wf represents the weight matrix of the nonlinear transformation layer, _bf represents the bias vector of the nonlinear transformation layer, μ represents the regularization coefficient, a represents the weight coefficient of the sine function, _Wr represents the regularized weight matrix, and a represents the abnormal structure feature; the expression of the multi-scale analysis layer is:/> Wherein, m(f) represents data features of different scales, _Wi represents the weight matrix of the i-th scale, _σi represents the standard deviation of the i-th scale, and f represents nonlinear data features; the expression of the loss function is:/> Wherein, L represents the loss function, μ ₁ and μ ₂ are both regularization coefficients, _hi represents the state of the neuron at the i-th scale, F represents the Frobenius norm, _Wi represents the weight matrix of the i-th scale, and x represents the time series data; the calculation formula of the dynamic density estimate is: Wherein, _dt (f) represents the dynamic density estimation value of the nonlinear data feature f at time t, N represents the number of nonlinear data features, K represents the space-time kernel function, which is used to measure the similarity between nonlinear data features in space and time, u represents the spatial distance between nonlinear data features, ω represents the time attenuation function, Δt represents the time difference between any time _t0 of the nonlinear data feature and the current time t; the calculation formula of the dynamic threshold is: Where Φ represents the dynamic threshold,/> represents the mean of the dynamic density estimate _dt of the nonlinear data feature, θ represents a constant, and σ represents the standard deviation of the Gaussian kernel function.

于本发明的一实施例中，基于采集得到的炼钢厂数据，构建多层次数据向量场的过程包括：将所述炼钢厂数据进行维度转化，得到数据点集；基于所述数据点集，计算得到数据持续图；基于所述数据持续图，计算得到所述多层次数据向量场。In one embodiment of the present invention, the process of constructing a multi-level data vector field based on the collected steel mill data includes: dimensional transformation of the steel mill data to obtain a data point set; based on the data point set, calculating a data persistence graph; based on the data persistence graph, calculating the multi-level data vector field.

于本发明的一实施例中，所述数据点集的表达式为：P(t)＝M(U)＝exp(-k₁t²)U·sin(k₂t)+k₃tU，其中，k₁表示时间衰减因子，k₂表示周期性调节因子，t表示时间，k₃表示线性时间影响因子，U表示炼钢厂数据，M(U)表示将炼钢厂数据进行维度转化的映射函数，P(t)表示数据点集；所述数据持续图的表达式为：其中，λ表示衰减因子，P(t)表示数据点集，D(t)表示数据持续图，t表示时间；所述多层次数据向量场的表达式为：/>其中，V_k(t)表示在时间t时第k层数据向量场，/>表示梯度算子，γ表示权重参数，D_k(t′)表示在时间t′时第k层的数据持续图，t′∈[0，t]；所述预设干扰识别函数的表达式为：/>其中，Idf表示预设干扰识别函数，θ表示阈值函数，V_k(t)表示在时间t时第k层数据向量场，/>表示V_k(t)的平均向量，I_k(t)表示在时间t时第k层数据向量场中识别到的干扰数据；所述纯净的炼钢厂数据的表达式为：/>其中，y(t)表示在时间t时纯净的炼钢厂数据，w_kj表示第k层的第j个干扰向量的权重，I_kj表示在时间t时第k层识别出的第j个干扰向量，n_k表示第k层的干扰向量的数量，K表示数据向量场的总层数，δ表示调节因子，U(t)表示时间t时的炼钢厂数据，I_k(t)表示在时间t时第k层识别出I_kj的组合。In one embodiment of the present invention, the expression of the data point set is: P(t)=M(U)=exp(-k ₁ t ² )U·sin(k ₂ t)+k ₃ tU, wherein k ₁ represents the time attenuation factor, k ₂ represents the periodic adjustment factor, t represents time, k ₃ represents the linear time influence factor, U represents the steel plant data, M(U) represents the mapping function for dimensional transformation of the steel plant data, and P(t) represents the data point set; the expression of the data persistence graph is: Wherein, λ represents the attenuation factor, P(t) represents the data point set, D(t) represents the data persistence diagram, and t represents time; the expression of the multi-level data vector field is:/> Where V _k (t) represents the k-th layer data vector field at time t, /> represents the gradient operator, γ represents the weight parameter, D _k (t′) represents the data persistence diagram of the kth layer at time t′, t′∈[0, t]; the expression of the preset interference recognition function is:/> Where, Idf represents the preset interference identification function, θ represents the threshold function, V _k (t) represents the k-th layer data vector field at time t, /> represents the average vector of V _k (t), I _k (t) represents the interference data identified in the k-th layer data vector field at time t; the expression of the pure steel mill data is:/> Among them, y(t) represents the pure steel plant data at time t, _wkj represents the weight of the jth interference vector of the kth layer, _Ikj represents the jth interference vector identified at the kth layer at time t, _nk represents the number of interference vectors in the kth layer, K represents the total number of layers of the data vector field, δ represents the adjustment factor, U(t) represents the steel plant data at time t, and _Ik (t) represents the combination of _Ikj identified at the kth layer at time t.

根据本发明实施例的一个方面，提供了一种炼钢厂数据采集装置，所述炼钢厂数据采集装置包括：数据获取模块，用于获取炼钢厂当前生产过程的时序数据；特征提取模块，用于将所述时序数据输入特征提取模型，得到非线性数据特征，所述特征提取模型由样本数据对预先构建的提取模型进行训练得到，所述预先构建的提取模型包括用于捕获结构特征的时间序列分析层、用于识别异常结构特征的自注意力机制层和用于提取非线性数据特征的非线性变换层，所述样本数据包括炼钢厂历史生产过程的全部或部分时序数据；阈值确定模块，用于基于所述非线性数据特征的动态密度估计值，确定出动态阈值；频率确定模块，用于根据所述动态密度估计值与所述动态阈值的比对结果，调整数据采集频率，以通过调整后的数据采集频率对所述炼钢厂进行数据采集。According to one aspect of an embodiment of the present invention, a data acquisition device for a steel mill is provided, the data acquisition device for the steel mill comprising: a data acquisition module for acquiring time series data of a current production process of the steel mill; a feature extraction module for inputting the time series data into a feature extraction model to obtain nonlinear data features, the feature extraction model is obtained by training a pre-constructed extraction model with sample data, the pre-constructed extraction model comprises a time series analysis layer for capturing structural features, a self-attention mechanism layer for identifying abnormal structural features, and a nonlinear transformation layer for extracting nonlinear data features, the sample data comprises all or part of the time series data of the historical production process of the steel mill; a threshold determination module for determining a dynamic threshold based on a dynamic density estimation value of the non-linear data feature; a frequency determination module for adjusting the data acquisition frequency according to a comparison result of the dynamic density estimation value with the dynamic threshold value, so as to perform data acquisition on the steel mill at the adjusted data acquisition frequency.

根据本发明实施例的一个方面，提供了一种电子设备，包括：一个或多个处理器；存储装置，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述电子设备实现如上述所述的炼钢厂数据采集方法。According to one aspect of an embodiment of the present invention, there is provided an electronic device, comprising: one or more processors; a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the electronic device implements the steel plant data collection method as described above.

根据本发明实施例的一个方面，提供了一种计算机可读存储介质，其上存储有计算机可读指令，当所述计算机可读指令被计算机的处理器执行时，使计算机执行上述所述的炼钢厂数据采集方法。According to one aspect of an embodiment of the present invention, a computer-readable storage medium is provided, on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor of a computer, the computer executes the steel mill data collection method described above.

本发明的有益效果：本发明通过获取炼钢厂当前生产过程的时序数据，将时序数据输入特征提取模型，得到非线性数据特征，特征提取模型由样本数据对预先构建的提取模型进行训练得到，基于非线性数据特征的动态密度估计值，确定出动态阈值，根据动态密度估计值与动态阈值的比对结果，调整数据采集频率，以通过调整后的数据采集频率对炼钢厂进行数据采集，以上过程，能够根据动态密度估计值与动态阈值的比对结果，对数据采集频率进行调整，解决了采用固定频率对炼钢厂进行数据采集带来的过度采集或采集量不足的技术问题。The beneficial effects of the present invention are as follows: the present invention obtains the time series data of the current production process of the steel plant, inputs the time series data into a feature extraction model, obtains nonlinear data features, the feature extraction model is obtained by training a pre-constructed extraction model with sample data, and determines a dynamic threshold based on a dynamic density estimate of the nonlinear data feature. According to the comparison result of the dynamic density estimate with the dynamic threshold, the data collection frequency is adjusted to collect data from the steel plant at the adjusted data collection frequency. The above process can adjust the data collection frequency according to the comparison result of the dynamic density estimate with the dynamic threshold, thereby solving the technical problem of excessive collection or insufficient collection caused by using a fixed frequency to collect data from the steel plant.

另外，结合时间序列分析层、自注意力机制层、非线性变换层、正交约束和多尺度分析层的特征提取模型能够深入地捕获时序数据的复杂模式和特征，尤其是在不同的时间尺度上捕获特征，并且能够考虑数据的时间依赖性，更适用于炼钢厂这种具有高度时间依赖性的数据环境，能够更为准确地对时序数据进行分析和处理；基于数据拓扑结构与多层次数据向量场，准确地识别出炼钢厂数据中的干扰部分，从而大大提高了数据采集的准确性，提高了数据采集的效率。In addition, the feature extraction model that combines the time series analysis layer, self-attention mechanism layer, nonlinear transformation layer, orthogonal constraint and multi-scale analysis layer can deeply capture the complex patterns and characteristics of time series data, especially capturing features at different time scales, and can consider the time dependence of data. It is more suitable for data environments such as steel mills that have a high degree of time dependence, and can analyze and process time series data more accurately; based on the data topological structure and multi-level data vector field, the interference part in the steel plant data can be accurately identified, thereby greatly improving the accuracy of data acquisition and improving the efficiency of data acquisition.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本申请。It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术者来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。在附图中：The drawings herein are incorporated into the specification and constitute a part of the specification, showing embodiments consistent with the present application, and together with the specification, are used to explain the principles of the present application. Obviously, the drawings described below are only some embodiments of the present application, and for those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work. In the drawings:

图1是本申请的一示例性实施例示出的示例性系统架构的示意图；FIG1 is a schematic diagram of an exemplary system architecture shown in an exemplary embodiment of the present application;

图2是本申请的一示例性实施例示出的炼钢厂数据采集方法的流程图；FIG2 is a flow chart of a steel plant data collection method shown in an exemplary embodiment of the present application;

图3是本申请的另一示例性实施例示出的炼钢厂数据采集方法的流程图；FIG3 is a flow chart of a steel plant data collection method shown in another exemplary embodiment of the present application;

图4是本申请的另一示例性实施例示出的特征提取模型的结构图；FIG4 is a structural diagram of a feature extraction model shown in another exemplary embodiment of the present application;

图5是本申请的一示例性实施例示出的炼钢厂数据采集装置的框图；FIG5 is a block diagram of a data acquisition device for a steel plant shown in an exemplary embodiment of the present application;

图6示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG6 shows a schematic diagram of the structure of a computer system of an electronic device suitable for implementing an embodiment of the present application.

具体实施方式Detailed ways

以下将参照附图和优选实施例来说明本发明的实施方式，本领域技术人员可由本说明书中所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本发明的精神下进行各种修饰或改变。应当理解，优选实施例仅为了说明本发明，而不是为了限制本发明的保护范围。The following will describe the embodiments of the present invention with reference to the accompanying drawings and preferred embodiments. Those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and the details in this specification can also be modified or changed in various ways based on different viewpoints and applications without departing from the spirit of the present invention. It should be understood that the preferred embodiments are only for illustrating the present invention, not for limiting the scope of protection of the present invention.

需要说明的是，以下实施例中所提供的图示仅以示意方式说明本发明的基本构想，遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制，其实际实施时各组件的型态、数量及比例可为一种随意的改变，且其组件布局型态也可能更为复杂。It should be noted that the illustrations provided in the following embodiments are only schematic illustrations of the basic concept of the present invention, and thus the drawings only show components related to the present invention rather than being drawn according to the number, shape and size of components in actual implementation. In actual implementation, the type, quantity and proportion of each component may be changed arbitrarily, and the component layout may also be more complicated.

在下文描述中，探讨了大量细节，以提供对本发明实施例的更透彻的解释，然而，对本领域技术人员来说，可以在没有这些具体细节的情况下实施本发明的实施例是显而易见的，在其他实施例中，以方框图的形式而不是以细节的形式来示出公知的结构和设备，以避免使本发明的实施例难以理解。In the following description, numerous details are discussed to provide a more thorough explanation of the embodiments of the present invention. However, it is obvious to those skilled in the art that the embodiments of the present invention can be implemented without these specific details. In other embodiments, well-known structures and devices are shown in the form of block diagrams rather than in detail to avoid making the embodiments of the present invention difficult to understand.

图1是本申请的一示例性实施例示出的示例性系统架构的示意图。FIG. 1 is a schematic diagram of an exemplary system architecture shown in an exemplary embodiment of the present application.

参照图1所示，系统架构可以包括采集设备101和计算机设备102。其中，计算机设备102可以是台式图形处理器(Graphic Processing Unit，GPU)计算机、GPU计算集群、神经网络计算机等中的至少一种。相关技术人员可以使用该计算机设备102通过获取炼钢厂当前生产过程的时序数据，将时序数据输入特征提取模型，得到非线性数据特征，特征提取模型由样本数据对预先构建的提取模型进行训练得到，基于非线性数据特征的动态密度估计值，确定出动态阈值，根据动态密度估计值与动态阈值的比对结果，调整数据采集频率，以通过调整后的数据采集频率对炼钢厂进行数据采集。采集设备101用于采集时序数据，在本实施例中数据采集设备101采用传感器等设备对时序数据进行采集，并提供给计算机设备102进行处理。As shown in FIG1 , the system architecture may include an acquisition device 101 and a computer device 102. The computer device 102 may be at least one of a desktop graphics processing unit (GPU) computer, a GPU computing cluster, a neural network computer, etc. Relevant technicians may use the computer device 102 to obtain the time series data of the current production process of the steel mill, input the time series data into a feature extraction model, and obtain nonlinear data features. The feature extraction model is obtained by training a pre-built extraction model with sample data. Based on the dynamic density estimation value of the nonlinear data feature, a dynamic threshold is determined. According to the comparison result of the dynamic density estimation value and the dynamic threshold, the data acquisition frequency is adjusted to collect data from the steel mill through the adjusted data acquisition frequency. The acquisition device 101 is used to collect time series data. In this embodiment, the data acquisition device 101 uses a sensor and other devices to collect time series data and provides it to the computer device 102 for processing.

示意性的，计算机设备102在获取到采集设备101中的时序数据之后，将时序数据输入特征提取模型，得到非线性数据特征，特征提取模型由样本数据对预先构建的提取模型进行训练得到，基于非线性数据特征的动态密度估计值，确定出动态阈值，根据动态密度估计值与动态阈值的比对结果，调整数据采集频率，以通过调整后的数据采集频率对炼钢厂进行数据采集，以上过程，能够根据动态密度估计值与动态阈值的比对结果，对数据采集频率进行调整，解决了采用固定频率对炼钢厂进行数据采集带来的过度采集或采集量不足的技术问题。Illustratively, after acquiring the time series data from the acquisition device 101, the computer device 102 inputs the time series data into a feature extraction model to obtain nonlinear data features. The feature extraction model is obtained by training a pre-constructed extraction model with sample data. A dynamic threshold is determined based on a dynamic density estimate of the nonlinear data feature. According to a comparison result between the dynamic density estimate and the dynamic threshold, the data acquisition frequency is adjusted to collect data from the steel plant at the adjusted data acquisition frequency. The above process can adjust the data acquisition frequency according to the comparison result between the dynamic density estimate and the dynamic threshold, thereby solving the technical problem of excessive or insufficient data collection from the steel plant caused by using a fixed frequency for data collection.

需要说明的是，本申请实施例所提供的炼钢厂数据采集方法一般由计算机设备102执行，相应地，炼钢厂数据采集装置一般设置于计算机设备102中。It should be noted that the steel mill data collection method provided in the embodiment of the present application is generally executed by the computer device 102 , and accordingly, the steel mill data collection device is generally set in the computer device 102 .

以下对本申请实施例的技术方案的实现细节进行详细阐述：The implementation details of the technical solution of the embodiment of the present application are described in detail below:

图2是本申请的一示例性实施例示出的炼钢厂数据采集方法的流程图，该炼钢厂数据采集方法可以计算处理设备来执行，该计算处理设备可以是图1中所示的计算机设备102。参照图2所示，该炼钢厂数据采集方法至少包括步骤S210至步骤S240，详细介绍如下：FIG2 is a flow chart of a steel mill data collection method shown in an exemplary embodiment of the present application. The steel mill data collection method can be executed by a computing and processing device, and the computing and processing device can be the computer device 102 shown in FIG1. Referring to FIG2, the steel mill data collection method at least includes steps S210 to S240, which are described in detail as follows:

在步骤S210中，获取炼钢厂当前生产过程的时序数据。In step S210, the time series data of the current production process of the steel plant is obtained.

在本申请的一实施例中，时序数据包括炼钢厂当前生产过程中的参数变化数据，例如，温度随时间变化数据、化学成分随时间变化数据、压力随时间变化数据和材料流量随时间变化数据等。In one embodiment of the present application, the time series data includes parameter change data in the current production process of the steel plant, such as temperature change data over time, chemical composition change data over time, pressure change data over time, and material flow change data over time.

在步骤S220中，将时序数据输入特征提取模型，得到非线性数据特征。In step S220, the time series data is input into a feature extraction model to obtain nonlinear data features.

在本实施例中，特征提取模型由样本数据对预先构建的提取模型进行训练得到，预先构建的提取模型包括用于捕获结构特征的时间序列分析层、用于识别异常结构特征的自注意力机制层和用于提取非线性数据特征的非线性变换层，样本数据包括炼钢厂历史生产过程的全部或部分时序数据。In this embodiment, the feature extraction model is obtained by training a pre-built extraction model with sample data. The pre-built extraction model includes a time series analysis layer for capturing structural features, a self-attention mechanism layer for identifying abnormal structural features, and a non-linear transformation layer for extracting non-linear data features. The sample data includes all or part of the time series data of the historical production process of the steel plant.

在本实施例中，时序数据的结构特征包括平稳的结构特征和非平稳的结构特征，其中，异常结构特征(或关键结构特征)包括非平稳的结构特征等。In this embodiment, the structural features of the time series data include stationary structural features and non-stationary structural features, wherein the abnormal structural features (or key structural features) include non-stationary structural features and the like.

在本实施例中，通过特征提取模块提取时序数据的非线性数据特征，从而捕获到了炼钢厂数据的内在结构特征，有利于减少炼钢厂数据的维度或复杂性。In this embodiment, the nonlinear data features of the time series data are extracted through the feature extraction module, thereby capturing the inherent structural features of the steel mill data, which is conducive to reducing the dimension or complexity of the steel mill data.

在步骤S230中，基于非线性数据特征的动态密度估计值，确定出动态阈值。In step S230, a dynamic threshold is determined based on the dynamic density estimate of the nonlinear data feature.

在本实施例中，动态密度估计值的计算公式为：In this embodiment, the calculation formula of the dynamic density estimation value is:

其中，d_t(f)表示在时间t下非线性数据特征f的动态密度估计值，N表示非线性数据特征数量，K表示空间-时间核函数，用于测量非线性数据特征之间在空间和时间上的相似度，u表示非线性数据特征之间的空间距离，ω表示时间衰减函数，Δt表示非线性数据特征的任一时间t₀和当前时间t的时间差。Wherein, _dt (f) represents the dynamic density estimate of the nonlinear data feature f at time t, N represents the number of nonlinear data features, K represents the space-time kernel function, which is used to measure the similarity between nonlinear data features in space and time, u represents the spatial distance between nonlinear data features, ω represents the time attenuation function, and Δt represents the time difference between any time _t0 and the current time t of the nonlinear data feature.

在本实施例中，通过动态密度估计值的计算，为每个提取出的非线性数据特征分配一个权重，这个权重基于非线性数据特征的时间属性，这意味着与异常数据特征最近的数据特征会被赋予更高的权重。In this embodiment, a weight is assigned to each extracted nonlinear data feature by calculating a dynamic density estimation value. The weight is based on the time attribute of the nonlinear data feature, which means that the data feature closest to the abnormal data feature will be assigned a higher weight.

在本实施例中，时间衰减函数的表达式为：In this embodiment, the expression of the time decay function is:

ω(Δt)＝e^-βΔt 式(2)ω(Δt)＝e ^-βΔt Formula (2)

其中，β表示常数，它决定了权重的衰减速度，Δt表示非线性数据特征的任一时间t₀和当前时间t的时间差，ω表示时间衰减函数。选择指数衰减函数作为时间衰减函数，可以确保与异常数据特征最近的数据点有更大的权重。Among them, β represents a constant, which determines the decay speed of the weight, Δt represents the time difference between any time t ₀ and the current time t of the nonlinear data feature, and ω represents the time decay function. Selecting the exponential decay function as the time decay function can ensure that the data points closest to the abnormal data feature have a larger weight.

在本实施例中，空间-时间核函数的表达式为：In this embodiment, the expression of the space-time kernel function is:

其中，σ表示高斯核函数的标准差，决定了核函数的宽度，s表示时序数据的维度，u表示非线性数据特征之间的空间距离，ω表示时间衰减函数，Δt表示非线性数据特征的任一时间t₀和当前时间t的时间差，K表示空间-时间核函数。Among them, σ represents the standard deviation of the Gaussian kernel function, which determines the width of the kernel function, s represents the dimension of the time series data, u represents the spatial distance between nonlinear data features, ω represents the time decay function, Δt represents the time difference between any time _t0 and the current time t of the nonlinear data feature, and K represents the space-time kernel function.

在本实施例中，空间-时间核函数用于测量非线性数据特征之间在空间和时间上的相似度，空间-时间核函数同时考虑数据的空间和时间属性，结合高斯核函数和时间衰减函数，两个非线性数据特征在空间和时间上都越接近，那么两个非线性数据特征的相似度越高。In this embodiment, the space-time kernel function is used to measure the similarity between nonlinear data features in space and time. The space-time kernel function takes into account both the spatial and temporal attributes of the data, and combines the Gaussian kernel function and the time attenuation function. The closer two nonlinear data features are in space and time, the higher the similarity between the two nonlinear data features.

在本实施例中，动态阈值的计算公式为：In this embodiment, the calculation formula of the dynamic threshold is:

其中，Φ表示动态阈值，表示非线性数据特征的动态密度估计值d_t的均值，θ表示常数，σ表示高斯核函数的标准差。Where Φ represents the dynamic threshold, represents the mean of the dynamic density estimate _dt of the nonlinear data feature, θ represents a constant, and σ represents the standard deviation of the Gaussian kernel function.

在本实施例中，动态阈值是基于所有非线性数据特征的动态密度估计值d_t的均值和高斯核函数的标准差计算的。In this embodiment, the dynamic threshold is calculated based on the mean of the dynamic density estimation values _dt of all nonlinear data features and the standard deviation of the Gaussian kernel function.

在步骤S240中，根据动态密度估计值与动态阈值的比对结果，调整数据采集频率，以通过调整后的数据采集频率对炼钢厂进行数据采集。In step S240, the data collection frequency is adjusted according to the comparison result between the dynamic density estimation value and the dynamic threshold value, so as to collect data from the steel plant through the adjusted data collection frequency.

在本实施例中，对于每个非线性数据特征，将其动态密度估计值d_t与动态阈值Φ进行比较，如果某个非线性数据特征的动态密度估计值大于动态阈值，那么，这个非线性数据特征被认为是异常数据特征(或关键数据特征)。一旦识别出异常数据特征，则增加数据采样频率，以获得更详细的炼钢厂数据；相反，如果非线性数据特征的动态密度估计值小于或等于动态阈值，则降低数据采样频率，以减少炼钢厂数据的采集和处理负担，因此，动态数据密度估计方法解决了采用固定频率对炼钢厂进行数据采集带来的过度采集或采集量不足的技术问题，不仅可以更准确地识别出时序数据中的异常数据特征，而且可以提高炼钢厂数据的采集效率，能够帮助炼钢厂更好地监控和优化生产过程。In this embodiment, for each nonlinear data feature, its dynamic density estimation value _dt is compared with the dynamic threshold value Φ. If the dynamic density estimation value of a nonlinear data feature is greater than the dynamic threshold value, then the nonlinear data feature is considered to be an abnormal data feature (or a key data feature). Once the abnormal data feature is identified, the data sampling frequency is increased to obtain more detailed steel mill data; on the contrary, if the dynamic density estimation value of the nonlinear data feature is less than or equal to the dynamic threshold value, the data sampling frequency is reduced to reduce the burden of steel mill data collection and processing. Therefore, the dynamic data density estimation method solves the technical problem of excessive or insufficient data collection caused by using a fixed frequency to collect data from a steel mill, and can not only more accurately identify abnormal data features in time series data, but also improve the collection efficiency of steel mill data, which can help steel mills better monitor and optimize production processes.

在本申请的一实施例中，通过调整后的数据采集频率对炼钢厂进行数据采集后，炼钢厂数据采集方法包括：In one embodiment of the present application, after data is collected from a steel mill through the adjusted data collection frequency, the data collection method for the steel mill includes:

基于采集得到的炼钢厂数据，构建多层次数据向量场，多层次数据向量场用于提取不同层次数据特征。Based on the collected steel mill data, a multi-level data vector field is constructed, and the multi-level data vector field is used to extract data features at different levels.

在本实施例中，多层次数据向量场可以揭示炼钢厂数据中的流动方向和模式。每一层的数据向量场都是基于不同的分辨率来构建的，从而捕捉到不同层次数据特征。In this embodiment, the multi-level data vector field can reveal the flow direction and pattern in the steel mill data. Each layer of the data vector field is constructed based on different resolutions to capture different levels of data features.

将多层次数据向量场输入预设干扰识别函数，得到干扰数据。The multi-level data vector field is input into a preset interference identification function to obtain interference data.

在本实施例中，预设干扰识别函数的表达式为：In this embodiment, the expression of the preset interference identification function is:

其中，Idf表示预设干扰识别函数，θ表示阈值函数，用于筛选出V_k(t)与差异向量，V_k(t)表示在时间t时第k层数据向量场，/>表示V_k(t)的平均向量，I_k(t)表示在时间t时第k层数据向量场中识别到的干扰数据。Where Idf represents the preset interference identification function, θ represents the threshold function, which is used to filter out V _k (t) and The difference vector, V _k (t) represents the k-th layer data vector field at time t, /> represents the average vector of V _k (t), and I _k (t) represents the interference data identified in the k-th layer data vector field at time t.

在本实施例中，通过预设干扰识别函数识别出与正常数据特征流动方向不符的向量，这些向量对应的是炼钢厂数据中的干扰部分。In this embodiment, vectors that are inconsistent with the flow direction of normal data characteristics are identified through a preset interference identification function, and these vectors correspond to the interference parts in the steel mill data.

将干扰数据从炼钢厂数据中剔除，得到纯净的炼钢厂数据。The interference data is removed from the steel plant data to obtain pure steel plant data.

在本实施例中，纯净的炼钢厂数据的表达式为：In this embodiment, the expression of the pure steel plant data is:

其中，y(t)表示在时间t时纯净的炼钢厂数据，w_kj表示第k层的第j个干扰向量的权重，I_kj表示在时间t时第k层识别出的第j个干扰向量，n_k表示第k层的干扰向量的数量，K表示数据向量场的总层数，δ表示调节因子，U(t)表示时间t时的炼钢厂数据，I_k(t)表示在时间t时第k层识别出I_kj的组合。Among them, y(t) represents the pure steel plant data at time t, _wkj represents the weight of the jth interference vector of the kth layer, _Ikj represents the jth interference vector identified at the kth layer at time t, _nk represents the number of interference vectors in the kth layer, K represents the total number of layers of the data vector field, δ represents the adjustment factor, U(t) represents the steel plant data at time t, and _Ik (t) represents the combination of _Ikj identified at the kth layer at time t.

在本实施例中，通过从炼钢厂数据中剔除中干扰数据，有效地降低采集到的炼钢厂数据中的干扰数据，确保采集到的炼钢厂数据的准确性。In this embodiment, by removing interference data from the steel mill data, the interference data in the collected steel mill data is effectively reduced, thereby ensuring the accuracy of the collected steel mill data.

在本申请的一实施例中，根据动态密度估计值与动态阈值的比对结果，调整数据采集频率的过程包括：In one embodiment of the present application, the process of adjusting the data acquisition frequency according to the comparison result of the dynamic density estimation value and the dynamic threshold value includes:

获取时序数据的当前数据采集频率。Get the current data collection frequency of time series data.

在本实施例中，当前数据采集频率可以根据炼钢厂生产的工艺阶段、生产工况等进行设定，在此，不进行赘述。In this embodiment, the current data collection frequency can be set according to the process stage, production conditions, etc. of the steel plant production, which will not be elaborated here.

若动态密度估计值小于或等于动态阈值，则判定动态密度估计值对应的数据特征为正常数据特征，并将当前数据采集频率调整为第一数据采集频率。If the dynamic density estimation value is less than or equal to the dynamic threshold value, the data feature corresponding to the dynamic density estimation value is determined to be a normal data feature, and the current data collection frequency is adjusted to the first data collection frequency.

在本实施例中，第一数据采集频率小于当前数据采集频率，第一数据采集频率可以根据实际情况进行设定，在此，不进行赘述。In this embodiment, the first data collection frequency is less than the current data collection frequency. The first data collection frequency can be set according to actual conditions, which will not be described in detail herein.

在本实施例中，在通过第一数据采集频率对炼钢厂进行数据采集，以减少炼钢厂数据采集和处理的负担。In this embodiment, data is collected from the steel plant at a first data collection frequency to reduce the burden of data collection and processing on the steel plant.

若动态密度估计值大于动态阈值，则判定动态密度估计值对应的数据特征为异常数据特征，并将当前数据采集频率调整为第二数据采集频率。If the dynamic density estimation value is greater than the dynamic threshold, the data feature corresponding to the dynamic density estimation value is determined to be an abnormal data feature, and the current data collection frequency is adjusted to a second data collection frequency.

在本实施例中，第二数据采集频率大于当前数据采集频率，第二数据采集频率可以根据实际情况进行设定，在此，不进行赘述。In this embodiment, the second data collection frequency is greater than the current data collection frequency. The second data collection frequency can be set according to actual conditions, which will not be described in detail herein.

在本实施例中，在通过第二数据采集频率对炼钢厂进行数据采集，以获得更详细的异常数据，从而不遗漏异常数据，提高炼钢厂数据采集的效率和准确性，还可以帮助炼钢厂更好地监控和优化生产过程。In this embodiment, data is collected from the steel plant through the second data collection frequency to obtain more detailed abnormal data, so as not to miss abnormal data, improve the efficiency and accuracy of data collection in the steel plant, and also help the steel plant better monitor and optimize the production process.

在本申请的一实施例中，若预先构建的提取模型还包括用于提取不同尺度数据特征的多尺度分析层，则通过样本数据对预先构建的提取模型进行训练，得到特征提取模型的过程包括：In one embodiment of the present application, if the pre-constructed extraction model further includes a multi-scale analysis layer for extracting features of data at different scales, the process of training the pre-constructed extraction model with sample data to obtain a feature extraction model includes:

将样本数据输入时间序列分析层，得到结构特征。The sample data is input into the time series analysis layer to obtain structural features.

在本实施例中，时间序列分析层的表达式为：In this embodiment, the expression of the time series analysis layer is:

其中，tx(x)表示时序数据的结构特征，x表示时序数据，sigm表示激活函数，W_t表示权重，b_t表示偏置，β表示高斯积分的权重系数，T表示转置。Among them, tx(x) represents the structural characteristics of time series data, x represents time series data, sigm represents the activation function, _Wt represents the weight, _bt represents the bias, β represents the weight coefficient of Gaussian integral, and T represents transpose.

在本实施例中，通过时间序列分析层对时序数据进行处理，可以捕获时序数据中的局部模式，更好地理解时序数据的内在结构特征。例如，通过分析温度和化学成分随时间的变化，预测并调节炉温和材料投放，优化炼钢过程。In this embodiment, by processing the time series data through the time series analysis layer, local patterns in the time series data can be captured and the inherent structural characteristics of the time series data can be better understood. For example, by analyzing the changes in temperature and chemical composition over time, the furnace temperature and material input can be predicted and adjusted to optimize the steelmaking process.

将结构特征输入自注意力机制层，得到异常结构特征。The structural features are input into the self-attention mechanism layer to obtain abnormal structural features.

在本实施例中，自注意力机制层的表达式为：In this embodiment, the expression of the self-attention mechanism layer is:

其中，Q、K、V分别表示自注意力机制层中的查询矩阵、键矩阵和值矩阵，a(ts)表示异常结构特征，T表示转置，d_k表示键的维度，ts表示结构特征。Among them, Q, K, V represent the query matrix, key matrix and value matrix in the self-attention mechanism layer respectively, a(ts) represents the abnormal structural feature, T represents transpose, _dk represents the dimension of the key, and ts represents the structural feature.

在本实施例中，查询矩阵Q的表达式为：In this embodiment, the expression of the query matrix Q is:

Q＝W_qts 式(9)Q＝W _q ts Formula (9)

其中，ts表示结构特征，W_q表示查询矩阵中的权重矩阵，Q表示自注意力机制层中的查询矩阵。Among them, ts represents the structural features, _Wq represents the weight matrix in the query matrix, and Q represents the query matrix in the self-attention mechanism layer.

在本实施例中，键矩阵K的表达式为：In this embodiment, the key matrix K is expressed as:

K＝W_kts 式(10)K＝W _k ts Formula (10)

其中，ts表示结构特征，W_k表示键矩阵中的权重矩阵，K表示自注意力机制层中的键矩阵。Among them, ts represents the structural features, _Wk represents the weight matrix in the key matrix, and K represents the key matrix in the self-attention mechanism layer.

在本实施例中，值矩阵V的表达式为：In this embodiment, the expression of the value matrix V is:

V＝W_vts 式(11)V＝W _v ts Formula (11)

其中，ts表示结构特征，W_v表示值矩阵中的权重矩阵，V表示自注意力机制层中的值矩阵。Among them, ts represents the structural features, W _v represents the weight matrix in the value matrix, and V represents the value matrix in the self-attention mechanism layer.

在本实施例中，自注意力机制层能够捕获并关注时序数据中的异常时间点或异常结构特征，并在捕获到异常时间点或异常结构特征之后，对炼钢生产过程中重要的影响因素进行分析，从而得到导致产生异常的原因。例如，当温度发生异常波动时，可以及时识别并处理潜在的生产问题。In this embodiment, the self-attention mechanism layer can capture and focus on abnormal time points or abnormal structural features in the time series data, and after capturing the abnormal time points or abnormal structural features, analyze the important influencing factors in the steelmaking production process to obtain the causes of the abnormality. For example, when the temperature fluctuates abnormally, potential production problems can be identified and handled in a timely manner.

将异常结构特征输入非线性变换层，得到非线性数据特征。The abnormal structure features are input into the nonlinear transformation layer to obtain nonlinear data features.

在本实施例中，非线性变换层的表达式为：In this embodiment, the expression of the nonlinear transformation layer is:

其中，f(a表示非线性数据特征，W_f表示非线性变换层的权重矩阵，b_f表示非线性变换层的偏置向量，μ表示正则化系数，α表示正弦函数的权重系数，W_r表示正则化的权重矩阵，a表示异常结构特征。Among them, f(a) represents the nonlinear data feature, _Wf represents the weight matrix of the nonlinear transformation layer, _bf represents the bias vector of the nonlinear transformation layer, μ represents the regularization coefficient, α represents the weight coefficient of the sine function, _Wr represents the regularized weight matrix, and a represents the abnormal structure feature.

在本实施例中，增加非线性变换层使得特征提取模型能够处理复杂且相互关联的生产数据，如温度、压力、材料比例，既能确保数据间的独立性，又能避免冗余和误差累积，正交约束通过正则化的权重矩阵引入，以确保得到的数据特征是正交的。In this embodiment, adding a nonlinear transformation layer enables the feature extraction model to process complex and interrelated production data, such as temperature, pressure, and material proportion, which can not only ensure the independence of the data but also avoid redundancy and error accumulation. The orthogonal constraint is introduced through the regularized weight matrix to ensure that the obtained data features are orthogonal.

将非线性数据特征输入多尺度分析层，得到不同尺度数据特征。The nonlinear data features are input into the multi-scale analysis layer to obtain data features of different scales.

在本实施例中，多尺度分析层的表达式为：In this embodiment, the expression of the multi-scale analysis layer is:

其中，m(f)表示不同尺度数据特征，W_i表示第i个尺度的权重矩阵，σ_i表示第i个尺度的标准差，f表示非线性数据特征。Among them, m(f) represents data features of different scales, _Wi represents the weight matrix of the i-th scale, _σi represents the standard deviation of the i-th scale, and f represents nonlinear data features.

在本实施例中，多尺度分析层可以在不同时间尺度上捕获时序数据的非线性数据特征，从而得到时序数据在不同时间和空间尺度上的特征变化，如短期内的快速变化和长期趋势，以全面监控生产过程。In this embodiment, the multi-scale analysis layer can capture the nonlinear data characteristics of time series data at different time scales, thereby obtaining the characteristic changes of time series data at different time and space scales, such as rapid changes in the short term and long-term trends, to comprehensively monitor the production process.

基于不同尺度数据特征和时序数据，构建损失函数，以最小化损失函数为目标，对时间序列分析层的参数、自注意力机制层的参数、非线性变换层的参数和多尺度分析层的参数进行更新。Based on the data features of different scales and time series data, a loss function is constructed. With the goal of minimizing the loss function, the parameters of the time series analysis layer, the parameters of the self-attention mechanism layer, the parameters of the nonlinear transformation layer, and the parameters of the multi-scale analysis layer are updated.

在本实施例中，损失函数的表达式为：In this embodiment, the loss function is expressed as:

其中，L表示损失函数，μ₁和μ₂均为正则化系数，h_i表示第i个尺度所在神经元的状态，F表示Frobenius范数，W_i表示第i个尺度的权重矩阵，x表示时序数据。Among them, L represents the loss function, _μ1 and _μ2 are regularization coefficients, _hi represents the state of the neuron at the i-th scale, F represents the Frobenius norm, _Wi represents the weight matrix of the i-th scale, and x represents the time series data.

在本实施例中，损失函数利用梯度下降法对时间序列分析层的参数、自注意力机制层的参数、非线性变换层的参数和多尺度分析层的参数进行更新和优化，通过最小化损失函数值，可以确保特征提取模型能够准确地捕捉炼钢厂数据中的结构特征，并通过结构特征反映炼钢厂数据中的关键信息，同时减少炼钢厂数据的维度和复杂性，使后续的数据处理和分析更加高效和准确。In this embodiment, the loss function uses the gradient descent method to update and optimize the parameters of the time series analysis layer, the parameters of the self-attention mechanism layer, the parameters of the nonlinear transformation layer, and the parameters of the multi-scale analysis layer. By minimizing the loss function value, it can be ensured that the feature extraction model can accurately capture the structural features in the steel mill data and reflect the key information in the steel mill data through the structural features. At the same time, the dimension and complexity of the steel mill data are reduced, making subsequent data processing and analysis more efficient and accurate.

将参数更新后的时间序列分析层、参数更新后的自注意力机制层、参数更新后的非线性变换层和参数更新后的多尺度分析层进行组合，得到特征提取模型。The time series analysis layer after parameter update, the self-attention mechanism layer after parameter update, the nonlinear transformation layer after parameter update and the multi-scale analysis layer after parameter update are combined to obtain a feature extraction model.

在本实施例中，结合时间序列分析层、自注意力机制层、非线性变换层、正交约束和多尺度分析层的特征提取模型能够深入地捕获时序数据的复杂模式和特征，尤其是在不同的时间尺度上捕获特征，并且能够考虑数据的时间依赖性，更适用于炼钢厂这种具有高度时间依赖性的数据环境，能够更为准确地对时序数据进行分析和处理。In this embodiment, the feature extraction model that combines the time series analysis layer, the self-attention mechanism layer, the nonlinear transformation layer, the orthogonal constraint and the multi-scale analysis layer can deeply capture the complex patterns and characteristics of the time series data, especially capturing features at different time scales, and can consider the time dependency of the data. It is more suitable for data environments such as steel mills that have a high degree of time dependency, and can analyze and process time series data more accurately.

在本申请的一实施例中，基于采集得到的炼钢厂数据，构建多层次数据向量场的过程包括：In one embodiment of the present application, the process of constructing a multi-level data vector field based on the collected steel mill data includes:

将炼钢厂数据进行维度转化，得到数据点集。The steel plant data is transformed into a dimension to obtain a set of data points.

在本实施例中，数据点集的表达式为：In this embodiment, the expression of the data point set is:

P(t)＝M(U)＝exp(-k₁t²)U·sin(k₂t)+k₂tU 式(15)P(t)＝M(U)＝exp(-k ₁ t ² )U·sin(k ₂ t)+k ₂ tU Formula (15)

其中，k₁表示时间衰减因子，k₂表示周期性调节因子，t表示时间，k₃表示线性时间影响因子，U表示炼钢厂数据，M(U)表示将炼钢厂数据进行维度转化的映射函数，P(t)表示数据点集。Among them, _k1 represents the time attenuation factor, _k2 represents the periodic adjustment factor, t represents time, _k3 represents the linear time influence factor, U represents the steel plant data, M(U) represents the mapping function for dimensional transformation of the steel plant data, and P(t) represents the data point set.

在本实施例中，k₁用于控制exp(-k₁t²)项的衰减速度，k₂用于影响数据变化的频率，k₃用于控制数据随时间的线性变化速度，exp(-k₁t²)表示炼钢厂数据随时间衰减的影响，更直接反映时间变化的影响，sin(k₂t)用于保持对周期性变化的捕捉，k₃tU是线性项，直接将时间因子与炼钢厂数据U相关联，表示炼钢厂数据U随时间线性增长或减少的趋势。In this embodiment, k ₁ is used to control the decay rate of the exp(-k ₁ t ² ) term, k ₂ is used to affect the frequency of data changes, k ₃ is used to control the linear change rate of data over time, exp(-k ₁ t ² ) represents the influence of the steel mill data decaying over time, and more directly reflects the influence of time changes, sin(k ₂ t) is used to keep capturing periodic changes, k ₃ tU is a linear term, which directly associates the time factor with the steel mill data U, and represents the trend of the steel mill data U linearly increasing or decreasing over time.

基于数据点集，计算得到数据持续图。Based on the set of data points, a data persistence graph is calculated.

在本实施例中，数据持续图的表达式为：In this embodiment, the expression of the data persistence graph is:

其中，λ表示衰减因子，P(t)表示数据点集，D(t)表示数据持续图，t表示时间。Among them, λ represents the decay factor, P(t) represents the data point set, D(t) represents the data persistence diagram, and t represents time.

在本实施例中，λ用于平衡数据点集一阶导数和数据点集二阶导数/>基于数据点集，计算得到数据持续图的方法为拓扑数据分析方法，通过拓扑数据分析方法，进一步揭示了炼钢厂数据的拓扑结构和数据特征。In this embodiment, λ is used to balance the first-order derivative of the data point set. and the second-order derivative of the data point set/> The method of calculating the data persistence graph based on the data point set is the topological data analysis method. Through the topological data analysis method, the topological structure and data characteristics of the steel plant data are further revealed.

基于数据持续图，计算得到多层次数据向量场。Based on the data persistence graph, a multi-level data vector field is calculated.

在本实施例中，多层次数据向量场的表达式为：In this embodiment, the expression of the multi-level data vector field is:

其中，V_k(t)表示在时间t时第k层数据向量场，表示梯度算子，γ表示权重参数，D_k(t′)表示在时间t′时第k层的数据持续图，t′∈[0，t]。Where V _k (t) represents the k-th layer data vector field at time t, represents the gradient operator, γ represents the weight parameter, D _k (t′) represents the data persistence graph of the kth layer at time t′, t′∈[0,t].

在本实施例中，梯度算子表示炼钢厂数据的方向性特征，权重参数γ用于平衡梯度项/>和二次项/> In this embodiment, the gradient operator Represents the directional characteristics of steel mill data, and the weight parameter γ is used to balance the gradient term/> and quadratic terms/>

在本实施例中，基于数据拓扑结构与多层次数据向量场，准确地识别出炼钢厂数据中的干扰部分，从而大大提高了数据采集的准确性，提高了数据采集的效率。In this embodiment, based on the data topology structure and the multi-level data vector field, the interference part in the steel plant data is accurately identified, thereby greatly improving the accuracy of data collection and improving the efficiency of data collection.

图3是本申请的另一示例性实施例示出的炼钢厂数据采集方法的流程图，如图3所示，该炼钢厂数据采集方法包括：(1)构建特征提取模型，通过特征提取模型捕获时序数据中的内在结构特征；(2)采用基于时间序列特性和空间分布的数据密度估计方法，确定出动态阈值，并根据动态密度估计值与动态阈值的比对结果，识别出异常数据特征，并进行有针对性的数据采集；(3)根据动态调整的采集频率采集炼钢厂数据，并基于数据拓扑结构与多层次向量场的处理方法，剔除炼钢厂数据中的干扰数据。FIG3 is a flow chart of a steel mill data collection method shown in another exemplary embodiment of the present application. As shown in FIG3 , the steel mill data collection method includes: (1) constructing a feature extraction model to capture the intrinsic structural features in the time series data through the feature extraction model; (2) using a data density estimation method based on time series characteristics and spatial distribution to determine a dynamic threshold, and based on the comparison result between the dynamic density estimation value and the dynamic threshold, identifying abnormal data features, and performing targeted data collection; (3) collecting steel mill data according to a dynamically adjusted collection frequency, and eliminating interference data in the steel mill data based on a data topology structure and a multi-level vector field processing method.

在本实施例中，构建特征提取模型，通过特征提取模型捕获时序数据中的内在结构特征，有利于减少炼钢厂数据的维度或复杂性。In this embodiment, a feature extraction model is constructed to capture the intrinsic structural features in the time series data, which is helpful to reduce the dimension or complexity of the steel plant data.

在本实施例中，采用基于时间序列特性和空间分布的数据密度估计方法，确定出动态阈值，并根据动态密度估计值与动态阈值的比对结果，识别出异常数据特征，可以对异常数据特征进行有针对性地数据采集，避免遗漏异常数据。In this embodiment, a data density estimation method based on time series characteristics and spatial distribution is used to determine the dynamic threshold, and based on the comparison result between the dynamic density estimation value and the dynamic threshold, the abnormal data features are identified, and targeted data collection can be performed on the abnormal data features to avoid missing abnormal data.

在本实施例中，基于数据拓扑结构与多层次向量场的处理方法，剔除炼钢厂数据中的干扰数据，有利于确保数据采集的准确性。In this embodiment, based on the data topology structure and the multi-level vector field processing method, interference data in the steel mill data is eliminated, which is beneficial to ensure the accuracy of data collection.

在本实施例中，采集设备通过传感器采集数据，根据时序数据的非线性数据特性，动态调整炼钢厂数据的采集频率，从而实现自适应采集。例如，在冶炼过程中，温度和化学成分在快速变化时，提高采集频率以获得更精确的数据；而在相对稳定的阶段，降低采集频率以减少数据处理负担。In this embodiment, the data acquisition device collects data through sensors, and dynamically adjusts the data acquisition frequency of the steel plant according to the nonlinear data characteristics of the time series data, thereby realizing adaptive acquisition. For example, during the smelting process, when the temperature and chemical composition are changing rapidly, the acquisition frequency is increased to obtain more accurate data; while in a relatively stable stage, the acquisition frequency is reduced to reduce the data processing burden.

在本实施例中，边缘计算节点通过接口与采集设备连接，实时监测炼钢过程中的参数变化数据，比如温度变化数据、压力变化数据和材料流量变化数据等。实时计算变化数据的非线性数据特性，当非线性数据特性超过动态阈值时，增加数据采集频率；当非线性数据特性低于动态阈值时，减少数据采集频率。In this embodiment, the edge computing node is connected to the collection device through an interface to monitor parameter change data in the steelmaking process in real time, such as temperature change data, pressure change data, and material flow change data. The nonlinear data characteristics of the change data are calculated in real time. When the nonlinear data characteristics exceed the dynamic threshold, the data collection frequency is increased; when the nonlinear data characteristics are lower than the dynamic threshold, the data collection frequency is reduced.

图4是本申请的另一示例性实施例示出的特征提取模型的结构图，如图4所示，该特征提取模型包括时间序列分析层、自注意力机制层、非线性变换层和多尺度分析层。FIG4 is a structural diagram of a feature extraction model shown in another exemplary embodiment of the present application. As shown in FIG4 , the feature extraction model includes a time series analysis layer, a self-attention mechanism layer, a nonlinear transformation layer, and a multi-scale analysis layer.

在本实施例中，时间序列分析层用于通过卷积核来捕获时间序列数据中的结构特征，有利于更好地理解数据的内在结构。In this embodiment, the time series analysis layer is used to capture the structural features in the time series data through the convolution kernel, which is conducive to better understanding the intrinsic structure of the data.

在本实施例中，自注意力机制层允许数据提取模型关注时序数据中的不同部分，通过关注生产过程中的异常时间点或异常结构特征，可以及时识别并处理潜在的生产问题。In this embodiment, the self-attention mechanism layer allows the data extraction model to focus on different parts of the time series data. By focusing on abnormal time points or abnormal structural features in the production process, potential production problems can be identified and handled in a timely manner.

在本实施例中，非线性变换层用于使得特征提取模型能够处理复杂且相互关联的生产数据，如温度、压力、材料比例，既能确保数据间的独立性，又能避免冗余和误差累积，正交约束通过正则化的权重矩阵引入，以确保得到的数据特征是正交的。In this embodiment, the nonlinear transformation layer is used to enable the feature extraction model to process complex and interrelated production data, such as temperature, pressure, and material proportion, which can not only ensure the independence of the data but also avoid redundancy and error accumulation. The orthogonal constraint is introduced through the regularized weight matrix to ensure that the obtained data features are orthogonal.

在本实施例中，多尺度分析层用于在不同时间尺度上捕获时序数据的非线性特征，从而得到时序数据在不同时间和空间尺度上的特征变化，如短期内的快速变化和长期趋势，以全面监控生产过程。In this embodiment, the multi-scale analysis layer is used to capture the nonlinear characteristics of time series data at different time scales, so as to obtain the characteristic changes of time series data at different time and space scales, such as rapid changes in the short term and long-term trends, so as to comprehensively monitor the production process.

以下介绍本申请的装置实施例，可以用于执行本申请上述实施例中的炼钢厂数据采集方法。对于本申请装置实施例中未披露的细节，请参照本申请上述的炼钢厂数据采集方法的实施例。The following describes an apparatus embodiment of the present application, which can be used to execute the steel mill data collection method in the above-mentioned embodiment of the present application. For details not disclosed in the apparatus embodiment of the present application, please refer to the above-mentioned embodiment of the steel mill data collection method of the present application.

图5是本申请的一示例性实施例示出的炼钢厂数据采集装置的框图。该装置可以应用于图1所示的实施环境，并具体配置在计算机设备102中。该装置也可以适用于其它的示例性实施环境，并具体配置在其它设备中，本实施例不对该装置所适用的实施环境进行限制。FIG5 is a block diagram of a steel plant data acquisition device shown in an exemplary embodiment of the present application. The device can be applied to the implementation environment shown in FIG1 and is specifically configured in the computer device 102. The device can also be applied to other exemplary implementation environments and specifically configured in other devices. This embodiment does not limit the implementation environment to which the device is applicable.

如图5所示，该示例性的炼钢厂数据采集装置包括：As shown in FIG5 , the exemplary steel plant data collection device includes:

数据获取模块501用于获取炼钢厂当前生产过程的时序数据。The data acquisition module 501 is used to acquire the time series data of the current production process of the steel plant.

特征提取模块502用于将时序数据输入特征提取模型，得到非线性数据特征。The feature extraction module 502 is used to input the time series data into the feature extraction model to obtain nonlinear data features.

阈值确定模块503用于基于非线性数据特征的动态密度估计值，确定出动态阈值。The threshold determination module 503 is used to determine a dynamic threshold based on a dynamic density estimation value of a nonlinear data feature.

频率确定模块504用于根据动态密度估计值与动态阈值的比对结果，调整数据采集频率，以通过调整后的数据采集频率对炼钢厂进行数据采集。The frequency determination module 504 is used to adjust the data collection frequency according to the comparison result between the dynamic density estimation value and the dynamic threshold value, so as to collect data from the steel mill through the adjusted data collection frequency.

在本实施例中，时序数据的结构特征包括平稳的结构特征和非平稳的结构特征，其中，异常结构特征包括非平稳的结构特征等。In this embodiment, the structural features of the time series data include stable structural features and non-stationary structural features, wherein the abnormal structural features include non-stationary structural features and the like.

在本实施例中，动态密度估计值的计算公式如公式(1)所示，在此，不进行赘述。通过动态密度估计值的计算，为每个提取出的非线性数据特征分配一个权重，这个权重基于非线性数据特征的时间属性，这意味着与异常数据特征最近的数据特征会被赋予更高的权重。In this embodiment, the calculation formula of the dynamic density estimation value is shown in formula (1), which is not repeated here. Through the calculation of the dynamic density estimation value, a weight is assigned to each extracted nonlinear data feature, and this weight is based on the time attribute of the nonlinear data feature, which means that the data feature closest to the abnormal data feature will be assigned a higher weight.

在本实施例中，时间衰减函数的表达式如公式(2)所示，在此，不进行赘述。空间-时间核函数的表达式如公式(3)所示，在此，不进行赘述。In this embodiment, the expression of the time decay function is shown in formula (2), which is not described in detail here. The expression of the space-time kernel function is shown in formula (3), which is not described in detail here.

在本实施例中，动态阈值的计算公式如公式(4)所示，在此，不进行赘述。动态阈值是基于所有非线性数据特征的动态密度估计值d_t的均值和高斯核函数的标准差计算的。In this embodiment, the calculation formula of the dynamic threshold is shown in formula (4), which is not described here. The dynamic threshold is calculated based on the mean of the dynamic density estimation value _dt of all nonlinear data features and the standard deviation of the Gaussian kernel function.

在本实施例中，对于每个非线性数据特征，将其动态密度估计值d_t与动态阈值Φ进行比较，如果某个非线性数据特征的动态密度估计值大于动态阈值，那么，这个非线性数据特征被认为是异常数据特征(关键数据特征)。一旦识别出异常数据特征，则增加数据采样频率，以获得更详细的炼钢厂数据；相反，如果非线性数据特征小于或等于动态阈值，则降低数据采样频率，以减少炼钢厂数据的采集和处理负担，因此，动态数据密度估计方法解决了采用固定频率对炼钢厂进行数据采集带来的过度采集或采集量不足的技术问题，不仅可以更准确地识别出时序数据中的异常数据特征，而且可以提高炼钢厂数据的采集效率，能够帮助炼钢厂更好地监控和优化生产过程。In this embodiment, for each nonlinear data feature, its dynamic density estimation value _dt is compared with the dynamic threshold Φ. If the dynamic density estimation value of a nonlinear data feature is greater than the dynamic threshold, then the nonlinear data feature is considered to be an abnormal data feature (key data feature). Once the abnormal data feature is identified, the data sampling frequency is increased to obtain more detailed steel mill data; on the contrary, if the nonlinear data feature is less than or equal to the dynamic threshold, the data sampling frequency is reduced to reduce the burden of steel mill data collection and processing. Therefore, the dynamic data density estimation method solves the technical problem of excessive or insufficient data collection caused by using a fixed frequency to collect data from a steel mill. It can not only more accurately identify abnormal data features in time series data, but also improve the collection efficiency of steel mill data, which can help steel mills better monitor and optimize production processes.

需要说明的是，上述实施例所提供的炼钢厂数据采集装置与上述实施例所提供的炼钢厂数据采集方法属于同一构思，其中各个模块和单元执行操作的具体方式已经在方法实施例中进行了详细描述，此处不再赘述。上述实施例所提供的炼钢厂数据采集装置在实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能，本处也不对此进行限制。It should be noted that the steel mill data acquisition device provided in the above embodiment and the steel mill data acquisition method provided in the above embodiment belong to the same concept, wherein the specific manner in which each module and unit performs the operation has been described in detail in the method embodiment, and will not be repeated here. In actual application, the steel mill data acquisition device provided in the above embodiment can allocate the above functions to different functional modules as needed, that is, divide the internal structure of the device into different functional modules to complete all or part of the functions described above, and this is not limited here.

本申请的实施例还提供了一种电子设备，包括：一个或多个处理器；存储装置，用于存储一个或多个程序，当一个或多个程序被一个或多个处理器执行时，使得电子设备实现上述各个实施例中提供的炼钢厂数据采集方法。An embodiment of the present application also provides an electronic device, comprising: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by one or more processors, the electronic device implements the steel plant data collection method provided in the above-mentioned embodiments.

图6示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。需要说明的是，图6示出的电子设备的计算机系统600仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Fig. 6 shows a schematic diagram of a computer system of an electronic device suitable for implementing an embodiment of the present application. It should be noted that the computer system 600 of the electronic device shown in Fig. 6 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present application.

如图6所示，计算机系统600包括中央处理单元(Central Processing Unit，CPU)601，其可以根据存储在只读存储器(Read-Only Memory，ROM)602中的程序或者从储存部分608加载到随机访问存储器(Random Access Memory，RAM)603中的程序而执行各种适当的动作和处理，例如执行上述实施例中的方法。在RAM 603中，还存储有系统操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(Input/Output，I/O)接口605也连接至总线604。As shown in Figure 6, computer system 600 includes a central processing unit (CPU) 601, which can perform various appropriate actions and processes according to the program stored in the read-only memory (ROM) 602 or the program loaded from the storage part 608 to the random access memory (RAM) 603, such as executing the method in the above embodiment. In RAM 603, various programs and data required for system operation are also stored. CPU 601, ROM 602 and RAM 603 are connected to each other through bus 604. Input/output (I/O) interface 605 is also connected to bus 604.

以下部件连接至I/O接口605：包括键盘、鼠标等的输入部分606；包括诸如阴极射线管(Cathode Ray Tube，CRT)、液晶显示器(Liquid Crystal Display，LCD)等以及扬声器等的输出部分607；包括硬盘等的储存部分608；以及包括诸如LAN(Local Area Network，局域网)卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器610上，以便于从其上读出的计算机程序根据需要被安装入储存部分608。The following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 608 including a hard disk, etc.; and a communication section 609 including a network interface card such as a LAN (Local Area Network) card, a modem, etc. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the I/O interface 605 as needed. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed so that a computer program read therefrom is installed into the storage section 608 as needed.

特别地，根据本申请的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本申请的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的计算机程序。在这样的实施例中，该计算机程序可以通过通信部分609从网络上被下载和安装，和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时，执行本申请的系统中限定的各种功能。In particular, according to an embodiment of the present application, the process described above with reference to the flowchart can be implemented as a computer software program. For example, an embodiment of the present application includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes a computer program for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network through a communication section 609, and/or installed from a removable medium 611. When the computer program is executed by a central processing unit (CPU) 601, various functions defined in the system of the present application are executed.

需要说明的是，本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory，EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory，CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的计算机程序。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的计算机程序可以用任何适当的介质传输，包括但不限于：无线、有线等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the embodiment of the present application can be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. The computer-readable storage medium can be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a flash memory, an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, wherein a computer-readable computer program is carried. This propagated data signal can take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device. A computer program contained on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagram in the accompanying drawings illustrate the possible architecture, functions and operations of the system, method and computer program product according to various embodiments of the present application. Wherein, each box in the flowchart or block diagram can represent a module, a program segment, or a part of the code, and the above-mentioned module, program segment, or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram or flowchart, and the combination of the boxes in the block diagram or flowchart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

描述于本申请实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现，所描述的单元也可以设置在处理器中。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments described in this application may be implemented by software or hardware, and the units described may also be set in a processor. The names of these units do not constitute limitations on the units themselves in some cases.

本申请的另一方面还提供了一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被计算机的处理器执行时，使计算机执行如前所述的炼钢厂数据采集方法。该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的，也可以是单独存在，而未装配入该电子设备中。Another aspect of the present application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor of a computer, causes the computer to execute the steel mill data collection method as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment, or may exist independently without being assembled into the electronic device.

本申请的另一方面还提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述各个实施例中提供的炼钢厂数据采集方法。Another aspect of the present application also provides a computer program product or a computer program, the computer program product or the computer program includes computer instructions, the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the steel mill data collection method provided in each of the above embodiments.

上述实施例仅示例性说明本发明的原理及其功效，而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下，对上述实施例进行修饰或改变。因此，但凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变，仍应由本发明的权利要求所涵盖。The above embodiments are merely illustrative of the principles and effects of the present invention, and are not intended to limit the present invention. Anyone familiar with the technology may modify or change the above embodiments without violating the spirit and scope of the present invention. Therefore, all equivalent modifications or changes made by a person with ordinary knowledge in the technical field without departing from the spirit and technical ideas disclosed by the present invention should still be covered by the claims of the present invention.

Claims

1. The steel mill data acquisition method is characterized by comprising the following steps of:

acquiring time sequence data of the current production process of a steel mill;

Inputting the time sequence data into a feature extraction model to obtain nonlinear data features, wherein the feature extraction model is obtained by training a pre-built extraction model by sample data, the pre-built extraction model comprises a time sequence analysis layer for capturing structural features, a self-attention mechanism layer for identifying abnormal structural features and a nonlinear transformation layer for extracting nonlinear data features, and the sample data comprises all or part of time sequence data of a steel mill historical production process;

determining a dynamic threshold based on the dynamic density estimation value of the nonlinear data characteristic;

And adjusting the data acquisition frequency according to the comparison result of the dynamic density estimation value and the dynamic threshold value, so as to acquire the data of the steel plant through the adjusted data acquisition frequency.

2. The steel mill data collection method according to claim 1, wherein after data collection of the steel mill by the adjusted data collection frequency, the steel mill data collection method comprises:

constructing a multi-level data vector field based on the acquired steel mill data, wherein the multi-level data vector field is used for extracting data characteristics of different levels;

Inputting the multi-level data vector field into a preset interference recognition function to obtain interference data;

And eliminating the interference data from the steel mill data to obtain pure steel mill data.

3. The steel mill data acquisition method according to claim 1 or 2, wherein the process of adjusting the data acquisition frequency according to the comparison result of the dynamic density estimation value and the dynamic threshold value comprises:

Acquiring the current data acquisition frequency of the time sequence data;

If the dynamic density estimation value is smaller than or equal to the dynamic threshold value, judging that the data characteristic corresponding to the dynamic density estimation value is a normal data characteristic, and adjusting the current data acquisition frequency to be a first data acquisition frequency, wherein the first data acquisition frequency is smaller than the current data acquisition frequency;

And if the dynamic density estimation value is larger than the dynamic threshold value, judging that the data characteristic corresponding to the dynamic density estimation value is an abnormal data characteristic, and adjusting the current data acquisition frequency to a second data acquisition frequency, wherein the second data acquisition frequency is larger than the current data acquisition frequency.

4. The steel mill data collection method according to claim 1 or 2, wherein if the pre-built extraction model further comprises a multi-scale analysis layer for extracting features of different scale data, training the pre-built extraction model by sample data, and obtaining the feature extraction model comprises:

Inputting the sample data into the time sequence analysis layer to obtain structural characteristics;

Inputting the structural features into the self-attention mechanism layer to obtain abnormal structural features;

inputting the abnormal structural characteristics into the nonlinear transformation layer to obtain nonlinear data characteristics;

Inputting the nonlinear data features into the multi-scale analysis layer to obtain data features with different scales;

constructing a loss function based on the different scale data features and the time sequence data, and updating parameters of the time sequence analysis layer, the self-attention mechanism layer, the nonlinear transformation layer and the multiscale analysis layer with the aim of minimizing the loss function;

and combining the time sequence analysis layer after parameter updating, the self-attention mechanism layer after parameter updating, the nonlinear transformation layer after parameter updating and the multi-scale analysis layer after parameter updating to obtain the feature extraction model.

5. The steel mill data collection method according to claim 4, wherein the expression of the time series analysis layer is:

Wherein ts (x) represents a structural feature of time series data, x represents time series data, sigm represents an activation function, W _t represents a weight, b _t represents a bias, β represents a weight coefficient of gaussian integration, and T represents a transpose;

The expression of the self-attention mechanism layer is as follows:

Wherein Q, K, V represents a query matrix, a key matrix, and a value matrix in the self-attention mechanism layer, respectively, a (ts) represents an abnormal structural feature, T represents a transpose, d _k represents a dimension of a key, and ts represents a structural feature;

the expression of the nonlinear transformation layer is as follows:

Wherein f (a) represents a nonlinear data characteristic, W _f represents a weight matrix of a nonlinear transformation layer, b _f represents a bias vector of the nonlinear transformation layer, μ represents a regularization coefficient, α represents a weight coefficient of a sine function, W _r represents a regularized weight matrix, and a represents an abnormal structural characteristic;

the expression of the multi-scale analysis layer is as follows:

Wherein m (f) represents data features of different scales, W _i represents a weight matrix of an ith scale, sigma _i represents a standard deviation of the ith scale, and f represents a nonlinear data feature;

The expression of the loss function is:

Wherein L represents a loss function, μ ₁ and μ ₂ are regularization coefficients, h _i represents a state of a neuron where the ith scale is located, F represents a Frobenius norm, W _i represents a weight matrix of the ith scale, and x represents time series data;

the calculation formula of the dynamic density estimation value is as follows:

Wherein d _t (f) represents a dynamic density estimate of the nonlinear data feature f at time t, N represents the number of nonlinear data features, K represents a space-time kernel function for measuring the similarity in space and time between the nonlinear data features, u represents the spatial distance between the nonlinear data features, ω represents a time decay function, Δt represents the time difference between any time t ₀ of the nonlinear data features and the current time t;

The calculation formula of the dynamic threshold value is as follows:

wherein phi represents the dynamic threshold value, The mean value of the dynamic density estimation d _t representing the nonlinear data feature, θ represents a constant, and σ represents the standard deviation of the gaussian kernel function.

6. The steel mill data acquisition method according to claim 2, wherein the process of constructing a multi-level data vector field based on the acquired steel mill data comprises:

Performing dimension conversion on the steel mill data to obtain a data point set;

calculating to obtain a data persistence graph based on the data point set;

and calculating the multi-level data vector field based on the data persistence graph.

7. The steel mill data collection method of claim 6, wherein the expression of the data point set is:

P(t)＝M(U)＝exp(-k₁t²)U·sin(k₂t)+k₃tU

Wherein k ₁ represents a time attenuation factor, k ₂ represents a periodicity adjustment factor, t represents time, k ₃ represents a linear time influence factor, U represents steelworks data, M (U) represents a mapping function for dimensional transformation of the steelworks data, and P (t) represents a set of data points;

The expression of the data persistence map is as follows:

Wherein λ represents an attenuation factor, P (t) represents a data point set, D (t) represents a data persistence map, and t represents time;

the expression of the multi-level data vector field is as follows:

wherein V _k (t) represents the kth layer data vector field at time t, Representing a gradient operator, gamma representing a weight parameter, D _k (t ') representing a data persistence map of a kth layer at time t ', t ' e [0, t ];

the expression of the preset interference recognition function is as follows:

Where Idf represents a preset interference recognition function, θ represents a threshold function, V _k (t) represents the kth layer data vector field at time t, Representing the average vector of V _k (t), I _k (t) representing the interference data identified in the k-th layer data vector field at time t;

The expression of the pure steel mill data is as follows:

Where y (t) represents clean steelworks data at time t, w _kj represents the weight of the j-th interference vector of the K-th layer, I _kj represents the j-th interference vector identified by the K-th layer at time t, n _k represents the number of interference vectors of the K-th layer, K represents the total number of layers of the data vector field, δ represents the adjustment factor, U (t) represents steelworks data at time t, I _k (t) represents the combination of I _kj identified by the K-th layer at time t.

8. A steel mill data acquisition device, characterized in that the steel mill data acquisition device comprises:

the data acquisition module is used for acquiring time sequence data of the current production process of the steel mill;

The feature extraction module is used for inputting the time sequence data into a feature extraction model to obtain nonlinear data features, the feature extraction model is obtained by training a pre-built extraction model through sample data, the pre-built extraction model comprises a time sequence analysis layer for capturing structural features, a self-attention mechanism layer for identifying abnormal structural features and a nonlinear transformation layer for extracting the nonlinear data features, and the sample data comprises all or part of time sequence data of a steel mill historical production process;

The threshold determining module is used for determining a dynamic threshold based on the dynamic density estimated value of the nonlinear data characteristic;

And the frequency determining module is used for adjusting the data acquisition frequency according to the comparison result of the dynamic density estimated value and the dynamic threshold value so as to acquire the data of the steel mill through the adjusted data acquisition frequency.

9. An electronic device, comprising:

One or more processors;

Storage means for storing one or more programs which when executed by the one or more processors cause the electronic device to implement the steel mill data collection method of any one of claims 1 to 7.

10. A computer readable storage medium, having stored thereon computer readable instructions, which when executed by a processor of a computer, cause the computer to perform the steel mill data collection method of any one of claims 1 to 7.