CN116402679A

CN116402679A - A Lightweight Infrared Super-resolution Adaptive Reconstruction Method

Info

Publication number: CN116402679A
Application number: CN202211692350.XA
Authority: CN
Inventors: 蒋一纯; 刘云清; 詹伟达; 陈宇; 韩登; 于永吉
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-07-07
Anticipated expiration: 2042-12-28
Also published as: CN116402679B

Abstract

The invention belongs to the technical field of image processing, in particular to a light-weight infrared super-resolution adaptive reconstruction method, comprising the following steps: Step 1, constructing a network model: the infrared image super-resolution reconstruction model includes an input initialization layer, image feature extraction Module and output image reconstruction module; step 2, prepare the data set: prepare the infrared image data set, and perform analog downsampling and data augmentation on it for subsequent network training; step 3, train the network model: train the infrared image super-resolution rate reconstruction model. The adaptive image feature processing unit proposed by the present invention, by limiting the self-attention mechanism in the sliding window, relies on the features in the sliding window to calculate and update the feature values in the window adaptively, avoiding the use of the same image in the local window. The convolution kernel improves the expressive ability and reduces the amount of computation generated during the training and reasoning process of the self-attention mechanism.

Description

A Lightweight Infrared Super-resolution Adaptive Reconstruction Method

技术领域technical field

本发明涉及图像处理技术领域，具体为一种轻量级红外超分辨率自适应重建方法。The invention relates to the technical field of image processing, in particular to a light-weight infrared super-resolution self-adaptive reconstruction method.

背景技术Background technique

红外图像的成像机理是通过感应环境中物体发射的热辐射进行成像，不依赖环境光或人造光源的反射，具备很强的抗干扰和全天候工作能力；由于其出色的识别能力和被动成像的特点，被广泛用于军事、自动驾驶和安防等领域；但是，红外成像传感器的制作工艺较为复杂，密集阵列需要制冷机支持，所以其分辨率普遍较低，且价格昂贵；相比直接改良成像传感器，通过图像超分辨率的方法，恢复部分红外图像中的高频信息，提高图像的分辨率和质量，能有效提高成像质量且成本低廉，具有重要的实际意义和广阔的应用前景；红外图像超分辨率是一个高度欠定问题，丢失的细节需要通过大量的图像结构关系来估计，这导致红外图像超分辨率重建的难度较高；目前主流的方案是采用卷积神经网络来完成从低分辨率红外图像到高分辨率红外图像的映射，这会受到卷积网络中卷积核参数复用的原理的限制。The imaging mechanism of infrared images is to perform imaging by sensing the thermal radiation emitted by objects in the environment. It does not rely on the reflection of ambient light or artificial light sources. It has strong anti-interference and all-weather working capabilities; , are widely used in the fields of military, automatic driving, and security; however, the manufacturing process of infrared imaging sensors is relatively complicated, and dense arrays require the support of refrigerators, so their resolution is generally low and expensive; compared to directly improving imaging sensors , through the method of image super-resolution, restore the high-frequency information in part of the infrared image, improve the resolution and quality of the image, can effectively improve the imaging quality and low cost, have important practical significance and broad application prospects; infrared image super-resolution Resolution is a highly underdetermined problem. Lost details need to be estimated through a large number of image structure relationships, which makes super-resolution reconstruction of infrared images more difficult; the current mainstream solution is to use convolutional neural networks to complete from low-resolution The mapping from high-resolution infrared images to high-resolution infrared images is limited by the principle of multiplexing convolution kernel parameters in convolutional networks.

中国专利公开号为“CN112308772B”，名称为“基于深度学习局部与非局部信息的超分辨率重建方法”，该方法构建了一个深层神经网络模型，将图像输入网络后分时复用同一套特征筛选网络，包括局部网络和非局部增强网络两大模块，通过非常深层的卷积运算，恢复图像中丢失的细节；卷积运算在每一层采用固定的卷积核，这使浅层网络的表达能力很差，所以网络常常要设计得很深很宽，这使计算复杂度和存储容量占用率都居高不下；因此，如何克服卷积运算的限制，通过少量的可学习参数和乘加运算就能实现高质量超分辨率重建是本领域技术人员亟需解决的问题。The Chinese patent publication number is "CN112308772B", and the title is "Super-resolution reconstruction method based on local and non-local information of deep learning". The screening network, including two modules of local network and non-local enhancement network, restores the lost details in the image through very deep convolution operation; convolution operation uses a fixed convolution kernel in each layer, which makes the shallow network The ability to express is very poor, so the network is often designed to be very deep and wide, which makes the computational complexity and storage capacity occupancy rate high; therefore, how to overcome the limitations of convolution operations, through a small number of learnable parameters and multiply-add It is an urgent problem for those skilled in the art to realize high-quality super-resolution reconstruction through calculation.

发明内容Contents of the invention

(一)解决的技术问题(1) Solved technical problems

针对现有技术的不足，本发明提供了一种轻量级红外超分辨率自适应重建方法，解决了上述背景技术中所提出的问题。Aiming at the deficiencies of the prior art, the present invention provides a light-weight infrared super-resolution adaptive reconstruction method, which solves the problems raised in the background art above.

(二)技术方案(2) Technical solution

本发明为了实现上述目的具体采用以下技术方案：The present invention specifically adopts the following technical solutions in order to achieve the above object:

一种轻量级红外超分辨率自适应重建方法，包括如下步骤：A lightweight infrared super-resolution adaptive reconstruction method, comprising the following steps:

步骤1，构建网络模型：红外图像超分辨率重建模型包括输入初始化层、图像特征提取模块和输出图像重建模块；Step 1, constructing a network model: the infrared image super-resolution reconstruction model includes an input initialization layer, an image feature extraction module and an output image reconstruction module;

步骤2，准备数据集：准备红外图像数据集，并对其进行模拟下采样和数据增广，以便后续进行网络训练；Step 2, prepare the data set: prepare the infrared image data set, and perform simulated downsampling and data augmentation on it for subsequent network training;

步骤3，训练网络模型：训练红外图像超分辨率重建模型，将步骤2中准备好的数据集进输入到步骤1中构建好的网络模型中进行训练；Step 3, training the network model: training the infrared image super-resolution reconstruction model, inputting the data set prepared in step 2 into the network model constructed in step 1 for training;

步骤4，最小化损失函数和选择最优评估指标：通过最小化网络输出图像与标签的损失函数，直到训练次数达到设定阈值或损失函数的值到达设定范围内即可认为模型参数已预训练完成，保存模型参数；同时选择最优评估指标来衡量算法的精度，评估系统的性能；Step 4, minimize the loss function and select the optimal evaluation index: by minimizing the loss function of the network output image and label, until the number of training times reaches the set threshold or the value of the loss function reaches the set range, the model parameters can be considered to have been predicted. After the training is completed, the model parameters are saved; at the same time, the optimal evaluation index is selected to measure the accuracy of the algorithm and evaluate the performance of the system;

步骤5，微调模型：准备多个额外的红外图像数据集，对模型进行训练和微调，得到更优的模型参数，进一步提高模型的泛化能力；最终使得模型在应对多种型号的红外成像仪时维持良好的重建质量；Step 5, fine-tuning the model: prepare multiple additional infrared image data sets, train and fine-tune the model, obtain better model parameters, and further improve the generalization ability of the model; finally, the model can cope with various types of infrared imagers maintain good reconstruction quality;

步骤6，保存模型：将最终确定的模型参数进行固化，之后需要进行红外图像超分辨率重建操作时，直接将图像输入到网络中即可得到最终的重建图像。Step 6, save the model: solidify the final model parameters, and then when the infrared image super-resolution reconstruction operation is required, directly input the image into the network to obtain the final reconstructed image.

上述的一种轻量级红外超分辨率自适应重建方法，所述步骤1中红外图像超分辨率重建模型中输入初始化为单层卷积层，用于将输入图像映射到特征空间中，以供后续特征的进一步细化和处理；图像特征提取模块由四层自适应图像特征处理单元组成，具体而言，自适应图像特征处理单元由卷积层一、自注意力层和卷积层二组成，其中自注意力层由线性特征拆解、自注意力机制、相对位置编码层、全连接层一、全连接层二和特征重组组成；输出图像重建模块由信道压缩层、全局跳跃连接、和像素重组层组成。In the aforementioned lightweight infrared super-resolution adaptive reconstruction method, the input in the infrared image super-resolution reconstruction model in step 1 is initialized as a single-layer convolutional layer, which is used to map the input image into the feature space to For further refinement and processing of subsequent features; the image feature extraction module consists of four layers of adaptive image feature processing units, specifically, the adaptive image feature processing unit consists of convolutional layer 1, self-attention layer and convolutional layer 2 The self-attention layer is composed of linear feature disassembly, self-attention mechanism, relative position encoding layer, fully connected layer 1, fully connected layer 2 and feature reorganization; the output image reconstruction module is composed of channel compression layer, global skip connection, and pixel reorganization layer composition.

上述的一种轻量级红外超分辨率自适应重建方法，所述步骤2中训练过程中的红外图像数据集使用FLIRADAS数据集；将数据集中的红外图像分别模拟下采样2、3、4倍，用于对不同超分辨率尺度的超分辨率重建模型进行有监督的训练；Above-mentioned a kind of lightweight infrared super-resolution self-adaptive reconstruction method, the infrared image data set in the training process in the described step 2 uses FLIRADAS data set; The infrared image in the data set is simulated and down-sampled by 2, 3, 4 times respectively , for supervised training of super-resolution reconstruction models at different super-resolution scales;

上述的一种轻量级红外超分辨率自适应重建方法，所述步骤4中在训练过程中损失函数选择使用自适应损失函数，在偏差值很高的情况下，引入像素损失以稳定、快速地优化网络参数，避免梯度爆炸的问题；在偏差值降低至阈值以下时，采用结构损失以使网络参数优化时聚焦于图像的纹理细节恢复；损失函数的选择影响着模型的好坏，能够真实地体现出预测值与真值差异，并且能够正确地反馈模型的质量。In the aforementioned light-weight infrared super-resolution adaptive reconstruction method, in the step 4, the loss function is selected to use an adaptive loss function during the training process, and when the deviation value is very high, pixel loss is introduced to stabilize and quickly Optimize the network parameters to avoid the problem of gradient explosion; when the deviation value is reduced below the threshold, the structural loss is used to focus on the restoration of the texture details of the image when optimizing the network parameters; the choice of loss function affects the quality of the model, which can be realistic It can accurately reflect the difference between the predicted value and the real value, and can correctly feedback the quality of the model.

上述的种轻量级红外超分辨率自适应重建方法，所述步骤4中在训练过程中合适的评估指标选择峰值信噪比(PSNR)、结构相似性(SSIM)，能够有效地评估算法超分辨率重建结果的质量和真实高分辨率图像之间的失真程度，衡量网络模型的性能。The above-mentioned light-weight infrared super-resolution adaptive reconstruction method, in the step 4, the appropriate evaluation index selection peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) in the training process can effectively evaluate the algorithm super-resolution The quality of the resolution reconstruction results and the degree of distortion between the real high-resolution image, which measures the performance of the network model.

上述的一种轻量级红外超分辨率自适应重建方法，所述步骤5中在微调模型参数过程中使用MFNet、TNO数据集。In the aforementioned light-weight infrared super-resolution adaptive reconstruction method, MFNet and TNO data sets are used in the process of fine-tuning model parameters in step 5.

本发明还提供了一种轻量级红外超分辨率的电子设备，所述设备包括：多功能视频流输入输出接口、一个中央处理器、多个图形处理单元、存储装置及存储在存储器上并可在处理器上运行的计算机程序；其中，中央处理器和多个图像处理单元执行计算机程序时实现上述方法的步骤。The present invention also provides a light-weight infrared super-resolution electronic device, which includes: a multifunctional video stream input and output interface, a central processing unit, a plurality of graphics processing units, a storage device and stored in the memory and A computer program that can run on a processor; wherein, when the central processing unit and multiple image processing units execute the computer program, the steps of the above method are realized.

本发明还提供了一种计算机可读存储介质，其上存储有计算机程序指令，该计算机程序指令被处理器执行时实现上述方法的步骤。The present invention also provides a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the steps of the above method are realized.

(三)有益效果(3) Beneficial effects

与现有技术相比，本发明提供了一种轻量级红外超分辨率自适应重建方法，具备以下有益效果：Compared with the prior art, the present invention provides a lightweight infrared super-resolution adaptive reconstruction method, which has the following beneficial effects:

本发明提出的自适应图像特征处理单元，通过将自注意力机制限制在滑动窗口内，依赖滑动窗口内各特征本身来自适应地计算并更新窗口内的特征值，避免了在局部窗口内采用相同的卷积核，提高了表达能力，同时减少了自注意力机制训练和推理过程中产生的计算量。The adaptive image feature processing unit proposed by the present invention, by limiting the self-attention mechanism in the sliding window, relies on the features in the sliding window to calculate and update the feature values in the window adaptively, avoiding the use of the same image in the local window. The convolution kernel improves the expressive ability and reduces the amount of computation generated during training and reasoning of the self-attention mechanism.

本发明在提出的自适应图像特征处理单元中，滑动窗口内加入了相对位置编码，避免重叠部分在计算自注意力时重复计算；重叠部分在重组时使用各窗口中相应区域的数学期望更新，无需再额外设计窗口间的信息交互手段。In the self-adaptive image feature processing unit proposed by the present invention, relative position coding is added in the sliding window to avoid repeated calculation of the overlapping part when calculating self-attention; when the overlapping part is reorganized, the mathematical expectation of the corresponding area in each window is used to update, There is no need to additionally design information interaction means between windows.

本发明在自注意力计算机制中不再使用层归一化操作，确保图像结构信息和对比度信息的完整性；同时，将输入的特征向量和新特征向量拼接后输入前馈网络更新，以更好地保持图像的低频结构。The present invention no longer uses the layer normalization operation in the self-attention calculation mechanism to ensure the integrity of the image structure information and contrast information; at the same time, the input feature vector and the new feature vector are spliced and input into the feedforward network for update to update Preserves the low-frequency structure of the image well.

本发明提出了一种自适应损失函数，它可以通过训练过程中实时监控网络模型的状态，自动选择让网络学习总体相似或图像纹理细节，提高了最终获得的网络模型的重建性能。The present invention proposes an adaptive loss function, which can monitor the state of the network model in real time during the training process, and automatically select the overall similarity or image texture details for the network to learn, thereby improving the reconstruction performance of the finally obtained network model.

附图说明Description of drawings

图1为为本发明流程图；Fig. 1 is a flowchart of the present invention;

图2为为本发明的网络模型结构；Fig. 2 is the network model structure of the present invention;

图3为本发明自适应图像处理单元的处理流程图；Fig. 3 is the processing flowchart of adaptive image processing unit of the present invention;

图4为本发明滑窗自注意力机制中特征图的工作原理示意图；Fig. 4 is a schematic diagram of the working principle of the feature map in the sliding window self-attention mechanism of the present invention;

图5为本发明像素重组的工作原理示意图；Fig. 5 is a schematic diagram of the working principle of pixel reorganization in the present invention;

图6为本发明实现轻量级红外超分辨率方法与现有技术的主要性能指标对比结果图；Fig. 6 is a comparison result diagram of main performance indicators between the light-weight infrared super-resolution method of the present invention and the prior art;

图7为本发明实现轻量级红外超分辨率方法的电子设备内部结构示意图。FIG. 7 is a schematic diagram of the internal structure of an electronic device implementing a lightweight infrared super-resolution method according to the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例1Example 1

如图1所示，一种轻量级红外超分辨率自适应重建方法的流程图，该方法具体包括如下步骤：As shown in Figure 1, a flow chart of a lightweight infrared super-resolution adaptive reconstruction method, the method specifically includes the following steps:

步骤1，构建网络模型：红外图像超分辨率重建模型包括输入初始化层、图像特征提取模块和输出图像重建模块；输入初始化为单层卷积层，用于将输入图像映射到特征空间中，以供后续特征的进一步细化和处理；图像特征提取模块由四层自适应图像特征处理单元组成，具体而言，自适应图像特征处理单元由卷积层一、特征拆解层、相对位置编码层、自注意力层、特征重组和卷积层二组成，其中自注意力层由线性自注意力机制、全连接层一、全连接层二组成；输出图像重建模块由信道压缩层、全局跳跃连接、和像素重组层组成；Step 1, build a network model: the infrared image super-resolution reconstruction model includes an input initialization layer, an image feature extraction module and an output image reconstruction module; the input is initialized as a single-layer convolutional layer, which is used to map the input image into the feature space, to For further refinement and processing of subsequent features; the image feature extraction module consists of four layers of adaptive image feature processing units, specifically, the adaptive image feature processing unit consists of convolution layer 1, feature disassembly layer, and relative position encoding layer , self-attention layer, feature reorganization and convolution layer two, wherein the self-attention layer is composed of linear self-attention mechanism, fully connected layer 1, and fully connected layer 2; the output image reconstruction module is composed of channel compression layer and global skip connection , and a pixel reorganization layer;

步骤2，准备数据集：准备FLIRADAS红外图像数据集，；将数据集中的红外图像进行增广，分别模拟下采样2、3、4倍，用于对不同超分辨率尺度的超分辨率重建模型进行有监督的训练；Step 2, prepare the data set: prepare the FLIRADAS infrared image data set; augment the infrared image in the data set, simulate downsampling by 2, 3, and 4 times respectively, and use it for super-resolution reconstruction models of different super-resolution scales conduct supervised training;

步骤4，最小化损失函数和选择最优评估指标：训练过程中损失函数选择使用自适应损失函数，在偏差值很高的情况下，引入像素损失以稳定、快速地优化网络参数，避免梯度爆炸的问题；在偏差值降低至阈值以下时，采用结构损失以使网络参数优化时聚焦于图像的纹理细节恢复；通过最小化网络输出图像与标签的损失函数，直到训练次数达到设定阈值或损失函数的值到达设定范围内即可认为模型参数已预训练完成，保存模型参数；同时选择最优评估指标来衡量算法的精度，评估系统的性能；Step 4, Minimize the loss function and select the optimal evaluation index: During the training process, the loss function is selected to use an adaptive loss function. When the deviation value is high, pixel loss is introduced to optimize the network parameters stably and quickly to avoid gradient explosion. The problem; when the deviation value is reduced below the threshold, the structural loss is used to focus on the restoration of the texture details of the image when optimizing the network parameters; by minimizing the loss function of the network output image and label, until the number of training times reaches the set threshold or loss When the value of the function reaches the set range, it can be considered that the model parameters have been pre-trained, and the model parameters are saved; at the same time, the optimal evaluation index is selected to measure the accuracy of the algorithm and evaluate the performance of the system;

步骤5，微调模型：准备MFNet、TNO红外图像数据集，对模型进行训练和微调，得到更优的模型参数，进一步提高模型的泛化能力；最终使得模型在应对多种型号的红外成像仪时维持良好的重建质量；Step 5, fine-tuning the model: prepare MFNet and TNO infrared image data sets, train and fine-tune the model, obtain better model parameters, and further improve the generalization ability of the model; finally, the model can cope with various types of infrared imagers maintain good reconstruction quality;

步骤6，保存模型：将最终确定的模型参数进行固化，之后需要进行红外超分辨率操作时，直接将图像输入到网络中即可得到最终的重建图像。Step 6, save the model: solidify the final model parameters, and then input the image directly into the network to obtain the final reconstructed image when infrared super-resolution operation is required.

实施例2：Example 2:

步骤1，构建网络模型；Step 1, build a network model;

所述步骤1中整个红外图像超分辨率重建模型包括输入初始化层、图像特征提取模块和输出图像重建模块；输入初始化层为一个卷积核为3×3、步长为1、填充为1且设置了偏置参数的卷积层，其从输入I_ir∈^1×H×W转换到特征空间中得到初始特征f₁∈^C×H×W的过程可以表示为：The entire infrared image super-resolution reconstruction model in the step 1 includes an input initialization layer, an image feature extraction module, and an output image reconstruction module; the input initialization layer is a convolution kernel of 3×3, a step size of 1, and a padding of 1 and For the convolutional layer with bias parameters set, the process of transforming the input I _ir ∈ ^1×H×W into the feature space to obtain the initial feature f ₁ ∈ ^C×H×W can be expressed as:

f₁＝W₁*I_ir+B₁ f ₁ =W ₁ *I _ir +B ₁

式中，W₁为输入初始化层中的卷积核，B₁为卷积操作中的偏置，*表示卷积操作；In the formula, W ₁ is the convolution kernel in the input initialization layer, B ₁ is the bias in the convolution operation, and * indicates the convolution operation;

然后，特征被输入图像特征提取模块进一步处理，图像特征提取模块中包括4个自适应图像处理单元，每一个单元负责处理上一层输出的特征图，并将本层输出特征图与输入特征图在通道维度拼接后输出；将特征f₁∈^C×H×W输入图像特征提取模块，并获得各单元输出特征f_n∈^(n+1)C×H×W,n＝1,2,3,4的具体过程可以表示为：Then, the feature is further processed by the input image feature extraction module. The image feature extraction module includes 4 adaptive image processing units, each unit is responsible for processing the feature map output by the previous layer, and combining the output feature map of this layer with the input feature map Output after splicing in the channel dimension; input feature f ₁ ∈ ^C×H×W into the image feature extraction module, and obtain the output feature f _n ∈ ^(n+1)C×H×W of each unit, n=1,2,3 , the specific process of 4 can be expressed as:

式中，

为第n个自适应图像处理单元，其工作原理如图2所示；在自适应图像处理单元中，特征图f_i′∈^C×H×W先经过一个卷积核为1×1、步长为1的卷积将通道改变为自注意力机制所需的特征向量长度，得到新特征图f₁′∈^C'^×H×W，该过程可表示为：In the formula,

is the nth adaptive image processing unit, and its working principle is shown in Figure 2; in the adaptive image processing unit, the feature map f _i ′∈ ^C×H×W firstly passes through a convolution kernel of 1×1, step The convolution with a length of 1 changes the channel to the feature vector length required by the self-attention mechanism, and obtains a new feature map f ₁ ′∈ ^C ′ ^×H×W , the process can be expressed as:

f₁′＝σ(W_i′*f_i′+B_i′)f ₁ ′=σ(W _i ′*f _i ′+B _i ′)

式中，σ(x)＝max(x,0)+min(x,p)为参数线性整流函数；由于其高效性和优秀的拟合能力，在本发明中，所有的激活函数均设计为参数线性整流函数；接下来，f₁′将由图3所示的滑窗自注意力机制进行处理，在大小为n×n、步进为m的窗口内将特征沿通道维度分成n×n个长度为C'的向量，得到向量集合

其中i＝1,2,......,H/m,j＝1,2,......,W/m分别为高宽方向切分的窗口编号；然后，将索引权重W_Q、查询权重W_K和内容权重W_V分别与各向量相乘，将特征向量分化为索引向量Q、查询向量K和内容向量V，该过程可以表达为：In the formula, σ(x)=max(x,0)+min(x,p) is a parameter linear rectification function; due to its high efficiency and excellent fitting ability, in the present invention, all activation functions are designed as parameter linear rectification function; next, f ₁ ′ will be processed by the sliding window self-attention mechanism shown in Fig. 3, which divides the feature into n×n n×n A vector of length C', get the set of vectors

Where i=1,2,...,H/m,j=1,2,...,W/m are the window numbers of the height and width directions; then, the index weight W _Q , query weight W _K and content weight W _V are respectively multiplied by each vector, and the feature vector is differentiated into index vector Q, query vector K and content vector V. This process can be expressed as:

Q＝W_Qw_i,j,K＝W_Kw_i,j,V＝W_Vw_i,j Q=W _Q w _i,j ,K=W _K w _i,j ,V=W _V w _i,j

将索引向量Q和查询向量的转置K^T做矩阵相乘，相当于计算向量集合内的内积，即可计算出向量集合中不同向量之间的相关性；对相关性矩阵做softmax归一化处理后与内容向量V相乘即可得到自注意力机制的输出

Multiply the index vector Q and the transpose K ^T of the query vector by matrix, which is equivalent to calculating the inner product in the vector set, and then calculate the correlation between different vectors in the vector set; perform softmax normalization on the correlation matrix The output of the self-attention mechanism can be obtained by multiplying it with the content vector V after optimization

式中，B_P为相对位置编码，用于减少滑窗过程中引入的重复自注意力计算，d_k为特征向量的长度；随后，特征向量先经过全连接层一，再与输入图像拼接后通过全连接层二得到输出向量

可以表示为：In the formula, _BP is the relative position encoding, which is used to reduce the repeated self-attention calculation introduced in the sliding window process, d _k is the length of the feature vector; then, the feature vector first passes through the fully connected layer 1, and then spliced with the input image Get the output vector through the fully connected layer 2

It can be expressed as:

式中，W₁′、W₂′为全连接层一和二的权重参数，B₁′、B₂′分别为全连接层一和全连接层二的偏置参数；求得输出向量后，将其按原顺序重新组合为特征图f₂′∈^C'^×H×W，其中重叠的像素由各窗口中该像素灰度值的期望来代替；最后，特征图f₂′通过一个核尺寸1×1、步长为1的卷积操作，再与输入特征图f_i′相加以实现局部残差连接，得到输出特征f_o′，该过程可被表示为：In the formula, W ₁ ′, W ₂ ′ are weight parameters of fully connected layer 1 and 2, B ₁ ′, B ₂ ′ are bias parameters of fully connected layer 1 and fully connected layer 2 respectively; after obtaining the output vector, Reassemble it into a feature map f ₂ ′∈ ^C ′ ^×H×W in the original order, where the overlapping pixels are replaced by the expectation of the gray value of the pixel in each window; finally, the feature map f ₂ ′ passes a kernel size The convolution operation of 1×1 with a step size of 1 is added to the input feature map f _i ′ to realize the local residual connection, and the output feature f _o ′ is obtained. This process can be expressed as:

f_o′＝σ(W_o′*f₂′+B_o′)+f_i′f _o ′=σ(W _o ′*f ₂ ′+B _o ′)+f _i ′

图像特征提取模块得到输出特征f₄∈^5C×H×W后，将其输入输出图像重建模块，首先经过一个信道压缩层，使用核尺寸1×1、步长为1的卷积将通道数压缩到与初始特征f₁∈^C×H×W相同，再与初始特征相加后，用核尺寸1×1、步长为1的卷积层将通道进一步减少到尺度的平方，最后使用如图5所示的像素重组输出最终的超分辨率重建图像I_SR∈^1×sH×sW(s为超分倍数)；该操作可被具体表示为：After the image feature extraction module obtains the output feature f ₄ ∈ ^5C×H×W , it is input to the output image reconstruction module. First, it passes through a channel compression layer, and uses a convolution with a kernel size of 1×1 and a step size of 1 to compress the number of channels. to be the same as the initial feature f ₁ ∈ ^C×H×W , and then added to the initial feature, the channel is further reduced to the square of the scale with a convolutional layer with a kernel size of 1×1 and a step size of 1, and finally used as shown in the figure The pixel reorganization shown in 5 outputs the final super-resolution reconstructed image I _SR ∈ ^1×sH×sW (s is the super-resolution multiple); this operation can be specifically expressed as:

I_SR＝G_pixelshuffle(W_c2*σ(W_c1*f₄)+f₁)I _SR ＝G _pixelshuffle (W _c2 *σ(W _c1 *f ₄ )+f ₁ )

式中，W_c1、W_c2为信道压缩层和卷积层的权重参数，G_pixelshuffle(·)表示像素重组操作；In the formula, W _c1 and W _c2 are the weight parameters of the channel compression layer and the convolution layer, and G _pixelshuffle ( ) represents the pixel shuffling operation;

步骤2，准备数据集；Step 2, prepare the data set;

所属步骤2中的数据集使用FLIR ADAS数据集，该数据集包括8862张分辨率为512×640的热红外图像；首先将这些图像切割为256×256的图像块，一共获得37976张图像块，然后用双三次下采样获取低分辨率图像，组合成高低分辨率图像对；为了扩充数据量，随即对图像进行水平翻转、垂直翻转、旋转、平移和缩放裁剪变换；The data set in Step 2 uses the FLIR ADAS data set, which includes 8862 thermal infrared images with a resolution of 512×640; first, these images are cut into 256×256 image blocks, and a total of 37,976 image blocks are obtained. Then use bicubic downsampling to obtain low-resolution images and combine them into high- and low-resolution image pairs; in order to expand the amount of data, the images are then horizontally flipped, vertically flipped, rotated, translated, and scaled and cropped;

步骤3，训练网络模型；Step 3, train the network model;

所述步骤3中的训练方案具体为：设定训练次数为100，每次输入到网络图片数量大小为16-32左右，每次输入到网络图片数量大小的上限主要是根据计算机图形处理器性能决定，一般每次输入到网络图片数量在16-32区间内，可以使网络训练更加稳定且训练结果更好；训练过程的学习率设置为0.001，既能保证训练的速度，又能避免出现梯度爆炸的问题；训练至100次、150次和175次时将学习率下降为目前学习率的0.1，能更好地接近参数最优值；网络参数优化器选择自适应矩估计算法，其优点主要在于经过偏置校正后，每一次迭代学习率都有个确定范围，使得参数比较平稳；损失函数函数值阈值设定为0.01，小于该阈值就可以认为整个网络的训练已基本完成；The training scheme in the described step 3 is specifically: the number of times of training is set to be 100, and the number of pictures input to the network is about 16-32 each time, and the upper limit of the number of pictures input to the network is mainly based on the performance of the computer graphics processor. It is decided that the number of pictures input to the network each time is generally in the range of 16-32, which can make the network training more stable and the training results better; the learning rate of the training process is set to 0.001, which can not only ensure the speed of training, but also avoid gradients The problem of explosion; when the training reaches 100, 150 and 175 times, the learning rate is reduced to 0.1 of the current learning rate, which can better approach the optimal value of the parameters; the network parameter optimizer chooses the adaptive moment estimation algorithm, and its advantages are mainly After the bias correction, the learning rate of each iteration has a certain range, so that the parameters are relatively stable; the threshold of the loss function value is set to 0.01, and the training of the entire network can be considered to be basically completed if it is less than this threshold;

步骤4，最小化损失函数和选择最优评估指标；Step 4, minimize the loss function and select the optimal evaluation index;

所述步骤4中将在网络的输出和标签计算损失值，通过最小化损失函数达到更好的超分辨率重建效果；损失函数选择结构相似性和像素损失，根据模型当前的训练效果调整损失函数的使用；结构相似性计算公式如下所示：In the step 4, the loss value will be calculated on the output and label of the network, and a better super-resolution reconstruction effect can be achieved by minimizing the loss function; the loss function selects structural similarity and pixel loss, and adjusts the loss function according to the current training effect of the model The use of; the structural similarity calculation formula is as follows:

SSIM(x,y)＝[l(x,y)]^α·[c(x,y)]^β·[s(x,y)]^γ SSIM(x,y)=[l(x,y)] ^α ·[c(x,y)] ^β ·[s(x,y)] ^γ

其中，l(x,y)表示亮度对比函数，c(x,y)表示对比度对比函数，s(x,y)表示结构对比函数，三个函数的定义如下所示：Among them, l(x,y) represents the brightness contrast function, c(x,y) represents the contrast contrast function, s(x,y) represents the structure contrast function, and the definitions of the three functions are as follows:

在实际应用中，α、β和γ均取值为1，C₃为0.5C₂，因此结构相似性公式可以表示为：In practical applications, α, β and γ all take the value of 1, and C ₃ is 0.5C ₂ , so the structural similarity formula can be expressed as:

x和y分别表示两张图像中大小为N×N的窗口的像素点，μ_x和μ_y分别表示x和y的均值，可作为亮度估计；σ_x和σ_y分别表示x和y的方差，可作为对比度估计；σ_xy表示x和y的协方差，可作为结构相似性度量；c1和c2为极小值参数，可避免分母为0，通常分别取0.01和0.03；所以根据定义，整个图像的结构相似性计算方式如下所示：x and y respectively represent the pixels of the window of size N×N in the two images, μ _x and μ _y represent the mean values of x and y respectively, which can be used as brightness estimation; σ _x and σ _y represent the variance of x and y respectively , can be used as a contrast estimate; σ _xy represents the covariance of x and y, which can be used as a structural similarity measure; c1 and c2 are minimum value parameters, which can avoid the denominator from being 0, and usually take 0.01 and 0.03 respectively; so by definition, the entire The structural similarity of images is calculated as follows:

X和Y分别表示待比较的两张图像，MN为窗口总数量，x_ij和y_ij为两张图片中各局部窗口；结构相似性具有对称性，其数值范围在[0,1]之间，数值越接近于1，结构相似性越大，两图像的差异越小；一般情况下，通过网络优化直接缩小其与1之间的差值即可，结构相似性损失如下所示：X and Y respectively represent the two images to be compared, MN is the total number of windows, x _ij and y _ij are the local windows in the two images; the structural similarity is symmetrical, and its value range is between [0,1] , the closer the value is to 1, the greater the structural similarity, and the smaller the difference between the two images; in general, it is enough to directly reduce the difference between it and 1 through network optimization, and the structural similarity loss is as follows:

SSIM_loss＝1-MSSIM(I_ir,I_SR)SSIM _loss ＝1-MSSIM(I _ir ,I _SR )

通过优化结构相似性损失，可逐步缩小输出图像与输入图像结构上的差异，使得图像在亮度、对比度上更相近，直觉感知上也更相近，生成图像质量较高；By optimizing the structural similarity loss, the structural difference between the output image and the input image can be gradually reduced, making the image more similar in brightness and contrast, and more similar in intuitive perception, and the quality of the generated image is higher;

像素损失函数定义如下所示：The pixel loss function is defined as follows:

在网络训练之初或出现严重波动时，像素损失能够稳定地优化网络参数，使网络向正确的方向继续训练；但像素损失主要来源于能量集中的低频部分的差异，即便差异已经很小，所以结构相似性损失这种聚焦于图像结构差异的损失更适合对网络进行精细调整；基于此，总损失函数被定义为：At the beginning of network training or when there are severe fluctuations, pixel loss can stably optimize network parameters and allow the network to continue training in the correct direction; but pixel loss mainly comes from the difference in the low-frequency part of the energy concentration, even if the difference is small, so Structural Similarity Loss This loss that focuses on structural differences in images is more suitable for fine-tuning the network; based on this, the overall loss function is defined as:

所述步骤4中合适的评估指标选择峰值信噪比(PSNR)、结构相似性(SSIM)，峰值信噪比是基于对应像素点间的误差，即基于误差敏感的图像质量评价；结构相似性则是从亮度、对比度和结构三方面度量图像相似性，是一种用以衡量两张数位影像相似程度的指标；结构相似性定义同损失函数峰值信噪比质量评估定义如下：Appropriate evaluation index selection peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) in the described step 4, peak signal-to-noise ratio is based on the error between corresponding pixel points, promptly based on error-sensitive image quality evaluation; Structural similarity It measures image similarity from three aspects: brightness, contrast, and structure. It is an index used to measure the similarity of two digital images; the definition of structural similarity is the same as that of the loss function peak signal-to-noise ratio quality evaluation. The definition is as follows:

步骤5，微调模型；Step 5, fine-tune the model;

所述步骤5中，采用MFNet和TNO数据集的红外图像数据，共包括约2000张红外图像，分辨率为640×480；将其图像进行同步骤2中的图像预处理操作，得到模型微调数据集；加载步骤4得到的模型权重参数，将学习率调整为0.000001，将模型微调数据集的图像对输入模型，继续训练10个训练周期；In the step 5, the infrared image data of the MFNet and TNO data sets are used, including a total of about 2000 infrared images with a resolution of 640×480; the images are subjected to the image preprocessing operation in the step 2 to obtain the model fine-tuning data Set; load the model weight parameters obtained in step 4, adjust the learning rate to 0.000001, input the image pairs of the model fine-tuning data set into the model, and continue training for 10 training cycles;

步骤6，保存模型与参数；Step 6, save the model and parameters;

所述步骤6中将网络训练完成后，需要将网络中所有参数保存，之后输入任意大小的图像均可获得超分辨率重建结果；After the network training is completed in the step 6, it is necessary to save all the parameters in the network, and then input an image of any size to obtain the super-resolution reconstruction result;

其中，卷积、拼接和上下采样等操作的实现是本领域技术人员公知的算法，具体流程和方法可在相应的教科书或者技术文献中查阅到。Among them, the realization of operations such as convolution, splicing, and up-and-down sampling are algorithms well known to those skilled in the art, and specific processes and methods can be found in corresponding textbooks or technical documents.

本发明通过构建一种轻量级红外超分辨率自适应重建方法，可以获得较高质量的超分辨率重建效果，由于其轻量化结构，相比先前的复杂网络具有更少的参数量，能应用于各种移动设备上；通过计算与现有方法得到图像的相关指标，进一步验证了该方法的可行性和优越性；现有技术和本发明提出方法的相关指标对比如图6所示；By constructing a light-weight infrared super-resolution adaptive reconstruction method, the present invention can obtain a higher-quality super-resolution reconstruction effect. Due to its lightweight structure, it has fewer parameters than the previous complex network, and can Applied to various mobile devices; by calculating and obtaining the related indicators of the image with the existing method, the feasibility and superiority of the method are further verified; the comparison of the related indicators of the prior art and the method proposed by the present invention is shown in Figure 6;

基于与上述图像超分辨率重建的方法相同的发明构思，本申请实施例还提供了一种电子设备，该电子设备具体可以为具备信号传输、浮点运算和存储的桌面计算机、便携式计算机、边缘计算设备、平板电脑、智能手机等，如图7所示，该电子设备可由主要部件处理器、存储器和通信接口组成；Based on the same inventive concept as the above image super-resolution reconstruction method, the embodiment of the present application also provides an electronic device, which can specifically be a desktop computer, a portable computer, an edge Computing equipment, tablet computers, smart phones, etc., as shown in Figure 7, the electronic equipment can be composed of main components processor, memory and communication interface;

处理器可以是通用处理器，例如中央处理器(CPU)、数字信号处理器(DSP)、图像处理器(GPU)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件，可以实现或者执行本申请实施例中公开的各方法、步骤及逻辑框图；通用处理器可以是微处理器或者任何常规的处理器等；结合本申请实施例所公开的方法步骤可以直接体现为硬件处理器执行完成，或用处理器中的硬件及软件模块组合执行完成；The processor can be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor (DSP), a graphics processor (GPU), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable Logic devices, discrete gate or transistor logic devices, and discrete hardware components can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application; the general-purpose processor can be a microprocessor or any conventional processor; The method steps disclosed in the embodiments of the present application can be directly implemented by a hardware processor, or by a combination of hardware and software modules in the processor;

存储器作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块；存储器可以包括至少一种类型的存储介质，例如可以包括随机访问存储器(RAM)、静态随机访问存储器(SRAM)、带电可擦除可编程只读存储器(EEPROM)、磁性存储器、光盘等等；存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此；本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置，用于存储程序指令或数据；As a non-volatile computer-readable storage medium, the memory can be used to store non-volatile software programs, non-volatile computer-executable programs and modules; the memory can include at least one type of storage medium, for example, it can include random Access memory (RAM), static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), magnetic memory, optical disk, etc.; memory is a memory that can be used to carry or store desired data in the form of instructions or data structures program code and any other medium that can be accessed by a computer, but is not limited thereto; the memory in this embodiment of the application can also be a circuit or any other device that can implement a storage function for storing program instructions or data;

通信接口可用于计算设备与其他计算设备、终端或成像设备的数据传输，通信接口可以采用通用的协议，例如通用串行总线(USB)、同步/异步串行接收/发送器(USART)、控制器局域网络(CAN)等等；通信接口能用于在不同设备之间传递数据的接口及其通信协议，但不限于此；本申请实施例中的通信接口还可以是光通信或其他任意能够实现信息传输的方式或协议；The communication interface can be used for data transmission between the computing device and other computing devices, terminals or imaging devices. The communication interface can use common protocols, such as Universal Serial Bus (USB), Synchronous/Asynchronous Serial Receiver/Transmitter (USART), control device area network (CAN), etc.; the communication interface can be used to transfer data between different devices and its communication protocol, but is not limited to this; the communication interface in the embodiment of the application can also be optical communication or any other capable The method or protocol to realize the information transmission;

本发明还提供了一种轻量级红外超分辨率自适应重建的计算机可读存储介质，该计算机可读存储介质可以是上述实施方式中所述装置中所包含的计算机可读存储介质；也可以是单独存在，未装配入设备中的计算机可读存储介质；计算机可读存储介质存储有一个或者一个以上程序，所述程序被一个或者一个以上的处理器用来执行描述于本发明提供的方法；The present invention also provides a computer-readable storage medium for lightweight infrared super-resolution adaptive reconstruction, the computer-readable storage medium may be the computer-readable storage medium included in the device described in the above implementation manner; It can be a computer-readable storage medium that exists independently and is not assembled into the device; the computer-readable storage medium stores one or more programs, and the program is used by one or more processors to execute the method described in the present invention ;

应注意，尽管图7所示的电子设备仅仅示出了存储器、处理器、通信接口，但是在具体实现过程中，本领域的技术人员应当理解，装置还包括实现正常运行所必须的其他器件；同时，根据具体需要，本领域的技术人员应当理解，装置还可包括实现其他附加功能的元器件；此外，本领域的技术人员应当理解，装置也可仅包括实现本发明实施例所必须的器件，而不必包括图7中所示的全部器件。It should be noted that although the electronic device shown in FIG. 7 only shows a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device also includes other devices necessary for normal operation; At the same time, according to specific needs, those skilled in the art should understand that the device can also include components to realize other additional functions; in addition, those skilled in the art should understand that the device can also only include the necessary devices to realize the embodiments of the present invention , without necessarily including all the devices shown in Figure 7.

最后应说明的是：以上所述仅为本发明的优选实施例而已，并不用于限制本发明，尽管参照前述实施例对本发明进行了详细的说明，对于本领域的技术人员来说，其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Finally, it should be noted that: the above is only a preferred embodiment of the present invention, and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it still The technical solutions recorded in the foregoing embodiments may be modified, or some technical features thereof may be equivalently replaced. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A lightweight infrared super-resolution adaptive reconstruction method, characterized in that: comprising the steps:

Step 1, constructing a network model: the infrared image super-resolution reconstruction model includes an input initialization layer, an image feature extraction module and an output image reconstruction module;

Step 2, prepare the data set: prepare the infrared image data set, and perform simulated downsampling and data augmentation on it for subsequent network training;

Step 3, training the network model: training the infrared image super-resolution reconstruction model, inputting the data set prepared in step 2 into the network model constructed in step 1 for training;

Step 4, minimize the loss function and select the optimal evaluation index: by minimizing the loss function of the network output image and label, until the number of training times reaches the set threshold or the value of the loss function reaches the set range, the model parameters can be considered to have been predicted. After the training is completed, the model parameters are saved; at the same time, the optimal evaluation index is selected to measure the accuracy of the algorithm and evaluate the performance of the system;

Step 5, fine-tuning the model: prepare multiple additional infrared image data sets, train and fine-tune the model, obtain better model parameters, and further improve the generalization ability of the model; finally, the model can cope with various types of infrared imagers maintain good reconstruction quality;

Step 6, save the model: solidify the final model parameters, and then when the infrared image super-resolution reconstruction operation is required, directly input the image into the network to obtain the final reconstructed image.

2. a kind of light-weight infrared super-resolution adaptive reconstruction method according to claim 1, is characterized in that: in the infrared image super-resolution reconstruction model in described step 1, input initialization is single-layer convolutional layer, uses It is used to map the input image into the feature space for further refinement and processing of subsequent features; the image feature extraction module consists of four layers of adaptive image feature processing units, specifically, the adaptive image feature processing unit consists of convolutional layers 1. Self-attention layer and convolution layer 2. The self-attention layer is composed of linear feature disassembly, self-attention mechanism, relative position encoding layer, fully connected layer 1, fully connected layer 2 and feature reorganization; output image The reconstruction module consists of a channel compression layer, a global skip connection, and a pixel reorganization layer.

3. A light-weight infrared super-resolution adaptive reconstruction method according to claim 1, characterized in that: the self-attention mechanism in the step 1.

4. a kind of light-weight infrared super-resolution adaptive reconstruction method according to claim 1, is characterized in that: in described step 1, Transformer module is made of efficient global local multi-head self-attention (EGLMSA) and multi-layer perceptron ( The MLP) block consists of two layer normalization layers and two summation operations, in which the efficient global local multi-head self-attention layer extracts the global context and local context respectively. The global context is crucial for the semantic segmentation of complex urban scenes, but the local Information For preserving rich spatial details, the proposed effective global-local attention constructs two parallel branches. The local branch is a relatively shallow structure that uses two parallel convolutional layers to extract local context. Then add two batch normalization operations before the final sum operation; the global branch first deploys a depthwise convolution to reduce the image resolution, thereby compressing the amount of computation and memory, and then uses the vector as the layer normalization Input, three vectors Q, K, and V are sent into three linear predictions, Q, K, and V are obtained by linear transformation of the input word vector X, and each matrix W can be obtained through learning, and this transformation can improve The fitting ability of the model, the obtained Q, K, V can be understood as Q: the information to be queried, K: the vector to be queried, V: the value obtained by the query, perform matrix multiplication on the Q and K vectors, and then pass the volume The product layer, Softmax activation function and instance normalization operation perform attention mapping, perform matrix multiplication on the obtained attention map and V vector, and finally aggregate the global context in the global branch and the local context in the local branch to generate a global - Local context, using depthwise convolutions, batch normalization operations, and standard convolutions to represent fine-grained global-local contexts.

5. a kind of lightweight infrared super-resolution adaptive reconstruction method according to claim 1, is characterized in that: in described step 2, semantic segmentation data set uses MFNet data set; The picture cropping of training set and verification set into several block pictures, and the resolution and dimension of each block picture are the initial resolution and initial dimension; perform semantic segmentation and labeling on the block picture category.

6. a kind of lightweight infrared super-resolution adaptive reconstruction method according to claim 1, is characterized in that: in described step 3, semantic segmentation data set uses MFNet data set in pre-training process; The four-channel image channel is separated to obtain visible light color images and infrared images. Images with complex scenes, various details, and complete categories are selected as training samples, and the rest of the images are used as test set samples. Visible light images and infrared images are respectively used as input networks for training.

7. A kind of lightweight infrared super-resolution adaptive reconstruction method according to claim 1, is characterized in that: in described step 4, loss function selects DiceLoss loss function in training process; The selection of loss function affects model It can truly reflect the difference between the predicted value and the true value, and can correctly feedback the quality of the model.

8. A light-weight infrared super-resolution adaptive reconstruction method according to claim 1, characterized in that: SODA is used in the process of fine-tuning model parameters in said step 5.