Nothing Special   »   [go: up one dir, main page]

CN113205018A - High-resolution image building extraction method based on multi-scale residual error network model - Google Patents

High-resolution image building extraction method based on multi-scale residual error network model Download PDF

Info

Publication number
CN113205018A
CN113205018A CN202110434612.1A CN202110434612A CN113205018A CN 113205018 A CN113205018 A CN 113205018A CN 202110434612 A CN202110434612 A CN 202110434612A CN 113205018 A CN113205018 A CN 113205018A
Authority
CN
China
Prior art keywords
building
image
layer
network
residual error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110434612.1A
Other languages
Chinese (zh)
Other versions
CN113205018B (en
Inventor
眭海刚
杜卓童
李强
段志强
肖昶
王海涛
王挺
程旗
冯文卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110434612.1A priority Critical patent/CN113205018B/en
Publication of CN113205018A publication Critical patent/CN113205018A/en
Application granted granted Critical
Publication of CN113205018B publication Critical patent/CN113205018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a high-resolution image building extraction method based on a multi-scale residual error network model. Firstly, analyzing the type and the characteristics of a typical building in a high-resolution remote sensing image, designing a data augmentation strategy based on the large data demand of a deep learning network, and determining the super-parameter ratio of a training sample set and a verification sample set; secondly, a dense shortcut structure is combined in a basic unit of a U-Net network symmetrical structure, a residual mapping unit is designed, and the arrangement of a convolution layer structure in the basic unit is improved, so that model training is facilitated; meanwhile, the improved network designs the image input stage as a characteristic pyramid input structure, so that image characteristics can be learned on different scales, multi-scale characteristic fusion is performed by combining the designed residual jump connection mode, the building segmentation result is refined through multi-level residual unit operation, the reusability of multi-level characteristics among different network layers is enhanced, the transmission of gradients in the network is effectively enhanced, and the model convergence is accelerated.

Description

High-resolution image building extraction method based on multi-scale residual error network model
Technical Field
The invention relates to the technical field of remote sensing application, in particular to a high-resolution image building extraction method based on a multi-scale residual error network model.
Background
The building is one of basic elements forming the urban structure, and the intelligent extraction method is a vital task for urban planning, monitoring and management and has important application value for urban development analysis. Compared with the medium-low resolution remote sensing image, the method for detecting the change of the building by using the high-resolution remote sensing image has the advantages that: the image contains richer terrain information, for example, an artificial building is represented as a point target in the medium-low resolution remote sensing image, but becomes an obvious surface target in the high resolution image, and the targets occupy more pixels; the characterization information such as the spatial structure, the texture and the like of the ground object of the same type is richer, and the information can better reflect the local characteristics and the internal detail difference of the ground object of the same type. However, the high-resolution remote sensing image brings rich detail information, simultaneously amplifies the more subtle and ignorable interference information in the medium-low resolution remote sensing image, and forms a new interference factor influencing building detection. Although the improvement of the image spatial resolution alleviates the problem of the mixed pixels of the low-resolution sensor, the spectral response of the interior of the area corresponding to the same building is greatly different due to the difference of building materials. The target structure of the building under the complex background is variable, the height is staggered, the building is often easily confused with surrounding ground objects such as trees, roads and the like, the phenomena of 'same-spectrum foreign matters and same-object different-spectrum' are obvious, and the difficulty in extracting the building is increased. The elevation discontinuities, the relatively more shadowing in the image, and the effects of shadows present in complex structures of buildings are more challenging to detect.
In recent years, with the development of computer computing power and deep learning algorithms, the target detection and identification and image semantic segmentation based on the convolutional neural network gradually exceed the best effect of the traditional algorithm, and the accuracy rate of building target extraction in the remote sensing image is greatly improved by an end-to-end deep network training method. Among them, the method based on the deep coding-decoding network has been widely applied to the building extraction. The coding part of the network is mainly used for extracting deep abstract features, most common coding network parts adopt classical network models such as VGGNet, ResNet and DenseNet, a full connection layer is abandoned, image blocks input in the network are subjected to multiple pooling operations, and feature diagram sizes in the middle of the network are subjected to multiple compression. The decoding part of the network is mainly used for learning from the characteristics acquired by the encoding part and recovering the image to obtain the building prediction mark image. At present, most networks adopt an upsampling and hopping connection structure, and characteristics learned at a bottom layer are transmitted to a high layer for decoding detailed information lost in the network recovery image. However, simply connecting the feature maps extracted by the encoder portions directly to the symmetrical encoder portions does not make full use of the feature information in multiple levels, and the detailed position information of the building object is still not effectively recovered. In addition, the depth model often has too high limitation on video memory and hardware conditions, how to improve the efficiency of target extraction, and the balance precision and the calculation cost are another main problem.
Disclosure of Invention
Aiming at the problems in the prior art, the invention adopts a strategy of multi-level feature integration and multi-scale feature fusion, designs a multi-scale residual error link network model, aims to solve the problem of building detail information loss caused by pooling operation in a deep network, and utilizes abundant multi-scale context feature information to realize more precise building segmentation. Meanwhile, the deep network model constructed by the invention can reduce the requirements of model training parameters and memory.
The technical scheme of the invention is as follows: a high-resolution image building extraction method based on a multi-scale residual error network model comprises the following steps:
analyzing the characteristics of building images of different types and styles according to typical building regions in a high-resolution remote sensing image, expanding a sample based on a data augmentation strategy, and determining the hyper-parameter ratio of a training set and a verification set;
designing a multi-scale residual error connection depth network integral model structure based on a convolution neural network basic symmetrical structure, a dense shortcut structure, a residual error jumping connection mode and a characteristic pyramid input structure, wherein the multi-scale residual error connection depth network integral model structure comprises the following substeps;
step 2.1, the multi-scale residual error connection depth network integrally comprises an encoder part and a decoder part;
step 2.2, the encoder part adopts a characteristic pyramid network input structure to obtain images on m different scales, then the images are processed by a convolution layer to ensure that the input of the next layer is consistent with the size of the output characteristic graph of the previous layer, the convolution characteristic graph output under the scale of the previous layer is merged with the image characteristic graph processed by the convolution layer under the scale of the previous layer to be used as the input of the next layer, and then the images are processed by a residual error mapping unit and a maximum pooling layer;
the residual mapping unit comprises two branches, a main branch comprises a plurality of convolutional layer units, a branch comprises a convolutional layer unit, and the convolutional layer unit comprises a convolutional layer, a modified linear unit and a batch normalization layer; let the input be x and the principal branches be denoted as
Figure BDA0003032387710000021
The branch is represented as
Figure BDA0003032387710000022
The output of the residual mapping unit is as shown in equation (2):
Figure BDA0003032387710000023
step 2.3, the decoder part comprises an up-sampling layer and a residual mapping unit corresponding to the encoder part;
step 2.4, combining the depth characteristic graph output by each scale of the encoder part with the characteristic graph obtained by an upper sampling layer under the corresponding scale of the decoder part in a residual error jumping connection mode;
step 2.5, finally, the output of the encoder part is processed by a convolution layer, and then a two-dimensional feature map is converted into a classification map through a Sigmoid activation layer;
and step three, training the multi-scale residual error connection depth network by using the training sample set in the step 1, obtaining an optimal multi-scale residual error connection depth network model by verifying the sample set, and finally performing high-resolution image building extraction on the test set by using the optimal model.
Further, the specific implementation steps of the first step are as follows:
(1) analyzing different characteristics of a typical building area in the high-resolution remote sensing image:
(a) the multi-layer residential area with the brick-concrete structure is a building which is orderly arranged, the planning is orderly, the number of layers is large, the arrangement is orderly, the buildings in the same residential area are uniformly arranged, and the styles of the buildings are uniform;
(b) the high-rise residential district, the single high-rise office building and the commercial building roof with clear building frame structures have the advantages that the streets are neat, the height of the adjacent buildings is large, the long and narrow shadows are formed, the building spacing is large, and the height and the appearance of each building are different;
(c) suburban buildings are sparsely distributed on roofs, are low houses, are irregular in shape, alternate in canine teeth and are connected with one another;
(d) villas are orderly arranged and are all single buildings, the length and the width of the buildings are consistent, and the shadows are short and small; the roof shape is consistent with the outer wall material, and each villa has a garden;
(2) and (3) expanding a training sample set by adopting a plurality of data augmentation strategies:
(a) randomly cutting the input image and the output label image;
(b) the input image and the output label image are randomly rotated, tr∈[-5,5];
(c) Multiplying each wave band of an input image by a random value n, wherein n belongs to [0.5,1 ];
(d) randomly turning the input image and the output label image horizontally and vertically;
(3) data segmentation, namely dividing a data set of building examples and other surface objects covering various urban, suburban and rural areas into a training set, a verification set and a test set, and determining a training set data sample: validation set data samples were 5: 1, dividing the data sample into input images with the size of 512 multiplied by 512, so as to train the model and test the training effect of the model in the following.
Further, in step 2.2, the convolution kernel sizes of the convolution layer in the main branch include two types, i.e., 3 × 3 and 1 × 1, the step parameter size is set to 1, the filling parameter size is set to 1, and the convolution layer in the branch adopts convolution kernels with the size of 1 × 1.
Further, in step 2.2, a feature pyramid network input structure is adopted, and images at 5 different scales are input as convolutional layers to perform image feature learning at different scales, which is 512 × 512 × 3, 256 × 256 × 3, 128 × 128 × 3, 64 × 64 × 3, and 32 × 32 × 3, respectively.
The invention is based on a multi-scale residual connection depth network model, researches an extraction method of a monomer building in a high-resolution remote sensing image, and is characterized in that:
(1) because the existing open-source building data sets are basically from the same sensor or images with approximate imaging time, namely the data distribution of the test image and the training image is very close, the robustness of a depth network model is poor, a data augmentation strategy is designed by analyzing the building characteristics in a multi-source multi-temporal data image, the data hyper-parameter in the depth network training is determined, and the generalization capability of a depth convolution neural network is improved;
(2) in consideration of the advantages of UNet and ResNet network structures, the basic unit structure of the deep convolutional network convolutional layer is improved, a residual mapping unit is designed on the basis of the basic unit structure of the UNet network, the structural arrangement of the convolutional layer in the basic unit is improved, and the model training efficiency is improved while gradient information transmission is ensured;
(3) because the network layers with different receptive fields have different representation capabilities on geometric and semantic information, the image features are learned on different scales by designing a feature pyramid input structure, and if the feature graphs extracted by the encoder part are directly connected to the symmetrical encoder part, the feature information cannot be fully fused, a residual error jump connection method is further researched, and the reusability of multi-level features among different network layers is enhanced.
Drawings
FIG. 1 is a flow processing diagram of a high-resolution remote sensing image single building extraction method based on a multi-scale residual connection depth network model.
Fig. 2 is a basic structural unit of residual mapping in a building extraction network, wherein "Conv" represents a convolutional layer, "ReLU" represents a modified linear unit, and "BN" represents a batch normalization layer.
Fig. 3 shows a designed 'Res path' hopping connection scheme. Different from a direct connection mode of directly connecting the characteristics of the encoder and the characteristics of the decoder, the method adopts multi-level residual error unit operation to fuse the multi-level characteristics.
Detailed Description
The invention provides a high-resolution image building extraction method based on a multi-scale residual error network model. Firstly, analyzing the type and the characteristics of a typical building in a high-resolution remote sensing image, designing a data augmentation strategy based on a large amount of data requirements of a deep learning network, and determining the super-parameter ratio of a training sample set and a verification sample set; secondly, a dense shortcut structure, namely a residual mapping unit, is combined in a basic unit of a U-Net network symmetrical structure, and the arrangement of a convolution layer structure in the basic unit is improved, so that the model training efficiency is improved while the gradient information transmission is ensured, the gradient disappearance in a deep network is avoided, and the model training is facilitated; meanwhile, the improved network designs the image input stage as a characteristic pyramid input structure, so that image characteristics can be learned on different scales, multi-scale characteristic fusion is performed by combining the designed residual jump connection mode, the building segmentation result is refined through multi-level residual unit operation, the reusability of multi-level characteristics among different network layers is enhanced, the transmission of gradients in the network is effectively enhanced, and the model convergence is accelerated. The method can carry out self-adaptive learning and analysis from the shallow local features to the deep abstract features, and adopts the strategies of multi-level feature integration and multi-scale feature integration to obtain rich multi-scale context feature information so as to realize more precise building segmentation.
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and embodiments, wherein a flow chart is shown in fig. 1, and the technical solution flow of the embodiments includes the following steps:
step one, analyzing building image characteristics of different types and styles according to typical building regions in the high-resolution remote sensing image, designing a data augmentation strategy based on a large amount of data requirements of a deep learning network, and determining the super-parameter ratio of a training sample set and a verification sample set. The specific implementation steps for obtaining the super-parameter of the sample data in the deep network training process are as follows:
(1) and (6) analyzing the data. Analyzing different characteristics of a typical building area in the high-resolution remote sensing image:
(a) the multi-layer residential area with the brick-concrete structure is generally a building which is orderly arranged, the planning is orderly, the number of layers is large, the arrangement is orderly, the arrangement of the buildings in the same residential area is relatively consistent, and the style of the buildings is basically consistent;
(b) the high-rise residential district, the single high-rise office building and the commercial building roof with clear building frame structures have the advantages that the streets are neat, the height of the adjacent buildings is large, the long and narrow shadows are formed, the building spacing is large, and the height and the appearance of each building are different;
(c) suburban buildings are sparsely distributed on roofs, are mostly low houses, are irregular in shape, and are connected with one another in a canine tooth alternating manner;
(d) villas are generally arranged in order and are all single buildings, the length and the width of the buildings are consistent, and the shadows are short and small; the roof shape, the exterior wall material is substantially uniform, and each villa has its own garden.
(2) And (5) data amplification. The deep convolutional neural network has better generalization test capability by utilizing a large number of training samples covering various architectural characteristics, overfitting is avoided, and a training sample set needs to be expanded by adopting various data augmentation strategies:
(a) randomly cutting the input image and the output label image;
(b) the input image and the output label image are randomly rotated, tr∈[-5,5];
(c) Multiplying each wave band of the input image by a random value (n belongs to [0.5,1 ]);
(d) the input image and the output label image are randomly flipped horizontally and vertically.
(3) And (4) data segmentation. Dividing data sets of building examples and other surface objects covering various urban, suburban and rural areas into a training set, a verification set and a test set, and determining that (training set data samples: verification set data samples) ≈ 5: 1, dividing the data sample into input images with the size of 512 multiplied by 512, so as to train the model and test the training effect of the model in the following.
And secondly, designing a basic structure unit of the multi-scale residual error connection deep network model convolution layer based on the characteristics of a symmetrical structure and a dense shortcut structure, promoting the transmission of information in the network, avoiding the gradient disappearance in the deep network, and being beneficial to model training. The specific implementation steps of obtaining the improved network residual mapping structure unit are as follows:
(1) and designing a basic structure unit of the network convolution layer, namely, residual-to-short blocks. The residual mapping basic unit structure is provided with two branches, so that the information transmission is promoted, the model convergence speed is accelerated, and the model training is facilitated:
(a) designing a main branch structure:
convolution kernel sizes (kernel) of convolution layers include 3 × 3 and 1 × 1, a stride parameter (stride) size is set to 1, a padding parameter (padding) size is set to 1, and a modified linear unit (ReLU) and a Batch Normalization layer (BN) are used. Let the input image be x and the principal branches be denoted as
Figure BDA0003032387710000061
As shown in the left branch of fig. 2. The ReLU activation function is shown in equation (1):
Figure BDA0003032387710000062
(b) branch structure design:
the convolutional layer adopts 1 × 1 convolutional kernel, and is followed by ReLU layer and BN layer, and branch is shown as
Figure BDA0003032387710000063
As shown in the right branch of fig. 2.
(2) Residual mapping elementary unit output, as shown in equation (2):
Figure BDA0003032387710000064
and step three, learning image characteristics on different scales by adopting a characteristic pyramid network input structure, performing multi-scale characteristic fusion by combining a designed residual error jumping connection mode, refining a building segmentation result through multi-stage residual error unit operation, and improving network training performance. The method for fusing multi-scale feature representation based on the residual jump connection mode comprises the following steps:
(1) and (3) adopting a characteristic pyramid network input structure, taking images under 5 different scales as input of the convolution layer to carry out image characteristic learning on the different scales, wherein the input of the next layer is consistent with the output characteristic diagram of the previous layer in size, and the input of the next layer is respectively 512 multiplied by 3, 256 multiplied by 3, 128 multiplied by 3, 64 multiplied by 3 and 32 multiplied by 3.
(2) And merging the convolution characteristic graph output under the previous layer scale and the image input under the previous layer scale, and performing characteristic learning by using a merged result as a new convolution layer input, so that multi-scale characteristic fusion can be performed.
(3) Designing a residual jump connection mode (Res Path) to replace the traditional simple mode of directly connecting the characteristic diagram of the encoder part to the symmetrical decoder part, directly connecting the output characteristic diagram of the encoder part with the corresponding upsampling characteristic of the decoder after residual unit operation (shown in a formula (2)), integrating the low-layer characteristic diagram with the symmetrical high-layer characteristic diagram to form a new tensor, and performing subsequent calculation and processing operations. The design of the residual jump connection structure is shown in fig. 3.
And step four, designing the overall model structure design of the multi-scale residual error connection depth network based on the basic structure unit of the convolutional neural network, the residual error jump connection mode and the characteristic pyramid input structure, wherein the specific parameters are shown in the table 1. The steps of obtaining the overall structure of the multi-scale residual connection depth network model are as follows:
(1) the network encoder structure design:
the encoder part adopts the strategies of multi-level feature integration and multi-scale feature integration and is formed by stacking a continuous convolutional layer, a residual mapping unit (short blocks) and a maximum pooling layer (max-pooling layers).
(2) And (3) structural design of a network decoder:
the decoder part is formed by stacking corresponding upper sampling layers (Upsampling layers), dense short-cut units (dense short blocks) and convolution layers; the Sigmoid activation layer is responsible for converting the two-dimensional depth feature map into a classification map.
(3) And (4) based on the residual jump connection mode designed in the step three (3), adding 5 extra Res path jump connection modes in a 5-pair down-sampling-up-sampling symmetrical structure, and fusing the pixel position information of the front layer feature diagram in the encoder and the semantic information of the up-sampling feature diagram of the decoder part to finish the refinement of the building segmentation result and improve the network training performance.
TABLE 1 Multi-Scale residual connection depth network Whole model parameter Table
Figure BDA0003032387710000071
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (4)

1. A high-resolution image building extraction method based on a multi-scale residual error network model is characterized by comprising the following steps:
analyzing the characteristics of building images of different types and styles according to typical building regions in a high-resolution remote sensing image, expanding a sample based on a data augmentation strategy, and determining the hyper-parameter ratio of a training set and a verification set;
designing a multi-scale residual error connection depth network integral model structure based on a convolution neural network basic symmetrical structure, a dense shortcut structure, a residual error jumping connection mode and a characteristic pyramid input structure, wherein the multi-scale residual error connection depth network integral model structure comprises the following substeps;
step 2.1, the multi-scale residual error connection depth network integrally comprises an encoder part and a decoder part;
step 2.2, the encoder part adopts a characteristic pyramid network input structure to obtain images on m different scales, then the images are processed by a convolution layer to ensure that the input of the next layer is consistent with the size of the output characteristic graph of the previous layer, the convolution characteristic graph output under the scale of the previous layer is merged with the image characteristic graph processed by the convolution layer under the scale of the previous layer to be used as the input of the next layer, and then the images are processed by a residual error mapping unit and a maximum pooling layer;
the residual mapping unit comprises two branches, a main branch comprises a plurality of convolutional layer units, a branch comprises a convolutional layer unit, and the convolutional layer unitComprises a convolution layer, a modified linear unit and a batch standardization layer; let the input be x and the principal branches be denoted as
Figure FDA0003032387700000011
The branch is represented as
Figure FDA0003032387700000012
The output of the residual mapping unit is as shown in equation (2):
Figure FDA0003032387700000013
step 2.3, the decoder part comprises an up-sampling layer and a residual mapping unit corresponding to the encoder part;
step 2.4, combining the depth characteristic graph output by each scale of the encoder part with the characteristic graph obtained by an upper sampling layer under the corresponding scale of the decoder part in a residual error jumping connection mode;
step 2.5, finally, the output of the encoder part is processed by a convolution layer, and then a two-dimensional feature map is converted into a classification map through a Sigmoid activation layer;
and step three, training the multi-scale residual error connection depth network by using the training sample set in the step 1, obtaining an optimal multi-scale residual error connection depth network model by verifying the sample set, and finally performing high-resolution image building extraction on the test set by using the optimal model.
2. The method for extracting high-resolution image buildings based on multi-scale residual error network model according to claim 1, characterized in that: the specific implementation steps of the first step are as follows:
(1) analyzing different characteristics of a typical building area in the high-resolution remote sensing image:
(a) the multi-layer residential area with the brick-concrete structure is a building which is orderly arranged, the planning is orderly, the number of layers is large, the arrangement is orderly, the buildings in the same residential area are uniformly arranged, and the styles of the buildings are uniform;
(b) the high-rise residential district, the single high-rise office building and the commercial building roof with clear building frame structures have the advantages that the streets are neat, the height of the adjacent buildings is large, the long and narrow shadows are formed, the building spacing is large, and the height and the appearance of each building are different;
(c) suburban buildings are sparsely distributed on roofs, are low houses, are irregular in shape, alternate in canine teeth and are connected with one another;
(d) villas are orderly arranged and are all single buildings, the length and the width of the buildings are consistent, and the shadows are short and small; the roof shape is consistent with the outer wall material, and each villa has a garden;
(2) and (3) expanding a training sample set by adopting a plurality of data augmentation strategies:
(a) randomly cutting the input image and the output label image;
(b) the input image and the output label image are randomly rotated, tr∈[-5,5];
(c) Multiplying each wave band of an input image by a random value n, wherein n belongs to [0.5,1 ];
(d) randomly turning the input image and the output label image horizontally and vertically;
(3) data segmentation, namely dividing a data set of building examples and other surface objects covering various urban, suburban and rural areas into a training set, a verification set and a test set, and determining a training set data sample: validation set data samples were 5: 1, dividing the data sample into input images with the size of 512 multiplied by 512, so as to train the model and test the training effect of the model in the following.
3. The method for extracting high-resolution image buildings based on multi-scale residual error network model according to claim 1, characterized in that: in step 2.2, the convolution kernel sizes of the convolution layers in the main branch include 3 × 3 and 1 × 1, the step parameter size is set to be 1, the filling parameter size is 1, and the convolution layers in the branch adopt convolution kernels with the size of 1 × 1.
4. The method for extracting high-resolution image buildings based on multi-scale residual error network model according to claim 3, characterized in that: in the step 2.2, a feature pyramid network input structure is adopted, and images at 5 different scales are input as convolution layers to perform image feature learning at different scales, wherein the image feature learning is 512 × 512 × 3, 256 × 256 × 3, 128 × 128 × 3, 64 × 64 × 3 and 32 × 32 × 3 respectively.
CN202110434612.1A 2021-04-22 2021-04-22 High-resolution image building extraction method based on multi-scale residual error network model Active CN113205018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110434612.1A CN113205018B (en) 2021-04-22 2021-04-22 High-resolution image building extraction method based on multi-scale residual error network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110434612.1A CN113205018B (en) 2021-04-22 2021-04-22 High-resolution image building extraction method based on multi-scale residual error network model

Publications (2)

Publication Number Publication Date
CN113205018A true CN113205018A (en) 2021-08-03
CN113205018B CN113205018B (en) 2022-04-29

Family

ID=77027900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110434612.1A Active CN113205018B (en) 2021-04-22 2021-04-22 High-resolution image building extraction method based on multi-scale residual error network model

Country Status (1)

Country Link
CN (1) CN113205018B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657480A (en) * 2021-08-13 2021-11-16 江南大学 Clothing analysis method based on feature fusion network model
CN113902792A (en) * 2021-11-05 2022-01-07 长光卫星技术有限公司 Building height detection method and system based on improved RetinaNet network and electronic equipment
CN114580564A (en) * 2022-03-21 2022-06-03 滁州学院 Dominant tree species remote sensing classification method and classification system based on unmanned aerial vehicle image
CN115393868A (en) * 2022-08-18 2022-11-25 中化现代农业有限公司 Text detection method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934153A (en) * 2019-03-07 2019-06-25 张新长 Building extracting method based on gate depth residual minimization network
US20200111214A1 (en) * 2018-10-03 2020-04-09 Merck Sharp & Dohme Corp. Multi-level convolutional lstm model for the segmentation of mr images
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111898543A (en) * 2020-07-31 2020-11-06 武汉大学 Building automatic extraction method integrating geometric perception and image understanding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200111214A1 (en) * 2018-10-03 2020-04-09 Merck Sharp & Dohme Corp. Multi-level convolutional lstm model for the segmentation of mr images
CN109934153A (en) * 2019-03-07 2019-06-25 张新长 Building extracting method based on gate depth residual minimization network
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111898543A (en) * 2020-07-31 2020-11-06 武汉大学 Building automatic extraction method integrating geometric perception and image understanding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何代毅等: "基于改进Mask-RCNN的遥感影像建筑物提取", 《计算机系统应用》 *
刘亦凡等: "利用深度残差网络的遥感影像建筑物提取", 《遥感信息》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657480A (en) * 2021-08-13 2021-11-16 江南大学 Clothing analysis method based on feature fusion network model
CN113902792A (en) * 2021-11-05 2022-01-07 长光卫星技术有限公司 Building height detection method and system based on improved RetinaNet network and electronic equipment
CN113902792B (en) * 2021-11-05 2024-06-11 长光卫星技术股份有限公司 Building height detection method, system and electronic equipment based on improved RETINANET network
CN114580564A (en) * 2022-03-21 2022-06-03 滁州学院 Dominant tree species remote sensing classification method and classification system based on unmanned aerial vehicle image
CN115393868A (en) * 2022-08-18 2022-11-25 中化现代农业有限公司 Text detection method and device, electronic equipment and storage medium
CN115393868B (en) * 2022-08-18 2023-05-26 中化现代农业有限公司 Text detection method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113205018B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN113205018B (en) High-resolution image building extraction method based on multi-scale residual error network model
CN117078943B (en) Remote sensing image road segmentation method integrating multi-scale features and double-attention mechanism
CN111126202B (en) Optical remote sensing image target detection method based on void feature pyramid network
CN115601549B (en) River and lake remote sensing image segmentation method based on deformable convolution and self-attention model
CN111259906B (en) Method for generating remote sensing image target segmentation countermeasures under condition containing multilevel channel attention
CN113850825A (en) Remote sensing image road segmentation method based on context information and multi-scale feature fusion
CN111598045B (en) Remote sensing farmland change detection method based on object spectrum and mixed spectrum
CN110263705A (en) Towards two phase of remote sensing technology field high-resolution remote sensing image change detecting method
CN114187450B (en) Remote sensing image semantic segmentation method based on deep learning
CN113486764B (en) Pothole detection method based on improved YOLOv3
CN115223063B (en) Deep learning-based unmanned aerial vehicle remote sensing wheat new variety lodging area extraction method and system
CN113345082A (en) Characteristic pyramid multi-view three-dimensional reconstruction method and system
CN114092697B (en) Building facade semantic segmentation method with attention fused with global and local depth features
CN113657326A (en) Weed detection method based on multi-scale fusion module and feature enhancement
CN110334719B (en) Method and system for extracting building image in remote sensing image
CN115223017B (en) Multi-scale feature fusion bridge detection method based on depth separable convolution
CN109740485A (en) Reservoir or dyke recognition methods based on spectrum analysis and depth convolutional neural networks
CN114663439A (en) Remote sensing image land and sea segmentation method
CN112883887B (en) Building instance automatic extraction method based on high spatial resolution optical remote sensing image
CN108256464A (en) High-resolution remote sensing image urban road extracting method based on deep learning
CN114821069A (en) Building semantic segmentation method for double-branch network remote sensing image fused with rich scale features
CN115359366A (en) Remote sensing image target detection method based on parameter optimization
CN110147780B (en) Real-time field robot terrain identification method and system based on hierarchical terrain
Guo et al. Monitoring the spatiotemporal change of Dongting Lake wetland by integrating Landsat and MODIS images, from 2001 to 2020
Liu et al. A new multi-channel deep convolutional neural network for semantic segmentation of remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant