Nothing Special   »   [go: up one dir, main page]

SAR AI Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 1

Deep Learning Meets SAR


Xiao Xiang Zhu, Fellow, IEEE, Sina Montazeri, Mohsin Ali, Yuansheng Hua, Member, IEEE, Yuanyuan
Wang, Member, IEEE, Lichao Mou, Member, IEEE, Yilei Shi, Member, IEEE, Feng Xu, Senior Member, IEEE,
Richard Bamler, Fellow, IEEE

Abstract—This is the pre-acceptance version, to read the final Inspired by numerous successful applications in the com-
version please go to IEEE Geoscience and Remote Sensing puter vision community, the use of deep learning in remote
Magazine on IEEE XPlore. sensing is now obtaining wide attention [5]. As first attempts in
Deep learning in remote sensing has become an international
hype, but it is mostly limited to the evaluation of optical Synthetic Aperture Radar (SAR), deep learning-based methods
data. Although deep learning has been introduced in Synthetic have been adopted for a variety of tasks, including terrain sur-
arXiv:2006.10027v2 [eess.IV] 5 Jan 2021

Aperture Radar (SAR) data processing, despite successful first face classification [6], object detection [7], parameter inversion
attempts, its huge potential remains locked. In this paper, we [8], despeckling [9], specific applications in Interferometric
provide an introduction to the most relevant deep learning SAR (InSAR) [10], and SAR-optical data fusion [11].
models and concepts, point out possible pitfalls by analyzing
special characteristics of SAR data, review the state-of-the-art For terrain surface classification from SAR and Polarimetric
of deep learning applied to SAR in depth, summarize available SAR (PolSAR) images, effective feature extraction is essential.
benchmarks, and recommend some important future research These features are extracted based on expert domain knowl-
directions. With this effort, we hope to stimulate more research edge and are usually applicable to a small number of cases and
in this interesting yet under-exploited research field and to pave data sets. Deep learning feature extraction has however proved
the way for use of deep learning in big SAR data processing
workflows. to overcome, to some degrees, both of the aforementioned
issues [6]. For SAR target detection, conventional approaches
Index Terms—Benchmarks, deep learning, despeckling, Inter-
mainly rely on template matching, where specific templates
ferometric Synthetic Aperture Radar (InSAR), object detection,
parameter inversion, Synthetic Aperture Radar (SAR), SAR- are created manually [12] to classify different categories, or
optical data fusion, terrain surface classification. through the use of traditional machine learning approaches,
such as Support Vector Machines (SVMs) [13], [14]; in
I. M OTIVATION contrast, modern deep learning algorithms aim at applying
deep CNNs to extract discriminative features automatically for
In recent years, deep learning [1] has been developed at a target recognition [7]. For parameter inversion, deep learning
dramatic pace, achieving great success in many fields. Unlike models are employed to learn the latent mapping function from
conventional algorithms, deep learning-based methods com- SAR images to estimated parameters, e.g., sea ice concentra-
monly employ hierarchical architectures, such as deep neural tion [8]. Regarding despeckling, conventional methods often
networks, to extract feature representations of raw data for rely on artificial filters and may suffer from mis-eliminating
numerous tasks. For instance, convolutional neural networks sharp features when denoising. Furthermore, the development
(CNNs) are capable of learning low- and high-level features of joint analysis of SAR and optical images has been motivated
from raw images with stacks of convolutional and pooling by the capacities of extracting features from both types of
layers, and then applying the extracted features to various images. For applications in InSAR, only a few studies have
computer vision tasks, such as large-scale image recognition been carried out such as the work described in [10]. However,
[2], object detection [3], and semantic segmentation [4]. these algorithms neglect the special characteristics of phase
The work of X. Zhu is jointly supported by the European Research Council and simply use an out-of-the-box deep learning-based model.
(ERC) under the European Union’s Horizon 2020 research and innovation pro- Despite the first successes, and unlike the evaluation of
gramme (grant agreement No. [ERC-2016-StG-714087], Acronym: So2Sat), optical data, the huge potential of deep learning in SAR and
by the Helmholtz Association through the Framework of Helmholtz AI -
Local Unit “Munich Unit @Aeronautics, Space and Transport (MASTr)” and InSAR remains locked. For example, to the best knowledge
Helmholtz Excellent Professorship “Data Science in Earth Observation - Big of the authors, there is no single example of deep learning in
Data Fusion for Urban Research” and by the German Federal Ministry of SAR that has been developed up to operational processing of
Education and Research (BMBF) in the framework of the international future
AI lab ”AI4EO” (Grant number: 01DD20001). (Corresponding author: Xiao big data or integrated into the production chain of any satellite
Xiang Zhu). mission. This paper aims at stimulating more research in this
X. Zhu, M. Ali, Y. Hua, and L. Mou are with the Remote Sensing interesting yet under-exploited research field.
Technology Institute (IMF), German Aerospace Center (DLR), Germany and
with Data Science in Earth Observation (SiPEO, former: Signal Processing In the remainder of this paper, Section II first introduces
in Earth Observation), Technical University of Munich (TUM), Germany (e- the most commonly used deep learning models in remote
mails: xiaoxiang.zhu@dlr.de). sensing. Section III describes the specific characteristics of
S. Montazeri and Y. Wang are with DLR-IMF, Wessling, Germany.
Y. Shi is with the Chair of Remote Sensing Technology (LMF), TUM, SAR data that have to be taken into account to exploit the full
80333 Munich, Germany. potential of SAR combined with deep learning. Section IV
F. Xu is with the Key Laboratory for Information Science of Electromag- details recent advances in the utilization of deep learning on
netic Waves (MoE), Fudan Univeristy, Shanghai, China.
R. Bamler is with DLR-IMF, 82234 Wessling, Germany and TUM-LMF, different SAR applications, which were outlined earlier in the
80333 Munich, Germany. section. Section V reviews the existing benchmark data sets
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 2

for different applications of SAR and their limitations. Finally, 3x3-sized convolutional kernels, which enabled it to have more
Section VI concludes current research, and gives an overview number of channels and in turn capture more diverse features.
of promising future directions. ResNet [31], U-Net [32], and DenseNet [33] were the
next major CNN architectures. The main feature of all these
architectures was the idea of connecting, not only neighboring
II. I NTRODUCTION TO R ELEVANT D EEP L EARNING layers but any two layers in the network, by using skip
M ODELS AND CONCEPTS connections. This helped reduce loss of information across
In this section, we briefly review relevant deep learning al- networks, mitigated the problem of vanishing gradients and
gorithms originally proposed for visual data processing that are allowed the design of deeper networks. U-Net is one of the
widely used for the state-of-the-art research of deep learning most commonly used image segmentation networks. It has
in SAR. In addition, we mention the latest developments of autoencoder based architecture where it uses skip connections
deep learning, which are not yet widely applied to SAR but to concatenate features from the first layer to last, second
may help create next generation of its algorithms. Fig. 1 gives to second last, and so on: this way it can get fine-grained
an overview of the deep learning models we discuss in this information from initial layers to the end layers. U-Net was
section. initially proposed for medical image segmentation, where
Before discussing deep learning algorithms, we would data labeling is a big problem. The authors used heavy data
like to stress that the importance of high-quality benchmark augmentation techniques on input data, making it possible to
datasets in deep learning research cannot be overstated. Es- learn from only a few hundred annotated samples. In ResNet
pecially in supervised learning, the knowledge that can be skip connections were used within individual blocks and not
learned by the model is bounded by the information present across the whole network. Since its initial proposal, it has seen
in the training dataset. For example, the MNIST [25] dataset many architectural tweaks, and even after 4-5 years its variants
played a key role in Yann LeCun’s seminal paper about con- are always among the top scorers on ImageNet. In DenseNet
volutional neural networks and gradient-based learning [26]. all the layers were attached to all preceding layers, reducing
Similarly, there would be no AlexNet [27], the network that the size of the network, albeit at the cost of memory usage.
kick-started the current deep learning renaissance, without the For a more detailed explanations of different CNN models,
ImageNet [28] dataset, which contains over 14 million images interested readers are referred to [34]. These CNN models
and 22,000 classes. ImageNet has been such an important part have also proved their worth in SAR processing tasks e.g. see
of deep learning research that, even after over 10 years of [35], [36], [37]. For more examples and details of CNNs in
being published, it is still used as a standard benchmark to SAR we refer our readers to Section IV.
evaluate the performance of CNNs for image classification. 2) Recurrent Neural Networks (RNN): Besides CNNs,
RNNs [38] are another major class of deep networks. Their
main building blocks are recurrent units, which take the current
A. Deep Learning Models input and output of the previous state as input. They provide
The main principle of deep learning models is to encode state-of-the-art results for processing data of variable lengths
input data into effective feature representations for target tasks. like text and time series data. Their weights can be replaced
To examplify how a deep learning framework works, we with convolutional kernels for visual processing tasks such as
take autoencoder as an example: it first maps an input data image captioning and predicting future frames/points in visual
to a latent representation via a trainable nonlinear mapping time-series data. Long short term memory (LSTM) [39] is
and then reconstructs inputs through reverse mapping. The one of the most popular architectures of RNN: its cells can
reconstruction error is usually defined as the Euclidian distance store values from any past instances while not being severely
between inputs and reconstructed inputs. Parameters of autoen- affected by the problem of gradient diminishing. Just like in
coders are optimized by gradient descent based optimizers, like any other time series data from deep learning toolkit the RNNs
stochastic gradient descent (SGD), RMSProp [29] and ADAM are natural choice to process SAR time series data, e.g. see
[30], during the backpropagation step. [40].
1) Convolutional Neural Networks (CNN): With the suc- 3) GANs: Proposed by Ian Goodfellow et al. [41], GANs
cess of AlexNet in the ImageNet Large Scale Visual Recog- are among the most popular and exciting inventions in the field
nition Challenge (ILSVRC-2012), where it scored a top-5 of deep learning. Based on game-theoretic principles, they
test error of 15.3% compared to 26.2% of the second best, consist of two networks called a generator and a discriminator.
CNNs have attracted worldwide attention and are now used for The generator’s objective is to learn a latent space, through
many image understanding tasks, such as image classification, which it can generate samples from the same distribution
object detection, and semantic segmentation. AlexNet consists as the training data, while the discriminator tries to learn
of five convolutional layers, three max-pooling layers, and to distinguish if a sample is from the generator or training
three fully-connected layers. One of the key innovations of data. This very simple mechanism is responsible for most
the AlexNet was the use of GPUs, which made it possible to cutting-edge algorithms of various applications, e.g., gener-
train such large networks with huge datasets without using ating artificial photo-realistic images/videos, super-resolution,
supercomputers. In just two years, VGGNet [2] overtook and text to image synthesis. For example in the SAR domain
AlexNet in performance by achieving a 6.8% top-5 test error GANs have already been successfully used in cloud removal
in ILSVRC-2014; the main difference was that it only used applications [42], [43]. The reader is referred to Section IV
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 3

CNN
VGG Net ResNet Block UNet LSTM Unit

RNN
Deep
Generative Learning
models RNN Unit Unfold
Deep RL
GNN
Variational Auto Encoder

Neural Architecture Search


Generative Adversarial Network Convolutional GNN Recurrent GNN Using DeepRL
Fig. 1: A Selection of relevant deep learning models. Sources of the images: VGG [15], ResNet [16], U-Net [17], LSTM [18],
RNN [19], VAE [20], GAN [21], CGNN [22], RGNN [23], and DeepRL [24].

for more examples. can be used by the decoder to generate the whole distribution
of outputs. The trick to learning this distribution is to also
B. Supervised, Unsupervised and Reinforcement Learning learn variance along with mean of latent representation at
the encoder-decoder meeting point and add a KL-divergence-
1) Supervised Learning: Most of popular deep learning
based loss term to the standard reconstruction loss function of
models fall under the category of supervised deep learning,
the autoencoders.
i.e. they need labelled datasets to learn the objective functions.
One of big challenges of supervised learning is generalization, 3) Deep Reinforcement Learning (DeepRL): Reinforce-
i.e. how well a trained model performs on test data. Therefore ment Learning (RL) tries to mimic the human learning behav-
it is vital that training data truly represents the true distribution ior, i.e., taking actions and then adjusting them for the future
of data so it can handle all the unseen data. If the model according to feedback from the environment. For example,
fits well on training data and fails on test data then it is young children learn to repeat or not repeat their actions based
called overfitting, in deep learning literature there are several on the reaction of their parents. The RL model consists of an
techniques that can be used to avoid it, e.g. Dropout[44]. environment with states, actions to transition between those
2) Unsupervised Learning: Unsupervised learning refers states, and a reward system for ending up in different states.
to the class of algorithms where the training data do not The objective of the algorithm is to learn the best actions for
contain labels. For instance, in classical data analysis, principal given states using a feedback reward system. In a classical RL
component analysis (PCA) [45] can be used to reduce the algorithms function, approximators are used to calculate the
data dimension followed by a clustering algorithm to group probability of different actions in different states. DeepRL uses
similar data points. In deep learning generative models like different types of neural networks to create these functions
autoencoders and variational autoencoders (VAEs) [46] and [47][48]. Recently DeepRL received particular attention and
Generative Adversarial Networks (GANs) [41] are some of popularity due to the success of Google Deep Mind’s AlphaGo
popular techniques that can be used for unsupervised learning. [49], which defeated the Go board game world champion. This
Their primary goal is to generate output data from the same task was considered impossible by computers just until a few
distribution as input data. Autoencoders consists of an encoder years ago.
part which finds compressed latent representation of input
and a decoder part which decodes that representation back
C. Relevant Deep Learning Concepts
to the original input. VAEs take autoencoders to the next level
by learning the whole distribution instead of just a single 1) Automatic Machine Learning (AutoML): Deep networks
representation at the end of the encoder part, which in turn have many hyperparameters to choose from, for example,
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 4

number of layers, kernel sizes, type of optimizer, skip connec- optical imagery in many respects. These are a few points to
tions, and the like. There are billions of possible combinations be considered when transferring CNN experience and expertise
of these parameters and given high computational cost, time, from optical to SAR data:
and energy costs it is hard to find the best performing network • Dynamic Range. Depending on their spatial resolution,
even from among a few hundred candidates. In the case of the dynamic range of SAR images can be up to 90
deep learning, the objective of AutoML is mainly to find the dB (TerraSAR-X high resolution spotlight data with a
most efficient and high performing deep network for a given resolution of about 1 m). Moreover, the distribution is
dataset and task. The first major attempt in this field was by extremely asymmetric, with the majority of pixels in the
Zoph et al. [50], who used DeepRL to find the optimum CNN low amplitude range (distributed scatterers) and a long
for image classification. In their system an RNN creates CNN tail representing bright discrete scatterers, in particular
architectures and, based on their classification results, proposes in urban areas. Standard CNNs are not able to handle
changes to them. This process continues to loop until the such dynamic ranges and, hence, most approaches feature
optimum architecture is found. This algorithm was able to find dynamic compression as a preprocessing step. In [57], the
competing networks compared to the state-of-the-art but took authors first take only amplitude values from 0 to 255
over 800 GPUs, which was unrealistic for practical application. and then subtract mean values of each image. In [11],
Recently, there have been many new developments in the [58], normalization is performed as a pre-processing step,
AutoML field, which have made it possible to perform such which compresses the dynamic range significantly.
tasks in more intelligent and efficient ways. More details about
the field of network architectural search can be found in [51]. • Signal Statistics. In order to retrieve features from SAR
Furthermore AutoML have also already been successfully (amplitude or intensity) images the speckle statistics must
applied to SAR for PolSAR classification [52]. The method be considered. Speckle is a multiplicative, rather than an
shows great potential for segmentation and classification tasks additive, phenomenon. This has consequences: While the
in particular. optimum estimator of radar brightness of a homogeneous
2) Geometric Deep Learning – Graph Neural Networks image patch under speckle is a simple moving averaging
(GNNs): Except for well-structured image data, there is a large operation (i.e., a convolution, like in the additive noise
amount of unstructured data, e.g., knowledge graphs and social case), other optimum detectors of edges and low-level
networks, in real life that cannot be directly processed by a features under additive Gaussian noise may no longer
deep CNN. Usually, these data are represented in the form be optimum in the case of SAR. A popular example
of graphs, where each node represents an entity and edges is Touzi’s CFAR edge detector [59] for SAR images,
delineate their mutual relations. To learn from unstructured which uses the ratio of two spatial averages over adjacent
data, geometric deep learning has been attracting an increasing windows. This operation cannot be emulated by the first
attention, and a most-commonly used architecture is GNN, layer of a standard CNN.
which is also proven successful in dealing with structured Some studies use a logarithmic mapping of the SAR
data. Specifically, Using the terminology of graphs, nodes of images prior to feeding them into a CNN [60], [9]. This
a graph can be regarded as feature descriptions of entities, turns speckle into an additive random variable and —as
and their edges are established by measuring their relations a side effect —reduces dynamic range. But still, a single
or distances and encoded in an adjacency matrix. Once a convolutional layer can only emulate approximations to
graph is constructed, messages can be propagated among each optimum SAR feature estimators. It could be valuable to
node by simply performing matrix multiplication. Follow- supplement the original log-SAR image by a few lowpass
ingly, [53] proposed Graph Convolutional Networks (GCNs) filtered and logarithmized versions as input to the CNN.
characterized by utilizing graph convolutions, and [45] fasten Another approach is to apply some sophisticated speckle
the process. Moreover recurrent units in RGNNs (Recurrent reduction filter before entering the CNN, e.g., non-local
Graph Neural Network) [54] [55] have also been proven to averaging [61], [62], [63].
obtain achievements in learning from graphs. The usefulness
of GNNs in SAR is still to be properly explored as [56] is one • Imaging Geometry. The SAR image coordinates range
of the only attempts in trying to do so. and azimuth are not arbitrary coordinates like East and
North or x and y, but rather reflect the peculiarities of
III. P OSSIBLE P ITFALLS the image generation process. Layover always occurs at
near range shadow always at far range of an object. That
To develop tailored deep learning architectures and prepare means, that data augmentation by rotation of SAR images
suitable training datasets for SAR or InSAR tasks, it is would lead to nonsense imagery that would never be
important to understand that SAR data is different from optical generated by a SAR.
remote sensing data, not to mention images downloaded from
the internet. In this section, we discuss the special character- • The Complex Nature of SAR Data. The most valuable
istics (and possible pitfalls) encountered while applying deep information of SAR data lies in its phase. This applies for
learning to SAR. SAR image formation, which takes place in the complex
What makes SAR data and SAR data processing by neural signal domain, as well as for polarimetric, interferometric,
networks unique? SAR data are substantially different from and tomographic SAR data processing. This means that
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 5

the entire CNN must be able to handle complex numbers. Unlike in the optical imaging field, where highly realistic
For the convolution operation this is trivial. The nonlinear scenes can be simulated, e.g. by PC games, the simulation
activation function and the loss function, however, require of SAR data is more a scientific topic without the power
thorough consideration. Depending on whether the acti- of commercial companies and a huge market. SAR sim-
vation function acts on the real and imaginary parts of ulators focus on specific scenarios, e.g. vegetation (only
the signal independently, or only on its magnitude, and distributed scatterers considered) or persistent (point)
where bias is added, phase will be distorted to different scatterers. The most advanced simulators are probably the
degrees. ones for computing radar backscatter signatures of single
If we use polarimetric SAR data for land cover or military objects, like vessels. To our knowledge though
target classification, a nonlinear processing of the phase there is no simulator available that can , e.g., generate
is even desirable, because the phase between different realistic interferometric data of rugged terrain with lay-
over, spatially varying coherence, and diverse scattering
polarimetric channels has physical meaning and, hence,
contributes to the classification process. mechanisms. Often simplified scattering assumptions are
made, e.g. that speckle is multiplicative. Even this is not
In SAR interferometry and tomography, however, the true; pure Gaussian scattering can only be found for quite
absolute phase has no meaning, i.e., the CNN must homogeneous surfaces and low resolution SARs. As soon
be invariant to an arbitrary phase offset. Assume some as the resolution increases chances for a few dominating
interferometric input signal x to a CNN and the output scatterers in a resolution cell increase as well and the
signal CN N (x) with phase statistics become substantially different from the one of
φ̂ = 6 CN N (x). (1) fully developed speckle

Any constant phase offset φ0 does not change the mean- IV. R ECENT A DVANCES IN D EEP LEARNING APPLIED TO
ing of the interferogram. Hence, we require an invariance SAR
that we refer to as ”phase linearity” (valid at least in the In this section, we provide an in-depth review of deep
expectation): learning methods applied to SAR data from six perspectives:
terrain surface classification, object detection, parameter in-
CN N (xejφ0 ) = CN N (x)ejφ0 . (2) version, despeckling, SAR Interferometry (InSAR), and SAR-
This linearity is violated, for example, if the activation optical data fusion. For each of these six applications, no-
function is applied to real and imaginary parts separately, table developments are stated in the chronological order, and
or if a bias is added to the complex numbers. their advantages and disadvantages are reported. Finally, each
subsection is concluded with a brief summary. It is worth
Another point to consider in regression-type InSAR mentioning that the application of deep learning to SAR image
CNN processing (e.g., for noise reduction) is the loss formation is not explicitly treated here. For SAR focusing
function. If the quantity of interest is not the complex we have to distinguish between general-purpose focusing and
number itself, but its phase, the loss function must be imaging of objects with a priori known properties, like spar-
able to handle the cyclic nature of phases. It may also be sity. General-purpose algorithms produce data for applications
advantageous that the loss function is independent—at like Land use and Land cover (LULC) classification, glacier
least to a certain degree —of the signal magnitude to monitoring, biomass estimation or interferometry. These are
relieve the CNN from modelling the magnitude. A loss complex-valued focused data that retain all the information
function that meets these requirements is, for example, contained in the raw data. General-purpose focusing has
6 CN N (x)−6 y) a well-defined system model and requires a sequence of
L = |E[ej( ]|, (3)
Fast Fourier Transform (FFT)s and phasor multiplications,
where y is the reference signal. i.e. linear operations like matrix-vector multiplications. For
Some authors use magnitude and phase, rather than decades optimal algorithms have been developed to perform
real and imaginary parts, as input to the CNN. This these operations at highest speed and with diffraction limited
approach is not invariant to phase offset, either. The in- accuracy. There is no reason why deep neural networks should
terpretation of a phase function as a real-valued function perform better or faster that this gold standard. If we want to
forces the CNN to disregard the sharp discontinuities at introduce prior knowledge about the imaged objects, however,
the ±π-transitions, whose positions are inconsequential. specialized focusing algorithms may be beneficially learned by
A standard CNN would pounce on these, interpreting neural networks. But even then it might make sense to focus
them as edges. raw data first by a standard algorithm and apply deep learning
for post-processing. In [64] a CNN is trained to focus sparse
• Simulation-based Training and Validation Data? The military targets. But even in this approach the raw data are
prevailing lack of ground-truth for regression-type tasks, partially focused by FFT, before entering to the CNN.
like speckle reduction or InSAR denoising, might tempt
us to use simulated SAR data for training and validation A. Terrain Surface Classification
of neural networks. However, this bears the risk that our As an important direction of SAR applications, terrain
networks will learn models that are far too simplified. surface classification using PolSAR images is rapidly ad-
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 6

vancing with the help of deep learning. Regarding feature classification, in which discriminant features are learned by
extraction, most conventional methods rely on exploring phys- combining ensemble learning with a deep belief network in
ical scattering properties [65] and texture information [66] an unsupervised manner. Moreover, taking into account that
in SAR images. However, these features are mainly human- most current deep learning methods aim at exploiting features
designed based on specific problems and characteristics of data either from polarization information or spatial information
sources. Compared to conventional methods, deep learning is of PolSAR images, Gao et al. [72] proposed a dual-branch
superior in terrain surface classification due to its capability of CNN to learn features from both perspectives for terrain
automatically learning discriminative features. Moreover, deep surface classification. This method is built on two feature
learning approaches, such as CNNs, can effectively extract not extraction channels: one to extract polarization features from
only polarimetric characteristics but also spatial patterns of the 6-channel real matrix, and the other to extract spatial
PolSAR images [6]. Some of the most notable deep learning features of a Pauli decomposition. Next the extracted features
techniques for PolSAR image classification are reviewed in are combined using two parallel fully connected layers, and
the following. finally fed to a softmax layer for classification. The detailed
Xie et al. [67] first applied deep learning to terrain surface architecture of this network is illustrated in Fig. 3.
classification using PolSAR images. They employed a stacked Different variations of CNNs have been used for terrain
auto encoder (SAE) to automatically learn deep features from surface classification as well. In [77], Zhou et al. first extracted
PolSAR data and then fed them to a softmax classifier. a 6-channel covariance matrix and then fed it to a trainable
Remarkable improvements in both classification accuracy and CNN for PolSAR image classification. Wang et al. [78]
visual effect proved that this method can effectively learn proposed a fully convolutional network integrated with sparse
a comprehensive feature representation for classification pur- and low-rank subspace representations for classifying PolSAR
poses. images. Chen et al. [79] improved CNN performances by
Instead of simply applying SAE, Geng et al. [70] proposed incorporating expert knowledge of target scattering mechanism
a deep convolutional autoencoder (DCAE) for automatically interpretation and polarimetric feature mining. In a more recent
extracting features and performing classification. The first work [80], He et al. proposed the combination of features
layer of DCAE is a hand-crafted convolutional layer, where learned from nonlinear manifold embedding and applying a
filters are pre-defined, such as gray-level co-occurrence ma- fully convolutional network (FCN) on input PolSAR images;
trices and Gabor filters. The second layer of DCAE performs the final classification was carried out in an ensemble approach
a scale transformation, which integrates correlated neighbor by SVM. In [81], the authors focused on the computational
pixels to reduce speckle. Following these two hand-crafted efficiency of deep learning methods, proposing the use of
layers, a trained SAE, which is similar to [67], is attached lightweight 3D CNNs. They showed that classification accu-
for learning more abstract features. Tested on high-resolution racy comparable to other CNN methods was achievable while
single-polarization TerraSAR-X images, the method achieved significantly reducing the number of learned parameters and
remarkable classification accuracy. therefore gaining computational efficiency.
Based on DCAE, Geng et al. [68] proposed a frame- Apart from these single-image classification schemes using
work, called deep supervised and contractive neural network CNN, the use of time series of SAR images for crop classifica-
(DSCNN), for SAR image classification, which introduces tion has been shown in [40], [82]. The authors of both papers
histogram of oriented gradient (HOG) descriptors. In addi- experimented with using Recurrent Neural Network (RNN)-
tion, a supervised penalty is designed to capture relevant based architectures to exploit the temporal dependency of
information between features and labels, and a contractive multi-temporal SAR images to improve classification accuracy.
restriction, which can enhance local invariance, is employed A unique approach for tackling PolSAR classification was
in the following trainable autoencoder layers. An example of recently proposed in [52], where for the first time the authors
applying DSCNN on TerraSAR-X data from a small area in utilized an AutoML technique to find the optimum CNN
Norway is seen in Fig. 2. Compared to other algorithms, the architecture for each dataset. The approach takes into account
capability of DSCNN to achieve a highly accurate and noise the complex nature of PolSAR images, is cost effective, and
free classification map is observed. achieves high classification accuracy [52].
In addition to the aforementioned methods, many stud- Most of the aforementioned methods rely primarily on
ies integrate SAE models with conventional classification preprocessing or transforming raw complex-valued data into
algorithms for terrain surface classification. Hou et al. [73] features in the real domain and then inputting them in a
proposed an SAE combined with superpixel for PolSAR image common CNN, which constrains the possibility of directly
classification. Multiple layers of the SAE are trained on a learning features from raw data. To tackle this problem,
pixel-by-pixel basis. Superpixels are formed based on Pauli- Zhang et al. [83] proposed a novel complex-valued CNN
decomposed pseudo-color images. Outputs of the SAE are (CV-CNN) specifically designed to process complex values
used as features in the final step of k-nearest neighbor clus- in PolSAR data, i.e., the off-diagonal elements of a coherency
tering of superpixels. Zhang et al. [74] applied stacked sparse or covariance matrix. The CV-CNN not only takes complex
AE to PolSAR image classification by taking into account numbers as input but also employs complex weights and
local spatial information. Qin et al. [75] applied adaptive complex operations throughout different layers. A complex-
boosting of RBMs to PolSAR image classification. Zhao et al. valued backpropagation algorithm is also developed for CV-
[76] proposed a discriminant DBN (DisDBN) for SAR image CNN training. Other notable complex-valued deep learning
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 7

Fig. 2: Classification maps obtained from a TerraSAR-X image of a small area in Norway [68]. Subfigures (a)-(f) depict
the results of classification using SVM (accuracy = 78.42%), sparse representation classifier (SRC) (accuracy = 85.61%),
random forest (accuracy = 82.20%) [69], SAE (accuracy = 87.26%) [67], DCAE (accuracy = 94.57%) [70], contractive AE
(accuracy = 88.74). Subfigures (g)-(i) show the combination of DSCNN with SVM (accuracy = 96.98%), with SRC (accuracy
= 92.51%) [71], and with random forest (accuracy = 96.87%). Subfigures (j) and (k) represent the classification results of
DSCNN (accuracy = 97.09%) and DSCNN followed by spatial regularization (accuracy = 97.53%), which achieve higher
accuracy than the other methods.

Fig. 3: The architecture of the dual-branch deep convolution neural network (Dual-CNN) for PolSAR image classification,
proposed in [72].

approaches for classification using PolSAR images can be perform a feature fusion based on spatial features learned from
found in [84], [85], [86]. Different from the previously men- intensity images and time-frequency features extracted from
tioned works, which exploit the complex-valued nature of SAR spectral analysis of complex SAR images. Since the time-
images in PolSAR image classification, Huang et al. [87] has frequency features are highly relevant for distinguishing differ-
recently proposed a novel deep learning framework called ent backscattering mechanisms within SAR images, they gain
Deep SAR-Net for land use classification focusing on feature accuracy in classifying man-made objects compared to the use
extraction from single-pol complex SAR images. The authors of typical CNNs, which only focus on spatial information.
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 8

Although not completely related to terrain surface classi- the MSTAR dataset [96]. The experiments in [7] showed great
fication, it is also worth mentioning that the combination of potential for applying CNNs to SAR target recognition. With
SAR and PolSAR images with feed-forward neural networks this discovery, Chen et al. [97] proposed A-ConvNets, a simple
has been extensively used for sea ice classification. This topic 5-layer CNN that was able to achieve state-of-the-art accuracy
is not treated any further in this section and the interested of about 99% on the MSTAR dataset.
reader is referred to consult [88], [89], [90], [91], [92] for Following this trend, more and more authors applied CNNs
more information. Similar to the polarimetric signature, In- to the MSTAR dataset [37], [98], [99]. Morgan [37] success-
SAR coherence provides information about physical scattering fully applied a modestly sized 3-layered CNN on MSTAR
properties. In [35] interferometric volume decorrelation is used and building upon it Wilmanski et al. [100] investigated the
as a feature for forest/non-forest mapping together with radar effects of initialization and optimizer selection on final results.
backscatter and incidence angle. The authors used bistatic Ding et al. [98] investigated the capabilities of a CNN model
TanDEM-X data where temporal decorrelation can be ne- combined with domain-specific data augmentation techniques
glected. They compared different architectures and concluded (e.g., pose synthesis and speckle adding) in SAR object detec-
that CNNs outperform random forest and U-Net [32] proved tion. Furthermore, Du et al. [99] proposed a displacement- and
best for this segmentation task. rotation-insensitive CNN, and claimed that data augmentation
To summarize, it is apparent that deep learning-based SAR on training samples is necessary and critical in the pre-
and PolSAR classification algorithms have advanced consid- processing stage.
erably in the past few years. Although at first the focus was On the same dataset, instead of treating CNN as an end-to-
based on low-rank representation learning using SAE [67] and end model, Wagner [101] and similarly Gao [102] integrated
its modifications [70], later research focused on a multitude of CNN and SVM, by first using a CNN to extract features, and
issues relevant to SAR imagery, such as taking into account then feeding them to an SVM for final prediction. Specifically,
speckle [70], [68] preserving spatial structures [72] and their Gao et al. [103] added a class of separation information
complex nature [83], [84], [85], [87]. It can also be seen to the cross-entropy cost function as a regularization term,
that the challenge of the scarcity of labeled data has driven which they show explicitly facilitates intra-class compactness
researchers to use semi-supervised learning algorithms [86] and separtability, in turn improving the quality of extracted
although weakly supervised methods for semantic annotation, features. More recently, Furukawa [104] proposed VersNet,
that has been proposed for high resolution optical data [93], an encoder-decoder style segmentation network, to not only
has not been explicitly explored for classification tasks using identify but also localize multiple objects in an input SAR im-
SAR data. Furthermore, specific metric learning approaches age. Moreover, Zhang et al. [95] proposed an approach based
to enhance class separability [94] can be adopted for SAR on multi-aspect image sequences as a pre-processing step. In
imagery in order to improve the overall classification accuracy. the contribution, they are taking into account backscattering
Finally, one of machine learning’s important fields, AutoML, signals from different viewing geometries, following feature
a field that had not been exploited extensively by the remote extraction using Gabor filters, dimensionallity reduction and
sensing community, has found its application for PolSAR eventually feeding the results to a Bidirectional LSTM model
image classification [52]. for joint recognition of targets. The flowchart of this SAR ATR
framework is illustrated in Fig. 4.
Besides truck detection, ship detection is another tackled
B. Object Detection
SAR object detection task. Early studies on applying deep
Although various characteristics distinguish SAR images learning models to ship detection [105], [106], [107], [108],
from optical RGB images, the SAR object detection problem is [109] mainly consist of two stages: first cropping patches from
still analogous to optical image classification and segmentation the whole SAR image and then identifying whether cropped
in the sense that feature extraction from raw data is always patches belong to target objects using a CNN. Because of fixed
the prior and crucial step. Hence, given success in the optical patch sizes these methods were not robust enough to cater for
domain, there is no doubt that deep learning is one of the most variations in ship geometry, like size and shape. This problem
promising ways to develop the state-of-the-art SAR object was overcome by using region-based CNNs [110], [111], with
detection algorithms. creative use of skip connections and feature fusion techniques
The majority of earlier works on SAR object detection in later literature. For example, Li et al. [112] fuses features
using deep learning consists of taking successful deep learning of the last three convolution layers before feeding them to a
methods for optical object detection and applying them with region proposal network (RPN). Kang et al. [113] proposed
minor tweaks to military vehicle detection (MSTAR dataset; a contextual region based network that fuses features from
see subsection V-C) or ship detection on custom datasets. Even different levels. Meanwhile, to make the most use of features
small-sized networks are easily able to achieve more than 90% of different resolution, Jiao et al. [114] densely connected each
test accuracy on most of these tasks. layer to its subsequent layers and fed features from all layers to
The first attempt in military vehicle detection can be found separate RPN to generat proposals; in the end the best proposal
in [7], where Chen et al. used an unsupervised sparse autoen- was chosen based on an intersection-over-union score.
coder to generate convolution kernels from random patches of In more recent works on SAR object detection, scientists
a given input for a single-layer CNN, which generated features have tried to explore many other interesting ideas to comple-
to train a softmax classifier for classifying military targets in ment current works. Dechesne et al. [115] proposed a multitask
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 9

Fig. 4: The flowchart of the multi-aspect-aware bi-directional approach for SAR ATR proposed in [95].

network that simultaneously learned to detect, classify, and the challenge of identifying characteristics of SAR imagery
estimate the length of ships. Mullissa et al. [116] showed like imaging geometry, size of objects, and speckle noise. The
that CNNs can be trained directly on Complex-Valued SAR second and bigger challenge is the lack of good quality stan-
data; Kazemi et al. [117] performed object classification using dardized datasets. As we observed, the most popular dataset,
an RNN based architecture directly on received SAR signal MSTAR, is too easy for deep nets, and for ship detection,
instead of processed SAR images; and Rostami et al. [118] the majority of authors created their datasets, which makes it
and Huang et al. [119] explored knowledge transfer or transfer very hard to judge the quality of the proposed algorithms and
learning from other domains to the SAR domain for SAR even harder to compare different algorithms. An example of
object detection. difficult to create dataset is that of a dataset for global building
Perhaps one of the more interesting recent works in this detection. The shape, size, and style of the buildings changes
application area is building detection by Shahzad et al. [120]. from region to region quite drastically, and so a good dataset
They tackle the problem of Very High Resolution (VHR) for this purpose requires training examples of buildings from
SAR building detection using a FCN [121] architecture for around the world which needs quite a big effort to do high
feature extraction, followed by CRF-RNN [122], which helps quality annotation of enough number of buildings such that
give similar weights to neighboring pixels. This architecture deep nets can learn something from them.
produced building segmentation masks with up to 93% ac-
curacy. An example of the detected buildings can be seen C. Parameter Inversion
in Fig. 5, where the left subfigure is the amplitude of the Parameter inversion from SAR images is a challenging
input TerraSAR-X image of Berlin, and the right subfigure field in SAR applications. As one important branch, ice
is the predicted building mask. Another major contribution concentration estimation is now attracting great attention due
made in that paper addresses the problem of lack of training to its importance to ice monitoring and climate research
data by introducing an automatic annotation technique, which [124]. Since there are complex interactions between SAR
annotates the TomoSAR data using Open Street Map (OSM) signals and sea ice [125], empirical algorithms face difficulties
data. with interpreting SAR images for accurate ice concentration
As an extension of the abovementioned work, Sun et al. estimation.
[123] tackled the problem of individual building segmentation Wang et al. [8] resorted to a CNN for generating ice
in large-scale urban areas. They propose a conditional GIS- concentration maps from dual polarized SAR images. Their
aware network (CG-Net) that learns multi-level visual features method takes image patches of the intensity-scaled dual band
and employs building footprint data to normalize these features SAR images as inputs, and outputs ice concentration directly.
for predicting building masks. Thanks to the novel network In [126], [127], Wang et al. employed various CNN models
architecture and the large amounts buildings labels automat- to estimate ice concentration from SAR images during the
ically generated from an accurate DEM and GIS building melt season. Labels are produced by ice experts via visual
footprints, this network achieves F1 score of 75.08% for interpretation. The algorithm was tested on dual-pol RadarSat-
individual building segmentation. With the predicted building 2 data. Since the problem considered is the regression of
masks, large-scaled levels-of-detail (LoD) 1 building models a continuous value, mean squared error is selected as the
are reconstructed with mean height error of 2.39 m. loss function. Experimental results demonstrate that CNNs
Overall deep learning has shown very good performance can offer a more accurate result than comparative operational
in existing SAR object detection tasks. There are two main products.
challenges that the algorithm designer needs to keep in mind In a different application, Song et al. [130] used a deep
when tackling any SAR object detection tasks. The first is CNN, including five pairs of convolutional and max pooling
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 10

Fig. 5: Very high resolution TerraSAR-X image of Berlin (left), and the predicted building mask [120] (right).

Fig. 6: The Architecture of CNN for SAR image despeckling [60].

layers followed by two fully connected layers for inverting nately, most of the focus of the remote sensing community
rough surface parameters from SAR images. The training of has been devoted to classical problems, which overlap with
the network was based on simulated data solely due to the computer vision tasks such as classification, object detection,
scarcity of real training data. The method was able to invert segmentation, and denoising. One reason for this might be
the desired parameters with a reasonable accuracy and the since parameter estimation usually requires the incorporation
authors showed that training a CNN for parameter inversion of appropriate physical models and tackles the problem at hand
purposes could be done quite efficiently. Furthermore, Zhao as regression rather than classification, the domain knowledge
et al. [131] designed a complex-valued CNN to directly is quite essential in order to apply deep learning for such
learn physical scattering signatures from PolSAR images. The tasks, especially for SAR images with their peculiar physi-
authors have notably proposed a framework to automatically cal characteristics. One interesting study [87] that has been
generate labeled data, which led to a supervised learning already described in detail in subsection IV-A, which designs
algorithm for the aforementioned parameter inversion. The discriminative features by spectral analysis of complex-valued
approach is similar to the study presented in [132], where the SAR data is an important work toward including deep learning
authors used deep learning for SAR image colorization and in parameter inversion studies using SAR data. We hope that
learning a full polarimteric SAR image from single-pol data. in the future more studies will be carried out in this direction.
Another interesting application of deep learning in parameter
inversion has been recently published in [133]. The authors
D. Despeckling
propose a deep neural network architecture containing a CNN
and a GAN to automatically learn SAR image simulation Speckle, caused by the coherent interaction among scattered
parameters from a small number of real SAR images. They signals from sub-resolution objects, often makes processing
later feed the learned parameters to a SAR simulator such and interpretation of SAR images difficult. Therefore, despeck-
as RaySAR [134] to generate a wide variety of simulated ling is a crucial procedure before applying SAR images to
SAR images, which can increase training data production various tasks. Conventional methods aim at removing speckle
and improve the interpretation of SAR images with complex either spatially, where local spatial filters, such as the Lee filter
backscattering scenarios. [135], Kuan filter [136], and Frost filter [137], are employed,
or using wavelet-based methods [138], [139], [140]. For a full
On the whole, deep learning-based parameter estimation for overview of these techniques, the reader is referred to [141].
SAR applications has not yet been fully exploited. Unfortu- In the past decade, patch-based methods for speckle reduction
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 11

Fig. 7: The comparison of speckle reduction among SAR-BM3D [128], SAR-CNN [60], and CNN-NLM applied to a small strip
of COSMO-SkyMed data over Caserta, Italy, where the reference clean image has been obtained by temporal multi-looking
applied to a stack of SAR images [129].

have gained high popularity due to their ability to preserve speckle noise is transformed into an additive form and can
spatial features while not sacrificing image resolution [142]. be recovered by residual learning, where log-speckle noise is
Deledalle et al. [143] proposed one of the first nonlocal patch- regarded as residual. As shown in Fig. 6, an input log-noisy
based methods applied to speckle reduction by taking into image is mapped identically to a fusion layer via a shortcut
account the statistical properties of speckle combined with connection, and then added element-wise with the learned
the original nonlocal image denoising algorithm introduced residual image to produce a log-clean image. Afterwards,
in [144]. A vast number of variations of the nonlocal method denoised images can be obtained by an exp-transformation.
for SAR despeckling has been proposed, with the most no- Wang et al. [9] proposed a CNN, called ID-CNN, for image
table ones included in [145], [146]. However, on one hand, despeckling, which can directly learn denoised images via a
manual selection of appropriate parameters for conventional component-wise division-residual layer with skip connections.
algorithms is not easy and is sensitive to reference images. In another words, homomorphic processing is not introduced
On the other hand, it is difficult to achieve a balance between for transforming multiplicative noise into additive noise and
preserving distinct image features and removing artifacts with at a final stage the noisy image is divided by the learned noise
empirical despeckling methods. To solve these limitations, to yield the clean image.
methods based on deep learning have been developed. As a step forward with respect to the two aforementioned
Inspired by the success of image denoising using a residual residual-based learning methods, Zhang et al. [148] employed
learning network architecture in the computer vision commu- a dilated residual network, SAR-DRN, instead of simply stack-
nity [147], Chierchia et al. [60] first introduced a residual ing convolutional layers. Unlike [60] and similar to [9], SAR-
learning CNN for SAR image despeckling by presenting a DRN is trained in an end-to-end fashion using a combination
17-layered CNN for learning to subtract speckle components of dilated convolutions and skip connections with a residual
from noisy images. Considering that speckle noise is assumed learning structure, which indicates that prior knowledge such
to be multiplicative, the homomorphic approach with coupled as a noise description model is not required in the workflow.
log- and exp-transformations is performed before and after In [149], Yue et al. proposed a novel deep neural network
feeding images to the network. In this case, multiplicative architecture specifically designed for SAR despeckling. It used
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 12

a convolutional neural network to extract image features and scene for training; they either output the clean image in an
reconstruct a discrete RCS probability density function (PDF). end-to-end fashion or propose residual-based techniques to
It is trained by a hybrid loss function which measures the learn the underlying noise model. With the availability of
distance between the actual SAR image intensity PDF and the large archives of time series thanks to the Sentinel-1 mission,
estimated one that is derived from convolution between the an interesting direction is to exploit the temporal correlation
reconstructed RCS PDF and prior speckle PDF. Experimental of speckle characteristics for despeckling applications. One
results demonstrated that the proposed despeckling neural critical issue is over-smoothing in despeckling that needs to
network can achieve comparable performance as non-learning be addressed. Many of the CNN-based methods perform well
state-of-the-art methods. in terms of speckle removal but are not able to preserve sharp
The unique distribution of SAR intensity images was also edges. This is quite problematic in despeckling of high resolu-
taken into account in [150]. It proposed a different loss tion SAR images of urban areas in particular. Another problem
function which contains three terms between the true and in supervised deep learning-based despeckling techniques is
the reconstructed image. They are the common L2 loss, the the lack of ground truth data. In many studies, the training
L2 difference between the gradient of the two images, and data set is built by corrupting optical images by multiplicative
the Kullback-Leibler divergence between the distribution of noise. This is far from realistic for despeckling applied to real
the two images. The three terms are designed to emphasize SAR data. Therefore, despeckling in an unsupervised manner
the spatial details, the identification of strong scatterers, and would be highly desirable and worth attention.
the speckle statistics, respectively. Experiments in [150] show
improved performance compared to SAR-BM3D [128] and
E. InSAR
SAR-DRN [148].
In [57], the problem of despeckling was tackled by a time Interferometric SAR (InSAR) is one of the most important
series of images. Using a stack of images for despeckling is not SAR techniques, and is widely used in reconstructing the
unique to deep learning-based methods, as has been recently topography of the Earth’s surface, i.e., digital elevation model
demonstrated in [151] as well. In [57] the authors utilized (DEM) generation [154], [155], [65], and detecting topograph-
a multi-layer perceptron with several hidden layers to learn ical displacements, e.g., monitoring volcanic eruptions [156],
non-linear intensity characteristics of training image patches. [157], [158], earthquakes [159], [160], land subsidence [161],
This approach has shown promising results and reported and urban areas using time series methods [162], [163], [164].
comparative performance with the state-of-the-art despeckling The principle of InSAR is to first measure the interferomet-
algorithms. ric phase between signals received by two antennas located
Again using single images instead of time series, in [36] the at different positions and then extract topographic information
authors proposed a deep encoder–decoder CNN architecture from the obtained interferogram by unwrapping and converting
with focus on feature preservation, which is a weakness of the absolute phase to height. However, an actual interferogram
CNNs. They modified U-Net [32] in order to accommodate often suffers from a large number of singular points, which
speckle statistical features. Another notable CNN approach originate from the interference distortion and noise in radar
was introduced in [129], where the authors used a nonlocal measurements. These points result in unwrapping errors and
structure, while the weights for pixel-wise similarity measures consequently low quality DEMs. To tackle this problem,
were assigned using a CNN. The results of this approach, Ichikawa and Hirose [165] applied a complex-valued neural
called CNN-NLM, are reported in Fig. 7, where the superiority network, CVNN, in the spectral domain to restore singular
of the method with respect to both feature preservation and points. With the help of the Complex Markov Random Field
speckle reduction is clearly observed. (CMRF) filter [166], they aimed at learning ideal relationships
One of the drawbacks of the aforementioned algorithms between the spectrum of neighboring pixels and that of center
is the requirement of noise-free and noisy image pairs for pixels via a one-hidden-layer CVNN. Notably, center pixels of
training. Often, those training data are simulated using optical each training sample are supposed to be ideal points, which
images with multiplicative noise. This is of course not ideal indicate that singular points are not fed to the network during
for real SAR images. Therefore, one elegant solution is the the training procedure. Similarly, Oyama and Hirose [167]
noise2noise framework [152], where the network only requires restored singular points with a CVNN in the spectrum domain.
two noisy images of the same area. [152] proves that the Related to topography extraction, Costante et al. [169]
network is able to learn a clean representation of the image proposed a fully CNN Encoder-Decoder architecture for es-
given the noise distributions of the two noisy images are timating DEM from single-pass image acquisitions. It is
independent and identical. This idea has been employed in demonstrated that this model is capable of extracting high-
SAR despeckling in [153]. The authors make use of multi- level features from input radar images using an encoder
temporal SAR images of a same area as the input to the section and then reconstructing full resolution DEM via a
noise2noise network. To mitigate the effect of the temporal decoder section. Moreover, the network can potentially solve
change between the input SAR image pairs, the authors the layover phenomenon in one single-look SAR image with
multiples a patch similarity term to the original loss function. contextual features.
From the deep learning-based despeckling methods re- In addition to reconstructing DEMs, Schwegmann et al.
viewed in this subsection, it can be observed that most methods [170] presented a CNN-based technique to detect subsidence
employ CNN-based architectures with single images of the deformations from interferograms. They employed a 9-layer
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 13

Fig. 8: The workflow of volcano deformation detection proposed in [168]. The CNN is trained on simulated data and is later
used to detect phase gradients and a decorrelation mask from input wrapped interferograms to locate ground deformation
caused by volcanoes.

network to extract salient information in interferograms and moving volcanoes by using a time series of interferograms in
displacement maps for discriminating deformation targets from [173]. In another study related to automatic volcanic deforma-
deformation-like targets. Furthermore, Anantrasirichai et al. tion detection, Valade et al. [168] designed and trained a CNN
[10], [171], [172] used a pre-trained CNN to automatically from scratch to learn a decorrelation mask from input wrapped
detect volcanic ground deformation from InSAR images. They interferograms, which then was used to detect volcanic ground
divided each image into patches, and relabeled them with deformation. The flowchart of this approach can be seen in
binary labels, i.e., ”background” and ”volcano”, and finally Fig. 8. The training in both of the aforementioned works [173],
fed them to the network to predict volcano deformation. They [168] was based on simulated data. Another geophysically
further improved their method to be able to detect slow- motivated example of using deep learning on InSAR data,
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 14

which was actually proposed earlier than the above-mentioned and validated on the SARptical dataset [182], [183], which is
CNN-based studies, was seen in [174], [175], [176], where the specifically built for joint analysis of VHR SAR and optical
authors used simple feed-forward shallow neural networks for images in dense urban areas.
seismic event characterization and automatic seismic source In [184], the authors proposed a deep learning frame-
parameter inversion by exploiting the power of neural net- work that can learn an end-to-end mapping between image
works in solving non-linear problems. patch pairs and their matching labels. An image pair is first
Recently, deep learning has been utilized for tomographic transformed into two 1-D vectors and then concatenated to
processing as well. An unfolded deep network which involves build a large 1-D vector as the input of the network. Then
the vector approximate message passing algorithms has been hidden layers are stacked for learning the mapping between
proposed in [177]. Experiments with simulated and real data input vectors and output binary labels, which indicate their
have been performed, which shows the spectral estimation correspondence.
gains speed up and achieves competitive performance. In For the purpose of matching SAR and optical images,
[178], a real-valued deep neural network is applied for MIMO Merkle et al. [185] presented a CNN that comprises of a
SAR 3-D imaging. It shows a better super-resolution power feature extraction stage (Siamese network) and a similarity
compared with other compressive sensing-based methods. measure stage (dot product layer). Specifically, features of
In summary, it can be concluded that the use of deep input optical and SAR images are extracted via two separate
learning methods in InSAR is still at a very early stage. 9-layer branches and then fed to a dot product layer for
Although deep learning has been used in different applications predicting the shift of the optical image within the large SAR
combined with InSAR, the full potential of interferograms is reference patch. Experimental results indicate that this deep
not yet fully exploited except in the pioneering work of Hirose learning-based method outperforms state-of-the-art matching
[179]. Many applications treat interferograms or deformation approaches [186], [187]. Furthermore, Abulkhanov et al. [188]
maps obtained from interferograms as images similar to RGB successfully trained a neural network to build feature point
or gray-scale ones and therefore the complex nature of in- descriptors to identify corresponding patches among SAR and
terferograms has remained unnoticed. Apart from this issue, optical images and match the detected descriptors using the
like the SAR despeckling problem using deep learning, lack RANSAC algorithm [189].
of ground truth data for either detection or image restora- In contrast to training a model to identify corresponding
tion problems is a motivation to focus on developing semi- image patches, Merkle et al. [190] first employed a conditional
supervised and unsupervised algorithms that combine deep generative adversarial network (cGAN) to generate artificial
learning and InSAR. Otherwise a training database consisting SAR-like images from optical images, then matched them with
of interferograms for different scenarios and also for different real SAR images. The authors demonstrate that the matching
phase contributions could be beneficial for supervised learning accuracy and precision are both improved with the proposed
applications. Simulation-based interferogram generation for strategy. Inspired by their study, more researchers resorted to
the latter has been recently proposed [180]. using GANs for the purpose of SAR-optical image matching
(see [191], [192] for a review).
With respect to applications of SAR and optical image
F. SAR-Optical Data fusion matching, Yao et al. [193] aimed at applying SAR and optical
The fusion of SAR and optical images can provide comple- images to semantic segmentation with deep neural networks.
mentary information about targets. However, considering the They collected corresponding optical patches from Google
two different sensing modalities, prior identification and co- Earth according to TerraSAR-X patches and built ground truths
registration of corresponding images are challenging [181], but using data from OpenStreetMap. Then SAR and optical images
compulsory for joint applications of SAR and optical images. were separately fed to different CNNs to predict semantic
For the purpose of identifying and matching SAR and optical labels (building, natural, land use, and water). Despite their
images, many current methods resort to deep learning, given experimental results not outperforming the state of the art by
its powerful capabilities of extracting effective features from the time [194] likely because of network design or training
complex images. strategy, they deduced that introducing advanced models and
In [58], the authors proposed a CNN for identifying corre- simultaneously using both data sources can greatly improve the
sponding image patches of very high resolution (VHR) optical performance of semantic segmentation. Another application
and SAR imagery of complex urban scenes. Their network mentioned in [195] demonstrated that standard fusion tech-
consists of two streams: one designed for extracting features niques for SAR and optical images require data from both
from optical images, the other responsible for learning features sources, which indicates that it is still not easy to interpret SAR
from SAR images. Next the extracted features are fused via images without the support of optical images. To address this
a concatenation layer for further binary prediction of their issue, Schmitt et al. [195] proposed an automatic colorization
correspondence. A selection of True Positives, False Positives, network, composed of a VAE and a mixture density network
False Negatives, and True Negatives of SAR-optical image (MDN) [196], to predict artificially colored SAR images (i.e.,
patches from [58] can be seen in Fig. 9. Similarly, Hughes Sentinel-1 images). These images are proven to disclose more
et al. [11] proposed a pseudo-Siamese CNN for learning a information to the human interpreter than the original SAR
multi-sensor correspondence predictor for SAR and optical data.
image patches. Notably, both networks in [58], [11] are trained In [42], the authors tackled the problem of cloud removal
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 15

Fig. 9: Randomly selected patches obtained from the testing phase of the network for SAR-optical image patch correspondence
detection proposed in [11].

from optical imagery. They introduced a cGAN architecture In particular, we consider the following categories of deep
to fuse SAR and cloud-corrupted multi-spectral data for learning problems in SAR.
generating cloud- and haze-free multi-spectral optical data. • Image classification: each pixel or patch in one image
Experiments proved the effectiveness of the proposed network is classified into a single label. This is often the case in
for removing cloud from multi-spectral data with auxiliary typical land use land cover classification problems.
SAR data. Extending previous multi-modal networks for cloud • Scene classification: similar to image classification, one
removal, [43] proposed a cycle-consistent GAN architecture image or patch is classified into a single label. However,
[197] that utilizes a image forward-backward translation con- one scene is usually much larger than an image patch.
sistency loss. Cloud-covered optical information is recon- Hence, it requires a different network architecture.
structed via SAR data fusion, while changes to cloud-free • Semantic segmentation: one image or patch is segmented
areas are minimized through use of the cycle consistency loss. to a classification map of the same dimension. Training
The cycle-consistent architecture allows training without pixel- of such neural networks also requires densely annotated
wise correspondences between cloudy input and cloud-free training data.
target optical imagery, relaxing requirements on the training • Object detection: similar to scene classification. However,
data set. detection often requires the estimation of the object
In summary, it can be seen that the utilization of deep location.
learning methods for SAR-optical data fusion has been a hot • Registration/matching: provide binary classification
topic in the remote sensing community. Although a handful (matched or unmatched), or estimate the translation
of data sets consisting of optical and SAR corresponding between two image patches. This type of task requires
image patches are available for different terrain types and matching pairs of two different image patches as training
applications, one of the biggest problems in this task is still data.
the scarcity of high quality training data. Semi-supervised
methods, as proposed in [198], seems to be a viable option to
tackle the problem. A great challenge in the SAR-optical im- A. Image/Scene Classification
age matching is the extreme difference in viewing geometries • So2Sat LCZ42 [200]: So2Sat LCZ42 follows the local
of the two sensors. For this it is important to exploit auxiliary climate zones (LCZs) classification scheme. The dataset
3D data in order to assist the training data generation. comprises 400,673 pairs of dual-pol Sentinel-1 and multi-
spectral Sentinel-2 image patches from 42 urban ag-
V. E XISTING B ENCHMARK DATASETS AND THEIR glomerations, plus 10 additional smaller areas, across
LIMITATIONS five continents. The image patches are hand-labelled
In order to train and evaluate deep learning models, large into one of the 17 LCZ classes [213]. The Sentinel-1
datasets are indispensable. Unlike RGB images in the com- image patches in this dataset contain both the geocoded
puter vision community, which can be easily collected and single look complex image, as well as a despeckled
interpreted, SAR images are much more difficult to annotate Lee filtered variant. In particular, it is the first Earth
due to their complex properties. Our research shows that big observation dataset that provides a quantitative measure
SAR datasets created for the primary purpose of deep learning of the label uncertainty, achieved by letting a group of
research are nearly non-existent in the community. In recent domain experts cast 10 independent votes on 19 cities
years, only a few SAR datasets have been made public for in the dataset. The dataset therefore can be considered
training and assessing deep learning models. In the following, a large-scale data fusion and classification benchmark
we categorize those datasets according to their best suited deep dataset for cutting-edge machine learning methodological
learning problem and focus on openly accessible and well- developments, such as automatic topology learning, data
curated large datasets. fusion, and quantification of uncertainties.
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 16

Fig. 10: Samples of the OpenSARUrban [199]. Six classes are shown from the top to the bottom: dense and low-rise residential
buildings, general residential area, high-rise buildings, villas, industrial storage area, and vegetation.

• OpenSARUrban [199]: OpenSARUrban consists of as scene classification or semantic segmentation for land
33,358 patches of Sentinel-1 dual-pol images covering 21 cover mapping.
major cities in China. The dataset was manually annotated • MSAW [204]: The multi-sensor all-weather mapping
according to a hierarchical classification scheme, with 10 (MSAW) dataset includes high-resolution SAR data,
classes of urban scenes at its finest level. Each image which covers 120 km2 in the area of Rotterdam, the
patch has a dimension of 100 by 100 pixels with a Netherlands. The quad-polarized X-band SAR imagery
pixel spacing of 10 m (Sentinel-1 GRD product). This from Capella Space with 0.5 m spatial resolution was
dataset can support deep learning studies of urban target used for the SpaceNet 6 Challenge. A total of 48,000
characterization, and content-based SAR image queries. unique building footprints have been labeled with addi-
Fig. 10 shows some samples from the OpenSARUrban tional building heights.
dataset. • PolSF [205]: This dataset consists of PolSAR im-
ages of San Francisco from eight different sensors,
including AIRSAR, ALOS-1, ALOS-2, RADARSAT-
B. Semantic Segmentation/Classification 2, SENTINEL-1A, SENTINEL-1B, GAOFEN-3, and
• SEN12MS [202]: SEN12MS was created based on its RISAT (data compiled by E. Pottier of IETR). Five of
previous version SEN1-2 [203]. SEN12MS consists of the eight images were densely labeled to five or six land
180,662 triplets of dual-pol Sentinel-1 image patches, use land cover classes in [205]. These densely annotated
multi-spectral Sentinel-2 image patches, and MODIS land images correspond to roughly 3,000 training patches of
cover maps. The patches are georeferenced with a ground 128 by 128 pixels. Although the data volume is relatively
sampling distance of 10 m. Each image patch has a low for deep learning research, this dataset is the only
dimension of 256 by 256 pixels. We expect this dataset annotated multi-sensory PolSAR dataset, to the best of
to support the community in developing sophisticated our knowledge. Therefore, we suggest that the creator of
deep learning-based approaches for common tasks such this dataset increase the number of annotated images to
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 17

TABLE I: Summary of available open SAR datasets


Name Description Suitable tasks Related work
So2Sat LCZ421 [200], 400,673 pairs of corresponding Sentinel-1 dual-pol image patch, Sentinel-2 image classification, [201]
TensorFlow API2 multispectral image patch, and manually labeled local climate zones classes data fusion,
over 42 urban agglomerations (plus 10 additional smaller areas) across the quantification of uncer-
globe. It is the first EO dataset that provides a quantitative measure of the label tainties
uncertainty, achieved by having a group of domain experts cast 10 independent
votes on 19 cities in the dataset.
OpenSARUrban3 [199] 33,358 Sentinel-1 dual-pol images patches covering 21 major cities in China, image classification
labeled with 10 classes of urban scenes.
SEN12MS4 [202] 180,748 corresponding image triplets containing Sentinel-1 dual-pol SAR image classification, [203]
data, Sentinel-2 multi-spectral imagery, and MODIS-derived land cover maps, semantic segmentation,
covering all inhabited continents during all meteorological seasons. data fusion
MSAW5 [204] quad-pol X-band SAR imagery from Capella Space with 0.5 m spatial reso- semantic segmentation
lution, which covers 120 km2 in the area of Rotterdam, the Netherlands. A
total number of 48,000 unique building footprints are labeled with associated
height information curated from the 3D Basis registratie Adressen en Gebouwen
(3DBAG) dataset.
PolSF, Data6 , The dataset includes PolSAR images of San Francisco from five different image classification, [206]
Label7 [205] sensors. Each image was densely labeled to five or six classes, such as semantic segmentation
mountain, water, high-density urban, low-density urban, vegetation, developed, data fusion
and bare soil.
MSTAR8 [207] 17,658 X-band very high resolution SAR images chips (patches) of 10 classes object detection, [97] [98] [208]
of different vehicles plus one class of simple geometric shaped target. SAR scene classification
images of pure clutter are also included in the dataset.
OpenSARShip 2.09 [209] 34,528 Sentinel-1 SAR image chips of ships with the ship geometric infor- object detection, [210]
mation, the ship type, and the corresponding automatic identification system scene classification
(AIS) information.
SAR-Ship-Dataset10 [211] 43,819 Gaofen-3 or Sentinel-1 image chips of different ships. Each image chip object detection, scene
has a dimension of 256 by 256 pixels in range and azimuth. classification
SARptical11 [212] 10,108 coregistered pairs of TerraSAR-X very high resolution spotlight image image matching [11], [183]
patch and UltraCAM aerial RGB image patch in Berlin, Germany. The
coregistration is defined by the matching of the 3D position of the center of
the image pair.
SEN1-212 [203] 282,384 pairs of corresponding Sentinel-1 single polarization intensity, and image matching [202]
Sentinel-2 RGB image patches, collected across the globe. The patches are of data fusion
dimension 256 by 256 pixels.

enable greater potential use of this dataset. type by verifying this data on the Marine Traffic website
[209]. Among all the patches, about one-third is extracted
C. Object Detection from Sentinel-1 GRD products, and the other two-thirds
are from Sentinel-1 SLC products. OpenSARShip 2.0 is
• MSTAR [207]: The Moving and Stationary Target Ac-
one of the handful of SAR datasets suitable for object
quisition and Recognition (MSTAR) dataset is one of
detection.
the earliest datasets for SAR target recognition. The
• SAR-Ship-Dataset [211]: This dataset was created using
dataset consists of total 17,658 X-band SAR image chips
102 Gaofen-3 and 108 Sentinel-1 images. It consists
(patches) of 10 classes of vehicle plus one class of
of 43,819 ship chips of 256 pixels in both range and
simple geometric shaped target. The collected SAR image
azimuth. These ships mainly have distinct scales and
patches are 128 by 128 pixels with a resolution of one
backgrounds. Therefore, this dataset can be employed for
foot in range and azimuth. In addition, 100 SAR images
developing multi-scale object detection models.
of clutter were also provided in the dataset.
• FUSAR-Ship [214]: This dataset was created using
In our opinion, the number of image patches in this
space-time matched-up datasets of Gaofen-3 SAR images
dataset is relatively low for deep learning models, espe-
and ship AIS messages. It consists of over 5000 ship chips
cially considering the number of classes. In addition, this
with corresponding ship information extracted from AIS
dataset represents a rather ideal and unrealistic scenario:
messages, which can be used to trace back to each unique
vehicles in the dataset are centered in the patch, and
ship of any particular chip.
the clutter is quite homogeneous without disturbing sig-
• AIR-SARShip-1.0/2.0 [215]: This dataset comprises 31
nals. However, considering the scarcity of such datasets,
(300) SAR images from the Geofen-3 satellite, which
MSTAR is a valuable source for target recognition.
includes 1m and 3m resolution imagery with different
• OpenSARShip 2.0 [209]: This dataset was built based
imaging modes, such as spotlight and stripmap. There are
on its previous version, OpenSARShip [210]. It contains
more than ten object categories including ships, tankers,
34,528 Sentinel-1 SAR image patches of different ships
fishing boats and others. The scene types in the dataset
with automatic identification system (AIS) information.
include ports, islands, reefs and sea surfaces of different
For each SAR image patch, the creators manually ex-
levels.
tracted the ship length, width, and direction, as well as its
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 18

D. Registration/Matching VI. C ONCLUSION AND F UTURE T RENDS


• SARptical [212], [183]: The SARptical dataset was de- This paper reviews the current state-of-the-art of an im-
signed for interpreting VHR spaceborne SAR images of portant and under-exploited research field — deep learning
dense urban areas. This dataset consists of 10,108 pairs in SAR. Relevant deep learning models are introduced, and
of corresponding very high resolution SAR and optical their applications in six application fields — terrain surface
image patches, whose location is precisely coregistered in classification, object detection, parameter inversion, despeck-
3D. They are extracted from TerraSAR-X VHR spotlight ling, InSAR, and SAR-optical data fusion — are analyzed in
images with resolution better than 1 m and UltraCAM depth. Exisiting benchmark datasets and their limitations are
aerial optical images of 20 cm pixel spacing, respec- discussed. In summary, despite early successes, full exploita-
tively. Unlike low and medium resolution images, high tion of deep learning in SAR is mostly limited by 1) the lack of
resolution SAR and optical images in dense urban areas large and representative benchmark datasets and 2) the defect
have very distinct geometries. Therefore, in the SARptical of tailored deep learning models that make full consideration
dataset, the center points of each image pair are matched of SAR signal characteristics.
in 3D space via sophisticated 3D reconstruction and Looking forward, the years ahead will be exciting. Next
matching algorithms. The UTM coordinates of the center generation spaceborne SAR missions will simultaneously pro-
pixel of each pair are also made available publicly in the vide high resolution and global coverage, which will enable
dataset. This dataset contributes to applications of multi- novel applications such as monitoring the dynamic Earth.
modal data classification, and SAR optical images co- To retrieve geo-parameters from these data, development of
registering. However, we believe more training samples new analytics methods are warranted. Deep learning is among
are required for learning complicated SAR optical image the most promising methods. To fully unlock its potential in
to image mapping. SAR/InSAR applications in this big SAR data era, there are
• SEN1-2 [203]: The SEN1-2 dataset consists of 282,384 several promising future directions:
pairs of corresponding Sentinel-1 single polarization in-
• Large and Representative Benchmark Datasets: As
tensity and Sentinel-2 RGB image patches, collected
summarized in this article, there is only a handful of SAR
from across the globe and throughout all meteorological
benchmarks, in particular when excluding multi-modal
seasons. The patches are of dimension 256 by 256 pixels.
ones. For instance, in SAR target detection, methods are
Their distribution over the four seasons is roughly even.
mainly tested on a single benchmark data set — the
SEN1-2 is the first large open dataset of this kind. We
MSTAR dataset, where only several thousands of target
believe it will support further developments in the field of
samples in total (several hundreds for each class) are
deep learning for remote sensing as well as multi-sensor
provided for training. With respect to InSAR, due to
data fusion, such as SAR image colorization, and SAR-
the lack of ground truth, datasets are extremely deficient
optical image matching.
or nearly nonexistent. Large and representative expert-
annotated benchmark datasets are in high demand in the
SAR community, and deserve more attention.
E. Other Datasets
• Unsupervised Deep Learning: To bypass the deficien-
• Sample PolSAR images from ESA: https://earth.esa. cies in annotated data in SAR, unsupervised deep learning
int/web/polsarpro/data-sources/sample-datasets. For ex- is a promising direction. These algorithms derive insights
ample, the Flevoland PolSAR Dataset. Several works directly from the data itself, and work as feature learning,
make use of this dataset for agricultural land use land representation learning, or clustering, which could be
cover classification. The authors of [216], [217], [218] further used for data-driven analytics. Autoencoders and
have manually labeled the dataset according to different their extensions, such as variational autoencoders (VAEs)
classification schemes. and deep embedded clustering algorithms, are popular
• SAR Image Land Cover Datasets [219]: This dataset choices. With respect to denoising, in despeckling, the
is not publicly available. Please contact the creator. high complexity of SAR images and lack of ground truth
• Airbus Ship Detection Challenge: https://www.kaggle. make it infeasible to produce appropriate benchmarks
com/c/airbus-ship-detection. from real data. Noise2Noise [152] is an elegant exam-
ple of unsupervised denoising where the authors learn
1 https://doi.org/10.14459/2018mp1483140
denoised data without clean data. Despite the nice visual
2 https://www.tensorflow.org/datasets/catalog/so2sat
3 https://doi.org/10.21227/3sz0-dp26
appearance of the results, preserving details is a must for
4 https://mediatum.ub.tum.de/1474000 SAR applications.
5 https://spacenet.ai/sn6-challenge/ • Interferometric Data Processing: Since deep learning
6 https://www.ietr.fr/polsarpro-bio/san-francisco/ methods are initially applied to perception tasks in com-
7 https://github.com/liuxuvip/PolSF
puter vision, many methods resort to transforming SAR
8 https://www.sdms.afrl.af.mil/index.php?collection=mstar
9 http://opensar.sjtu.edu.cn/Data/Search
images, e.g., PolSAR images, into RGB-like images in
10 https://github.com/CAESAR-Radi/SAR-Ship-Dataset advance or focus only on intensities. In other words,
11 https://www.sipeo.bgu.tum.de/downloads/SARptical data.zip the most essential component of a SAR measurement —
12 https://mediatum.ub.tum.de/1436631 the phase information — is not appropriately considered.
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 19

Although CV-CNNs are capable of learning phase infor- interest, like ships, in real-time. Based on these detections
mation and show great potential in processing CV-SAR the transmit waveform can be modified such as to zoom
images, only a few such attempts have been made [83]. into the region of interest and allow for a close-up look
Extending CNN to complex domain, while being able to of the object and possibly classify or even identify it.
preserve the precious phase information, would enable Reinforcement (online) learning is part of the concept as
networks to directly learn features from raw data, and well as fast and reliable detectors or classifiers (trained
would open up a wide range of SAR/InSAR applications. offline), e.g. based on deep learning. All this is edge
• Quantification of Uncertainties: Generally speaking, computing; the learning algorithms have to perform in
geo-parameter estimates without uncertainty measures real-time and with the limited compute resources onboard
are considered invalid in remote sensing. Appropriately the satellite or airplane.
trained deep learning models can achieve highly accu- Last but not least, technology advances in deep learning in
rate predictions. Yet, they fail in quantifying the un- remote sensing would only be possible if experts in remote
certainty of these predictions. Here, giving a statement sensing and machine learning work closely together. This is
about the predictive uncertainty, while considering both particularly true when it comes to SAR. Thus, we encourage
aleatoric uncertainty and epistemic uncertainty, is of cru- more joint initiatives working collaboratively toward deep
cial importance. The Bayesian deep learning community learning powered, explainable and reproducible big SAR data
has developed a model-agnostic and easy-to-implement analytics.
methodology to estimate both data and model uncertainty
within deep learning models [220], which are awaiting
exploration by the SAR community. R EFERENCES
• Large Scale Nonlinear Optimization Problems: The
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
development of inversion algorithms should keep up the 521, no. 7553, pp. 436–444, 2015.
pace of data growth. Fast solvers are demanded for [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks
many advanced parameter inversion models, which often for large-scale image recognition,” arXiv:1409.1556, 2014.
[3] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object detection with
involve non-convex, nonlinear, and complex-valued op- deep learning: A review,” IEEE Transactions on Neural Networks and
timization problems, such as compressive-sensing-based Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019.
tomographic inversion, or low rank complex tensor de- [4] Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic
segmentation using deep neural networks,” International Journal of
composition for InSAR time series data analysis. In some Multimedia Information Retrieval, vol. 7, no. 2, pp. 87–93, 2018.
cases, the iterations of the optimization algorithms per- [5] X. X. Zhu, D. Tuia, L. Mou, G. Xia, L. Zhang, F. Xu, and F. Fraundor-
form similar computations as layers in neural networks, fer, “Deep learning in remote sensing: A comprehensive review and list
of resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5,
that is, a linear step followed by a non-linear activation no. 4, pp. 8–36, 2017.
(see for example, the iteratively reweighted least-squares [6] H. Parikh, S. Patel, and V. Patel, “Classification of SAR and PolSAR
approach). And it is thus meaningful to replace the images using deep learning: a review,” International Journal of Image
computationally expensive optimization algorithms with and Data Fusion, vol. 11, no. 1, pp. 1–32, 2020.
[7] S. Chen and H. Wang, “SAR target recognition based on deep learning,”
unrolled deep architectures that could be trained from in International Conference on Data Science and Advanced Analytics
simulated data [221]. (DSAA), 2014.
• Cognitive Sensors: Radars –– and SARs in particular – [8] L. Wang, A. Scott, L. Xu, and D. Clausi, “Ice concentration estimation
from dual-polarized SAR images using deep convolutional neural
– are very complex and versatile imaging machines. A networks,” IEEE Transactions on Geoscience and Remote Sensing,
variety of modes (stripmap, spotlight, ScanSAR, TOPS, 2014.
etc.), swath-widths, incidence angles and polarizations [9] P. Wang, H. Zhang, and V. Patel, “SAR image despeckling using a
convolutional neural network,” IEEE Signal Processing Letters, vol. 24,
can be programmed in near real-time. Cognitive radars go no. 12, pp. 1763–1767, 2017.
a giant step further; they adapt their operational modes [10] N. Anantrasirichai, J. Biggs, F. Albino, P. Hill, and D. Bull, “Appli-
autonomously to the environment to be imaged by an in- cation of machine learning to classification of volcanic deformation
in routinely generated InSAR data,” Journal of Geophysical Research:
telligent interplay of transmit waveforms, adaptive signal Solid Earth, 2018.
processing on the receiver side and learning. Cognitive [11] L. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu, “Identifying
SARs are still in their conceptual and experimental phase corresponding patches in SAR and optical images with a pseudo-
siamese CNN,” IEEE Geoscience and Remote Sensing Letters, vol. 15,
and are often justified by the stunning capabilities of no. 5, pp. 784–788, 2018.
the echo-location system of bats. In his early pioneering [12] K. Ikeuchi, T. Shakunaga, M. Wheeler, and T. Yamazaki, “Invariant
article [222] Haykin defines three ingredients of a cogni- histograms and deformable template matching for SAR target recog-
nition,” in Proceedings CVPR IEEE Computer Society Conference on
tive radar: “1) intelligent signal processing, which builds Computer Vision and Pattern Recognition. IEEE, 1996, pp. 100–105.
on learning through interactions of the radar with the [13] Q. Zhao and J. Principe, “Support vector machines for SAR automatic
surrounding environment; 2) feedback from the receiver target recognition,” IEEE Transactions on Aerospace and Electronic
to the transmitter, which is a facilitator of intelligence; Systems, vol. 37, no. 2, pp. 643–654, 2001.
[14] M. Bryant and F. Garber, “SVM classifier applied to the MSTAR public
and 3) preservation of the information content of radar data set,” in Algorithms for Synthetic Aperture Radar Imagery, 1999.
returns, which is realized by the Bayesian approach to [15] M. Ferguson, R. Ak, Y.-T. T. Lee, and K. H. Law, “Automatic
target detection through tracking.” Such a SAR could, localization of casting defects with convolutional neural networks,”
in 2017 IEEE International Conference on Big Data (Big Data).
e.g., perform a low resolution, yet wide swath, surveil- Boston, MA: IEEE, Dec. 2017, pp. 1726–1735. [Online]. Available:
lance of a coastal area and in a first step detect objects of http://ieeexplore.ieee.org/document/8258115/
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 20

[16] C. Bourez, “Deep learning course,” [Accessed May 27, tion using multitemporal SAR sentinel-1 for camargue, france,” Remote
2020]. [Online]. Available: http://christopher5106.github.io/img/ Sensing, vol. 10, no. 8, p. 1217, 2018.
deeplearningcourse/DL46.png [41] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
[17] S. Panchal, “Cityscape image segmentation with tensorflow 2.0,” S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
[Accessed May 27, 2020]. [Online]. Available: https://miro.medium. Advances in neural information processing systems, 2014, pp. 2672–
com/max/2000/1*3FGS0kEAS55XmqxIXkp0mQ.png 2680.
[18] Wikipedia, “Long short-term memory,” [Accessed May 27, 2020]. [42] C. Grohnfeld, M. Schmitt, and X. X. Zhu, “A conditional generative ad-
[Online]. Available: https://upload.wikimedia.org/wikipedia/commons/ versarial network to fuse SAR and multispectral optical data for cloud
thumb/3/3b/The LSTM cell.png/1280px-The LSTM cell.png removal from Sentinel-2 images,” in IEEE International Geoscience
[19] W. Feng, N. Guan, Y. Li, X. Zhang, and Z. Luo, “Audio visual and Remote Sensing Symposium (IGARSS), 2018.
speech recognition with multimodal recurrent neural networks,” in [43] P. Ebel, M. Schmitt, and X. Zhu, “Cloud removal in unpaired sentinel-
2017 International Joint Conference on Neural Networks (IJCNN). 2 imagery using cycle-consistent gan and sar-optical data fusion,”
Anchorage, AK, USA: IEEE, May 2017, pp. 681–688. [Online]. IGARSS 2020 IEEE International Geoscience and Remote Sensing
Available: http://ieeexplore.ieee.org/document/7965918/ Symposium, 2020.
[20] “Under the hood of the variational autoencoder (in prose and [44] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
code),” [Accessed May 27, 2020]. [Online]. Available: http: dinov, “Dropout: a simple way to prevent neural networks from
//fastforwardlabs.github.io/blog-images/miriam/imgs code/vae.4.png overfitting,” The journal of machine learning research, vol. 15, no. 1,
[21] T. Silva, “An intuitive introduction to generative ad- pp. 1929–1958, 2014.
versarial networks (gans),” [Accessed May 26, 2020]. [45] K. Pearson, “Liii. on lines and planes of closest fit to systems of points
[Online]. Available: https://cdn-media-1.freecodecamp.org/images/ in space,” The London, Edinburgh, and Dublin Philosophical Magazine
m41LtQVUf3uk5IOYlHLpPazxI3pWDwG8VEvU and Journal of Science, vol. 2, no. 11, pp. 559–572, 1901.
[22] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy side [46] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv
effects with graph convolutional networks,” Bioinformatics, vol. 34, preprint arXiv:1312.6114, 2013.
no. 13, p. 457–466, 2018. [47] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
[23] B. Huang and K. M. Carley, “Residual or Gate? Towards Deeper Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski
Graph Neural Networks for Inductive Graph Representation Learning,” et al., “Human-level control through deep reinforcement learning,”
arXiv:1904.08035 [cs, stat], Aug. 2019, arXiv: 1904.08035. [Online]. Nature, vol. 518, no. 7540, pp. 529–533, 2015.
Available: http://arxiv.org/abs/1904.08035 [48] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource man-
[24] B. Zoph and Q. V. Le, “Neural Architecture Search with Reinforcement agement with deep reinforcement learning,” in Proceedings of the 15th
Learning,” arXiv:1611.01578 [cs], Feb. 2017, arXiv: 1611.01578. ACM Workshop on Hot Topics in Networks, 2016, pp. 50–56.
[Online]. Available: http://arxiv.org/abs/1611.01578 [49] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van
[25] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,
database,” IEEE, 2010. M. Lanctot et al., “Mastering the game of go with deep neural networks
[26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based and tree search,” nature, vol. 529, no. 7587, p. 484, 2016.
learning applied to document recognition,” Proceedings of the IEEE, [50] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement
vol. 86, no. 11, pp. 2278–2324, 1998. learning,” arXiv preprint arXiv:1611.01578, 2016.
[27] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification [51] T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A
with deep convolutional neural networks,” in Advances in Neural survey,” arXiv preprint arXiv:1808.05377, 2018.
Information Processing Systems, 2012. [52] H. Dong, B. Zou, L. Zhang, and S. Zhang, “Automatic design of
[28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: CNNs via differentiable neural architecture search for PolSAR image
A large-scale hierarchical image database,” in 2009 IEEE conference classification,” IEEE Transactions on Geoscience and Remote Sensing,
on computer vision and pattern recognition. Ieee, 2009, pp. 248–255. pp. 1–14, 2020.
[29] T. Tieleman and G. Hinton, “Lecture 6.5—RmsProp: Divide the [53] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
gradient by a running average of its recent magnitude,” COURSERA: convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
Neural Networks for Machine Learning, 2012. [54] B. Huang and K. M. Carley, “Residual or gate? towards deeper graph
[30] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” neural networks for inductive graph representation learning,” arXiv
arXiv preprint arXiv:1412.6980, 2014. preprint arXiv, 2019.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for [55] Y. Shi, Q. Li, and X. X. Zhu, “Building segmentation through a
image recognition,” in IEEE International Conference on Computer gated graph convolutional neural network with deep structured feature
Vision and Pattern Recognition (CVPR), 2016. embedding,” ISPRS Journal of Photogrammetry and Remote Sensing,
[32] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net- vol. 159, pp. 184–197, 2020.
works for biomedical image segmentation,” in International Confer- [56] F. Ma, F. Gao, J. Sun, H. Zhou, and A. Hussain, “Attention graph
ence on Medical image computing and computer-assisted intervention. convolution network for image segmentation in big sar imagery data,”
Springer, 2015, pp. 234–241. Remote Sensing, vol. 11, no. 21, p. 2586, 2019.
[33] G. Huang, Z. Liu, K. Weinberger, and L. Maaten, “Densely connected [57] X. Tang, L. Zhang, and X. Ding, “SAR image despeckling with a
convolutional networks,” in IEEE International Conference on Com- multilayer perceptron neural network,” International Journal of Digital
puter Vision and Pattern Recognition (CVPR), 2017. Earth, pp. 1–21, 2018.
[34] T. Hoeser and C. Kuenzer, “Object Detection and Image Segmentation [58] L. Mou, M. Schmitt, Y. Wang, and X. X. Zhu, “A CNN for the
with Deep Learning on Earth Observation Data: A Review-Part I: identification of corresponding patches in SAR and optical imagery
Evolution and Recent Trends,” Remote Sensing, vol. 12, no. 10, p. of urban scenes,” in Urban Remote Sensing Event (JURSE), 2017.
1667, 2020. [59] R. Touzi, A. Lopes, and P. Bousquet, “A statistical and geometrical
[35] A. Mazza, F. Sica, P. Rizzoli, and G. Scarpa, “TanDEM-X edge detector for SAR images,” IEEE Transactions on Geoscience and
Forest Mapping Using Convolutional Neural Networks,” Remote Remote Sensing, vol. 26, no. 6, pp. 764–773, 1988.
Sensing, vol. 11, no. 24, p. 2980, Jan. 2019. [Online]. Available: [60] G. Chierchia, D. Cozzolino, G. Poggi, and L. Verdoliva, “SAR
https://www.mdpi.com/2072-4292/11/24/2980 image despeckling through convolutional neural networks,”
[36] F. Lattari, B. Gonzalez Leon, F. Asaro, A. Rucci, C. Prati, and arXiv:1704.00275, 2017.
M. Matteucci, “Deep learning for SAR image despeckling,” Remote [61] Y. Shi, X. X. Zhu, and R. Bamler, “Optimized parallelization of non-
Sensing, vol. 11, no. 13, p. 1532, 2019. local means filter for image noise reduction of InSAR image,” in IEEE
[37] D. Morgan, “Deep convolutional neural networks for ATR from SAR International Conference on Information and Automation, 2015.
imagery,” in Algorithms for Synthetic Aperture Radar Imagery, 2015. [62] X. X. Zhu, R. Bamler, M. Lachaise, F. Adam, Y. Shi, and M. Eineder,
[38] B. A. Pearlmutter, “Learning state space trajectories in recurrent neural “Improving TanDEM-X DEMs by non-local InSAR filtering,” in Eu-
networks,” Neural Computation, vol. 1, no. 2, pp. 263–269, 1989. ropean Conference on Synthetic Aperture Radar (EUSAR), 2014.
[39] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural [63] L. Denis, C.-A. Deledalle, and F. Tupin, “From patches to deep
computation, vol. 9, no. 8, pp. 1735–1780, 1997. learning: Combining self-similarity and neural networks for sar image
[40] E. Ndikumana, D. Ho Tong Minh, N. Baghdadi, D. Courault, and despeckling,” in IGARSS 2019 - 2019 IEEE International Geoscience
L. Hossard, “Deep recurrent neural network for agricultural classifica- and Remote Sensing Symposium. IEEE, 2019, pp. 5113–5116.
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 21

[64] J. Gao, B. Deng, Y. Qin, H. Wang, and X. Li, “Enhanced Radar [85] L. Li, L. Ma, L. Jiao, F. Liu, Q. Sun, and J. Zhao, “Complex contourlet-
Imaging Using a Complex-Valued Convolutional Neural Network,” CNN for polarimetric SAR image classification,” Pattern Recognition,
IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 1, pp. 35– p. 107110, 2019.
39, 2019. [86] W. Xie, G. Ma, F. Zhao, H. Liu, and L. Zhang, “PolSAR image
[65] A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek, and classification via a novel semi-supervised recurrent complex-valued
K. P. Papathanassiou, “A tutorial on synthetic aperture radar,” IEEE convolution neural network,” Neurocomputing, vol. 388, pp. 255–268,
Geoscience and Remote Sensing Magazine, vol. 1, no. 1, pp. 6–43, 2020.
2013. [87] Z. Huang, M. Datcu, Z. Pan, and B. Lei, “Deep SAR-Net: Learning
[66] C. He, S. Li, Z. Liao, and M. Liao, “Texture classification of PolSAR objects from signals,” ISPRS Journal of Photogrammetry and Remote
data based on sparse coding of wavelet polarization textons,” IEEE Sensing, vol. 161, pp. 179–193, 2020.
Transactions on Geoscience and Remote Sensing, vol. 51, no. 8, pp. [88] R. Ressel, A. Frost, and S. Lehner, “A neural network-based classi-
4576–4590, 2013. fication for sea ice types on x-band SAR images,” IEEE Journal of
[67] H. Xie, S. Wang, K. Liu, S. Lin, and B. Hou, “Multilayer feature Selected Topics in Applied Earth Observations and Remote Sensing,
learning for polarimetric synthetic radar data classification,” in IEEE vol. 8, no. 7, pp. 3672–3680, 2015.
International Geoscience and Remote Sensing Symposium (IGARSS), [89] R. Ressel, S. Singha, and S. Lehner, “Neural network based automatic
2014. sea ice classification for CL-pol RISAT-1 imagery,” in 2016 IEEE
[68] J. Geng, H. Wang, J. Fan, and X. Ma, “Deep supervised and contractive International Geoscience and Remote Sensing Symposium (IGARSS).
neural network for SAR image classification,” IEEE Transactions on IEEE, 2016, pp. 4835–4838.
Geoscience and Remote Sensing, vol. 55, no. 4, pp. 2442–2459, 2017. [90] R. Ressel, S. Singha, S. Lehner, A. Rosel, and G. Spreen, “Investigation
[69] S. Uhlmann and S. Kiranyaz, “Integrating color features in polarimetric into different polarimetric features for sea ice classification using x-
SAR image classification,” IEEE Transactions on Geoscience and band synthetic aperture radar,” IEEE Journal of Selected Topics in
Remote Sensing, vol. 52, no. 4, pp. 2197–2216, 2014. Applied Earth Observations and Remote Sensing, vol. 9, no. 7, pp.
[70] J. Geng, J. Fan, H. Wang, X. Ma, B. Li, and F. Chen, “High-resolution 3131–3143, 2016.
SAR image classification via deep convolutional autoencoders,” IEEE [91] S. Singha, M. Johansson, N. Hughes, S. M. Hvidegaard, and H. Skou-
Geoscience and Remote Sensing Letters, vol. 12, no. 11, pp. 2351– rup, “Arctic sea ice characterization using spaceborne fully polarimetric
2355, 2015. l-, c-, and x-band SAR with validation by airborne measurements,”
[71] B. Hou, B. Ren, G. Ju, H. Li, L. Jiao, and J. Zhao, “SAR image IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 7,
classification via hierarchical sparse representation and multisize patch pp. 3715–3734, 2018.
features,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 1, [92] N. Zakhvatkina, V. Smirnov, and I. Bychkova, “Satellite SAR data-
pp. 33–37, 2016. based sea ice classification: An overview,” Geosciences, vol. 9, no. 4,
p. 152, 2019.
[72] F. Gao, T. Huang, J. Wang, J. Sun, A. Hussain, and E. Yang, “Dual-
branch deep convolution neural network for polarimetric SAR image [93] X. Yao, J. Han, G. Cheng, X. Qian, and L. Guo, “Semantic Annotation
classification,” Applied Sciences, vol. 7, no. 5, p. 447, 2017. of High-Resolution Satellite Images via Weakly Supervised Learning,”
IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 6,
[73] B. Hou, H. Kou, and L. Jiao, “Classification of polarimetric SAR
pp. 3660–3671, 2016.
images using multilayer autoencoders and superpixels,” IEEE Journal
[94] G. Cheng, C. Yang, X. Yao, L. Guo, and J. Han, “When Deep Learning
of Selected Topics in Applied Earth Observations and Remote Sensing,
Meets Metric Learning: Remote Sensing Image Scene Classification via
vol. 9, no. 7, pp. 3072–3081, 2016.
Learning Discriminative CNNs,” IEEE Transactions on Geoscience and
[74] L. Zhang, W. Ma, and D. Zhang, “Stacked sparse autoencoder in
Remote Sensing, vol. 56, no. 5, pp. 2811–2821, 2018.
PolSAR data classification using local spatial information,” IEEE
[95] F. Zhang, C. Hu, Q. Yin, W. Li, H. Li, and W. Hong, “SAR target
Geoscience and Remote Sensing Letters, vol. 13, no. 9, pp. 1359–1363,
recognition using the multi-aspect-aware bidirectional LSTM recurrent
2016.
neural networks,” arXiv:1707.09875, 2017.
[75] F. Qin, J. Guo, and W. Sun, “Object-oriented ensemble classification
[96] E. Keydel, S. Lee, and J. Moore, “MSTAR extended operating condi-
for polarimetric SAR imagery using restricted Boltzmann machines,”
tions: A tutorial,” in Algorithms for Synthetic Aperture Radar Imagery
Remote Sensing Letters, vol. 8, no. 3, pp. 204–213, 2017.
III, 1996.
[76] Z. Zhao, L. Jiao, J. Zhao, J. Gu, and J. Zhao, “Discriminant deep [97] S. Chen, H. Wang, F. Xu, and Y. Jin, “Target classification using the
belief network for high-resolution SAR image classification,” Pattern deep convolutional networks for SAR images,” IEEE Transactions on
Recognition, vol. 61, pp. 686–701, 2017. Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4806–4817, 2016.
[77] Y. Zhou, H. Wang, F. Xu, and Y. Jin, “Polarimetric SAR image classi- [98] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural network
fication using deep convolutional neural networks,” IEEE Geoscience with data augmentation for SAR target recognition,” IEEE Geoscience
and Remote Sensing Letters, vol. 13, no. 12, pp. 1935–1939, 2016. and Remote Sensing Letters, vol. 13, no. 3, pp. 364–368, 2016.
[78] Y. Wang, C. He, X. Liu, and M. Liao, “A hierarchical fully convolu- [99] K. Du, Y. Deng, R. Wang, T. Zhao, and N. Li, “SAR ATR based on
tional network integrated with sparse and low-rank subspace represen- displacement-and rotation-insensitive CNN,” Remote Sensing Letters,
tations for PolSAR imagery classification,” Remote Sensing, vol. 10, vol. 7, no. 9, pp. 895–904, 2016.
no. 2, p. 342, 2018. [100] M. Wilmanski, C. Kreucher, and J. Lauer, “Modern approaches in deep
[79] S. Chen and C. Tao, “PolSAR image classification using polarimetric- learning for SAR ATR,” in Algorithms for Synthetic Aperture Radar
feature-driven deep convolutional neural network,” IEEE Geoscience Imagery, 2016.
and Remote Sensing Letters, vol. 15, no. 4, pp. 627–631, 2018. [101] S. Wagner, “SAR ATR by a combination of convolutional neural net-
[80] C. He, M. Tu, D. Xiong, and M. Liao, “Nonlinear manifold learning work and support vector machines,” IEEE Transactions on Aerospace
integrated with fully convolutional networks for PolSAR image classi- and Electronic Systems, vol. 52, no. 6, pp. 2861–2872, 2016.
fication,” Remote Sensing, vol. 12, no. 4, p. 655, 2020. [102] F. Gao, T. Huang, J. Sun, J. Wang, A. Hussain, and E. Yang, “A new
[81] H. Dong, L. Zhang, and B. Zou, “PolSAR image classification with algorithm for sar image target recognition based on an improved deep
lightweight 3d convolutional networks,” Remote Sensing, vol. 12, no. 3, convolutional neural network,” Cognitive Computation, vol. 11, no. 6,
p. 396, 2020. pp. 809–824, 2019.
[82] N. Teimouri, M. Dyrmann, and R. N. Jørgensen, “A novel spatio- [103] F. Gao, T. Huang, J. Wang, J. Sun, E. Yang, and A. Hussain,
temporal FCN-LSTM network for recognizing various crop types using “Combining deep convolutional neural network and svm to sar image
multi-temporal radar images,” Remote Sensing, vol. 11, no. 8, p. 990, target recognition,” in IEEE International Conference on Internet of
2019. Things (iThings) and IEEE Green Computing and Communications
[83] Z. Zhang, H. Wang, F. Xu, and Y. Jin, “Complex-valued convolutional (GreenCom) and IEEE Cyber, Physical and Social Computing (CP-
neural network and its application in polarimetric SAR image classifi- SCom) and IEEE Smart Data (SmartData), 2017.
cation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, [104] H. Furukawa, “Deep learning for end-to-end automatic target recogni-
no. 12, pp. 7177–7188, 2017. tion from synthetic aperture radar imagery,” arXiv:1801.08558, 2018.
[84] A. G. Mullissa, C. Persello, and A. Stein, “PolSARNet: A deep [105] D. Cozzolino, G. Di Martino, G. Poggi, and L. Verdoliva, “A fully
fully convolutional network for polarimetric SAR image classification,” convolutional neural network for low-complexity single-stage ship
IEEE Journal of Selected Topics in Applied Earth Observations and detection in Sentinel-1 SAR images,” in IEEE International Geoscience
Remote Sensing, vol. 12, no. 12, pp. 5300–5309, 2019. and Remote Sensing Symposium (IGARSS), 2017.
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 22

[106] C. Schwegmann, W. Kleynhans, B. Salmon, L. Mdakane, and R. Meyer, age,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50,
“Very deep learning for ship discrimination in synthetic aperture no. 2, pp. 606–616, 2012.
radar imagery,” in IEEE International Geoscience and Remote Sensing [129] D. Cozzolino, L. Verdoliva, G. Scarpa, and G. Poggi, “Nonlocal CNN
Symposium (IGARSS), 2016. SAR image despeckling,” Remote Sensing, vol. 12, no. 6, p. 1006,
[107] C. Bentes, A. Frost, D. Velotto, and B. Tings, “Ship-iceberg dis- 2020.
crimination with convolutional neural networks in high resolution [130] T. Song, L. Kuang, L. Han, Y. Wang, and Q. H. Liu, “Inversion of
SAR images,” in European Conference on Synthetic Aperture Radar rough surface parameters from SAR images using simulation-trained
(EUSAR), 2016. convolutional neural networks,” IEEE Geoscience and Remote Sensing
[108] N. Ødegaard, A. Knapskog, C. Cochin, and J. Louvigne, “Classification Letters, vol. 15, no. 7, pp. 1130–1134, 2018.
of ships using real and simulated data in a convolutional neural [131] J. Zhao, M. Datcu, Z. Zhang, H. Xiong, and W. Yu, “Contrastive-
network,” in IEEE Radar Conference (RadarConf), 2016. regulated cnn in the complex domain: A method to learn physical
[109] Y. Liu, M. Zhang, P. Xu, and Z. Guo, “SAR ship detection using scattering signatures from flexible polsar images,” IEEE Transactions
sea-land segmentation-based convolutional neural network,” in Interna- on Geoscience and Remote Sensing, vol. 57, no. 12, pp. 10 116–10 135,
tional Workshop on Remote Sensing with Intelligent Processing (RSIP), 2019.
2017. [132] Q. Song, F. Xu, and Y.-Q. Jin, “Radar Image Colorization: Con-
[110] R. Girshick, “Fast R-CNN,” arXiv:1504.08083, 2015. verting Single-Polarization to Fully Polarimetric Using Deep Neural
[111] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real- Networks,” IEEE Access, vol. 6, pp. 1647–1661, 2018.
time object detection with region proposal networks,” IEEE Transac- [133] S. Niu, X. Qiu, B. Lei, C. Ding, and K. Fu, “Parameter Extraction
tions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. Based on Deep Neural Network for SAR Target Simulation,” IEEE
1137–1149, 2017. Transactions on Geoscience and Remote Sensing, vol. 58, no. 7, pp.
[112] J. Li, C. Qu, and J. Shao, “Ship detection in SAR images based on an 4901–4914, 2020.
improved faster R-CNN,” in SAR in Big Data Era: Models, Methods [134] S. Auer, R. Bamler, and P. Reinartz, “RaySAR - 3D SAR simulator:
and Applications (BIGSARDATA), 2017. Now open source,” in 2016 IEEE International Geoscience and Remote
[113] M. Kang, K. Ji, X. Leng, and Z. Lin, “Contextual region-based Sensing Symposium (IGARSS). Beijing, China: IEEE, 2016, pp. 6730–
convolutional neural network with multilayer fusion for SAR ship 6733.
detection,” Remote Sensing, vol. 9, no. 8, p. 860, 2017. [135] J. Lee, “Digital image enhancement and noise filtering by use of
[114] J. Jiao, Y. Zhang, H. Sun, X. Yang, X. Gao, W. Hong, K. Fu, and local statistics,” IEEE Transactions on Pattern Analysis and Machine
X. Sun, “A densely connected end-to-end neural network for multiscale Intelligence, vol. PAMI-2, no. 2, pp. 165–168, 1980.
and multiscene SAR ship detection,” IEEE Access, vol. 6, pp. 20 881– [136] D. Kuan, A. Sawchuk, T. Strand, and P. Chavel, “Adaptive noise
20 892, 2018. smoothing filter for images with signal-dependent noise,” IEEE trans-
[115] C. Dechesne, S. Lefèvre, R. Vadaine, G. Hajduch, and R. Fablet, actions on Pattern Analysis and Machine Intelligence, vol. PAMI-7,
“Multi-task deep learning from sentinel-1 sar: ship detection, classifi- no. 2, pp. 165–177, 1985.
cation and length estimation,” in Conference on Big Data from Space, [137] V. Frost, J. Stiles, K. Shanmugan, and J. Holtzman, “A model for radar
2019. images and its application to adaptive digital filtering of multiplicative
[116] A. G. Mullissa, C. Persello, and A. Stein, “Polsarnet: A deep fully noise,” IEEE Transactions on Pattern Analysis and Machine Intelli-
convolutional network for polarimetric sar image classification,” IEEE gence, vol. PAMI-4, no. 2, pp. 157–166, 1982.
Journal of Selected Topics in Applied Earth Observations and Remote [138] H. Xie, L. Pierce, and F. Ulaby, “SAR speckle reduction using wavelet
Sensing, 2019. denoising and Markov random field modeling,” IEEE Transactions on
[117] S. Kazemi, B. Yonel, and B. Yazici, “Deep learning for direct automatic Geoscience and Remote Sensing, vol. 40, no. 10, pp. 2196–2212, 2002.
target recognition from sar data,” in 2019 IEEE Radar Conference [139] F. Argenti and L. Alparone, “Speckle removal from SAR images in the
(RadarConf). IEEE, 2019, pp. 1–6. undecimated wavelet domain,” IEEE Transactions on Geoscience and
[118] M. Rostami, S. Kolouri, E. Eaton, and K. Kim, “Deep transfer learning Remote Sensing, vol. 40, no. 11, pp. 2363–2374, 2002.
for few-shot sar image classification,” Remote Sensing, vol. 11, no. 11, [140] A. Achim, P. Tsakalides, and A. Bezerianos, “SAR image denoising
p. 1374, 2019. via Bayesian wavelet shrinkage based on heavy-tailed modeling,” IEEE
[119] Z. Huang, Z. Pan, and B. Lei, “What, where, and how to transfer Transactions on Geoscience and Remote Sensing, vol. 41, no. 8, pp.
in sar target recognition based on deep cnns,” IEEE Transactions on 1773–1784, 2003.
Geoscience and Remote Sensing, 2019. [141] F. Argenti, A. Lapini, T. Bianchi, and L. Alparone, “A tutorial on
[120] M. Shahzad, M. Maurer, F. Fraundorfer, Y. Wang, and X. X. Zhu, speckle reduction in synthetic aperture radar images,” IEEE Geoscience
“Buildings detection in VHR SAR images using fully convolution neu- and Remote Sensing Magazine, vol. 1, no. 3, pp. 6–35, 2013.
ral networks,” IEEE Transactions on Geoscience and Remote Sensing, [142] F. Tupin, L. Denis, C.-A. Deledalle, and G. Ferraioli, “Ten years of
vol. 57, no. 2, pp. 1100–1116, 2019. patch-based approaches for sar imaging: A review,” in IGARSS 2019 -
[121] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks 2019 IEEE International Geoscience and Remote Sensing Symposium.
for semantic segmentation,” in IEEE International Conference on IEEE, 2019, pp. 5105–5108.
Computer Vision and Pattern Recognition (CVPR), 2015. [143] C.-A. Deledalle, L. Denis, and F. Tupin, “Iterative weighted maximum
[122] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, likelihood denoising with probabilistic patch-based weights,” IEEE
C. Huang, and P. H. Torr, “Conditional random fields as recurrent Transactions on Image Processing, vol. 18, no. 12, pp. 2661–2672,
neural networks,” in Proceedings of the IEEE international conference 2009.
on computer vision, 2015, pp. 1529–1537. [144] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image
[123] Y. Sun, Y. Hua, L. Mou, and X. X. Zhu, “Cg-net: Conditional gis-aware denoising,” in 2005 IEEE Computer Society Conference on Computer
network for individual building segmentation in vhr sar images,” arXiv Vision and Pattern Recognition (CVPR’05), vol. 2. IEEE, 2005, pp.
preprint arXiv:2011.08362, 2020. 60–65.
[124] F. RADAR and J. FALKINGHAM, “Global satel- [145] Xin Su, C.-A. Deledalle, F. Tupin, and Hong Sun, “Two-step multi-
lite observation requirements for floating ice.” [Online]. temporal nonlocal means for synthetic aperture radar images,” IEEE
Available: https://www.wmo.int/pages/prog/sat/meetings/documents/ Transactions on Geoscience and Remote Sensing, vol. 52, no. 10, pp.
PSTG-4 Doc 08-04 GlobSatObsReq-FloatingIce.pdf 6181–6196, 2014.
[125] W. Dierking, “Sea ice monitoring by synthetic aperture radar,” [146] C.-A. Deledalle, L. Denis, F. Tupin, A. Reigber, and M. Jager,
Oceanography, vol. 26, no. 2, pp. 100–111, 2013. “NL-SAR: A unified nonlocal framework for resolution-preserving
[126] L. Wang, K. Scott, L. Xu, and D. Clausi, “Sea ice concentration estima- (pol)(in)SAR denoising,” IEEE Transactions on Geoscience and Re-
tion during melt from dual-pol SAR scenes using deep convolutional mote Sensing, vol. 53, no. 4, pp. 2021–2038, 2015.
neural networks: A case study,” IEEE Transactions on Geoscience and [147] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaus-
Remote Sensing, vol. 54, no. 8, pp. 4524–4533, 2016. sian denoiser: Residual learning of deep CNN for image denoising,”
[127] L. Wang, “Learning to estimate sea ice concentration from SAR IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155,
imagery,” Ph.D. dissertation, University of Waterloo, 2016. [Online]. 2017.
Available: http://hdl.handle.net/10012/10954 [148] Q. Zhang, Q. Yuan, J. Li, Z. Yang, and X. Ma, “Learning a dilated
[128] S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva, “A nonlocal residual network for SAR image despeckling,” Remote Sensing, vol. 10,
SAR image denoising algorithm based on LLMMSE wavelet shrink- no. 2, p. 196, 2018.
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 23

[149] D.-X. Yue, F. Xu, and Y.-Q. Jin, “Sar despeckling neural network [169] G. Costante, T. Ciarfuglia, and F. Biondi, “Towards monocular digital
with logarithmic convolutional product model,” International Journal elevation model (DEM) estimation by convolutional neural networks-
of Remote Sensing, vol. 39, no. 21, pp. 7483–7505, 2018. application on synthetic aperture radar images,” arXiv:1803.05387,
[150] S. Vitale, G. Ferraioli, and V. Pascazio, “Multi-Objective CNN 2018.
Based Algorithm for SAR Despeckling,” arXiv:2006.09050 [cs, [170] C. Schwegmann, W. Kleynhans, J. Engelbrecht, L. Mdakane, and
eess], Aug. 2020, arXiv: 2006.09050v4. [Online]. Available: http: R. Meyer, “Subsidence feature discrimination using deep convolutional
//arxiv.org/abs/2006.09050 neural networks in synthetic aperture radar imagery,” in IEEE Interna-
[151] G. Baier, W. He, and N. Yokoya, “Robust nonlocal low-rank SAR time tional Geoscience and Remote Sensing Symposium (IGARSS), 2017.
series despeckling considering speckle correlation by total variation [171] N. Anantrasirichai, F. Albino, P. Hill, D. Bull, and J. Biggs,
regularization,” IEEE Transactions on Geoscience and Remote Sensing, “Detecting volcano deformation in InSAR using deep learning,”
pp. 1–13, 2020. arXiv:1803.00380, 2018.
[152] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, [172] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull, “A deep learning
and T. Aila, “Noise2noise: Learning image restoration without clean approach to detecting volcano deformation from satellite imagery
data,” 2018. using synthetic datasets,” Remote Sensing of Environment, vol. 230,
[153] X. Ma, C. Wang, Z. Yin, and P. Wu, “SAR Image Despeckling by p. 111179, 2019.
Noisy Reference-Based Deep Learning Method,” IEEE Transactions [173] ——, “The application of convolutional neural networks to detect slow,
on Geoscience and Remote Sensing, pp. 1–12, 2020. [Online]. sustained deformation in InSAR time series,” Geophysical Research
Available: https://ieeexplore.ieee.org/document/9091002/ Letters, vol. 46, no. 21, pp. 11 850–11 858, 2019.
[154] H. Zebker, C. Werner, P. Rosen, and S. Hensley, “Accuracy of [174] F. Del Frate, M. Picchiani, G. Schiavon, and S. Stramondo, “Neural
topographic maps derived from ERS-1 interferometric radar,” IEEE networks and SAR interferometry for the characterization of seismic
Transactions on Geoscience and Remote Sensing, vol. 32, no. 4, pp. events,” in Proc. SPIE, C. Notarnicola, Ed., 2010, p. 78290J.
823–836, 1994. [175] M. Picchiani, F. Del Frate, G. Schiavon, S. Stramondo, M. Chini, and
[155] R. Abdelfattah and J. Nicolas, “Topographic SAR interferometry for- C. Bignami, “Neural networks for automatic seismic source analysis
mulation for high-precision DEM generation,” IEEE Transactions on from DInSAR data,” in Proc. SPIE, 2011, p. 81790K.
Geoscience and Remote Sensing, vol. 40, no. 11, pp. 2415–2426, 2002. [176] S. Stramondo, F. Del Frate, M. Picchiani, and G. Schiavon, “Seismic
[156] D. Massonnet, P. Briole, and A. Arnaud, “Deflation of mount Etna source quantitative parameters retrieval from InSAR data and neural
monitored by spaceborne radar interferometry,” Nature, vol. 375, no. networks,” IEEE Transactions on Geoscience and Remote Sensing,
6532, p. 567, 1995. vol. 49, no. 1, pp. 96–104, 2011.
[157] J. Ruch, J. Anderssohn, T. Walter, and M. Motagh, “Caldera-scale [177] J. Gao, Y. Ye, S. Li, Y. Qin, X. Gao, and X. Li, “Fast super-resolution
inflation of the Lazufre volcanic area, south America: Evidence from 3d sar imaging using an unfolded deep network,” in 2019 IEEE
InSAR,” Journal of Volcanology and Geothermal Research, vol. 174, International Conference on Signal, Information and Data Processing
no. 4, pp. 337–344, 2008. (ICSIDP). IEEE, 2019, pp. 1–5.
[158] E. Trasatti, F. Casu, C. Giunchi, S. Pepe, G. Solaro, S. Tagliaventi, [178] C. Wu, Z. Zhang, L. Chen, and W. Yu, “Super-resolution for mimo
P. Berardino, M. Manzo, A. Pepe, G. Ricciardi, E. Sansosti, P. Tizzani, array sar 3-d imaging based on compressive sensing and deep neural
G. Zeni, and R. Lanari, “The 2004–2006 uplift episode at Campi network,” IEEE Journal of Selected Topics in Applied Earth Observa-
Flegrei caldera (Italy): Constraints from SBAS-DInSAR ENVISAT tions and Remote Sensing, vol. 13, pp. 3109–3124, 2020.
data and Bayesian source inference,” Geophysical Research Letters, [179] A. Hirose, Complex-Valued Neural Networks, ser. Studies in Compu-
vol. 35, no. 7, pp. 1–6, 2008. tational Intelligence. Springer Berlin Heidelberg, 2012, vol. 400.
[159] D. Massonnet, M. Rossi, C. Carmona, F. Adragna, G. Peltzer, K. Feigl, [180] G. Rongier, C. Rude, T. Herring, and V. Pankratius, “Generative
and T. Rabaute, “The displacement field of the landers earthquake Modeling of InSAR Interferograms,” Earth and Space Science, vol. 6,
mapped by radar interferometry,” Nature, vol. 364, no. 6433, p. 138, no. 12, pp. 2671–2683, 2019.
1993. [181] M. Schmitt and X. X. Zhu, “On the challenges in stereogrammetric
[160] G. Peltzer and P. Rosen, “Surface displacement of the 17 May 1993 fusion of SAR and optical imagery for urban areas,” the International
Eureka valley, California, earthquake observed by SAR interferometry,” Archives of the Photogrammetry, Remote Sensing and Spatial Informa-
Science, vol. 268, no. 5215, pp. 1333–1336, 1995. tion Sciences, vol. 41, no. B7, pp. 719–722, 2016.
[161] V. B. H. (Gini) Ketelaar, Satellite Radar Interferometry, ser. Remote [182] Y. Wang, X. X. Zhu, S. Montazeri, J. Kang, L. Mou, and M. Schmitt,
Sensing and Digital Image Processing. Springer Netherlands, 2009, “Potential of the “SARptical” system,” in FRINGE, 2017.
vol. 14. [183] Y. Wang and X. X. Zhu, “The SARptical dataset for joint analysis of
[162] X. X. Zhu and R. Bamler, “Let’s do the time warp: Multicomponent SAR and optical image in dense urban area,” in IEEE International
nonlinear motion estimation in differential SAR tomography,” IEEE Geoscience and Remote Sensing Symposium (IGARSS), 2018.
Geoscience and Remote Sensing Letters, vol. 8, no. 4, pp. 735–739, [184] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A
2011. deep learning framework for remote sensing image registration,” ISPRS
[163] S. Gernhardt and R. Bamler, “Deformation monitoring of single Journal of Photogrammetry and Remote Sensing, 2018.
buildings using meter-resolution SAR data in PSI,” ISPRS Journal of [185] N. Merkle, W. Luo, S. Auer, R. Müller, and R. Urtasun, “Exploiting
Photogrammetry and Remote Sensing, vol. 73, pp. 68–79, 2012. deep matching and SAR data for the geo-localization accuracy im-
[164] S. Montazeri, X. X. Zhu, M. Eineder, and R. Bamler, “Three- provement of optical satellite images,” Remote Sensing, vol. 9, no. 6,
dimensional deformation monitoring of urban infrastructure by to- p. 586, 2017.
mographic SAR using multitrack TerraSAR-x data stacks,” IEEE [186] S. Suri and P. Reinartz, “Mutual-information-based registration of
Transactions on Geoscience and Remote Sensing, vol. 54, no. 12, pp. TerraSAR-X and Ikonos imagery in urban areas,” IEEE Transactions
6868–6878, 2016. on Geoscience and Remote Sensing, vol. 48, no. 2, pp. 939–949, 2010.
[165] K. Ichikawa and A. Hirose, “Singular unit restoration in InSAR [187] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin, “SAR-
using complex-valued neural networks in the spectral domain,” IEEE SIFT: A SIFT-like algorithm for SAR images,” IEEE Transactions on
Transactions on Geoscience and Remote Sensing, vol. 55, no. 3, pp. Geoscience and Remote Sensing, vol. 53, no. 1, pp. 453–466, 2015.
1717–1723, 2017. [188] D. Abulkhanov, I. Konovalenko, D. Nikolaev, A. Savchik, E. Shvets,
[166] R. Yamaki and A. Hirose, “Singular unit restoration in interferograms and D. Sidorchuk, “Neural network-based feature point descriptors for
based on complex-valued markov random field model for phase un- registration of optical and SAR images,” in International Conference
wrapping,” IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 1, on Machine Vision (ICMV), 2018.
pp. 18–22, 2009. [189] M. A. Fischler and R. C. Bolles, “Random sample consensus: a
[167] K. Oyama and A. Hirose, “Adaptive phase-singular-unit restoration paradigm for model fitting with applications to image analysis and
with entire-spectrum-processing complex-valued neural networks in automated cartography,” Communications of the ACM, vol. 24, no. 6,
interferometric SAR,” Electronics Letters, vol. 54, no. 1, pp. 43–44, pp. 381–395, 1981.
2018. [190] N. Merkle, S. Auer, R. Müller, and P. Reinartz, “Exploring the
[168] S. Valade, A. Ley, F. Massimetti, O. D’Hondt, M. Laiolo, D. Coppola, potential of conditional adversarial networks for optical and SAR
D. Loibl, O. Hellwich, and T. R. Walter, “Towards global volcano mon- image matching,” IEEE Journal of Selected Topics in Applied Earth
itoring using multisensor sentinel missions and artificial intelligence: Observations and Remote Sensing, pp. 1–10, 2018.
The MOUNTS monitoring system,” Remote Sensing, vol. 11, no. 13, [191] L. H. Hughes, N. Merkle, T. Burgmann, S. Auer, and M. Schmitt,
p. 1528, 2019. “Deep learning for SAR-optical image matching,” in IGARSS 2019 -
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 24

2019 IEEE International Geoscience and Remote Sensing Symposium. [210] L. Huang, B. Liu, B. Li, W. Guo, W. Yu, Z. Zhang, and W. Yu, “Open-
IEEE, 2019, pp. 4877–4880. SARShip: A Dataset Dedicated to Sentinel-1 Ship Interpretation,” IEEE
[192] M. Fuentes Reyes, S. Auer, N. Merkle, C. Henry, and M. Schmitt, Journal of Selected Topics in Applied Earth Observations and Remote
“SAR-to-optical image translation based on conditional generative Sensing, vol. 11, no. 1, pp. 195–208, Jan. 2018, conference Name:
adversarial networks—optimization, opportunities and limits,” Remote IEEE Journal of Selected Topics in Applied Earth Observations and
Sensing, vol. 11, no. 17, p. 2067, 2019. Remote Sensing.
[193] W. Yao, D. Marmanis, and M. Datcu, “Semantic segmentation using [211] Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei, “A SAR Dataset
deep neural networks for SAR and optical image pairs,” 2017. of Ship Detection for Deep Learning under Complex Backgrounds,”
[194] N. Audebert, B. Le Saux, and S. Lefèvre, “Semantic Segmentation Remote Sensing, vol. 11, no. 7, p. 765, Mar. 2019. [Online]. Available:
of Earth Observation Data Using Multimodal and Multi-scale https://www.mdpi.com/2072-4292/11/7/765
Deep Networks,” in Computer Vision – ACCV 2016, S.-H. Lai, [212] Y. Wang, X. X. Zhu, B. Zeisl, and M. Pollefeys, “Fusing Meter-
V. Lepetit, K. Nishino, and Y. Sato, Eds. Cham: Springer Resolution 4-D InSAR Point Clouds and Optical Images for Semantic
International Publishing, 2017, vol. 10111, pp. 180–196, series Urban Infrastructure Monitoring,” IEEE Transactions on Geoscience
Title: Lecture Notes in Computer Science. [Online]. Available: and Remote Sensing, vol. 55, no. 1, pp. 14–26, Jan. 2017, 00002.
http://link.springer.com/10.1007/978-3-319-54181-5 12 [213] I. D. Stewart and T. R. Oke, “Local climate zones for urban
[195] M. Schmitt, L. Hughes, M. Körner, and X. X. Zhu, “Colorizing temperature studies,” Bulletin of the American Meteorological
Sentinel-1 SAR images using a variational autoencoder conditioned Society, vol. 93, no. 12, pp. 1879–1900, 2012. [Online]. Available:
on Sentinel-2 imagery,” International Archives of the Photogrammetry, http://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-11-00019.1
Remote Sensing and Spatial Information Sciences, vol. 42, p. 2, 2018. [214] H. Xiyue, W. Ao, Q. Song, J. Lai, H. Wang, and F. Xu, “Fusar-ship: a
[196] C. Bishop, “Mixture density networks,” Citeseer, Tech. Rep., 1994. high-resolution sar-ais matchup dataset of gaofen-3 for ship detection
[197] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image- and recognition,” SCIENCE CHINA Information Sciences, 2020.
to-image translation using cycle-consistent adversarial networks,” [215] S. Xian, W. Zhirui, S. Yuanrui, D. Wenhui, Z. Yue, and F. Kun, “Air-
in 2017 IEEE International Conference on Computer Vision sarship–1.0: High resolution sar ship detection dataset,” J. Radars,
(ICCV). IEEE, Oct 2017, p. 2242–2251. [Online]. Available: vol. 8, no. 6, pp. 852–862, 2019.
http://ieeexplore.ieee.org/document/8237506/ [216] P. Yu, A. Qin, and D. Clausi, “Unsupervised polarimetric SAR im-
[198] L. H. Hughes and M. Schmitt, “A SEMI-SUPERVISED APPROACH age segmentation and classification using region growing with edge
TO SAR-OPTICAL IMAGE MATCHING,” ISPRS Annals of Pho- penalty,” IEEE Transactions on Geoscience and Remote Sensing,
togrammetry, Remote Sensing and Spatial Information Sciences, vol. vol. 50, no. 4, pp. 1302–1317, 2012.
IV-2/W7, pp. 71–78, 2019. [217] D. Hoekman and M. Vissers, “A new polarimetric classification ap-
[199] J. Zhao, Z. Zhang, W. Yao, M. Datcu, H. Xiong, and W. Yu, proach evaluated for agricultural crops,” IEEE Transactions on Geo-
science and Remote Sensing, vol. 41, no. 12, pp. 2881–2889, 2003.
“OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban
Interpretation,” IEEE Journal of Selected Topics in Applied Earth [218] W. Yang, D. Dai, J. Wu, and C. He, Weakly supervised polarimetric
Observations and Remote Sensing, vol. 13, pp. 187–203, 2020. SAR image classification with multi-modal Markov aspect model.
ISPRS, 2010.
[Online]. Available: https://ieeexplore.ieee.org/document/8952866/
[219] C. O. Dumitru, G. Schwarz, and M. Datcu, “SAR Image Land Cover
[200] X. Zhu, J. Hu, C. Qiu, Y. Shi, J. Kang, L. Mou, H. Bagheri, M. Häberle,
Datasets for Classification Benchmarking of Temporal Changes,” IEEE
Y. Hua, R. Huang, L. D. Hughes, H. Li, Y. Sun, G. Zhang, S. Han,
Journal of Selected Topics in Applied Earth Observations and Remote
M. Schmitt, and Y. Wang, “So2Sat LCZ42: A benchmark dataset for
Sensing, vol. 11, no. 5, pp. 1571–1592, May 2018, conference Name:
global local climate zones classification,” IEEE Geoscience and Remote
IEEE Journal of Selected Topics in Applied Earth Observations and
Sensing Magazine, vol. in press, 2020.
Remote Sensing.
[201] M. Neumann, A. S. Pinto, X. Zhai, and N. Houlsby, “In- [220] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian
domain representation learning for remote sensing,” arXiv:1911.06721 deep learning for computer vision?” 2017.
[cs], Nov. 2019, arXiv: 1911.06721. [Online]. Available: http: [221] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence
//arxiv.org/abs/1911.06721 of unfolded ista and its practical weights and thresholds,” 2018.
[202] M. Schmitt, L. H. Hughes, C. Qiu, and X. X. Zhu, [222] S. Haykin, “Cognitive radar: a way of the future,” IEEE Signal
“SEN12MS - A CURATED DATASET OF GEOREFERENCED Processing Magazine, vol. 23, no. 1, pp. 30–40, 2006.
MULTI-SPECTRAL SENTINEL-1/2 IMAGERY FOR DEEP
LEARNING AND DATA FUSION,” ISPRS Annals of
Photogrammetry, Remote Sensing and Spatial Information Sciences,
vol. IV-2/W7, pp. 153–160, Sep. 2019. [Online]. Avail-
able: https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.
net/IV-2-W7/153/2019/
[203] M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset for
deep learning in SAR-Optical data fusion,” in ISPRS Annals of the
Photogrammetry, Remote Sensing and Spatial Information Sciences,
2018.
[204] J. Shermeyer, D. Hogan, J. Brown, A. Van Etten, N. Weir, F. Paci-
fici, R. Haensch, A. Bastidas, S. Soenen, T. Bacastow et al.,
“Spacenet 6: Multi-sensor all weather mapping dataset,” arXiv preprint
arXiv:2004.06500, 2020.
[205] X. Liu, L. Jiao, and F. Liu, “PolSF: Polsar image dataset on san
francisco,” arXiv preprint arXiv:1912.07259, 2019.
[206] Y. Cao, Y. Wu, P. Zhang, W. Liang, and M. Li, “Pixel-wise polsar image
classification via a novel complex-valued deep fully convolutional
network,” Remote Sensing, vol. 11, no. 22, p. 2653, 2019.
[207] T. Ross, S. Worrell, V. Velten, J. Mossing, and M. Bryant, “Standard
SAR ATR evaluation experiments using the MSTAR public release data
set,” in Algorithms for Synthetic Aperture Radar Imagery, 1998.
[208] F. Gao, Y. Yang, J. Wang, J. Sun, E. Yang, and H. Zhou, “A deep
convolutional generative adversarial networks (dcgans)-based semi-
supervised method for object recognition in synthetic aperture radar
(sar) images,” Remote Sensing, vol. 10, no. 6, p. 846, 2018.
[209] B. Li, B. Liu, L. Huang, W. Guo, Z. Zhang, and W. Yu, “OpenSARShip
2.0: A large-volume dataset for deeper interpretation of ship targets in
Sentinel-1 imagery,” in 2017 SAR in Big Data Era: Models, Methods
and Applications (BIGSARDATA). Beijing: IEEE, Nov. 2017, pp.
1–5. [Online]. Available: http://ieeexplore.ieee.org/document/8124929/
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 25

Xiao Xiang Zhu (S’10–M’12–SM’14–F’21) re- Yuansheng Hua (S’18) received the Bachelor’s
ceived the Master (M.Sc.) degree, her doctor of degree in remote sensing science and technology
engineering (Dr.-Ing.) degree and her “Habilitation” from the Wuhan University, Wuhan, China, in 2014,
in the field of signal processing from Technical and the Master’s degree in Earth Oriented Space Sci-
University of Munich (TUM), Munich, Germany, in ence and Technology (ESPACE) from the Technical
2008, 2011 and 2013, respectively. University of Munich (TUM), Munich, Germany, in
She is currently the Professor for Data Science 2018.
in Earth Observation (former: Signal Processing He is currently pursuing the Ph.D. degree with
in Earth Observation) at Technical University of the German Aerospace Center (DLR), Wessling,
Munich (TUM) and the Head of the Department Germany and the Technical University of Munich
“EO Data Science” at the Remote Sensing Tech- (TUM), Munich, Germany.
nology Institute, German Aerospace Center (DLR). Since 2019, Zhu is a In 2019, he was a visiting researcher with the Wageningen University &
co-coordinator of the Munich Data Science Research School (www.mu- Research, Wageningen, Netherlands. His research interests include remote
ds.de). Since 2019 She also heads the Helmholtz Artificial Intelligence – sensing, computer vision, and deep learning, especially their applications in
Research Field “Aeronautics, Space and Transport”. Since May 2020, she is remote sensing.
the director of the international future AI lab ”AI4EO – Artificial Intelligence
for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond”, Munich,
Germany. Since October 2020, she also serves in the board of directors of
the Munich Data Science Institute (MDSI), TUM. Prof. Zhu was a guest
scientist or visiting professor at the Italian National Research Council (CNR-
IREA), Naples, Italy, Fudan University, Shanghai, China, the University of
Tokyo, Tokyo, Japan and University of California, Los Angeles, United States
in 2009, 2014, 2015 and 2016, respectively. Her main research interests are Yuanyuan Wang (S’08-M’11) received the B.Eng.
remote sensing and Earth observation, signal processing, machine learning degree (Hons.) in electrical engineering from The
and data science, with a special application focus on global urban mapping. Hong Kong Polytechnic University, Hong Kong, in
Dr. Zhu is a member of young academy (Junge Akademie/Junges Kolleg) at 2008, and the M.Sc. and Dr. Ing. degree from the
the Berlin-Brandenburg Academy of Sciences and Humanities and the German Technical University of Munich (TUM), Munich,
National Academy of Sciences Leopoldina and the Bavarian Academy of Germany, in 2010 and 2015, respectively. In June
Sciences and Humanities. She is an associate Editor of IEEE Transactions on and July 2014, he was a Guest Scientist with the
Geoscience and Remote Sensing. Institute of Visual Computing, ETH Zürich, Zürich,
Switzerland. He is currently with the Department
of EO Data Science, Remote Sensing Technology
Institute of the German Aerospace Center, Weßling,
Germany, where he leads the working group Big SAR Data. He is also a
guest member of the Professorship of Data Science in Earth Observation,
Sina Montazeri received the B.Sc. degree in geode- Technical University of Munich, Munich, Germany, where he supports the
tic engineering from the University of Isfahan, Isfa- scientific management of ERC projects So2Sat (so2sat.eu) and AI4SmartCities
han, Iran, in 2011, the M.Sc. degree in geomatics (cordis.europa.eu/project/id/957467). His research interests include optimal
from Delft University of Technology (TU Delft), and robust parameters estimation in multibaseline InSAR techniques, mul-
Delft, The Netherlands in 2014, and the Ph.D. degree tisensor fusion algorithms of synthetic aperture radar (SAR) and optical data,
in radar remote sensing from the Technical Univer- nonlinear optimization with complex numbers, machine learning in SAR, and
sity of Munich (TUM), Munich, Germany, in 2019 high-performance computing for big data.
with a dissertation on Geodetic SAR Interferometry. Dr. Wang serves as the reviewer for multiple IEEE GRSS and other remote
In 2012, he spent two weeks with the Laboratoire sensing journals. He was one of the best reviewers of the IEEE Transactions
des Sciences de l’Image, de l’Informatique et de la on Geoscience and Remote Sensing in 2016. He is also an associate editor
Télédétection, University of Strasbourg, Strasbourg, of the Geoscience Data Journal of the UK Royal Meteorological Society.
France, as a Junior Researcher working on thermal remote sensing. From 2013
to 2015, he was a Research Assistant with the Remote Sensing Technology
Institute (IMF), German Aerospace Center (DLR), where he was involved in
absolute localization of point clouds obtained from SAR tomography. From
2015 to 2019, he was a research associate with TUM-SiPEO and DLR-IMF
working on automatic positioning of ground control points from multi-view
radar images. He is currently a Senior Researcher with the department of
EO Data Science of DLR-IMF focused on developing Machine Learning Lichao Mou received the Bachelor’s degree in
algorithms applied to radar imagery. His research interests include advanced automation from the Xi’an University of Posts and
InSAR techniques for deformation monitoring of urban infrastructure, image Telecommunications, Xi’an, China, in 2012, the
and signal processing relevant to radar imagery and applied machine learning. Master’s degree in signal and information processing
Dr. Montazeri was the recipient of the DLR Science Award and the IEEE from the University of Chinese Academy of Sciences
Geoscience and Remote Sensing Society Transactions Prize Paper Award, in (UCAS), China, in 2015, and the Dr.-Ing. degree
2016 and 2017, respectively for his work on Geodetic SAR Tomography. from the Technical University of Munich (TUM),
Munich, Germany, in 2020.
He is currently a Guest Professor at the Munich
AI Future Lab AI4EO, TUM and the Head of Visual
Learning and Reasoning team at the Department
“EO Data Science”, Remote Sensing Technology Institute (IMF), German
Mohsin Ali eceived his Bachelors in Computer Aerospace Center (DLR), Wessling, Germany. Since 2019, he is an AI Con-
Engineering degree from National University of sultant for the Helmholtz Artificial Intelligence Cooperation Unit (HAICU).
Science and Technology(NUST), Islamabad, Pak- In 2015 he spent six months at the Computer Vision Group at the University
istan in 2013 and Masters in Computer Science of Freiburg in Germany. In 2019 he was a Visiting Researcher with the
degree from University of Freiburg, Germany in Cambridge Image Analysis Group (CIA), University of Cambridge, UK. From
2018. Since April 2019 he is a PhD candidate at 2019 to 2020, he was a Research Scientist at DLR-IMF.
Earth Observation Center, DLR supervised by Prof. He was the recipient of the first place in the 2016 IEEE GRSS Data Fusion
Dr. Xiaoxiang Zhu. His main research interest are Contest and finalists for the Best Student Paper Award at the 2017 Joint Urban
uncertainty estimation in deep learning models for Remote Sensing Event and 2019 Joint Urban Remote Sensing Event.
remote sensing applications.
ACCEPTED BY IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021 26

Yilei Shi (M’18) received his Diploma (Dipl.-Ing.) Richard Bamler (M’95–SM’00–F’05) received his
degree in Mechanical Engineering, his Doctorate Diploma degree in Electrical Engineering, his Doc-
(Dr.-Ing.) degree in Engineering from Technical torate in Engineering, and his “Habilitation” in the
University of Munich (TUM), Germany. In April field of signal and systems theory in 1980, 1986, and
and May 2019, he was a guest scientist with the 1988, respectively, from the Technical University of
department of applied mathematics and theoretical Munich, Germany.
physics, University of Cambridge, United Kingdom. He worked at the university from 1981 to 1989
He is currently a senior scientist with the Chair of on optical signal processing, holography, wave
Remote Sensing Technology, Technical University propagation, and tomography. He joined the Ger-
of Munich. man Aerospace Center (DLR), Oberpfaffenhofen,
His research interests include computational intel- in 1989, where he is currently the Director of the
ligence, fast solver and parallel computing for large-scale problems, advanced Remote Sensing Technology Institute.
methods on SAR and InSAR processing, machine learning and deep learning In early 1994, Richard Bamler was a visiting scientist at Jet Propulsion
for variety data sources, such as SAR, optical images, medical images and so Laboratory (JPL) in preparation of the SIC-C/X-SAR missions, and in 1996
on; PDE related numerical modeling and computing. he was guest professor at the University of Innsbruck. Since 2003 he has held a
full professorship in remote sensing technology at the Technical University of
Munich as a double appointment with his DLR position. His teaching activities
include university lectures and courses on signal processing, estimation theory,
and SAR. Since he joined DLR Richard Bamler, his team, and his institute
have been working on SAR and optical remote sensing, image analysis and
understanding, stereo reconstruction, computer vision, ocean color, passive
and active atmospheric sounding, and laboratory spectrometry. They were
and are responsible for the development of the operational processors for
SIR-C/X-SAR, SRTM, TerraSAR-X, TanDEM-X, Tandem-L, ERS-2/GOME,
ENVISAT/SCIAMACHY, MetOp/GOME-2, Sentinel-5P, Sentinel-4, DESIS,
EnMAP, etc.
Richard Bamler’s research interests are in algorithms for optimum infor-
mation extraction from remote sensing data with emphasis on SAR. This
involves new estimation algorithms, like sparse reconstruction, compressive
sensing and deep learning.

Feng Xu (S’06-M’08-SM’14) received the B.E.


(Hons.) degree in information engineering from
Southeast University, Nanjing, China, in 2003, and
the Ph.D. (Hons.) degree in electronic engineering
from Fudan University, Shanghai, China, in 2008.
From 2008 to 2010, he was a Post-Doctoral Fellow
with the NOAA Center for Satellite Application and
Research, Camp Springs, MD, USA. From 2010
to 2013, he was with Intelligent Automation Inc.,
Rockville, MD, USA, while he was partly with
the NASA Goddard Space Flight Center, Greenbelt,
MD, USA, as a Research Scientist. In 2012, he was selected into China’s
Global Experts Recruitment Program, and subsequently returned to Fudan
University, Shanghai, China, in 2013, where he is currently a Professor with
the School of Information Science and Technology and the Vice Director
of the MoE Key Laboratory for Information Science of Electromagnetic
Waves. He has authored more than 30 papers in peer-reviewed journals,
co-authored two books, and holds two patents, among many conference
papers. His research interests include electromagnetic scattering modeling,
SAR information retrieval, and radar system development.
Dr. Xu was a recipient of the second-class National Nature Science Award
the IEEE Geoscience and Remote Sensing Society and the 2014 SUMMA
Graduate Fellowship in the advanced electromagnetics area. He currently
erves as the Associate Editor of the IEEE GEOSCIENCE AND REMOTE
SENSING LETTERS. He is the Founding Chair of the IEEE GRSS Shanghai
Chapter.

You might also like