Deep Learning For Geometric and Semantic Tasks in Photogrammetry and Remote Sensing

Geo-spatial Information Science
ISSN: 1009-5020 (Print) 1993-5153 (Online) Journal homepage: https://www.tandfonline.com/loi/tgsi20
Deep learning for geometric and semantic tasks in

photogrammetry and remote sensing
Christian Heipke & Franz Rottensteiner
To cite this article: Christian Heipke & Franz Rottensteiner (2020) Deep learning for geometric
and semantic tasks in photogrammetry and remote sensing, Geo-spatial Information Science,
23:1, 10-19, DOI: 10.1080/10095020.2020.1718003
To link to this article: https://doi.org/10.1080/10095020.2020.1718003
© 2020 Wuhan University. Published by

Informa UK Limited, trading as Taylor &
Francis Group.
Published online: 03 Feb 2020.
Submit your article to this journal
Article views: 6644
View related articles
View Crossmark data
Citing articles: 20 View citing articles
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tgsi20
GEO-SPATIAL INFORMATION SCIENCE
2020, VOL. 23, NO. 1, 10–19
https://doi.org/10.1080/10095020.2020.1718003
Deep learning for geometric and semantic tasks in photogrammetry and

remote sensing
Christian Heipke and Franz Rottensteiner
Institute of Photogrammetry and GeoInformation (PI), Leibniz University Hannover, Hannover, Germany
ABSTRACT ARTICLE HISTORY

During the last few years, artificial intelligence based on deep learning, and particularly based on Received 18 December 2019
convolutional neural networks, has acted as a game changer in just about all tasks related to Accepted 14 January 2020
photogrammetry and remote sensing. Results have shown partly significant improvements in KEYWORDS
many projects all across the photogrammetric processing chain from image orientation to surface Deep learning; machine
reconstruction, scene classification as well as change detection, object extraction and object learning; convolutional
tracking and recognition in image sequences. This paper summarizes the foundations of deep neural networks(CNN);
learning for photogrammetry and remote sensing before illustrating, by way of example, differ- example project from IPI
ent projects being carried out at the Institute of Photogrammetry and GeoInformation, Leibniz
University Hannover, in this exciting and fast moving field of research and development.
including photogrammetry and remote sensing (Zhu

1. Introduction
et al. 2017). The main reasons are twofold: (a) since
The use of neurons and neural networks for artificial a few years, computers are powerful enough to process
intelligence in general, and for tasks related to image and store data using large networks with many layers
understanding in particular, is not new. Artificial neu- (called “deep” networks), in particular when using
rons were described by McCulloch and Pitts as early as GPUs (graphical processing units) during training,
1943. Rosenblatt (1958) developed the first computer and (b) more and more training data became available
program, which implemented the so-called concept of for the different tasks (it should be noted that AlexNet
perceptrons (see Figure 1) and was able to learn based used some 1,2 million labeled training images to learn
on trial and error. After Minsky and Papert (1969) a total of some 60 million parameters). The most
proved mathematically that the original concept could comprehensive textbook available for deep learning
not model the important XOR statement (exclusive today is the one by Goodfellow, Bengio, and
OR; the result is true only for an odd number of Courville (2016).
positive inputs), which dealt the research on neural This paper is structured as follows: after a brief
networks a significant blow, the field was revived summary of the principles of deep learning and
about two decades later with the introduction of back- CNN, by way of example we describe the work carried
propagation (Rummelhart, Hinton, and Williams out along those lines at the Institute of
1986; LeCun 1987), which allowed the efficient train- Photogrammetry and GeoInformation (IPI) of
ing of multi-layer artificial neural networks (see Figure Leibniz University Hannover. We subdivide the
2), to which the theoretical restrictions noted by main chapter into geometric approaches and those
Minsky and Papert (1969) do not apply. Other impor- used in aerial image analysis and close range. Finally,
tant steps were the introduction of Convolutional some conclusions are drawn.
Neural Networks (CNN, LeCun et al. 1989; LeCun
and Bengio 1998) and deep belief networks (Hinton,
2. Convolutional networks for image analysis
Osindero, and Teh 2006). The breakthrough of deep
learning came, when Krizhevsky, Sutskever, and In principle, a CNN can be considered a classifier. In
Hinton (2012) won the ImageNet Large-Scale traditional classifiers (random forests, support vector
Recognition Challenge, a classification task involving machines, conditional random fields, maximum likeli-
1000 different classes (Russakovsky et al. 2015) using hood estimation, etc.) features representing the different
a CNN-based approach. Their network, called classes are extracted from the data set in a pre-processing
AlexNet, lowered the remaining error by nearly 50% step, and classification is then performed based on these
compared to the previous best result. features. It is clear then that the results can only be as
Since then, deep learning based on neural networks good as the selected features. CNN overcome this pro-
has seen a tremendous success in many different areas blem by learning the features together with the
CONTACT Christian Heipke heipke@ipi.uni-hannover.de

© 2020 Wuhan University. Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
GEO-SPATIAL INFORMATION SCIENCE 11
Figure 1. Concept of a perceptron j. Depicted are the input xi, the weight wji, the bias bj, the (non-linear) function f and the
resulting output aj.
one (the one with the maximum entry in the case of

max-pooling) is retained and (c) an activation step,
where the remaining set of pixels is subjected to a non-
linear function. In most current works the rectilinear
unit (ReLU) has been chosen as an activation function.
These steps are followed by processing through a few
densely connected layers which eventually results in
a feature vector representing the complete input
image. This feature vector is then classified using an
arbitrary classifier. Typically, the softmax classifier is
used as it has several advantages (Kreinovich and
Quintana 1991).
Similar to the concept of image pyramids the pool-
ing step is employed to increase the context area
Figure 2. Artificial neural network with input layer, two hidden considered by each filter. A non-linear activation func-
layers and output layer. tion must be used, since otherwise, all steps could be
substituted by one (linear) layer between input and
corresponding label for each data sample (see Figure 3). output, which is known not to be expressive enough
The price to pay is the fact that a very large amount of for learning any but very simple tasks. The elements of
training data is needed to estimate this largely increased the filters are considered as unknown parameters
number of unknowns. Since often the required amount which are learned from training data via stochastic
of training data is not available, additional data are gen- gradient descent. Initial values can typically be selected
erated from the available ones (data augmentation) or arbitrarily and the gradients are computed by back-
simulation results are used as a substitute for real training propagation. Updates for the unknowns are found
data. based on a specially designed loss function, which for
In a CNN architecture, in principle three different the training data minimizes a function of the differ-
steps are carried out in each layer (see Figure 4): (a) the ences between the class predicted by the network and
convolution step, where a set of digital filters is applied the known class. Various training strategies are in use
to an input image of fixed size, (b) a so-called pooling regarding the size of the sample set used simulta-
step, where from a larger group of filtered pixels only neously (called batch size) in one parameter update
Figure 3. Concept of a standard classifier (top) and a CNN classifier (bottom). The advantage of the latter is that the features and
the model parameters are learned simultaneously from the training data.
12 C. HEIPKE AND F. ROTTENSTEINER
Figure 4. Architecture of a typical Convolutional Neural Network for image analysis. The figure shows the successive steps of
convolution and pooling to generate a feature vector which is classified in the final step, typically using the softmax classifier (the
non-linear activation function is not depicted).
step and the selection of nodes used for each training unknowns necessary for complex tasks; this explains
sample (in the so-called dropout strategy some of the the fact that in general deeper networks yield better
nodes are not always used to increase the generaliza- results (e.g. He et al. 2015).
tion capabilities of the network). While the original concept of a CNN would typi-
As would have become apparent after this descrip- cally learn a feature vector to represent a whole image,
tion, when using a CNN, several parameters need to be other tasks have also been solved using CNN. Among
fixed prior to processing the images. These comprise those are pixel-wise classification (called semantic seg-
among others the number of filters and their size, the mentation in Computer Vision (CV) terminology),
number of nodes in each layer and the number of where Fully Convolutional Networks (FCN, Long,
layers. The latter one is of particular importance Shelhamer, and Darrell 2015) are employed. Encoder-
(Baral, Fuentes, and Kreinovich 2018): In principle, decoder networks (Hinton and Salakhutdinov 2006;
a neural network (as any supervised classifier) can be Ronneberger, Fischer, and Brox 2015, see Figure 5)
seen as an interpolation function with the training carry out the upsampling required to get pixel-wise
samples serving as support. Each path between input class predictions in a series of steps in the decoder part
and output through the network represents such that mirror the structure of the downsampling proce-
a function. In order to increase the accuracy of the dure of the encoder network. The U-net structure of
overall results, many different functions are needed. Ronneberger, Fischer, and Brox (2015) includes so-
However, permutations within a layer lead to the same called skip connections to better preserve object
function being implemented through different paths. boundaries. Also object detection, where objects are
Therefore, the number of nodes per layer should be described by bounding boxes (Ren et al. 2017) and
kept reasonably small, and as a consequence, many object delineation (instance segmentation in the CV
layers are needed in order to obtain the number of world, He et al. 2017), where in addition to these
Figure 5. The U-net architecture, an example of an encoder network with skip connections (Ronneberger, Fischer, and Brox 2015).
bounding boxes a mask is computed for each object work at the Institute of Photogrammetry and
with pixels belonging to either fore- or background, GeoInformation, as will be shown in the following.
describe very useful tasks tackled using CNNs. Other
network architectures comprise Siamese networks
(Bromley 1993), where weights are shared between 3.1. CNN for geometric tasks
two different parts of the network, often to determine
Problems relating to image orientation and dense sur-
similarity of two images (e.g. in image matching),
face reconstruction are considered geometric tasks in
Recurrent Neural Networks (RNN, e.g. Grave et al.
this context. We report on projects related to these two
2009) for dealing with time-dependent data and
tasks.
Generative Adversarial Networks (GAN, Goodfellow
In image orientation, a specific problem is the
et al. 2014), which can learn new data with the same
detection, description and matching of conjugate
statistical distribution as a given data set. The latter
point pairs. While in standard cases different opera-
can be useful, e.g. in transfer learning (Yosinski et al.
tional solutions based on the well-known SIFT (Scale
2014; Tzeng et al. 2017). Finally, CNN techniques have
Invariant Feature Transform, Lowe 2004) operator
also been applied to unstructured 3D point data
exist, these solutions reach their limits for wide base-
(Landrieu and Simonovsky 2018), e.g. representing
line image pairs with largely different viewing direc-
depth (Qi et al. 2016).
tions and different scales. This is for instance the case
In particular, for pixel-wise classification and for
when oblique aerial images of different viewing direc-
object delineation it is important in our field to con-
tions need to be matched. Chen, Rottensteiner, and
sider the geometric accuracy of the object boundary, as
Heipke (2016) suggest a Siamese network to learn
a different label is sought for each pixel. Thus, in some
a feature descriptor to solve this problem. The loss
works maximum pooling, which acts as a low path
function is designed according to the triplet loss para-
filter and thus blurs the boundary, is not used. In order
digm (Weinberger and Saul 2009): it pulls the descrip-
to still keep the number of filter elements, and thus of
tors of matching patches closer in feature space while
unknown parameters to be estimated, at a reasonable
pushing the descriptors for non-matching pairs
number, filter elements are interpolated from
further away from each other.
a selected number of unknowns in successive layers,
Also after decades of research and development, 3D
or dilated convolution, originally developed for wave-
surface reconstruction cannot be considered
let decomposition (Holschneider et al. 1990; Yu and
a problem solved under all circumstances: areas with
Koltun 2016), is used, where a number of elements are
poor and repetitive texture, as well as sharp depth
set to zero. In both cases, care should be taken not to
discontinuities and resulting occlusions continue to
violate the sampling theorem.
pose difficulties. The first solution based on CNN
was presented by Zbontar and LeCun (2015). At IPI
3. Deep learning research at IPI we deal with this problem on two levels: On the one
hand, Kang et al. (2019) developed a new dense stereo
In photogrammetry and remote sensing, and in parti-
method based on dilated convolution, which does not
cular when dealing with aerial or satellite images, some
only use depth as training data but includes a depth
of the conditions which hold true for typical computer
gradient term into the loss function (see Figure 6). The
vision applications do not apply: (a) the images are
results show that more detail can be retrieved in par-
much larger and contain a multitude of objects, each
ticular in the presence of depth discontinuities, if (and
often only a few pixels in size; (b) the image orientation
only if) the gradients in the training data are reliable.
and the ground sampling distance are typically known;
On the other hand, Mehltretter and Heipke (2019)
(c) there is no preferred direction in the image (“up”
improve the quality of dense stereo matching by ana-
does not point to the sky); (d) besides 3-channel color
lyzing the 3D cost volume of the related disparity
images other modalities such as additional bands (e.g.
space image. In a novel CNN architecture features
the infrared channel) and depth are often available,
for confidence estimation are directly learned from
sometimes also other data such as maps, social media
the volumetric 3D data.
data or Volunteered Geographical Information (VGI);
(e) often, considerable prior knowledge about the scene
is available; (f) typically, there is a shortage of training 3.2. Aerial image analysis
data, while at least in an update scenario outdated map
data are given; and finally (g) the accuracy require- The automatic analysis of aerial imagery has been
ments are typically more stringent, both for geometric a major focus of research for a number of decades
and for semantic results. Thus, the question did arise at IPI. We currently work on three different topics
a few years ago, in how far deep learning and CNN can with a connection to deep learning: land cover and
be used to advantage also in photogrammetry and land use classification, transfer learning and bomb
remote sensing. This question has also influenced crater detection.
Figure 6. Network architecture for dense matching (Kang et al. 2019).
The first one is concerned with the update of 2019). Assuming the availability of labeled training
land cover and land use databases. Yang, data for existing data (called the source domain), we
Rottensteiner, and Heipke (2018, 2019) have sug- adapt a CNN trained on these data to new data (target
gested two network architectures, one for land domain) that have a different joint distribution of class
cover and another one for land use update. For labels and features. In domain adaptation, a specific
the land cover, an ensemble classifier combining setting of transfer learning, this adaptation is to be
RGB data with an infrared channel and height in achieved without new hand-labeled training samples
the form of a normalized Digital Terrain Model is from the new domain. For that purpose, we adapt
being used in an encoder-decoder network struc- Adversarial Discriminative Domain Adaptation
ture with skip connections (see Figure 7). In the (ADDA; Tzeng et al. 2017) to the prediction of land
following land use estimation, the object shapes are cover from aerial images and a Digital Surface Model
taken from the topographic database to stabilize the (DSM). Adversarial methods try to train a neural net-
solution, while for each object the label is estimated work to produce a feature representation that is inde-
using the input information as well as the result of pendent from the domain from which a sample is
land cover classification. The results confirm that drawn; similarity is measured by the capability of
CNN can outperform the best methods employed another neural network (called discriminator) to pre-
previously, i.e. Conditional Random Fields (Albert, dict from which domain a feature vector was drawn.
Rottensteiner, and Heipke 2017). While ADDA gives encouraging results for similar
Another topic we work on is related to transfer domains, there is clearly room for improvement if
learning with the goal of pixel-wise classification of the domains are very different, especially with respect
mono-temporal data (Wittich and Rottensteiner to the distribution of class labels.
Figure 7. Architecture of ensemble classifier for semantic segmentation of land cover including skip connections. The top encoder
part takes color images as input, the bottom part the infrared channel and height information. The encoder part ensures a detailed
information for each pixel (Yang et al. 2019).
In a more classical pattern recognition approach several data and prior terms. A multi-task CNN delivers
Clermont et al. (2019) extract bomb crater from images some of the image-related data terms by predicting the
acquired during the second world war (see Figure 8). positions of keypoints and model edges in the image
The background of this work is the fact that a number of while also providing a prior term for the coarse orienta-
bombs did not explode during the war and are still tion (rotation about the vertical axis) of the car. Figure 9
sitting in the ground, posing a significant danger in shows the qualitative results of 3D reconstruction based
particular during ground construction work. The ratio- on Coenen, Rottensteiner, and Heipke (2019).
nale of the project is that finding the bomb craters will Pedestrian detection and tracking (Nguyen,
give an indication of where unexploded bombs might Rottensteiner, and Heipke 2019) rely on the Mask
lie. The work is based on a variant of the ResNet R-CNN approach (He et al. 2017) to generate and
architecture (He et al. 2015), the results show that this classify region proposals assumed to contain pedes-
seemingly not so difficult problem is indeed challen- trians. Since stereo information is available detection
ging, partly because of the lack of a sufficient number of and tracking are carried out in 3D space, which allows
training data. to employ additional geometric constraints (a position
in 3D can only be occupied by one person). Data
association is then based on the triplet loss using
3.3. Close range applications TriNet (Hermans, Beyer, and Leibe 2017) and takes
into account the local context. Experiments indicate
In this area, we are concerned with mobility, as well as
the good quality of the results, both when evaluating
a project dealing with artwork. In the field of mobility,
the geometric accuracy of the resulting trajectories and
we have designed and implemented a system that can
also when investigating their length: the new approach
recognize and determine the relative poses of cars in
shows fewer identity switches and thus longer trajec-
a stereoscopic image sequence based on adaptive
tories than comparable solutions.
shape models. In a related project, pedestrians are
Person re-identification is tackled by using a fisheye
detected and tracked in these sequences. Finally, we
camera in nadir viewing position (Blott, Takami, and
are working on the re-identification of persons being
Heipke 2018, Blott, Yu, and Heipke 2019). In this way,
viewed from different cameras of a sensor network. All
multiple views of a person (front, side, back) can be
three projects are connected to the German Science
extracted from the image sequences, before comparing
Foundation as part of the Research Training Network
this 3-view set of images with a database in order to re-
“Integrity and Collaboration in Dynamic Sensor
identify the person. Classification of the different
Networks” funded at our university (i.c.sens 2019).
views uses a ResNet variant (He et al. 2015), while in
In the first project (Coenen, Rottensteiner, and
the matching stage the TriNet is used to extract fea-
Heipke 2019), for every detected object a CAD model is
tures. The results are promising and the approach
fitted into a stereo image pair and the derived point
outperforms existing approaches by a significant mar-
cloud, allowing to estimate the pose of the car relative
gin, partly due to the fact that more information is
of the camera position and, consequently, of the camera
available than in single image solutions.
relative to the other car. If the detected cars are equipped
The last project we want to discuss is related to
with a GNSS receiver and can communicate their posi-
cultural heritage documentation. There are many
tion to the camera, these cars thus act as dynamic control
museums having collections of silk fabrics. These col-
points for image orientation and, thus, the positioning of
lections are also documented in digital records, typically
the cars. The core of the method is 3D reconstruction by
consisting of digital images and a corresponding text.
optimizing a probabilistic energy function involving
Figure 8. Results of automatic detection of bomb craters in historic wartime images using convolutional neural networks.
Figure 9. Four qualitative results of 3D vehicle reconstruction based on (Coenen et al. 2019). Left: Input image, superimposed with
extracted model wireframes. Right: 3D view on the reconstructed scene.
The information contained in the text, e.g. describing the related classes. Otherwise there is a risk of
the time or place of production of a fabric, is very overfitting the classifier to the training data and
important for art historians, but it is not provided in a bias is likely to be introduced into the results.
a standardized way, and sometimes important pieces of To increase the amount of training data, data
information are missing. In the context of an EU H2020 augmentation, transfer learning, approaches
project (SILKNOW 2019), a multi-task CNN based on which are able to tolerate a certain amount of
ResNet (He et al. 2015) was developed that simulta- incorrect labels (label noise), semi-supervised
neously predicts the production time, the production and unsupervised learning (clustering) can be
place and the production technique from a digital employed and should be studied. In some cases,
image, deriving the training data automatically by ana- simulation techniques may also help.
lyzing existing collections (Dorozynski, Clermont, and ● A CNN “cannot learn the unseen”, the general-
Rottensteiner 2019). The results show that by combin- ization capabilities are limited to previously seen
ing these prediction tasks, the accuracy of prediction is training data.
increased if high-quality training samples are used. ● Incremental learning and forgetting (or
“unlearning”) data, e.g. those which are not rele-
vant anymore due to a changing environment, is
4. Conclusions a topic which has received little attention in our
The short summary of the individual projects had the field so far, yet this area offers a large potential, in
goal to convince the reader, that indeed, deep learning particular for multi-temporal analysis.
and CNN-based solutions carry great value in photo- ● A number of design decisions need to be taken, e.g.
grammetry and remote sensing. In both, geometric and with respect to the network architecture and the
semantic tasks, CNN-based solutions outperform those design of the loss function. It is not clear in gen-
based on more traditional image analysis. The strength eral, how different choices influence the results,
of CNN is the combined estimation of the feature and how robust the classifiers are. Some works
representation and the labels during classification, and suggest that CNN can be indeed be fooled rela-
it seems that deeper networks are practically guaranteed tively easily (Nguyen, Yosinski, and Clune 2015).
to yield better results than shallow networks, as long as ● A CNN is based on correlations of different data
enough training data is available. Open source imple- sets. We argue that understanding a task to then
mentations for CNN exist, and the industry has started reason about possible solutions in a way humans
to make heavy use of these algorithms. do is far beyond the scope of the currently
Having said that, one should not forget that in employed methods (note that this does not
essence, a CNN (and any deep learning approach) is mean that reasoning is not done, e.g. in a game
a classifier. As such it comes with the same general of chess or Go. It does mean, however, that CNN
limitations as any other classifier. Therefore, a number does not have an intuition for possibly correct
of questions need further attention: solutions and abstract deductive learning).
● A CNN is largely a black box. While it may deliver
● A CNN needs a sufficient number of representa- very good results, it is largely unknown why and
tive training data, well balanced with respect to how exactly these results are being reached. Besides
being a little frustrating from a scientific point of Fernerkundung Geoinformation”. Being the Chairman of the
view, this means that the limitations of these meth- ISPRS Working Group II/4, he initiated and conducted the
ods cannot clearly be stated, resulting in some ISPRS benchmark on urban object detection and 3D building
reconstruction.
doubts whether the methods can be employed in
real-world safety- and security-related areas –
autonomous driving is a good example. ORCID
Thus, it seems that a number of difficult research ques- Christian Heipke http://orcid.org/0000-0002-7007-9549
tions still exist in our field. Besides taking care of a better Franz Rottensteiner http://orcid.org/0000-0003-1942-
8210
geometric and semantic accuracy of the results, improv-
ing their reliability is of great importance. This will only
be possible by investigating better ways to explain why References
deep learning approaches give the results they do (see
e.g. Roscher et al. 2019). Another important aspect is the General references
integration of deep learning approaches with other
Baral, C., O. Fuentes, and V. Kreinovich. 2018.“Why Deep
learning paradigms and prior knowledge, according to
Neural Networks: A Possible Theoretical Explanation”. In
the motto, “Why learn what we already know?”. So far, Constraint Programming and Decision Making: Theory
the approaches discussed in this paper are mainly stand- and Applications, edited by M. Ceberio and V.
alone solutions. We believe that in the long run, only Kreinovich, 1–6. Cham, Switzerland: Springer. http://
a combination of different methods will lead to success. www.cs.utep.edu/vladik/2015/tr15-55.pdf
Bromley, J., J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun,
C. Moore, and R. Shah. 1993. “Signature Verification
Using a “Siamese” Time Delay Neural Network.”
Disclosure statement International Journal of Pattern Recognition and
No potential conflict of interest was reported by the authors. Artificial Intelligence 7 (04): 669–688. doi:10.1142/
S0218001493000339.
Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-
Farley, S. Ozair, A. Courville, and Y. Bengio, 2014.
Notes on contributors “Generative Adversarial Nets.” Advances in Neural
Information Processing Systems 27 (NIPS’14), Montreal,
Christian Heipke is a professor of photogrammetry and
Quebec, Canada, December 8–13, 2672–2680.
remote sensing at Leibniz University Hannover, where he
Goodfellow, I., Y. Bengio, and A. Courville. 2016. Deep
currently leads a group of about 25 researchers. His profes-
Learning. Cambridge, MA: MIT Press.
sional interests comprise all aspects of photogrammetry,
Grave, A., M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke,
remote sensing, image understanding and their connection
and J. Schmidhuber. 2009. “A Novel Connectionist System
to computer vision and GIS. His has authored or coauthored
for Improved Unconstrained Handwriting Recognition.”
more than 300 scientific papers, more than 70 of which
IEEE Transactions on Pattern Analysis and Machine
appeared in peer-reviewed international journals. He is the
Intelligence 31 (5): 855–868. doi:10.1109/TPAMI.2008.137.
recipient of the 1992 ISPRS Otto von Gruber Award, the
He, K., G. Gkioxari, P. Dollar, and R. Girshick. 2017. “Mask
2012 ISPRS Fred Doyle Award, and the 2013 ASPRS
R-CNN.” Proc. International Conference on Computer
Photogrammetric (Fairchild) Award. He is an ordinary
Vision (ICCV), Venice, Italy, 2980–2988.
member of various learnt societies. From 2004 to 2009, he
He, K., X. Zhang, S. Ren, and J. Sun. 2015. “Delving Deep
served as vice president of EuroSDR. From 2011-2014 he
into Rectifiers: Surpassing Human-level Performance on
was chair of the German Geodetic Commission (DGK),
ImageNet Classification.” IEEE International Conference
from 2012-2016 ISPRS Secretary General. Currently he
on Computer Vision (ICCV), Las Condes, Santiago,
serves as ISPRS President.
Chile, 1026–1034.
Franz Rottensteiner is an Associate Professor and leader of Hermans, A., L. Beyer, and B. Leibe. 2017. “In Defence of the
the research group “Photogrammetric Image Analysis” at Triplet Loss for Person Re-identification.” In CoRR.
Leibniz University Hannover. He received the Dipl.-Ing. arXiv:abs/1703.07737. Ithaca, NY: Cornell University.
degree in surveying and the Ph.D. degree and venia docendi https://arxiv.org/abs/1703.07737
in photogrammetry, all from Vienna University of Hinton, G., and R. Salakhutdinov. 2006. “Reducing the
Technology (TUW), Vienna, Austria. His research interests Dimensionality of Data with Neural Networks.” Science
include all aspects of image orientation, image classification, 313 (5786): 504–507. doi:10.1126/science.1127647.
automated object detection and reconstruction from images Hinton, G., S. Osindero, and Y. Teh. 2006. “A Fast Learning
and point clouds, and change detection from remote sensing Algorithm for Deep Belief Nets.” Neural Computation 18:
data. Before joining LUH in 2008, he worked at TUW and the 1527–1554. doi:10.1162/neco.2006.18.7.1527.
Universities of New South Wales and Melbourne, respec- Holschneider, M., R. Kronland-Martinet, J. Morlet, and
tively, both in Australia. He has authored or coauthored P. Tchamitchian. 1990. “A Real-time Algorithm for
more than 150 scientific papers, 36 of which have appeared Signal Analysis with the Help of the Wavelet
in peer-reviewed international journals. He received the Karl Transform.” In Wavelets, edited by J. M. Combres,
Rinner Award of the Austrian Geodetic Commission in 2004 A. Grossmann, and P. Tchamitchian, 286–297. Berlin,
and the Carl Pulfrich Award for Photogrammetry, sponsored Heidelberg: Springer.
by Leica Geosystems, in 2017. Since 2011, he has been the Kreinovich, V., and C. Quintana. 1991. “Neural Networks:
Associate Editor of the ISI-listed journal “Photogrammetrie What Non-linearity to Choose?” Proceedings of the 4th
University of New Brunswick Artificial Intelligence Russakovsky, O., J. Deng, H. Su, J. Krause, S. Satheesh,
Workshop, Fredericton, New Brunswick, 627–637. S. Ma, Z. Huang, et al. 2015. “ImageNet Large Scale
Krizhevsky, A., I. Sutskever, and G. E. Hinton. 2012. Visual Recognition Challenge.” International Journal of
“ImageNet Classification with Deep Convolutional Computer Vision 115 (3): 211–252. doi:10.1007/s11263-
Neural Networks.” Advances in Neural Information 015-0816-y.
Processing Systems 25 (NIPS’12), Lake Tahoe, NV, Tzeng, E., J. Hoffman, K. Saenko, and T. Darrell. 2017.
1097–1105. “Adversarial Discriminative Domain Adaptation.” Proc.
Landrieu, L., and M. Simonovsky. 2018. “Large-scale Point IEEE Conference on Computer Vision and Pattern
Cloud Semantic Segmentation with Superpoint Recognition (CVPR), Honolulu, HI, July 21–26.
Graphs.” Proceedings of the IEEE Conference on Weinberger, K. Q., and L. K. Saul. 2009. “Distance Metric
Computer Vision and Pattern Recognition (CVPR), Learning for Large Margin Nearest Neighbor
Salt Lake City, UT. Classification.” Journal of Machine Learning Research
LeCun, Y. 1987. “Modèles connexionnistes de l’apprentis- 10: 207–244.
sage.” Thèse de Doctorat, Université Paris 6. Yosinski, J., J. Clune, Y. Bengio, and H. Lipson. 2014. “How
LeCun, Y., B. Boser, J. S. Denker, D. Henderson, Transferable are Features in Deep Neural Networks?”
R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Advances in Neural Information Processing Systems 27
“Handwritten Digit Recognition with a (NIPS’14), Montreal, Quebec, Canada, December 8–13.
Back-propagation Network.” 2nd International Yu, F., and V. Koltun. 2016. “Multi-Scale Context
Conference on Neural Information Processing Systems Aggregation by Dilated Convolutions.” 4th International
(NIPS’89), Denver, CO, 396–404. Conference on Learning Representations, Caribe Hilton,
LeCun, Y., and Y. Bengio. 1998. Convolutional Networks for San Juan, Puerto Rico, May 2–4.
Images, Speech, and Time Series, the Handbook of Brain Zbontar, J., and Y. LeCun. 2015. “Computing the Stereo
Theory and Neural Networks. Cambridge, MA: MIT Matching Cost with a Convolutional Neural Network.”
Press. CVPR 1592–1599. doi:10.1109/CVPR.2015.7298767.
Long, J., E. Shelhamer, and T. Darrell. 2015. “Fully Zhu, X., D. Tuia, L. Mou, G. Xia, L. Zhang, F. Xu, and
Convolutional Networks for Semantic Segmentation.” F. Fraundorfer. 2017. “Deep Learning in Remote
IEEE Computer Vision and Pattern Recognition (CVPR Sensing: A Comprehensive Review and List of
‘15), Boston, MA. Resources.” IEEE GRSS Magazine 5 (4): 8–36.
Lowe, D. G. 2004. “Distinctive Image Features from
Scale-invariant Keypoints.” International Journal of
Computer Vision 60 (2): 91–110. doi:10.1023/B:
IPI contributions
VISI.0000029664.99615.94.
McCulloch, W., and W. Pitts. 1943. “A Logical Calculus of Albert L., Rottensteiner F., and Heipke C. 2017. “A Higher
the Ideas Immanent in Nervous Activity.” Bulletin of Order Conditional Random Field Model for Simultaneous
Mathematical Biophysics 5: 115–133. doi:10.1007/ Classification of Land Cover and Land Use.” ISPRS Journal
BF02478259. for Photogrammetry and Remote Sensing 130 (2017): 63–80.
Minsky, M., and S. Papert. 1969. Perceptrons. Cambridge, Blott G., Yu J., and Heipke C. 2019. “Multi-View Person
MA: MIT Press. Re-Identification in a Fisheye Camera Network with
Nguyen, A., J. Yosinski, and J. Clune. 2015. “Deep Neural Different Viewing Directions.” PFG. doi:10.1007/
Networks are Easily Fooled: High Confidence Predictions s41064-019-00083-y.
for Unrecognizable Images.” IEEE Computer Vision and Blott G., Takami M., and Heipke C. 2018. “Semantic
Pattern Recognition (CVPR ‘15), Boston, MA. Segmentation of Fisheye Images.” Computer Vision –
Qi, C. R., H. Su, K. Mo, and L. Guibas. 2016. “PointNet: ECCV 2018 Workshops Part I – 6th Workshop on
Deep Learning on Point Sets for 3D Classification and Computer Vision for Road Scene Understanding and
Segmentation.” IEEE Conference on Computer Vision Autonomous Driving, Springer LNCS 11,129, Cham,
and Pattern Recognition (CVPR), Las Vegas, NV. 181–196.
Ren, S., K. He, R. Girshick, and J. Sun. 2017. “Faster R-CNN: Chen L., Rottensteiner F., and Heipke C. 2016. “Invariant
Towards Real-time Object Detection with Region Descriptor Learning Using a Siamese Convolutional
Proposal Networks.” IEEE Transactions on Pattern Neural Network.” ISPRS Annals of the
Analysis and Machine Intelligence 39 (6): 1137–1149. Photogrammetry, Remote Sensing and Spatial
doi:10.1109/TPAMI.2016.2577031. Information Sciences III-3, Prague, Czech Republic, July
Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: 12–19.
Convolutional Networks for Biomedical Image Clermont D., Kruse C., Rottensteiner F., and Heipke C.
Segmentation.” 18th Int. Conference on medical image 2019. “Supervised Detection of Bomb Craters in
computing and computer assisted intervention, Munich, Historical Aerial Images Using Convolutional Neural
Germany. Networks.” ISPRS Annals of the Photogrammetry,
Roscher, R., B. Bohn, M. F. Duarte, and J. Garcke. 2019. Remote Sensing and Spatial Information Sciences XLII-2/
“Explainable Machine Learning for Scientific Insights and W16: 67–74. doi:10.5194/isprs-archives-XLII-2-W16-67-
Discoveries.” In arXiv:1905.08883. Ithaca, NY: Cornell 2019.
University. Coenen M., Rottensteiner F., and Heipke C. 2019. “Precise
Rosenblatt, F. 1958. “The Perceptron. A Probabilistic Model Vehicle Reconstruction for Autonomous Driving
for Information Storage and Organization in the Brain.” Applications.” ISPRS Annals of the Photogrammetry,
Psychological Reviews 65: 386–408. doi:10.1037/ Remote Sensing and Spatial Information Sciences IV-2/
h0042519. W5: 21–28. doi:10.5194/isprs-annals-IV-2-W5-21-2019.
Rummelhart, D., G. Hinton, and R. Williams. 1986. Dorozynski M., Clermont D., and Rottensteiner F., 2019.
“Learning Representations by Back-propagating Errors.” “Multi-task Deep Learning with Incomplete Training
Nature 323: 533–536. doi:10.1038/323533a0. Samples for the Image-based Prediction of Variables
Describing Silk Fabrics.” ISPRS Annals of the Camera.” ISPRS Annals IV-2/W5, Enschede, The
Photogrammetry, Remote Sensing and Spatial Netherlands, June 10–14, 53–60.
Information Sciences IV-2/W6: 47–54. SILKNOW. 2019. Accessed 20 November 2019. http://sil
i.c.sens. 2019. Accessed 20 November 2019. https://www. know.eu/
icsens.uni-hannover.de/start.html?&L=1 Wittich D., and Rottensteiner F. 2019. “Adversarial Domain
Kang J., Chen L., Deng F., and Heipke C. 2019. “Context Adaptation for the Classification of Aerial Images and
Pyramidal Network for Stereo Matching Regularized by Height Data Using Convolutional Neural Networks.”
Disparity Gradients.” ISPRS Journal of Photogrammetry ISPRS Annals of the Photogrammetry, Remote Sensing
and Remote Sensing 157 (2019): 201–215. and Spatial Information Sciences IV-2/W7: 197–204.
Mehltretter M., and Heipke C. 2019. “CNN-based Cost Yang C., Rottensteiner F., and Heipke C. 2018. “Classification
Volume Analysis as Confidence Measure for Dense of Land Cover and Land Use Based on Convolutional
Matching.” ICCV Workshop on 3D Reconstruction in Neural Networks.” ISPRS Annals of the Photogrammetry,
the Wild (3DRW2019). http://openaccess.thecvf.com/con Remote Sensing and Spatial Information Sciences IV-3:
tent_ICCVW_2019/papers/3DRW/Mehltretter_CNN- 251–258. doi:10.5194/isprs-annals-IV-3-251-2018.
Based_Cost_Volume_Analysis_as_Confidence_Measure_ Yang C., Rottensteiner F., and Heipke C. 2019. “Classification of
for_Dense_Matching_ICCVW_2019_paper.pdf Land Cover and Land Use Based on Convolutional Neural
Nguyen U., Rottensteiner F., and Heipke C. 2019. Networks.” ISPRS Annals of the Photogrammetry, Remote
“Confidence-aware Pedestrian Tracking Using a Stereo Sensing and Spatial Information Sciences III-3: 251–258.

Deep Learning For Geometric and Semantic Tasks in Photogrammetry and Remote Sensing

Uploaded by

Copyright:

Available Formats

Deep Learning For Geometric and Semantic Tasks in Photogrammetry and Remote Sensing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning For Geometric and Semantic Tasks in Photogrammetry and Remote Sensing

Uploaded by

Copyright:

Available Formats

Geo-spatial Information Science

ISSN: 1009-5020 (Print) 1993-5153 (Online) Journal homepage: https://www.tandfonline.com/loi/tgsi20

Deep learning for geometric and semantic tasks in

Christian Heipke & Franz Rottensteiner

To link to this article: https://doi.org/10.1080/10095020.2020.1718003

© 2020 Wuhan University. Published by

Published online: 03 Feb 2020.

Submit your article to this journal

Article views: 6644

View related articles

View Crossmark data

Citing articles: 20 View citing articles

Full Terms & Conditions of access and use can be found at

Deep learning for geometric and semantic tasks in photogrammetry and

ABSTRACT ARTICLE HISTORY

including photogrammetry and remote sensing (Zhu

CONTACT Christian Heipke heipke@ipi.uni-hannover.de

one (the one with the maximum entry in the case of

Figure 6. Network architecture for dense matching (Kang et al. 2019).

You might also like