Aerial Scene Understanding Using Deep Wavelet Scattering Network and Conditional Random Field

Sandeep Nadella¹⁵,
Amarjot Singh¹⁶ &
S. N. Omkar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9913))

Included in the following conference series:

European Conference on Computer Vision

9898 Accesses
5 Citations

Abstract

This paper presents a fast and robust architecture for scene understanding for aerial images recorded from an Unmanned Aerial Vehicle. The architecture uses Deep Wavelet Scattering Network to extract Translation and Rotation Invariant features that are then used by a Conditional Random Field to perform scene segmentation. Experiments are conducted using the proposed framework on two annotated datasets of 1277 images and 300 aerial images, introduced in the paper. An overall pixel accuracy of 81 % and 78 % is achieved for the datasets. A comparison with another similar framework is also presented.

You have full access to this open access chapter, Download conference paper PDF

SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

Article 19 June 2024

SKYSCENES: A Synthetic Dataset for Aerial Scene Understanding

Multitemporal Aerial Image Registration Using Semantic Features

Keywords

1 Introduction

Unmanned air vehicles (UAVs) have recently become a useful information gathering medium for numerous applications such as surveillance [15], vegetation management [23], disaster management (flood) [18], atmosphere pollution monitoring [22] and coastline management [2]. UAVs have particularly gained popularity for data collection in the aftermath of natural catastrophes such as floods [18] and earthquakes [18] due to their ease of deployment, ability to fly at low altitudes and capture images at higher resolution. These systems have been used in the past to segment the aerial image into regions to develop emergency route plans that can help to rescue trapped victims and estimate the incurred damage.

Numerous attempts have been made in the past to segment regions from aerial imagery. Initial methods in this area focused only on segmenting roads from aerial images. Some of them achieved this task by using traditional methods such as active contours and snakes [9] while others used features such as higher order movements [17] and intensity [3]. These methods were further extended to segment the aerial image into other natural and man-made landmarks (in addition to roads) that can help to construct detailed maps of the terrain of interest and further benefit route planning. Ghiasi et al. [7], Dubuisson-Jolly et al. [6] and Rezaeian et al. [16] used a fusion of color and texture features to achieve semantic segmentation of aerial images. Lathuiliere et al. [10] combined a Markov model with SVM to achieve the task of aerial scene segmentation while Montoya-Zegarra et al. [13] used class specific priors with Conditional Random Field (CRF) to achieve the pixel-wise labeling. Marmanis et al. [12] used an ensemble of Convolutional Neural Networks (CNNs) to segment out vegetation regions from aerial images.

Hand-engineered color and texture features achieve only a nominal scene segmentation accuracy while despite the success of CNNs, design and optimal configuration of these networks is not well understood making it difficult to develop these networks. In addition, it is difficult to train CNNs for aerial scene segmentation due to the availability of limited training dataset. Bruna et al. [1] and Sifre et al. [21] have shown that wavelet-based ScatterNets can give a performance that is competitive to that of trained networks, based on our accumulated knowledge of geometrical image properties. Hence, we use the Deep Wavelet scattering architecture proposed by Mallat [21] as the front-end of our proposed pipeline to extract translation and rotation invariant scattering features. Condition Random field (CRF) is the obvious choice from the back-end as they give superior performance over Markov Random Field (MRF) [5]. Hence, CRF is used as the back-end of the proposed network that uses the translation and rotation invariant features extracted by the scattering network to perform the desired scene segmentation.

This paper presents a framework for Scene Understanding for aerial images recorded from an unmanned air vehicle. The main contributions of the paper are stated below:

Scene Understanding Architecture: The proposed architecture extracts Translation and Rotation Invariant features using a handcrafted computationally efficient Deep Wavelet Scattering network (front-end) that are further used by a Condition Random Field (CRF) (back-end) to achieve the necessary scene segmentation.
Datasets: Since, CRF is a supervised learning algorithm, a dataset of 1277 annotated images carefully collected from Stanford Background dataset [8] and CMU Urban Image dataset [14], that contains the selected natural and man-made landmarks that appear in aerial images is introduced (Please note that these features are not recorded from the UAV). This dataset is used to pre-train the CRF. Next, an UAV aerial image dataset of 300 annotated images recorded from an UAV with the same man-made landmarks is used to fine tune the pre-trained CRF.

The proposed framework is used to perform scene understanding on the introduced datasets. The average segmentation accuracy for each class for both datasets is presented. In addition, an extensive comparison of the proposed pipeline with other scene segmentation methods is presented.

The paper is divided into the following sections. Section 2 presented the Datasets introduced in the paper while Sect. 3 presents the proposed Scene Segmentation Framework. Section 4 presents the experimental results and Sect. 5 draws conclusions.

2 Introduced Annotated Datasets

The paper presents two annotated datasets which contain natural and man-made landmarks which appear in aerial images. The landmarks quite commonly seen in aerial images are included in the datasets. The landmarks are namely: ‘Sky’, ‘Tree’, ‘Road’, ‘Grass’, ‘Water’, ‘Building’, ‘Mountain.’, ‘Foreground objects’. The first dataset (D1) is a collection of 1277 annotated images carefully chooses from Stanford Background dataset [8] and CMU Urban Image Dataset [14]. All images are forced to a fixed resolution of $200 \times 300$. The second UAV aerial image dataset (D2) introduced in the paper includes 300 annotated images takes from the UAV. The images contain the landmarks mentioned above. The UAV and example of two images from the UAV aerial image dataset are shown in Fig. 1.

3 Scene Understanding Framework

This section introduces the proposed scene understanding framework that is used to segment the image regions which can be then utilized to interpret the scene. The framework is composed of a front-end that extracts discriminatory features while a back-end that uses these features to segment the image into different regions. We use the Deep Wavelet scattering architecture proposed by Mallat [21] as the front-end of our proposed pipeline to extract translation and rotation invariant scattering features while Conditional Random field (CRF) is used as the back-end of the proposed network that utilizes the extracted features to perform the desired scene segmentation. The pipeline is shown in Fig. 2.

3.1 Deep Wavelet Scattering Network

Deep wavelet scattering networks are multilayer networks that incorporate geometric knowledge to produce high-dimensional image representations that are discriminative and approximately invariant to translation and rotation [1, 21]. The invariants at the first layer of the network are obtained by filtering the image with multi-scale and multi-directional complex Morlet wavelet decompositions followed by a point-wise nonlinearity and local averaging. The high frequencies lost due to averaging are recovered at the later layers using cascaded wavelet transformations with non-linearities, justifying the need for a multilayer network. Bruna et al. [1] and Sifre et al. [19–21] have proposed numerous convolutional scattering architectures that produce invariant and discriminative feature descriptors. We present the generic idea behind all the models below.

ScatterNets decompose an input image x using multi-scale and multi-directional complex Morlet wavelets that are obtained by dilating and rotating a single band-pass filter $\psi $ for any scale j and direction $\theta $. A wavelet filters a signal x using a complex wavelet ${\psi _{\theta ,j_{1}} }$ using the following formulation:

$$\begin{aligned} x\star {\psi _{\theta ,j_{1}}} = x\star \psi _{{\theta ,j_{1}} }^{a} + \iota x\star \psi _{{\theta ,j_{1}} }^{b} \end{aligned}$$

(1)

where ${\psi }^{a}$ is the real and ${\psi }^{b} $ is the imaginary part of the wavelet. A wavelet transform response commutes with translations, and is therefore not translation invariant. To build a translation invariant representation, a $L_{2}$ point-wise is first applied to the wavelet coefficient as shown below:

$$\begin{aligned} |x\star \psi _{\theta ,j_{1}}| = \sqrt{|x\star \psi _{\theta ,j_{1}}^{a}|^2 + |x\star \psi _{\theta ,j_{1} }^{b}|^2} \end{aligned}$$

(2)

$L_{2}$ is a good non-linearity as it is stable to deformations and non-expansive that makes it stable to additive noise [1]. This results in the regular envelope of the filtered signal which still commutes with translations.

The resulting wavelet-modulus operator applied on the signal x is given by:

$$\begin{aligned} \widetilde{W_{1}}x = (x\star \phi _{J},{|x \star \psi _{\theta ,j}|}_{\theta ,j}) = (S_{0}x,U_{1}x) \end{aligned}$$

(3)

where $x\star \phi _{J}$ is the low-pass coefficient and $|x \star \psi _{\theta ,j}|_{\theta ,j}$ is the high pass coefficient. The invariant part of $U_{1}$ is computed with an averaging over the spatial and angle variables. It is implemented for each scale j, fixed with a roto-translation convolution of $Y(h) = U_{1}x(h,j_{1})$ along the $h = ({u}',{\theta }')$ variable, with an averaging kernel $\varPhi _{J}(h)$. For $p_{1} = (g_{1}, j_{1})$ and $g_{1} = (u, \theta _{1})$, this is written

$$\begin{aligned} S_{1}x(p_{1}) = U_{1}x(., j_{1}) \star \varPhi _{J}(g_{1}) \end{aligned}$$

(4)

We choose $\varPhi _{J}({u}',{\theta }') = (2\pi )^{-1}\varPhi _{J}({u}')$ to perform an averaging over all angles $\theta $ and over a spatial domain proportional to $2^J$.

The high frequencies lost by this averaging are recovered through roto-translation convolutions with separable wavelets. Roto-translation wavelets are computed with three separable products. Complex quadrature phase spatial wavelets $\psi _{\theta _{2},j_{2}}(u)$ or averaging filters $\varPhi _{J}(u)$ are multiplied by complex $2\pi $ periodic wavelet $\bar{\psi }_{k}(\theta ) $ or by $\bar{\phi }(\theta ) = (2 \pi )^{-1}$

$$\begin{aligned} \varPsi _{\theta _{2},j_{2},k_{2}}(u,\theta ) = \psi _{\theta _{2},j_{2}} (u) \bar{\psi }_{k_{2}}(\theta ) \end{aligned}$$

(5)

$$\begin{aligned} \varPsi _{0,J,k_{2}}(u,\theta ) = \varPhi _{J} (u) \bar{\psi }_{k_{2}}(\theta ) \end{aligned}$$

(6)

$$\begin{aligned} \varPsi _{\theta _{2},j_{2},0}(u,\theta ) = \psi _{\theta _{2},j_{2}} (u)\bar{\phi }(\theta ) \end{aligned}$$

(7)

Finally, roto-translation wavelets for second layer are computed as $\widetilde{W_{2}}U_{1}x = (S_{1}x,U_{2}x)$ where $S_{1}x$ is defined in (4) and

$$\begin{aligned} U_{2}x(p_{2}) = |U_{1}x(., j_{1}) \star \varPsi _{\theta _{2},j_{2},k_{2}}(g_{1})| \end{aligned}$$

(8)

with $g_{1} = (u, \theta _{1})$, $p_{2} = (g_{1}, \bar{p_{2}})$, and $\bar{p_{2}} = (j_{1}, \theta _{2}-\theta _{1}, j_{2}, k_{2})$. Since $U_{2}x(p_{2})$ is computed with a roto-translation convolution, it remains covariant to the action of the roto-translation group. Fast computations of roto-translation convolutions with separable wavelet filters $\varPsi _{\theta _{2},j_{2},k_{2}}(u,\theta ) = \psi _{\theta _{2},j_{2}} (u) \bar{\psi }_{k_{2}}(\theta )$ are performed by factorizing

$$\begin{aligned} Y \star \varPsi _{\theta _{2},j_{2},k_{2}}(u,\theta ) = \sum _{\theta '}\Bigg (\sum _{u'}Y(u',\theta ')\psi _{\theta _{2},j_{2}} (r_{-\theta '}(u-u')) \Bigg )\bar{\psi }_{k_{2}}(\theta - {\theta }') \end{aligned}$$

(9)

It is thus computed with a two-dimensional convolution of $Y(u,\theta ')$ with $\psi _{\theta _{2},j_{2}}(r_{\theta }u)$ along $u = (u_{1}, u_{2})$, followed by a convolution of the output and a one-dimensional circular convolution of the result with $k_{2}$ along $\theta $. This convolution which rotates the spatial support of $\psi _{\theta _{2},j_{2}}(u)$ by $\theta $ while multiplying its amplitude by $ \bar{\psi }_{k_{2}}(u)$.

Applying $\widetilde{W_{3}} = \widetilde{W_{2}}$ to $U_{2}x$ computes second order scattering coefficients as a convolution of $Y(g) = U_{2} x(g,\bar{p}_{2})$ with $\varPhi _{J}(g)$, for $\bar{p}_{2}$ fixed is given as:

$$\begin{aligned} S_{2}x(p_{2}) = U_{1}x(., \bar{p}_{2}) \star \varPhi _{J}(g) \end{aligned}$$

(10)

The output roto-translation of a second order scattering representation is a vector of coefficients given by:

$$\begin{aligned} Sx = (S_{0}x(u), S_{1}x(p_{1}),S_{2}x(p_{2})) \end{aligned}$$

(11)

with $p_{1} = (u,\theta _{1},j_{1})$ and $p_{2} = (u,\theta _{1},j_{1},\theta _{2},j_{2},k_{2})$.

3.2 Conditional Random Field

Conditional Random Field (CRF) is a probabilistic framework that allows us to describe the relationship between related output variables such as labels for pixels in an image as a function of observed features: pixel colors or features [5]. This framework is thus ideal for combining multiple visual cues for scene understanding.

The CRF undirected graphical model used in this paper uses pairwise 4-connected grid consisting of finite number of vertices or nodes and edges connecting these nodes. Each node corresponds to a random variable denoted by X. Edges define the neighbourhood relation between these unobserved random variables.

A loss equation, required to be minimized to achieve the optimal labelling is defined by fitting two matrices F and G to the unary and edge features. $\gamma _{i}$ represent the set of parameter values $\gamma (x_i)$ for all values of $x_{i}$. Let $k(\mathbf {y},i)$ represent the unary features for variable i for a given input image y. Therefore:

$$\begin{aligned} \gamma _{i} = Fk(\mathbf {y},i) \end{aligned}$$

(12)

In the similar fashion, the parameter values for all $x_{i}, x_{j}$ is denoted by $\gamma _{ij}$. Given that $v(\mathbf {y},i,j)$ represents the edge feature for pair (i, j), then:

$$\begin{aligned} \gamma _{ij} = Gv(\mathbf {y},i,j) \end{aligned}$$

(13)

The gradients needed for optimization can be obtained using the following equation:

$$\begin{aligned} \frac{\partial L}{\partial F} = \sum _{i} \frac{\partial L}{\partial \gamma _{i}} k(\mathbf {y},i)^{T}, \frac{\partial L}{\partial G} = \sum _{ij} \frac{\partial L}{\partial \gamma _{ij}} v(\mathbf {y},i,j)^{T} \end{aligned}$$

(14)

This is under the assumption that $\frac{\partial L}{\partial \gamma }$ has been calculated.

A clique loss function is used in the paper to achieve the scene segmentation with Tree-Reweighted [5] inference which uses LBFGS optimization algorithm. In this process, the number of images used for trained is repeatedly doubled, with the number of learning iterations halved. Marginal based clique loss function is used to calculate the loss at each iteration. After every iteration, the loss value is checked and if bad search direction is encountered, L-BFGS is reinitialized [5].

4 Results

The proposed scene segmentation pipeline was evaluated and compared with another similar framework on both datasets introduced in the Sect. 2.

Table 1. Table presents overall pixel accuracy for each label on both datasets. D1: 1277 image dataset collected from Stanford Background dataset [8] and CMU Urban Image Dataset [14]. D2: 300 image datset collected from an UAV. HOG: Histogram of oriented gradients [4], RGB-I [5]: RGB Intensities, PL [5]: Pixel Locations, BL: Building, MN: Mountain, FGO: Foreground Object, FCN-8s: Fully Convolutional Network with 8 pixel stride [11] and OPA: Overall Pixel Accuracy

Full size table

The front-end of the pipeline uses the deep wavelet scattering network to extract features using Morlet filters at 4 scales (j) and 8 pre-defined orientations ($\theta $), as explained in Sect. 3.1. The features are extricated from each image from the 1277 annotated image dataset (D1) constructed by combining images from the Stanford Background dataset [8] and CMU Urban Image Dataset [14]. Each image of the dataset has a fixed resolution of $200 \times 300$. The condition random field is trained on the image features obtained using a 5-fold cross-validation split on the dataset. The average accuracies for ‘Sky’, ‘Tree’, ‘Road’, ‘Grass’, ‘Water’, ‘Building’, ‘Mountain’, ‘Foreground object’ is presented in Table 1. Two images selected from the D1 dataset along with the ground truth and segmentation using the trained CRF is shown in Fig. 3. The trained CRF is able to recognize the above-mentioned landmarks from images contained in D1 dataset.

Next, the trained CRF is fine tuned to detect the same landmarks for aerial images recorded from the UAV. The trained CRF model is fine-tuned on the features extracted from the UAV annotated aerial image dataset (D2) presented in Sect. 2. The features are extracted with the deep wavelet scattering network using Morlet filters with the above-mentioned parameters, from images of resolution $200 \times 300$ obtained using a 5-fold cross-validation split on the UAV image dataset. The average accuracies for the above-mentioned labels for the UAV image dataset is presented in Table 1. Two images selected from the D2 dataset along with the ground truth and segmentation using the Fine-tuned CRF is shown in Fig. 3.

The proposed scene segmentation pipeline is compared with segmentation pipelines that use: (i) hand-crafted or (ii) learned features to achieve this task. The scene segmentation results of the proposed method are compared with the segmentation performed by a handcrafted feature obtained by combining RGB intensities, HOG [4] and pixel locations that are then used in a CRF framework to achieve segmentation for both datasets. The proposed method is then compared with the scene segmentation results obtained by training a Fully Convolutional Network (FCN) with 8-pixel stride [11] on D1 dataset and then fine tuning the learned network on D2 dataset. The results are presented in Table 1. It is evident from Table 1 that the proposed method outperforms the segmentation pipeline that makes use of the hand-crafted features for scene segmentation on both datasets. The proposed method is also able to outperform the Fully Convolutional Network [11] on both datasets. The reason for this seems to be the small size of the D1 and D2 datasets resulting into the inefficient learning of the FCN network.

5 Conclusions

The paper introduces a novel application area of scene understanding to aerial images, which can be vital in surveillance and disaster management applications. The proposed architecture has also shown the importance of Scatternet that can extract invariant features which can replace popular hand-crafted features due to its superior performance. The proposed framework can also be used in an application with less availability of training data as only the back-end of our framework requires learning. The proposed framework achieves decent overall pixel accuracy for scene segmentation on both annotated datasets introduced in the paper. We hope to extend our framework to make use of large corpora of partially labeled data, or perhaps by using motion cues in videos to obtain segmentation labels. An important and natural extension of our method can be provided by incorporating object-based reasoning directly into our model which can lead to better understanding of images.

References

Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)
Article Google Scholar
Casella, E., Rovere, A., Pedroncini, A., Mucerino, L., Casella, M., Cusati, L.A., Vacchi, M., Ferrari, M., Firpo, M.: Study of wave runup using numerical models and low-altitude aerial photogrammetry: A tool for coastal management. Estuar. Coast. Shelf Sci. 149, 160–167 (2014)
Article Google Scholar
Christophe, E., Inglada, J.: Robust road extraction for high resolution satellite images. In: 2007 IEEE International Conference on Image Processing, pp. 437–440. IEEE (2007)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Google Scholar
Domke, J.: Learning graphical model parameters with approximate marginal inference. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2454–2467 (2013)
Article Google Scholar
Dubuisson-Jolly, M., Gupta, A.: Color and texture fusion: application to aerial image segmentation and gis updating. Image Vis. Comput. 18, 823–832 (2010)
Article Google Scholar
Ghiasi, M., Amirfattahi, R.: Fast semantic segmentation of aerial images based on color and texture. In: 8th Iranian Conference on Machine Vision and Image Processing (MVIP) (2013)
Google Scholar
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: International Conference on Computer Vision (ICCV) (2009)
Google Scholar
Laptev, I., Mayer, H., Lindeberg, T., Eckstein, W., Steger, C., Baumgartner, A.: Automatic extraction of roads from aerial images based on scale space and snakes. Mach. Vis. Appl. 12(1), 23–31 (2000)
Article Google Scholar
Lathuiliere, S., Vu, H., Le, T., Tran, T., Hung, D.: Semantic regions recognition in UAV images sequence. Knowl. Syst. Eng. 326, 313–324 (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional network for semantic segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Marmanis, D., Wegner, J.D., Galliani, S., Schindler, K., Datcu, M., Stilla, U.: Semantic segmentation of aerial images with an ensemble of CNNs. ISPRS Ann. Photogrammetry Remote Sens. Spatial Inf. Sci. 3, 473–480 (2016)
Article Google Scholar
Montoya-Zegarra, J., Wegner, J., Ladicky, L., Schindler, K.: Semantic segmentation of aerial images in urban areas with class-specific higher-order cliques. ISPRS Ann. Photogrammetry Remote Sens. Spatial Inf. Sci. 2, 127–133 (2015)
Article Google Scholar
Munoz, D., Bagnell, J.A., Hebert, M.: co-inference for multi-modal scene analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 668–681. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_48
Chapter Google Scholar
Penmetsa, S., Minhuj, F., Singh, A., Omkar, S.: Autonomous UAV for suspicious action detection using pictorial human pose estimation and classification. Electron. Lett. Comput. Vis. Image Anal. 3(1), 18–32 (2014)
Google Scholar
Rezaeian, M., Amirfattahi, R., Sadri, S.: Semantic segmentation of aerial images using fusion of color and texture features. J. Comput. Secur. 1, 225–238 (2013)
Google Scholar
Rochery, M., Jermyn, I.H., Zerubia, J.: Higher order active contours. Int. J. Comput. Vis. 69(1), 27–42 (2006)
Article Google Scholar
Şerban, G., Rus, I., Vele, D., Breţcan, P., Alexe, M., Petrea, D.: Flood-prone area delimitation using UAV technology, in the areas hard-to-reach for classic aircrafts: case study in the north-east of apuseni mountains, transylvania. Nat. Hazards, 82, 1–16 (2016)
Google Scholar
Sifre, L.: Rigid-motion scattering for image classification. Ph.D. thesis (2014)
Google Scholar
Sifre, L., Mallat, S.: Combined scattering for rotation invariant texture analysis. In: European Symposium on Artificial Neural Networks (ESANN) (2012)
Google Scholar
Sifre, L., Mallat, S.: Rotation, scaling and deformation invariant scattering for texture discrimination. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 1233–1240 (2013)
Google Scholar
Šmídl, V., Hofman, R.: Tracking of atmospheric release of pollution using unmanned aerial vehicles. Atmos. Environ. 67, 425–436 (2013)
Article Google Scholar
Su, Y., Guo, Q., Fry, D.L., Collins, B.M., Kelly, M., Flanagan, J.P., Battles, J.J.: A vegetation mapping strategy for conifer forests by combining airborne lidar data and aerial imagery. Can. J. Remote Sens. 42(1), 1–15 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, National Institute of Technology, Warangal, India
Sandeep Nadella
Department of Engineering, University of Cambridge, Cambridge, UK
Amarjot Singh
Department of Aerospace Engineering, Indian Institute of Science, Bangalore, India
S. N. Omkar

Authors

Sandeep Nadella
View author publications
You can also search for this author in PubMed Google Scholar
Amarjot Singh
View author publications
You can also search for this author in PubMed Google Scholar
S. N. Omkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amarjot Singh .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Gang Hua
Facebook AI Research (FAIR), Menlo Park, USA
Hervé Jégou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nadella, S., Singh, A., Omkar, S.N. (2016). Aerial Scene Understanding Using Deep Wavelet Scattering Network and Conditional Random Field. In: Hua, G., Jégou, H. (eds) Computer Vision – ECCV 2016 Workshops. ECCV 2016. Lecture Notes in Computer Science(), vol 9913. Springer, Cham. https://doi.org/10.1007/978-3-319-46604-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-46604-0_15
Published: 18 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46603-3
Online ISBN: 978-3-319-46604-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics