Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
A Remote Sensing Diagnosis of Water Use and Water Stress in a Region with Intense Irrigation Growth in Brazil
Next Article in Special Issue
A Fast Retrieval of Cloud Parameters Using a Triplet of Wavelengths of Oxygen Dimer Band around 477 nm
Previous Article in Journal
Storm Surge Hazard Assessment of the Levee of a Rapidly Developing City-Based on LiDAR and Numerical Models
Previous Article in Special Issue
Model Selection in Atmospheric Remote Sensing with Application to Aerosol Retrieval from DSCOVR/EPIC. Part 2: Numerical Analysis
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory

1
Remote Sensing Technology Institute, German Aerospace Center (DLR), 82234 Oberpfaffenhofen, Germany
2
Jet Propulsion Laboratory (NASA-JPL), California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(22), 3724; https://doi.org/10.3390/rs12223724
Submission received: 1 October 2020 / Revised: 1 November 2020 / Accepted: 6 November 2020 / Published: 12 November 2020
(This article belongs to the Special Issue Advances of Remote Sensing Inversion)
Graphical abstract
">
Figure 1
<p>Relative errors <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math>, and <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math> versus <math display="inline"><semantics> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> = 3.0 km. All four aerosol models are involved in the retrieval.</p> ">
Figure 2
<p>Relative errors <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math>, and <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math> versus <math display="inline"><semantics> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> = 1.0. All four aerosol models are involved in the retrieval.</p> ">
Figure 3
<p>Relative errors <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math>, and <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math> versus <math display="inline"><semantics> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> <mo>=</mo> <mn>3.0</mn> </mrow> </semantics></math> km. All aerosol models except the exact one are involved in the retrieval.</p> ">
Figure 4
<p>Relative errors <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math>, <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math>, and <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>max</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math> versus <math display="inline"><semantics> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <mrow> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> <mo>=</mo> <mn>1.0</mn> </mrow> </semantics></math>. All aerosol models except the exact one are involved in the retrieval.</p> ">
Figure 5
<p>Upper panels: mean a posteriori density of <math display="inline"><semantics> <mi>τ</mi> </semantics></math> computed by GCV for Test Examples 1 (<b>left</b>) and 2 (right). Lower panels: mean a posteriori density of <span class="html-italic">H</span> computed by GCV for Test Examples 1 (<b>left</b>) and 2 (<b>right</b>). The true values are <math display="inline"><semantics> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> = 1.5 and <math display="inline"><semantics> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> = 3.0 km.</p> ">
Figure 6
<p>Relative errors <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math> versus <math display="inline"><semantics> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> <mo>=</mo> <mn>3.0</mn> </mrow> </semantics></math> km (upper panels) and <math display="inline"><semantics> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <mrow> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> <mo>=</mo> <mn>1.0</mn> </mrow> </semantics></math> (lower panels). The surface albedo is included in the retrieval and all four aerosol models are considered.</p> ">
Figure 7
<p>Relative errors <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>τ</mi> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>ε</mi> <mrow> <mi>mean</mi> </mrow> <mi>H</mi> </msubsup> </semantics></math> versus <math display="inline"><semantics> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> <mo>=</mo> <mn>3.0</mn> </mrow> </semantics></math> km (upper panels) and <math display="inline"><semantics> <msub> <mi>H</mi> <mi mathvariant="normal">t</mi> </msub> </semantics></math> for <math display="inline"><semantics> <mrow> <msub> <mi>τ</mi> <mi mathvariant="normal">t</mi> </msub> <mo>=</mo> <mn>1.0</mn> </mrow> </semantics></math> (lower panels). The surface albedo is included in the retrieval and all aerosol models excepting the exact one are considered.</p> ">
Review Reports Versions Notes

Abstract

:
The retrieval of aerosol and cloud properties such as their optical thickness and/or layer/top height requires the selection of a model that describes their microphysical properties. We demonstrate that, if there is not enough information for an appropriate microphysical model selection, the solution’s accuracy can be improved if the model uncertainty is taken into account and appropriately quantified. For this purpose, we design a retrieval algorithm accounting for the uncertainty in model selection. The algorithm is based on (i) the computation of each model solution using the iteratively regularized Gauss–Newton method, (ii) the linearization of the forward model around the solution, and (iii) the maximum marginal likelihood estimation and the generalized cross-validation to estimate the optimal model. The algorithm is applied to the retrieval of aerosol optical thickness and aerosol layer height from synthetic measurements corresponding to the Earth Polychromatic Imaging Camera (EPIC) instrument onboard the Deep Space Climate Observatory (DSCOVR) satellite. Our numerical simulations show that the heuristic approach based on the thesolution minimizing the residual, which is frequently used in literature, is completely unrealistic when both the aerosol model and surface albedo are unknown.

Graphical Abstract">

Graphical Abstract

1. Introduction

The limited information provided by satellite measurements does not allow for a complete determination of aerosol and cloud properties and, in particular, their microphysical properties. To deal with this problem, a set of models describing the microphysical properties is typically used. For example, the aerosol models implemented in the Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol algorithm over land [1] consist of three spherical, fine-dominated types and one spheroidal, coarse-dominated type, while typical cloud models consist of water and ice clouds with a predefined particle shape and size. In this regard, a priori information such as the selection of a microphysical model is an essential part of the retrieval process. Standard retrieval algorithms usually ignore model uncertainty, i.e., a model is chosen from a given set of candidate models, and the retrieval is performed as if the selected model represents the true state. This approach is very risky and may result in large errors in the retrieved parameters if the model does not reflect the real scenario. An efficient way to quantify the uncertainty in model selection is the Bayesian approach and, in particular, Bayesian model selection and model averaging [2]. By model selection, we mean the specific problem of choosing the most suitable model from a given set of candidate models. In general, model selection is not a trivial task because for a given measurement, there will be several models that fit the data equally well. These tools were used, among others, in Refs. [3,4] to study uncertainty quantification in remote sensing of aerosols from Ozone Monitoring Instrument (OMI) data.
The key quantity in Bayesian model selection is relative evidence, which is a measure of how well the model fits the measurement. This is expressed in terms of the marginal likelihood, which is the integral over the state vector space of the a priori density times the likelihood density (see below). An accurate computation of the marginal likelihood is not a trivial task because some parameters characterizing the a priori and likelihood densities, e.g., the data error variance and the a priori state variance, are not precisely known and must be estimated. Moreover, the integral over the state vector space cannot be computed analytically because the likelihood density depends on the nonlinear forward model, for which an analytical representation is not available.
In this paper, we aim to eliminate the drawbacks of the Bayesian approach by using:
  • an iteratively regularized Gauss–Newton method for computing the solution of each model and estimating the ratio of the data error variance and the a priori state variance;
  • a linearization of the forward model around the solution for integrating the likelihood density over the state vector;
  • parameter choice methods from linear regularization theory for estimating the optimal model and the data error variance.
The paper is organized as follows. In Section 2, we derive a scaled data model with white noise in order to simplify the subsequent analysis. Section 3 is devoted to a review of the Bayesian parameter estimation and model selection in order to describe the problem and clarify the nomenclature. In Section 4, we summarize the iteratively regularized Gauss–Newton method and emphasize some special features of this method, including the estimation of the ratio of the data error variance and the a priori state variance. In Section 5, we assume a linearization of the forward model around the solution and, under this assumption, extend some regularization parameter choice methods for linear problems (the maximum marginal likelihood estimation and the generalized cross-validation) for data error variance estimation and model selection. Section 6 summarizes the computational steps of the proposed retrieval algorithm. In Section 7, we apply the algorithm to the retrieval of aerosol optical thickness and aerosol layer height from the Earth Polychromatic Imaging Camera (EPIC)/Deep Space Climate Observatory (DSCOVR) measurements.

2. Data Model

Consider N m microphysical models, and let F m ( x ) R M be the forward model corresponding to the model m for m = 1 , , N m . More specifically, in our analysis, F m ( x ) is the vector of the logarithms of the simulated radiances at all wavelengths λ i , i = 1 , , M , i.e., [ F m ( x ) ] i = ln I m ( λ i , x ) , and x R N is the state vector encapsulating the atmospheric parameters to be retrieved, e.g., aerosol optical thickness and layer height.
The nonlinear data model under examination consists of the equations:
y δ = y + δ mes ,
and:
y = F m ( x ) + δ aer m
where y δ R M is the measurement vector or the noisy data vector, y the exact data vector, δ mes R M the measurement error vector, and δ aer m R M the model error vector, i.e., the error due to an inadequate model. Defining the (total) data error vector by:
δ m = δ mes + δ aer m ,
the data model (1) and (2) becomes:
y δ = F m ( x ) + δ m .
In a deterministic setting, δ m is characterized by the noise level Δ m (defined as an upper bound for δ m , i.e., δ m Δ m ), x is a deterministic vector, and we are faced with the solution of the nonlinear equation y δ = F m ( x ) . In a stochastic setting, δ m and x are random vectors, and the data model (3) is solved by means of a Bayesian approach.
Using the prewhitening technique, we transform the data model (3) into a model with white noise. For this purpose, we take:
  • δ mes as a Gaussian random vector with zero mean and covariance matrix:
    C mes = diag ( σ mes i 2 ) M = σ mes 2 C mes , C mes = diag σ mes i 2 σ mes 2 M , σ mes 2 = 1 M i = 1 M σ mes i 2 ,
    and
  • δ aer m as a Gaussian random vector with zero mean and covariance matrix:
    C aer m = σ aer m 2 I M .
If δ mes and δ aer m are independent random vectors, then δ m is a Gaussian random vector with zero mean and covariance matrix:
C δ m = C mes + C aer m = σ m 2 C δ m , C δ m = w mes m C mes + ( 1 w mes m ) I M ,
where:
σ m 2 = σ mes 2 + σ aer m 2
is the variance of the data error vector and:
w mes m = σ mes 2 / σ m 2
is the weighting factor giving the contribution of C mes to the covariance matrix C δ m . After these preliminary constructions, we introduce the scaled data model:
y ¯ δ = F ¯ m ( x ) + δ ¯ m ,
where:
y ¯ δ = P y δ , F ¯ m ( x ) = P F m ( x ) , δ ¯ m = P δ m ,
and:
P = C δ m 1 / 2 = diag 1 w mes m 1 σ mes i 2 σ mes 2 1 / 2 M
is the scaling matrix. Taking into account that:
C ¯ δ m = E { δ ¯ m δ ¯ m k T } = σ m 2 P C δ m P T = σ m 2 I M ,
we see that by means of the prewhitening technique, the Gaussian random vector δ m N ( 0 , C δ m = σ m 2 C δ m ) is transformed into the white noise δ ¯ m N ( 0 , C ¯ δ m = σ m 2 I M ) . Here and in the following, the notation N ( x mean , C x ) stands for a normal distribution with mean x mean and covariance matrix C x .
The following comments can be taken into consideration:
  • In [3], the covariance matrix C aer m was estimated by empirically exploring a set of residuals of model fits to the measurements. Essentially, C aer m is assumed to be in the form:
    [ C aer m ] i j = σ 1 2 exp [ ( λ i λ j ) 2 / l 2 ] , i j σ 0 2 + σ 1 2 , i = j ,
    where σ 0 2 , σ 1 2 , and l (representing the non-spectral diagonal variance, the spectral variance, and the correlation length, respectively) are computed by means of an empirical semivariogram, i.e., the variances of the residual differences are calculated for each wavelength pair with the distance d = | λ i λ j | , and a theoretical Gaussian variogram model is fitted to these empirical semivariogram values. As compared to Equation (6), in our analysis, we use the diagonal matrix approximation C aer m σ aer m 2 I M with σ aer m 2 = σ 0 2 + σ 1 2 .
  • In principle, the scaling matrix P depends on the model error variance σ aer m 2 through the weighting factor w mes m . However, for [ F m ( x ) ] i = ln I m ( λ i , x ) , we usually have σ mes i 2 σ mes 2 for all i = 1 , , M . In this case, it follows that C mes , together with C δ m and P , are close to the identity matrix. This result does not mean that in our model, σ aer m 2 does not play any role; σ aer m 2 is included in σ m 2 , which is the subject of an estimation process.

3. Bayesian Approach

In this section, we review the basics of Bayesian parameter estimation and model selection by following the analysis given in Refs. [2,3].

3.1. Bayesian Parameter Estimation

In statistical inversion theory, the state vector x is assumed to be a random vector, and in this regard, we take x N ( x a , C x ) , where x a is the a priori state vector, the best beforehand estimate of the solution, and:
C x = diag ( σ x i 2 ) N = σ x 2 C x , C x = diag σ x i 2 σ x 2 N , σ x 2 = 1 N i = 1 N σ x i 2 .
Furthermore, the matrix L is defined through the Cholesky factorization:
C x 1 = L T L , that is , L = diag σ x σ x i N ,
and the parameter α is defined as:
α = σ m 2 σ x 2 .
The data model (3) gives a relation between the three random vectors y ¯ δ , x , and δ m , and therefore, their probability densities depend on each other. The following probability densities are relevant in Bayesian parameter estimation: (i) the a priori density p ( x m ) , which represents the conditional probability density of x given the model m before performing the measurement y ¯ δ , (ii) the likelihood density p ( y ¯ δ x , m ) , which represents the conditional probability density of y ¯ δ given the state x and the model m, and (iii) the a posteriori density p ( x y ¯ δ , m ) , which represents the conditional probability density of x given the data y ¯ δ and the model m.
The Bayes theorem of inverse problems relates the a posteriori density to the likelihood density:
p ( x y ¯ δ , m ) = p ( y ¯ δ x , m ) p ( x , m ) p ( y ¯ δ , m ) = p ( y ¯ δ x , m ) p ( x m ) p ( y ¯ δ m ) .
In Bayesian parameter estimation, the marginal likelihood density p ( y ¯ δ m ) , defined by:
p ( y ¯ δ m ) = p ( x , y ¯ δ m ) d x = p ( y ¯ δ x , m ) p ( x m ) d x ,
plays the role of a normalization constant and is usually ignored. However, as we will see in the next section, this density is of particular importance in Bayesian model selection.
For x N ( x a , C x = σ m 2 ( α L T L ) 1 ) and δ ¯ m N ( 0 , C ¯ δ m = σ m 2 I M ) , the probability densities p ( x m ) and p ( y ¯ δ x , m ) are given by:
p ( x m ) = 1 ( 2 π σ m 2 ) N det [ ( α L T L ) 1 ] × exp α 2 σ m 2 L ( x x a ) 2
and:
p ( y ¯ δ x , m ) = 1 ( 2 π σ m 2 ) M exp 1 2 σ m 2 y ¯ δ F ¯ m ( x ) 2 ,
respectively. The Bayes formula yields the following expression for the a posteriori density:
p ( x y ¯ δ , m ) exp 1 2 V α ( x y ¯ δ , m ) ,
where the a posteriori potential V α ( x y ¯ δ , m ) is defined by:
V α ( x y ¯ δ , m ) = 1 σ m 2 y ¯ δ F ¯ m ( x ) 2 + α L ( x x a ) 2 .
The maximum a posteriori estimator x ^ m α δ maximizing the conditional probability density p ( x y ¯ δ , m ) also minimizes the potential V α ( x y ¯ δ , m ) , i.e.,
x ^ m α δ = arg min x V α ( x y ¯ δ , m ) .

3.2. Bayesian Model Selection

The relative evidence, also known as the a posteriori probability of the model m given the measurement y δ , p ( m y ¯ δ ) , is used for model comparison. By taking into consideration the Bayes theorem, this conditional probability density is defined by:
p ( m y ¯ δ ) = p ( y ¯ δ m ) p ( m ) p ( y ¯ δ ) .
The mean formula:
p ( y ¯ δ ) = m = 1 N m p ( y ¯ δ m ) p ( m )
and the assumption that all models are equally likely, i.e., p ( m ) = 1 / N m , yield:
p ( m y ¯ δ ) = p ( y ¯ δ m ) m = 1 N m p ( y ¯ δ m ) .
Intuitively, the model with the highest evidence is the best among all the models involved, and in this regard, we define the maximum solution estimate as:
x ^ max δ = x ^ m α δ , m = arg max m p ( m y ¯ δ ) .
In fact, we can compare the models to see if one of them clearly shows the highest evidence, or if there are several models with comparable values of evidence. When several models can provide equally good fits to the measurement, we can use the Bayesian model averaging technique to combine the individual a posteriori densities weighted by their evidence. Specifically, using the relation:
p ( x , y ¯ δ ) = m = 1 N m p ( x , y ¯ δ , m ) = m = 1 N m p ( x y ¯ δ , m ) p ( m y ¯ δ ) p ( y ¯ δ )
on the one hand, and p ( x , y ¯ δ ) = p ( x y ¯ δ ) p ( y ¯ δ ) , on the other hand, we are led to the Bayesian model averaging formula:
p mean ( x y ¯ δ ) = m = 1 N m p ( x y ¯ δ , m ) p ( m y ¯ δ ) ,
and, consequently, to the mean solution estimate:
x ^ mean δ = m = 1 N m x ^ m α δ p ( m y ¯ δ ) .
The above analysis shows that in Bayesian parameter estimation and model selection, we are faced with the following problems:
Problem 1.
From Equations (11)–(13), it is apparent that the computation of the estimator x ^ m α δ requires the knowledge of the parameter α , i.e., the ratio of the data error variance and the a priori state variance.
Problem 2.
From Equations (15) and (17), we see that the solution estimates x ^ max δ and x ^ mean δ are expressed in terms of relative evidence p ( m y ¯ δ ) , which in turn, according to Equation (14), is expressed in terms of the marginal likelihood density p ( y ¯ δ m ) . In view of Equation (8), the computation of the marginal likelihood density p ( y ¯ δ m ) requires the knowledge of the likelihood density p ( y ¯ δ x , m ) , and therefore of Equation (10), of the data error variance σ m 2 .
Problem 3.
The dependency of the likelihood density p ( y ¯ δ x , m ) on the nonlinear forward model F ¯ m ( x ) does not allow an analytical integration in Equation (8).

4. Iteratively Regularized Gauss–Newton Method

In the framework of Tikhonov regularization, a regularized solution x m α δ to the nonlinear equation y ¯ δ = F ¯ m ( x ) is computed as the minimizer of the Tikhonov function:
F m α ( x ) = y ¯ δ F ¯ m ( x ) 2 + α L ( x x a ) 2 ,
that is,
x m α δ = arg min x F m α ( x ) .
In classical regularization theory, the first term on the right-hand side of Equation (18) is the squared residual; the second one is the penalty term; while α and L are the regularization parameter and the regularization matrix, respectively. From Equations (12) and (18), we see that V α ( x y ¯ δ , m ) = ( 1 / σ m 2 ) F m α ( x ) , and from Equations (13) and (19), we deduce that x ^ m α δ = x m α δ . Thus, the maximum a posteriori estimate coincides with the Tikhonov solution, and therefore, the Bayesian parameter estimation can be regarded as a stochastic version of the method of Tikhonov regularization with an a priori chosen regularization parameter α .
In the framework of Tikhonov regularization, the computation of the optimal regularization parameter is a crucial issue. With too little regularization, reconstructions deviate significantly from the a priori, and the solution is said to be under-regularized. With too much regularization, the reconstructions are too close to the a priori, and the solution is said to be over-regularized. In the Bayesian framework, the optimal regularization parameter is identified as the true ratio of the data error variance and the a priori state variance.
Several regularization parameter choice methods were discussed in [5]. These include methods with constant regularization parameters, e.g., the maximum likelihood estimation, the generalized cross-validation, and the nonlinear L-curve method. Unfortunately, at present, there is no fail-safe regularization parameter choice method that guarantees small solution errors in any circumstance, that is for any errors in the data vector.
An improvement of the problems associated with the regularization parameter selection is achieved in the framework of the so-called iterative regularization methods. These approaches are less sensitive to overestimates of the regularization parameter, but require more iteration steps to achieve convergence. A representative iterative approach is the iteratively regularized Gauss–Newton method.
At iteration step k of the iteratively regularized Gauss–Newton method, the forward model F ¯ m ( x ) is linearized around the current iterate x m k δ ,
F ¯ m ( x ) F ¯ m ( x m k δ ) + K ¯ m k ( x x m k δ ) ,
where:
K ¯ m k = F ¯ m x ( x m k δ )
is the Jacobian matrix at x m k δ , and the nonlinear equation F ¯ m ( x ) = y ¯ δ is replaced by its linearization:
K ¯ m k p = y ¯ m k δ ,
where:
p = x x a
is the step vector with respect to the a priori and:
y ¯ m k δ = y ¯ δ F ¯ m ( x m k δ ) + K ¯ m k ( x m k δ x a )
is the linearized data vector at iteration step k.
Since the nonlinear problem is ill-posed, its linearization is also ill-posed. Therefore, the linearized Equation (20) is solved by means of Tikhonov regularization with the penalty term α k L p 2 , where the regularization parameters α k are the terms of a decreasing (geometric) sequence, i.e., α k = q α k 1 with q < 1 . Note that in the method of Tikhonov regularization, the regularization parameter α is kept constant during the iterative process. The Tikhonov function for the linearized equation takes the form:
F l m k ( p ) = y ¯ m k δ K ¯ m k p 2 + α k L p 2 ,
and its minimizer is given by:
p m k δ = K ¯ m k y ¯ m k δ ,
where:
K ¯ m k = ( K ¯ m k T K ¯ m k + α k L T L ) 1 K ¯ m k T
is the regularized generalized inverse at x m k δ . The new iterate is computed as:
x m k + 1 δ = x a + K ¯ m k y ¯ m k δ .
For the iterative regularization methods, the number of iteration steps k plays the role of the regularization parameter, and the iterative process is stopped after an appropriate number of steps k in order to avoid an uncontrolled expansion of errors in the data. In fact, a mere minimization of the residual r m k δ , where:
r m k δ = y ¯ δ F ¯ m ( x m k δ )
is the residual vector at x m k δ , leads to a semi-convergent behavior of the iterated solution: while the error in the residual decreases as the number of iteration steps increases, the error in the solution starts to increase after an initial decay. A widely used a posteriori choice for the stopping index k is the discrepancy principle. According to this principle, the iterative process is terminated after k steps such that:
r m k δ 2 η Δ m 2 < r m k δ 2 , 0 k < k ,
with η > 1 ; hence, the regularized solution is x m k δ . Since the noise level cannot be estimated a priori for many practical problems arising in atmospheric remote sensing, we adopt a practical approach. This is based on the observation that the squared residual r m k δ 2 decreases during the iterative process and attains a plateau at approximately Δ m 2 . Thus, if the nonlinear residuals r m k δ converge to r m δ within a prescribed tolerance, we use the estimate:
Δ m 2 = r m δ 2 .
The above heuristic stopping rule does not have any mathematical justification, but works sufficiently well in practice.
Since the amount of regularization is gradually decreased during the iterative processes, the iteratively regularized Gauss–Newton method can handle problems that practically do not require much regularization.
The numerical experiments performed in [5] showed that at the solution x m k δ , (i) α k 1 is close to the optimal regularization parameter, and (ii) x m k δ is close to the Tikhonov solution corresponding to the optimal regularization parameter. In this regard, we make the following assumptions:
  • α ^ = α k 1 is an estimate for the optimal regularization parameter, and
  • x m α ^ δ = x m k δ is the minimizer of the Tikhonov function with regularization parameter α ^ , F m α ^ ( x ) .
Some consequences of the second assumption are the following. The optimality condition F m α ^ ( x m α ^ δ ) = 0 , yields:
K ¯ m α ^ T [ F ¯ m ( x m α ^ δ ) y ¯ δ ] + α ^ L L T ( x m α ^ δ x a ) = 0
and further,
x m α ^ δ = x a + K ¯ m α ^ y ¯ m α ^ δ ,
where K ¯ m α ^ = K ¯ m k ,
y ¯ m α ^ δ = y ¯ m k δ = y ¯ δ F ¯ m ( x m α ^ δ ) + K ¯ m α ^ ( x m α ^ δ x a ) ,
and:
K ¯ m α ^ = ( K ¯ m α ^ T K ¯ m α ^ + α ^ L T L ) 1 K ¯ m α ^ T .
Consequently, we see that:
p m α ^ δ = x m α ^ δ x a = K ¯ m α ^ y ¯ m α ^ δ
is the minimizer of the Tikhonov function:
F l m α ^ ( p ) = y ¯ m α ^ δ K ¯ m α ^ p 2 + α ^ L p 2 ,
and that the residual vector of the linear equation y ¯ m α ^ δ = K ¯ m α ^ p is equal to the nonlinear residual vector at the solution, i.e.,
y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ = y ¯ δ F ¯ m ( x m α ^ δ ) = r m α ^ δ .
In summary, in the iteratively regularized Gauss–Newton method:
  • the computation of the regularized solution x m α ^ δ depends only on the initial value α 1 and the ratio q of the geometric sequence α k , which determine the rate of convergence, and
  • the regularization parameter at the solution is an estimate for the optimal regularization parameter, and so, for the ratio of the data error variance and the a priori state variance.
Taking these results into account, we may conclude that the iteratively regularized Gauss–Newton method gives a solution to Problem 1 of the Bayesian parameter estimation.

5. Parameter Estimation and Model Selection

In this section, we address Problems 2 and 3 related to Bayesian model selection.
To solve Problem 3, we suppose that the forward model can be linearized around the solution x m α ^ δ [6]. In other words, in the first-order Taylor expansion:
F ¯ m x = F ¯ m ( x m α ^ δ ) + K ¯ m α ^ ( x x m α ^ δ ) + R ¯ m ( x , x m α ^ δ )
the remainder term R ¯ m ( x , x m α ^ δ ) can be neglected. As a result, the nonlinear data model y ¯ δ = F ¯ m ( x ) + δ ¯ m becomes the linear model:
y ¯ m α ^ δ = K ¯ m α ^ p + δ ¯ m ,
where y ¯ m α ^ δ is given by Equation (21), p = x x a N ( 0 , C x = σ m 2 ( α ^ L T L ) 1 ) and δ ¯ m N ( 0 , C ¯ δ m = σ m 2 I M ) . The direct consequences of dealing with the linear model (26) are the following results:
  • the a posteriori density p ( x y ¯ δ , m ) can be expressed as a Gaussian distribution:
    p ( x y ¯ δ , m ) exp 1 2 ( x x m α ^ δ ) T C ^ x m 1 ( x x m α ^ δ ) ,
    where:
    C ^ x m = σ m 2 ( K ¯ m α ^ T K ¯ m α ^ + α ^ L T L ) 1 .
    is the a posteriori covariance matrix, and
  • as shown in Appendix A, the marginal likelihood density p ( y ¯ δ m ) can be computed analytically; the result is:
    p ( y ¯ δ m ) = det ( I M A ^ m α ^ ) ( 2 π σ m 2 ) M × exp 1 2 σ m 2 y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ ,
    where A ^ m α ^ = K ¯ m α ^ K ¯ m α ^ is the influence matrix at the solution x m α ^ δ .
The replacement of the nonlinear data model by a linear data model will also enable us to deal with Problem 2. More specifically, for this purpose, we will adapt some regularization parameter choice methods for linear problems, i.e., maximum marginal likelihood estimation [7,8,9] and generalized cross-validation [10,11], to model selection. Furthermore, estimates for the data error variance σ m 2 delivered by these methods will be used to compute the marginal likelihood density and, therefore, the relative evidence.
In Appendix B, it is shown that:
  • in the framework of maximum marginal likelihood estimation, the data error variance can be estimated by:
    σ ^ L m 2 = 1 M y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ ,
    and the model with the smallest value of the marginal likelihood function, defined by:
    λ ( m ) = y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ det ( I M A ^ m α ^ ) M ,
    is optimal.
  • in the framework of generalized cross-validation, the data error variance can be estimated by:
    σ ^ G m 2 = r m α ^ δ 2 [ trace ( I M A ^ m α ^ ) ] ,
    and the model with the smallest value of the generalized cross-validation function, defined by:
    υ ( m ) = r m α ^ δ 2 [ trace ( I A ^ m α ^ ) ] 2 ,
    is optimal.
Denoting by p L ( y ¯ δ m ) and p G ( y ¯ δ m ) the marginal likelihoods corresponding to σ ^ L m 2 and σ ^ G m 2 , respectively, we define:
  • the relative evidence of an approach based on marginal likelihood and the computation of the data error variance in the framework of the Maximum Marginal Likelihood Estimation (MLMMLE) by:
    p MLMMLE ( m y δ ) = p L ( y ¯ δ m ) m = 1 N m p L ( y ¯ δ m ) ,
  • the relative evidence of an approach based on the marginal likelihood and computation of the data error variance in the framework of the Generalized Cross-Validation (MLGCV) by:
    p MLGCV ( m y δ ) = p G ( y ¯ m δ m ) m = 1 N m p G ( y ¯ m δ m ) .
Note that for the data error variance estimate (29), the marginal likelihood density becomes:
p L ( y ¯ δ m ) = c M [ λ ( m ) ] M ,
with:
c M = M 2 π M exp M 2 .
On the other hand, the statements (30) and (32) are equivalent to (cf. Equation (15)):
x ^ max δ = x m α ^ δ , m = arg max m 1 λ ( m ) ,
and:
x ^ max δ = x m α ^ δ , m = arg max m 1 v ( m ) ,
respectively. Deviating from a stochastic interpretation of the relative evidence and regarding this quantity in a deterministic setting merely as a merit function characterizing the solution x m α ^ δ , we define:
  • the relative evidence of an approach based on Maximum Marginal Likelihood Estimation (MMLE) by:
    p MMLE ( m y δ ) = 1 / λ ( m ) m = 1 N m 1 / λ ( m ) ,
  • the relative evidence of an approach based on Generalized Cross-Validation (GCV) by:
    p GCV ( m y δ ) = 1 / υ ( m ) m = 1 N m 1 / υ ( m ) .
Note that p MMLE ( m y δ ) and p GCV ( m y δ ) do not depend on the data error variance σ m 2 . Note also that σ ^ G m 2 is defined in terms of the square residual:
r m α ^ δ 2 = y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ 2 = ( I M A ^ m α ^ ) y ¯ m α ^ δ 2 ,
while, according to Equation (29), σ ^ L m 2 is defined in terms of the quantity y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ . The same difference exists between the numerators of the cross-validation function v ( m ) and the marginal likelihood function λ ( m ) (see Equations (32) and (30)).

6. Algorithm Description

An algorithmic implementation of the iteratively regularized Gauss–Newton method in connection with model selection is as follows:
  • compute the scaling matrix P by means of Equation (5) and the scaled data vector y ¯ δ = P y δ ;
  • given the current iterate x m k δ at step k, compute the forward model F m ( x m k δ ) , the Jacobian matrix K m k , and the scaled quantities F ¯ m ( x m k δ ) = P F m ( x m k δ ) and K ¯ m k = P K m k ;
  • compute the linearized data vector:
    y ¯ m k δ = y ¯ δ F ¯ m ( x m k δ ) + K ¯ m k ( x m k δ x a ) ;
  • compute the singular value decomposition of the quotient matrix K ¯ m k L 1 = U Γ V T , where Γ = diag ( γ m i ) M with γ m i = 0 for i > N is a diagonal matrix containing the singular values γ m i in decreasing order and U = [ u 1 , , u M ] R M × M and V = [ v 1 , , v N ] R N × N are orthogonal matrices containing the left and right singular column vectors u i and v i , respectively;
  • if k = 1 , choose α k = max ( γ m 1 γ m N , α min ) , where γ m 1 and γ m N are the largest and the smallest singular values, respectively; otherwise, set α k = max ( q α k 1 , α min ) ;
  • compute the minimizer of the Tikhonov function for the linearized equation,
    p m k δ = i = 1 n γ m i γ m i 2 + α k ( u i T y ¯ m k δ ) v i ,
    and the new iterate, x m k + 1 δ = p m k δ + x a ;
  • compute the nonlinear residual vector at x m k + 1 δ ,
    r m k + 1 δ = y ¯ δ F ¯ m ( x m k + 1 δ ) ,
    and the residual r k + 1 = r m k + 1 δ 2 ;
  • compute the condition number c k = γ m 1 / γ m N , the scalar quantities:
    y ¯ k = i = 1 N α k 1 γ m i 2 + α k 1 ( u i T y ¯ m k δ ) 2 + i = N + 1 M ( u i T y ¯ m k δ ) 2 , t k = M N + i = 1 N α k 1 γ m i 2 + α k 1 , d k = i = 1 N α k 1 γ m i 2 + α k 1 ,
    and the normalized covariance matrix:
    C ^ x m k = V ^ diag 1 γ m i 2 + α k 1 N V ^ T ,
    where V ^ = L 1 V ;
  • if r k + 1 r k , recompute x m k + 1 δ by means of a step-length algorithm such that r k + 1 < r k ; if the residual cannot be reduced, set r = r k . and go to Step 12;
  • compute the relative decrease in the residual r = ( r k r k + 1 ) / r k ;
  • if r > ε R , go to Step 1; otherwise set r = r k + 1 , and go to Step 12;
  • determine k such that r k η r < r k , for all 0 k < k ;
  • since r k = r m α ^ δ 2 , y ¯ k = y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ , t k = trace ( I A ^ m α ^ ) , and d k = det ( I M A ^ m α ^ ) , compute the estimates:
    λ ( m ) = y ¯ k d k M , v ( m ) = r k t k 2 , σ ^ L m 2 = 1 M y ¯ k , σ ^ G m 2 = r k t k ,
    the marginal likelihood:
    p X ( y ¯ δ m ) = d k ( 2 π σ ^ Y m 2 ) M exp y ¯ k 2 σ ^ Y m 2 ,
    where X stands for the character strings MLMMLE and MLGCV when the character variable Y takes the values L and G, respectively, and the covariance matrix,
    C ^ x m = σ ^ Y m 2 C ^ x m k ;
  • compute the relative evidence p X ( m y δ ) , where X stands for the character strings MLMMLE, MLGCV, MMLE, and GCV by using Equations (33), (34), (37), and (38), respectively, for those situations;
  • compute the maximum and mean solution estimates x ^ max δ and x ^ mean δ by using Equations (15) and (17), respectively, and the mean a posteriori density p mean ( x y δ ) by using Equation (16).
The control parameters of the algorithm are: (i) the ratio q of the geometric sequence of regularization parameters, (ii) the minimum acceptable value of the regularization parameter α min , (iii) the tolerance ε R of the residual test convergence, and (iv) the tolerance η of the discrepancy principle stopping rule.
We conclude our theoretical analysis with some comments:
  • Another estimate for the data error variance is derived in Appendix C; this is given by:
    σ ^ R m 2 = 1 M N r m α ^ δ 2 .
    If α ^ γ m i 2 for all i = 1 , , N , we can approximate (cf. Equation (A17)):
    trace ( I A ^ m α ^ ) = M N + i = 1 M α ^ γ m i 2 + α ^ , M N ,
    and in view of Equation (31), we deduce that σ ^ R m 2 σ ^ G m 2 .
  • If a model is far from reality, it is natural to assume that the model parameter errors are large or, equivalently, that the data error variance is large. This observation suggests that, in a deterministic setting, we may define the relative evidence as:
    p ( m y δ ) = 1 / σ ^ Y m 2 m = 1 N m 1 / σ ^ Y m 2 ,
    where the character variable Y takes the values R, L, or G. In this case, for σ ^ R m 2 = r m α ^ δ 2 / ( M N ) , the model with the smallest value of the squared residual is considered to be optimal.

7. Application to the EPIC Instrument

The Deep Space Climate Observatory (DSCOVR) flies in a Lissajous orbit about the first Earth-Sun Lagrange point (L 1 ), which is 1.5 million kilometers from the Earth towards the Sun. The Earth Polychromatic Imaging Camera (EPIC) is one of the two Earth observing instruments on board DSCOVR. This instrument views the entire sunlit side of the Earth from sunrise to sunset in the backscattering direction (scattering angles between 168.5° and 175.5°) with 10 narrowband filters, ranging from 317.5 to 779.5 nm [12]. This unique Earth observing geometry of EPIC compared to other instruments in Sun-synchronous orbits, that rarely view Earth at such large scattering angles [13], makes it a suitable candidate for several climate science applications including the measurement of cloud reflectivity and aerosol optical thickness and layer height [14].
We apply the algorithm proposed in Section 6 to the retrieval of aerosol optical thickness and aerosol layer height by generating synthetic measurements corresponding to the EPIC instrument. Specifically, Channels 7 and 8 in the Oxygen B-band at 680 and 687.75 nm, respectively, and Channels 9 and 10 in the Oxygen A-band at 764 and 779.5 nm, respectively, are used in the retrieval. The linearized radiative transfer model is based on the discrete ordinates method with matrix exponential and uses the correlated k-distribution method in conjunction with the principal component analysis technique to speed up the computations [15,16].
We consider the aerosol models implemented in the Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol retrieval algorithm over land [1]. There are three spherical, fine-dominated model types (non-absorbing, moderately absorbing, and absorbing) and one spheroid, coarse-dominated model type (dust). These aerosol models, which depend on location and season, are the result of a cluster analysis of the climatology of almucantar retrievals from global AERONET (AErosol RObotic NETwork) measurements [17]. Each model consists of two log-normal modes (accumulated and coarse). A single log-normal mode is described by the number size distribution:
d N ( r ) d ln r = N 0 2 π σ exp ( ln r ln r mod ) 2 2 σ 2 ,
where r mod is the modal or median radius of the number size distribution, σ is the standard deviation, and:
N 0 = 0 d N ( r ) d ln r d ln r
is the total number of particles per cross-section of the atmospheric column. Table 1 displays the log-normal size parameters and the complex refractive indices m for the four aerosol models, where, for each log-normal mode,
r v = r mod exp ( 3 σ 2 )
is the median radius of the volume size distribution:
d V ( r ) d ln r = V 0 2 π σ exp ( ln r ln r v ) 2 2 σ 2 ,
and:
V 0 = 0 4 π r 3 3 d N ( r ) d ln r d ln r = N 0 4 π r mod 3 3 exp ( 4.5 σ 2 )
is the volume of particles per cross-section of the atmospheric column.
In our numerical analysis, the state vector is x = [ τ , H ] T , where τ is the aerosol optical thickness and H the aerosol layer height. The true aerosol optical thicknesses to be retrieved are τ t = 0.25 , 0.50, 0.75, 1.0, 1.25, and 1.5, while for the true aerosol layer height, we take H t = 1.0 , 1.5, 2.0, 2.5, and 3.0 km. The a priori values, which coincide with the initial guesses, are τ a = 2.0 and H a = 4 km. A Lambertian surface is assumed, and if not stated otherwise, the surface albedo is A = 0.06 . We generate synthetic measurements by choosing the moderately absorbing aerosol model (Model 2) as the true or exact model. Specifically, we:
  • compute the radiances for the moderately absorbing aerosol model and the true values τ t and H t , I modabs ( λ i , τ t , H t ) ;
  • add the measurement noise δ Imes ,
    I modabs δ ( λ i , τ t , H t ) = I modabs ( λ i , τ t , H t ) + [ δ Imes ] i ,
    where δ Imes N ( 0 , C Imes ) , C Imes = diag ( σ Imes i 2 ) M , and:
    σ Imes i = I modabs ( λ i , τ t , H t ) SNR with SNR = 290 ,
    which is the maximum signal-to-noise ratio according to the EPIC camera specifications and
  • in view of the approximation:
    ln I modabs δ ( λ i , τ t , H t ) ln I modabs ( λ i , τ t , H t ) + [ δ Imes ] i I modabs ( λ i , τ t , H t ) ,
    identify [ y δ ] i = ln I modabs δ ( λ i , τ t , H t ) , [ y ] i = ln I modabs ( λ i , τ t , H t ) , and:
    [ δ mes ] i = [ δ Imes ] i I modabs ( λ i , τ t , H t ) ,
    implying:
    σ mes i = σ Imes i I modabs ( λ i , τ t , H t ) = 1 SNR .
The above scheme yields C δ m = I M and P = I M , showing that the regularized solution does not depend on the scaling matrix. Coming to the regularization matrix, we first note that the choice σ x i = ε x [ x a ] i for some ε x gives:
L = diag σ x σ x i N = diag i = 1 N [ x a ] i 2 N [ x a ] i N .
However, to have a better control on the amount of regularization that is applied to each component of the state vector, we introduce the weight w x i of component [ x ] i , and take:
L = diag w x i i = 1 N [ x a ] i 2 N [ x a ] i N .
If w x i , the component [ x ] i is close to the a priori, while for w x i 0 , the component [ x ] i is practically unconstrained. In our simulations, we used w τ = w H = 1.0 . What is left is the specification of the control parameters of the algorithm; these are q = 0.1 , α min = 10 6 γ m N , where γ m N is the smallest singular value of the quotient matrix K ¯ m k L 1 at k = 1 (the first iteration step), ε R = 10 3 , and η = 1.05 .
To analyze the accuracy of the aerosol retrieval we consider several test examples.
  • Test Example 1
In the first example, we analyze the efficiency of the aerosol model selection by considering all models, i.e., we take m = 1, 2, 3, 4. Thus, the algorithm has the chance to identify the correct aerosol model. The solution accuracy is characterized by the relative errors:
ε mean τ = | τ mean τ t | τ t and ε mean H = | H mean H t | τ t
corresponding to (cf. Equation (17)) x ^ mean δ = [ τ mean , H mean ] , and:
ε max τ = | τ max τ t | τ t and ε max H = | H max H t | τ t
corresponding to (cf. Equation (15)) x ^ max δ = [ τ max , H max ] . In Figure 1 and Figure 2, we illustrate the variations of the relative errors with respect to τ t and H t , respectively, while in Table 2, we show the average relative errors:
ε mean | τ τ = 1 N τ i = 1 N τ ε mean i τ and ε mean | τ H = 1 N τ i = 1 N τ ε mean i H
over τ t for N τ = 6 , and:
ε mean | H τ = 1 N H i = 1 N H ε mean i τ and ε mean | H H = 1 N H i = 1 N H ε mean i H
over H t for N H = 5 . The following conclusions can be drawn:
  • the relative errors ε mean τ and ε mean H corresponding to the generalized cross-validation (GCV and MLGCV) are in general smaller than those corresponding to the maximum marginal likelihood estimation (MMLE and MLMMLE).
  • the relative errors ε max τ and ε max H are very small, and so, we deduce that the algorithm recognizes the exact aerosol model;
  • the best method is GCV, in which case the average relative errors are ε mean | τ τ = 0.009 , ε mean | τ H = 0.020 , ε mean | H τ = 0.001 , and ε mean | H H = 0.024 ;
  • the aerosol optical thickness is better retrieved than the aerosol layer height.
  • Test Example 2
In the second example, we consider all aerosol models m except the exact one, i.e., we take m = 1 , 3 , 4 . In this more realistic scenario, the algorithm tries to find an aerosol model that is as close as possible to the exact one. The variations of the relative errors with respect to τ t and H t are illustrated in Figure 3 and Figure 4, respectively, while the average relative errors are given in Table 3. The following conclusions can be drawn:
  • the relative errors ε mean τ and ε mean H corresponding to the generalized cross-validation (GCV and MLGCV) are still smaller than those corresponding to the maximum marginal likelihood estimation (MMLE and MLMMLE).
  • the relative errors ε max τ and ε max H are extremely large, and so, we infer that the maximum solution estimate x ^ max δ = [ τ max , H max ] is completely unrealistic;
  • as before, the best method is GCV characterized by the average relative errors ε mean | τ τ = 0.060 , ε mean | τ H = 0.111 , ε mean | H τ = 0.022 , and ε mean | H H = 0.096 ;
  • the relative errors are significantly larger than those corresponding to the case when all four aerosol models are taken into account.
In Figure 5, we show the mean a posteriori density of τ and H computed by GCV for Test Examples 1 and 2. In Test Example 1, the a posteriori density is sharply peaked, indicating small errors in the retrieval, while in Test Example 2, the a posteriori density is wide and spread over all the aerosol models.
  • Test Example 3
In the third series of our numerical experiments, we include the surface albedo in the retrieval. In fact, the surface albedo regarded as a model parameter can be (i) assumed to be known, (ii) included in the retrieval, or (iii) treated as a model uncertainty. The second situation, which leads to more accurate results, is considered here. In this case, however, the aerosol optical thickness and the surface albedo are strongly correlated (the condition number of K ¯ m k L 1 at k = 1 is 10 3 10 4 times larger than for the case in which the surface albedo is not part of the retrieval). Since we are not interested in an accurate surface albedo retrieval, we chose the weight controlling the constraint of the surface albedo in the regularization matrix as w A = 10 3 . As w τ = w H = 1.0 , this means that the surface albedo is tightly constrained to the a priori. We use the a priori and true values A a = 0.06 and A t = 0.063 , respectively; thus, the uncertainty in the surface albedo with respect to the a priori is 5%. The synthetic measurements are generated with the true surface albedo of 0.063.
In Figure 6, we illustrate the variations of the relative errors ε mean τ and ε mean H with respect to τ t and H t , when all four aerosol models are taken into account. The average relative errors are given in Table 4. The results show that:
  • the relative errors ε mean τ and ε mean H corresponding to generalized cross-validation (GCV and MLGCV) are in general larger than those corresponding to maximum marginal likelihood estimation (MMLE and MLMMLE);
  • the best methods are MMLE and MLMMLE; the average relative errors given by MLMMLE are ε mean | τ τ = 0.051 , ε mean | τ H = 0.060 , ε mean | H τ = 0.011 , and ε mean | H H = 0.027 ;
  • the relative errors are significantly larger than those obtained in the first two test examples (when the surface albedo is known exactly).
Figure 7 illustrates the variations of the relative errors ε mean τ and ε mean H with respect to τ t and H t , when all aerosol models except the exact one are considered. The corresponding average relative errors are given in Table 5. The results show that:
  • as in the previous scenario, the relative errors ε mean τ and ε mean H corresponding to generalized cross-validation (GCV and MLGCV) are in general larger than those corresponding to maximum marginal likelihood estimation (MMLE and MLMMLE).
  • the best methods are MMLE and MLMMLE; the average relative errors delivered by MMLE are ε mean | τ τ = 0.101, ε mean | τ H = 0.113, ε mean | H τ = 0.070, and ε mean | H H = 0.204;
  • the relative errors are the largest among all the test examples.

8. Conclusions

We designed a retrieval algorithm that takes into account uncertainty in model selection. The solution corresponding to a specific model is characterized by a metric called relative evidence, which is a measure of the fit between the model and the measurement. Based on this metric, the maximum solution estimate, corresponding to the model with the highest evidence, and the mean solution estimate, representing a linear combination of solutions weighted by their evidence, are introduced.
The retrieval algorithm is based on:
  • an application of the prewhitening technique in order to transform the data model into a scaled model with white noise;
  • a deterministic regularization method, i.e., the iteratively regularized Gauss–Newton method, in order to compute the regularized solution (equivalent to the maximum a posteriori estimate of the solution in a Bayesian framework) and to determine the optimal value of the regularization parameter (equivalent to the ratio of the data error and a priori state variances in a Bayesian framework);
  • a linearization of the forward model around the solution in order to transform the nonlinear data model into a linear model and, in turn, facilitate an analytical integration of the likelihood density over the state vector;
  • an extension of maximum marginal likelihood estimation and generalized cross-validation to model selection and data error variance estimation.
Essentially, the algorithm includes four selection models corresponding to:
  • the two parameter choice methods used (maximum marginal likelihood estimation and generalized cross-validation) and
  • the two settings in which the relative evidence is treated (stochastic and deterministic).
The algorithm is applied to the retrieval of aerosol optical thickness and aerosol layer height from synthetic measurements corresponding to the EPIC instrument. In the simulations, the aerosol models implemented in the MODIS aerosol algorithm over land are considered, and the surface albedo is either assumed to be known or included in the retrieval. The following conclusions are drawn:
  • The differences between the results corresponding to the stochastic and deterministic interpretations of the relative evidence are not significant.
  • If the surface albedo is assumed to be known, generalized cross-validation is superior to maximum marginal likelihood estimation; if the surface albedo is included in the retrieval, the contrary is true.
  • The errors in the aerosol optical thickness retrieval are smaller than those in the aerosol layer height retrieval. In the most realistic situation, when the exact aerosol model and surface albedo are unknown, the average relative errors in the retrieved aerosol optical thickness are about 10%, while the corresponding errors in the aerosol layer height are about 20%.
  • The maximum solution estimate is completely unrealistic when both the aerosol model and surface albedo are unknown.

Author Contributions

Conceptualization: V.N. and A.D.; Data curation: S.S.; Formal analysis: S.S. and V.M.G.; Funding acquisition: V.N., D.L. and A.D.; Investigation: S.S.; Methodology: V.N. and A.D.; Project administration: D.L. and A.D.; Supervision: V.N., D.S.E. and A.D.; Validation: V.N. and A.D.; Writing—original draft: S.S.; Writing—review & editing: S.S., V.N., D.S.E., D.L. and A.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the German Aerospace Center (DLR) and the German Academic Exchange Service (DAAD) through the program DLR/DAAD Research Fellowships 2015 (57186656), with Reference Numbers 91613528 and 91627488.

Acknowledgments

A portion of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004). V.N. acknowledges support from the NASA Earth Science US Participating Investigator program (Solicitation NNH16ZDA001N-ESUSPI).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this Appendix, we compute the marginal likelihood for the linear data model (26) with p N ( 0 , C x = σ m 2 ( α ^ L T L ) 1 ) and δ ¯ m N ( 0 , C ¯ δ m = σ m 2 I M ) . In this case, the a priori and likelihood densities p ( p m ) and p ( y ¯ δ p , m ) are given by:
p ( p m ) = 1 ( 2 π σ m 2 ) N det [ ( α ^ L T L ) 1 ] × exp α ^ 2 σ m 2 p T L T Lp
and:
p ( y ¯ δ p , m ) = 1 ( 2 π σ m 2 ) M × exp 1 2 σ m 2 ( y ¯ m α ^ δ K ¯ m α ^ p ) T ( y ¯ m α ^ δ K ¯ m α ^ p ) ,
respectively. Using the identity:
( y ¯ m α ^ δ K ¯ m α ^ p ) T ( y ¯ m α ^ δ K ¯ m α ^ p ) + α ^ p T L T Lp = ( p p m α ^ δ ) T ( K ¯ m α ^ T K ¯ m α ^ + α ^ L T L ) ( p p m α ^ δ ) + y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ ,
where (cf. Equation (23)) p m α ^ δ = K ¯ m α ^ y ¯ m α ^ δ and A ^ m α ^ = K ¯ m α ^ K ¯ m α ^ , we express the integrand in Equation (8) as:
p ( y ¯ δ x , m ) p ( x m ) = 1 ( 2 π σ m 2 ) N + M det [ ( α ^ L T L ) 1 ] × exp 1 2 σ m 2 ( p p m α ^ δ ) T ( K ¯ m α ^ T K ¯ m α ^ + α ^ L T L ) ( p p m α ^ δ ) × exp 1 2 σ m 2 y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ .
The normalization condition:
exp 1 2 ( p p m α ^ δ ) T [ σ m 2 ( K ¯ m α ^ T K ¯ m α ^ + α ^ L T L ) 1 ] 1 ( p p m α ^ δ ) d p = ( 2 π σ m 2 ) N det [ ( K ¯ m α ^ T K ¯ m α ^ + α ^ L T L ) 1 ] ,
the identities det ( A 1 ) = [ det ( A ) ] 1 and:
( K ¯ m α ^ T K ¯ m α ^ + α ^ L T L ) 1 α ^ L T L = I N A m α ^ ,
where A m α ^ = K ¯ m α ^ K ¯ m α ^ is the averaging kernel matrix, and the relation:
det ( I N A m α ^ ) = det ( I M A ^ m α ^ ) ,
yield the following expression for the marginal likelihood density:
p ( y ¯ δ m ) = det ( I M A ^ m α ^ ) ( 2 π σ m 2 ) M × exp 1 2 σ m 2 y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ .

Appendix B

In this Appendix, we present a general tool for estimating the data error variance and the optimal model. Consider the linear model (26), i.e., y ¯ m α ^ δ = K ¯ m α ^ p + δ ¯ m , and a singular value decomposition of the quotient matrix K ¯ m α ^ L 1 = U Γ V T , where Γ = diag ( γ i ) N with the convention γ m i 2 = 0 for i > N , U = [ u 1 , , u M ] , and V = [ v 1 , , v N ] . The covariance matrix of the data y ¯ m α ^ δ can be computed as:
E { y ¯ m α ^ δ y ¯ m α ^ δ T } = K ¯ m α ^ C x K ¯ m α ^ T + C ¯ δ m = σ m 2 α ^ K ¯ m α ^ ( L T L ) 1 K ¯ m α ^ T + σ m 2 I M = U Σ y ¯ U T
where:
Σ y ¯ = diag σ m 2 γ m i 2 + α ^ α ^ M .
Define the scaled data:
Y ¯ m α ^ δ = U T y ¯ m α ^ δ
and note that the covariance matrix of Y ¯ m α ^ δ is the diagonal matrix Σ y ¯ , i.e.,
E { Y ¯ m α ^ δ Y ¯ m α ^ δ T } = Σ y ¯ .
If σ m 2 is the correct data error variance, we must have (cf. Equation (A2)):
E { Y ¯ m α ^ i δ 2 } = σ m 2 γ m i 2 + α ^ α ^ , i = 1 , , M ,
where Y ¯ m α ^ i δ = [ Y ¯ m α ^ δ ] i . If σ m 2 is unknown, we can find the estimate σ ^ m 2 from the equations:
E { Y ¯ m α ^ i δ 2 } = σ ^ m 2 γ m i 2 + α ^ α ^ , i = 1 , , M .
However, since only one realization of the random vector Y ¯ m α ^ δ is known, the calculation of σ ^ m 2 from Equation (A3) may lead to erroneous results. Therefore, we look for another selection criterion.
Set:
a i ( σ m 2 ) = σ m 2 γ m i 2 + α ^ α ^
and define the function:
f ( Y ¯ m α ^ δ | m , σ m 2 ) = i = 1 M ψ ( a i ( σ m 2 ) ) + ψ ( a i ( σ m 2 ) ) [ Y ¯ m α ^ i δ 2 a i ( σ m 2 ) ]
with ψ ( a ) being a strictly concave function, i.e., ψ ′′ ( a ) < 0 . The expected value of f ( · ) is given by:
E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } = i = 1 M ψ ( a i ( σ m 2 ) ) + ψ ( a i ( σ m 2 ) ) [ E { Y ¯ m α ^ i δ 2 } a i ( σ m 2 ) ] ,
hence, defining the estimate σ ^ m 2 through the relation (cf. Equations (A3) and (A4)):
E { Y ¯ m α ^ i δ 2 } = a i ( σ ^ m 2 ) ,
we express E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } as:
E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } = i = 1 M ψ ( a i ( σ m 2 ) ) + ψ ( a i ( σ m 2 ) ) [ a i ( σ ^ m 2 ) a i ( σ m 2 ) ] .
Then, we obtain:
E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } = i = 1 M { ψ ( a i ( σ m 2 ) ) ψ ( a i ( σ ^ m 2 ) ) + ψ ( a i ( σ m 2 ) ) [ a i ( σ ^ m 2 ) a i ( σ m 2 ) ] } .
Considering the second-order Taylor expansion around a i ( σ m 2 ) ,
ψ ( a i ( σ ^ m 2 ) ) = ψ ( a i ( σ m 2 ) ) + ψ ( a i ( σ m 2 ) ) [ a i ( σ ^ m 2 ) a i ( σ m 2 ) ] + 1 2 ψ ′′ ( ξ i ) [ a i ( σ ^ m 2 ) a i ( σ m 2 ) ] 2
for some ξ i between a i ( σ m 2 ) and a i ( σ ^ m 2 ) and taking into account the fact that ψ is strictly concave, we deduce that each term in the sum (A6) is non-negative and vanishes only for a i ( σ m 2 ) = a i ( σ ^ m 2 ) . Thus, we have:
E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) }
for all σ m 2 . In conclusion, σ ^ m 2 , defined through Equation (A5), is the unique global minimizer of E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } , i.e.,
σ ^ m 2 = arg min σ m 2 E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } .
Coming to the optimal model m ^ , it seems that a natural choice of m ^ is:
m ^ = arg min m E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } .
However, instead of Equation (A8), we use a more general selection criterion, namely the optimal model m ^ defined as:
m ^ = arg min m F ( E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } ) ,
where F is a monotonic increasing function of E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } .
The generalized cross-validation [10,11] and maximum marginal likelihood estimation [7,8,9] can be obtained by a particular choice of the function ψ ( a ) .
  • Generalized cross-validation
For the choice:
ψ ( a ) = 1 1 a ,
we obtain:
E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } = M + i = 1 M E { Y ¯ m α ^ i δ 2 } a i 2 ( σ m 2 ) 2 a i ( σ m 2 ) .
Since σ ^ m 2 is the unique global minimizer of E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } , the condition:
d d σ m 2 E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } = 0 ,
yields:
i = 1 M E { Y ¯ m α ^ i δ 2 } a i 2 ( σ ^ m 2 ) 1 a i ( σ ^ m 2 ) = 0 .
The above equation together with the relation:
a i ( σ ^ m 2 ) = σ ^ m 2 γ m i 2 + α ^ α ^ ,
provide the following estimate for the data error variance:
σ ^ m 2 = i = 1 M α ^ γ m i 2 + α ^ 2 E { Y ¯ m α ^ i δ 2 } i = 1 M α ^ γ m i 2 + α ^ .
On the other hand, from Equations (A10) and (A12), we obtain:
E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } = M i = 1 M 1 a i ( σ ^ m 2 ) ,
and we define the function F ( · ) by:
F ( E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } ) = 1 M E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } .
Taking Equations (A13) and (A14) into account, we find that the function F ( · ) can be calculated as:
F ( E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } ) = i = 1 M α ^ γ m i 2 + α ^ 2 E { Y ¯ m α ^ i δ 2 } i = 1 M α ^ γ m i 2 + α ^ 2 .
In practice, the expectation E { Y ¯ m α ^ i δ 2 } cannot be computed since only one realization Y ¯ m α ^ i δ = u i T y ¯ m α ^ δ is known. Therefore, we approximate E { Y ¯ m α ^ i δ 2 } ( u i T y ¯ m α ^ δ ) 2 and consider the so-called generalized cross-validation function:
v ( m ) = i = 1 M α ^ γ m i 2 + α ^ 2 ( u i T y ¯ m α ^ δ ) 2 i = 1 M α ^ γ m i 2 + α ^ 2 .
Finally, using the representations:
y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ 2 = r m α ^ δ 2 = i = 1 M α ^ γ m i 2 + α ^ 2 ( u i T y ¯ m α ^ δ ) 2 ,
trace ( I M A ^ m α ^ ) = i = 1 M α ^ γ m i 2 + α ^ ,
we express the estimate of the data error variance as:
σ ^ m 2 = r m α ^ δ 2 [ trace ( I M A ^ m α ^ ) ] ,
and the generalized cross-validation function as:
v ( m ) = r m α ^ δ 2 [ trace ( I M A ^ m α ^ ) ] 2 .
  • Maximum marginal likelihood estimation
For the choice:
ψ ( a ) = ln a ψ ( a ) = 1 a ,
we obtain:
E { f ( Y ¯ m α ^ δ | m , σ m 2 ) } = M + i = 1 M E { Y ¯ m α ^ i δ 2 } a i ( σ m 2 ) + ln a i ( σ m 2 ) .
From the minimization condition (A11), we find that σ ^ m 2 satisfying the equation:
i = 1 M E { Y ¯ m α ^ i δ 2 } a i ( σ ^ m 2 ) = M ,
is given by:
σ ^ m 2 = 1 M i = 1 M α ^ γ m i 2 + α ^ E { Y ¯ m α ^ i δ 2 } .
Equations (A20) and (A21) yield:
E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } = i = 1 M ln a i ( σ ^ m 2 )
and we define the function F ( · ) by:
F ( E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } ) = exp 1 M E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) }
Furthermore, using the result:
i = 1 M ln a i ( σ ^ m 2 ) = ln i = 1 M α ^ γ m i 2 + α ^ E { Y ¯ m α ^ i δ 2 } M i = 1 M ln α ^ γ m i 2 + α ^ ,
we express F ( · ) as:
F ( E { f ( Y ¯ m α ^ δ | m , σ ^ m 2 ) } ) = i = 1 M α ^ γ m i 2 + α ^ E { Y ¯ m α ^ i δ 2 } i = 1 M α ^ γ m i 2 + α ^ 1 / M .
Considering the approximation E { Y ¯ m α ^ i δ 2 } ( u i T y ¯ m α ^ δ ) 2 , we define the maximum likelihood function λ ( m ) by:
λ ( m ) = i = 1 M α ^ γ m i 2 + α ^ ( u i T y ¯ m α ^ δ ) 2 i = 1 M α ^ γ m i 2 + α ^ 1 / M
Using the results:
y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ = i = 1 M α ^ γ m i 2 + α ^ ( u i T y ¯ m α ^ δ ) 2 ,
det ( I M A ^ m α ^ ) = i = 1 N α ^ γ m i 2 + α ^ ,
we express the estimate for the data error variance as:
σ ^ L m 2 = 1 M y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ ,
and the maximum likelihood function as:
λ ( m ) = y ¯ m α ^ δ T ( I M A ^ m α ^ ) y ¯ m α ^ δ det ( I M A ^ m α ^ ) M .

Appendix C

In this Appendix, we derive an estimate for the data error variance by analyzing the residual of the linear equation y ¯ m α ^ δ = K ¯ m α ^ p . In terms of a singular value decomposition of the quotient matrix K ¯ m α ^ L 1 = U Γ V T with Γ = diag ( γ i ) N , γ m i 2 = 0 for i > N , U = [ u 1 , , u M ] , and V = [ v 1 , , v N ] , the squared norm of the residual of the linear equation y ¯ m α ^ δ = K ¯ m α ^ p is:
y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ 2 = i = 1 M α ^ γ m i 2 + α ^ 2 ( u i T y ¯ m α ^ δ ) 2 .
In the data model (cf. Equation (26)) y ¯ m α ^ δ = K ¯ m α ^ p + δ ¯ m , we set y ¯ m α ^ = K ¯ m α ^ p m α ^ δ and rewrite this equation as:
y ¯ m α ^ δ = y ¯ m α ^ + δ ¯ m .
Using the relation:
E { ( u i T δ ¯ m ) ( u j T δ ¯ m ) } = E k = 1 M l = 1 M [ δ ¯ m ] k [ δ ¯ m ] l [ u i ] k [ u j ] l = σ m 2 u i T u j = σ m 2 δ i j ,
where δ i j is the Kronecker delta function, we get:
E { ( u i T δ ¯ m ) 2 } = σ m 2 ,
so that from Equations (A29) and (A30), we obtain:
E { ( u i T y ¯ m α ^ δ ) 2 } = ( u i T y ¯ m α ^ ) 2 + σ m 2 .
Since y ¯ m α ^ = K ¯ m α ^ p m α ^ δ belongs to the range of the matrix operator K ¯ m α ^ , which, in turn, is spanned by the vectors { u i } i = 1 N , we have u i T y ¯ m α ^ = 0 for i > N , and further (cf. Equation (A31)):
E { ( u i T y ¯ m α ^ δ ) 2 } = σ m 2 for i > N .
Taking the expected value of Equation (A28) and using Equations (A31) and (A32), we get:
E { y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ | | 2 } = ( M N ) σ m 2 + i = 1 N α ^ γ m i 2 + α ^ 2 [ ( u i T y ¯ m α ^ ) 2 + σ m 2 ]
Now, if α ^ γ m i 2 for all i = 1 , , N , we approximate:
E { y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ | | 2 } ( M N ) σ m 2 ,
and deduce that an estimate for the data error variance is:
σ ^ m 2 1 M N E { y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ 2 } 1 M N y ¯ m α ^ δ K ¯ m α ^ p m α ^ δ 2 = 1 M N r m α ^ δ 2 .

References

  1. Levy, R.C.; Remer, L.A.; Dubovik, O. Global aerosol optical properties and application to Moderate Resolution Imaging Spectroradiometer aerosol retrieval over land. J. Geophys. Res. Atmos. 2007, 112. [Google Scholar] [CrossRef] [Green Version]
  2. Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian Model averaging: A tutorial. Stat. Sci. 1999, 14, 382–401. [Google Scholar]
  3. Määttä, A.; Laine, M.; Tamminen, J.; Veefkind, J. Quantification of uncertainty in aerosol optical thickness retrieval arising from aerosol microphysical model and other sources, applied to Ozone Monitoring Instrument (OMI) measurements. Atmos. Meas. Tech. 2014, 7, 1185–1199. [Google Scholar] [CrossRef]
  4. Kauppi, A.; Kolmonen, P.; Laine, M.; Tamminen, J. Aerosol-type retrieval and uncertainty quantification from OMI data. Atmos. Meas. Tech. 2017, 10, 4079–4098. [Google Scholar] [CrossRef] [Green Version]
  5. Doicu, A.; Trautmann, T.; Schreier, F. Numerical Regularization for Atmospheric Inverse Problems; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar] [CrossRef] [Green Version]
  6. Tarantola, A. Inverse Problem Theory and Methods for Model Parameter Estimation; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 2005. [Google Scholar] [CrossRef] [Green Version]
  7. Patterson, H.; Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 1971, 58, 545–554. [Google Scholar] [CrossRef]
  8. Smyth, G.K.; Verbyla, A.P. A conditional likelihood approach to residual maximum likelihood estimation in generalized linear models. J. R. Stat. Soc. Ser. Methodol. 1996, 58, 565–572. [Google Scholar] [CrossRef]
  9. Stuart, A.; Ord, J.; Arnols, S. Kendall’s Advanced Theory of Statistics. Volume 2A: Classical Inference and the Linear Model; Oxford University Press Inc.: Oxford, UK, 1999. [Google Scholar]
  10. Wahba, G. Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 1977, 14, 651–667. [Google Scholar] [CrossRef]
  11. Wahba, G. Spline Models for Observational Data; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1990. [Google Scholar] [CrossRef]
  12. Marshak, A.; Herman, J.; Szabo, A.; Blank, K.; Carn, S.; Cede, A.; Geogdzhayev, I.; Huang, D.; Huang, L.K.; Knyazikhin, Y.; et al. Earth Observations from DSCOVR EPIC Instrument. Bull. Am. Meteorol. Soc. 2018, 99, 1829–1850. [Google Scholar] [CrossRef] [PubMed]
  13. Geogdzhayev, I.V.; Marshak, A. Calibration of the DSCOVR EPIC visible and NIR channels using MODIS Terra and Aqua data and EPIC lunar observations. Atmos. Meas. Tech. 2018, 11, 359–368. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Xu, X.; Wang, J.; Wang, Y.; Zeng, J.; Torres, O.; Reid, J.; Miller, S.; Martins, J.; Remer, L. Detecting layer height of smoke aerosols over vegetated land and water surfaces via oxygen absorption bands: Hourly results from EPIC/DSCOVR in deep space. Atmos. Meas. Tech. 2019, 12, 3269–3288. [Google Scholar] [CrossRef] [Green Version]
  15. García, V.M.; Sasi, S.; Efremenko, D.S.; Doicu, A.; Loyola, D. Radiative transfer models for retrieval of cloud parameters from EPIC/DSCOVR measurements. J. Quant. Spectrosc. Radiat. Transf. 2018, 213, 228–240. [Google Scholar] [CrossRef] [Green Version]
  16. García, V.M.; Sasi, S.; Efremenko, D.S.; Doicu, A.; Loyola, D. Linearized radiative transfer models for retrieval of cloud parameters from EPIC/DSCOVR measurements. J. Quant. Spectrosc. Radiat. Transf. 2018, 213, 241–251. [Google Scholar] [CrossRef] [Green Version]
  17. Holben, B.; Eck, T.; Slutsker, I.; Tanré, D.; Buis, J.; Setzer, A.; Vermote, E.; Reagan, J.; Kaufman, Y.; Nakajima, T.; et al. AERONET—A federated instrument network and data archive for aerosol characterization. Remote Sens. Environ. 1998, 66, 1–16. [Google Scholar] [CrossRef]
Figure 1. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus τ t for H t = 3.0 km. All four aerosol models are involved in the retrieval.
Figure 1. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus τ t for H t = 3.0 km. All four aerosol models are involved in the retrieval.
Remotesensing 12 03724 g001
Figure 2. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus H t for τ t = 1.0. All four aerosol models are involved in the retrieval.
Figure 2. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus H t for τ t = 1.0. All four aerosol models are involved in the retrieval.
Remotesensing 12 03724 g002
Figure 3. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus τ t for H t = 3.0 km. All aerosol models except the exact one are involved in the retrieval.
Figure 3. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus τ t for H t = 3.0 km. All aerosol models except the exact one are involved in the retrieval.
Remotesensing 12 03724 g003
Figure 4. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus H t for τ t = 1.0 . All aerosol models except the exact one are involved in the retrieval.
Figure 4. Relative errors ε mean τ , ε max τ , ε mean H , and ε max H versus H t for τ t = 1.0 . All aerosol models except the exact one are involved in the retrieval.
Remotesensing 12 03724 g004
Figure 5. Upper panels: mean a posteriori density of τ computed by GCV for Test Examples 1 (left) and 2 (right). Lower panels: mean a posteriori density of H computed by GCV for Test Examples 1 (left) and 2 (right). The true values are τ t = 1.5 and H t = 3.0 km.
Figure 5. Upper panels: mean a posteriori density of τ computed by GCV for Test Examples 1 (left) and 2 (right). Lower panels: mean a posteriori density of H computed by GCV for Test Examples 1 (left) and 2 (right). The true values are τ t = 1.5 and H t = 3.0 km.
Remotesensing 12 03724 g005
Figure 6. Relative errors ε mean τ and ε mean H versus τ t for H t = 3.0 km (upper panels) and H t for τ t = 1.0 (lower panels). The surface albedo is included in the retrieval and all four aerosol models are considered.
Figure 6. Relative errors ε mean τ and ε mean H versus τ t for H t = 3.0 km (upper panels) and H t for τ t = 1.0 (lower panels). The surface albedo is included in the retrieval and all four aerosol models are considered.
Remotesensing 12 03724 g006
Figure 7. Relative errors ε mean τ and ε mean H versus τ t for H t = 3.0 km (upper panels) and H t for τ t = 1.0 (lower panels). The surface albedo is included in the retrieval and all aerosol models excepting the exact one are considered.
Figure 7. Relative errors ε mean τ and ε mean H versus τ t for H t = 3.0 km (upper panels) and H t for τ t = 1.0 (lower panels). The surface albedo is included in the retrieval and all aerosol models excepting the exact one are considered.
Remotesensing 12 03724 g007
Table 1. Optical properties of the aerosol models. For each model, the first and second lines are related to the accumulated and coarse modes, respectively. The complex refractive index of dust corresponds to the wavelengths 470 (1), 550 (2), 660 (3), and 2100 (4) nm. In the simulations, the values of the refractive index at λ = 660 nm are assumed.
Table 1. Optical properties of the aerosol models. For each model, the first and second lines are related to the accumulated and coarse modes, respectively. The complex refractive index of dust corresponds to the wavelengths 470 (1), 550 (2), 660 (3), and 2100 (4) nm. In the simulations, the values of the refractive index at λ = 660 nm are assumed.
Model r v ( μ m ) σ m = ( Re ( m ) , Im ( m ) ) V 0 ( μ m 3 / μ m 2 )
1 Nonabs . 0.160 + 0.0434 τ 3.325 + 0.1411 τ 0.364 + 0.1529 τ 0.759 + 0.0168 τ ( 1.42 , 0.004 0.0015 τ ) ( 1.42 , 0.004 0.0015 τ ) 0.1718 τ 0.821 0.0934 τ 0.639
2 Modabs . 0.145 + 0.0203 τ 3.101 + 0.3364 τ 0.374 + 0.1365 τ 0.729 + 0.098 τ ( 1.43 , 0.008 0.002 τ ) ( 1.43 , 0.008 0.002 τ ) 0.1642 τ 0.775 0.1482 τ 0.684
3 Abs . 0.134 + 0.0096 τ 3.448 + 0.9489 τ 0.383 + 0.0794 τ 0.743 + 0.0409 τ ( 1.51 , 0.02 ) ( 1.51 , 0.02 ) 0.1748 τ 0.891 0.1043 τ 0.682
4 Dust 0.1416 τ 0.052 2.2 0.7561 τ 0.148 0.554 τ 0.052 ( 1.48 τ 0.021 , 0.0025 τ 0.132 ) ( 1 ) ( 1.48 τ 0.021 , 0.002 ) ( 2 ) ( 1.48 τ 0.021 , 0.0018 τ 0.08 ) ( 3 ) ( 1.46 τ 0.040 , 0.0018 τ 0.30 ) ( 4 ) ( 1.48 τ 0.021 , 0.0025 τ 0.132 ) ( 1 ) ( 1.48 τ 0.021 , 0.002 ) ( 2 ) ( 1.48 τ 0.021 , 0.0018 τ 0.08 ) ( 3 ) ( 1.46 τ 0.040 , 0.0018 τ 0.30 ) ( 4 ) 0.0871 τ 1.026 0.6786 τ 1.057
Table 2. Average relative errors for the results plotted in Figure 1 and Figure 2. MMLE, Maximum Marginal Likelihood Estimation; GCV, Generalized-Cross Validation.
Table 2. Average relative errors for the results plotted in Figure 1 and Figure 2. MMLE, Maximum Marginal Likelihood Estimation; GCV, Generalized-Cross Validation.
MethodAverage Relative Error
ε mean τ τ ε mean τ H ε mean H τ ε mean H H
MMLE0.0460.0910.0270.102
MLMMLE0.0090.0620.0220.059
GCV0.0090.0200.0010.024
MLGCV0.0090.0250.0060.042
Table 3. Average relative errors for the results plotted in Figure 3 and Figure 4.
Table 3. Average relative errors for the results plotted in Figure 3 and Figure 4.
MethodAverage Relative Error
ε mean τ τ ε mean τ H ε mean H τ ε mean H H
MMLE0.0950.2060.0730.318
MLMMLE0.1800.2100.1390.348
GCV0.0600.1110.0220.096
MLGCV0.0590.2000.0760.259
Table 4. Average relative errors for the results plotted in Figure 6.
Table 4. Average relative errors for the results plotted in Figure 6.
MethodAverage Relative Error
ε mean τ τ ε mean τ H ε mean H τ ε mean H H
MMLE0.0420.0460.0180.042
MLMMLE0.0510.0600.0110.027
GCV0.1010.2130.0760.237
MLGCV0.1190.3110.1000.398
Table 5. Average relative errors for the results plotted in Figure 7.
Table 5. Average relative errors for the results plotted in Figure 7.
MethodAverage Relative Error
ε mean τ τ ε mean τ H ε mean H τ ε mean H H
MMLE0.1010.1130.0700.204
MLMMLE0.1170.2030.0570.291
GCV0.1360.2690.1090.274
MLGCV0.1160.3070.1470.346
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sasi, S.; Natraj, V.; Molina García, V.; Efremenko, D.S.; Loyola, D.; Doicu, A. Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory. Remote Sens. 2020, 12, 3724. https://doi.org/10.3390/rs12223724

AMA Style

Sasi S, Natraj V, Molina García V, Efremenko DS, Loyola D, Doicu A. Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory. Remote Sensing. 2020; 12(22):3724. https://doi.org/10.3390/rs12223724

Chicago/Turabian Style

Sasi, Sruthy, Vijay Natraj, Víctor Molina García, Dmitry S. Efremenko, Diego Loyola, and Adrian Doicu. 2020. "Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory" Remote Sensing 12, no. 22: 3724. https://doi.org/10.3390/rs12223724

APA Style

Sasi, S., Natraj, V., Molina García, V., Efremenko, D. S., Loyola, D., & Doicu, A. (2020). Model Selection in Atmospheric Remote Sensing with an Application to Aerosol Retrieval from DSCOVR/EPIC, Part 1: Theory. Remote Sensing, 12(22), 3724. https://doi.org/10.3390/rs12223724

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop