19926-Article Text-23939-1-2-20220628
19926-Article Text-23939-1-2-20220628
19926-Article Text-23939-1-2-20220628
Marcos V. Conde, Steven McDonagh, Matteo Maggioni, Aleš Leonardis, Eduardo Pérez-Pellitero
Huawei Noah’s Ark Lab
481
considered important when training for specific intermediate balance, color correction, gamma expansion and tone map-
tasks within the ISP (e.g. colour constancy). Additionally, as ping (Brown 2016; Heide et al. 2014; Delbracio et al. 2021).
there is no model regularization, training CycleISP remains a Brooks et al. introduces a modular, differentiable ISP model
complex procedure, requiring large amounts of RAW data. where each module is an invertible function that approximates
Contemporary to our presented work, the InvISP (Xing, the aforementioned canonical operations. In this section we
Qian, and Chen 2021) proposes the camera ISP model as review that work, and introduce notation and parameter de-
an invertible ISP using a single invertible neural network tails about each operation as well as the complete function
(Kingma and Dhariwal 2018; Ho et al. 2019) to perform composition.
both the RAW-to-RGB and RGB-to-RAW mapping. This Let us initially define two images spaces: the RAW im-
normalizing-flow-based approach has the advantages of be- age domain Y and the sRGB image domain X . The trans-
ing invertible and learnable, however, as CycleISP lacks of formation done by the camera ISP can thus be defined as
interpretability and control, requires large amounts of train- f : Y → X . Intuitively, we can define a modular ISP func-
ing data and its constrained by the invertible blocks (i.e. input tion f as a composite function as follows:
and output size must be identical).
In this paper we introduce a novel hybrid approach that x = (fn ◦ · · · ◦ f2 ◦ f1 )(y, pn , . . . , p2 , p1 ), (1)
tackles the aforementioned limitations of ISP modelling and where fi is a function with related parameters pi for a compo-
retains the best of both model-based and end-to-end learnable sition of arbitrary length n. In order to recover a RAW image
approaches (Shlezinger et al. 2021). We propose a modular, y from the respective sRGB observation x (i.e. a mapping
parametric, model-driven approach with a novel parameter from X → Y) we can choose fi to be invertible and tractable
dictionary learning strategy that builds on Brooks et al. We bijective functions:
further improve this flexible, interpretable and constrained
ISP architecture with additional lens shading modelling and y = (f1−1 ◦ f2−1 ◦ · · · ◦ fn−1 )(x, p1 , p2 , . . . , pn ). (2)
a more flexible parametric tone mapping. To address the lack
of in-camera parameters, discussed previously, we design 2.1 Colour Filter Array Mosaic
an end-to-end learnable dictionary representation of inner Camera sensors use a Colour Filter Array (CFA) in order to
camera parameters. This provides a set of parameter basis capture wavelength-specific colour information. The sensor
for optimal end-to-end reconstruction, and enables unlimited is covered with a pattern of colour filters arranged in a given
data augmentation in the RAW image manifold. Our pro- spatial pattern, e.g. the well known Bayer pattern, which is a
posed method is modular, interpretable and is governed by 2 × 2 distribution of R - G - G - B colours, that effectively
well-understood camera parameters. It provides a framework produces a single colour measurement per spatial position.
to learn an end-to-end ISP and related parameters from data. In order to obtain the missing colour samples for each
Moreover, it can be learnt successfully from very few sam- spatial position, the so called demosaicing methods aim to
ples, even when corrupted by noise. Note that we focus on recover the missing pixels, commonly an ill-posed problem
the RAW reconstruction task and its downstream applications which, for the sake of simplicity, we will address as a simple
(e.g. denoising, HDR imaging). The forward pass or RAW- bilinear interpolation: f6 (y) = bic(y). The inverse of this
to-RGB processing, despite related, is a different research function, is however, a straightforward mosaicing operation.
problem (Ignatov, Van Gool, and Timofte 2020; Schwartz, It can be defined as:
Giryes, and Bronstein 2019; Liang et al. 2021).
Our main contributions can be summarized as follows: f6−1 (x5 , km ) = x5 ∗ km , (3)
(1) a modular and differentiable ISP model, composed of
where ∗ denotes a convolution with kernel km containing the
canonical camera operations and governed by interpretable
mosaicing pattern, generally strictly formed by {0, 1}.
parameters, (2) a training mechanism that, in conjunction
with our model contribution, is capable of end-to-end learn- 2.2 Lens Shading Effect
ing of rich parameter representations, i.e. dictionaries or ba-
sis and related linear decomposition decoders, that result Lens Shading Effect (LSE) is the phenomenon of the reduced
in compact ISP models, free from direct parameter supervi- amount of light captured by the photoreceptor when moving
sion, (3) extensive experimental investigation; our learned from the center of the optical axis towards the borders of the
RGB-to-RAW mappings are used to enable data augmenta- sensor, mostly caused by obstruction of elements involved in
tion towards down-stream task performance improvement, in the lens assembly. We can define this function as:
multiple data regimes of varying size and noise. f5 (x5 , M ) = x5 ⊙ M, (4)
2 Image Signal Processor where M is a mask containing per-pixel lens-shading gains.
The group of operations necessary to convert the camera sen- This can be inverted by inverting each per-pixel gain.
sor readings into natural-looking RGB images are generally
referred to as the Image Signal Processor (ISP). There is great 2.3 White Balance and Digital Gain
variability in ISP designs with varying levels of complexity The White Balance (WB) stage aims to neutralize the scene
and functionalities, however a majority of them contain at light source colour such that after correction its appearance
least a number of operations that are generally considered to matches that of an achromatic light source. In practice, this is
be a canonical representation of a basic ISP, namely white achieved by a global per-channel gain for two of the colours,
482
i.e. red and blue gains, namely gr and gb respectively, which 3 Learning Parameter Representations
we arrange in a three colour vector gwb = [ gr 1 gb ].
In the previous section we introduced a modular and paramet-
This scene illuminant is generally estimated heuristically,
ric ISP model, however that model alone does not allow for
although more sophisticated approaches have also been ex-
end-to-end training. In this section we introduce a strategy to
plored (Gijsenij, Gevers, and Lucassen 2009; Barron and
enable end-to-end training for the presented ISP model. To
Tsai 2017; Hernandez-Juarez et al. 2020).
the best of our knowledge, our method is the first data-driven,
WB is normally applied in conjunction with a scalar digital
model-based approach to tackle the reverse ISP problem.
gain gd , which is applied globally to all three channels, and
In Figure 1 we show an overview of our proposed ap-
scales the image intensities as desired. This process can be
proach, formed by a 6-stage ISP model (in blue colour) and
conveniently described as a convolution:
separate networks that learn parameter dictionaries and feed
f4 (x4 , gwb , g) = x4 ∗ (gd gwb ). (5) parameters to the model (in green colour).
To obtain the inverse function f4−1 , we just invert each of
the gains individually, but instead of using the naive division 3.1 Parameter Dictionaries
1/g , we follow the highlight-preserving cubic transformation
Color Correction Modern smartphone cameras typically
of Brooks et al. use different CCMs depending on specific light conditions
and capture modes, so any method that assumes a single
2.4 Color Correction
CCM mode might struggle to cope with colour variability.
The ISP converts the sensor color space into the output color Additionally, an ISP model might be trained to reconstruct
space. This step is often necessary as the CFA colour spectral RAW images captured with different cameras and thus also
sensitivity does not necessarily match the output color stan- different ISP and CCMs. As previously discussed, these ma-
dard, e.g. sRGB (Brown 2016; Afifi et al. 2021). A global trices are generally not accessible to the end user. In order
change in the color space can be achieved with a 3 × 3 Color to learn the color space transformation done by the ISP, we
Correction Matrix (CCM): create a dictionary Dccm ∈ RN ×3×3 of size N , where each
f3 (x3 , Cm ) = X3 Cm , (6) atom is a CCM. To preserve the significance and physical
where Cm denotes a CCM parameter and X3 denotes x3 meaning of these matrices, and avoid learning non-realistic
reshaped for convenience as a matrix, i.e. X3 ∈ Rhw×3 . parameters, we constrain the learnt atoms in the dictionary
Similarly to f3 , we can obtain f3−1 by using Cm pseudo- by column-normalizing each matrix following the ℓ1 norm,
inverse. as this is one of the most representative properties of realistic
CCMs (Brooks et al. 2019; Koskinen, Yang, and Kämäräinen
2.5 Gamma Correction 2019). We perform the color correction as a convolution oper-
The camera sensor readings are linearly proportional to the ation, where the convolutional kernels are the atoms of Dccm
light received, however the human visual system does not and the input is the intermediate representation from the
naturally perceive light linearly, but rather is more sensitive previous function in the ISP model. As the result of this op-
to darker regions. Thus it is common practice to adapt the eration we obtain Iccm ∈ RN ×H×W ×3 , which represents N
linear sensor readings with a gamma logarithmic function: RGB images, each one the result of applying each atom to the
input image. This representation Iccm passes through a CNN
f2 (x2 , γ) = max(x2 , ϵ)1/γ , (7) encoder Eccm that produces a vector of weights wccm ∈ RN .
where γ is a parameter regulating the amount of compres- The resultant color transformed sRGB image is obtained as a
sion/expansion, generally with values around γ = 2.2. The linear combination of Iccm and wccm , which is equivalent to
inverse function can be defined as follows: linearly combining the atoms in the dictionary, and applying
f2−1 (x1 , γ) = max(x1 , ϵ)γ . (8) the resultant CCM to the image. As illustrated in Figure 2,
the model simultaneously learns Dccm and Eccm . This novel
2.6 Tone Mapping dictionary representation of the camera parameters can allow
learning the CCMs of various cameras at once. Note that the
Tone Mapping Operators (TMOs) have been generally used
encoder Erccm used during the reverse pass is different from
to adapt images to their final display device, the most com-
mon case being the TMO applied to High Dynamic Range the Efccm used in the forward pass as we show in Figure 1,
Images (HDRI) on typical Low Dynamic Range display de- however, both encoders have the same functionality.
vices. As opposed to using an S-shaped polynomial function Digital Gain and White Balance Similarly to the CCM
as proposed by (Reinhard et al. 2002; Brooks et al. 2019), we dictionaries, we define Dwb ∈ RN ×3 as a dictionary of N
can use instead a parametric piece-wise linear function that white balance and digital gains, thus, each atom is a triplet of
we model as a shallow convolutional neural network (Pun- scalars (gd gr gb ). We apply each atom g from the dictionary
nappurath and Brown 2020) composed only by 1 × 1 kernels as described by Brooks et al. and obtain Iwb ∈ RN ×H×W ×3 ,
and ReLU activations: which represents a linear decomposition of the results from
applying each gi to the input image. An encoder Ewb pro-
f1 (x1 , θf ) = ϕt (x1 , θf ), (9) duces a set of weights wwb ∈ RN from such representation.
where ϕ is a shallow CNN with learnable parameters θf Note that this encoder is different from the Eccm used in
for the forward pass. A different set of weights θr can be the color correction step. The encoder and dictionary are
optimized for the reverse pass. learned jointly in the optimization. The linear combination
483
Feed params. Conv + ReLU (32)
RGB
RED Feed Conv (3)
BLUE
Invertible op Conv + Sigm. (1)
conv
conv
conv
conv
conv
conv
conv
Reverse pass
sRGB RAW
Forward pass
conv
conv
conv
conv
conv
conv
conv
Figure 1: A visualization of our proposed model using as backbone (blue) the classical ISP operations described in Section 2,
and additional learning component (green) described in Section 3. RAW images are visualized through bilinear demosaicing.
484
3.3 Lens Shading Modelling Method PSNRr W 25% B 25% PSNRd Par. (M)
Due to sensor optics, the amount of light hitting the sensor UPI 36.84 14.87 57.10 49.30 0.0
falls off radially towards the edges and produces a vignetting CycleISP 37.62 15.90 51.65 49.77 3.1
effect; known as lens shading. A typically early ISP stage U-Net 39.84 20.27 49.61 49.69 11.7
constitutes Lens Shading Correction (LSC) (Young 2000) and Ours 45.21 21.58 66.33 50.02 0.6
is used to correct the effects of uneven light hitting the sensor,
towards providing a uniform light response. This is done by Table 1: Quantitative RAW reconstruction results on SIDD.
applying a mask, typically pre-calibrated by the manufacturer, The reconstruction PSNRr (dB), the Best (B) and Worst (W)
to correct for non-uniform light fallout effects (Delbracio et al. 25% percentile are shown for each baseline method. We also
2021). Modelling of the ISP therefore requires a method to show quantitative RAW denoising results in terms of PSNRd
add or correct the Lens Shading Effect (LSE) by modelling to measure the impact of the synthetic data. We include the
such a mask. We propose to model this mask as a pixel-wise number of parameters (Par.) for each model (in millions).
gain map:
1. Gaussian mask Gmask (x, y) ∼ N2 (µ µ, Σ ) fitted from fil-
tered sensor readings, assigns more or less intensity de- quantitative evaluation defining 2 variants: PSNRr for RAW
pending on the pixel position (x, y). Its two parameters µ reconstruction and PSNRd for denoising.
and Σ are further optimized together with the end-to-end
ISP model. 4.1 Datasets
2. Attention-guided mask Amask using a CNN attention SIDD (Abdelhamed, Lin, and Brown 2018; Abdelhamed,
block, as was illustrated in Figure 1. These shallow blocks Timofte, and Brown 2019). Due to the small aperture and sen-
have constrained capacity to ensure the Lens Shading sor, high-resolution smartphone images have notably more
block only corrects per-pixel gains, and thus, we maintain noise than those from DSLRs. This dataset provides real
the interpretability of the entire pipeline. noisy images with their ground-truth, in both raw sensor
data (raw-RGB) and sRGB color spaces. The images are
Both masks are in the space RH×W . During the reverse captured using five different smartphone cameras under dif-
pass, we apply both masks to the image using an element- ferent lighting conditions, poses and ISO levels. There are
wise multiplication (per-pixel gain), recreating the sensor’s 320 ultra-high-resolution image pairs available for training
lens shading effect. To reverse this transformation or correct (e.g. 5328×3000). Validation set consist of 1280 image pairs.
the LSE, we apply the LSC mask: (i) the inverse of Gmask MIT-Adobe FiveK dataset (Bychkovsky et al. 2011). We
(element-wise divide) and (ii) A−1
mask estimated by the atten- use the train-test sets proposed by InvISP (Xing, Qian, and
tion block in the forward pass. Chen 2021) for the Canon EOS 5D and the Nikon D700,
and the same processing using the LibRaw library to render
3.4 Training ground-truth sRGB images from the RAW images.
The complete pipeline is end-to-end trainable and we can use
a simple ℓ2 distance between the training RAW image y and 4.2 RAW Image Reconstruction
the estimated RAW image ŷ. To ensure the complete pipeline
We compare our RAW image reconstruction against other
is invertible, we add ℓ2 loss terms for each intermediate image
state-of-the-art methods, namely: UPI (Brooks et al. 2019) a
and also a consistency loss in the decomposition vectors w
modular, invertible and differentiable ISP model. Requires
of the forward and reverse encoders. For more details we
parameter tuning to fit the distribution of the SIDD dataset.
refer the reader to the supplementary material, where we also
CycleISP (Zamir et al. 2020) a data-driven approach for mod-
provide other relevant information about the training process
elling camera ISP pipelines in forward and reverse directions.
e.g. GPU devices, batch sizes, network architectures.
For generating synthetic RAW images, we use their publicly
available pre-trained model, which has been fine-tuned using
4 Experimental Results the SIDD dataset. U-Net (Ronneberger, Fischer, and Brox
Throughout this section, we provide evidence that our method 2015) a popular architecture that has been previously utilized
can effectively learn the RGB to RAW mapping of real un- to learn ISP models (Ignatov, Van Gool, and Timofte 2020) as
known camera ISPs, obtaining state-of-the-art RAW recon- a naive baseline trained end-to-end without any other model
struction performance, and also validating the robustness assumptions or regularization.
of our model to operate under noisy data and data frugal- In Table 1 we show reconstruction results in terms of
ity (i.e. few-shot learning set-ups). Additionally, we conduct PSNRr on the SIDD validation. Our model performs better
experiments on a downstream task, i.e. RAW image denois- than CycleISP despite being ∼5× smaller, achieving +7.6dB
ing, in order to validate ISP modelling beyond RAW image improvement, and better than U-Net despite being ∼20×
reconstruction, and the effectiveness of our proposed data smaller. We also perform better than hand-crafted methods
augmentations. In all our experiments, we use the reverse as UPI by +8.37, which proves our capacity for learning
pass of our model (Figure 1). During the denoising experi- camera parameters. In Figure 4 we show a qualitative com-
ments, we use our ISP model as an on-line domain adaptation parison of RAW reconstruction methods. Additionally, we
from RGB to RAW, guided by the proposed dictionary aug- aim to prove that our pipeline is invertible, by doing the cy-
mentations (see Section 3.1). We use PSNR as a metric for cle mapping (sRGB to RAW and back to sRGB) our model
485
RGB
Method Nikon PSNRr Canon PSNRr
50
Luma UPI 29.30 -
45
486
Method PSNR SSIM
Noisy 37.18 0.850
EPLL (Zoran and Weiss 2011) 40.73 0.935
GLIDE (Talebi and Milanfar 2014) 41.87 0.949
TNRD (Chen and Pock 2017) 42.77 0.945
FoE (Roth and Black 2005) 43.13 0.969
MLP (Burger et al. 2012) 43.17 0.965
KSVD (Aharon et al. 2006) 43.26 0.969
DnCNN (Zhang et al. 2017) 43.30 0.965
NLM (Buades, Coll, and Morel 2005) 44.06 0.971
WNNM (Gu et al. 2014) 44.85 0.975
BM3D (Dabov et al. 2007) 45.52 0.980
Ours-u 49.90 0.982
DHDN (Park, Yu, and Jeong 2019) 52.02 0.988
Ours-f 52.05 0.986
Noisy CycleISP Ours Clean CycleISP (Zamir et al. 2020) 52.38 0.990
Ours 52.48 0.990
Figure 5: Qualitative RAW Denoising samples. Our model
removes noise while keeping textures and details. More com- Table 3: RAW denoising results on the SIDD Dataset. Few-
parisons can be found in the supplementary material. shot and unsupervised variants of our method are denoted as
“Ours-f” and “Ours-u” respectively.
487
References Dabov, K.; Foi, A.; Katkovnik, V.; and Egiazarian, K. 2007.
Abdelhamed, A.; Afifi, M.; Timofte, R.; and Brown, M. S. Image Denoising by Sparse 3-D Transform-Domain Collab-
2020. NTIRE 2020 Challenge on Real Image Denois- orative Filtering. IEEE Transactions on Image Processing,
ing: Dataset, Methods and Results. In Proceedings of the 16(8): 2080–2095.
IEEE/CVF Conference on Computer Vision and Pattern Delbracio, M.; Kelly, D.; Brown, M. S.; and Milanfar,
Recognition (CVPR) Workshops. P. 2021. Mobile Computational Photography: A Tour.
arXiv:2102.09000.
Abdelhamed, A.; Lin, S.; and Brown, M. S. 2018. A High-
Quality Denoising Dataset for Smartphone Cameras. In Gharbi, M.; Chaurasia, G.; Paris, S.; and Durand, F. 2016.
Proceedings of the IEEE Conference on Computer Vision Deep Joint Demosaicking and Denoising. ACM Transactions
and Pattern Recognition (CVPR). on Graphics, 35(6).
Abdelhamed, A.; Timofte, R.; and Brown, M. S. 2019. Gijsenij, A.; Gevers, T.; and Lucassen, M. P. 2009. Perceptual
NTIRE 2019 Challenge on Real Image Denoising: Meth- analysis of distance measures for color constancy algorithms.
ods and Results. In Proceedings of the IEEE/CVF Confer- Journal of the Optical Society of America A, 26(10).
ence on Computer Vision and Pattern Recognition (CVPR) Gu, S.; Zhang, L.; Zuo, W.; and Feng, X. 2014. Weighted
Workshops. Nuclear Norm Minimization with Application to Image De-
Afifi, M.; Abdelhamed, A.; Abuolaim, A.; Punnappurath, A.; noising. In Proceedings of the IEEE Conference on Computer
and Brown, M. S. 2021. CIE XYZ Net: Unprocessing Images Vision and Pattern Recognition.
for Low-Level Computer Vision Tasks. IEEE Transactions Heide, F.; Steinberger, M.; Tsai, Y.-T.; Rouf, M.; Pajak, D.;
on Pattern Analysis and Machine Intelligence (TPAMI). Reddy, D.; Gallo, O.; Liu, J.; Heidrich, W.; Egiazarian, K.;
Aharon, M.; Elad, M.; and Bruckstein, A. 2006. K-SVD: An Kautz, J.; and Pulli, K. 2014. FlexISP: A Flexible Camera Im-
algorithm for designing overcomplete dictionaries for sparse age Processing Framework. ACM Transactions on Graphics,
representation. IEEE Transactions on Signal Processing, 33(6).
54(11): 4311–4322. Hernandez-Juarez, D.; Parisot, S.; Busam, B.; Leonardis, A.;
Barron, J. T.; and Tsai, Y. 2017. Fast Fourier Color Constancy. Slabaugh, G.; and McDonagh, S. 2020. A multi-hypothesis
In Proceedings of the IEEE Conference on Computer Vision approach to color constancy. In Proceedings of the IEEE/CVF
and Pattern Recognition (CVPR). Conference on Computer Vision and Pattern Recognition,
2270–2280.
Brooks, T.; Mildenhall, B.; Xue, T.; Chen, J.; Sharlet, D.; and
Barron, J. T. 2019. Unprocessing Images for Learned Raw Ho, J.; Chen, X.; Srinivas, A.; Duan, Y.; and Abbeel, P. 2019.
Denoising. In Proceedings of the IEEE/CVF Conference on Flow++: Improving Flow-Based Generative Models with
Computer Vision and Pattern Recognition (CVPR). Variational Dequantization and Architecture Design. In Pro-
ceedings of the International Conference on Machine Learn-
Brown, M. S. 2016. Understanding the In-Camera Image ing.
Processing Pipeline for Computer Vision. In IEEE Computer
Vision and Pattern Recognition - Tutorial. Huiskes, M. J.; Thomee, B.; and Lew, M. S. 2010. New
Trends and Ideas in Visual Concept Detection: The MIR
Buades, A.; Coll, B.; and Morel, J. . 2005. A non-local Flickr Retrieval Evaluation Initiative. In Proceedings of
algorithm for image denoising. In Proceedings of the IEEE the International Conference on Multimedia Information Re-
Conference on Computer Vision and Pattern Recognition trieval.
(CVPR).
Ignatov, A.; Malivenko, G.; Plowman, D.; Shukla, S.; and
Buckler, M.; Jayasuriya, S.; and Sampson, A. 2017. Re- Timofte, R. 2021. Fast and Accurate Single-Image Depth
configuring the Imaging Pipeline for Computer Vision. In Estimation on Mobile Devices, Mobile AI 2021 Challenge:
Proceedings of the IEEE International Conference on Com- Report. In Proceedings of the IEEE/CVF Conference on Com-
puter Vision. puter Vision and Pattern Recognition (CVPR) Workshops.
Burger, H. C.; Schuler, C. J.; and Harmeling, S. 2012. Image Ignatov, A.; Van Gool, L.; and Timofte, R. 2020. Replacing
denoising: Can plain neural networks compete with BM3D? Mobile Camera ISP With a Single Deep Learning Model.
In 2012 IEEE Conference on Computer Vision and Pattern In Proceedings of the IEEE/CVF Conference on Computer
Recognition, 2392–2399. Vision and Pattern Recognition (CVPR) Workshops.
Bychkovsky, V.; Paris, S.; Chan, E.; and Durand, F. 2011. Imamura, R.; Itasaka, T.; and Okuda, M. 2019. Zero-Shot
Learning photographic global tonal adjustment with a Hyperspectral Image Denoising With Separable Image Prior.
database of input / output image pairs. In Proceedings of the In Proceedings fo the IEEE/CVF International Conference
IEEE Conference on Computer Vision and Pattern Recogni- on Computer Vision (ICCV) Workshop.
tion. Kingma, D. P.; and Dhariwal, P. 2018. Glow: Generative Flow
Chen, Y.; and Pock, T. 2017. Trainable Nonlinear Reaction with Invertible 1x1 Convolutions. In Bengio, S.; Wallach, H.;
Diffusion: A Flexible Framework for Fast and Effective Im- Larochelle, H.; Grauman, K.; Cesa-Bianchi, N.; and Garnett,
age Restoration. IEEE Transactions on Pattern Analysis and R., eds., Advances in Neural Information Processing Systems,
Machine Intelligence, 39(6): 1256–1272. volume 31.
488
Koskinen, S.; Yang, D.; and Kämäräinen, J. 2019. Reverse Xing, Y.; Qian, Z.; and Chen, Q. 2021. Invertible Image Sig-
Imaging Pipeline for Raw RGB Image Augmentation. In nal Processing. In The IEEE/CVF Conference on Computer
Proceedings of the IEEE International Conference on Image Vision and Pattern Recognition (CVPR).
Processing (ICIP). Young, I. T. 2000. Shading correction: compensation for
Liang, Z.; Cai, J.; Cao, Z.; and Zhang, L. 2021. CameraNet: illumination and sensor inhomogeneities. Current Protocols
A Two-Stage Framework for Effective Camera ISP Learning. in Cytometry, 14(1).
IEEE Transactions on Image Processing, 30: 2248–2262. Zamir, S. W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F. S.;
Liu, Y.-L.; Lai, W.-S.; Chen, Y.-S.; Kao, Y.-L.; Yang, M.-H.; Yang, M.-H.; and Shao, L. 2020. CycleISP: Real Image
Chuang, Y.-Y.; and Huang, J.-B. 2020. Single-Image HDR Restoration via Improved Data Synthesis. In Proceedings of
Reconstruction by Learning to Reverse the Camera Pipeline. the IEEE/CVF Conference on Computer Vision and Pattern
In Proceedings of the IEEE/CVF Conference on Computer Recognition (CVPR).
Vision and Pattern Recognition (CVPR). Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; and Zhang, L.
Mantiuk, R.; Mantiuk, R.; Tomaszewska, A.; and Heidrich, 2017. Beyond a Gaussian denoiser: Residual learning of
W. 2009. Color correction for tone mapping. In Computer deep CNN for image denoising. IEEE Transactions on Image
Graphics Forum, volume 28. Processing, 26(7): 3142–3155.
Zoran, D.; and Weiss, Y. 2011. From learning models of
Park, B.; Yu, S.; and Jeong, J. 2019. Densely Connected natural image patches to whole image restoration. In 2011
Hierarchical Network for Image Denoising. In Proceedings International Conference on Computer Vision, 479–486.
of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR) Workshops.
Punnappurath, A.; and Brown, M. S. 2020. Learning Raw Im-
age Reconstruction-Aware Deep Image Compressors. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
42(4): 1013–1019.
Qian, G.; Gu, J.; Ren, J. S.; Dong, C.; Zhao, F.; and Lin,
J. 2019. Trinity of Pixel Enhancement: a Joint Solution
for Demosaicking, Denoising and Super-Resolution. arXiv
preprint arXiv:1905.02538.
Reinhard, E.; Stark, M.; Shirley, P.; and Ferwerda, J. 2002.
Photographic Tone Reproduction for Digital Images. ACM
Transactions on Graphics, 21(3): 267–276.
Ronneberger, O.; Fischer, P.; and Brox, T. 2015. U-Net:
Convolutional Networks for Biomedical Image Segmenta-
tion. In Medical Image Computing and Computer-Assisted
Intervention (MICCAI).
Roth, S.; and Black, M. J. 2005. Fields of Experts: a frame-
work for learning image priors. In 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recog-
nition (CVPR’05), volume 2, 860–867 vol. 2.
Schwartz, E.; Giryes, R.; and Bronstein, A. M. 2019. Deep-
ISP: Toward Learning an End-to-End Image Processing
Pipeline. IEEE Transactions on Image Processing, 28(2):
912–923.
Shlezinger, N.; Whang, J.; Eldar, Y. C.; and Dimakis, A. G.
2021. Model-Based Deep Learning: Key Approaches and
Design Guidelines. In Proceedings of the IEEE Data Science
and Learning Workshop (DSLW).
Talebi, H.; and Milanfar, P. 2014. Global Image Denoising.
IEEE Transactions on Image Processing, 23(2): 755–768.
Wronski, B.; Garcia-Dorado, I.; Ernst, M.; Kelly, D.; Krainin,
M.; Liang, C.-K.; Levoy, M.; and Milanfar, P. 2019. Hand-
held Multi-Frame Super-Resolution. ACM Transactions on
Graphics, 38(4).
Xia, M.; Liu, X.; and Wong, T.-T. 2018. Invertible Grayscale.
ACM Transactions on Graphics, 37(6).
489