Abstract
We introduce coded two-bucket (C2B) imaging, a new operating principle for computational sensors with applications in active 3D shape estimation and coded-exposure imaging. A C2B sensor modulates the light arriving at each pixel by controlling which of the pixel’s two “buckets” should integrate it. C2B sensors output two images per video frame—one per bucket—and allow rapid, fully-programmable, per-pixel control of the active bucket. Using these properties as a starting point, we (1) develop an image formation model for these sensors, (2) couple them with programmable light sources to acquire illumination mosaics, i.e., images of a scene under many different illumination conditions whose pixels have been multiplexed and acquired in one shot, and (3) show how to process illumination mosaics to acquire live disparity or normal maps of dynamic scenes at the sensor’s native resolution. We present the first experimental demonstration of these capabilities, using a fully-functional C2B camera prototype. Key to this unique prototype is a novel programmable CMOS sensor that we designed from the ground up, fabricated and turned into a working system.
You have full access to this open access chapter, Download conference paper PDF
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
New camera designs—and new types of imaging sensors—have been instrumental in driving the field of computer vision in exciting new directions. In the last decade alone, time-of-flight cameras [1, 2] have been widely adopted for vision [3] and computational photography tasks [4,5,6,7]; event cameras [8] that support asynchronous imaging have led to new vision techniques for high-speed motion analysis [9] and 3D scanning [10]; high-resolution sensors with dual-pixel [11] and assorted-pixel [12] designs are defining the state of the art for smartphone cameras; and sensors with pixel-wise coded-exposure capabilities are starting to appear [13, 14] for compressed sensing applications [15].
Against this backdrop, we introduce a new type of computational video camera to the vision community—the coded two-bucket (C2B) camera (Fig. 1). The C2B camera is a pixel-wise coded-exposure camera that never blocks the incident light. Instead, each pixel in its sensor contains two charge-collection sites—two “buckets”—as well as a one-bit writeable memory that controls which bucket is active. The camera outputs two images per video frame—one per bucket—and performs exposure coding by rapidly controlling the active bucket of each pixel, via a programmable sequence of binary 2D patterns. Key to this unique functionality is a novel programmable CMOS sensor that we designed from the ground up, fabricated in a CMOS image sensor (CIS) process technology [16] for the first time, and turned into a working camera system.
The light efficiency and electronic per-pixel coding capabilities of C2B cameras open up a range of applications that go well beyond what is possible today. This potentially includes compressive acquisition of high-speed video [17] with optimal light efficiency; simultaneous acquisition of both epipolar-only [18] and non-epipolar video streams; fully-electronic acquisition of high-dynamic-range AC-flicker videos [19]; conferring EpiScan3D-like functionality [20] to non-rectified imaging systems; and performing many other coded-exposure imaging tasks [15, 21, 22] with a compact camera platform.
Our focus in this first paper, however, is to highlight the novel capabilities of C2B cameras for live dense one-shot 3D reconstruction: we show that from just one grayscale C2B video frame of a dynamic scene under active illumination, it is possible to reconstruct the scene’s 3D snapshot (i.e., per-pixel disparity or normal, plus albedo) at a resolution comparable to the sensor’s pixel array. We argue that C2B cameras allow us to reduce this very difficult 3D reconstruction problem [23,24,25,26,27,28] to the potentially much easier 2D problems of image demosaicing [29, 30] and illumination multiplexing [31].
In particular, we show that C2B cameras can acquire—in one frame—images of a scene under linearly-independent illuminations, multiplexed across the buckets of neighboring pixels. We call such a frame a two-bucket illumination mosaic. In this setting, reconstruction at full sensor resolution involves four steps (Fig. 2): (1) control bucket activities and light sources to pack distinct low-resolution images of the scene into one C2B frame (i.e., images per bucket); (2) upsample these images to full resolution by demosaicing; (3) demultiplex all the upsampled images jointly, to obtain up to linearly-independent full-resolution images; and (4) use these images to solve for shape and albedo at each pixel independently. We demonstrate the effectiveness of this procedure by recovering dense 3D shape and albedo from one shot with two of the oldest and simplest active 3D reconstruction algorithms available—multi-pattern cosine phase shifting [33, 34] and photometric stereo [35].
From a hardware perspective, we build on previous attempts to fabricate sensors with C2B-like functionality [36,37,38], which did not rely on a CMOS image sensor process technology. More broadly, our prototype can be thought of as generalizing three families of sensors. Programmable coded-exposure sensors [13] allow individual pixels to be “masked” for brief periods during the exposure of a video frame (Fig. 3, left). Just like the C2B sensor, they have a writeable one-bit memory inside each pixel to control masking, but their pixels lack a second bucket so light falling onto “masked” pixels is lost. Continuous-wave time-of-flight sensors [1, 2] can be thought of as having complementary functionality to coded-exposure sensors: their pixels have two buckets whose activity can be toggled programmatically (so no light is lost), but they have no in-pixel writeable memory. As such, the active bucket is constrained to be the same for all pixels (Fig. 3, middle). This makes programmable per-pixel coding—and acquisition of illumination mosaics in particular—impossible without specialized optics (e.g., [17]). Multi-bucket (a.k.a., “multi-tap”) sensors [39,40,41,42] have more than two buckets in each pixel but they have no writeable memory either, so per-pixel coding is not possible. In theory, an -bucket sensor would be uniquely suited for dense one-shot reconstruction because it can acquire in each frame full-resolution images corresponding to any set of illuminations [43]. In practice, however, C2B sensors have several advantages: they are scalable because they can pack linearly-independent images into one frame for any value of —without hard-wiring this value into the pixel’s CMOS design; they are much more light efficient because each extra bucket reduces the pixel’s photo-sensitive region significantly for a given pixel size; and they have a broader range of applications because they enable per-pixel coding. To our knowledge, 2D sensors with more than four buckets have not been fabricated in a standard CMOS image process, and it is unclear if they could offer acceptable imaging performance.
On the conceptual side, our contributions are the following: (1) we put forth a general model for the C2B camera that opens up new directions for coded-exposure imaging with active sources; (2) we formulate its control as a novel multiplexing problem [31, 44,45,46,47,48,49] in the bucket and pixel domains; (3) we draw a connection between two-bucket imaging and algorithms that operate directly on intensity ratios [50]; and (4) we provide an algorithm-independent framework for dense one-shot reconstruction that is simpler than earlier attempts [18] and is compatible with standard image processing pipelines.
Last but not least, we demonstrate all the above experimentally, on the first fully-operational C2B camera prototype.
2 Coded Two-Bucket Imaging
We begin by introducing an image formation model for C2B cameras. We consider the most general setting in this section, where a whole sequence of C2B frames may be acquired instead of just one.
C2B cameras output two images per video frame—one for each bucket (Fig. 2). We refer to these images as the bucket-\(1\) image and bucket-\(0\) image.
The Code Tensor. Programming a C2B camera amounts to specifying the time-varying contents of its pixels’ memories at two different timescales: (1) at the scale of sub-frames within a video frame, which correspond to the updates of the in-pixel memories (Fig. 1, right), and (2) at the scale of frames within a video sequence. For a video sequence with frames and a camera that has pixels and supports sub-frames, bucket activities can be represented as a three-dimensional binary tensor of size . We call the code tensor (Fig. 4a).
We use two specific 2D “slices” of the code tensor in our analysis below, and have special notation for them. For a specific pixel \(p\), slice describes the activity of pixel \(p\)’s buckets across all frames and sub-frames. Similarly, for a specific frame \(f\), slice describes the bucket activity of all pixels across all sub-frames of \(f\):
where \({\mathbf {c}}^{p}_{f}\) is an -dimensional row vector that specifies the active bucket of pixel \(p\) in the sub-frames of frame \(f\); and \({\mathbf {c}}^{}_{fs}\) is a -dimensional column vector that specifies the active bucket of all pixels in sub-frame \(s\) of frame \(f\).
The Illumination Matrix. Although C2B cameras can be used for passive imaging applications [15], we model the case where illumination is programmable at sub-frame timescales too. In particular, we represent the scene’s time-varying illumination condition as an illumination matrix that applies to all frames:
where row vector \({\mathbf {l}}_{s}\) denotes the scene’s illumination condition in sub-frame \(s\) of every frame. We consider two types of scene illumination in this work: a set of directional light sources whose intensity is given by vector \({\mathbf {l}}_{s}\); and a projector that projects a pattern specified by the first elements of \({\mathbf {l}}_{s}\) in the presence of ambient light, which we treat as an -th source that is “always on” (i.e., element for all \(s\)).
Two-Bucket Image Formation Model for Pixel \(p\). Let \({\mathbf {i}}^{p}_{}\) and \({\mathbf {\hat{i}}}^{p}_{}\) be column vectors holding the intensities of pixel \(p\)’s bucket 1 and bucket 0, respectively, in frames. We model these intensities as the result of light transport from the light sources to the pixel’s two buckets (Fig. 4b):
where \(\overline{b}\) denotes the binary complement of matrix or vector b, is the slice of the code tensor corresponding to \(p\), and \({\mathbf {t}}^{p}\) is the pixel’s transport vector. Element \({\mathbf {t}}^{p}[{l}]\) of this vector describes the transport of light from source \(l\) to pixel \(p\) in the timespan of one sub-frame, across all light paths.
To gain some intuition about Eq. (3), consider the buckets’ intensities in frame \(f\):
In effect, the two buckets of pixel \(p\) can be thought of as “viewing” the scene under two potentially different illuminations given by vectors and , respectively. Moreover, if \({\mathbf {c}}^{p}_{f}\) varies from frame to frame, these illumination conditions may vary as well.
Bucket Ratios as Albedo “Quasi-Invariants”. Since the two buckets of pixel \(p\) generally represent different illumination conditions, the two ratios
defined by \(p\)’s buckets are illumination ratios [50,51,52]. Moreover, we show in [32] that under zero-mean Gaussian image noise, these ratios are well approximated by Gaussian random variables whose means are the ideal (noiseless) ratios and whose standard deviations depend weakly on albedo. In effect, C2B cameras provide two “albedo-invariant” images per frame. We exploit this feature of C2B cameras for both shape recovery and demosaicing in Sects. 3 and 5, respectively.
2.1 Acquiring Two-Bucket Illumination Mosaics
A key feature of C2B cameras is that they offer an important alternative to multi-frame acquisition: instead of capturing frames in sequence, they can capture a spatially-multiplexed version of them in a single C2B frame (Fig. 2). We call such a frame a two-bucket illumination mosaic in analogy to the RGB filter mosaics of color image sensors [12, 53, 54]. Unlike filter mosaics, however, which are attached to the sensor and cannot be changed, acquisition of illumination mosaics is programmable for any .
The Bucket-1 and Bucket-0 Image Sequences. Collecting the two buckets’ intensities in Eq. (4) across all frames and pixels, we define two matrices that hold all this data:
Code Tensor for Mosaic Acquisition. Formally, a two-bucket illumination mosaic is a spatial sub-sampling of the sequences and in Eq. (6). Acquiring it amounts to specifying a one-frame code tensor that spatially multiplexes the corresponding -frame tensor in Fig. 4(a). We do this by (1) defining a regular tiling of the sensor plane and (2) specifying a correspondence , between the pixels in a tile and frames. The rows of are then defined to be
Mosaic Acquisition Example. The C2B frames in Fig. 2 were captured using a \(2\times 2\) pixel tile to spatially multiplex a three-frame code tensor. The tensor assigned identical illumination conditions to all pixels within a frame and different conditions across frames. Pixels within each tile were assigned to individual frames using the correspondence \(\{(1,1) \rightarrow 1,~(1,2) \rightarrow 2,~(2,1) \rightarrow 2,~(2,2) \rightarrow 3\}\).
3 Per-Pixel Estimation of Normals and Disparities
Let us now turn to the problem of normal and disparity estimation using photometric stereo and structured-light triangulation, respectively. We consider the most basic formulation of these tasks, where all computations are done independently at each pixel and the relation between observations and unknowns is expressed as a system of linear equations. These formulations should be treated merely as examples that showcase the special characteristics of two-bucket imaging; as with conventional cameras, using advanced methods to handle more general settings [55, 56] is certainly possible.
From Bucket Intensities to Demultiplexed Intensities. As a starting point, we expand Eq. (3) to get a relation that involves only intensities:
Each scalar \(i^{p}_{s}\) in the right-hand side of Eq. (8) is the intensity that a conventional camera pixel would record if the scene’s illumination condition was \({\mathbf {l}}_{s}\). Therefore, Eq. (8) tells us that as far as a single pixel \(p\) is concerned, C2B cameras capture the same measurements a conventional camera would capture for 3D reconstruction—except that these measurements are multiplexed over bucket intensities. To retrieve them, these intensities must be demultiplexed by inverting Eq. (8):
where \({}'\) denotes matrix transpose. This inversion is only possible if is non-singular. Moreover, the signal-to-noise ratio (SNR) of the demultiplexed intensities depends heavily on and (Sect. 4). Setting aside this issue for now, we consider below the task of shape recovery from already-demultiplexed intensities. For notational simplicity, we drop the pixel index \(p\) from the equations below.
Per-Pixel Constraints on 3D Shape. The relation between demultiplexed intensities and the pixel’s unknowns takes exactly the same form in both photometric stereo and structured-light triangulation with cosine patterns:
where is known; \({\mathbf {x}}\) is a 3D vector that contains the pixel’s shape unknowns; \(a_{}\) is the unknown albedo; and \({\mathbf {e}}\) is observation noise. See Table 1 for a summary of each problem’s assumptions and for the mapping of problem-specific quantities to Eq. (10).
There are (at least) three ways to turn Eq. (10) into a constraint on normals and disparities under zero-mean Gaussian noise. The resulting constraints are not equivalent when combining measurements from small pixel neighborhoods—as we implicitly do—because they are not equally invariant to spatial albedo variations:
-
1.
Direct method (DM): treat Eq. (10) as providing independent constraints on vector \(a_{}{\mathbf {x}}\) and solve for both \(a_{}\) and \({\mathbf {x}}\). The advantage of this approach is that errors are Gaussian by construction; its disadvantage is that Eq. (10) depends on albedo.
-
2.
Ratio constraint (R): divide individual intensities by their total sum to obtain an illumination ratio, as in Eq. (5). This yields the following constraint on \({\mathbf {x}}\):
(11)where \(r_{l} = i^{}_{l}/\sum _{k} i^{}_{k}\) and \({\mathbf {1}}\) is a row vector of all ones. The advantage here is that both \(r_{l}\) and Eq. (11) are approximately invariant to albedo.
-
3.
Cross-product constraint (CP): instead of computing an explicit ratio from Eq. (10), eliminate \(a_{}\) to obtain
$$\begin{aligned} i^{}_{l}{\mathbf {d}}_{k}{\mathbf {x}}= i^{}_{k}{\mathbf {d}}_{l}{\mathbf {x}}. \end{aligned}$$(12)Since Eq. (12) has intensities \(i^{}_{l},i^{}_{k}\) as factors, it does implicitly depend on albedo.
Solving for the Unknowns. Both structured light and photometric stereo require at least independent constraints for a unique solution. In the DM method, we use least-squares to solve for \(a_{}{\mathbf {x}}\); when using the R or CP constraints, we apply singular-value decomposition to solve for \({\mathbf {x}}\).
4 Code Matrices for Bucket Multiplexing
The previous section gave ways to solve for 3D shape when we have enough independent constraints per pixel. Here we consider the problem of controlling a C2B camera to actually obtain them for a pixel \(p\). In particular, we show how to choose (1) the number of frames , (2) the number of sub-frames per frame , and (3) the pixel-specific slice of the code tensor, which defines the multiplexing matrix in Eq. (8).
Determining these parameters can be thought of as an instance of the optimal multiplexing problem [31, 44,45,46,47,48,49]. This problem has been considered in numerous contexts before, as a one-to-one mapping from desired measurements to actual, noisy observations. In the case of coded two-bucket imaging, however, the problem is slightly different because each frame yields two measurements instead of just one.
The results below provide further insight into this particular multiplexing problem (see [32] for proofs). Observation 1 implies that even though a pixel’s two buckets provide measurements in total across frames, at most of them can be independent because the multiplexing matrix is rank-deficient:
Observation 1
.
Intuitively, a C2B camera should not be thought of as being equivalent to two coded-exposure cameras that operate completely independently. This is because the activities of a pixel’s two buckets are binary complements of each other, and thus not independent.
Corollary 1
Multiplexing intensities requires frames.
Corollary 2
The minimal configuration for fully-constrained reconstruction at a pixel \(p\) is frames, sub-frames per frame, and linearly-independent illumination vectors of dimension . The next-highest configuration is 3 frames, 4 subframes/illumination vectors.
We now seek the optimal matrix , i.e., the matrix that maximizes the SNR of the demultiplexed intensities in Eq. (9). Lemma 1 extends the lower-bound analysis of Ratner et al. [45] to obtain a lower bound on the mean-squared error (MSE) of two-bucket multiplexing [32]:
Lemma 1
For every multiplexing matrix , the MSE of the best unbiased linear estimator satisfies the lower bound
Although Lemma 1 does not provide an explicit construction, it does ensure the optimality of matrices whose MSEs achieve the lower bound. We used this observation to prove the optimality of matrices derived from the standard Hadamard construction [31]:
Proposition 1
Let where is derived from the Hadamard matrix by removing its row of ones to create an matrix. The bucket-multiplexing matrix defined by is optimal.
The smallest for which Proposition 1 applies are and . Since our main goal is one-shot acquisition, optimal matrices for other small values of are also of significant interest. To find them, we conducted a brute-force search over the space of small binary matrices to find the ones with the lowest MSE. These matrices are shown in Table 2. See Fig. 6(a), (b) and [32] for an initial empirical SNR analysis.
5 One-Shot Shape from Two-Bucket Illumination Mosaics
We use three different ways of estimating shape from a two-bucket illumination mosaic:
-
1.
Intensity demosaicing (ID): treat the intensities in a mosaic tile as separate “imaging dimensions” for the purpose of demosaicing; upsample these intensities by applying either an RGB demosaicing algorithm to three of these dimensions at a time, or by using a more general assorted-pixel procedure [12, 54] that takes all of them into account; demultiplex the upsampled images using Eq. (9); and apply any of the estimation methods in Sect. 3 to the result. Fig. 2 illustrates this approach.
-
2.
Bucket-ratio demosaicing (BRD): apply Eq. (5) to each pixel in the mosaic to obtain an albedo-invariant two-bucket ratio mosaic; demosaic and demultiplex them; and compute 3D shape using the ratio constraint of Sect. 3. See Fig. 5 for an example.
-
3.
No demosaicing (ND): instead of upsampling, treat each mosaic tile as a “super-pixel” whose unknowns (i.e., normal, disparity, etc.) do not vary within the tile; compute one shape estimate per tile using any of the methods of Sect. 3.
Performance Evaluation of One-Shot Photometric Stereo on Synthetic Data. Figures 6(c) and (d) analyze the effective resolution and albedo invariance of normal maps computed by several combinations of methods from Sects. 3 and 5, plus two more—Baseline, which applies basic photometric stereo to three full-resolution images; and Color, the one-shot color photometric stereo technique in [23]. To generate synthetic data, we (1) generated scenes with random spatially-varying normal maps and RGB albedo maps, (2) applied a spatial low-pass filter to albedo maps and the spherical coordinates of normal maps, (3) rendered them to create three sets of images—a grayscale C2B frame; three full-resolution grayscale images; and a Bayer color mosaic—and (4) added zero-mean Gaussian noise to each pixel, corresponding to a peak SNR of 30dB. Since all calculations except demosaicing are done per pixel, any frequency-dependent variations in performance must be due to this upsampling step. Our simulation results do match the intuition that performance should degrade for very high normal map frequencies regardless of the type of neighborhood processing. For spatial frequencies up to 0.3 the Nyquist limit, however, one-shot C2B imaging confers a substantial performance advantage. A similar evaluation for structured-light triangulation can be found in [32].
6 Live 3D Imaging with a C2B Camera
Experimental Conditions. Both C2B frame acquisition and scene reconstruction run at 20 Hz for all experiments, using , the corresponding optimal from Table 2, and the \(2\times 2\) mosaic tile defined in Sect. 2.1. C2B frames are always processed by the same sequence of steps—demosaicing, demultiplexing and per-pixel reconstruction. For structured light, we fit an 8 mm Schneider Cinegon f/1.4 lens to our camera with its aperture set to f/2, and use a TI LightCrafter for projecting \(684 \times 608\)-pixel, 24-gray-level patterns at a rate of Hz in sync with sub-frames. The stereo baseline was approximately 20 cm, the scene was 1.1–1.5 m away, and the cosine frequency was 5 for all patterns and experiments. For photometric stereo we switch to a 23mm Schneider APO-Xenoplan f/1.4 lens to approximate orthographic imaging conditions, and illuminate a scene 2–3 m away with four sub-frame synchronized Luxdrive 7040 Endor Star LEDs, fitted with 26.5 mm Carclo Technical Plastics lenses.
Quantitative Experiments. Our goal was to compare the 3D accuracy of one-shot C2B imaging against that of full-resolution sequential imaging—using the exact same system and algorithms. Figure 7 shows the static scenes used for these experiments, along with example reconstructions for photometric stereo and structured light, respectively. The “ground truth,” which served as our reference, was computed by averaging 1000 sequentially-captured, bucket-1 images per illumination condition and applying the same reconstruction algorithm to the lower-noise, averaged images. To further distinguish the impact of demosaicing from that of sensor-specific non-idealities, we also compute shape from a simulated C2B frame; to create it we spatially multiplex the averaged images computationally in a way that simulates the operation of our C2B sensor. Row 3 of Fig. 7 shows some of these comparisons for structured light. The BRD-R method, coupled with OpenCV’s demosaicing algorithm, yields the best performance in this case, corresponding to a disparity error of \(4\%\). See [32] for more details and additional results.
Reconstructing Dynamic Scenes. Figure 8 shows several examples.
7 Concluding Remarks
Our experiments relied on some of the very first images from a C2B sensor. Issues such as fixed-pattern noise; slight variations in gain across buckets and across pixels; and other minor non-idealities do still exist. Nevertheless, we believe that our preliminary results support the claim that 3D data are acquired at near-sensor resolution.
We intentionally used raw, unprocessed intensities and the simplest possible approaches for demosaicing and reconstruction. There is no doubt that denoised images and more advanced reconstruction algorithms could improve reconstruction performance considerably. Our use of generic RGB demosaicing software is also clearly sub-optimal, as their algorithms do not take into account the actual correlations that exist across C2B pixels. A prudent approach would be to train an assorted-pixel algorithm on precisely such data.
Last but certainly not least, we are particularly excited about C2B cameras sparking new vision techniques that take full advantage of their advanced imaging capabilities.
References
Lange, R., Seitz, P.: Solid-state time-of-flight range camera. IEEE J. Quantum Electron. 37(3), 390–397 (2001)
Bamji, C.S., et al.: A 0.13 \(\mu \)m CMOS system-on-chip for a \(512\times 424\) time-of-flight image sensor with multi-frequency photo-demodulation up to 130 MHz and 2 GS/s ADC. IEEE J. Solid-State Circ. 50(1), 303–319 (2015)
Newcombe, R.A., Fox, D., Seitz, S.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of IEEE CVPR (2015)
Heide, F., Hullin, M.B., Gregson, J., Heidrich, W.: Low-budget transient imaging using photonic mixer devices. In: Proceedings of ACM SIGGRAPH (2013)
Kadambi, A., Bhandari, A., Whyte, R., Dorrington, A., Raskar, R.: Demultiplexing illumination via low cost sensing and nanosecond coding. In: Proceedings of IEEE ICCP (2014)
Shrestha, S., Heide, F., Heidrich, W., Wetzstein, G.: Computational imaging with multi-camera time-of-flight systems. In: Proceedings of ACM SIGGRAPH (2016)
Callenberg, C., Heide, F., Wetzstein, G., Hullin, M.B.: Snapshot difference imaging using correlation time-of-flight sensors. In: Proceedings of ACM SIGGRAPH, Asia (2017)
Lichtsteiner, P., Posch, C., Delbruck, T.: A 128\(\times \)128 120 dB 15 \(\mu \)s latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circ. 43(2), 566–576 (2008)
Kim, H., Leutenegger, S., Davison, A.J.: Real-time 3D reconstruction and 6-DoF tracking with an event camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 349–364. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_21
Matsuda, N., Cossairt, O., Gupta, M.: MC3D: motion contrast 3D scanning. In: Proceedings of IEEE ICCP (2015)
Jang, J., Yoo, Y., Kim, J., Paik, J.: Sensor-based auto-focusing system using multi-scale feature extraction and phase correlation matching. Sensors 15(3), 5747–5762 (2015)
Yasuma, F., Mitsunaga, T., Iso, D., Nayar, S.K.: Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE-TIP 19(9), 2241–2253 (2010)
Zhang, J., Etienne-Cummings, R., Chin, S., Xiong, T., Tran, T.: Compact all-CMOS spatiotemporal compressive sensing video camera with pixel-wise coded exposure. Opt. Express 24(8), 9013–9024 (2016)
Sonoda, T., Nagahara, H., Endo, K., Sugiyama, Y., Taniguchi, R.: High-speed imaging using CMOS image sensor with quasi pixel-wise exposure. In: Proceedings of IEEE ICCP (2016)
Baraniuk, R.G., Goldstein, T., Sankaranarayanan, A.C., Studer, C., Veeraraghavan, A., Wakin, M.B.: Compressive video sensing: algorithms, architectures, and applications. IEEE Sig. Process. Mag. 34(1), 52–66 (2017)
Fossum, E.R., Hondongwa, D.B.: A review of the pinned photodiode for CCD and CMOS image sensors. IEEE J. Electron Devices 2(3), 33–43 (2014)
Hitomi, Y., Gu, J., Gupta, M., Mitsunaga, T., Nayar, S.K.: Video from a single coded exposure photograph using a learned over-complete dictionary. In: Proceedings of IEEE ICCV (2011)
O’Toole, M., Mather, J., Kutulakos, K.N.: 3D shape and indirect appearance by structured light transport. IEEE T-PAMI 38(7), 1298–1312 (2016)
Sheinin, M., Schechner, Y., Kutulakos, K.N.: Computational imaging on the electric grid. In: Proceedings of IEEE CVPR (2017)
O’Toole, M., Achar, S., Narasimhan, S.G., Kutulakos, K.N.: Homogeneous codes for energy-efficient illumination and imaging. In: Proceedings of ACM SIGGRAPH (2015)
Heintzmann, R., Hanley, Q.S., Arndt-Jovin, D., Jovin, T.M.: A dual path programmable array microscope (PAM): simultaneous acquisition of conjugate and non-conjugate images. J. Microsc. 204(2), 119–135 (2001)
Raskar, R., Agrawal, A., Tumblin, J.: Coded exposure photography: motion deblurring using fluttered shutter. In: Proceedings of ACM SIGGRAPH (2006)
Hernandez, C., Vogiatzis, G., Brostow, G.J., Stenger, B., Cipolla, R.: Non-rigid photometric stereo with colored lights. In: Proceedings of IEEE ICCV (2007)
Kim, H., Wilburn, B., Ben-Ezra, M.: Photometric stereo for dynamic surface orientations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 59–72. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_5
Fyffe, G., Yu, X., Debevec, P.: Single-shot photometric stereo by spectral multiplexing. In: Proceedings of IEEE ICCP (2011)
Van der Jeught, S., Dirckx, J.J.J.: Real-time structured light profilometry: a review. Opt. Lasers Eng. 87, 18–31 (2016)
Sagawa, R., Furukawa, R., Kawasaki, H.: Dense 3D reconstruction from high frame-rate video using a static grid pattern. IEEE T-PAMI 36(9), 1733–1747 (2014)
Narasimhan, S.G., Koppal, S.J., Yamazaki, S.: Temporal dithering of illumination for fast active vision. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 830–844. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_61
Gharbi, M., Chaurasia, G., Paris, S., Durand, F.: Deep joint demosaicking and denoising. In: Proceedings of ACM SIGGRAPH Asia (2016)
Heide, F., et al.: FlexISP: a flexible camera image processing framework. In: Proceedings of ACM SIGGRAPH, Asia (2014)
Schechner, Y.Y., Nayar, S.K., Belhumeur, P.N.: Multiplexing for optimal lighting. IEEE T-PAMI 29(8), 1339–1354 (2007)
Wei, M., Sarhangnejad, N., Xia, Z., Gusev, N., Katic, N., Genov, R., Kutulakos, K.N.: Coded two-bucket cameras for computer vision: supplemental document. In: Proceedings of ECCV (2018), also available at http://www.dgp.toronto.edu/C2B
Salvi, J., Fernandez, S., Pribanic, T., Llado, X.: A state of the art in structured light patterns for surface profilometry. Pattern Recogn. 43(8), 2666–2680 (2010)
Salvi, J., Pages, J., Batlle, J.: Pattern codification strategies in structured light systems. Pattern Recogn. 37(4), 827–849 (2004)
Woodham, R.J.: Photometric method for determining surface orientation from multiple images. Opt. Eng. 19(1), 191139 (1980)
Sarhangnejad, N., Lee, H., Katic, N., O’Toole, M., Kutulakos, K.N., Genov, R.: CMOS image sensor architecture for primal-dual coding. In: International Image Sensor Workshop (2017)
Luo, Y., Mirabbasi, S.: Always-on CMOS image sensor pixel design for pixel-wise binary coded exposure. In: IEEE International Symposium on Circuits & Systems (2017)
Luo, Y., Ho, D., Mirabbasi, S.: Exposure-programmable CMOS pixel with selective charge storage and code memory for computational imaging. IEEE Trans. Circ. Syst. 65(5), 1555–1566 (2018)
Wan, G., Li, X., Agranov, G., Levoy, M., Horowitz, M.: CMOS image sensors with multi-bucket pixels for computational photography. IEEE J. Solid-State Circ. 47(4), 1031–1042 (2012)
Wilburn, B.S., Ben-Ezra, M.: Time interleaved exposures and multiplexed illumination. US Patent 9,100,581 (2015)
Wan, G., Horowitz, M., Levoy, M.: Applications of multi-bucket sensors to computational photography. Technical report, Stanford Computer Graphics Lab (2012)
Seo, M.W., et al.: 4.3 A programmable sub-nanosecond time-gated 4-tap lock-in pixel CMOS image sensor for real-time fluorescence lifetime imaging microscopy. In: Proceedings of IEEE ISSCC (2017)
Yoda, T., Nagahara, H., Taniguchi, R.I., Kagawa, K., Yasutomi, K., Kawahito, S.: The dynamic photometric stereo method using a multi-tap CMOS image sensor. Sensors 18(3), 786 (2018)
Wetzstein, G., Ihrke, I., Heidrich, W.: On plenoptic multiplexing and reconstruction. Int. J. Comput. Vis. 101(2), 384–400 (2013)
Ratner, N., Schechner, Y.Y., Goldberg, F.: Optimal multiplexed sensing: bounds, conditions and a graph theory link. Opt. Express 15(25), 17072–17092 (2007)
Brown, C.M.: Multiplex imaging and random arrays. Ph.D. thesis, University of Chicago (1972)
Ratner, N., Schechner, Y.Y.: Illumination multiplexing within fundamental limits. In: Proceedings of IEEE CVPR (2007)
Nonoyama, M., Sakaue, F., Sato, J.: Multiplex image projection using multi-band projectors. In: IEEE Workshop on Color and Photometry in Computer Vision (2013)
Mitra, K., Cossairt, O.S., Veeraraghavan, A.: A framework for analysis of computational imaging systems: role of signal prior. IEEE T-PAMI Sens. Noise Multiplexing 36(10), 1909–1921 (2014)
Liu, Z., Shan, Y., Zhang, Z.: Expressive expression mapping with ratio images. In: Proceedings of ACM SIGGRAPH (2001)
Wang, L., Yang, R., Davis, J.: BRDF invariant stereo using light transport constancy. IEEE T-PAMI 29(9), 1616–1626 (2007)
Pilet, J., Strecha, C., Fua, P.: Making background subtraction robust to sudden illumination changes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 567–580. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_42
Bayer, B.E.: Color imaging array. US Patent 3,971,065 (1976)
Narasimhan, S.G., Nayar, S.: Enhancing resolution along multiple imaging dimensions using assorted pixels. IEEE T-PAMI 27(4), 518–530 (2005)
Queau, Y., Mecca, R., Durou, J.D., Descombes, X.: Photometric stereo with only two images: a theoretical study and numerical resolution. Image Vis. Comput. 57, 175–191 (2017)
Gupta, M., Nayar, S.K.: Micro phase shifting. In: Proceedings of IEEE CVPR (2012)
Acknowledgements
We gratefully acknowledge the support of the Natural Sciences and Engineering Research Council of Canada under the RGPIN, RTI and SGP programs, and of DARPA under the REVEAL program. We also wish to thank Hui Feng Ke and Gilead Wolf Posluns for FPGA programming related to the C2B sensor, Sarah Anne Kushner for help with live imaging experiments, and Michael Brown, Harel Haim and the anonymous reviewers for their many helpful comments and suggestions on earlier versions of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 88844 KB)
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, M. et al. (2018). Coded Two-Bucket Cameras for Computer Vision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11207. Springer, Cham. https://doi.org/10.1007/978-3-030-01219-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-01219-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01218-2
Online ISBN: 978-3-030-01219-9
eBook Packages: Computer ScienceComputer Science (R0)