Coded Two-Bucket Cameras for Computer Vision

Mian Wei¹⁶,
Navid Sarhangnejad¹⁷,
Zhengfan Xia¹⁷,
Nikita Gusev¹⁷,
Nikola Katic¹⁷,
Roman Genov¹⁷ &
…
Kiriakos N. Kutulakos¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11207))

Included in the following conference series:

European Conference on Computer Vision

2859 Accesses
21 Citations
3 Altmetric

Abstract

We introduce coded two-bucket (C2B) imaging, a new operating principle for computational sensors with applications in active 3D shape estimation and coded-exposure imaging. A C2B sensor modulates the light arriving at each pixel by controlling which of the pixel’s two “buckets” should integrate it. C2B sensors output two images per video frame—one per bucket—and allow rapid, fully-programmable, per-pixel control of the active bucket. Using these properties as a starting point, we (1) develop an image formation model for these sensors, (2) couple them with programmable light sources to acquire illumination mosaics, i.e., images of a scene under many different illumination conditions whose pixels have been multiplexed and acquired in one shot, and (3) show how to process illumination mosaics to acquire live disparity or normal maps of dynamic scenes at the sensor’s native resolution. We present the first experimental demonstration of these capabilities, using a fully-functional C2B camera prototype. Key to this unique prototype is a novel programmable CMOS sensor that we designed from the ground up, fabricated and turned into a working system.

You have full access to this open access chapter, Download conference paper PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

New camera designs—and new types of imaging sensors—have been instrumental in driving the field of computer vision in exciting new directions. In the last decade alone, time-of-flight cameras [1, 2] have been widely adopted for vision [3] and computational photography tasks [4,5,6,7]; event cameras [8] that support asynchronous imaging have led to new vision techniques for high-speed motion analysis [9] and 3D scanning [10]; high-resolution sensors with dual-pixel [11] and assorted-pixel [12] designs are defining the state of the art for smartphone cameras; and sensors with pixel-wise coded-exposure capabilities are starting to appear [13, 14] for compressed sensing applications [15].

Against this backdrop, we introduce a new type of computational video camera to the vision community—the coded two-bucket (C2B) camera (Fig. 1). The C2B camera is a pixel-wise coded-exposure camera that never blocks the incident light. Instead, each pixel in its sensor contains two charge-collection sites—two “buckets”—as well as a one-bit writeable memory that controls which bucket is active. The camera outputs two images per video frame—one per bucket—and performs exposure coding by rapidly controlling the active bucket of each pixel, via a programmable sequence of binary 2D patterns. Key to this unique functionality is a novel programmable CMOS sensor that we designed from the ground up, fabricated in a CMOS image sensor (CIS) process technology [16] for the first time, and turned into a working camera system.

The light efficiency and electronic per-pixel coding capabilities of C2B cameras open up a range of applications that go well beyond what is possible today. This potentially includes compressive acquisition of high-speed video [17] with optimal light efficiency; simultaneous acquisition of both epipolar-only [18] and non-epipolar video streams; fully-electronic acquisition of high-dynamic-range AC-flicker videos [19]; conferring EpiScan3D-like functionality [20] to non-rectified imaging systems; and performing many other coded-exposure imaging tasks [15, 21, 22] with a compact camera platform.

Our focus in this first paper, however, is to highlight the novel capabilities of C2B cameras for live dense one-shot 3D reconstruction: we show that from just one grayscale C2B video frame of a dynamic scene under active illumination, it is possible to reconstruct the scene’s 3D snapshot (i.e., per-pixel disparity or normal, plus albedo) at a resolution comparable to the sensor’s pixel array. We argue that C2B cameras allow us to reduce this very difficult 3D reconstruction problem [23,24,25,26,27,28] to the potentially much easier 2D problems of image demosaicing [29, 30] and illumination multiplexing [31].

In particular, we show that C2B cameras can acquire—in one frame—images of a scene under linearly-independent illuminations, multiplexed across the buckets of neighboring pixels. We call such a frame a two-bucket illumination mosaic. In this setting, reconstruction at full sensor resolution involves four steps (Fig. 2): (1) control bucket activities and light sources to pack distinct low-resolution images of the scene into one C2B frame (i.e., images per bucket); (2) upsample these images to full resolution by demosaicing; (3) demultiplex all the upsampled images jointly, to obtain up to linearly-independent full-resolution images; and (4) use these images to solve for shape and albedo at each pixel independently. We demonstrate the effectiveness of this procedure by recovering dense 3D shape and albedo from one shot with two of the oldest and simplest active 3D reconstruction algorithms available—multi-pattern cosine phase shifting [33, 34] and photometric stereo [35].

From a hardware perspective, we build on previous attempts to fabricate sensors with C2B-like functionality [36,37,38], which did not rely on a CMOS image sensor process technology. More broadly, our prototype can be thought of as generalizing three families of sensors. Programmable coded-exposure sensors [13] allow individual pixels to be “masked” for brief periods during the exposure of a video frame (Fig. 3, left). Just like the C2B sensor, they have a writeable one-bit memory inside each pixel to control masking, but their pixels lack a second bucket so light falling onto “masked” pixels is lost. Continuous-wave time-of-flight sensors [1, 2] can be thought of as having complementary functionality to coded-exposure sensors: their pixels have two buckets whose activity can be toggled programmatically (so no light is lost), but they have no in-pixel writeable memory. As such, the active bucket is constrained to be the same for all pixels (Fig. 3, middle). This makes programmable per-pixel coding—and acquisition of illumination mosaics in particular—impossible without specialized optics (e.g., [17]). Multi-bucket (a.k.a., “multi-tap”) sensors [39,40,41,42] have more than two buckets in each pixel but they have no writeable memory either, so per-pixel coding is not possible. In theory, an -bucket sensor would be uniquely suited for dense one-shot reconstruction because it can acquire in each frame full-resolution images corresponding to any set of illuminations [43]. In practice, however, C2B sensors have several advantages: they are scalable because they can pack linearly-independent images into one frame for any value of —without hard-wiring this value into the pixel’s CMOS design; they are much more light efficient because each extra bucket reduces the pixel’s photo-sensitive region significantly for a given pixel size; and they have a broader range of applications because they enable per-pixel coding. To our knowledge, 2D sensors with more than four buckets have not been fabricated in a standard CMOS image process, and it is unclear if they could offer acceptable imaging performance.

On the conceptual side, our contributions are the following: (1) we put forth a general model for the C2B camera that opens up new directions for coded-exposure imaging with active sources; (2) we formulate its control as a novel multiplexing problem [31, 44,45,46,47,48,49] in the bucket and pixel domains; (3) we draw a connection between two-bucket imaging and algorithms that operate directly on intensity ratios [50]; and (4) we provide an algorithm-independent framework for dense one-shot reconstruction that is simpler than earlier attempts [18] and is compatible with standard image processing pipelines.

Last but not least, we demonstrate all the above experimentally, on the first fully-operational C2B camera prototype.

2 Coded Two-Bucket Imaging

We begin by introducing an image formation model for C2B cameras. We consider the most general setting in this section, where a whole sequence of C2B frames may be acquired instead of just one.

C2B cameras output two images per video frame—one for each bucket (Fig. 2). We refer to these images as the bucket-$1$ image and bucket-$0$ image.

The Code Tensor. Programming a C2B camera amounts to specifying the time-varying contents of its pixels’ memories at two different timescales: (1) at the scale of sub-frames within a video frame, which correspond to the updates of the in-pixel memories (Fig. 1, right), and (2) at the scale of frames within a video sequence. For a video sequence with frames and a camera that has pixels and supports sub-frames, bucket activities can be represented as a three-dimensional binary tensor of size . We call the code tensor (Fig. 4a).

We use two specific 2D “slices” of the code tensor in our analysis below, and have special notation for them. For a specific pixel $p$, slice describes the activity of pixel $p$’s buckets across all frames and sub-frames. Similarly, for a specific frame $f$, slice describes the bucket activity of all pixels across all sub-frames of $f$:

(1)

where ${\mathbf {c}}^{p}_{f}$ is an -dimensional row vector that specifies the active bucket of pixel $p$ in the sub-frames of frame $f$; and ${\mathbf {c}}^{}_{fs}$ is a -dimensional column vector that specifies the active bucket of all pixels in sub-frame $s$ of frame $f$.

The Illumination Matrix. Although C2B cameras can be used for passive imaging applications [15], we model the case where illumination is programmable at sub-frame timescales too. In particular, we represent the scene’s time-varying illumination condition as an illumination matrix that applies to all frames:

(2)

where row vector ${\mathbf {l}}_{s}$ denotes the scene’s illumination condition in sub-frame $s$ of every frame. We consider two types of scene illumination in this work: a set of directional light sources whose intensity is given by vector ${\mathbf {l}}_{s}$; and a projector that projects a pattern specified by the first elements of ${\mathbf {l}}_{s}$ in the presence of ambient light, which we treat as an -th source that is “always on” (i.e., element for all $s$).

Two-Bucket Image Formation Model for Pixel $p$. Let ${\mathbf {i}}^{p}_{}$ and ${\mathbf {\hat{i}}}^{p}_{}$ be column vectors holding the intensities of pixel $p$’s bucket 1 and bucket 0, respectively, in frames. We model these intensities as the result of light transport from the light sources to the pixel’s two buckets (Fig. 4b):

(3)

where $\overline{b}$ denotes the binary complement of matrix or vector b, is the slice of the code tensor corresponding to $p$, and ${\mathbf {t}}^{p}$ is the pixel’s transport vector. Element ${\mathbf {t}}^{p}[{l}]$ of this vector describes the transport of light from source $l$ to pixel $p$ in the timespan of one sub-frame, across all light paths.

To gain some intuition about Eq. (3), consider the buckets’ intensities in frame $f$:

(4)

In effect, the two buckets of pixel $p$ can be thought of as “viewing” the scene under two potentially different illuminations given by vectors and , respectively. Moreover, if ${\mathbf {c}}^{p}_{f}$ varies from frame to frame, these illumination conditions may vary as well.

Bucket Ratios as Albedo “Quasi-Invariants”. Since the two buckets of pixel $p$ generally represent different illumination conditions, the two ratios

$$\begin{aligned} \small r~=~\frac{{\mathbf {i}}^{p}_{}[{f}]}{{\mathbf {i}}^{p}_{}[{f}]+{\mathbf {\hat{i}}}^{p}_{}[{f}]} \ \ \ \ \ ,\ \ \ \ \hat{r}~=~\frac{{\mathbf {\hat{i}}}^{p}_{}[{f}]}{{\mathbf {i}}^{p}_{}[{f}]+{\mathbf {\hat{i}}}^{p}_{}[{f}]} \ \ \ \ \ ,\end{aligned}$$

(5)

defined by $p$’s buckets are illumination ratios [50,51,52]. Moreover, we show in [32] that under zero-mean Gaussian image noise, these ratios are well approximated by Gaussian random variables whose means are the ideal (noiseless) ratios and whose standard deviations depend weakly on albedo. In effect, C2B cameras provide two “albedo-invariant” images per frame. We exploit this feature of C2B cameras for both shape recovery and demosaicing in Sects. 3 and 5, respectively.

2.1 Acquiring Two-Bucket Illumination Mosaics

A key feature of C2B cameras is that they offer an important alternative to multi-frame acquisition: instead of capturing frames in sequence, they can capture a spatially-multiplexed version of them in a single C2B frame (Fig. 2). We call such a frame a two-bucket illumination mosaic in analogy to the RGB filter mosaics of color image sensors [12, 53, 54]. Unlike filter mosaics, however, which are attached to the sensor and cannot be changed, acquisition of illumination mosaics is programmable for any .

The Bucket-1 and Bucket-0 Image Sequences. Collecting the two buckets’ intensities in Eq. (4) across all frames and pixels, we define two matrices that hold all this data:

(6)

Code Tensor for Mosaic Acquisition. Formally, a two-bucket illumination mosaic is a spatial sub-sampling of the sequences and in Eq. (6). Acquiring it amounts to specifying a one-frame code tensor that spatially multiplexes the corresponding -frame tensor in Fig. 4(a). We do this by (1) defining a regular tiling of the sensor plane and (2) specifying a correspondence , between the pixels in a tile and frames. The rows of are then defined to be

$$\begin{aligned} \widetilde{{\mathbf {c}}}^{p_i}_{1}~\overset{\text {def}}{=}~{\mathbf {c}}^{p_i}_{f_i} \ \ .\end{aligned}$$

(7)

Mosaic Acquisition Example. The C2B frames in Fig. 2 were captured using a $2\times 2$ pixel tile to spatially multiplex a three-frame code tensor. The tensor assigned identical illumination conditions to all pixels within a frame and different conditions across frames. Pixels within each tile were assigned to individual frames using the correspondence $\{(1,1) \rightarrow 1,~(1,2) \rightarrow 2,~(2,1) \rightarrow 2,~(2,2) \rightarrow 3\}$.

3 Per-Pixel Estimation of Normals and Disparities

Let us now turn to the problem of normal and disparity estimation using photometric stereo and structured-light triangulation, respectively. We consider the most basic formulation of these tasks, where all computations are done independently at each pixel and the relation between observations and unknowns is expressed as a system of linear equations. These formulations should be treated merely as examples that showcase the special characteristics of two-bucket imaging; as with conventional cameras, using advanced methods to handle more general settings [55, 56] is certainly possible.

Table 1. The two basic multi-image reconstruction techniques considered in this work.

Full size table

From Bucket Intensities to Demultiplexed Intensities. As a starting point, we expand Eq. (3) to get a relation that involves only intensities:

(8)

Each scalar $i^{p}_{s}$ in the right-hand side of Eq. (8) is the intensity that a conventional camera pixel would record if the scene’s illumination condition was ${\mathbf {l}}_{s}$. Therefore, Eq. (8) tells us that as far as a single pixel $p$ is concerned, C2B cameras capture the same measurements a conventional camera would capture for 3D reconstruction—except that these measurements are multiplexed over bucket intensities. To retrieve them, these intensities must be demultiplexed by inverting Eq. (8):

(9)

where ${}'$ denotes matrix transpose. This inversion is only possible if is non-singular. Moreover, the signal-to-noise ratio (SNR) of the demultiplexed intensities depends heavily on and (Sect. 4). Setting aside this issue for now, we consider below the task of shape recovery from already-demultiplexed intensities. For notational simplicity, we drop the pixel index $p$ from the equations below.

Per-Pixel Constraints on 3D Shape. The relation between demultiplexed intensities and the pixel’s unknowns takes exactly the same form in both photometric stereo and structured-light triangulation with cosine patterns:

(10)

where is known; ${\mathbf {x}}$ is a 3D vector that contains the pixel’s shape unknowns; $a_{}$ is the unknown albedo; and ${\mathbf {e}}$ is observation noise. See Table 1 for a summary of each problem’s assumptions and for the mapping of problem-specific quantities to Eq. (10).

There are (at least) three ways to turn Eq. (10) into a constraint on normals and disparities under zero-mean Gaussian noise. The resulting constraints are not equivalent when combining measurements from small pixel neighborhoods—as we implicitly do—because they are not equally invariant to spatial albedo variations:

1.
Direct method (DM): treat Eq. (10) as providing independent constraints on vector $a_{}{\mathbf {x}}$ and solve for both $a_{}$ and ${\mathbf {x}}$. The advantage of this approach is that errors are Gaussian by construction; its disadvantage is that Eq. (10) depends on albedo.
2.
Ratio constraint (R): divide individual intensities by their total sum to obtain an illumination ratio, as in Eq. (5). This yields the following constraint on ${\mathbf {x}}$:
(11)
where $r_{l} = i^{}_{l}/\sum _{k} i^{}_{k}$ and ${\mathbf {1}}$ is a row vector of all ones. The advantage here is that both $r_{l}$ and Eq. (11) are approximately invariant to albedo.
3.
Cross-product constraint (CP): instead of computing an explicit ratio from Eq. (10), eliminate $a_{}$ to obtain
$$\begin{aligned} i^{}_{l}{\mathbf {d}}_{k}{\mathbf {x}}= i^{}_{k}{\mathbf {d}}_{l}{\mathbf {x}}. \end{aligned}$$
(12)
Since Eq. (12) has intensities $i^{}_{l},i^{}_{k}$ as factors, it does implicitly depend on albedo.

Solving for the Unknowns. Both structured light and photometric stereo require at least independent constraints for a unique solution. In the DM method, we use least-squares to solve for $a_{}{\mathbf {x}}$; when using the R or CP constraints, we apply singular-value decomposition to solve for ${\mathbf {x}}$.

4 Code Matrices for Bucket Multiplexing

The previous section gave ways to solve for 3D shape when we have enough independent constraints per pixel. Here we consider the problem of controlling a C2B camera to actually obtain them for a pixel $p$. In particular, we show how to choose (1) the number of frames , (2) the number of sub-frames per frame , and (3) the pixel-specific slice of the code tensor, which defines the multiplexing matrix in Eq. (8).

Determining these parameters can be thought of as an instance of the optimal multiplexing problem [31, 44,45,46,47,48,49]. This problem has been considered in numerous contexts before, as a one-to-one mapping from desired measurements to actual, noisy observations. In the case of coded two-bucket imaging, however, the problem is slightly different because each frame yields two measurements instead of just one.

The results below provide further insight into this particular multiplexing problem (see [32] for proofs). Observation 1 implies that even though a pixel’s two buckets provide measurements in total across frames, at most of them can be independent because the multiplexing matrix is rank-deficient:

Observation 1

Intuitively, a C2B camera should not be thought of as being equivalent to two coded-exposure cameras that operate completely independently. This is because the activities of a pixel’s two buckets are binary complements of each other, and thus not independent.

Corollary 1

Multiplexing intensities requires frames.

Corollary 2

The minimal configuration for fully-constrained reconstruction at a pixel $p$ is frames, sub-frames per frame, and linearly-independent illumination vectors of dimension . The next-highest configuration is 3 frames, 4 subframes/illumination vectors.

**Table 2. *Optimal matrices* *for small* . Note that the lower bound given by Eq. (13) is attained only for , *i.e.*, for the smallest Hadamard-based construction of .**

We now seek the optimal matrix , i.e., the matrix that maximizes the SNR of the demultiplexed intensities in Eq. (9). Lemma 1 extends the lower-bound analysis of Ratner et al. [45] to obtain a lower bound on the mean-squared error (MSE) of two-bucket multiplexing [32]:

Lemma 1

For every multiplexing matrix , the MSE of the best unbiased linear estimator satisfies the lower bound

(13)

Although Lemma 1 does not provide an explicit construction, it does ensure the optimality of matrices whose MSEs achieve the lower bound. We used this observation to prove the optimality of matrices derived from the standard Hadamard construction [31]:

Proposition 1

Let where is derived from the Hadamard matrix by removing its row of ones to create an matrix. The bucket-multiplexing matrix defined by is optimal.

The smallest for which Proposition 1 applies are and . Since our main goal is one-shot acquisition, optimal matrices for other small values of are also of significant interest. To find them, we conducted a brute-force search over the space of small binary matrices to find the ones with the lowest MSE. These matrices are shown in Table 2. See Fig. 6(a), (b) and [32] for an initial empirical SNR analysis.

5 One-Shot Shape from Two-Bucket Illumination Mosaics

We use three different ways of estimating shape from a two-bucket illumination mosaic:

1.
Intensity demosaicing (ID): treat the intensities in a mosaic tile as separate “imaging dimensions” for the purpose of demosaicing; upsample these intensities by applying either an RGB demosaicing algorithm to three of these dimensions at a time, or by using a more general assorted-pixel procedure [12, 54] that takes all of them into account; demultiplex the upsampled images using Eq. (9); and apply any of the estimation methods in Sect. 3 to the result. Fig. 2 illustrates this approach.
2.
Bucket-ratio demosaicing (BRD): apply Eq. (5) to each pixel in the mosaic to obtain an albedo-invariant two-bucket ratio mosaic; demosaic and demultiplex them; and compute 3D shape using the ratio constraint of Sect. 3. See Fig. 5 for an example.
3.
No demosaicing (ND): instead of upsampling, treat each mosaic tile as a “super-pixel” whose unknowns (i.e., normal, disparity, etc.) do not vary within the tile; compute one shape estimate per tile using any of the methods of Sect. 3.

Performance Evaluation of One-Shot Photometric Stereo on Synthetic Data. Figures 6(c) and (d) analyze the effective resolution and albedo invariance of normal maps computed by several combinations of methods from Sects. 3 and 5, plus two more—Baseline, which applies basic photometric stereo to three full-resolution images; and Color, the one-shot color photometric stereo technique in [23]. To generate synthetic data, we (1) generated scenes with random spatially-varying normal maps and RGB albedo maps, (2) applied a spatial low-pass filter to albedo maps and the spherical coordinates of normal maps, (3) rendered them to create three sets of images—a grayscale C2B frame; three full-resolution grayscale images; and a Bayer color mosaic—and (4) added zero-mean Gaussian noise to each pixel, corresponding to a peak SNR of 30dB. Since all calculations except demosaicing are done per pixel, any frequency-dependent variations in performance must be due to this upsampling step. Our simulation results do match the intuition that performance should degrade for very high normal map frequencies regardless of the type of neighborhood processing. For spatial frequencies up to 0.3 the Nyquist limit, however, one-shot C2B imaging confers a substantial performance advantage. A similar evaluation for structured-light triangulation can be found in [32].

6 Live 3D Imaging with a C2B Camera

Experimental Conditions. Both C2B frame acquisition and scene reconstruction run at 20 Hz for all experiments, using , the corresponding optimal from Table 2, and the $2\times 2$ mosaic tile defined in Sect. 2.1. C2B frames are always processed by the same sequence of steps—demosaicing, demultiplexing and per-pixel reconstruction. For structured light, we fit an 8 mm Schneider Cinegon f/1.4 lens to our camera with its aperture set to f/2, and use a TI LightCrafter for projecting $684 \times 608$-pixel, 24-gray-level patterns at a rate of Hz in sync with sub-frames. The stereo baseline was approximately 20 cm, the scene was 1.1–1.5 m away, and the cosine frequency was 5 for all patterns and experiments. For photometric stereo we switch to a 23mm Schneider APO-Xenoplan f/1.4 lens to approximate orthographic imaging conditions, and illuminate a scene 2–3 m away with four sub-frame synchronized Luxdrive 7040 Endor Star LEDs, fitted with 26.5 mm Carclo Technical Plastics lenses.

Quantitative Experiments. Our goal was to compare the 3D accuracy of one-shot C2B imaging against that of full-resolution sequential imaging—using the exact same system and algorithms. Figure 7 shows the static scenes used for these experiments, along with example reconstructions for photometric stereo and structured light, respectively. The “ground truth,” which served as our reference, was computed by averaging 1000 sequentially-captured, bucket-1 images per illumination condition and applying the same reconstruction algorithm to the lower-noise, averaged images. To further distinguish the impact of demosaicing from that of sensor-specific non-idealities, we also compute shape from a simulated C2B frame; to create it we spatially multiplex the averaged images computationally in a way that simulates the operation of our C2B sensor. Row 3 of Fig. 7 shows some of these comparisons for structured light. The BRD-R method, coupled with OpenCV’s demosaicing algorithm, yields the best performance in this case, corresponding to a disparity error of $4\%$. See [32] for more details and additional results.

Reconstructing Dynamic Scenes. Figure 8 shows several examples.

7 Concluding Remarks

Our experiments relied on some of the very first images from a C2B sensor. Issues such as fixed-pattern noise; slight variations in gain across buckets and across pixels; and other minor non-idealities do still exist. Nevertheless, we believe that our preliminary results support the claim that 3D data are acquired at near-sensor resolution.

We intentionally used raw, unprocessed intensities and the simplest possible approaches for demosaicing and reconstruction. There is no doubt that denoised images and more advanced reconstruction algorithms could improve reconstruction performance considerably. Our use of generic RGB demosaicing software is also clearly sub-optimal, as their algorithms do not take into account the actual correlations that exist across C2B pixels. A prudent approach would be to train an assorted-pixel algorithm on precisely such data.

Last but certainly not least, we are particularly excited about C2B cameras sparking new vision techniques that take full advantage of their advanced imaging capabilities.

References

Lange, R., Seitz, P.: Solid-state time-of-flight range camera. IEEE J. Quantum Electron. 37(3), 390–397 (2001)
Article Google Scholar
Bamji, C.S., et al.: A 0.13 $\mu $m CMOS system-on-chip for a $512\times 424$ time-of-flight image sensor with multi-frequency photo-demodulation up to 130 MHz and 2 GS/s ADC. IEEE J. Solid-State Circ. 50(1), 303–319 (2015)
Article Google Scholar
Newcombe, R.A., Fox, D., Seitz, S.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of IEEE CVPR (2015)
Google Scholar
Heide, F., Hullin, M.B., Gregson, J., Heidrich, W.: Low-budget transient imaging using photonic mixer devices. In: Proceedings of ACM SIGGRAPH (2013)
Google Scholar
Kadambi, A., Bhandari, A., Whyte, R., Dorrington, A., Raskar, R.: Demultiplexing illumination via low cost sensing and nanosecond coding. In: Proceedings of IEEE ICCP (2014)
Google Scholar
Shrestha, S., Heide, F., Heidrich, W., Wetzstein, G.: Computational imaging with multi-camera time-of-flight systems. In: Proceedings of ACM SIGGRAPH (2016)
Google Scholar
Callenberg, C., Heide, F., Wetzstein, G., Hullin, M.B.: Snapshot difference imaging using correlation time-of-flight sensors. In: Proceedings of ACM SIGGRAPH, Asia (2017)
Article Google Scholar
Lichtsteiner, P., Posch, C., Delbruck, T.: A 128$\times $128 120 dB 15 $\mu $s latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circ. 43(2), 566–576 (2008)
Article Google Scholar
Kim, H., Leutenegger, S., Davison, A.J.: Real-time 3D reconstruction and 6-DoF tracking with an event camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 349–364. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_21
Chapter Google Scholar
Matsuda, N., Cossairt, O., Gupta, M.: MC3D: motion contrast 3D scanning. In: Proceedings of IEEE ICCP (2015)
Google Scholar
Jang, J., Yoo, Y., Kim, J., Paik, J.: Sensor-based auto-focusing system using multi-scale feature extraction and phase correlation matching. Sensors 15(3), 5747–5762 (2015)
Article Google Scholar
Yasuma, F., Mitsunaga, T., Iso, D., Nayar, S.K.: Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE-TIP 19(9), 2241–2253 (2010)
MathSciNet MATH Google Scholar
Zhang, J., Etienne-Cummings, R., Chin, S., Xiong, T., Tran, T.: Compact all-CMOS spatiotemporal compressive sensing video camera with pixel-wise coded exposure. Opt. Express 24(8), 9013–9024 (2016)
Article Google Scholar
Sonoda, T., Nagahara, H., Endo, K., Sugiyama, Y., Taniguchi, R.: High-speed imaging using CMOS image sensor with quasi pixel-wise exposure. In: Proceedings of IEEE ICCP (2016)
Google Scholar
Baraniuk, R.G., Goldstein, T., Sankaranarayanan, A.C., Studer, C., Veeraraghavan, A., Wakin, M.B.: Compressive video sensing: algorithms, architectures, and applications. IEEE Sig. Process. Mag. 34(1), 52–66 (2017)
Article Google Scholar
Fossum, E.R., Hondongwa, D.B.: A review of the pinned photodiode for CCD and CMOS image sensors. IEEE J. Electron Devices 2(3), 33–43 (2014)
Article Google Scholar
Hitomi, Y., Gu, J., Gupta, M., Mitsunaga, T., Nayar, S.K.: Video from a single coded exposure photograph using a learned over-complete dictionary. In: Proceedings of IEEE ICCV (2011)
Google Scholar
O’Toole, M., Mather, J., Kutulakos, K.N.: 3D shape and indirect appearance by structured light transport. IEEE T-PAMI 38(7), 1298–1312 (2016)
Article Google Scholar
Sheinin, M., Schechner, Y., Kutulakos, K.N.: Computational imaging on the electric grid. In: Proceedings of IEEE CVPR (2017)
Google Scholar
O’Toole, M., Achar, S., Narasimhan, S.G., Kutulakos, K.N.: Homogeneous codes for energy-efficient illumination and imaging. In: Proceedings of ACM SIGGRAPH (2015)
Google Scholar
Heintzmann, R., Hanley, Q.S., Arndt-Jovin, D., Jovin, T.M.: A dual path programmable array microscope (PAM): simultaneous acquisition of conjugate and non-conjugate images. J. Microsc. 204(2), 119–135 (2001)
Article MathSciNet Google Scholar
Raskar, R., Agrawal, A., Tumblin, J.: Coded exposure photography: motion deblurring using fluttered shutter. In: Proceedings of ACM SIGGRAPH (2006)
Google Scholar
Hernandez, C., Vogiatzis, G., Brostow, G.J., Stenger, B., Cipolla, R.: Non-rigid photometric stereo with colored lights. In: Proceedings of IEEE ICCV (2007)
Google Scholar
Kim, H., Wilburn, B., Ben-Ezra, M.: Photometric stereo for dynamic surface orientations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 59–72. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_5
Chapter Google Scholar
Fyffe, G., Yu, X., Debevec, P.: Single-shot photometric stereo by spectral multiplexing. In: Proceedings of IEEE ICCP (2011)
Google Scholar
Van der Jeught, S., Dirckx, J.J.J.: Real-time structured light profilometry: a review. Opt. Lasers Eng. 87, 18–31 (2016)
Article Google Scholar
Sagawa, R., Furukawa, R., Kawasaki, H.: Dense 3D reconstruction from high frame-rate video using a static grid pattern. IEEE T-PAMI 36(9), 1733–1747 (2014)
Article Google Scholar
Narasimhan, S.G., Koppal, S.J., Yamazaki, S.: Temporal dithering of illumination for fast active vision. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 830–844. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_61
Chapter Google Scholar
Gharbi, M., Chaurasia, G., Paris, S., Durand, F.: Deep joint demosaicking and denoising. In: Proceedings of ACM SIGGRAPH Asia (2016)
Article Google Scholar
Heide, F., et al.: FlexISP: a flexible camera image processing framework. In: Proceedings of ACM SIGGRAPH, Asia (2014)
Article Google Scholar
Schechner, Y.Y., Nayar, S.K., Belhumeur, P.N.: Multiplexing for optimal lighting. IEEE T-PAMI 29(8), 1339–1354 (2007)
Article Google Scholar
Wei, M., Sarhangnejad, N., Xia, Z., Gusev, N., Katic, N., Genov, R., Kutulakos, K.N.: Coded two-bucket cameras for computer vision: supplemental document. In: Proceedings of ECCV (2018), also available at http://www.dgp.toronto.edu/C2B
Salvi, J., Fernandez, S., Pribanic, T., Llado, X.: A state of the art in structured light patterns for surface profilometry. Pattern Recogn. 43(8), 2666–2680 (2010)
Article Google Scholar
Salvi, J., Pages, J., Batlle, J.: Pattern codification strategies in structured light systems. Pattern Recogn. 37(4), 827–849 (2004)
Article Google Scholar
Woodham, R.J.: Photometric method for determining surface orientation from multiple images. Opt. Eng. 19(1), 191139 (1980)
Article Google Scholar
Sarhangnejad, N., Lee, H., Katic, N., O’Toole, M., Kutulakos, K.N., Genov, R.: CMOS image sensor architecture for primal-dual coding. In: International Image Sensor Workshop (2017)
Google Scholar
Luo, Y., Mirabbasi, S.: Always-on CMOS image sensor pixel design for pixel-wise binary coded exposure. In: IEEE International Symposium on Circuits & Systems (2017)
Google Scholar
Luo, Y., Ho, D., Mirabbasi, S.: Exposure-programmable CMOS pixel with selective charge storage and code memory for computational imaging. IEEE Trans. Circ. Syst. 65(5), 1555–1566 (2018)
MathSciNet Google Scholar
Wan, G., Li, X., Agranov, G., Levoy, M., Horowitz, M.: CMOS image sensors with multi-bucket pixels for computational photography. IEEE J. Solid-State Circ. 47(4), 1031–1042 (2012)
Article Google Scholar
Wilburn, B.S., Ben-Ezra, M.: Time interleaved exposures and multiplexed illumination. US Patent 9,100,581 (2015)
Google Scholar
Wan, G., Horowitz, M., Levoy, M.: Applications of multi-bucket sensors to computational photography. Technical report, Stanford Computer Graphics Lab (2012)
Google Scholar
Seo, M.W., et al.: 4.3 A programmable sub-nanosecond time-gated 4-tap lock-in pixel CMOS image sensor for real-time fluorescence lifetime imaging microscopy. In: Proceedings of IEEE ISSCC (2017)
Google Scholar
Yoda, T., Nagahara, H., Taniguchi, R.I., Kagawa, K., Yasutomi, K., Kawahito, S.: The dynamic photometric stereo method using a multi-tap CMOS image sensor. Sensors 18(3), 786 (2018)
Article Google Scholar
Wetzstein, G., Ihrke, I., Heidrich, W.: On plenoptic multiplexing and reconstruction. Int. J. Comput. Vis. 101(2), 384–400 (2013)
Article Google Scholar
Ratner, N., Schechner, Y.Y., Goldberg, F.: Optimal multiplexed sensing: bounds, conditions and a graph theory link. Opt. Express 15(25), 17072–17092 (2007)
Article Google Scholar
Brown, C.M.: Multiplex imaging and random arrays. Ph.D. thesis, University of Chicago (1972)
Google Scholar
Ratner, N., Schechner, Y.Y.: Illumination multiplexing within fundamental limits. In: Proceedings of IEEE CVPR (2007)
Google Scholar
Nonoyama, M., Sakaue, F., Sato, J.: Multiplex image projection using multi-band projectors. In: IEEE Workshop on Color and Photometry in Computer Vision (2013)
Google Scholar
Mitra, K., Cossairt, O.S., Veeraraghavan, A.: A framework for analysis of computational imaging systems: role of signal prior. IEEE T-PAMI Sens. Noise Multiplexing 36(10), 1909–1921 (2014)
Article Google Scholar
Liu, Z., Shan, Y., Zhang, Z.: Expressive expression mapping with ratio images. In: Proceedings of ACM SIGGRAPH (2001)
Google Scholar
Wang, L., Yang, R., Davis, J.: BRDF invariant stereo using light transport constancy. IEEE T-PAMI 29(9), 1616–1626 (2007)
Article Google Scholar
Pilet, J., Strecha, C., Fua, P.: Making background subtraction robust to sudden illumination changes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 567–580. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_42
Chapter Google Scholar
Bayer, B.E.: Color imaging array. US Patent 3,971,065 (1976)
Google Scholar
Narasimhan, S.G., Nayar, S.: Enhancing resolution along multiple imaging dimensions using assorted pixels. IEEE T-PAMI 27(4), 518–530 (2005)
Article Google Scholar
Queau, Y., Mecca, R., Durou, J.D., Descombes, X.: Photometric stereo with only two images: a theoretical study and numerical resolution. Image Vis. Comput. 57, 175–191 (2017)
Article Google Scholar
Gupta, M., Nayar, S.K.: Micro phase shifting. In: Proceedings of IEEE CVPR (2012)
Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the support of the Natural Sciences and Engineering Research Council of Canada under the RGPIN, RTI and SGP programs, and of DARPA under the REVEAL program. We also wish to thank Hui Feng Ke and Gilead Wolf Posluns for FPGA programming related to the C2B sensor, Sarah Anne Kushner for help with live imaging experiments, and Michael Brown, Harel Haim and the anonymous reviewers for their many helpful comments and suggestions on earlier versions of this manuscript.

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, Canada
Mian Wei & Kiriakos N. Kutulakos
Department of Electrical Engineering, University of Toronto, Toronto, Canada
Navid Sarhangnejad, Zhengfan Xia, Nikita Gusev, Nikola Katic & Roman Genov

Authors

Mian Wei
View author publications
You can also search for this author in PubMed Google Scholar
Navid Sarhangnejad
View author publications
You can also search for this author in PubMed Google Scholar
Zhengfan Xia
View author publications
You can also search for this author in PubMed Google Scholar
Nikita Gusev
View author publications
You can also search for this author in PubMed Google Scholar
Nikola Katic
View author publications
You can also search for this author in PubMed Google Scholar
Roman Genov
View author publications
You can also search for this author in PubMed Google Scholar
Kiriakos N. Kutulakos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mian Wei .

Editor information

Editors and Affiliations

Google Research, Zurich, Switzerland
Vittorio Ferrari
Carnegie Mellon University, Pittsburgh, PA, USA
Martial Hebert
Google Research, Zurich, Switzerland
Cristian Sminchisescu
Hebrew University of Jerusalem, Jerusalem, Israel
Yair Weiss

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 88844 KB)

Supplementary material 2 (pdf 1645 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, M. et al. (2018). Coded Two-Bucket Cameras for Computer Vision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11207. Springer, Cham. https://doi.org/10.1007/978-3-030-01219-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-01219-9_4
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01218-2
Online ISBN: 978-3-030-01219-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics